Why does Random Forest variable importance not sum to 100%?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
1












The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks










share|cite|improve this question

















  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31







  • 1




    See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22
















up vote
1
down vote

favorite
1












The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks










share|cite|improve this question

















  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31







  • 1




    See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks










share|cite|improve this question













The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks







r random-forest importance






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Sep 8 at 14:02









Micha

1083




1083







  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31







  • 1




    See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22












  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31







  • 1




    See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22







1




1




Why do you assume it should sum to 1? I see no reason for that belief.
– Firebug
Sep 8 at 18:31





Why do you assume it should sum to 1? I see no reason for that belief.
– Firebug
Sep 8 at 18:31





1




1




See Measures of variable importance in random forests
– Firebug
Sep 8 at 21:22




See Measures of variable importance in random forests
– Firebug
Sep 8 at 21:22










1 Answer
1






active

oldest

votes

















up vote
4
down vote



accepted










As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer




















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365955%2fwhy-does-random-forest-variable-importance-not-sum-to-100%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
4
down vote



accepted










As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer




















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50














up vote
4
down vote



accepted










As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer




















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50












up vote
4
down vote



accepted







up vote
4
down vote



accepted






As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer












As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Sep 8 at 14:43









Wayne

15.5k13572




15.5k13572











  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50
















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50















Thanks for the welcome. I expect I'll be back :-)
– Micha
Sep 9 at 6:50




Thanks for the welcome. I expect I'll be back :-)
– Micha
Sep 9 at 6:50

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365955%2fwhy-does-random-forest-variable-importance-not-sum-to-100%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay