Contribute Media
A thank you to everyone who makes this possible: Read More

Feature Importance and Ensemble Methods : a new perspective


Ensemble methods are extremely performant in terms of prediction, but lack easy interpretation. Feature importance is not only counting up how many times a feature has been used in a weak learner, but also by how much this feature contributes to the result. Detailed example and implementation are provided in a jupyter notebook in python for the library "xgboost" of extreme gradient boosting.


I - Feature importance in ensemble algorithms - state of the art

  1. Feature importance in sklearn/xgboost: basically counts the occurrences of a feature in all the weak learners
  2. Construction of the trees in xgboost: if the trees are deep enough, every feature is going to be used
  3. Global feature importance is a misleading: a given feature might be critical for a given subpopulation but completely irrelevant for another (ex : multi-class classification)

II - Xgboost real feature importance

  1. Prediction influence: first splits influence the prediction more than last splits, so the importance of a feature must be weighted by the discrimination it provides
  2. Point-to-point feature importance: following the path of a given prediction, it is possible to weigh the importance of every used feature
  3. A relevant assessment of feature importance: explanation of a given prediction, and aggregation on a set of data points

III - Implementation and examples

  1. Point-to-point feature importance illustration and implementation explanation
  2. Evolution of feature importance with respect to learning iterations
  3. Noisy variables cancellation

IV - Limits and ways forward

  1. A word on correlated variables
  2. Is there a compromise performance/interpretation ?


Improve this page