Feature Importance

Probabilitist inference from tensorflow ? 

RFE - Closer to causality than feature importance

from sklearn.feature_selection import RFE
from lightgbm import LGBMClassifier

estimator = LGBMClassifier()
selector = RFE(estimator, 8, step=1)
selector = selector.fit(X, y)

## For now I will do this
X_df = X.T
X_df["Value"] = selector.ranking_

X_df = X_df[X_df["Value"]==1]

X_df = X_df.T


Recursive Feature Selection makes sense in some regard - as it might eliminate a correlated feature that might have been powerful but share its power with another. It is very similar to leaving highly correlated features. 

This is the trap you stepped in, predictors is the prediction model, predictor variables are the variables to the predictors, instead of saying predictor variables, you can just say features. 

It does not matter if you over fit for feature importance as long as you are note using it for feature selection 

To be truthful the prediction excercise is not that interesting, what is more interesting
fitting the data to the response, i.e. training the model and looking at the feature
interactions it proposes. Feature importance is only interesting to the extent it is
different to correlation plot, in that regard, higher dimensional interactions become

Feature Importance + Selection Draft
There are just 4 worthwhile approaches to measure feature importance. 

  1. Backward Induction - Drop Feature Retrain Model - Identify Performance
  1. Permutation Importance - Randomly shuffle feature instances - Identify Performance
  1. SHAP Values - Features ‘ Average Contribution to Outcome - More Consistent