Normalizing the output will not affect shape of ff, so it's generally not necessary.
Probabilitist inference from tensorflow ?
The way I calculate non-linear, non-monotonic "correlation" (dependency?, association?) is very simple: I just train a non-linear model between the two variables and see if they can predict each other. You can do this for each pair of features in a dataset and then range-standardize between the highest and lowest RMSE and subtract values from 1 so you get what looks like a typical correlation matrix.
RFE - Closer to causality than feature importance
from sklearn.feature_selection import RFE
from lightgbm import LGBMClassifier
estimator = LGBMClassifier()
selector = RFE(estimator, 8, step=1)
selector = selector.fit(X, y)
## For now I will do this
X_df = X.T
X_df["Value"] = selector.ranking_
X_df = X_df[X_df["Value"]==1]
X_df = X_df.T
Recursive Feature Selection makes sense in some regard - as it might eliminate a correlated feature that might have been powerful but share its power with another. It is very similar to leaving highly correlated features.
This is the trap you stepped in, predictors is the prediction model, predictor variables are the variables to the predictors, instead of saying predictor variables, you can just say features.
It does not matter if you over fit for feature importance as long as you are note using it for feature selection
To be truthful the prediction excercise is not that interesting, what is more interesting
fitting the data to the response, i.e. training the model and looking at the feature
interactions it proposes. Feature importance is only interesting to the extent it is
different to correlation plot, in that regard, higher dimensional interactions become
Feature Importance + Selection Draft
There are just 4 worthwhile approaches to measure feature importance.
- Backward Induction - Drop Feature Retrain Model - Identify Performance
- Permutation Importance - Randomly shuffle feature instances - Identify Performance