Feature Importance Chart in neural network using Keras in Python

Feature Importance Chart in neural network using Keras in Python - python

I am using python(3.6) anaconda (64 bit) spyder (3.1.2). I already set a neural network model using keras (2.0.6) for a regression problem(one response, 10 variables). I was wondering how can I generate feature importance chart like so:
def base_model():
model = Sequential()
model.add(Dense(200, input_dim=10, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer = 'adam')
return model
clf = KerasRegressor(build_fn=base_model, epochs=100, batch_size=5,verbose=0)
clf.fit(X_train,Y_train)

I was recently looking for the answer to this question and found something that was useful for what I was doing and thought it would be helpful to share. I ended up using a permutation importance module from the eli5 package. It most easily works with a scikit-learn model. Luckily, Keras provides a wrapper for sequential models. As shown in the code below, using it is very straightforward.
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance
def base_model():
model = Sequential()
...
return model
X = ...
y = ...
my_model = KerasRegressor(build_fn=base_model, **sk_params)
my_model.fit(X,y)
perm = PermutationImportance(my_model, random_state=1).fit(X,y)
eli5.show_weights(perm, feature_names = X.columns.tolist())

This is a relatively old post with relatively old answers, so I would like to offer another suggestion of using SHAP to determine feature importance for your Keras models. SHAP offers support for both 2d and 3d arrays compared to eli5 which currently only supports 2d arrays (so if your model uses layers which require 3d input like LSTM or GRU, eli5 will not work).
Here is the link to an example of how SHAP can plot the feature importance for your Keras models, but in case it ever becomes broken some sample code and plots are provided below as well (taken from said link):
import shap
# load your data here, e.g. X and y
# create and fit your model here
# load JS visualization code to notebook
shap.initjs()
# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
shap.summary_plot(shap_values, X, plot_type="bar")

At the moment Keras doesn't provide any functionality to extract the feature importance.
You can check this previous question:
Keras: Any way to get variable importance?
or the related GoogleGroup: Feature importance
Spoiler: In the GoogleGroup someone announced an open source project to solve this issue..

Related

XGBoost model quantization - Sklearn model quantization

I am looking for solutions to quantize sklearn models. I am specifically looking for XGBoost models.
I did find solutions to quantize pytorch and tensorflow models but nothing on sklearn.
Solutions tried:
Converted sklearn model to ONNX and then tried to quantize ONNX model, but that didn't work either. Here is the link to the bug.
Any pointers or solutions can be shared, it would be of great help.

Someone had answered the question in your link bug.
Try not to add a final node ZipMap by the 'zipmap' option:
onx = convert_sklearn(clr, initial_types=initial_type,
options={'zipmap': False})
I'm interested to know if that works for you?
BTW, you can use onnxmltools to convert a XGBoost model to ONNX according to this.
the sample code:
import onnx
import onnxmltools
from onnxmltools.convert.common.data_types import FloatTensorType
from xgboost import XGBClassifier
clf = XGBClassifier()
# fit the classifier...
onnx_model_path = "xgb_classifier.onnx"
initial_type = [('float_input', FloatTensorType([None, num_features]))]
onnx_model = onnxmltools.convert.convert_xgboost(clf, initial_types=initial_type, target_opset=10)
onnx.save(onnx_model, onnx_model_path)
Note that:
Model must be trained using the scikit-learn API of xgboost
The training data passed to XGBClassifier().fit() must not have feature names associated with it. For example, if your training data is a DataFrame called df, which has column names, you will need to use a representation without column names (i.e. df.values) when training.

SVC Classifier to Keras CNN with probabilities or confidence to distinguish untrained classes

This question is pretty similar to this one and based on this post over GitHub, in the sense that I am trying to convert an SVM multiclass classification model (e.g., using sklearn) to a Keras model.
Specifically, I am looking for a way of retrieving probabilities (similar to SVC probability=True) or confidence value at the end so that I can define some sort of threshold and be able to distinguish between trained classes and non-trained ones. That is if I train my model with 3 or 4 classes, but then use a 5th that it wasn't trained with, it will still output some prediction, even if totally wrong. I want to avoid that in some way.
I got the following working reasonably well, but it relies on picking the maximum value at the end (argmax), which I would like to avoid:
model = Sequential()
model.add(Dense(30, input_shape=(30,), activation='relu', kernel_initializer='he_uniform'))
# output classes
model.add(Dense(3, kernel_regularizer=regularizers.l2(0.1)))
# the activation is linear by default, which works; softmax makes the accuracy be stuck 33% if targeting 3 classes, or 25% if targeting 4.
#model.add(Activation('softmax'))
model.compile(loss='categorical_hinge', optimizer=keras.optimizers.Adam(lr=1e-3), metrics=['accuracy'])
Any ideas on how to tackle this untrained-class problem? Something like Plat scaling or Temperature scaling would work, if I can still save the model as onnx.

As I suspected, got softmax to work by scaling the features (input) of the model. No need for stop gradient or anything. I was specifically using really big numbers, which despite training well, were preventing softmax (logistic regression) to work properly. The scaling of the features can be done, for instance, through the following code:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
By doing this the output of the SVM-like model using keras is outputting probabilities as originally intended.

How to obtain the Tensorflow code version of a NN built in Keras?

I have been working with Keras for a week or so. I know that Keras can use either TensorFlow or Theano as a backend. In my case, I am using TensorFlow.
So I'm wondering: is there a way to write a NN in Keras, and then print out the equivalent version in TensorFlow?
MVE
For instance suppose I write
#create seq model
model = Sequential()
# add layers
model.add(Dense(100, input_dim = (10,), activation = 'relu'))
model.add(Dense(1, activation = 'linear'))
# compile model
model.compile(optimizer = 'adam', loss = 'mse')
# fit
model.fit(Xtrain, ytrain, epochs = 100, batch_size = 32)
# predict
ypred = model.predict(Xtest, batch_size = 32)
# evaluate
result = model.evaluate(Xtest)
This code might be wrong, since I just started, but I think you get the idea.
What I want to do is write down this code, run it (or not even, maybe!) and then have a function or something that will produce the TensorFlow code that Keras has written to do all these calculations.

First, let's clarify some of the language in the question. TensorFlow (and Theano) use computational graphs to perform tensor computations. So, when you ask if there is a way to "print out the equivalent version" in Tensorflow, or "produce TensorFlow code," what you're really asking is, how do you export a TensorFlow graph from a Keras model?
As the Keras author states in this thread,
When you are using the TensorFlow backend, your Keras code is actually building a TF graph. You can just grab this graph.
Keras only uses one graph and one session.
However, he links to a tutorial whose details are now outdated. But the basic concept has not changed.
We just need to:
Get the TensorFlow session
Export the computation graph from the TensorFlow session
Do it with Keras
The keras_to_tensorflow repository contains a short example of how to export a model from Keras for use in TensorFlow in an iPython notebook. This is basically using TensorFlow. It isn't a clearly-written example, but throwing it out there as a resource.
Do it with TensorFlow
It turns out we can actually get the TensorFlow session that Keras is using from TensorFlow itself, using the tf.contrib.keras.backend.get_session() function. It's pretty simple to do - just import and call. This returns the TensorFlow session.
Once you have the TensorFlow session variable, you can use the SavedModelBuilder to save your computational graph (guide + example to using SavedModelBuilder in the TensorFlow docs). If you're wondering how the SavedModelBuilder works and what it actually gives you, the SavedModelBuilder Readme in the Github repo is a good guide.
P.S. - If you are planning on heavy usage of TensorFlow + Keras in combination, have a look at the other modules available in tf.contrib.keras

So you want to use instead of WX+b a different function for your neurons. Well in tensorflow you explicitly calculate this product, so for example you do
y_ = tf.matmul(X, W)
you simply have to write your formula and let the network learn. It should not be difficult to implement.
In addition what you are trying to do (according to the paper you link) is called batch normalization and is relatively standard. The idea being you normalize your intermediate steps (in the different layers). Check for example https://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwikh-HM7PnWAhXDXRQKHZJhD9EQFggyMAE&url=https%3A%2F%2Farxiv.org%2Fabs%2F1502.03167&usg=AOvVaw1nGzrGnhPhNGEczNwcn6WK or https://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0ahUKEwikh-HM7PnWAhXDXRQKHZJhD9EQFghCMAM&url=https%3A%2F%2Fbcourses.berkeley.edu%2Ffiles%2F66022277%2Fdownload%3Fdownload_frd%3D1%26verifier%3DoaU8pqXDDwZ1zidoDBTgLzR8CPSkWe6MCBKUYan7&usg=AOvVaw0AHLwD_0pUr1BSsiiRoIFc
Hope that helps,
Umberto

Using Keras serialized model with dropout in pyspark

I have several neural networks built using Keras that I used so far mostly in Jupyter. I often save models from scikit-learn with joblib and Keras with json + hdf5 and use them in other notebooks without issue.
I made a Python Spark application that can make use of those serialized models in cluster mode. joblib models are working fine however, I encountered an issue with Keras.
Here is the model used in notebook and pyspark:
def build_gru_model():
model = Sequential()
model.add(Embedding(max_nb_words, 128, input_length=max_sequence_length, dropout=0.2))
model.add(GRU(128, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
both called the same way:
preds = model.predict_proba(data, verbose=0)
However, only in Spark I get the error:
MissingInputError: ("An input of the graph, used to compute DimShuffle{x,x,x,x}(keras_learning_phase), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", keras_learning_phase)
I've done the mandatory search and found: https://github.com/fchollet/keras/issues/2430 which points to https://keras.io/getting-started/faq/
If I indeed remove dropout from my model, it works. However, I fail to understand how to implement something that would allow me to keep dropout during the training phase like described in the FAQ.
Based on the model code, how one would accomplish this?

You can try to put (before your prediction)
import keras.backend as K
K.set_learning_phase(0)
It should set your learning phase to 0 (test time)

Multiple regression output nodes in tensorflow learn

I am relatively new to tensorflow and want to use the DNNRegressor from tf.contrib.learn for a regression task. But instead of one output node, I would like to have several (let's say ten for example).
How can I configure my regressor to adjust many output nodes to fit my needs?
My question is related to the following ones already asked on SO, but there seems to be no working answer (I am using TensorFlow version 0.11)
skflow regression predict multiple values
Multiple target columns with SkFlow TensorFlowDNNRegressor

It seems using tflearn will be the other choice.
Update: I realize we should use Keras as an well developed API for tensorflow+ theano .

Using tflearn this works:
net = tfl.input_data(shape=[None, n_features1, n_features2], name='input')
net = tfl.fully_connected(net, 128, activation='relu')
net = tfl.fully_connected(net, n_features, activation='linear')
net = tfl.regression(net, batch_size=batch_size, loss='mean_square', name='target')
Replace the single fully connected layer of 128 nodes here with whatever network architecture you want. And don't forget to choose the loss function appropriate to your problem, e.g., cross-entropy for classification.
python 2.7.11, tensorflow 0.10.0rc0, tflearn 0.2.1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.