I Originally trained the model with sklearn, which was then converted to onnx. I now want to get the Sklearn model back. The only related thing I can find online is converting onnx to tf-lite.
Thanks in advance!
I am currently trying to create a Tensorflow DNN model with a multilabel target variable, and whilst my code hasn't had any problems so far, the imbalanced nature of the dataset that I'm working with has caused a few problems.
As per recommendations in Keras' documentation, I've applied an intial bias to the model. I've also tried to enable the class weight parameter in the model compile function and this is where I'm stuck
https://github.com/tensorflow/tensorflow/issues/41448
There seems to be a known bug in this method as seen in this GitHub link, and my attempts at creating a workaround haven't been successful at all. I'd appreciate any advice on creating a workaround because I'm at a loss myself to be honest. Currently running Tensorflow 2.4
You are using a slightly old version of TensorFlow. This worked for me in a multiclass dataset using TensorFlow 2.7 and Keras 2.7:
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight(class_weight="balanced", classes=np.unique(y_train),
y=y_train)
model.fit(
...
class_weight=dict(enumerate(class_weights))
)
The values of y_train must be integers in the range [0, NUMBER_CLASSES - 1] for this code to work correctly. You can accomplish this using LabelEncoder.
Alternatively, you can use sample_weight instead of class_weight to accomplish the same thing (in fact, Keras internally converts class_weight to sample_weight). Here you can find the documentation about these parameters.
Other easy-to-implement and effective methods to combat data imbalance are oversampling and undersampling, which have a similar effect to using class_weight. You can use them in case you have problems using class_weight or sample_weight.
I have an H2O AutoML generated GBM model using python. I wonder if we can convert this into a standard sklearn model so that I can fit it into my ecosystem of other sklearn models.
I can see the model properties as below when I print the model.
If direct conversion from H2O to sklearn is not feasible, is there a way we can use the above properties to recreate GBM in sklearn? These terminologies look slightly different from the standard sklearn GBM parameters.
Thanks in advance.
It will be a bit tricky, since the packages are a bit different. Sklearn is based on Python/Cython/C and H2O uses Java. The underlying algorithms could also be different. However, you can try matching/translating your hyperparameters between the two since they will be similar.
Additionally, it would be a good idea to have an ecosystem that is library agnostic so that you can interchange different models.
Is it possible to extract native xgboost model pickle file from H2OXGBoostEstimator model in Python and read in by raw XGBoost Python API? Thanks!
You can try these two "h2o-to-xgboost" methods to extract XGBoost hyperparameters and DMatrix from a trained H2O model, that (according to the docs) will give you exactly the same XGBoost native python model.
nativeXGBoostParam = h2oModelD.convert_H2OXGBoostParams_2_XGBoostParams()
nativeXGBoostInput = data.convert_H2OFrame_2_DMatrix(myX, y, h2oModelD)
nativeModel = xgb.train(dtrain=nativeXGBoostInput,
params=nativeXGBoostParam[0],
num_boost_round=nativeXGBoostParam[1])
More info:
docs
example
I am using python(3.6) anaconda (64 bit) spyder (3.1.2). I already set a neural network model using keras (2.0.6) for a regression problem(one response, 10 variables). I was wondering how can I generate feature importance chart like so:
def base_model():
model = Sequential()
model.add(Dense(200, input_dim=10, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer = 'adam')
return model
clf = KerasRegressor(build_fn=base_model, epochs=100, batch_size=5,verbose=0)
clf.fit(X_train,Y_train)
I was recently looking for the answer to this question and found something that was useful for what I was doing and thought it would be helpful to share. I ended up using a permutation importance module from the eli5 package. It most easily works with a scikit-learn model. Luckily, Keras provides a wrapper for sequential models. As shown in the code below, using it is very straightforward.
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance
def base_model():
model = Sequential()
...
return model
X = ...
y = ...
my_model = KerasRegressor(build_fn=base_model, **sk_params)
my_model.fit(X,y)
perm = PermutationImportance(my_model, random_state=1).fit(X,y)
eli5.show_weights(perm, feature_names = X.columns.tolist())
This is a relatively old post with relatively old answers, so I would like to offer another suggestion of using SHAP to determine feature importance for your Keras models. SHAP offers support for both 2d and 3d arrays compared to eli5 which currently only supports 2d arrays (so if your model uses layers which require 3d input like LSTM or GRU, eli5 will not work).
Here is the link to an example of how SHAP can plot the feature importance for your Keras models, but in case it ever becomes broken some sample code and plots are provided below as well (taken from said link):
import shap
# load your data here, e.g. X and y
# create and fit your model here
# load JS visualization code to notebook
shap.initjs()
# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
shap.summary_plot(shap_values, X, plot_type="bar")
At the moment Keras doesn't provide any functionality to extract the feature importance.
You can check this previous question:
Keras: Any way to get variable importance?
or the related GoogleGroup: Feature importance
Spoiler: In the GoogleGroup someone announced an open source project to solve this issue..