XGBoost model quantization - Sklearn model quantization - python

I am looking for solutions to quantize sklearn models. I am specifically looking for XGBoost models.
I did find solutions to quantize pytorch and tensorflow models but nothing on sklearn.
Solutions tried:
Converted sklearn model to ONNX and then tried to quantize ONNX model, but that didn't work either. Here is the link to the bug.
Any pointers or solutions can be shared, it would be of great help.

Someone had answered the question in your link bug.
Try not to add a final node ZipMap by the 'zipmap' option:
onx = convert_sklearn(clr, initial_types=initial_type,
options={'zipmap': False})
I'm interested to know if that works for you?
BTW, you can use onnxmltools to convert a XGBoost model to ONNX according to this.
the sample code:
import onnx
import onnxmltools
from onnxmltools.convert.common.data_types import FloatTensorType
from xgboost import XGBClassifier
clf = XGBClassifier()
# fit the classifier...
onnx_model_path = "xgb_classifier.onnx"
initial_type = [('float_input', FloatTensorType([None, num_features]))]
onnx_model = onnxmltools.convert.convert_xgboost(clf, initial_types=initial_type, target_opset=10)
onnx.save(onnx_model, onnx_model_path)
Note that:
Model must be trained using the scikit-learn API of xgboost
The training data passed to XGBClassifier().fit() must not have feature names associated with it. For example, if your training data is a DataFrame called df, which has column names, you will need to use a representation without column names (i.e. df.values) when training.

Related

Converting onnx model to sklearn

I Originally trained the model with sklearn, which was then converted to onnx. I now want to get the Sklearn model back. The only related thing I can find online is converting onnx to tf-lite.
Thanks in advance!

Are there any alternate ways to apply class weights to tensorflow neural networks?

I am currently trying to create a Tensorflow DNN model with a multilabel target variable, and whilst my code hasn't had any problems so far, the imbalanced nature of the dataset that I'm working with has caused a few problems.
As per recommendations in Keras' documentation, I've applied an intial bias to the model. I've also tried to enable the class weight parameter in the model compile function and this is where I'm stuck
https://github.com/tensorflow/tensorflow/issues/41448
There seems to be a known bug in this method as seen in this GitHub link, and my attempts at creating a workaround haven't been successful at all. I'd appreciate any advice on creating a workaround because I'm at a loss myself to be honest. Currently running Tensorflow 2.4
You are using a slightly old version of TensorFlow. This worked for me in a multiclass dataset using TensorFlow 2.7 and Keras 2.7:
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight(class_weight="balanced", classes=np.unique(y_train),
y=y_train)
model.fit(
...
class_weight=dict(enumerate(class_weights))
)
The values of y_train must be integers in the range [0, NUMBER_CLASSES - 1] for this code to work correctly. You can accomplish this using LabelEncoder.
Alternatively, you can use sample_weight instead of class_weight to accomplish the same thing (in fact, Keras internally converts class_weight to sample_weight). Here you can find the documentation about these parameters.
Other easy-to-implement and effective methods to combat data imbalance are oversampling and undersampling, which have a similar effect to using class_weight. You can use them in case you have problems using class_weight or sample_weight.

Converting H2O AutoML Model to Sklearn Model

I have an H2O AutoML generated GBM model using python. I wonder if we can convert this into a standard sklearn model so that I can fit it into my ecosystem of other sklearn models.
I can see the model properties as below when I print the model.
If direct conversion from H2O to sklearn is not feasible, is there a way we can use the above properties to recreate GBM in sklearn? These terminologies look slightly different from the standard sklearn GBM parameters.
Thanks in advance.
It will be a bit tricky, since the packages are a bit different. Sklearn is based on Python/Cython/C and H2O uses Java. The underlying algorithms could also be different. However, you can try matching/translating your hyperparameters between the two since they will be similar.
Additionally, it would be a good idea to have an ecosystem that is library agnostic so that you can interchange different models.

Extract native xgboost model from H2OXGBoostEstimator model in Python

Is it possible to extract native xgboost model pickle file from H2OXGBoostEstimator model in Python and read in by raw XGBoost Python API? Thanks!
You can try these two "h2o-to-xgboost" methods to extract XGBoost hyperparameters and DMatrix from a trained H2O model, that (according to the docs) will give you exactly the same XGBoost native python model.
nativeXGBoostParam = h2oModelD.convert_H2OXGBoostParams_2_XGBoostParams()
nativeXGBoostInput = data.convert_H2OFrame_2_DMatrix(myX, y, h2oModelD)
nativeModel = xgb.train(dtrain=nativeXGBoostInput,
params=nativeXGBoostParam[0],
num_boost_round=nativeXGBoostParam[1])
More info:
docs
example

Feature Importance Chart in neural network using Keras in Python

I am using python(3.6) anaconda (64 bit) spyder (3.1.2). I already set a neural network model using keras (2.0.6) for a regression problem(one response, 10 variables). I was wondering how can I generate feature importance chart like so:
def base_model():
model = Sequential()
model.add(Dense(200, input_dim=10, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer = 'adam')
return model
clf = KerasRegressor(build_fn=base_model, epochs=100, batch_size=5,verbose=0)
clf.fit(X_train,Y_train)
I was recently looking for the answer to this question and found something that was useful for what I was doing and thought it would be helpful to share. I ended up using a permutation importance module from the eli5 package. It most easily works with a scikit-learn model. Luckily, Keras provides a wrapper for sequential models. As shown in the code below, using it is very straightforward.
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance
def base_model():
model = Sequential()
...
return model
X = ...
y = ...
my_model = KerasRegressor(build_fn=base_model, **sk_params)
my_model.fit(X,y)
perm = PermutationImportance(my_model, random_state=1).fit(X,y)
eli5.show_weights(perm, feature_names = X.columns.tolist())
This is a relatively old post with relatively old answers, so I would like to offer another suggestion of using SHAP to determine feature importance for your Keras models. SHAP offers support for both 2d and 3d arrays compared to eli5 which currently only supports 2d arrays (so if your model uses layers which require 3d input like LSTM or GRU, eli5 will not work).
Here is the link to an example of how SHAP can plot the feature importance for your Keras models, but in case it ever becomes broken some sample code and plots are provided below as well (taken from said link):
import shap
# load your data here, e.g. X and y
# create and fit your model here
# load JS visualization code to notebook
shap.initjs()
# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
shap.summary_plot(shap_values, X, plot_type="bar")
At the moment Keras doesn't provide any functionality to extract the feature importance.
You can check this previous question:
Keras: Any way to get variable importance?
or the related GoogleGroup: Feature importance
Spoiler: In the GoogleGroup someone announced an open source project to solve this issue..

Categories

Resources