Is it possible to extract native xgboost model pickle file from H2OXGBoostEstimator model in Python and read in by raw XGBoost Python API? Thanks!
You can try these two "h2o-to-xgboost" methods to extract XGBoost hyperparameters and DMatrix from a trained H2O model, that (according to the docs) will give you exactly the same XGBoost native python model.
nativeXGBoostParam = h2oModelD.convert_H2OXGBoostParams_2_XGBoostParams()
nativeXGBoostInput = data.convert_H2OFrame_2_DMatrix(myX, y, h2oModelD)
nativeModel = xgb.train(dtrain=nativeXGBoostInput,
params=nativeXGBoostParam[0],
num_boost_round=nativeXGBoostParam[1])
More info:
docs
example
Related
I am looking for solutions to quantize sklearn models. I am specifically looking for XGBoost models.
I did find solutions to quantize pytorch and tensorflow models but nothing on sklearn.
Solutions tried:
Converted sklearn model to ONNX and then tried to quantize ONNX model, but that didn't work either. Here is the link to the bug.
Any pointers or solutions can be shared, it would be of great help.
Someone had answered the question in your link bug.
Try not to add a final node ZipMap by the 'zipmap' option:
onx = convert_sklearn(clr, initial_types=initial_type,
options={'zipmap': False})
I'm interested to know if that works for you?
BTW, you can use onnxmltools to convert a XGBoost model to ONNX according to this.
the sample code:
import onnx
import onnxmltools
from onnxmltools.convert.common.data_types import FloatTensorType
from xgboost import XGBClassifier
clf = XGBClassifier()
# fit the classifier...
onnx_model_path = "xgb_classifier.onnx"
initial_type = [('float_input', FloatTensorType([None, num_features]))]
onnx_model = onnxmltools.convert.convert_xgboost(clf, initial_types=initial_type, target_opset=10)
onnx.save(onnx_model, onnx_model_path)
Note that:
Model must be trained using the scikit-learn API of xgboost
The training data passed to XGBClassifier().fit() must not have feature names associated with it. For example, if your training data is a DataFrame called df, which has column names, you will need to use a representation without column names (i.e. df.values) when training.
When training and saving an xgboost model in Python using the base API (i.e., xgboost.train(args)), we can then save the parameters using .save_model():
import xgboost
model = xgboost.train(args) # Learning API
model.save_model(args)
loaded_model = xgboost.XGBRegressor() # Scikit-Learn API
loaded_model.load_model(args)
How can we load this trained model into the xgboost sklearn API? My goal is to load a trained xgboost model (trained using the Learning API) into the xgboost Scikit-Learn API as a fitted model, so that I can then leverage other sklearn functions.
The approach I included in the code above does not enable the loaded model to work with other sklearn functions, and I get a NotFittedError when I try to use other sklearn functions on the model.
Here is a link to the Python API for the model I am using: https://xgboost.readthedocs.io/en/latest/python/python_api.html
I am training the model using the 'Learning API' and trying to load the model into the 'Scikit-Learn API'.
Assuming you have used one the standard classifiers or models of scikit-learn package, you can save and load your models using pickle:
import pickle
model.train(X)
saved_model = pickle.dumps(model)
# Load the pickled model
loaded_model = pickle.loads(saved_model)
# Using the loaded model to predict new data
loaded_model.predict(X_test)
You can also save the saved_model into an arbitrary file and load it afterwards.
import pickle
model.train(X)
file_pi = open('model.obj', 'w')
pickle.dump(model, file_pi)
# Load the pickled model
filehandler = open(filename, 'r')
loaded_model = pickle.load(filehandler)
# Using the loaded model to predict new data
loaded_model.predict(X_test)
I have the Ensemble model that combines both tensorflow and scikit-learn. And I would like to save this Ensemble model as a box to feed data in and generate the output. My code is as below
def model_base_LSTM(***):
***
model = model_base_LSTM(***)
ensem_model = BaggingRegressor(base_estimator=model, n_estimators=15)
ensem_model.fit(x_train, y_train)
bag_mod_pred = ensem_model.predict(x_test_bag)
from joblib import dump, load
dump(ensem_model, 'LSTM_Ensemble.joblib')
TypeError: can't pickle _thread._local objects
So, how to solve this problem??
You can save your TensorFlow (and even PyTorch) models with Scikit-Learn, but only if you use Neuraxle and its saving mechanics.
Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.
The trick is performed by using Neuraxle-TensorFlow or Neuraxle-PyTorch.
Why so?
Using one of Neuraxle-TensorFlow or Neuraxle-PyTorch will provide you with a saver to allow your thing to be serialized correctly. You want it to be serialized correctly to be able to ensure compatibility between scikit-learn and your Deep Learning framework when it comes time to save or parallelize things and so forth. You can read how Neuraxle solves this with savers here.
Code Examples
Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.
Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline
Suppose I have successfully trained a XGBoost machine learning model in python.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7)
model = XGBClassifier()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
I want to port this model to another system which will be writte in C/C++. To do this, I need to know the internal logic of the XGboost trained model and translate them into a series of if-then-else statements like decision trees, if I am not wrong.
How can this be done? How to find out the internal logic of the XGBoost trained model to implement it on another system?
I am using python 3.7.
m2cgen Is an awesome package that will convert Scikit-Learn compatible models into raw code. If you are using XGBoosts sklearn wrappers (which it looks like you are), then you can simply call something like this:
model = XGBClassifier()
model.fit(x_train, y_train)
...
import m2cgen as m2c
with open('./model.c','w') as f:
code = m2c.export_to_c(model)
f.write(code)
The really awesome thing about this package, is that it supports many different types of models, such as
Linear
SVM
Tree
Random Forest
Boosting
One more thing. m2cgen also supports multiple languages such as
C
C#
Dart
Go
Haskell
Java
JavaScript
PHP
PowerShell
Python
R
Visual Basic
I hope this helps!
Someone wrote a script that does exactly this. Check out https://github.com/popcorn/xgb2cpp
The recommended way of using any ml/dl model is making simple RESTful API with flask/bottle (these are lightweight python frameworks) and use them globally with any language.
You can also containerize RESTful API with docker in case you are developing a big project with a lot of models.
Even containerized Restful APIs are used to deploy models on the cloud, ex- aws.
If you are interested in getting the logic behind any ml model, always have a look on its source code (on GitHub).
I am learning XGBoost and I am using python (3.x). I came cross the XGBoost cv function. Suppose, I have two models gbt1 and gbt2 which I created using XGBClassifier. Now, I was looking to use the CV method of XGBoost for cross validation. I noticed that I didn't nee to specify which model I am trying to optimize here. I just need to pass the param and DMatrix. My question here is how XGBoost determine what model or estimator use ?
cv_df = xgb.cv(params, DTrain, num_boost_round = 5, nfold=n_folds,
early_stopping_rounds= early_stopping)