Save sklearn cross validation object - python

Following the tutorial for sklearn, I attempted to save an object that was created via sklearn but was unsuccessful. It appears the problem is with the cross validation object, as I can save the actual (final) model.
Given:
rf_model = RandomForestRegressor(n_estimators=1000, n_jobs=4, compute_importances = False)
cvgridsrch = GridSearchCV(estimator=rf_model, param_grid=parameters,n_jobs=4)
cvgridsrch.fit(X,y)
This will succeed:
joblib.dump(cvgridsrch.best_estimator_, 'C:\\Users\\Desktop\\DMA\\cvgridsrch.pkl', compress=9)
and this will fail:
joblib.dump(cvgridsrch, 'C:\\Users\\Desktop\\DMA\\cvgridsrch.pkl', compress=9)
with error:
PicklingError: Can't pickle <type 'instancemethod'>: it's not found as __builtin__.instancemethod
How to save the full object?

If you are using Python 2,
try:
import dill
So that lambda functions can be pickled....

One possible cause could be multithreading issue, which you may refer to this stackoverflow answer.
Also, is it possible for you to dump your object not via joblib but a more fundamental method like pickle (and not even cPickle, which is more restrictive)?

I know this is an old question, but it might be useful for people coming here having the same, or similar, problem.
I'm not sure of the specific error message, but I managed to sucessfully save the entire GridSearchCV object in my own project by using pickle:
import pickle
gs = GridSearchCV(some parameters) #create the gridsearch object
gs.fit(X, y) # fit the model
with open('file_name', 'wb') as f:
pickle.dump(gs, f) # save the object to a file
Then you can use
with open('file_name', 'rb') as f:
gs = pickle.load(f)
to read the file and hence be able to use the object again.

Related

AttributeError: Can't get attribute 'tokenizer' on <module '__main__'>

I trained a logistic regression model on textual data and saved the model using pickle. But for testing when I try to load the model I got the error mentioned in the title while executing the following line:
model = pickle.load(open("sentiment.model", "rb"))
Following is the code used for saving the model:
import pickle
print("[INFO] saving Model...")
f = open('sentiment.model', "wb")
# first I saved the best_estimator_
f.write(pickle.dumps(gs_lr_tfidf.best_estimator_))
# but again I saved the model completely without mentioning any attribute i.e:
# f.write(pickle.dumps(gs_lr_tfidf))
# but none of them helped and I got the same error
f.close()
print("[INFO] Model saved!")
This error doesn't show up when I load the model in the same notebook just after finishing the training process (in the same runtime). But this error occurs when I try to load the model separately in different runtime even if the model loader code is the same. Why this is happening?
I think the problems is from the behaviour of pickle, as what #hafiz031 said, it's normal when run the same code in the file. So short answer is you need to import tokenizer(from whatever lib you use) before you load the model
For people who know chinese, you can go to this CSDN link for more info.
For people who don't know chinese, sorry for my bad English and I'll try my best to explain.
The documentation says:
pickle.loads(data, /, *, fix_imports=True, encoding='ASCII', errors='strict', buffers=None)
Return the reconstituted object hierarchy of the pickled representation data of an object. data must be a bytes-like object.
There is an implicit requirement if you use pickle.loads, the object hierarchy must be declared before you load it. Intuitively you can think as you bring USD to north pole and you want to exchange USD to fish with a penguin. As they don't have the concept what is money, they won't make the deal. Same as pickle, if you haven't import tokenizer before, after pickle loads the bytes back to tokenizer, they don't know what is 'tokenizer' and return error to you. Thats why your code works in training file but fail when you loads the model in a different file.
in my case, I just import an extra lib.
# import your own lib
import pickle
import nltk.tokenizer
import genism
import sklearn
#...
model = pickle.load(open("sentiment.model", "rb"))
#model.predict()

AttributeError: 'XGBClassifier' object has no attribute 'save_raw'

I keep getting the error mentioned in the title whenever I try to compile my file. I'm basically using this file https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/training.py
and the error happens on line 38 at save_raw
I've tried reinstalling different versions of xgboost with both pip and git clone, nothing seems to work. Can someone help me?
I am using the latest version of scikit, python and xgboost.
if xgb_model is not None:
if not isinstance(xgb_model, STRING_TYPES):
xgb_model = xgb_model.save_raw() //Error here
bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
nboost = len(bst.get_dump())
I have experienced with save in **XGBRegressor**
I think it is same with **XGBClassifier**.
I can working with **save_model** and **load_model** but some objects will not be saved or loaded.
def load_model(self, fname):
"""
Load the model from a file.
The model is loaded from an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature names) will not be loaded.
Label encodings (text labels to numeric labels) will be also lost.
**If you are using only the Python interface, we recommend pickling the model object for best results.**
So another solutions is considered
with me, pickle package works well
import pickle
pickle.dump(model, open("boston_earlyStopping.dat", "wb"))
new_model = pickle.load(open("boston_earlyStopping.dat", "rb"))
new_model.best_ntree_limit
99

Load LightGBM model from string or buffer

I would like to load a LightGBM model from a string or buffer rather than a file on disk.
It seems that there is a method called model_from_string documentation link but ... it produces an error, which seemingly defeats the purpose of the method as I understand it.
import boto3
import lightgbm as lgb
import io
model_path = 'some/path/here'
s3_bucket = boto3.resource('s3').Bucket('some-bucket')
obj = s3_bucket.Object(model_path)
buf = io.BytesIO()
try:
obj.download_fileobj(buf)
except Exception as e:
raise e
else:
model = lgb.Booster().model_from_string(buf.read().decode("UTF-8"))
which produces the following error....
TypeError: Need at least one training dataset or model file to create booster instance
Alternatively, I thought that I might be able to use the regular loading method
lgb.Booster(model_file=buf.read().decode("UTF-8"))
... but this also doesn't work.
FileNotFoundError: [Errno 2] No such file or directory: ''
Now, I realize that I can create a workaround by writing the buffer to disk, and then reading it. However, this feels very redundant and inefficient.
Thus, my question is, how can instantiate a model to use for prediction without pointing to a an actual file on disk?
It seems that there is an undocumented parameter model_str which can be used to initialize the lgb.Booster object.
model = lgb.Booster({'model_str': buf.read().decode("UTF-8")})
Source: https://github.com/Microsoft/LightGBM/issues/2097#issuecomment-482332232
Credit goes to Nikita Titov aka StrikerRUS on GitHub.

How to save to disk / export a lightgbm LGBMRegressor model trained in python?

Hi I am unable to find a way to save a lightgbm.LGBMRegressor model to a file for later re-use.
Try:
my_model.booster_.save_model('mode.txt')
#load from model:
bst = lgb.Booster(model_file='mode.txt')
Note: the API state that
bst = lgb.train(…)
bst.save_model('model.txt', num_iteration=bst.best_iteration)
Depending on the version, one of the above works. For generic, You can also use pickle or something similar to freeze your model.
import joblib
# save model
joblib.dump(my_model, 'lgb.pkl')
# load model
gbm_pickle = joblib.load('lgb.pkl')
Let me know if that helps
For Python 3.7 and lightgbm==2.3.1, I found that the previous answers were insufficient to correctly save and load a model. The following worked:
lgbr = lightgbm.LGBMRegressor(num_estimators = 200, max_depth=5)
lgbr.fit(train[num_columns], train["prep_time_seconds"])
preds = lgbr.predict(predict[num_columns])
lgbr.booster_.save_model('lgbr_base.txt')
Finally, we can validated that this worked via:
model = lightgbm.Booster(model_file='lgbr_base.txt')
model.predict(predict[num_columns])
Without the above, I was getting the error: AttributeError: 'LGBMRegressor' object has no attribute 'save_model'
With the lastest version of lightGBM using import lightgbm as lgb, here is how to do it:
model.save_model('lgb_classifier.txt', num_iteration=model.best_iteration)
and then you can read the model as follow :
model = lgb.Booster(model_file='lgb_classifier.txt')
clf.save_model('lgbm_model.mdl')
clf = lgb.Booster(model_file='lgbm_model.mdl')

TypeError: can't pickle face_BasicFaceRecognizer objects

I want to store the trained object, however there is an error shown above. What should I do if I need to store this trained model?
fishface = cv2.face.createFisherFaceRecognizer()
m = fishface.train(training_data, np.asarray(training_labels))
output = open('data.pkl', 'wb')
pickle.dump(fishface, output)
Unfortunately, OpenCV bindings typically don't support pickling. You'll have to use built-in OpenCV serialization.
You can do m.save("serialized_recognizer.cv2") and at runtime, m.load("serialized_recognizer.cv2"), if m is an instantiated FaceRecognizer.
This is what I have used to store the trained object to my current working directory. Saves me the work of holding all training data along with my application and time spent on training the model each time.
recognizer1 = cv2.face.createLBPHFaceRecognizer()
recognizer1.train(images, np.array(labels))
#save at the last of the program using:-
recognizer1.save('qwe.xml')
#load it later in a different program/instance using:-
recognizer1 = cv2.face.createLBPHFaceRecognizer()
recognizer1.load('qwe.xml')

Categories

Resources