Cannot save scikit-learn model using joblib? - python

I have the Ensemble model that combines both tensorflow and scikit-learn. And I would like to save this Ensemble model as a box to feed data in and generate the output. My code is as below
def model_base_LSTM(***):
***
model = model_base_LSTM(***)
ensem_model = BaggingRegressor(base_estimator=model, n_estimators=15)
ensem_model.fit(x_train, y_train)
bag_mod_pred = ensem_model.predict(x_test_bag)
from joblib import dump, load
dump(ensem_model, 'LSTM_Ensemble.joblib')
TypeError: can't pickle _thread._local objects
So, how to solve this problem??

You can save your TensorFlow (and even PyTorch) models with Scikit-Learn, but only if you use Neuraxle and its saving mechanics.
Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.
The trick is performed by using Neuraxle-TensorFlow or Neuraxle-PyTorch.
Why so?
Using one of Neuraxle-TensorFlow or Neuraxle-PyTorch will provide you with a saver to allow your thing to be serialized correctly. You want it to be serialized correctly to be able to ensure compatibility between scikit-learn and your Deep Learning framework when it comes time to save or parallelize things and so forth. You can read how Neuraxle solves this with savers here.
Code Examples
Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.
Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline

Related

How to open and analyze Google Tables model using python/tf?

I read several discussions about this and still cannot make it work for my case
Have a classification model trained using Google Tables.
Exported the model and download the directory with cli.
My goal is to get a better understanding of the model trained by google, study it, understand its decisions. And later try to prune it to improve performance.
I'm using this code, just to start:
import tensorflow as tf
from tensorflow import keras
import struct2tensor
location = "model_dir"
model = tf.saved_model.load(location)
model.summary()
I get this error:
AttributeError: 'AutoTrackable' object has no attribute 'summary'
the variable model is of type:
<tensorflow.python.training.tracking.tracking.AutoTrackable at 0x7fa8eaa7ed30>
And I stuck there, don't know how to continue. Using Python 3.8 and the last version of those libraries. Any idea of how can I proceed?
Thanks!
The proper method to load your model depends on your file formatting.
You can see in the Tensorflow documentation that "The object returned by tf.saved_model.load is not a Keras object (i.e. doesn't have .fit, .predict, etc. methods)" and "Use tf.keras.models.load_model to restore the Keras model".
I'm not sure if you want to use the keras module or not, but since you have imported it I assume you do. In that case I would recommend checking this other Stackoverflow thread where it is explained how to use the tf.keras.models.load_model method depending if your model is saved as .pb or .h5.
If the model is saved as .pb you should use it with the string pointing to the directory where the model is saved, as you did in your code snippet but in this case using the keras method:
model = tf.keras.models.load_model('model_dir')
If instead it's saved as .h5 you should use it specifying it:
model = tf.keras.models.load_model('my_model_in_h5.h5')

What are all the formats to save machine learning model in scikit-learn, keras, tensorflow and mxnet?

There are many ways to save a model and its weights. It is confusing when there are so many ways and not any source where we can read and compare their properties.
Some of the formats I know are:
1. YAML File - Structure only
2. JSON File - Structure only
3. H5 Complete Model - Keras
4. H5 Weights only - Keras
5. ProtoBuf - Deployment using TensorFlow serving
6. Pickle - Scikit-learn
7. Joblib - Scikit-learn - replacement for Pickle, for objects containing large data.
Discussion:
Unlike scikit-learn, Keras does not recommend you save models using pickle. Instead, models are saved as an HDF5 file. The HDF5 file contains everything you need to not only load the model to make predictions (i.e., architecture and trained parameters) but also to restart training (i.e., loss and optimizer settings and the current state).
What are other formats to save the model for Scikit-learn, Keras, Tensorflow, and Mxnet? Also what info I am missing about each of the above-discussed formats?
There are also formats like onnx which basically supports most of the frameworks and helps in removing the confusion of using different formats for different frameworks.
There exists also TFJS format, which enables you to use the model on web or node.js environments. Additionally, you will need TF Lite format to make inference on mobile and edge devices. Most recently, TF Lite for Microcontrollers exports the model as a byte array in C header file.
Your question on formats for saving a model has multiple possible answers, based on why you want to save your model:
Save your model to resume training it later
Save your model to load it for inference later
These scenarios give you a couple of options:
You could save your model using the library-specific saving functions; if you want to resume training, make sure that you have saved all the information you need to really be able to resume training. Formats here will vary by library, and indeed are not aimed at being formats that you would inspect or read in any way - they are just files. If you are looking for a library that wraps all of these save functions behind a common API, you should check out the modelstore Python library.
You can also want to use a common format like ONNX; there are converters from Keras to ONNX and scikit-learn to ONNX available; but it is uncommon to use this format to later resume training. The benefit here is that they are all saved to a common format, which may streamline the process of loading them later.

Using Pytorch's dataloaders & transforms with sklearn

I have been using pytorch a lot and got used to their dataloaders and transforms, in particular when it comes to data augmentation, as they're very user-friendly and easy to understand.
However, I need to run some ML models from sklearn.
Is there a way to use pytorch's dataloaders for sklearn ?
Yes, you can. You can do this with sklearn's partial_fit method. Read HERE.
6.1.3. Incremental learning
Finally, for 3. we have a number of options inside scikit-learn. Although all algorithms cannot learn
incrementally (i.e. without seeing all the instances at once), all
estimators implementing the partial_fit API are candidates. Actually,
the ability to learn incrementally from a mini-batch of instances
(sometimes called “online learning”) is key to out-of-core learning as
it guarantees that at any given time there will be only a small amount
of instances in the main memory. Choosing a good size for the
mini-batch that balances relevancy and memory footprint could involve
some tuning [1].
Not all algorithms can do this, however.
Then, you can use pytorch's dataloader to preprocess the data and feed it in batches to partial_fit.
I came across the skorch library recently and this could help you.
"The goal of skorch is to make it possible to use PyTorch with sklearn. "
From the skorch docs:
class skorch.dataset.Dataset(X, y=None, length=None)
General dataset wrapper that can be used in conjunction with PyTorch DataLoader.
I guess you could use the Dataset class for wrapping your PyTorch DataLoader and use sklearn models. If you would like to use other PyTorch features like PyTorch Tensors you could also do that.

Deep learning: save and load a universal machine model through different libraries

My questions can be divided into two parts.
Is there a format of machine learning model file that can be used through different libraries? For example, I saved a model by pytorch, then load it using tensorflow?
If not, is there a library that can help transfer the formats so that a pytorch machine learning model can be used directly in keras?
The reason why I ask this question is that recently I need to adjust some of my previous trained models in tensorflow to pytorch.
An update for this question:
Facebook and Microsoft are going to launch a model standard called ONNX, which is used for transferring models between different framworks, for example between Pytorch to Caffe2. Link in the following:
https://research.fb.com/facebook-and-microsoft-introduce-new-open-ecosystem-for-interchangeable-ai-frameworks/
An further update for this question:
Tensorflow itself use Protocol Buffer format to store model file, which can be used for transfer between different models. Link in the following:
https://www.tensorflow.org/extend/tool_developers/
Very interesting question. A neural network is a mathematical abstraction consisting of a network of layers (convolution, recurrent, ...), operations (dot product, non-linearity, ...) and their respective parameters (weights, biases).
AFAIK, there's not an universal model file. Nonetheless, different libraries allow users to save their models in a binary format.
There's no library for conversion but there's effort on github repo that addresses this question.
Predictive Markup Modeling Language (PMML) is an XML-based representation language for many machine learning models. It's an open standard that's used by many companies for serializing and deserializing models. I've used libraries that support PMML for machine learning models like SVM and decision trees but have not used it for deep learning models. However, there are open source projects that will work with Tensorflow and Keras, but these libraries seem to be for serializing and deserializing for use with the same library. You might want to check if PMML is making progress for serializing and deserializing between libraries.
If not, is there a library that can help transfer the formats so that a pytorch machine learning model can be used directly in keras?
You can try a Pytorch2Keras converter.
In that moment, it supports base layers like Conv2d, Linear, Activations, Element-wise operations. So, I converted ResNet50 with error 1e-6.

Can I export RapidMiner model to integrate with python?

I have trained a classifier model using RapidMiner after a trying a lot of algorithms and evaluate it on my dataset.
I also export the model from RapidMiner as XML and pkl file, but I can't read it in my python program (scikit-learn).
Is there any way to import RapidMiner classifier/model in a python program and use it to predict or classify new data in my end application?
Practically, I would say no - just train your model in sklearn from the beginning if that's where you want it.
Your RapidMiner model is some kind of object. The two formats you are exporting as are just storage methods. Sklearn models are a different kind of object. You can't directly save one and load it into the other. A similar example would be to ask if you can take an airplane engine and load it into a train.
To do what you're asking, you'll need to take the underlying data that your classifier saved, find the format, and then figure out a way to get it in the same format as a sklearn classifier. This is dependent on what type of classifier you have. For example, if you're using a bayesian model, you could somehow capture the prior probabilities and then use those, but this isn't trivial.
You could use the pmml extenstion for RapidMiner to export your model.
For python there is for example the augustus library that can work with pmml files.

Categories

Resources