I have trained a classifier model using RapidMiner after a trying a lot of algorithms and evaluate it on my dataset.
I also export the model from RapidMiner as XML and pkl file, but I can't read it in my python program (scikit-learn).
Is there any way to import RapidMiner classifier/model in a python program and use it to predict or classify new data in my end application?
Practically, I would say no - just train your model in sklearn from the beginning if that's where you want it.
Your RapidMiner model is some kind of object. The two formats you are exporting as are just storage methods. Sklearn models are a different kind of object. You can't directly save one and load it into the other. A similar example would be to ask if you can take an airplane engine and load it into a train.
To do what you're asking, you'll need to take the underlying data that your classifier saved, find the format, and then figure out a way to get it in the same format as a sklearn classifier. This is dependent on what type of classifier you have. For example, if you're using a bayesian model, you could somehow capture the prior probabilities and then use those, but this isn't trivial.
You could use the pmml extenstion for RapidMiner to export your model.
For python there is for example the augustus library that can work with pmml files.
Related
I am trying to train a custom object detector using tflite model maker (https://www.tensorflow.org/lite/tutorials/model_maker_object_detection). I want to deploy trained tflite model to coral edgeTPU. I want to use tensorflow tfrecord (multiple) as input for training a model like object detection API. I tried with
tflite_model_maker.object_detector.DataLoader(
tfrecord_file_patten, size, label_map, annotations_json_file=None
) but I am not able to work around it. I have following questions.
Is it possible to tfrecord for training like mentioned above?
Is it also possible to pass multiple CSV files for training?
For multiple CSV files, you could probably just append one file to the other. Then you'd just have to pass one csv file.
As for passing a tfrecord instead, this should be possible. I'm also attempting to do this, so if I get it working I'll update my post. Looking at the source, it seems from_cache is the function internally used. Following that structure, should be able to create a DataLoader object similarly:
train_data = DataLoader(tfrecord_file_patten, meta_data['size'],
meta_data['label_map'], ann_json_file)
In this case, tfrecord_file_patten should be a tfrecord of your training data. You can construct the validation and test data the same way. This will work provided you're constructing your TFRecords correctly. There appears to be some inconsistency to how it's done in different places, so make sure you follow the same structure in creating the TFRecords as found in the ModelMaker source. This worked for me. One specific thing to watch out for is to use an integer for the 'image/source_id' feature in your TFExamples. If you use a string it'll throw an error.
I've currently trained & tested several supervised models using sklearn and xgboost (using the same data). The xgboost model performs slightly better than sklearn's LassoCV.
I'm trying to find a way to export the model object so that it can be interacted with by non-technical folks in either Excel and/or VBA. Specifically, non-technical folk need to be able to enter all feature values for a new observation, and have the exported sklearn or xgboost model output a new prediction.
I know I can save sklearn and xgboost objects down as .pkl. Is there an interface or API that can take input from excel, pass it to the .pkl model file, and return the correct scalar prediction? There are about 40-50 features that will need to be inputted and passed to the exported model. After exporting the model from python, it will not need to retrained, only used for prediction.
I have built a gbm classifier on R using the library gbm.
gbm2<-gbm(deal_stage~.,data=train,train.fraction=1,
interaction.depth=4,shrinkage=.001,
n.trees=6000,bag.fraction=0.5,cv.folds=5,
distribution="bernoulli",verbose=T)
r2pmml(gbm2,"/gbm_test.pmml",compact=TRUE)
Then on Python, when I try to do predictions from the PMML file, I get different results than what I had on R.
from pypmml import Model
model = Model.fromFile('gbm_test.pmml')
model.predict(observation)
Overall, I get a different accuracy on the train and on the test set for both models.
My dataset contains integer, and string features. And there are missing values for some fields, which should normally be handled by the classifier.
I would greatly appreciate an advice to see what should I change to make my predictions on Python coincide with what I observe on R!
Thanks!
If you're using JPMML tools for converting models to PMML files, then you should be also using JPMML evaluators for scoring these PMML files. The JPMML software project has extensive integration test coverage, which covers the whole pipeline.
Right now, will you be getting the correct predictions if you switch from PyPMML to JPMML-Evaluator-Python?
There are many ways to save a model and its weights. It is confusing when there are so many ways and not any source where we can read and compare their properties.
Some of the formats I know are:
1. YAML File - Structure only
2. JSON File - Structure only
3. H5 Complete Model - Keras
4. H5 Weights only - Keras
5. ProtoBuf - Deployment using TensorFlow serving
6. Pickle - Scikit-learn
7. Joblib - Scikit-learn - replacement for Pickle, for objects containing large data.
Discussion:
Unlike scikit-learn, Keras does not recommend you save models using pickle. Instead, models are saved as an HDF5 file. The HDF5 file contains everything you need to not only load the model to make predictions (i.e., architecture and trained parameters) but also to restart training (i.e., loss and optimizer settings and the current state).
What are other formats to save the model for Scikit-learn, Keras, Tensorflow, and Mxnet? Also what info I am missing about each of the above-discussed formats?
There are also formats like onnx which basically supports most of the frameworks and helps in removing the confusion of using different formats for different frameworks.
There exists also TFJS format, which enables you to use the model on web or node.js environments. Additionally, you will need TF Lite format to make inference on mobile and edge devices. Most recently, TF Lite for Microcontrollers exports the model as a byte array in C header file.
Your question on formats for saving a model has multiple possible answers, based on why you want to save your model:
Save your model to resume training it later
Save your model to load it for inference later
These scenarios give you a couple of options:
You could save your model using the library-specific saving functions; if you want to resume training, make sure that you have saved all the information you need to really be able to resume training. Formats here will vary by library, and indeed are not aimed at being formats that you would inspect or read in any way - they are just files. If you are looking for a library that wraps all of these save functions behind a common API, you should check out the modelstore Python library.
You can also want to use a common format like ONNX; there are converters from Keras to ONNX and scikit-learn to ONNX available; but it is uncommon to use this format to later resume training. The benefit here is that they are all saved to a common format, which may streamline the process of loading them later.
I am new with machine learning and want to do following implementation
Want to create a custom .mlmodel with input of "xls or csv or nsdata of this files" and output should be double or array.
Pythone file should be able to read input data because i am going to use train_data from this input.
Pythone will do some calculation on this input data and provide prediction on this (i will do this calculation using sklearn,LinearRegression)
Can any one please help me how i can do this ?
You can use python to train your model with SKLearn as you suggested. This is a good post on getting started with that (make sure you use Sklearn and not Statsmodels).
https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9
When you have trained your model, you can convert it using Apple's coremltools:
https://github.com/apple/coremltools
When you've converted it you can add your .mlmodel file to your xcode project. You'll then need to write some code to get all of the your model inputs collected from your app and pass them as inputs to the model.
Good luck!