How to deploy my .pkl machine learnt model made with fastai on my other machine with python on the desktop - python

I have followed the fastai documentation and the videos on how to create a ML model that can detect the different home care products like soaps and deodorants and so on.
I have now come to where I have the model that supposedly works with an error rate of 0.03...
to my understanding its about a 97% accurate model, however I have no idea on how to predict on other images on another machine. I have exported it using the "learn.export('Home_Care_Model.pkl')" as said in the documentation, with no luck.
Now in the documentation is states that I would need to define the model again with a classes and training set and so on again but now I'm on another computer so i don't have those files on it and I can't go through running it on the web as it is supposed to be run as a python script on any desktop (end goal).
What I'm going towards is where I have one file with unsorted images that then when i run the model on will separate the images into two different folders according to the prediction.
I have been searching for an answer to this and to be honest i'm not sure if i am just not understanding it well enough or something as i have come up empty with every attempt i make to get this model working.
Here is my training code:
from fastai import *
from fastai.vision import *
%matplotlib inline
%reload_ext autoreload
%autoreload 2
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
path = Path('my files path...')
print(path)
for folder in ('soap','deo'): # I have more but it will waist space.
print (folder)
verify_images(path/folder, max_size=500)
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train='.', valid_pct=0.3, ds_tfms=get_transforms(),
size=224, num_workers=4).normalize(imagenet_stats)
data.classes
from fastai.metrics import error_rate
learn= create_cnn(data, models.resnet34, metrics=error_rate)
learn
defaults.device = torch.device('cuda')
learn.fit_one_cycle(5)
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(4, max_lr=slice(3e-6,3e-5))
learn.save('day2Test_02')
from fastai.widgets import *
ds, idxs = DatasetFormatter().from_toplosses(learn)
ImageCleaner(ds, idxs, path)
df = pd.read_csv(path/'cleaned.csv', header='infer')
df.head()
df[(df['name'].apply(lambda x: len(x)<5))]
np.random.seed(42)
db =(ImageList.from_df(df,path).random_split_by_pct(0.2).label_from_df().transform(get_transforms(), size= 224).databunch(bs=8)).normalize(imagenet_stats)
data.classes, data.c, len(data.train_ds), len(data.valid_ds)
db.classes, db.c,len(db.train_ds), len(data.valid_ds)
learn.data = db
learn.freeze()
learn.fit_one_cycle(4)
learn.save('day2Test_02_01')
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(4, max_lr=slice(3e-5,3e-4))
learn.save('dat2Test_test2')
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
learn.export('day2Test_test2.pkl')

Related

How to load a finetuned sciBERT model in AllenNLP?

I have finetuned the SciBERT model on the SciIE dataset. The repository uses AllenNLP to finetune the model. The training is executed as follows:
python -m allennlp.run train $CONFIG_FILE --include-package scibert -s "$#"
After a successful training I have a model.tar.gz file as an output that contains weights.th, config.json, and vocabulary folder. I have tried to load it in the allenlp predictor:
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("model.tar.gz")
But I get the following error:
ConfigurationError: bert-pretrained not in acceptable choices for
dataset_reader.token_indexers.bert.type: ['single_id', 'characters',
'elmo_characters', 'spacy', 'pretrained_transformer',
'pretrained_transformer_mismatched']. You should either use the
--include-package flag to make sure the correct module is loaded, or use a fully qualified class name in your config file like {"model":
"my_module.models.MyModel"} to have it imported automatically.
I have never worked with allenNLP, so I am quite lost about what to do.
For reference, this is the part of the config that describer token indexers
"token_indexers": {
"bert": {
"type": "bert-pretrained",
"do_lowercase": "false",
"pretrained_model": "/home/tomaz/neo4j/scibert/model/vocab.txt",
"use_starting_offsets": true
}
}
I am using allenlp version
Name: allennlp
Version: 1.2.1
Edit:
I think I have made a lot of progress, I have to use the same version that was used to train the model and I can import the modules like so:
from allennlp.predictors.predictor import Predictor
from scibert.models.bert_crf_tagger import *
from scibert.models.bert_text_classifier import *
from scibert.models.dummy_seq2seq import *
from scibert.dataset_readers.classification_dataset_reader import *
predictor = Predictor.from_path("scibert_ner/model.tar.gz")
dataset_reader="classification_dataset_reader")
predictor.predict(
sentence="Did Uriah honestly think he could beat The Legend of Zelda in under three hours?"
)
Now I get an error:
No default predictor for model type bert_crf_tagger.\nPlease specify a
predictor explicitly
I know that I can use the predictor_name to specify a predictor explicitly, but I haven't got the faintest idea which name to pick that would work
I have seen a lot of people having this problem. Upon going through the repository code, I found this to be the easiest way to run the predictions:
python -m allennlp.run predict /path/to/saved_model/model.tar.gz /path/to/test.txt\
--include-package scibert --use-dataset-reader\
--output-file /path/to/where/you/want/predict.txt\
--predictor sentence-tagger --batch-size 16
What did I add? The predictor sentence-tagger. Once you go through the repository, you would find that the registered predictor is sentence-tagger. Although the DEFAUL_DICT of the taggers contain sentence_tagger. A lot of confusion, right? Tell me!
This answer also saves you from writing a predictor.

Problem with loading language_model_learner fastai

I have problem with fastai library. My code below:
import fastai
from fastai.text import *
import os
import pandas as pd
import fastai
from fastai import *
lab = df.columns[0]
data_lm = TextLMDataBunch.from_csv(r'/AWD', 'data.csv', label_cols = lab, text_cols = ['text'])
data_clas = TextClasDataBunch.from_csv(r'/AWD', 'data.csv', vocab = data_lm.train_ds.vocab, bs = 256,label_cols = lab, text_cols=['text'])
data_lm.save('data_lm_export.pkl')
data_clas.save('data_clas.pkl')
learn = language_model_learner(data_lm,AWD_LSTM,drop_mult = 0.3)
learn.lr_find()
learn.recorder.plot(skip_end=10)
learn.fit_one_cycle(10,1e-2,moms=(0.8,0.7))
learn.save('fit_head')
learn.load('fit_head')
My data is quite big, so each epoch in fit_one_cycle lasts about 6h. My resources enables me only to train model in SLURM JOB 70h, so my whole script will be cancelled. I wanted to divide my script into pieces and the first longest part has to learn and save fit_head. Everything was ok, and after that I wanted to load my model to train it again, but i got this error:
**RuntimeError: Error(s) in loading state_dict for SequentialRNN:
size mismatch for 0.encoder.weight: copying a param with shape torch.Size([54376, 400]) from checkpoint, the shape in current model is torch.Size([54720, 400]).
**
I have checked similar problems on github/stack posts and I tried those solutions like this below, but i cannot find anything usefull.
data_clas.vocab.stoi = data_lm.vocab.stoi
data_clas.vocab.itos = data_lm.vocab.itos
Is there any possibility to load trained model without having this issue ?
When you do learner.save() only the model weights are saved on your disk and not the model state dict which contains the model architecture information.
To train the model in a different session you must first define the model itself. Remember to use the same code to define your new model. Since your data is quite heavy as you mentioned you can use a very small subset (~16 records) of your data to create this new model and then do learn.load(model_path) and you should be able to resume training.
you can modify the training data with learn.data.train_dl = new_dl

How can a loaded LGBM model produces different predictions on different machines?

I have previously saved LGBM model with pickle. The program takes the content from the given web page, then classifies it.
Loaded model works well in my local computers (a desktop and a laptop). But it gives nonsense predictions when it runs on an EC2 instance. What issue can cause this?
The models are exactly the same in three machines (desktop, laptop and EC2 instance)
The input data of the model (after the preprocessing steps) are the same
The codes are the same (cloned by Git)
The predictions does make sense on local machines
The numpy, scikit-learn and other package versions are the same in all three machines
I load the model with:
self.mapper = pickle.load(open(model_path + "mapper.pkl", "rb"))
self.model = pickle.load(open(model_path + "model.pkl", "rb"))
And do predictions with, in case you wonder (sorry for the messy code, this part didn't written by me):
data = self.pipeline.transform([{"url_name": url}])
df = pd.DataFrame({"text": [data]})
processed = self.mapper.transform(df)
processed = [a[0] for a in processed]
ws_results = OrderedDict()
for p1, p2 in sorted(zip(self.classes, self.model.predict_proba(processed).tolist()[0]), reverse=True, key=lambda x: x[1])[:3]:
ws_results[p1] = round(p2, 3)

use a saved trained model to predict on new dataset

I am using theano, sklearn and numpy in Python. I found this code for saving my trained network and predict on my new dataset in this link https://github.com/lzhbrian/RBM-DBN-theano-DL4J/blob/master/src/theano/code/logistic_sgd.py. the part of the code I am using is this :
"""
An example of how to load a trained model and use it
to predict labels.
"""
def predict():
# load the saved model
classifier = pickle.load(open('best_model.pkl'))
# compile a predictor function
predict_model = theano.function(
inputs=[classifier.input],
outputs=classifier.y_pred)
# We can test it on some examples from test test
dataset='mnist.pkl.gz'
datasets = load_data(dataset)
test_set_x, test_set_y = datasets[2]
test_set_x = test_set_x.get_value()
predicted_values = predict_model(test_set_x[:10])
print("Predicted values for the first 10 examples in test set:")
print(predicted_values)
if __name__ == '__main__':
sgd_optimization_mnist()
The code for the neural network model I want to save and load and predict with is https://github.com/aseveryn/deep-qa. I could save and load the model with cPickle but I continuously get errors in # compile a predictor function part:
predict_model = theano.function(inputs=[classifier.input],outputs=classifier.y_pred)
Actually I am not certain what I need to put in the inputs according to my code. Which one is right?
inputs=[main.predict_prob_batch.batch_iterator], outputs=test_nnet.layers[-1].
y_pred)
inputs=[predict_prob_batch.batch_iterator],
outputs=test_nnet.layers[-1].y_pred)
inputs=[MiniBatchIteratorConstantBatchSize.dataset],
outputs=test_nnet.layers[-1].y_pred)
inputs=[
sgd_trainer.MiniBatchIteratorConstantBatchSize.dataset],
outputs=test_nnet.layers[-1].y_pred)
or none of them???
Each of them I tried I got the errors:
ImportError: No module named MiniBatchIteratorConstantBatchSize
or
NameError: global name 'predict_prob_batch' is not defined
I would really appreciate if you could help me.
I also used these commands for running the code but still the errors.
python -c 'from run_nnet import predict; from sgd_trainer import MiniBatchIteratorConstantBatchSize; from MiniBatchIteratorConstantBatchSize import dataset; print predict()'
python -c 'from run_nnet import predict; from sgd_trainer import *; from MiniBatchIteratorConstantBatchSize import dataset; print predict()'
Thank you and let me know please if you know a better way to predict for new dataset on the loaded trained model.

Install issues with 'lr_utils' in python

I am trying to complete some homework in a DeepLearning.ai course assignment.
When I try the assignment in Coursera platform everything works fine, however, when I try to do the same imports on my local machine it gives me an error,
ModuleNotFoundError: No module named 'lr_utils'
I have tried resolving the issue by installing lr_utils but to no avail.
There is no mention of this module online, and now I started to wonder if that's a proprietary to deeplearning.ai?
Or can we can resolve this issue in any other way?
You will be able to find the lr_utils.py and all the other .py files (and thus the code inside them) required by the assignments:
Go to the first assignment (ie. Python Basics with numpy) - which you can always access whether you are a paid user or not
And then click on 'Open' button in the Menu bar above. (see the image below)
.
Then you can include the code of the modules directly in your code.
As per the answer above, lr_utils is a part of the deep learning course and is a utility to download the data sets. It should readily work with the paid version of the course but in case you 'lost' access to it, I noticed this github project has the lr_utils.py as well as some data sets
https://github.com/andersy005/deep-learning-specialization-coursera/tree/master/01-Neural-Networks-and-Deep-Learning/week2/Programming-Assignments
Note:
The chinese website links did not work when I looked at them. Maybe the server storing the files expired. I did see that this github project had some datasets though as well as the lr_utils file.
EDIT: The link no longer seems to work. Maybe this one will do?
https://github.com/knazeri/coursera/blob/master/deep-learning/1-neural-networks-and-deep-learning/2-logistic-regression-as-a-neural-network/lr_utils.py
Download the datasets from the answer above.
And use this code (It's better than the above since it closes the files after usage):
def load_dataset():
with h5py.File('datasets/train_catvnoncat.h5', "r") as train_dataset:
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
with h5py.File('datasets/test_catvnoncat.h5', "r") as test_dataset:
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:])
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
"lr_utils" is not official library or something like that.
Purpose of "lr_utils" is to fetch the dataset that is required for course.
option (didn't work for me): go to this page and there is a python code for downloading dataset and creating "lr_utils"
I had a problem with fetching data from provided url (but at least you can try to run it, maybe it will work)
option (worked for me): in the comments (at the same page 1) there are links for manually downloading dataset and "lr_utils.py", so here they are:
link for dataset download
link for lr_utils.py script download
Remember to extract dataset when you download it and you have to put dataset folder and "lr_utils.py" in the same folder as your python script that is using it (script with this line "import lr_utils").
The way I fixed this problem was by:
clicking File -> Open -> You will see the lr_utils.py file ( it does not matter whether you have paid/free version of the course).
opening the lr_utils.py file in Jupyter Notebooks and clicking File -> Download ( store it in your own folder ), rerun importing the modules. It will work like magic.
I did the same process for the datasets folder.
You can download train and test dataset directly here: https://github.com/berkayalan/Deep-Learning/tree/master/datasets
And you need to add this code to the beginning:
import numpy as np
import h5py
import os
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
I faced similar problem and I had followed the following steps:
1. import the following library
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
2. download the train_catvnoncat.h5 and test_catvnoncat.h5 from any of the below link:
[https://github.com/berkayalan/Neural-Networks-and-Deep-Learning/tree/master/datasets]
or
[https://github.com/JudasDie/deeplearning.ai/tree/master/Improving%20Deep%20Neural%20Networks/Week1/Regularization/datasets]
3. create a folder named datasets and paste these two files in this folder.
[ Note: datasets folder and your source code file should be in same directory]
4. run the following code
def load_dataset():
with h5py.File('datasets1/train_catvnoncat.h5', "r") as train_dataset:
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
with h5py.File('datasets1/test_catvnoncat.h5', "r") as test_dataset:
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:])
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
5. Load the data:
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
check datasets
print(len(train_set_x_orig))
print(len(test_set_x_orig))
your data set is ready, you may check the len of the train_set_x_orig, train_set_y variable. For mine, it was 209 and 50
I could download the dataset directly from coursera page.
Once you open the Coursera notebook you go to File -> Open and the following window will be display:
enter image description here
Here the notebooks and datasets are displayed, you can go to the datasets folder and download the required data for the assignment. The package lr_utils.py is also available for downloading.
below is your code, just save your file named "lr_utils.py" and now you can use it.
import numpy as np
import h5py
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
if your code file can not find you newly created lr_utils.py file just write this code:
import sys
sys.path.append("full path of the directory where you saved Ir_utils.py file")
Here is the way to get dataset from as #ThinkBonobo:
https://github.com/andersy005/deep-learning-specialization-coursera/tree/master/01-Neural-Networks-and-Deep-Learning/week2/Programming-Assignments/datasets
write a lr_utils.py file, as above answer #StationaryTraveller, put it into any of sys.path() directory.
def load_dataset():
with h5py.File('datasets/train_catvnoncat.h5', "r") as train_dataset:
....
!!! BUT make sure that you delete 'datasets/', cuz now the name of your data file is train_catvnoncat.h5
restart kernel and good luck.
I may add to the answers that you can save the file with lr_utils script on the disc and import that as a module using importlib util function in the following way.
The below code came from the general thread about import functions from external files into the current user session:
How to import a module given the full path?
### Source load_dataset() function from a file
# Specify a name (I think it can be whatever) and path to the lr_utils.py script locally on your PC:
util_script = importlib.util.spec_from_file_location("utils function", "D:/analytics/Deep_Learning_AI/functions/lr_utils.py")
# Make a module
load_utils = importlib.util.module_from_spec(util_script)
# Execute it on the fly
util_script.loader.exec_module(load_utils)
# Load your function
load_utils.load_dataset()
# Then you can use your load_dataset() coming from above specified 'module' called load_utils
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_utils.load_dataset()
# This could be a general way of calling different user specified modules so I did the same for the rest of the neural network function and put them into separate file to keep my script clean.
# Just remember that Python treat it like a module so you need to prefix the function name with a 'module' name eg.:
# d = nnet_utils.model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1000, learning_rate = 0.005, print_cost = True)
nnet_script = importlib.util.spec_from_file_location("utils function", "D:/analytics/Deep_Learning_AI/functions/lr_nnet.py")
nnet_utils = importlib.util.module_from_spec(nnet_script)
nnet_script.loader.exec_module(nnet_utils)
That was the most convenient way for me to source functions/methods from different files in Python so far.
I am coming from the R background where you can call just one line function source() to bring external scripts contents into your current session.
The above answers didn't help, some links had expired.
So, lr_utils is not a pip library but a file in the same notebook as the CourseEra website.
You can click on "Open", and it'll open the explorer where you can download everything that you would want to run in another environment.
(I used this on a browser.)
This is how i solved mine, i copied the lir_utils file and paste it in my notebook thereafter i downloaded the dataset by zipping the file and extracting it. With the following code. Note: Run the code on coursera notebook and select only the zipped file in the directory to download.
!pip install zipfile36
zf = zipfile.ZipFile('datasets/train_catvnoncat_h5.zip', mode='w')
try:
zf.write('datasets/train_catvnoncat.h5')
zf.write('datasets/test_catvnoncat.h5')
finally:
zf.close()

Categories

Resources