Error in loading model in PyTorch - python

I Have the following code snippet
from train import predict
import random
import torch
ann=torch.load('ann.pt') #importing trained model
while True:
k=raw_input("User:")
intent,top_value,top_index = predict(str(k),ann)
print(intent)
when I run the script it is throwing the error as below:
Traceback (most recent call last):
File "test.py", line 6, in <module>
ann=torch.load('ann.pt') #importing trained model
File "/home/local/ZOHOCORP/raghav-5305/miniconda2/lib/python2.7/site-packages/torch/serialization.py", line 261, in load
return _load(f, map_location, pickle_module)
File "/home/local/ZOHOCORP/raghav-5305/miniconda2/lib/python2.7/site-packages/torch/serialization.py", line 409, in _load
result = unpickler.load()
AttributeError: 'module' object has no attribute 'ANN'
I have ann.pt file in the same folder as my script is.
Kindly help me identify fix the error and load the model.
Thanks in advance.

When trying to save both parameters and model, pytorch pickles the parameters but only store path the model Class. For instance, changing tree structure or refactoring can break loading.
Therefore as the documentation points out, it is not recommended, prefer only save/load parameters:
...the serialized data is bound to the specific classes and the exact directory structure used, so it can break in various ways when used in other projects, or after some serious refactors.
For more help, it'll be useful to show your saving code.

Related

How to migrate from keras.engine.[...] to tf.keras.engine?

As discussed in keras vs. tensorflow.python.keras - which one to use? I generally do not want to include keras in my machine learning project. Yet in the following code example using keras works and tf.keras does not:
>>>import keras
>>>print(keras.engine.keras_tensor.KerasTensor)
<class 'keras.engine.keras_tensor.KerasTensor'>
and
>>>import tensorflow as tf
>>>print(tf.keras.engine.keras_tensor.KerasTensor)
Traceback (most recent call last):
File "c:\Users\reifv\root\Heidelberg Master\Netflix_AI_codes\working env - Copy\Netflix_AI\main copy.py", line 4, in <module>
print(tf.keras.engine.keras_tensor.KerasTensor)
File "C:\Users\reifv\root\Heidelberg Master\Netflix_AI_codes\working env - Copy\Netflix_AI\venvnf\lib\site-packages\tensorflow\python\util\lazy_loader.py", line 59, in __getattr__
return getattr(module, item)
AttributeError: module 'keras.api._v2.keras' has no attribute 'engine'
This is one example of tf.keras not working, but I encountered this issue several times along my project. Therefore, a general solution would be much appreciated, too.

Why does the Dequantize node fail to prepare?

Background
I'm playing around with MediaPipe for hand tracking and found this useful wrapper for loading MediaPipe's hand_landmark.tflite model. It works without any problems for me on Ubuntu 18.04 with Tensorflow 1.14.0.
However, when I try use a newer recently released model, I run into the following error:
INFO: Initialized TensorFlow Lite runtime.
Traceback (most recent call last):
File "/home/user/code/.../repo/models/test_model.py", line 12, in <module>
use_mediapipe_model()
File "/home/user/code/.../repo/models/test_model.py", line 8, in use_mediapipe_model
interp_joint.allocate_tensors()
File "/home/user/code/env/lib/python3.6/site-packages/tensorflow/lite/python/interpreter.py", line 95, in allocate_tensors
return self._interpreter.AllocateTensors()
File "/home/user/code/env/lib/python3.6/site-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 106, in AllocateTensors
return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_AllocateTensors(self)
RuntimeError: tensorflow/lite/kernels/dequantize.cc:62 op_context.input->type == kTfLiteUInt8 || op_context.input->type == kTfLiteInt8 was not true.Node number 0 (DEQUANTIZE) failed to prepare.
When looking at the two models in Netron, I can see that the newer model uses nodes of the type Dequantize which seem to cause the problem. As I'm a beginner when it comes to Tensorflow I don't really know where to go from here.
Code to reproduce the error
from pathlib import Path
import tensorflow as tf
def use_mediapipe_model():
interp_joint = tf.lite.Interpreter(
f"{Path(__file__).parent}/hand_landmark.tflite") # path to model
interp_joint.allocate_tensors()
if __name__ == "__main__":
use_mediapipe_model()
Question
Is the problem related to the version of Tensorflow that I'm using or am I doing something wrong when it comes to loading the .tflite models?
Doesn't work in TF 1.14.0. You need at least 1.15.2

Pipeline error: "AttributeError: 'ColumnTransformer' object has no attribute '_feature_names_in'"

I am trying to use scikitlearn to predict over new data using a pipeline object I had trained back in February. Since Friday, February 28th, the predict function no longer works for my pipeline object, citing the error:
>>> df = pd.read_csv('test_df_for_example.csv')
>>> mdl = joblib.load('split_0_model.pkl')
>>> mdl.predict(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/metaestimators.py", line 116, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/pipeline.py", line 419, in predict
Xt = transform.transform(Xt)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/compose/_column_transformer.py", line 587, in transform
self._validate_features(X.shape[1], X_feature_names)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/compose/_column_transformer.py", line 411, in _validate_features
if ((self._feature_names_in is None or feature_names is None)
AttributeError: 'ColumnTransformer' object has no attribute '_feature_names_in'
I am using Microsoft Azure's virtual machines to do this predicting (although the above code I ran on my local computer), so working with the versioning of the modules is difficult, and most of the time I am forced to use the latest versions of packages. I believe this error comes from scikitlearn's new version 0.22.2.post1, which I am using.
I have an example CSV with testing data here
The model file pickled with joblib here
And code to reproduce the error here
And yaml environment file here
Is there any way I can upgrade my model so that this error does not occur?
Thanks!
Kristine
I recommend pinning down versions in your YAML, especially with the speed of releases in the azureml space.
So downgrading sklearn to the last stable build for your use case may be the solution, or upgrading the rest of your code base to accommodate the new sklearn version.
Ex.
- pip:
- sklearn==0.20.0
- azureml-sdk==1.0.85
- etc...
Thanks to Nema, using the following specifications I'm able to use the previous scikit-learn version to load my model:
- scikit-learn<=0.21.3
- azureml-sdk<=1.0.83

Running model in custom Data Set(2 classes) error in testing phase (MXNet framework)

i am running a model for my own data set(the project was implemented for training/testing with ImageNet) with 2 classes. I have made all the changes (in config files etc) but after training finishes(successfully), i get the following error when starting testing:
wrote gt roidb to ./data/cache/ImageNetVID_DET_val_gt_roidb.pkl
Traceback (most recent call last):
File "experiments/dff_rfcn/dff_rfcn_end2end_train_test.py", line 20, in <module>
test.main()
File "experiments/dff_rfcn/../../dff_rfcn/test.py", line 53, in main
args.vis, args.ignore_cache, args.shuffle, config.TEST.HAS_RPN, config.dataset.proposal, args.thresh, logger=logger, output_path=final_output_path)
File "experiments/dff_rfcn/../../dff_rfcn/function/test_rcnn.py", line 68, in test_rcnn
roidbs_seg_lens[gpu_id] += x['frame_seg_len']
KeyError: 'frame_seg_len'
I cleaned the cache file before running. As i have read in previous topics, this might be an issue of previous datasets .pkl files in cache. What may have caused this error? I also want to mention that i changed .txt filenames that feed the neural network(if this is important), and that training finishes well.
This is my first time running a project in Deep Learning so please show some understanding.
MXNet typically uses methods other than pickle directly for serialization of the model architecture and trained weights.
With Gluon API, you can save weights of the model to a file (i.e. Block) with .save_params() and then load the weights from a file with .load_params(). You 'save' the model architecture by keeping the code used to define the model. See and example of this here.
With Module API, you can create checkpoints at the end of each epoch which will save the symbol (i.e. model architecture) and the parameters (i.e. model weights). See here.
checkpoint = mx.callback.do_checkpoint(model_prefix)
mod = mx.mod.Module(symbol=net)
mod.fit(train_iter, num_epoch=5, epoch_end_callback=checkpoint)
You can then load the model of a given checkpoint (e.g. 42 in this example)
sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 42)
mod.set_params(arg_params, aux_params)

Unable to load a pretrained model

After training my model for almost 2 days 3 files were generated:
best_model.ckpt.data-00000-of-00001
best_model.ckpt.index
best_model.ckpt.meta
where best_model is my model name.
When I try to import my model using the following command
with tf.Session() as sess:
saver = tf.train.import_meta_graph('best_model.ckpt.meta')
saver.restore(sess, "best_model.ckpt")
I get the following error
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/shreyash/.local/lib/python2.7/site-
packages/tensorflow/python/training/saver.py", line 1577, in
import_meta_graph
**kwargs)
File "/home/shreyash/.local/lib/python2.7/site-
packages/tensorflow/python/framework/meta_graph.py", line 498, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/home/shreyash/.local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 259, in import_graph_def
raise ValueError('No op named %s in defined operations.' % node.op)
ValueError: No op named attn_add_fun_f32f32f32 in defined operations.
How to fix this?
I have referred this post: TensorFlow, why there are 3 files after saving the model?
Tensorflow version 1.0.0 installed using pip
Linux version 16.04
python 2.7
The importer can't find a very specific function in your graph, namely attn_add_fun_f32f32f32, which is likely to be one of attention functions.
Probably you've stepped into this issue. However, they say it's bundled in tensorflow 1.0. Double check that installed tensorflow version contains attention_decoder_fn.py (or, if you are using another library, check that it's there).
If it's there, here are your options:
Rename this operation, if possible. You might want to read this discussion for workarounds.
Duplicate your graph definition, so that you won't have to call import_meta_graph, but restore the model into the current graph.

Categories

Resources