Cannot load a pickle file using joblib of sklearn

Cannot load a pickle file using joblib of sklearn - python

I trained a model in a cluster, downloaded it (pkl format) and tried to load locally. I know that sklearn's version of joblib was used to save a model mymodel.pkl (but I don't know which exactly version...).
from sklearn.externals import joblib
print(joblib.__version__)
model = joblib.load("mymodel.pkl")
I use the version 0.13.0 of sklearn's joblib locally.
This is the error that I got:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-100-d0a3c42e5c53> in <module>
3 print(joblib.__version__)
4
----> 5 model = joblib.load("mymodel.pkl")
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py in load(filename, mmap_mode)
596 return load_compatibility(fobj)
597
--> 598 obj = _unpickle(fobj, filename, mmap_mode)
599
600 return obj
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
524 obj = None
525 try:
--> 526 obj = unpickler.load()
527 if unpickler.compat_mode:
528 warnings.warn("The file '%s' has been generated with a "
~\AppData\Local\Continuum\anaconda3\lib\pickle.py in load(self)
1083 raise EOFError
1084 assert isinstance(key, bytes_types)
-> 1085 dispatch[key[0]](self)
1086 except _Stop as stopinst:
1087 return stopinst.value
KeyError: 239
Update:
Also I tried, but got an error AttributeError: 'str' object has no attribute 'readable':
with io.BufferedReader("mymodel.pkl") as pickle_file:
model = pickle.load(pickle_file)

You tried to dump it with joblib.dump('pipeline','mymodel.pkl'). This only dumped the string 'pipeline'! Not your actual pipeline object.
Dump it correctly with:
joblib.dump(pipeline,'mymodel.pkl')
...then read back with:
model = joblib.load('mymodel.pkl')

Related

AttributeError: 'IntervalArray' object has no attribute '_dtype' while reading pickle file to dataframe

I want to read a pickle file into a data frame. However, I get the following error message and I don't know how to solve the issue.
df_batches = pd.read_pickle(folder_in / "batches_and_phases.p")
Error Message:
AttributeError Traceback (most recent call last)
File ~\Master_Thesis\mypython\lib\site-packages\pandas\io\pickle.py:205, in read_pickle(filepath_or_buffer, compression, storage_options)
204 warnings.simplefilter("ignore", Warning)
--> 205 return pickle.load(handles.handle)
206 except excs_to_catch:
207 # e.g.
208 # "No module named 'pandas.core.sparse.series'"
209 # "Can't get attribute '__nat_unpickle' on <module 'pandas._libs.tslib"
File ~\Master_Thesis\mypython\lib\site-packages\pandas\_libs\internals.pyx:750, in pandas._libs.internals.BlockManager.__setstate__()
File ~\Master_Thesis\mypython\lib\site-packages\pandas\_libs\internals.pyx:767, in pandas._libs.internals.BlockManager.__setstate__()
File ~\Master_Thesis\mypython\lib\site-packages\pandas\core\internals\blocks.py:2143, in ensure_block_shape(values, ndim)
2142 if values.ndim < ndim:
-> 2143 if not is_1d_only_ea_dtype(values.dtype):
2144 # TODO(EA2D): https://github.com/pandas-dev/pandas/issues/23023
2145 # block.shape is incorrect for "2D" ExtensionArrays
2146 # We can't, and don't need to, reshape.
2147 values = cast("np.ndarray | DatetimeArray | TimedeltaArray", values)
File ~\Master_Thesis\mypython\lib\site-packages\pandas\core\arrays\interval.py:624, in IntervalArray.dtype(self)
622 #property
623 def dtype(self) -> IntervalDtype:
--> 624 return self._dtype
AttributeError: 'IntervalArray' object has no attribute '_dtype'
Thank you!

Joblib process not working when loading model

I have created an image classification model using pre-trained model inceptionV3. After I trained the model on my dataset I saved the model using joblib. When trying to load the model Im getting error "Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://1ea4479d-6a25-4562-965a-428f7eb33342/variables/variables
You may be trying to load on a different device from the computational device. Consider setting the experimental_io_device option in tf.saved_model.LoadOptions to the io_device such as '/job:localhost'."
Any idea why is this message appearing or is it because you cant use joblib to save a model made from pre-trained model. Below is the code and the error
import joblib
joblib.dump(inceptionv3_model, 'inceptV3_model.pkl')
model_inceptionv3 = joblib.load('inceptV3_model.pkl')
FileNotFoundError Traceback (most recent call last)
<ipython-input-14-8ed26b03fd7d> in <module>
1 # loading the model
----> 2 model_inceptionv3 = joblib.load('C:/Users/Indranil/inceptV3_model.pkl')
~\anaconda3\lib\site-packages\joblib\numpy_pickle.py in load(filename, mmap_mode)
583 return load_compatibility(fobj)
584
--> 585 obj = _unpickle(fobj, filename, mmap_mode)
586 return obj
~\anaconda3\lib\site-packages\joblib\numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
502 obj = None
503 try:
--> 504 obj = unpickler.load()
505 if unpickler.compat_mode:
506 warnings.warn("The file '%s' has been generated with a "
~\anaconda3\lib\pickle.py in load(self)
1208 raise EOFError
1209 assert isinstance(key, bytes_types)
-> 1210 dispatch[key[0]](self)
1211 except _Stop as stopinst:
1212 return stopinst.value
~\anaconda3\lib\pickle.py in load_reduce(self)
1585 args = stack.pop()
1586 func = stack[-1]
-> 1587 stack[-1] = func(*args)
1588 dispatch[REDUCE[0]] = load_reduce
1589
~\anaconda3\lib\site-packages\keras\saving\pickle_utils.py in deserialize_model_from_bytecode(serialized_model)
46 with tf.io.gfile.GFile(dest_path, "wb") as f:
47 f.write(archive.extractfile(name).read())
---> 48 model = save_module.load_model(temp_dir)
49 tf.io.gfile.rmtree(temp_dir)
50 return model
~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
~\anaconda3\lib\site-packages\tensorflow\python\saved_model\load.py in load_internal(export_dir, tags, options, loader_cls, filters)
975 ckpt_options, options, filters)
976 except errors.NotFoundError as err:
--> 977 raise FileNotFoundError(
978 str(err) + "\n You may be trying to load on a different device "
979 "from the computational device. Consider setting the "
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://1ea4479d-6a25-4562-965a-428f7eb33342/variables/variables
You may be trying to load on a different device from the computational device. Consider setting the `experimental_io_device` option in `tf.saved_model.LoadOptions` to the io_device such as '/job:localhost'.

Trying to save model in h5 (using keras and tensorflow) - unable to create link

RuntimeError: Traceback (most recent call
last) in ()
128 # Run experiments with the Shifted Patch Tokenization and Locality Self Attention modified ViT
129 vit_net = create_vit_classifier(vanilla=False)
--> 130 vit_model = run_experiment(vit_net)
3 frames /usr/local/lib/python3.7/dist-packages/h5py/_hl/group.py in
setitem(self, name, obj)
371
372 if isinstance(obj, HLObject):
--> 373 h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
374
375 elif isinstance(obj, SoftLink):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5o.pyx in h5py.h5o.link()
RuntimeError: Unable to create link (name already exists)

BertForTokenClassification Not loading

I tried to load a Bert model from local directory and it was showing an error
I am using cuda 10.0 version and pytorch 1.6.0
Code to load model:-
output_dir = './ner_model/'
model = BertForTokenClassification.from_pretrained(output_dir)
tokenizer = BertTokenizer.from_pretrained(output_dir)
model.to(device)
Any help would be appreicated
ReadError: invalid header
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
~\anaconda3\envs\env\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
511 try:
--> 512 state_dict = torch.load(resolved_archive_file, map_location="cpu")
513 except Exception:
~\anaconda3\envs\env\lib\site-packages\torch\serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
385 try:
--> 386 return _load(f, map_location, pickle_module, **pickle_load_args)
387 finally:
~\anaconda3\envs\env\lib\site-packages\torch\serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
558 # .zip is used for torch.jit.save and will throw an un-pickling error here
--> 559 raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
560 # if not a tarfile, reset file offset and proceed
RuntimeError: ./ner_model/pytorch_model.bin is a zip archive (did you mean to use torch.jit.load()?)
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-13-770da388c2c8> in <module>
23
24 output_dir = './ner_model/'
---> 25 model = BertForTokenClassification.from_pretrained(output_dir)
26 tokenizer = BertTokenizer.from_pretrained(output_dir)
27 model.to(device)
~\anaconda3\envs\env\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
513 except Exception:
514 raise OSError(
--> 515 "Unable to load weights from pytorch checkpoint file. "
516 "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
517 )
OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, pleas
e set from_tf=True.

Word2Vec error when loading in GoogleNews data

I am following a tutorial here: https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
I am at the part "Word2vec and Logistic Regression". I have downloaded the "GoogleNews-vectors-negative300.bin.gz" file and I am tyring to apply it to my own text data. However when I get to the following code:
%%time
from gensim.models import Word2Vec
wv = gensim.models.KeyedVectors.load_word2vec_format("/data/users/USERS/File_path/classifier/GoogleNews_Embedding/GoogleNews-vectors-negative300.bin.gz", binary=True)
wv.init_sims(replace=True)
I run into the following error:
/data/users/msmith/env/lib64/python3.6/site-packages/smart_open/smart_open_lib.py:398: UserWarning: This function is deprecated, use smart_open.open instead. See the migration notes for details: https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst#migrating-to-the-new-open-function
'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
<timed exec> in <module>
~/env/lib64/python3.6/site-packages/gensim/models/keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
1492 return _load_word2vec_format(
1493 cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
-> 1494 limit=limit, datatype=datatype)
1495
1496 def get_keras_embedding(self, train_embeddings=False):
~/env/lib64/python3.6/site-packages/gensim/models/utils_any2vec.py in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
383 with utils.ignore_deprecation_warning():
384 # TODO use frombuffer or something similar
--> 385 weights = fromstring(fin.read(binary_len), dtype=REAL).astype(datatype)
386 add_word(word, weights)
387 else:
/usr/lib64/python3.6/gzip.py in read(self, size)
274 import errno
275 raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 276 return self._buffer.read(size)
277
278 def read1(self, size=-1):
/usr/lib64/python3.6/_compression.py in readinto(self, b)
66 def readinto(self, b):
67 with memoryview(b) as view, view.cast("B") as byte_view:
---> 68 data = self.read(len(byte_view))
69 byte_view[:len(data)] = data
70 return len(data)
/usr/lib64/python3.6/gzip.py in read(self, size)
480 break
481 if buf == b"":
--> 482 raise EOFError("Compressed file ended before the "
483 "end-of-stream marker was reached")
484
EOFError: Compressed file ended before the end-of-stream marker was reached
Any idea whats gone wrong/ how to overcome this issue?
Thanks in advance!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot load a pickle file using joblib of sklearn - python

You tried to dump it with joblib.dump('pipeline','mymodel.pkl'). This only dumped the string 'pipeline'! Not your actual pipeline object. Dump it correctly with: joblib.dump(pipeline,'mymodel.pkl') ...then read back with: model = joblib.load('mymodel.pkl')

Related

AttributeError: 'IntervalArray' object has no attribute '_dtype' while reading pickle file to dataframe

Joblib process not working when loading model

Trying to save model in h5 (using keras and tensorflow) - unable to create link

BertForTokenClassification Not loading

Word2Vec error when loading in GoogleNews data

Categories

Resources