This question already has answers here:
How to pickle and unpickle instances of a class that inherits from defaultdict?
(2 answers)
Closed 8 years ago.
UserDicts and UserLists seem to work unstable with pickle module. How do I fix this bug:
test_pickle.py
import pickle
class UserList(list):
pass
class UserDict(dict):
pass
u = UserList([])
for i in range(10):
d = UserDict()
d.u = u
u.append(d)
pickle.dump(u, open("ttt.pcl", 'wb'))
$ python test_pickle.py
... <~300 traceback lines>
File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python2.7/pickle.py", line 405, in save_reduce
self.memoize(obj)
File "/usr/lib/python2.7/pickle.py", line 244, in memoize
assert id(obj) not in self.memo
AssertionError
Now, if I increase the number of elements in UserList, it gets even "better":
import pickle
class UserList(list):
pass
class UserDict(dict):
pass
u = UserList([])
for i in range(100):
d = UserDict()
d.u = u
u.append(d)
pickle.dump(u, open("ttt.pcl", 'wb'))
$python test_pickle.py
... <more lines than my terminal can handle>
File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/usr/lib/python2.7/copy_reg.py", line 71, in _reduce_ex
state = base(self)
RuntimeError: maximum recursion depth exceeded while calling a Python object
You have some circular references.
d.u = u
u.append(d)
Pickle protocol 0 has issues with this, as you are experiencing. The simplest way to fix: specify protocol=-1:
pickle.dump(u, open("ttt.pcl", 'wb'), protocol=-1)
From the docs:
There are currently 3 different protocols which can be used for
pickling.
Protocol version 0 is the original ASCII protocol and is backwards compatible with earlier versions of Python.
Protocol version 1 is the old binary format which is also compatible with earlier versions of Python.
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.
Protocol 0 is the default (in python 2) and specifying -1 means "use the highest protocol available".
Related
I am applying some preprocessing to the CIFAR100 dataset
from datasets.load import load_dataset
from datasets import Features, Array3D
from transformers.models.vit.feature_extraction_vit import ViTFeatureExtractor
# Resampling & Normalization
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
dataset = load_dataset('cifar100', split='train[:100]')
features = Features({
'pixel_values': Array3D(dtype="float32", shape=(3, 224, 224)),
**dataset.features,
})
dataset = dataset.map(lambda batch, col_name: feature_extractor(batch[col_name]),
features=features, fn_kwargs={'col_name': 'img'}, batched=True)
I got the following warning, which means datasets cannot cache the transformed dataset.
Reusing dataset cifar100 (/home/qys/.cache/huggingface/datasets/cifar100/cifar100/1.0.0/f365c8b725c23e8f0f8d725c3641234d9331cd2f62919d1381d1baa5b3ba3142)
Parameter 'function'=<function <lambda> at 0x7f3279f3eef0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
Curiously, I can pickle/dill foo, but not lambda x: foo(x), despite the fact that they have exactly the same effect. I guess that's related to the problem?
>>> def foo(x): return x + 1
...
>>> Hasher.hash(foo)
'ff7fae499aa1d820'
>>> Hasher.hash(lambda x: foo(x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 237, in hash
return cls.hash_default(value)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 230, in hash_default
return cls.hash_bytes(dumps(value))
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 564, in dumps
dump(obj, file)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 539, in dump
Pickler(file, recurse=True).dump(obj)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 620, in dump
StockPickler.dump(self, obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 487, in dump
self.save(obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 862, in save_function
dill._dill._save_with_postproc(
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 1153, in _save_with_postproc
pickler.write(pickler.get(pickler.memo[id(dest)][0]))
KeyError: 139847629663936
I have also tried making the function accessible from the top level of a module, i.e.
preprocessor = lambda batch: feature_extractor(batch['img'])
dataset = dataset.map(preprocessor, features=features, batched=True)
However, it still doesn't work
>>> from datasets.fingerprint import Hasher
>>> preprocessor = lambda batch: feature_extractor(batch['img'])
>>> Hasher.hash(preprocessor)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 237, in hash
return cls.hash_default(value)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/fingerprint.py", line 230, in hash_default
return cls.hash_bytes(dumps(value))
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 564, in dumps
dump(obj, file)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 539, in dump
Pickler(file, recurse=True).dump(obj)
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 620, in dump
StockPickler.dump(self, obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 487, in dump
self.save(obj)
File "/home/qys/.pyenv/versions/3.10.4/lib/python3.10/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 862, in save_function
dill._dill._save_with_postproc(
File "/home/qys/Research/embedder/.venv/lib/python3.10/site-packages/dill/_dill.py", line 1153, in _save_with_postproc
pickler.write(pickler.get(pickler.memo[id(dest)][0]))
KeyError: 140408024252096
In Python 3.9, pickle hashes the glob_ids dictionary in addition to the globs of a function. To make hashing deterministic when the globals are not in the same order, the order of glob_ids needs to be made deterministic. PR to fix: https://github.com/huggingface/datasets/pull/4516
(Until merged, a temporary fix is to use an older version of dill:
pip install "dill<0.3.5"
see https://github.com/huggingface/datasets/issues/4506#issuecomment-1157417219)
I am getting error like this below in the 500 internal server error page.
File "/usr/local/lib/python3.5/dist-packages/pyDatalog/pyParser.py", line 388, in __call__
literal = Literal.make(self._pyD_name, tuple(args), kwargs)
File "/usr/local/lib/python3.5/dist-packages/pyDatalog/pyParser.py", line 510, in make
return precalculations & Query(predicate_name, terms, kwargs, prearity, aggregate)
File "/usr/local/lib/python3.5/dist-packages/pyDatalog/pyParser.py", line 574, in __init__
Literal.__init__(self, predicate_name, terms, kwargs, prearity, aggregate)
File "/usr/local/lib/python3.5/dist-packages/pyDatalog/pyParser.py", line 500, in __init__
self.lua = pyEngine.Literal(self.predicate_name, tbl, self.prearity, aggregate)
File "pyDatalog\pyEngine.py", line 402, in pyDatalog.pyEngine.Literal.__init__ (pyDatalog/pyEngine.c:15254)
File "pyDatalog\pyEngine.py", line 333, in pyDatalog.pyEngine.Pred.__new__ (pyDatalog/pyEngine.c:13917)
File "pyDatalog\pyEngine.py", line 334, in pyDatalog.pyEngine.Pred.__new__ (pyDatalog/pyEngine.c:13431)
AttributeError: '_thread._local' object has no attribute 'logic'
How can access the PyDataLog variables
I have found the answer here in the Thread safety and multi-models section. If it can help anyone else facing what I faced.
A Python program may start several threads. Each thread should have these statements to initialize pyDatalog :
from pyDatalog import pyDatalog, Logic
Logic() # initializes the pyDatalog engine
I tried to use pickle to dump a MDAnalysis.universe object, but I got error mmessage like
Traceback (most recent call last):
File "convert.py", line 9, in <module>
blob = pickle.dumps(u)
File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python2.7/pickle.py", line 419, in save_reduce
save(state)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/usr/lib/python2.7/copy_reg.py", line 84, in _reduce_ex
dict = getstate()
TypeError: 'AtomGroup' object is not callable
any suggestion would be appreciated!
Updated answer (for MDAnalysis ≥ 2.0)
Since MDAnalysis 2.0.0 (August 2021), Universes can be pickled.
import MDAnalysis as mda
import pickle
u = mda.Universe(topology, trajectory)
pickle.dump(u, open("universe.pkl", "wb"))
# load pickled universe
u_pickled = pickle.load(open("universe.pkl", "rb"))
# test that we get same positions
(u_pickled.atoms.positions == u.atoms.positions).all()
# -> True
# but that universes are different
u == u_pickled
# -> False
See also Parallelizing Analysis in the User Guide.
Old answer
MDAnalysis.Universe objects contain some objects that cannot be serialized and pickled by the standard mechanisms, such as open file descriptors. One would need to write specialized __getstate__() and __setstate__() methods as described in the Pickle protocol but none of this is implemented as of the current 0.8.1 (April 2014) release.
The specific error is explained by Manel in his comment on MDAnalysis Issue 173: Pickle searches for a __getstate__() method. Although that is not implemented, Universe, which manages manages its own attributes to generate "virtual attributes" on the fly, interprets this as an atom selection and eventually returns an empty AtomGroup instance. This, in turn, is called because pickle believes that it is the local implementation of __getstate__. AtomGroup, however, is not callable and the error results.
As you probably noticed you got a quicker response by asking on the MDAnalysis user list or by filing an issue — Stackoverflow is typically lower on the list of developers for answering such specific questions.
I am attempting to pickle the pygame.Surface object, which is not pickleable by default. What I've done is to add the classic picklability functions to the class and overwrite it. This way it will work with the rest of my code.
class TemporarySurface(pygame.Surface):
def __getstate__(self):
print '__getstate__ executed'
return (pygame.image.tostring(self,IMAGE_TO_STRING_FORMAT),self.get_size())
def __setstate__(self,state):
print '__setstate__ executed'
tempsurf = pygame.image.frombuffer(state[0],state[1],IMAGE_TO_STRING_FORMAT)
pygame.Surface.__init__(self,tempsurf)
pygame.Surface = TemporarySurface
Here is an example of my traceback when I try to pickle a few recursive objects:
Traceback (most recent call last):
File "dibujar.py", line 981, in save_project
pickler.dump((key,value))
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 562, in save_tuple
save(element)
File "/usr/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/usr/lib/python2.7/copy_reg.py", line 71, in _reduce_ex
state = base(self)
ValueError: size needs to be (int width, int height)
The part that puzzles me is that the print statement is not being executed. Is __getstate__ even being called? I'm confused here, and I'm not exactly sure what information to put up. Let me know if anything additional would help.
As the documentation says, the primary entry point for pickling extension types is the __reduce__ or __reduce_ex__ methods. Given the error, it seems that the default __reduce__ implementation is not compatible with pygame.Surface's constructor.
So you'd be better off providing a __reduce__ method for Surface, or registering one externally via the copy_reg module. I would suggest the latter, since it doesn't involve monkey patching. You probably want something like:
import copy_reg
def pickle_surface(surface):
return construct_surface, (pygame.image.tostring(surface, IMAGE_TO_STRING_FORMAT), surface.get_size())
def construct_surface(data, size):
return pygame.image.frombuffer(data, size, IMAGE_TO_STRING_FORMAT)
construct_surface.__safe_for_unpickling__ = True
copy_reg.pickle(pygame.Surface, pickle_surface)
That should be all you need. Make sure that the construct_surface function is available at the top level of a module though: the unpickling process needs to be able to locate the function in order to perform the unpickling process (which might be happening in a different interpreter instance).
I am new to python. I have a file data.pkl. What I would like to do is get the data from the file. I looked at http://docs.python.org/library/pickle.html, 11.1.7 example and tried exactly that.
My code looks like this:
import pprint, pickle
pkl_file = open('data.pkl', 'rb')
data1 = pickle.load(pkl_file)
pprint.pprint(data1)
pkl_file.close()
But it is giving me error:
Traceback (most recent call last):
File "/home/sadiksha/workspace/python/test.py", line 5, in <module>
data1 = pickle.load(pkl_file)
File "/usr/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 966, in load_string
raise ValueError, "insecure string pickle"
Can anyone please tell me what am I doing wrong here?
It seems that your pickle file was either not written correctly (specifying 'wb') or the file was somehow corrupted. Try creating your own pickle file and reading that back in. That should do the trick.
As for the pickle file specified, it is definitely corrupted.