I am new to Python and am playing with Pickle and don't understand how this works
I define a defaultdict, write it to pickle. Then in a different script I read it and it still behaves like a defaultdict even without importing collections
script1:
import pickle
from collections import defaultdict
x = defaultdict(list)
x['a'].append(1)
print(x)
with open('pick','wb') as f:
pickle.dump( x, f )
script2:
import pickle
with open('pick','rb') as f:
x = pickle.load( f )
x['b'].append(2)
print(x)
y = dict()
try:
y['b'].append(2)
print(y)
except KeyError:
print("Can't append to y")
running:
$ python3 pick2.py
defaultdict(<class 'list'>, {'a': [1], 'b': [2]})
Can't append to y
So, the 2nd script doesn't import defaultdict but the pickled x still acts like one. I'm confused :)
How does this work in Python? Thanks for any info :)
First of all, if you look at the pickle docs, specifically:
pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored
So what this is telling us is that pickle will import the module that defines the object you are unpickling.
We can show this with a small example, consider the following folder structure:
parent/
|-- a.py
|-- sub/
sub is an empty sub-folder
a.py holds an example class
# a.py
class ExampleClass:
def __init__(self):
self.var = 'This is a string'
Now starting the python console in the parent directory:
alex#toaster:parent$ python3
>>> import pickle
>>> from a import ExampleClass
>>> x = ExampleClass()
>>> x.var
'This is a string'
>>> with open('eg.p', 'wb') as f:
... pickle.dump(x, f)
Exit the shell. Move to the sub directory and try to load the pickled ExampleClass object.
alex#toaster:sub$ python3
>>> import pickle
>>> with open('../eg.p', 'rb') as f:
... x = pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'a'
We get a ModuleNotFoundError as pickle cannot load the class definition from the module a (it's in a different directory). In your case, python can load the collections.defaultdict class as this module is on the PYTHONPATH. However, to continue to use the module(s) imported by pickle you will still need to import them yourself; eg you want to create another defaultdict in script2.py.
To find out more about modules look here, specifically 6.1.2 The Module Search Path.
Related
Or, is there a way to serialize and save a class from a script, that can still be loaded if the script is deleted?
Consider three Python scripts that are in the same directory:
test.py
import pickle
import test_class_pickle
tc = test_class_pickle.Test()
pickle.dump(tc, open("/home/user/testclass", "wb"))
test_class_pickle.py
class Test:
def __init__(self):
self.var1 = "Hello!"
self.var2 = "Goodbye!"
def print_vars(self):
print(self.var1, self.var2)
test_class_unpickle.py
import pickle
tc = pickle.load(open("/home/user/testclass", "rb"))
print(tc.var1, tc.var2)
When I run test.py, it imports the Test class from test_class_pickle, creates an instance of it, and saves it to a file using pickle. When I run test_class_unpickle.py, it loads the class back into memory as expected.
However, when I delete test_class_pickle.py and run test_class_unpickle.py again, it throws this exception:
Traceback (most recent call last):
File "/home/sam/programs/python/testing/test_class_unpickle.py", line 3, in <module>
tc = pickle.load(open("/home/sam/testclass", "rb"))
ModuleNotFoundError: No module named 'test_class_pickle'
Is there a way I can save class instances to a file without relying on the original script's continuous existence? It would be nice if I didn't have to use something like json (which would require me to get a list of all the attributes of the class, write them into a dictionary, etc.), because all the classes are also handling other classes, which are handling other classes, etc., and each class has several functions that handle the data.
Here's a way to get dill to do it. Dill only stores definitions of objects defined in __main__, but not those in separate modules. The following function redefines a separate module in __main__ so they will be stored in the pickle file. Based on this answer https://stackoverflow.com/a/64758608/355230.
test.py
import dill
from pathlib import Path
import test_class_pickle
def mainify_module(module):
import __main__ # This module.
import importlib, inspect, sys
code = inspect.getsource(module)
spec = importlib.util.spec_from_loader(module.__name__, loader=None)
module_obj = importlib.util.module_from_spec(spec)
exec(code, __main__.__dict__)
sys.modules[module.__name__] = module_obj # Replace in cache.
globals()[module.__name__] = module_obj # Redefine.
pkl_filepath = Path('testclass.pkl')
pkl_filepath.unlink(missing_ok=True) # Delete any existing file.
mainify_module(test_class_pickle)
tc = Test('I Am the Walrus!')
with open(pkl_filepath, 'wb') as file:
dill.dump(tc, file)
print(f'dill pickle file {pkl_filepath.name!r} created')
test_class_pickle.py
class Test:
def __init__(self, baz):
self.var1 = "Hello!"
self.var2 = "Goodbye!"
self.foobar = baz
def print_vars(self):
print(self.var1, self.var2, self.foobar)
test_class_unpickle.py
import dill
pkl_filepath = 'testclass.pkl'
with open(pkl_filepath, 'rb') as file:
tc = dill.load(file)
tc.print_vars() # -> Hello! Goodbye! I Am the Walrus!
If, as in your example, the class you want to serialize is self-contained, i.e. it doesn't reference global objects or other custom classes of the same package, a simpler workaround is to temporarily "orphan" the class:
import dill
import test_class_pickle
def pickle_class_by_value(file, obj, **kwargs):
cls = obj if isinstance(obj, type) else type(obj)
cls_module = cls.__module__
cls.__module__ = None # dill will think this class is orphaned...
dill.dump(file, obj, **kwargs) # and serialize it by value
cls.__module__ = cls_module
tc = test_class_pickle.Test()
with open("/home/user/testclass.pkl", "wb") as file:
pickle_class_by_value(file, tc)
My module contains a class which should be pickleable, both instance and definition
I have the following structure:
MyModule
|-Submodule
|-MyClass
In other questions on SO I have already found that dill is able to pickle class definitions and surely enough it works by copying the definition of MyClass into a separate script and pickling it there, like this:
import dill as pickle
class MyClass(object):
...
instance = MyClass(...)
with open(..., 'wb') as file:
pickle.dump(instance, file)
However, it does not work when importing the class:
Pickling:
from MyModule.Submodule import MyClass
import dill as pickle
instance = MyClass(...)
with open(.., 'wb') as file:
pickle.dump(instance, file)
Loading:
import dill as pickle
with open(..., 'rb') as file:
instance = pickle.load(file)
>>> ModuleNotFoundError: No module named 'MyModule'
I think the class definition is saved by reference, although it should not have as per default settings in dill. This is done correctly when MyClass is known as __main__.MyClass, which happens when the class is defined in the main script.
I am wondering, is there any way to detach MyClass from MyModule? Any way to make it act like a top level import (__main__.MyClass) so dill knows how to load it on my other machine?
Relevant question:
Why dill dumps external classes by reference no matter what
Dill indeed only stores definitions of objects in __main__, and not those in modules, so one way around this problem is to redefine those objects in main:
def mainify(obj):
import __main__
import inspect
import ast
s = inspect.getsource(obj)
m = ast.parse(s)
co = compile(m, '<string>', 'exec')
exec(co, __main__.__dict__)
And then:
from MyModule.Submodule import MyClass
import dill as pickle
mainify(MyClass)
instance = MyClass(...)
with open(.., 'wb') as file:
pickle.dump(instance, file)
Now you should be able to load the pickle from anywhere, even where the MyModule.Submodule is not available.
I'm the dill author. This is a duplicate of the question you refer to above. The relevant GitHub feature request is: https://github.com/uqfoundation/dill/issues/128.
I think the larger issue is that you want to pickle an object defined in another file that is not installed. This is currently not possible, I believe.
As a workaround, I believe you may be able to pickle with dill.source by extracting the source code of the class (or module) and pickling that dynamically, or extracting the source code and compiling a new object in __main__.
I managed to save the instance and definition of my class using the following dirty hack:
class MyClass(object):
def save(path):
import __main__
with open(__file__) as f:
code = compile(f.read(), "somefile.py", 'exec')
globals = __main__.__dict__
locals = {'instance': self, 'savepath': path}
exec(code, globals, locals)
if __name__ == '__main__':
# Script is loaded in top level, MyClass is now available under the qualname '__main__.MyClass'
import dill as pickle
# copy the attributes of the 'MyModule.Submodule.MyClass' instance to a bew 'MyClass' instance.
new_instance = MyClass.__new__(MyClass)
new_instance.__dict__ = locals()['instance'].__dict__
with open(locals()['savepath'], 'wb') as f:
pickle.dump(new_instance, f)
Using the exec statement the file can be executed from within __main__, so the class definition will be saved as well.
This script should not be executed as main script without using the save function.
I have some code in the form of a string and would like to make a module out of it without writing to disk.
When I try using imp and a StringIO object to do this, I get:
>>> imp.load_source('my_module', '', StringIO('print "hello world"'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: load_source() argument 3 must be file, not instance
>>> imp.load_module('my_module', StringIO('print "hello world"'), '', ('', '', 0))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: load_module arg#2 should be a file or None
How can I create the module without having an actual file? Alternatively, how can I wrap a StringIO in a file without writing to disk?
UPDATE:
NOTE: This issue is also a problem in python3.
The code I'm trying to load is only partially trusted. I've gone through it with ast and determined that it doesn't import anything or do anything I don't like, but I don't trust it enough to run it when I have local variables running around that could get modified, and I don't trust my own code to stay out of the way of the code I'm trying to import.
I created an empty module that only contains the following:
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
Then you can import mymodule and call mymodule.load(code). This works for me because I've ensured that the code I'm loading does not use globals. Also, the global keyword is only a parser directive and can't refer to anything outside of the exec.
This really is way too much work to import the module without writing to disk, but if you ever want to do this, I believe it's the best way.
Here is how to import a string as a module (Python 2.x):
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec my_code in mymodule.__dict__
In Python 3, exec is a function, so this should work:
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec(my_code, mymodule.__dict__)
Now access the module attributes (and functions, classes etc) as:
print(mymodule.a)
>>> 5
To ignore any next attempt to import, add the module to sys:
sys.modules['mymodule'] = mymodule
imp.new_module is deprecated since python 3.4, but it still works as of python 3.9
imp.new_module was replaced with importlib.util.module_from_spec
importlib.util.module_from_spec
is preferred over using types.ModuleType to create a new module as
spec is used to set as many import-controlled attributes on the module
as possible.
importlib.util.spec_from_loader
uses available loader APIs, such as InspectLoader.is_package(), to
fill in any missing information on the spec.
these module attributes are __builtins__ __doc__ __loader__ __name__ __package__ __spec__
import sys, importlib.util
def import_module_from_string(name: str, source: str):
"""
Import module from source string.
Example use:
import_module_from_string("m", "f = lambda: print('hello')")
m.f()
"""
spec = importlib.util.spec_from_loader(name, loader=None)
module = importlib.util.module_from_spec(spec)
exec(source, module.__dict__)
sys.modules[name] = module
globals()[name] = module
# demo
# note: "if True:" allows to indent the source string
import_module_from_string('hello_module', '''if True:
def hello():
print('hello')
''')
hello_module.hello()
You could simply create a Module object and stuff it into sys.modules and put your code inside.
Something like:
import sys
from types import ModuleType
mod = ModuleType('mymodule')
sys.modules['mymodule'] = mod
exec(mycode, mod.__dict__)
If the code for the module is in a string, you can forgo using StringIO and use it directly with exec, as illustrated below with a file named dynmodule.py.
Works in Python 2 & 3.
from __future__ import print_function
class _DynamicModule(object):
def load(self, code):
execdict = {'__builtins__': None} # optional, to increase safety
exec(code, execdict)
keys = execdict.get(
'__all__', # use __all__ attribute if defined
# else all non-private attributes
(key for key in execdict if not key.startswith('_')))
for key in keys:
setattr(self, key, execdict[key])
# replace this module object in sys.modules with empty _DynamicModule instance
# see Stack Overflow question:
# https://stackoverflow.com/questions/5365562/why-is-the-value-of-name-changing-after-assignment-to-sys-modules-name
import sys as _sys
_ref, _sys.modules[__name__] = _sys.modules[__name__], _DynamicModule()
if __name__ == '__main__':
import dynmodule # name of this module
import textwrap # for more readable code formatting in sample string
# string to be loaded can come from anywhere or be generated on-the-fly
module_code = textwrap.dedent("""\
foo, bar, baz = 5, 8, 2
def func():
return foo*bar + baz
__all__ = 'foo', 'bar', 'func' # 'baz' not included
""")
dynmodule.load(module_code) # defines module's contents
print('dynmodule.foo:', dynmodule.foo)
try:
print('dynmodule.baz:', dynmodule.baz)
except AttributeError:
print('no dynmodule.baz attribute was defined')
else:
print('Error: there should be no dynmodule.baz module attribute')
print('dynmodule.func() returned:', dynmodule.func())
Output:
dynmodule.foo: 5
no dynmodule.baz attribute was defined
dynmodule.func() returned: 42
Setting the '__builtins__' entry to None in the execdict dictionary prevents the code from directly executing any built-in functions, like __import__, and so makes running it safer. You can ease that restriction by selectively adding things to it you feel are OK and/or required.
It's also possible to add your own predefined utilities and attributes which you'd like made available to the code thereby creating a custom execution context for it to run in. That sort of thing can be useful for implementing a "plug-in" or other user-extensible architecture.
you could use exec or eval to execute python code as a string. see here, here and here
The documentation for imp.load_source says (my emphasis):
The file argument is the source file, open for reading as text, from the beginning. It must currently be a real file object, not a user-defined class emulating a file.
... so you may be out of luck with this method, I'm afraid.
Perhaps eval would be enough for you in this case?
This sounds like a rather surprising requirement, though - it might help if you add some more to your question about the problem you're really trying to solve.
I have some code in the form of a string and would like to make a module out of it without writing to disk.
When I try using imp and a StringIO object to do this, I get:
>>> imp.load_source('my_module', '', StringIO('print "hello world"'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: load_source() argument 3 must be file, not instance
>>> imp.load_module('my_module', StringIO('print "hello world"'), '', ('', '', 0))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: load_module arg#2 should be a file or None
How can I create the module without having an actual file? Alternatively, how can I wrap a StringIO in a file without writing to disk?
UPDATE:
NOTE: This issue is also a problem in python3.
The code I'm trying to load is only partially trusted. I've gone through it with ast and determined that it doesn't import anything or do anything I don't like, but I don't trust it enough to run it when I have local variables running around that could get modified, and I don't trust my own code to stay out of the way of the code I'm trying to import.
I created an empty module that only contains the following:
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
Then you can import mymodule and call mymodule.load(code). This works for me because I've ensured that the code I'm loading does not use globals. Also, the global keyword is only a parser directive and can't refer to anything outside of the exec.
This really is way too much work to import the module without writing to disk, but if you ever want to do this, I believe it's the best way.
Here is how to import a string as a module (Python 2.x):
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec my_code in mymodule.__dict__
In Python 3, exec is a function, so this should work:
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec(my_code, mymodule.__dict__)
Now access the module attributes (and functions, classes etc) as:
print(mymodule.a)
>>> 5
To ignore any next attempt to import, add the module to sys:
sys.modules['mymodule'] = mymodule
imp.new_module is deprecated since python 3.4, but it still works as of python 3.9
imp.new_module was replaced with importlib.util.module_from_spec
importlib.util.module_from_spec
is preferred over using types.ModuleType to create a new module as
spec is used to set as many import-controlled attributes on the module
as possible.
importlib.util.spec_from_loader
uses available loader APIs, such as InspectLoader.is_package(), to
fill in any missing information on the spec.
these module attributes are __builtins__ __doc__ __loader__ __name__ __package__ __spec__
import sys, importlib.util
def import_module_from_string(name: str, source: str):
"""
Import module from source string.
Example use:
import_module_from_string("m", "f = lambda: print('hello')")
m.f()
"""
spec = importlib.util.spec_from_loader(name, loader=None)
module = importlib.util.module_from_spec(spec)
exec(source, module.__dict__)
sys.modules[name] = module
globals()[name] = module
# demo
# note: "if True:" allows to indent the source string
import_module_from_string('hello_module', '''if True:
def hello():
print('hello')
''')
hello_module.hello()
You could simply create a Module object and stuff it into sys.modules and put your code inside.
Something like:
import sys
from types import ModuleType
mod = ModuleType('mymodule')
sys.modules['mymodule'] = mod
exec(mycode, mod.__dict__)
If the code for the module is in a string, you can forgo using StringIO and use it directly with exec, as illustrated below with a file named dynmodule.py.
Works in Python 2 & 3.
from __future__ import print_function
class _DynamicModule(object):
def load(self, code):
execdict = {'__builtins__': None} # optional, to increase safety
exec(code, execdict)
keys = execdict.get(
'__all__', # use __all__ attribute if defined
# else all non-private attributes
(key for key in execdict if not key.startswith('_')))
for key in keys:
setattr(self, key, execdict[key])
# replace this module object in sys.modules with empty _DynamicModule instance
# see Stack Overflow question:
# https://stackoverflow.com/questions/5365562/why-is-the-value-of-name-changing-after-assignment-to-sys-modules-name
import sys as _sys
_ref, _sys.modules[__name__] = _sys.modules[__name__], _DynamicModule()
if __name__ == '__main__':
import dynmodule # name of this module
import textwrap # for more readable code formatting in sample string
# string to be loaded can come from anywhere or be generated on-the-fly
module_code = textwrap.dedent("""\
foo, bar, baz = 5, 8, 2
def func():
return foo*bar + baz
__all__ = 'foo', 'bar', 'func' # 'baz' not included
""")
dynmodule.load(module_code) # defines module's contents
print('dynmodule.foo:', dynmodule.foo)
try:
print('dynmodule.baz:', dynmodule.baz)
except AttributeError:
print('no dynmodule.baz attribute was defined')
else:
print('Error: there should be no dynmodule.baz module attribute')
print('dynmodule.func() returned:', dynmodule.func())
Output:
dynmodule.foo: 5
no dynmodule.baz attribute was defined
dynmodule.func() returned: 42
Setting the '__builtins__' entry to None in the execdict dictionary prevents the code from directly executing any built-in functions, like __import__, and so makes running it safer. You can ease that restriction by selectively adding things to it you feel are OK and/or required.
It's also possible to add your own predefined utilities and attributes which you'd like made available to the code thereby creating a custom execution context for it to run in. That sort of thing can be useful for implementing a "plug-in" or other user-extensible architecture.
you could use exec or eval to execute python code as a string. see here, here and here
The documentation for imp.load_source says (my emphasis):
The file argument is the source file, open for reading as text, from the beginning. It must currently be a real file object, not a user-defined class emulating a file.
... so you may be out of luck with this method, I'm afraid.
Perhaps eval would be enough for you in this case?
This sounds like a rather surprising requirement, though - it might help if you add some more to your question about the problem you're really trying to solve.
I'm trying to integrate a project Project A built by a colleague into another python project. Now this colleague has not used relative imports in his code but instead done
from packageA.moduleA import ClassA
from packageA.moduleA import ClassB
and consequently pickled the classes with cPickle. For neatness I'd like to hide the package that his (Project A) built inside my project. This however changes the path of the classes defined in packageA. No problem, I'll just redefine the import using
from ..packageA.moduleA import ClassA
from ..packageA.moduleA import ClassB
but now the un pickling the classes fails with the following message
with open(fname) as infile: self.clzA = cPickle.load(infile)
ImportError: No module named packageA.moduleA
So why doesn't cPickle apparently see the module defs. Do I need to add the root of packageA to system path? Is this the correct way to solve the problem?
The cPickled file looks something like
ccopy_reg
_reconstructor
p1
(cpackageA.moduleA
ClassA
p2
c__builtin__
object
p3
NtRp4
The old project hierarchy is of the sort
packageA/
__init__.py
moduleA.py
moduleB.py
packageB/
__init__.py
moduleC.py
moduleD.py
I'd like to put all of that into a WrapperPackage
MyPackage/
.. __init__.py
.. myModuleX.py
.. myModuleY.py
WrapperPackage/
.. __init__.py
.. packageA/
.. __init__.py
.. moduleA.py
.. moduleB.py
.. packageB/
.. __init__.py
.. moduleC.py
.. moduleD.py
You'll need to create an alias for the pickle import to work; the following to the __init__.py file of the WrapperPackage package:
from .packageA import * # Ensures that all the modules have been loaded in their new locations *first*.
from . import packageA # imports WrapperPackage/packageA
import sys
sys.modules['packageA'] = packageA # creates a packageA entry in sys.modules
It may be that you'll need to create additional entries though:
sys.modules['packageA.moduleA'] = moduleA
# etc.
Now cPickle will find packageA.moduleA and packageA.moduleB again at their old locations.
You may want to re-write the pickle file afterwards, the new module location will be used at that time. The additional aliases created above should ensure that the modules in question have the new location name for cPickle to pick up when writing the classes again.
In addition to #MartinPieters answer the other way of doing this is to define the find_global method of the cPickle.Unpickler class, or extend the pickle.Unpickler class.
def map_path(mod_name, kls_name):
if mod_name.startswith('packageA'): # catch all old module names
mod = __import__('WrapperPackage.%s'%mod_name, fromlist=[mod_name])
return getattr(mod, kls_name)
else:
mod = __import__(mod_name)
return getattr(mod, kls_name)
import cPickle as pickle
with open('dump.pickle','r') as fh:
unpickler = pickle.Unpickler(fh)
unpickler.find_global = map_path
obj = unpickler.load() # object will now contain the new class path reference
with open('dump-new.pickle','w') as fh:
pickle.dump(obj, fh) # ClassA will now have a new path in 'dump-new'
A more detailed explanation of the process for both pickle and cPickle can be found here.
One possible solution is to directly edit the pickle file (if you have access). I ran into this same problem of a changed module path, and I had saved the files as pickle.HIGHEST_PROTOCOL so it should be binary in theory, but the module path was sitting at the top of the pickle file in plain text. So I just did a find replace on all of the instances of the old module path with the new one and voila, they loaded correctly.
I'm sure this solution is not for everyone, especially if you have a very complex pickled object, but it is a quick and dirty data fix that worked for me!
This is my basic pattern for flexible unpickling - via an unambiguous and fast transition map - as there are usually just a few known classes besides the primitive data-types relevant for pickling. This also protects unpickling against erroneous or maliciously constructed data, which after all can execute arbitrary python code (!) upon a simple pickle.load() (with or without error-prone sys.modules fiddling).
Python 2 & 3:
from __future__ import print_function
try:
import cPickle as pickle, copy_reg as copyreg
except:
import pickle, copyreg
class OldZ:
a = 1
class Z(object):
a = 2
class Dangerous:
pass
_unpickle_map_safe = {
# all possible and allowed (!) classes & upgrade paths
(__name__, 'Z') : Z,
(__name__, 'OldZ') : Z,
('old.package', 'OldZ') : Z,
('__main__', 'Z') : Z,
('__main__', 'OldZ') : Z,
# basically required
('copy_reg', '_reconstructor') : copyreg._reconstructor,
('__builtin__', 'object') : copyreg._reconstructor,
}
def unpickle_find_class(modname, clsname):
print("DEBUG unpickling: %(modname)s . %(clsname)s" % locals())
try:
return _unpickle_map_safe[(modname, clsname)]
except KeyError:
raise pickle.UnpicklingError(
"%(modname)s . %(clsname)s not allowed" % locals())
if pickle.__name__ == 'cPickle': # PY2
def SafeUnpickler(f):
u = pickle.Unpickler(f)
u.find_global = unpickle_find_class
return u
else: # PY3 & Python2-pickle.py
class SafeUnpickler(pickle.Unpickler):
find_class = staticmethod(unpickle_find_class)
def test(fn='./z.pkl'):
z = OldZ()
z.b = 'teststring' + sys.version
pickle.dump(z, open(fn, 'wb'), 2)
pickle.dump(Dangerous(), open(fn + 'D', 'wb'), 2)
# load again
o = SafeUnpickler(open(fn, 'rb')).load()
print(pickle, "loaded:", o, o.a, o.b)
assert o.__class__ is Z
try:
raise SafeUnpickler(open(fn + 'D', 'rb')).load() and AssertionError
except pickle.UnpicklingError:
print('OK: Dangerous not allowed')
if __name__ == '__main__':
test()