Pickle class definition in module with dill - python

My module contains a class which should be pickleable, both instance and definition
I have the following structure:
MyModule
|-Submodule
|-MyClass
In other questions on SO I have already found that dill is able to pickle class definitions and surely enough it works by copying the definition of MyClass into a separate script and pickling it there, like this:
import dill as pickle
class MyClass(object):
...
instance = MyClass(...)
with open(..., 'wb') as file:
pickle.dump(instance, file)
However, it does not work when importing the class:
Pickling:
from MyModule.Submodule import MyClass
import dill as pickle
instance = MyClass(...)
with open(.., 'wb') as file:
pickle.dump(instance, file)
Loading:
import dill as pickle
with open(..., 'rb') as file:
instance = pickle.load(file)
>>> ModuleNotFoundError: No module named 'MyModule'
I think the class definition is saved by reference, although it should not have as per default settings in dill. This is done correctly when MyClass is known as __main__.MyClass, which happens when the class is defined in the main script.
I am wondering, is there any way to detach MyClass from MyModule? Any way to make it act like a top level import (__main__.MyClass) so dill knows how to load it on my other machine?
Relevant question:
Why dill dumps external classes by reference no matter what

Dill indeed only stores definitions of objects in __main__, and not those in modules, so one way around this problem is to redefine those objects in main:
def mainify(obj):
import __main__
import inspect
import ast
s = inspect.getsource(obj)
m = ast.parse(s)
co = compile(m, '<string>', 'exec')
exec(co, __main__.__dict__)
And then:
from MyModule.Submodule import MyClass
import dill as pickle
mainify(MyClass)
instance = MyClass(...)
with open(.., 'wb') as file:
pickle.dump(instance, file)
Now you should be able to load the pickle from anywhere, even where the MyModule.Submodule is not available.

I'm the dill author. This is a duplicate of the question you refer to above. The relevant GitHub feature request is: https://github.com/uqfoundation/dill/issues/128.
I think the larger issue is that you want to pickle an object defined in another file that is not installed. This is currently not possible, I believe.
As a workaround, I believe you may be able to pickle with dill.source by extracting the source code of the class (or module) and pickling that dynamically, or extracting the source code and compiling a new object in __main__.

I managed to save the instance and definition of my class using the following dirty hack:
class MyClass(object):
def save(path):
import __main__
with open(__file__) as f:
code = compile(f.read(), "somefile.py", 'exec')
globals = __main__.__dict__
locals = {'instance': self, 'savepath': path}
exec(code, globals, locals)
if __name__ == '__main__':
# Script is loaded in top level, MyClass is now available under the qualname '__main__.MyClass'
import dill as pickle
# copy the attributes of the 'MyModule.Submodule.MyClass' instance to a bew 'MyClass' instance.
new_instance = MyClass.__new__(MyClass)
new_instance.__dict__ = locals()['instance'].__dict__
with open(locals()['savepath'], 'wb') as f:
pickle.dump(new_instance, f)
Using the exec statement the file can be executed from within __main__, so the class definition will be saved as well.
This script should not be executed as main script without using the save function.

Related

Is there a way to serialize a class such that it can be unserialized independent of its original script?

Or, is there a way to serialize and save a class from a script, that can still be loaded if the script is deleted?
Consider three Python scripts that are in the same directory:
test.py
import pickle
import test_class_pickle
tc = test_class_pickle.Test()
pickle.dump(tc, open("/home/user/testclass", "wb"))
test_class_pickle.py
class Test:
def __init__(self):
self.var1 = "Hello!"
self.var2 = "Goodbye!"
def print_vars(self):
print(self.var1, self.var2)
test_class_unpickle.py
import pickle
tc = pickle.load(open("/home/user/testclass", "rb"))
print(tc.var1, tc.var2)
When I run test.py, it imports the Test class from test_class_pickle, creates an instance of it, and saves it to a file using pickle. When I run test_class_unpickle.py, it loads the class back into memory as expected.
However, when I delete test_class_pickle.py and run test_class_unpickle.py again, it throws this exception:
Traceback (most recent call last):
File "/home/sam/programs/python/testing/test_class_unpickle.py", line 3, in <module>
tc = pickle.load(open("/home/sam/testclass", "rb"))
ModuleNotFoundError: No module named 'test_class_pickle'
Is there a way I can save class instances to a file without relying on the original script's continuous existence? It would be nice if I didn't have to use something like json (which would require me to get a list of all the attributes of the class, write them into a dictionary, etc.), because all the classes are also handling other classes, which are handling other classes, etc., and each class has several functions that handle the data.
Here's a way to get dill to do it. Dill only stores definitions of objects defined in __main__, but not those in separate modules. The following function redefines a separate module in __main__ so they will be stored in the pickle file. Based on this answer https://stackoverflow.com/a/64758608/355230.
test.py
import dill
from pathlib import Path
import test_class_pickle
def mainify_module(module):
import __main__ # This module.
import importlib, inspect, sys
code = inspect.getsource(module)
spec = importlib.util.spec_from_loader(module.__name__, loader=None)
module_obj = importlib.util.module_from_spec(spec)
exec(code, __main__.__dict__)
sys.modules[module.__name__] = module_obj # Replace in cache.
globals()[module.__name__] = module_obj # Redefine.
pkl_filepath = Path('testclass.pkl')
pkl_filepath.unlink(missing_ok=True) # Delete any existing file.
mainify_module(test_class_pickle)
tc = Test('I Am the Walrus!')
with open(pkl_filepath, 'wb') as file:
dill.dump(tc, file)
print(f'dill pickle file {pkl_filepath.name!r} created')
test_class_pickle.py
class Test:
def __init__(self, baz):
self.var1 = "Hello!"
self.var2 = "Goodbye!"
self.foobar = baz
def print_vars(self):
print(self.var1, self.var2, self.foobar)
test_class_unpickle.py
import dill
pkl_filepath = 'testclass.pkl'
with open(pkl_filepath, 'rb') as file:
tc = dill.load(file)
tc.print_vars() # -> Hello! Goodbye! I Am the Walrus!
If, as in your example, the class you want to serialize is self-contained, i.e. it doesn't reference global objects or other custom classes of the same package, a simpler workaround is to temporarily "orphan" the class:
import dill
import test_class_pickle
def pickle_class_by_value(file, obj, **kwargs):
cls = obj if isinstance(obj, type) else type(obj)
cls_module = cls.__module__
cls.__module__ = None # dill will think this class is orphaned...
dill.dump(file, obj, **kwargs) # and serialize it by value
cls.__module__ = cls_module
tc = test_class_pickle.Test()
with open("/home/user/testclass.pkl", "wb") as file:
pickle_class_by_value(file, tc)

Loading source code with inspect.getsource() fails saying it can't read a built-in class. (It's not)

When I load the source code of a class from a module directly, it's fine:
import arg_master
inspect.getsource(func)
When I load a module with spec_from_file_location and go for a function it's fine.
When I load a module with spec_from_file_location and go for a class,
it fails with:
TypeError: <class 'mymod.ArgMaster'> is a built-in class
(it's not. I wrote it.)
Here is my full source:
import os, inspect, importlib
filename = 'arg_master.py'
spec = importlib.util.spec_from_file_location("mymod", filename)
mymod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mymod)
func = vars(mymod)['ArgMaster']
inspect.getsource(func) #<<< Fails
Load method number 2 also fails:
import importlib, types
filename = 'arg_master.py'
loader = importlib.machinery.SourceFileLoader('mod', filename)
mod = types.ModuleType(loader.name)
loader.exec_module(mod)
func = vars(mod)['ArgMaster']
inspect.getsource(func)
Edit: I found a hackish solution:
import inspect
filename = 'arg_master.py'
name = os.path.basename(filename)
name = os.path.splitext(name)[0]
importlib.import_module(name)
func = vars(mod)['ArgMaster']
inspect.getsource(func)
I get the same error with your code, which oddly enough works for functions, but not classes. This below is essentially the same as your solution, just slightly less "hacky" as you don't have to mess with file paths:
import inspect, importlib
cls = getattr(importlib.import_module('arg_master'), 'ArgMaster')
print(inspect.getsource(cls))
This will work if the two modules in the same dir. If you need to do relative imports, might have to mess around with package=__pckage__ or similar

python create and register module via compile function [duplicate]

I have some code in the form of a string and would like to make a module out of it without writing to disk.
When I try using imp and a StringIO object to do this, I get:
>>> imp.load_source('my_module', '', StringIO('print "hello world"'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: load_source() argument 3 must be file, not instance
>>> imp.load_module('my_module', StringIO('print "hello world"'), '', ('', '', 0))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: load_module arg#2 should be a file or None
How can I create the module without having an actual file? Alternatively, how can I wrap a StringIO in a file without writing to disk?
UPDATE:
NOTE: This issue is also a problem in python3.
The code I'm trying to load is only partially trusted. I've gone through it with ast and determined that it doesn't import anything or do anything I don't like, but I don't trust it enough to run it when I have local variables running around that could get modified, and I don't trust my own code to stay out of the way of the code I'm trying to import.
I created an empty module that only contains the following:
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
Then you can import mymodule and call mymodule.load(code). This works for me because I've ensured that the code I'm loading does not use globals. Also, the global keyword is only a parser directive and can't refer to anything outside of the exec.
This really is way too much work to import the module without writing to disk, but if you ever want to do this, I believe it's the best way.
Here is how to import a string as a module (Python 2.x):
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec my_code in mymodule.__dict__
In Python 3, exec is a function, so this should work:
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec(my_code, mymodule.__dict__)
Now access the module attributes (and functions, classes etc) as:
print(mymodule.a)
>>> 5
To ignore any next attempt to import, add the module to sys:
sys.modules['mymodule'] = mymodule
imp.new_module is deprecated since python 3.4, but it still works as of python 3.9
imp.new_module was replaced with importlib.util.module_from_spec
importlib.util.module_from_spec
is preferred over using types.ModuleType to create a new module as
spec is used to set as many import-controlled attributes on the module
as possible.
importlib.util.spec_from_loader
uses available loader APIs, such as InspectLoader.is_package(), to
fill in any missing information on the spec.
these module attributes are __builtins__ __doc__ __loader__ __name__ __package__ __spec__
import sys, importlib.util
def import_module_from_string(name: str, source: str):
"""
Import module from source string.
Example use:
import_module_from_string("m", "f = lambda: print('hello')")
m.f()
"""
spec = importlib.util.spec_from_loader(name, loader=None)
module = importlib.util.module_from_spec(spec)
exec(source, module.__dict__)
sys.modules[name] = module
globals()[name] = module
# demo
# note: "if True:" allows to indent the source string
import_module_from_string('hello_module', '''if True:
def hello():
print('hello')
''')
hello_module.hello()
You could simply create a Module object and stuff it into sys.modules and put your code inside.
Something like:
import sys
from types import ModuleType
mod = ModuleType('mymodule')
sys.modules['mymodule'] = mod
exec(mycode, mod.__dict__)
If the code for the module is in a string, you can forgo using StringIO and use it directly with exec, as illustrated below with a file named dynmodule.py.
Works in Python 2 & 3.
from __future__ import print_function
class _DynamicModule(object):
def load(self, code):
execdict = {'__builtins__': None} # optional, to increase safety
exec(code, execdict)
keys = execdict.get(
'__all__', # use __all__ attribute if defined
# else all non-private attributes
(key for key in execdict if not key.startswith('_')))
for key in keys:
setattr(self, key, execdict[key])
# replace this module object in sys.modules with empty _DynamicModule instance
# see Stack Overflow question:
# https://stackoverflow.com/questions/5365562/why-is-the-value-of-name-changing-after-assignment-to-sys-modules-name
import sys as _sys
_ref, _sys.modules[__name__] = _sys.modules[__name__], _DynamicModule()
if __name__ == '__main__':
import dynmodule # name of this module
import textwrap # for more readable code formatting in sample string
# string to be loaded can come from anywhere or be generated on-the-fly
module_code = textwrap.dedent("""\
foo, bar, baz = 5, 8, 2
def func():
return foo*bar + baz
__all__ = 'foo', 'bar', 'func' # 'baz' not included
""")
dynmodule.load(module_code) # defines module's contents
print('dynmodule.foo:', dynmodule.foo)
try:
print('dynmodule.baz:', dynmodule.baz)
except AttributeError:
print('no dynmodule.baz attribute was defined')
else:
print('Error: there should be no dynmodule.baz module attribute')
print('dynmodule.func() returned:', dynmodule.func())
Output:
dynmodule.foo: 5
no dynmodule.baz attribute was defined
dynmodule.func() returned: 42
Setting the '__builtins__' entry to None in the execdict dictionary prevents the code from directly executing any built-in functions, like __import__, and so makes running it safer. You can ease that restriction by selectively adding things to it you feel are OK and/or required.
It's also possible to add your own predefined utilities and attributes which you'd like made available to the code thereby creating a custom execution context for it to run in. That sort of thing can be useful for implementing a "plug-in" or other user-extensible architecture.
you could use exec or eval to execute python code as a string. see here, here and here
The documentation for imp.load_source says (my emphasis):
The file argument is the source file, open for reading as text, from the beginning. It must currently be a real file object, not a user-defined class emulating a file.
... so you may be out of luck with this method, I'm afraid.
Perhaps eval would be enough for you in this case?
This sounds like a rather surprising requirement, though - it might help if you add some more to your question about the problem you're really trying to solve.

How can a string be imported as a Python module bound to a local name? [duplicate]

I have some code in the form of a string and would like to make a module out of it without writing to disk.
When I try using imp and a StringIO object to do this, I get:
>>> imp.load_source('my_module', '', StringIO('print "hello world"'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: load_source() argument 3 must be file, not instance
>>> imp.load_module('my_module', StringIO('print "hello world"'), '', ('', '', 0))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: load_module arg#2 should be a file or None
How can I create the module without having an actual file? Alternatively, how can I wrap a StringIO in a file without writing to disk?
UPDATE:
NOTE: This issue is also a problem in python3.
The code I'm trying to load is only partially trusted. I've gone through it with ast and determined that it doesn't import anything or do anything I don't like, but I don't trust it enough to run it when I have local variables running around that could get modified, and I don't trust my own code to stay out of the way of the code I'm trying to import.
I created an empty module that only contains the following:
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
Then you can import mymodule and call mymodule.load(code). This works for me because I've ensured that the code I'm loading does not use globals. Also, the global keyword is only a parser directive and can't refer to anything outside of the exec.
This really is way too much work to import the module without writing to disk, but if you ever want to do this, I believe it's the best way.
Here is how to import a string as a module (Python 2.x):
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec my_code in mymodule.__dict__
In Python 3, exec is a function, so this should work:
import sys,imp
my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec(my_code, mymodule.__dict__)
Now access the module attributes (and functions, classes etc) as:
print(mymodule.a)
>>> 5
To ignore any next attempt to import, add the module to sys:
sys.modules['mymodule'] = mymodule
imp.new_module is deprecated since python 3.4, but it still works as of python 3.9
imp.new_module was replaced with importlib.util.module_from_spec
importlib.util.module_from_spec
is preferred over using types.ModuleType to create a new module as
spec is used to set as many import-controlled attributes on the module
as possible.
importlib.util.spec_from_loader
uses available loader APIs, such as InspectLoader.is_package(), to
fill in any missing information on the spec.
these module attributes are __builtins__ __doc__ __loader__ __name__ __package__ __spec__
import sys, importlib.util
def import_module_from_string(name: str, source: str):
"""
Import module from source string.
Example use:
import_module_from_string("m", "f = lambda: print('hello')")
m.f()
"""
spec = importlib.util.spec_from_loader(name, loader=None)
module = importlib.util.module_from_spec(spec)
exec(source, module.__dict__)
sys.modules[name] = module
globals()[name] = module
# demo
# note: "if True:" allows to indent the source string
import_module_from_string('hello_module', '''if True:
def hello():
print('hello')
''')
hello_module.hello()
You could simply create a Module object and stuff it into sys.modules and put your code inside.
Something like:
import sys
from types import ModuleType
mod = ModuleType('mymodule')
sys.modules['mymodule'] = mod
exec(mycode, mod.__dict__)
If the code for the module is in a string, you can forgo using StringIO and use it directly with exec, as illustrated below with a file named dynmodule.py.
Works in Python 2 & 3.
from __future__ import print_function
class _DynamicModule(object):
def load(self, code):
execdict = {'__builtins__': None} # optional, to increase safety
exec(code, execdict)
keys = execdict.get(
'__all__', # use __all__ attribute if defined
# else all non-private attributes
(key for key in execdict if not key.startswith('_')))
for key in keys:
setattr(self, key, execdict[key])
# replace this module object in sys.modules with empty _DynamicModule instance
# see Stack Overflow question:
# https://stackoverflow.com/questions/5365562/why-is-the-value-of-name-changing-after-assignment-to-sys-modules-name
import sys as _sys
_ref, _sys.modules[__name__] = _sys.modules[__name__], _DynamicModule()
if __name__ == '__main__':
import dynmodule # name of this module
import textwrap # for more readable code formatting in sample string
# string to be loaded can come from anywhere or be generated on-the-fly
module_code = textwrap.dedent("""\
foo, bar, baz = 5, 8, 2
def func():
return foo*bar + baz
__all__ = 'foo', 'bar', 'func' # 'baz' not included
""")
dynmodule.load(module_code) # defines module's contents
print('dynmodule.foo:', dynmodule.foo)
try:
print('dynmodule.baz:', dynmodule.baz)
except AttributeError:
print('no dynmodule.baz attribute was defined')
else:
print('Error: there should be no dynmodule.baz module attribute')
print('dynmodule.func() returned:', dynmodule.func())
Output:
dynmodule.foo: 5
no dynmodule.baz attribute was defined
dynmodule.func() returned: 42
Setting the '__builtins__' entry to None in the execdict dictionary prevents the code from directly executing any built-in functions, like __import__, and so makes running it safer. You can ease that restriction by selectively adding things to it you feel are OK and/or required.
It's also possible to add your own predefined utilities and attributes which you'd like made available to the code thereby creating a custom execution context for it to run in. That sort of thing can be useful for implementing a "plug-in" or other user-extensible architecture.
you could use exec or eval to execute python code as a string. see here, here and here
The documentation for imp.load_source says (my emphasis):
The file argument is the source file, open for reading as text, from the beginning. It must currently be a real file object, not a user-defined class emulating a file.
... so you may be out of luck with this method, I'm afraid.
Perhaps eval would be enough for you in this case?
This sounds like a rather surprising requirement, though - it might help if you add some more to your question about the problem you're really trying to solve.

How do I change the name of an imported library?

I am using jython with a third party application. The third party application has some builtin libraries foo. To do some (unit) testing we want to run some code outside of the application. Since foo is bound to the application we decided to write our own mock implementation.
However there is one issue, we implemented our mock class in python while their class is in java. Thus to use their code one would do import foo and foo is the mock class afterwards. However if we import the python module like this we get the module attached to the name, thus one has to write foo.foo to get to the class.
For convenience reason we would love to be able to write from ourlib.thirdparty import foo to bind foo to the foo-class. However we would like to avoid to import all the classes in ourlib.thirdparty directly, since the loading time for each file takes quite a while.
Is there any way to this in python? ( I did not get far with Import hooks I tried simply returning the class from load_module or overwriting what I write to sys.modules (I think both approaches are ugly, particularly the later))
edit:
ok: here is what the files in ourlib.thirdparty look like simplified(without magic):
foo.py:
try:
import foo
except ImportError:
class foo
....
Actually they look like this:
foo.py:
class foo
....
__init__.py in ourlib.thirdparty
import sys
import os.path
import imp
#TODO: 3.0 importlib.util abstract base classes could greatly simplify this code or make it prettier.
class Importer(object):
def __init__(self, path_entry):
if not path_entry.startswith(os.path.join(os.path.dirname(__file__), 'thirdparty')):
raise ImportError('Custom importer only for thirdparty objects')
self._importTuples = {}
def find_module(self, fullname):
module = fullname.rpartition('.')[2]
try:
if fullname not in self._importTuples:
fileObj, self._importTuples[fullname] = imp.find_module(module)
if isinstance(fileObj, file):
fileObj.close()
except:
print 'backup'
path = os.path.join(os.path.join(os.path.dirname(__file__), 'thirdparty'), module+'.py')
if not os.path.isfile(path):
return None
raise ImportError("Could not find dummy class for %s (%s)\n(searched:%s)" % (module, fullname, path))
self._importTuples[fullname] = path, ('.py', 'r', imp.PY_SOURCE)
return self
def load_module(self, fullname):
fp = None
python = False
print fullname
if self._importTuples[fullname][1][2] in (imp.PY_SOURCE, imp.PY_COMPILED, imp.PY_FROZEN):
fp = open( self._importTuples[fullname][0], self._importTuples[fullname][1][1])
python = True
try:
imp.load_module(fullname, fp, *self._importTuples[fullname])
finally:
if python:
module = fullname.rpartition('.')[2]
#setattr(sys.modules[fullname], module, getattr(sys.modules[fullname], module))
#sys.modules[fullname] = getattr(sys.modules[fullname], module)
if isinstance(fp, file):
fp.close()
return getattr(sys.modules[fullname], module)
sys.path_hooks.append(Importer)
As others have remarked, it is such a plain thing in Python that the import statement iself has a syntax for that:
from foo import foo as original_foo, for example -
or even import foo as module_foo
Interesting to note is that the import statemente binds a name to the imported module or object ont he local context - however, the dictionary sys.modules (on the moduels sys of course), is a live reference to all imported modules, using their names as a key. This mechanism plays a key role in avoding that Python re-reads and re-executes and already imported module , when running (that is, if various of yoru modules or sub-modules import the samefoo` module, it is just read once -- the subsequent imports use the reference stored in sys.modules).
And -- besides the "import...as" syntax, modules in Python are just another object: you can assign any other name to them in run time.
So, the following code would also work perfectly for you:
import foo
original_foo = foo
class foo(Mock):
...

Categories

Resources