Unpickling python objects with a changed module path - python

I'm trying to integrate a project Project A built by a colleague into another python project. Now this colleague has not used relative imports in his code but instead done
from packageA.moduleA import ClassA
from packageA.moduleA import ClassB
and consequently pickled the classes with cPickle. For neatness I'd like to hide the package that his (Project A) built inside my project. This however changes the path of the classes defined in packageA. No problem, I'll just redefine the import using
from ..packageA.moduleA import ClassA
from ..packageA.moduleA import ClassB
but now the un pickling the classes fails with the following message
with open(fname) as infile: self.clzA = cPickle.load(infile)
ImportError: No module named packageA.moduleA
So why doesn't cPickle apparently see the module defs. Do I need to add the root of packageA to system path? Is this the correct way to solve the problem?
The cPickled file looks something like
ccopy_reg
_reconstructor
p1
(cpackageA.moduleA
ClassA
p2
c__builtin__
object
p3
NtRp4
The old project hierarchy is of the sort
packageA/
__init__.py
moduleA.py
moduleB.py
packageB/
__init__.py
moduleC.py
moduleD.py
I'd like to put all of that into a WrapperPackage
MyPackage/
.. __init__.py
.. myModuleX.py
.. myModuleY.py
WrapperPackage/
.. __init__.py
.. packageA/
.. __init__.py
.. moduleA.py
.. moduleB.py
.. packageB/
.. __init__.py
.. moduleC.py
.. moduleD.py

You'll need to create an alias for the pickle import to work; the following to the __init__.py file of the WrapperPackage package:
from .packageA import * # Ensures that all the modules have been loaded in their new locations *first*.
from . import packageA # imports WrapperPackage/packageA
import sys
sys.modules['packageA'] = packageA # creates a packageA entry in sys.modules
It may be that you'll need to create additional entries though:
sys.modules['packageA.moduleA'] = moduleA
# etc.
Now cPickle will find packageA.moduleA and packageA.moduleB again at their old locations.
You may want to re-write the pickle file afterwards, the new module location will be used at that time. The additional aliases created above should ensure that the modules in question have the new location name for cPickle to pick up when writing the classes again.

In addition to #MartinPieters answer the other way of doing this is to define the find_global method of the cPickle.Unpickler class, or extend the pickle.Unpickler class.
def map_path(mod_name, kls_name):
if mod_name.startswith('packageA'): # catch all old module names
mod = __import__('WrapperPackage.%s'%mod_name, fromlist=[mod_name])
return getattr(mod, kls_name)
else:
mod = __import__(mod_name)
return getattr(mod, kls_name)
import cPickle as pickle
with open('dump.pickle','r') as fh:
unpickler = pickle.Unpickler(fh)
unpickler.find_global = map_path
obj = unpickler.load() # object will now contain the new class path reference
with open('dump-new.pickle','w') as fh:
pickle.dump(obj, fh) # ClassA will now have a new path in 'dump-new'
A more detailed explanation of the process for both pickle and cPickle can be found here.

One possible solution is to directly edit the pickle file (if you have access). I ran into this same problem of a changed module path, and I had saved the files as pickle.HIGHEST_PROTOCOL so it should be binary in theory, but the module path was sitting at the top of the pickle file in plain text. So I just did a find replace on all of the instances of the old module path with the new one and voila, they loaded correctly.
I'm sure this solution is not for everyone, especially if you have a very complex pickled object, but it is a quick and dirty data fix that worked for me!

This is my basic pattern for flexible unpickling - via an unambiguous and fast transition map - as there are usually just a few known classes besides the primitive data-types relevant for pickling. This also protects unpickling against erroneous or maliciously constructed data, which after all can execute arbitrary python code (!) upon a simple pickle.load() (with or without error-prone sys.modules fiddling).
Python 2 & 3:
from __future__ import print_function
try:
import cPickle as pickle, copy_reg as copyreg
except:
import pickle, copyreg
class OldZ:
a = 1
class Z(object):
a = 2
class Dangerous:
pass
_unpickle_map_safe = {
# all possible and allowed (!) classes & upgrade paths
(__name__, 'Z') : Z,
(__name__, 'OldZ') : Z,
('old.package', 'OldZ') : Z,
('__main__', 'Z') : Z,
('__main__', 'OldZ') : Z,
# basically required
('copy_reg', '_reconstructor') : copyreg._reconstructor,
('__builtin__', 'object') : copyreg._reconstructor,
}
def unpickle_find_class(modname, clsname):
print("DEBUG unpickling: %(modname)s . %(clsname)s" % locals())
try:
return _unpickle_map_safe[(modname, clsname)]
except KeyError:
raise pickle.UnpicklingError(
"%(modname)s . %(clsname)s not allowed" % locals())
if pickle.__name__ == 'cPickle': # PY2
def SafeUnpickler(f):
u = pickle.Unpickler(f)
u.find_global = unpickle_find_class
return u
else: # PY3 & Python2-pickle.py
class SafeUnpickler(pickle.Unpickler):
find_class = staticmethod(unpickle_find_class)
def test(fn='./z.pkl'):
z = OldZ()
z.b = 'teststring' + sys.version
pickle.dump(z, open(fn, 'wb'), 2)
pickle.dump(Dangerous(), open(fn + 'D', 'wb'), 2)
# load again
o = SafeUnpickler(open(fn, 'rb')).load()
print(pickle, "loaded:", o, o.a, o.b)
assert o.__class__ is Z
try:
raise SafeUnpickler(open(fn + 'D', 'rb')).load() and AssertionError
except pickle.UnpicklingError:
print('OK: Dangerous not allowed')
if __name__ == '__main__':
test()

Related

Set a module's class method as an attribute of that module from outside that module

The objective:
I have a package with submodules that I would like to be accessible in the most straightforward way possible. The submodules contain classes to take advantage of the class structure, but don't need to be initialized (as they contain static and class methods). So, ideally, I would like to access them as follows:
from myPackage.subModule import someMethod
print (someMethod)
from myPackage import subModule
print (subModule.someMethod)
import myPackage
print(myPackage.subModule.someMethod)
Here is the package structure:
myPackage ─┐
__init__.py
subModule
subModule2
etc.
Example of a typical submodule:
# submodule.py
class SomeClass():
someAttr = list(range(10))
#classmethod
def someMethod(cls):
pass
#staticmethod
def someMethod2():
pass
Here is the code I have in my '__init __.py': In order to achieve the above; it attempts to set attributes for each class at the package level, and the same for it's methods at the sub-module level.
# __init__.py
def import_submodules(package, filetypes=('py', 'pyc', 'pyd'), ignoreStartingWith='_'):
'''Import submodules to the given package, expose any classes at the package level
and their respective class methods at submodule level.
:Parameters:
package (str)(obj) = A python package.
filetypes (str)(tuple) = Filetype extension(s) to include.
ignoreStartingWith (str)(tuple) = Ignore submodules starting with given chars.
'''
if isinstance(package, str):
package = sys.modules[package]
if not package:
return
pkg_dir = os.path.dirname(os.path.abspath(package.__file__))
sys.path.append(pkg_dir) #append this dir to the system path.
for mod_name in os.listdir(pkg_dir):
if mod_name.startswith(ignoreStartingWith):
continue
elif os.path.isfile(os.path.join(pkg_dir, mod_name)):
mod_name, *mod_ext = mod_name.rsplit('.', 1)
if filetypes:
if not mod_ext or mod_ext[0] not in filetypes:
continue
mod = importlib.import_module(mod_name)
vars(package)[mod_name] = mod
classes = inspect.getmembers(mod, inspect.isclass)
for cls_name, clss in classes:
vars(package)[cls_name] = clss
methods = inspect.getmembers(clss, inspect.isfunction)
for method_name, method in methods:
vars(mod)[method_name] = method
del mod_name
import_submodules(__name__)
At issue is this line:
vars(mod)[method_name] = method
Which ultimately results in: (indicating that the attribute was not set)
from myPackage.subModule import someMethod
ImportError: cannot import name 'someMethod' from 'myPackage.subModule'
I am able to set the methods as attributes to the module within that module, but setting them from outside (ie. in the package __init __), isn't working as written. I understand this isn't ideal to begin with, but my current logic is; that the ease of use, outweighs any perceived issues with namespace pollution. I am, of course, always open to counter-arguments.
I just checked it on my machine.
Created a package myPackage with a module subModule that has a function someMethod.
I run a python shell with working directory in the same directory that the myPackage is in, and to get these 3 import statements to work:
from myPackage.subModule import someMethod
from myPackage import subModule
import myPackage
All I had to do was to create an __init__.py with this line in it:
from . import subModule
Found a nice "hacky" solution -
subModule.py:
class myClass:
#staticmethod
def someMethod():
print("I have a bad feeling about this")
myInstance = myClass()
someMethod = myInstance.someMethod
init.py is empty
Still scratching my head of why I am unable to do this from the package __init __, but this solution works with the caveat it has to be called at the end of each submodule. Perhaps someone, in the future, someone can chime in as to why this wasn't working when completely contained in the __init __.
def addMembers(module, ignoreStartingWith='_'):
'''Expose class members at module level.
:Parameters:
module (str)(obj) = A python module.
ignoreStartingWith (str)(tuple) = Ignore class members starting with given chars.
ex. call: addMembers(__name__)
'''
if isinstance(module, str):
module = sys.modules[module]
if not module:
return
classes = inspect.getmembers(module, inspect.isclass)
for cls_name, clss in classes:
cls_members = [(o, getattr(clss, o)) for o in dir(clss) if not o.startswith(ignoreStartingWith)]
for name, mem in cls_members:
vars(module)[name] = mem
This is the solution I ended up going with. It needs to be put at the end of each submodule of your package. But, it is simple and in addition to all the standard ways of importing, allows you to import a method directly:
def __getattr__(attr):
'''Attempt to get a class attribute.
:Parameters:
attr (str): A name of a class attribute.
:Return:
(obj) The attribute.
'''
try:
return getattr(Someclass, attr)
except AttributeError as error:
raise AttributeError(f'{__file__} in __getattr__\n\t{error} ({type(attr).__name__})')
from somePackage.someModule import someMethod

How to expose every name in sub-module in __init__.py of a package?

I defined a package that include a dynamically growing set of modules:
- mypackage
- __init__.py
- module1.py
- module2.py
- module3.py
... many more .py files will be added
I could expose every name in every module in __init__.py like this:
from module1 import *
from module2 import *
from module3 import *
Now when I import mypackage in client code, I get all the names defined in the sub-modules:
# suppose funcA is defined in module1, class B is defined in module2
import mypackage
mypackage.funcA() # call module1.funcA
b = mypackage.B() # module2.B
The problem is, I could define many new modules in mypackage, and I don't want to add an extra line from modulex import * to __init__.py, every time I add a new module to the package.
What is the best way to dynamically export names in all submodules?
In Python 3 the solution is straightforward using importlib and pkgutil.
Placing the following code in __init__.py is the same typing from submodule import * for all submodules (even nested ones).
import importlib
import pkgutil
for mod_info in pkgutil.walk_packages(__path__, __name__ + '.'):
mod = importlib.import_module(mod_info.name)
# Emulate `from mod import *`
try:
names = mod.__dict__['__all__']
except KeyError:
names = [k for k in mod.__dict__ if not k.startswith('_')]
globals().update({k: getattr(mod, k) for k in names})
If you only want to include immediate submodules (e.g. pkg.mod but not pkg.mod.submod), replace walk_packages with iter_modules.
I emulated from mod import * based on this answer: How to do from module import * using importlib?
I'm not sure if this is what you mean:
but if i've understood correctly - do this in your __init__.py file.
import os
__all__ = []
for module in os.listdir(os.path.dirname(__file__)):
if module != '__init__.py' and module[-3:] == '.py':
__all__.append(module[:-3])
You're adding all files in the same package into the __all__
Simply adding module names to __all__ will not always serve the purpose. I came across such an issue and also required them to be imported in addition to adding them to __all__. This is the code I came up with to make it work in my case. I didn't have any sub packages, so this code works only at the top level.
modules = glob.glob(os.path.dirname(__file__) + '/*.py')
__all__ = []
for mod in modules:
if not str(mod).endswith('__init__.py'):
package_prefix = __name__ + '.'
module_name = str(mod)[str(mod).rfind('\\') + 1:-3]
__all__.append(module_name)
__import__(package_prefix + module_name, globals(), locals(), [''])

Python dynamic import and __all__

I am facing a behaviour that I don't understand even through I feel it is a very basic question...
Imagine you have a python package mypackage containing a module mymodule with 2 files __init__.py and my_object.py. Last file contains a class named MyObject.
I am trying to right an automatic import within the __init__.py that is an equivalent to :
__init__.py
__all__ = ['MyObject']
from my_object import MyObject
in order to be able to do:
from mypackge.mymodule import MyObject
I came up with a solution that fill all with all classes' names. It uses __import__ (also tried the importlib.import_module() method) but when I try to import MyObject from the mymodule, it keeps telling me:
ImportError: cannot import name MyObject
Here is the script I started with :
classes = []
for module in os.listdir(os.path.dirname(__file__)):
if module != '__init__.py' and module[-3:] == '.py':
module_name = module[:-3]
import_name = '%s.%s' % (__name__, module_name)
# Import module here! Looking for an equivalent to 'from module import MyObject'
# importlib.import_module(import_name) # Same behaviour
__import__(import_name, globals(), locals(), ['*'])
members = inspect.getmembers(sys.modules[import_name], lambda member: inspect.isclass(member) and member.__module__.startswith(__name__) )
names = [member[0] for member in members]
classes.extend(names)
__all__ = classes
Can someone help me on that point ?
Many thanks,
You just need to assign attributes on the module: globals().update(members).

How do I change the name of an imported library?

I am using jython with a third party application. The third party application has some builtin libraries foo. To do some (unit) testing we want to run some code outside of the application. Since foo is bound to the application we decided to write our own mock implementation.
However there is one issue, we implemented our mock class in python while their class is in java. Thus to use their code one would do import foo and foo is the mock class afterwards. However if we import the python module like this we get the module attached to the name, thus one has to write foo.foo to get to the class.
For convenience reason we would love to be able to write from ourlib.thirdparty import foo to bind foo to the foo-class. However we would like to avoid to import all the classes in ourlib.thirdparty directly, since the loading time for each file takes quite a while.
Is there any way to this in python? ( I did not get far with Import hooks I tried simply returning the class from load_module or overwriting what I write to sys.modules (I think both approaches are ugly, particularly the later))
edit:
ok: here is what the files in ourlib.thirdparty look like simplified(without magic):
foo.py:
try:
import foo
except ImportError:
class foo
....
Actually they look like this:
foo.py:
class foo
....
__init__.py in ourlib.thirdparty
import sys
import os.path
import imp
#TODO: 3.0 importlib.util abstract base classes could greatly simplify this code or make it prettier.
class Importer(object):
def __init__(self, path_entry):
if not path_entry.startswith(os.path.join(os.path.dirname(__file__), 'thirdparty')):
raise ImportError('Custom importer only for thirdparty objects')
self._importTuples = {}
def find_module(self, fullname):
module = fullname.rpartition('.')[2]
try:
if fullname not in self._importTuples:
fileObj, self._importTuples[fullname] = imp.find_module(module)
if isinstance(fileObj, file):
fileObj.close()
except:
print 'backup'
path = os.path.join(os.path.join(os.path.dirname(__file__), 'thirdparty'), module+'.py')
if not os.path.isfile(path):
return None
raise ImportError("Could not find dummy class for %s (%s)\n(searched:%s)" % (module, fullname, path))
self._importTuples[fullname] = path, ('.py', 'r', imp.PY_SOURCE)
return self
def load_module(self, fullname):
fp = None
python = False
print fullname
if self._importTuples[fullname][1][2] in (imp.PY_SOURCE, imp.PY_COMPILED, imp.PY_FROZEN):
fp = open( self._importTuples[fullname][0], self._importTuples[fullname][1][1])
python = True
try:
imp.load_module(fullname, fp, *self._importTuples[fullname])
finally:
if python:
module = fullname.rpartition('.')[2]
#setattr(sys.modules[fullname], module, getattr(sys.modules[fullname], module))
#sys.modules[fullname] = getattr(sys.modules[fullname], module)
if isinstance(fp, file):
fp.close()
return getattr(sys.modules[fullname], module)
sys.path_hooks.append(Importer)
As others have remarked, it is such a plain thing in Python that the import statement iself has a syntax for that:
from foo import foo as original_foo, for example -
or even import foo as module_foo
Interesting to note is that the import statemente binds a name to the imported module or object ont he local context - however, the dictionary sys.modules (on the moduels sys of course), is a live reference to all imported modules, using their names as a key. This mechanism plays a key role in avoding that Python re-reads and re-executes and already imported module , when running (that is, if various of yoru modules or sub-modules import the samefoo` module, it is just read once -- the subsequent imports use the reference stored in sys.modules).
And -- besides the "import...as" syntax, modules in Python are just another object: you can assign any other name to them in run time.
So, the following code would also work perfectly for you:
import foo
original_foo = foo
class foo(Mock):
...

How can I discover classes in a specific package in python?

I have a package of plug-in style modules. It looks like this:
/Plugins
/Plugins/__init__.py
/Plugins/Plugin1.py
/Plugins/Plugin2.py
etc...
Each .py file contains a class that derives from PluginBaseClass. So I need to list every module in the Plugins package and then search for any classes that implement PluginBaseClass. Ideally I want to be able to do something like this:
for klass in iter_plugins(project.Plugins):
action = klass()
action.run()
I have seen some other answers out there, but my situation is different. I have an actual import to the base package (ie: import project.Plugins) and I need to find the classes after discovering the modules.
Edit: here's a revised solution. I realised I was making a mistake while testing my previous one, and it doesn't really work the way you would expect. So here is a more complete solution:
import os
from imp import find_module
from types import ModuleType, ClassType
def iter_plugins(package):
"""Receives package (as a string) and, for all of its contained modules,
generates all classes that are subclasses of PluginBaseClass."""
# Despite the function name, "find_module" will find the package
# (the "filename" part of the return value will be None, in this case)
filename, path, description = find_module(package)
# dir(some_package) will not list the modules within the package,
# so we explicitly look for files. If you need to recursively descend
# a directory tree, you can adapt this to use os.walk instead of os.listdir
modules = sorted(set(i.partition('.')[0]
for i in os.listdir(path)
if i.endswith(('.py', '.pyc', '.pyo'))
and not i.startswith('__init__.py')))
pkg = __import__(package, fromlist=modules)
for m in modules:
module = getattr(pkg, m)
if type(module) == ModuleType:
for c in dir(module):
klass = getattr(module, c)
if (type(klass) == ClassType and
klass is not PluginBaseClass and
issubclass(klass, PluginBaseClass)):
yield klass
My previous solution was:
You could try something like:
from types import ModuleType
import Plugins
classes = []
for item in dir(Plugins):
module = getattr(Plugins, item)
# Get all (and only) modules in Plugins
if type(module) == ModuleType:
for c in dir(module):
klass = getattr(module, c)
if isinstance(klass, PluginBaseClass):
classes.append(klass)
Actually, even better, if you want some modularity:
from types import ModuleType
def iter_plugins(package):
# This assumes "package" is a package name.
# If it's the package itself, you can remove this __import__
pkg = __import__(package)
for item in dir(pkg):
module = getattr(pkg, item)
if type(module) == ModuleType:
for c in dir(module):
klass = getattr(module, c)
if issubclass(klass, PluginBaseClass):
yield klass
You may (and probably should) define __all__ in __init__.py as a list of the submodules in your package; this is so that you support people doing from Plugins import *. If you have done so, you can iterate over the modules with
import Plugins
import sys
modules = { }
for module in Plugins.__all__:
__import__( module )
modules[ module ] = sys.modules[ module ]
# iterate over dir( module ) as above
The reason another answer posted here fails is that __import__ imports the lowest-level module, but returns the top-level one (see the docs). I don't know why.
Scanning modules isn't good idea. If you need class registry you should look at metaclasses or use existing solutions like zope.interface.
Simple solution through metaclasses may look like that:
from functools import reduce
class DerivationRegistry(type):
def __init__(cls,name,bases,cls_dict):
type.__init__(cls,name,bases,cls_dict)
cls._subclasses = set()
for base in bases:
if isinstance(base,DerivationRegistry):
base._subclasses.add(cls)
def getSubclasses(cls):
return reduce( set.union,
( succ.getSubclasses() for succ in cls._subclasses if isinstance(succ,DerivationRegistry)),
cls._subclasses)
class Base(object):
__metaclass__ = DerivationRegistry
class Cls1(object):
pass
class Cls2(Base):
pass
class Cls3(Cls2,Cls1):
pass
class Cls4(Cls3):
pass
print(Base.getSubclasses())
If you don't know what's going to be in Plugins ahead of time, you can get a list of python files in the package's directory, and import them like so:
# compute a list of modules in the Plugins package
import os
import Plugins
plugin_modules = [f[:-3] for f in os.listdir(os.path.dirname(Plugins.__file__))
if f.endswith('.py') and f != '__init__.py']
Sorry, that comprehension might be a mouthful for someone relatively new to python. Here's a more verbose version (might be easier to follow):
plugin_modules = []
package_path = Plugins.__file__
file_list = os.listdir(os.path.dirname(package_path))
for file_name in file_list:
if file_name.endswith('.py') and file_name != '__init__.py':
plugin_modules.append(file_name)
Then you can use __import__ to get the module:
# get the first one
plugin = __import__('Plugins.' + plugin_modules[0])

Categories

Resources