Pickling dynamically created types

Pickling dynamically created types - python

I've been trying to get some dynamically created types (i.e. ones created by calling 3-arg type()) to pickle and unpickle nicely. I've been using this module switching trick to hide the details from users of the module and give clean semantics.
I've learned several things already:
The type must be findable with getattr on the module itself
The type must be consistent with what getattr finds, that is to say if we call pickle.dumps(o) then it must be true that type(o) == getattr(module, 'name of type')
Where I'm stuck though is that there still seems to be something odd going on - it seems to be calling __getstate__ on something unexpected.
Here's the simplest setup I've got that reproduces the issue, testing with Python 3.5, but I'd like to target back to 3.3 if possible:
# module.py
import sys
import functools
def dump(self):
return b'Some data' # Dummy for testing
def undump(self, data):
print('Undump: %r' % data) # Do nothing for testing
# Cheaty demo way to make this consistent
#functools.lru_cache(maxsize=None)
def make_type(name):
return type(name, (), {
'__getstate__': dump,
'__setstate__': undump,
})
class Magic(object):
def __init__(self, path):
self.path = path
def __getattr__(self, name):
print('Getting thing: %s (from: %s)' % (name, self.path))
# for simple testing all calls to make_type must end in last x.y.z.last
if name != 'last':
if self.path:
return Magic(self.path + '.' + name)
else:
return Magic(name)
return make_type(self.path + '.' + name)
# Make the switch
sys.modules[__name__] = Magic('')
And then a quick way to exercise that:
import module
import pickle
f=module.foo.bar.woof.last()
print(f.__getstate__()) # See, *this* works
print('Pickle starts here')
print(pickle.dumps(f))
Which then gives:
Getting thing: foo (from: )
Getting thing: bar (from: foo)
Getting thing: woof (from: foo.bar)
Getting thing: last (from: foo.bar.woof)
b'Some data'
Pickle starts here
Getting thing: __spec__ (from: )
Getting thing: _initializing (from: __spec__)
Getting thing: foo (from: )
Getting thing: bar (from: foo)
Getting thing: woof (from: foo.bar)
Getting thing: last (from: foo.bar.woof)
Getting thing: __getstate__ (from: foo.bar.woof)
Traceback (most recent call last):
File "test.py", line 7, in <module>
print(pickle.dumps(f))
TypeError: 'Magic' object is not callable
I wasn't expecting to see anything looking up __getstate__ on module.foo.bar.woof, but even if we force that lookup to fail by adding:
if name == '__getstate__': raise AttributeError()
into our __getattr__ it still fails with:
Traceback (most recent call last):
File "test.py", line 7, in <module>
print(pickle.dumps(f))
_pickle.PicklingError: Can't pickle <class 'module.Magic'>: it's not the same object as module.Magic
What gives? Am I missing something with __spec__? The docs for __spec__ pretty much just stress setting it appropriately, but don't seem to actually explain much.
More importantly the bigger question is how am I supposed to go about making types I programatically generated via a pseudo module's __getattr__ implementation pickle properly?
(And obviously once I've managed to get pickle.dumps to produce something I expect pickle.loads to call undump with the same thing)

To pickle f, pickle needs to pickle f's class, module.foo.bar.woof.last.
The docs don't claim support for pickling arbitrary classes. They claim the following:
The following types can be pickled:
...
classes that are defined at the top level of a module
module.foo.bar.woof.last isn't defined at the top level of a module, even a pretend module like module. In this not-officially-supported case, the pickle logic ends up trying to pickle module.foo.bar.woof, either here:
elif parent is not module:
self.save_reduce(getattr, (parent, lastname))
or here
else if (parent != module) {
PickleState *st = _Pickle_GetGlobalState();
PyObject *reduce_value = Py_BuildValue("(O(OO))",
st->getattr, parent, lastname);
status = save_reduce(self, reduce_value, NULL);
module.foo.bar.woof can't be pickled for multiple reasons. It returns a non-callable Magic instance for all unsupported method lookups, like __getstate__, which is where your first error comes from. The module-switching thing prevents finding the Magic class to pickle it, which is where your second error comes from. There are probably more incompatibilities.

As it seems, and is already proven that making the class callable is just a drifting out another wrong direction, thankfully to this hack, I could find a getaround to make the class reiterable by its TYPE. following the context of the error <class 'module.Magic'>: it's not the same object as module.Magic the pickler doesn't iterate through the same call that renders a different type from the other one, this is a major common problem with pickling self instanciating classes, for this instance, an object by its class, there for the solution is patching the class with its type #mock.patch('module.Magic', type(module.Magic)) this is a short answer for a something.
Main.py
import module
import pickle
import mock
f=module1.foo.bar.woof.last
print(f().__getstate__()) # See, *this* works
print('Pickle starts here')
#mock.patch('module1.Magic', type(module1.Magic))
def pickleit():
return pickle.dumps(f())
print(pickleit())
Magic class
class Magic(object):
def __init__(self, value):
self.path = value
__class__: lambda x:x
def __getstate__(self):
print ("Shoot me! i'm at " + self.path )
return dump(self)
def __setstate__(self,value):
print ('something will never occur')
return undump(self,value)
def __spec__(self):
print ("Wrong side of the planet ")
def _initializing(self):
print ("Even farther lost ")
def __getattr__(self, name):
print('Getting thing: %s (from: %s)' % (name, self.path))
# for simple testing all calls to make_type must end in last x.y.z.last
if name != 'last':
if self.path:
return Magic(self.path + '.' + name)
else:
return Magic(name)
print('terminal stage' )
return make_type(self.path + '.' + name)
Even assuming this is not more of striking the ball by the edge of the bat, I could see the content dumped into my console.

Related

multiprocessing and modules

I am attempting to use multiprocessing to call derived class member function defined in a different module. There seem to be several questions dealing with calling class methods from the same module, but none from different modules. For example, if I have the following structure:
main.py
multi/
__init__.py (empty)
base.py
derived.py
main.py
from multi.derived import derived
from multi.base import base
if __name__ == '__main__':
base().multiFunction()
derived().multiFunction()
base.py
import multiprocessing;
# The following two functions wrap calling a class method
def wrapPoolMapArgs(classInstance, functionName, argumentLists):
className = classInstance.__class__.__name__
return zip([className] * len(argumentLists), [functionName] * len(argumentLists), [classInstance] * len(argumentLists), argumentLists)
def executeWrappedPoolMap(args, **kwargs):
classType = eval(args[0])
funcType = getattr(classType, args[1])
funcType(args[2], args[3:], **kwargs)
class base:
def multiFunction(self):
mppool = multiprocessing.Pool()
mppool.map(executeWrappedPoolMap, wrapPoolMapArgs(self, 'method', range(3)))
def method(self,args):
print "base.method: " + args.__str__()
derived.py
from base import base
class derived(base):
def method(self,args):
print "derived.method: " + args.__str__()
Output
base.method: (0,)
base.method: (1,)
base.method: (2,)
Traceback (most recent call last):
File "e:\temp\main.py", line 6, in <module>
derived().multiFunction()
File "e:\temp\multi\base.py", line 15, in multiFunction
mppool.map(executeWrappedPoolMap, wrapPoolMapArgs(self, 'method', range(3)))
File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 567, in get
raise self._value
NameError: name 'derived' is not defined
I have tried fully qualifying the class name in the wrapPoolMethodArgs method, but that just gives the same error, saying multi is not defined.
Is there someway to achieve this, or must I restructure to have all classes in the same package if I want to use multiprocessing with inheritance?

This is almost certainly caused by the ridiculous eval based approach to dynamically invoking specific code.
In executeWrappedPoolMap (in base.py), you convert a str name of a class to the class itself with classType = eval(args[0]). But eval is executed in the scope of executeWrappedPoolMap, which is in base.py, and can't find derived (because the name doesn't exist in base.py).
Stop passing the name, and pass the class object itself, passing classInstance.__class__ instead of classInstance.__class__.__name__; multiprocessing will pickle it for you, and you can use it directly on the other end, instead of using eval (which is nearly always wrong; it's code smell of the strongest sort).
BTW, the reason the traceback isn't super helpful is that the exception is raised in the worker, caught, pickle-ed, and sent back to the main process and re-raise-ed. The traceback you see is from that re-raise, not where the NameError actually occurred (which was in the eval line).

Checking unwanted type change in Python

I come from static-type programming and I'm interested in understanding the rationale behind dynamic-type programming to check if dynamic-type languages can better fit my needs.
I've read about the theory behind duck programming. I've also read that unit testing (desirable and used in static-type programming) becomes a need in dynamic languages where compile-time checks are missing.
However, I'm still afraid to miss the big picture. In particular, how can you check for a mistake where a variable type is accidentally changed ?
Let's make a very simple example in Python:
#! /usr/bin/env python
userid = 3
defaultname = "foo"
username = raw_input("Enter your name: ")
if username == defaultname:
# Bug: here we meant userid...
username = 2
# Here username can be either an int or a string
# depending on the branch taken.
import re
match_string = re.compile("oo")
if (match_string.match(username)):
print "Match!"
Pylint, pychecker and pyflakes do not warn about this issue.
What is the Pythonic way of dealing with this kind of errors ?
Should the code be wrapped with a try/catch ?

This will not give you checks at compile time, but as you suggested using a try/catch, I will assume that runtime checks would also be helpful.
If you use classes, you could hook your own type checks in the __setattr__ method. For example:
import datetime
# ------------------------------------------------------------------------------
# TypedObject
# ------------------------------------------------------------------------------
class TypedObject(object):
attr_types = {'id' : int,
'start_time' : datetime.time,
'duration' : float}
__slots__ = attr_types.keys()
# --------------------------------------------------------------------------
# __setattr__
# --------------------------------------------------------------------------
def __setattr__(self, name, value):
if name not in self.__slots__:
raise AttributeError(
"'%s' object has no attribute '%s'"
% (self.__class__.__name__, name))
if type(value) is not self.attr_types[name]:
raise TypeError(
"'%s' object attribute '%s' must be of type '%s'"
% (self.__class__.__name__, name,
self.attr_types[name].__name__))
# call __setattr__ on parent class
super(MyTypedObject, self).__setattr__(name, value)
That would result in:
>>> my_typed_object = TypedObject()
>>> my_typed_object.id = "XYZ" # ERROR
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 28, in __setattr__
TypeError: 'MyTypedObject' object attribute 'id' must be of type 'int'
>>> my_typed_object.id = 123 # OK
You could go on and make the TypedObject above more generic, so that your classes could inherit from it.
Another (probably better) solution (pointed out here) could be to use Entought Traits

Python pickle crash when trying to return default value in getattr

I have a dictionary like class that I use to store some values as attributes. I recently added some logic(__getattr__) to return None if an attribute doesn't exist. As soon as I did this pickle crashed, and I wanted some insight into why?
Test Code:
import cPickle
class DictionaryLike(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __iter__(self):
return iter(self.__dict__)
def __getitem__(self, key):
if(self.__dict__.has_key(key)):
return self.__dict__[key]
else:
return None
''' This is the culprit...'''
def __getattr__(self, key):
print 'Retreiving Value ' , key
return self.__getitem__(key)
class SomeClass(object):
def __init__(self, kwargs={}):
self.args = DictionaryLike(**kwargs)
someClass = SomeClass()
content = cPickle.dumps(someClass,-1)
print content
Result:
Retreiving Value __getnewargs__
Traceback (most recent call last):
File <<file>> line 29, in <module>
content = cPickle.dumps(someClass,-1)
TypeError: 'NoneType' object is not callable`
Did I do something stupid? I had read a post that deepcopy() might require that I throw an exception if a key doesn't exist? If this is the case is there any easy way to achieve what I want without throwing an exception?
End result is that if some calls
someClass.args.i_dont_exist
I want it to return None.

Implementing __getattr__ is a bit tricky, since it is called for every non-existing attribute. In your case, the pickle module tests your class for the __getnewargs__ special method and receives None, which is obviously not callable.
You might want to alter __getattr__ to call the base implementation for magic names:
def __getattr__(self, key):
if key.startswith('__') and key.endswith('__'):
return super(DictionaryLike, self).__getattr__(key)
return self.__getitem__(key)
I usually pass through all names starting with an underscore, so that I can sidestep the magic for internal symbols.

You need to raise an AttributeError when an attribute is not present in your class:
def __getattr__(self, key):
i = self.__getitem__(key)
if i == None:
raise AttributeError
return self.__getitem__(key)
I am going to assume that this behavior is required. From the python documentation for getattr, "Called when an attribute lookup has not found the attribute in the usual places (i.e. it is not an instance attribute nor is it found in the class tree for self). name is the attribute name. This method should return the (computed) attribute value or raise an AttributeError exception."
There is no way to tell pickle etc that the attribute it's looking for is not found unless you raise the exception. For example, in your error message pickle is looking for a special callable method called __getnewargs__, pickle expects that if the AttributeError exception is not found the return value is callable.
I guess one potential work around you could perhaps try defining all of the special methods pickle is looking for as dummy methods?

Python pickle - how does it break?

Everyone knows pickle is not a secure way to store user data. It even says so on the box.
I'm looking for examples of strings or data structures that break pickle parsing in the current supported versions of cPython >= 2.4. Are there things that can be pickled but not unpickled? Are there problems with particular unicode characters? Really big data structures? Obviously the old ASCII protocol has some issues, but what about the most current binary form?
I'm particularly curious about ways in which the pickle loads operation can fail, especially when given a string produced by pickle itself. Are there any circumstances in which pickle will continue parsing past the .?
What sort of edge cases are there?
Edit: Here are some examples of the sort of thing I'm looking for:
In Python 2.4, you can pickle an array without error, but you can't unpickle it. http://bugs.python.org/issue1281383
You can't reliably pickle objects that inherit from dict and call __setitem__ before instance variables are set with __setstate__. This can be a gotcha when pickling Cookie objects. See http://bugs.python.org/issue964868 and http://bugs.python.org/issue826897
Python 2.4 (and 2.5?) will return a pickle value for infinity (or values close to it like 1e100000), but may (depending on platform) fail when loading. See http://bugs.python.org/issue880990 and http://bugs.python.org/issue445484
This last item is interesting because it reveals a case where the STOP marker does not actually stop parsing - when the marker exists as part of a literal, or more generally, when not preceded by a newline.

This is a greatly simplified example of what pickle didn't like about my data structure.
import cPickle as pickle
class Member(object):
def __init__(self, key):
self.key = key
self.pool = None
def __hash__(self):
return self.key
class Pool(object):
def __init__(self):
self.members = set()
def add_member(self, member):
self.members.add(member)
member.pool = self
member = Member(1)
pool = Pool()
pool.add_member(member)
with open("test.pkl", "w") as f:
pickle.dump(member, f, pickle.HIGHEST_PROTOCOL)
with open("test.pkl", "r") as f:
x = pickle.load(f)
Pickle is known to be a little funny with circular structures, but if you toss custom hash functions and sets/dicts into the mix then things get quite hairy.
In this particular example it partially unpickles the member and then encounters the pool. So it then partially unpickles the pool and encounters the members set. So it creates the set and tries to add the partially unpickled member to the set. At which point it dies in the custom hash function, because the member is only partially unpickled. I dread to think what might happen if you had an "if hasattr..." in the hash function.
$ python --version
Python 2.6.5
$ python test.py
Traceback (most recent call last):
File "test.py", line 25, in <module>
x = pickle.load(f)
File "test.py", line 8, in __hash__
return self.key
AttributeError: ("'Member' object has no attribute 'key'", <type 'set'>, ([<__main__.Member object at 0xb76cdaac>],))

If you are interested in how things fail with pickle (or cPickle, as it's just a slightly different import), you can use this growing list of all the different object types in python to test against fairly easily.
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
The package dill includes functions that discover how an object fails to pickle, for example by catching the error it throws and returning it to the user.
dill.dill has these functions, which you could also build for pickle or cPickle, simply with a cut-and-paste and an import pickle or import cPickle as pickle (or import dill as pickle):
def copy(obj, *args, **kwds):
"""use pickling to 'copy' an object"""
return loads(dumps(obj, *args, **kwds))
# quick sanity checking
def pickles(obj,exact=False,safe=False,**kwds):
"""quick check if object pickles with dill"""
if safe: exceptions = (Exception,) # RuntimeError, ValueError
else:
exceptions = (TypeError, AssertionError, PicklingError, UnpicklingError)
try:
pik = copy(obj, **kwds)
try:
result = bool(pik.all() == obj.all())
except AttributeError:
result = pik == obj
if result: return True
if not exact:
return type(pik) == type(obj)
return False
except exceptions:
return False
and includes these in dill.detect:
def baditems(obj, exact=False, safe=False): #XXX: obj=globals() ?
"""get items in object that fail to pickle"""
if not hasattr(obj,'__iter__'): # is not iterable
return [j for j in (badobjects(obj,0,exact,safe),) if j is not None]
obj = obj.values() if getattr(obj,'values',None) else obj
_obj = [] # can't use a set, as items may be unhashable
[_obj.append(badobjects(i,0,exact,safe)) for i in obj if i not in _obj]
return [j for j in _obj if j is not None]
def badobjects(obj, depth=0, exact=False, safe=False):
"""get objects that fail to pickle"""
if not depth:
if pickles(obj,exact,safe): return None
return obj
return dict(((attr, badobjects(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
def badtypes(obj, depth=0, exact=False, safe=False):
"""get types for objects that fail to pickle"""
if not depth:
if pickles(obj,exact,safe): return None
return type(obj)
return dict(((attr, badtypes(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
and this last function, which is what you can use to test the objects in dill._objects
def errors(obj, depth=0, exact=False, safe=False):
"""get errors for objects that fail to pickle"""
if not depth:
try:
pik = copy(obj)
if exact:
assert pik == obj, \
"Unpickling produces %s instead of %s" % (pik,obj)
assert type(pik) == type(obj), \
"Unpickling produces %s instead of %s" % (type(pik),type(obj))
return None
except Exception:
import sys
return sys.exc_info()[1]
return dict(((attr, errors(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

It is possible to pickle class instances. If I knew what classes your application uses, then I could subvert them. A contrived example:
import subprocess
class Command(object):
def __init__(self, command):
self._command = self._sanitize(command)
#staticmethod
def _sanitize(command):
return filter(lambda c: c in string.letters, command)
def run(self):
subprocess.call('/usr/lib/myprog/%s' % self._command, shell=True)
Now if your program creates Command instances and saves them using pickle, and I could subvert or inject into that storage, then I could run any command I choose by setting self._command directly.
In practice my example should never pass for secure code anyway. But note that if the sanitize function is secure, then so is the entire class, apart from the possible use of pickle from untrusted data breaking this. Therefore, there exist programs which are secure but can be made insecure by the inappropriate use of pickle.
The danger is that your pickle-using code could be subverted along the same principle but in innocent-looking code where the vulnerability is far less obvious. The best thing to do is to always avoid using pickle to load untrusted data.

How to override built-in getattr in Python?

I know how to override an object's getattr() to handle calls to undefined object functions. However, I would like to achieve the same behavior for the builtin getattr() function. For instance, consider code like this:
call_some_undefined_function()
Normally, that simply produces an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'call_some_undefined_function' is not defined
I want to override getattr() so that I can intercept the call to "call_some_undefined_function()" and figure out what to do.
Is this possible?

I can only think of a way to do this by calling eval.
class Global(dict):
def undefined(self, *args, **kargs):
return u'ran undefined'
def __getitem__(self, key):
if dict.has_key(self, key):
return dict.__getitem__(self, key)
return self.undefined
src = """
def foo():
return u'ran foo'
print foo()
print callme(1,2)
"""
code = compile(src, '<no file>', 'exec')
globals = Global()
eval(code, globals)
The above outputs
ran foo
ran undefined

You haven't said why you're trying to do this. I had a use case where I wanted to be capable of handling typos that I made during interactive Python sessions, so I put this into my python startup file:
import sys
import re
def nameErrorHandler(type, value, traceback):
if not isinstance(value, NameError):
# Let the normal error handler handle this:
nameErrorHandler.originalExceptHookFunction(type, value, traceback)
name = re.search(r"'(\S+)'", value.message).group(1)
# At this point we know that there was an attempt to use name
# which ended up not being defined anywhere.
# Handle this however you want...
nameErrorHandler.originalExceptHookFunction = sys.excepthook
sys.excepthook = nameErrorHandler
Hopefully this is helpful for anyone in the future who wants to have a special error handler for undefined names... whether this is helpful for the OP or not is unknown since they never actually told us what their intended use-case was.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.