What's the exact usage of __reduce__ in Pickler - python

I know that in order to be picklable, a class has to overwrite __reduce__ method, and it has to return string or tuple.
How does this function work?
What the exact usage of __reduce__? When will it been used?

When you try to pickle an object, there might be some properties that don't serialize well. One example of this is an open file handle. Pickle won't know how to handle the object and will throw an error.
You can tell the pickle module how to handle these types of objects natively within a class directly. Lets see an example of an object which has a single property; an open file handle:
import pickle
class Test(object):
def __init__(self, file_path="test1234567890.txt"):
# An open file in write mode
self.some_file_i_have_opened = open(file_path, 'wb')
my_test = Test()
# Now, watch what happens when we try to pickle this object:
pickle.dumps(my_test)
It should fail and give a traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
--- snip snip a lot of lines ---
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects
However, had we defined a __reduce__ method in our Test class, pickle would have known how to serialize this object:
import pickle
class Test(object):
def __init__(self, file_path="test1234567890.txt"):
# Used later in __reduce__
self._file_name_we_opened = file_path
# An open file in write mode
self.some_file_i_have_opened = open(self._file_name_we_opened, 'wb')
def __reduce__(self):
# we return a tuple of class_name to call,
# and optional parameters to pass when re-creating
return (self.__class__, (self._file_name_we_opened, ))
my_test = Test()
saved_object = pickle.dumps(my_test)
# Just print the representation of the string of the object,
# because it contains newlines.
print(repr(saved_object))
This should give you something like: "c__main__\nTest\np0\n(S'test1234567890.txt'\np1\ntp2\nRp3\n.", which can be used to recreate the object with open file handles:
print(vars(pickle.loads(saved_object)))
In general, the __reduce__ method needs to return a tuple with at least two elements:
A blank object class to call. In this case, self.__class__
A tuple of arguments to pass to the class constructor. In the example it's a single string, which is the path to the file to open.
Consult the docs for a detailed explanation of what else the __reduce__ method can return.

Related

bonobo method failed after overriden

I am using a light ETL library called bonobo.
The csv writer bonobo.CsvWriter class has a factory method:
def writer_factory(self, file):
return csv.writer(file, **self.get_dialect_kwargs()).writerow
with the docs:
class CsvWriter(FileWriter, CsvHandler):
#Method(
__doc__='''
Builds the CSV writer, a.k.a an object we can pass a field collection to be written as one line in the
target file.
Defaults to builtin csv.writer(...).writerow, but can be overriden to fit your special needs.
'''
)
I'd like to add some extra parameters to customize my csv file, so I try to override it as such:
class quoCsvWriter(bonobo.CsvWriter):
def writer_factory(self, file):
return csv.writer(file, **self.get_dialect_kwargs(),quoting=csv.QUOTE_NONNUMERIC).writerow
when I add the node into the chain, the programs shows:
Traceback (most recent call last):
File "geocoding.py", line 162, in <module>
get_graph(),
File "geocoding.py", line 135, in get_graph
quoCsvWriter('db_addresses.csv')
File "/Users/xxxx/xxxx/lib/python3.6/site-packages/bonobo/config/configurables.py", line 152, in __new__
missing.remove(name)
KeyError: 'writer_factory'
Any hints are appreciated.
Update:
meanwhile when I try to do
bonobo.CsvWriter('filename.csv',quoting=csv.QUOTE_MINIMAL)
it throws error:
TypeError "quoting" must be an integer
As of bonobo 0.6, overriding directly Method instances in subclasses is non trivial. Instead, you should provide an overriden implementation in the constructor arguments.
def writer_factory(self, file):
return csv.writer(file, **{**self.get_dialect_kwargs(), 'quoting': csv.QUOTE_NONNUMERIC}).writerow
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
extract,
bonobo.CsvWriter('...', writer_factory=writer_factory),
)
return graph
If you really want to subclass for this use case, you can do it by overriding get_dialect_kwargs() method instead:
#use_context
class QuoteNonNumericCsvWriter(bonobo.CsvWriter):
def get_dialect_kwargs(self):
return {
**super().get_dialect_kwargs(),
'quoting': csv.QUOTE_NONNUMERIC,
}
This should work as expected.
Of course, overriding quoting is possible directly from writer constructor as of bonobo 0.6.2, there was a bug before around this but fix is now released.
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
extract,
bonobo.CsvWriter('...', quoting=csv.QUOTE_NONNUMERIC),
)
All three methods have the exact same behaviour, you should favour the last one.
Hope that helps.

Pickling dynamically created types

I've been trying to get some dynamically created types (i.e. ones created by calling 3-arg type()) to pickle and unpickle nicely. I've been using this module switching trick to hide the details from users of the module and give clean semantics.
I've learned several things already:
The type must be findable with getattr on the module itself
The type must be consistent with what getattr finds, that is to say if we call pickle.dumps(o) then it must be true that type(o) == getattr(module, 'name of type')
Where I'm stuck though is that there still seems to be something odd going on - it seems to be calling __getstate__ on something unexpected.
Here's the simplest setup I've got that reproduces the issue, testing with Python 3.5, but I'd like to target back to 3.3 if possible:
# module.py
import sys
import functools
def dump(self):
return b'Some data' # Dummy for testing
def undump(self, data):
print('Undump: %r' % data) # Do nothing for testing
# Cheaty demo way to make this consistent
#functools.lru_cache(maxsize=None)
def make_type(name):
return type(name, (), {
'__getstate__': dump,
'__setstate__': undump,
})
class Magic(object):
def __init__(self, path):
self.path = path
def __getattr__(self, name):
print('Getting thing: %s (from: %s)' % (name, self.path))
# for simple testing all calls to make_type must end in last x.y.z.last
if name != 'last':
if self.path:
return Magic(self.path + '.' + name)
else:
return Magic(name)
return make_type(self.path + '.' + name)
# Make the switch
sys.modules[__name__] = Magic('')
And then a quick way to exercise that:
import module
import pickle
f=module.foo.bar.woof.last()
print(f.__getstate__()) # See, *this* works
print('Pickle starts here')
print(pickle.dumps(f))
Which then gives:
Getting thing: foo (from: )
Getting thing: bar (from: foo)
Getting thing: woof (from: foo.bar)
Getting thing: last (from: foo.bar.woof)
b'Some data'
Pickle starts here
Getting thing: __spec__ (from: )
Getting thing: _initializing (from: __spec__)
Getting thing: foo (from: )
Getting thing: bar (from: foo)
Getting thing: woof (from: foo.bar)
Getting thing: last (from: foo.bar.woof)
Getting thing: __getstate__ (from: foo.bar.woof)
Traceback (most recent call last):
File "test.py", line 7, in <module>
print(pickle.dumps(f))
TypeError: 'Magic' object is not callable
I wasn't expecting to see anything looking up __getstate__ on module.foo.bar.woof, but even if we force that lookup to fail by adding:
if name == '__getstate__': raise AttributeError()
into our __getattr__ it still fails with:
Traceback (most recent call last):
File "test.py", line 7, in <module>
print(pickle.dumps(f))
_pickle.PicklingError: Can't pickle <class 'module.Magic'>: it's not the same object as module.Magic
What gives? Am I missing something with __spec__? The docs for __spec__ pretty much just stress setting it appropriately, but don't seem to actually explain much.
More importantly the bigger question is how am I supposed to go about making types I programatically generated via a pseudo module's __getattr__ implementation pickle properly?
(And obviously once I've managed to get pickle.dumps to produce something I expect pickle.loads to call undump with the same thing)
To pickle f, pickle needs to pickle f's class, module.foo.bar.woof.last.
The docs don't claim support for pickling arbitrary classes. They claim the following:
The following types can be pickled:
...
classes that are defined at the top level of a module
module.foo.bar.woof.last isn't defined at the top level of a module, even a pretend module like module. In this not-officially-supported case, the pickle logic ends up trying to pickle module.foo.bar.woof, either here:
elif parent is not module:
self.save_reduce(getattr, (parent, lastname))
or here
else if (parent != module) {
PickleState *st = _Pickle_GetGlobalState();
PyObject *reduce_value = Py_BuildValue("(O(OO))",
st->getattr, parent, lastname);
status = save_reduce(self, reduce_value, NULL);
module.foo.bar.woof can't be pickled for multiple reasons. It returns a non-callable Magic instance for all unsupported method lookups, like __getstate__, which is where your first error comes from. The module-switching thing prevents finding the Magic class to pickle it, which is where your second error comes from. There are probably more incompatibilities.
As it seems, and is already proven that making the class callable is just a drifting out another wrong direction, thankfully to this hack, I could find a getaround to make the class reiterable by its TYPE. following the context of the error <class 'module.Magic'>: it's not the same object as module.Magic the pickler doesn't iterate through the same call that renders a different type from the other one, this is a major common problem with pickling self instanciating classes, for this instance, an object by its class, there for the solution is patching the class with its type #mock.patch('module.Magic', type(module.Magic)) this is a short answer for a something.
Main.py
import module
import pickle
import mock
f=module1.foo.bar.woof.last
print(f().__getstate__()) # See, *this* works
print('Pickle starts here')
#mock.patch('module1.Magic', type(module1.Magic))
def pickleit():
return pickle.dumps(f())
print(pickleit())
Magic class
class Magic(object):
def __init__(self, value):
self.path = value
__class__: lambda x:x
def __getstate__(self):
print ("Shoot me! i'm at " + self.path )
return dump(self)
def __setstate__(self,value):
print ('something will never occur')
return undump(self,value)
def __spec__(self):
print ("Wrong side of the planet ")
def _initializing(self):
print ("Even farther lost ")
def __getattr__(self, name):
print('Getting thing: %s (from: %s)' % (name, self.path))
# for simple testing all calls to make_type must end in last x.y.z.last
if name != 'last':
if self.path:
return Magic(self.path + '.' + name)
else:
return Magic(name)
print('terminal stage' )
return make_type(self.path + '.' + name)
Even assuming this is not more of striking the ball by the edge of the bat, I could see the content dumped into my console.

Cannot unpickle Exception subclass

Simplified version of Why is my custom exception unpickle failing.
I am trying to pickle a 'simple' exception subclass. It pickles OK, but when unpickling it falls over:
import pickle
class ABError(Exception):
def __init__(self, a, b):
self.a = a
self.b = b
ab_err = ABError("aaaa", "bbbb")
pickled = pickle.dumps(ab_err)
original = pickle.loads(pickled) # Fails
Error:
Traceback (most recent call last):
File "p.py", line 12, in <module>
original = pickle.loads(pickled) # Fails
File "/usr/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
TypeError: __init__() takes exactly 3 arguments (1 given)
An earlier comment suggested the issue is because the built in Exception class supplies a __setstate_() method. However, it's not clear to me if this is expected behaviour or not - it certainly seems surprising since doing the same thing with a subclass of object works OK.
The BaseException class defines a custom __reduce__ method in exceptions.c, which returns the list of arguments to pass to __init__. Exact code is
if (self->args && self->dict)
return PyTuple_Pack(3, Py_TYPE(self), self->args, self->dict);
else
return PyTuple_Pack(2, Py_TYPE(self), self->args);
According to __reduce__ documentation,
the first item of the tuple is the callable to invoke. Here, that will be the exception class.
the second item is the tuple of arguments to pass to the callable. Here, that will be self.args.
the third item is a dict to merge into self.__dict__.
So from this, BaseException.__reduce__ will make unpickle invoke the exception's constructor with given args.
You have two options: either override __reduce__, or put the required arguments in self.args, either directly or by letting the parent class do it:
import pickle
class ABError(Exception):
def __init__(self, a, b):
self.a = a
self.b = b
# self.args = (a, b)
# maybe better, let base class's __init__ do it =>
super(ABError, self).__init__(a, b)
ab_err = ABError("aaaa", "bbbb")
pickled = pickle.dumps(ab_err)
original = pickle.loads(pickled) # no longer fails
Note that the original issue comes from the rather naive way BaseException pickle handling works. It is fixed in the latest python3 releases. Your question's original code works fine on python 3.5 for instance.

multiprocessing and modules

I am attempting to use multiprocessing to call derived class member function defined in a different module. There seem to be several questions dealing with calling class methods from the same module, but none from different modules. For example, if I have the following structure:
main.py
multi/
__init__.py (empty)
base.py
derived.py
main.py
from multi.derived import derived
from multi.base import base
if __name__ == '__main__':
base().multiFunction()
derived().multiFunction()
base.py
import multiprocessing;
# The following two functions wrap calling a class method
def wrapPoolMapArgs(classInstance, functionName, argumentLists):
className = classInstance.__class__.__name__
return zip([className] * len(argumentLists), [functionName] * len(argumentLists), [classInstance] * len(argumentLists), argumentLists)
def executeWrappedPoolMap(args, **kwargs):
classType = eval(args[0])
funcType = getattr(classType, args[1])
funcType(args[2], args[3:], **kwargs)
class base:
def multiFunction(self):
mppool = multiprocessing.Pool()
mppool.map(executeWrappedPoolMap, wrapPoolMapArgs(self, 'method', range(3)))
def method(self,args):
print "base.method: " + args.__str__()
derived.py
from base import base
class derived(base):
def method(self,args):
print "derived.method: " + args.__str__()
Output
base.method: (0,)
base.method: (1,)
base.method: (2,)
Traceback (most recent call last):
File "e:\temp\main.py", line 6, in <module>
derived().multiFunction()
File "e:\temp\multi\base.py", line 15, in multiFunction
mppool.map(executeWrappedPoolMap, wrapPoolMapArgs(self, 'method', range(3)))
File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Program Files\Python27\lib\multiprocessing\pool.py", line 567, in get
raise self._value
NameError: name 'derived' is not defined
I have tried fully qualifying the class name in the wrapPoolMethodArgs method, but that just gives the same error, saying multi is not defined.
Is there someway to achieve this, or must I restructure to have all classes in the same package if I want to use multiprocessing with inheritance?
This is almost certainly caused by the ridiculous eval based approach to dynamically invoking specific code.
In executeWrappedPoolMap (in base.py), you convert a str name of a class to the class itself with classType = eval(args[0]). But eval is executed in the scope of executeWrappedPoolMap, which is in base.py, and can't find derived (because the name doesn't exist in base.py).
Stop passing the name, and pass the class object itself, passing classInstance.__class__ instead of classInstance.__class__.__name__; multiprocessing will pickle it for you, and you can use it directly on the other end, instead of using eval (which is nearly always wrong; it's code smell of the strongest sort).
BTW, the reason the traceback isn't super helpful is that the exception is raised in the worker, caught, pickle-ed, and sent back to the main process and re-raise-ed. The traceback you see is from that re-raise, not where the NameError actually occurred (which was in the eval line).

How to wrap built-in methods in Python? (or 'how to pass them by reference')

I want to wrap the default open method with a wrapper that should also catch exceptions. Here's a test example that works:
truemethod = open
def fn(*args, **kwargs):
try:
return truemethod(*args, **kwargs)
except (IOError, OSError):
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
open = fn
I want to make a generic method of it:
def wrap(method, exceptions = (OSError, IOError)):
truemethod = method
def fn(*args, **kwargs):
try:
return truemethod(*args, **kwargs)
except exceptions:
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
method = fn
But it doesn't work:
>>> wrap(open)
>>> open
<built-in function open>
Apparently, method is a copy of the parameter, not a reference as I expected. Any pythonic workaround?
The problem with your code is that inside wrap, your method = fn statement is simply changing the local value of method, it isn't changing the larger value of open. You'll have to assign to those names yourself:
def wrap(method, exceptions = (OSError, IOError)):
def fn(*args, **kwargs):
try:
return method(*args, **kwargs)
except exceptions:
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
return fn
open = wrap(open)
foo = wrap(foo)
Try adding global open. In the general case, you might want to look at this section of the manual:
This module provides direct access to all ‘built-in’ identifiers of Python; for example, __builtin__.open is the full name for the built-in function open(). See chapter Built-in Objects.
This module is not normally accessed explicitly by most applications, but can be useful in modules that provide objects with the same name as a built-in value, but in which the built-in of that name is also needed. For example, in a module that wants to implement an open() function that wraps the built-in open(), this module can be used directly:
import __builtin__
def open(path):
f = __builtin__.open(path, 'r')
return UpperCaser(f)
class UpperCaser:
'''Wrapper around a file that converts output to upper-case.'''
def __init__(self, f):
self._f = f
def read(self, count=-1):
return self._f.read(count).upper()
# ...
CPython implementation detail: Most modules have the name __builtins__ (note the 's') made available as part of their globals. The value of __builtins__ is normally either this module or the value of this modules’s __dict__ attribute. Since this is an implementation detail, it may not be used by alternate implementations of Python.
you can just add return fn at the end of your wrap function and then do:
>>> open = wrap(open)
>>> open('bhla')
Traceback (most recent call last):
File "<pyshell#24>", line 1, in <module>
open('bhla')
File "<pyshell#18>", line 7, in fn
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
SystemExit: Can't open 'bhla'. Error #2: No such file or directory

Categories

Resources