Unpickle binary file to text [duplicate] - python

This question already has answers here:
Is there a way to view cPickle or Pickle file contents without loading Python in Windows?
(3 answers)
Closed 4 years ago.
I need to do some maintenance on a system that basically looks like:
(Complicated legacy Python program) -> binary pickle file -> (Another complicated legacy Python program)
Which requires figuring out exactly what is in the intermediate pickle file. I suspect the file format is much simpler than the codes that generate and consume it, and it would help if I could verify that by eyeballing the file itself instead of having to figure out exactly what all the code does.
Is there a way to take the binary pickle file and convert it, not to live objects in memory (which is what every page I could find with a Google search reckons 'unpickle' means) but to some readable text format? JSON, XML, whatever, I'm not fussy about the exact format, anything would do as long as it is a complete and readable representation of the contents that I can load up in a text editor and look at.

Python native types are just readable enough. The hard part in your way is that unpickling will automatically try to import any modules with classes for any instances defined in code in your file.
Fortunatelly, Python is flexible enough its possible to temporarily hack the import machinery in order to fool the unpickling and give it false classes to fill in with the pickled attributes.
Then, it is a matter of converting the dictionary of the instances that were unpickled in this way back to human readable.
Fortunatelly, I maintain a pet project that performs this "temporary import system hacking", so I could lend a couple lines of code from there to make the same here.
In order to test this thing, I ended up creating a stand-alone script. As the comments on it spell: don't try to incorporate this in a larger program - it will break the running Python program as it is, by creating faking modules - but it should be enough for you to visualize what is pickled in there - although it would be impossible to match all the corner cases you can have there - you will have to work from here, mostly on the "pythonize" function bellow:
import re, pickle, pprint, sys
from types import ModuleType
from collections.abc import Sequence, Mapping, Set
from contextlib import contextmanager
def pythonize(obj):
if isinstance(obj, (str, bytes)):
return obj
if isinstance(obj, (Sequence, Set)):
container = []
for element in obj:
container.append(pythonize(element))
return container
elif isinstance(obj, Mapping):
container = {}
else:
container = {"$CLS": obj.__class__.__qualname__}
if not hasattr(obj, "__dict__"):
return repr(obj)
obj = obj.__dict__
for key, value in obj.items():
container[key] = pythonize(value)
return container
class FakeModule:
def __getattr__(self, attr):
cls = type(attr, (), {})
setattr(self, attr, cls)
return cls
def fake_importer(name, globals, locals, fromlist, level):
module = sys.modules[name] = FakeModule()
return module
#contextmanager
def fake_import_system():
# With code lifted from https://github.com/jsbueno/extradict - MapGetter functionality
builtins = __builtins__ if isinstance(__builtins__, dict) else __builtins__.__dict__
original_import = builtins["__import__"]
builtins["__import__"] = fake_importer
yield None
builtins["__import__"] = original_import
def unpickle_to_text(stream: bytes):
# WARNING: this example will wreck havoc in loaded modules!
# do not use as part of a complex system!!
action_log = []
with fake_import_system():
result = pickle.loads(stream)
pythonized = pythonize(result)
return pprint.pformat(pythonized)
if __name__ == "__main__":
print(unpickle_to_text(open(sys.argv[1], "rb").read()))
update: as this might have some use for more people, I just made a gist out of this code. Maybe it is even pip worth: https://gist.github.com/jsbueno/b72a20cba121926bec19163780390b92

If the application is old enough it might use pickle protocol 0, which is human-readable.
You could try the pickletools module found in python 3.2+.
Using python3 -m pickletools <file> will "disassemle" the pickle file for you.
Alternatively, you could try loading the data using data = pickle.load() and then immediately dump it using print(json.dumps(data)). Note that this might fail, because pickle can represent more things than JSON can.

Related

Change default multiprocessing unpickler class

I have a multiprocessing program on Device A which uses a queue and a SyncManager to make this accessible over the network. The queue stores a custom class from a module on the device which gets automatically pickled by the multiprocessing package as module.class.
On another device reading the queue via a SyncManager, I have the same module as part of a package instead of top-level as it was on Device A. This means I get a ModuleNotFoundError when I attempt to read an item from the queue as the unpickler doesn't know the module is now package.module.
I've seen this work-around which uses a new class based on pickler.Unpicker and seems the least hacky and extensible: https://stackoverflow.com/a/53327348/5683049
However, I don't know how to specify the multiprocessing unpickler class to use.
I see this can be done for the reducer class so I assume there is a way to also set the unpickler?
I have never seen a way to do this. You may have to hack around this. Let the multiprocessor system think you're passing byte strings or byte arrays, and have your user code perform the pickling and unpickling.
A hack? Yes. But not much worse that what you already have to do.
Using a mixture of:
How to change the serialization method used by the multiprocessing module?
https://stackoverflow.com/a/53327348/5683049
I was able to get this working using code similar to the following:
from multiprocessing.reduction import ForkingPickler, AbstractReducer
import pickle
import io
multiprocessing.context._default_context.reducer = MyPickleReducer()
class RenameUnpickler(pickle.Unpickler):
def find_class(self, module, name):
renamed_module = module
if module == "old_module_name":
renamed_module = "new_package.module_name"
return super(RenameUnpickler, self).find_class(renamed_module, name)
class MyForkingPickler(ForkingPickler):
# Method signature from pickle._loads
def loads(self, /, *, fix_imports=True, encoding="ASCII", errors="strict",
buffers=None):
if isinstance(s, str):
raise TypeError("Can't load pickle from unicode string")
file = io.BytesIO(s)
return RenameUnpickler(file, fix_imports=fix_imports, buffers=buffers,
encoding=encoding, errors=errors).load()
class MyPickleReducer(AbstractReducer):
ForkingPickler = MyForkingPickler
register = MyForkingPickler.register
This could be useful if you want to further override how the unpickling is performed, but in my original case it is probably just easier to redirect the module using:
from new_package import module_name
sys.modules['old_module_name'] = module_name

Searching for import location

I'm trying to write a Python script/function to help determine where to import
something from. I want this functionality specifically for PyQt4 but it could
be extended and useful for the Python standard library.
To accomplish this I need to dynamically search modules, shared objects,
classes, and possibly more 'types' of things for a given search term.
I want to pass a module/class/etc. as a string, import it dynamically and then
search it.
The interface of the function would be something like this:
search(object_to_search, search_term)
Examples:
search('datetime', 'today') -> 'datetime.datetime.today'
search('PyQt4', 'NonModal') -> 'PyQt4.QtCore.Qt.NonModal'
search('PyQt4', 'QIcon') -> 'PyQt4.QtGui.QIcon'
I suppose this could also support wildcards like '*' to search all of
sys.path however this is beyond the scope of what I'm concered about at this
time.
This type of script would be especially useful for PyQt4. There are a lot of
'enum-like' things like NonModal above and finding the location to import
or reference them from can be cumbersome.
I tend to use the C++ Qt documentation quite a bit for PyQt4 information since
it's typically easy to translate the examples into Python. The C++ Qt
documentation is more comprehensive. It's also much easier to find sample code
for Qt proper. However, the import locations of some of these 'enum-like'
attributes are hard to find in PyQt4 without manually searching the docs or
guessing. The goal is to automate this process and just have the script tell
me the location of the thing I'm looking for.
I have tried a few different ways of implementing this including just searching
the object recursively with dir(), using the inspect module, and a hacky
version of import/discovery for shared objects. Below is my various
implementations, non of which really work quite right.
Is this type of thing even possible? If so, what is the best way to do this?
Also, I've tried to trim down the number of things I search by omitting
built-in things, functions, etc., but I'm not convinced this is necessary or
even correct.
Attempted solution:
def search_obj_inspect(obj, search_term):
"""Search given object for 'search_term' with 'inspect' module"""
import inspect
def searchable_type(obj):
is_class = inspect.isclass(obj)
root_type_or_obj = is_class and obj not in [object, type]
abstract_class = is_class and inspect.isabstract(obj)
method_or_func = inspect.ismethod(obj) or inspect.isfunction(obj)
try:
hidden_attribute = obj.__name__.startswith('__')
except AttributeError:
hidden_attribute = False
searchable = True
# Avoid infinite recursion when looking through 'object' and 'type' b/c
# they are members of each other
if True in [abstract_class, root_type_or_obj, hidden_attribute,
method_or_func, inspect.isbuiltin(obj)]:
searchable = False
return searchable
# FIXME: Search obj itself for search term? obj.__name__ == search_term?
for n, v in inspect.getmembers(obj,
predicate=lambda x: searchable_type(x)):
if search_term in n:
try:
return v.__name__
except AttributeError:
return str(v)
return None
def search_obj_dir(obj, search_term):
"""Search given object and all it's python submodules for attr"""
import re
import types
SEARCHABLE_TYPES = (types.ModuleType, types.ClassType, types.InstanceType,
types.DictProxyType)
for item in dir(obj):
if item.startswith('__'):
continue
if re.search(search_term, item):
return obj.__dict__[item].__module__
else:
try:
submod = obj.__dict__[item]
except (KeyError, AttributeError):
# Nothing left to recursively search
continue
if not isinstance(submod, SEARCHABLE_TYPES):
continue
#mod = search_obj_dir(submod, search_term)
#if mod:
#return '.'.join([mod, submod.__name__])
return None
def search_so(module, attr):
"""Search given modules shared objects for attr"""
import os
import re
try:
path = module.__path__[0]
except (AttributeError, IndexError):
return False
for root, dirs, files in os.walk(path):
for filename in files:
if filename.endswith('.so'):
py_module_name = filename.split('.')[0]
so_module = __import__(module.__name__, globals(), locals(),
[py_module_name])
if re.search(attr, py_module_name):
return ''.join([module.__name__, '.', py_module_name])
try:
found = search_obj_dir(
so_module.__dict__[py_module_name], attr)
except KeyError:
# This isn't a top-level so, might be a subpackage
# FIXME: Could recursively search SO as well, but imports
# might get weird
continue
if found:
return found
Thanks to #JonClements for the hint to look into the pydoc module. I found a way to achieve pretty much what I wanted with a combination of the pkgutil and inspect modules.
The 'trick' is to go through all the modules in a package (like PyQt4) with pkgutil.iter_modules and then all non-package objects can be searched with inspect.getmembers. The general idea is the following:
for pkg in given_pkg:
for module_or_class in pkg:
for attribute in module_or_class:
You can see the full solution here.

Unpickling a function into a different context in Python

I have written a Python interface to a process-centric job distribution system we're developing/using internally at my workplace. While reasonably skilled programmers, the primary people using this interface are research scientists, not software developers, so ease-of-use and keeping the interface out of the way to the greatest degree possible is paramount.
My library unrolls a sequence of inputs into a sequence of pickle files on a shared file server, then spawns jobs that load those inputs, perform the computation, pickle the results, and exit; the client script then picks back up and produces a generator that loads and yields the results (or rethrows any exception the calculation function did.)
This is only useful since the calculation function itself is one of the serialized inputs. cPickle is quite content to pickle function references, but requires the pickled function to be reimportable in the same context. This is problematic. I've already solved the problem of finding the module to reimport it, but the vast majority of the time, it is a top-level function that is pickled and, thus, does not have a module path. The only strategy I've found to be able to unpickle such a function on the computation nodes is this nauseating little approach towards simulating the original environment in which the function was pickled before unpickling it:
...
# At this point, we've identified the source of the target function.
# A string by its name lives in "modname".
# In the real code, there is significant try/except work here.
targetModule = __import__(modname)
globalRef = globals()
for thingie in dir(targetModule):
if thingie not in globalRef:
globalRef[thingie] = targetModule.__dict__[thingie]
# sys.argv[2]: the path to the pickle file common to all jobs, which contains
# any data in common to all invocations of the target function, then the
# target function itself
commonFile = open(sys.argv[2], "rb")
commonUnpickle = cPickle.Unpickler(commonFile)
commonData = commonUnpickle.load()
# the actual function unpack I'm having trouble with:
doIt = commonUnpickle.load()
The final line is the most important one here- it's where my module is picking up the function it should actually be running. This code, as written, works as desired, but directly manipulating the symbol tables like this is unsettling.
How can I do this, or something very much like this that does not force the research scientists to separate their calculation scripts into a proper class structure (they use Python like the most excellent graphing calculator ever and I would like to continue to let them do so) the way Pickle desperately wants, without the unpleasant, unsafe, and just plain scary __dict__-and-globals() manipulation I'm using above? I fervently believe there has to be a better way, but exec "from {0} import *".format("modname") didn't do it, several attempts to inject the pickle load into the targetModule reference didn't do it, and eval("commonUnpickle.load()", targetModule.__dict__, locals()) didn't do it. All of these fail with Unpickle's AttributeError over being unable to find the function in <module>.
What is a better way?
Pickling functions can be rather annoying if trying to move them into a different context. If the function does not reference anything from the module that it is in and references (if anything) modules that are guaranteed to be imported, you might check some code from a Rudimentary Database Engine found on the Python Cookbook.
In order to support views, the academic module grabs the code from the callable when pickling the query. When it comes time to unpickle the view, a LambdaType instance is created with the code object and a reference to a namespace containing all imported modules. The solution has limitations but worked well enough for the exercise.
Example for Views
class _View:
def __init__(self, database, query, *name_changes):
"Initializes _View instance with details of saved query."
self.__database = database
self.__query = query
self.__name_changes = name_changes
def __getstate__(self):
"Returns everything needed to pickle _View instance."
return self.__database, self.__query.__code__, self.__name_changes
def __setstate__(self, state):
"Sets the state of the _View instance when unpickled."
database, query, name_changes = state
self.__database = database
self.__query = types.LambdaType(query, sys.modules)
self.__name_changes = name_changes
Sometimes is appears necessary to make modifications to the registered modules available in the system. If for example you need to make reference to the first module (__main__), you may need to create a new module with your available namespace loaded into a new module object. The same recipe used the following technique.
Example for Modules
def test_northwind():
"Loads and runs some test on the sample Northwind database."
import os, imp
# Patch the module namespace to recognize this file.
name = os.path.splitext(os.path.basename(sys.argv[0]))[0]
module = imp.new_module(name)
vars(module).update(globals())
sys.modules[name] = module
Your question was long, and I was too caffeinated to make it through your very long question… However, I think you are looking to do something that there's a pretty good existing solution for already. There's a fork of the parallel python (i.e. pp) library that takes functions and objects and serializes them, sends them to different servers, and then unpikles and executes them. The fork lives inside the pathos package, but you can download it independently here:
http://danse.cacr.caltech.edu/packages/dev_danse_us
The "other context" in that case is another server… and the objects are transported by converting the objects to source code and then back to objects.
If you are looking to use pickling, much in the way you are doing already, there's an extension to mpi4py that serializes arguments and functions, and returns pickled return values… The package is called pyina, and is commonly used to ship code and objects to cluster nodes in coordination with a cluster scheduler.
Both pathos and pyina provide map abstractions (and pipe), and try to hide all of the details of parallel computing behind the abstractions, so scientists don't need to learn anything except how to program normal serial python. They just use one of the map or pipe functions, and get parallel or distributed computing.
Oh, I almost forgot. The dill serializer includes dump_session and load_session functions that allow the user to easily serialize their entire interpreter session and send it to another computer (or just save it for later use). That's pretty handy for changing contexts, in a different sense.
Get dill, pathos, and pyina here: https://github.com/uqfoundation
For a module to be recognized as loaded I think it must by in sys.modules, not just its content imported into your global/local namespace. Try to exec everything, then get the result out of an artificial environment.
env = {"fn": sys.argv[2]}
code = """\
import %s # maybe more
import cPickle
commonFile = open(fn, "rb")
commonUnpickle = cPickle.Unpickler(commonFile)
commonData = commonUnpickle.load()
doIt = commonUnpickle.load()
"""
exec code in env
return env["doIt"]
While functions are advertised as first-class objects in Python, this is one case where it can be seen that they are really second-class objects. It is the reference to the callable, not the object itself, that is pickled. (You cannot directly pickle a lambda expression.)
There is an alternate usage of __import__ that you might prefer:
def importer(modulename, symbols=None):
u"importer('foo') returns module foo; importer('foo', ['bar']) returns {'bar': object}"
if modulename in sys.modules: module = sys.modules[modulename]
else: module = __import__(modulename, fromlist=['*'])
if symbols == None: return module
else: return dict(zip(symbols, map(partial(getattr, module), symbols)))
So these would all be basically equivalent:
from mymodule.mysubmodule import myfunction
myfunction = importer('mymodule.mysubmodule').myfunction
globals()['myfunction'] = importer('mymodule.mysubmodule', ['myfunction'])['myfunction']

How is a Python project set up?

I am doing some heavy commandline stuff (not really web based) and am new to Python, so I was wondering how to set up my files/folders/etc. Are there "header" files where I can keep all the DB connection stuff?
How/where do I define classes and objects?
Just to give you an example of a typical Python module's source, here's something with some explanation. This is a file named "Dims.py". This is not the whole file, just some parts to give an idea what's going on.
#!/usr/bin/env python
This is the standard first line telling the shell how to execute this file. Saying /usr/bin/env python instead of /usr/bin/python tells the shell to find Python via the user's PATH; the desired Python may well be in ~/bin or /usr/local/bin.
"""Library for dealing with lengths and locations."""
If the first thing in the file is a string, it is the docstring for the module. A docstring is a string that appears immediately after the start of an item, which can be accessed via its __doc__ property. In this case, since it is the module's docstring, if a user imports this file with import Dims, then Dims.__doc__ will return this string.
# Units
MM_BASIC = 1500000
MILS_BASIC = 38100
IN_BASIC = MILS_BASIC * 1000
There are a lot of good guidelines for formatting and naming conventions in a document known as PEP (Python Enhancement Proposal) 8. These are module-level variables (constants, really) so they are written in all caps with underscores. No, I don't follow all the rules; old habits die hard. Since you're starting fresh, follow PEP 8 unless you can't.
_SCALING = 1
_SCALES = {
mm_basic: MM_BASIC,
"mm": MM_BASIC,
mils_basic: MILS_BASIC,
"mil": MILS_BASIC,
"mils": MILS_BASIC,
"basic": 1,
1: 1
}
These module-level variables have leading underscores in their names. This gives them a limited amount of "privacy", in that import Dims will not let you access Dims._SCALING. However, if you need to mess with it, you can explicitly say something like import Dims._SCALING as scaling.
def UnitsToScale(units=None):
"""Scales the given units to the current scaling."""
if units is None:
return _SCALING
elif units not in _SCALES:
raise ValueError("unrecognized units: '%s'." % units)
return _SCALES[units]
UnitsToScale is a module-level function. Note the docstring and the use of default values and exceptions. No spaces around the = in default value declarations.
class Length(object):
"""A length. Makes unit conversions easier.
The basic, mm, and mils properties can be used to get or set the length
in the desired units.
>>> x = Length(mils=1000)
>>> x.mils
1000.0
>>> x.mm
25.399999999999999
>>> x.basic
38100000L
>>> x.mils = 100
>>> x.mm
2.54
"""
The class declaration. Note the docstring has things in it that look like Python command line commands. These care called doctests, in that they are test code in the docstring. More on this later.
def __init__(self, unscaled=0, basic=None, mm=None, mils=None, units=None):
"""Constructs a Length.
Default contructor creates a length of 0.
>>> Length()
Length(basic=0)
Length(<float>) or Length(<string>) creates a length with the given
value at the current scale factor.
>>> Length(1500)
Length(basic=1500)
>>> Length("1500")
Length(basic=1500)
"""
# Straight copy
if isinstance(unscaled, Length):
self._x = unscaled._x
return
# rest omitted
This is the initializer. Unlike C++, you only get one, but you can use default arguments to make it look like several different constructors are available.
def _GetBasic(self): return self._x
def _SetBasic(self, x): self._x = x
basic = property(_GetBasic, _SetBasic, doc="""
This returns the length in basic units.""")
This is a property. It allows you to have getter/setter functions while using the same syntax as you would for accessing any other data member, in this case, myLength.basic = 10 does the same thing as myLength._SetBasic(10). Because you can do this, you should not write getter/setter functions for your data members by default. Just operate directly on the data members. If you need to have getter/setter functions later, you can convert the data member to a property and your module's users won't need to change their code. Note that the docstring is on the property, not the getter/setter functions.
If you have a property that is read-only, you can use property as a decorator to declare it. For example, if the above property was to be read-only, I would write:
#property
def basic(self):
"""This returns the length in basic units."""
return self._x
Note that the name of the property is the name of the getter method. You can also use decorators to declare setter methods in Python 2.6 or later.
def __mul__(self, other):
"""Multiplies a Length by a scalar.
>>> Length(10)*10
Length(basic=100)
>>> 10*Length(10)
Length(basic=100)
"""
if type(other) not in _NumericTypes:
return NotImplemented
return Length(basic=self._x * other)
This overrides the * operator. Note that you can return the special value NotImplemented to tell Python that this operation isn't implemented (in this case, if you try to multiply by a non-numeric type like a string).
__rmul__ = __mul__
Since code is just a value like anything else, you can assign the code of one method to another. This line tells Python that the something * Length operation uses the same code as Length * something. Don't Repeat Yourself.
Now that the class is declared, I can get back to module code. In this case, I have some code that I want to run only if this file is executed by itself, not if it's imported as a module. So I use the following test:
if __name__ == "__main__":
Then the code in the if is executed only if this is being run directly. In this file, I have the code:
import doctest
doctest.testmod()
This goes through all the docstrings in the module and looks for lines that look like Python prompts with commands after them. The lines following are assumed to be the output of the command. If the commands output something else, the test is considered to have failed and the actual output is printed. Read the doctest module documentation for all the details.
One final note about doctests: They're useful, but they're not the most versatile or thorough tests available. For those, you'll want to read up on unittests (the unittest module).
Each python source file is a module. There are no "header" files. The basic idea is that when you import "foo" it'll load the code from "foo.py" (or a previously compiled version of it). You can then access the stuff from the foo module by saying foo.whatever.
There seem to be two ways for arranging things in Python code. Some projects use a flat layout, where all of the modules are at the top-level. Others use a hierarchy. You can import foo/bar/baz.py by importing "foo.bar.baz". The big gotcha with hierarchical layout is to have __init__.py in the appropriate directories (it can even be empty, but it should exist).
Classes are defined like this:
class MyClass(object):
def __init__(self, x):
self.x = x
def printX(self):
print self.x
To create an instance:
z = MyObject(5)
You can organize it in whatever way makes the most sense for your application. I don't exactly know what you're doing so I can't be certain what the best organization would be for you, but you can pretty much split it up as you see fit and just import what you need.
You can define classes in any file, and you can define as many classes as you would like in a script (unlike Java). There are no official header files (not like C or C++), but you can use config files to store info about connecting to a DB, whatever, and use configparser (a standard library function) to organize them.
It makes sense to keep like things in the same file, so if you have a GUI, you might have one file for the interface, and if you have a CLI, you might keep that in a file by itself. It's less important how your files are organized and more important how the source is organized into classes and functions.
This would be the place to look for that: http://docs.python.org/reference/.
First of all, compile and install pip: http://pypi.python.org/pypi/pip. It is like Ubuntu's apt-get. You run it via a Terminal by typing in pip install package-name. It has a database of packages, so you can install/uninstall stuff quite easily with it.
As for importing and "header" files, from what I can tell, if you run import foo, Python looks for foo.py in the current folder. If it's not there, it looks for eggs (folders unzipped in the Python module directory) and imports those.
As for defining classes and objects, here's a basic example:
class foo(foobar2): # I am extending a class, in this case 'foobar2'. I take no arguments.
__init__(self, the, list, of, args = True): # Instead, the arguments get passed to me. This still lets you define a 'foo()' objects with three arguments, only you let '__init__' take them.
self.var = 'foo'
def bar(self, args):
self.var = 'bar'
def foobar(self): # Even if you don't need arguments, never leave out the self argument. It's required for classes.
print self.var
foobar = foo('the', 'class', 'args') # This is how you initialize me!
Read more on this in the Python Reference, but my only tip is to never forget the self argument in class functions. It will save you a lot of debugging headaches...
Good luck!
There's no some fixed structure for Python programs, but you can take Django project as an example. Django project consists of one settings.py module, where global settings (like your example with DB connection properties) are stored and pluggable applications. Each application has it's own models.py module, which stores database models and, possibly, other domain specific objects. All the rest is up to you.
Note, that these advices are not specific to Python. In C/C++ you probably used similar structure and kept settings in XML. Just forget about headers and put settings in plain in .py file, that's all.

Python: Using `copyreg` to define reducers for types that already have reducers

(Keep in mind I'm working in Python 3, so a solution needs to work in Python 3.)
I would like to use the copyreg module to teach Python how to pickle functions. When I tried to do it, the _Pickler object would still try to pickle functions using the save_global function. (Which doesn't work for unbound methods, and that's the motivation for doing this.)
It seems like _Pickler first tries to look in its own dispatch for the type of the object that you want to pickle before looking in copyreg.dispatch_table. I'm not sure if this is intentional.
Is there any way for me to tell Python to pickle functions with the reducer that I provide?
The following hack seems to work in Python 3.1...:
import copyreg
def functionpickler(f):
print('pickling', f.__name__)
return f.__name__
ft = type(functionpickler)
copyreg.pickle(ft, functionpickler)
import pickle
pickle.Pickler = pickle._Pickler
del pickle.Pickler.dispatch[ft]
s = pickle.dumps(functionpickler)
print('Result is', s)
Out of this, the two hackish lines are:
pickle.Pickler = pickle._Pickler
del pickle.Pickler.dispatch[ft]
You need to remove the dispatch entry for functions' type because otherwise it preempts the copyreg registration; and I don't think you can do that on the C-coded Pickler so you need to set it to the Python-coded one.
It would be a bit less of a hack to subclass _Pickler with a class of your own which makes its own dispatch (copying the parent's and removing the entry for the function type), and then use your subclass specifically (and its dump method) rather than pickle.dump; however it would also be a bit less convenient that this monkeypatching of pickle itself.

Categories

Resources