Recently, I have been asked to make "our C++ lib work in the cloud".
Basically, the lib is computer intensive (calculating prices), so it would make sense.
I have constructed a SWIG interface to make a python version with in the mind to use MapReduce with MRJob.
I wanted to serialize the objects in a file, and using a mapper, deserialize and calculate the price.
For example:
class MRTest(MRJob):
def mapper(self,key,value):
obj = dill.loads(value)
yield (key, obj.price())
But now I reach a dead end since it seems that dill cannot handle SWIG extension:
PicklingError: Can't pickle <class 'SwigPyObject'>: it's not found as builtins.SwigPyObject
Is there a way to make this work properly?
I'm the dill author. That's correct, dill can't pickle C++ objects. When you see it's not found as builtin.some_object… that almost invariably means that you are trying to pickle some object that is not written in python, but uses python to bind to C/C++ (i.e. an extension type). You have no hope of directly pickling such objects with a python serializer.
However, since you are interested in pickling a subclass of an extension type, you can actually do it. All you will need to do is to give your object the appropriate state you want to save as an instance attribute or attributes, and provide a __reduce__ method to tell dill (or pickle) how to save the state of your object. This method is how python deals with serializing extension types. See:
https://docs.python.org/2/library/pickle.html#pickling-and-unpickling-extension-types
There are probably better examples, but here's at least one example:
https://stackoverflow.com/a/19874769/4646678
Related
I have a large object in my Python3 code which, when tried to be pickled with the pickle module throws the following error:
TypeError: cannot serialize '_io.BufferedReader' object
However, dill.dump() and dill.load() are able to save and restore the object seamlessly.
What causes the trouble for the pickle module?
Now that dill saves and reconstructs the object without any error, is there any way to verify if the pickling and unpickling with dill went well?
How's it possible that pickle fails, but dill succeeds?
I'm the dill author.
1) Easiest thing to do is look at this file: https://github.com/uqfoundation/dill/blob/master/dill/_objects.py, it lists what pickle can serialize, and what dill can serialize.
2) you can try dill.copy and dill.check and dill.pickles to check different levels of pickling and unpickling. dill also more includes utilities for detecting and diagnosing serialization issues in dill.detect and dill.pointers.
3) dill is built on pickle, and augments it by registering new serialization functions.
4) dill includes serialization variants which enable the user to choose from different object dependency serialization strategies (in dill.settings) -- including source code extraction and object reconstitution with dill.source (and extension of the stdlib inspect module).
Can anybody help me to understand how and why to choose among e.g. pickle and dill?
My use case is the following.
I would like to dump an object which instance of a class derived by multiple inheritance from some external library classes. Moreover one attribute of the class is a dictionary, which has a function as a value.
Unfortunately, that function is defined within the scope of the class.
def class:
def f():
def that_function():
# do someth
# back within f() scope
self.mydata{'foo':that_function}
Any comment regard to robustness to external dependencies?
Or any other library I could consider for serialization?
I'm the dill author. You should use pickle if all the objects you want to pickle can be pickled by pickle.dump. If one or more of the objects are unpicklable with pickle, then use dill. See the pickle docs for what can be pickled with pickle. dill can pickle most python objects, with some exceptions.
If you want to consider alternatives to dill, there's cloudpickle, which has a similar functionality to dill (and is very similar to dill when using dill.settings['recurse'] = True).
There are other serialization libraries, like json, but they actually serialize less objects than pickle does, so you'd not choose them to serialize a user-built class.
Python docs mention this word a lot and I want to know what it means.
It simply means it can be serialized by the pickle module. For a basic explanation of this, see What can be pickled and unpickled?. Pickling Class Instances provides more details, and shows how classes can customize the process.
Things that are usually not pickable are, for example, sockets, file(handler)s, database connections, and so on. Everything that's build up (recursively) from basic python types (dicts, lists, primitives, objects, object references, even circular) can be pickled by default.
You can implement custom pickling code that will, for example, store the configuration of a database connection and restore it afterwards, but you will need special, custom logic for this.
All of this makes pickling a lot more powerful than xml, json and yaml (but definitely not as readable)
These are all great answers, but for anyone who's new to programming and still confused here's the simple answer:
Pickling an object is making it so you can store it as it currently is, long term (to often to hard disk). A bit like Saving in a video game.
So anything that's actively changing (like a live connection to a database) can't be stored directly (though you could probably figure out a way to store the information needed to create a new connection, and that you could pickle)
Bonus definition: Serializing is packaging it in a form that can be handed off to another program. Unserializing it is unpacking something you got sent so that you can use it
Pickling is the process in which the objects in python are converted into simple binary representation that can be used to write that object in a text file which can be stored. This is done to store the python objects and is also called as serialization. You can infer from this what de-serialization or unpickling means.
So when we say an object is picklable it means that the object can be serialized using the pickle module of python.
Title says it all. It seems like it ought be possible (somehow) to implement python-side pickling for PyObjC objects whose Objective-C classes implement NSCoding without re-implementing everything from scratch. That said, while value-semantic members would probably be straightforward, by-reference object graphs and conditional coding might be tricky. How might you get the two sides to "collaborate" on the object graph parts?
PyObjC does support writing Python objects to a (keyed) archive (that is, any object that can be pickled implements NSCoding).
That’s probably the easiest way to serialize arbitrary graphs of Python and Objective-C objects.
As I wrote in the comments for another answer I ran into problems when trying to find a way to implement pickle support for any object that implements NSCoding due to incompatibilities in how NSArchiver and pickle traverse the object graph (IIRC primarily when restoring the archive).
Shouldn't it be pretty straightforward?
On pickling, call encodeWithCoder on the object using an NSArchiver or something. Have pickle store that string.
On unpickling, use NSUnarchiver to create an NSObject from the pickled string.
Python docs mention this word a lot and I want to know what it means.
It simply means it can be serialized by the pickle module. For a basic explanation of this, see What can be pickled and unpickled?. Pickling Class Instances provides more details, and shows how classes can customize the process.
Things that are usually not pickable are, for example, sockets, file(handler)s, database connections, and so on. Everything that's build up (recursively) from basic python types (dicts, lists, primitives, objects, object references, even circular) can be pickled by default.
You can implement custom pickling code that will, for example, store the configuration of a database connection and restore it afterwards, but you will need special, custom logic for this.
All of this makes pickling a lot more powerful than xml, json and yaml (but definitely not as readable)
These are all great answers, but for anyone who's new to programming and still confused here's the simple answer:
Pickling an object is making it so you can store it as it currently is, long term (to often to hard disk). A bit like Saving in a video game.
So anything that's actively changing (like a live connection to a database) can't be stored directly (though you could probably figure out a way to store the information needed to create a new connection, and that you could pickle)
Bonus definition: Serializing is packaging it in a form that can be handed off to another program. Unserializing it is unpacking something you got sent so that you can use it
Pickling is the process in which the objects in python are converted into simple binary representation that can be used to write that object in a text file which can be stored. This is done to store the python objects and is also called as serialization. You can infer from this what de-serialization or unpickling means.
So when we say an object is picklable it means that the object can be serialized using the pickle module of python.