So I have a class that remembers its instances based on an ID. When unpickling such an object there are two cases I'd like to handle:
No other instance with that ID exists, so a new instance is generated and __setstate__ should be called normally.
When another instance of that ID exists, that object is returned and I want to avoid calling __setstate__ on it.
Please take a look at this class definition.
You will notice that currently I update only attributes that evaluate to False as a workaround.
I can think of two general strategies to solve this:
As in this previous answer, I could define another function that is called with the result from __getnewargs__ and attach some attribute that tells me whether to run __setstate__ or not.
If I can tell whether the class' __call__ method was called from within pickle I could do the same.
I do not know if it's possible to further interfere with the pickling machinery and I don't want to subclass the unpickler.
Thoughts, recommendations, completely different solutions? Thank you.
Related
I've been working on multiprocessing and C++ extensions and I don't quite get the __getstate_manages_dict__ function (I know how to use it, but I'm not really sure how it works). The boost/Python documentation for pickle support says this:
The author of a Boost.Python extension class might provide a __getstate__ method without considering the possibilities that: * his class is used in Python as a base class. Most likely the dict of instances of the derived class needs to be pickled in order to restore the instances correctly. * the user adds items to the instance's __dict__ directly. Again, the __dict__ of the instance then needs to be pickled.
To alert the user to this highly unobvious problem, a safety guard is
provided. If __getstate__ is defined and the instance's __dict__ is
not empty, Boost.Python tests if the class has an attribute
__getstate_manages_dict__. An exception is raised if this attribute is not defined:
I've seen some examples where the object's __dict__ is returned in __getstate__ and then updated in __setstate__. What is this __dict__ refering to? Is it the __dict__ attribute of the derived class object? Also, why does this dict needs to be handled explicitly if pickle calls __init__ to create a new object and then sets the attribute?
Thanks
I know how to use it, but I'm not really sure how it works
It's a boolean value that is False by default. The point is to signal to Boost that the __getstate__/__setstate__ implementation handles the class' __dict__ attribute, so that the information won't be lost in the pickling process.
The idea is that Boost::Python can't actually determine whether the code is written properly, so instead you are made to jump through this extra hurdle so that, if you are unaware of the problem, you see an error message - as they say, to alert the user to this highly unobvious problem.
It's not doing anything magical. It's just there to confirm that you "read the warranty", so to speak.
The boost/Python documentation for pickle support says this:
This is just explaining the reasons why it's important to consider the __dict__ contents - even if you don't want to pickle all the attributes that you set explicitly in the __init__ (for example, because the class holds a reference to a large resource that you intend to load in some other way, or a results cache, or...). In short, instances of your class might contain information that you didn't expect them to contain, and that your __getstate__ implementation won't know how to handle, unless it takes the instance's __dict__ into account.
Hence the "practical advice" offered: "If __getstate__ is required, include the instance's __dict__ in the Python object that is returned."
What is this __dict__ referring to? Is it the __dict__ attribute of the derived class object?
It's the __dict__ attribute of the instance that __getstate__ was called upon. That could be an instance of a derived class, if that class doesn't override the methods. Or it could be an instance of the base class, which may or may not have had extra attributes added outside the class implementation.
Also, why does this dict needs to be handled explicitly if pickle calls init to create a new object and then sets the attribute?
See above. When you get the attributes (so that the pickle file can be written), you need to make sure that you actually get all the necessary attributes, or else they'll be missing upon restoration. Hard-coded logic can miss some.
Recently I have been learning about managed attributes in Python and a common theme with properties and descriptors is, that they have to be assigned as class attributes. But nowhere can I find an explanation of why and especially why they cannot be assigned as instance attributes. So my question has actually two parts:
why do properties / descriptor instances have to be class attributes?
why can properties / descriptor instances not be instance attributes?
It is because of the way Python tries to resolve attributes:
First it checks if it is defined at the class level
If yes, it checks if it is a property or a data descriptor
If yes, it follows this "path"
If no, it checks if it is a simple class variable (up to its parent classes if any)
If yes, it checks the instance overrides this class attribute value, if yes, it returns the overriden value, if no it returns the class attribute value
If no, it checks if the instance declares this attribute
If yes, it returns the instance attribute value
If no, it throws AttributeError
Voila ;-)
EDIT
I just found this link which explains it better than me.
Another nice illustration.
why do properties/descriptor instances have to be class attributes?
They don't have to be, they just are. This was a design decision that probably has many more reasons to back it up than I can think (simplifying implementation, separating classes from objects).
why can properties/descriptor instances not be instance attributes?
They could be, you can always override __getattribute__ in order to invoke any descriptors accessed on an instance or forbid them altogether if you desire.
Keep in mind that the fact that Python won't stop you from doing this doesn't mean it's a good idea.
According to this post:
Python memoising/deferred lookup property decorator
A mnemonic decorator can be used to declare a lazy property in a class. There is even an 'official' package that can be used out of the box:
https://pypi.python.org/pypi/lazy
however, both of these implementation has a severe problem: any memorized values will be attempted to be pickled by python. If these values are unpicklable it will cause the program to break down.
My question is: is there an easy way to implement scala's "#transient lazy val" declaration without too much tinkering? This declaration should remember the property in case of multiple invocation, and drop it once the class/object is serialized.
Unaware of scala implementation details, but the easiest solution comes to my mind, if you're satisfied with other aspects of the 'lazy property' library you've found, would be implementing __getstate__ and __setstate__ object methods, as described in Pickling and unpickling normal class instances
These methods are called by pickle/unpickle handler during object instance (de)serialization.
This way you can have fine-grained control of how/which attributes of your object serialized.
You should read corresponding documentation on another two pickle-related methods as well (take care of __getinitargs__ specifically).
Python deserialized objects initialization differes from common __new__ & __init__ sequence
I'm familiar with the theory about __new__ vs __init__. The former one defines how an instance of a class is created (new object inside the memory), whereas the latter one initializes it (assigns initial state attributes - fields). There is a couple of articles in the web about this, such as this one:
Use __new__ when you need to control the creation of a new instance.
Use __init__ when you need to control initialization of a new
instance.
As I said, I do understand the difference, yet, I can't imagine a real world example of situation when I need to use __new__ instead of __init__. If I can customize something during object creation, I can move it to object initialization - as long as it's the same object. The mentioned link says:
In general, you shouldn't need to override __new__ unless you're
subclassing an immutable type like str, int, unicode or tuple.
And here comes my question - can someone give an example of situation, when overriding __new__ is in fact the right solution that can't be done using __init__ and why is that?
Singletone pattern - the most obvious example.
When you've created one more object - it is not the singletone, right?
Thus, you have to handle this when you're creating your object. One variant of the solution - to use __new__() method.
I overrided the save() method of my Fooclass so that when I create a Foo instance, some logic occurs. It works well.
Nevertheless, I have other methods in other classes that update Foo instances, and of course, I have to save changes calling the save() method. But I want them to directly update without passing into the logic I made for object creation.
Is there an elegant solution to that?
What about overriding __init__() method instead of save()? (I was told it was a bad practice, but not sure to understand why)
Thank you.
You should not override __init__, because that is called in all cases when a model is being instantiated, including when you load it from the database.
A good way to do what you want is to check the value of self.pk within your save method: if it is None, then this is a new instance being created.