I have a class whose instances are classes. Each instance has its own name. I can add that name to the enclosing module when the new class is created, which allows me to pickle.dump it. Originally, I didn't add the name to the module, but I had a reduce method in the top-level class. Unfortunately, there's some mysterious code in the pickle module that special-cases subclasses of type and looks for the name in the module rather than calling reduce. If that code were inside a try, and just continued to the reduce code on failure, I think using reduce would work fine.
Anyway, the problem with just adding the name is that on pickle.load, the name doesn't exist and the load fails. Presumably, I could add a getattr to the module to interpret the name and create the corresponding class, but my understanding is that not all versions of python support module-level getattr.
The reduce method in the class instances allows successfully pickling instances of those class instances by recreating the class instances on pickle.load, but pickling the class instances themselves doesn't work.
Other than making a slightly nonstandard pickle module, is there some reasonable solution to allow pickling of the class instances?
Related
I have a python class which uses class variables to maintain some instance-wide state. However, when attempting to test various scenarios via pytest I'm realizing that the class-level attributes are being shared across the tests since it's the same interpreter being used. Is there any pattern or way around this outside of running each class as a separate invocation of the pytest command or resetting every class attribute manually between tests? It's looking like a big no but wanted to ask and make sure.
p.s. For those who will immediately ask why I'm using class attributes. I'm using class attributes for state because this code will run on AWS Lambda. With this pattern I can cache objects in memory as class-attributes between lambda invocations and be assured that instance attributes are cleared each time the lambda runs. It's an attempt at using some OOP and writing base lambda classes with logging and various helpers already implemented that other devs can leverage
https://docs.python.org/3.3/library/imp.html#imp.reload
When reload(module) is executed:
Python modules’ code is recompiled and the module-level code
reexecuted, defining a new set of objects which are bound to names in
the module’s dictionary. The init function of extension modules is not
called a second time. As with all other objects in Python the old
objects are only reclaimed after their reference counts drop to zero.
The names in the module namespace are updated to point to any new or
changed objects. Other references to the old objects (such as names
external to the module) are not rebound to refer to the new objects
and must be updated in each namespace where they occur if that is
desired.
I've been working on multiprocessing and C++ extensions and I don't quite get the __getstate_manages_dict__ function (I know how to use it, but I'm not really sure how it works). The boost/Python documentation for pickle support says this:
The author of a Boost.Python extension class might provide a __getstate__ method without considering the possibilities that: * his class is used in Python as a base class. Most likely the dict of instances of the derived class needs to be pickled in order to restore the instances correctly. * the user adds items to the instance's __dict__ directly. Again, the __dict__ of the instance then needs to be pickled.
To alert the user to this highly unobvious problem, a safety guard is
provided. If __getstate__ is defined and the instance's __dict__ is
not empty, Boost.Python tests if the class has an attribute
__getstate_manages_dict__. An exception is raised if this attribute is not defined:
I've seen some examples where the object's __dict__ is returned in __getstate__ and then updated in __setstate__. What is this __dict__ refering to? Is it the __dict__ attribute of the derived class object? Also, why does this dict needs to be handled explicitly if pickle calls __init__ to create a new object and then sets the attribute?
Thanks
I know how to use it, but I'm not really sure how it works
It's a boolean value that is False by default. The point is to signal to Boost that the __getstate__/__setstate__ implementation handles the class' __dict__ attribute, so that the information won't be lost in the pickling process.
The idea is that Boost::Python can't actually determine whether the code is written properly, so instead you are made to jump through this extra hurdle so that, if you are unaware of the problem, you see an error message - as they say, to alert the user to this highly unobvious problem.
It's not doing anything magical. It's just there to confirm that you "read the warranty", so to speak.
The boost/Python documentation for pickle support says this:
This is just explaining the reasons why it's important to consider the __dict__ contents - even if you don't want to pickle all the attributes that you set explicitly in the __init__ (for example, because the class holds a reference to a large resource that you intend to load in some other way, or a results cache, or...). In short, instances of your class might contain information that you didn't expect them to contain, and that your __getstate__ implementation won't know how to handle, unless it takes the instance's __dict__ into account.
Hence the "practical advice" offered: "If __getstate__ is required, include the instance's __dict__ in the Python object that is returned."
What is this __dict__ referring to? Is it the __dict__ attribute of the derived class object?
It's the __dict__ attribute of the instance that __getstate__ was called upon. That could be an instance of a derived class, if that class doesn't override the methods. Or it could be an instance of the base class, which may or may not have had extra attributes added outside the class implementation.
Also, why does this dict needs to be handled explicitly if pickle calls init to create a new object and then sets the attribute?
See above. When you get the attributes (so that the pickle file can be written), you need to make sure that you actually get all the necessary attributes, or else they'll be missing upon restoration. Hard-coded logic can miss some.
Python allows for methods to be add to instances of a class rather than the whole class as demonstrated her Adding a Method to an Existing Object Instance. Most of the time this seems like a bad idea for consistent behavior of classes. When might this be necessary? Why does python allow this at all?
Python doesn't specifically allow this, it's just a consequence of the way the Python object model works. Methods are just object attributes like any other; and generally you can add any attribute to an existing object.
I'm having a tricky problem in the game I'm working on. I'm using Pygame to develop it. I happen to be one of those developers who never uses the default__dict__ object variable; I always define __slots__ to clarify the variables an object can have (I have a classmethod that reads the slots to determine the variables needed from a config file).
Anyway, I just realized that this effort isn't working in some of my classes; they still have a __dict__ variable and can have arbitrary attributes assigned to, even though they explicitly define their __slots__. I think this is because they are inheriting from pygame.sprite.Sprite, which has a __dict__. If this is the case, how do I suppress creation of this dict? (I though explicitly defining __slots__ was supposed to) Or could I be mistaken about the cause? Thanks for any insight; it's hard to find information about this particular problem via searches.
The only way to suppress arbitrary attributes and the __dict__ container of them, is to use __slots__ as you are and inherit from a class that does the same. A subclass of a class that has a __dict__ will always have a __dict__. The only way around it is to not inherit from this class (but, for example, use composition instead.)
I have a python module called model with basically the following content:
class Database:
class Publiation(object):
pass
class Article(Publication):
pass
class Book(Publication):
pass
class AnotherDatabase:
class Seminar(object):
pass
...
I define the objects in the database as classes nested under a main class in order to organize them more distinctively. The objects are parsed from a large XML file, which takes time. I would like to pickle the imported objects to make them loadable in shorter time.
I get the error:
pickle.PicklingError: Can't pickle
: it's
not found as project.model.Article
The class is now project.model.Article, not project.model.Database.Article as defined. Can I fix this error and keep the classes nested like above? Is it a bad idea to organize classes by nesting them?
When an inner class is created, there is no way for the interpreter to know which class it was defined inside of, this information is not recorded. This is why pickle does not know where to look for the class Article.
Because of this there are numerous issues when using inner classes, not just when it comes to pickling. If there are classes at the module scope with the same name, it introduces a lot of ambiguity as there is no easy way to tell the two types apart (e.g. with repr or when debugging.)
As a result it is generally best to avoid nested classes in Python unless you have a very good reason for doing so.
It's certainly a lot simpler to keep your classes unnested. As an alternative, you can use packages to group the classes together.
In any case, there is an alternate serializer named cerealizer which I think could handle the nested classes. You would need to register the classes with it before deserialization. I've used it before when pickle wouldn't suffice (also problems related to the classes) and it works well!