How to create a persistant class using pickle in Python - python

New to python...
I have the following class Key, that extends dict:
class Key( dict ):
def __init__( self ):
self = { some dictionary stuff... }
def __getstate__(self):
state = self.__dict__.copy()
return state
def __setstate__(self, state):
self.__dict__.update( state )
I want to save an instance of the class with its data using pickle.dump and then retrieve the data using pickle.load. I understand that I am supposed to somehow change the getstate and the setstate, however, am not entirely clear on how I am supposed to do that... any help would be greatly appreciated!

I wrote a subclass of dict that does this here it is.
class AttrDict(dict):
"""A dictionary with attribute-style access. It maps attribute access to
the real dictionary. """
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
def __getstate__(self):
return self.__dict__.items()
def __setstate__(self, items):
for key, val in items:
self.__dict__[key] = val
def __repr__(self):
return "%s(%s)" % (self.__class__.__name__, dict.__repr__(self))
def __setitem__(self, key, value):
return super(AttrDict, self).__setitem__(key, value)
def __getitem__(self, name):
return super(AttrDict, self).__getitem__(name)
def __delitem__(self, name):
return super(AttrDict, self).__delitem__(name)
__getattr__ = __getitem__
__setattr__ = __setitem__
def copy(self):
return AttrDict(self)
It basically converts the state to a basic tuple, and takes that back again to unpickle.
But be aware that you have to have to original source file available to unpickle. The pickling does not actually save the class itself, only the instance state. Python will need the original class definition to re-create from.

Related

How to use __setitem__ properly?

I want to make a data object:
class GameData:
def __init__(self, data={}):
self.data = data
def __getitem__(self, item):
return self.data[item]
def __setitem__(self, key, value):
self.data[key] = value
def __getattr__(self, item):
return self.data[item]
def __setattr__(self, key, value):
self.data[kay] = value
def __repr__(self):
return str(self.data)
When I create a GameData object, I get RecursionError. How can I avoid setitem recall itself?
In the assignment self.data = data, __setattr__ is called because self has no attribute called data at the moment. __setattr__ then calls __getattr__ to obtain the non-existing attribute data. __getattr__ itself calls __getattr__ again. This is a recursion.
Use object.__setattr__(self, 'data', data) to do the assignment when implementing __setattr__.
class GameData:
def __init__(self, data=None):
object.__setattr__(self, 'data', {} if data is None else data)
def __getitem__(self, item):
return self.data[item]
def __setitem__(self, key, value):
self.data[key] = value
def __getattr__(self, item):
return self.data[item]
def __setattr__(self, key, value):
self.data[key] = value
def __repr__(self):
return str(self.data)
For details, see the __getattr__ manual
Additionally, do not use mutable objects as default parameter because the same object {} in the default argument is shared between GameData instances.

Python property setting does not assign the right instance attributes (leading underscores) when assigning via __setattr__

I can't set the right properties of an instance when setting their attributes via setattr in a factory method.
Given the following code where data is a simple dict containing e.g. { "age": "64", ...}
def factory(data):
obj = MyClass()
for k, v in data.items():
setattr(obj, k, v)
return obj
class MyClass(object):
def __init__(self):
self._age = None
# more...
#property
def age(self):
return self._age
#age.setter
def age(self, value):
some_validation(value)
self._age = value
def __setattr__(self, name, value):
object.__setattr__(self, name, value)
def __getitem__(self, item):
return self.__dict__.get(item, None)
def __getattr__(self, item):
self.__dict__[item] = None
return None
def __str__(self):
return json.dumps(self, default=lambda o: o.__dict__)
c = factory(data)
print(c)
I always get the following output when printing the created object:
{"_age": "64", ...}
But I need to have
{"age": "64", ...}
Why does the setattr method assign the leading underscore?
Some of the things you are trying to achieve get mixed up, like wanting to print __dict__ for a readable representation, but using private attributes for properties. Let's start from scratch and see how we can implement your class correctly.
You are trying to implement a class which attributes can be accessed both as keys and attributes. That is fine and can be accomplished in a more concise way.
class MyClass:
...
def __getitem__(self, item):
return self.__getattribute__(item)
def __setitem__(self, key, value):
return self.__setattr__(key, value)
You also want None to be returned when an attribute does not exist. This is covered by __getattr__ which is called exactly when an attribute does not exist.
def __getattr__(self, _):
return None
Then you want to add some validation to some attributes with property. It is indeed the correct way to proceed.
#property
def age(self):
return self._age
#age.setter
def age(self, value):
# some validation here
self._age = value
And finally you want to be able to have a nice string representation of your instance. We have to be careful for that since we had to add some private attributes that we do not want to print.
What we are going to do is implement a method keys to allow casting to dict. This method will only return keys for attributes which are not private nor methods.
def keys(self):
return [k for k in dir(self) if not k.startswith('_') and not callable(self[k])]
def __str__(self):
return json.dumps(dict(self))
This does the right thing.
obj = MyClass()
obj.age = 3
print(obj)
# prints: {"age": 3}

How to implement mutable PickleTypes that automatically update on change

SQLAlchemy offers the PickleType and offers mutation tracking for any type that is mutable (like a dict).
The SQLAlchemy documentation mentions that this is the way to implement a mutable PickleType but it does not state exactly how to proceed with it.
Note: I want to store a dict in the PickleType.
How do you implement this?
While the documentation mentions some examples, it is not sufficient in my eyes, so I will add my implementation here that can be used to implement a mutable dict that is pickled and stored in the database.
Use the MutableDict example from the docs:
class MutableDict(Mutable, dict):
#classmethod
def coerce(cls, key, value):
if not isinstance(value, MutableDict):
if isinstance(value, dict):
return MutableDict(value)
return Mutable.coerce(key, value)
else:
return value
def __delitem(self, key):
dict.__delitem__(self, key)
self.changed()
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
self.changed()
def __getstate__(self):
return dict(self)
def __setstate__(self, state):
self.update(self)
Now create a column to be tracked:
class MyModel(Base):
data = Column(MutableDict.as_mutable(PickleType))
I would like to see some other examples that are maybe more advanced or possibly use different data structures. What would a generic approach for pickle look like? Is there one (I suppose not, or SQLAlchemy would have one).
Here's a solution I came up with. It wraps any type and detects any attribute sets and calls Mutable.changed(). It also wraps function calls and detects changes by taking a snapshot of the object before and after and comparing. Should work for Pickleable types...
from sqlalchemy.ext.mutable import Mutable
class MutableTypeWrapper(Mutable):
top_attributes = ['_underlying_object',
'_underlying_type',
'_last_state',
'_snapshot_update',
'_snapshot_changed',
'_notify_if_changed',
'changed',
'__getstate__',
'__setstate__',
'coerce']
#classmethod
def coerce(cls, key, value):
if not isinstance(value, MutableTypeWrapper):
try:
return MutableTypeWrapper(value)
except:
return Mutable.coerce(key, value)
else:
return value
def __getstate__(self):
return self._underlying_object
def __setstate__(self, state):
self._underlying_type = type(state)
self._underlying_object = state
def __init__(self, underlying_object, underlying_type=None):
if (underlying_object is None and underlying_type is None):
print('Both underlying object and type are none.')
raise RuntimeError('Unable to create MutableTypeWrapper with no underlying object or type.')
if (underlying_object is not None):
self._underlying_object = underlying_object
else:
self._underlying_object = underlying_type()
if (underlying_type is not None):
self._underlying_type = underlying_type
else:
self._underlying_type = type(underlying_object)
def __getattr__(self, attr):
if (attr in MutableTypeWrapper.top_attributes):
return object.__getattribute__(self, attr)
orig_attr = self._underlying_object.__getattribute__(attr)
if callable(orig_attr):
def hooked(*args, **kwargs):
self._snapshot_update()
result = orig_attr(*args, **kwargs)
self._notify_if_changed()
# prevent underlying from becoming unwrapped
if result == self._underlying_object:
return self
return result
return hooked
else:
return orig_attr
def __setattr__(self, attr, value):
if (attr in MutableTypeWrapper.top_attributes):
object.__setattr__(self, attr, value)
return
self._underlying_object.__setattr__(attr, value)
self.changed()
def _snapshot_update(self):
self._last_state = pickle.dumps(self._underlying_object,
pickle.HIGHEST_PROTOCOL)
def _snapshot_changed(self):
return self._last_state != pickle.dumps(self._underlying_object,
pickle.HIGHEST_PROTOCOL)
def _notify_if_changed(self):
if (self._snapshot_changed()):
self.changed()
And then use it with PickleType as follows:
class TestModel(Base):
__tablename__ = 'testtable'
id = Column(Integer, primary_key=True)
obj = Column(MutableTypeWrapper.as_mutable(PickleType))
The disadvantage here is the underlying class is snapshotted before every function call, and then changes are compared after in order to verify if the underlying object has changed. This will have a significant performance impact.
The other way to ensure that your PickleType objects are updated when you modify them is to copy and assign them before committing changes.

Clean way to disable `__setattr__` until after initialization

I've written the following wrapper class. I want to define __setattr__ such that it redirects all attributes to the wrapped class. However, this prevents me from initializing the wrapper class. Any elegant way to fix this?
class Wrapper:
def __init__(self, value):
# How to use the default '__setattr__' inside '__init__'?
self.value = value
def __setattr__(self, name, value):
setattr(self.value, name, value)
You are catching all assignments, which prevents the constructor from assigning self.value. You can use self.__dict__ to access the instance dictionary. Try:
class Wrapper:
def __init__(self, value):
self.__dict__['value'] = value
def __setattr__(self, name, value):
setattr(self.value, name, value)
Another way using object.__setattr__:
class Wrapper(object):
def __init__(self, value):
object.__setattr__(self, 'value', value)
def __setattr__(self, name, value):
setattr(self.value, name, value)
A way to disable the __setattr__ until after initialization without changing the self.value = value syntax in the __init__ method is covered here. In short, embed knowledge of initialization in the object and use it in the __setattr__ method. For your Wrapper:
class Wrapper:
__initialized = False
def __init__(self, value):
self.value = value
self.__initialized = True
def __setattr__(self, name, value):
if self.__initialized:
# your __setattr__ implementation here
else:
object.__setattr__(self, name, value)
With __getattr__ overridden as well::
class Wrapper:
def __init__(self,wrapped):
self.__dict__['wrapped'] = wrapped
def __setattr__(self,name,value):
setattr(self.__dict__['wrapped'],name,value)
def __getattr__(self,name):
return getattr(self.__dict__['wrapped'],name)
class A:
def __init__(self,a):
self.a = a
wa = Wrapper(A(3))
#wa.a == wa.wrapped.a == 3
As suggested in other answers, one idea is to directly access the object dictionary to bypass setattr resolution.
For something easy to read, I suggest the following:
def __init__(self,wrapped1, wrapped2):
vars(self).update(dict(
_wrapped1=wrapped1,
_wrapped2=wrapped2,
))
Using vars is optional, but I find it nicer than directly accessing self.__dict__, and the inline dict() notation allows for grouping all instance variable initialization in a visible block with minimum boilerplate code overhead.

How to clear instance data without setattr?

I wanted, to make traversable (by DB, single file or just as string) class in python. I Write this (shorted):
from json import JSONDecoder, JSONEncoder
def json_decode(object): return JSONDecoder().decode(object)
def json_encode(object): return JSONEncoder().encode(object)
class Storage:
__separator__ = 'ANY OF ANYS'
__keys__ = []
__vals__ = []
__slots__ = ('__keys__', '__vals__', '__separator__')
def __getattr__(self, key):
try:
return self.__vals__[self.__keys__.index(key)]
except IndexError:
raise AttributeError
def __setattr__(self, key, val):
self.__keys__.append(key)
self.__vals__.append(val)
def store(self):
return (json_encode(self.__keys__) + self.__separator__ +
json_encode(self.__vals__))
def restore(self, stored):
stored = stored.split(self.__separator__)
for (key, val) in zip(json_decode(stored[0]), json_decode(stored[1])):
setattr(self, key, val)
And yea - that work, but... When i'm making more instances, all of them are like singleton.
So - how to set attribute to instance without _setattr_?
PS. I got idea - make in set/getattr an pass for keys/vals, but it'll make mess.
your __separator__, __keys__, __vals__ and __slots__ are attributes of the object "Storage"(class object). I don't know if it's exactly the same, but I'd call it static variables of the class.
If you want to have different values for each instance of Storage, define each of these variables in your __init__ function:
class Storage(object):
__slots__ = ('__keys__', '__vals__', '__separator__')
def __init__(self):
super(Storage, self).__setattr__('__separator__', "ANY OF ANYS")
super(Storage, self).__setattr__('__keys__', [])
super(Storage, self).__setattr__('__vals__', [])
def __getattr__(self, key):
try:
vals = getattr(self, '__vals__')
keys = getattr(self, '__keys__')
return vals[keys.index(key)]
except IndexError:
raise AttributeError
def __setattr__(self, key, val):
vals = getattr(self, '__vals__')
keys = getattr(self, '__keys__')
vals.append(val)
keys.append(key)
edited so getattr and setattr works
I got that problem 2 days ago. Don't know if that's exactly your problem, but you said that about "its like I have a singleton"
You could make your Storage class a subclass of a special base class like this:
class Singleton(object):
def __new__(cls, *args, **kwargs):
if '_inst_' not in vars(cls):
cls._inst = type.__new__(cls, *args, *kwargs)
return cls._inst
class Storage(Singleton):
....
As long as you don't override __new__() in your subclass, all subsequent calls to create new instances after the first will return the one first created.

Categories

Resources