I wrote a very simple program to subclass a dictionary. I wanted to try the __missing__ method in python.
After some research i found out that in Python 2 it's available in defaultdict. ( In python 3 we use collections.UserDict though..)
The __getitem__ is the on responsible for calling the __missing__ method if the key isn't found.
When i implement __getitem__ in the following program i get a key error, but when i implement without it, i get the desired value.
import collections
class DictSubclass(collections.defaultdict):
def __init__(self,dic):
if dic is None:
self.data = None
else:
self.data = dic
def __setitem__(self,key,value):
self.data[key] = value
########################
def __getitem__(self,key):
return self.data[key]
########################
def __missing__(self,key):
self.data[key] = None
dic = {'a':4,'b':10}
d1 = DictSubclass(dic)
d2 = DictSubclass(None)
print d1[2]
I thought i needed to implement __getitem__ since it's responsible for calling __missing__. I understand that the class definition of defaultdict has a __getitem__ method. But even so, say i wanted to write my own __getitem__, how would i do it?
The dict type will always try to call __missing__. All that defaultdict does is provide an implementation; if you are providing your own __missing__ method you don't have to subclass defaultdict at all.
See the dict documentation:
d[key]
Return the item of d with key key. Raises a KeyError if key is not in the map.
If a subclass of dict defines a method __missing__() and key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call. No other operations or methods invoke __missing__().
However, you need to leave the default __getitem__ method in place, or at least call it. If you override dict.__getitem__ with your own version and not call the base implementation, __missing__ is never called.
You could call __missing__ from your own implementation:
def __getitem__(self, key):
if key not in self.data:
return self.__missing__(key)
return self.data[key]
or you could call the original implementation:
def __getitem__(self, key):
if key not in self.data:
return super(DictSubclass , self).__getitem__(key)
return self.data[key]
In Python 2, you can just subclass UserDict.UserDict:
from UserDict import UserDict
class DictSubclass(UserDict):
def __missing__(self, key):
self.data[key] = None
Related
I'm trying to modify a third-party dict class to make it immutable after a certain point.
With most classes, I can assign to method slots to modify behavior.
However, this doesn't seem possible with all methods in all classes. In particular for dict, I can reassign update, but not __setitem__.
Why? How are they different?
For example:
class Freezable(object):
def _not_modifiable(self, *args, **kw):
raise NotImplementedError()
def freeze(self):
"""
Disallow mutating methods from now on.
"""
print "FREEZE"
self.__setitem__ = self._not_modifiable
self.update = self._not_modifiable
# ... others
return self
class MyDict(dict, Freezable):
pass
d = MyDict()
d.freeze()
print d.__setitem__ # <bound method MyDict._not_modifiable of {}>
d[2] = 3 # no error -- this is incorrect.
d.update({4:5}) # raise NotImplementedError
Note that you can define the class __setitem__, e.g.:
def __setitem__(self, key, value):
if self.update is Freezable._not_modifiable:
raise TypeError('{} has been frozen'.format(id(self)))
dict.__setitem__(self, key, value)
(This method is a bit clumsy; there are other options. But it's one way to make it work even though Python calls the class's __setitem__ directly.)
I am trying to use pickle to transfer python objects over the wire between 2 servers. I created a simple class, that subclasses dict and I am trying to use pickle for the marshalling:
def value_is_not_none(value):
return value is not None
class CustomDict(dict):
def __init__(self, cond=lambda x: x is not None):
super().__init__()
self.cond = cond
def __setitem__(self, key, value):
if self.cond(value):
dict.__setitem__(self, key, value)
I first tried to use pickle for the marshalling, but when I un-marshalled I received an error related to the lambda expression.
Then I tried to do the marshalling with dill but it seemed the __init__ was not called.
Then I tried again with pickle, but I passed the value_is_not_none() function as the cond parameter - again the __init__() does not seemed to be invoked and the un-marshalling failed on the __setitem__() (cond is None).
Why is that? what am I missing here?
If I try to run the following code:
obj = CustomDict(cond=value_is_not_none)
obj['hello'] = ['world']
payload = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
obj2 = pickle.loads(payload)
it fails with
AttributeError: 'CustomDict' object has no attribute 'cond'
This is a different question than: Python, cPickle, pickling lambda functions
as I tried using dill with lambda and it failed to work, and I also tried passing a function and it also failed.
pickle is loading your dictionary data before it has restored the attributes on your instance. As such the self.cond attribute is not yet set when __setitem__ is called for the dictionary key-value pairs.
Note that pickle will never call __init__; instead it'll create an entirely blank instance and restore the __dict__ attribute namespace on that directly.
You have two options:
default to cond=None and ignore the condition if it is still set to None:
class CustomDict(dict):
def __init__(self, cond=None):
super().__init__()
self.cond = cond
def __setitem__(self, key, value):
if getattr(self, 'cond', None) is None or self.cond(value):
dict.__setitem__(self, key, value)
The getattr() there is needed because a blank instance has no cond attribute at all (it is not set to None, the attribute is entirely missing). You could add cond = None to the class:
class CustomDict(dict):
cond = None
and then just test for if self.cond is None or self.cond(value):.
Define a custom __reduce__ method to control how the initial object is created when restored:
def _default_cond(v): return v is not None
class CustomDict(dict):
def __init__(self, cond=_default_cond):
super().__init__()
self.cond = cond
def __setitem__(self, key, value):
if self.cond(value):
dict.__setitem__(self, key, value)
def __reduce__(self):
return (CustomDict, (self.cond,), None, None, iter(self.items()))
__reduce__ is expected to return a tuple with:
A callable that can be pickled directly (here the class does fine)
A tuple of positional arguments for that callable; on unpickling the first element is called passing in the second as arguments, so by setting this to (self.cond,) we ensure that the new instance is created with cond passed in as an argument and now CustomDict.__init__() will be called.
The next 2 positions are for a __setstate__ method (ignored here) and for list-like types, so we set these to None.
The last element is an iterator for the key-value pairs that pickle then will restore for us.
Note that I replaced the default value for cond with a function here too so you don't have to rely on dill for the pickling.
I have the following custom class:
class MyArray (OrderedDict):
def __init__ (self,*args):
OrderedDict.__init__(self,*args)
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return OrderedDict.__getitem__ (self,key)
return MyArray((k,self[k]) for k in key)
This class does exactly what i want for when i have multiple keys, but doesn't do what i want for single keys.
Let me demonstrate what my code outputs:
x = MyArray()
x[0] = 3
x[1] = 4
x[2] = 5
print x[1,0,2]
MyArray([(1,4),(0,3),(2,5)])
But then:
print x[1]
4
I want it to be:
MyArray([(1,4)])
Here was my attempt to fix it to act the way i want (which led to infinite recursion):
class MyArray (OrderedDict):
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return MyArray({key:OrderedDict.__getitem__ (self,key)})
return MyArray((k,OrderedDict.__getitem__ (self,k)) for k in key)
The key here is to realize that self[k] is the same as self.__getitem__(k) so you don't want to use self[k] inside __getitem__, unless you are in fact trying to do some recursion. Instead always use OrderedDict.__getitem__ (self,key).
On an unrelated note, you generally don't want to create a method that just calls the same method of the parent class, ie:
class MyArray (OrderedDict):
def __init__ (self,*args):
OrderedDict.__init__(self,*args)
Just delete the method and python will call the parent class method for you, inheritance is awesome :).
update:
After some digging I found that you get infinite recursion when you try to print a MyArray because OrderedDict.__repr__ calls OrderDict.items which then calls OrderDict.__getitem__ (in the form of self[key]), then it calls __repr__ on each of the items ... The issue here is that you're modifying __getitem__ to do something very different than what it does in the Parent class. If you want this class to have the full functionality of a python class, you'll need to override every method that uses self[key] anywhere in the method. You can start with items, ie something like:
def items(self):
'od.items() -> list of (key, value) pairs in od'
return [(key, OrderedDict.__getitem__(self, key)) for key in self]
When you hit this kind of thing it's often better to drop the subclassing and just have the OrderedDict be an attribute of the new class, something like:
class MyArray(object):
def __init__(self, *args):
self.data = OrderedDict(*args)
def __getitem__(self, key):
if not hasattr (key, '__iter__'):
return MyArray([(key, self.data[key])])
return MyArray([(k, self.data[k]) for k in key])
The infinite recursion was happening in self.items() as Bi Rico pointed out.
Here is a code that works (essentially overrides self.items())
class MyArray (OrderedDict):
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return MyArray({key:OrderedDict.__getitem__ (self,key)})
return MyArray((k,OrderedDict.__getitem__ (self,k)) for k in key)
def items(self):
'''
called when the dictionary is printed
'''
return [(k, OrderedDict.__getitem__(self, k)) for k in self]
The code above would have worked without the items definition if I had inherited from dict instead of OrderedDict.
I had to write a class of some sort that overrides __getattribute__.
basically my class is a container, which saves every user-added property to self._meta which is a dictionary.
class Container(object):
def __init__(self, **kwargs):
super(Container, self).__setattr__('_meta', OrderedDict())
#self._meta = OrderedDict()
super(Container, self).__setattr__('_hasattr', lambda key : key in self._meta)
for attr, value in kwargs.iteritems():
self._meta[attr] = value
def __getattribute__(self, key):
try:
return super(Container, self).__getattribute__(key)
except:
if key in self._meta : return self._meta[key]
else:
raise AttributeError, key
def __setattr__(self, key, value):
self._meta[key] = value
#usage:
>>> a = Container()
>>> a
<__main__.Container object at 0x0000000002B2DA58>
>>> a.abc = 1 #set an attribute
>>> a._meta
OrderedDict([('abc', 1)]) #attribute is in ._meta dictionary
I have some classes which inherit Container base class and some of their methods have #property decorator.
class Response(Container):
#property
def rawtext(self):
if self._hasattr("value") and self.value is not None:
_raw = self.__repr__()
_raw += "|%s" %(self.value.encode("utf-8"))
return _raw
problem is that .rawtext isn't accessible. (I get attributeerror.) every key in ._meta is accessible, every attributes added by __setattr__ of object base class is accessible, but method-to-properties by #property decorator isn't. I think it has to do with my way of overriding __getattribute__ in Container base class. What should I do to make properties from #property accessible?
I think you should probably think about looking at __getattr__ instead of __getattribute__ here. The difference is this: __getattribute__ is called inconditionally if it exists -- __getattr__ is only called if python can't find the attribute via other means.
I completely agree with mgilson. If you want a sample code which should be equivalent to your code but work well with properties you can try:
class Container(object):
def __init__(self, **kwargs):
self._meta = OrderedDict()
#self._hasattr = lambda key: key in self._meta #???
for attr, value in kwargs.iteritems():
self._meta[attr] = value
def __getattr__(self, key):
try:
return self._meta[key]
except KeyError:
raise AttributeError(key)
def __setattr__(self, key, value):
if key in ('_meta', '_hasattr'):
super(Container, self).__setattr__(key, value)
else:
self._meta[key] = value
I really do not understand your _hasattr attribute. You put it as an attribute but it's actually a function that has access to self... shouldn't it be a method?
Actually I think you should simple use the built-in function hasattr:
class Response(Container):
#property
def rawtext(self):
if hasattr(self, 'value') and self.value is not None:
_raw = self.__repr__()
_raw += "|%s" %(self.value.encode("utf-8"))
return _raw
Note that hasattr(container, attr) will return True also for _meta.
An other thing that puzzles me is why you use an OrderedDict. I mean, you iterate over kwargs, and the iteration has random order since it's a normal dict, and add the items in the OrderedDict. Now you have _meta which contains the values in random order.
If you aren't sure whether you need to have a specific order or not, simply use dict and eventually swap to OrderedDict later.
By the way: never ever use an try: ... except: without specifying the exception to catch. In your code you actually wanted to catch only AttributeErrors so you should have done:
try:
return super(Container, self).__getattribute__(key)
except AttributeError:
#stuff
I am debugging some code and I want to find out when a particular dictionary is accessed. Well, it's actually a class that subclass dict and implements a couple extra features. Anyway, what I would like to do is subclass dict myself and add override __getitem__ and __setitem__ to produce some debugging output. Right now, I have
class DictWatch(dict):
def __init__(self, *args):
dict.__init__(self, args)
def __getitem__(self, key):
val = dict.__getitem__(self, key)
log.info("GET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
return val
def __setitem__(self, key, val):
log.info("SET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
dict.__setitem__(self, key, val)
'name_label' is a key which will eventually be set that I want to use to identify the output. I have then changed the class I am instrumenting to subclass DictWatch instead of dict and changed the call to the superconstructor. Still, nothing seems to be happening. I thought I was being clever, but I wonder if I should be going a different direction.
Thanks for the help!
Another issue when subclassing dict is that the built-in __init__ doesn't call update, and the built-in update doesn't call __setitem__. So, if you want all setitem operations to go through your __setitem__ function, you should make sure that it gets called yourself:
class DictWatch(dict):
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def __getitem__(self, key):
val = dict.__getitem__(self, key)
print('GET', key)
return val
def __setitem__(self, key, val):
print('SET', key, val)
dict.__setitem__(self, key, val)
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
print('update', args, kwargs)
for k, v in dict(*args, **kwargs).items():
self[k] = v
What you're doing should absolutely work. I tested out your class, and aside from a missing opening parenthesis in your log statements, it works just fine. There are only two things I can think of. First, is the output of your log statement set correctly? You might need to put a logging.basicConfig(level=logging.DEBUG) at the top of your script.
Second, __getitem__ and __setitem__ are only called during [] accesses. So make sure you only access DictWatch via d[key], rather than d.get() and d.set()
Consider subclassing UserDict or UserList. These classes are intended to be subclassed whereas the normal dict and list are not, and contain optimisations.
That should not really change the result (which should work, for good logging threshold values) :
your init should be :
def __init__(self,*args,**kwargs) : dict.__init__(self,*args,**kwargs)
instead, because if you call your method with DictWatch([(1,2),(2,3)]) or DictWatch(a=1,b=2) this will fail.
(or,better, don't define a constructor for this)
As Andrew Pate's answer proposed, subclassing collections.UserDict instead of dict is much less error prone.
Here is an example showing an issue when inheriting dict naively:
class MyDict(dict):
def __setitem__(self, key, value):
super().__setitem__(key, value * 10)
d = MyDict(a=1, b=2) # Bad! MyDict.__setitem__ not called
d.update(c=3) # Bad! MyDict.__setitem__ not called
d['d'] = 4 # Good!
print(d) # {'a': 1, 'b': 2, 'c': 3, 'd': 40}
UserDict inherits from collections.abc.MutableMapping, so this works as expected:
class MyDict(collections.UserDict):
def __setitem__(self, key, value):
super().__setitem__(key, value * 10)
d = MyDict(a=1, b=2) # Good: MyDict.__setitem__ correctly called
d.update(c=3) # Good: MyDict.__setitem__ correctly called
d['d'] = 4 # Good
print(d) # {'a': 10, 'b': 20, 'c': 30, 'd': 40}
Similarly, you only have to implement __getitem__ to automatically be compatible with key in my_dict, my_dict.get, …
Note: UserDict is not a subclass of dict, so isinstance(UserDict(), dict) will fail (but isinstance(UserDict(), collections.abc.MutableMapping) will work).
All you will have to do is
class BatchCollection(dict):
def __init__(self, inpt={}):
super(BatchCollection, self).__init__(inpt)
A sample usage for my personal use
### EXAMPLE
class BatchCollection(dict):
def __init__(self, inpt={}):
super(BatchCollection, self).__init__(inpt)
def __setitem__(self, key, item):
if (isinstance(key, tuple) and len(key) == 2
and isinstance(item, collections.Iterable)):
# self.__dict__[key] = item
super(BatchCollection, self).__setitem__(key, item)
else:
raise Exception(
"Valid key should be a tuple (database_name, table_name) "
"and value should be iterable")
Note: tested only in python3