How to properly subclass dict and override __getitem__ & __setitem__ - python

I am debugging some code and I want to find out when a particular dictionary is accessed. Well, it's actually a class that subclass dict and implements a couple extra features. Anyway, what I would like to do is subclass dict myself and add override __getitem__ and __setitem__ to produce some debugging output. Right now, I have
class DictWatch(dict):
def __init__(self, *args):
dict.__init__(self, args)
def __getitem__(self, key):
val = dict.__getitem__(self, key)
log.info("GET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
return val
def __setitem__(self, key, val):
log.info("SET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
dict.__setitem__(self, key, val)
'name_label' is a key which will eventually be set that I want to use to identify the output. I have then changed the class I am instrumenting to subclass DictWatch instead of dict and changed the call to the superconstructor. Still, nothing seems to be happening. I thought I was being clever, but I wonder if I should be going a different direction.
Thanks for the help!

Another issue when subclassing dict is that the built-in __init__ doesn't call update, and the built-in update doesn't call __setitem__. So, if you want all setitem operations to go through your __setitem__ function, you should make sure that it gets called yourself:
class DictWatch(dict):
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def __getitem__(self, key):
val = dict.__getitem__(self, key)
print('GET', key)
return val
def __setitem__(self, key, val):
print('SET', key, val)
dict.__setitem__(self, key, val)
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
print('update', args, kwargs)
for k, v in dict(*args, **kwargs).items():
self[k] = v

What you're doing should absolutely work. I tested out your class, and aside from a missing opening parenthesis in your log statements, it works just fine. There are only two things I can think of. First, is the output of your log statement set correctly? You might need to put a logging.basicConfig(level=logging.DEBUG) at the top of your script.
Second, __getitem__ and __setitem__ are only called during [] accesses. So make sure you only access DictWatch via d[key], rather than d.get() and d.set()

Consider subclassing UserDict or UserList. These classes are intended to be subclassed whereas the normal dict and list are not, and contain optimisations.

That should not really change the result (which should work, for good logging threshold values) :
your init should be :
def __init__(self,*args,**kwargs) : dict.__init__(self,*args,**kwargs)
instead, because if you call your method with DictWatch([(1,2),(2,3)]) or DictWatch(a=1,b=2) this will fail.
(or,better, don't define a constructor for this)

As Andrew Pate's answer proposed, subclassing collections.UserDict instead of dict is much less error prone.
Here is an example showing an issue when inheriting dict naively:
class MyDict(dict):
def __setitem__(self, key, value):
super().__setitem__(key, value * 10)
d = MyDict(a=1, b=2) # Bad! MyDict.__setitem__ not called
d.update(c=3) # Bad! MyDict.__setitem__ not called
d['d'] = 4 # Good!
print(d) # {'a': 1, 'b': 2, 'c': 3, 'd': 40}
UserDict inherits from collections.abc.MutableMapping, so this works as expected:
class MyDict(collections.UserDict):
def __setitem__(self, key, value):
super().__setitem__(key, value * 10)
d = MyDict(a=1, b=2) # Good: MyDict.__setitem__ correctly called
d.update(c=3) # Good: MyDict.__setitem__ correctly called
d['d'] = 4 # Good
print(d) # {'a': 10, 'b': 20, 'c': 30, 'd': 40}
Similarly, you only have to implement __getitem__ to automatically be compatible with key in my_dict, my_dict.get, …
Note: UserDict is not a subclass of dict, so isinstance(UserDict(), dict) will fail (but isinstance(UserDict(), collections.abc.MutableMapping) will work).

All you will have to do is
class BatchCollection(dict):
def __init__(self, inpt={}):
super(BatchCollection, self).__init__(inpt)
A sample usage for my personal use
### EXAMPLE
class BatchCollection(dict):
def __init__(self, inpt={}):
super(BatchCollection, self).__init__(inpt)
def __setitem__(self, key, item):
if (isinstance(key, tuple) and len(key) == 2
and isinstance(item, collections.Iterable)):
# self.__dict__[key] = item
super(BatchCollection, self).__setitem__(key, item)
else:
raise Exception(
"Valid key should be a tuple (database_name, table_name) "
"and value should be iterable")
Note: tested only in python3

Related

Custom key function for python defaultdict

What is a good way to define a custom key function analogous to the key argument to list.sort, for use in a collections.defaultdict?
Here's an example use case:
import collections
class Path(object):
def __init__(self, start, end, *other_features):
self._first = start
self._last = end
self._rest = other_features
def startpoint(self):
return self._first
def endpoint(self):
return self._last
# Maybe it has __eq__ and __hash__, maybe not
paths = [... a list of Path objects ...]
by_endpoint = collections.defaultdict(list)
for p in paths:
by_last_name[p.endpoint()].append(p)
# do stuff that depends on lumping paths with the same endpoint together
What I desire is a way to tell by_endpoint to use Path.endpoint as the key function, similar to the key argument to list.sort, and not have to put this key definition into the Path class itself (via __eq__ and __hash__), since it is just as sensible to also support "lumping by start point" as well.
Something like this maybe:
from collections import defaultdict
class defaultkeydict(defaultdict):
def __init__(self, default_factory, key=lambda x: x, *args, **kwargs):
defaultdict.__init__(self, default_factory, *args, **kwargs)
self.key_func = key
def __getitem__(self, key):
return defaultdict.__getitem__(self, self.get_key(key))
def __setitem__(self, key, value):
defaultdict.__setitem__(self, self.get_key(key), value)
def get_key(self, key):
try:
return self.key_func(key)
except Exception:
return key
Note the logic that falls back to the passed-in key if the key function can't be executed. That way you can still access the items using strings or whatever keys.
Now:
p = Path("Seattle", "Boston")
d = defaultkeydict(list, key=lambda x: x.endpoint())
d[p].append(p)
print(d) # defaultdict(<type 'list'>, {'Boston': [<__main__.Path object at ...>]})

Python 2 __missing__ method

I wrote a very simple program to subclass a dictionary. I wanted to try the __missing__ method in python.
After some research i found out that in Python 2 it's available in defaultdict. ( In python 3 we use collections.UserDict though..)
The __getitem__ is the on responsible for calling the __missing__ method if the key isn't found.
When i implement __getitem__ in the following program i get a key error, but when i implement without it, i get the desired value.
import collections
class DictSubclass(collections.defaultdict):
def __init__(self,dic):
if dic is None:
self.data = None
else:
self.data = dic
def __setitem__(self,key,value):
self.data[key] = value
########################
def __getitem__(self,key):
return self.data[key]
########################
def __missing__(self,key):
self.data[key] = None
dic = {'a':4,'b':10}
d1 = DictSubclass(dic)
d2 = DictSubclass(None)
print d1[2]
I thought i needed to implement __getitem__ since it's responsible for calling __missing__. I understand that the class definition of defaultdict has a __getitem__ method. But even so, say i wanted to write my own __getitem__, how would i do it?
The dict type will always try to call __missing__. All that defaultdict does is provide an implementation; if you are providing your own __missing__ method you don't have to subclass defaultdict at all.
See the dict documentation:
d[key]
Return the item of d with key key. Raises a KeyError if key is not in the map.
If a subclass of dict defines a method __missing__() and key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call. No other operations or methods invoke __missing__().
However, you need to leave the default __getitem__ method in place, or at least call it. If you override dict.__getitem__ with your own version and not call the base implementation, __missing__ is never called.
You could call __missing__ from your own implementation:
def __getitem__(self, key):
if key not in self.data:
return self.__missing__(key)
return self.data[key]
or you could call the original implementation:
def __getitem__(self, key):
if key not in self.data:
return super(DictSubclass , self).__getitem__(key)
return self.data[key]
In Python 2, you can just subclass UserDict.UserDict:
from UserDict import UserDict
class DictSubclass(UserDict):
def __missing__(self, key):
self.data[key] = None

Custom OrderedDict that returns itself

I have the following custom class:
class MyArray (OrderedDict):
def __init__ (self,*args):
OrderedDict.__init__(self,*args)
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return OrderedDict.__getitem__ (self,key)
return MyArray((k,self[k]) for k in key)
This class does exactly what i want for when i have multiple keys, but doesn't do what i want for single keys.
Let me demonstrate what my code outputs:
x = MyArray()
x[0] = 3
x[1] = 4
x[2] = 5
print x[1,0,2]
MyArray([(1,4),(0,3),(2,5)])
But then:
print x[1]
4
I want it to be:
MyArray([(1,4)])
Here was my attempt to fix it to act the way i want (which led to infinite recursion):
class MyArray (OrderedDict):
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return MyArray({key:OrderedDict.__getitem__ (self,key)})
return MyArray((k,OrderedDict.__getitem__ (self,k)) for k in key)
The key here is to realize that self[k] is the same as self.__getitem__(k) so you don't want to use self[k] inside __getitem__, unless you are in fact trying to do some recursion. Instead always use OrderedDict.__getitem__ (self,key).
On an unrelated note, you generally don't want to create a method that just calls the same method of the parent class, ie:
class MyArray (OrderedDict):
def __init__ (self,*args):
OrderedDict.__init__(self,*args)
Just delete the method and python will call the parent class method for you, inheritance is awesome :).
update:
After some digging I found that you get infinite recursion when you try to print a MyArray because OrderedDict.__repr__ calls OrderDict.items which then calls OrderDict.__getitem__ (in the form of self[key]), then it calls __repr__ on each of the items ... The issue here is that you're modifying __getitem__ to do something very different than what it does in the Parent class. If you want this class to have the full functionality of a python class, you'll need to override every method that uses self[key] anywhere in the method. You can start with items, ie something like:
def items(self):
'od.items() -> list of (key, value) pairs in od'
return [(key, OrderedDict.__getitem__(self, key)) for key in self]
When you hit this kind of thing it's often better to drop the subclassing and just have the OrderedDict be an attribute of the new class, something like:
class MyArray(object):
def __init__(self, *args):
self.data = OrderedDict(*args)
def __getitem__(self, key):
if not hasattr (key, '__iter__'):
return MyArray([(key, self.data[key])])
return MyArray([(k, self.data[k]) for k in key])
The infinite recursion was happening in self.items() as Bi Rico pointed out.
Here is a code that works (essentially overrides self.items())
class MyArray (OrderedDict):
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return MyArray({key:OrderedDict.__getitem__ (self,key)})
return MyArray((k,OrderedDict.__getitem__ (self,k)) for k in key)
def items(self):
'''
called when the dictionary is printed
'''
return [(k, OrderedDict.__getitem__(self, k)) for k in self]
The code above would have worked without the items definition if I had inherited from dict instead of OrderedDict.

Subclassing dict causes error TypeError: argument of type 'type' is not iterable

I'm trying to subclass dict but I get the error in the subject when trying to override the __getitem__ function. I obtained the initial implementation of the derived class from this post.
The idea is to check if a key exists (a key is a tuple of 2 strings), and if not add the key to the dictionary and return its value. The values of the dictionary are obtained by calling eval() on the concatenated strings of the key. By the way, I know of the existence of default dictionaries (which may help in this case), but I wanted to do it differently. This is the code
class DictSubclass(dict):
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def __getitem__(self, key):
if not key in dict:
fn = '{}_{}'.format(key[0], key[1])
dict.__setitem__(key,eval(fn))
val = dict.__getitem__(self, key)
return val
def __setitem__(self, key, val):
dict.__setitem__(self, key, val)
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
for k, v in dict(*args, **kwargs).items():
self[k] = v
Why am I getting the following error?
line 21, in __getitem__
if not key in dict:
TypeError: argument of type 'type' is not iterable
You should have used if not key in self:. As if not key in dict: asks if the key is in the class dict which as classes are not iterables, fails.

Correct usage of a getter/setter for dictionary values

I'm pretty new to Python, so if there's anything here that's flat-out bad, please point it out.
I have an object with this dictionary:
traits = {'happy': 0, 'worker': 0, 'honest': 0}
The value for each trait should be an int in the range 1-10, and new traits should not be allowed to be added. I want getter/setters so I can make sure these constraints are being kept. Here's how I made the getter and setter now:
def getTrait(self, key):
if key not in self.traits.keys():
raise KeyError
return traits[key]
def setTrait(self, key, value):
if key not in self.traits.keys():
raise KeyError
value = int(value)
if value < 1 or value > 10:
raise ValueError
traits[key] = value
I read on this website about the property() method. But I don't see an easy way to make use of it for getting/setting the values inside the dictionary. Is there a better way to do this? Ideally I would like the usage of this object to be obj.traits['happy'] = 14, which would invoke my setter method and throw a ValueError since 14 is over 10.
If you are willing to use syntax like obj['happy'] = 14 then you could use __getitem__ and __setitem__:
def __getitem__(self, key):
if key not in self.traits.keys():
raise KeyError
...
return traits[key]
def __setitem__(self, key, value):
if key not in self.traits.keys():
raise KeyError
...
traits[key] = value
If you really do want obj.traits['happy'] = 14 then you could define a subclass of dict and make obj.traits an instance of this subclass.
The subclass would then override __getitem__ and __setitem__ (see below).
PS. To subclass dict, inherit from both collections.MutableMapping, and dict. Otherwise, dict.update would not call the new __setitem__.
import collections
class TraitsDict(collections.MutableMapping,dict):
def __getitem__(self,key):
return dict.__getitem__(self,key)
def __setitem__(self, key, value):
value = int(value)
if not 1 <= value <= 10:
raise ValueError('{v} not in range [1,10]'.format(v=value))
dict.__setitem__(self,key,value)
def __delitem__(self, key):
dict.__delitem__(self,key)
def __iter__(self):
return dict.__iter__(self)
def __len__(self):
return dict.__len__(self)
def __contains__(self, x):
return dict.__contains__(self,x)
class Person(object):
def __init__(self):
self.traits=TraitsDict({'happy': 0, 'worker': 0, 'honest': 0})
p=Person()
print(p.traits['happy'])
# 0
p.traits['happy']=1
print(p.traits['happy'])
# 1
p.traits['happy']=14
# ValueError: 14 not in range [1,10]
Some obvious tips come to my mind first:
Do not use .keys() method when checking for existence of some key (instead of if key not in self.traits.keys() use if key not in self.traits).
Do not explicitly throw KeyError exception - it is thrown if you try to access inexistent key.
Your code could look like this after above changes:
def getTrait(self, key):
return traits[key]
def setTrait(self, key, value):
if key not in self.traits:
raise KeyError
value = int(value)
if value < 1 or value > 10:
raise ValueError
traits[key] = value
Ps. I did no check the correctness of your code thoroughly - there may be some other issues.
and new traits should not be allowed to be added.
The natural way to do this is to use an object instead of a dictionary, and set the class' __slots__.
The value for each trait should be an int in the range 1-10... I want getter/setters so I can make sure these constraints are being kept.
The natural way to do this is to use an object instead of a dictionary, so that you can write getter/setter logic that's part of the class, and wrap them up as properties. Since all these properties will work the same way, we can do some refactoring to write code that generates a property given an attribute name.
The following is probably over-engineered:
def one_to_ten(attr):
def get(obj): return getattr(obj, attr)
def set(obj, val):
val = int(val)
if not 1 <= val <= 10: raise ValueError
setattr(obj, attr, val)
return property(get, set)
def create_traits_class(*traits):
class Traits(object):
__slots__ = ['_' + trait for trait in traits]
for trait in traits: locals()[trait] = one_to_ten('_' + trait)
def __init__(self, **kwargs):
for k, v in kwargs.items(): setattr(self, k, v)
for trait in traits: assert hasattr(self, trait), "Missing trait in init"
def __repr__(self):
return 'Traits(%s)' % ', '.join(
'%s = %s' % (trait, getattr(self, trait)) for trait in traits
)
return Traits
example_type = create_traits_class('happy', 'worker', 'honest')
example_instance = example_type(happy=3, worker=8, honest=4)
# and you can set the .traits of some other object to example_instance.

Categories

Resources