Custom OrderedDict that returns itself - python

I have the following custom class:
class MyArray (OrderedDict):
def __init__ (self,*args):
OrderedDict.__init__(self,*args)
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return OrderedDict.__getitem__ (self,key)
return MyArray((k,self[k]) for k in key)
This class does exactly what i want for when i have multiple keys, but doesn't do what i want for single keys.
Let me demonstrate what my code outputs:
x = MyArray()
x[0] = 3
x[1] = 4
x[2] = 5
print x[1,0,2]
MyArray([(1,4),(0,3),(2,5)])
But then:
print x[1]
4
I want it to be:
MyArray([(1,4)])
Here was my attempt to fix it to act the way i want (which led to infinite recursion):
class MyArray (OrderedDict):
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return MyArray({key:OrderedDict.__getitem__ (self,key)})
return MyArray((k,OrderedDict.__getitem__ (self,k)) for k in key)

The key here is to realize that self[k] is the same as self.__getitem__(k) so you don't want to use self[k] inside __getitem__, unless you are in fact trying to do some recursion. Instead always use OrderedDict.__getitem__ (self,key).
On an unrelated note, you generally don't want to create a method that just calls the same method of the parent class, ie:
class MyArray (OrderedDict):
def __init__ (self,*args):
OrderedDict.__init__(self,*args)
Just delete the method and python will call the parent class method for you, inheritance is awesome :).
update:
After some digging I found that you get infinite recursion when you try to print a MyArray because OrderedDict.__repr__ calls OrderDict.items which then calls OrderDict.__getitem__ (in the form of self[key]), then it calls __repr__ on each of the items ... The issue here is that you're modifying __getitem__ to do something very different than what it does in the Parent class. If you want this class to have the full functionality of a python class, you'll need to override every method that uses self[key] anywhere in the method. You can start with items, ie something like:
def items(self):
'od.items() -> list of (key, value) pairs in od'
return [(key, OrderedDict.__getitem__(self, key)) for key in self]
When you hit this kind of thing it's often better to drop the subclassing and just have the OrderedDict be an attribute of the new class, something like:
class MyArray(object):
def __init__(self, *args):
self.data = OrderedDict(*args)
def __getitem__(self, key):
if not hasattr (key, '__iter__'):
return MyArray([(key, self.data[key])])
return MyArray([(k, self.data[k]) for k in key])

The infinite recursion was happening in self.items() as Bi Rico pointed out.
Here is a code that works (essentially overrides self.items())
class MyArray (OrderedDict):
def __getitem__ (self, key):
if not hasattr (key, '__iter__'):
return MyArray({key:OrderedDict.__getitem__ (self,key)})
return MyArray((k,OrderedDict.__getitem__ (self,k)) for k in key)
def items(self):
'''
called when the dictionary is printed
'''
return [(k, OrderedDict.__getitem__(self, k)) for k in self]
The code above would have worked without the items definition if I had inherited from dict instead of OrderedDict.

Related

Should I implement __contains__ method when implementing Mapping or MutableMapping in Python?

I wrote a code for a custom dictionary using a list.
According to collections.abc module I only need to implement __getitem__, __iter__, __len__ when impelementing Mapping and I did.
But a code include in statement which using __contains__ methods inside seems always returns True.
Should I write __contains__ method too?
from collections import Mapping
class ListDict(Mapping):
def __init__(self):
self._data = [] # [(key, value)]
def __getitem__(self, item):
for k, v in self._data:
if k == item:
return v
def __iter__(self):
return iter(e[0] for e in self._data)
def __len__(self):
return len(self._data)
d = ListDict()
print('key' in d) # True
You forgot to handle the key-not-found case in your __getitem__. The provided __contains__ implementation relies on __getitem__ raising a KeyError for not-present keys. Your __getitem__ is broken, so the provided __contains__ breaks too.
Raise a KeyError:
def __getitem__(self, item):
for k, v in self._data:
if k == item:
return v
raise KeyError(item)

Why can I reassign dict.update but not dict.__setitem__

I'm trying to modify a third-party dict class to make it immutable after a certain point.
With most classes, I can assign to method slots to modify behavior.
However, this doesn't seem possible with all methods in all classes. In particular for dict, I can reassign update, but not __setitem__.
Why? How are they different?
For example:
class Freezable(object):
def _not_modifiable(self, *args, **kw):
raise NotImplementedError()
def freeze(self):
"""
Disallow mutating methods from now on.
"""
print "FREEZE"
self.__setitem__ = self._not_modifiable
self.update = self._not_modifiable
# ... others
return self
class MyDict(dict, Freezable):
pass
d = MyDict()
d.freeze()
print d.__setitem__ # <bound method MyDict._not_modifiable of {}>
d[2] = 3 # no error -- this is incorrect.
d.update({4:5}) # raise NotImplementedError
Note that you can define the class __setitem__, e.g.:
def __setitem__(self, key, value):
if self.update is Freezable._not_modifiable:
raise TypeError('{} has been frozen'.format(id(self)))
dict.__setitem__(self, key, value)
(This method is a bit clumsy; there are other options. But it's one way to make it work even though Python calls the class's __setitem__ directly.)

python getting object properties with __dict__

Using py3, I have an object that uses the #property decorator
class O(object):
def __init__(self):
self._a = None
#property
def a(self):
return 1
accessing the attribute a via __dict__ (with _a) doesn't seem to return the property decorated value but the initialized value None
o = O()
print(o.a, o.__dict__['_a'])
>>> 1, None
Is there a generic way to make this work? I mostly need this for
def __str__(self):
return ' '.join('{}: {}'.format(key, val) for key, val in self.__dict__.items())
Of course self.__dict__["_a"] will return self._a (well actually it's the other way round - self._a will return self.__dict__["_a"] - but anyway), not self.a. The only thing the property is doing here is to automatically invoke it's getter (your a(self) function) so you don't have to type the parens, otherwise it's just a plain method call.
If you want something that works with properties too, you'll have to get those manually from dir(self.__class__) and getattr(self.__class__, name), ie:
def __str__(self):
# py2
attribs = self.__dict__.items()
# py3
# attribs = list(self.__dict__.items())
for name in dir(self.__class__):
obj = getattr(self.__class__, name)
if isinstance(obj, property):
val = obj.__get__(self, self.__class__)
attribs.append((name, val))
return ' '.join('{}: {}'.format(key, val) for key, val in attribs)
Note that this won't prevent _a to appears in attribs - if you want to avoid this you'll also have to filter out protected names from the attribs list (all protected names, since you ask for something generic):
def __str__(self):
attribs = [(k, v) for k, v in self.__dict__.items() if not k.startswith("_")]
for name in dir(self.__class__):
# a protected property is somewhat uncommon but
# let's stay consistent with plain attribs
if name.startswith("_"):
continue
obj = getattr(self.__class__, name)
if isinstance(obj, property):
val = obj.__get__(self, self.__class__)
attribs.append((name, val))
return ' '.join('{}: {}'.format(key, val) for key, val in attribs)
Also note that this won't handle other computed attributes (property is just one generic implementation of the descriptor protocol). At this point, your best bet for something that's still as generic as possible but that can be customised if needed is to implement the above as a mixin class with a couple hooks for specialization:
class PropStrMixin(object):
# add other descriptor types you want to include in the
# attribs list
_COMPUTED_ATTRIBUTES_CLASSES = [property,]
def _get_attr_list(self):
attribs = [(k, v) for k, v in self.__dict__.items() if not k.startswith("_")]
for name in dir(self.__class__):
# a protected property is somewhat uncommon but
# let's stay consistent with plain attribs
if name.startswith("_"):
continue
obj = getattr(self.__class__, name)
if isinstance(obj, *self._COMPUTED_ATTRIBUTES_CLASSES):
val = obj.__get__(self, self.__class__)
attribs.append((name, val))
return attribs
def __str__(self):
attribs = self._get_attr_list()
return ' '.join('{}: {}'.format(key, val) for key, val in attribs)
class YouClass(SomeParent, PropStrMixin):
# here you can add to _COMPUTED_ATTRIBUTES_CLASSES
_COMPUTED_ATTRIBUTES_CLASSES = PropStrMixin + [SomeCustomDescriptor])
Property is basically a "computed attribute". In general, the property's value is not stored anywhere, it is computed on demand. That's why you cannot find it in the __dict__.
#property decorator replaces the class method by a descriptor object which then calls the original method as its getter. This happens at the class level.
The lookup for o.a starts at the instance. It does not exist there, the class is checked in the next step. O.a exists and is a descriptor (because it has special methods for the descriptor protocol), so the descriptor's getter is called and the returned value is used.
(EDITED)
There is not a general way to dump the name:value pairs for the descriptors. Classes including the bases must be inspected, this part is not difficult. However retrieving the values is equivalent to a function call and may have unexpected and undesirable side-effects. For a different perspective I'd like to quote a comment by bruno desthuilliers here: "property get should not have unwanted side effects (if it does then there's an obvious design error)".
You can also update self._a as getter since the return of the getter should always reflect what self._a is stored:
class O(object):
def __init__(self):
self._a = self.a
#property
def a(self):
self._a = 1
return self._a
A bit redundant, maybe, but setting self._a = None initially is useless in this case.
In case you need a setter
This would also be compatible given remove the first line in getter:
#a.setter
def a(self, value):
self._a = value

Python 2 __missing__ method

I wrote a very simple program to subclass a dictionary. I wanted to try the __missing__ method in python.
After some research i found out that in Python 2 it's available in defaultdict. ( In python 3 we use collections.UserDict though..)
The __getitem__ is the on responsible for calling the __missing__ method if the key isn't found.
When i implement __getitem__ in the following program i get a key error, but when i implement without it, i get the desired value.
import collections
class DictSubclass(collections.defaultdict):
def __init__(self,dic):
if dic is None:
self.data = None
else:
self.data = dic
def __setitem__(self,key,value):
self.data[key] = value
########################
def __getitem__(self,key):
return self.data[key]
########################
def __missing__(self,key):
self.data[key] = None
dic = {'a':4,'b':10}
d1 = DictSubclass(dic)
d2 = DictSubclass(None)
print d1[2]
I thought i needed to implement __getitem__ since it's responsible for calling __missing__. I understand that the class definition of defaultdict has a __getitem__ method. But even so, say i wanted to write my own __getitem__, how would i do it?
The dict type will always try to call __missing__. All that defaultdict does is provide an implementation; if you are providing your own __missing__ method you don't have to subclass defaultdict at all.
See the dict documentation:
d[key]
Return the item of d with key key. Raises a KeyError if key is not in the map.
If a subclass of dict defines a method __missing__() and key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call. No other operations or methods invoke __missing__().
However, you need to leave the default __getitem__ method in place, or at least call it. If you override dict.__getitem__ with your own version and not call the base implementation, __missing__ is never called.
You could call __missing__ from your own implementation:
def __getitem__(self, key):
if key not in self.data:
return self.__missing__(key)
return self.data[key]
or you could call the original implementation:
def __getitem__(self, key):
if key not in self.data:
return super(DictSubclass , self).__getitem__(key)
return self.data[key]
In Python 2, you can just subclass UserDict.UserDict:
from UserDict import UserDict
class DictSubclass(UserDict):
def __missing__(self, key):
self.data[key] = None

How to properly subclass dict and override __getitem__ & __setitem__

I am debugging some code and I want to find out when a particular dictionary is accessed. Well, it's actually a class that subclass dict and implements a couple extra features. Anyway, what I would like to do is subclass dict myself and add override __getitem__ and __setitem__ to produce some debugging output. Right now, I have
class DictWatch(dict):
def __init__(self, *args):
dict.__init__(self, args)
def __getitem__(self, key):
val = dict.__getitem__(self, key)
log.info("GET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
return val
def __setitem__(self, key, val):
log.info("SET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
dict.__setitem__(self, key, val)
'name_label' is a key which will eventually be set that I want to use to identify the output. I have then changed the class I am instrumenting to subclass DictWatch instead of dict and changed the call to the superconstructor. Still, nothing seems to be happening. I thought I was being clever, but I wonder if I should be going a different direction.
Thanks for the help!
Another issue when subclassing dict is that the built-in __init__ doesn't call update, and the built-in update doesn't call __setitem__. So, if you want all setitem operations to go through your __setitem__ function, you should make sure that it gets called yourself:
class DictWatch(dict):
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def __getitem__(self, key):
val = dict.__getitem__(self, key)
print('GET', key)
return val
def __setitem__(self, key, val):
print('SET', key, val)
dict.__setitem__(self, key, val)
def __repr__(self):
dictrepr = dict.__repr__(self)
return '%s(%s)' % (type(self).__name__, dictrepr)
def update(self, *args, **kwargs):
print('update', args, kwargs)
for k, v in dict(*args, **kwargs).items():
self[k] = v
What you're doing should absolutely work. I tested out your class, and aside from a missing opening parenthesis in your log statements, it works just fine. There are only two things I can think of. First, is the output of your log statement set correctly? You might need to put a logging.basicConfig(level=logging.DEBUG) at the top of your script.
Second, __getitem__ and __setitem__ are only called during [] accesses. So make sure you only access DictWatch via d[key], rather than d.get() and d.set()
Consider subclassing UserDict or UserList. These classes are intended to be subclassed whereas the normal dict and list are not, and contain optimisations.
That should not really change the result (which should work, for good logging threshold values) :
your init should be :
def __init__(self,*args,**kwargs) : dict.__init__(self,*args,**kwargs)
instead, because if you call your method with DictWatch([(1,2),(2,3)]) or DictWatch(a=1,b=2) this will fail.
(or,better, don't define a constructor for this)
As Andrew Pate's answer proposed, subclassing collections.UserDict instead of dict is much less error prone.
Here is an example showing an issue when inheriting dict naively:
class MyDict(dict):
def __setitem__(self, key, value):
super().__setitem__(key, value * 10)
d = MyDict(a=1, b=2) # Bad! MyDict.__setitem__ not called
d.update(c=3) # Bad! MyDict.__setitem__ not called
d['d'] = 4 # Good!
print(d) # {'a': 1, 'b': 2, 'c': 3, 'd': 40}
UserDict inherits from collections.abc.MutableMapping, so this works as expected:
class MyDict(collections.UserDict):
def __setitem__(self, key, value):
super().__setitem__(key, value * 10)
d = MyDict(a=1, b=2) # Good: MyDict.__setitem__ correctly called
d.update(c=3) # Good: MyDict.__setitem__ correctly called
d['d'] = 4 # Good
print(d) # {'a': 10, 'b': 20, 'c': 30, 'd': 40}
Similarly, you only have to implement __getitem__ to automatically be compatible with key in my_dict, my_dict.get, …
Note: UserDict is not a subclass of dict, so isinstance(UserDict(), dict) will fail (but isinstance(UserDict(), collections.abc.MutableMapping) will work).
All you will have to do is
class BatchCollection(dict):
def __init__(self, inpt={}):
super(BatchCollection, self).__init__(inpt)
A sample usage for my personal use
### EXAMPLE
class BatchCollection(dict):
def __init__(self, inpt={}):
super(BatchCollection, self).__init__(inpt)
def __setitem__(self, key, item):
if (isinstance(key, tuple) and len(key) == 2
and isinstance(item, collections.Iterable)):
# self.__dict__[key] = item
super(BatchCollection, self).__setitem__(key, item)
else:
raise Exception(
"Valid key should be a tuple (database_name, table_name) "
"and value should be iterable")
Note: tested only in python3

Categories

Resources