I have subclassed the dict and need to detect all its modifications.
(I know I cannot detect an in-place modification of a stored value. That's OK.)
My code:
def __setitem__(self, key, value):
super().__setitem__(key, value)
self.modified = True
def __delitem__(self, key):
super().__delitem__(key)
self.modified = True
The problem is it works only for a straightforward assignment or deletion. It does not detect changes made by pop(), popitem(), clear() and update().
Why are __setitem__ and __delitem__ bypassed when items are added or deleted? Do I have to redefine all those methods (pop, etc.) as well?
For this usage, you should not subclass dict class, but instead use the abstract classes form the collections module of Python standard library.
You should subclass the MutableMapping abstract class and override the following methods: __getitem__, __setitem__, __delitem__, __iter__ and __len__,
all that by using an inner dict. The abstract base class ensures that all other methods will use those ones.
class MyDict(collections.MutableMapping):
def __init__(self):
self.d = {}
# other initializations ...
def __setitem__(self, key, value):
self.d[key] = value
self.modified = true
...
pop, popitem, clear, update are not implemented through __setitem__ and __delitem__.
You must redefine them also.
I can suggest look at OrderedDict implementation.
Related
I'm working on a project and I have a dictionary in Python. Due to this being an "append-only" dictionary, I need a way to disable .pop, .popitem, .clear, etc. Is this possible in python?
I have tried:
mydict = dict()
#attempt 1:
mydict.pop = None
#Get error pop is read-only
#attempt 2:
del mydict.pop
#Get error pop is read-only
I have tried this on all delete methods with the same results.
Using inheritance to remove functionality violates Liskov substitution (a hypothetical dict subclass would behave erroneously when treated as a dict instance). And besides, I can easily just dict.clear(my_subclassed_dict_instance). The thing you're asking for is not, in Python terminology, a dict and shouldn't masquerade as a subclass of one.
What you're looking for is a fresh class entirely. You want a class that contains a dict, not one that is a dict.
import collections.abc.Mapping
class MyAppendOnlyContainer(Mapping):
def __init__(self, **kwargs):
self._impl = kwargs
def __getitem__(self, key):
return self._impl[key]
def __setitem__(self, key, value):
self._impl[key] = value
def __iter__(self):
return iter(self._impl)
def __len__(self):
return len(self._impl)
# Plus whatever other functionality you need ...
Note that collections.abc.Mapping requires __getitem__, __iter__, and __len__ (which your use case faithfully implements) and gives you get, __contains__, and several other useful dict-like helper functions. What you have is not a dict. What you have is a Mapping with benefits.
Subclass the dict() class:
class NewDict(dict):
def clear(self):
pass
def pop(self, *args): #Note: returns a NoneType
pass
def popitem(self, *args):
pass
Now when you create your dict, create it using:
myDict = NewDict()
I quickly dumped this into the command line interpreter and it seems to work.
I just realised that __setattr__ doesn't work on the class itself. So this implementation,
class Integer:
me_is_int = 0
def __setattr__(self, name, value):
if not isinstance(value, int):
raise TypeError
doesn't raise on this:
Integer.me_is_int = "lol"
So, switching to a metaclass:
class IntegerMeta:
def __setattr__(cls, name, value):
if not isinstance(value, int):
raise TypeError
class Integer(metaclass=IntegerMeta):
me_is_int = 0
this works, but this:
Integer().me_is_int = "lol"
doesn't work yet again. So do I need to copy the __setattr__ method in Integer again to make it work on instances? Is it not possible for Integer to use IntegerMeta's __setattr__ for instances?
You are right in your reasoning: having a custom __setattr__ special method in the metaclass will affect any value setting on the class, and having the it on the class will affect all instances of the class.
With that in mind, if you don't want to duplicate code, is to arrange the metaclass itself to inject the logic in a class, whenever it is created.
The way you've written it, even thinking as an example, is dangerous, as it will affect any attribute set on the class or instances - but if you have a list of the attributes you want to guard in that way, it would also work.
attributes_to_guard = {"me_is_int",}
class Meta:
def __init__(cls, name, bases, ns, **kw):
# This line itself would not work if the setattr would not check
# for a restricted set of attributes to guard:
cls.__setattr__ = cls.__class__.__setattr__
# Also, note that this overrides any manually customized
# __setattr__ on the classes. The mechanism to call those,
# and still add the guarding logic in the metaclass would be
# more complicated, but it can be done
super().__init__(name, bases, ns, **kw)
def __setattr__(self, name, value):
if name in attributes_to_guard not isinstance(value, int):
raise TypeError()
class Integer(metaclass=Meta):
me_is_int = 0
I would like to subclass list and trigger an event (data checking) every time any change happens to the data. Here is an example subclass:
class MyList(list):
def __init__(self, sequence):
super().__init__(sequence)
self._test()
def __setitem__(self, key, value):
super().__setitem__(key, value)
self._test()
def append(self, value):
super().append(value)
self._test()
def _test(self):
""" Some kind of check on the data. """
if not self == sorted(self):
raise ValueError("List not sorted.")
Here, I am overriding methods __init__, __setitem__ and __append__ to perform the check if data changes. I think this approach is undesirable, so my question is: Is there a possibilty of triggering data checking automatically if any kind of mutation happens to the underlying data structure?
As you say, this is not the best way to go about it. To correctly implement this, you'd need to know about every method that can change the list.
The way to go is to implement your own list (or rather a mutable sequence). The best way to do this is to use the abstract base classes from Python which you find in the collections.abc module. You have to implement only a minimum amount of methods and the module automatically implements the rest for you.
For your specific example, this would be something like this:
from collections.abc import MutableSequence
class MyList(MutableSequence):
def __init__(self, iterable=()):
self._list = list(iterable)
def __getitem__(self, key):
return self._list.__getitem__(key)
def __setitem__(self, key, item):
self._list.__setitem__(key, item)
# trigger change handler
def __delitem__(self, key):
self._list.__delitem__(key)
# trigger change handler
def __len__(self):
return self._list.__len__()
def insert(self, index, item):
self._list.insert(index, item)
# trigger change handler
Performance
Some methods are slow in their default implementation. For example __contains__ is defined in the Sequence class as follows:
def __contains__(self, value):
for v in self:
if v is value or v == value:
return True
return False
Depending on your class, you might be able to implement this faster. However, performance is often less important than writing code which is easy to understand. It can also make writing a class harder, because you're then responsible for implementing the methods correctly.
This Python 2 example:
class LoggingDict(dict):
# Simple example of extending a builtin class
def __setitem__(self, key, value):
logging.info('Setting %r to %r' % (key, value))
super(LoggingDict, self).__setitem__(key, value)
and this Python 3 example:
class LoggingDict(dict):
# Simple example of extending a builtin class
def __setitem__(self, key, value):
logging.info('Setting %r to %r' % (key, value))
super().__setitem__(key, value)
illustrate the fact that Python 2's super requires explicit class and self arguments (but Python 3's doesnt). Why is that? It seems like an irritating limitation.
The link in AKS' comment provides the answer here:
Lets say in the Python 2 example I thought "I don't like that explicit class reference. What if I change the name of the class or move this code and forget to update it?". Lets say I thought, a-ha, I'll replace the explicit class name with self.__class__ and wrote:
class LoggingDict(dict):
# Simple example of extending a builtin class
def __setitem__(self, key, value):
logging.info('Setting %r to %r' % (key, value))
super(self.__class__, self).__setitem__(key, value)
Now I create a subclass called SpecialisedLoggingDict of LoggingDict (which doesn't override __setitem__), instantiate it and call __setitem__ on it.
Now self refers to an instance of SpecialisedLoggingDict, so the super returns LoggingDict, and we go straight back into LoggingDict.__setitem__, entering infinite recursion.
The essential point is that in Python 2 a method doesn't really know which class it was defined in, it only knows the class of the instance on which it's being called. Python 3 does compile-time "magic", adding a __class__ cell to functions so that super() can be used without the explicit class reference.
I am quite an average programmer in python and i have not done very complex or any major application with python before ... I was reading new class styles and came across some very new things to me which i am understanding which is data types and classes unification
class defaultdict(dict):
def __init__(self, default=None):
dict.__init__(self)
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default
but whats really getting me confused is why would they unify them? ... i really can't picture any reason making it of high importance .. i'll be glad if anybody can throw some light on this please Thank you
The primary reason was to allow for built-in types to be subclassed in the same way user-created classes could be. Prior to new-style classes, to create a dict-like class, you needed to subclass from a specially designed UserDict class, or produce a custom class that provided the full dict protocol. Now, you can just do class MySpecialDict(dict): and override the methods you want to modify.
For the full rundown, see PEP 252 - Making Types Look More Like Classes
For an example, here's a dict subclass that logs modifications to it:
def log(msg):
...
class LoggingDict(dict):
def __setitem__(self, key, value):
super(LoggingDict, self).__setitem__(key, value)
log('Updated: {}={}'.format(key, value))
Any instance of LoggingDict can be used wherever a regular dict is expected:
def add_value_to_dict(d, key, value):
d[key] = value
logging_dict = LoggingDict()
add_value_to_dict(logging_dict, 'testkey', 'testvalue')
If you instead used a function instead of LoggingDict:
def log_value(d, key, value):
log('Updated: {}={}'.format(key, value))
mydict = dict()
How would you pass mydict to add_value_to_dict and have it log the addition without having to make add_value_to_dict know about log_value?