in pinax Userdict.py:
def __getitem__(self, key):
if key in self.data:
return self.data[key]
if hasattr(self.__class__, "__missing__"):
return self.__class__.__missing__(self, key)
why does it do this on self.__class__.__missing__.
thanks
The UserDict.py presented here emulates built-in dict closely, so for example:
>>> class m(dict):
... def __missing__(self, key): return key + key
...
>>> a=m()
>>> a['ciao']
'ciaociao'
just as you can override the special method __missing__ to deal with missing keys when you subclass the built-in dict, so can you override it when you subclass that UserDict.
The official Python docs for dict are here, and they do say:
New in version 2.5: If a subclass of
dict defines a method __missing__(),
if the key key is not present, the
d[key] operation calls that method
with the key key as argument. The
d[key] operation then returns or
raises whatever is returned or raised
by the __missing__(key) call if the
key is not present. No other
operations or methods invoke
__missing__(). If __missing__() is not defined, KeyError is raised.
__missing__() must be a method; it cannot be an instance variable. For an
example, see collections.defaultdict.
If you want to use default values in a dict (aka __missing__), you can check out defaultdict from collections module:
from collections import defaultdict
a = defaultdict(int)
a[1] # -> 0
a[2] += 1
a # -> defaultdict(int, {1: 0, 2: 1})
Related
Our use case is that if a key doesn't exist in the dictionary and we are trying to fetch the value against that key then a list with only that key should be returned as the default value.
Below is an example:
>>> dic = defaultdict(<function 'custom_default_function'>, {1: [1,2,6], 3: [3,6,8]})
>>> print(dic[1])
[1,2,6]
>>> print(dic[5])
[5]
In case of key with value 1 the output is completely fine as the key is there in dic. But for the case when we trying to look for key 5 then the default value that the code must print should be [5] i.e a list with only key as an element inside it.
I tried to write a default function but am not getting on how to pass parameter to the default function.
def default_function(key):
return key
# Defining the dict
d = defaultdict(default_function)
d[1] = [1,4]
d[2] = [2,3]
print(d[4]) # This will throw error as the positional argument for default_function is not missing
Where am I going wrong and how can I resolve this using defaultdict in Python?
defaultdict will not generate a new value that depends on the key...
you could inherit from dict and overload __missing__:
class MyDict(dict):
def __init__(self):
super().__init__()
def __missing__(self, key):
self[key] = [key]
return self[key]
my_dict = MyDict()
print(my_dict[5]) # -> [5]
print(my_dict) # -> {5: [5]}
there are 2 other answers here that might help:
Accessing key in factory of defaultdict
Is there a clever way to pass the key to defaultdict's default_factory?
Mypy considers this to be valid with strict = true:
from typing import Dict, TypeVar
KeyType = TypeVar("KeyType")
ValueType = TypeVar("ValueType")
class InvertibleDict(Dict[KeyType, ValueType]):
def __inverse__(self) -> "InvertibleDict[ValueType, KeyType]":
new_instance: "InvertibleDict[ValueType, KeyType]" = self.__class__()
for key, value in self.items():
new_instance[value] = key
return new_instance
However, it does not accept the following, more concise version of the same code, saying that "Keywords must be strings" on the last line:
from typing import Dict, TypeVar
KeyType = TypeVar("KeyType")
ValueType = TypeVar("ValueType")
class InvertibleDict(Dict[KeyType, ValueType]):
def __inverse__(self) -> "InvertibleDict[ValueType, KeyType]":
return self.__class__(**{value: key for key, value in self.items()})
MyPy is correct here, it is catching a bug in your implementation (the beauty of static type checking). The type of:
{value: key for key, value in self.items()}
Is Dict[KeyType, ValueType], but that will fail in general when you do:
dict(**some_mapping)
Where the keys are not guaranteed to be strings.
Observe:
>>> dict(**{1:2,3:4})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keywords must be strings
You just want:
return self.__class__({value: key for key, value in self.items()})
Which won't fail in general:
>>> dict({1:2,3:4})
{1: 2, 3: 4}
Personally, I would go with your first implementation regardless to not unnecessarily waste 2x the amount of space required, and do a needless second-pass.
Note, you would probably never use ** unpacking to initialize a dict, the keyword-argument form of the constructor is a convenience for writing something like:
>>> dict(foo=1, bar=2)
{'foo': 1, 'bar': 2}
You can even use this handy trick when copying a dictionary but wanting to force a value for particular string keys:
>>> dict({'foo': 1, 'bar': 2}, bar=42)
{'foo': 1, 'bar': 42}
Just for laughs I tried return self.__class__({value: key for key, value in self.items()}), which seems to work the same and passes mypy checks. TIL dicts can be initialised with a dict rather than **kwargs.
I've got a class that has replaced __iter__ to hide extra unneeded data. I've made the rest of my code backwards compatible by setting iteritems to either dict.iteritems or dict.items depending on the python version, and I can then call iteritems(class_object), but it doesn't seem to work well with my class.
It'll be easier to explain with an example:
class Test(dict):
def __init__(self, some_dict):
self.some_dict = some_dict
super(self.__class__, self).__init__(self.some_dict)
def __iter__(self):
for k, v in self.some_dict.iteritems():
yield k, v['value']
test_dict = {
'a': {'value': 'what',
'hidden': 123},
'b': {'value': 'test'}
}
If I do Test(test_dict).__iter__(), then it correctly returns {'a': 'what', 'b': 'test'}
If I add iteritems = __iter__ to the class, then it also works when doing Test(test_dict).iteritems()
However, no matter what I try, doing dict.iteritems(Test(test_dict)) defaults to the standard dict iterating, and returns {'a': {'hidden': 123, 'value': 'what'}, 'b': {'value': 'test'}}
I've tried a couple of trace functions but they don't go deep enough to figure out what's going on.
The dict.iteritems() method reaches straight into the internal data structures of the dict implementation. You passed in a subclass of dict so those same data structures are there for it to access. You can't override this behaviour.
Not that dict.iteritems() would ever use __iter__; the latter produces keys only, not key-value pairs!
You should instead define iteritems differently; given a PY3 boolean variable that is False for Python 2, True otherwise:
from operator import methodcaller
iteritems = methodcaller('items' if PY3 else 'iteritems')
Now iteritems(object) is translated to object.iteritems() or object.items(), as needed, and the correct method is always called.
Next, to extend dictionary behaviour, instead of subclassing dict, I'd subclass collections.MutableMapping (*):
from collections import MutableMapping
class Test(MutableMapping):
def __init__(self, some_dict):
self.some_dict = some_dict.copy()
def __getitem__(self, key):
return self.some_dict[key]
def __setitem__(self, key, value):
self.some_dict[key] = value
def __delitem__(self, key):
del self.some_dict[key]
def __len__(self):
return len(self.some_dict)
def __iter__(self):
for k, v in self.some_dict.iteritems():
yield k, v['value']
This implements all the same methods that dict provides, except for copy and the dict.fromkeys() class method.
You could instead inherit from collections.UserDict(), which adds those two remaining methods:
try:
# Python 2
from UserDict import UserDict
except ImportError:
from collections import UserDict
class Test(UserDict):
def __iter__(self):
for k, v in self.data.iteritems():
yield k, v['value']
Only an alternate __iter__ implementation is needed in that case.
In either case, you still can't use dict.iteritems on these objects, because that method can only work with actual dict objects.
(*) collections.MutableMapping is the Python 2 location of that class, the official Python 3 location is collections.abc.MutableMapping but there are aliases in place to support Python 2-compatible code.
my_dict = {'a': 1}
I wish for my_dict['a'] to behave the same as my_dict.get('a')
That way, if I do my_dict['b'], I will not raise an error but get the default None value, the same way you would get it from my_dict.get('b')
In the case of my_dict = {'a': {'b': 2}} I could do my_dict['a']['b'] and it would act as my_dict.get('a').get('b')
When doing my_dict['b'] = 2 it will act same as my_dict.update({'b': 2})
Is it possible to do so that I will not have to inherit from dict?
You can use a collections.defaultdict() object to add a new value to the dictionary each time you try to access a non-existing key:
>>> from collections import defaultdict
>>> d = defaultdict(lambda: None)
>>> d['a'] is None
True
>>> d
defaultdict(<function <lambda> at 0x10f463e18>, {'a': None})
If you don't want the key added, create a subclass of dict that implements the __missing__ method:
class DefaultNoneDict(dict):
def __missing__(self, key):
return None
This explicitly won't add new keys:
>>> d = DefaultNoneDict()
>>> d['a'] is None
True
>>> d
{}
If you wanted to chain .get() calls, you'll have to return an empty dictionary instead, otherwise dict.get(keyA).get(keyB) will fail with an attribute error (the first None returned won't have a .get() method).
Generally speaking, it is better to stick to the default type and be explicit. There is nothing wrong with:
value = some_d.get(outer, {}).get(inner)
Using a defaultdict or a dict subclass with custom __missing__ hook have a downside: they will always produce a default when the key is missing, even when you accidentally produced incorrect keys somewhere else in your code. I often opt for an explicit dict.get() or dict.setdefault() codepath over defaultdict precisely because I want a non-existing key to produce an error in other parts of my project.
How does:
dict = {}
if key not in dict:
dict[key] = foo
Compare to:
try:
dict[key]
except KeyError:
dict[key] = foo
ie, is the look up of a key in anyway faster than the linear search through dict.keys(), that I assume the first form will do?
Just to clarify one point: if key not in d doesn't do a linear search through d's keys. It uses the dict's hash table to quickly find the key.
You're looking for the setdefault method:
>>> r = {}
>>> r.setdefault('a', 'b')
'b'
>>> r
{'a': 'b'}
>>> r.setdefault('a', 'e')
'b'
>>> r
{'a': 'b'}
The answer depends on how often the key is already in the dict (BTW, has anyone mentioned to you how bad an idea it is to hide a builtin such as dict behind a variable?)
if key not in dct:
dct[key] = foo
If the key is in the dictionary this does one dictionary lookup. If the key is in the dictionary it looks up the dictionary twice.
try:
dct[key]
except KeyError:
dct[key] = foo
This may be slightly faster for the case where the key is in the dictionary, but throwing an exception has quite a big overhead, so it is almost always not the best option.
dct.setdefault(key, foo)
This one is slightly tricky: it always involves two dictionary lookups: the first one is to find the setdefault method in the dict class, the second is to look for key in the dct object. Also if foo is an expression it will be evaluated every time whereas the earlier options only evaluate it when they have to.
Also look at collections.defaultdict. That is the most appropriate solution for a large class of situations like this.
Try: my_dict.setdefault(key, default). It's slightly slower than the other options, though.
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
#!/usr/bin/env python
example_dict = dict(zip(range(10), range(10)))
def kn(key, d):
if key not in d:
d[key] = 'foo'
def te(key, d):
try:
d[key]
except KeyError:
d[key] = 'foo'
def sd(key, d):
d.setdefault(key, 'foo')
if __name__ == '__main__':
from timeit import Timer
t = Timer("kn(2, example_dict)", "from __main__ import kn, example_dict")
print t.timeit()
t = Timer("te(2, example_dict)", "from __main__ import te, example_dict")
print t.timeit()
t = Timer("sd(2, example_dict)", "from __main__ import sd, example_dict")
print t.timeit()
# kn: 0.249855041504
# te: 0.244259119034
# sd: 0.375113964081
my_dict.get(key, foo) returns foo if key isn't in my_dict. The default value is None, so my_dict.get(key) will return None if key isn't in my_dict. The first of your options is better if you want to just add key to your dictionary. Don't worry about speed here. If you find that populating your dictionary is a hot spot in your program, then think about it. But it isn't. So don't.