my_dict = {'a': 1}
I wish for my_dict['a'] to behave the same as my_dict.get('a')
That way, if I do my_dict['b'], I will not raise an error but get the default None value, the same way you would get it from my_dict.get('b')
In the case of my_dict = {'a': {'b': 2}} I could do my_dict['a']['b'] and it would act as my_dict.get('a').get('b')
When doing my_dict['b'] = 2 it will act same as my_dict.update({'b': 2})
Is it possible to do so that I will not have to inherit from dict?
You can use a collections.defaultdict() object to add a new value to the dictionary each time you try to access a non-existing key:
>>> from collections import defaultdict
>>> d = defaultdict(lambda: None)
>>> d['a'] is None
True
>>> d
defaultdict(<function <lambda> at 0x10f463e18>, {'a': None})
If you don't want the key added, create a subclass of dict that implements the __missing__ method:
class DefaultNoneDict(dict):
def __missing__(self, key):
return None
This explicitly won't add new keys:
>>> d = DefaultNoneDict()
>>> d['a'] is None
True
>>> d
{}
If you wanted to chain .get() calls, you'll have to return an empty dictionary instead, otherwise dict.get(keyA).get(keyB) will fail with an attribute error (the first None returned won't have a .get() method).
Generally speaking, it is better to stick to the default type and be explicit. There is nothing wrong with:
value = some_d.get(outer, {}).get(inner)
Using a defaultdict or a dict subclass with custom __missing__ hook have a downside: they will always produce a default when the key is missing, even when you accidentally produced incorrect keys somewhere else in your code. I often opt for an explicit dict.get() or dict.setdefault() codepath over defaultdict precisely because I want a non-existing key to produce an error in other parts of my project.
Related
Our use case is that if a key doesn't exist in the dictionary and we are trying to fetch the value against that key then a list with only that key should be returned as the default value.
Below is an example:
>>> dic = defaultdict(<function 'custom_default_function'>, {1: [1,2,6], 3: [3,6,8]})
>>> print(dic[1])
[1,2,6]
>>> print(dic[5])
[5]
In case of key with value 1 the output is completely fine as the key is there in dic. But for the case when we trying to look for key 5 then the default value that the code must print should be [5] i.e a list with only key as an element inside it.
I tried to write a default function but am not getting on how to pass parameter to the default function.
def default_function(key):
return key
# Defining the dict
d = defaultdict(default_function)
d[1] = [1,4]
d[2] = [2,3]
print(d[4]) # This will throw error as the positional argument for default_function is not missing
Where am I going wrong and how can I resolve this using defaultdict in Python?
defaultdict will not generate a new value that depends on the key...
you could inherit from dict and overload __missing__:
class MyDict(dict):
def __init__(self):
super().__init__()
def __missing__(self, key):
self[key] = [key]
return self[key]
my_dict = MyDict()
print(my_dict[5]) # -> [5]
print(my_dict) # -> {5: [5]}
there are 2 other answers here that might help:
Accessing key in factory of defaultdict
Is there a clever way to pass the key to defaultdict's default_factory?
Mypy considers this to be valid with strict = true:
from typing import Dict, TypeVar
KeyType = TypeVar("KeyType")
ValueType = TypeVar("ValueType")
class InvertibleDict(Dict[KeyType, ValueType]):
def __inverse__(self) -> "InvertibleDict[ValueType, KeyType]":
new_instance: "InvertibleDict[ValueType, KeyType]" = self.__class__()
for key, value in self.items():
new_instance[value] = key
return new_instance
However, it does not accept the following, more concise version of the same code, saying that "Keywords must be strings" on the last line:
from typing import Dict, TypeVar
KeyType = TypeVar("KeyType")
ValueType = TypeVar("ValueType")
class InvertibleDict(Dict[KeyType, ValueType]):
def __inverse__(self) -> "InvertibleDict[ValueType, KeyType]":
return self.__class__(**{value: key for key, value in self.items()})
MyPy is correct here, it is catching a bug in your implementation (the beauty of static type checking). The type of:
{value: key for key, value in self.items()}
Is Dict[KeyType, ValueType], but that will fail in general when you do:
dict(**some_mapping)
Where the keys are not guaranteed to be strings.
Observe:
>>> dict(**{1:2,3:4})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keywords must be strings
You just want:
return self.__class__({value: key for key, value in self.items()})
Which won't fail in general:
>>> dict({1:2,3:4})
{1: 2, 3: 4}
Personally, I would go with your first implementation regardless to not unnecessarily waste 2x the amount of space required, and do a needless second-pass.
Note, you would probably never use ** unpacking to initialize a dict, the keyword-argument form of the constructor is a convenience for writing something like:
>>> dict(foo=1, bar=2)
{'foo': 1, 'bar': 2}
You can even use this handy trick when copying a dictionary but wanting to force a value for particular string keys:
>>> dict({'foo': 1, 'bar': 2}, bar=42)
{'foo': 1, 'bar': 42}
Just for laughs I tried return self.__class__({value: key for key, value in self.items()}), which seems to work the same and passes mypy checks. TIL dicts can be initialised with a dict rather than **kwargs.
I saw this example at pythontips. I do not understand the second line when defaultdict takes an argument "tree" and return a "tree".
import collections
tree = lambda: collections.defaultdict(tree)
some_dict = tree()
some_dict['color']['favor'] = "yellow"
# Works fine
After I run this code, I checked the type of some_dict
defaultdict(< function < lambda > at 0x7f19ae634048 >,
{'color': defaultdict(
< function < lambda > at 0x7f19ae634048 >, {'favor': 'yellow'})})
This is a pretty clever way to create a recursive defaultdict. It's a little tricky to understand at first but once you dig into what's happening, it's actually a pretty simple use of recursion.
In this example, we define a recursive lambda function, tree, that returns a defaultdict whose constructor is tree. Let's rewrite this using regular functions for clarity.
from collections import defaultdict
from pprint import pprint
def get_recursive_dict():
return defaultdict(get_recursive_dict)
Note that we're returning defaultdict(get_recursive_dict) and not defaultdict(get_recursive_dict()). We want to pass defaultdict a callable object (i.e. the function get_recursive_dict). Actually calling get_recursive_dict() would result in infinite recursion.
If we call get_recursive_dict, we get an empty defaultdict whose default value is the function get_recursive_dict.
recursive_dict = get_recursive_dict()
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x0000000004FFC4A8>, {})
Let's see this in action. Create the key 'alice' and it's corresponding value defaults to an empty defaultdict whose default value is the function get_recursive_dict. Notice that this is the same default value as our recursive_dict!
print(recursive_dict['alice'])
# defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {})
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {'alice': defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {})})
So we can create as many nested dictionaries as we want.
recursive_dict['bob']['age'] = 2
recursive_dict['charlie']['food']['dessert'] = 'cake'
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'charlie': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'food': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'dessert': 'cake'})}), 'bob': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'age': 2}), 'alice': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {})})
Once you overwrite the default value with a key, you can no longer create arbitrarily deep nested dictionaries.
recursive_dict['bob']['age']['year'] = 2016
# TypeError: 'int' object does not support item assignment
I hope this clears things up!
Two points to note:
lambda represents an anonymous function.
Functions are first-class objects in Python. They may be assigned to a variable like any other object.
So here are 2 different ways to define functionally identical objects. They are recursive functions because they reference themselves.
from collections import defaultdict
# anonymous
tree = lambda: defaultdict(tree)
# explicit
def tree(): return defaultdict(tree)
Running the final 2 lines with these different definitions in turn, you see only a subtle difference in the naming of the defaultdict type:
# anonymous
defaultdict(<function __main__.<lambda>()>,
{'color': defaultdict(<function __main__.<lambda>()>,
{'favor': 'yellow'})})
# explicit
defaultdict(<function __main__.tree()>,
{'color': defaultdict(<function __main__.tree()>,
{'favor': 'yellow'})})
It's easier to see if you try this: a = lambda: a, you'll see that a() returns a. So...
>>> a = lambda: a
>>> a()()()()
<function <lambda> at 0x102bffd08>
They're doing this with the defaultdict too. tree is a function returning a defaultdict whose default value is yet another defaultdict, and so on.
I wasn't actually aware of this either. I thought tree would have to be defined first. Maybe it's a special Python rule? (EDIT:) No, I forgot that Python does the name lookup at runtime, and tree already points to the lambda then. In C++ there's compile-time reference checking, but you can define functions that reference themselves.
It seems like a way to create behavior that some users wouldn't expect. Like say you accidentally redefine tree later, your defaultdict is broken:
>>> import collections
>>> tree = lambda: collections.defaultdict(tree)
>>> some_dict = tree()
>>> tree = 4
>>> some_dict[4][3] = 2 # TypeError: first argument must be callable or None
In someone else's code I read the following two lines:
x = defaultdict(lambda: 0)
y = defaultdict(lambda: defaultdict(lambda: 0))
As the argument of defaultdict is a default factory, I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed. Am I correct?
And what about y? It seems that the default factory will create a defaultdict with default 0. But what does that mean concretely? I tried to play around with it in Python shell, but couldn't figure out what it is exactly.
I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed.
That's right. This is more idiomatically written
x = defaultdict(int)
In the case of y, when you do y["ham"]["spam"], the key "ham" is inserted in y if it does not exist. The value associated with it becomes a defaultdict in which "spam" is automatically inserted with a value of 0.
I.e., y is a kind of "two-tiered" defaultdict. If "ham" not in y, then evaluating y["ham"]["spam"] is like doing
y["ham"] = {}
y["ham"]["spam"] = 0
in terms of ordinary dict.
You are correct for what the first one does. As for y, it will create a defaultdict with default 0 when a key doesn't exist in y, so you can think of this as a nested dictionary. Consider the following example:
y = defaultdict(lambda: defaultdict(lambda: 0))
print y['k1']['k2'] # 0
print dict(y['k1']) # {'k2': 0}
To create an equivalent nested dictionary structure without defaultdict you would need to create an inner dict for y['k1'] and then set y['k1']['k2'] to 0, but defaultdict does all of this behind the scenes when it encounters keys it hasn't seen:
y = {}
y['k1'] = {}
y['k1']['k2'] = 0
The following function may help for playing around with this on an interpreter to better your understanding:
def to_dict(d):
if isinstance(d, defaultdict):
return dict((k, to_dict(v)) for k, v in d.items())
return d
This will return the dict equivalent of a nested defaultdict, which is a lot easier to read, for example:
>>> y = defaultdict(lambda: defaultdict(lambda: 0))
>>> y['a']['b'] = 5
>>> y
defaultdict(<function <lambda> at 0xb7ea93e4>, {'a': defaultdict(<function <lambda> at 0xb7ea9374>, {'b': 5})})
>>> to_dict(y)
{'a': {'b': 5}}
defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained.
lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int), which will do the same thing.
As for the second part, the author would like to create a new defaultdict(int), or a nested dictionary, whenever a key is not found in the top-level dictionary.
All answers are good enough still I am giving the answer to add more info:
"defaultdict requires an argument that is callable. That return result of that callable object is the default value that the dictionary returns when you try to access the dictionary with a key that does not exist."
Here's an example
SAMPLE= {'Age':28, 'Salary':2000}
SAMPLE = defaultdict(lambda:0,SAMPLE)
>>> SAMPLE
defaultdict(<function <lambda> at 0x0000000002BF7C88>, {'Salary': 2000, 'Age': 28})
>>> SAMPLE['Age']----> This will return 28
>>> SAMPLE['Phone']----> This will return 0 # you got 0 as output for a non existing key inside SAMPLE
y = defaultdict(lambda:defaultdict(lambda:0))
will be helpful if you try this y['a']['b'] += 1
How does:
dict = {}
if key not in dict:
dict[key] = foo
Compare to:
try:
dict[key]
except KeyError:
dict[key] = foo
ie, is the look up of a key in anyway faster than the linear search through dict.keys(), that I assume the first form will do?
Just to clarify one point: if key not in d doesn't do a linear search through d's keys. It uses the dict's hash table to quickly find the key.
You're looking for the setdefault method:
>>> r = {}
>>> r.setdefault('a', 'b')
'b'
>>> r
{'a': 'b'}
>>> r.setdefault('a', 'e')
'b'
>>> r
{'a': 'b'}
The answer depends on how often the key is already in the dict (BTW, has anyone mentioned to you how bad an idea it is to hide a builtin such as dict behind a variable?)
if key not in dct:
dct[key] = foo
If the key is in the dictionary this does one dictionary lookup. If the key is in the dictionary it looks up the dictionary twice.
try:
dct[key]
except KeyError:
dct[key] = foo
This may be slightly faster for the case where the key is in the dictionary, but throwing an exception has quite a big overhead, so it is almost always not the best option.
dct.setdefault(key, foo)
This one is slightly tricky: it always involves two dictionary lookups: the first one is to find the setdefault method in the dict class, the second is to look for key in the dct object. Also if foo is an expression it will be evaluated every time whereas the earlier options only evaluate it when they have to.
Also look at collections.defaultdict. That is the most appropriate solution for a large class of situations like this.
Try: my_dict.setdefault(key, default). It's slightly slower than the other options, though.
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
#!/usr/bin/env python
example_dict = dict(zip(range(10), range(10)))
def kn(key, d):
if key not in d:
d[key] = 'foo'
def te(key, d):
try:
d[key]
except KeyError:
d[key] = 'foo'
def sd(key, d):
d.setdefault(key, 'foo')
if __name__ == '__main__':
from timeit import Timer
t = Timer("kn(2, example_dict)", "from __main__ import kn, example_dict")
print t.timeit()
t = Timer("te(2, example_dict)", "from __main__ import te, example_dict")
print t.timeit()
t = Timer("sd(2, example_dict)", "from __main__ import sd, example_dict")
print t.timeit()
# kn: 0.249855041504
# te: 0.244259119034
# sd: 0.375113964081
my_dict.get(key, foo) returns foo if key isn't in my_dict. The default value is None, so my_dict.get(key) will return None if key isn't in my_dict. The first of your options is better if you want to just add key to your dictionary. Don't worry about speed here. If you find that populating your dictionary is a hot spot in your program, then think about it. But it isn't. So don't.