I saw this example at pythontips. I do not understand the second line when defaultdict takes an argument "tree" and return a "tree".
import collections
tree = lambda: collections.defaultdict(tree)
some_dict = tree()
some_dict['color']['favor'] = "yellow"
# Works fine
After I run this code, I checked the type of some_dict
defaultdict(< function < lambda > at 0x7f19ae634048 >,
{'color': defaultdict(
< function < lambda > at 0x7f19ae634048 >, {'favor': 'yellow'})})
This is a pretty clever way to create a recursive defaultdict. It's a little tricky to understand at first but once you dig into what's happening, it's actually a pretty simple use of recursion.
In this example, we define a recursive lambda function, tree, that returns a defaultdict whose constructor is tree. Let's rewrite this using regular functions for clarity.
from collections import defaultdict
from pprint import pprint
def get_recursive_dict():
return defaultdict(get_recursive_dict)
Note that we're returning defaultdict(get_recursive_dict) and not defaultdict(get_recursive_dict()). We want to pass defaultdict a callable object (i.e. the function get_recursive_dict). Actually calling get_recursive_dict() would result in infinite recursion.
If we call get_recursive_dict, we get an empty defaultdict whose default value is the function get_recursive_dict.
recursive_dict = get_recursive_dict()
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x0000000004FFC4A8>, {})
Let's see this in action. Create the key 'alice' and it's corresponding value defaults to an empty defaultdict whose default value is the function get_recursive_dict. Notice that this is the same default value as our recursive_dict!
print(recursive_dict['alice'])
# defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {})
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {'alice': defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {})})
So we can create as many nested dictionaries as we want.
recursive_dict['bob']['age'] = 2
recursive_dict['charlie']['food']['dessert'] = 'cake'
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'charlie': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'food': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'dessert': 'cake'})}), 'bob': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'age': 2}), 'alice': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {})})
Once you overwrite the default value with a key, you can no longer create arbitrarily deep nested dictionaries.
recursive_dict['bob']['age']['year'] = 2016
# TypeError: 'int' object does not support item assignment
I hope this clears things up!
Two points to note:
lambda represents an anonymous function.
Functions are first-class objects in Python. They may be assigned to a variable like any other object.
So here are 2 different ways to define functionally identical objects. They are recursive functions because they reference themselves.
from collections import defaultdict
# anonymous
tree = lambda: defaultdict(tree)
# explicit
def tree(): return defaultdict(tree)
Running the final 2 lines with these different definitions in turn, you see only a subtle difference in the naming of the defaultdict type:
# anonymous
defaultdict(<function __main__.<lambda>()>,
{'color': defaultdict(<function __main__.<lambda>()>,
{'favor': 'yellow'})})
# explicit
defaultdict(<function __main__.tree()>,
{'color': defaultdict(<function __main__.tree()>,
{'favor': 'yellow'})})
It's easier to see if you try this: a = lambda: a, you'll see that a() returns a. So...
>>> a = lambda: a
>>> a()()()()
<function <lambda> at 0x102bffd08>
They're doing this with the defaultdict too. tree is a function returning a defaultdict whose default value is yet another defaultdict, and so on.
I wasn't actually aware of this either. I thought tree would have to be defined first. Maybe it's a special Python rule? (EDIT:) No, I forgot that Python does the name lookup at runtime, and tree already points to the lambda then. In C++ there's compile-time reference checking, but you can define functions that reference themselves.
It seems like a way to create behavior that some users wouldn't expect. Like say you accidentally redefine tree later, your defaultdict is broken:
>>> import collections
>>> tree = lambda: collections.defaultdict(tree)
>>> some_dict = tree()
>>> tree = 4
>>> some_dict[4][3] = 2 # TypeError: first argument must be callable or None
Related
I am using the snippet from this answer, more specifically the one that uses a lambda.
I have been trying to implement this using the := operator in Python 3.8 and I ended up with two different implementations, both of which I believe does what I want.
Snippet 1 (using a lambda defined beforehand, same as snippet in linked answer)
from collections import defaultdict
from pprint import pprint
func = lambda:defaultdict(func)
infinite_nested_dicts_one = func()
infinite_nested_dicts_one[1][2][3][4][5]['a'] = '12345a'
infinite_nested_dicts_one[1][2][3][4][5]['b'] = '12345b'
print(infinite_nested_dicts_one[1][2][3][4][5]['a']) # '12345a'
print(infinite_nested_dicts_one[1][2][3][4][5]['b']) # '12345b'
pprint(infinite_nested_dicts_one)
Output 1
12345a
12345b
defaultdict(<function <lambda> at 0x000001A7405EA790>,
{1: defaultdict(<function <lambda> at 0x000001A7405EA790>,
{2: defaultdict(<function <lambda> at 0x000001A7405EA790>,
{3: defaultdict(<function <lambda> at 0x000001A7405EA790>,
{4: defaultdict(<function <lambda> at 0x000001A7405EA790>,
{5: defaultdict(<function <lambda> at 0x000001A7405EA790>,
{'a': '12345a',
'b': '12345b'})})})})})})
Snippet 2 (using assignment operator)
from collections import defaultdict
from pprint import pprint
infinite_nested_dicts_two = (func:=defaultdict(lambda:func))
infinite_nested_dicts_two[1][2][3][4][5]['a'] = '12345a'
infinite_nested_dicts_two[1][2][3][4][5]['b'] = '12345b'
print(infinite_nested_dicts_two[1][2][3][4][5]['a']) # '12345a'
print(infinite_nested_dicts_two[1][2][3][4][5]['b']) # '12345b'
pprint(infinite_nested_dicts_two)
Output 2
12345a
12345b
defaultdict(<function <lambda> at 0x0000022F1A0EA790>,
{1: <Recursion on defaultdict with id=2401323545280>,
2: <Recursion on defaultdict with id=2401323545280>,
3: <Recursion on defaultdict with id=2401323545280>,
4: <Recursion on defaultdict with id=2401323545280>,
5: <Recursion on defaultdict with id=2401323545280>,
'a': '12345a',
'b': '12345b'})
In both cases accessing the outer most defaultdict for [1][2][3][4][5]['a'] and [1][2][3][4][5]['b'] gives me the same output.
So the question is
Why is the pprint output different?
Are these identical or not?
Are they both arbitrary nested defaultdict of defaultdict?
Am I missing some details?
PS :
I am aware the the syntactical equivalent using := is infinite_nested_dicts_two = (func:=lambda:defaultdict(func))().
Do let me know if any clarification is needed. Many Thanks.
Your second version assigns a defaultdict instance to func (and, redundantly, to infinite_nested_dicts_two). Accordingly, instead of using func (which is not a function!) as its default-generating function, it uses a constant function that always returns func—i.e., the dictionary itself is the default value. The necessary result is the heavily recursive structure revealed by pprint, since every level of indexing adds a key to, and returns, the same object.
my_dict = {'a': 1}
I wish for my_dict['a'] to behave the same as my_dict.get('a')
That way, if I do my_dict['b'], I will not raise an error but get the default None value, the same way you would get it from my_dict.get('b')
In the case of my_dict = {'a': {'b': 2}} I could do my_dict['a']['b'] and it would act as my_dict.get('a').get('b')
When doing my_dict['b'] = 2 it will act same as my_dict.update({'b': 2})
Is it possible to do so that I will not have to inherit from dict?
You can use a collections.defaultdict() object to add a new value to the dictionary each time you try to access a non-existing key:
>>> from collections import defaultdict
>>> d = defaultdict(lambda: None)
>>> d['a'] is None
True
>>> d
defaultdict(<function <lambda> at 0x10f463e18>, {'a': None})
If you don't want the key added, create a subclass of dict that implements the __missing__ method:
class DefaultNoneDict(dict):
def __missing__(self, key):
return None
This explicitly won't add new keys:
>>> d = DefaultNoneDict()
>>> d['a'] is None
True
>>> d
{}
If you wanted to chain .get() calls, you'll have to return an empty dictionary instead, otherwise dict.get(keyA).get(keyB) will fail with an attribute error (the first None returned won't have a .get() method).
Generally speaking, it is better to stick to the default type and be explicit. There is nothing wrong with:
value = some_d.get(outer, {}).get(inner)
Using a defaultdict or a dict subclass with custom __missing__ hook have a downside: they will always produce a default when the key is missing, even when you accidentally produced incorrect keys somewhere else in your code. I often opt for an explicit dict.get() or dict.setdefault() codepath over defaultdict precisely because I want a non-existing key to produce an error in other parts of my project.
Print dict and defaultdict:
>>> d = {'key': 'value'}
>>> print(d)
{'key': 'value'}
>>> dd = defaultdict(lambda: 'value')
>>> dd['key']
'value'
>>> print(dd)
defaultdict(<function <lambda> at 0x7fbd44cb6b70>, {'key': 'value'})
With nested structure it becomes ugly:
>>> nested_d = {'key1': {'key2': {'key3': 'value'}}}
>>> print(nested_d)
{'key1': {'key2': {'key3': 'value'}}}
>>> def factory():
... return defaultdict(factory)
...
>>> nested_dd = defaultdict(factory)
>>> nested_dd['key1']['key2']['key3'] = 'value'
>>> print(nested_dd)
defaultdict(<function factory at 0x7fbd44cd4ea0>, {'key1': defaultdict(<function factory at 0x7fbd44cd4ea0>, {'key2': defaultdict(<function factory at 0x7fbd44cd4ea0>, {'key3': 'value'})})})
Were there any reasons for not making it human-readable by default? (UPD: I mean what are the reasons behind not having custom __str__ defined for defaultdict by default?)
repr() output (defaultdict has no __str__, only __repr__) is debugging output. It is not meant to be pretty, it is meant to be functional. It tells you the type, the repr() of the callable that produces the default, and the contents.
From the __repr__ documentation:
This is typically used for debugging, so it is important that the representation is information-rich and unambiguous.
Like all datatypes in Python, (except for strings for obvious reasons), no informal (__str__) is defined because it is up to the programmer to decide what output is suitable for their use-cases. No default can be set for that, because use-cases vary so widely. Output for a file has different needs than output to a GUI or to a web-page for example.
In Python 2, convert the object to a plain dictionary first, then use pprint() if you want 'pretty' output:
def todict(d):
if not isinstance(d, dict):
return d
return {k: todict(v) for k, v in d.items()}
pprint(todict(nested_dd))
In Python 3, pprint supports defaultdict directly:
>>> pprint(nested_dd)
defaultdict(<function factory at 0x105ed2f28>,
{'key1': defaultdict(<function factory at 0x105ed2f28>,
{'key2': defaultdict(<function factory at 0x105ed2f28>,
{'key3': 'value'})})})
There's no way to know what, if anything, the author(s) were thinking or even whether they gave it much consideration at all.
For the specific case of nested defaultdicts, as shown your example code:
def factory():
return defaultdict(factory)
nested_dd = defaultdict(factory)
nested_dd['key1']['key2']['key3'] = 'value'
You can avoid the issue by subclassing dict like this instead:
class Tree(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
nested_dd = Tree()
nested_dd['key1']['key2']['key3'] = 'value'
print(nested_dd) # -> {'key1': {'key2': {'key3': 'value'}}}
Since the subclass doesn't define its own __repr__() or __str__() methods, instances of it will print (and pprint) just like regular dict instances do.
In someone else's code I read the following two lines:
x = defaultdict(lambda: 0)
y = defaultdict(lambda: defaultdict(lambda: 0))
As the argument of defaultdict is a default factory, I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed. Am I correct?
And what about y? It seems that the default factory will create a defaultdict with default 0. But what does that mean concretely? I tried to play around with it in Python shell, but couldn't figure out what it is exactly.
I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed.
That's right. This is more idiomatically written
x = defaultdict(int)
In the case of y, when you do y["ham"]["spam"], the key "ham" is inserted in y if it does not exist. The value associated with it becomes a defaultdict in which "spam" is automatically inserted with a value of 0.
I.e., y is a kind of "two-tiered" defaultdict. If "ham" not in y, then evaluating y["ham"]["spam"] is like doing
y["ham"] = {}
y["ham"]["spam"] = 0
in terms of ordinary dict.
You are correct for what the first one does. As for y, it will create a defaultdict with default 0 when a key doesn't exist in y, so you can think of this as a nested dictionary. Consider the following example:
y = defaultdict(lambda: defaultdict(lambda: 0))
print y['k1']['k2'] # 0
print dict(y['k1']) # {'k2': 0}
To create an equivalent nested dictionary structure without defaultdict you would need to create an inner dict for y['k1'] and then set y['k1']['k2'] to 0, but defaultdict does all of this behind the scenes when it encounters keys it hasn't seen:
y = {}
y['k1'] = {}
y['k1']['k2'] = 0
The following function may help for playing around with this on an interpreter to better your understanding:
def to_dict(d):
if isinstance(d, defaultdict):
return dict((k, to_dict(v)) for k, v in d.items())
return d
This will return the dict equivalent of a nested defaultdict, which is a lot easier to read, for example:
>>> y = defaultdict(lambda: defaultdict(lambda: 0))
>>> y['a']['b'] = 5
>>> y
defaultdict(<function <lambda> at 0xb7ea93e4>, {'a': defaultdict(<function <lambda> at 0xb7ea9374>, {'b': 5})})
>>> to_dict(y)
{'a': {'b': 5}}
defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained.
lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int), which will do the same thing.
As for the second part, the author would like to create a new defaultdict(int), or a nested dictionary, whenever a key is not found in the top-level dictionary.
All answers are good enough still I am giving the answer to add more info:
"defaultdict requires an argument that is callable. That return result of that callable object is the default value that the dictionary returns when you try to access the dictionary with a key that does not exist."
Here's an example
SAMPLE= {'Age':28, 'Salary':2000}
SAMPLE = defaultdict(lambda:0,SAMPLE)
>>> SAMPLE
defaultdict(<function <lambda> at 0x0000000002BF7C88>, {'Salary': 2000, 'Age': 28})
>>> SAMPLE['Age']----> This will return 28
>>> SAMPLE['Phone']----> This will return 0 # you got 0 as output for a non existing key inside SAMPLE
y = defaultdict(lambda:defaultdict(lambda:0))
will be helpful if you try this y['a']['b'] += 1
Update: dicts retaining insertion order is guaranteed for Python 3.7+
I want to use a .py file like a config file.
So using the {...} notation I can create a dictionary using strings as keys but the definition order is lost in a standard python dictionary.
My question: is it possible to override the {...} notation so that I get an OrderedDict() instead of a dict()?
I was hoping that simply overriding dict constructor with OrderedDict (dict = OrderedDict) would work, but it doesn't.
Eg:
dict = OrderedDict
dictname = {
'B key': 'value1',
'A key': 'value2',
'C key': 'value3'
}
print dictname.items()
Output:
[('B key', 'value1'), ('A key', 'value2'), ('C key', 'value3')]
Here's a hack that almost gives you the syntax you want:
class _OrderedDictMaker(object):
def __getitem__(self, keys):
if not isinstance(keys, tuple):
keys = (keys,)
assert all(isinstance(key, slice) for key in keys)
return OrderedDict([(k.start, k.stop) for k in keys])
ordereddict = _OrderedDictMaker()
from nastyhacks import ordereddict
menu = ordereddict[
"about" : "about",
"login" : "login",
'signup': "signup"
]
Edit: Someone else discovered this independently, and has published the odictliteral package on PyPI that provides a slightly more thorough implementation - use that package instead
To literally get what you are asking for, you have to fiddle with the syntax tree of your file. I don't think it is advisable to do so, but I couldn't resist the temptation to try. So here we go.
First, we create a module with a function my_execfile() that works like the built-in execfile(), except that all occurrences of dictionary displays, e.g. {3: 4, "a": 2} are replaced by explicit calls to the dict() constructor, e.g. dict([(3, 4), ('a', 2)]). (Of course we could directly replace them by calls to collections.OrderedDict(), but we don't want to be too intrusive.) Here's the code:
import ast
class DictDisplayTransformer(ast.NodeTransformer):
def visit_Dict(self, node):
self.generic_visit(node)
list_node = ast.List(
[ast.copy_location(ast.Tuple(list(x), ast.Load()), x[0])
for x in zip(node.keys, node.values)],
ast.Load())
name_node = ast.Name("dict", ast.Load())
new_node = ast.Call(ast.copy_location(name_node, node),
[ast.copy_location(list_node, node)],
[], None, None)
return ast.copy_location(new_node, node)
def my_execfile(filename, globals=None, locals=None):
if globals is None:
globals = {}
if locals is None:
locals = globals
node = ast.parse(open(filename).read())
transformed = DictDisplayTransformer().visit(node)
exec compile(transformed, filename, "exec") in globals, locals
With this modification in place, we can modify the behaviour of dictionary displays by overwriting dict. Here is an example:
# test.py
from collections import OrderedDict
print {3: 4, "a": 2}
dict = OrderedDict
print {3: 4, "a": 2}
Now we can run this file using my_execfile("test.py"), yielding the output
{'a': 2, 3: 4}
OrderedDict([(3, 4), ('a', 2)])
Note that for simplicity, the above code doesn't touch dictionary comprehensions, which should be transformed to generator expressions passed to the dict() constructor. You'd need to add a visit_DictComp() method to the DictDisplayTransformer class. Given the above example code, this should be straight-forward.
Again, I don't recommend this kind of messing around with the language semantics. Did you have a look into the ConfigParser module?
OrderedDict is not "standard python syntax", however, an ordered set of key-value pairs (in standard python syntax) is simply:
[('key1 name', 'value1'), ('key2 name', 'value2'), ('key3 name', 'value3')]
To explicitly get an OrderedDict:
OrderedDict([('key1 name', 'value1'), ('key2 name', 'value2'), ('key3 name', 'value3')])
Another alternative, is to sort dictname.items(), if that's all you need:
sorted(dictname.items())
As of python 3.6, all dictionaries will be ordered by default. For now, this is an implementation detail of dict and should not be relied upon, but it will likely become standard after v3.6.
Insertion order is always preserved in the new dict implementation:
>>>x = {'a': 1, 'b':2, 'c':3 }
>>>list(x.keys())
['a', 'b', 'c']
As of python 3.6 **kwargs order [PEP468] and class attribute order [PEP520] are preserved. The new compact, ordered dictionary implementation is used to implement the ordering for both of these.
What you are asking for is impossible, but if a config file in JSON syntax is sufficient you can do something similar with the json module:
>>> import json, collections
>>> d = json.JSONDecoder(object_pairs_hook = collections.OrderedDict)
>>> d.decode('{"a":5,"b":6}')
OrderedDict([(u'a', 5), (u'b', 6)])
The one solution I found is to patch python itself, making the dict object remember the order of insertion.
This then works for all kind of syntaxes:
x = {'a': 1, 'b':2, 'c':3 }
y = dict(a=1, b=2, c=3)
etc.
I have taken the ordereddict C implementation from https://pypi.python.org/pypi/ruamel.ordereddict/ and merged back into the main python code.
If you do not mind re-building the python interpreter, here is a patch for Python 2.7.8:
https://github.com/fwyzard/cpython/compare/2.7.8...ordereddict-2.7.8.diff
.A
If what you are looking for is a way to get easy-to-use initialization syntax - consider creating a subclass of OrderedDict and adding operators to it that update the dict, for example:
from collections import OrderedDict
class OrderedMap(OrderedDict):
def __add__(self,other):
self.update(other)
return self
d = OrderedMap()+{1:2}+{4:3}+{"key":"value"}
d will be- OrderedMap([(1, 2), (4, 3), ('key','value')])
Another possible syntactic-sugar example using the slicing syntax:
class OrderedMap(OrderedDict):
def __getitem__(self, index):
if isinstance(index, slice):
self[index.start] = index.stop
return self
else:
return OrderedDict.__getitem__(self, index)
d = OrderedMap()[1:2][6:4][4:7]["a":"H"]