Related
I have a large list like:
[A][B1][C1]=1
[A][B1][C2]=2
[A][B2]=3
[D][E][F][G]=4
I want to build a multi-level dict like:
A
--B1
-----C1=1
-----C2=1
--B2=3
D
--E
----F
------G=4
I know that if I use recursive defaultdict I can write table[A][B1][C1]=1, table[A][B2]=2, but this works only if I hardcode those insert statement.
While parsing the list, I don't how many []'s I need beforehand to call table[key1][key2][...].
You can do it without even defining a class:
from collections import defaultdict
nested_dict = lambda: defaultdict(nested_dict)
nest = nested_dict()
nest[0][1][2][3][4][5] = 6
Your example says that at any level there can be a value, and also a dictionary of sub-elements. That is called a tree, and there are many implementations available for them. This is one:
from collections import defaultdict
class Tree(defaultdict):
def __init__(self, value=None):
super(Tree, self).__init__(Tree)
self.value = value
root = Tree()
root.value = 1
root['a']['b'].value = 3
print root.value
print root['a']['b'].value
print root['c']['d']['f'].value
Outputs:
1
3
None
You could do something similar by writing the input in JSON and using json.load to read it as a structure of nested dictionaries.
I think the simplest implementation of a recursive dictionary is this. Only leaf nodes can contain values.
# Define recursive dictionary
from collections import defaultdict
tree = lambda: defaultdict(tree)
Usage:
# Create instance
mydict = tree()
mydict['a'] = 1
mydict['b']['a'] = 2
mydict['c']
mydict['d']['a']['b'] = 0
# Print
import prettyprint
prettyprint.pp(mydict)
Output:
{
"a": 1,
"b": {
"a": 1
},
"c": {},
"d": {
"a": {
"b": 0
}
}
}
I'd do it with a subclass of dict that defines __missing__:
>>> class NestedDict(dict):
... def __missing__(self, key):
... self[key] = NestedDict()
... return self[key]
...
>>> table = NestedDict()
>>> table['A']['B1']['C1'] = 1
>>> table
{'A': {'B1': {'C1': 1}}}
You can't do it directly with defaultdict because defaultdict expects the factory function at initialization time, but at initialization time, there's no way to describe the same defaultdict. The above construct does the same thing that default dict does, but since it's a named class (NestedDict), it can reference itself as missing keys are encountered. It is also possible to subclass defaultdict and override __init__.
This is equivalent to the above, but avoiding lambda notation. Perhaps easier to read ?
def dict_factory():
return defaultdict(dict_factory)
your_dict = dict_factory()
Also -- from the comments -- if you'd like to update from an existing dict, you can simply call
your_dict[0][1][2].update({"some_key":"some_value"})
In order to add values to the dict.
Dan O'Huiginn posted a very nice solution on his journal in 2010:
http://ohuiginn.net/mt/2010/07/nested_dictionaries_in_python.html
>>> class NestedDict(dict):
... def __getitem__(self, key):
... if key in self: return self.get(key)
... return self.setdefault(key, NestedDict())
>>> eggs = NestedDict()
>>> eggs[1][2][3][4][5]
{}
>>> eggs
{1: {2: {3: {4: {5: {}}}}}}
You may achieve this with a recursive defaultdict.
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
It is important to protect the default factory name, the_tree here, in a closure ("private" local function scope). Avoid using a one-liner lambda version, which is bugged due to Python's late binding closures, and implement this with a def instead.
The accepted answer, using a lambda, has a flaw where instances must rely on the nested_dict name existing in an outer scope. If for whatever reason the factory name can not be resolved (e.g. it was rebound or deleted) then pre-existing instances will also become subtly broken:
>>> nested_dict = lambda: defaultdict(nested_dict)
>>> nest = nested_dict()
>>> nest[0][1][2][3][4][6] = 7
>>> del nested_dict
>>> nest[8][9] = 10
# NameError: name 'nested_dict' is not defined
To add to #Hugo To have a max depth:
l=lambda x:defaultdict(lambda:l(x-1)) if x>0 else defaultdict(dict)
arr = l(2)
A slightly different possibility that allows regular dictionary initialization:
from collections import defaultdict
def superdict(arg=()):
update = lambda obj, arg: obj.update(arg) or obj
return update(defaultdict(superdict), arg)
Example:
>>> d = {"a":1}
>>> sd = superdict(d)
>>> sd["b"]["c"] = 2
You could use a NestedDict.
from ndicts.ndicts import NestedDict
nd = NestedDict()
nd[0, 1, 2, 3, 4, 5] = 6
The result as a dictionary:
>>> nd.to_dict()
{0: {1: {2: {3: {4: {5: 6}}}}}}
To install ndicts
pip install ndicts
I have a large list like:
[A][B1][C1]=1
[A][B1][C2]=2
[A][B2]=3
[D][E][F][G]=4
I want to build a multi-level dict like:
A
--B1
-----C1=1
-----C2=1
--B2=3
D
--E
----F
------G=4
I know that if I use recursive defaultdict I can write table[A][B1][C1]=1, table[A][B2]=2, but this works only if I hardcode those insert statement.
While parsing the list, I don't how many []'s I need beforehand to call table[key1][key2][...].
You can do it without even defining a class:
from collections import defaultdict
nested_dict = lambda: defaultdict(nested_dict)
nest = nested_dict()
nest[0][1][2][3][4][5] = 6
Your example says that at any level there can be a value, and also a dictionary of sub-elements. That is called a tree, and there are many implementations available for them. This is one:
from collections import defaultdict
class Tree(defaultdict):
def __init__(self, value=None):
super(Tree, self).__init__(Tree)
self.value = value
root = Tree()
root.value = 1
root['a']['b'].value = 3
print root.value
print root['a']['b'].value
print root['c']['d']['f'].value
Outputs:
1
3
None
You could do something similar by writing the input in JSON and using json.load to read it as a structure of nested dictionaries.
I think the simplest implementation of a recursive dictionary is this. Only leaf nodes can contain values.
# Define recursive dictionary
from collections import defaultdict
tree = lambda: defaultdict(tree)
Usage:
# Create instance
mydict = tree()
mydict['a'] = 1
mydict['b']['a'] = 2
mydict['c']
mydict['d']['a']['b'] = 0
# Print
import prettyprint
prettyprint.pp(mydict)
Output:
{
"a": 1,
"b": {
"a": 1
},
"c": {},
"d": {
"a": {
"b": 0
}
}
}
I'd do it with a subclass of dict that defines __missing__:
>>> class NestedDict(dict):
... def __missing__(self, key):
... self[key] = NestedDict()
... return self[key]
...
>>> table = NestedDict()
>>> table['A']['B1']['C1'] = 1
>>> table
{'A': {'B1': {'C1': 1}}}
You can't do it directly with defaultdict because defaultdict expects the factory function at initialization time, but at initialization time, there's no way to describe the same defaultdict. The above construct does the same thing that default dict does, but since it's a named class (NestedDict), it can reference itself as missing keys are encountered. It is also possible to subclass defaultdict and override __init__.
This is equivalent to the above, but avoiding lambda notation. Perhaps easier to read ?
def dict_factory():
return defaultdict(dict_factory)
your_dict = dict_factory()
Also -- from the comments -- if you'd like to update from an existing dict, you can simply call
your_dict[0][1][2].update({"some_key":"some_value"})
In order to add values to the dict.
Dan O'Huiginn posted a very nice solution on his journal in 2010:
http://ohuiginn.net/mt/2010/07/nested_dictionaries_in_python.html
>>> class NestedDict(dict):
... def __getitem__(self, key):
... if key in self: return self.get(key)
... return self.setdefault(key, NestedDict())
>>> eggs = NestedDict()
>>> eggs[1][2][3][4][5]
{}
>>> eggs
{1: {2: {3: {4: {5: {}}}}}}
You may achieve this with a recursive defaultdict.
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
It is important to protect the default factory name, the_tree here, in a closure ("private" local function scope). Avoid using a one-liner lambda version, which is bugged due to Python's late binding closures, and implement this with a def instead.
The accepted answer, using a lambda, has a flaw where instances must rely on the nested_dict name existing in an outer scope. If for whatever reason the factory name can not be resolved (e.g. it was rebound or deleted) then pre-existing instances will also become subtly broken:
>>> nested_dict = lambda: defaultdict(nested_dict)
>>> nest = nested_dict()
>>> nest[0][1][2][3][4][6] = 7
>>> del nested_dict
>>> nest[8][9] = 10
# NameError: name 'nested_dict' is not defined
To add to #Hugo To have a max depth:
l=lambda x:defaultdict(lambda:l(x-1)) if x>0 else defaultdict(dict)
arr = l(2)
A slightly different possibility that allows regular dictionary initialization:
from collections import defaultdict
def superdict(arg=()):
update = lambda obj, arg: obj.update(arg) or obj
return update(defaultdict(superdict), arg)
Example:
>>> d = {"a":1}
>>> sd = superdict(d)
>>> sd["b"]["c"] = 2
You could use a NestedDict.
from ndicts.ndicts import NestedDict
nd = NestedDict()
nd[0, 1, 2, 3, 4, 5] = 6
The result as a dictionary:
>>> nd.to_dict()
{0: {1: {2: {3: {4: {5: 6}}}}}}
To install ndicts
pip install ndicts
Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)
I want to be able to do:
x = defaultdict(...stuff...)
x[0][1][0]
{}
So, I can do x = defaultdict(defaultdict), but that's only a second level:
x[0]
{}
x[0][0]
KeyError: 0
There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?
Note this is asking how to do an infinite-level recursive defaultdict, so it's distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.
I'll probably just end up using the bunch pattern, but when I realized I didn't know how to do this, it got me interested.
The other answers here tell you how to create a defaultdict which contains "infinitely many" defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.
You may have been looking for:
defaultdict(lambda: defaultdict(dict))
The reasons why you might prefer this construct are:
It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
This enables the "leaf" of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))
For an arbitrary number of levels:
def rec_dd():
return defaultdict(rec_dd)
>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}
Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:
rec_dd = lambda: defaultdict(rec_dd)
There is a nifty trick for doing that:
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
Similar to BrenBarn's solution, but doesn't contain the name of the variable tree twice, so it works even after changes to the variable dictionary:
tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))
Then you can create each new x with x = tree().
For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.
class NestedDefaultDict(defaultdict):
def __init__(self, *args, **kwargs):
super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
Usage:
my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']
print(my_dict) # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}
I based this of Andrew's answer here.
If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:
def nested_defaultdict(existing=None, **kwargs):
if existing is None:
existing = {}
if not isinstance(existing, dict):
return existing
existing = {key: nested_defaultdict(val) for key, val in existing.items()}
return defaultdict(nested_defaultdict, existing, **kwargs)
https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775
Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.
(cross posting from Can't pickle defaultdict)
def wrap_defaultdict(instance, times=1):
"""Wrap an instance an arbitrary number of `times` to create nested defaultdict.
Parameters
----------
instance - list, dict, int, collections.Counter
times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
Notes
-----
using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
- thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
"""
from collections import defaultdict
def _dd(x):
return defaultdict(x.copy)
dd = defaultdict(instance)
for i in range(times-1):
dd = _dd(dd)
return dd
Based on Chris W answer, however, to address the type annotation concern, you could make it a factory function that defines the detailed types. For example this is the final solution to my problem when I was researching this question:
def frequency_map_factory() -> dict[str, dict[str, int]]:
"""
Provides a recorder of: per X:str, frequency of Y:str occurrences.
"""
return defaultdict(lambda: defaultdict(int))
here is a recursive function to convert a recursive default dict to a normal dict
def defdict_to_dict(defdict, finaldict):
# pass in an empty dict for finaldict
for k, v in defdict.items():
if isinstance(v, defaultdict):
# new level created and that is the new value
finaldict[k] = defdict_to_dict(v, {})
else:
finaldict[k] = v
return finaldict
defdict_to_dict(my_rec_default_dict, {})
#nucklehead's response can be extended to handle arrays in JSON as well:
def nested_dict(existing=None, **kwargs):
if existing is None:
existing = defaultdict()
if isinstance(existing, list):
existing = [nested_dict(val) for val in existing]
if not isinstance(existing, dict):
return existing
existing = {key: nested_dict(val) for key, val in existing.items()}
return defaultdict(nested_dict, existing, **kwargs)
Here's a solution similar to #Stanislav's answer that works with multiprocessing and also allows for termination of the nesting:
from collections import defaultdict
from functools import partial
class NestedDD(defaultdict):
def __init__(self, n, *args, **kwargs):
self.n = n
factory = partial(build_nested_dd, n=n - 1) if n > 1 else int
super().__init__(factory, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
def build_nested_dd(n):
return NestedDD(n)
Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)
I want to be able to do:
x = defaultdict(...stuff...)
x[0][1][0]
{}
So, I can do x = defaultdict(defaultdict), but that's only a second level:
x[0]
{}
x[0][0]
KeyError: 0
There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?
Note this is asking how to do an infinite-level recursive defaultdict, so it's distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.
I'll probably just end up using the bunch pattern, but when I realized I didn't know how to do this, it got me interested.
The other answers here tell you how to create a defaultdict which contains "infinitely many" defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.
You may have been looking for:
defaultdict(lambda: defaultdict(dict))
The reasons why you might prefer this construct are:
It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
This enables the "leaf" of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))
For an arbitrary number of levels:
def rec_dd():
return defaultdict(rec_dd)
>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}
Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:
rec_dd = lambda: defaultdict(rec_dd)
There is a nifty trick for doing that:
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
Similar to BrenBarn's solution, but doesn't contain the name of the variable tree twice, so it works even after changes to the variable dictionary:
tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))
Then you can create each new x with x = tree().
For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:
from collections import defaultdict
def tree():
def the_tree():
return defaultdict(the_tree)
return the_tree()
I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.
class NestedDefaultDict(defaultdict):
def __init__(self, *args, **kwargs):
super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
Usage:
my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']
print(my_dict) # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}
I based this of Andrew's answer here.
If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:
def nested_defaultdict(existing=None, **kwargs):
if existing is None:
existing = {}
if not isinstance(existing, dict):
return existing
existing = {key: nested_defaultdict(val) for key, val in existing.items()}
return defaultdict(nested_defaultdict, existing, **kwargs)
https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775
Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.
(cross posting from Can't pickle defaultdict)
def wrap_defaultdict(instance, times=1):
"""Wrap an instance an arbitrary number of `times` to create nested defaultdict.
Parameters
----------
instance - list, dict, int, collections.Counter
times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
Notes
-----
using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
- thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
"""
from collections import defaultdict
def _dd(x):
return defaultdict(x.copy)
dd = defaultdict(instance)
for i in range(times-1):
dd = _dd(dd)
return dd
Based on Chris W answer, however, to address the type annotation concern, you could make it a factory function that defines the detailed types. For example this is the final solution to my problem when I was researching this question:
def frequency_map_factory() -> dict[str, dict[str, int]]:
"""
Provides a recorder of: per X:str, frequency of Y:str occurrences.
"""
return defaultdict(lambda: defaultdict(int))
here is a recursive function to convert a recursive default dict to a normal dict
def defdict_to_dict(defdict, finaldict):
# pass in an empty dict for finaldict
for k, v in defdict.items():
if isinstance(v, defaultdict):
# new level created and that is the new value
finaldict[k] = defdict_to_dict(v, {})
else:
finaldict[k] = v
return finaldict
defdict_to_dict(my_rec_default_dict, {})
#nucklehead's response can be extended to handle arrays in JSON as well:
def nested_dict(existing=None, **kwargs):
if existing is None:
existing = defaultdict()
if isinstance(existing, list):
existing = [nested_dict(val) for val in existing]
if not isinstance(existing, dict):
return existing
existing = {key: nested_dict(val) for key, val in existing.items()}
return defaultdict(nested_dict, existing, **kwargs)
Here's a solution similar to #Stanislav's answer that works with multiprocessing and also allows for termination of the nesting:
from collections import defaultdict
from functools import partial
class NestedDD(defaultdict):
def __init__(self, n, *args, **kwargs):
self.n = n
factory = partial(build_nested_dd, n=n - 1) if n > 1 else int
super().__init__(factory, *args, **kwargs)
def __repr__(self):
return repr(dict(self))
def build_nested_dd(n):
return NestedDD(n)
Is there a way to have a defaultdict(defaultdict(int)) in order to make the following code work?
for x in stuff:
d[x.a][x.b] += x.c_int
d needs to be built ad-hoc, depending on x.a and x.b elements.
I could use:
for x in stuff:
d[x.a,x.b] += x.c_int
but then I wouldn't be able to use:
d.keys()
d[x.a].keys()
Yes like this:
defaultdict(lambda: defaultdict(int))
The argument of a defaultdict (in this case is lambda: defaultdict(int)) will be called when you try to access a key that doesn't exist. The return value of it will be set as the new value of this key, which means in our case the value of d[Key_doesnt_exist] will be defaultdict(int).
If you try to access a key from this last defaultdict i.e. d[Key_doesnt_exist][Key_doesnt_exist] it will return 0, which is the return value of the argument of the last defaultdict i.e. int().
The parameter to the defaultdict constructor is the function which will be called for building new elements. So let's use a lambda !
>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0
Since Python 2.7, there's an even better solution using Counter:
>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})
Some bonus features
>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]
For more information see PyMOTW - Collections - Container data types and Python Documentation - collections
Previous answers have addressed how to make a two-levels or n-levels defaultdict. In some cases you want an infinite one:
def ddict():
return defaultdict(ddict)
Usage:
>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
{1: defaultdict(<function ddict at 0x7fcac68bf048>,
{'a': defaultdict(<function ddict at 0x7fcac68bf048>,
{True: 0.5}),
'b': 3})})
I find it slightly more elegant to use partial:
import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)
Of course, this is the same as a lambda.
For reference, it's possible to implement a generic nested defaultdict factory method through:
from collections import defaultdict
from functools import partial
from itertools import repeat
def nested_defaultdict(default_factory, depth=1):
result = partial(defaultdict, default_factory)
for _ in repeat(None, depth - 1):
result = partial(defaultdict, result)
return result()
The depth defines the number of nested dictionary before the type defined in default_factory is used.
For example:
my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')
Others have answered correctly your question of how to get the following to work:
for x in stuff:
d[x.a][x.b] += x.c_int
An alternative would be to use tuples for keys:
d = defaultdict(int)
for x in stuff:
d[x.a,x.b] += x.c_int
# ^^^^^^^ tuple key
The nice thing about this approach is that it is simple and can be easily expanded. If you need a mapping three levels deep, just use a three item tuple for the key.