Need help finishing this code to flatten a dictionary [duplicate] - python
Suppose you have a dictionary like:
{'a': 1,
'c': {'a': 2,
'b': {'x': 5,
'y' : 10}},
'd': [1, 2, 3]}
How would you go about flattening that into something like:
{'a': 1,
'c_a': 2,
'c_b_x': 5,
'c_b_y': 10,
'd': [1, 2, 3]}
Basically the same way you would flatten a nested list, you just have to do the extra work for iterating the dict by key/value, creating new keys for your new dictionary and creating the dictionary at final step.
import collections
def flatten(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
>>> flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]})
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
For Python >= 3.3, change the import to from collections.abc import MutableMapping to avoid a deprecation warning and change collections.MutableMapping to just MutableMapping.
Or if you are already using pandas, You can do it with json_normalize() like so:
import pandas as pd
d = {'a': 1,
'c': {'a': 2, 'b': {'x': 5, 'y' : 10}},
'd': [1, 2, 3]}
df = pd.json_normalize(d, sep='_')
print(df.to_dict(orient='records')[0])
Output:
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'c_b_y': 10, 'd': [1, 2, 3]}
There are two big considerations that the original poster needs to consider:
Are there keyspace clobbering issues? For example, {'a_b':{'c':1}, 'a':{'b_c':2}} would result in {'a_b_c':???}. The below solution evades the problem by returning an iterable of pairs.
If performance is an issue, does the key-reducer function (which I hereby refer to as 'join') require access to the entire key-path, or can it just do O(1) work at every node in the tree? If you want to be able to say joinedKey = '_'.join(*keys), that will cost you O(N^2) running time. However if you're willing to say nextKey = previousKey+'_'+thisKey, that gets you O(N) time. The solution below lets you do both (since you could merely concatenate all the keys, then postprocess them).
(Performance is not likely an issue, but I'll elaborate on the second point in case anyone else cares: In implementing this, there are numerous dangerous choices. If you do this recursively and yield and re-yield, or anything equivalent which touches nodes more than once (which is quite easy to accidentally do), you are doing potentially O(N^2) work rather than O(N). This is because maybe you are calculating a key a then a_1 then a_1_i..., and then calculating a then a_1 then a_1_ii..., but really you shouldn't have to calculate a_1 again. Even if you aren't recalculating it, re-yielding it (a 'level-by-level' approach) is just as bad. A good example is to think about the performance on {1:{1:{1:{1:...(N times)...{1:SOME_LARGE_DICTIONARY_OF_SIZE_N}...}}}})
Below is a function I wrote flattenDict(d, join=..., lift=...) which can be adapted to many purposes and can do what you want. Sadly it is fairly hard to make a lazy version of this function without incurring the above performance penalties (many python builtins like chain.from_iterable aren't actually efficient, which I only realized after extensive testing of three different versions of this code before settling on this one).
from collections import Mapping
from itertools import chain
from operator import add
_FLAG_FIRST = object()
def flattenDict(d, join=add, lift=lambda x:(x,)):
results = []
def visit(subdict, results, partialKey):
for k,v in subdict.items():
newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey,lift(k))
if isinstance(v,Mapping):
visit(v, results, newKey)
else:
results.append((newKey,v))
visit(d, results, _FLAG_FIRST)
return results
To better understand what's going on, below is a diagram for those unfamiliar with reduce(left), otherwise known as "fold left". Sometimes it is drawn with an initial value in place of k0 (not part of the list, passed into the function). Here, J is our join function. We preprocess each kn with lift(k).
[k0,k1,...,kN].foldleft(J)
/ \
... kN
/
J(k0,J(k1,J(k2,k3)))
/ \
/ \
J(J(k0,k1),k2) k3
/ \
/ \
J(k0,k1) k2
/ \
/ \
k0 k1
This is in fact the same as functools.reduce, but where our function does this to all key-paths of the tree.
>>> reduce(lambda a,b:(a,b), range(5))
((((0, 1), 2), 3), 4)
Demonstration (which I'd otherwise put in docstring):
>>> testData = {
'a':1,
'b':2,
'c':{
'aa':11,
'bb':22,
'cc':{
'aaa':111
}
}
}
from pprint import pprint as pp
>>> pp(dict( flattenDict(testData) ))
{('a',): 1,
('b',): 2,
('c', 'aa'): 11,
('c', 'bb'): 22,
('c', 'cc', 'aaa'): 111}
>>> pp(dict( flattenDict(testData, join=lambda a,b:a+'_'+b, lift=lambda x:x) ))
{'a': 1, 'b': 2, 'c_aa': 11, 'c_bb': 22, 'c_cc_aaa': 111}
>>> pp(dict( (v,k) for k,v in flattenDict(testData, lift=hash, join=lambda a,b:hash((a,b))) ))
{1: 12416037344,
2: 12544037731,
11: 5470935132935744593,
22: 4885734186131977315,
111: 3461911260025554326}
Performance:
from functools import reduce
def makeEvilDict(n):
return reduce(lambda acc,x:{x:acc}, [{i:0 for i in range(n)}]+range(n))
import timeit
def time(runnable):
t0 = timeit.default_timer()
_ = runnable()
t1 = timeit.default_timer()
print('took {:.2f} seconds'.format(t1-t0))
>>> pp(makeEvilDict(8))
{7: {6: {5: {4: {3: {2: {1: {0: {0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0}}}}}}}}}
import sys
sys.setrecursionlimit(1000000)
forget = lambda a,b:''
>>> time(lambda: dict(flattenDict(makeEvilDict(10000), join=forget)) )
took 0.10 seconds
>>> time(lambda: dict(flattenDict(makeEvilDict(100000), join=forget)) )
[1] 12569 segmentation fault python
... sigh, don't think that one is my fault...
[unimportant historical note due to moderation issues]
Regarding the alleged duplicate of Flatten a dictionary of dictionaries (2 levels deep) of lists
That question's solution can be implemented in terms of this one by doing sorted( sum(flatten(...),[]) ). The reverse is not possible: while it is true that the values of flatten(...) can be recovered from the alleged duplicate by mapping a higher-order accumulator, one cannot recover the keys. (edit: Also it turns out that the alleged duplicate owner's question is completely different, in that it only deals with dictionaries exactly 2-level deep, though one of the answers on that page gives a general solution.)
If you're using pandas there is a function hidden in pandas.io.json._normalize1 called nested_to_record which does this exactly.
from pandas.io.json._normalize import nested_to_record
flat = nested_to_record(my_dict, sep='_')
1 In pandas versions 0.24.x and older use pandas.io.json.normalize (without the _)
Here is a kind of a "functional", "one-liner" implementation. It is recursive, and based on a conditional expression and a dict comprehension.
def flatten_dict(dd, separator='_', prefix=''):
return { prefix + separator + k if prefix else k : v
for kk, vv in dd.items()
for k, v in flatten_dict(vv, separator, kk).items()
} if isinstance(dd, dict) else { prefix : dd }
Test:
In [2]: flatten_dict({'abc':123, 'hgf':{'gh':432, 'yu':433}, 'gfd':902, 'xzxzxz':{"432":{'0b0b0b':231}, "43234":1321}}, '.')
Out[2]:
{'abc': 123,
'gfd': 902,
'hgf.gh': 432,
'hgf.yu': 433,
'xzxzxz.432.0b0b0b': 231,
'xzxzxz.43234': 1321}
Not exactly what the OP asked, but lots of folks are coming here looking for ways to flatten real-world nested JSON data which can have nested key-value json objects and arrays and json objects inside the arrays and so on. JSON doesn't include tuples, so we don't have to fret over those.
I found an implementation of the list-inclusion comment by #roneo to the answer posted by #Imran :
https://github.com/ScriptSmith/socialreaper/blob/master/socialreaper/tools.py#L8
import collections
def flatten(dictionary, parent_key=False, separator='.'):
"""
Turn a nested dictionary into a flattened dictionary
:param dictionary: The dictionary to flatten
:param parent_key: The string to prepend to dictionary's keys
:param separator: The string used to separate flattened keys
:return: A flattened dictionary
"""
items = []
for key, value in dictionary.items():
new_key = str(parent_key) + separator + key if parent_key else key
if isinstance(value, collections.MutableMapping):
items.extend(flatten(value, new_key, separator).items())
elif isinstance(value, list):
for k, v in enumerate(value):
items.extend(flatten({str(k): v}, new_key).items())
else:
items.append((new_key, value))
return dict(items)
Test it:
flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3] })
>> {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd.0': 1, 'd.1': 2, 'd.2': 3}
Annd that does the job I need done: I throw any complicated json at this and it flattens it out for me.
All credits to https://github.com/ScriptSmith .
Code:
test = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}
def parse_dict(init, lkey=''):
ret = {}
for rkey,val in init.items():
key = lkey+rkey
if isinstance(val, dict):
ret.update(parse_dict(val, key+'_'))
else:
ret[key] = val
return ret
print(parse_dict(test,''))
Results:
$ python test.py
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
I am using python3.2, update for your version of python.
This is not restricted to dictionaries, but every mapping type that implements .items(). Further ist faster as it avoides an if condition. Nevertheless credits go to Imran:
def flatten(d, parent_key=''):
items = []
for k, v in d.items():
try:
items.extend(flatten(v, '%s%s_' % (parent_key, k)).items())
except AttributeError:
items.append(('%s%s' % (parent_key, k), v))
return dict(items)
How about a functional and performant solution in Python3.5?
from functools import reduce
def _reducer(items, key, val, pref):
if isinstance(val, dict):
return {**items, **flatten(val, pref + key)}
else:
return {**items, pref + key: val}
def flatten(d, pref=''):
return(reduce(
lambda new_d, kv: _reducer(new_d, *kv, pref),
d.items(),
{}
))
This is even more performant:
def flatten(d, pref=''):
return(reduce(
lambda new_d, kv: \
isinstance(kv[1], dict) and \
{**new_d, **flatten(kv[1], pref + kv[0])} or \
{**new_d, pref + kv[0]: kv[1]},
d.items(),
{}
))
In use:
my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]}
print(flatten(my_obj))
# {'d': [1, 2, 3], 'cby': 10, 'cbx': 5, 'ca': 2, 'a': 1}
If you are a fan of pythonic oneliners:
my_dict={'a': 1,'c': {'a': 2,'b': {'x': 5,'y' : 10}},'d': [1, 2, 3]}
list(pd.json_normalize(my_dict).T.to_dict().values())[0]
returns:
{'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd': [1, 2, 3]}
You can leave the [0] from the end, if you have a list of dictionaries and not just a single dictionary.
My Python 3.3 Solution using generators:
def flattenit(pyobj, keystring=''):
if type(pyobj) is dict:
if (type(pyobj) is dict):
keystring = keystring + "_" if keystring else keystring
for k in pyobj:
yield from flattenit(pyobj[k], keystring + k)
elif (type(pyobj) is list):
for lelm in pyobj:
yield from flatten(lelm, keystring)
else:
yield keystring, pyobj
my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]}
#your flattened dictionary object
flattened={k:v for k,v in flattenit(my_obj)}
print(flattened)
# result: {'c_b_y': 10, 'd': [1, 2, 3], 'c_a': 2, 'a': 1, 'c_b_x': 5}
Utilizing recursion, keeping it simple and human readable:
def flatten_dict(dictionary, accumulator=None, parent_key=None, separator="."):
if accumulator is None:
accumulator = {}
for k, v in dictionary.items():
k = f"{parent_key}{separator}{k}" if parent_key else k
if isinstance(v, dict):
flatten_dict(dictionary=v, accumulator=accumulator, parent_key=k)
continue
accumulator[k] = v
return accumulator
Call is simple:
new_dict = flatten_dict(dictionary)
or
new_dict = flatten_dict(dictionary, separator="_")
if we want to change the default separator.
A little breakdown:
When the function is first called, it is called only passing the dictionary we want to flatten. The accumulator parameter is here to support recursion, which we see later. So, we instantiate accumulator to an empty dictionary where we will put all of the nested values from the original dictionary.
if accumulator is None:
accumulator = {}
As we iterate over the dictionary's values, we construct a key for every value. The parent_key argument will be None for the first call, while for every nested dictionary, it will contain the key pointing to it, so we prepend that key.
k = f"{parent_key}{separator}{k}" if parent_key else k
In case the value v the key k is pointing to is a dictionary, the function calls itself, passing the nested dictionary, the accumulator (which is passed by reference, so all changes done to it are done on the same instance) and the key k so that we can construct the concatenated key. Notice the continue statement. We want to skip the next line, outside of the if block, so that the nested dictionary doesn't end up in the accumulator under key k.
if isinstance(v, dict):
flatten_dict(dict=v, accumulator=accumulator, parent_key=k)
continue
So, what do we do in case the value v is not a dictionary? Just put it unchanged inside the accumulator.
accumulator[k] = v
Once we're done we just return the accumulator, leaving the original dictionary argument untouched.
NOTE
This will work only with dictionaries that have strings as keys. It will work with hashable objects implementing the __repr__ method, but will yield unwanted results.
Simple function to flatten nested dictionaries. For Python 3, replace .iteritems() with .items()
def flatten_dict(init_dict):
res_dict = {}
if type(init_dict) is not dict:
return res_dict
for k, v in init_dict.iteritems():
if type(v) == dict:
res_dict.update(flatten_dict(v))
else:
res_dict[k] = v
return res_dict
The idea/requirement was:
Get flat dictionaries with no keeping parent keys.
Example of usage:
dd = {'a': 3,
'b': {'c': 4, 'd': 5},
'e': {'f':
{'g': 1, 'h': 2}
},
'i': 9,
}
flatten_dict(dd)
>> {'a': 3, 'c': 4, 'd': 5, 'g': 1, 'h': 2, 'i': 9}
Keeping parent keys is simple as well.
I was thinking of a subclass of UserDict to automagically flat the keys.
class FlatDict(UserDict):
def __init__(self, *args, separator='.', **kwargs):
self.separator = separator
super().__init__(*args, **kwargs)
def __setitem__(self, key, value):
if isinstance(value, dict):
for k1, v1 in FlatDict(value, separator=self.separator).items():
super().__setitem__(f"{key}{self.separator}{k1}", v1)
else:
super().__setitem__(key, value)
The advantages it that keys can be added on the fly, or using standard dict instanciation, without surprise:
>>> fd = FlatDict(
... {
... 'person': {
... 'sexe': 'male',
... 'name': {
... 'first': 'jacques',
... 'last': 'dupond'
... }
... }
... }
... )
>>> fd
{'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond'}
>>> fd['person'] = {'name': {'nickname': 'Bob'}}
>>> fd
{'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond', 'person.name.nickname': 'Bob'}
>>> fd['person.name'] = {'civility': 'Dr'}
>>> fd
{'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond', 'person.name.nickname': 'Bob', 'person.name.civility': 'Dr'}
This is similar to both imran's and ralu's answer. It does not use a generator, but instead employs recursion with a closure:
def flatten_dict(d, separator='_'):
final = {}
def _flatten_dict(obj, parent_keys=[]):
for k, v in obj.iteritems():
if isinstance(v, dict):
_flatten_dict(v, parent_keys + [k])
else:
key = separator.join(parent_keys + [k])
final[key] = v
_flatten_dict(d)
return final
>>> print flatten_dict({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]})
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
The answers above work really well. Just thought I'd add the unflatten function that I wrote:
def unflatten(d):
ud = {}
for k, v in d.items():
context = ud
for sub_key in k.split('_')[:-1]:
if sub_key not in context:
context[sub_key] = {}
context = context[sub_key]
context[k.split('_')[-1]] = v
return ud
Note: This doesn't account for '_' already present in keys, much like the flatten counterparts.
Davoud's solution is very nice but doesn't give satisfactory results when the nested dict also contains lists of dicts, but his code be adapted for that case:
def flatten_dict(d):
items = []
for k, v in d.items():
try:
if (type(v)==type([])):
for l in v: items.extend(flatten_dict(l).items())
else:
items.extend(flatten_dict(v).items())
except AttributeError:
items.append((k, v))
return dict(items)
def flatten(unflattened_dict, separator='_'):
flattened_dict = {}
for k, v in unflattened_dict.items():
if isinstance(v, dict):
sub_flattened_dict = flatten(v, separator)
for k2, v2 in sub_flattened_dict.items():
flattened_dict[k + separator + k2] = v2
else:
flattened_dict[k] = v
return flattened_dict
I actually wrote a package called cherrypicker recently to deal with this exact sort of thing since I had to do it so often!
I think the following code would give you exactly what you're after:
from cherrypicker import CherryPicker
dct = {
'a': 1,
'c': {
'a': 2,
'b': {
'x': 5,
'y' : 10
}
},
'd': [1, 2, 3]
}
picker = CherryPicker(dct)
picker.flatten().get()
You can install the package with:
pip install cherrypicker
...and there's more docs and guidance at https://cherrypicker.readthedocs.io.
Other methods may be faster, but the priority of this package is to make such tasks easy. If you do have a large list of objects to flatten though, you can also tell CherryPicker to use parallel processing to speed things up.
here's a solution using a stack. No recursion.
def flatten_nested_dict(nested):
stack = list(nested.items())
ans = {}
while stack:
key, val = stack.pop()
if isinstance(val, dict):
for sub_key, sub_val in val.items():
stack.append((f"{key}_{sub_key}", sub_val))
else:
ans[key] = val
return ans
Using generators:
def flat_dic_helper(prepand,d):
if len(prepand) > 0:
prepand = prepand + "_"
for k in d:
i = d[k]
if isinstance(i, dict):
r = flat_dic_helper(prepand + k,i)
for j in r:
yield j
else:
yield (prepand + k,i)
def flat_dic(d):
return dict(flat_dic_helper("",d))
d = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}
print(flat_dic(d))
>> {'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
Here's an algorithm for elegant, in-place replacement. Tested with Python 2.7 and Python 3.5. Using the dot character as a separator.
def flatten_json(json):
if type(json) == dict:
for k, v in list(json.items()):
if type(v) == dict:
flatten_json(v)
json.pop(k)
for k2, v2 in v.items():
json[k+"."+k2] = v2
Example:
d = {'a': {'b': 'c'}}
flatten_json(d)
print(d)
unflatten_json(d)
print(d)
Output:
{'a.b': 'c'}
{'a': {'b': 'c'}}
I published this code here along with the matching unflatten_json function.
If you want to flat nested dictionary and want all unique keys list then here is the solution:
def flat_dict_return_unique_key(data, unique_keys=set()):
if isinstance(data, dict):
[unique_keys.add(i) for i in data.keys()]
for each_v in data.values():
if isinstance(each_v, dict):
flat_dict_return_unique_key(each_v, unique_keys)
return list(set(unique_keys))
I always prefer access dict objects via .items(), so for flattening dicts I use the following recursive generator flat_items(d). If you like to have dict again, simply wrap it like this: flat = dict(flat_items(d))
def flat_items(d, key_separator='.'):
"""
Flattens the dictionary containing other dictionaries like here: https://stackoverflow.com/questions/6027558/flatten-nested-python-dictionaries-compressing-keys
>>> example = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}
>>> flat = dict(flat_items(example, key_separator='_'))
>>> assert flat['c_b_y'] == 10
"""
for k, v in d.items():
if type(v) is dict:
for k1, v1 in flat_items(v, key_separator=key_separator):
yield key_separator.join((k, k1)), v1
else:
yield k, v
def flatten_nested_dict(_dict, _str=''):
'''
recursive function to flatten a nested dictionary json
'''
ret_dict = {}
for k, v in _dict.items():
if isinstance(v, dict):
ret_dict.update(flatten_nested_dict(v, _str = '_'.join([_str, k]).strip('_')))
elif isinstance(v, list):
for index, item in enumerate(v):
if isinstance(item, dict):
ret_dict.update(flatten_nested_dict(item, _str= '_'.join([_str, k, str(index)]).strip('_')))
else:
ret_dict['_'.join([_str, k, str(index)]).strip('_')] = item
else:
ret_dict['_'.join([_str, k]).strip('_')] = v
return ret_dict
Using dict.popitem() in straightforward nested-list-like recursion:
def flatten(d):
if d == {}:
return d
else:
k,v = d.popitem()
if (dict != type(v)):
return {k:v, **flatten(d)}
else:
flat_kv = flatten(v)
for k1 in list(flat_kv.keys()):
flat_kv[k + '_' + k1] = flat_kv[k1]
del flat_kv[k1]
return {**flat_kv, **flatten(d)}
If you do not mind recursive functions, here is a solution. I have also taken the liberty to include an exclusion-parameter in case there are one or more values you wish to maintain.
Code:
def flatten_dict(dictionary, exclude = [], delimiter ='_'):
flat_dict = dict()
for key, value in dictionary.items():
if isinstance(value, dict) and key not in exclude:
flatten_value_dict = flatten_dict(value, exclude, delimiter)
for k, v in flatten_value_dict.items():
flat_dict[f"{key}{delimiter}{k}"] = v
else:
flat_dict[key] = value
return flat_dict
Usage:
d = {'a':1, 'b':[1, 2], 'c':3, 'd':{'a':4, 'b':{'a':7, 'b':8}, 'c':6}, 'e':{'a':1,'b':2}}
flat_d = flatten_dict(dictionary=d, exclude=['e'], delimiter='.')
print(flat_d)
Output:
{'a': 1, 'b': [1, 2], 'c': 3, 'd.a': 4, 'd.b.a': 7, 'd.b.b': 8, 'd.c': 6, 'e': {'a': 1, 'b': 2}}
Variation of this Flatten nested dictionaries, compressing keys with max_level and custom reducer.
def flatten(d, max_level=None, reducer='tuple'):
if reducer == 'tuple':
reducer_seed = tuple()
reducer_func = lambda x, y: (*x, y)
else:
raise ValueError(f'Unknown reducer: {reducer}')
def impl(d, pref, level):
return reduce(
lambda new_d, kv:
(max_level is None or level < max_level)
and isinstance(kv[1], dict)
and {**new_d, **impl(kv[1], reducer_func(pref, kv[0]), level + 1)}
or {**new_d, reducer_func(pref, kv[0]): kv[1]},
d.items(),
{}
)
return impl(d, reducer_seed, 0)
I tried some of the solutions on this page - though not all - but those I tried failed to handle the nested list of dict.
Consider a dict like this:
d = {
'owner': {
'name': {'first_name': 'Steven', 'last_name': 'Smith'},
'lottery_nums': [1, 2, 3, 'four', '11', None],
'address': {},
'tuple': (1, 2, 'three'),
'tuple_with_dict': (1, 2, 'three', {'is_valid': False}),
'set': {1, 2, 3, 4, 'five'},
'children': [
{'name': {'first_name': 'Jessica',
'last_name': 'Smith', },
'children': []
},
{'name': {'first_name': 'George',
'last_name': 'Smith'},
'children': []
}
]
}
}
Here's my makeshift solution:
def flatten_dict(input_node: dict, key_: str = '', output_dict: dict = {}):
if isinstance(input_node, dict):
for key, val in input_node.items():
new_key = f"{key_}.{key}" if key_ else f"{key}"
flatten_dict(val, new_key, output_dict)
elif isinstance(input_node, list):
for idx, item in enumerate(input_node):
flatten_dict(item, f"{key_}.{idx}", output_dict)
else:
output_dict[key_] = input_node
return output_dict
which produces:
{
owner.name.first_name: Steven,
owner.name.last_name: Smith,
owner.lottery_nums.0: 1,
owner.lottery_nums.1: 2,
owner.lottery_nums.2: 3,
owner.lottery_nums.3: four,
owner.lottery_nums.4: 11,
owner.lottery_nums.5: None,
owner.tuple: (1, 2, 'three'),
owner.tuple_with_dict: (1, 2, 'three', {'is_valid': False}),
owner.set: {1, 2, 3, 4, 'five'},
owner.children.0.name.first_name: Jessica,
owner.children.0.name.last_name: Smith,
owner.children.1.name.first_name: George,
owner.children.1.name.last_name: Smith,
}
A makeshift solution and it's not perfect.
NOTE:
it doesn't keep empty dicts such as the address: {} k/v pair.
it won't flatten dicts in nested tuples - though it would be easy to add using the fact that python tuples act similar to lists.
You can use recursion in order to flatten your dictionary.
import collections
def flatten(
nested_dict,
seperator='.',
name=None,
):
flatten_dict = {}
if not nested_dict:
return flatten_dict
if isinstance(
nested_dict,
collections.abc.MutableMapping,
):
for key, value in nested_dict.items():
if name is not None:
flatten_dict.update(
flatten(
nested_dict=value,
seperator=seperator,
name=f'{name}{seperator}{key}',
),
)
else:
flatten_dict.update(
flatten(
nested_dict=value,
seperator=seperator,
name=key,
),
)
else:
flatten_dict[name] = nested_dict
return flatten_dict
if __name__ == '__main__':
nested_dict = {
1: 'a',
2: {
3: 'c',
4: {
5: 'e',
},
6: [1, 2, 3, 4, 5, ],
},
}
print(
flatten(
nested_dict=nested_dict,
),
)
Output:
{
"1":"a",
"2.3":"c",
"2.4.5":"e",
"2.6":[1, 2, 3, 4, 5]
}
Related
Extract varying levels of nested key value pairs from dictionary [duplicate]
Suppose you have a dictionary like: {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]} How would you go about flattening that into something like: {'a': 1, 'c_a': 2, 'c_b_x': 5, 'c_b_y': 10, 'd': [1, 2, 3]}
Basically the same way you would flatten a nested list, you just have to do the extra work for iterating the dict by key/value, creating new keys for your new dictionary and creating the dictionary at final step. import collections def flatten(d, parent_key='', sep='_'): items = [] for k, v in d.items(): new_key = parent_key + sep + k if parent_key else k if isinstance(v, collections.MutableMapping): items.extend(flatten(v, new_key, sep=sep).items()) else: items.append((new_key, v)) return dict(items) >>> flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}) {'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10} For Python >= 3.3, change the import to from collections.abc import MutableMapping to avoid a deprecation warning and change collections.MutableMapping to just MutableMapping.
Or if you are already using pandas, You can do it with json_normalize() like so: import pandas as pd d = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]} df = pd.json_normalize(d, sep='_') print(df.to_dict(orient='records')[0]) Output: {'a': 1, 'c_a': 2, 'c_b_x': 5, 'c_b_y': 10, 'd': [1, 2, 3]}
There are two big considerations that the original poster needs to consider: Are there keyspace clobbering issues? For example, {'a_b':{'c':1}, 'a':{'b_c':2}} would result in {'a_b_c':???}. The below solution evades the problem by returning an iterable of pairs. If performance is an issue, does the key-reducer function (which I hereby refer to as 'join') require access to the entire key-path, or can it just do O(1) work at every node in the tree? If you want to be able to say joinedKey = '_'.join(*keys), that will cost you O(N^2) running time. However if you're willing to say nextKey = previousKey+'_'+thisKey, that gets you O(N) time. The solution below lets you do both (since you could merely concatenate all the keys, then postprocess them). (Performance is not likely an issue, but I'll elaborate on the second point in case anyone else cares: In implementing this, there are numerous dangerous choices. If you do this recursively and yield and re-yield, or anything equivalent which touches nodes more than once (which is quite easy to accidentally do), you are doing potentially O(N^2) work rather than O(N). This is because maybe you are calculating a key a then a_1 then a_1_i..., and then calculating a then a_1 then a_1_ii..., but really you shouldn't have to calculate a_1 again. Even if you aren't recalculating it, re-yielding it (a 'level-by-level' approach) is just as bad. A good example is to think about the performance on {1:{1:{1:{1:...(N times)...{1:SOME_LARGE_DICTIONARY_OF_SIZE_N}...}}}}) Below is a function I wrote flattenDict(d, join=..., lift=...) which can be adapted to many purposes and can do what you want. Sadly it is fairly hard to make a lazy version of this function without incurring the above performance penalties (many python builtins like chain.from_iterable aren't actually efficient, which I only realized after extensive testing of three different versions of this code before settling on this one). from collections import Mapping from itertools import chain from operator import add _FLAG_FIRST = object() def flattenDict(d, join=add, lift=lambda x:(x,)): results = [] def visit(subdict, results, partialKey): for k,v in subdict.items(): newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey,lift(k)) if isinstance(v,Mapping): visit(v, results, newKey) else: results.append((newKey,v)) visit(d, results, _FLAG_FIRST) return results To better understand what's going on, below is a diagram for those unfamiliar with reduce(left), otherwise known as "fold left". Sometimes it is drawn with an initial value in place of k0 (not part of the list, passed into the function). Here, J is our join function. We preprocess each kn with lift(k). [k0,k1,...,kN].foldleft(J) / \ ... kN / J(k0,J(k1,J(k2,k3))) / \ / \ J(J(k0,k1),k2) k3 / \ / \ J(k0,k1) k2 / \ / \ k0 k1 This is in fact the same as functools.reduce, but where our function does this to all key-paths of the tree. >>> reduce(lambda a,b:(a,b), range(5)) ((((0, 1), 2), 3), 4) Demonstration (which I'd otherwise put in docstring): >>> testData = { 'a':1, 'b':2, 'c':{ 'aa':11, 'bb':22, 'cc':{ 'aaa':111 } } } from pprint import pprint as pp >>> pp(dict( flattenDict(testData) )) {('a',): 1, ('b',): 2, ('c', 'aa'): 11, ('c', 'bb'): 22, ('c', 'cc', 'aaa'): 111} >>> pp(dict( flattenDict(testData, join=lambda a,b:a+'_'+b, lift=lambda x:x) )) {'a': 1, 'b': 2, 'c_aa': 11, 'c_bb': 22, 'c_cc_aaa': 111} >>> pp(dict( (v,k) for k,v in flattenDict(testData, lift=hash, join=lambda a,b:hash((a,b))) )) {1: 12416037344, 2: 12544037731, 11: 5470935132935744593, 22: 4885734186131977315, 111: 3461911260025554326} Performance: from functools import reduce def makeEvilDict(n): return reduce(lambda acc,x:{x:acc}, [{i:0 for i in range(n)}]+range(n)) import timeit def time(runnable): t0 = timeit.default_timer() _ = runnable() t1 = timeit.default_timer() print('took {:.2f} seconds'.format(t1-t0)) >>> pp(makeEvilDict(8)) {7: {6: {5: {4: {3: {2: {1: {0: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0}}}}}}}}} import sys sys.setrecursionlimit(1000000) forget = lambda a,b:'' >>> time(lambda: dict(flattenDict(makeEvilDict(10000), join=forget)) ) took 0.10 seconds >>> time(lambda: dict(flattenDict(makeEvilDict(100000), join=forget)) ) [1] 12569 segmentation fault python ... sigh, don't think that one is my fault... [unimportant historical note due to moderation issues] Regarding the alleged duplicate of Flatten a dictionary of dictionaries (2 levels deep) of lists That question's solution can be implemented in terms of this one by doing sorted( sum(flatten(...),[]) ). The reverse is not possible: while it is true that the values of flatten(...) can be recovered from the alleged duplicate by mapping a higher-order accumulator, one cannot recover the keys. (edit: Also it turns out that the alleged duplicate owner's question is completely different, in that it only deals with dictionaries exactly 2-level deep, though one of the answers on that page gives a general solution.)
If you're using pandas there is a function hidden in pandas.io.json._normalize1 called nested_to_record which does this exactly. from pandas.io.json._normalize import nested_to_record flat = nested_to_record(my_dict, sep='_') 1 In pandas versions 0.24.x and older use pandas.io.json.normalize (without the _)
Here is a kind of a "functional", "one-liner" implementation. It is recursive, and based on a conditional expression and a dict comprehension. def flatten_dict(dd, separator='_', prefix=''): return { prefix + separator + k if prefix else k : v for kk, vv in dd.items() for k, v in flatten_dict(vv, separator, kk).items() } if isinstance(dd, dict) else { prefix : dd } Test: In [2]: flatten_dict({'abc':123, 'hgf':{'gh':432, 'yu':433}, 'gfd':902, 'xzxzxz':{"432":{'0b0b0b':231}, "43234":1321}}, '.') Out[2]: {'abc': 123, 'gfd': 902, 'hgf.gh': 432, 'hgf.yu': 433, 'xzxzxz.432.0b0b0b': 231, 'xzxzxz.43234': 1321}
Not exactly what the OP asked, but lots of folks are coming here looking for ways to flatten real-world nested JSON data which can have nested key-value json objects and arrays and json objects inside the arrays and so on. JSON doesn't include tuples, so we don't have to fret over those. I found an implementation of the list-inclusion comment by #roneo to the answer posted by #Imran : https://github.com/ScriptSmith/socialreaper/blob/master/socialreaper/tools.py#L8 import collections def flatten(dictionary, parent_key=False, separator='.'): """ Turn a nested dictionary into a flattened dictionary :param dictionary: The dictionary to flatten :param parent_key: The string to prepend to dictionary's keys :param separator: The string used to separate flattened keys :return: A flattened dictionary """ items = [] for key, value in dictionary.items(): new_key = str(parent_key) + separator + key if parent_key else key if isinstance(value, collections.MutableMapping): items.extend(flatten(value, new_key, separator).items()) elif isinstance(value, list): for k, v in enumerate(value): items.extend(flatten({str(k): v}, new_key).items()) else: items.append((new_key, value)) return dict(items) Test it: flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3] }) >> {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd.0': 1, 'd.1': 2, 'd.2': 3} Annd that does the job I need done: I throw any complicated json at this and it flattens it out for me. All credits to https://github.com/ScriptSmith .
Code: test = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]} def parse_dict(init, lkey=''): ret = {} for rkey,val in init.items(): key = lkey+rkey if isinstance(val, dict): ret.update(parse_dict(val, key+'_')) else: ret[key] = val return ret print(parse_dict(test,'')) Results: $ python test.py {'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10} I am using python3.2, update for your version of python.
This is not restricted to dictionaries, but every mapping type that implements .items(). Further ist faster as it avoides an if condition. Nevertheless credits go to Imran: def flatten(d, parent_key=''): items = [] for k, v in d.items(): try: items.extend(flatten(v, '%s%s_' % (parent_key, k)).items()) except AttributeError: items.append(('%s%s' % (parent_key, k), v)) return dict(items)
How about a functional and performant solution in Python3.5? from functools import reduce def _reducer(items, key, val, pref): if isinstance(val, dict): return {**items, **flatten(val, pref + key)} else: return {**items, pref + key: val} def flatten(d, pref=''): return(reduce( lambda new_d, kv: _reducer(new_d, *kv, pref), d.items(), {} )) This is even more performant: def flatten(d, pref=''): return(reduce( lambda new_d, kv: \ isinstance(kv[1], dict) and \ {**new_d, **flatten(kv[1], pref + kv[0])} or \ {**new_d, pref + kv[0]: kv[1]}, d.items(), {} )) In use: my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]} print(flatten(my_obj)) # {'d': [1, 2, 3], 'cby': 10, 'cbx': 5, 'ca': 2, 'a': 1}
If you are a fan of pythonic oneliners: my_dict={'a': 1,'c': {'a': 2,'b': {'x': 5,'y' : 10}},'d': [1, 2, 3]} list(pd.json_normalize(my_dict).T.to_dict().values())[0] returns: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd': [1, 2, 3]} You can leave the [0] from the end, if you have a list of dictionaries and not just a single dictionary.
My Python 3.3 Solution using generators: def flattenit(pyobj, keystring=''): if type(pyobj) is dict: if (type(pyobj) is dict): keystring = keystring + "_" if keystring else keystring for k in pyobj: yield from flattenit(pyobj[k], keystring + k) elif (type(pyobj) is list): for lelm in pyobj: yield from flatten(lelm, keystring) else: yield keystring, pyobj my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]} #your flattened dictionary object flattened={k:v for k,v in flattenit(my_obj)} print(flattened) # result: {'c_b_y': 10, 'd': [1, 2, 3], 'c_a': 2, 'a': 1, 'c_b_x': 5}
Utilizing recursion, keeping it simple and human readable: def flatten_dict(dictionary, accumulator=None, parent_key=None, separator="."): if accumulator is None: accumulator = {} for k, v in dictionary.items(): k = f"{parent_key}{separator}{k}" if parent_key else k if isinstance(v, dict): flatten_dict(dictionary=v, accumulator=accumulator, parent_key=k) continue accumulator[k] = v return accumulator Call is simple: new_dict = flatten_dict(dictionary) or new_dict = flatten_dict(dictionary, separator="_") if we want to change the default separator. A little breakdown: When the function is first called, it is called only passing the dictionary we want to flatten. The accumulator parameter is here to support recursion, which we see later. So, we instantiate accumulator to an empty dictionary where we will put all of the nested values from the original dictionary. if accumulator is None: accumulator = {} As we iterate over the dictionary's values, we construct a key for every value. The parent_key argument will be None for the first call, while for every nested dictionary, it will contain the key pointing to it, so we prepend that key. k = f"{parent_key}{separator}{k}" if parent_key else k In case the value v the key k is pointing to is a dictionary, the function calls itself, passing the nested dictionary, the accumulator (which is passed by reference, so all changes done to it are done on the same instance) and the key k so that we can construct the concatenated key. Notice the continue statement. We want to skip the next line, outside of the if block, so that the nested dictionary doesn't end up in the accumulator under key k. if isinstance(v, dict): flatten_dict(dict=v, accumulator=accumulator, parent_key=k) continue So, what do we do in case the value v is not a dictionary? Just put it unchanged inside the accumulator. accumulator[k] = v Once we're done we just return the accumulator, leaving the original dictionary argument untouched. NOTE This will work only with dictionaries that have strings as keys. It will work with hashable objects implementing the __repr__ method, but will yield unwanted results.
Simple function to flatten nested dictionaries. For Python 3, replace .iteritems() with .items() def flatten_dict(init_dict): res_dict = {} if type(init_dict) is not dict: return res_dict for k, v in init_dict.iteritems(): if type(v) == dict: res_dict.update(flatten_dict(v)) else: res_dict[k] = v return res_dict The idea/requirement was: Get flat dictionaries with no keeping parent keys. Example of usage: dd = {'a': 3, 'b': {'c': 4, 'd': 5}, 'e': {'f': {'g': 1, 'h': 2} }, 'i': 9, } flatten_dict(dd) >> {'a': 3, 'c': 4, 'd': 5, 'g': 1, 'h': 2, 'i': 9} Keeping parent keys is simple as well.
I was thinking of a subclass of UserDict to automagically flat the keys. class FlatDict(UserDict): def __init__(self, *args, separator='.', **kwargs): self.separator = separator super().__init__(*args, **kwargs) def __setitem__(self, key, value): if isinstance(value, dict): for k1, v1 in FlatDict(value, separator=self.separator).items(): super().__setitem__(f"{key}{self.separator}{k1}", v1) else: super().__setitem__(key, value) The advantages it that keys can be added on the fly, or using standard dict instanciation, without surprise: >>> fd = FlatDict( ... { ... 'person': { ... 'sexe': 'male', ... 'name': { ... 'first': 'jacques', ... 'last': 'dupond' ... } ... } ... } ... ) >>> fd {'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond'} >>> fd['person'] = {'name': {'nickname': 'Bob'}} >>> fd {'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond', 'person.name.nickname': 'Bob'} >>> fd['person.name'] = {'civility': 'Dr'} >>> fd {'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond', 'person.name.nickname': 'Bob', 'person.name.civility': 'Dr'}
This is similar to both imran's and ralu's answer. It does not use a generator, but instead employs recursion with a closure: def flatten_dict(d, separator='_'): final = {} def _flatten_dict(obj, parent_keys=[]): for k, v in obj.iteritems(): if isinstance(v, dict): _flatten_dict(v, parent_keys + [k]) else: key = separator.join(parent_keys + [k]) final[key] = v _flatten_dict(d) return final >>> print flatten_dict({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}) {'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
The answers above work really well. Just thought I'd add the unflatten function that I wrote: def unflatten(d): ud = {} for k, v in d.items(): context = ud for sub_key in k.split('_')[:-1]: if sub_key not in context: context[sub_key] = {} context = context[sub_key] context[k.split('_')[-1]] = v return ud Note: This doesn't account for '_' already present in keys, much like the flatten counterparts.
Davoud's solution is very nice but doesn't give satisfactory results when the nested dict also contains lists of dicts, but his code be adapted for that case: def flatten_dict(d): items = [] for k, v in d.items(): try: if (type(v)==type([])): for l in v: items.extend(flatten_dict(l).items()) else: items.extend(flatten_dict(v).items()) except AttributeError: items.append((k, v)) return dict(items)
def flatten(unflattened_dict, separator='_'): flattened_dict = {} for k, v in unflattened_dict.items(): if isinstance(v, dict): sub_flattened_dict = flatten(v, separator) for k2, v2 in sub_flattened_dict.items(): flattened_dict[k + separator + k2] = v2 else: flattened_dict[k] = v return flattened_dict
I actually wrote a package called cherrypicker recently to deal with this exact sort of thing since I had to do it so often! I think the following code would give you exactly what you're after: from cherrypicker import CherryPicker dct = { 'a': 1, 'c': { 'a': 2, 'b': { 'x': 5, 'y' : 10 } }, 'd': [1, 2, 3] } picker = CherryPicker(dct) picker.flatten().get() You can install the package with: pip install cherrypicker ...and there's more docs and guidance at https://cherrypicker.readthedocs.io. Other methods may be faster, but the priority of this package is to make such tasks easy. If you do have a large list of objects to flatten though, you can also tell CherryPicker to use parallel processing to speed things up.
here's a solution using a stack. No recursion. def flatten_nested_dict(nested): stack = list(nested.items()) ans = {} while stack: key, val = stack.pop() if isinstance(val, dict): for sub_key, sub_val in val.items(): stack.append((f"{key}_{sub_key}", sub_val)) else: ans[key] = val return ans
Using generators: def flat_dic_helper(prepand,d): if len(prepand) > 0: prepand = prepand + "_" for k in d: i = d[k] if isinstance(i, dict): r = flat_dic_helper(prepand + k,i) for j in r: yield j else: yield (prepand + k,i) def flat_dic(d): return dict(flat_dic_helper("",d)) d = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]} print(flat_dic(d)) >> {'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
Here's an algorithm for elegant, in-place replacement. Tested with Python 2.7 and Python 3.5. Using the dot character as a separator. def flatten_json(json): if type(json) == dict: for k, v in list(json.items()): if type(v) == dict: flatten_json(v) json.pop(k) for k2, v2 in v.items(): json[k+"."+k2] = v2 Example: d = {'a': {'b': 'c'}} flatten_json(d) print(d) unflatten_json(d) print(d) Output: {'a.b': 'c'} {'a': {'b': 'c'}} I published this code here along with the matching unflatten_json function.
If you want to flat nested dictionary and want all unique keys list then here is the solution: def flat_dict_return_unique_key(data, unique_keys=set()): if isinstance(data, dict): [unique_keys.add(i) for i in data.keys()] for each_v in data.values(): if isinstance(each_v, dict): flat_dict_return_unique_key(each_v, unique_keys) return list(set(unique_keys))
I always prefer access dict objects via .items(), so for flattening dicts I use the following recursive generator flat_items(d). If you like to have dict again, simply wrap it like this: flat = dict(flat_items(d)) def flat_items(d, key_separator='.'): """ Flattens the dictionary containing other dictionaries like here: https://stackoverflow.com/questions/6027558/flatten-nested-python-dictionaries-compressing-keys >>> example = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]} >>> flat = dict(flat_items(example, key_separator='_')) >>> assert flat['c_b_y'] == 10 """ for k, v in d.items(): if type(v) is dict: for k1, v1 in flat_items(v, key_separator=key_separator): yield key_separator.join((k, k1)), v1 else: yield k, v
def flatten_nested_dict(_dict, _str=''): ''' recursive function to flatten a nested dictionary json ''' ret_dict = {} for k, v in _dict.items(): if isinstance(v, dict): ret_dict.update(flatten_nested_dict(v, _str = '_'.join([_str, k]).strip('_'))) elif isinstance(v, list): for index, item in enumerate(v): if isinstance(item, dict): ret_dict.update(flatten_nested_dict(item, _str= '_'.join([_str, k, str(index)]).strip('_'))) else: ret_dict['_'.join([_str, k, str(index)]).strip('_')] = item else: ret_dict['_'.join([_str, k]).strip('_')] = v return ret_dict
Using dict.popitem() in straightforward nested-list-like recursion: def flatten(d): if d == {}: return d else: k,v = d.popitem() if (dict != type(v)): return {k:v, **flatten(d)} else: flat_kv = flatten(v) for k1 in list(flat_kv.keys()): flat_kv[k + '_' + k1] = flat_kv[k1] del flat_kv[k1] return {**flat_kv, **flatten(d)}
If you do not mind recursive functions, here is a solution. I have also taken the liberty to include an exclusion-parameter in case there are one or more values you wish to maintain. Code: def flatten_dict(dictionary, exclude = [], delimiter ='_'): flat_dict = dict() for key, value in dictionary.items(): if isinstance(value, dict) and key not in exclude: flatten_value_dict = flatten_dict(value, exclude, delimiter) for k, v in flatten_value_dict.items(): flat_dict[f"{key}{delimiter}{k}"] = v else: flat_dict[key] = value return flat_dict Usage: d = {'a':1, 'b':[1, 2], 'c':3, 'd':{'a':4, 'b':{'a':7, 'b':8}, 'c':6}, 'e':{'a':1,'b':2}} flat_d = flatten_dict(dictionary=d, exclude=['e'], delimiter='.') print(flat_d) Output: {'a': 1, 'b': [1, 2], 'c': 3, 'd.a': 4, 'd.b.a': 7, 'd.b.b': 8, 'd.c': 6, 'e': {'a': 1, 'b': 2}}
Variation of this Flatten nested dictionaries, compressing keys with max_level and custom reducer. def flatten(d, max_level=None, reducer='tuple'): if reducer == 'tuple': reducer_seed = tuple() reducer_func = lambda x, y: (*x, y) else: raise ValueError(f'Unknown reducer: {reducer}') def impl(d, pref, level): return reduce( lambda new_d, kv: (max_level is None or level < max_level) and isinstance(kv[1], dict) and {**new_d, **impl(kv[1], reducer_func(pref, kv[0]), level + 1)} or {**new_d, reducer_func(pref, kv[0]): kv[1]}, d.items(), {} ) return impl(d, reducer_seed, 0)
I tried some of the solutions on this page - though not all - but those I tried failed to handle the nested list of dict. Consider a dict like this: d = { 'owner': { 'name': {'first_name': 'Steven', 'last_name': 'Smith'}, 'lottery_nums': [1, 2, 3, 'four', '11', None], 'address': {}, 'tuple': (1, 2, 'three'), 'tuple_with_dict': (1, 2, 'three', {'is_valid': False}), 'set': {1, 2, 3, 4, 'five'}, 'children': [ {'name': {'first_name': 'Jessica', 'last_name': 'Smith', }, 'children': [] }, {'name': {'first_name': 'George', 'last_name': 'Smith'}, 'children': [] } ] } } Here's my makeshift solution: def flatten_dict(input_node: dict, key_: str = '', output_dict: dict = {}): if isinstance(input_node, dict): for key, val in input_node.items(): new_key = f"{key_}.{key}" if key_ else f"{key}" flatten_dict(val, new_key, output_dict) elif isinstance(input_node, list): for idx, item in enumerate(input_node): flatten_dict(item, f"{key_}.{idx}", output_dict) else: output_dict[key_] = input_node return output_dict which produces: { owner.name.first_name: Steven, owner.name.last_name: Smith, owner.lottery_nums.0: 1, owner.lottery_nums.1: 2, owner.lottery_nums.2: 3, owner.lottery_nums.3: four, owner.lottery_nums.4: 11, owner.lottery_nums.5: None, owner.tuple: (1, 2, 'three'), owner.tuple_with_dict: (1, 2, 'three', {'is_valid': False}), owner.set: {1, 2, 3, 4, 'five'}, owner.children.0.name.first_name: Jessica, owner.children.0.name.last_name: Smith, owner.children.1.name.first_name: George, owner.children.1.name.last_name: Smith, } A makeshift solution and it's not perfect. NOTE: it doesn't keep empty dicts such as the address: {} k/v pair. it won't flatten dicts in nested tuples - though it would be easy to add using the fact that python tuples act similar to lists.
You can use recursion in order to flatten your dictionary. import collections def flatten( nested_dict, seperator='.', name=None, ): flatten_dict = {} if not nested_dict: return flatten_dict if isinstance( nested_dict, collections.abc.MutableMapping, ): for key, value in nested_dict.items(): if name is not None: flatten_dict.update( flatten( nested_dict=value, seperator=seperator, name=f'{name}{seperator}{key}', ), ) else: flatten_dict.update( flatten( nested_dict=value, seperator=seperator, name=key, ), ) else: flatten_dict[name] = nested_dict return flatten_dict if __name__ == '__main__': nested_dict = { 1: 'a', 2: { 3: 'c', 4: { 5: 'e', }, 6: [1, 2, 3, 4, 5, ], }, } print( flatten( nested_dict=nested_dict, ), ) Output: { "1":"a", "2.3":"c", "2.4.5":"e", "2.6":[1, 2, 3, 4, 5] }
Add value of the already existing key in dict in a form of container in Python
Is there any built-in function that would do the following? dictionary = {‘a’:1, ‘b’:2, ‘c’:3} dictionary.update(c=10) # what happens dictionary ---- {‘a’:1, ‘b’:2, ‘c’:10} # what I want to happen: dictionary ---- {‘a’:1, ‘b’:2, ‘c’:(3, 10)} By default if keys are the same, later key would override earlier one. If the key is already present in dict, the value of the new key: value pair would be added to already existing value in a form of container, like tuple, or list or set. I can write a helper function to do so but I believe it should be something built-in for this matter.
You can do this from collections import defaultdict d = defaultdict(list) d["a"].append(1) d["b"].append(2) d["c"].append(3) d["c"].append(10) print(d) Result defaultdict(list, {'a': [1], 'b': [2], 'c': [3, 10]})
Your desired solution is not very elegant, so I am going to propose an alternative one. Tuples are immutable. Let's use lists instead, because we can easily append to them. The data type of the values should be consistent. Use lists in any case, even for single values. Let's use a defaultdict such that we don't have to initialize lists manually. Putting it together: >>> from collections import defaultdict >>> d = defaultdict(list) >>> for v, k in enumerate('abc', 1): ... d[k].append(v) ... >>> d defaultdict(<class 'list'>, {'a': [1], 'b': [2], 'c': [3]}) >>> d['c'].append(10) >>> d defaultdict(<class 'list'>, {'a': [1], 'b': [2], 'c': [3, 10]})
You could rewrite the update function by creating a new class: In Python bulitins.py: def update(self, E=None, **F): # known special case of dict.update """ D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] """ pass So I write this(Inherit from UserDict, suggested by #timgeb): from collections import UserDict class CustomDict(UserDict): def __init__(self): super().__init__() def update(self, E=None, **F) -> None: if E: if isinstance(E, dict): for k in E: self[k] = E[k] else: for k, v in E: self[k] = v else: if isinstance(F, dict): for key in F: if isinstance(self[key], list): self[key].append(F[key]) else: self[key] = [self[key], F[key]] dictionary = CustomDict() dictionary.update({'a': 1, 'b': 2, 'c': 3}) print(dictionary) dictionary.update(a=3) print(dictionary) dictionary.update(a=4) print(dictionary) Result: {'a': 1, 'b': 2, 'c': 3} {'a': [1, 3], 'b': 2, 'c': 3} {'a': [1, 3, 4], 'b': 2, 'c': 3} Maybe there are some logic errors in my code,but welcome to point out.
Perhaps you could use something like: dictionary = {'a':1, 'b':2, 'c':3} dictionary.update({'c': 10 if not dictionary.get('c') else tuple([dictionary['c'],] + [10,])}) # {'a': 1, 'b': 2, 'c': (3, 10)} But it should probably be wrapped into a function to make things clean. The general pattern would be (I suppose, based on your question): dict = {...} if 'a' not in dict: do_this() # just add it to the dict? else: do_that() # build a tuple or list? In your above question you're mixing types -- I'm not sure if you want that, a more pythonic approach might be to have all the values as list and use a defaultdict.
Python3 dictionary comprehension with sub-dictionary upacking?
Suppose one has a dictionary, root which consists of key:value pairs, where some values are themselves dictionaries. Can one (and if so, how) unpack these sub dictionaries via dictionary comprehension? e.g. {k: v if type(v) is not dict else **v for k, v in root.items()} example: root = {'a': 1, 'b': {'c': 2, 'd': 3}} result = {'a': 1, 'c': 2, 'd': 3}
I guess I should post as an answer with a more broad explanation to help you as it is a little bit different to other existing questions { _k: _v for k, v in root.items() for _k, _v in ( # here I create a dummy dictionary if non exists v if isinstance(v, dict) else {k: v} ).items() # and iterate that } The key part of understanding is that you need consistent and generic logic for the comprehension to work. You can do this by creating dummy nested dictionaries where they don't previously exist using v if isinstance(v, dict) else {k: v} Then this is a simple nested dictionary unpacking exercise. To help in your future comprehensions I would recommend writing the code out e.g. res = dict() for k,v in root.items(): d = v if isinstance(v, dict) else {k: v} for _k, _v in d.items(): res[_k] = _v and work backwards from this Useful references Putting a simple if-then-else statement on one line Nested dictionary comprehension python
If you have several levels of nested dictionaries, I suggest you the following solution based on a recursive function: def flatten(res, root): for k,v in root.items(): if isinstance(v, dict): flatten(res, v) else: res[k] = v root = {'a': 1, 'b': {'c': 2, 'd': {'e': 5, 'f': 6}}} result = {} flatten(result, root) print(result) # {'a': 1, 'c': 2, 'e': 5, 'f': 6}
Here is a recursive solution. In the function _flatten_into_kv_pairs, we iterate through the key and values pair and yield those keys/values if the value is a not a dictionary. If it is, then we recursively call _flatten_into_kv_pairs with the yield from construct. The function flatten_dict is just a shell which turns sequence of key/value pairs back into a dictionary. def _flatten_into_kv_pairs(dict_object): for k, v in dict_object.items(): if isinstance(v, dict): yield from _flatten_into_kv_pairs(v) else: yield k, v def flatten_dict(dict_object): return dict(_flatten_into_kv_pairs(dict_object)) root = {'a': 1, 'b': {'c': 2, 'd': 3, 'e': {'f': 4, 'g': 5}}} print(flatten_dict(root)) Output: {'a': 1, 'c': 2, 'd': 3, 'f': 4, 'g': 5}
How can I find dict keys for matching values in two dicts?
I have two dictionaries mapping IDs to values. For simplicity, lets say those are the dictionaries: d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3} d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1} As named, the dictionaries are not symmetrical. I would like to get a dictionary of keys from dictionaries d_source and d_target whose values match. The resulting dictionary would have d_source keys as its own keys, and d_target keys as that keys value (in either a list, tuple or set format). This would be The expected returned value for the above example should be the following list: {'a': ('1', 'A'), 'b': ('B',), 'c': ('C',), '3': ('C',)} There are two somewhat similar questions, but those solutions can't be easily applied to my question. Some characteristics of the data: Source would usually be smaller than target. Having roughly few thousand sources (tops) and a magnitude more targets. Duplicates in the same dict (both d_source and d_target) are not too likely on values. matches are expected to be found for (a rough estimate) not more than 50% than d_source items. All keys are integers. What is the best (performance wise) solution to this problem? Modeling data into other datatypes for improved performance is totally ok, even when using third party libraries (i'm thinking numpy)
All answers have O(n^2) efficiency which isn't very good so I thought of answering myself. I use 2(source_len) + 2(dict_count)(dict_len) memory and I have O(2n) efficiency which is the best you can get here I believe. Here you go: from collections import defaultdict d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3} d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1} def merge_dicts(source_dict, *rest): flipped_rest = defaultdict(list) for d in rest: while d: k, v = d.popitem() flipped_rest[v].append(k) return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()} new_dict = merge_dicts(d_source, d_target) By the way, I'm using a tuple in order not to link the resulting lists together. As you've added specifications for the data, here's a closer matching solution: d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3} d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1} def second_merge_dicts(source_dict, *rest): """Optimized for ~50% source match due to if statement addition. Also uses less memory. """ unique_values = set(source_dict.values()) flipped_rest = defaultdict(list) for d in rest: while d: k, v = d.popitem() if v in unique_values: flipped_rest[v].append(k) return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()} new_dict = second_merge_dicts(d_source, d_target)
from collections import defaultdict from pprint import pprint d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3} d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1} d_result = defaultdict(list) {d_result[a].append(b) for a in d_source for b in d_target if d_source[a] == d_target[b]} pprint(d_result) Output: {'3': ['C'], 'a': ['A', '1'], 'b': ['B'], 'c': ['C']} Timing results: from collections import defaultdict from copy import deepcopy from random import randint from timeit import timeit def Craig_match(source, target): result = defaultdict(list) {result[a].append(b) for a in source for b in target if source[a] == target[b]} return result def Bharel_match(source_dict, *rest): flipped_rest = defaultdict(list) for d in rest: while d: k, v = d.popitem() flipped_rest[v].append(k) return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()} def modified_Bharel_match(source_dict, *rest): """Optimized for ~50% source match due to if statement addition. Also uses less memory. """ unique_values = set(source_dict.values()) flipped_rest = defaultdict(list) for d in rest: while d: k, v = d.popitem() if v in unique_values: flipped_rest[v].append(k) return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()} # generate source, target such that: # a) ~10% duplicate values in source and target # b) 2000 unique source keys, 20000 unique target keys # c) a little less than 50% matches source value to target value # d) numeric keys and values source = {} for k in range(2000): source[k] = randint(0, 1800) target = {} for k in range(20000): if k < 1000: target[k] = randint(0, 2000) else: target[k] = randint(2000, 19000) best_time = {} approaches = ('Craig', 'Bharel', 'modified_Bharel') for a in approaches: best_time[a] = None for _ in range(3): for approach in approaches: test_source = deepcopy(source) test_target = deepcopy(target) statement = 'd=' + approach + '_match(test_source,test_target)' setup = 'from __main__ import test_source, test_target, ' + approach + '_match' t = timeit(stmt=statement, setup=setup, number=1) if not best_time[approach] or (t < best_time[approach]): best_time[approach] = t for approach in approaches: print(approach, ':', '%0.5f' % best_time[approach]) Output: Craig : 7.29259 Bharel : 0.01587 modified_Bharel : 0.00682
Here is another solution. There are a lot of ways to do this for key1 in d1: for key2 in d2: if d1[key1] == d2[key2]: stuff Note that you can use any name for key1 and key2.
This maybe "cheating" in some regards, although if you are looking for the matching values of the keys regardless of the case sensitivity then you might be able to do: import sets aa = {'a': 1, 'b': 2, 'c':3} bb = {'A': 1, 'B': 2, 'd': 3} bbl = {k.lower():v for k,v in bb.items()} result = {k:k.upper() for k,v in aa.iteritems() & bbl.viewitems()} print( result ) Output: {'a': 'A', 'b': 'B'} The bbl declaration changes the bb keys into lowercase (it could be either aa, or bb). * I only tested this on my phone, so just throwing this idea out there I suppose... Also, you've changed your question radically since I began composing my answer, so you get what you get.
It is up to you to determine the best solution. Here is a solution: def dicts_to_tuples(*dicts): result = {} for d in dicts: for k,v in d.items(): result.setdefault(v, []).append(k) return [tuple(v) for v in result.values() if len(v) > 1] d1 = {'a': 1, 'b': 2, 'c':3} d2 = {'A': 1, 'B': 2} print dicts_to_tuples(d1, d2)
Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
For example I have two dicts: Dict A: {'a': 1, 'b': 2, 'c': 3} Dict B: {'b': 3, 'c': 4, 'd': 5} I need a pythonic way of 'combining' two dicts such that the result is: {'a': 1, 'b': 5, 'c': 7, 'd': 5} That is to say: if a key appears in both dicts, add their values, if it appears in only one dict, keep its value.
Use collections.Counter: >>> from collections import Counter >>> A = Counter({'a':1, 'b':2, 'c':3}) >>> B = Counter({'b':3, 'c':4, 'd':5}) >>> A + B Counter({'c': 7, 'b': 5, 'd': 5, 'a': 1}) Counters are basically a subclass of dict, so you can still do everything else with them you'd normally do with that type, such as iterate over their keys and values.
A more generic solution, which works for non-numeric values as well: a = {'a': 'foo', 'b':'bar', 'c': 'baz'} b = {'a': 'spam', 'c':'ham', 'x': 'blah'} r = dict(a.items() + b.items() + [(k, a[k] + b[k]) for k in set(b) & set(a)]) or even more generic: def combine_dicts(a, b, op=operator.add): return dict(a.items() + b.items() + [(k, op(a[k], b[k])) for k in set(b) & set(a)]) For example: >>> a = {'a': 2, 'b':3, 'c':4} >>> b = {'a': 5, 'c':6, 'x':7} >>> import operator >>> print combine_dicts(a, b, operator.mul) {'a': 10, 'x': 7, 'c': 24, 'b': 3}
>>> A = {'a':1, 'b':2, 'c':3} >>> B = {'b':3, 'c':4, 'd':5} >>> c = {x: A.get(x, 0) + B.get(x, 0) for x in set(A).union(B)} >>> print(c) {'a': 1, 'c': 7, 'b': 5, 'd': 5}
Intro: There are the (probably) best solutions. But you have to know it and remember it and sometimes you have to hope that your Python version isn't too old or whatever the issue could be. Then there are the most 'hacky' solutions. They are great and short but sometimes are hard to understand, to read and to remember. There is, though, an alternative which is to to try to reinvent the wheel. - Why reinventing the wheel? - Generally because it's a really good way to learn (and sometimes just because the already-existing tool doesn't do exactly what you would like and/or the way you would like it) and the easiest way if you don't know or don't remember the perfect tool for your problem. So, I propose to reinvent the wheel of the Counter class from the collections module (partially at least): class MyDict(dict): def __add__(self, oth): r = self.copy() try: for key, val in oth.items(): if key in r: r[key] += val # You can custom it here else: r[key] = val except AttributeError: # In case oth isn't a dict return NotImplemented # The convention when a case isn't handled return r a = MyDict({'a':1, 'b':2, 'c':3}) b = MyDict({'b':3, 'c':4, 'd':5}) print(a+b) # Output {'a':1, 'b': 5, 'c': 7, 'd': 5} There would probably others way to implement that and there are already tools to do that but it's always nice to visualize how things would basically works.
Definitely summing the Counter()s is the most pythonic way to go in such cases but only if it results in a positive value. Here is an example and as you can see there is no c in result after negating the c's value in B dictionary. In [1]: from collections import Counter In [2]: A = Counter({'a':1, 'b':2, 'c':3}) In [3]: B = Counter({'b':3, 'c':-4, 'd':5}) In [4]: A + B Out[4]: Counter({'d': 5, 'b': 5, 'a': 1}) That's because Counters were primarily designed to work with positive integers to represent running counts (negative count is meaningless). But to help with those use cases,python documents the minimum range and type restrictions as follows: The Counter class itself is a dictionary subclass with no restrictions on its keys and values. The values are intended to be numbers representing counts, but you could store anything in the value field. The most_common() method requires only that the values be orderable. For in-place operations such as c[key] += 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are supported. The same is also true for update() and subtract() which allow negative and zero values for both inputs and outputs. The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison. The elements() method requires integer counts. It ignores zero and negative counts. So for getting around that problem after summing your Counter you can use Counter.update in order to get the desire output. It works like dict.update() but adds counts instead of replacing them. In [24]: A.update(B) In [25]: A Out[25]: Counter({'d': 5, 'b': 5, 'a': 1, 'c': -1})
myDict = {} for k in itertools.chain(A.keys(), B.keys()): myDict[k] = A.get(k, 0)+B.get(k, 0)
The one with no extra imports! Their is a pythonic standard called EAFP(Easier to Ask for Forgiveness than Permission). Below code is based on that python standard. # The A and B dictionaries A = {'a': 1, 'b': 2, 'c': 3} B = {'b': 3, 'c': 4, 'd': 5} # The final dictionary. Will contain the final outputs. newdict = {} # Make sure every key of A and B get into the final dictionary 'newdict'. newdict.update(A) newdict.update(B) # Iterate through each key of A. for i in A.keys(): # If same key exist on B, its values from A and B will add together and # get included in the final dictionary 'newdict'. try: addition = A[i] + B[i] newdict[i] = addition # If current key does not exist in dictionary B, it will give a KeyError, # catch it and continue looping. except KeyError: continue EDIT: thanks to jerzyk for his improvement suggestions.
import itertools import collections dictA = {'a':1, 'b':2, 'c':3} dictB = {'b':3, 'c':4, 'd':5} new_dict = collections.defaultdict(int) # use dict.items() instead of dict.iteritems() for Python3 for k, v in itertools.chain(dictA.iteritems(), dictB.iteritems()): new_dict[k] += v print dict(new_dict) # OUTPUT {'a': 1, 'c': 7, 'b': 5, 'd': 5} OR Alternative you can use Counter as #Martijn has mentioned above.
For a more generic and extensible way check mergedict. It uses singledispatch and can merge values based on its types. Example: from mergedict import MergeDict class SumDict(MergeDict): #MergeDict.dispatch(int) def merge_int(this, other): return this + other d2 = SumDict({'a': 1, 'b': 'one'}) d2.merge({'a':2, 'b': 'two'}) assert d2 == {'a': 3, 'b': 'two'}
From python 3.5: merging and summing Thanks to #tokeinizer_fsj that told me in a comment that I didn't get completely the meaning of the question (I thought that add meant just adding keys that eventually where different in the two dictinaries and, instead, i meant that the common key values should be summed). So I added that loop before the merging, so that the second dictionary contains the sum of the common keys. The last dictionary will be the one whose values will last in the new dictionary that is the result of the merging of the two, so I thing the problem is solved. The solution is valid from python 3.5 and following versions. a = { "a": 1, "b": 2, "c": 3 } b = { "a": 2, "b": 3, "d": 5 } # Python 3.5 for key in b: if key in a: b[key] = b[key] + a[key] c = {**a, **b} print(c) >>> c {'a': 3, 'b': 5, 'c': 3, 'd': 5} Reusable code a = {'a': 1, 'b': 2, 'c': 3} b = {'b': 3, 'c': 4, 'd': 5} def mergsum(a, b): for k in b: if k in a: b[k] = b[k] + a[k] c = {**a, **b} return c print(mergsum(a, b))
Additionally, please note a.update( b ) is 2x faster than a + b from collections import Counter a = Counter({'menu': 20, 'good': 15, 'happy': 10, 'bar': 5}) b = Counter({'menu': 1, 'good': 1, 'bar': 3}) %timeit a + b; ## 100000 loops, best of 3: 8.62 µs per loop ## The slowest run took 4.04 times longer than the fastest. This could mean that an intermediate result is being cached. %timeit a.update(b) ## 100000 loops, best of 3: 4.51 µs per loop
One line solution is to use dictionary comprehension. C = { k: A.get(k,0) + B.get(k,0) for k in list(B.keys()) + list(A.keys()) }
def merge_with(f, xs, ys): xs = a_copy_of(xs) # dict(xs), maybe generalizable? for (y, v) in ys.iteritems(): xs[y] = v if y not in xs else f(xs[x], v) merge_with((lambda x, y: x + y), A, B) You could easily generalize this: def merge_dicts(f, *dicts): result = {} for d in dicts: for (k, v) in d.iteritems(): result[k] = v if k not in result else f(result[k], v) Then it can take any number of dicts.
This is a simple solution for merging two dictionaries where += can be applied to the values, it has to iterate over a dictionary only once a = {'a':1, 'b':2, 'c':3} dicts = [{'b':3, 'c':4, 'd':5}, {'c':9, 'a':9, 'd':9}] def merge_dicts(merged,mergedfrom): for k,v in mergedfrom.items(): if k in merged: merged[k] += v else: merged[k] = v return merged for dct in dicts: a = merge_dicts(a,dct) print (a) #{'c': 16, 'b': 5, 'd': 14, 'a': 10}
Here's yet another option using dictionary comprehensions combined with the behavior of dict(): dict3 = dict(dict1, **{ k: v + dict1.get(k, 0) for k, v in dict2.items() }) # {'a': 4, 'b': 2, 'c': 7, 'g': 1} From https://docs.python.org/3/library/stdtypes.html#dict: https://docs.python.org/3/library/stdtypes.html#dict and also If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument. The dict comprehension **{ k: v + dict1.get(v, 0), v in dict2.items() } handles adding dict1[1] to v. We don't need an explicit if here because the default value for our dict1.get can be set to 0 instead.
This solution is easy to use, it is used as a normal dictionary, but you can use the sum function. class SumDict(dict): def __add__(self, y): return {x: self.get(x, 0) + y.get(x, 0) for x in set(self).union(y)} A = SumDict({'a': 1, 'c': 2}) B = SumDict({'b': 3, 'c': 4}) # Also works: B = {'b': 3, 'c': 4} print(A + B) # OUTPUT {'a': 1, 'b': 3, 'c': 6}
The above solutions are great for the scenario where you have a small number of Counters. If you have a big list of them though, something like this is much nicer: from collections import Counter A = Counter({'a':1, 'b':2, 'c':3}) B = Counter({'b':3, 'c':4, 'd':5}) C = Counter({'a': 5, 'e':3}) list_of_counts = [A, B, C] total = sum(list_of_counts, Counter()) print(total) # Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3}) The above solution is essentially summing the Counters by: total = Counter() for count in list_of_counts: total += count print(total) # Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3}) This does the same thing but I think it always helps to see what it is effectively doing underneath.
What about: def dict_merge_and_sum( d1, d2 ): ret = d1 ret.update({ k:v + d2[k] for k,v in d1.items() if k in d2 }) ret.update({ k:v for k,v in d2.items() if k not in d1 }) return ret A = {'a': 1, 'b': 2, 'c': 3} B = {'b': 3, 'c': 4, 'd': 5} print( dict_merge_and_sum( A, B ) ) Output: {'d': 5, 'a': 1, 'c': 7, 'b': 5}
More conventional way to combine two dict. Using modules and tools are good but understanding the logic behind it will help in case you don't remember the tools. Program to combine two dictionary adding values for common keys. def combine_dict(d1,d2): for key,value in d1.items(): if key in d2: d2[key] += value else: d2[key] = value return d2 combine_dict({'a':1, 'b':2, 'c':3},{'b':3, 'c':4, 'd':5}) output == {'b': 5, 'c': 7, 'd': 5, 'a': 1}
Here's a very general solution. You can deal with any number of dict + keys that are only in some dict + easily use any aggregation function you want: def aggregate_dicts(dicts, operation=sum): """Aggregate a sequence of dictionaries using `operation`.""" all_keys = set().union(*[el.keys() for el in dicts]) return {k: operation([dic.get(k, None) for dic in dicts]) for k in all_keys} example: dicts_same_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3}] aggregate_dicts(dicts_same_keys, operation=sum) #{'x': 3, 'y': 6} example non-identical keys and generic aggregation: dicts_diff_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3, 'c': 4}] def mean_no_none(l): l_no_none = [el for el in l if el is not None] return sum(l_no_none) / len(l_no_none) aggregate_dicts(dicts_diff_keys, operation=mean_no_none) # {'x': 1.0, 'c': 4.0, 'y': 2.0}
dict1 = {'a':1, 'b':2, 'c':3} dict2 = {'a':3, 'g':1, 'c':4} dict3 = {} # will store new values for x in dict1: if x in dict2: #sum values with same key dict3[x] = dict1[x] +dict2[x] else: #add the values from x to dict1 dict3[x] = dict1[x] #search for new values not in a for x in dict2: if x not in dict1: dict3[x] = dict2[x] print(dict3) # {'a': 4, 'b': 2, 'c': 7, 'g': 1}
Merging three dicts a,b,c in a single line without any other modules or libs If we have the three dicts a = {"a":9} b = {"b":7} c = {'b': 2, 'd': 90} Merge all with a single line and return a dict object using c = dict(a.items() + b.items() + c.items()) Returning {'a': 9, 'b': 2, 'd': 90}