Merging dict of dicts and sum values - python

I'm looking for a way to merge multiple dicts with each other, which contain nested dicts too. The number of nested dicts is not static but dynamic.
At the end the Final dict should contain all the dicts of dicts and the sum of their values:
COUNTRY1 = {'a': {'X': 10, 'Y': 18, 'Z': 17}, 'b': {'AA':{'AAx':45,'AAy':22},'BB':{'BBx':45,'BBy':22}}, 'c': 100}
COUNTRY2 = {'a': {'U': 12, 'V': 34, 'W': 23}, 'b': {'AA':{'AAz':23,'AAa':26},'BB':{'BBz':11,'BBa':15}}, 'c': 115}
COUNTRY3 = {'a': {'Y': 15, 'Z': 14, 'X': 12}, 'b': {'AA':{'AAx':45,'AAz':22},'BB':{'BBy':45,'BBz':22}}, 'c': 232}
# After merging the dictionaries the result should look like:
ALL
>>> {'a': {'X': 22, 'Y': 33, 'Z': 31, 'U': 12, 'V': 34, 'W': 23}, 'b': {'AA':{'AAx':90,'AAy':22,'AAz':45,'AAa':26},'BB':{'BBx':45,'BBy':67, 'BBz':33,'BBa':15}}, 'c': 447}
I tried the following code which allows nested dicts to a max of 3 nested dicts. Unfortunately the code doesn't do what I expected. Thereby it doesn't look very clean, I feel like this could be done with a recursive function, however I can't find a way to do it.
COUNTRIES = ['COUNTRY1','COUNTRY2', 'COUNTRY3']
ALL = {}
for COUNTRY_CODE in COUNTRIES:
COUNTRY = pickle.load(open(COUNTRY_CODE+".p", "rb"))
keys = COUNTRY.keys()
for key in keys:
try:
keys2 = COUNTRY[key].keys()
print(key, keys2)
for key2 in keys2:
try:
keys3 = COUNTRY[key][key2].keys()
print(key2, keys3)
for key3 in keys3:
try:
keys4 = COUNTRY[key][key2][key3].keys()
print(key3, keys4)
except:
print(key3, "NO KEY3")
if not key3 in ALL[key][key2]:
ALL[key][key2][key3] = COUNTRY[key][key2][key3]
else:
ALL[key][key2][key3] =+ COUNTRY[key][key2][key3]
except:
print(key2, "NO KEY2")
if not key2 in ALL[key]:
ALL[key][key2] = COUNTRY[key][key2]
else:
ALL[key][key2] =+ COUNTRY[key][key2]
except:
print(key, "NO KEY")
if not key in ALL:
ALL[key] = COUNTRY[key]
else:
ALL[key] =+ COUNTRY[key]
print(ALL)

The issue is that you need to determine what to do with a dictionary key based on the type of the value. The basic idea is:
Input is a pair of dictionaries, output is the sum dictionary
Step along both input dictionaries
If a value is a dictionary, recurse
If a value is a number, add it to the other number
This is fairly easy to implement with a comprehension:
def add_dicts(d1, d2):
def sum(v1, v2):
if v2 is None:
return v1
try:
return v1 + v2
except TypeError:
return add_dicts(v1, v2)
result = d2.copy()
result.update({k: sum(v, d2.get(k)) for k, v in d1.items()})
return result
The copy ensures that any keys in d2 that are not also in d1 are simply copied over.
You can now sum as follows:
ALL = add_dicts(add_dicts(COUNTRY1, COUNTRY2), COUNTRY3)
More generally, you can use functools.reduce to do this for an indefinite number of dictionaries:
dicts = [COUNTRY1, COUNTRY2, COUNTRY3]
ALL = reduce(add_dicts, dicts)

Make two functions like below:
def cal_sum(lst):
final_dict = dict()
for l in lst:
sum(final_dict,l)
return final_dict
def sum(final_dict,iter_dict):
for k, v in iter_dict.items():
if isinstance(v, dict):
sum(final_dict.setdefault(k, dict()), v)
elif isinstance(v, int):
final_dict[k] = final_dict.get(k, 0) + v
Calling the above code as follows produces the desired output:
>>> print(cal_sum([COUNTRY1, COUNTRY2, COUNTRY3]))
{'a': {'U': 12, 'W': 23, 'V': 34, 'Y': 33, 'X': 22, 'Z': 31}, 'c': 447, 'b': {'AA': {'AAa': 26, 'AAy': 22, 'AAx': 90, 'AAz': 45}, 'BB': {'BBa': 15, 'BBz': 33, 'BBy': 67, 'BBx': 45}}}

Related

Formatting to nested dict [duplicate]

I have a flattened dictionary which I want to make into a nested one, of the form
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
I want to convert it to the form
nested = {'X': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}},
'Y': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}}}
The structure of the flat dictionary is such that there should not be any problems with ambiguities. I want it to work for dictionaries of arbitrary depth, but performance is not really an issue. I've seen lots of methods for flattening a nested dictionary, but basically none for nesting a flattened dictionary. The values stored in the dictionary are either scalars or strings, never iterables.
So far I have got something which can take the input
test_dict = {'X_a_one': '10',
'X_b_one': '10',
'X_c_one': '10'}
to the output
test_out = {'X': {'a_one': '10',
'b_one': '10',
'c_one': '10'}}
using the code
def nest_once(inp_dict):
out = {}
if isinstance(inp_dict, dict):
for key, val in inp_dict.items():
if '_' in key:
head, tail = key.split('_', 1)
if head not in out.keys():
out[head] = {tail: val}
else:
out[head].update({tail: val})
else:
out[key] = val
return out
test_out = nest_once(test_dict)
But I'm having trouble working out how to make this into something which recursively creates all levels of the dictionary.
Any help would be appreciated!
(As for why I want to do this: I have a file whose structure is equivalent to a nested dict, and I want to store this file's contents in the attributes dictionary of a NetCDF file and retrieve it later. However NetCDF only allows you to put flat dictionaries as the attributes, so I want to unflatten the dictionary I previously stored in the NetCDF file.)
Here is my take:
def nest_dict(flat):
result = {}
for k, v in flat.items():
_nest_dict_rec(k, v, result)
return result
def _nest_dict_rec(k, v, out):
k, *rest = k.split('_', 1)
if rest:
_nest_dict_rec(rest[0], v, out.setdefault(k, {}))
else:
out[k] = v
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
nested = {'X': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}},
'Y': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}}}
print(nest_dict(flat) == nested)
# True
output = {}
for k, v in source.items():
# always start at the root.
current = output
# This is the part you're struggling with.
pieces = k.split('_')
# iterate from the beginning until the second to last place
for piece in pieces[:-1]:
if not piece in current:
# if a dict doesn't exist at an index, then create one
current[piece] = {}
# as you walk into the structure, update your current location
current = current[piece]
# The reason you're using the second to last is because the last place
# represents the place you're actually storing the item
current[pieces[-1]] = v
Here's one way using collections.defaultdict, borrowing heavily from this previous answer. There are 3 steps:
Create a nested defaultdict of defaultdict objects.
Iterate items in flat input dictionary.
Build defaultdict result according to the structure derived from splitting keys by _, using getFromDict to iterate the result dictionary.
This is a complete example:
from collections import defaultdict
from functools import reduce
from operator import getitem
def getFromDict(dataDict, mapList):
"""Iterate nested dictionary"""
return reduce(getitem, mapList, dataDict)
# instantiate nested defaultdict of defaultdicts
tree = lambda: defaultdict(tree)
d = tree()
# iterate input dictionary
for k, v in flat.items():
*keys, final_key = k.split('_')
getFromDict(d, keys)[final_key] = v
{'X': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}},
'Y': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}}}
As a final step, you can convert your defaultdict to a regular dict, though usually this step is not necessary.
def default_to_regular_dict(d):
"""Convert nested defaultdict to regular dict of dicts."""
if isinstance(d, defaultdict):
d = {k: default_to_regular_dict(v) for k, v in d.items()}
return d
# convert back to regular dict
res = default_to_regular_dict(d)
The other answers are cleaner, but since you mentioned recursion we do have other options.
def nest(d):
_ = {}
for k in d:
i = k.find('_')
if i == -1:
_[k] = d[k]
continue
s, t = k[:i], k[i+1:]
if s in _:
_[s][t] = d[k]
else:
_[s] = {t:d[k]}
return {k:(nest(_[k]) if type(_[k])==type(d) else _[k]) for k in _}
You can use itertools.groupby:
import itertools, json
flat = {'Y_a_two': 20, 'Y_a_one': 10, 'X_b_two': 20, 'X_b_one': 10, 'X_a_one': 10, 'X_a_two': 20, 'Y_b_two': 20, 'Y_b_one': 10}
_flat = [[*a.split('_'), b] for a, b in flat.items()]
def create_dict(d):
_d = {a:list(b) for a, b in itertools.groupby(sorted(d, key=lambda x:x[0]), key=lambda x:x[0])}
return {a:create_dict([i[1:] for i in b]) if len(b) > 1 else b[0][-1] for a, b in _d.items()}
print(json.dumps(create_dict(_flat), indent=3))
Output:
{
"Y": {
"b": {
"two": 20,
"one": 10
},
"a": {
"two": 20,
"one": 10
}
},
"X": {
"b": {
"two": 20,
"one": 10
},
"a": {
"two": 20,
"one": 10
}
}
}
Another non-recursive solution with no imports. Splitting the logic between inserting each key-value pair of the flat dict and mapping over key-value pairs of the flat dict.
def insert(dct, lst):
"""
dct: a dict to be modified inplace.
lst: list of elements representing a hierarchy of keys
followed by a value.
dct = {}
lst = [1, 2, 3]
resulting value of dct: {1: {2: 3}}
"""
for x in lst[:-2]:
dct[x] = dct = dct.get(x, dict())
dct.update({lst[-2]: lst[-1]})
def unflat(dct):
# empty dict to store the result
result = dict()
# create an iterator of lists representing hierarchical indices followed by the value
lsts = ([*k.split("_"), v] for k, v in dct.items())
# insert each list into the result
for lst in lsts:
insert(result, lst)
return result
result = unflat(flat)
# {'X': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}},
# 'Y': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}}}
Here is a reasonably readable recursive result:
def unflatten_dict(a, result = None, sep = '_'):
if result is None:
result = dict()
for k, v in a.items():
k, *rest = k.split(sep, 1)
if rest:
unflatten_dict({rest[0]: v}, result.setdefault(k, {}), sep = sep)
else:
result[k] = v
return result
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
print(unflatten_dict(flat))
# {'X': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}},
# 'Y': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}}}
This is based on a couple of the above answers, uses no imports and is only tested in python 3.
Install ndicts
pip install ndicts
Then in your script
from ndicts.ndicts import NestedDict
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
nd = NestedDict()
for key, value in flat.items():
n_key = tuple(key.split("_"))
nd[n_key] = value
If you need the result as a dictionary:
>>> nd.to_dict()
{'X': {'a': {'one': 10, 'two': 20},
'b': {'one': 10, 'two': 20}},
'Y': {'a': {'one': 10, 'two': 20},
'b': {'one': 10, 'two': 20}}}

How to merge with two objects into a tuple with the two elements?

I'm new to python and I was trying to merge two objects into a tuple with the two elements?
I've tried like merging lists, summing them and so on, nothing worked as I wanted. The code I'm providing doesn`t works also.
def merge(dict1,dict2):
for key in dict2:
if key in dict1:
dict2[key]=dict2[key]+dict1[key]
else:
pass
return dict2
The input is this:
a = {'x': [1,2,3], 'y': 1, 'z': set([1,2,3]), 'w': 'qweqwe', 't': {'a': [1, 2]}, 'm': [1]}
And this:
b = {'x': [4,5,6], 'y': 4, 'z': set([4,2,3]), 'w': 'asdf', 't': {'a': [3, 2]}, 'm': "wer"}
And I want the output to be this:
{'x': [1,2,3,4,5,6], 'y': 5, 'z': set([1,2,3,4]), 'w': 'qweqweasdf', 't': {'a': [1, 2, 3, 2]}, 'm': ([1], "wer")}
With ^this being a single tuple.
Given there are so many types, a lot of type checking is required, but this simple recursion should work.
Also, this assumes the keys in a and b are the same, as they are in the example.
def merge(a,b,new_dict):
for key in a.keys():
if type(a[key]) != type(b[key]):
new_dict[key] = (a[key],b[key])
elif type(a[key]) == dict:
new_dict[key] = merge(a[key],b[key],{})
elif type(a[key]) == set:
new_dict[key] = a[key]|b[key]
else:
new_dict[key] = a[key] + b[key]
return new_dict
c = merge(a,b,{})

Python compare matching keys and print values, keys

I know there are somewhat similar questions to this, but none have answered specifically what I'm trying to do and I haven't had any luck with it. I have two dictionaries and I want to print out the key and values if the values of it are larger than the or not in dictionary two.
So for instance:
dict1 = {'T': 5, 'X': 10, 'Y': 15, 'Z': 25}
dict2 = {'U': 10, 'X': 11, 'Y': 15, 'Z': 15}
How do I get it to only print 'T': 5, 'Z': 25?
for key in dict1:
if key not in dict2 or dict1[key] > dict2[key]:
print("'%s': %d" % (key, dict1[key]))

How to check if a string contains a dictionary

I want to recursively parse all values in a dict that are strings with ast.literal_eval(value) but not do that eval if the string doesn't contain a dict. I want this, because I have a string in a dict that is a dict in itself and I would like the value to be a dict. Best to give an example
my_dict = {'a': 42, 'b': "my_string", 'c': "{'d': 33, 'e': 'another string'}"}
Now I don't want a do to ast.literal_eval(my_dict['c']) I want a generic solution where I can do convert_to_dict(my_dict)
I wanted to write my own method, but I don't know how to check if a string contains a dict, and then ast.literal_eval will fail, hence the question.
You can check if you have a dict after using literal_eval and reassign:
from ast import literal_eval
def reassign(d):
for k, v in d.items():
try:
evald = literal_eval(v)
if isinstance(evald, dict):
d[k] = evald
except ValueError:
pass
Just pass in the dict:
In [2]: my_dict = {'a': 42, 'b': "my_string", 'c': "{'d': 33, 'e': 'another stri
...: ng'}"}
In [3]: reassign(my_dict)
In [4]: my_dict
Out[4]: {'a': 42, 'b': 'my_string', 'c': {'d': 33, 'e': 'another string'}}
In [5]: my_dict = {'a': '42', 'b': "my_string", '5': "{'d': 33, 'e': 'another st
...: ring', 'other_dict':{'foo':'bar'}}"}
In [6]: reassign(my_dict)
In [7]: my_dict
Out[7]:
{'5': {'d': 33, 'e': 'another string', 'other_dict': {'foo': 'bar'}},
'a': '42',
'b': 'my_string'}
You should also be aware that if you had certain other objects in the dict like datetime objects etc.. then literal_eval would fail so it really depends on what your dict can contain as to whether it will work or not.
If you need a recursive approach, all you need is to call reassign on the new dict.
def reassign(d):
for k, v in d.items():
try:
evald = literal_eval(v)
if isinstance(evald, dict):
d[k] = evald
reassign(evald)
except ValueError:
pass
And again just pass the dict:
In [10]: my_dict = {'a': 42, 'b': "my_string", 'c': "{'d': 33, 'e': \"{'f' : 64}
...: \"}"}
In [11]: reassign(my_dict)
In [12]: my_dict
Out[12]: {'a': 42, 'b': 'my_string', 'c': {'d': 33, 'e': {'f': 64}}}
And if you want a new dict:
from ast import literal_eval
from copy import deepcopy
def reassign(d):
for k, v in d.items():
try:
evald = literal_eval(v)
if isinstance(evald, dict):
yield k, dict(reassign(evald))
except ValueError:
yield k, deepcopy(v)
Which will give you a new dict:
In [17]: my_dict = {'a': [1, 2, [3]], 'b': "my_string", 'c': "{'d': 33, 'e': \"{
...: 'f' : 64}\"}"}
In [18]: new = dict(reassign(my_dict))
In [19]: my_dict["a"][-1].append(4)
In [20]: new
Out[20]: {'a': [1, 2, [3]], 'b': 'my_string', 'c': {'d': 33, 'e': {'f': 64}}}
In [21]: my_dict
Out[21]:
{'a': [1, 2, [3, 4]],
'b': 'my_string',
'c': '{\'d\': 33, \'e\': "{\'f\' : 64}"}'}
You need to make sure to deepcopy objects or you won't get a true independent copy of the dict when you have nested object like the list of lists above.
Here is a proposition that handles recursion. As it was suggested in the comments, it tries to eval everything then check if the result is a dict, if it is we recurse, else we skip the value . I sligthly altered the initial dict to show that it hanldes recusion fine :
import ast
my_dict = {'a': 42, 'b': "my_string", 'c': "{'d': 33, 'e': \"{'f' : 64}\"}"}
def recursive_dict_eval(old_dict):
new_dict = old_dict.copy()
for key,value in old_dict.items():
try:
evaled_value=ast.literal_eval(value)
assert isinstance(evaled_value,dict)
new_dict[key]=recursive_dict_eval(evaled_value)
except (SyntaxError, ValueError, AssertionError):
#SyntaxError, ValueError are for the literal_eval exceptions
pass
return new_dict
print(my_dict)
print(recursive_dict_eval(my_dict))
Output:
{'a': 42, 'b': 'my_string', 'c': '{\'d\': 33, \'e\': "{\'f\' : 64}"}'}
{'a': 42, 'b': 'my_string', 'c': {'e': {'f': 64}, 'd': 33}}
The general idea referenced in my above comment is to run thru the dictionary and try and evaluate. Store that in a local variable, and then check if that evaluated expression is a dictionary. If so, then reassign it to the passed input. If not, leave it alone.
my_dict = {'a': 42, 'b': "my_string", 'c': "{'d': 33, 'e': 'another string'}"}
def convert_to_dict(d):
for key, val in d.items():
try:
check = ast.literal_eval(val)
except:
continue
if isinstance(check, dict):
d[key] = check
return d
convert_to_dict(my_dict)
The other answers were really good and lead me to the right solution, but the previous accepted answer had a bug. Here is my working solution:
def recursive_dict_eval(myDict):
for key,value in myDict.items():
try:
if(isinstance(value, dict)):
recursive_dict_eval(value)
evaled_value=ast.literal_eval(value)
assert isinstance(evaled_value,dict)
myDict[key]=recursive_dict_eval(evaled_value)
except (SyntaxError, ValueError, AssertionError):
#SyntaxError, ValueError are for the literal_eval exceptions
pass
return myDict
If you need to handle nested str defining dict, json.loads with an object_hook might work for you:
import json
def convert_subdicts(d):
for k, v in d.items():
try:
# Try to decode a dict
newv = json.loads(v, object_hook=convert_subdicts)
except Exception:
continue
else:
if isinstance(newv, dict):
d[k] = newv # Replace with decoded dict
return d
origdict = {'a': 42, 'b': "my_string", 'c': "{'d': 33, 'e': 'another string'}"}
newdict = convert_subdicts(origdict.copy()) # Omit .copy() if mutating origdict okay
That should recursively handle the case where the contained dicts might contain strs values that define subdicts. If you don't need to handle that case, you can omit the use of the object_hook, or replace json.loads entirely with ast.literal_eval.

How can I get a list of nested dictionary keys as dot separated strings?

Say I have a dictionary that looks like:
d = {'a': 1, 'b': {'sa': 11, 'sb': 22, 'sc': {'ssa': 111, 'ssb': 222}}, 'c': 3}
I want a list of all the keys whose values aren't other dicts, but represented by their dot notation (assuming you 'dot' at each level of the dict). To put it another way, I want the compound, dot-notation key for all values who have no children.
For example, for the above dict, I would like to get (not necessarily in any order):
['a',
'b.sa',
'b.sb',
'b.sc.ssa',
'b.sc.ssb',
'c']
I'm sure there is a more elegant way to solve this problem but this should get you started.
d = {'a': 1, 'b': {'sa': 11, 'sb': 22, 'sc': {'ssa': 111, 'ssb': 222}}, 'c': 3}
def dotter(d, key, dots):
if isinstance(d, dict):
for k in d:
dotter(d[k], key + '.' + k if key else k, dots)
else:
dots.append(key)
return dots
print dotter(d, '', [])
d = {'a': 1, 'b': {'sa': 11, 'sb': 22, 'sc': {'ssa': 111, 'ssb': 222}}, 'c': 3}
def fun(k, d, pre):
path = '%s.%s' % (pre, k) if pre else k
return path if type(d[k]) is not dict else ",".join([fun(i,d[k], path) for i in d[k]])
print ",".join([fun(k,d, '') for k in d]).split(',')
OUTPUT
['a', 'c', 'b.sc.ssa', 'b.sc.ssb', 'b.sb', 'b.sa']
In case you want the dict with its values
def dotter(mixed, key='', dots={}):
if isinstance(mixed, dict):
for (k, v) in mixed.items():
dotter(mixed[k], '%s.%s' % (key, k) if key else k)
else:
dots[key] = mixed
return dots
>>> d = {'a': 1, 'b': {'sa': 11, 'sb': 22, 'sc': {'ssa': 111, 'ssb': 222}}, 'c': 3}
>>> dotted_dict = dotter(d)
>>> print(dotted_dict)
{'a': 1, 'c': 3, 'b.sa': 11, 'b.sb': 22, 'b.sc.ssb': 222, 'b.sc.ssa': 111}

Categories

Resources