Combining Two Dictionaries by Like Keys

Combining Two Dictionaries by Like Keys - python

Let's say I have two dictionaries like so:
first_dict = {'1': 3, '2': 4, '3':8, '9': 20}
second_dict = {'3': 40, '9': 28, '100': 3}
Now here's the idea: I want to get all the keys that are the same, and make the entries for those keys into a dictionary of each value.
For example:
combined_dict = {'3': {'first_dict': 8, 'second_dict': 40}, '9': {'first_dict': 20, 'second_dict':28}}
What would be the best way to accomplish this for larger dictionaries?

Use dictionary view objects:
combined_dict = {key: {'first_dict': first_dict[key], 'second_dict': second_dict[key]}
for key in first_dict.viewkeys() & second_dict}
The expression first_dict.viewkeys() & second_dict uses a set intersection to list just the keys that both dictionaries have in common. The dict.viewkeys() gives us a view into the dictionary that acts like a set without creating a whole new set object.
This makes the above expression more efficient than creating two sets and intersecting those, especially when dealing with large dictionaries.

common_keys = set(first_dict.keys()) & set(second_dict.keys())
combined_dict = { key: {'first_dict': first_dict[key],
'second_dict': second_dict[key] } for key in common_keys
}

This works with arbitrary number of dicts:
def combine(**kwargs):
return { k: { d: kwargs[d][k] for d in kwargs
} for k in set.intersection(*map(set, kwargs.values()))
}
For example:
print combine(
one={'1': 3, '2': 4, '3':8, '9': 20},
two={'3': 40, '9': 28, '100': 3},
three={'3':14, '9':42})
# {'9': {'one': 20, 'three': 42, 'two': 28}, '3': {'one': 8, 'three': 14, 'two': 40}}

Related

Why is this dictionary turning into a tuple?

I have a complex dictionary:
l = {10: [{'a':1, 'T':'y'}, {'a':2, 'T':'n'}], 20: [{'a':3,'T':'n'}]}
When I'm trying to iterate over the dictionary I'm not getting a dictionary with a list for values that are a dictionary I'm getting a tuple like so:
for m in l.items():
print(m)
(10, [{'a': 1, 'T': 'y'}, {'a': 2, 'T': 'n'}])
(20, [{'a': 3, 'T': 'n'}])
But when I just print l I get my original dictionary:
In [7]: l
Out[7]: {10: [{'a': 1, 'T': 'y'}, {'a': 2, 'T': 'n'}], 20: [{'a': 3, 'T': 'n'}]}
How do I iterate over the dictionary? I still need the keys and to process each dictionary in the value list.

There are two questions here. First, you ask why this is turned into a "tuple" - the answer to that question is because that is what the .items() method on dictionaries returns - a tuple of each key/value pair.
Knowing this, you can then decide how to use this information. You can choose to expand the tuple into the two parts during iteration
for k, v in l.items():
# Now k has the value of the key and v is the value
# So you can either use the value directly
print(v[0]);
# or access using the key
value = l[k];
print(value[0]);
# Both yield the same value

With a dictionary you can add another variable while iterating over it.
for key, value in l.items():
print(key,value)

I often rely on pprint when processing a nested object to know at a glance what structure that I am dealing with.
from pprint import pprint
l = {10: [{'a':1, 'T':'y'}, {'a':2, 'T':'n'}], 20: [{'a':3,'T':'n'}]}
pprint(l, indent=4, width=40)
Output:
{ 10: [ {'T': 'y', 'a': 1},
{'T': 'n', 'a': 2}],
20: [{'T': 'n', 'a': 3}]}
Others have already answered with implementations.

Thanks for all the help. I did discuss figure out how to process this. Here is the implementation I came up with:
for m in l.items():
k,v = m
print(f"key: {k}, val: {v}")
for n in v:
print(f"key: {n['a']}, val: {n['T']}")
Thanks for everyones help!

Formatting to nested dict [duplicate]

I have a flattened dictionary which I want to make into a nested one, of the form
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
I want to convert it to the form
nested = {'X': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}},
'Y': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}}}
The structure of the flat dictionary is such that there should not be any problems with ambiguities. I want it to work for dictionaries of arbitrary depth, but performance is not really an issue. I've seen lots of methods for flattening a nested dictionary, but basically none for nesting a flattened dictionary. The values stored in the dictionary are either scalars or strings, never iterables.
So far I have got something which can take the input
test_dict = {'X_a_one': '10',
'X_b_one': '10',
'X_c_one': '10'}
to the output
test_out = {'X': {'a_one': '10',
'b_one': '10',
'c_one': '10'}}
using the code
def nest_once(inp_dict):
out = {}
if isinstance(inp_dict, dict):
for key, val in inp_dict.items():
if '_' in key:
head, tail = key.split('_', 1)
if head not in out.keys():
out[head] = {tail: val}
else:
out[head].update({tail: val})
else:
out[key] = val
return out
test_out = nest_once(test_dict)
But I'm having trouble working out how to make this into something which recursively creates all levels of the dictionary.
Any help would be appreciated!
(As for why I want to do this: I have a file whose structure is equivalent to a nested dict, and I want to store this file's contents in the attributes dictionary of a NetCDF file and retrieve it later. However NetCDF only allows you to put flat dictionaries as the attributes, so I want to unflatten the dictionary I previously stored in the NetCDF file.)

Here is my take:
def nest_dict(flat):
result = {}
for k, v in flat.items():
_nest_dict_rec(k, v, result)
return result
def _nest_dict_rec(k, v, out):
k, *rest = k.split('_', 1)
if rest:
_nest_dict_rec(rest[0], v, out.setdefault(k, {}))
else:
out[k] = v
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
nested = {'X': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}},
'Y': {'a': {'one': 10,
'two': 20},
'b': {'one': 10,
'two': 20}}}
print(nest_dict(flat) == nested)
# True

output = {}
for k, v in source.items():
# always start at the root.
current = output
# This is the part you're struggling with.
pieces = k.split('_')
# iterate from the beginning until the second to last place
for piece in pieces[:-1]:
if not piece in current:
# if a dict doesn't exist at an index, then create one
current[piece] = {}
# as you walk into the structure, update your current location
current = current[piece]
# The reason you're using the second to last is because the last place
# represents the place you're actually storing the item
current[pieces[-1]] = v

Here's one way using collections.defaultdict, borrowing heavily from this previous answer. There are 3 steps:
Create a nested defaultdict of defaultdict objects.
Iterate items in flat input dictionary.
Build defaultdict result according to the structure derived from splitting keys by _, using getFromDict to iterate the result dictionary.
This is a complete example:
from collections import defaultdict
from functools import reduce
from operator import getitem
def getFromDict(dataDict, mapList):
"""Iterate nested dictionary"""
return reduce(getitem, mapList, dataDict)
# instantiate nested defaultdict of defaultdicts
tree = lambda: defaultdict(tree)
d = tree()
# iterate input dictionary
for k, v in flat.items():
*keys, final_key = k.split('_')
getFromDict(d, keys)[final_key] = v
{'X': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}},
'Y': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}}}
As a final step, you can convert your defaultdict to a regular dict, though usually this step is not necessary.
def default_to_regular_dict(d):
"""Convert nested defaultdict to regular dict of dicts."""
if isinstance(d, defaultdict):
d = {k: default_to_regular_dict(v) for k, v in d.items()}
return d
# convert back to regular dict
res = default_to_regular_dict(d)

The other answers are cleaner, but since you mentioned recursion we do have other options.
def nest(d):
_ = {}
for k in d:
i = k.find('_')
if i == -1:
_[k] = d[k]
continue
s, t = k[:i], k[i+1:]
if s in _:
_[s][t] = d[k]
else:
_[s] = {t:d[k]}
return {k:(nest(_[k]) if type(_[k])==type(d) else _[k]) for k in _}

You can use itertools.groupby:
import itertools, json
flat = {'Y_a_two': 20, 'Y_a_one': 10, 'X_b_two': 20, 'X_b_one': 10, 'X_a_one': 10, 'X_a_two': 20, 'Y_b_two': 20, 'Y_b_one': 10}
_flat = [[*a.split('_'), b] for a, b in flat.items()]
def create_dict(d):
_d = {a:list(b) for a, b in itertools.groupby(sorted(d, key=lambda x:x[0]), key=lambda x:x[0])}
return {a:create_dict([i[1:] for i in b]) if len(b) > 1 else b[0][-1] for a, b in _d.items()}
print(json.dumps(create_dict(_flat), indent=3))
Output:
{
"Y": {
"b": {
"two": 20,
"one": 10
},
"a": {
"two": 20,
"one": 10
}
},
"X": {
"b": {
"two": 20,
"one": 10
},
"a": {
"two": 20,
"one": 10
}
}
}

Another non-recursive solution with no imports. Splitting the logic between inserting each key-value pair of the flat dict and mapping over key-value pairs of the flat dict.
def insert(dct, lst):
"""
dct: a dict to be modified inplace.
lst: list of elements representing a hierarchy of keys
followed by a value.
dct = {}
lst = [1, 2, 3]
resulting value of dct: {1: {2: 3}}
"""
for x in lst[:-2]:
dct[x] = dct = dct.get(x, dict())
dct.update({lst[-2]: lst[-1]})
def unflat(dct):
# empty dict to store the result
result = dict()
# create an iterator of lists representing hierarchical indices followed by the value
lsts = ([*k.split("_"), v] for k, v in dct.items())
# insert each list into the result
for lst in lsts:
insert(result, lst)
return result
result = unflat(flat)
# {'X': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}},
# 'Y': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}}}

Here is a reasonably readable recursive result:
def unflatten_dict(a, result = None, sep = '_'):
if result is None:
result = dict()
for k, v in a.items():
k, *rest = k.split(sep, 1)
if rest:
unflatten_dict({rest[0]: v}, result.setdefault(k, {}), sep = sep)
else:
result[k] = v
return result
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
print(unflatten_dict(flat))
# {'X': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}},
# 'Y': {'a': {'one': 10, 'two': 20}, 'b': {'one': 10, 'two': 20}}}
This is based on a couple of the above answers, uses no imports and is only tested in python 3.

Install ndicts
pip install ndicts
Then in your script
from ndicts.ndicts import NestedDict
flat = {'X_a_one': 10,
'X_a_two': 20,
'X_b_one': 10,
'X_b_two': 20,
'Y_a_one': 10,
'Y_a_two': 20,
'Y_b_one': 10,
'Y_b_two': 20}
nd = NestedDict()
for key, value in flat.items():
n_key = tuple(key.split("_"))
nd[n_key] = value
If you need the result as a dictionary:
>>> nd.to_dict()
{'X': {'a': {'one': 10, 'two': 20},
'b': {'one': 10, 'two': 20}},
'Y': {'a': {'one': 10, 'two': 20},
'b': {'one': 10, 'two': 20}}}

Merging dict of dicts and sum values

I'm looking for a way to merge multiple dicts with each other, which contain nested dicts too. The number of nested dicts is not static but dynamic.
At the end the Final dict should contain all the dicts of dicts and the sum of their values:
COUNTRY1 = {'a': {'X': 10, 'Y': 18, 'Z': 17}, 'b': {'AA':{'AAx':45,'AAy':22},'BB':{'BBx':45,'BBy':22}}, 'c': 100}
COUNTRY2 = {'a': {'U': 12, 'V': 34, 'W': 23}, 'b': {'AA':{'AAz':23,'AAa':26},'BB':{'BBz':11,'BBa':15}}, 'c': 115}
COUNTRY3 = {'a': {'Y': 15, 'Z': 14, 'X': 12}, 'b': {'AA':{'AAx':45,'AAz':22},'BB':{'BBy':45,'BBz':22}}, 'c': 232}
# After merging the dictionaries the result should look like:
ALL
>>> {'a': {'X': 22, 'Y': 33, 'Z': 31, 'U': 12, 'V': 34, 'W': 23}, 'b': {'AA':{'AAx':90,'AAy':22,'AAz':45,'AAa':26},'BB':{'BBx':45,'BBy':67, 'BBz':33,'BBa':15}}, 'c': 447}
I tried the following code which allows nested dicts to a max of 3 nested dicts. Unfortunately the code doesn't do what I expected. Thereby it doesn't look very clean, I feel like this could be done with a recursive function, however I can't find a way to do it.
COUNTRIES = ['COUNTRY1','COUNTRY2', 'COUNTRY3']
ALL = {}
for COUNTRY_CODE in COUNTRIES:
COUNTRY = pickle.load(open(COUNTRY_CODE+".p", "rb"))
keys = COUNTRY.keys()
for key in keys:
try:
keys2 = COUNTRY[key].keys()
print(key, keys2)
for key2 in keys2:
try:
keys3 = COUNTRY[key][key2].keys()
print(key2, keys3)
for key3 in keys3:
try:
keys4 = COUNTRY[key][key2][key3].keys()
print(key3, keys4)
except:
print(key3, "NO KEY3")
if not key3 in ALL[key][key2]:
ALL[key][key2][key3] = COUNTRY[key][key2][key3]
else:
ALL[key][key2][key3] =+ COUNTRY[key][key2][key3]
except:
print(key2, "NO KEY2")
if not key2 in ALL[key]:
ALL[key][key2] = COUNTRY[key][key2]
else:
ALL[key][key2] =+ COUNTRY[key][key2]
except:
print(key, "NO KEY")
if not key in ALL:
ALL[key] = COUNTRY[key]
else:
ALL[key] =+ COUNTRY[key]
print(ALL)

The issue is that you need to determine what to do with a dictionary key based on the type of the value. The basic idea is:
Input is a pair of dictionaries, output is the sum dictionary
Step along both input dictionaries
If a value is a dictionary, recurse
If a value is a number, add it to the other number
This is fairly easy to implement with a comprehension:
def add_dicts(d1, d2):
def sum(v1, v2):
if v2 is None:
return v1
try:
return v1 + v2
except TypeError:
return add_dicts(v1, v2)
result = d2.copy()
result.update({k: sum(v, d2.get(k)) for k, v in d1.items()})
return result
The copy ensures that any keys in d2 that are not also in d1 are simply copied over.
You can now sum as follows:
ALL = add_dicts(add_dicts(COUNTRY1, COUNTRY2), COUNTRY3)
More generally, you can use functools.reduce to do this for an indefinite number of dictionaries:
dicts = [COUNTRY1, COUNTRY2, COUNTRY3]
ALL = reduce(add_dicts, dicts)

Make two functions like below:
def cal_sum(lst):
final_dict = dict()
for l in lst:
sum(final_dict,l)
return final_dict
def sum(final_dict,iter_dict):
for k, v in iter_dict.items():
if isinstance(v, dict):
sum(final_dict.setdefault(k, dict()), v)
elif isinstance(v, int):
final_dict[k] = final_dict.get(k, 0) + v
Calling the above code as follows produces the desired output:
>>> print(cal_sum([COUNTRY1, COUNTRY2, COUNTRY3]))
{'a': {'U': 12, 'W': 23, 'V': 34, 'Y': 33, 'X': 22, 'Z': 31}, 'c': 447, 'b': {'AA': {'AAa': 26, 'AAy': 22, 'AAx': 90, 'AAz': 45}, 'BB': {'BBa': 15, 'BBz': 33, 'BBy': 67, 'BBx': 45}}}

python trim down dictionaries in a list of dictionaries

i have the following list which can contain multiple dictionaries of different sizes.
The keys in each dictionary are unique, but one key may exist in different dictionaries. Values are unique across dictionaries.
I want to trim down my dictionaries so that they contain the keys and values for which the value is the highest among all dictionaries.
For example, the key '1258' exists in three of the four dictionaries, and it has the highest value only in the last one, so in the reconstructed list, this key and its value will be in the last dictionary only.
If the key doesn't exist in other dictionaries, then it will remain in the dictionary where it belongs to.
here is sample data:
[{'1258': 1.0167004,
'160': 1.5989301000000002,
'1620': 1.3058813000000002,
'2571': 0.7914598,
'26': 4.554409,
'2943': 0.5072369,
'2951': 0.4955711,
'2952': 1.2380746000000002,
'2953': 1.6159719,
'2958': 0.4340355,
'2959': 0.6026906,
'2978': 0.619001,
'2985': 1.5677016,
'3075': 1.04948,
'3222': 0.9721148000000001,
'3388': 1.680108,
'341': 0.8871856,
'3443': 0.6000103,
'361': 2.6682623000000003,
'4': 5.227341,
'601': 2.2614983999999994,
'605': 0.6303175999999999,
'9': 5.0326675},
{'1457': 5.625237999999999,
'1469': 25.45585200000001,
'1470': 25.45585200000001,
'160': 0.395728,
'1620': 0.420267,
'2571': 0.449151,
'26': 0.278281,
'601': 0.384822,
'605': 5.746278700000001,
'9': 1.487241},
{'1258': 0.27440200000000003,
'1457': 0.8723639999999999,
'1620': 0.182567,
'2571': 0.197134,
'2943': 0.3461654,
'2951': 0.47372800000000004,
'2952': 0.6662919999999999,
'2953': 0.6725458,
'2958': 0.4437159,
'2959': 0.690856,
'2985': 0.8106226999999999,
'3075': 0.352618,
'3222': 0.7866500000000001,
'3388': 0.760664,
'3443': 0.129771,
'601': 0.345448,
'605': 1.909823,
'9': 0.888999},
{'1258': 1.0853083,
'160': 0.622579,
'1620': 0.7419095,
'2571': 0.9828758,
'2943': 2.254124,
'2951': 0.6294688,
'2952': 1.0965362,
'2953': 1.8409954000000002,
'2958': 0.7394122999999999,
'2959': 0.9398920000000001,
'2978': 0.672122,
'2985': 1.2385512999999997,
'3075': 0.912366,
'3222': 0.8364904,
'3388': 0.37316499999999997,
'341': 1.0399186,
'3443': 0.547093,
'361': 0.3313275,
'601': 0.5318834,
'605': 0.2909876}]

Here's one approach. I shortened your example to one that's easier to reason about.
>>> dcts = [
... {1:2, 3:4, 5:6},
... {1:3, 6:7, 8:9},
... {6:10, 8:11, 9:12}]
>>>
>>> [{k:v for k,v in d.items() if v == max(d.get(k) for d in dcts)} for d in dcts]
[{3: 4, 5: 6}, {1: 3}, {8: 11, 9: 12, 6: 10}]
edit:
more efficient because the max is only computed once for each key:
>>> from operator import or_
>>> from functools import reduce
>>> allkeys = reduce(or_, (d.viewkeys() for d in dcts))
>>> max_vals = {k:max(d.get(k) for d in dcts) for k in allkeys}
>>> result = [{k:v for k,v in d.items() if v == max_vals[k]} for d in dcts]
>>> result
[{3: 4, 5: 6}, {1: 3}, {8: 11, 9: 12, 6: 10}]

Is it possible to add <key, value> pair at the end of the dictionary in python

When I introduce new pair it is inserted at the beginning of dictionary. Is it possible to append it at the end?

UPDATE
As of Python 3.7, dictionaries remember the insertion order. By simply adding a new value, you can be sure that it will be "at the end" if you iterate over the dictionary.
Dictionaries have no order, and thus have no beginning or end. The display order is arbitrary.
If you need order, you can use a list of tuples instead of a dict:
In [1]: mylist = []
In [2]: mylist.append(('key', 'value'))
In [3]: mylist.insert(0, ('foo', 'bar'))
You'll be able to easily convert it into a dict later:
In [4]: dict(mylist)
Out[4]: {'foo': 'bar', 'key': 'value'}
Alternatively, use a collections.OrderedDict as suggested by IamAlexAlright.

A dict in Python is not "ordered" - in Python 2.7+ there's collections.OrderedDict, but apart from that - no... The key point of a dictionary in Python is efficient key->lookup value... The order you're seeing them in is completely arbitrary depending on the hash algorithm...

No. Check the OrderedDict from collections module.

dictionary data is inorder collection
if u add data to dict use this :
Adding a new key value pair
Dic.update( {'key' : 'value' } )
If key is string you can directly add without curly braces
Dic.update( key= 'value' )

If you intend for updated values to move to the end of the dict then you can pop the key first then update the dict.
For example:
In [1]: number_dict = {str(index): index for index in range(10)}
In [2]: number_dict.update({"3": 13})
In [3]: number_dict
Out[3]:
{'0': 0,
'1': 1,
'2': 2,
'3': 13,
'4': 4,
'5': 5,
'6': 6,
'7': 7,
'8': 8,
'9': 9}
In [4]: number_dict = {str(index): index for index in range(10)}
In [5]: number_dict.pop("3", None)
In [6]: number_dict.update({"3": 13})
In [7]: number_dict
Out[7]:
{'0': 0,
'1': 1,
'2': 2,
'4': 4,
'5': 5,
'6': 6,
'7': 7,
'8': 8,
'9': 9,
'3': 13}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combining Two Dictionaries by Like Keys - python

common_keys = set(first_dict.keys()) & set(second_dict.keys()) combined_dict = { key: {'first_dict': first_dict[key], 'second_dict': second_dict[key] } for key in common_keys }

Related

Why is this dictionary turning into a tuple?

Formatting to nested dict [duplicate]

Merging dict of dicts and sum values

python trim down dictionaries in a list of dictionaries

Is it possible to add <key, value> pair at the end of the dictionary in python

Categories

Resources