Merge multiple dictionaries conditionally - python

I have N dictionaries that contain the same keys, with values that are integers. I want to merge these into a single dictionary based on the maximum value. Currently I have something like this:
max_dict = {}
for dict in original_dict_list:
for key, val in dict.iteritems():
if key not in max_dict or max_dict[key] < val:
max_dict[key] = val
Is there a better (or more "pythonic") way of doing this?

Use collection.Counter() objects instead, they support 'merging' counts natively:
from collections import Counter
max_dict = Counter()
for d in original_dict_list:
max_dict |= Counter(d)
or even:
from collections import Counter
from operator import or_
max_dict = reduce(or_, map(Counter, original_dict_list))
Counter objects are multi-sets (also called 'bags' sometimes). The | operator performs a union on two counters, storing the maximum count for a given key.
A Counter is also a straight subclass of dict, so you can (mostly) treat it like any other dictionary.
Demo:
>>> from collections import Counter
>>> from operator import or_
>>> original_dict_list = [{'foo': 3, 'bar': 10}, {'foo': 42, 'spam': 20}, {'bar': 5, 'ham': 10}]
>>> reduce(or_, map(Counter, original_dict_list))
Counter({'foo': 42, 'spam': 20, 'bar': 10, 'ham': 10})

Assuming not all dictionaries contain all keys:
keys = set(k for x in original_dict_list for k in x)
max_dict = {k:max([x[k] for x in original_dict_list if k in x]) for k in keys}

max_dict = { k:max([v]+[ _dict[k] for _dict in original_dict_list[1:] ])
for k,v in original_dict_list[0].items() }

Related

Adding dictionaries together in Python

If i have 2 dictionaries x={'a':1,'b':2} and y={'a':1,'b':3}
and i want the output z={'a':2,'b':5}, is there a z=dict.add(x,y) function or should i convert both dictionaries into dataframes and then add them together with z=x.add(y)?
You could use Counter in this case for example:
from pprint import pprint
from collections import Counter
x={'a':1,'b':2}
y={'a':1,'b':3}
c = Counter()
c.update(x)
c.update(y)
pprint(dict(c))
Output:
{'a': 2, 'b': 5}
Or using +:
from pprint import pprint
from collections import Counter
x={'a':1,'b':2}
y={'a':1,'b':3}
pprint(dict(Counter(x) + Counter(y)))
collections.Counter is the natural method, but you can also use a dictionary comprehension after calculating the union of your dictionary keys:
x = {'a':1, 'b':2}
y = {'a':1, 'b':3}
dict_tup = (x, y)
keys = set().union(*dict_tup)
z = {k: sum(i.get(k, 0) for i in dict_tup) for k in keys}
print(z)
{'a': 2, 'b': 5}
Code:
from collections import Counter
x = {"a":1, "b":2}
y = {"a":1, "b":3}
c = Counter(x)
c += Counter(y)
z = dict(c)
print(z)
Output:
{'a': 2, 'b': 5}

Filter out elements that occur less times than a minimum threshold

After trying to count the occurrences of an element in a list using the below code
from collections import Counter
A = ['a','a','a','b','c','b','c','b','a']
A = Counter(A)
min_threshold = 3
After calling Counter on A above, a counter object like this is formed:
>>> A
Counter({'a': 4, 'b': 3, 'c': 2})
From here, how do I filter only 'a' and 'b' using minimum threshold value of 3?
Build your Counter, then use a dict comprehension as a second, filtering step.
{x: count for x, count in A.items() if count >= min_threshold}
# {'a': 4, 'b': 3}
As covered by Satish BV, you can iterate over your Counter with a dictionary comprehension. You could use items (or iteritems for more efficiency and if you're on Python 2) to get a sequence of (key, value) tuple pairs.
And then turn that into a Counter.
my_dict = {k: v for k, v in A.iteritems() if v >= min_threshold}
filteredA = Counter(my_dict)
Alternatively, you could iterate over the original Counter and remove the unnecessary values.
for k, v in A.items():
if v < min_threshold:
A.pop(k)
This looks nicer:
{ x: count for x, count in A.items() if count >= min_threshold }
You could remove the keys from the dictionary that are below 3:
for key, cnts in list(A.items()): # list is important here
if cnts < min_threshold:
del A[key]
Which gives you:
>>> A
Counter({'a': 4, 'b': 3})

How can you get a python dictionary to have duplicate keys holding values?

I'm working on an assignment. Is there anyway a dictionary can have duplicate keys and hold the same or different values. Here is an example of what i'm trying to do:
dict = {
'Key1' : 'Helo', 'World'
'Key1' : 'Helo'
'Key1' : 'Helo', 'World'
}
I tried doing this but when I associate any value to key1, it gets added to the same key1.
Is this possible with a dictionary? If not what other data structure I can use to implement this process?
Use dictionaries of lists to hold multiple values.
One way to have multiple values to a key is to use a dictionary of lists.
x = { 'Key1' : ['Hello', 'World'],
'Key2' : ['Howdy', 'Neighbor'],
'Key3' : ['Hey', 'Dude']
}
To get the list you want (or make a new one), I recommend using setdefault.
my_list = x.setdefault(key, [])
Example:
>>> x = {}
>>> x['abc'] = [1,2,3,4]
>>> x
{'abc': [1, 2, 3, 4]}
>>> x.setdefault('xyz', [])
[]
>>> x.setdefault('abc', [])
[1, 2, 3, 4]
>>> x
{'xyz': [], 'abc': [1, 2, 3, 4]}
Using defaultdict for the same functionality
To make this even easier, the collections module has a defaultdict object that simplifies this. Just pass it a constructor/factory.
from collections import defaultdict
x = defaultdict(list)
x['key1'].append(12)
x['key1'].append(13)
You can also use dictionaries of dictionaries or even dictionaries of sets.
>>> from collections import defaultdict
>>> dd = defaultdict(dict)
>>> dd
defaultdict(<type 'dict'>, {})
>>> dd['x']['a'] = 23
>>> dd
defaultdict(<type 'dict'>, {'x': {'a': 23}})
>>> dd['x']['b'] = 46
>>> dd['y']['a'] = 12
>>> dd
defaultdict(<type 'dict'>, {'y': {'a': 12}, 'x': {'a': 23, 'b': 46}})
I think you want collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
list_of_values = [['Hello', 'World'], 'Hello', ['Hello', 'World']]
for v in list_of_values:
d['Key1'].append(v)
print d
This will deal with duplicate keys, and instead of overwriting the key, it will append something to that list of values.
Keys are unique to the data. Consider using some other value for a key or consider using a different data structure to hold this data.
for example:
don't use a persons address as a unique key because several people might live there.
a person's social security number or a drivers license is a much better unique id of a person.
you can create your own id to force it to be unique.

Merge several Python dictionaries [duplicate]

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 3 months ago.
I have to merge list of python dictionary. For eg:
dicts[0] = {'a':1, 'b':2, 'c':3}
dicts[1] = {'a':1, 'd':2, 'c':'foo'}
dicts[2] = {'e':57,'c':3}
super_dict = {'a':[1], 'b':[2], 'c':[3,'foo'], 'd':[2], 'e':[57]}
I wrote the following code:
super_dict = {}
for d in dicts:
for k, v in d.items():
if super_dict.get(k) is None:
super_dict[k] = []
if v not in super_dict.get(k):
super_dict[k].append(v)
Can it be presented more elegantly / optimized?
Note
I found another question on SO but its about merging exactly 2 dictionaries.
You can iterate over the dictionaries directly -- no need to use range. The setdefault method of dict looks up a key, and returns the value if found. If not found, it returns a default, and also assigns that default to the key.
super_dict = {}
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict.setdefault(k, []).append(v)
Also, you might consider using a defaultdict. This just automates setdefault by calling a function to return a default value when a key isn't found.
import collections
super_dict = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].append(v)
Also, as Sven Marnach astutely observed, you seem to want no duplication of values in your lists. In that case, set gets you what you want:
import collections
super_dict = collections.defaultdict(set)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].add(v)
from collections import defaultdict
dicts = [{'a':1, 'b':2, 'c':3},
{'a':1, 'd':2, 'c':'foo'},
{'e':57, 'c':3} ]
super_dict = defaultdict(set) # uses set to avoid duplicates
for d in dicts:
for k, v in d.items(): # use d.iteritems() in python 2
super_dict[k].add(v)
you can use this behaviour of dict. (a bit elegant)
a = {'a':1, 'b':2, 'c':3}
b = {'d':1, 'e':2, 'f':3}
c = {1:1, 2:2, 3:3}
merge = {**a, **b, **c}
print(merge) # {'a': 1, 'b': 2, 'c': 3, 'd': 1, 'e': 2, 'f': 3, 1: 1, 2: 2, 3: 3}
and you are good to go :)
Merge the keys of all dicts, and for each key assemble the list of values:
super_dict = {}
for k in set(k for d in dicts for k in d):
super_dict[k] = [d[k] for d in dicts if k in d]
The expression set(k for d in dicts for k in d) builds a set of all unique keys of all dictionaries. For each of these unique keys, we use the list comprehension [d[k] for d in dicts if k in d] to build the list of values from all dicts for this key.
Since you only seem to one the unique value of each key, you might want to use sets instead:
super_dict = {}
for k in set(k for d in dicts for k in d):
super_dict[k] = set(d[k] for d in dicts if k in d)
It seems like most of the answers using comprehensions are not all that readable. In case any gets lost in the mess of answers above this might be helpful (although extremely late...). Just loop over the items of each dict and place them in a separate one.
super_dict = {key:val for d in dicts for key,val in d.items()}
When the value of the keys are in list:
from collections import defaultdict
dicts = [{'a':[1], 'b':[2], 'c':[3]},
{'a':[11], 'd':[2], 'c':['foo']},
{'e':[57], 'c':[3], "a": [1]} ]
super_dict = defaultdict(list) # uses set to avoid duplicates
for d in dicts:
for k, v in d.items(): # use d.iteritems() in python 2
super_dict[k] = list(set(super_dict[k] + v))
combined_dict = {}
for elem in super_dict.keys():
combined_dict[elem] = super_dict[elem]
combined_dict
## output: {'a': [1, 11], 'b': [2], 'c': [3, 'foo'], 'd': [2], 'e': [57]}
I have a very easy to go solution without any imports.
I use the dict.update() method.
But sadly it will overwrite, if same key appears in more than one dictionary, then the most recently merged dict's value will appear in the output.
dict1 = {'Name': 'Zara', 'Age': 7}
dict2 = {'Sex': 'female' }
dict3 = {'Status': 'single', 'Age': 27}
dict4 = {'Occupation':'nurse', 'Wage': 3000}
def mergedict(*args):
output = {}
for arg in args:
output.update(arg)
return output
print(mergedict(dict1, dict2, dict3, dict4))
The output is this:
{'Name': 'Zara', 'Age': 27, 'Sex': 'female', 'Status': 'single', 'Occupation': 'nurse', 'Wage': 3000}
Perhaps a more modern and concise approach for those who use python 3.3 or later versions is the use of ChainMap from the collections module.
from collections import ChainMap
d1 = {'a': 1, 'b': 3}
d2 = {'c': 2}
d3 = {'d': 7, 'a': 9}
d4 = {}
combo = dict(ChainMap(d1, d2, d3, d4))
# {'d': 7, 'a': 1, 'c': 2, 'b': 3}
For a larger collection of dict objects then star operator works
dict(ChainMap(*dict_collection))
Note that the resulting dictionary seems to only keep the value of the first key it encounters in the ordered collection and ignores any further duplicates.
This may be a bit more elegant:
super_dict = {}
for d in dicts:
for k, v in d.iteritems():
l=super_dict.setdefault(k,[])
if v not in l:
l.append(v)
UPDATE: made change suggested by Sven
UPDATE: changed to avoid duplicates (thanks Marcin and Steven)
Never forget that the standard libraries have a wealth of tools for dealing with dicts and iteration:
from itertools import chain
from collections import defaultdict
super_dict = defaultdict(list)
for k,v in chain.from_iterable(d.iteritems() for d in dicts):
if v not in super_dict[k]: super_dict[k].append(v)
Note that the if v not in super_dict[k] can be avoided by using defaultdict(set) as per Steven Rumbalski's answer.
If you assume that the keys in which you are interested are at the same nested level, you can recursively traverse each dictionary and create a new dictionary using that key, effectively merging them.
merged = {}
for d in dicts:
def walk(d,merge):
for key, item in d.items():
if isinstance(item, dict):
merge.setdefault(key, {})
walk(item, merge[key])
else:
merge.setdefault(key, [])
merge[key].append(item)
walk(d,merged)
For example, say you have the following dictionaries you want to merge.
dicts = [{'A': {'A1': {'FOO': [1,2,3]}}},
{'A': {'A1': {'A2': {'BOO': [4,5,6]}}}},
{'A': {'A1': {'FOO': [7,8]}}},
{'B': {'B1': {'COO': [9]}}},
{'B': {'B2': {'DOO': [10,11,12]}}},
{'C': {'C1': {'C2': {'POO':[13,14,15]}}}},
{'C': {'C1': {'ROO': [16,17]}}}]
Using the key at each level, you should get something like this:
{'A': {'A1': {'FOO': [[1, 2, 3], [7, 8]],
'A2': {'BOO': [[4, 5, 6]]}}},
'B': {'B1': {'COO': [[9]]},
'B2': {'DOO': [[10, 11, 12]]}},
'C': {'C1': {'C2': {'POO': [[13, 14, 15]]},
'ROO': [[16, 17]]}}}
Note: I assume the leaf at each branch is a list of some kind, but you can obviously change the logic to do whatever is necessary for your situation.
This is a more recent enhancement over the prior answer by ElbowPipe, using newer syntax introduced in Python 3.9 for merging dictionaries. Note that this answer does not merge conflicting values into a list!
> import functools
> import operator
> functools.reduce(operator.or_, [{0:1}, {2:3, 4:5}, {2:6}])
{0: 1, 2: 6, 4: 5}
For a oneliner, the following could be used:
{key: {d[key] for d in dicts if key in d} for key in {key for d in dicts for key in d}}
although readibility would benefit from naming the combined key set:
combined_key_set = {key for d in dicts for key in d}
super_dict = {key: {d[key] for d in dicts if key in d} for key in combined_key_set}
Elegance can be debated but personally I prefer comprehensions over for loops. :)
(The dictionary and set comprehensions are available in Python 2.7/3.1 and newer.)
python 3.x (reduce is builtin for python 2.x, so no need to import if in 2.x)
import operator
from functools import operator.add
a = [{'a': 1}, {'b': 2}, {'c': 3, 'd': 4}]
dict(reduce(operator.add, map(list,(map(dict.items, a))))
map(dict.items, a) # converts to list of key, value iterators
map(list, ... # converts to iterator equivalent of [[[a, 1]], [[b, 2]], [[c, 3],[d,4]]]
reduce(operator.add, ... # reduces the multiple list down to a single list
My solution is similar to #senderle proposed, but instead of for loop I used map
super_dict = defaultdict(set)
map(lambda y: map(lambda x: super_dict[x].add(y[x]), y), dicts)
The use of defaultdict is good, this also can be done with the use of itertools.groupby.
import itertools
# output all dict items, and sort them by key
dicts_ele = sorted( ( item for d in dicts for item in d.items() ), key = lambda x: x[0] )
# groups items by key
ele_groups = itertools.groupby( dicts_ele, key = lambda x: x[0] )
# iterates over groups and get item value
merged = { k: set( v[1] for v in grouped ) for k, grouped in ele_groups }
and obviously, you can merge this block of code into one-line style
merged = {
k: set( v[1] for v in grouped )
for k, grouped in (
itertools.groupby(
sorted(
( item for d in dicts for item in d.items() ),
key = lambda x: x[0]
),
key = lambda x: x[0]
)
)
}
I'm a bit late to the game but I did it in 2 lines with no dependencies beyond python itself:
flatten = lambda *c: (b for a in c for b in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))
o = reduce(lambda d1,d2: dict((k, list(flatten([d1.get(k), d2.get(k)]))) for k in set(d1.keys() + d2.keys())), dicts)
# output:
# {'a': [1, 1, None], 'c': [3, 'foo', 3], 'b': [2, None, None], 'e': [None, 57], 'd': [None, 2, None]}
Though if you don't care about nested lists, then:
o2 = reduce(lambda d1,d2: dict((k, [d1.get(k), d2.get(k)]) for k in set(d1.keys() + d2.keys())), dicts)
# output:
# {'a': [[1, 1], None], 'c': [[3, 'foo'], 3], 'b': [[2, None], None], 'e': [None, 57], 'd': [[None, 2], None]}

How to sum dict elements

In Python,
I have list of dicts:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
I want one final dict that will contain the sum of all dicts.
I.e the result will be: {'a':5, 'b':7}
N.B: every dict in the list will contain same number of key, value pairs.
You can use the collections.Counter
counter = collections.Counter()
for d in dict1:
counter.update(d)
Or, if you prefer oneliners:
functools.reduce(operator.add, map(collections.Counter, dict1))
A little ugly, but a one-liner:
dictf = reduce(lambda x, y: dict((k, v + y[k]) for k, v in x.iteritems()), dict1)
Leveraging sum() should get better performance when adding more than a few dicts
>>> dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
>>> from operator import itemgetter
>>> {k:sum(map(itemgetter(k), dict1)) for k in dict1[0]} # Python2.7+
{'a': 5, 'b': 7}
>>> dict((k,sum(map(itemgetter(k), dict1))) for k in dict1[0]) # Python2.6
{'a': 5, 'b': 7}
adding Stephan's suggestion
>>> {k: sum(d[k] for d in dict1) for k in dict1[0]} # Python2.7+
{'a': 5, 'b': 7}
>>> dict((k, sum(d[k] for d in dict1)) for k in dict1[0]) # Python2.6
{'a': 5, 'b': 7}
I think Stephan's version of the Python2.7 code reads really nicely
This might help:
def sum_dict(d1, d2):
for key, value in d1.items():
d1[key] = value + d2.get(key, 0)
return d1
>>> dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
>>> reduce(sum_dict, dict1)
{'a': 5, 'b': 7}
The following code shows one way to do it:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
final = {}
for k in dict1[0].keys(): # Init all elements to zero.
final[k] = 0
for d in dict1:
for k in d.keys():
final[k] = final[k] + d[k] # Update the element.
print final
This outputs:
{'a': 5, 'b': 7}
as you desired.
Or, as inspired by kriss, better but still readable:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
final = {}
for d in dict1:
for k in d.keys():
final[k] = final.get(k,0) + d[k]
print final
I pine for the days of the original, readable Python :-)
I was interested in the performance of the proposed Counter, reduce and sum methods for large lists. Maybe someone else is interested in this as well.
You can have a look here: https://gist.github.com/torstenrudolf/277e98df296f23ff921c
I tested the three methods for this list of dictionaries:
dictList = [{'a': x, 'b': 2*x, 'c': x**2} for x in xrange(10000)]
the sum method showed the best performance, followed by reduce and Counter was the slowest. The time showed below is in seconds.
In [34]: test(dictList)
Out[34]:
{'counter': 0.01955194902420044,
'reduce': 0.006518083095550537,
'sum': 0.0018319153785705566}
But this is dependent on the number of elements in the dictionaries. the sum method will slow down faster than the reduce.
l = [{y: x*y for y in xrange(100)} for x in xrange(10000)]
In [37]: test(l, num=100)
Out[37]:
{'counter': 0.2401433277130127,
'reduce': 0.11110662937164306,
'sum': 0.2256883692741394}
You can also use the pandas sum function to compute the sum:
import pandas as pd
# create a DataFrame
df = pd.DataFrame(dict1)
# compute the sum and convert to dict.
dict(df.sum())
This results in:
{'a': 5, 'b': 7}
It also works for floating points:
dict2 = [{'a':2, 'b':3.3},{'a':3, 'b':4.5}]
dict(pd.DataFrame(dict2).sum())
Gives the correct results:
{'a': 5.0, 'b': 7.8}
In Python 2.7 you can replace the dict with a collections.Counter object. This supports addition and subtraction of Counters.
Here is a reasonable beatiful one.
final = {}
for k in dict1[0].Keys():
final[k] = sum(x[k] for x in dict1)
return final
Here is another working solution (python3), quite general as it works for dict, lists, arrays. For non-common elements, the original value will be included in the output dict.
def mergsum(a, b):
for k in b:
if k in a:
b[k] = b[k] + a[k]
c = {**a, **b}
return c
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
print(mergsum(dict1[0], dict1[1]))
One further one line solution
dict(
functools.reduce(
lambda x, y: x.update(y) or x, # update, returns None, and we need to chain.
dict1,
collections.Counter())
)
This creates only one counter, uses it as an accumulator and finally converts back to a dict.

Categories

Resources