Related
I have a list of dicts, and I'd like to remove the dicts with identical key and value pairs.
For this list: [{'a': 123}, {'b': 123}, {'a': 123}]
I'd like to return this: [{'a': 123}, {'b': 123}]
Another example:
For this list: [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}, {'a': 123, 'b': 1234}]
I'd like to return this: [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}]
Try this:
[dict(t) for t in {tuple(d.items()) for d in l}]
The strategy is to convert the list of dictionaries to a list of tuples where the tuples contain the items of the dictionary. Since the tuples can be hashed, you can remove duplicates using set (using a set comprehension here, older python alternative would be set(tuple(d.items()) for d in l)) and, after that, re-create the dictionaries from tuples with dict.
where:
l is the original list
d is one of the dictionaries in the list
t is one of the tuples created from a dictionary
Edit: If you want to preserve ordering, the one-liner above won't work since set won't do that. However, with a few lines of code, you can also do that:
l = [{'a': 123, 'b': 1234},
{'a': 3222, 'b': 1234},
{'a': 123, 'b': 1234}]
seen = set()
new_l = []
for d in l:
t = tuple(d.items())
if t not in seen:
seen.add(t)
new_l.append(d)
print new_l
Example output:
[{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}]
Note: As pointed out by #alexis it might happen that two dictionaries with the same keys and values, don't result in the same tuple. That could happen if they go through a different adding/removing keys history. If that's the case for your problem, then consider sorting d.items() as he suggests.
Another one-liner based on list comprehensions:
>>> d = [{'a': 123}, {'b': 123}, {'a': 123}]
>>> [i for n, i in enumerate(d) if i not in d[n + 1:]]
[{'b': 123}, {'a': 123}]
Here since we can use dict comparison, we only keep the elements that are not in the rest of the initial list (this notion is only accessible through the index n, hence the use of enumerate).
If using a third-party package would be okay then you could use iteration_utilities.unique_everseen:
>>> from iteration_utilities import unique_everseen
>>> l = [{'a': 123}, {'b': 123}, {'a': 123}]
>>> list(unique_everseen(l))
[{'a': 123}, {'b': 123}]
It preserves the order of the original list and ut can also handle unhashable items like dictionaries by falling back on a slower algorithm (O(n*m) where n are the elements in the original list and m the unique elements in the original list instead of O(n)). In case both keys and values are hashable you can use the key argument of that function to create hashable items for the "uniqueness-test" (so that it works in O(n)).
In the case of a dictionary (which compares independent of order) you need to map it to another data-structure that compares like that, for example frozenset:
>>> list(unique_everseen(l, key=lambda item: frozenset(item.items())))
[{'a': 123}, {'b': 123}]
Note that you shouldn't use a simple tuple approach (without sorting) because equal dictionaries don't necessarily have the same order (even in Python 3.7 where insertion order - not absolute order - is guaranteed):
>>> d1 = {1: 1, 9: 9}
>>> d2 = {9: 9, 1: 1}
>>> d1 == d2
True
>>> tuple(d1.items()) == tuple(d2.items())
False
And even sorting the tuple might not work if the keys aren't sortable:
>>> d3 = {1: 1, 'a': 'a'}
>>> tuple(sorted(d3.items()))
TypeError: '<' not supported between instances of 'str' and 'int'
Benchmark
I thought it might be useful to see how the performance of these approaches compares, so I did a small benchmark. The benchmark graphs are time vs. list-size based on a list containing no duplicates (that was chosen arbitrarily, the runtime doesn't change significantly if I add some or lots of duplicates). It's a log-log plot so the complete range is covered.
The absolute times:
The timings relative to the fastest approach:
The second approach from thefourtheye is fastest here. The unique_everseen approach with the key function is on the second place, however it's the fastest approach that preserves order. The other approaches from jcollado and thefourtheye are almost as fast. The approach using unique_everseen without key and the solutions from Emmanuel and Scorpil are very slow for longer lists and behave much worse O(n*n) instead of O(n). stpks approach with json isn't O(n*n) but it's much slower than the similar O(n) approaches.
The code to reproduce the benchmarks:
from simple_benchmark import benchmark
import json
from collections import OrderedDict
from iteration_utilities import unique_everseen
def jcollado_1(l):
return [dict(t) for t in {tuple(d.items()) for d in l}]
def jcollado_2(l):
seen = set()
new_l = []
for d in l:
t = tuple(d.items())
if t not in seen:
seen.add(t)
new_l.append(d)
return new_l
def Emmanuel(d):
return [i for n, i in enumerate(d) if i not in d[n + 1:]]
def Scorpil(a):
b = []
for i in range(0, len(a)):
if a[i] not in a[i+1:]:
b.append(a[i])
def stpk(X):
set_of_jsons = {json.dumps(d, sort_keys=True) for d in X}
return [json.loads(t) for t in set_of_jsons]
def thefourtheye_1(data):
return OrderedDict((frozenset(item.items()),item) for item in data).values()
def thefourtheye_2(data):
return {frozenset(item.items()):item for item in data}.values()
def iu_1(l):
return list(unique_everseen(l))
def iu_2(l):
return list(unique_everseen(l, key=lambda inner_dict: frozenset(inner_dict.items())))
funcs = (jcollado_1, Emmanuel, stpk, Scorpil, thefourtheye_1, thefourtheye_2, iu_1, jcollado_2, iu_2)
arguments = {2**i: [{'a': j} for j in range(2**i)] for i in range(2, 12)}
b = benchmark(funcs, arguments, 'list size')
%matplotlib widget
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style.use('ggplot')
mpl.rcParams['figure.figsize'] = '8, 6'
b.plot(relative_to=thefourtheye_2)
For completeness here is the timing for a list containing only duplicates:
# this is the only change for the benchmark
arguments = {2**i: [{'a': 1} for j in range(2**i)] for i in range(2, 12)}
The timings don't change significantly except for unique_everseen without key function, which in this case is the fastest solution. However that's just the best case (so not representative) for that function with unhashable values because it's runtime depends on the amount of unique values in the list: O(n*m) which in this case is just 1 and thus it runs in O(n).
Disclaimer: I'm the author of iteration_utilities.
Other answers would not work if you're operating on nested dictionaries such as deserialized JSON objects. For this case you could use:
import json
set_of_jsons = {json.dumps(d, sort_keys=True) for d in X}
X = [json.loads(t) for t in set_of_jsons]
If you are using Pandas in your workflow, one option is to feed a list of dictionaries directly to the pd.DataFrame constructor. Then use drop_duplicates and to_dict methods for the required result.
import pandas as pd
d = [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}, {'a': 123, 'b': 1234}]
d_unique = pd.DataFrame(d).drop_duplicates().to_dict('records')
print(d_unique)
[{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}]
Sometimes old-style loops are still useful. This code is little longer than jcollado's, but very easy to read:
a = [{'a': 123}, {'b': 123}, {'a': 123}]
b = []
for i in range(len(a)):
if a[i] not in a[i+1:]:
b.append(a[i])
If you want to preserve the Order, then you can do
from collections import OrderedDict
print OrderedDict((frozenset(item.items()),item) for item in data).values()
# [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}]
If the order doesn't matter, then you can do
print {frozenset(item.items()):item for item in data}.values()
# [{'a': 3222, 'b': 1234}, {'a': 123, 'b': 1234}]
Not a universal answer, but if your list happens to be sorted by some key, like this:
l=[{'a': {'b': 31}, 't': 1},
{'a': {'b': 31}, 't': 1},
{'a': {'b': 145}, 't': 2},
{'a': {'b': 25231}, 't': 2},
{'a': {'b': 25231}, 't': 2},
{'a': {'b': 25231}, 't': 2},
{'a': {'b': 112}, 't': 3}]
then the solution is as simple as:
import itertools
result = [a[0] for a in itertools.groupby(l)]
Result:
[{'a': {'b': 31}, 't': 1},
{'a': {'b': 145}, 't': 2},
{'a': {'b': 25231}, 't': 2},
{'a': {'b': 112}, 't': 3}]
Works with nested dictionaries and (obviously) preserves order.
You can use a set, but you need to turn the dicts into a hashable type.
seq = [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}, {'a': 123, 'b': 1234}]
unique = set()
for d in seq:
t = tuple(d.iteritems())
unique.add(t)
Unique now equals
set([(('a', 3222), ('b', 1234)), (('a', 123), ('b', 1234))])
To get dicts back:
[dict(x) for x in unique]
Easiest way, convert each item in the list to string, since dictionary is not hashable. Then you can use set to remove the duplicates.
list_org = [{'a': 123}, {'b': 123}, {'a': 123}]
list_org_updated = [ str(item) for item in list_org]
print(list_org_updated)
["{'a': 123}", "{'b': 123}", "{'a': 123}"]
unique_set = set(list_org_updated)
print(unique_set)
{"{'b': 123}", "{'a': 123}"}
You can use the set, but if you do want a list, then add the following:
import ast
unique_list = [ast.literal_eval(item) for item in unique_set]
print(unique_list)
[{'b': 123}, {'a': 123}]
input_list =[{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}, {'a': 123, 'b': 1234}]
#output required => [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}]
#code
list = [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}, {'a': 123, 'b': 1234}]
empty_list = []
for item in list:
if item not in empty_list:
empty_list.append(item)
print("previous list =",list)
print("Updated list =",empty_list)
#output
previous list = [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}, {'a': 123, 'b': 1234}]
Updated list = [{'a': 123, 'b': 1234}, {'a': 3222, 'b': 1234}]
Here's a quick one-line solution with a doubly-nested list comprehension (based on #Emmanuel 's solution).
This uses a single key (for example, a) in each dict as the primary key, rather than checking if the entire dict matches
[i for n, i in enumerate(list_of_dicts) if i.get(primary_key) not in [y.get(primary_key) for y in list_of_dicts[n + 1:]]]
It's not what OP asked for, but it's what brought me to this thread, so I figured I'd post the solution I ended up with
Not so short but easy to read:
list_of_data = [{'a': 123}, {'b': 123}, {'a': 123}]
list_of_data_uniq = []
for data in list_of_data:
if data not in list_of_data_uniq:
list_of_data_uniq.append(data)
Now, list list_of_data_uniq will have unique dicts.
Remove duplications by custom key:
def remove_duplications(arr, key):
return list({key(x): x for x in arr}.values())
A lot of good examples searching for duplicate values and keys, below is the way we filter out whole dictionary duplicate data in lists. Use dupKeys = [] if your source data is comprised of EXACT formatted dictionaries and looking for duplicates. Otherwise set dupKeys = to the key names of the data you want to not have duplicate entries of, can be 1 to n keys. It aint elegant, but works and is very flexible
import binascii
collected_sensor_data = [{"sensor_id":"nw-180","data":"XXXXXXX"},
{"sensor_id":"nw-163","data":"ZYZYZYY"},
{"sensor_id":"nw-180","data":"XXXXXXX"},
{"sensor_id":"nw-97", "data":"QQQQQZZ"}]
dupKeys = ["sensor_id", "data"]
def RemoveDuplicateDictData(collected_sensor_data, dupKeys):
checkCRCs = []
final_sensor_data = []
if dupKeys == []:
for sensor_read in collected_sensor_data:
ck1 = binascii.crc32(str(sensor_read).encode('utf8'))
if not ck1 in checkCRCs:
final_sensor_data.append(sensor_read)
checkCRCs.append(ck1)
else:
for sensor_read in collected_sensor_data:
tmp = ""
for k in dupKeys:
tmp += str(sensor_read[k])
ck1 = binascii.crc32(tmp.encode('utf8'))
if not ck1 in checkCRCs:
final_sensor_data.append(sensor_read)
checkCRCs.append(ck1)
return final_sensor_data
final_sensor_data = [{"sensor_id":"nw-180","data":"XXXXXXX"},
{"sensor_id":"nw-163","data":"ZYZYZYY"},
{"sensor_id":"nw-97", "data":"QQQQQZZ"}]
If you don't care about scale and crazy performance, simple func:
# Filters dicts with the same value in unique_key
# in: [{'k1': 1}, {'k1': 33}, {'k1': 1}]
# out: [{'k1': 1}, {'k1': 33}]
def remove_dup_dicts(list_of_dicts: list, unique_key) -> list:
unique_values = list()
unique_dicts = list()
for obj in list_of_dicts:
val = obj.get(unique_key)
if val not in unique_values:
unique_values.append(val)
unique_dicts.append(obj)
return unique_dicts
i have this scenario
x=['a','b','c'] #Header
y=[(1,2,3),(4,5,6)] #data
I need to create below structure
[{'a':1, 'b':2, 'c':3}, {'a':4, 'b':5, 'c':6}]
Any better way of doing this(like a python expert)
rows=[]
for row in range(0,len(y)):
rec={}
for col in range(0, len(x)):
rec[x[col]]=y[row][col]
rows.append(rec)
print(rows)
above code will give the desired result, but i am looking for a one liner solution some thing like below
rows=list( ( {x[col]:y[row][col]} for row in range(0,len(y)) for col in range(0, len(x)) ) )
output:
[{'a': 1}, {'b': 2}, {'c': 3}, {'a': 4}, {'b': 5}, {'c': 6}]
but this gives list as individual dict's rather than a combined dict. Any ideas???
You could write a generator that iterates over data. Then for each item in data use zip to generate iterable of (header, value) tuples that you pass to dict:
>>> x = ['a','b','c']
>>> y = [(1,2,3),(4,5,6)]
>>> gen = (dict(zip(x, z)) for z in y)
>>> list(gen)
[{'a': 1, 'c': 3, 'b': 2}, {'a': 4, 'c': 6, 'b': 5}]
Update The example above uses generator expression instead of list since the code writing CSV would only need one row at a time. Generating the full list would require much more memory with no benefit.
Given the two dictionaries below, I'm trying to create a list of new dictionaries combining the items from one (com), which will repeat for each member, with only the values of the second (e), entered one at a time, under key 'n'. E.g. first list member would be:
{'n': 330, 'b': 2, 'a': 1}
If I use update() within the list comprehension to add the key-pair values from the first dictionary to the result I get a list with two None members.
I've tried different ways to write this, e.g. using map() and on both python 2 and 3; so I ask the experts.
>>> com
{'b': 2, 'a': 1}
>>> e
{'p': 330, 'r': 220}
>>> [n for rt in e.values() for n in [{'n':rt}]]
[{'n': 330}, {'n': 220}]
>>> [n.update(com) for rt in e.values() for n in [{'n':rt}]]
[None, None]
To clarify the problem statement:
Given a fixed map com and a map containing a series of elements (I would find the name 'nvals' more mnemonic), return a list of maps, where each element of the list has one of the values from nvals as the value of key 'n', and all the elements of com.
So the following code returns the required result:
>>> com = {'b': 2, 'a': 1}
>>> nvals = {'p': 330, 'r': 220}
>>> l = []
>>> for n in nvals.values():
... d = dict(com) # Make sure to clone the dictionary.
... d["n"] = n
... l.append(d)
...
>>> l
[{'n': 330, 'b': 2, 'a': 1}, {'n': 220, 'b': 2, 'a': 1}]
>>>
The trouble is that is not a list comprehension
>>> [dict(list(com.items()) + [('n', n)]) for n in nvals.values()]
[{'n': 330, 'b': 2, 'a': 1}, {'n': 220, 'b': 2, 'a': 1}]
>>>
Seems to meet the requirements.
I have to say, I think I find the loop easier to read.
Try this:
com = {'b': 2, 'a': 1}
e = {'p': 330, 'r': 220}
res = [{i:j for i,j in com.items()+[("n",v)]} for v in e.values()]
Or this:
res = map(lambda x: dict([("n",x)]+com.items()), e.values())
Or maybe:
res = [dict(zip(["n"]+com.keys(),[v]+com.values())) for v in e.values()]
Don't you just love python?
I have a dictionary inside a defaultdict. I noticed that the dictionary is being shared across keys and therefore it takes the values of the last write. How can I isolate those dictionaries?
>>> from collections import defaultdict
>>> defaults = [('a', 1), ('b', {})]
>>> dd = defaultdict(lambda: dict(defaults))
>>> dd[0]
{'a': 1, 'b': {}}
>>> dd[1]
{'a': 1, 'b': {}}
>>> dd[0]['b']['k'] = 'v'
>>> dd
defaultdict(<function <lambda> at 0x7f4b3688b398>, {0: {'a': 1, 'b': {'k': 'v'}}, 1:{'a': 1, 'b': {'k': 'v'}}})
>>> dd[1]['b']['k'] = 'v2'
>>> dd
defaultdict(<function <lambda> at 0x7f4b3688b398>, {0: {'a': 1, 'b': {'k': 'v2'}}, 1: {'a': 1, 'b': {'k': 'v2'}}})
Notice that v was set to v2 for both dictionaries. Why is that? and how to change this behavior without much performance overhead?
When you do dict(defaults) you're not copying the inner dictionary, just making another reference to it. So when you change that dictionary, you're going to see the change everywhere it's referenced.
You need deepcopy here to avoid the problem:
import copy
from collections import defaultdict
defaults = {'a': 1, 'b': {}}
dd = defaultdict(lambda: copy.deepcopy(defaults))
Or you need to not use the same inner mutable objects in successive calls by not repeatedly referencing defaults:
dd = defaultdict(lambda: {'a': 1, 'b': {}})
Your values all contain references to the same object from defaults: you rebuild the outer dict, but not the inner one. Just make a function that creates a new, separate object:
def builder():
return {'a': 1, 'b': {}}
dd = defaultdict(builder)
For example I have two dicts:
Dict A: {'a': 1, 'b': 2, 'c': 3}
Dict B: {'b': 3, 'c': 4, 'd': 5}
I need a pythonic way of 'combining' two dicts such that the result is:
{'a': 1, 'b': 5, 'c': 7, 'd': 5}
That is to say: if a key appears in both dicts, add their values, if it appears in only one dict, keep its value.
Use collections.Counter:
>>> from collections import Counter
>>> A = Counter({'a':1, 'b':2, 'c':3})
>>> B = Counter({'b':3, 'c':4, 'd':5})
>>> A + B
Counter({'c': 7, 'b': 5, 'd': 5, 'a': 1})
Counters are basically a subclass of dict, so you can still do everything else with them you'd normally do with that type, such as iterate over their keys and values.
A more generic solution, which works for non-numeric values as well:
a = {'a': 'foo', 'b':'bar', 'c': 'baz'}
b = {'a': 'spam', 'c':'ham', 'x': 'blah'}
r = dict(a.items() + b.items() +
[(k, a[k] + b[k]) for k in set(b) & set(a)])
or even more generic:
def combine_dicts(a, b, op=operator.add):
return dict(a.items() + b.items() +
[(k, op(a[k], b[k])) for k in set(b) & set(a)])
For example:
>>> a = {'a': 2, 'b':3, 'c':4}
>>> b = {'a': 5, 'c':6, 'x':7}
>>> import operator
>>> print combine_dicts(a, b, operator.mul)
{'a': 10, 'x': 7, 'c': 24, 'b': 3}
>>> A = {'a':1, 'b':2, 'c':3}
>>> B = {'b':3, 'c':4, 'd':5}
>>> c = {x: A.get(x, 0) + B.get(x, 0) for x in set(A).union(B)}
>>> print(c)
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
Intro:
There are the (probably) best solutions. But you have to know it and remember it and sometimes you have to hope that your Python version isn't too old or whatever the issue could be.
Then there are the most 'hacky' solutions. They are great and short but sometimes are hard to understand, to read and to remember.
There is, though, an alternative which is to to try to reinvent the wheel.
- Why reinventing the wheel?
- Generally because it's a really good way to learn (and sometimes just because the already-existing tool doesn't do exactly what you would like and/or the way you would like it) and the easiest way if you don't know or don't remember the perfect tool for your problem.
So, I propose to reinvent the wheel of the Counter class from the collections module (partially at least):
class MyDict(dict):
def __add__(self, oth):
r = self.copy()
try:
for key, val in oth.items():
if key in r:
r[key] += val # You can custom it here
else:
r[key] = val
except AttributeError: # In case oth isn't a dict
return NotImplemented # The convention when a case isn't handled
return r
a = MyDict({'a':1, 'b':2, 'c':3})
b = MyDict({'b':3, 'c':4, 'd':5})
print(a+b) # Output {'a':1, 'b': 5, 'c': 7, 'd': 5}
There would probably others way to implement that and there are already tools to do that but it's always nice to visualize how things would basically works.
Definitely summing the Counter()s is the most pythonic way to go in such cases but only if it results in a positive value. Here is an example and as you can see there is no c in result after negating the c's value in B dictionary.
In [1]: from collections import Counter
In [2]: A = Counter({'a':1, 'b':2, 'c':3})
In [3]: B = Counter({'b':3, 'c':-4, 'd':5})
In [4]: A + B
Out[4]: Counter({'d': 5, 'b': 5, 'a': 1})
That's because Counters were primarily designed to work with positive integers to represent running counts (negative count is meaningless). But to help with those use cases,python documents the minimum range and type restrictions as follows:
The Counter class itself is a dictionary
subclass with no restrictions on its keys and values. The values are
intended to be numbers representing counts, but you could store
anything in the value field.
The most_common() method requires only
that the values be orderable.
For in-place operations such as c[key]
+= 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are
supported. The same is also true for update() and subtract() which
allow negative and zero values for both inputs and outputs.
The multiset methods are designed only for use cases with positive values.
The inputs may be negative or zero, but only outputs with positive
values are created. There are no type restrictions, but the value type
needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
So for getting around that problem after summing your Counter you can use Counter.update in order to get the desire output. It works like dict.update() but adds counts instead of replacing them.
In [24]: A.update(B)
In [25]: A
Out[25]: Counter({'d': 5, 'b': 5, 'a': 1, 'c': -1})
myDict = {}
for k in itertools.chain(A.keys(), B.keys()):
myDict[k] = A.get(k, 0)+B.get(k, 0)
The one with no extra imports!
Their is a pythonic standard called EAFP(Easier to Ask for Forgiveness than Permission). Below code is based on that python standard.
# The A and B dictionaries
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
# The final dictionary. Will contain the final outputs.
newdict = {}
# Make sure every key of A and B get into the final dictionary 'newdict'.
newdict.update(A)
newdict.update(B)
# Iterate through each key of A.
for i in A.keys():
# If same key exist on B, its values from A and B will add together and
# get included in the final dictionary 'newdict'.
try:
addition = A[i] + B[i]
newdict[i] = addition
# If current key does not exist in dictionary B, it will give a KeyError,
# catch it and continue looping.
except KeyError:
continue
EDIT: thanks to jerzyk for his improvement suggestions.
import itertools
import collections
dictA = {'a':1, 'b':2, 'c':3}
dictB = {'b':3, 'c':4, 'd':5}
new_dict = collections.defaultdict(int)
# use dict.items() instead of dict.iteritems() for Python3
for k, v in itertools.chain(dictA.iteritems(), dictB.iteritems()):
new_dict[k] += v
print dict(new_dict)
# OUTPUT
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
OR
Alternative you can use Counter as #Martijn has mentioned above.
For a more generic and extensible way check mergedict. It uses singledispatch and can merge values based on its types.
Example:
from mergedict import MergeDict
class SumDict(MergeDict):
#MergeDict.dispatch(int)
def merge_int(this, other):
return this + other
d2 = SumDict({'a': 1, 'b': 'one'})
d2.merge({'a':2, 'b': 'two'})
assert d2 == {'a': 3, 'b': 'two'}
From python 3.5: merging and summing
Thanks to #tokeinizer_fsj that told me in a comment that I didn't get completely the meaning of the question (I thought that add meant just adding keys that eventually where different in the two dictinaries and, instead, i meant that the common key values should be summed). So I added that loop before the merging, so that the second dictionary contains the sum of the common keys. The last dictionary will be the one whose values will last in the new dictionary that is the result of the merging of the two, so I thing the problem is solved. The solution is valid from python 3.5 and following versions.
a = {
"a": 1,
"b": 2,
"c": 3
}
b = {
"a": 2,
"b": 3,
"d": 5
}
# Python 3.5
for key in b:
if key in a:
b[key] = b[key] + a[key]
c = {**a, **b}
print(c)
>>> c
{'a': 3, 'b': 5, 'c': 3, 'd': 5}
Reusable code
a = {'a': 1, 'b': 2, 'c': 3}
b = {'b': 3, 'c': 4, 'd': 5}
def mergsum(a, b):
for k in b:
if k in a:
b[k] = b[k] + a[k]
c = {**a, **b}
return c
print(mergsum(a, b))
Additionally, please note a.update( b ) is 2x faster than a + b
from collections import Counter
a = Counter({'menu': 20, 'good': 15, 'happy': 10, 'bar': 5})
b = Counter({'menu': 1, 'good': 1, 'bar': 3})
%timeit a + b;
## 100000 loops, best of 3: 8.62 µs per loop
## The slowest run took 4.04 times longer than the fastest. This could mean that an intermediate result is being cached.
%timeit a.update(b)
## 100000 loops, best of 3: 4.51 µs per loop
One line solution is to use dictionary comprehension.
C = { k: A.get(k,0) + B.get(k,0) for k in list(B.keys()) + list(A.keys()) }
def merge_with(f, xs, ys):
xs = a_copy_of(xs) # dict(xs), maybe generalizable?
for (y, v) in ys.iteritems():
xs[y] = v if y not in xs else f(xs[x], v)
merge_with((lambda x, y: x + y), A, B)
You could easily generalize this:
def merge_dicts(f, *dicts):
result = {}
for d in dicts:
for (k, v) in d.iteritems():
result[k] = v if k not in result else f(result[k], v)
Then it can take any number of dicts.
This is a simple solution for merging two dictionaries where += can be applied to the values, it has to iterate over a dictionary only once
a = {'a':1, 'b':2, 'c':3}
dicts = [{'b':3, 'c':4, 'd':5},
{'c':9, 'a':9, 'd':9}]
def merge_dicts(merged,mergedfrom):
for k,v in mergedfrom.items():
if k in merged:
merged[k] += v
else:
merged[k] = v
return merged
for dct in dicts:
a = merge_dicts(a,dct)
print (a)
#{'c': 16, 'b': 5, 'd': 14, 'a': 10}
Here's yet another option using dictionary comprehensions combined with the behavior of dict():
dict3 = dict(dict1, **{ k: v + dict1.get(k, 0) for k, v in dict2.items() })
# {'a': 4, 'b': 2, 'c': 7, 'g': 1}
From https://docs.python.org/3/library/stdtypes.html#dict:
https://docs.python.org/3/library/stdtypes.html#dict
and also
If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument.
The dict comprehension
**{ k: v + dict1.get(v, 0), v in dict2.items() }
handles adding dict1[1] to v. We don't need an explicit if here because the default value for our dict1.get can be set to 0 instead.
This solution is easy to use, it is used as a normal dictionary, but you can use the sum function.
class SumDict(dict):
def __add__(self, y):
return {x: self.get(x, 0) + y.get(x, 0) for x in set(self).union(y)}
A = SumDict({'a': 1, 'c': 2})
B = SumDict({'b': 3, 'c': 4}) # Also works: B = {'b': 3, 'c': 4}
print(A + B) # OUTPUT {'a': 1, 'b': 3, 'c': 6}
The above solutions are great for the scenario where you have a small number of Counters. If you have a big list of them though, something like this is much nicer:
from collections import Counter
A = Counter({'a':1, 'b':2, 'c':3})
B = Counter({'b':3, 'c':4, 'd':5})
C = Counter({'a': 5, 'e':3})
list_of_counts = [A, B, C]
total = sum(list_of_counts, Counter())
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
The above solution is essentially summing the Counters by:
total = Counter()
for count in list_of_counts:
total += count
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
This does the same thing but I think it always helps to see what it is effectively doing underneath.
What about:
def dict_merge_and_sum( d1, d2 ):
ret = d1
ret.update({ k:v + d2[k] for k,v in d1.items() if k in d2 })
ret.update({ k:v for k,v in d2.items() if k not in d1 })
return ret
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
print( dict_merge_and_sum( A, B ) )
Output:
{'d': 5, 'a': 1, 'c': 7, 'b': 5}
More conventional way to combine two dict. Using modules and tools are good but understanding the logic behind it will help in case you don't remember the tools.
Program to combine two dictionary adding values for common keys.
def combine_dict(d1,d2):
for key,value in d1.items():
if key in d2:
d2[key] += value
else:
d2[key] = value
return d2
combine_dict({'a':1, 'b':2, 'c':3},{'b':3, 'c':4, 'd':5})
output == {'b': 5, 'c': 7, 'd': 5, 'a': 1}
Here's a very general solution. You can deal with any number of dict + keys that are only in some dict + easily use any aggregation function you want:
def aggregate_dicts(dicts, operation=sum):
"""Aggregate a sequence of dictionaries using `operation`."""
all_keys = set().union(*[el.keys() for el in dicts])
return {k: operation([dic.get(k, None) for dic in dicts]) for k in all_keys}
example:
dicts_same_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3}]
aggregate_dicts(dicts_same_keys, operation=sum)
#{'x': 3, 'y': 6}
example non-identical keys and generic aggregation:
dicts_diff_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3, 'c': 4}]
def mean_no_none(l):
l_no_none = [el for el in l if el is not None]
return sum(l_no_none) / len(l_no_none)
aggregate_dicts(dicts_diff_keys, operation=mean_no_none)
# {'x': 1.0, 'c': 4.0, 'y': 2.0}
dict1 = {'a':1, 'b':2, 'c':3}
dict2 = {'a':3, 'g':1, 'c':4}
dict3 = {} # will store new values
for x in dict1:
if x in dict2: #sum values with same key
dict3[x] = dict1[x] +dict2[x]
else: #add the values from x to dict1
dict3[x] = dict1[x]
#search for new values not in a
for x in dict2:
if x not in dict1:
dict3[x] = dict2[x]
print(dict3) # {'a': 4, 'b': 2, 'c': 7, 'g': 1}
Merging three dicts a,b,c in a single line without any other modules or libs
If we have the three dicts
a = {"a":9}
b = {"b":7}
c = {'b': 2, 'd': 90}
Merge all with a single line and return a dict object using
c = dict(a.items() + b.items() + c.items())
Returning
{'a': 9, 'b': 2, 'd': 90}