Find unique (key: value) pair given N dictionaries in python - python

I would like to find an easy and/or fast way to find all common couple (pair: value) given N dictionaries in python. (3.X would be best)
PROBLEM
Given a set of 3 dicts (but it could be any dict, it is just for the example)
n1 = {'a': 1, 'b': 2, 'c': 3}
n2 = {'a': 1, 'b': 4, 'c': 3, 'd': 4}
n3 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
The result for common (key: values) for n1, n2 and n3
should be:
({'a': 1, 'c': 3})
And for n2 and n3 it should be
({'a': 1, 'c': 3, 'd': 4})
I first though about using a brute force algorithm that will check every pair (key: value) for every dict
Here is a implementation using a recursive algorithm
SOLUTION A
list_dict = [n1, n2, n3]
def finding_uniquness(ls):
def recursion(ls, result):
if not ls:
return result
result = {k: v for k, v in result.items() for k1, v1 in ls[0].items() if k == k1 and v == v1}
return recursion(ls[1:], result)
return recursion(ls[1:], ls[0])
finding_uniquness(list_dict)
# {'c': 3, 'a': 1}
But it is not easily understandable and the complexity is high
(I'm not sure how to calculate complexity; but since we compare all the elements on all dict, the complexity should be O(N²)?)
Then, I though about Sets. because it could naturally compare all the element
SOLUTION B
import functools
list_dict = [n1, n2, n3]
set_list = [set(n.items()) for n in list_dict]
functools.reduce(lambda x, y: x & y, set_list)
# {('a', 1), ('c', 3)}
It is so much better than the previous solution, unfortunately, when one of the key have a list as values it throws an error:
>>> n = {'a': [], 'b': 2, 'c': 3}
>>> set(n.items())
TypeError: unhashable type: 'list'
My question is then double:
is there any better algorithm than SOLUTION A?
or is there a way to avoid the TypeError with SOLUTION B?
of course, any other remarks will be welcome.

Simpler and more efficient way:
>>> {k: v
for k, v in list_dict[0].items()
if all(k in d and d[k] == v
for d in list_dict[1:])}
{'c': 3, 'a': 1}
Using an extra variable for list_dict[1:] might be beneficial, otherwise the short-circuiting of all somewhat goes to waste. Or if you don't need the list afterwards you could just pop the "master" dictionary:
>>> {k: v
for k, v in list_dict.pop().items()
if all(k in d and d[k] == v
for d in list_dict)}
{'c': 3, 'a': 1}
Or using get with a default that cannot be in the dictionary as suggested by #Jean-FrançoisFabre:
>>> marker = object()
>>> {k: v
for k, v in list_dict.pop().items()
if all(d.get(k, marker) == v
for d in list_dict)}
{'c': 3, 'a': 1}

If unhashable values are a problem you can always compute the intersection of the keys up-front by using .keys() and then compare only the values associated with the keys that all dictionaries have in common:
import operator as op
from functools import reduce
common_keys = reduce(op.and_, (d.keys() for d in my_dicts))
common_items = {}
for key in common_keys:
value = my_dicts[0][key]
if all(d[key] == value for d in my_dicts):
common_items[key] = value
This should be pretty faster than solution a, slower than solution b, but works on all inputs.

A batteries-included version.
To handle unhashable types, we use pickling; replace it with dill or json or any other predictable serialization to taste.
import collections
import itertools
import pickle
def findCommonPairs(dicts):
all_pairs = itertools.chain(*[d.items() for d in dicts])
cnt = collections.Counter(map(pickle.dumps, all_pairs))
return [pickle.loads(pickled_pair)
for pickled_pair, count in cnt.items()
if count == len(dicts)]
>>> findCommonPairs([n1, n2, n3])
[('a', 1), ('c', 3)]
>>> findCommonPairs([{'a': [1,2], 'b': [2,3]}, {'a': [1,2]}])
[('a', [1, 2])]
Note that serialization only goes so far. To properly compare dicts if dicts, for instance, these dicts must be turned into (key, value) pairs and sorted before serialization. Any structures that reference each other may have issues (or not). Replace pickling with a custom predictable serializer if you care about these issues.

Related

Merging a list of dicts into one dict [duplicate]

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.
Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})
assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))
This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }
dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}
If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)
Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}
Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].
Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}
There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.
def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}
If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}
Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})
d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""
From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2
Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}
dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))
A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

Python: Merging multiple dictionaries with the same keys and different values [duplicate]

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.
Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})
assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))
This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }
dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}
If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)
Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}
Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].
Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}
There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.
def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}
If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}
Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})
d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""
From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2
Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}
dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))
A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

How can I find dict keys for matching values in two dicts?

I have two dictionaries mapping IDs to values. For simplicity, lets say those are the dictionaries:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
As named, the dictionaries are not symmetrical.
I would like to get a dictionary of keys from dictionaries d_source and d_target whose values match. The resulting dictionary would have d_source keys as its own keys, and d_target keys as that keys value (in either a list, tuple or set format).
This would be The expected returned value for the above example should be the following list:
{'a': ('1', 'A'),
'b': ('B',),
'c': ('C',),
'3': ('C',)}
There are two somewhat similar questions, but those solutions can't be easily applied to my question.
Some characteristics of the data:
Source would usually be smaller than target. Having roughly few thousand sources (tops) and a magnitude more targets.
Duplicates in the same dict (both d_source and d_target) are not too likely on values.
matches are expected to be found for (a rough estimate) not more than 50% than d_source items.
All keys are integers.
What is the best (performance wise) solution to this problem?
Modeling data into other datatypes for improved performance is totally ok, even when using third party libraries (i'm thinking numpy)
All answers have O(n^2) efficiency which isn't very good so I thought of answering myself.
I use 2(source_len) + 2(dict_count)(dict_len) memory and I have O(2n) efficiency which is the best you can get here I believe.
Here you go:
from collections import defaultdict
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def merge_dicts(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = merge_dicts(d_source, d_target)
By the way, I'm using a tuple in order not to link the resulting lists together.
As you've added specifications for the data, here's a closer matching solution:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def second_merge_dicts(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = second_merge_dicts(d_source, d_target)
from collections import defaultdict
from pprint import pprint
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
d_result = defaultdict(list)
{d_result[a].append(b) for a in d_source for b in d_target if d_source[a] == d_target[b]}
pprint(d_result)
Output:
{'3': ['C'],
'a': ['A', '1'],
'b': ['B'],
'c': ['C']}
Timing results:
from collections import defaultdict
from copy import deepcopy
from random import randint
from timeit import timeit
def Craig_match(source, target):
result = defaultdict(list)
{result[a].append(b) for a in source for b in target if source[a] == target[b]}
return result
def Bharel_match(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
def modified_Bharel_match(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
# generate source, target such that:
# a) ~10% duplicate values in source and target
# b) 2000 unique source keys, 20000 unique target keys
# c) a little less than 50% matches source value to target value
# d) numeric keys and values
source = {}
for k in range(2000):
source[k] = randint(0, 1800)
target = {}
for k in range(20000):
if k < 1000:
target[k] = randint(0, 2000)
else:
target[k] = randint(2000, 19000)
best_time = {}
approaches = ('Craig', 'Bharel', 'modified_Bharel')
for a in approaches:
best_time[a] = None
for _ in range(3):
for approach in approaches:
test_source = deepcopy(source)
test_target = deepcopy(target)
statement = 'd=' + approach + '_match(test_source,test_target)'
setup = 'from __main__ import test_source, test_target, ' + approach + '_match'
t = timeit(stmt=statement, setup=setup, number=1)
if not best_time[approach] or (t < best_time[approach]):
best_time[approach] = t
for approach in approaches:
print(approach, ':', '%0.5f' % best_time[approach])
Output:
Craig : 7.29259
Bharel : 0.01587
modified_Bharel : 0.00682
Here is another solution. There are a lot of ways to do this
for key1 in d1:
for key2 in d2:
if d1[key1] == d2[key2]:
stuff
Note that you can use any name for key1 and key2.
This maybe "cheating" in some regards, although if you are looking for the matching values of the keys regardless of the case sensitivity then you might be able to do:
import sets
aa = {'a': 1, 'b': 2, 'c':3}
bb = {'A': 1, 'B': 2, 'd': 3}
bbl = {k.lower():v for k,v in bb.items()}
result = {k:k.upper() for k,v in aa.iteritems() & bbl.viewitems()}
print( result )
Output:
{'a': 'A', 'b': 'B'}
The bbl declaration changes the bb keys into lowercase (it could be either aa, or bb).
* I only tested this on my phone, so just throwing this idea out there I suppose... Also, you've changed your question radically since I began composing my answer, so you get what you get.
It is up to you to determine the best solution. Here is a solution:
def dicts_to_tuples(*dicts):
result = {}
for d in dicts:
for k,v in d.items():
result.setdefault(v, []).append(k)
return [tuple(v) for v in result.values() if len(v) > 1]
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'A': 1, 'B': 2}
print dicts_to_tuples(d1, d2)

Python - Linking three dictionaries into one if keys are same [duplicate]

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.
Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})
assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))
This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }
dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}
If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)
Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}
Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].
Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}
There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.
def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}
If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}
Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})
d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""
From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2
Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}
dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))
A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?

For example I have two dicts:
Dict A: {'a': 1, 'b': 2, 'c': 3}
Dict B: {'b': 3, 'c': 4, 'd': 5}
I need a pythonic way of 'combining' two dicts such that the result is:
{'a': 1, 'b': 5, 'c': 7, 'd': 5}
That is to say: if a key appears in both dicts, add their values, if it appears in only one dict, keep its value.
Use collections.Counter:
>>> from collections import Counter
>>> A = Counter({'a':1, 'b':2, 'c':3})
>>> B = Counter({'b':3, 'c':4, 'd':5})
>>> A + B
Counter({'c': 7, 'b': 5, 'd': 5, 'a': 1})
Counters are basically a subclass of dict, so you can still do everything else with them you'd normally do with that type, such as iterate over their keys and values.
A more generic solution, which works for non-numeric values as well:
a = {'a': 'foo', 'b':'bar', 'c': 'baz'}
b = {'a': 'spam', 'c':'ham', 'x': 'blah'}
r = dict(a.items() + b.items() +
[(k, a[k] + b[k]) for k in set(b) & set(a)])
or even more generic:
def combine_dicts(a, b, op=operator.add):
return dict(a.items() + b.items() +
[(k, op(a[k], b[k])) for k in set(b) & set(a)])
For example:
>>> a = {'a': 2, 'b':3, 'c':4}
>>> b = {'a': 5, 'c':6, 'x':7}
>>> import operator
>>> print combine_dicts(a, b, operator.mul)
{'a': 10, 'x': 7, 'c': 24, 'b': 3}
>>> A = {'a':1, 'b':2, 'c':3}
>>> B = {'b':3, 'c':4, 'd':5}
>>> c = {x: A.get(x, 0) + B.get(x, 0) for x in set(A).union(B)}
>>> print(c)
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
Intro:
There are the (probably) best solutions. But you have to know it and remember it and sometimes you have to hope that your Python version isn't too old or whatever the issue could be.
Then there are the most 'hacky' solutions. They are great and short but sometimes are hard to understand, to read and to remember.
There is, though, an alternative which is to to try to reinvent the wheel.
- Why reinventing the wheel?
- Generally because it's a really good way to learn (and sometimes just because the already-existing tool doesn't do exactly what you would like and/or the way you would like it) and the easiest way if you don't know or don't remember the perfect tool for your problem.
So, I propose to reinvent the wheel of the Counter class from the collections module (partially at least):
class MyDict(dict):
def __add__(self, oth):
r = self.copy()
try:
for key, val in oth.items():
if key in r:
r[key] += val # You can custom it here
else:
r[key] = val
except AttributeError: # In case oth isn't a dict
return NotImplemented # The convention when a case isn't handled
return r
a = MyDict({'a':1, 'b':2, 'c':3})
b = MyDict({'b':3, 'c':4, 'd':5})
print(a+b) # Output {'a':1, 'b': 5, 'c': 7, 'd': 5}
There would probably others way to implement that and there are already tools to do that but it's always nice to visualize how things would basically works.
Definitely summing the Counter()s is the most pythonic way to go in such cases but only if it results in a positive value. Here is an example and as you can see there is no c in result after negating the c's value in B dictionary.
In [1]: from collections import Counter
In [2]: A = Counter({'a':1, 'b':2, 'c':3})
In [3]: B = Counter({'b':3, 'c':-4, 'd':5})
In [4]: A + B
Out[4]: Counter({'d': 5, 'b': 5, 'a': 1})
That's because Counters were primarily designed to work with positive integers to represent running counts (negative count is meaningless). But to help with those use cases,python documents the minimum range and type restrictions as follows:
The Counter class itself is a dictionary
subclass with no restrictions on its keys and values. The values are
intended to be numbers representing counts, but you could store
anything in the value field.
The most_common() method requires only
that the values be orderable.
For in-place operations such as c[key]
+= 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are
supported. The same is also true for update() and subtract() which
allow negative and zero values for both inputs and outputs.
The multiset methods are designed only for use cases with positive values.
The inputs may be negative or zero, but only outputs with positive
values are created. There are no type restrictions, but the value type
needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
So for getting around that problem after summing your Counter you can use Counter.update in order to get the desire output. It works like dict.update() but adds counts instead of replacing them.
In [24]: A.update(B)
In [25]: A
Out[25]: Counter({'d': 5, 'b': 5, 'a': 1, 'c': -1})
myDict = {}
for k in itertools.chain(A.keys(), B.keys()):
myDict[k] = A.get(k, 0)+B.get(k, 0)
The one with no extra imports!
Their is a pythonic standard called EAFP(Easier to Ask for Forgiveness than Permission). Below code is based on that python standard.
# The A and B dictionaries
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
# The final dictionary. Will contain the final outputs.
newdict = {}
# Make sure every key of A and B get into the final dictionary 'newdict'.
newdict.update(A)
newdict.update(B)
# Iterate through each key of A.
for i in A.keys():
# If same key exist on B, its values from A and B will add together and
# get included in the final dictionary 'newdict'.
try:
addition = A[i] + B[i]
newdict[i] = addition
# If current key does not exist in dictionary B, it will give a KeyError,
# catch it and continue looping.
except KeyError:
continue
EDIT: thanks to jerzyk for his improvement suggestions.
import itertools
import collections
dictA = {'a':1, 'b':2, 'c':3}
dictB = {'b':3, 'c':4, 'd':5}
new_dict = collections.defaultdict(int)
# use dict.items() instead of dict.iteritems() for Python3
for k, v in itertools.chain(dictA.iteritems(), dictB.iteritems()):
new_dict[k] += v
print dict(new_dict)
# OUTPUT
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
OR
Alternative you can use Counter as #Martijn has mentioned above.
For a more generic and extensible way check mergedict. It uses singledispatch and can merge values based on its types.
Example:
from mergedict import MergeDict
class SumDict(MergeDict):
#MergeDict.dispatch(int)
def merge_int(this, other):
return this + other
d2 = SumDict({'a': 1, 'b': 'one'})
d2.merge({'a':2, 'b': 'two'})
assert d2 == {'a': 3, 'b': 'two'}
From python 3.5: merging and summing
Thanks to #tokeinizer_fsj that told me in a comment that I didn't get completely the meaning of the question (I thought that add meant just adding keys that eventually where different in the two dictinaries and, instead, i meant that the common key values should be summed). So I added that loop before the merging, so that the second dictionary contains the sum of the common keys. The last dictionary will be the one whose values will last in the new dictionary that is the result of the merging of the two, so I thing the problem is solved. The solution is valid from python 3.5 and following versions.
a = {
"a": 1,
"b": 2,
"c": 3
}
b = {
"a": 2,
"b": 3,
"d": 5
}
# Python 3.5
for key in b:
if key in a:
b[key] = b[key] + a[key]
c = {**a, **b}
print(c)
>>> c
{'a': 3, 'b': 5, 'c': 3, 'd': 5}
Reusable code
a = {'a': 1, 'b': 2, 'c': 3}
b = {'b': 3, 'c': 4, 'd': 5}
def mergsum(a, b):
for k in b:
if k in a:
b[k] = b[k] + a[k]
c = {**a, **b}
return c
print(mergsum(a, b))
Additionally, please note a.update( b ) is 2x faster than a + b
from collections import Counter
a = Counter({'menu': 20, 'good': 15, 'happy': 10, 'bar': 5})
b = Counter({'menu': 1, 'good': 1, 'bar': 3})
%timeit a + b;
## 100000 loops, best of 3: 8.62 µs per loop
## The slowest run took 4.04 times longer than the fastest. This could mean that an intermediate result is being cached.
%timeit a.update(b)
## 100000 loops, best of 3: 4.51 µs per loop
One line solution is to use dictionary comprehension.
C = { k: A.get(k,0) + B.get(k,0) for k in list(B.keys()) + list(A.keys()) }
def merge_with(f, xs, ys):
xs = a_copy_of(xs) # dict(xs), maybe generalizable?
for (y, v) in ys.iteritems():
xs[y] = v if y not in xs else f(xs[x], v)
merge_with((lambda x, y: x + y), A, B)
You could easily generalize this:
def merge_dicts(f, *dicts):
result = {}
for d in dicts:
for (k, v) in d.iteritems():
result[k] = v if k not in result else f(result[k], v)
Then it can take any number of dicts.
This is a simple solution for merging two dictionaries where += can be applied to the values, it has to iterate over a dictionary only once
a = {'a':1, 'b':2, 'c':3}
dicts = [{'b':3, 'c':4, 'd':5},
{'c':9, 'a':9, 'd':9}]
def merge_dicts(merged,mergedfrom):
for k,v in mergedfrom.items():
if k in merged:
merged[k] += v
else:
merged[k] = v
return merged
for dct in dicts:
a = merge_dicts(a,dct)
print (a)
#{'c': 16, 'b': 5, 'd': 14, 'a': 10}
Here's yet another option using dictionary comprehensions combined with the behavior of dict():
dict3 = dict(dict1, **{ k: v + dict1.get(k, 0) for k, v in dict2.items() })
# {'a': 4, 'b': 2, 'c': 7, 'g': 1}
From https://docs.python.org/3/library/stdtypes.html#dict:
https://docs.python.org/3/library/stdtypes.html#dict
and also
If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument.
The dict comprehension
**{ k: v + dict1.get(v, 0), v in dict2.items() }
handles adding dict1[1] to v. We don't need an explicit if here because the default value for our dict1.get can be set to 0 instead.
This solution is easy to use, it is used as a normal dictionary, but you can use the sum function.
class SumDict(dict):
def __add__(self, y):
return {x: self.get(x, 0) + y.get(x, 0) for x in set(self).union(y)}
A = SumDict({'a': 1, 'c': 2})
B = SumDict({'b': 3, 'c': 4}) # Also works: B = {'b': 3, 'c': 4}
print(A + B) # OUTPUT {'a': 1, 'b': 3, 'c': 6}
The above solutions are great for the scenario where you have a small number of Counters. If you have a big list of them though, something like this is much nicer:
from collections import Counter
A = Counter({'a':1, 'b':2, 'c':3})
B = Counter({'b':3, 'c':4, 'd':5})
C = Counter({'a': 5, 'e':3})
list_of_counts = [A, B, C]
total = sum(list_of_counts, Counter())
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
The above solution is essentially summing the Counters by:
total = Counter()
for count in list_of_counts:
total += count
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
This does the same thing but I think it always helps to see what it is effectively doing underneath.
What about:
def dict_merge_and_sum( d1, d2 ):
ret = d1
ret.update({ k:v + d2[k] for k,v in d1.items() if k in d2 })
ret.update({ k:v for k,v in d2.items() if k not in d1 })
return ret
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
print( dict_merge_and_sum( A, B ) )
Output:
{'d': 5, 'a': 1, 'c': 7, 'b': 5}
More conventional way to combine two dict. Using modules and tools are good but understanding the logic behind it will help in case you don't remember the tools.
Program to combine two dictionary adding values for common keys.
def combine_dict(d1,d2):
for key,value in d1.items():
if key in d2:
d2[key] += value
else:
d2[key] = value
return d2
combine_dict({'a':1, 'b':2, 'c':3},{'b':3, 'c':4, 'd':5})
output == {'b': 5, 'c': 7, 'd': 5, 'a': 1}
Here's a very general solution. You can deal with any number of dict + keys that are only in some dict + easily use any aggregation function you want:
def aggregate_dicts(dicts, operation=sum):
"""Aggregate a sequence of dictionaries using `operation`."""
all_keys = set().union(*[el.keys() for el in dicts])
return {k: operation([dic.get(k, None) for dic in dicts]) for k in all_keys}
example:
dicts_same_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3}]
aggregate_dicts(dicts_same_keys, operation=sum)
#{'x': 3, 'y': 6}
example non-identical keys and generic aggregation:
dicts_diff_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3, 'c': 4}]
def mean_no_none(l):
l_no_none = [el for el in l if el is not None]
return sum(l_no_none) / len(l_no_none)
aggregate_dicts(dicts_diff_keys, operation=mean_no_none)
# {'x': 1.0, 'c': 4.0, 'y': 2.0}
dict1 = {'a':1, 'b':2, 'c':3}
dict2 = {'a':3, 'g':1, 'c':4}
dict3 = {} # will store new values
for x in dict1:
if x in dict2: #sum values with same key
dict3[x] = dict1[x] +dict2[x]
else: #add the values from x to dict1
dict3[x] = dict1[x]
#search for new values not in a
for x in dict2:
if x not in dict1:
dict3[x] = dict2[x]
print(dict3) # {'a': 4, 'b': 2, 'c': 7, 'g': 1}
Merging three dicts a,b,c in a single line without any other modules or libs
If we have the three dicts
a = {"a":9}
b = {"b":7}
c = {'b': 2, 'd': 90}
Merge all with a single line and return a dict object using
c = dict(a.items() + b.items() + c.items())
Returning
{'a': 9, 'b': 2, 'd': 90}

Categories

Resources