Python - Linking three dictionaries into one if keys are same [duplicate] - python

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.

Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})

assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))

This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }

dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}

If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)

Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}

Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].

Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}

There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.

def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}

If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}

Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})

d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""

From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2

Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}

dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))

A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

Related

Merging a list of dicts into one dict [duplicate]

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.
Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})
assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))
This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }
dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}
If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)
Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}
Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].
Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}
There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.
def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}
If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}
Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})
d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""
From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2
Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}
dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))
A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

Python: Merging multiple dictionaries with the same keys and different values [duplicate]

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.
Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})
assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))
This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }
dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}
If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)
Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}
Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].
Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}
There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.
def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}
If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}
Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})
d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""
From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2
Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}
dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))
A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

Deleting dictionary keys from a provided list in Python [duplicate]

I know how to remove an entry, 'key' from my dictionary d, safely. You do:
if d.has_key('key'):
del d['key']
However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.
entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
if x in d:
del d[x]
However, I was wondering if there is a smarter way to do this?
Using dict.pop:
d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
d.pop(k, None)
Using Dict Comprehensions
final_dict = {key: value for key, value in d if key not in [key1, key2]}
where key1 and key2 are to be removed.
In the example below, keys "b" and "c" are to be removed & it's kept in a keys list.
>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>>
Why not like this:
entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}
def entries_to_remove(entries, the_dict):
for key in entries:
if key in the_dict:
del the_dict[key]
A more compact version was provided by mattbornski using dict.pop()
a solution is using map and filter functions
python 2
d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)
python 3
d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)
you get:
{'c': 3}
If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:
values_removed = [d.pop(k, None) for k in entities_to_remove]
You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function's side effect.
Found a solution with pop and map
d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)
The output of this:
{'d': 'valueD'}
I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.
Update
The above code will throw an error if a key does not exist in the dict.
DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']
def remove_key(key):
DICTIONARY.pop(key, None)
list(map(remove_key, keys))
print(DICTIONARY)
output:
DICTIONARY = {'b': 'valueB', 'd': 'valueD'}
Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it's quite readable. Adding in a function doesn't cause much overhead either:
timeit results (10k iterations):
all(x.pop(v) for v in r) # 0.85
all(map(x.pop, r)) # 0.60
list(map(x.pop, r)) # 0.70
all(map(x.__delitem__, r)) # 0.44
del_all(x, r) # 0.40
<inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
"""Remove list of elements from mapping."""
for key in to_remove:
del mapping[key]
For small iterations, doing that 'inline' was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.
I have no problem with any of the existing answers, but I was surprised to not find this solution:
keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}
for k in keys_to_remove:
try:
del my_dict[k]
except KeyError:
pass
assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}
Note: I stumbled across this question coming from here. And my answer is related to this answer.
I have tested the performance of three methods:
# Method 1: `del`
for key in remove_keys:
if key in d:
del d[key]
# Method 2: `pop()`
for key in remove_keys:
d.pop(key, None)
# Method 3: comprehension
{key: v for key, v in d.items() if key not in remove_keys}
Here are the results of 1M iterations:
del: 2.03s 2.0 ns/iter (100%)
pop(): 2.38s 2.4 ns/iter (117%)
comprehension: 4.11s 4.1 ns/iter (202%)
So both del and pop() are the fastest. Comprehensions are 2x slower.
But anyway, we speak nanoseconds here :) Dicts in Python are ridiculously fast.
Why not:
entriestoremove = (2,5,1)
for e in entriestoremove:
if d.has_key(e):
del d[e]
I don't know what you mean by "smarter way". Surely there are other ways, maybe with dictionary comprehensions:
entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}
inline
import functools
#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}
entitiesToREmove = ('a', 'b', 'c')
#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)
#: python3
list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))
print(d)
# output: {'d': 'dvalue'}
I think using the fact that the keys can be treated as a set is the nicest way if you're on python 3:
def remove_keys(d, keys):
to_remove = set(keys)
filtered_keys = d.keys() - to_remove
filtered_values = map(d.get, filtered_keys)
return dict(zip(filtered_keys, filtered_values))
Example:
>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}
It would be nice to have full support for set methods for dictionaries (and not the unholy mess we're getting with Python 3.9) so that you could simply "remove" a set of keys. However, as long as that's not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I've created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.
from itertools import product
from time import perf_counter
# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))
print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000
keys = product(range(50000, 100000), range(1, 100))
# for x,y in keys:
# del cells[x, y]
for n in map(cells.pop, keys):
pass
print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")
10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:
keys = cells.keys() & keys
In summary: del is already heavily optimised, so don't worry about using it.
Another map() way to remove list of keys from dictionary
and avoid raising KeyError exception
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, keys_to_remove))
print('k=', k)
print('dic after = \n', dic)
**this will produce output**
k= ['key_not_exist', 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
Duplicate keys_to_remove is artificial, it needs to supply defaults values for dict.pop() function.
You can add here any array with len_ = len(key_to_remove)
For example
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, np.zeros(len(keys_to_remove))))
print('k=', k)
print('dic after = ', dic)
** will produce output **
k= [0.0, 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
def delete_keys_from_dict(dictionary, keys):
"""
Deletes the unwanted keys in the dictionary
:param dictionary: dict
:param keys: list of keys
:return: dict (modified)
"""
from collections.abc import MutableMapping
keys_set = set(keys)
modified_dict = {}
for key, value in dictionary.items():
if key not in keys_set:
if isinstance(value, list):
modified_dict[key] = list()
for x in value:
if isinstance(x, MutableMapping):
modified_dict[key].append(delete_keys_from_dict(x, keys_set))
else:
modified_dict[key].append(x)
elif isinstance(value, MutableMapping):
modified_dict[key] = delete_keys_from_dict(value, keys_set)
else:
modified_dict[key] = value
return modified_dict
_d = {'a': 1245, 'b': 1234325, 'c': {'a': 1245, 'b': 1234325}, 'd': 98765,
'e': [{'a': 1245, 'b': 1234325},
{'a': 1245, 'b': 1234325},
{'t': 767}]}
_output = delete_keys_from_dict(_d, ['a', 'b'])
_expected = {'c': {}, 'd': 98765, 'e': [{}, {}, {'t': 767}]}
print(_expected)
print(_output)
I'm late to this discussion but for anyone else. A solution may be to create a list of keys as such.
k = ['a','b','c','d']
Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.
new_dictionary = [dictionary.pop(x, 'n/a') for x in k]
The 'n/a' is in case the key does not exist, a default value needs to be returned.

Merge several Python dictionaries [duplicate]

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 3 months ago.
I have to merge list of python dictionary. For eg:
dicts[0] = {'a':1, 'b':2, 'c':3}
dicts[1] = {'a':1, 'd':2, 'c':'foo'}
dicts[2] = {'e':57,'c':3}
super_dict = {'a':[1], 'b':[2], 'c':[3,'foo'], 'd':[2], 'e':[57]}
I wrote the following code:
super_dict = {}
for d in dicts:
for k, v in d.items():
if super_dict.get(k) is None:
super_dict[k] = []
if v not in super_dict.get(k):
super_dict[k].append(v)
Can it be presented more elegantly / optimized?
Note
I found another question on SO but its about merging exactly 2 dictionaries.
You can iterate over the dictionaries directly -- no need to use range. The setdefault method of dict looks up a key, and returns the value if found. If not found, it returns a default, and also assigns that default to the key.
super_dict = {}
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict.setdefault(k, []).append(v)
Also, you might consider using a defaultdict. This just automates setdefault by calling a function to return a default value when a key isn't found.
import collections
super_dict = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].append(v)
Also, as Sven Marnach astutely observed, you seem to want no duplication of values in your lists. In that case, set gets you what you want:
import collections
super_dict = collections.defaultdict(set)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].add(v)
from collections import defaultdict
dicts = [{'a':1, 'b':2, 'c':3},
{'a':1, 'd':2, 'c':'foo'},
{'e':57, 'c':3} ]
super_dict = defaultdict(set) # uses set to avoid duplicates
for d in dicts:
for k, v in d.items(): # use d.iteritems() in python 2
super_dict[k].add(v)
you can use this behaviour of dict. (a bit elegant)
a = {'a':1, 'b':2, 'c':3}
b = {'d':1, 'e':2, 'f':3}
c = {1:1, 2:2, 3:3}
merge = {**a, **b, **c}
print(merge) # {'a': 1, 'b': 2, 'c': 3, 'd': 1, 'e': 2, 'f': 3, 1: 1, 2: 2, 3: 3}
and you are good to go :)
Merge the keys of all dicts, and for each key assemble the list of values:
super_dict = {}
for k in set(k for d in dicts for k in d):
super_dict[k] = [d[k] for d in dicts if k in d]
The expression set(k for d in dicts for k in d) builds a set of all unique keys of all dictionaries. For each of these unique keys, we use the list comprehension [d[k] for d in dicts if k in d] to build the list of values from all dicts for this key.
Since you only seem to one the unique value of each key, you might want to use sets instead:
super_dict = {}
for k in set(k for d in dicts for k in d):
super_dict[k] = set(d[k] for d in dicts if k in d)
It seems like most of the answers using comprehensions are not all that readable. In case any gets lost in the mess of answers above this might be helpful (although extremely late...). Just loop over the items of each dict and place them in a separate one.
super_dict = {key:val for d in dicts for key,val in d.items()}
When the value of the keys are in list:
from collections import defaultdict
dicts = [{'a':[1], 'b':[2], 'c':[3]},
{'a':[11], 'd':[2], 'c':['foo']},
{'e':[57], 'c':[3], "a": [1]} ]
super_dict = defaultdict(list) # uses set to avoid duplicates
for d in dicts:
for k, v in d.items(): # use d.iteritems() in python 2
super_dict[k] = list(set(super_dict[k] + v))
combined_dict = {}
for elem in super_dict.keys():
combined_dict[elem] = super_dict[elem]
combined_dict
## output: {'a': [1, 11], 'b': [2], 'c': [3, 'foo'], 'd': [2], 'e': [57]}
I have a very easy to go solution without any imports.
I use the dict.update() method.
But sadly it will overwrite, if same key appears in more than one dictionary, then the most recently merged dict's value will appear in the output.
dict1 = {'Name': 'Zara', 'Age': 7}
dict2 = {'Sex': 'female' }
dict3 = {'Status': 'single', 'Age': 27}
dict4 = {'Occupation':'nurse', 'Wage': 3000}
def mergedict(*args):
output = {}
for arg in args:
output.update(arg)
return output
print(mergedict(dict1, dict2, dict3, dict4))
The output is this:
{'Name': 'Zara', 'Age': 27, 'Sex': 'female', 'Status': 'single', 'Occupation': 'nurse', 'Wage': 3000}
Perhaps a more modern and concise approach for those who use python 3.3 or later versions is the use of ChainMap from the collections module.
from collections import ChainMap
d1 = {'a': 1, 'b': 3}
d2 = {'c': 2}
d3 = {'d': 7, 'a': 9}
d4 = {}
combo = dict(ChainMap(d1, d2, d3, d4))
# {'d': 7, 'a': 1, 'c': 2, 'b': 3}
For a larger collection of dict objects then star operator works
dict(ChainMap(*dict_collection))
Note that the resulting dictionary seems to only keep the value of the first key it encounters in the ordered collection and ignores any further duplicates.
This may be a bit more elegant:
super_dict = {}
for d in dicts:
for k, v in d.iteritems():
l=super_dict.setdefault(k,[])
if v not in l:
l.append(v)
UPDATE: made change suggested by Sven
UPDATE: changed to avoid duplicates (thanks Marcin and Steven)
Never forget that the standard libraries have a wealth of tools for dealing with dicts and iteration:
from itertools import chain
from collections import defaultdict
super_dict = defaultdict(list)
for k,v in chain.from_iterable(d.iteritems() for d in dicts):
if v not in super_dict[k]: super_dict[k].append(v)
Note that the if v not in super_dict[k] can be avoided by using defaultdict(set) as per Steven Rumbalski's answer.
If you assume that the keys in which you are interested are at the same nested level, you can recursively traverse each dictionary and create a new dictionary using that key, effectively merging them.
merged = {}
for d in dicts:
def walk(d,merge):
for key, item in d.items():
if isinstance(item, dict):
merge.setdefault(key, {})
walk(item, merge[key])
else:
merge.setdefault(key, [])
merge[key].append(item)
walk(d,merged)
For example, say you have the following dictionaries you want to merge.
dicts = [{'A': {'A1': {'FOO': [1,2,3]}}},
{'A': {'A1': {'A2': {'BOO': [4,5,6]}}}},
{'A': {'A1': {'FOO': [7,8]}}},
{'B': {'B1': {'COO': [9]}}},
{'B': {'B2': {'DOO': [10,11,12]}}},
{'C': {'C1': {'C2': {'POO':[13,14,15]}}}},
{'C': {'C1': {'ROO': [16,17]}}}]
Using the key at each level, you should get something like this:
{'A': {'A1': {'FOO': [[1, 2, 3], [7, 8]],
'A2': {'BOO': [[4, 5, 6]]}}},
'B': {'B1': {'COO': [[9]]},
'B2': {'DOO': [[10, 11, 12]]}},
'C': {'C1': {'C2': {'POO': [[13, 14, 15]]},
'ROO': [[16, 17]]}}}
Note: I assume the leaf at each branch is a list of some kind, but you can obviously change the logic to do whatever is necessary for your situation.
This is a more recent enhancement over the prior answer by ElbowPipe, using newer syntax introduced in Python 3.9 for merging dictionaries. Note that this answer does not merge conflicting values into a list!
> import functools
> import operator
> functools.reduce(operator.or_, [{0:1}, {2:3, 4:5}, {2:6}])
{0: 1, 2: 6, 4: 5}
For a oneliner, the following could be used:
{key: {d[key] for d in dicts if key in d} for key in {key for d in dicts for key in d}}
although readibility would benefit from naming the combined key set:
combined_key_set = {key for d in dicts for key in d}
super_dict = {key: {d[key] for d in dicts if key in d} for key in combined_key_set}
Elegance can be debated but personally I prefer comprehensions over for loops. :)
(The dictionary and set comprehensions are available in Python 2.7/3.1 and newer.)
python 3.x (reduce is builtin for python 2.x, so no need to import if in 2.x)
import operator
from functools import operator.add
a = [{'a': 1}, {'b': 2}, {'c': 3, 'd': 4}]
dict(reduce(operator.add, map(list,(map(dict.items, a))))
map(dict.items, a) # converts to list of key, value iterators
map(list, ... # converts to iterator equivalent of [[[a, 1]], [[b, 2]], [[c, 3],[d,4]]]
reduce(operator.add, ... # reduces the multiple list down to a single list
My solution is similar to #senderle proposed, but instead of for loop I used map
super_dict = defaultdict(set)
map(lambda y: map(lambda x: super_dict[x].add(y[x]), y), dicts)
The use of defaultdict is good, this also can be done with the use of itertools.groupby.
import itertools
# output all dict items, and sort them by key
dicts_ele = sorted( ( item for d in dicts for item in d.items() ), key = lambda x: x[0] )
# groups items by key
ele_groups = itertools.groupby( dicts_ele, key = lambda x: x[0] )
# iterates over groups and get item value
merged = { k: set( v[1] for v in grouped ) for k, grouped in ele_groups }
and obviously, you can merge this block of code into one-line style
merged = {
k: set( v[1] for v in grouped )
for k, grouped in (
itertools.groupby(
sorted(
( item for d in dicts for item in d.items() ),
key = lambda x: x[0]
),
key = lambda x: x[0]
)
)
}
I'm a bit late to the game but I did it in 2 lines with no dependencies beyond python itself:
flatten = lambda *c: (b for a in c for b in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))
o = reduce(lambda d1,d2: dict((k, list(flatten([d1.get(k), d2.get(k)]))) for k in set(d1.keys() + d2.keys())), dicts)
# output:
# {'a': [1, 1, None], 'c': [3, 'foo', 3], 'b': [2, None, None], 'e': [None, 57], 'd': [None, 2, None]}
Though if you don't care about nested lists, then:
o2 = reduce(lambda d1,d2: dict((k, [d1.get(k), d2.get(k)]) for k in set(d1.keys() + d2.keys())), dicts)
# output:
# {'a': [[1, 1], None], 'c': [[3, 'foo'], 3], 'b': [[2, None], None], 'e': [None, 57], 'd': [[None, 2], None]}

Removing multiple keys from a dictionary safely

I know how to remove an entry, 'key' from my dictionary d, safely. You do:
if d.has_key('key'):
del d['key']
However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.
entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
if x in d:
del d[x]
However, I was wondering if there is a smarter way to do this?
Using dict.pop:
d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
d.pop(k, None)
Using Dict Comprehensions
final_dict = {key: value for key, value in d if key not in [key1, key2]}
where key1 and key2 are to be removed.
In the example below, keys "b" and "c" are to be removed & it's kept in a keys list.
>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>>
Why not like this:
entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}
def entries_to_remove(entries, the_dict):
for key in entries:
if key in the_dict:
del the_dict[key]
A more compact version was provided by mattbornski using dict.pop()
a solution is using map and filter functions
python 2
d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)
python 3
d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)
you get:
{'c': 3}
If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:
values_removed = [d.pop(k, None) for k in entities_to_remove]
You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function's side effect.
Found a solution with pop and map
d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)
The output of this:
{'d': 'valueD'}
I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.
Update
The above code will throw an error if a key does not exist in the dict.
DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']
def remove_key(key):
DICTIONARY.pop(key, None)
list(map(remove_key, keys))
print(DICTIONARY)
output:
DICTIONARY = {'b': 'valueB', 'd': 'valueD'}
Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it's quite readable. Adding in a function doesn't cause much overhead either:
timeit results (10k iterations):
all(x.pop(v) for v in r) # 0.85
all(map(x.pop, r)) # 0.60
list(map(x.pop, r)) # 0.70
all(map(x.__delitem__, r)) # 0.44
del_all(x, r) # 0.40
<inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
"""Remove list of elements from mapping."""
for key in to_remove:
del mapping[key]
For small iterations, doing that 'inline' was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.
I have no problem with any of the existing answers, but I was surprised to not find this solution:
keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}
for k in keys_to_remove:
try:
del my_dict[k]
except KeyError:
pass
assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}
Note: I stumbled across this question coming from here. And my answer is related to this answer.
I have tested the performance of three methods:
# Method 1: `del`
for key in remove_keys:
if key in d:
del d[key]
# Method 2: `pop()`
for key in remove_keys:
d.pop(key, None)
# Method 3: comprehension
{key: v for key, v in d.items() if key not in remove_keys}
Here are the results of 1M iterations:
del: 2.03s 2.0 ns/iter (100%)
pop(): 2.38s 2.4 ns/iter (117%)
comprehension: 4.11s 4.1 ns/iter (202%)
So both del and pop() are the fastest. Comprehensions are 2x slower.
But anyway, we speak nanoseconds here :) Dicts in Python are ridiculously fast.
Why not:
entriestoremove = (2,5,1)
for e in entriestoremove:
if d.has_key(e):
del d[e]
I don't know what you mean by "smarter way". Surely there are other ways, maybe with dictionary comprehensions:
entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}
inline
import functools
#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}
entitiesToREmove = ('a', 'b', 'c')
#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)
#: python3
list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))
print(d)
# output: {'d': 'dvalue'}
I think using the fact that the keys can be treated as a set is the nicest way if you're on python 3:
def remove_keys(d, keys):
to_remove = set(keys)
filtered_keys = d.keys() - to_remove
filtered_values = map(d.get, filtered_keys)
return dict(zip(filtered_keys, filtered_values))
Example:
>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}
It would be nice to have full support for set methods for dictionaries (and not the unholy mess we're getting with Python 3.9) so that you could simply "remove" a set of keys. However, as long as that's not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I've created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.
from itertools import product
from time import perf_counter
# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))
print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000
keys = product(range(50000, 100000), range(1, 100))
# for x,y in keys:
# del cells[x, y]
for n in map(cells.pop, keys):
pass
print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")
10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:
keys = cells.keys() & keys
In summary: del is already heavily optimised, so don't worry about using it.
Another map() way to remove list of keys from dictionary
and avoid raising KeyError exception
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, keys_to_remove))
print('k=', k)
print('dic after = \n', dic)
**this will produce output**
k= ['key_not_exist', 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
Duplicate keys_to_remove is artificial, it needs to supply defaults values for dict.pop() function.
You can add here any array with len_ = len(key_to_remove)
For example
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, np.zeros(len(keys_to_remove))))
print('k=', k)
print('dic after = ', dic)
** will produce output **
k= [0.0, 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
def delete_keys_from_dict(dictionary, keys):
"""
Deletes the unwanted keys in the dictionary
:param dictionary: dict
:param keys: list of keys
:return: dict (modified)
"""
from collections.abc import MutableMapping
keys_set = set(keys)
modified_dict = {}
for key, value in dictionary.items():
if key not in keys_set:
if isinstance(value, list):
modified_dict[key] = list()
for x in value:
if isinstance(x, MutableMapping):
modified_dict[key].append(delete_keys_from_dict(x, keys_set))
else:
modified_dict[key].append(x)
elif isinstance(value, MutableMapping):
modified_dict[key] = delete_keys_from_dict(value, keys_set)
else:
modified_dict[key] = value
return modified_dict
_d = {'a': 1245, 'b': 1234325, 'c': {'a': 1245, 'b': 1234325}, 'd': 98765,
'e': [{'a': 1245, 'b': 1234325},
{'a': 1245, 'b': 1234325},
{'t': 767}]}
_output = delete_keys_from_dict(_d, ['a', 'b'])
_expected = {'c': {}, 'd': 98765, 'e': [{}, {}, {'t': 767}]}
print(_expected)
print(_output)
I'm late to this discussion but for anyone else. A solution may be to create a list of keys as such.
k = ['a','b','c','d']
Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.
new_dictionary = [dictionary.pop(x, 'n/a') for x in k]
The 'n/a' is in case the key does not exist, a default value needs to be returned.

Categories

Resources