How to sum dict elements - python

In Python,
I have list of dicts:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
I want one final dict that will contain the sum of all dicts.
I.e the result will be: {'a':5, 'b':7}
N.B: every dict in the list will contain same number of key, value pairs.

You can use the collections.Counter
counter = collections.Counter()
for d in dict1:
counter.update(d)
Or, if you prefer oneliners:
functools.reduce(operator.add, map(collections.Counter, dict1))

A little ugly, but a one-liner:
dictf = reduce(lambda x, y: dict((k, v + y[k]) for k, v in x.iteritems()), dict1)

Leveraging sum() should get better performance when adding more than a few dicts
>>> dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
>>> from operator import itemgetter
>>> {k:sum(map(itemgetter(k), dict1)) for k in dict1[0]} # Python2.7+
{'a': 5, 'b': 7}
>>> dict((k,sum(map(itemgetter(k), dict1))) for k in dict1[0]) # Python2.6
{'a': 5, 'b': 7}
adding Stephan's suggestion
>>> {k: sum(d[k] for d in dict1) for k in dict1[0]} # Python2.7+
{'a': 5, 'b': 7}
>>> dict((k, sum(d[k] for d in dict1)) for k in dict1[0]) # Python2.6
{'a': 5, 'b': 7}
I think Stephan's version of the Python2.7 code reads really nicely

This might help:
def sum_dict(d1, d2):
for key, value in d1.items():
d1[key] = value + d2.get(key, 0)
return d1
>>> dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
>>> reduce(sum_dict, dict1)
{'a': 5, 'b': 7}

The following code shows one way to do it:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
final = {}
for k in dict1[0].keys(): # Init all elements to zero.
final[k] = 0
for d in dict1:
for k in d.keys():
final[k] = final[k] + d[k] # Update the element.
print final
This outputs:
{'a': 5, 'b': 7}
as you desired.
Or, as inspired by kriss, better but still readable:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
final = {}
for d in dict1:
for k in d.keys():
final[k] = final.get(k,0) + d[k]
print final
I pine for the days of the original, readable Python :-)

I was interested in the performance of the proposed Counter, reduce and sum methods for large lists. Maybe someone else is interested in this as well.
You can have a look here: https://gist.github.com/torstenrudolf/277e98df296f23ff921c
I tested the three methods for this list of dictionaries:
dictList = [{'a': x, 'b': 2*x, 'c': x**2} for x in xrange(10000)]
the sum method showed the best performance, followed by reduce and Counter was the slowest. The time showed below is in seconds.
In [34]: test(dictList)
Out[34]:
{'counter': 0.01955194902420044,
'reduce': 0.006518083095550537,
'sum': 0.0018319153785705566}
But this is dependent on the number of elements in the dictionaries. the sum method will slow down faster than the reduce.
l = [{y: x*y for y in xrange(100)} for x in xrange(10000)]
In [37]: test(l, num=100)
Out[37]:
{'counter': 0.2401433277130127,
'reduce': 0.11110662937164306,
'sum': 0.2256883692741394}

You can also use the pandas sum function to compute the sum:
import pandas as pd
# create a DataFrame
df = pd.DataFrame(dict1)
# compute the sum and convert to dict.
dict(df.sum())
This results in:
{'a': 5, 'b': 7}
It also works for floating points:
dict2 = [{'a':2, 'b':3.3},{'a':3, 'b':4.5}]
dict(pd.DataFrame(dict2).sum())
Gives the correct results:
{'a': 5.0, 'b': 7.8}

In Python 2.7 you can replace the dict with a collections.Counter object. This supports addition and subtraction of Counters.

Here is a reasonable beatiful one.
final = {}
for k in dict1[0].Keys():
final[k] = sum(x[k] for x in dict1)
return final

Here is another working solution (python3), quite general as it works for dict, lists, arrays. For non-common elements, the original value will be included in the output dict.
def mergsum(a, b):
for k in b:
if k in a:
b[k] = b[k] + a[k]
c = {**a, **b}
return c
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
print(mergsum(dict1[0], dict1[1]))

One further one line solution
dict(
functools.reduce(
lambda x, y: x.update(y) or x, # update, returns None, and we need to chain.
dict1,
collections.Counter())
)
This creates only one counter, uses it as an accumulator and finally converts back to a dict.

Related

Python convert dict of lists to dict of sets?

I have:
myDict = {'a': [1,2,3], 'b':[4,5,6], 'c':[7,8,9]}
I want:
myDict = {'a': set([1,2,3]), 'b':set([4,5,6]), 'c':set([7,8,9])}
Is there a one-liner I can use to do this rather than looping through it and converting the type of the values?
You'll have to loop anyway:
{key: set(value) for key, value in yourData.items()}
If you're using Python 3.6+, you can also do this:
dict(zip(myDict.keys(), map(set, myDict.values())))
This can be done with map by mapping the values to type set
myDict = dict(map(lambda x: (x[0], set(x[1])), myDict.items()))
Or with either version of dictionary comprehension as well
myDict = {k: set(v) for k, v in myDict.items()}
myDict = {k: set(myDict[k]) for k in myDict}
You can use comprehension for it:
Basically, loop through the key-value pairs and create set out of each value for the corresponding key.
>>> myDict = {'a': [1,2,3], 'b':[4,5,6], 'c':[7,8,9]}
>>> myDict = {k: set(v) for k, v in myDict.items()}
>>> myDict
{'a': {1, 2, 3}, 'b': {4, 5, 6}, 'c': {8, 9, 7}}
You can't do it without looping anyway, but you can have the looping done in one line, with the following code:
myDict = {k:set(v) for k, v in myDict.items()}
This is basically traversing each item in your dictionary and converting the lists to sets and combining the key(str):value(set) pairs to a new dictionary and assigning it back to myDict variable.

Find unique (key: value) pair given N dictionaries in python

I would like to find an easy and/or fast way to find all common couple (pair: value) given N dictionaries in python. (3.X would be best)
PROBLEM
Given a set of 3 dicts (but it could be any dict, it is just for the example)
n1 = {'a': 1, 'b': 2, 'c': 3}
n2 = {'a': 1, 'b': 4, 'c': 3, 'd': 4}
n3 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
The result for common (key: values) for n1, n2 and n3
should be:
({'a': 1, 'c': 3})
And for n2 and n3 it should be
({'a': 1, 'c': 3, 'd': 4})
I first though about using a brute force algorithm that will check every pair (key: value) for every dict
Here is a implementation using a recursive algorithm
SOLUTION A
list_dict = [n1, n2, n3]
def finding_uniquness(ls):
def recursion(ls, result):
if not ls:
return result
result = {k: v for k, v in result.items() for k1, v1 in ls[0].items() if k == k1 and v == v1}
return recursion(ls[1:], result)
return recursion(ls[1:], ls[0])
finding_uniquness(list_dict)
# {'c': 3, 'a': 1}
But it is not easily understandable and the complexity is high
(I'm not sure how to calculate complexity; but since we compare all the elements on all dict, the complexity should be O(N²)?)
Then, I though about Sets. because it could naturally compare all the element
SOLUTION B
import functools
list_dict = [n1, n2, n3]
set_list = [set(n.items()) for n in list_dict]
functools.reduce(lambda x, y: x & y, set_list)
# {('a', 1), ('c', 3)}
It is so much better than the previous solution, unfortunately, when one of the key have a list as values it throws an error:
>>> n = {'a': [], 'b': 2, 'c': 3}
>>> set(n.items())
TypeError: unhashable type: 'list'
My question is then double:
is there any better algorithm than SOLUTION A?
or is there a way to avoid the TypeError with SOLUTION B?
of course, any other remarks will be welcome.
Simpler and more efficient way:
>>> {k: v
for k, v in list_dict[0].items()
if all(k in d and d[k] == v
for d in list_dict[1:])}
{'c': 3, 'a': 1}
Using an extra variable for list_dict[1:] might be beneficial, otherwise the short-circuiting of all somewhat goes to waste. Or if you don't need the list afterwards you could just pop the "master" dictionary:
>>> {k: v
for k, v in list_dict.pop().items()
if all(k in d and d[k] == v
for d in list_dict)}
{'c': 3, 'a': 1}
Or using get with a default that cannot be in the dictionary as suggested by #Jean-FrançoisFabre:
>>> marker = object()
>>> {k: v
for k, v in list_dict.pop().items()
if all(d.get(k, marker) == v
for d in list_dict)}
{'c': 3, 'a': 1}
If unhashable values are a problem you can always compute the intersection of the keys up-front by using .keys() and then compare only the values associated with the keys that all dictionaries have in common:
import operator as op
from functools import reduce
common_keys = reduce(op.and_, (d.keys() for d in my_dicts))
common_items = {}
for key in common_keys:
value = my_dicts[0][key]
if all(d[key] == value for d in my_dicts):
common_items[key] = value
This should be pretty faster than solution a, slower than solution b, but works on all inputs.
A batteries-included version.
To handle unhashable types, we use pickling; replace it with dill or json or any other predictable serialization to taste.
import collections
import itertools
import pickle
def findCommonPairs(dicts):
all_pairs = itertools.chain(*[d.items() for d in dicts])
cnt = collections.Counter(map(pickle.dumps, all_pairs))
return [pickle.loads(pickled_pair)
for pickled_pair, count in cnt.items()
if count == len(dicts)]
>>> findCommonPairs([n1, n2, n3])
[('a', 1), ('c', 3)]
>>> findCommonPairs([{'a': [1,2], 'b': [2,3]}, {'a': [1,2]}])
[('a', [1, 2])]
Note that serialization only goes so far. To properly compare dicts if dicts, for instance, these dicts must be turned into (key, value) pairs and sorted before serialization. Any structures that reference each other may have issues (or not). Replace pickling with a custom predictable serializer if you care about these issues.

Deleting dictionary keys from a provided list in Python [duplicate]

I know how to remove an entry, 'key' from my dictionary d, safely. You do:
if d.has_key('key'):
del d['key']
However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.
entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
if x in d:
del d[x]
However, I was wondering if there is a smarter way to do this?
Using dict.pop:
d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
d.pop(k, None)
Using Dict Comprehensions
final_dict = {key: value for key, value in d if key not in [key1, key2]}
where key1 and key2 are to be removed.
In the example below, keys "b" and "c" are to be removed & it's kept in a keys list.
>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>>
Why not like this:
entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}
def entries_to_remove(entries, the_dict):
for key in entries:
if key in the_dict:
del the_dict[key]
A more compact version was provided by mattbornski using dict.pop()
a solution is using map and filter functions
python 2
d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)
python 3
d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)
you get:
{'c': 3}
If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:
values_removed = [d.pop(k, None) for k in entities_to_remove]
You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function's side effect.
Found a solution with pop and map
d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)
The output of this:
{'d': 'valueD'}
I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.
Update
The above code will throw an error if a key does not exist in the dict.
DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']
def remove_key(key):
DICTIONARY.pop(key, None)
list(map(remove_key, keys))
print(DICTIONARY)
output:
DICTIONARY = {'b': 'valueB', 'd': 'valueD'}
Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it's quite readable. Adding in a function doesn't cause much overhead either:
timeit results (10k iterations):
all(x.pop(v) for v in r) # 0.85
all(map(x.pop, r)) # 0.60
list(map(x.pop, r)) # 0.70
all(map(x.__delitem__, r)) # 0.44
del_all(x, r) # 0.40
<inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
"""Remove list of elements from mapping."""
for key in to_remove:
del mapping[key]
For small iterations, doing that 'inline' was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.
I have no problem with any of the existing answers, but I was surprised to not find this solution:
keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}
for k in keys_to_remove:
try:
del my_dict[k]
except KeyError:
pass
assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}
Note: I stumbled across this question coming from here. And my answer is related to this answer.
I have tested the performance of three methods:
# Method 1: `del`
for key in remove_keys:
if key in d:
del d[key]
# Method 2: `pop()`
for key in remove_keys:
d.pop(key, None)
# Method 3: comprehension
{key: v for key, v in d.items() if key not in remove_keys}
Here are the results of 1M iterations:
del: 2.03s 2.0 ns/iter (100%)
pop(): 2.38s 2.4 ns/iter (117%)
comprehension: 4.11s 4.1 ns/iter (202%)
So both del and pop() are the fastest. Comprehensions are 2x slower.
But anyway, we speak nanoseconds here :) Dicts in Python are ridiculously fast.
Why not:
entriestoremove = (2,5,1)
for e in entriestoremove:
if d.has_key(e):
del d[e]
I don't know what you mean by "smarter way". Surely there are other ways, maybe with dictionary comprehensions:
entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}
inline
import functools
#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}
entitiesToREmove = ('a', 'b', 'c')
#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)
#: python3
list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))
print(d)
# output: {'d': 'dvalue'}
I think using the fact that the keys can be treated as a set is the nicest way if you're on python 3:
def remove_keys(d, keys):
to_remove = set(keys)
filtered_keys = d.keys() - to_remove
filtered_values = map(d.get, filtered_keys)
return dict(zip(filtered_keys, filtered_values))
Example:
>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}
It would be nice to have full support for set methods for dictionaries (and not the unholy mess we're getting with Python 3.9) so that you could simply "remove" a set of keys. However, as long as that's not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I've created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.
from itertools import product
from time import perf_counter
# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))
print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000
keys = product(range(50000, 100000), range(1, 100))
# for x,y in keys:
# del cells[x, y]
for n in map(cells.pop, keys):
pass
print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")
10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:
keys = cells.keys() & keys
In summary: del is already heavily optimised, so don't worry about using it.
Another map() way to remove list of keys from dictionary
and avoid raising KeyError exception
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, keys_to_remove))
print('k=', k)
print('dic after = \n', dic)
**this will produce output**
k= ['key_not_exist', 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
Duplicate keys_to_remove is artificial, it needs to supply defaults values for dict.pop() function.
You can add here any array with len_ = len(key_to_remove)
For example
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, np.zeros(len(keys_to_remove))))
print('k=', k)
print('dic after = ', dic)
** will produce output **
k= [0.0, 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
def delete_keys_from_dict(dictionary, keys):
"""
Deletes the unwanted keys in the dictionary
:param dictionary: dict
:param keys: list of keys
:return: dict (modified)
"""
from collections.abc import MutableMapping
keys_set = set(keys)
modified_dict = {}
for key, value in dictionary.items():
if key not in keys_set:
if isinstance(value, list):
modified_dict[key] = list()
for x in value:
if isinstance(x, MutableMapping):
modified_dict[key].append(delete_keys_from_dict(x, keys_set))
else:
modified_dict[key].append(x)
elif isinstance(value, MutableMapping):
modified_dict[key] = delete_keys_from_dict(value, keys_set)
else:
modified_dict[key] = value
return modified_dict
_d = {'a': 1245, 'b': 1234325, 'c': {'a': 1245, 'b': 1234325}, 'd': 98765,
'e': [{'a': 1245, 'b': 1234325},
{'a': 1245, 'b': 1234325},
{'t': 767}]}
_output = delete_keys_from_dict(_d, ['a', 'b'])
_expected = {'c': {}, 'd': 98765, 'e': [{}, {}, {'t': 767}]}
print(_expected)
print(_output)
I'm late to this discussion but for anyone else. A solution may be to create a list of keys as such.
k = ['a','b','c','d']
Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.
new_dictionary = [dictionary.pop(x, 'n/a') for x in k]
The 'n/a' is in case the key does not exist, a default value needs to be returned.

Merge several Python dictionaries [duplicate]

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 3 months ago.
I have to merge list of python dictionary. For eg:
dicts[0] = {'a':1, 'b':2, 'c':3}
dicts[1] = {'a':1, 'd':2, 'c':'foo'}
dicts[2] = {'e':57,'c':3}
super_dict = {'a':[1], 'b':[2], 'c':[3,'foo'], 'd':[2], 'e':[57]}
I wrote the following code:
super_dict = {}
for d in dicts:
for k, v in d.items():
if super_dict.get(k) is None:
super_dict[k] = []
if v not in super_dict.get(k):
super_dict[k].append(v)
Can it be presented more elegantly / optimized?
Note
I found another question on SO but its about merging exactly 2 dictionaries.
You can iterate over the dictionaries directly -- no need to use range. The setdefault method of dict looks up a key, and returns the value if found. If not found, it returns a default, and also assigns that default to the key.
super_dict = {}
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict.setdefault(k, []).append(v)
Also, you might consider using a defaultdict. This just automates setdefault by calling a function to return a default value when a key isn't found.
import collections
super_dict = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].append(v)
Also, as Sven Marnach astutely observed, you seem to want no duplication of values in your lists. In that case, set gets you what you want:
import collections
super_dict = collections.defaultdict(set)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].add(v)
from collections import defaultdict
dicts = [{'a':1, 'b':2, 'c':3},
{'a':1, 'd':2, 'c':'foo'},
{'e':57, 'c':3} ]
super_dict = defaultdict(set) # uses set to avoid duplicates
for d in dicts:
for k, v in d.items(): # use d.iteritems() in python 2
super_dict[k].add(v)
you can use this behaviour of dict. (a bit elegant)
a = {'a':1, 'b':2, 'c':3}
b = {'d':1, 'e':2, 'f':3}
c = {1:1, 2:2, 3:3}
merge = {**a, **b, **c}
print(merge) # {'a': 1, 'b': 2, 'c': 3, 'd': 1, 'e': 2, 'f': 3, 1: 1, 2: 2, 3: 3}
and you are good to go :)
Merge the keys of all dicts, and for each key assemble the list of values:
super_dict = {}
for k in set(k for d in dicts for k in d):
super_dict[k] = [d[k] for d in dicts if k in d]
The expression set(k for d in dicts for k in d) builds a set of all unique keys of all dictionaries. For each of these unique keys, we use the list comprehension [d[k] for d in dicts if k in d] to build the list of values from all dicts for this key.
Since you only seem to one the unique value of each key, you might want to use sets instead:
super_dict = {}
for k in set(k for d in dicts for k in d):
super_dict[k] = set(d[k] for d in dicts if k in d)
It seems like most of the answers using comprehensions are not all that readable. In case any gets lost in the mess of answers above this might be helpful (although extremely late...). Just loop over the items of each dict and place them in a separate one.
super_dict = {key:val for d in dicts for key,val in d.items()}
When the value of the keys are in list:
from collections import defaultdict
dicts = [{'a':[1], 'b':[2], 'c':[3]},
{'a':[11], 'd':[2], 'c':['foo']},
{'e':[57], 'c':[3], "a": [1]} ]
super_dict = defaultdict(list) # uses set to avoid duplicates
for d in dicts:
for k, v in d.items(): # use d.iteritems() in python 2
super_dict[k] = list(set(super_dict[k] + v))
combined_dict = {}
for elem in super_dict.keys():
combined_dict[elem] = super_dict[elem]
combined_dict
## output: {'a': [1, 11], 'b': [2], 'c': [3, 'foo'], 'd': [2], 'e': [57]}
I have a very easy to go solution without any imports.
I use the dict.update() method.
But sadly it will overwrite, if same key appears in more than one dictionary, then the most recently merged dict's value will appear in the output.
dict1 = {'Name': 'Zara', 'Age': 7}
dict2 = {'Sex': 'female' }
dict3 = {'Status': 'single', 'Age': 27}
dict4 = {'Occupation':'nurse', 'Wage': 3000}
def mergedict(*args):
output = {}
for arg in args:
output.update(arg)
return output
print(mergedict(dict1, dict2, dict3, dict4))
The output is this:
{'Name': 'Zara', 'Age': 27, 'Sex': 'female', 'Status': 'single', 'Occupation': 'nurse', 'Wage': 3000}
Perhaps a more modern and concise approach for those who use python 3.3 or later versions is the use of ChainMap from the collections module.
from collections import ChainMap
d1 = {'a': 1, 'b': 3}
d2 = {'c': 2}
d3 = {'d': 7, 'a': 9}
d4 = {}
combo = dict(ChainMap(d1, d2, d3, d4))
# {'d': 7, 'a': 1, 'c': 2, 'b': 3}
For a larger collection of dict objects then star operator works
dict(ChainMap(*dict_collection))
Note that the resulting dictionary seems to only keep the value of the first key it encounters in the ordered collection and ignores any further duplicates.
This may be a bit more elegant:
super_dict = {}
for d in dicts:
for k, v in d.iteritems():
l=super_dict.setdefault(k,[])
if v not in l:
l.append(v)
UPDATE: made change suggested by Sven
UPDATE: changed to avoid duplicates (thanks Marcin and Steven)
Never forget that the standard libraries have a wealth of tools for dealing with dicts and iteration:
from itertools import chain
from collections import defaultdict
super_dict = defaultdict(list)
for k,v in chain.from_iterable(d.iteritems() for d in dicts):
if v not in super_dict[k]: super_dict[k].append(v)
Note that the if v not in super_dict[k] can be avoided by using defaultdict(set) as per Steven Rumbalski's answer.
If you assume that the keys in which you are interested are at the same nested level, you can recursively traverse each dictionary and create a new dictionary using that key, effectively merging them.
merged = {}
for d in dicts:
def walk(d,merge):
for key, item in d.items():
if isinstance(item, dict):
merge.setdefault(key, {})
walk(item, merge[key])
else:
merge.setdefault(key, [])
merge[key].append(item)
walk(d,merged)
For example, say you have the following dictionaries you want to merge.
dicts = [{'A': {'A1': {'FOO': [1,2,3]}}},
{'A': {'A1': {'A2': {'BOO': [4,5,6]}}}},
{'A': {'A1': {'FOO': [7,8]}}},
{'B': {'B1': {'COO': [9]}}},
{'B': {'B2': {'DOO': [10,11,12]}}},
{'C': {'C1': {'C2': {'POO':[13,14,15]}}}},
{'C': {'C1': {'ROO': [16,17]}}}]
Using the key at each level, you should get something like this:
{'A': {'A1': {'FOO': [[1, 2, 3], [7, 8]],
'A2': {'BOO': [[4, 5, 6]]}}},
'B': {'B1': {'COO': [[9]]},
'B2': {'DOO': [[10, 11, 12]]}},
'C': {'C1': {'C2': {'POO': [[13, 14, 15]]},
'ROO': [[16, 17]]}}}
Note: I assume the leaf at each branch is a list of some kind, but you can obviously change the logic to do whatever is necessary for your situation.
This is a more recent enhancement over the prior answer by ElbowPipe, using newer syntax introduced in Python 3.9 for merging dictionaries. Note that this answer does not merge conflicting values into a list!
> import functools
> import operator
> functools.reduce(operator.or_, [{0:1}, {2:3, 4:5}, {2:6}])
{0: 1, 2: 6, 4: 5}
For a oneliner, the following could be used:
{key: {d[key] for d in dicts if key in d} for key in {key for d in dicts for key in d}}
although readibility would benefit from naming the combined key set:
combined_key_set = {key for d in dicts for key in d}
super_dict = {key: {d[key] for d in dicts if key in d} for key in combined_key_set}
Elegance can be debated but personally I prefer comprehensions over for loops. :)
(The dictionary and set comprehensions are available in Python 2.7/3.1 and newer.)
python 3.x (reduce is builtin for python 2.x, so no need to import if in 2.x)
import operator
from functools import operator.add
a = [{'a': 1}, {'b': 2}, {'c': 3, 'd': 4}]
dict(reduce(operator.add, map(list,(map(dict.items, a))))
map(dict.items, a) # converts to list of key, value iterators
map(list, ... # converts to iterator equivalent of [[[a, 1]], [[b, 2]], [[c, 3],[d,4]]]
reduce(operator.add, ... # reduces the multiple list down to a single list
My solution is similar to #senderle proposed, but instead of for loop I used map
super_dict = defaultdict(set)
map(lambda y: map(lambda x: super_dict[x].add(y[x]), y), dicts)
The use of defaultdict is good, this also can be done with the use of itertools.groupby.
import itertools
# output all dict items, and sort them by key
dicts_ele = sorted( ( item for d in dicts for item in d.items() ), key = lambda x: x[0] )
# groups items by key
ele_groups = itertools.groupby( dicts_ele, key = lambda x: x[0] )
# iterates over groups and get item value
merged = { k: set( v[1] for v in grouped ) for k, grouped in ele_groups }
and obviously, you can merge this block of code into one-line style
merged = {
k: set( v[1] for v in grouped )
for k, grouped in (
itertools.groupby(
sorted(
( item for d in dicts for item in d.items() ),
key = lambda x: x[0]
),
key = lambda x: x[0]
)
)
}
I'm a bit late to the game but I did it in 2 lines with no dependencies beyond python itself:
flatten = lambda *c: (b for a in c for b in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))
o = reduce(lambda d1,d2: dict((k, list(flatten([d1.get(k), d2.get(k)]))) for k in set(d1.keys() + d2.keys())), dicts)
# output:
# {'a': [1, 1, None], 'c': [3, 'foo', 3], 'b': [2, None, None], 'e': [None, 57], 'd': [None, 2, None]}
Though if you don't care about nested lists, then:
o2 = reduce(lambda d1,d2: dict((k, [d1.get(k), d2.get(k)]) for k in set(d1.keys() + d2.keys())), dicts)
# output:
# {'a': [[1, 1], None], 'c': [[3, 'foo'], 3], 'b': [[2, None], None], 'e': [None, 57], 'd': [[None, 2], None]}

Removing multiple keys from a dictionary safely

I know how to remove an entry, 'key' from my dictionary d, safely. You do:
if d.has_key('key'):
del d['key']
However, I need to remove multiple entries from a dictionary safely. I was thinking of defining the entries in a tuple as I will need to do this more than once.
entities_to_remove = ('a', 'b', 'c')
for x in entities_to_remove:
if x in d:
del d[x]
However, I was wondering if there is a smarter way to do this?
Using dict.pop:
d = {'some': 'data'}
entries_to_remove = ('any', 'iterable')
for k in entries_to_remove:
d.pop(k, None)
Using Dict Comprehensions
final_dict = {key: value for key, value in d if key not in [key1, key2]}
where key1 and key2 are to be removed.
In the example below, keys "b" and "c" are to be removed & it's kept in a keys list.
>>> a
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> keys = ["b", "c"]
>>> print {key: a[key] for key in a if key not in keys}
{'a': 1, 'd': 4}
>>>
Why not like this:
entries = ('a', 'b', 'c')
the_dict = {'b': 'foo'}
def entries_to_remove(entries, the_dict):
for key in entries:
if key in the_dict:
del the_dict[key]
A more compact version was provided by mattbornski using dict.pop()
a solution is using map and filter functions
python 2
d={"a":1,"b":2,"c":3}
l=("a","b","d")
map(d.__delitem__, filter(d.__contains__,l))
print(d)
python 3
d={"a":1,"b":2,"c":3}
l=("a","b","d")
list(map(d.__delitem__, filter(d.__contains__,l)))
print(d)
you get:
{'c': 3}
If you also need to retrieve the values for the keys you are removing, this would be a pretty good way to do it:
values_removed = [d.pop(k, None) for k in entities_to_remove]
You could of course still do this just for the removal of the keys from d, but you would be unnecessarily creating the list of values with the list comprehension. It is also a little unclear to use a list comprehension just for the function's side effect.
Found a solution with pop and map
d = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'b', 'c']
list(map(d.pop, keys))
print(d)
The output of this:
{'d': 'valueD'}
I have answered this question so late just because I think it will help in the future if anyone searches the same. And this might help.
Update
The above code will throw an error if a key does not exist in the dict.
DICTIONARY = {'a': 'valueA', 'b': 'valueB', 'c': 'valueC', 'd': 'valueD'}
keys = ['a', 'l', 'c']
def remove_key(key):
DICTIONARY.pop(key, None)
list(map(remove_key, keys))
print(DICTIONARY)
output:
DICTIONARY = {'b': 'valueB', 'd': 'valueD'}
Some timing tests for cpython 3 shows that a simple for loop is the fastest way, and it's quite readable. Adding in a function doesn't cause much overhead either:
timeit results (10k iterations):
all(x.pop(v) for v in r) # 0.85
all(map(x.pop, r)) # 0.60
list(map(x.pop, r)) # 0.70
all(map(x.__delitem__, r)) # 0.44
del_all(x, r) # 0.40
<inline for loop>(x, r) # 0.35
def del_all(mapping, to_remove):
"""Remove list of elements from mapping."""
for key in to_remove:
del mapping[key]
For small iterations, doing that 'inline' was a bit faster, because of the overhead of the function call. But del_all is lint-safe, reusable, and faster than all the python comprehension and mapping constructs.
I have no problem with any of the existing answers, but I was surprised to not find this solution:
keys_to_remove = ['a', 'b', 'c']
my_dict = {k: v for k, v in zip("a b c d e f g".split(' '), [0, 1, 2, 3, 4, 5, 6])}
for k in keys_to_remove:
try:
del my_dict[k]
except KeyError:
pass
assert my_dict == {'d': 3, 'e': 4, 'f': 5, 'g': 6}
Note: I stumbled across this question coming from here. And my answer is related to this answer.
I have tested the performance of three methods:
# Method 1: `del`
for key in remove_keys:
if key in d:
del d[key]
# Method 2: `pop()`
for key in remove_keys:
d.pop(key, None)
# Method 3: comprehension
{key: v for key, v in d.items() if key not in remove_keys}
Here are the results of 1M iterations:
del: 2.03s 2.0 ns/iter (100%)
pop(): 2.38s 2.4 ns/iter (117%)
comprehension: 4.11s 4.1 ns/iter (202%)
So both del and pop() are the fastest. Comprehensions are 2x slower.
But anyway, we speak nanoseconds here :) Dicts in Python are ridiculously fast.
Why not:
entriestoremove = (2,5,1)
for e in entriestoremove:
if d.has_key(e):
del d[e]
I don't know what you mean by "smarter way". Surely there are other ways, maybe with dictionary comprehensions:
entriestoremove = (2,5,1)
newdict = {x for x in d if x not in entriestoremove}
inline
import functools
#: not key(c) in d
d = {"a": "avalue", "b": "bvalue", "d": "dvalue"}
entitiesToREmove = ('a', 'b', 'c')
#: python2
map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove)
#: python3
list(map(lambda x: functools.partial(d.pop, x, None)(), entitiesToREmove))
print(d)
# output: {'d': 'dvalue'}
I think using the fact that the keys can be treated as a set is the nicest way if you're on python 3:
def remove_keys(d, keys):
to_remove = set(keys)
filtered_keys = d.keys() - to_remove
filtered_values = map(d.get, filtered_keys)
return dict(zip(filtered_keys, filtered_values))
Example:
>>> remove_keys({'k1': 1, 'k3': 3}, ['k1', 'k2'])
{'k3': 3}
It would be nice to have full support for set methods for dictionaries (and not the unholy mess we're getting with Python 3.9) so that you could simply "remove" a set of keys. However, as long as that's not the case, and you have a large dictionary with potentially a large number of keys to remove, you might want to know about the performance. So, I've created some code that creates something large enough for meaningful comparisons: a 100,000 x 1000 matrix, so 10,000,00 items in total.
from itertools import product
from time import perf_counter
# make a complete worksheet 100000 * 1000
start = perf_counter()
prod = product(range(1, 100000), range(1, 1000))
cells = {(x,y):x for x,y in prod}
print(len(cells))
print(f"Create time {perf_counter()-start:.2f}s")
clock = perf_counter()
# remove everything above row 50,000
keys = product(range(50000, 100000), range(1, 100))
# for x,y in keys:
# del cells[x, y]
for n in map(cells.pop, keys):
pass
print(len(cells))
stop = perf_counter()
print(f"Removal time {stop-clock:.2f}s")
10 million items or more is not unusual in some settings. Comparing the two methods on my local machine I see a slight improvement when using map and pop, presumably because of fewer function calls, but both take around 2.5s on my machine. But this pales in comparison to the time required to create the dictionary in the first place (55s), or including checks within the loop. If this is likely then its best to create a set that is a intersection of the dictionary keys and your filter:
keys = cells.keys() & keys
In summary: del is already heavily optimised, so don't worry about using it.
Another map() way to remove list of keys from dictionary
and avoid raising KeyError exception
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, keys_to_remove))
print('k=', k)
print('dic after = \n', dic)
**this will produce output**
k= ['key_not_exist', 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
Duplicate keys_to_remove is artificial, it needs to supply defaults values for dict.pop() function.
You can add here any array with len_ = len(key_to_remove)
For example
dic = {
'key1': 1,
'key2': 2,
'key3': 3,
'key4': 4,
'key5': 5,
}
keys_to_remove = ['key_not_exist', 'key1', 'key2', 'key3']
k = list(map(dic.pop, keys_to_remove, np.zeros(len(keys_to_remove))))
print('k=', k)
print('dic after = ', dic)
** will produce output **
k= [0.0, 1, 2, 3]
dic after = {'key4': 4, 'key5': 5}
def delete_keys_from_dict(dictionary, keys):
"""
Deletes the unwanted keys in the dictionary
:param dictionary: dict
:param keys: list of keys
:return: dict (modified)
"""
from collections.abc import MutableMapping
keys_set = set(keys)
modified_dict = {}
for key, value in dictionary.items():
if key not in keys_set:
if isinstance(value, list):
modified_dict[key] = list()
for x in value:
if isinstance(x, MutableMapping):
modified_dict[key].append(delete_keys_from_dict(x, keys_set))
else:
modified_dict[key].append(x)
elif isinstance(value, MutableMapping):
modified_dict[key] = delete_keys_from_dict(value, keys_set)
else:
modified_dict[key] = value
return modified_dict
_d = {'a': 1245, 'b': 1234325, 'c': {'a': 1245, 'b': 1234325}, 'd': 98765,
'e': [{'a': 1245, 'b': 1234325},
{'a': 1245, 'b': 1234325},
{'t': 767}]}
_output = delete_keys_from_dict(_d, ['a', 'b'])
_expected = {'c': {}, 'd': 98765, 'e': [{}, {}, {'t': 767}]}
print(_expected)
print(_output)
I'm late to this discussion but for anyone else. A solution may be to create a list of keys as such.
k = ['a','b','c','d']
Then use pop() in a list comprehension, or for loop, to iterate over the keys and pop one at a time as such.
new_dictionary = [dictionary.pop(x, 'n/a') for x in k]
The 'n/a' is in case the key does not exist, a default value needs to be returned.

Categories

Resources