So I have a list with several dictionaries, they all have the same keys. Some dictionaries are the same but one value is different. How could I merge them into 1 dictionary having that different values as array?
Let me give you an example:
let's say I have this dictionaries
[{'a':1, 'b':2,'c':3},{'a':1, 'b':2,'c':4},{'a':1, 'b':3,'c':3},{'a':1, 'b':3,'c':4}]
My desired output would be this:
[{'a':1, 'b':2,'c':[3,4]},{'a':1, 'b':3,'c':[3,4]}]
I've tried using for and if nested, but it's too expensive and nasty, and I'm sure there must be a better way. Could you give me a hand?
How could I do that for any kind of dictionary assuming that the amount of keys is the same on the dictionaries and knowing the name of the key to be merged as array (c in this case)
thanks!
Use a collections.defaultdict to group the c values by a and b tuple keys:
from collections import defaultdict
lst = [
{"a": 1, "b": 2, "c": 3},
{"a": 1, "b": 2, "c": 4},
{"a": 1, "b": 3, "c": 3},
{"a": 1, "b": 3, "c": 4},
]
d = defaultdict(list)
for x in lst:
d[x["a"], x["b"]].append(x["c"])
result = [{"a": a, "b": b, "c": c} for (a, b), c in d.items()]
print(result)
Could also use itertools.groupby if lst is already ordered by a and b:
from itertools import groupby
from operator import itemgetter
lst = [
{"a": 1, "b": 2, "c": 3},
{"a": 1, "b": 2, "c": 4},
{"a": 1, "b": 3, "c": 3},
{"a": 1, "b": 3, "c": 4},
]
result = [
{"a": a, "b": b, "c": [x["c"] for x in g]}
for (a, b), g in groupby(lst, key=itemgetter("a", "b"))
]
print(result)
Or if lst is not ordered by a and b, we can sort by those two keys as well:
result = [
{"a": a, "b": b, "c": [x["c"] for x in g]}
for (a, b), g in groupby(
sorted(lst, key=itemgetter("a", "b")), key=itemgetter("a", "b")
)
]
print(result)
Output:
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
Update
For a more generic solution for any amount of keys:
def merge_lst_dicts(lst, keys, merge_key):
groups = defaultdict(list)
for item in lst:
key = tuple(item.get(k) for k in keys)
groups[key].append(item.get(merge_key))
return [
{**dict(zip(keys, group_key)), **{merge_key: merged_values}}
for group_key, merged_values in groups.items()
]
print(merge_lst_dicts(lst, ["a", "b"], "c"))
# [{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
You could use a temp dict to solve this problem -
>>>python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
>>> di=[{'a':1, 'b':2,'c':3},{'a':1, 'b':2,'c':4},{'a':1, 'b':3,'c':3},{'a':1, 'b':3,'c':4}]
>>> from collections import defaultdict as dd
>>> dt=dd(list) #default dict of list
>>> for d in di: #create temp dict with 'a','b' as tuple and append 'c'
... dt[d['a'],d['b']].append(d['c'])
>>> for k,v in dt.items(): #Create final output from temp
... ol.append({'a':k[0],'b':k[1], 'c':v})
...
>>> ol #output
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
If the number of keys in input dict is large, the process to extract
tuple for temp_dict can be automated -
if the keys the define condition for merging are known than it can be simply a constant tuple eg.
keys=('a','b') #in this case, merging happens over these keys
If this is not known at until runtime, then we can get these keys using zip function and set difference, eg.
>>> di
[{'a': 1, 'b': 2, 'c': 3}, {'a': 1, 'b': 2, 'c': 4}, {'a': 1, 'b': 3, 'c': 3}, {'a': 1, 'b': 3, 'c': 4}]
>>> key_to_ignore_for_merge='c'
>>> keys=tuple(set(list(zip(*zip(*di)))[0])-set(key_to_ignore_for_merge))
>>> keys
('a', 'b')
At this point, we can use map to extract tuple for keys only-
>>> dt=dd(list)
>>> for d in di:
... dt[tuple(map(d.get,keys))].append(d[key_to_ignore_for_merge])
>>> dt
defaultdict(<class 'list'>, {(1, 2): [3, 4], (1, 3): [3, 4]})
Now, to recreate the dictionary from default_dict and keys will require some zip magic again!
>>> for k,v in dt.items():
... dtt=dict(tuple(zip(keys, k)))
... dtt[key_to_ignore_for_merge]=v
... ol.append(dtt)
...
>>> ol
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
This solution assumes that you only know the keys that can be different (eg. 'c') and rest is all runtime.
Related
I have list of identical dictionaries:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
I need to get something like this:
a = [1, 4, 7]
b = [2, 5, 8]
c = [3, 6, 9]
I know how to do in using for .. in .., but is there way to do it without looping?
If i do
a, b, c = zip(*my_list)
i`m getting
a = ('a', 'a', 'a')
b = ('b', 'b', 'b')
c = ('c', 'c', 'c')
Any solution?
You need to extract all the values in my_list.You could try:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
a, b, c = zip(*map(lambda d: d.values(), my_list))
print(a, b, c)
# (1, 4, 7) (2, 5, 8) (3, 6, 9)
Pointed out by #Alexandre,This work only when the dict is ordered.If you couldn't make sure the order, consider the answer of yatu.
You will have to loop to obtain the values from the inner dictionaries. Probably the most appropriate structure would be to have a dictionary, mapping the actual letter and a list of values. Assigning to different variables is usually not the best idea, as it will only work with the fixed amount of variables.
You can iterate over the inner dictionaries, and append to a defaultdict as:
from collections import defaultdict
out = defaultdict(list)
for d in my_list:
for k,v in d.items():
out[k].append(v)
print(out)
#defaultdict(list, {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]})
Pandas DataFrame has just a factory method for this, so if you already have it as a dependency or if the input data is large enough:
import pandas as pd
my_list = ...
df = pd.DataFrame.from_rows(my_list)
a = list(df['a']) # df['a'] is a pandas Series, essentially a wrapped C array
b = list(df['b'])
c = list(df['c'])
Please find the code below. I believe that the version with a loop is much easier to read.
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
# we assume that all dictionaries have the sames keys
a, b, c = map(list, map(lambda k: map(lambda d: d[k], my_list), my_list[0]))
print(a,b,c)
I have a dictionary, with this value:
{"a": 1, "b": 2, "c": 3}
I would like to rename the key b to B, without it losing its second place. In Python 3.7 and higher, dictionaries preserve insertion order, so the order of the keys can be counted on and might mean something. The end result I'm looking for is:
{"a": 1, "B": 2, "c": 3}
The obvious code would be to run:
>>> dictionary["B"] = dictionary.pop("b")
{'a': 1, 'c': 3, 'B': 2}
However, this doesn't preserve the order as desired.
foo = {'c': 2, 'b': 4, 'J': 7}
foo = {key if key != 'b' else 'B': value for key, value in foo.items()}
foo
Out[7]: {'c': 2, 'B': 4, 'J': 7}
This solution modifies the dictionary d in-place. If performance is not a concern, you could do the following:
d = {"a": 1, "b": 2, "c": 3, "d": 4}
replacement = {"b": "B"}
for k, v in list(d.items()):
d[replacement.get(k, k)] = d.pop(k)
print(d)
Output:
{'a': 1, 'B': 2, 'c': 3, 'd': 4}
Notice that the above solution will work for any numbers of keys to be replaced. Also note that you need to iterate over a copy of d.items() (using list(d.items())), as you shouldn't iterate over a dictionary while modifying its keys.
As a variant of the existing answers that also works for more than once replacement, you can define another dictionary showing which keys to replace with that other keys:
>>> d = {"a": 1, "b": 2, "c": 3}
>>> repl = {"b": "B"}
>>> {repl.get(k, k): d[k] for k in d}
{'a': 1, 'B': 2, 'c': 3}
Of course, this still creates a new dictionary instead of updating the existing one and thus needs O(n), but at least it does so just once for all keys that need to be updated.
dict1 = {"a": 1, "b": 2, "c": 3}
dict2 = dict()
for key in dict1:
if key == 'b':
dict2[key.upper()] = dict1[key]
else:
dict2[key] = dict1[key]
dict1 = dict2 #if you want to have it in original dict
You can set whatever value you want in if statement
Given any amount of dictionaries, how would one go about merging them all together, such that the merged dictionary contains all the dictionaries' elements as well as summing similar key values.
eg.
d1 = {a: 2, b: 3, c: 1}
d2 = {a: 3, b: 2, c: 3}
d3 = {b: 8, d: 2}
our merged dictionary would look like such:
{a: 5, b: 13, c: 4, d: 2}
Can this be done via kwargs? I am aware that one can do:
{**d1, **d2, **d3}
But can this be done for n-defined dictionaries?
you can use a Counter
from collections import Counter
d1 = {'a': 2, 'b': 3, 'c': 1}
d2 = {'a': 3, 'b': 2, 'c': 3}
d3 = {'b': 8, 'd': 2}
list_of_dicts = [d1, d2, d3]
cnt = Counter()
for d in list_of_dicts:
cnt.update(d)
print(cnt)
Counter({'b': 13, 'a': 5, 'c': 4, 'd': 2})
Per your comment regarding defaultdict, here is an approach along those lines. That said, I prefer the Counter approach in the answer from #Raphael.
from collections import defaultdict
def mergesum(*dicts):
merged = defaultdict(int)
for k, v in (item for d in dicts for item in d.items()):
merged[k] += v
return merged
d1 = {'a': 2, 'b': 3, 'c': 1}
d2 = {'a': 3, 'b': 2, 'c': 3}
d3 = {'b': 8, 'd': 2}
result = mergesum(d1, d2, d3)
print(result)
# defaultdict(<class 'int'>, {'a': 5, 'b': 13, 'c': 4, 'd': 2})
For school i am writing a small program for a rankinglist for a game.
I am using dicts for this, with the name of the player as keyname, and the score as keyvalue.
there will be 10 games, and each game will have an automatic ranking system which i print to file.
ive already managed to code the ranking system, but now im facing a bigger challange which i cannot solve:
I have to make an overall ranking, which means someplayername can be in several contests with several scores, but i need to only keep the highest score of a duplicate.
In short: I need some help with keeping the duplicate key with the highest value:
like this:
dict1 = {"a": 6, "b": 4, "c": 2, "g": 1}
dict2 = {"a": 3, "f": 4, "g": 5, "d": 2}
dictcombined = {'a': 6, 'b': 4, 'c': 2, 'g': 5, 'f': 4, 'd': 2}
the normal merge option just takes the second dict and thus that value.
thnx in advance
You need to have a function that will keep track of the highest scores for each player. It will add a player to the total if not already there, otherwise adding it if it's higher.
Something like this:
def addScores(scores, total):
for player in scores:
if player not in total or total[player] < scores[player]:
total[player] = scores[player]
This works like a charm:
dict1 = {"a": 6, "z": 4, "g": 1, "hh": 50, "ggg": 1}
dict2 = {"a": 3, "g": 5, "d": 2, "hh": 50}
for key in dict1:
if key not in dict2 or dict1[key] > dict2[key]:
dict2[key] = dict1[key]
print (dict1)
print (dict2)
dict3 = {**dict1, **dict2}
print (dict3)
Now I can compare dict3 with other dicts and so on.
Here's a variation on Matt Eding's answer that compares each value individually instead of creating sets of values. As a plus, it doesn't need any imports.
def combine_dicts(chooser, *dicts):
combined = {}
for d in dicts:
for k, v in d.items():
if k not in combined:
combined[k] = v
else:
combined[k] = chooser(v, combined[k])
return combined
Usage:
>>> combine_dicts(max, dict1, dict2)
{'a': 6, 'b': 4, 'c': 2, 'g': 5, 'f': 4, 'd': 2}
Here is my generalized solution to your question. It's a function that can combine an arbitrary number of dictionaries and has an option for other comparison functions should you want to say, keep track of the minimum values instead.
import collections
def combine_dicts(func, *dicts):
default = collections.defaultdict(set)
for d in dicts:
for k, v in d.items():
default[k].add(v)
return {k: func(v) for k, v in default.items()}
It uses a defaultdict with set as its default_factory to keep track of repetitions of keys with different values. Then it returns a dictionary comprehension to filter out the desired values.
dict1 = {"a": 6, "b": 4, "c": 2, "g": 1}
dict2 = {"a": 3, "d": 2, "f": 4, "g": 5}
dict_comb = combine_dicts(max, dict1, dict2)
print(dict_comb) # -> {'a': 6, 'b': 4, 'c': 2, 'd': 2, 'f': 4, 'g': 5}
Yet another approach, surprisingly not proposed (since 100% built-in)
>>> dict(sorted([*dict1.items(), *dict2.items()]))
{'a': 6, 'b': 4, 'c': 2, 'd': 2, 'f': 4, 'g': 5}
If your key-value pairs are less "lexicographic", you may want to target the numerics specifically, doing
>>> dict(sorted([*dict1.items(), *dict2.items()], key=lambda item: item[1]))
{'g': 5, 'c': 2, 'd': 2, 'a': 6, 'b': 4, 'f': 4}
You might consider using Pandas for this. It also has a ton of other helpful functionality for working with data.
There's probably an ideal way to solve this, but the first thing I thought of is to create two Series (which are sort of like dicts), concatenate them, group by the labels (a, b, c, etc.), then get the max for each group.
import pandas as pd
s1, s2 = [pd.Series(d, name='Scores') for d in [dict1, dict2]]
result = pd.concat([s1, s2]).groupby(level=0).max()
>>> result
a 6
b 4
c 2
d 2
f 4
g 5
Name: Scores, dtype: int64
If you want the result as a dict:
>>> result.to_dict()
{'a': 6, 'b': 4, 'c': 2, 'd': 2, 'f': 4, 'g': 5}
I have a dictionary of dictionaries:
d = {"a": {"x":1, "y":2, "z":3}, "b": {"x":2, "y":3, "z":4}, "c": {"x":3, "y":4, "z":5}}
And I want to convert it to:
new_d = {"x":[1, 2, 3], "y": [2, 3, 4], "z": [3, 4, 5]}
The requirement is that new_d[key][i] and new_d[another_key][i] should be in the same sub-dictionary of d.
So I created new_d = {} and then:
for key in d.values()[0].keys():
new_d[key] = [d.values()[i][key] for i in range(len(d.values()))]
This gives me what I expected, but I am just wondering if there are some built-in functions for this operation or there are better ways to do it.
There is no built-in function for this operation, no. I'd just loop over values directly:
new_d = {}
for sub in d.itervalues(): # Python 3: use d.values()
for key, value in sub.iteritems(): # Python 3: use d.items()
new_d.setdefault(key, []).append(value)
This avoids creating a new list for the dict.values() call each time.
Note that dictionaries have no order. The values in the resulting lists are going to fit your criteria however; they'll be added in the same order for each of the keys in new_d:
>>> d = {"a": {"x":1, "y":2, "z":3}, "b": {"x":2, "y":3, "z":4}, "c": {"x":3, "y":4, "z":5}}
>>> new_d = {}
>>> for sub in d.values():
... for key, value in sub.items():
... new_d.setdefault(key, []).append(value)
...
>>> new_d
{'x': [1, 2, 3], 'y': [2, 3, 4], 'z': [3, 4, 5]}
List Comprehension Method
If you like dictionary and list comprehensions ...
d1 = {"a": {"x": 1, "y": 2, "z": 3},
"b": {"x": 2, "y": 3, "z": 4},
"c": {"x": 3, "y": 4, "z": 5}}
dl1 = {kl: [v for di in d1.values() for k, v in di.items() if k == kl]
for di in d1.values() for kl in di.keys()}
print(dl1)
And yields the results hoped for ...
{'x': [1, 2, 3], 'y': [2, 3, 4], 'z': [3, 4, 5]}