dictionary add values for the same keys - python

I have a list of dictionary:
[{'name':'Jay', 'value':'1'},{'name':'roc', 'value':'9'},{'name':'Jay', 'value':'7'},{'name':'roc', 'value':'2'}]
I want it to be:
[{'name':'Jay', 'value':'8'},{'name':'roc', 'value':'11'}]
I tried looping through but I am not able to find an example where I can do this. Any hint or idea will be appreciated.

You can use a defaultdict:
lst = [{'name':'Jay', 'value':'1'},{'name':'roc', 'value':'9'},{'name':'Jay', 'value':'7'},{'name':'roc', 'value':'2'}]
1) sum values for each name:
from collections import defaultdict
result = defaultdict(int)
for d in lst:
result[d['name']] += int(d['value'])
2) convert the name-value pair to a dictionary within a list:
[{'name': name, 'value': value} for name, value in result.items()]
# [{'name': 'roc', 'value': 11}, {'name': 'Jay', 'value': 8}]
Or if you want the value as str type as commented by #Kevin:
[{'name': name, 'value': str(value)} for name, value in result.items()]
​# [{'name': 'roc', 'value': '11'}, {'name': 'Jay', 'value': '8'}]

This is a good use case for itertools.groupby.
from itertools import groupby
from operator import itemgetter
orig = [{'name':'Jay', 'value':'1'},
{'name':'roc', 'value':'9'},
{'name':'Jay', 'value':'7'},
{'name':'roc', 'value':'2'}]
get_name = itemgetter('name')
result = [{'name': name, 'value': str(sum(int(d['value']) for d in dicts))}
for name, dicts in groupby(sorted(orig, key=get_name), key=get_name)]
Breaking it down:
get_name is a function that given a dictionary, returns the value of its "name" key. I.e., get_name = lambda x: x['name'].
sorted returns the list of dictionaries sorted by the value of the "name" key.
groupby returns an iterator of (name, dicts) where dicts is a list (ok, generator) of the dicts that share name as the value of the "name" key. (Grouping only occurs for consecutive items with the same key value, hence the need to sort the list in the previous step.)
The result is a list of new dictionaries using the given name and the sum of all the related "value" elements.

Similar to Psidom's answer but using collections.Counter which is the perfect candidate for accumulating integer values.
import collections
d =[{'name':'Jay', 'value':'1'},{'name':'roc', 'value':'9'},{'name':'Jay', 'value':'7'},{'name':'roc', 'value':'2'}]
c = collections.Counter()
for sd in d:
c[sd["name"]] += int(sd["value"])
Then, you need to rebuild the dicts if needed, by converting back to string.
print([{"name":n,"value":str(v)} for n,v in c.items()])
result:
[{'name': 'Jay', 'value': '8'}, {'name': 'roc', 'value': '11'}]

For the sake of completeness, without collections.defaultdict:
data = [{'name': 'Jay', 'value': '1'}, {'name': 'roc', 'value': '9'},
{'name': 'Jay', 'value': '7'}, {'name': 'roc', 'value': '2'}]
result = {}
# concetrate
for element in data:
result[element["name"]] = result.get(element["name"], 0) + int(element["value"])
# unpack
result = [{"name": element, "value": result[element]} for element in result]
# optionally, you can loop through result.items()
# you can, also, turn back result[elements] to str if needed
print(result)
# prints: [{'name': 'Jay', 'value': 8}, {'name': 'roc', 'value': 11}]

Another way to solve your question by using groupby from itertools module:
from itertools import groupby
a = [{'name':'Jay', 'value':'1'},{'name':'roc', 'value':'9'},{'name':'Jay', 'value':'7'},{'name':'roc', 'value':'2'}]
final = []
for k,v in groupby(sorted(a, key= lambda x: x["name"]), lambda x: x["name"]):
final.append({"name": k, "value": str(sum(int(j["value"]) for j in list(v)))})
print(final)
Output:
[{'name': 'Jay', 'value': '8'}, {'name': 'roc', 'value': '11'}]

ld = [{'name':'Jay', 'value':'1'},{'name':'roc', 'value':'9'},{'name':'Jay', 'value':'7'},{'name':'roc', 'value':'2'}]
tempDict = {}
finalList = []
for d in ld:
name = d['name']
value = d['value']
if name not in tempDict:
tempDict[name] = 0
tempDict[name] += int(value)
#tempDict => {'Jay': 8, 'roc': 11}
for name,value in tempDict.items():
finalList.append({'name':name,'value':value})
print(finalList)
# [{'name': 'Jay', 'value': 8}, {'name': 'roc', 'value': 11}]

Here's another way using pandas
names = [{'name':'Jay', 'value':'1'},{'name':'roc', 'value':'9'},{'name':'Jay', 'value':'7'},
{'name':'roc', 'value':'2'}]
df = pd.DataFrame(names)
df['value'] = df['value'].astype(int)
group = df.groupby('name')['value'].sum().to_dict()
result = [{'name': name, 'value': value} for name, value in group.items()]
Which outputs:
[{'value': 8, 'name': 'Jay'}, {'value': 11, 'name': 'roc'}]

Related

Sort list of dictionaries according to custom ordering

I need to sort list of dictionaries according to predefined list of values
I am using this part of code, and it will work fine but l_of_dicts has values which are not in a sort_l(predefined list of values)
l_of_dicts = [{'name':'Max','years':18},{'name':'John','years':25},{'name':'Ana','years':19},{'name':'Melis','years':38},{'name':'Ivana','years':38}]
sort_l = ['Ana','Melis','John','Max','Peter']
res = sorted(l_of_dicts , key = lambda ele: sort_l .index(list(ele.values())[0]))
I get an error :
ValueError: 'Ivana' is not in list
Is it posible to ignore that values? or even better extract them to another list?
The script bellow does the job but looks heavier than it should be. Maybe there's easier way to do it.
Script:
def sort_dict_list(dict_list, sorting_list):
result = []
not_used_values = []
temp = {}
for index, dictionary in enumerate(dict_list):
temp[dictionary['name']] = index
for item in sorting_list:
if item in temp.keys():
result.append(dict_list[temp[item]])
else:
not_used_values.append(item)
return result, not_used_values
You could do some preprocessing to store the ordering of only names in l_of_dicts:
l_of_dicts = [{'name': 'Max', 'years': 18}, {'name': 'John', 'years': 25},
{'name': 'Ana', 'years': 19}, {'name': 'Melis', 'years': 38}]
names_in_dicts = {d['name'] for d in l_of_dicts if 'name' in d}
sort_l = ['Ana', 'Melis', 'Ivana', 'John', 'Max', 'Peter']
sort_order = {name: order for order, name in enumerate(sort_l) if name in
names_in_dicts}
print(sort_order) # {'Ana': 0, 'Melis': 1, 'John': 3, 'Max': 4}
sorted_l_of_dicts = sorted(l_of_dicts, key=lambda d: sort_order[d['name']])
print(sorted_l_of_dicts) # [{'name': 'Ana', 'years': 19}, {'name': 'Melis', 'years': 38}, {'name': 'John', 'years': 25}, {'name': 'Max', 'years': 18}]

Summing up list of dictionaries based on a condition and deleting few keys

I have a list of dictionaries with dynamic keys(keys are generated from code) as follows:
l=[{"key1":1,"author":"test","year":"2011"},{"key2":5,"author":"test","year":"2012"},
{"key1":3,"author":"test1","year":"2012"},
{"key1":1,"author":"test","year":"2012"}]
Now I want to add up the first key values if the keys are same and group them finally.So,my final list should look like this:
l=[{"key1":2,"author":"test","year":["2011","2012"]},{"key2":5,"author":"test","year":"2012"},{"key1":3,"author":"test1","year":"2012"}]
I have tried pandas groupby but I can't use it because the keys are auto-generated.However,the code is as follows:
(pd.DataFrame(l)
.groupby(['author', 'year'], as_index=False)
.key1.sum()
.to_dict('r'))
What could be a better approach?
Rules:
Sum up two values if the first key in dictionary is same and also the other keys author and year remain same
If author is not same,then don't add them up
If author is same but years are different,then group the years up
and
add the key
You'd be better off with a cleaner data structure, where there is nothing special with the first mapping of your dicts, and where that first mapping is split into e.g. 'key':first_mapping_key and 'count':first_mapping_value.
One way to do that from your list of dicts structure (where "the first key is special") is:
def transform(d):
(k, v), *t = d.items()
return dict(key=k, count=v, **dict(t))
lmod = [transform(d) for d in l]
lmod
# out:
[{'key': 'key1', 'count': 0, 'author': 'test', 'year': '2010'},
{'key': 'key1', 'count': 1, 'author': 'test', 'year': '2011'},
{'key': 'key2', 'count': 5, 'author': 'test', 'year': '2012'},
{'key': 'key1', 'count': 3, 'author': 'test1', 'year': '2012'},
{'key': 'key1', 'count': 1, 'author': 'test', 'year': '2012'}]
Now you can easily groupby and aggregate to your heart's content. For example:
(pd.DataFrame(lmod)
.query('count != 0')
.groupby(['key', 'author'])
.agg({'count': sum, 'year': set})
)
The second topic is how to group by and aggregate without using pandas. Here is a way to do that using first principles (with only core library functions):
def grp_key(d):
return d['key'], d['author']
def expect_single(a):
values = set(a)
assert len(values) == 1
return next(iter(values))
_funcdict = {
'key': expect_single,
'author': expect_single,
'count': sum,
}
def agg(lod):
keys = {k: 1 for d in lod for k in d} # insertion-order union of all keys
d = {k: _funcdict.get(k, set)(d.get(k) for d in lod) for k in keys}
return d
Application:
out = [
agg(list(g))
for k, g in groupby(sorted([
d for d in lmod if d['count'] != 0
], key=grp_key), key=grp_key)
]
out
# output:
[{'key': 'key1', 'count': 2, 'author': 'test', 'year': {'2011', '2012'}},
{'key': 'key1', 'count': 3, 'author': 'test1', 'year': {'2012'}},
{'key': 'key2', 'count': 5, 'author': 'test', 'year': {'2012'}}]
Try groupby-agg on the result of .groupby(['author', 'year']. Aggregation is applied to each key except author and year in separate steps.
df = pd.DataFrame(l)
df_gp = df.groupby(['author', 'year'], as_index=False).sum()
def agg_key(df, key):
return df[df[key] != 0].groupby("author", as_index=False).agg({
# collect the years
"year": lambda sr: [str(el) for el in sr],
# sum the key
key: "sum",
}).to_dict(orient="records")
# keys except group and author
keys = df.columns[~df.columns.isin(["author", "year"])]
# apply aggregation and flatten list of lists
ans = [el for key in keys for el in agg_key(df_gp, key)]
Output
print(ans)
[{'author': 'test', 'year': ['2011', '2012'], 'key1': 2.0},
{'author': 'test1', 'year': ['2012'], 'key1': 3.0},
{'author': 'test', 'year': ['2012'], 'key2': 5.0}]
A single "year" is returned as a single-elemented list instead of a str for the sake of type consistency (recommended).

Is there a more pythonic/compact way to create this dictionary?

I have a dictionary of lists of dictionaries that looks like this:
original_dict = {
1: [{'name': 'Sam'}, {'name': 'Mahmoud'}, {'name': 'Xiao'}],
2: [{'name': 'Olufemi'}, {'name': 'Kim'}, {'name': 'Rafael'}]
}
I know that the names in the lists in this dictionary are all unique. IE: the same name will not appear multiple times in this structure. I want to compile a dictionary of all sub-dictionaries, keyed by their names. I want the result to look like this:
result_dict = {
'Sam': {'name': 'Sam'},
'Mahmoud': {'name': 'Mahmoud'},
'Xiao': {'name': 'Xiao'},
'Olufemi': {'name': 'Olufemi'},
'Kim': {'name': 'Kim'},
'Rafael': {'name': 'Rafael'}
}
So far my solution looks like this:
result_dict = {}
for list_of_dicts in original_dict.values:
for curr_dict in list_of_dicts:
result_dict[curr_dict['name']] = curr_dict
But is there a more pythonic/compact way to do this? Maybe using dict comprehension?
You can use dictionary comprehension.
result = {name['name']: name for k, v in original.items() for name in v}
The inner for loop will iterate through all the key value pairs and the outer will iterate through each name in each value.
Yes, just rewrite your loops as a dict comprehension:
original = {
1: [{'name': 'Sam'}, {'name': 'Mahmoud'}, {'name': 'Xiao'}],
2: [{'name': 'Olufemi'}, {'name': 'Kim'}, {'name': 'Rafael'}]
}
result = {d['name']: d for l in original.values() for d in l}
from pprint import pprint
pprint(result)
Output:
{'Kim': {'name': 'Kim'},
'Mahmoud': {'name': 'Mahmoud'},
'Olufemi': {'name': 'Olufemi'},
'Rafael': {'name': 'Rafael'},
'Sam': {'name': 'Sam'},
'Xiao': {'name': 'Xiao'}}

Replace dictionaries keys with values from list

keys = ['id', 'name', 'address']
list = [{'Value': 1}, {'Value': 'Example name'}, {'VarCharValue': 'GA'}]
Looking for the most pythonic way to replace key dicts keys. I tried with for loop and list indexes but it was ugly. Excepted result:
list = [{'id': 1}, {'name': 'Example name'}, {'address': 'GA'}]
You can use a list comprehension with zip. To extract the only value in a dictionary d, you can use next(iter(d.values())) or list(d.values())[0].
K = ['id', 'name', 'address']
L = [{'Value': 1}, {'Value': 'Example name'}, {'VarCharValue': 'GA'}]
res = [{k: next(iter(v.values()))} for k, v in zip(K, L)]
[{'id': 1}, {'name': 'Example name'}, {'address': 'GA'}]
If you don't want to use iter(), you can use list(), which looks almost the same as jpp's solution
res = [{k: list(v.values())[0]} for k,v in zip(K,L)]
In this, you simply convert the dict_values object to a list, and get the first item, instead of getting the iterator and calling next on it.

How to compare two lists of dicts for multiple key, value pairs?

I have two lists of dicts, one is a modified subset of the other. I would like to get the elements of list_one that don't appear in list_two, based on two keys. Example:
list_one = [{'name': 'alf', 'age': 25},
{'name': 'alf', 'age': 50},
{'name': 'cid', 'age': 30}]
list_two = [{'name': 'alf', 'age': 25, 'hair_color': 'brown'},
{'name': 'cid', 'age': 30, 'hair_color': 'black'}]
desired_list = [{'name': 'alf', 'age': 50}]
How can I accomplish this? I have a feeling it is with some sort of list comprehension, as such:
desired_list = [x for x in list_one if x['name'] != x2['name'] and x['age'] != x2['age']
for all x2 in list_two]
I think this is easily done with two comprehensions as:
Code:
have_2 = {(d['name'], d['age']) for d in list_two}
extra = [d for d in list_one if (d['name'], d['age']) not in have_2]
This first creates a set of tuples which we already have, then checks which dicts do not match any of these existing keys.
Test Code:
list_one = [{'name': 'alf', 'age': 25},
{'name': 'alf', 'age': 50},
{'name': 'cid', 'age': 30}]
list_two = [{'name': 'alf', 'age': 25, 'hair_color': 'brown'},
{'name': 'cid', 'age': 30, 'hair_color': 'black'}]
have_2 = {(d['name'], d['age']) for d in list_two}
extra = [d for d in list_one if (d['name'], d['age']) not in have_2]
print(extra)
Results:
[{'name': 'alf', 'age': 50}]
Yet another possible solution:
>>> list(filter(lambda x: not any([set(x.items()).issubset(y.items()) for y in list_two]), list_one))
[{'age': 50, 'name': 'alf'}]
or:
>>> s2 = [set(i.items()) for i in list_two]
>>> list(filter(lambda x: not any([set(x.items()).issubset(y) for y in s2]), list_one))
[{'age': 50, 'name': 'alf'}]
The advantage of this approach is that it does not need to know the "keys" ('age' and 'name') present in both dictionary sets.
Use this:-
new_list = [i for i,j in zip(list_one,list_two) if i['name']!=j['name'] and i['age']!=j['age']]
print (new_list)
Output
[{'name': 'alf', 'age': 50}]
An efficient way would be to convert your two structures to dicts, keyed by the two values, then create the result dict:
key = lambda dct: (dct['name'], dct['age'])
d1 = { key(dct): dct for dct in list_one }
d2 = { key(dct): dct for dct in list_two }
desired_d = { k:v for k,v in d1.items() if k not in d2 }
print(desired_d)
print(desived_d.values())
diff = [
e for e in list_one
if (e['name'], e['age']) not in set((e['name'], e['age']) for e in list_two)
]
print diff

Categories

Resources