Out of all the months in the year, I need to code the month with largest total balance (it's June as all together June has the biggest "amount" value)
lst = [
{'account': 'x\\*', 'amount': 300, 'day': 3, 'month': 'June'},
{'account': 'y\\*', 'amount': 550, 'day': 9, 'month': 'May'},
{'account': 'z\\*', 'amount': -200, 'day': 21, 'month': 'June'},
{'account': 'g', 'amount': 80, 'day': 10, 'month': 'May'},
{'account': 'x\\*', 'amount': 30, 'day': 16, 'month': 'August'},
{'account': 'x\\*', 'amount': 100, 'day': 5, 'month': 'June'},
]
The problem is that both "amount" and the name of the months are values.
I tried to find the total for each month, but I need to use for loop to code the highest month "amount".
My attempt:
get_sum = lambda my_dict, month: sum(d['amount']
for d in my_list if d['month'] == month)
total_June = get_sum(my_list,'June')
total_August = get_sum(my_list),'August')
A simple solution with pandas.
import pandas as pd
lst = [
{'account': 'x\\*', 'amount': 300, 'day': 3, 'month': 'June'},
{'account': 'y\\*', 'amount': 550, 'day': 9, 'month': 'May'},
{'account': 'z\\*', 'amount': -200, 'day': 21, 'month': 'June'},
{'account': 'g', 'amount': 80, 'day': 10, 'month': 'May'},
{'account': 'x\\*', 'amount': 30, 'day': 16, 'month': 'August'},
{'account': 'x\\*', 'amount': 100, 'day': 5, 'month': 'June'},
]
# convert list of dictionaries to dataframe
df = pd.DataFrame(lst)
# Get the row / series that has max amount.
# idxmax returns an index for loc.
max_series_by_amount = df.loc[df['amount'].idxmax(axis="index")]
# Get only month and amount in a plain list
print(max_series_by_amount[["month", "amount"]].tolist())
['May', 550]
Please note that using pandas adds a substantial amount of dependencies to the project, that said, pandas is commonly imported anyway for data science or data manipulation tasks. Pierre D solutions here are definitively faster.
One possibility (among many):
from itertools import groupby
from operator import itemgetter
mo_total = {
k: sum([d.get('amount', 0) for d in v])
for k, v in groupby(sorted(lst, key=itemgetter('month')), key=itemgetter('month'))
}
>>> mo_total
{'August': 30, 'June': 200, 'May': 630}
>>> max(mo_total.items(), key=lambda kv: kv[1])
('May', 630)
Without itemgetter:
bymonth = lambda d: d.get('month')
mo_total = {
k: sum([d.get('amount', 0) for d in v])
for k, v in groupby(sorted(lst, key=bymonth), key=bymonth)
}
Yet another way, using defaultdict:
from collections import defaultdict
tot = defaultdict(int)
for d in lst:
tot[d['month']] += d.get('amount', 0)
>>> tot
defaultdict(int, {'June': 200, 'May': 630, 'August': 30})
>>> max(tot, key=lambda k: tot[k])
'May'
I have a list of dictionaries:
data = [{'average': 2, 'day': '2022-01-01'},
{'average': 3, 'day': '2022-01-02'},
{'average': 5, 'day': '2022-01-03'},
{'sum': 8, 'day': '2022-01-01'},
{'sum': 15, 'day': '2022-01-02'},
{'sum': 9, 'day': '2022-01-03'},
{'total_value': 19, 'day': '2022-01-01'},
{'total_value': 99, 'day': '2022-01-02'},
{'total_value': 15, 'day': '2022-01-03'}]
I want my output as:
output = [{'average': 2, 'sum': 8, 'total_value': 19, 'day': '2022-01-01'},
{'average': 3, 'sum': 15, 'total_value': 99, 'day': '2022-01-02'},
{'average': 5, 'sum': 9, 'total_value': 15, 'day': '2022-01-03'}]
The output puts the values together based off their date. My approaches so far have been to try and separate everything out into different dictionaries (date_dict, sum_dict, etc.) and then bringing them all together, but that doesn't seem to work and is extremely sloppy.
You could iterate over data and create a dictionary using day as key:
data = [{'average': 2, 'day': '2022-01-01'},
{'average': 3, 'day': '2022-01-02'},
{'average': 5, 'day': '2022-01-03'},
{'sum': 8, 'day': '2022-01-01'},
{'sum': 15, 'day': '2022-01-02'},
{'sum': 9, 'day': '2022-01-03'},
{'total_value': 19, 'day': '2022-01-01'},
{'total_value': 99, 'day': '2022-01-02'},
{'total_value': 15, 'day': '2022-01-03'}]
output = {}
for item in data:
if item['day'] not in output:
output[item['day']] = item
else:
output[item['day']].update(item)
print(list(output.values()))
Out:
[
{'average': 2, 'day': '2022-01-01', 'sum': 8, 'total_value': 19},
{'average': 3, 'day': '2022-01-02', 'sum': 15, 'total_value': 99},
{'average': 5, 'day': '2022-01-03', 'sum': 9, 'total_value': 15}
]
Had a bit of fun and made it with dict/list comprehension. Check out that neat | operator in python 3.9+ :-)
Python <3.9
from collections import ChainMap
data_grouped_by_day = {
day : dict(ChainMap(*[d for d in data if d["day"] == day ]))
for day in {d["day"] for d in data }
}
for day, group_data in data_grouped_by_day.items():
group_data.update(day=day)
result = list(data_grouped_by_day.values())
Python 3.9+
from collections import ChainMap
result = [
dict(ChainMap(*[d for d in data if d["day"] == day ])) | {"day" : day}
for day in {d["day"] for d in data}
]
The output in both cases is (keys order may vary)
[{'total_value': 99, 'day': '2022-01-02', 'sum': 15, 'average': 3},
{'total_value': 15, 'day': '2022-01-03', 'sum': 9, 'average': 5},
{'total_value': 19, 'day': '2022-01-01', 'sum': 8, 'average': 2}]
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a list of dict I want to group by multiple keys.
I have used sort by default in python dict
data = [
[],
[{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}, {'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020}, {'value': 126, 'bot': 'DB', 'month':8, 'year': 2021}],
[],
[{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}, {'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}, {'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}],
[{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}],
[{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}, {'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}, {'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020}, {'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}, {'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}, {'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}]
]
output_dict = {}
for i in data:
if not i:
pass
for j in i:
for key,val in sorted(j.items()):
output_dict.setdefault(val, []).append(key)
print(output_dict)
{'DB': ['bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot'], 9: ['month', 'month', 'month'], 8: ['value'], 2020: ['year', 'year', 'year', 'year', 'year'], 10: ['month', 'month'], 79: ['value'], 126: ['value'], 2021: ['year', 'year', 'year', 'year', 'year', 'year', 'year', 'year'], 'GEMBOT': ['bot', 'bot', 'bot', 'bot'], 11: ['month', 'month'], 222: ['value'], 4: ['month', 'month', 'month'], 623: ['value'], 628: ['value'], 0: ['value'], 703: ['value'], 3: ['month'], 1081: ['value'], 1335: ['value'], 1920: ['value'], 1: ['month'], 2132: ['value'], 2: ['month'], 2383: ['value']}
But I want the output like this.
[{ "bot": "DB",
"date": "Sept 20",
"value": 134
},{"bot": "DB",
"date": "Oct 20",
"value": 79
}.. So on ]
Is there an efficient way to flatten this list ?
Thanks in advance
Two things will make this easier to answer. The first is a list comprehension that will promote sub-items:
data_reshaped = [cell for row in data for cell in row]
this will take your original data and flatten it a bit to:
[
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020},
{'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021},
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020},
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021},
{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020},
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021},
{'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021},
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021},
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
]
Now we can iterate over that using an compound key and setdefault() to aggregate the results. Note if you rather use collections.defaultdict() as I do then swap that out for setdefault().
results = {}
for cell in data_reshaped:
key = f"{cell['bot']}_{cell['year']}_{cell['month']}"
value = cell["value"] # save the value so we can reset cell next
cell["value"] = 0 # setting this to 0 cleans up the next line.
results.setdefault(key, cell)["value"] += value
This should allow you to:
for result in results.values():
print(result)
Giving:
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}
{'value': 1414, 'bot': 'DB', 'month': 10, 'year': 2020}
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021}
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
Full solution:
data = [
[],
[
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020},
{'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 126, 'bot': 'DB', 'month':8, 'year': 2021}
],
[],
[
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020},
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
],
[
{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
],
[
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020},
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021},
{'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021},
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021},
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
]
]
data_reshaped = [cell for row in data for cell in row]
results = {}
for cell in data_reshaped:
key = f"{cell['bot']}_{cell['year']}_{cell['month']}"
value = cell["value"]
cell["value"] = 0
results.setdefault(key, cell)["value"] += value
for result in results.values():
print(result)
Again Giving:
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}
{'value': 1414, 'bot': 'DB', 'month': 10, 'year': 2020}
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021}
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
I will leave it to you to figure out casting the two date fields to some other presentation as that seems out of context with the question at hand.
Maybe try:
from pprint import pprint
import datetime
output_dict = []
for i in data:
if i:
for j in i:
for key, val in sorted(j.items()):
if key == "bot":
temp["bot"] = val
elif key == "value":
temp["value"] = val
elif key == "month":
month = datetime.datetime.strptime(str(val), "%m")
temp["date"] = month.strftime("%b")
elif key == "year":
temp["date"] = str(temp["date"]) + " " + str(val)
output_dict.append(temp)
temp = {}
pprint(output_dict)
The final results are shown as follows:
[{'bot': 'DB', 'date': 'Sep 2020', 'value': 8},
{'bot': 'DB', 'date': 'Oct 2020', 'value': 79},
{'bot': 'DB', 'date': 'Aug 2021', 'value': 126},
{'bot': 'GEMBOT', 'date': 'Nov 2020', 'value': 222},
{'bot': 'GEMBOT', 'date': 'Apr 2021', 'value': 623},
{'bot': 'GEMBOT', 'date': 'Sep 2021', 'value': 628},
{'bot': 'GEMBOT', 'date': 'Apr 2021', 'value': 0},
{'bot': 'DB', 'date': 'Nov 2020', 'value': 703},
{'bot': 'DB', 'date': 'Mar 2021', 'value': 1081},
{'bot': 'DB', 'date': 'Oct 2020', 'value': 1335},
{'bot': 'DB', 'date': 'Apr 2021', 'value': 1920},
{'bot': 'DB', 'date': 'Jan 2021', 'value': 2132},
{'bot': 'DB', 'date': 'Feb 2021', 'value': 2383}]
Maybe try:
output = []
for i in data:
if not i:
pass
for j in i:
output.append(j)
And then if you want to sort it, then you can use sorted_output = sorted(ouput, key=lambda k: k['bot']) to sort it by bot for example. If you want to sort it by date, maybe create a value that calculates the date in months and then sorts it from there.
I'd like to create a dictionary inside a dictionary in python using function setdefault().
I'm trying to make a list of names and dates of birth using fallow dictionary.
names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}
dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02'}
I was tring to use set to achieve this:
res_dict = dict()
for v, k in names.items():
for v1, k1 in dates.items():
res_dict.setdefault(v, {}).append(k)
res_dict.setdefault(v1, {}).append(k1)
return res_dict
but it give me an error.
The result should be:
res_dict = {'Will': {'january': '7/01'}, 'Mary' : {'february': '2/02'} ,'George': {'march': '21/03'}, 'Steven': {'april': '14/03'}, 'Peter': {'may': ''}}
How can I get the desired result using setdefault()?
You could try this:
In [17]: results = {}
In [18]: for k, v in names.iteritems():
results[k] = {v: dates.setdefault(k, '')}
....:
....:
In [20]: results
Out[20]:
{'George': {'march': '21/02'},
'Mary': {'february': '2/02'},
'Peter': {'may': ''},
'Steven': {'april': '14/03'},
'Will': {'january': '7/01'}}
And as to your comment regarding adding month and day, you can add them similarly:
In [28]: for k, v in names.iteritems():
results[k] = {'month': v, 'day': dates.setdefault(k, '')}
....:
....:
In [30]: results
Out[30]:
{'George': {'day': '21/02', 'month': 'march'},
'Mary': {'day': '2/02', 'month': 'february'},
'Peter': {'day': '', 'month': 'may'},
'Steven': {'day': '14/03', 'month': 'april'},
'Will': {'day': '7/01', 'month': 'january'}}
And if you want to omit day completely in the case where a value doesn't exist:
In [8]: results = {}
In [9]: for k, v in names.iteritems():
...: results[k] = {'month': v}
...: if dates.has_key(k):
...: results[k]['day'] = dates[k]
...:
...:
In [10]: results
Out[10]:
{'George': {'day': '21/03', 'month': 'march'},
'Mary': {'day': '2/02', 'month': 'february'},
'Peter': {'month': 'may'},
'Steven': {'day': '14/03', 'month': 'april'},
'Will': {'day': '7/01', 'month': 'january'}}
And in the odd case where you know the date but not the month, iterating through the set of the keys (as #KayZhu suggested) with a defaultdict may be the easiest solution:
In [1]: from collections import defaultdict
In [2]: names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}
In [3]: dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02', 'Marat': '27/03'}
In [4]: results = defaultdict(dict)
In [5]: for name in set(names.keys() + dates.keys()):
...: if name in names:
...: results[name]['month'] = names[name]
...: if name in dates:
...: results[name]['day'] = dates[name]
...:
...:
In [6]: for k, v in results.iteritems():
...: print k, v
...:
...:
George {'day': '21/03', 'month': 'march'}
Will {'day': '7/01', 'month': 'january'}
Marat {'day': '27/03'}
Steven {'day': '14/03', 'month': 'april'}
Peter {'month': 'may'}
Mary {'day': '2/02', 'month': 'february'}
A simple one-liner:
In [38]: names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}
In [39]: dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02'}
In [40]: dict((name,{names[name]:dates.get(name,'')}) for name in names)
out[40]:
{'George': {'march': '21/03'},
'Mary': {'february': '2/02'},
'Peter': {'may': ''},
'Steven': {'april': '14/03'},
'Will': {'january': '7/01'}}
You will need get the superset keys from names and dates first:
>>> for k in set(names.keys() + dates.keys()):
... res_dict[k] = {names.setdefault(k, ''): dates.setdefault(k, None)}
...
...
>>> res_dict
{'Will': {'january': '7/01'}, 'Steven': {'april': '14/03'}, 'Peter': {'may': None},
'Mary': {'february': '2/02'}, 'George': {'march': '21/03'}}
Otherwise, you will miss out results whose keys are in dates but not in names.