Python group by multiple keys in a dict [closed]

Python group by multiple keys in a dict [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a list of dict I want to group by multiple keys.
I have used sort by default in python dict
data = [
[],
[{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}, {'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020}, {'value': 126, 'bot': 'DB', 'month':8, 'year': 2021}],
[],
[{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}, {'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}, {'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}],
[{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}],
[{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}, {'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}, {'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020}, {'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}, {'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}, {'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}]
]
output_dict = {}
for i in data:
if not i:
pass
for j in i:
for key,val in sorted(j.items()):
output_dict.setdefault(val, []).append(key)
print(output_dict)
{'DB': ['bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot', 'bot'], 9: ['month', 'month', 'month'], 8: ['value'], 2020: ['year', 'year', 'year', 'year', 'year'], 10: ['month', 'month'], 79: ['value'], 126: ['value'], 2021: ['year', 'year', 'year', 'year', 'year', 'year', 'year', 'year'], 'GEMBOT': ['bot', 'bot', 'bot', 'bot'], 11: ['month', 'month'], 222: ['value'], 4: ['month', 'month', 'month'], 623: ['value'], 628: ['value'], 0: ['value'], 703: ['value'], 3: ['month'], 1081: ['value'], 1335: ['value'], 1920: ['value'], 1: ['month'], 2132: ['value'], 2: ['month'], 2383: ['value']}
But I want the output like this.
[{ "bot": "DB",
"date": "Sept 20",
"value": 134
},{"bot": "DB",
"date": "Oct 20",
"value": 79
}.. So on ]
Is there an efficient way to flatten this list ?
Thanks in advance

Two things will make this easier to answer. The first is a list comprehension that will promote sub-items:
data_reshaped = [cell for row in data for cell in row]
this will take your original data and flatten it a bit to:
[
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020},
{'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021},
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020},
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021},
{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020},
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021},
{'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021},
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021},
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
]
Now we can iterate over that using an compound key and setdefault() to aggregate the results. Note if you rather use collections.defaultdict() as I do then swap that out for setdefault().
results = {}
for cell in data_reshaped:
key = f"{cell['bot']}_{cell['year']}_{cell['month']}"
value = cell["value"] # save the value so we can reset cell next
cell["value"] = 0 # setting this to 0 cleans up the next line.
results.setdefault(key, cell)["value"] += value
This should allow you to:
for result in results.values():
print(result)
Giving:
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}
{'value': 1414, 'bot': 'DB', 'month': 10, 'year': 2020}
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021}
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
Full solution:
data = [
[],
[
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020},
{'value': 79, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 126, 'bot': 'DB', 'month':8, 'year': 2021}
],
[],
[
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020},
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021},
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
],
[
{'value': 0, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
],
[
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020},
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021},
{'value': 1335, 'bot': 'DB', 'month': 10, 'year': 2020},
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021},
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021},
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
]
]
data_reshaped = [cell for row in data for cell in row]
results = {}
for cell in data_reshaped:
key = f"{cell['bot']}_{cell['year']}_{cell['month']}"
value = cell["value"]
cell["value"] = 0
results.setdefault(key, cell)["value"] += value
for result in results.values():
print(result)
Again Giving:
{'value': 8, 'bot': 'DB', 'month': 9, 'year': 2020}
{'value': 1414, 'bot': 'DB', 'month': 10, 'year': 2020}
{'value': 126, 'bot': 'DB', 'month': 8, 'year': 2021}
{'value': 222, 'bot': 'GEMBOT', 'month': 11, 'year': 2020}
{'value': 623, 'bot': 'GEMBOT', 'month': 4, 'year': 2021}
{'value': 628, 'bot': 'GEMBOT', 'month': 9, 'year': 2021}
{'value': 703, 'bot': 'DB', 'month': 11, 'year': 2020}
{'value': 1081, 'bot': 'DB', 'month': 3, 'year': 2021}
{'value': 1920, 'bot': 'DB', 'month': 4, 'year': 2021}
{'value': 2132, 'bot': 'DB', 'month': 1, 'year': 2021}
{'value': 2383, 'bot': 'DB', 'month': 2, 'year': 2021}
I will leave it to you to figure out casting the two date fields to some other presentation as that seems out of context with the question at hand.

Maybe try:
from pprint import pprint
import datetime
output_dict = []
for i in data:
if i:
for j in i:
for key, val in sorted(j.items()):
if key == "bot":
temp["bot"] = val
elif key == "value":
temp["value"] = val
elif key == "month":
month = datetime.datetime.strptime(str(val), "%m")
temp["date"] = month.strftime("%b")
elif key == "year":
temp["date"] = str(temp["date"]) + " " + str(val)
output_dict.append(temp)
temp = {}
pprint(output_dict)
The final results are shown as follows:
[{'bot': 'DB', 'date': 'Sep 2020', 'value': 8},
{'bot': 'DB', 'date': 'Oct 2020', 'value': 79},
{'bot': 'DB', 'date': 'Aug 2021', 'value': 126},
{'bot': 'GEMBOT', 'date': 'Nov 2020', 'value': 222},
{'bot': 'GEMBOT', 'date': 'Apr 2021', 'value': 623},
{'bot': 'GEMBOT', 'date': 'Sep 2021', 'value': 628},
{'bot': 'GEMBOT', 'date': 'Apr 2021', 'value': 0},
{'bot': 'DB', 'date': 'Nov 2020', 'value': 703},
{'bot': 'DB', 'date': 'Mar 2021', 'value': 1081},
{'bot': 'DB', 'date': 'Oct 2020', 'value': 1335},
{'bot': 'DB', 'date': 'Apr 2021', 'value': 1920},
{'bot': 'DB', 'date': 'Jan 2021', 'value': 2132},
{'bot': 'DB', 'date': 'Feb 2021', 'value': 2383}]

Maybe try:
output = []
for i in data:
if not i:
pass
for j in i:
output.append(j)
And then if you want to sort it, then you can use sorted_output = sorted(ouput, key=lambda k: k['bot']) to sort it by bot for example. If you want to sort it by date, maybe create a value that calculates the date in months and then sorts it from there.

Related

Count duplicates in dictionary by specific keys

I have a list of dictionaries and I need to count duplicates by specific keys.
For example:
[
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
If will specify 'age' and 'country' it should return
[
{
'age': 10,
'country': 'USA',
'count': 2
},
{
'age': 10,
'country': 'Canada',
'count': 2
},
{
'age': 15,
'country': 'Canada',
'count': 1
}
]
Or if I will specify 'name' and 'height':
[
{
'name': 'John',
'height': 185,
'count': 2
},
{
'name': 'Mark',
'height': 180,
'count': 2
},
{
'name': 'Doe',
'heigth': 185,
'count': 1
}
]
Maybe there is a way to implement this by Counter?

You can use itertools.groupby with sorted list:
>>> data = [
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
>>> from itertools import groupby
>>> key = 'age', 'country'
>>> list_sorter = lambda x: tuple(x[k] for k in key)
>>> grouper = lambda x: tuple(x[k] for k in key)
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'age': 10, 'country': 'Canada', 'count': 2},
{'age': 10, 'country': 'USA', 'count': 2},
{'age': 15, 'country': 'Canada', 'count': 1}]
>>> key = 'name', 'height'
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
If you use pandas then you can use, pandas.DataFrame.groupby, pandas.groupby.size, pandas.Series.to_frame, pandas.DataFrame.reset_index and finally pandas.DataFrame.to_dict with orient='records':
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> df.groupby(list(key)).size().to_frame('count').reset_index().to_dict('records')
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]

Parse multiple nested file in a pandas dataframe

I am having this file, this file is a sample result form a elastic search query.
[{'key': 'hkdshkdsd',
'doc_count': 1851,
'aggs_fs': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 697,
'buckets': [{'key': 'jdsjodsjod',
'doc_count': 113,
'agg_date': {'buckets': [{'key_as_string': '2020-09-07T14:00:00.000Z',
'key': 1599487200000,
'doc_count': 20,
'agg_ave': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T15:00:00.000Z',
'key': 1599490800000,
'doc_count': 19,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T16:00:00.000Z',
'key': 1599494400000,
'doc_count': 27,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T17:00:00.000Z',
'key': 1599498000000,
'doc_count': 20,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T18:00:00.000Z',
'key': 1599501600000,
'doc_count': 23,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T19:00:00.000Z',
'key': 1599505200000,
'doc_count': 4,
'agg_ave': {'value': 40.22999954223633},
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T20:00:00.000Z',
'key': 1599508800000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T21:00:00.000Z',
'key': 1599512400000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T22:00:00.000Z',
'key': 1599516000000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-07T23:00:00.000Z',
'key': 1599519600000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-08T00:00:00.000Z',
'key': 1599523200000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-08T01:00:00.000Z',
'key': 1599526800000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}},
{'key_as_string': '2020-09-08T02:00:00.000Z',
'key': 1599530400000,
'doc_count': 0,
'aggs_ma': {'value': 40.22999954223633}}]}}]}}]
I need to convert this file in a pandas DataFrame, I tried to use json_normalize, but seems it is just normalizing the first key and when the first key is normalized and want to keep normalizing what happens is that return me an error.
Does somebody can help me?
Thanks

I use this code:
dfs = []
for i in your_list:
df = pd.DataFrame.from_dict(i, orient='index')
# Use pd.DataFrame.from_dict(i, orient='index').T may be
dfs.append(dfs)
full_df = pd.concat(dfs)
It just made life easier than json_normalize

creating multiple index on a pandas dataframe

i am trying to convert a data-frame to a dict in the below format:
name age country state pincode
user1 10 in tn 1
user2 11 in tx 2
user3 12 eu gh 3
user4 13 eu io 4
user5 14 us pi 5
user6 15 us ew 6
the output groups users based on countries and had a dictionary of users with the details of users inside a dictionary
{
'in': {
'user1': {'age': 10, 'state': 'tn', 'pincode': 1},
'user2': {'age': 11, 'state': 'tx', 'pincode': 2}
},
'eu': {
'user3': {'age': 12, 'state': 'gh', 'pincode': 3},
'user4': {'age': 13, 'state': 'io', 'pincode': 4},
},
'us': {
'user5': {'age': 14, 'state': 'pi', 'pincode': 5},
'user6': {'age': 15, 'state': 'ew', 'pincode': 6},
}
}
I am currently doing this by below statement(this is not completely correct as i am using a list inside the loop, instead it should have been a dict):
op2 = {}
for i, row in sample2.iterrows():
if row['country'] not in op2:
op2[row['country']] = []
op2[row['country']] = {row['name'] : {'age':row['age'],'state':row['state'],'pincode':row['pincode']}}
I want a the solution to work if there are additional columns added to the df. for example telephone number. Since the statement I have written is static it won't give me the additional rows in my output. Is there a built in method in pandas that does this?

You can combine to_dict with groupby:
{k:v.drop('country',axis=1).to_dict('i')
for k,v in df.set_index('name').groupby('country')}
Output:
{'eu': {'user3': {'age': 12, 'state': 'gh', 'pincode': 3},
'user4': {'age': 13, 'state': 'io', 'pincode': 4}},
'in': {'user1': {'age': 10, 'state': 'tn', 'pincode': 1},
'user2': {'age': 11, 'state': 'tx', 'pincode': 2}},
'us': {'user5': {'age': 14, 'state': 'pi', 'pincode': 5},
'user6': {'age': 15, 'state': 'ew', 'pincode': 6}}}

How can i add the dictionary into list using append function or the other function?

Execusme, i need your help!
Code Script
tracks_ = []
track = {}
if category == 'reference':
for i in range(len(tracks)):
if len(tracks) >= 1:
_tracks = tracks[i]
track['id'] = _track['id']
tracks_.append(track)
print (tracks_)
tracks File
[{'id': 345, 'mode': 'ghost', 'missed': 27, 'box': [0.493, 0.779, 0.595, 0.808], 'score': 89, 'class': 1, 'time': 3352}, {'id': 347, 'mode': 'ghost', 'missed': 9, 'box': [0.508, 0.957, 0.631, 0.996], 'score': 89, 'class': 1, 'time': 5463}, {'id': 914, 'mode': 'track', 'missed': 0, 'box': [0.699, 0.496, 0.991, 0.581], 'score': 87, 'class': 62, 'time': 6549}, {'id': 153, 'mode': 'track', 'missed': 0, 'box': [0.613, 0.599, 0.88, 0.689], 'score': 73, 'class': 62, 'time': 6549}, {'id': 588, 'mode': 'track', 'missed': 0, 'box': [0.651, 0.685, 0.958, 0.775], 'score': 79, 'class': 62, 'time': 6549}, {'id': 972, 'mode': 'track', 'missed': 0, 'box': [0.632, 0.04, 0.919, 0.126], 'score': 89, 'class': 62, 'time': 6549}, {'id': 300, 'mode': 'ghost', 'missed': 6, 'box': [0.591, 0.457, 0.74, 0.498], 'score': 71, 'class': 62, 'time': 5716}]
Based on the codescript and the input above, i want to print out the tracks_ and the result is
[{'id': 300}, {'id': 300}, {'id': 300}, {'id': 300}, {'id': 300}, {'id': 300}, {'id': 300}]
but, the result that print out should be like this :
[{'id': 345}, {'id': 347},{'id': 914}, {'id': 153}, {'id': 588}, {'id': 972}, {'id': 300}, ]

you are appending to your list track_ the same dict , which causes to have in your list only references of the same dict, practically you have only one dict in your list tracks_, and any modification to the dict track will be reflected in all the elements of your list, to fix you should create a new dict on each iteration:
if category == 'reference' and len(tracks) >= 1:
for d in tracks:
tracks_.append({'id' : d['id']})
you could use a list comprehension:
tracks_ = [{'id': t['id']} for t in tracks]
tracks_
output:
[{'id': 345},
{'id': 347},
{'id': 914},
{'id': 153},
{'id': 588},
{'id': 972},
{'id': 300}]

Python List of Dictionaries iterations

What wrong with this code, return empty list?
week = []
for d in week:
day_num = calendar.weekday(d.year,d.month,d.day)
day_name = calendar.day_name[day_num]
daydate = { "day_name":day_name,
"day":d.day,
"month":d.month,
"year":d.year,
}
week.append(daydate)
return week

Because the list week is empty initially, the for loop is iterated zero times.

Your week list is set as [] just before the for statement, so the loop doesn't have any element to iterate on. You have to either:
remove this week = [] if week has already been declared
add elements in the list.
fixed your code. It's maybe not on week that you want to iterate but on another variable.

import calendar
from datetime import datetime
from datetime import timedelta
def generateDays(start_date,weeks):
days=7*weeks
week = []
for day in np.arange(days):
a_date = pd.to_datetime(start_date + timedelta(days=int(day)))
day_num = calendar.weekday(a_date.year,a_date.month,a_date.day)
day_name = calendar.day_name[day_num]
daydate = { "day_name":day_name,
"day":a_date.day,
"month":a_date.month,
"year":a_date.year,
}
week.append(daydate)
return week
print(generateDays(date.today(),2))
output
[{'day_name': 'Wednesday', 'day': 16, 'month': 6, 'year': 2021}, {'day_name': 'Thursday', 'day': 17, 'month': 6, 'year': 2021}, {'day_name': 'Friday', 'day': 18, 'month': 6, 'year': 2021}, {'day_name': 'Saturday', 'day': 19, 'month': 6, 'year': 2021}, {'day_name': 'Sunday', 'day': 20, 'month': 6, 'year': 2021}, {'day_name': 'Monday', 'day': 21, 'month': 6, 'year': 2021}, {'day_name': 'Tuesday', 'day': 22, 'month': 6, 'year': 2021}, {'day_name': 'Wednesday', 'day': 23, 'month': 6, 'year': 2021}, {'day_name': 'Thursday', 'day': 24, 'month': 6, 'year': 2021}, {'day_name': 'Friday', 'day': 25, 'month': 6, 'year': 2021}, {'day_name': 'Saturday', 'day': 26, 'month': 6, 'year': 2021}, {'day_name': 'Sunday', 'day': 27, 'month': 6, 'year': 2021}, {'day_name': 'Monday', 'day': 28, 'month': 6, 'year': 2021}, {'day_name': 'Tuesday', 'day': 29, 'month': 6, 'year': 2021}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python group by multiple keys in a dict [closed] - python

Related

Count duplicates in dictionary by specific keys

Parse multiple nested file in a pandas dataframe

creating multiple index on a pandas dataframe

How can i add the dictionary into list using append function or the other function?

Python List of Dictionaries iterations

Categories

Resources