sorting and grouping multiple dictionaries in python - python

I have an unknown number of dictionaries each identified by a specific code.
All of the values are create dynamically, so the codes to group by are unknown.
I am hoping someone might be able to help me identify the best way to group the dictionaries so that I can then move through each to produce a table. There are actually about 7 items in the dictionary.
Each dictionary is a row in the table.
example:
results = ['GROUP1':{'name':'Sam', 'code':'CDZ', 'cat_name':'category1', 'cat_code':'GROUP1'}, 'GROUP1':{'name':'James', 'code':'CDF', 'cat_name':'category1', 'cat_code':'GROUP1'}, 'GROUP2':{'name':'Ellie', 'code':'CDT', 'cat_name':'category2', 'cat_code':'GROUP2'}]
I want to be able to format these dictionaries into a table using to produce the following:
GROUP1 - category1
CODE | NAME
CDZ | Sam
CDF | James
GROUP2 - category2
CODE | NAME
CDT | Ellie
Thanks so much in advance.

If results is a list of dicts like this
>>> results = [
... {'name': 'Sam', 'code': 'CDZ', 'cat_name': 'category1', 'cat_code': 'GROUP1'},
... {'name': 'James', 'code': 'CDF', 'cat_name': 'category1', 'cat_code': 'GROUP1'},
... {'name': 'Ellie', 'code': 'CDT', 'cat_name': 'category2', 'cat_code': 'GROUP2'}]
>>> from collections import defaultdict
>>> D = defaultdict(list)
>>> for item in results:
... D[item['cat_code'], item['cat_name']].append((item['code'], item['name']))
...
>>> import pprint
>>> pprint.pprint(dict(D))
{('GROUP1', 'category1'): [('CDZ', 'Sam'), ('CDF', 'James')],
('GROUP2', 'category2'): [('CDT', 'Ellie')]}
You can iterate through D.items() and do whatever you like

Your first two problems are that your example input is not valid Python (the dict literal syntax is {}, not []), and you cannot have two identical keys in one dict. Until you fix these, we can't do anything. I'm not sure how you'd want to fix the duplicate keys problem--maybe you actually want an array of dicts instead of a dict of dicts?

Related

Create a nested dict containing list from a file

For example, for the txt file of
Math, Calculus, 5
Math, Vector, 3
Language, English, 4
Language, Spanish, 4
into the dictionary of:
data={'Math':{'name':[Calculus, Vector], 'score':[5,3]}, 'Language':{'name':[English, Spanish], 'score':[4,4]}}
I am having trouble with appending value to create list inside the smaller dict. I'm very new to this and I would not understand importing command. Thank you so much for all your help!
For each line, find the 3 values, then add them to a dict structure
from pathlib import Path
result = {}
for row in Path("test.txt").read_text().splitlines():
subject_type, subject, score = row.split(", ")
if subject_type not in result:
result[subject_type] = {'name': [], 'score': []}
result[subject_type]['name'].append(subject)
result[subject_type]['score'].append(int(score))
You can simplify it with the use of a defaultdict that creates the mapping if the key isn't already present
result = defaultdict(lambda: {'name': [], 'score': []}) # from collections import defaultdict
for row in Path("test.txt").read_text().splitlines():
subject_type, subject, score = row.split(", ")
result[subject_type]['name'].append(subject)
result[subject_type]['score'].append(int(score))
With pandas.DataFrame you can directly the formatted data and output the format you want
import pandas as pd
df = pd.read_csv("test.txt", sep=", ", engine="python", names=['key', 'name', 'score'])
df = df.groupby('key').agg(list)
result = df.to_dict(orient='index')
From your data:
data={'Math':{'name':['Calculus', 'Vector'], 'score':[5,3]},
'Language':{'name':['English', 'Spanish'], 'score':[4,4]}}
If you want to append to the list inside your dictionary, you can do:
data['Math']['name'].append('Algebra')
data['Math']['score'].append(4)
If you want to add a new dictionary, you can do:
data['Science'] = {'name':['Chemisty', 'Biology'], 'score':[2,3]}
I am not sure if that is what you wanted but I hope it helps!

Is there a way to sort a dictionary from the outside in

I'm trying to create an event manager in which a dictionary stores the events like this
my_dict = {'2020':
{'9': {'8': ['School ']},
'11': {'13': ['Doctors ']},
'8': {'31': ['Interview']}
},
'2021': {}}
In which the outer key is the year the middle key is a month and the most inner key is a date which leads to a list of events.
I'm trying to first sort it so that the months are in order then sort it again so that the days are in order. Thanks in advance
Use-case
DevOrangeCrush wishes to sort on keys in a nested dictionary where the nesting occurs on multiple levels
Solution
Normalize the data so that the dates match ISO8601 format, for easier sorting
In plain English, this means make sure you always use two digits for month and date, and always use four digits for year
Re-normalize the original dictionary data structure into a single list of dictionaries, where each dictionary represents a row, and the list represents an outer containing table
this is known as an Array of Hashes in perl-speak
this is known as a list of objects in JSON-speak
Once your data is restructured you are solving a much more well-known, well-documented, and more obvious problem, how to sort a simple list of dictionaries (which is already documented in the See also section of this answer).
Example
import pprint
## original data is formatted as a nested dictionary, which is clumsy
my_dict = {'2020':
{'9': {'8': ['School ']}, '11':
{'13': ['Doctors ']},'8':
{'31': ['Interview']}}, '2021': {}
}
## we want the data formatted as a standard table (aka list of dictionary)
## this is the most common format for this kind of data as you would see in
## databases and spreadsheets
mydata_table = []
ddtemp = dict()
for year in my_dict:
for month in my_dict[year].keys():
ddtemp['month'] = '{0:02d}'.format(*[int(month)])
ddtemp['year'] = year
for day in my_dict[year][month].keys():
ddtemp['day'] = '{0:02d}'.format(*[int(day)])
mydata_row = dict()
mydata_row['year'] = '{year}'.format(**ddtemp)
mydata_row['month'] = '{month}'.format(**ddtemp)
mydata_row['day'] = '{day}'.format(**ddtemp)
mydata_row['task_list'] = my_dict[year][month][day]
mydata_row['date'] = '{year}-{month}-{day}'.format(**ddtemp)
mydata_table.append(mydata_row)
pass
pass
pass
## output result is now easily sorted and there is no data loss
## you will have to modify this if you want to deal with years that
## do not have any associated task_list data
pprint.pprint(mydata_table)
'''
## now we have something that can be sorted using well-known python idioms
## and easily manipulated using data-table semantics
## (search, sort, filter-by, group-by, select, project ... etc)
[
{'date': '2020-09-08','day': '08',
'month': '09','task_list': ['School '],'year': '2020'},
{'date': '2020-11-13','day': '13',
'month': '11','task_list': ['Doctors '],'year': '2020'},
{'date': '2020-08-31','day': '31',
'month': '08','task_list': ['Interview'],'year': '2020'},
]
'''
See also
How to sort a python list-of-dictionary
How to sort objects by multiple keys
Why you should use ISO8601 date format
ISO8601 vs timestamp
To get sorted events data, you can do something like this:
def sort_events(my_dict):
new_events_data = dict()
for year, month_data in my_dict.items():
new_month_data = dict()
for month, day_data in month_data.items():
sorted_day_data = sorted(day_data.items(), key=lambda kv: int(kv[0]))
new_month_data[month] = OrderedDict(sorted_day_data)
sorted_months_data = sorted(new_month_data.items(), key=lambda kv: int(kv[0]))
new_events_data[year] = OrderedDict(sorted_months_data)
return new_events_data
Output:
{'2020': OrderedDict([('8', OrderedDict([('31', ['Interview'])])),
('9', OrderedDict([('8', ['School '])])),
('11', OrderedDict([('13', ['Doctors '])]))]),
'2021': OrderedDict()}
A simple dict can't be ordered, you could do it using a OrderedDict but if you simply need to get it sorted while iterating on it do like this
for year in sorted(map(int, my_dict)):
year_dict = my_dict[str(year)]
for month in sorted(map(int, year_dict)):
month_dict = year_dict[str(month)]
for day in sorted(map(int, month_dict)):
events = month_dict[str(day)]
for event in events:
print(year, month, day, event)
Online Demo
The conversion to int is to ensure right ordering between the numbers, without you'll get 1, 10, 11, .., 2, 20, 21
A dictionary in Python does not have an order, you might want to try the OrderedDict class from the collections Module which remembers the order of insertion.
Of course you would have to sort and reinsert the elements whenever you insert a new element which should be placed before any of the existing elements.
If you care about order maybe a different data structure works better. For example a list of lists.

get a value of a key associated with other key's values in dictionaries Python

I have two lists of dictionaries and json list and I need to grab a value of a specific key based on the value of a key from another dictionary. My data looks like this:
opps = [{'Product2Id': '100','Price': '1645'}, {'Product2Id': '101','Price': '5478'}]
products = [{'Id': '100', 'Name': 'Insertion'}, [{'Id': '101', 'Name': 'Print'}]
sales_json = {'Insertion': {'name': 'BAZ', 'id': '95'}, 'Print': {'name': 'BIC', 'id': '105'}
I need to loop through opps and assign a value to a new variable from sales_json. But for a specific Id that are stored in products and in opps
I tried the following:
for index, my_dict in enumerate(opps):
new_name = sales_json[products[my_dict["Product2Id"]]["Name"]]["name"]
Gives me an error.
The desired output is:
print(new_name)
BAZ,
BIC
You are trying to use the list products as a dictionary. Instead, you should first build a product number to name dictionary from it:
prod_num_to_name = {d['Id']: d['Name'] for d in products}
Then, you can run the loop you wanted, modified like this:
for index, my_dict in enumerate(opps):
new_name = sales_json[prod_num_to_name[my_dict["Product2Id"]]]["name"]
print new_name
To return a list of names that match the criteria, using a List Comprehension:
names = [ sales_json[product['Name']]['name'] for opp in opps for product in products if product['Id'] == opp['Product2Id']]
print (names)
Prints the list of names:
['BAZ', 'BIC']

Filtering through a list with embedded dictionaries

I've got a json format list with some dictionaries within each list, it looks like the following:
[{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
The amount of entries within the list can be up to 100. I plan to present the 'name' for each entry, one result at a time, for those that have London as a town. The rest are of no use to me. I'm a beginner at python so I would appreciate a suggestion in how to go about this efficiently. I initially thought it would be best to remove all entries that don't have London and then I can go through them one by one.
I also wondered if it might be quicker to not filter but to cycle through the entire json and select the names of entries that have the town as London.
You can use filter:
data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
london_dicts = filter(lambda d: d['venue']['town'] == 'London', data)
for d in london_dicts:
print(d)
This is as efficient as it can get because:
The loop is written in C (in case of CPython)
filter returns an iterator (in Python 3), which means that the results are loaded to memory one by one as required
One way is to use list comprehension:
>>> data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
>>> [d for d in data if d['venue']['town'] == 'London']
[{'id': 17,
'name': 'Alfred',
'venue': {'id': 456, 'town': 'London'},
'month': 'February'},
{'id': 17,
'name': 'Mary',
'venue': {'id': 56, 'town': 'London'},
'month': 'December'}]

How to create a Pandas DataFrame from a list of OrderedDicts?

I have the following list:
o_dict_list = [(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Coffee')]), 'Ambiguous'),
(OrderedDict([('StreetNamePreType', 'AVENUE'), ('StreetName', 'Washington')]), 'Ambiguous'),
(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Quartz')]), 'Ambiguous')]
And like the title says, I am trying to take this list and create a pandas dataframe where the columns are: 'StreetNamePreType' and 'StreetName' and the rows contain the corresponding values for each key in the OrderedDict.
I have done some searching on StackOverflow to get some guidance on how to create a dataframe, see here but I am getting an error when I run this code (I am trying to replicate what is going on in that response).
from collections import Counter, OrderedDict
import pandas as pd
col = Counter()
for k in o_dict_list:
col.update(k)
df = pd.DataFrame([k.values() for k in o_dict_list], columns = col.keys())
When I run this code, the error I get is: TypeError: unhashable type: 'OrderedDict'
I looked up this error, here, I get that there is a problem with the datatypes, but I, unfortunately, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own.
I suspect that my list of OrderedDict is not exactly the same as in here which is why I am not getting my code to work. More specifically, I believe I have a list of sets, and each element contains an OrderedDict. The example, that I have linked to here seems to be a true list of OrderedDicts.
Again, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own and am looking for help.
I would use list comprehension to do this as follows.
pd.DataFrame([o_dict_list[i][0] for i, j in enumerate(o_dict_list)])
See the output below.
StreetNamePreType StreetName
0 ROAD Coffee
1 AVENUE Washington
2 ROAD Quartz
extracting the OrderedDict objects from your list and then use pd.Dataframe should work
values= []
for i in range(len(o_dict_list)):
values.append(o_dict_list[i][0])
pd.DataFrame(values)
StreetNamePreType StreetName
0 ROAD Coffee
1 AVENUE Washington
2 ROAD Quartz
d = [{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points':90, 'time': '9:00', 'month': 'january'},
{'points_h1':20, 'month': 'june'}]
pd.DataFrame(d)

Categories

Resources