Get top rows by one column Django - python

I'm making a job to categorize earnings and expenses, from app of movimentations.
To this, i need get the category to tile, with more cases.
For example, in this Scenario:
| title | category | count |
| ----- | -------------- | ----- |
| Pizza | food | 6 |
| Pizza | others_expense | 1 |
| Pizza | refund | 1 |
I want return just the first row, because the title is the same, and category food is used with most frequency.
Code example
I want get the result using just Django ORM, because i have diferent databases and is more fast than iterate over a large list.
Model:
class Movimentation(models.Model):
title = models.CharField(max_length=50)
value = models.FloatField()
category = models.CharField(max_length=50)
Consult:
My actual consult in Django ORM is.
Movimentation.objects \
.values('title', 'category') \
.annotate(count=Count('*'))
Result:
[
{'title': 'Pizza', 'category': 'food', 'count': 6},
{'title': 'Pizza', 'category': 'others_expense', 'count': 1},
{'title': 'Pizza', 'category': 'refund', 'count': 1},
{'title': 'Hamburguer', 'category': 'food', 'count': 1},
{'title': 'Clothing', 'category': 'personal', 'count': 18},
{'title': 'Clothing', 'category': 'home', 'count': 15},
{'title': 'Clothing', 'category': 'others_expense', 'count': 1}
]
Expected result:
In this case, i get just one row by title, with the most used category.
[
{'title': 'Pizza', 'category': 'food', 'count': 6},
{'title': 'Hamburguer', 'category': 'food', 'count': 1},
{'title': 'Clothing', 'category': 'personal', 'count': 18}
]

Related

Filter list based on value existing in another list

I have o first list A=
[{'name': 'PASSWORD', 'id': '5f2496e5-dc40-418a-92e0-098e4642a92e'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920'},
{'name': 'PERU_DNI_NUMBER', 'id': '41f41303-4a71-4732-a8a4-0eecea464562'},
{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'},
{'name': 'POLAND_NATIONAL_ID_NUMBER',
'id': '32c49d92-6d5f-408e-b41e-dfec76ceae6a'}]
and I have a second list B :
[{'name': 'PHONE_NUMBER', 'count': '96'}]
I want to filter the first list based on the second in order to have the following list :
[{'name': 'PHONE_NUMBER', 'count': '96','id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'}.
I have used the following code but I dont get the right ouptut:
filtered = []
for x,i in DLP_job[i]['name']:
if x,i in ids[i]['name']:
filtered.append(x)
print(filtered)
Here is the solution
A = [{'name': 'PASSWORD', 'id': '5f2496e5-dc40-418a-92e0-098e4642a92e'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920'},
{'name': 'PERU_DNI_NUMBER', 'id': '41f41303-4a71-4732-a8a4-0eecea464562'},
{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'},
{'name': 'POLAND_NATIONAL_ID_NUMBER',
'id': '32c49d92-6d5f-408e-b41e-dfec76ceae6a'}]
B = [{'name': 'PHONE_NUMBER', 'count': '96'}]
print([{**x, **y} for x in A for y in B if y['name'] == x['name']])
One way is to walk both lists, and wherever you have matching name keys, use the merger of the 2 dicts:
l1 = [{'name': 'PASSWORD', 'id': '5f2496e5-dc40-418a-92e0-098e4642a92e'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920'},
{'name': 'PERU_DNI_NUMBER', 'id': '41f41303-4a71-4732-a8a4-0eecea464562'},
{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'},
{'name': 'POLAND_NATIONAL_ID_NUMBER',
'id': '32c49d92-6d5f-408e-b41e-dfec76ceae6a'}]
l2 = [{'name': 'PHONE_NUMBER', 'count': '96'}, {'name': 'PERSON_NAME', 'count': '100'}]
result = []
for d2 in l2:
for d1 in l1:
if d1['name'] == d2['name']:
result.append({**d1, **d2})
print(result)
[{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b', 'count': '96'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920', 'count': '100'}]

python pandas unnest data column containing a list of dictionaries

we have the following dataframe:
import pandas as pd
our_df = pd.DataFrame(data = {'rank': {0: 1, 1: 2}, 'title_name': {0: "And It's Still Alright", 1: 'Black Madonna'}, 'title_id': {0: '120034150', 1: '106938609'}, 'artist_id': {0: '222521', 1: '200160'}, 'artist_name': {0: 'Nathaniel Rateliff', 1: 'Cage The Elephant'}, 'label': {0: 'CNCO', 1: 'RCA'}, 'metrics': {0: [{'name': 'Rank', 'value': 1}, {'name': 'Song', 'value': "And It's Still Alright"}, {'name': 'Artist', 'value': 'Nathaniel Rateliff'}, {'name': 'TP Spins', 'value': 933}, {'name': '+/- Chg. Spins', 'value': -32}, {'name': 'LP Spins', 'value': 965}, {'name': 'Stations', 'value': '44/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1260000}, {'name': '+/- Chg. Audience', 'value': -40600}, {'name': 'LP Audience', 'value': 1300600}, {'name': 'TP Stream', 'value': 413101}], 1: [{'name': 'Rank', 'value': 2}, {'name': 'Song', 'value': 'Black Madonna'}, {'name': 'Artist', 'value': 'Cage The Elephant'}, {'name': 'TP Spins', 'value': 814}, {'name': '+/- Chg. Spins', 'value': 38}, {'name': 'LP Spins', 'value': 776}, {'name': 'Stations', 'value': '38/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1283400}, {'name': '+/- Chg. Audience', 'value': -21600}, {'name': 'LP Audience', 'value': 1305000}, {'name': 'TP Stream', 'value': 362366}]}})
and we are looking to convert the metrics column into 12 new columns in our dataframe, using the metric's name field as the column name, and value field as the field in the dataframe. Something like this:
rank title_name title_id artist_id artist_name label Rank Song ...
1 'And It's Still Alright' 120034150 222521 'Nathaniel Rateliff' 'CNCO' 1 "And It's Still Alright"
Here's what the value in the metrics column looks like for row 1:
our_df['metrics'][0]
[{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': "And It's Still Alright"},
{'name': 'Artist', 'value': 'Nathaniel Rateliff'},
{'name': 'TP Spins', 'value': 933},
{'name': '+/- Chg. Spins', 'value': -32},
{'name': 'LP Spins', 'value': 965},
{'name': 'Stations', 'value': '44/46'},
{'name': 'Adds', 'value': 0},
{'name': 'TP Audience', 'value': 1260000},
{'name': '+/- Chg. Audience', 'value': -40600},
{'name': 'LP Audience', 'value': 1300600},
{'name': 'TP Stream', 'value': 413101}]
The +/- in the column names may be problematic though, along with the . in Chg. This dataframe would be best if all the column names were snake_case, if the +/- was replaced with plus_minus, and if the . in Chg. was simply dropped.
Edit: we can assume that the metric names will be the same in every row in the dataframe. However, there may be other dataframes with different metric names, so it would be preferable if the names 'Rank', 'Song', 'Artist', etc. were not hardcoded. Here is the original list before it was converted into a pandas dataframe:
raw_data = [{'rank': 1,
'title_name': 'BUTTER',
'title_id': '',
'artist_id': '',
'artist_name': 'BTS',
'label': '',
'peak_position': 1,
'last_week_rank': 7,
'last_2week_rank': 8,
'metrics': [{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': 'BUTTER'},
{'name': 'Artist', 'value': 'BTS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 7},
{'name': 'Last 2 Week Rank', 'value': 8},
{'name': 'Weeks On Chart', 'value': 15}]},
{'rank': 2,
'title_name': 'STAY',
'title_id': '',
'artist_id': '',
'artist_name': 'THE KID LAROI & JUS',
'label': '',
'peak_position': 1,
'last_week_rank': 1,
'last_2week_rank': 1,
'metrics': [{'name': 'Rank', 'value': 2},
{'name': 'Song', 'value': 'STAY'},
{'name': 'Artist', 'value': 'THE KID LAROI & JUS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 1},
{'name': 'Last 2 Week Rank', 'value': 1},
{'name': 'Weeks On Chart', 'value': 8}]}]
Most likely, the fastest way is to process raw_data as a dictionary and only then construct a DataFrame with it.
records = []
for rec in raw_data:
for metric in rec['metrics']:
# process name: snake_case > drop '.' > '+/-' to 'plus_minus'
name = metric['name'].lower().replace(' ','_').replace('.','').replace('+/-','plus_minus')
rec[name] = metric['value']
rec.pop('metrics') # drop metric records
records.append(rec)
df = pd.DataFrame(records)
Output
Resulting df
rank
title_name
title_id
artist_id
artist_name
label
peak_position
last_week_rank
last_2week_rank
song
artist
label_description
last_2_week_rank
weeks_on_chart
0
1
BUTTER
BTS
1
7
8
BUTTER
BTS
8
15
1
2
STAY
THE KID LAROI & JUS
1
1
1
STAY
THE KID LAROI & JUS
1
8
Setup
raw_data = [{'rank': 1,
'title_name': 'BUTTER',
'title_id': '',
'artist_id': '',
'artist_name': 'BTS',
'label': '',
'peak_position': 1,
'last_week_rank': 7,
'last_2week_rank': 8,
'metrics': [{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': 'BUTTER'},
{'name': 'Artist', 'value': 'BTS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 7},
{'name': 'Last 2 Week Rank', 'value': 8},
{'name': 'Weeks On Chart', 'value': 15}]},
{'rank': 2,
'title_name': 'STAY',
'title_id': '',
'artist_id': '',
'artist_name': 'THE KID LAROI & JUS',
'label': '',
'peak_position': 1,
'last_week_rank': 1,
'last_2week_rank': 1,
'metrics': [{'name': 'Rank', 'value': 2},
{'name': 'Song', 'value': 'STAY'},
{'name': 'Artist', 'value': 'THE KID LAROI & JUS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 1},
{'name': 'Last 2 Week Rank', 'value': 1},
{'name': 'Weeks On Chart', 'value': 8}]}]
Using the example's data as raw_data, i.e.
our_df = pd.DataFrame(data = {'rank': {0: 1, 1: 2}, 'title_name': {0: "And It's Still Alright", 1: 'Black Madonna'}, 'title_id': {0: '120034150', 1: '106938609'}, 'artist_id': {0: '222521', 1: '200160'}, 'artist_name': {0: 'Nathaniel Rateliff', 1: 'Cage The Elephant'}, 'label': {0: 'CNCO', 1: 'RCA'}, 'metrics': {0: [{'name': 'Rank', 'value': 1}, {'name': 'Song', 'value': "And It's Still Alright"}, {'name': 'Artist', 'value': 'Nathaniel Rateliff'}, {'name': 'TP Spins', 'value': 933}, {'name': '+/- Chg. Spins', 'value': -32}, {'name': 'LP Spins', 'value': 965}, {'name': 'Stations', 'value': '44/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1260000}, {'name': '+/- Chg. Audience', 'value': -40600}, {'name': 'LP Audience', 'value': 1300600}, {'name': 'TP Stream', 'value': 413101}], 1: [{'name': 'Rank', 'value': 2}, {'name': 'Song', 'value': 'Black Madonna'}, {'name': 'Artist', 'value': 'Cage The Elephant'}, {'name': 'TP Spins', 'value': 814}, {'name': '+/- Chg. Spins', 'value': 38}, {'name': 'LP Spins', 'value': 776}, {'name': 'Stations', 'value': '38/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1283400}, {'name': '+/- Chg. Audience', 'value': -21600}, {'name': 'LP Audience', 'value': 1305000}, {'name': 'TP Stream', 'value': 362366}]}})
raw_data = our_df.to_dict(orient='records')
Output
Resulting df from the solution above
rank
title_name
title_id
artist_id
artist_name
label
song
artist
tp_spins
plus_minus_chg_spins
lp_spins
stations
adds
tp_audience
plus_minus_chg_audience
lp_audience
tp_stream
0
1
And It's Still Alright
120034150
222521
Nathaniel Rateliff
CNCO
And It's Still Alright
Nathaniel Rateliff
933
-32
965
44/46
0
1260000
-40600
1300600
413101
1
2
Black Madonna
106938609
200160
Cage The Elephant
RCA
Black Madonna
Cage The Elephant
814
38
776
38/46
0
1283400
-21600
1305000
362366
Let's start decomposing your issue. After defining our_df we can generate a new dataframe based on the column metrics with:
pd.concat([pd.DataFrame({x['name']:x['value'] for x in y},index=[0]) for y in our_df['metrics']]
Which outputs:
Rank Song ... LP Audience TP Stream
0 1 And It's Still Alright ... 1300600 413101
0 2 Black Madonna ... 1305000 362366
Next it's just a question of joining them together with pd.concat() or merge. I assume the common key is the column Rank therefore I'll use merge:
our_df.drop(columns=['metrics']).merge(pd.concat([pd.DataFrame({x['name']:x['value'] for x in y},index=[0]) for y in our_df['metrics']]),left_on='rank',right_on='Rank')
Outputting the full dataframe
rank title_name ... LP Audience TP Stream
0 1 And It's Still Alright ... 1300600 413101
1 2 Black Madonna ... 1305000 362366
Alternative that might be robust against missing names
metric_df = our_df.apply(
lambda r:
pd.Series(
index=list(map(lambda d: d['name'], r['metrics']))+['rank'],
data=list(map(lambda d: d['value'], r['metrics']))+[r['rank']],
),
axis=1,
)
our_df.merge(metric_df, on='rank')
box = pd.concat({index : pd.DataFrame(ent)
for index, ent in
zip( our_df.index, our_df.metrics)})
( our_df
.drop(columns = 'metrics')
.join(box.droplevel(-1))
.pivot(['rank', 'title_name', 'title_id', 'artist_id', 'artist_name', 'label'],
'name',
'value')
.reset_index()
)
name rank title_name title_id artist_id artist_name label +/- Chg. Audience +/- Chg. Spins Adds Artist LP Audience LP Spins Rank Song Stations TP Audience TP Spins TP Stream
0 1 And It's Still Alright 120034150 222521 Nathaniel Rateliff CNCO -40600 -32 0 Nathaniel Rateliff 1300600 965 1 And It's Still Alright 44/46 1260000 933 413101
1 2 Black Madonna 106938609 200160 Cage The Elephant RCA -21600 38 0 Cage The Elephant 1305000 776 2 Black Madonna 38/46 1283400 814 362366
Working on the raw_data:
from itertools import chain, product
metrics = [ent['metrics'] for ent in raw_data]
non_metrics = [{key : value
for key, value
in ent.items()
if key != 'metrics'}
for ent in raw_data]
combo = zip(metrics, non_metrics)
combo = (product(metrics, [non_metrics])
for metrics, non_metrics in combo)
combo = chain.from_iterable(combo)
combo = [{**left, **right} for left, right in combo]
pd.DataFrame(combo)
name value rank title_name title_id artist_id artist_name label peak_position last_week_rank last_2week_rank
0 Rank 1 1 BUTTER BTS 1 7 8
1 Song BUTTER 1 BUTTER BTS 1 7 8
2 Artist BTS 1 BUTTER BTS 1 7 8
3 Label Description None 1 BUTTER BTS 1 7 8
4 Label 1 BUTTER BTS 1 7 8
5 Last Week Rank 7 1 BUTTER BTS 1 7 8
6 Last 2 Week Rank 8 1 BUTTER BTS 1 7 8
7 Weeks On Chart 15 1 BUTTER BTS 1 7 8
8 Rank 2 2 STAY THE KID LAROI & JUS 1 1 1
9 Song STAY 2 STAY THE KID LAROI & JUS 1 1 1
10 Artist THE KID LAROI & JUS 2 STAY THE KID LAROI & JUS 1 1 1
11 Label Description None 2 STAY THE KID LAROI & JUS 1 1 1
12 Label 2 STAY THE KID LAROI & JUS 1 1 1
13 Last Week Rank 1 2 STAY THE KID LAROI & JUS 1 1 1
14 Last 2 Week Rank 1 2 STAY THE KID LAROI & JUS 1 1 1
15 Weeks On Chart 8 2 STAY THE KID LAROI & JUS 1 1 1
You can then reshape/transform into whatever you desire.

TypeError when convert list of dictionaries to a DataFrame

I have a_list:
[{
'attributes': {
'title': 'apple',
'id': '5543',
'owner': 'tom',
}
},
{
'attributes': {
'title': 'pear',
'id': '5432',
'owner': 'suzy',
}
},
{
'attributes': {
'title': 'orange',
'id': '1234',
'owner': 'james',
}
}]
I am trying to return this as a simple dataframe.
From looking at other posts, the nested dictionary leads me to think I should be using json_normalize and passing in the attributes for the record_path.
df = pd.json_normalize(a_list, record_path='attributes', meta= ['title', 'id', 'owner'])
However, this returns an exception:
TypeError: {'attributes': {'title': 'apple', 'id': '5543', 'owner': 'tom'}} has non list value {'title': 'apple', 'id': '5543', 'owner': 'tom'} for path attributes. Must be list or null.
What have I done wrong here?
You can use simple list comprehension:
df = pd.DataFrame([d["attributes"] for d in lst])
print(df)
Prints:
title id owner
0 apple 5543 tom
1 pear 5432 suzy
2 orange 1234 james

Merge 2 pandas dictionary columns

I have a dataframe simplified here with 3 columns.
| id | channels | facebookCount |
|:---- |:------:| -----:|
| 0 | {'channel': 'Google', 'count': 0.0} | 3 |
| 1 | {'channel': 'Google', 'count': 4.0} | 0 |
| 2 | {'channel': 'Google', 'count': 3.0} | 6 |
The channels column was a simple count column like facebookCount. However, I transformed into a dictionary using apply and lambda as such:
data_df["channels"] = data_df["googleCount"].apply(
lambda x: {} if x is None else {"channel": "Google", "count": x})
How can I construct the channel column so that it has data for both facebook and google so that I have a list containing 2 dictionaries as seen below:
| id | channels |
|:---- |:------:|
| 0 | [{'channel': 'Google', 'count': 0.0}, {'channel': 'Facebook', 'count': 3.0}] |
| 1 | [{'channel': 'Google', 'count': 4.0}, {'channel': 'Facebook', 'count': 0.0}] |
| 2 | [{'channel': 'Google', 'count': 3.0}, {'channel': 'Facebook', 'count': 6.0}] |
I have tried creating both dictionaries and then setting channel as well as creating one dictionary and then merging the 2 using apply and lambda as well as a helper function as such
dict1 = data_df["30DayGoogleCampaignCount"].apply(
lambda x: {"channel": "Google", "count": x})
data_df["paidMediaChannels"] = data_df["30DayFacebookCampaignCount"].apply(
lambda x: self.Merge(dict1, {"channel": "facebook", "count": x}))
def Merge(self, dict1, dict2):
return(dict2.update(dict1))
Try something like:
import pandas as pd
df = pd.DataFrame({'id': [0, 1, 2],
'channels': [{'channel': 'Google', 'count': 0.0},
{'channel': 'Google', 'count': 4.0},
{'channel': 'Google', 'count': 3.0}],
'facebookCount': [3, 0, 6]})
# Create List
df['channels'] = df.apply(
lambda x: [x['channels'],
{'channel': 'Facebook',
'count': x['facebookCount']}],
axis=1
)
# Drop facebookCount Column
df = df.drop(columns='facebookCount')
print(df.to_string())
df:
id channels
0 0 [{'channel': 'Google', 'count': 0.0}, {'channel': 'Facebook', 'count': 3}]
1 1 [{'channel': 'Google', 'count': 4.0}, {'channel': 'Facebook', 'count': 0}]
2 2 [{'channel': 'Google', 'count': 3.0}, {'channel': 'Facebook', 'count': 6}]

Sort a list of dict with a key from another list of dict

In the following example, I would like to sort the animals by the alphabetical order of their category, which is stored in an order dictionnary.
category = [{'uid': 0, 'name': 'mammals'},
{'uid': 1, 'name': 'birds'},
{'uid': 2, 'name': 'fish'},
{'uid': 3, 'name': 'reptiles'},
{'uid': 4, 'name': 'invertebrates'},
{'uid': 5, 'name': 'amphibians'}]
animals = [{'name': 'horse', 'category': 0},
{'name': 'whale', 'category': 2},
{'name': 'mollusk', 'category': 4},
{'name': 'tuna ', 'category': 2},
{'name': 'worms', 'category': 4},
{'name': 'frog', 'category': 5},
{'name': 'dog', 'category': 0},
{'name': 'salamander', 'category': 5},
{'name': 'horse', 'category': 0},
{'name': 'octopus', 'category': 4},
{'name': 'alligator', 'category': 3},
{'name': 'monkey', 'category': 0},
{'name': 'kangaroos', 'category': 0},
{'name': 'salmon', 'category': 2}]
sorted_animals = sorted(animals, key=lambda k: (k['category'])
How could I achieve this?
Thanks.
You are now sorting on the category id. All you need to do is map that id to a lookup for a given category name.
Create a dictionary for the categories first so you can directly map the numeric id to the associated name from the category list, then use that mapping when sorting:
catuid_to_name = {c['uid']: c['name'] for c in category}
sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
Demo:
>>> from pprint import pprint
>>> category = [{'uid': 0, 'name': 'mammals'},
... {'uid': 1, 'name': 'birds'},
... {'uid': 2, 'name': 'fish'},
... {'uid': 3, 'name': 'reptiles'},
... {'uid': 4, 'name': 'invertebrates'},
... {'uid': 5, 'name': 'amphibians'}]
>>> animals = [{'name': 'horse', 'category': 0},
... {'name': 'whale', 'category': 2},
... {'name': 'mollusk', 'category': 4},
... {'name': 'tuna ', 'category': 2},
... {'name': 'worms', 'category': 4},
... {'name': 'frog', 'category': 5},
... {'name': 'dog', 'category': 0},
... {'name': 'salamander', 'category': 5},
... {'name': 'horse', 'category': 0},
... {'name': 'octopus', 'category': 4},
... {'name': 'alligator', 'category': 3},
... {'name': 'monkey', 'category': 0},
... {'name': 'kangaroos', 'category': 0},
... {'name': 'salmon', 'category': 2}]
>>> catuid_to_name = {c['uid']: c['name'] for c in category}
>>> pprint(catuid_to_name)
{0: 'mammals',
1: 'birds',
2: 'fish',
3: 'reptiles',
4: 'invertebrates',
5: 'amphibians'}
>>> sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
>>> pprint(sorted_animals)
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'whale'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'salmon'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'worms'},
{'category': 4, 'name': 'octopus'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'monkey'},
{'category': 0, 'name': 'kangaroos'},
{'category': 3, 'name': 'alligator'}]
Note that within each category, the dictionaries have been left in relative input order. You could return a tuple of values from the sorting key to further apply a sorting order within each category, e.g.:
sorted_animals = sorted(
animals,
key=lambda k: (catuid_to_name[k['category']], k['name'])
)
would sort by animal name within each category, producing:
>>> pprint(sorted(animals, key=lambda k: (catuid_to_name[k['category']], k['name'])))
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'salmon'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'whale'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'octopus'},
{'category': 4, 'name': 'worms'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'kangaroos'},
{'category': 0, 'name': 'monkey'},
{'category': 3, 'name': 'alligator'}]
imo your category structure is far too complicated - at least as long as the uid is nothing but the index, you could simply use a list for that:
category = [c['name'] for c in category]
# ['mammals', 'birds', 'fish', 'reptiles', 'invertebrates', 'amphibians']
sorted_animals = sorted(animals, key=lambda k: category[k['category']])
#[{'name': 'frog', 'category': 5}, {'name': 'salamander', 'category': 5}, {'name': 'whale', 'category': 2}, {'name': 'tuna ', 'category': 2}, {'name': 'salmon', 'category': 2}, {'name': 'mollusk', 'category': 4}, {'name': 'worms', 'category': 4}, {'name': 'octopus', 'category': 4}, {'name': 'horse', 'category': 0}, {'name': 'dog', 'category': 0}, {'name': 'horse', 'category': 0}, {'name': 'monkey', 'category': 0}, {'name': 'kangaroos', 'category': 0}, {'name': 'alligator', 'category': 3}]

Categories

Resources