python pandas unnest data column containing a list of dictionaries - python

we have the following dataframe:
import pandas as pd
our_df = pd.DataFrame(data = {'rank': {0: 1, 1: 2}, 'title_name': {0: "And It's Still Alright", 1: 'Black Madonna'}, 'title_id': {0: '120034150', 1: '106938609'}, 'artist_id': {0: '222521', 1: '200160'}, 'artist_name': {0: 'Nathaniel Rateliff', 1: 'Cage The Elephant'}, 'label': {0: 'CNCO', 1: 'RCA'}, 'metrics': {0: [{'name': 'Rank', 'value': 1}, {'name': 'Song', 'value': "And It's Still Alright"}, {'name': 'Artist', 'value': 'Nathaniel Rateliff'}, {'name': 'TP Spins', 'value': 933}, {'name': '+/- Chg. Spins', 'value': -32}, {'name': 'LP Spins', 'value': 965}, {'name': 'Stations', 'value': '44/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1260000}, {'name': '+/- Chg. Audience', 'value': -40600}, {'name': 'LP Audience', 'value': 1300600}, {'name': 'TP Stream', 'value': 413101}], 1: [{'name': 'Rank', 'value': 2}, {'name': 'Song', 'value': 'Black Madonna'}, {'name': 'Artist', 'value': 'Cage The Elephant'}, {'name': 'TP Spins', 'value': 814}, {'name': '+/- Chg. Spins', 'value': 38}, {'name': 'LP Spins', 'value': 776}, {'name': 'Stations', 'value': '38/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1283400}, {'name': '+/- Chg. Audience', 'value': -21600}, {'name': 'LP Audience', 'value': 1305000}, {'name': 'TP Stream', 'value': 362366}]}})
and we are looking to convert the metrics column into 12 new columns in our dataframe, using the metric's name field as the column name, and value field as the field in the dataframe. Something like this:
rank title_name title_id artist_id artist_name label Rank Song ...
1 'And It's Still Alright' 120034150 222521 'Nathaniel Rateliff' 'CNCO' 1 "And It's Still Alright"
Here's what the value in the metrics column looks like for row 1:
our_df['metrics'][0]
[{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': "And It's Still Alright"},
{'name': 'Artist', 'value': 'Nathaniel Rateliff'},
{'name': 'TP Spins', 'value': 933},
{'name': '+/- Chg. Spins', 'value': -32},
{'name': 'LP Spins', 'value': 965},
{'name': 'Stations', 'value': '44/46'},
{'name': 'Adds', 'value': 0},
{'name': 'TP Audience', 'value': 1260000},
{'name': '+/- Chg. Audience', 'value': -40600},
{'name': 'LP Audience', 'value': 1300600},
{'name': 'TP Stream', 'value': 413101}]
The +/- in the column names may be problematic though, along with the . in Chg. This dataframe would be best if all the column names were snake_case, if the +/- was replaced with plus_minus, and if the . in Chg. was simply dropped.
Edit: we can assume that the metric names will be the same in every row in the dataframe. However, there may be other dataframes with different metric names, so it would be preferable if the names 'Rank', 'Song', 'Artist', etc. were not hardcoded. Here is the original list before it was converted into a pandas dataframe:
raw_data = [{'rank': 1,
'title_name': 'BUTTER',
'title_id': '',
'artist_id': '',
'artist_name': 'BTS',
'label': '',
'peak_position': 1,
'last_week_rank': 7,
'last_2week_rank': 8,
'metrics': [{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': 'BUTTER'},
{'name': 'Artist', 'value': 'BTS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 7},
{'name': 'Last 2 Week Rank', 'value': 8},
{'name': 'Weeks On Chart', 'value': 15}]},
{'rank': 2,
'title_name': 'STAY',
'title_id': '',
'artist_id': '',
'artist_name': 'THE KID LAROI & JUS',
'label': '',
'peak_position': 1,
'last_week_rank': 1,
'last_2week_rank': 1,
'metrics': [{'name': 'Rank', 'value': 2},
{'name': 'Song', 'value': 'STAY'},
{'name': 'Artist', 'value': 'THE KID LAROI & JUS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 1},
{'name': 'Last 2 Week Rank', 'value': 1},
{'name': 'Weeks On Chart', 'value': 8}]}]

Most likely, the fastest way is to process raw_data as a dictionary and only then construct a DataFrame with it.
records = []
for rec in raw_data:
for metric in rec['metrics']:
# process name: snake_case > drop '.' > '+/-' to 'plus_minus'
name = metric['name'].lower().replace(' ','_').replace('.','').replace('+/-','plus_minus')
rec[name] = metric['value']
rec.pop('metrics') # drop metric records
records.append(rec)
df = pd.DataFrame(records)
Output
Resulting df
rank
title_name
title_id
artist_id
artist_name
label
peak_position
last_week_rank
last_2week_rank
song
artist
label_description
last_2_week_rank
weeks_on_chart
0
1
BUTTER
BTS
1
7
8
BUTTER
BTS
8
15
1
2
STAY
THE KID LAROI & JUS
1
1
1
STAY
THE KID LAROI & JUS
1
8
Setup
raw_data = [{'rank': 1,
'title_name': 'BUTTER',
'title_id': '',
'artist_id': '',
'artist_name': 'BTS',
'label': '',
'peak_position': 1,
'last_week_rank': 7,
'last_2week_rank': 8,
'metrics': [{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': 'BUTTER'},
{'name': 'Artist', 'value': 'BTS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 7},
{'name': 'Last 2 Week Rank', 'value': 8},
{'name': 'Weeks On Chart', 'value': 15}]},
{'rank': 2,
'title_name': 'STAY',
'title_id': '',
'artist_id': '',
'artist_name': 'THE KID LAROI & JUS',
'label': '',
'peak_position': 1,
'last_week_rank': 1,
'last_2week_rank': 1,
'metrics': [{'name': 'Rank', 'value': 2},
{'name': 'Song', 'value': 'STAY'},
{'name': 'Artist', 'value': 'THE KID LAROI & JUS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 1},
{'name': 'Last 2 Week Rank', 'value': 1},
{'name': 'Weeks On Chart', 'value': 8}]}]
Using the example's data as raw_data, i.e.
our_df = pd.DataFrame(data = {'rank': {0: 1, 1: 2}, 'title_name': {0: "And It's Still Alright", 1: 'Black Madonna'}, 'title_id': {0: '120034150', 1: '106938609'}, 'artist_id': {0: '222521', 1: '200160'}, 'artist_name': {0: 'Nathaniel Rateliff', 1: 'Cage The Elephant'}, 'label': {0: 'CNCO', 1: 'RCA'}, 'metrics': {0: [{'name': 'Rank', 'value': 1}, {'name': 'Song', 'value': "And It's Still Alright"}, {'name': 'Artist', 'value': 'Nathaniel Rateliff'}, {'name': 'TP Spins', 'value': 933}, {'name': '+/- Chg. Spins', 'value': -32}, {'name': 'LP Spins', 'value': 965}, {'name': 'Stations', 'value': '44/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1260000}, {'name': '+/- Chg. Audience', 'value': -40600}, {'name': 'LP Audience', 'value': 1300600}, {'name': 'TP Stream', 'value': 413101}], 1: [{'name': 'Rank', 'value': 2}, {'name': 'Song', 'value': 'Black Madonna'}, {'name': 'Artist', 'value': 'Cage The Elephant'}, {'name': 'TP Spins', 'value': 814}, {'name': '+/- Chg. Spins', 'value': 38}, {'name': 'LP Spins', 'value': 776}, {'name': 'Stations', 'value': '38/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1283400}, {'name': '+/- Chg. Audience', 'value': -21600}, {'name': 'LP Audience', 'value': 1305000}, {'name': 'TP Stream', 'value': 362366}]}})
raw_data = our_df.to_dict(orient='records')
Output
Resulting df from the solution above
rank
title_name
title_id
artist_id
artist_name
label
song
artist
tp_spins
plus_minus_chg_spins
lp_spins
stations
adds
tp_audience
plus_minus_chg_audience
lp_audience
tp_stream
0
1
And It's Still Alright
120034150
222521
Nathaniel Rateliff
CNCO
And It's Still Alright
Nathaniel Rateliff
933
-32
965
44/46
0
1260000
-40600
1300600
413101
1
2
Black Madonna
106938609
200160
Cage The Elephant
RCA
Black Madonna
Cage The Elephant
814
38
776
38/46
0
1283400
-21600
1305000
362366

Let's start decomposing your issue. After defining our_df we can generate a new dataframe based on the column metrics with:
pd.concat([pd.DataFrame({x['name']:x['value'] for x in y},index=[0]) for y in our_df['metrics']]
Which outputs:
Rank Song ... LP Audience TP Stream
0 1 And It's Still Alright ... 1300600 413101
0 2 Black Madonna ... 1305000 362366
Next it's just a question of joining them together with pd.concat() or merge. I assume the common key is the column Rank therefore I'll use merge:
our_df.drop(columns=['metrics']).merge(pd.concat([pd.DataFrame({x['name']:x['value'] for x in y},index=[0]) for y in our_df['metrics']]),left_on='rank',right_on='Rank')
Outputting the full dataframe
rank title_name ... LP Audience TP Stream
0 1 And It's Still Alright ... 1300600 413101
1 2 Black Madonna ... 1305000 362366

Alternative that might be robust against missing names
metric_df = our_df.apply(
lambda r:
pd.Series(
index=list(map(lambda d: d['name'], r['metrics']))+['rank'],
data=list(map(lambda d: d['value'], r['metrics']))+[r['rank']],
),
axis=1,
)
our_df.merge(metric_df, on='rank')

box = pd.concat({index : pd.DataFrame(ent)
for index, ent in
zip( our_df.index, our_df.metrics)})
( our_df
.drop(columns = 'metrics')
.join(box.droplevel(-1))
.pivot(['rank', 'title_name', 'title_id', 'artist_id', 'artist_name', 'label'],
'name',
'value')
.reset_index()
)
name rank title_name title_id artist_id artist_name label +/- Chg. Audience +/- Chg. Spins Adds Artist LP Audience LP Spins Rank Song Stations TP Audience TP Spins TP Stream
0 1 And It's Still Alright 120034150 222521 Nathaniel Rateliff CNCO -40600 -32 0 Nathaniel Rateliff 1300600 965 1 And It's Still Alright 44/46 1260000 933 413101
1 2 Black Madonna 106938609 200160 Cage The Elephant RCA -21600 38 0 Cage The Elephant 1305000 776 2 Black Madonna 38/46 1283400 814 362366
Working on the raw_data:
from itertools import chain, product
metrics = [ent['metrics'] for ent in raw_data]
non_metrics = [{key : value
for key, value
in ent.items()
if key != 'metrics'}
for ent in raw_data]
combo = zip(metrics, non_metrics)
combo = (product(metrics, [non_metrics])
for metrics, non_metrics in combo)
combo = chain.from_iterable(combo)
combo = [{**left, **right} for left, right in combo]
pd.DataFrame(combo)
name value rank title_name title_id artist_id artist_name label peak_position last_week_rank last_2week_rank
0 Rank 1 1 BUTTER BTS 1 7 8
1 Song BUTTER 1 BUTTER BTS 1 7 8
2 Artist BTS 1 BUTTER BTS 1 7 8
3 Label Description None 1 BUTTER BTS 1 7 8
4 Label 1 BUTTER BTS 1 7 8
5 Last Week Rank 7 1 BUTTER BTS 1 7 8
6 Last 2 Week Rank 8 1 BUTTER BTS 1 7 8
7 Weeks On Chart 15 1 BUTTER BTS 1 7 8
8 Rank 2 2 STAY THE KID LAROI & JUS 1 1 1
9 Song STAY 2 STAY THE KID LAROI & JUS 1 1 1
10 Artist THE KID LAROI & JUS 2 STAY THE KID LAROI & JUS 1 1 1
11 Label Description None 2 STAY THE KID LAROI & JUS 1 1 1
12 Label 2 STAY THE KID LAROI & JUS 1 1 1
13 Last Week Rank 1 2 STAY THE KID LAROI & JUS 1 1 1
14 Last 2 Week Rank 1 2 STAY THE KID LAROI & JUS 1 1 1
15 Weeks On Chart 8 2 STAY THE KID LAROI & JUS 1 1 1
You can then reshape/transform into whatever you desire.

Related

Filter list based on value existing in another list

I have o first list A=
[{'name': 'PASSWORD', 'id': '5f2496e5-dc40-418a-92e0-098e4642a92e'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920'},
{'name': 'PERU_DNI_NUMBER', 'id': '41f41303-4a71-4732-a8a4-0eecea464562'},
{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'},
{'name': 'POLAND_NATIONAL_ID_NUMBER',
'id': '32c49d92-6d5f-408e-b41e-dfec76ceae6a'}]
and I have a second list B :
[{'name': 'PHONE_NUMBER', 'count': '96'}]
I want to filter the first list based on the second in order to have the following list :
[{'name': 'PHONE_NUMBER', 'count': '96','id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'}.
I have used the following code but I dont get the right ouptut:
filtered = []
for x,i in DLP_job[i]['name']:
if x,i in ids[i]['name']:
filtered.append(x)
print(filtered)
Here is the solution
A = [{'name': 'PASSWORD', 'id': '5f2496e5-dc40-418a-92e0-098e4642a92e'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920'},
{'name': 'PERU_DNI_NUMBER', 'id': '41f41303-4a71-4732-a8a4-0eecea464562'},
{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'},
{'name': 'POLAND_NATIONAL_ID_NUMBER',
'id': '32c49d92-6d5f-408e-b41e-dfec76ceae6a'}]
B = [{'name': 'PHONE_NUMBER', 'count': '96'}]
print([{**x, **y} for x in A for y in B if y['name'] == x['name']])
One way is to walk both lists, and wherever you have matching name keys, use the merger of the 2 dicts:
l1 = [{'name': 'PASSWORD', 'id': '5f2496e5-dc40-418a-92e0-098e4642a92e'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920'},
{'name': 'PERU_DNI_NUMBER', 'id': '41f41303-4a71-4732-a8a4-0eecea464562'},
{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b'},
{'name': 'POLAND_NATIONAL_ID_NUMBER',
'id': '32c49d92-6d5f-408e-b41e-dfec76ceae6a'}]
l2 = [{'name': 'PHONE_NUMBER', 'count': '96'}, {'name': 'PERSON_NAME', 'count': '100'}]
result = []
for d2 in l2:
for d1 in l1:
if d1['name'] == d2['name']:
result.append({**d1, **d2})
print(result)
[{'name': 'PHONE_NUMBER', 'id': 'ac24413b-bb8f-4adc-ada5-a984f145a70b', 'count': '96'},
{'name': 'PERSON_NAME', 'id': '3a255440-e2aa-4c4d-993f-4cdef3237920', 'count': '100'}]

getting 'TypeError: object of type 'float' has no len()' when trying to convert Json into Dataframe

I am trying to create dataframe from the json which I fetched from Quickbooks APAgingSummary API, but I am getting an error "TypeError: object of type 'float' has no len()", when I am inserting json_normalize data in the form of list to pandas. I used the same code for creating Dataframe from Quickbooks AccountListDetail API Json and it was working fine.
This code was used for fetching data:
base_url = 'https://sandbox-quickbooks.api.intuit.com'
url = f"{base_url}/v3/company/{auth_client.realm_id}/reports/AgedPayables?&minorversion=62"
auth_header = f'Bearer {auth_client.access_token}'
headers = {
'Authorization': auth_header,
'Accept': 'application/json'
}
response = requests.get(url, headers=headers)
responseJson = response.json()
responseJson
This is the responseJson:
{'Header': {'Time': '2021-10-05T04:33:02-07:00',
'ReportName': 'AgedPayables',
'DateMacro': 'today',
'StartPeriod': '2021-10-05',
'EndPeriod': '2021-10-05',
'SummarizeColumnsBy': 'Total',
'Currency': 'USD',
'Option': [{'Name': 'report_date', 'Value': '2021-10-05'},
{'Name': 'NoReportData', 'Value': 'false'}]},
'Columns': {'Column': [{'ColTitle': '', 'ColType': 'Vendor'},
{'ColTitle': 'Current',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': 'current'}]},
{'ColTitle': '1 - 30',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '0'}]},
{'ColTitle': '31 - 60',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '1'}]},
{'ColTitle': '61 - 90',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '2'}]},
{'ColTitle': '91 and over',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '3'}]},
{'ColTitle': 'Total',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': 'total'}]}]},
'Rows': {'Row': [{'ColData': [{'value': 'Brosnahan Insurance Agency',
'id': '31'},
{'value': ''},
{'value': '241.23'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '241.23'}]},
{'ColData': [{'value': "Diego's Road Warrior Bodyshop", 'id': '36'},
{'value': '755.00'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '755.00'}]},
{'ColData': [{'value': 'Norton Lumber and Building Materials', 'id': '46'},
{'value': ''},
{'value': '205.00'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '205.00'}]},
{'ColData': [{'value': 'PG&E', 'id': '48'},
{'value': ''},
{'value': ''},
{'value': '86.44'},
{'value': ''},
{'value': ''},
{'value': '86.44'}]},
{'ColData': [{'value': 'Robertson & Associates', 'id': '49'},
{'value': ''},
{'value': '315.00'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '315.00'}]},
{'Summary': {'ColData': [{'value': 'TOTAL'},
{'value': '755.00'},
{'value': '761.23'},
{'value': '86.44'},
{'value': '0.00'},
{'value': '0.00'},
{'value': '1602.67'}]},
'type': 'Section',
'group': 'GrandTotal'}]}}
this is the code where I am getting the error:
colHeaders = []
for i in responseJson['Columns']['Column']:
colHeaders.append(i['ColTitle'])
responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)
this is the responseDf after json_normalize:
ColData type group Summary.ColData
0 [{'value': 'Brosnahan Insurance Agency', 'id':... NaN NaN NaN
1 [{'value': 'Diego's Road Warrior Bodyshop', 'i... NaN NaN NaN
2 [{'value': 'Norton Lumber and Building Materia... NaN NaN NaN
3 [{'value': 'PG&E', 'id': '48'}, {'value': ''},... NaN NaN NaN
4 [{'value': 'Robertson & Associates', 'id': '49... NaN NaN NaN
5 NaN Section GrandTotal [{'value': 'TOTAL'}, {'value': '755.00'}, {'va...
each element of ColData contains list of dictionaries.
and This is the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-215-6ce65ce2ac94> in <module>
6
7 responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
----> 8 responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)
9 responseDf
10
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
507 if is_named_tuple(data[0]) and columns is None:
508 columns = data[0]._fields
--> 509 arrays, columns = to_arrays(data, columns, dtype=dtype)
510 columns = ensure_index(columns)
511
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in to_arrays(data, columns, coerce_float, dtype)
522 return [], [] # columns if columns is not None else []
523 if isinstance(data[0], (list, tuple)):
--> 524 return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
525 elif isinstance(data[0], abc.Mapping):
526 return _list_of_dict_to_arrays(
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
559 else:
560 # list of lists
--> 561 content = list(lib.to_object_array(data).T)
562 # gh-26429 do not raise user-facing AssertionError
563 try:
pandas\_libs\lib.pyx in pandas._libs.lib.to_object_array()
TypeError: object of type 'float' has no len()
Any help will be really appreciated.
You got the error because there is NaN value on the ColData column in responseDf. NaN is considered float type and has no len(), hence the error.
To solve the problem, you can init the NaN with list of empty dict with .fillna(), as follows:
responseDf['ColData'] = responseDf['ColData'].fillna({i: [{}] for i in responseDf.index})
Put the codes immediately after the line with pd.json_normalize
The full set of codes will be:
colHeaders = []
for i in responseJson['Columns']['Column']:
colHeaders.append(i['ColTitle'])
responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
## Add the code here
responseDf['ColData'] = responseDf['ColData'].fillna({i: [{}] for i in responseDf.index})
responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)
Then, you will get through the error and get the result of responseDf, as follows:
print(responseDf)
ColData type group Summary.ColData Current 1 - 30 31 - 60 61 - 90 91 and over Total
0 [{'value': 'Brosnahan Insurance Agency', 'id': '31'}, {'value': ''}, {'value': '241.23'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '241.23'}] NaN NaN NaN {'value': 'Brosnahan Insurance Agency', 'id': '31'} {'value': ''} {'value': '241.23'} {'value': ''} {'value': ''} {'value': ''} {'value': '241.23'}
1 [{'value': 'Diego's Road Warrior Bodyshop', 'id': '36'}, {'value': '755.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '755.00'}] NaN NaN NaN {'value': 'Diego's Road Warrior Bodyshop', 'id': '36'} {'value': '755.00'} {'value': ''} {'value': ''} {'value': ''} {'value': ''} {'value': '755.00'}
2 [{'value': 'Norton Lumber and Building Materials', 'id': '46'}, {'value': ''}, {'value': '205.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '205.00'}] NaN NaN NaN {'value': 'Norton Lumber and Building Materials', 'id': '46'} {'value': ''} {'value': '205.00'} {'value': ''} {'value': ''} {'value': ''} {'value': '205.00'}
3 [{'value': 'PG&E', 'id': '48'}, {'value': ''}, {'value': ''}, {'value': '86.44'}, {'value': ''}, {'value': ''}, {'value': '86.44'}] NaN NaN NaN {'value': 'PG&E', 'id': '48'} {'value': ''} {'value': ''} {'value': '86.44'} {'value': ''} {'value': ''} {'value': '86.44'}
4 [{'value': 'Robertson & Associates', 'id': '49'}, {'value': ''}, {'value': '315.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '315.00'}] NaN NaN NaN {'value': 'Robertson & Associates', 'id': '49'} {'value': ''} {'value': '315.00'} {'value': ''} {'value': ''} {'value': ''} {'value': '315.00'}
5 [{}] Section GrandTotal [{'value': 'TOTAL'}, {'value': '755.00'}, {'value': '761.23'}, {'value': '86.44'}, {'value': '0.00'}, {'value': '0.00'}, {'value': '1602.67'}] {} None None None None None None

append all values in a dict if all elements exists and remove duplicates

I have a scenario where I have three dictionaries which I want merge into one but the condition is while I compare the three dictionaries with key name if there are duplicates need to remove them.
Here is what I have tried :
dict1= {'d1': [{'name': 'app1', 'id': 7134}, {'name': 'app2', 'id': 242}, {'name': 'yest app', 'id': 67},{'name': 'abc jam app', 'id': 6098}]}
dict2= {'d2': [{'name': 'app1 ', 'id': 30}, {'name': 'app2', 'id': 82}, {'name': 'yest app', 'id': 17}]}
dict3= {'d3': [{'name': 'app1', 'id': 70}, {'name': 'app2', 'id': 2582},{'name': 'availabla2z', 'id': 6667}]}
dict2 = {i:j for i,j in dict2.items() if i not in dict1}
dict3 = {i:j for i,j in dict3.items() if i not in dict2}
But the same do not give results also I am not sure how to compare three dicts for that matter.
and since if you look at the data dict1 is having an element 'name': 'app1' where as the same element is there in dict2 like this 'name': 'app1 ' (with a space) not sure how to format this as well and get a final dict like below as result.
{'final': [{'name': 'app1 ', 'id': 30}, {'name': 'app2', 'id': 82}, {'name': 'yest app', 'id': 17},{'name': 'abc jam app', 'id': 6098},{'name': 'availabla2z', 'id': 6667}]}
Here's a solution, taking advantage of this other SO answer (useful for python-2.x alternatives) that will remove duplicates without any particular rule:
final_dict = dict()
final_dict["final"] = dict1["d1"] + dict2["d2"] + dict3["d3"]
final_dict["final"] = list(
{v['name'].strip():v for v in final_dict["final"]}.values()
) # see usage of .strip() to handle space problems you mention
print(final_dict)
Result:
{'final': [
{'name': 'app1', 'id': 70},
{'name': 'app2', 'id': 2582},
{'name': 'yest app', 'id': 17},
{'name': 'abc jam app', 'id': 6098},
{'name': 'availabla2z', 'id': 6667}]
}
Here is a working Updater code:
dict1= {'d1': [{'name': 'app1', 'id': 7134}, {'name': 'app2', 'id': 242}, {'name': 'yest app', 'id': 67},{'name': 'abc jam app', 'id': 6098}]}
dict2= {'d2': [{'name': 'app1 ', 'id': 30}, {'name': 'app2', 'id': 82}, {'name': 'yest app', 'id': 17}]}
dict3= {'d3': [{'name': 'app1', 'id': 70}, {'name': 'app2', 'id': 2582},{'name': 'availabla2z', 'id': 6667}]}
final = {'final':[]}
for i in dict1['d1']:
final['final'].append(i)
for k,l in zip(dict3['d3'],range(len(dict1['d1']))):
if k['name'] == final['final'][l]['name']:
final['final'][l].update(k)
else:
final['final'].append(k)
for j,l in zip(dict2['d2'],range(len(dict1['d1']))):
if j['name'].strip() == final['final'][l]['name'].strip():
final['final'][l].update(j)
else:
final['final'].append(j)
This gives:
{'final': [{'name': 'app1 ', 'id': 30}, {'name': 'app2', 'id': 82}, {'name': 'yest app', 'id': 17}, {'name': 'abc jam app', 'id': 6098}, {'name': 'availabla2z', 'id': 6667}]}
You could group all dictionaries together by name using defaultdict:
from collections import defaultdict
d = defaultdict(list)
for lst in (dict1.values(), dict2.values(), dict3.values()):
for sublst in lst:
for dic in sublst:
d[dic["name"].strip()].append(dic)
Then choose the dictionaries with the smallest id value using min(). This still works for the requirements since it still chooses one dictionary and matches the output requested.
from operator import itemgetter
result = {'field': [min(x, key=itemgetter('id')) for x in d.values()]}
print(result)
Output:
{'field': [{'name': 'app1', 'id': 30}, {'name': 'app2', 'id': 82}, {'name': 'yest app', 'id': 17}, {'name': 'abc jam app', 'id': 6098}, {'name': 'availabla2z', 'id': 6667}]}

Sorting nested dictionary according to a key

i have a dictionary as below :
{' PLATINUM': [{'Name': 'MATH',
'Description': 'You can earn up to 50 Rs per year',
'value': 50},
{'Name': 'SCIENCE',
'Description': 'You can earn up to 100 Rs per year',
'value': 100},
{'Name': 'TOTAL',
'Description': 'You can earn up to 200 Rs per year',
'value': 200},
{'Name': 'SOCIAL',
'Description': 'You can earn up to 50 Rs per year',
'value': 50}],
'TITANIUM': [{'Name': 'SOCIAL',
'Description': 'You can earn up to 20 Rs per year',
'value': 20},
{'Name': 'MATH',
'Description': 'You can earn up to 10 Rs per year',
'value': 10},
{'Name': 'TOTAL',
'Description': 'You can earn up to 30 Rs per year',
'value': 30}]}
I wanted it to be sorted at each level - 'PLATINUM','TITANIUM' (as many levels) with the 'value'.
so expected dictionary will look like :
{' PLATINUM': [
{'Name': 'TOTAL',
'Description': 'You can earn up to 200 Rs per year',
'value': 200},
{'Name': 'SCIENCE',
'Description': 'You can earn up to 100 Rs per year',
'value': 100},
{'Name': 'MATH',
'Description': 'You can earn up to 50 Rs per year',
'value': 50},
{'Name': 'SOCIAL',
'Description': 'You can earn up to 50 Rs per year',
'value': 50}],
'TITANIUM': [
{'Name': 'TOTAL',
'Description': 'You can earn up to 30 Rs per year',
'value': 30}
{'Name': 'SOCIAL',
'Description': 'You can earn up to 20 Rs per year',
'value': 20},
{'Name': 'MATH',
'Description': 'You can earn up to 10 Rs per year',
'value': 10}]}
Can any one help me to achieve it with python code ?
You could use the following dictionary comprehension, where the inner dictionaries are sorted according to the key value:
from operator import itemgetter
d = {' PLATINUM': [{'Name': 'MATH', 'Description': 'You ...'}
{k:sorted(d[k], key=itemgetter('value'), reverse=True) for k in d}
Output
{' PLATINUM': [{'Name': 'TOTAL',
'Description': 'You can earn up to 200 Rs per year',
'value': 200},
{'Name': 'SCIENCE',
'Description': 'You can earn up to 100 Rs per year',
'value': 100},
{'Name': 'MATH',
'Description': 'You can earn up to 50 Rs per year',
'value': 50},
{'Name': 'SOCIAL',
'Description': 'You can earn up to 50 Rs per year',
'value': 50}],
'TITANIUM': [{'Name': 'TOTAL',
'Description': 'You can earn up to 30 Rs per year',
'value': 30},
{'Name': 'SOCIAL',
'Description': 'You can earn up to 20 Rs per year',
'value': 20},
{'Name': 'MATH',
'Description': 'You can earn up to 10 Rs per year',
'value': 10}]}

Sort a list of dict with a key from another list of dict

In the following example, I would like to sort the animals by the alphabetical order of their category, which is stored in an order dictionnary.
category = [{'uid': 0, 'name': 'mammals'},
{'uid': 1, 'name': 'birds'},
{'uid': 2, 'name': 'fish'},
{'uid': 3, 'name': 'reptiles'},
{'uid': 4, 'name': 'invertebrates'},
{'uid': 5, 'name': 'amphibians'}]
animals = [{'name': 'horse', 'category': 0},
{'name': 'whale', 'category': 2},
{'name': 'mollusk', 'category': 4},
{'name': 'tuna ', 'category': 2},
{'name': 'worms', 'category': 4},
{'name': 'frog', 'category': 5},
{'name': 'dog', 'category': 0},
{'name': 'salamander', 'category': 5},
{'name': 'horse', 'category': 0},
{'name': 'octopus', 'category': 4},
{'name': 'alligator', 'category': 3},
{'name': 'monkey', 'category': 0},
{'name': 'kangaroos', 'category': 0},
{'name': 'salmon', 'category': 2}]
sorted_animals = sorted(animals, key=lambda k: (k['category'])
How could I achieve this?
Thanks.
You are now sorting on the category id. All you need to do is map that id to a lookup for a given category name.
Create a dictionary for the categories first so you can directly map the numeric id to the associated name from the category list, then use that mapping when sorting:
catuid_to_name = {c['uid']: c['name'] for c in category}
sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
Demo:
>>> from pprint import pprint
>>> category = [{'uid': 0, 'name': 'mammals'},
... {'uid': 1, 'name': 'birds'},
... {'uid': 2, 'name': 'fish'},
... {'uid': 3, 'name': 'reptiles'},
... {'uid': 4, 'name': 'invertebrates'},
... {'uid': 5, 'name': 'amphibians'}]
>>> animals = [{'name': 'horse', 'category': 0},
... {'name': 'whale', 'category': 2},
... {'name': 'mollusk', 'category': 4},
... {'name': 'tuna ', 'category': 2},
... {'name': 'worms', 'category': 4},
... {'name': 'frog', 'category': 5},
... {'name': 'dog', 'category': 0},
... {'name': 'salamander', 'category': 5},
... {'name': 'horse', 'category': 0},
... {'name': 'octopus', 'category': 4},
... {'name': 'alligator', 'category': 3},
... {'name': 'monkey', 'category': 0},
... {'name': 'kangaroos', 'category': 0},
... {'name': 'salmon', 'category': 2}]
>>> catuid_to_name = {c['uid']: c['name'] for c in category}
>>> pprint(catuid_to_name)
{0: 'mammals',
1: 'birds',
2: 'fish',
3: 'reptiles',
4: 'invertebrates',
5: 'amphibians'}
>>> sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
>>> pprint(sorted_animals)
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'whale'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'salmon'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'worms'},
{'category': 4, 'name': 'octopus'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'monkey'},
{'category': 0, 'name': 'kangaroos'},
{'category': 3, 'name': 'alligator'}]
Note that within each category, the dictionaries have been left in relative input order. You could return a tuple of values from the sorting key to further apply a sorting order within each category, e.g.:
sorted_animals = sorted(
animals,
key=lambda k: (catuid_to_name[k['category']], k['name'])
)
would sort by animal name within each category, producing:
>>> pprint(sorted(animals, key=lambda k: (catuid_to_name[k['category']], k['name'])))
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'salmon'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'whale'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'octopus'},
{'category': 4, 'name': 'worms'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'kangaroos'},
{'category': 0, 'name': 'monkey'},
{'category': 3, 'name': 'alligator'}]
imo your category structure is far too complicated - at least as long as the uid is nothing but the index, you could simply use a list for that:
category = [c['name'] for c in category]
# ['mammals', 'birds', 'fish', 'reptiles', 'invertebrates', 'amphibians']
sorted_animals = sorted(animals, key=lambda k: category[k['category']])
#[{'name': 'frog', 'category': 5}, {'name': 'salamander', 'category': 5}, {'name': 'whale', 'category': 2}, {'name': 'tuna ', 'category': 2}, {'name': 'salmon', 'category': 2}, {'name': 'mollusk', 'category': 4}, {'name': 'worms', 'category': 4}, {'name': 'octopus', 'category': 4}, {'name': 'horse', 'category': 0}, {'name': 'dog', 'category': 0}, {'name': 'horse', 'category': 0}, {'name': 'monkey', 'category': 0}, {'name': 'kangaroos', 'category': 0}, {'name': 'alligator', 'category': 3}]

Categories

Resources