creating pandas df from disctionary in python - python

I have data coming from API like below:
> {'Message': {'Success': True, 'ErrorMessage': ''},
> 'StoresAttributes': [{'StoreCode': '1004',
> 'Categories': [{'Code': 'Lctn',
> 'Attribute': {'Code': 'Long', 'Value': '16.99390523395146'}},
> {'Code': 'Lctn',
> 'Attribute': {'Code': 'Lat', 'Value': '52.56718450856377'}},
> {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}},
> {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}}]},
> {'StoreCode': '1005',
> 'Categories': [{'Code': 'Lctn',
> 'Attribute': {'Code': 'Long', 'Value': '14.2339250'}},
> {'Code': 'Lctn', 'Attribute': {'Code': 'Lat', 'Value': '53.8996090'}},
> {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}},
> {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}},
> {'Code': 'Offr', 'Attribute': {'Code': 'Bchi', 'Value': 'True'}}]},
And I want to make data frame from it. I have tried with loop or pd.DataFrame() function but it didn't work properly.
What I want to achieve is df with subsequent columns:
StoreCode: 1004,
Long: 16,99,
Lat: 52,56,
Bake: True.
Can please anyone help?
Below screen with my result from json_normalize
error

You can use json_normalize then pivot:
import pandas as pd
data = {'Message': {'Success': True, 'ErrorMessage': ''}, 'StoresAttributes': [{'StoreCode': '1004', 'Categories': [{'Code': 'Lctn', 'Attribute': {'Code': 'Long', 'Value': '16.99390523395146'}}, {'Code': 'Lctn', 'Attribute': {'Code': 'Lat', 'Value': '52.56718450856377'}}, {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}}, {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}}]}, {'StoreCode': '1005', 'Categories': [{'Code': 'Lctn', 'Attribute': {'Code': 'Long', 'Value': '14.2339250'}}, {'Code': 'Lctn', 'Attribute': {'Code': 'Lat', 'Value': '53.8996090'}}, {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}}, {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}}, {'Code': 'Offr', 'Attribute': {'Code': 'Bchi', 'Value': 'True'}}]}]}
df = pd.json_normalize(data['StoresAttributes'], meta='StoreCode', record_path='Categories')
df.pivot(columns='Attribute.Code', values='Attribute.Value', index='StoreCode')
Output:
Attribute.Code Bake Bchi Lat Long SCO
StoreCode
1004 True NaN 52.56718450856377 16.99390523395146 True
1005 True True 53.8996090 14.2339250 True

You can use json_normalize() function like this:
data = [
{"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
{"name": {"given": "Mark", "family": "Regner"}},
{"id": 2, "name": "Faye Raker"},
]
pd.json_normalize(data)
Output:
id name.first name.last name.given name.family name
0 1.0 Coleen Volk NaN NaN NaN
1 NaN NaN NaN Mark Regner NaN
2 2.0 NaN NaN NaN NaN Faye Raker
You can refer to the below link to know more about the json_normalize() function.
CLICK HERE

Related

multinested json python to df

I have a json response with productdetails of multiple products in this structure
all_content=[{'productcode': '0502570SRE',
'brand': {'code': 'MJ', 'name': 'Marie Jo'},
'series': {'code': '0257', 'name': 'DANAE'},
'family': {'code': '0257SRE', 'name': 'DANAE Red'},
'style': {'code': '0502570'},
'introSeason': {'code': '226', 'name': 'WINTER 2022'},
'seasons': [{'code': '226', 'name': 'WINTER 2022'}],
'otherColors': [{'code': '0502570ZWA'}, {'code': '0502570PIR'}],
'synonyms': [{'code': '0502571SRE'}],
'stayerB2B': False,
'name': [{'language': 'de', 'text': 'DANAE Rot Rioslip'},
{'language': 'sv', 'text': 'DANAE Red '},
{'language': 'en', 'text': 'DANAE Red rio briefs'},
{'language': 'it', 'text': 'DANAE Rouge slip brasiliano'},
{'language': 'fr', 'text': 'DANAE Rouge slip brésilien'},
{'language': 'da', 'text': 'DANAE Red rio briefs'},
{'language': 'nl', 'text': 'DANAE Rood rioslip'},
.......]
what i need is a dataframe with for each productcode only the values of specific keys in the subdictionaries. for ex.
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip
I've tried a nested for loop which gives me a list of all values of one specific key - value pair in a subdict.
for item in all_content:
synonyms_details = item ['synonyms']
for i in synonyms_details:
print (i['code'])
How do I get from here to a DF like this
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip
took another route with json_normalize to make a flattened df with original key in column
# test with json_normalize to extract list_of_dicts_of_list_of_dicts
# meta prefix necessary avoiding conflicts multi_use 'code'as key
df_syn = pd.json_normalize(all_content, record_path = ['synonyms'], meta = 'code', meta_prefix = 'org_')
result
code org_code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473TF
changed column name for merge with orginal df
df_syn = df_syn.rename(columns={"code": "syn_code", "org_code":"code"})
result
syn_code code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473T
merged flattened df with left merge based on combined key
df = pd.merge(left=df, right=df_syn, how='left', left_on='code', right_on='code')
result see last column. NaN because not every product has a synonym.
code brand series family style introSeason seasons otherColors synonyms stayerB2B ... composition recycleComposition media assortmentIds preOrderPeriod firstDeliveryDates productGroup language text syn_code
0 0502570SRE {'code': 'MJ', 'name': 'Marie Jo'} {'code': '0257', 'name': 'DANAE'} {'code': '0257SRE', 'name': 'DANAE Red'} {'code': '0502570'} {'code': '226', 'name': 'WINTER 2022'} [{'code': '226', 'name': 'WINTER 2022'}] [{'code': '0502570ZWA'}, {'code': '0502570PIR'}] [] False ... [{'material': [{'language': 'de', 'text': 'Pol... [{'origin': [{'language': 'de', 'text': 'Nicht... [{'type': 'IMAGE', 'imageType': 'No body pictu... [BO_MJ] {'startDate': '2022-01-01', 'endDate': '2022-0... {'common': '2022-09-05', 'deviations': []} {'code': '0SRI1'} nl DANAE Rood rioslip NaN
Next step is to get one step deeper to get the values of a list_of_lists_of_lists.
Suggestions for a more straightforward way? Extracting data like this for all nested values is quite timeconsuming.

How to check if value inside of json and get the object that contains the keys with that value?

I have this JSON object
{'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}
How do I check and get the keys 'code' inside 'rates' contain value 'Expedia'?
This is what I tried but did not succeed.
if ('code','Expedia') in json_object['result']['rates'].items():
print(json_object['result']['rates'])
else:
print('no price')
json_object['result']['rates'] is a list, not a dict, so it doesn't have an items() method. What you want is something like:
[rate for rate in json_object['result']['rates'] if rate['code'] == 'Expedia']
This will give you a list of all the dictionaries in rates matching the criteria you're looking for; there might be none, one, or more.
The issue is that the data inside the "rates" key is a list. Take a look at the following:
>>> data = {'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}}
>>> type(data["result"]["rates"])
<class 'list'>
I'm not sure if I understand what you're trying to do exactly but maybe you are looking for something like this:
>>> data["result"]["rates"]
[{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}, {'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0}, {'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0}]
>>> for rate in data["result"]["rates"]:
... code = rate["code"]
... if code == "Expedia":
... print(rate)
...
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}
>>>
You want to look through the list of rates and seach the ones with el['code'] == 'Expedia' :
d = {'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}
}
print([el for el in d['result']['rates'] if el['code']=='Expedia'])
>>> [{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}]

getting 'TypeError: object of type 'float' has no len()' when trying to convert Json into Dataframe

I am trying to create dataframe from the json which I fetched from Quickbooks APAgingSummary API, but I am getting an error "TypeError: object of type 'float' has no len()", when I am inserting json_normalize data in the form of list to pandas. I used the same code for creating Dataframe from Quickbooks AccountListDetail API Json and it was working fine.
This code was used for fetching data:
base_url = 'https://sandbox-quickbooks.api.intuit.com'
url = f"{base_url}/v3/company/{auth_client.realm_id}/reports/AgedPayables?&minorversion=62"
auth_header = f'Bearer {auth_client.access_token}'
headers = {
'Authorization': auth_header,
'Accept': 'application/json'
}
response = requests.get(url, headers=headers)
responseJson = response.json()
responseJson
This is the responseJson:
{'Header': {'Time': '2021-10-05T04:33:02-07:00',
'ReportName': 'AgedPayables',
'DateMacro': 'today',
'StartPeriod': '2021-10-05',
'EndPeriod': '2021-10-05',
'SummarizeColumnsBy': 'Total',
'Currency': 'USD',
'Option': [{'Name': 'report_date', 'Value': '2021-10-05'},
{'Name': 'NoReportData', 'Value': 'false'}]},
'Columns': {'Column': [{'ColTitle': '', 'ColType': 'Vendor'},
{'ColTitle': 'Current',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': 'current'}]},
{'ColTitle': '1 - 30',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '0'}]},
{'ColTitle': '31 - 60',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '1'}]},
{'ColTitle': '61 - 90',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '2'}]},
{'ColTitle': '91 and over',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': '3'}]},
{'ColTitle': 'Total',
'ColType': 'Money',
'MetaData': [{'Name': 'ColKey', 'Value': 'total'}]}]},
'Rows': {'Row': [{'ColData': [{'value': 'Brosnahan Insurance Agency',
'id': '31'},
{'value': ''},
{'value': '241.23'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '241.23'}]},
{'ColData': [{'value': "Diego's Road Warrior Bodyshop", 'id': '36'},
{'value': '755.00'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '755.00'}]},
{'ColData': [{'value': 'Norton Lumber and Building Materials', 'id': '46'},
{'value': ''},
{'value': '205.00'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '205.00'}]},
{'ColData': [{'value': 'PG&E', 'id': '48'},
{'value': ''},
{'value': ''},
{'value': '86.44'},
{'value': ''},
{'value': ''},
{'value': '86.44'}]},
{'ColData': [{'value': 'Robertson & Associates', 'id': '49'},
{'value': ''},
{'value': '315.00'},
{'value': ''},
{'value': ''},
{'value': ''},
{'value': '315.00'}]},
{'Summary': {'ColData': [{'value': 'TOTAL'},
{'value': '755.00'},
{'value': '761.23'},
{'value': '86.44'},
{'value': '0.00'},
{'value': '0.00'},
{'value': '1602.67'}]},
'type': 'Section',
'group': 'GrandTotal'}]}}
this is the code where I am getting the error:
colHeaders = []
for i in responseJson['Columns']['Column']:
colHeaders.append(i['ColTitle'])
responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)
this is the responseDf after json_normalize:
ColData type group Summary.ColData
0 [{'value': 'Brosnahan Insurance Agency', 'id':... NaN NaN NaN
1 [{'value': 'Diego's Road Warrior Bodyshop', 'i... NaN NaN NaN
2 [{'value': 'Norton Lumber and Building Materia... NaN NaN NaN
3 [{'value': 'PG&E', 'id': '48'}, {'value': ''},... NaN NaN NaN
4 [{'value': 'Robertson & Associates', 'id': '49... NaN NaN NaN
5 NaN Section GrandTotal [{'value': 'TOTAL'}, {'value': '755.00'}, {'va...
each element of ColData contains list of dictionaries.
and This is the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-215-6ce65ce2ac94> in <module>
6
7 responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
----> 8 responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)
9 responseDf
10
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
507 if is_named_tuple(data[0]) and columns is None:
508 columns = data[0]._fields
--> 509 arrays, columns = to_arrays(data, columns, dtype=dtype)
510 columns = ensure_index(columns)
511
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in to_arrays(data, columns, coerce_float, dtype)
522 return [], [] # columns if columns is not None else []
523 if isinstance(data[0], (list, tuple)):
--> 524 return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
525 elif isinstance(data[0], abc.Mapping):
526 return _list_of_dict_to_arrays(
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
559 else:
560 # list of lists
--> 561 content = list(lib.to_object_array(data).T)
562 # gh-26429 do not raise user-facing AssertionError
563 try:
pandas\_libs\lib.pyx in pandas._libs.lib.to_object_array()
TypeError: object of type 'float' has no len()
Any help will be really appreciated.
You got the error because there is NaN value on the ColData column in responseDf. NaN is considered float type and has no len(), hence the error.
To solve the problem, you can init the NaN with list of empty dict with .fillna(), as follows:
responseDf['ColData'] = responseDf['ColData'].fillna({i: [{}] for i in responseDf.index})
Put the codes immediately after the line with pd.json_normalize
The full set of codes will be:
colHeaders = []
for i in responseJson['Columns']['Column']:
colHeaders.append(i['ColTitle'])
responseDf = pd.json_normalize(responseJson["Rows"]["Row"])
## Add the code here
responseDf['ColData'] = responseDf['ColData'].fillna({i: [{}] for i in responseDf.index})
responseDf[colHeaders] = pd.DataFrame(responseDf.ColData.tolist(), index= responseDf.index)
Then, you will get through the error and get the result of responseDf, as follows:
print(responseDf)
ColData type group Summary.ColData Current 1 - 30 31 - 60 61 - 90 91 and over Total
0 [{'value': 'Brosnahan Insurance Agency', 'id': '31'}, {'value': ''}, {'value': '241.23'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '241.23'}] NaN NaN NaN {'value': 'Brosnahan Insurance Agency', 'id': '31'} {'value': ''} {'value': '241.23'} {'value': ''} {'value': ''} {'value': ''} {'value': '241.23'}
1 [{'value': 'Diego's Road Warrior Bodyshop', 'id': '36'}, {'value': '755.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '755.00'}] NaN NaN NaN {'value': 'Diego's Road Warrior Bodyshop', 'id': '36'} {'value': '755.00'} {'value': ''} {'value': ''} {'value': ''} {'value': ''} {'value': '755.00'}
2 [{'value': 'Norton Lumber and Building Materials', 'id': '46'}, {'value': ''}, {'value': '205.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '205.00'}] NaN NaN NaN {'value': 'Norton Lumber and Building Materials', 'id': '46'} {'value': ''} {'value': '205.00'} {'value': ''} {'value': ''} {'value': ''} {'value': '205.00'}
3 [{'value': 'PG&E', 'id': '48'}, {'value': ''}, {'value': ''}, {'value': '86.44'}, {'value': ''}, {'value': ''}, {'value': '86.44'}] NaN NaN NaN {'value': 'PG&E', 'id': '48'} {'value': ''} {'value': ''} {'value': '86.44'} {'value': ''} {'value': ''} {'value': '86.44'}
4 [{'value': 'Robertson & Associates', 'id': '49'}, {'value': ''}, {'value': '315.00'}, {'value': ''}, {'value': ''}, {'value': ''}, {'value': '315.00'}] NaN NaN NaN {'value': 'Robertson & Associates', 'id': '49'} {'value': ''} {'value': '315.00'} {'value': ''} {'value': ''} {'value': ''} {'value': '315.00'}
5 [{}] Section GrandTotal [{'value': 'TOTAL'}, {'value': '755.00'}, {'value': '761.23'}, {'value': '86.44'}, {'value': '0.00'}, {'value': '0.00'}, {'value': '1602.67'}] {} None None None None None None

From Dataframe to nested Dictionary in Python

I want to transform to a specific nested dictionary format.
The dataframe df looks like this:
OC OZ ON WT DC DZ DN
0 PL 97 TP 59 DE 63 DC
3 US 61 SU 9 US 95 SU
The expected output is this:
{'location':
{
'zipCode':
{'country': 'PL',
'code': '97'},
'location': {'id': '1'},
'longName': 'TP',
},
'CarriageParameter':
{'road':
{'truckLoad': 'Auto'}
},
'load':
{'weight': '59',
'unit': 'ton',
'showEmissionsAtResponse': 'true'
}
},
{'location':
{
'zipCode':
{'country': 'DE',
'code': '63'},
'location': {'id': '2'},
'longName': 'DC']
},
'CarriageParameter':
{'road':
{'truckLoad': 'Auto'}
},
'unload': {
'weight': '59'),
'unit': 'ton',
'showEmissionsAtResponse': 'true'
}
}
I've tried this code below but i am only getting one part of the dictionary:
dic = {}
dic['section'] = []
for ix, row in df.iterrows():
in_dict1 = {
'location':
{
'zipCode':
{'country': row['OC'],
'code': row['OZ']},
'location': {'id': '1'},
'longName': row['ON'],
},
'CarriageParameter':
{'road':
{'truckLoad': 'Auto'}
},
'load':
{'weight': str(row['WT']),
'unit': 'ton',
'showEmissionsAtResponse': 'true'
}
},
in_dict2 = {'location':
{
'zipCode':
{'country': row['DC'],
'code': row['DZ']},
'location': {'id': '2'},
'longName': row['DN']
},
'CarriageParameter':
{'road':
{'truckLoad': 'Auto'}
},
'unload': {
'weight': str(row['WT']),
'unit': 'ton',
'showEmissionsAtResponse': 'true'
}
}
dic['section'].append(in_dict1)
the pretty print of the first row below:
{'section': [{'CarriageParameter': {'road': {'truckLoad': 'Auto'}},
'load': {'showEmissionsAtResponse': 'true',
'unit': 'ton',
'weight': '59'},
'location': {'location': {'id': '1'},
'longName': 'TP COLEP_GRABICA PL',
'zipCode': {'code': '97-306', 'country': 'PL'}}}]}
I would expect the second part of the dictionary it's kind of lost somewhere...
How to fix this issue ?
It's because you didn't add the second dictionary, you could try this:
import pandas as pd
import io
s_e='''
OC OZ ON WT DC DZ DN
0 PL 97 TP 59 DE 63 DC
3 US 61 SU 9 US 95 SU
'''
df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', parse_dates=[1,2], engine='python')
dic = {}
dic['section'] = []
for ix, row in df.iterrows():
in_dict1 = {
'location':
{
'zipCode':
{'country': row['OC'],
'code': row['OZ']},
'location': {'id': '1'},
'longName': row['ON'],
},
'CarriageParameter':
{'road':
{'truckLoad': 'Auto'}
},
'load':
{'weight': str(row['WT']),
'unit': 'ton',
'showEmissionsAtResponse': 'true'
}
}
in_dict2 = {'location':
{
'zipCode':
{'country': row['DC'],
'code': row['DZ']},
'location': {'id': '2'},
'longName': row['DN']
},
'CarriageParameter':
{'road':
{'truckLoad': 'Auto'}
},
'unload': {
'weight': str(row['WT']),
'unit': 'ton',
'showEmissionsAtResponse': 'true'
}
}
dic['section'].append(in_dict1)
dic['section'].append(in_dict2)
print(dic['section'])
Output:
[{'location': {'zipCode': {'country': 'PL', 'code': 97}, 'location': {'id': '1'}, 'longName': 'TP'}, 'CarriageParameter': {'road': {'truckLoad': 'Auto'}}, 'load': {'weight': '59', 'unit': 'ton', 'showEmissionsAtResponse': 'true'}}, {'location': {'zipCode': {'country': 'DE', 'code': 63}, 'location': {'id': '2'}, 'longName': 'DC'}, 'CarriageParameter': {'road': {'truckLoad': 'Auto'}}, 'unload': {'weight': '59', 'unit': 'ton', 'showEmissionsAtResponse': 'true'}}, {'location': {'zipCode': {'country': 'US', 'code': 61}, 'location': {'id': '1'}, 'longName': 'SU'}, 'CarriageParameter': {'road': {'truckLoad': 'Auto'}}, 'load': {'weight': '9', 'unit': 'ton', 'showEmissionsAtResponse': 'true'}}, {'location': {'zipCode': {'country': 'US', 'code': 95}, 'location': {'id': '2'}, 'longName': 'SU'}, 'CarriageParameter': {'road': {'truckLoad': 'Auto'}}, 'unload': {'weight': '9', 'unit': 'ton', 'showEmissionsAtResponse': 'true'}}]

Max-valued-elements from list group by element's code

I want to extract all elements of the maximum running date grouped by their codes from a list of dictionary.
Here is what I got so far:
import datetime
from itertools import groupby
commission_list = [
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-12', '%Y-%m-%d'), 'value': 150},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-12', '%Y-%m-%d'), 'value': 450},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-16', '%Y-%m-%d'), 'value': 140},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-17', '%Y-%m-%d'), 'value': 120},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-17', '%Y-%m-%d'), 'value': 220},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-11', '%Y-%m-%d'), 'value': 150},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-15', '%Y-%m-%d'), 'value': 140},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-16', '%Y-%m-%d'), 'value': 160},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-19', '%Y-%m-%d'), 'value': 210},
{'code': 'COMMISSION_CODE3', 'runningdt': datetime.datetime.strptime('2016-04-16', '%Y-%m-%d'), 'value': 330},
{'code': 'COMMISSION_CODE3', 'runningdt': datetime.datetime.strptime('2016-04-20', '%Y-%m-%d'), 'value': 310},
{'code': 'COMMISSION_CODE3', 'runningdt': datetime.datetime.strptime('2016-04-20', '%Y-%m-%d'), 'value': 410},
]
latest_run_commissions = []
for key, commission_group in groupby(commission_list, lambda x: x['code']):
tem = list(commission_group)
the_last_com = (max(tem, key=lambda x: x['runningdt']))
filtered_objs = filter(lambda f: f['runningdt'] == the_last_com['runningdt'], tem)
for o in filtered_objs:
latest_run_commissions.append(o)
for f in latest_run_commissions:
print(f)
print(" ")
Are there any more effective and efficient ways out there? Your advice or suggestions will be much appreciated and welcomed.
You can use itemgetter from the operator module to do this efficiently.
In [16]: from operator import itemgetter
In [17]: sorted_data = sorted(commission_list, key=itemgetter('code'))
In [18]: for g, data in groupby(sorted_data, key=itemgetter('code')):
....: print(max(data, key=itemgetter('runningdt')))
....:
{'runningdt': datetime.datetime(2016, 4, 17, 0, 0), 'code': 'COMMISSION_CODE1', 'value': 120}
{'runningdt': datetime.datetime(2016, 4, 19, 0, 0), 'code': 'COMMISSION_CODE2', 'value': 210}
{'runningdt': datetime.datetime(2016, 4, 20, 0, 0), 'code': 'COMMISSION_CODE3', 'value': 310}

Categories

Resources