Nested dicts with empty lists to Pandas dataframe columns - python

I have some data from an API that I am trying to convert to a Pandas dataframe.
I am struggling to extract the 'station_xyz__cr' id number from the list in a nested dict (where a list can be empty as in the middle dataset).
output = {'data': [{'abc_serial_number__c': 'ABC2020-07571',
'id': 'V48000000000F79',
'modified_date__v': '2020-06-15T05:13:14.000Z',
'name__v': 'VVV-001039',
'station_xyz__cr': {'data': [{'id': 'V5J000000000B86'}],
'responseDetails': {'limit': 250,
'offset': 0,
'size': 1,
'total': 1}}},
{'abc_serial_number__c': 'ABC2020-09952',
'id': 'V48000000001B94',
'modified_date__v': '2020-06-24T11:30:40.000Z',
'name__v': 'VVV-004040',
'station_xyz__cr': {'data': [],
'responseDetails': {'limit': 250,
'offset': 0,
'size': 1,
'total': 1}}},
{'abc_serial_number__c': 'ABC2020-09196',
'id': 'V48000000001B95',
'modified_date__v': '2020-06-23T09:38:18.000Z',
'name__v': 'VVV-004041',
'station_xyz__cr': {'data': [{'id': 'V5J000000000Z10'}],
'responseDetails': {'limit': 250,
'offset': 0,
'size': 1,
'total': 1}}}],
'responseDetails': {'limit': 1000, 'offset': 0, 'size': 3, 'total': 3},
'responseStatus': 'SUCCESS'}
I'm trying to get the nested id data into a column in the dataframe something like this:
station_xyz__cr.data.id
0 V5J000000000B86
1 None
2 V5J000000000Z10
I've tried converting to a dataframe with json_normalize (droppping the columns I don't need):
df = pd.json_normalize(output['data'])
df = df.loc[:, ~df.columns.str.startswith('station_xyz__cr.responseDetails')]
print(df)
abc_serial_number__c id modified_date__v name__v \
0 ABC2020-07571 V48000000000F79 2020-06-15T05:13:14.000Z VVV-001039
1 ABC2020-09952 V48000000001B94 2020-06-24T11:30:40.000Z VVV-004040
2 ABC2020-09196 V48000000001B95 2020-06-23T09:38:18.000Z VVV-004041
station_xyz__cr.data
0 [{'id': 'V5J000000000B86'}]
1 []
2 [{'id': 'V5J000000000Z10'}]
but Im stuggling to convert the 'station_xyz__cr.data' list of dicts to simple dataframe of the ids:
df2 = pd.DataFrame(df['station_xyz__cr.data'].tolist(), index= df.index)
df2 = df2.rename(columns = {0:'station_xyz__cr.data'})
df2
station_xyz__cr.data
0 {'id': 'V5J000000000B86'}
1 None
2 {'id': 'V5J000000000Z10'}
The 'None' is causing me problems when I tried to extract further.
I tried replacing the None - but I could only replace with 0:
df.fillna(0, inplace=True)

Get the row index of None values. Using row index as a mask, set the row, col combinations to a default value that is consistent with the rest of the columns' values for next stage in data flow.
isna_idx = pd.isnull(df2['station_xyz__cr.data'])
df2.loc[isna_idx, ['station_xyz__cr.data']] = {'id': '...'}

Related

Convert a single column dataframe into an array/list of dictionaries in python

My dataframe is as shown:
score
timestamp
1645401600.0 10.4
1645405200.0 22.4
1645408800.0 36.2
I want to convert it to an array of dictionaries.
Expected Result is :
result=[
{
timestamp:1645401600.0
score:10.4
},
{
timestamp:1645405200.0
score:22.4
},
{
timestamp:1645408800.0
score:36.2
}
]
Reset the index and then use to_dict:
result = df.reset_index().to_dict('records')
Output:
>>> result
[{'timestamp': 1645401600.0, 'score': 10.4},
{'timestamp': 1645405200.0, 'score': 22.4},
{'timestamp': 1645408800.0, 'score': 36.2}]
df.to_dict('records')
This is what you are looking for
Important: parameter is 'records' and not 'record'
You can use to_dict with records(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html)
df.to_dict('records')
Here you go
import pandas as pd
df = pd.DataFrame({'score': [10.4, 22.4, 36.2]},
index = [1645401600.0, 1645405200.0, 1645408800.0])
df = df.rename_axis('timestamp').reset_index()
df = df.to_dict('records')
Output
>>> [{'timestamp': 1645401600.0, 'score': 10.4},
{'timestamp': 1645405200.0, 'score': 22.4},
{'timestamp': 1645408800.0, 'score': 36.2}]

How to filter through nested dictionaries in a list [duplicate]

This question already has answers here:
python filter list of dictionaries based on key value
(4 answers)
Closed 1 year ago.
I have a patient_list that contains nested dictionaries as below:
patient_list
[ {"id": 1
"hospital":{"id": 1
"doctor":{"id": 1}}
},
{"id": 2
"hospital":{"id": 2
"doctor":{"id": 1}}
}]
I am trying to filter into it so that I can get all the patients that belong to the doctor of id=1. I have tried different variations of the following solution but I did not succeed.Am not sure if it is the correct approach.I will appreciate any help.
id=1
result = [d for d in patient_list['hospital']['doctor'] if d['id'] == id]
you cannot index list with strings, it should be done on each list item.
idx = 1
results = [d for d in patient_list if d['hospital']['doctor']['id'] == idx]
this would translate into
idx = 1
results = []
for patient in patient_list:
hospital = patient['hospital']
doctor = hospital['doctor']
doctor_id = doctor['id']
if doctor_id == idx:
results.append(patient)
The json is in the question is incorrect.
Assuming the json is as follows:
In [6]: patient_list
Out[6]:
[{'id': 1, 'hospital': {'id': 1}, 'doctor': {'id': 1}},
{'id': 2, 'hospital': {'id': 2}, 'doctor': {'id': 1}}]
In [7]: [d for d in patient_list if d["doctor"]['id'] == id]
Out[7]:
[{'id': 1, 'hospital': {'id': 1}, 'doctor': {'id': 1}},
{'id': 2, 'hospital': {'id': 2}, 'doctor': {'id': 1}}]

How can I add column values from a pandas dataframe as a new key value pair in another column having dictionary?

I have a dataframe for example
df = {'dicts':[{'id': 0, 'text': 'Willamette'},
{'id': 1, 'text': 'Valley'}],
'ner': ["Person", "Location"]}
df= pd.DataFrame(df)
`
I want end result like
{'id': 0, 'text': 'Willamette', 'ner': 'Person'}
{'id': 1, 'text': 'Valley', 'ner': 'Location'}
`
I am using following logic but it isn't working for me-
for i, rows in df["dicts"].iteritems():
for cat in df['ner']:
df["dicts"][i]=df["dicts"][i].update({'ner' : df['ner'][cat]})
How can i solve this?
IIUC
d=pd.DataFrame(df.dicts.tolist(),index=df.index).join(df[['ner']]).to_dict('r')
[{'id': 0, 'text': 'Willamette', 'ner': 'Person'}, {'id': 1, 'text': 'Valley', 'ner': 'Location'}]

Extract key and value from json to new dataframe

I have a dataframe that has JSON values are in columns. Those were indented into multiple levels. I would like to extract the end key and value into a new dataframe. I will give you sample column values below
{'shipping_assignments': [{'shipping': {'address': {'address_type':
'shipping', 'city': 'Calder', 'country_id': 'US',
'customer_address_id': 1, 'email': 'roni_cost#example.com',
'entity_id': 1, 'firstname': 'Veronica', 'lastname': 'Costello',
'parent_id': 1, 'postcode': '49628-7978', 'region': 'Michigan',
'region_code': 'MI', 'region_id': 33, 'street': ['6146 Honey Bluff
Parkway'], 'telephone': '(555) 229-3326'}, 'method':
'flatrate_flatrate', 'total': {'base_shipping_amount': 5,
'base_shipping_discount_amount': 0,
'base_shipping_discount_tax_compensation_amnt': 0,
'base_shipping_incl_tax': 5, 'base_shipping_invoiced': 5,
'base_shipping_tax_amount': 0, 'shipping_amount': 5,
'shipping_discount_amount': 0,
'shipping_discount_tax_compensation_amount': 0, 'shipping_incl_tax':
5, 'shipping_invoiced': 5, 'shipping_tax_amount': 0}}, 'items':
[{'amount_refunded': 0, 'applied_rule_ids': '1',
'base_amount_refunded': 0, 'base_discount_amount': 0,
'base_discount_invoiced': 0, 'base_discount_tax_compensation_amount':
0, 'base_discount_tax_compensation_invoiced': 0,
'base_original_price': 29, 'base_price': 29, 'base_price_incl_tax':
31.39, 'base_row_invoiced': 29, 'base_row_total': 29, 'base_row_total_incl_tax': 31.39, 'base_tax_amount': 2.39,
'base_tax_invoiced': 2.39, 'created_at': '2019-09-27 10:03:45',
'discount_amount': 0, 'discount_invoiced': 0, 'discount_percent': 0,
'free_shipping': 0, 'discount_tax_compensation_amount': 0,
'discount_tax_compensation_invoiced': 0, 'is_qty_decimal': 0,
'item_id': 1, 'name': 'Iris Workout Top', 'no_discount': 0,
'order_id': 1, 'original_price': 29, 'price': 29, 'price_incl_tax':
31.39, 'product_id': 1434, 'product_type': 'configurable', 'qty_canceled': 0, 'qty_invoiced': 1, 'qty_ordered': 1,
'qty_refunded': 0, 'qty_shipped': 1, 'row_invoiced': 29, 'row_total':
29, 'row_total_incl_tax': 31.39, 'row_weight': 1, 'sku':
'WS03-XS-Red', 'store_id': 1, 'tax_amount': 2.39, 'tax_invoiced':
2.39, 'tax_percent': 8.25, 'updated_at': '2019-09-27 10:03:46', 'weight': 1, 'product_option': {'extension_attributes':
{'configurable_item_options': [{'option_id': '141', 'option_value':
167}, {'option_id': '93', 'option_value': 58}]}}}]}],
'payment_additional_info': [{'key': 'method_title', 'value': 'Check /
Money order'}], 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title':
'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount':
2.39}], 'item_applied_taxes': [{'type': 'product', 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent':
8.25, 'amount': 2.39, 'base_amount': 2.39}]}], 'converting_from_quote': True}
Above is single row value of the dataframe column df['x']
My codes are below to convert
sample = data['x'].tolist()
data = json.dumps(sample)
df = pd.read_json(data)
it gives new dataframe with columns
Index(['applied_taxes', 'converting_from_quote', 'item_applied_taxes',
'payment_additional_info', 'shipping_assignments'],
dtype='object')
When I tried to do the same above to convert the column which has row values
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
m_sample = m_df.tolist()
m_data = json.dumps(m_sample)
c_df = pd.read_json(m_data)
It doesn't work
Check this link to get the beautified_json
I came across a beautiful ETL package in python called petl. convert the json list into dict form with the help of function called fromdicts(json_string)
order_table = fromdicts(data_list)
If you find any nested dict in any of the columns, use unpackdict(order_table,'nested_col')
it will unpack the nested dict.
In my case, I need to unpack the applied_tax column. Below code will unpack and append the key and value as a column and row in the same table.
order_table = unpackdict(order_table, 'applied_taxes')
If you guys wants to know more about -petl
It seems that your mistake was in tolist(). Try the following:
import pandas as pd
import json
import re
data = {"shipping_assignments":[{"shipping":{"address":{"address_type":"shipping","city":"Calder","country_id":"US","customer_address_id":1,"email":"roni_cost#example.com","entity_id":1,"firstname":"Veronica","lastname":"Costello","parent_id":1,"postcode":"49628-7978","region":"Michigan","region_code":"MI","region_id":33,"street":["6146 Honey Bluff Parkway"],"telephone":"(555) 229-3326"},"method":"flatrate_flatrate","total":{"base_shipping_amount":5,"base_shipping_discount_amount":0,"base_shipping_discount_tax_compensation_amnt":0,"base_shipping_incl_tax":5,"base_shipping_invoiced":5,"base_shipping_tax_amount":0,"shipping_amount":5,"shipping_discount_amount":0,"shipping_discount_tax_compensation_amount":0,"shipping_incl_tax":5,"shipping_invoiced":5,"shipping_tax_amount":0}},"items":[{"amount_refunded":0,"applied_rule_ids":"1","base_amount_refunded":0,"base_discount_amount":0,"base_discount_invoiced":0,"base_discount_tax_compensation_amount":0,"base_discount_tax_compensation_invoiced":0,"base_original_price":29,"base_price":29,"base_price_incl_tax":31.39,"base_row_invoiced":29,"base_row_total":29,"base_row_total_incl_tax":31.39,"base_tax_amount":2.39,"base_tax_invoiced":2.39,"created_at":"2019-09-27 10:03:45","discount_amount":0,"discount_invoiced":0,"discount_percent":0,"free_shipping":0,"discount_tax_compensation_amount":0,"discount_tax_compensation_invoiced":0,"is_qty_decimal":0,"item_id":1,"name":"Iris Workout Top","no_discount":0,"order_id":1,"original_price":29,"price":29,"price_incl_tax":31.39,"product_id":1434,"product_type":"configurable","qty_canceled":0,"qty_invoiced":1,"qty_ordered":1,"qty_refunded":0,"qty_shipped":1,"row_invoiced":29,"row_total":29,"row_total_incl_tax":31.39,"row_weight":1,"sku":"WS03-XS-Red","store_id":1,"tax_amount":2.39,"tax_invoiced":2.39,"tax_percent":8.25,"updated_at":"2019-09-27 10:03:46","weight":1,"product_option":{"extension_attributes":{"configurable_item_options":[{"option_id":"141","option_value":167},{"option_id":"93","option_value":58}]}}}]}],"payment_additional_info":[{"key":"method_title","value":"Check / Money order"}],"applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}],"item_applied_taxes":[{"type":"product","applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}]}],"converting_from_quote":"True"}
df = pd.read_json(json.dumps(data))
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
c_df = pd.read_json(json.dumps(list(m_df)))
print(c_df)
prints the following:
0
0 {'code': 'US-MI-*-Rate 1', 'title': 'US-MI-*-R...

Getting TypeError when trying to retrieve values from keys in a list of dictionaries

I have an array of dictionaries in a pandas DataFrame:
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
1 [{'id': 12, 'name': 'Adventure'}, {'id': 88, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
2 [{'id': 10749, 'name': 'Romance'}, {'id': 77, 'name': 'Horror'}]
I am trying to get all the names from a single row into a simple list of Strings, like: "Horror, family, drama" etc for each row in the dataset.
I tried this code but I am getting the error: string indices must be integers
for y in df:
names = [x['name'] for x in y]
Any help is appriciated
Iterating over a data-frame iterates over the names of the columns, `:
In [15]: df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
In [16]: df
Out[16]:
a b
0 1 4
1 2 5
2 3 6
In [17]: for x in df:
...: print(x)
...:
a
b
It is like a dict that would iterate over it's keys.
You need something like:
df['your_column'].apply(lambda x: [d['name'] for d in x])
IIUC, this is dict not a list. you should using .get
[[y.get('name') for y in x ]for x in df['your columns']]
Out[578]:
[['Animation', 'Comedy', 'Family'],
['Adventure', 'Fantasy', 'Family'],
['Romance', 'Horror']]
Convert str
import ast
df.a=df.a.apply(ast.literal_eval)

Categories

Resources