Converting nested dictionary into a Pandas DataFrame - python

I have the following dictionary
{'data':[{'action_values': [
{'action_type': 'offsite_conversion', 'value': '5479.8'},
{'action_type': 'omni_add_to_cart', 'value': '9217.55'},
{'action_type': 'omni_purchase', 'value': '5479.8'},
{'action_type': 'add_to_cart', 'value': '9217.55'}]}]}
And I am trying to convert it where each element after action type is a pandas DataFrame column, and the value as row. Something like
offsite_conversion omni_add_to_cart omni_purchase add_to_cart
0 5479.8 9217.55 5479.8 9217.55

Using .json_normalize():
df = pd.json_normalize(data=data["data"], record_path="action_values").transpose().reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0]).reset_index(drop=True)

Related

How can i change the dictionary output format in python

I am getting output in this format
But I want output in this format
Any Help will be appreciated
Thankyou in Advance
I've tried to convert my data into an array but it doesn't work as i want
This is my output :
{'date': '2021-12-30 17:31:05.865139', 'sub_data': [{'key': 'day0', 'value': 255}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 8}, {'key': 'day7', 'value': 2}, {'key': 'day15', 'value': 3}, {'key': 'day30', 'value': 5}]}
{'date': '2021-12-31 17:31:05.907697', 'sub_data': [{'key': 'day0', 'value': 222}, {'key': 'day1', 'value': 1}, {'key': 'day3', 'value': 0}, {'key': 'day7', 'value': 0}, {'key': 'day15', 'value': 1}, {'key': 'day30', 'value': 0}]}]
There are a few ways you can generate a pandas dataframe the way you want. The output data you provide is very nested and you have to pull out data. A problem is, that in the sub-set data the dictionary keys are called 'key" and not the actual name. With a custom function you can prepare the data as needed:
Option I:
def generate_dataframe(dataset):
# Init empty DataFrame - bad practice
df_result = pd.DataFrame()
for data in dataset:
dataframe_row = {}
# Convert date
date_time_obj = datetime.strptime(data['date'], '%Y-%m-%d %H:%M:%S.%f')
dataframe_row['date'] = date_time_obj.strftime("%d%b%y")
for vals in data['sub_data']:
dataframe_row[vals['key']] = vals['value']
df_result = df_result.append(dataframe_row, ignore_index=True)
return df_result
dataset =[output_I,output_II]
df = generate_dataframe(dataset)
Option II:
Extract data and transpose sub data
def process_sub_data(data):
# convert subdate to dataframe first
df_data = pd.DataFrame(data['sub_data'])
# Transpose dataframe
df_data = df_data.T
# Make first row to column
df_data.columns = df_data.iloc[0]
df_data = df_data.iloc[1:].reset_index(drop=True)
Option III
You can try to format nested data with
df_res = pd.json_normalize(data, max_level=2)
This will not work properly as your column names (day1, ... day30) are not the keys of the dict
Hope I could help :)

Pandas: Convert dictionary to dataframe where keys and values are the columns

I have a dictionary like so:
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI', '7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI', '9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
I would like to convert my dictionary into something like this to make a dataframe where I put all the keys and values in a separate list.
d = {'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e', '7c975c26-f9fc-4579-822d-a1042b82cb17', '9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI', 'SUCCEEDED-AEN-IC_GBI', 'SUCCEEDED-ESP2-IC_GBI']
What would be the best way to go about this?
You can easily create a DataFrame like this:
import pandas as pd
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI',
'7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI',
'9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
table = pd.DataFrame(d.items(), columns=['key', 'value'])
If you just want to rearrange your Dictionary you could do this:
d2 = {'key': list(d.keys()), 'value': list(d.values())}
Since you tagged pandas, try:
pd.Series(d).reset_index(name='value').to_dict('list')
Output:
{'index': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
Pure python:
{'key':list(d.keys()), 'value': list(d.values())}
output:
{'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
You can create the dataframe zipping the key/value lists with zip function:
import pandas as pd
df = pd.DataFrame(list(zip(d.keys(),d.values())), columns=['key','value'])

Converting Dictionary Values to new Dictionary

I am pulling json data from an API and have a number of columns in my dataframe that contain dictionaries. These dictionaries are written so that the id and the value are two separate entries in the dictionary like so:
{'id': 'AnnualUsage', 'value': '13071'}
Some of the rows for these columns contain only one dictionary entry like shown above, but others can contain up to 7:
[{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
When I attempt to break this dictionary down into separate column attributes
CTG_df2 = pd.concat([CTG_df['id'], CTG_df['applicationUDFs'].apply(pd.Series)], axis=1)
I end up with columns in a dataframe each containing a dictionary of the above entry i.e.
{'id': 'AnnualUsageDE', 'value': '13071'}
Is there a way for me to convert my dictionary values into new key-value pairs? For instance I would like to convert from:
{'id': 'AnnualUsageDE', 'value': '13071'}
to
{'AnnualUsageDE': '13071'}
If this is possible I will then be able to create new columns from these attributes.
You can do a dict comprehension. From your list of dicts, compose a new dict where the key is the id of each element and the value is the value of each element.
original = [{'id': 'AnnualUsage', 'value': '13071'},
{'id': 'TestId', 'value': 'Z13753'},
{'id': 'NumberOfMe', 'value': '3'},
{'id': 'Prem', 'value': '960002'},
{'id': 'ProjectID', 'value': '0039'},
{'id': 'Region', 'value': 'CHR'},
{'id': 'Tariff', 'value': 'Multiple'},
{'id': 'Number', 'value': '06860702'}]
newdict = {subdict['id']: subdict['value'] for subdict in original}
print(newdict)
# {'AnnualUsage': '13071',
# 'Number': '06860702',
# 'NumberOfMe': '3',
# 'Prem': '960002',
# 'ProjectID': '0039',
# 'Region': 'CHR',
# 'Tariff': 'Multiple',
# 'TestId': 'Z13753'}
You can iterate through the values and set each of them to the dictionary value:
newdict = dict()
for x in original:
newdict[x["id"]] = x["value"]

Separate pd DataFrame Rows that are dictionaries into columns

I am extracting some data from an API and having challenges transforming it into a proper dataframe.
The resulting DataFrame df is arranged as such:
Index Column
0 {'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
1 {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}
I am trying to split the emails into one column and the list into a separate column:
Index Column1 Column2
0 email#email.com [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
Ideally, each 'action'/'date' would have it's own separate row, however I believe I can do the further unpacking myself.
After looking around I tried/failed lots of solutions such as:
df.apply(pd.Series) # does nothing
pd.DataFrame(df['column'].values.tolist()) # makes each dictionary key as a separate colum
where most of the rows are NaN except one which has the pair value
Edit:
As many of the questions asked the initial format of the data in the API, it's a list of dictionaries:
[{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]},{'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
Thanks
One naive way of doing this is as below:
inp = [{'email#email.com': [{'action': 'data', 'date': 'date'}, {'action': 'data', 'date': 'date'}]}
, {'different-email#email.com': [{'action': 'data', 'date': 'date'}]}]
index = 0
df = pd.DataFrame()
for each in inp: # iterate through the list of dicts
for k, v in each.items(): #take each key value pairs
for eachv in v: #the values being a list, iterate through each
print (str(eachv))
df.set_value(index,'Column1',k)
df.set_value(index,'Column2',str(eachv))
index += 1
I am sure there might be a better way of writing this. Hope this helps :)
Assuming you have already read it as dataframe, you can use following -
import ast
df['Column'] = df['Column'].apply(lambda x: ast.literal_eval(x))
df['email'] = df['Column'].apply(lambda x: x.keys()[0])
df['value'] = df['Column'].apply(lambda x: x.values()[0])

JSON to Pandas Dataframe not knowing if JSON will have all the columns of the dataframe

I am doing a research project and trying to pull thousands of quarterly results for companies from the SEC EDGAR API.
Each result is a list of dictionaries structured as follows:
[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}...]
I want each result to be a row of a pandas dataframe. The issue is that each result may not have the same fields due to the data available. I would like to check if the column(field) of the dataframe is present in one of the results field and if it is add the result value to the row. If not, I would like to add an np.NaN. How would I go about doing this?
A list/dict comprehension ought to work here:
In [11]: s
Out[11]:
[[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}],
[{'field': 'othercurrentliabilities', 'value': 6886000000.0}]]
In [12]: pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s])
Out[12]:
othercurrentliabilities otherliabilities propertyplantequipmentnet
0 6.886000e+09 1.370000e+10 1.578900e+10
1 6.886000e+09 NaN NaN
make a list of df.result.rows[x]['values']
like below
s=[]
for x in range(df.result.totalrows[0]):
s=s+[df.result.rows[x]['values']]
print(x)
df1=pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s]
df1
will give you result.

Categories

Resources