Python Create dataframe from nested dict with lists - python

I am trying to create a dataframe / csv that looks like this
App
id
stages
requestCpu
requestMemory
appName
123
dev
1000
1024
appName
123
staging
3200
1024
The dict data looks like this and includes quite a lot of apps, however all the data inside the apps looks the same with the dict layout:
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2"...}
I used something like this before:
df = pd.DataFrame.from_dict(test_data, orient='index')
df = pd.concat([df.drop(['stages'], axis=1), (df['stages'].apply(pd.Series))], axis=1)
df.index.name = "App"
However this wasn't able to split up the list part and also the stages were now in columns so not how i wanted it to look..
Any help much appreciated, thanks

Easiest solution would be to iterate the rows prior to loading it with pandas:
import pandas as pd
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2": {"id": "456", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}}
rows = []
for app, app_data in test_data.items():
for stage, stage_data in app_data["stages"].items():
row = {
"App": app,
"id": app_data["id"],
"stages": stage
}
for metric in stage_data:
metric_name, metric_value = list(metric.items())[0]
row[metric_name] = metric_value
rows.append(row)
df = pd.json_normalize(rows)
# Reorder columns
df = df[["App", "id", "stages", "request.cpu", "request.memory"]]
Output:
App
id
stages
request.cpu
request.memory
0
appName
123
dev
1000
1024
1
appName
123
staging
3200
1024
2
appName2
456
dev
1000
1024
3
appName2
456
staging
3200
1024

Related

How to remove delimeted pipe from my json column and split them to different columns and their respective values

"description": ID|100|\nName|Sam|\nCity|New York City|\nState|New York|\nContact|1234567890|\nEmail|1234#yahoo.com|
This is how my code in json looks like. I wanted to convert this json file to excel sheet to split the nested column to separate columns and have used pandas for it, but couldn't achieve it. The output I want in my excel sheet is:
ID Name City State Contact Email
100 Sam New York City New York 1234567890 1234#yahoo.com
I want to remove those pipes and the solution should be in pandas. Please help me out with this.
The code I am trying:
I want output as:
The output on excel sheet:
[2]: https://i.stack.imgur.com/QjSUU.png
The list of dict column looks like:
"assignees": [{
"id": 1234,
"username": "xyz",
"name": "XYZ",
"state": "active",
"avatar_url": "aaaaaaaaaaaaaaa",
"web_url": "bbbbbbbbbbb"
},
{
"id": 5678,
"username": "abcd",
"name": "ABCD",
"state": "active",
"avatar_url": "hhhhhhhhhhh",
"web_url": "mmmmmmmmm"
}
],
This could be one way:
import pandas as pd
df = pd.read_json('Sample.json')
df2 = pd.DataFrame()
for i in df.index:
desc = df['description'][i]
attributes = desc.split("\n")
d = {}
for attrib in attributes:
if not(attrib.startswith('Name') or attrib.startswith('-----')):
kv = attrib.split("|")
d[kv[0]] = kv[1]
df2 = df2.append(d, ignore_index=True)
print(df2)
df2.to_csv("output.csv")
Output xls:

convert json data to pandas dataframe in python (dictionary inside list )

I have json data like below:
{"name": "Monkey", "image": "https://media.npr.org/assets/img/2017/09/12/macaca_nigra_self-portrait-3e0070aa19a7fe36e802253048411a38f14a79f8-s800-c85.webp", "attributes": [{"trait_type": "Bones", "value": "Zombie"}, {"trait_type": "Clothes", "value": "Striped"}, {"trait_type": "Mouth", "value": "Bubblegum"}, {"trait_type": "Eyes", "value": "Black Sunglasses"}, {"trait_type": "Hat", "value": "Sushi"}, {"trait_type": "Background", "value": "Purple"}]}
I want to convert this json data as pandas dataframe only selecting the attributes as filter it as below:
Bones Clothes Mouth Eyes Hat Background
zombie striped bubblegum black sushi purple
Can any expert please help me to get the output as i mentioned
Thank you
There is probably a prettier solution but this does the job:
import json
import pandas as pd
with open('file.json') as f:
trait_types= []
values = []
data = json.load(f)
df = pd.DataFrame(data)
for key in data['attributes']:
trait_types.append(key['trait_type'])
values.append(key['value'])
df = pd.DataFrame({
'trait type': trait_types,
'value' : values})
print(df)

Flattening nested JSON to pandas.DataFrame: Ordering and Naming Columns based on dictionary values

My question raised when I exploited this helpful answer provided by Trenton McKinney on the issue of flattening multiple nested JSON-files for handling in pandas.
Following his advice, I have used the flatten_json function described here to flatten a batch of nested json files. However, I have run into a problem with the uniformity of my JSON-files.
A single JSON-File looks roughly like this made-up example data:
{
"product": "example_productname",
"product_id": "example_productid",
"product_type": "example_producttype",
"producer": "example_producer",
"currency": "example_currency",
"client_id": "example_clientid",
"supplement": [
{
"supplementtype": "RTZ",
"price": 300000,
"rebate": "500",
},
{
"supplementtype": "CVB",
"price": 500000,
"rebate": "250",
},
{
"supplementtype": "JKL",
"price": 100000,
"rebate": "750",
},
],
}
Utilizing the referenced code, I will end up with data looking like this:
product
product_id
product_type
producer
currency
client_id
supplement_0_supplementtype
supplement_0_price
supplement_0_rebate
supplement_1_supplementtype
supplement_1_price
supplement_1_rebate
etc
example_productname
example_productid
example_type
example_producer
example_currency
example_clientid
RTZ
300000
500
CVB
500000
250
etc
example_productname2
example_productid2
example_type2
example_producer2
example_currency2
example_clientid2
CVB
500000
250
RTZ
300000
500
etc
There are multiple issues with this.
Firstly, in my data, there is a limited list of "supplements", however, they do not always appear, and if they do, they are not always in the same order. In the example table, you can see that the two "supplements" switched positions in the second row. I would prefer a fixed order of the "supplement columns".
Secondly, the best option would be a table like this:
product
product_id
product_type
producer
currency
client_id
supplement_RTZ_price
supplement_RTZ_rebate
supplement_CVB_price
supplement_CVB_rebate
etc
example_productname
example_productid
example_type
example_producer
example_currency
example_clientid
300000
500
500000
250
etc
I have tried editing the flatten_json function referenced, but I don't have an inkling of how to make this work.
The solution consists of simply editing the dictionary (thanks to Andrej Kesely). I just added a pass to exceptions in case some columns are inexistent.
d = {
"product": "example_productname",
"product_id": "example_productid",
"product_type": "example_producttype",
"producer": "example_producer",
"currency": "example_currency",
"client_id": "example_clientid",
"supplement": [
{
"supplementtype": "RTZ",
"price": 300000,
"rebate": "500",
},
{
"supplementtype": "CVB",
"price": 500000,
"rebate": "250",
},
{
"supplementtype": "JKL",
"price": 100000,
"rebate": "750",
},
],
}
for s in d["supplement"]:
try:
d["supplementtype_{}_price".format(s["supplementtype"])] = s["price"]
except:
pass
try:
d["supplementtype_{}_rebate".format(s["supplementtype"])] = s["rebate"]
except:
pass
del d["supplement"]
df = pd.DataFrame([d])
print(df)
product product_id product_type producer currency client_id supplementtype_RTZ_price supplementtype_RTZ_rebate supplementtype_CVB_price supplementtype_CVB_rebate supplementtype_JKL_price supplementtype_JKL_rebate
0 example_productname example_productid example_producttype example_producer example_currency example_clientid 300000 500 500000 250 100000 750
The used/referenced code:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
"""
Flatten a list of nested dicts.
"""
out = dict()
def flatten(x: (list, dict, str), name: str='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude:
flatten(x[a], f'{name}{a}{sep}')
elif type(x) is list:
i = 0
for a in x:
flatten(a, f'{name}{i}{sep}')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
# list of files
files = ['test1.json', 'test2.json']
# list to add dataframe from each file
df_list = list()
# iterate through files
for file in files:
with open(file, 'r') as f:
# read with json
data = json.loads(f.read())
# flatten_json into a dataframe and add to the dataframe list
df_list.append(pd.DataFrame.from_dict(flatten_json(data), orient='index').T)
# concat all dataframes together
df = pd.concat(df_list).reset_index(drop=True)
You can modify the dictionary before you create dataframe from it:
d = {
"product": "example_productname",
"product_id": "example_productid",
"product_type": "example_producttype",
"producer": "example_producer",
"currency": "example_currency",
"client_id": "example_clientid",
"supplement": [
{
"supplementtype": "RTZ",
"price": 300000,
"rebate": "500",
},
{
"supplementtype": "CVB",
"price": 500000,
"rebate": "250",
},
{
"supplementtype": "JKL",
"price": 100000,
"rebate": "750",
},
],
}
for s in d["supplement"]:
d["supplementtype_{}_price".format(s["supplementtype"])] = s["price"]
d["supplementtype_{}_rebate".format(s["supplementtype"])] = s["rebate"]
del d["supplement"]
df = pd.DataFrame([d])
print(df)
Prints:
product product_id product_type producer currency client_id supplementtype_RTZ_price supplementtype_RTZ_rebate supplementtype_CVB_price supplementtype_CVB_rebate supplementtype_JKL_price supplementtype_JKL_rebate
0 example_productname example_productid example_producttype example_producer example_currency example_clientid 300000 500 500000 250 100000 750

How to handle a JSON that returns a list of dict-like objects in Pandas?

I am using an API from collegefootballdata.com to get data on scores and betting lines. I want to use betting lines to infer expected win % and then compare that to actual results (I feel like my team loses too many games where we are big favorites and want to test that.) This code retrieves one game for example purposes.
parameters = {
"gameId": 401112435,
"year": 2019
}
response = requests.get("https://api.collegefootballdata.com/lines", params=parameters)
The JSON output is this:
[
{
"awayConference": "ACC",
"awayScore": 28,
"awayTeam": "Virginia Tech",
"homeConference": "ACC",
"homeScore": 35,
"homeTeam": "Boston College",
"id": 401112435,
"lines": [
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "57.5",
"provider": "consensus",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "57",
"provider": "Caesars",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "58",
"provider": "numberfire",
"spread": "4.5"
},
{
"formattedSpread": "Virginia Tech -4.5",
"overUnder": "56.5",
"provider": "teamrankings",
"spread": "4.5"
}
],
"season": 2019,
"seasonType": "regular",
"week": 1
}
]
I'm then loading into a pandas dataframe with:
def jstring(obj):
# create a formatted string of the Python JSON object
text = json.dumps(obj, sort_keys=True, indent=4)
return text
json_str = jstring(response.json())
df = pd.read_json(json_str)
This creates a dataframe with a "lines" column that contains the entire lines section of the JSON as a string. Ultimately, I want to use the "spread" value in the block where "provider" = "consensus". Everything else is extraneous for my purposes. I've tried exploding the column with
df = df.explode('lines')
which gives me 4 rows with something like this for each game (as expected):
{'formattedSpread': 'Virginia Tech -4.5', 'overUnder': '57.5', 'provider': 'consensus', 'spread': '4.5'}
Here is where I'm stuck. I want to keep only the rows where 'provider' = 'consensus', and further I need to have 'spread' to use as a separate variable/column in my analysis. I've tried exploding a 2nd time, df.split, df.replace to change { to [ and explode as a list, all to no avail. Any help is appreciated!!
This is probably what you're looking for -
EDIT: Handling special case.
import pandas as pd
import requests
params = {
"gameId": 401112435,
"year": 2019,
}
r = requests.get("https://api.collegefootballdata.com/lines", params=params)
df = pd.DataFrame(r.json()) # Create a DataFrame with a lines column that contains JSON
df = df.explode('lines') # Explode the DataFrame so that each line gets its own row
df = df.reset_index(drop=True) # After explosion, the indices are all the same - this resets them so that you can align the DataFrame below cleanly
def fill_na_lines(lines):
if pd.isna(lines):
return {k: None for k in ['provider', 'spread', 'formattedSpread', 'overUnder']}
return lines
df.lines = df.lines.apply(fill_na_lines)
lines_df = pd.DataFrame(df.lines.tolist()) # A separate lines DataFrame created from the lines JSON column
df = pd.concat([df, lines_df], axis=1) # Concatenating the two DataFrames along the vertical axis.
# Now you can filter down to whichever rows you need.
df = df[df.provider == 'consensus']
The documentation on joining DataFrames in different ways is probably useful.

How to convert JSON into a table using python?

HI I'm a beginner with python.
I have a csv file which I retrieve from my database. The table has several columns, one of which contains data in json. I have difficulty converting json data into a table to be saved in pdf.
First I load the csv. Then I take the column that has the data in json and I convert them.
df = pd.DataFrame(pd.read_csv('database.csv',sep=';'))
print(df.head())
for index, row in df.iterrows():
str = row["json_data"]
val = ast.literal_eval(str)
val1 = json.loads(str)
A sample of my json is that
{
"title0": {
"id": "1",
"ex": "90",
},
"title1": {
"name": "Alex",
"surname": "Boris",
"code": "RBAMRA4",
},
"title2": {
"company": "LA",
"state": "USA",
"example": "9090",
}
}
I'm trying to create a table like this
-------------------------------------------\
title0
--------------------------------------------\
id 1
ex 90
---------------------------------------------\
title 1
---------------------------------------------\
name Alex
surname Boris
code RBAMRA4
----------------------------------------------\
title2
----------------------------------------------\
company LA
state USA
example 9090
You can use the Python JSON library to achieve this.
import json
my_str = open("outfile").read()
val1 = json.loads(my_str)
for key in val1:
print("--------------------\n"+key+"\n--------------------")
for k in val1[key]:
print(k, val1[key][k])
Load the JSON data into the json.jsonloads function, this will deserialize the string and convert it to a python object (the whole object becomes a dict).
Then you loop through the dict the way you want.
--------------------
title0
--------------------
id 1
ex 90
--------------------
title1
--------------------
name Alex
surname Boris
code RBAMRA4
--------------------
title2
--------------------
company LA
state USA
example 9090
Read about parsing a dict, then you will understand the for loop.

Categories

Resources