import nested data into pandas from a json file

import nested data into pandas from a json file - python

I have a generated file as follows:
[{"intervals": [{"overwrites": 35588.4, "latency": 479.52}, {"overwrites": 150375.0, "latency": 441.1485001192274}], "uid": "23"}]
I simplified the file a bit for space reasons (there are more columns besides for the "overwrites" and "latency" ). I would like to import the data into a dataframe so I can later on draw the latency. I tried the following:
with open(os.path.join(path, "my_file.json")) as json_file:
curr_list=json.load(json_file)
df=pd.Series(curr_list[0]['intervals'])
print df
which returned:
0 {u'overwrites': 35588.4, u'latency...
1 {u'overwrites': 150375.0, u'latency...
However I couldn't get to store df in a data structure that allows me to access the latency field as follows:
graph = df[['latency']]
graph.plot(title="latency")
Any ideas?
Thanks for the help!

I think you can use json_normalize:
import pandas as pd
from pandas.io.json import json_normalize
data = [{"intervals": [{"overwrites": 35588.4, "latency": 479.52},
{"overwrites": 150375.0, "latency": 441.1485001192274}],
"uid": "23"}]
result = json_normalize(data, 'intervals', ['uid'])
print result
latency overwrites uid
0 479.5200 35588.4 23
1 441.1485 150375.0 23

Related

How to use pandas to get output in tabular format

the below mentioned code was all ok untill i tried to beautify it using pandas, can anyone pls suggest how can i wrap the output in tabular format with headers, borders?
old code :
import eikon as ek
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
df, err = ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'})
print(df)
New code :
import eikon as ek
import pandas as pd
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
df = pd.dataframe(ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'}))
print(df, headers='Keys',tablefmt='psql')

this call ek.get_data is returning two things, you can add a line like this:
import eikon as ek
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
data, err = ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'})
df = pd.DataFrame(data)
print(df)

Issues with .JSON file conversion and CSV manipulation in Python

sorry for the long post! I'm a bit Python-illiterate, so please bear with me:
I am working on a project that uses extracted Fitbit resting heart-rate data to compare heart-rate values between a series of years.
The fitbit data exports as a .json file that I am attempting to convert to .csv for further analysis.
I pulled a script from github that converts .json files to .csv-formatted files, however when inputing the resting heart rate data I am running into a few troubles.
Sample lines from .json:
[{
"dateTime" : "09/30/16 00:00:00",
"value" : {
"date" : "09/30/16",
"value" : 76.83736383927637,
"error" : 2.737363838373737
}
Section of GitHub code that transforms nested frame into columns:
# reading json into dataframes
resting_hr_df = get_json_to_df(file_list=resting_hr_file_list).reset_index()
# Heart rate contains a sub json that are explicitly converted into column
resting_hr_df['date'] = resting_hr_df['value'].transform(lambda x: make_new_df_value(x, 'date'))
resting_hr_df['value'] = resting_hr_df['value'].transform(lambda x: make_new_df_value(x, 'value'))
resting_hr_df['error'] = resting_hr_df['value'].transform(lambda x: make_new_df_value(x, 'error'))
resting_hr_df = resting_hr_df.drop(['value', 'index'], axis=1)
There are two variables named 'value' and I think this is causing the issue.
When using the transform function in pandas to assign variable names for the nested dataframe keys, the second ‘value’ values store as 0 in the .csv file.
How should I store the values?

The problem is that this is a nested json file. The solution is to load the json file with json and then load it into pandas with json_normalize
import json
import pandas as pd
with open('filename.json') as data_file:
data = json.load(data_file)
resting_hr_df = pd.json_normalize(data)
resting_hr_df
Output resting_hr_df:
| | dateTime | value.date | value.value | value.error |
|---:|:------------------|:-------------|--------------:|--------------:|
| 0 | 09/30/16 00:00:00 | 09/30/16 | 76.8374 | 2.73736 |

You can easy solve this method using default pandas function like read_json .
df = pd.read_json('abc.json', orient='index')
data = df.to_csv(index=False)
print(data)
it can be easy and helpful to solve this problem by convert json to csv file

How to update a pandas dataframe, from multiple API calls

I need to do a python script to
Read a csv file with the columns (person_id, name, flag). The file has 3000 rows.
Based on the person_id from the csv file, I need to call a URL passing the person_id to do a GET
http://api.myendpoint.intranet/get-data/1234
The URL will return some information of the person_id, like example below. I need to get all rents objects and save on my csv. My output needs to be like this
import pandas as pd
import requests
ids = pd.read_csv(f"{path}/data.csv", delimiter=';')
person_rents = df = pd.DataFrame([], columns=list('person_id','carId','price','rentStatus'))
for id in ids:
response = request.get(f'endpoint/{id["person_id"]}')
json = response.json()
person_rents.append( [person_id, rent['carId'], rent['price'], rent['rentStatus'] ] )
pd.read_csv(f"{path}/data.csv", delimiter=';' )
person_id;name;flag;cardId;price;rentStatus
1000;Joseph;1;6638;1000;active
1000;Joseph;1;5566;2000;active
Response example
{
"active": false,
"ctodx": false,
"rents": [{
"carId": 6638,
"price": 1000,
"rentStatus": "active"
}, {
"carId": 5566,
"price": 2000,
"rentStatus": "active"
}
],
"responseCode": "OK",
"status": [{
"request": 345,
"requestStatus": "F"
}, {
"requestId": 678,
"requestStatus": "P"
}
],
"transaction": false
}
After save the additional data from response on csv, i need to get data from another endpoint using the carId on the URL. The mileage result must be save in the same csv.
http://api.myendpoint.intranet/get-mileage/6638
http://api.myendpoint.intranet/get-mileage/5566
The return for each call will be like this
{"mileage":1000.0000}
{"mileage":550.0000}
The final output must be
person_id;name;flag;cardId;price;rentStatus;mileage
1000;Joseph;1;6638;1000;active;1000.0000
1000;Joseph;1;5566;2000;active;550.0000
SOmeone can help me with this script?
Could be with pandas or any python 3 lib.

Code Explanation
Create dataframe, df, with pd.read_csv.
It is expected that all of the values in 'person_id', are unique.
Use .apply on 'person_id', to call prepare_data.
prepare_data expects 'person_id' to be a str or int, as indicated by the type annotation, Union[int, str]
Call the API, which will return a dict, to the prepare_data function.
Convert the 'rents' key, of the dict, into a dataframe, with pd.json_normalize.
Use .apply on 'carId', to call the API, and extract the 'mileage', which is added to dataframe data, as a column.
Add 'person_id' to data, which can be used to merge df with s.
Convert pd.Series, s to a dataframe, with pd.concat, and then merge df and s, on person_id.
Save to a csv with pd.to_csv in the desired form.
Potential Issues
If there's an issue, it's most likely to occur in the call_api function.
As long as call_api returns a dict, like the response shown in the question, the remainder of the code will work correctly to produce the desired output.
import pandas as pd
import requests
import json
from typing import Union
def call_api(url: str) -> dict:
r = requests.get(url)
return r.json()
def prepare_data(uid: Union[int, str]) -> pd.DataFrame:
d_url = f'http://api.myendpoint.intranet/get-data/{uid}'
m_url = 'http://api.myendpoint.intranet/get-mileage/'
# get the rent data from the api call
rents = call_api(d_url)['rents']
# normalize rents into a dataframe
data = pd.json_normalize(rents)
# get the mileage data from the api call and add it to data as a column
data['mileage'] = data.carId.apply(lambda cid: call_api(f'{m_url}{cid}')['mileage'])
# add person_id as a column to data, which will be used to merge data to df
data['person_id'] = uid
return data
# read data from file
df = pd.read_csv('file.csv', sep=';')
# call prepare_data
s = df.person_id.apply(prepare_data)
# s is a Series of DataFrames, which can be combined with pd.concat
s = pd.concat([v for v in s])
# join df with s, on person_id
df = df.merge(s, on='person_id')
# save to csv
df.to_csv('output.csv', sep=';', index=False)
If there are any errors when running this code:
Leave a comment, to let me know.
edit your question, and paste the entire TraceBack, as text, into a code block.
Example
# given the following start dataframe
person_id name flag
0 1000 Joseph 1
1 400 Sam 1
# resulting dataframe using the same data for both id 1000 and 400
person_id name flag carId price rentStatus mileage
0 1000 Joseph 1 6638 1000 active 1000.0
1 1000 Joseph 1 5566 2000 active 1000.0
2 400 Sam 1 6638 1000 active 1000.0
3 400 Sam 1 5566 2000 active 1000.0

There are many different ways to implement this. One of them would be, like you started in your comment:
read the CSV file with pandas
for each line take the person_id and build a call
the delivered JSON response can then be taken from the rents
the carId is then extracted for each individual rental
finally this is collected in a row_list
the row_list is then converted back to csv via pandas
A very simple solution without any error handling could look something like this:
from types import SimpleNamespace
import pandas as pd
import requests
import json
path = '/some/path/'
df = pd.read_csv(f'{path}/data.csv', delimiter=';')
rows_list = []
for _, row in df.iterrows():
rentCall = f'http://api.myendpoint.intranet/get-data/{row.person_id}'
print(rentCall)
response = requests.get(rentCall)
r = json.loads(response.text, object_hook=lambda d: SimpleNamespace(**d))
for rent in r.rents:
mileageCall = f'http://api.myendpoint.intranet/get-mileage/{rent.carId}'
print(mileageCall)
response2 = requests.get(mileageCall)
m = json.loads(response2.text, object_hook=lambda d: SimpleNamespace(**d))
state = "active" if r.active else "inactive"
rows_list.append((row['person_id'], row['name'], row['flag'], rent.carId, rent.price, state, m.mileage))
df = pd.DataFrame(rows_list, columns=('person_id', 'name', 'flag', 'carId', 'price', 'rentStatus', 'mileage'))
print(df.to_csv(index=False, sep=';'))

Speeding up with multiprocessing
You mention that you have 3000 rows, which means that you'll have to make a lot of API calls. Depending on the connection, every one of these calls might take a while. As a result, performing this in a sequential way might be too slow. The majority of the time, your program will just be waiting on a response from the server without doing anything else.
We can improve this performance by using multiprocessing.
I use all the code from Trenton his answer, but I replace the following sequential call:
# call prepare_data
s = df.person_id.apply(prepare_data)
With a parallel alternative:
from multiprocessing import Pool
n_processes=20 # Experiment with this to see what works well
with Pool(n_processes) as p:
s=p.map(prepare_data, df.person_id)
Alternatively, a threadpool might be faster, but you'll have to test that by replacing the import with
from multiprocessing.pool import ThreadPool as Pool.

Easiest way to get API data into str/int format python

I have read through many articles and posts to connect to an api then format it into int/str however I did mange to make possibly the longest winded way ever its real ugly please could someone show me the shortest most efficient way to accomplish the below code any suggestions would be greatly appreciated bassically looking to print out "eos" in str format and "price" as int Thanks!
import urllib
import json
import pandas as pd
import numpy as np
import requests
r = requests.get('https://api.coinmarketcap.com/v1/ticker/eos/')
with open('events.csv','w') as fd:
fd.write(r.text)
data = pd.read_csv('events.csv', names=['Choose One'])
i = data.iloc[[6], [0]]
a = str(i)
name,price = a.split(":")
string = price[2:-1]
print(string)

It's simpler to just use pandas read_json to read the file into a data frame, read_json will automatically assign the apt datatype to each column, then use column selection to select 'name','price_usd' columns (of-course in this case there is only one row, but the same code can be used with multiple rows)
i.e.
import pandas as pd
df = pd.read_json('https://api.coinmarketcap.com/v1/ticker/eos/')
print(df[['name','price_usd']].apply(lambda row:'{}: {:.0f}'.format(ro
w['name'],row['price_usd']),axis=1))
using .0f in the format statement will display the integer part (rounded) of the price_usd value so the output will be.
0 EOS: 9
alternatively using the round function will round the float values
i.e.
In [34]: import pandas as pd
...: df = pd.read_json('https://api.coinmarketcap.com/v1/ticker/eos/')
...: print(df[['name','price_usd']].apply(lambda row:'{}: {:}'.format(row['n
...: ame'],round(row['price_usd'],2)),axis=1))
...:
...:
0 EOS: 8.99
dtype: object

Simply use json.loads(r.text) or much easier directly r.json().
Say, right now the api returns the following data:
[
{
"id": "eos",
"name": "EOS",
"symbol": "EOS",
"rank": "9",
"price_usd": "9.31992",
"price_btc": "0.00106154",
"24h_volume_usd": "596467000.0",
"market_cap_usd": "6034993504.0",
"available_supply": "647537050.0",
"total_supply": "900000000.0",
"max_supply": "1000000000.0",
"percent_change_1h": "1.3",
"percent_change_24h": "-6.81",
"percent_change_7d": "-36.4",
"last_updated": "1517755757"
}
]
If you use r.json(), you get this as a json, otherwise load it with data = json.loads(r.text) and save it to a pandas DataFrame with df = pd.DataFrame(data) which then looks like the following:
In [15]: df
Out[15]:
24h_volume_usd available_supply id last_updated market_cap_usd max_supply name percent_change_1h percent_change_24h percent_change_7d price_btc price_usd rank symbol total_supply
0 596467000.0 647537050.0 eos 1517755757 6034993504.0 1000000000.0 EOS 1.3 -6.81 -36.4 0.00106154 9.31992 9 EOS 900000000.0
Access the data with pandas indexing:
In [8]: df[['name', 'price_usd']]
Out[8]:
name price_usd
0 EOS 9.29186
Or for printing:
In [18]: print df.loc[0, 'name'], ': ', df.loc[0, 'price_usd']
EOS : 9.31992

Nested JSON file into Pandas Dataframe

I'm having trouble getting this nested JSON object into a pandas dataframe using python:
{
"count":275,
"calls":[
{
"connectedTo":"18885068980",
"serviceName":"",
"callGuid":"01541af0-d87c-4911-a868-f5ac573d1e31",
"origin":"+19178558701",
"stateChangedAt":"2016-04-15T18:21:23Z",
"sequence":9,
"appletName":"ACD Sales General"
}
]
}
I've tried using json_normalize and am going in circles. Any help would be very much appreciated!

I know that it includes json_normalize, but I think this is what you are trying to do.
import json
import pandas as pd
from pandas.io.json import json_normalize
from pprint import pprint
j = json.dumps( //to create the json
{'count': 275,
"calls":
[{'connectedTo': "18885068980",
"serviceName":"",
"callGuid":"01541af0-d87c-4911-a868-f5ac573d1e31",
"stateChangedAt":"2016-04-15T18:21:23Z",
"sequence":9,
"appletName":"ACD Sales General"}]})
data = json.loads(j)
pprint(json_normalize(data['calls']))
which returns
appletName callGuid connectedTo \
0 ACD Sales General 01541af0-d87c-4911-a868-f5ac573d1e31 18885068980
sequence serviceName stateChangedAt
0 9 2016-04-15T18:21:23Z

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

import nested data into pandas from a json file - python

Related

How to use pandas to get output in tabular format

Issues with .JSON file conversion and CSV manipulation in Python

How to update a pandas dataframe, from multiple API calls

Easiest way to get API data into str/int format python

Nested JSON file into Pandas Dataframe

Categories

Resources