How to convert a nested JSON object to a dataframe?

How to convert a nested JSON object to a dataframe? - python

I am getting a JSON object returned from an API call which looks like this:
{"meta":{"symbol":"AAPL","interval":"1min","currency":"USD","exchange_timezone":"America/New_York","exchange":"NASDAQ","mic_code":"XNGS","type":"Common Stock"},"values":[{"datetime":"2022-06-06 15:59:00","open":"146.14999","high":"146.47000","low":"146.09000","close":"146.14000","volume":"1364826"},{"datetime":"2022-06-06 15:58:00","open":"146.14000","high":"146.17999","low":"146.08000","close":"146.14680","volume":"358111"},{"datetime":"2022-06-06 15:57:00","open":"146.30499","high":"146.33000","low":"146.13000","close":"146.14000","volume":"0"},{"datetime":"2022-06-06 15:56:00","open":"146.25999","high":"146.34500","low":"146.20000","close":"146.31000","volume":"306725"},{"datetime":"2022-06-06 15:55:00","open":"146.14999","high":"146.38000","low":"146.07001","close":"146.25999","volume":"384471"},{"datetime":"2022-06-06 15:54:00","open":"145.95000","high":"146.25999","low":"145.91000","close":"146.15500","volume":"287583"},{"datetime":"2022-06-06 15:53:00","open":"145.97000","high":"146.10001","low":"145.89760","close":"145.94569","volume":"231640"},{"datetime":"2022-06-06 15:52:00","open":"145.96500","high":"146.00000","low":"145.78999","close":"145.96500","volume":"189185"},{"datetime":"2022-06-06 15:51:00","open":"145.89000","high":"146.00000","low":"145.74001","close":"145.96001","volume":"182617"},{"datetime":"2022-06-06 15:50:00","open":"145.74001","high":"146.11290","low":"145.74001","close":"145.89500","volume":"376980"},{"datetime":"2022-06-06 15:49:00","open":"145.63499","high":"145.85001","low":"145.63000","close":"145.73000","volume":"190471"},{"datetime":"2022-06-06 15:48:00","open":"145.61000","high":"145.71001","low":"145.58000","close":"145.65131","volume":"138908"},{"datetime":"2022-06-06 15:47:00","open":"145.64999","high":"145.65500","low":"145.53999","close":"145.61011","volume":"166144"},{"datetime":"2022-06-06 15:46:00","open":"145.81500","high":"145.82500","low":"145.62061","close":"145.66000","volume":"175801"},{"datetime":"2022-06-06 15:45:00","open":"145.88989","high":"145.98000","low":"145.80780","close":"145.81880","volume":"161626"},{"datetime":"2022-06-06 15:44:00","open":"145.80000","high":"145.89000","low":"145.77000","close":"145.89000","volume":"89067"},{"datetime":"2022-06-06 15:43:00","open":"145.95000","high":"145.97000","low":"145.78500","close":"145.80000","volume":"180386"},{"datetime":"2022-06-06 15:42:00","open":"145.84000","high":"146.09000","low":"145.82001","close":"145.96989","volume":"377760"},{"datetime":"2022-06-06 15:41:00","open":"145.59000","high":"145.86000","low":"145.59000","close":"145.83730","volume":"283091"},{"datetime":"2022-06-06 15:40:00","open":"145.46001","high":"145.60001","low":"145.36000","close":"145.58501","volume":"159567"},{"datetime":"2022-06-06 15:39:00","open":"145.50999","high":"145.56850","low":"145.45000","close":"145.47009","volume":"113975"},{"datetime":"2022-06-06 15:38:00","open":"145.30000","high":"145.50880","low":"145.24010","close":"145.50500","volume":"174004"},{"datetime":"2022-06-06 15:37:00","open":"145.44000","high":"145.44000","low":"145.27000","close":"145.30000","volume":"189831"},{"datetime":"2022-06-06 15:36:00","open":"145.54890","high":"145.54890","low":"145.38000","close":"145.44000","volume":"101993"},{"datetime":"2022-06-06 15:35:00","open":"145.53000","high":"145.56000","low":"145.41000","close":"145.54500","volume":"114006"},{"datetime":"2022-06-06 15:34:00","open":"145.58501","high":"145.60789","low":"145.50999","close":"145.52010","volume":"108473"},{"datetime":"2022-06-06 15:33:00","open":"145.53999","high":"145.60500","low":"145.47000","close":"145.58501","volume":"133996"},{"datetime":"2022-06-06 15:32:00","open":"145.56500","high":"145.64000","low":"145.46030","close":"145.53999","volume":"131019"},{"datetime":"2022-06-06 15:31:00","open":"145.34500","high":"145.60001","low":"145.34000","close":"145.58800","volume":"238105"},{"datetime":"2022-06-06 15:30:00","open":"145.34500","high":"145.35001","low":"145.27000","close":"145.34000","volume":"136026"}],"status":"ok"}
I am interested in the "values" section, the datetime,h,l,o,c,v values and I want to import them into a dataframe.
My code is:
resp = requests.get(url)
which generates the above response. Then:
df = pd.DataFrame(resp)
which provides this:
0 b'{"meta":{"symbol":"AAPL","interval":"1day","...
1 b'":"XNGS","type":"Common Stock"},"values":[{"...
2 b'e":"146.14000","volume":"65217850"},{"dateti...
3 b'5.39000","volume":"88471302"},{"datetime":"2...
4 b'1","volume":"72348100"},{"datetime":"2022-06...
How can I skip the meta section and populate the dataframe only with the values that I need?
I have tried:
df = pd.DataFrame(resp.meta.values)
and
df = pd.DataFrame(resp['meta']['values'])
which return errors: no attribute meta and not subscriptable respectively.

Edit to fit actual solution:
You should be able to load your API response with:
data = resp.json()
pd.DataFrame(data['values'])

Related

Most efficient way of converting RESTful output to dataframe

I have output from a REST call that I've converted to JSON.
It's a highly nested collection of dicts and lists, but I'm eventually able to convert it to dataframe as follows:
import panads as pd
from requests import get
url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all'
params = {
'startTime' : '2008-06',
'dimensionAtObservation' : 'TimeDimension'
}
r = get(url, params = params)
x = r.json()
d = x['dataSets'][0]['series']
a = pd.DataFrame(d['0:0:0']['observations'])
b = pd.DataFrame(d['0:1:0']['observations'])
This works absent some manipulation to make it easier to work with, and as there are multiple time series, I can do a version of the same for each, but it goes without saying it's kind of clunky.
Is there a better/cleaner way to do this.

The pandasdmx library makes this super-simple:
import pandasdmx as sdmx
df = sdmx.Request('OECD').data(
resource_id='MEI_FIN',
key='IR3TIB.GBR+USA.M',
params={'startTime': '2008-06', 'dimensionAtObservation': 'TimeDimension'},
).write()

Absent any responses, here's the solution I came up with. I added a list comprehension to deal with getting each series into a dataframe, and then a transpose as this source resulted in the series being aligned across rows instead of down columns.
import panads as pd
from requests import get
url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all'
params = {
'startTime' : '2008-06',
'dimensionAtObservation' : 'TimeDimension'
}
r = get(url, params = params)
x = r.json()
d = x['dataSets'][0]['series']
df = [pd.DataFrame(d[i]['observations']).loc[0] for i in d]
df = pd.DataFrame(df).T

Can't figure out why I get error when trying to get JSON value into DF

Been looking online but can't figure out why I'm getting the error as the data is available in the JSON.
I'm trying to extract "pull_request_contributors" value from JSON and put into DF.
I get the error:
KeyError: "Try running with errors='ignore' as key 'pull_request_contributors' is not always present"
Code
cg = CoinGeckoAPI()
ts = '01-01-2017'
cs = 'bitcoin'
# get data
result = cg.get_coin_history_by_id(cs, ts)
#pull_request_contributors
df_pr = pd_json.json_normalize(data,
record_path='developer_data',
meta=['pull_request_contributors']).set_index(ts)
JSON
{'community_data': {'facebook_likes': 40055,
'reddit_accounts_active_48h': '4657.4',
'reddit_average_comments_48h': 186.5,
'reddit_average_posts_48h': 3.75,
'reddit_subscribers': 1014816,
'twitter_followers': 64099},
'developer_data': {'closed_issues': 3845,
'commit_count_4_weeks': 245,
'forks': 22024,
'pull_request_contributors': 564,
'pull_requests_merged': 6163,
'stars': 36987,
'subscribers': 3521,
'total_issues': 4478}...
Expectation
date bitcoin
01-01-2017 564

Since the field pull_request_contributors is not available in each object, pandas cannot build the dataframe. Run
df_pr = pd_json.json_normalize(data, record_path='developer_data', meta=['pull_request_contributors'], errors='ignore').set_index(ts)
to ignore missing fields.
EDIT
json_normalized creates a table with all fields as columns and their values make the rows. So for what you want to achieve, I wouldn't go with json_normalize, since you know which particular field you want to read. Here's how I would do it
ts = '01-01-2017'
cs = 'bitcoin'
df_pr = pd_json.json_normalize(data['developer_data'])
df = pd.DataFrame(data=[{'date': ts,
cs: data['developer_data']['pull_request_contributors']}]).set_index('date')
This way we simply construct the DataFrame, without first normalizing the response.
If the response is a string and not a dict, I don't know what the CoinGeckoAPI returns, you can decode it first with
import json
data = json.loads(json_string)
Hope this helps

Create Proper Dataframe from SDMX Response, Python 3.6

I want to prepare dataset from the data available in http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ATSI_BIRTHS_SUMM
Data API:
http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetData/ATSI_BIRTHS_SUMM/1+4+5+7+8+9+10+13+14+15+18+19+20.IM+IB.0+1+2+3+4+5+6+7.A/all
from pandasdmx import Request
Agency_Code = 'ABS'
Dataset_Id = 'ATSI_BIRTHS_SUMM'
ABS = Request(Agency_Code)
data_response = ABS.data(resource_id='ATSI_BIRTHS_SUMM')
print(data_response.url)
DF = data_response.write(data_response.data.obs(with_values=True, with_attributes=True), parse_time=False)
Above gives error: ValueError: Type names and field names cannot be a keyword: 'None'
DF = data_response.write(data_response.data.series, parse_time=False), This works but Dimension items coming in column wise.
Support Links:
http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetDataStructure/all
http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetDataStructure/ATSI_BIRTHS_SUMM
http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ATSI_BIRTHS_SUMM
Please suggest better way to retrieve data.

Your example
DF = data_response.write(data_response.data.series, parse_time=False)
Produces a stacked DataFrame, by unstack().reset_index() you will get a "flat" DataFrame.
data_response.write().unstack().reset_index()
MEASURE INDIGENOUS_STATUS ASGS_2011 FREQUENCY TIME_PERIOD 0
0 1 IM 0 A 2001 8334.0
Is this what you are looking for?

Cannot Access dict with duplicate values dynamically from Api - Python for Binance

I am attempting to format this Api
https://www.binance.com/api/v1/ticker/allBookTickers
Here is an abbreviated version of the Api
[{"symbol":"ETHBTC","bidPrice":"0.07200500","bidQty":"0.67800000","askPrice":"0.07203200","askQty":"7.19200000"},{"symbol":"LTCBTC","bidPrice":"0.01281100","bidQty":"10.90000000","askPrice":"0.01282500","askQty":"1.01000000"}]
Each dict is saved as an index in the list my issue is with the fact that each dict starts with 'symbol' rather then the name like 'ETHBTC'
I can call the index number but as their is hundreds of dicts in the api I need to find a method of being able to type in for instance 'ETHBTC' to call that dict?
This is what it would look like in an ideal world but I have no idea how to achieve this any help would be greatly appreciated?
> data = requests.get('https://www.binance.com/api/v1/ticker/allBookTickers')
> data = data.json()
> ltc = data['LTCBTC']

Use the following code:-
import requests
#fetched data from url using requests
data = requests.get('https://www.binance.com/api/v1/ticker/allBookTickers')
# creating json object from response
dataJson = data.json()
# creating dictionary from json object using symbol value as key
dataDictionary = {d['symbol'] : d for d in dataJson}
# accessing dictionary object using symbol value
ltc = dataDictionary['LTCBTC']
print ltc
# now you can use ltc values by given keys as, and so on for other values
print ltc['askPrice']
In this code we created python dictionary from the response returned.

Pandas Google Distance Matrix API - Pass coordinates into URL

I am working with the Google Distance Matrix API, where I want to feed coordinates from a dataframe into the API and return the duration and distance between the two points.
Here is my dataframe:
import pandas as pd
import simplejson
import urllib
import numpy as np
Record orig_lat orig_lng dest_lat dest_lng
1 40.7484405 -74.0073127 40.7115242 -74.0145492
2 40.7421218 -73.9878531 40.7727216 -73.9863531
First, I need to combine the orig_lat & orig_lng and dest_lat & dest_lng into strings, which then pass into the url. So I've tried creating the variables orig_coord & dest_coord then passing them into the URL and returning values:
orig_coord = df[['orig_lat','orig_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
dest_coord = df[['dest_lat','dest_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
for row in df.itertuples():
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,end_coord)
result = simplejson.load(urllib.urlopen(url))
df['driving_time_text'] = result['rows'][0]['elements'][0]['duration']['text']
But I get the following error: "TypeError: () got an unexpected keyword argument 'axis'"
So my question is: how do I concatenate values from two columns into a string, then pass that string into a URL and output the result?
Thank you in advance!

Hmm, I am not sure how you constructed your data frame. Maybe post those details? But if you can live with referencing tuple elements positionally, this worked for me:
import pandas as pd
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353}]
df = pd.DataFrame(data)
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
print url
produces
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.748441,-74.007313&destinations=40.711524,-74.014549&units=imperial&MYGOOGLEAPIKEY
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.742122,-73.987853&destinations=40.772722,-73.986353&units=imperial&MYGOOGLEAPIKEY
To update the data frame with the result, since row is a tuple and not writeable, you might want to keep track of the current index as you iterate. Maybe something like this:
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549, 'result': -1},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353, 'result': -1}]
df = pd.DataFrame(data)
i_row = 0
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
# Do stuff to get your result
df['result'][i_row] = result
i_row++

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert a nested JSON object to a dataframe? - python

Edit to fit actual solution: You should be able to load your API response with: data = resp.json() pd.DataFrame(data['values'])

Related

Most efficient way of converting RESTful output to dataframe

Can't figure out why I get error when trying to get JSON value into DF

Create Proper Dataframe from SDMX Response, Python 3.6

Cannot Access dict with duplicate values dynamically from Api - Python for Binance

Pandas Google Distance Matrix API - Pass coordinates into URL

Categories

Resources