How to convert a nested JSON object to a dataframe? - python
I am getting a JSON object returned from an API call which looks like this:
{"meta":{"symbol":"AAPL","interval":"1min","currency":"USD","exchange_timezone":"America/New_York","exchange":"NASDAQ","mic_code":"XNGS","type":"Common Stock"},"values":[{"datetime":"2022-06-06 15:59:00","open":"146.14999","high":"146.47000","low":"146.09000","close":"146.14000","volume":"1364826"},{"datetime":"2022-06-06 15:58:00","open":"146.14000","high":"146.17999","low":"146.08000","close":"146.14680","volume":"358111"},{"datetime":"2022-06-06 15:57:00","open":"146.30499","high":"146.33000","low":"146.13000","close":"146.14000","volume":"0"},{"datetime":"2022-06-06 15:56:00","open":"146.25999","high":"146.34500","low":"146.20000","close":"146.31000","volume":"306725"},{"datetime":"2022-06-06 15:55:00","open":"146.14999","high":"146.38000","low":"146.07001","close":"146.25999","volume":"384471"},{"datetime":"2022-06-06 15:54:00","open":"145.95000","high":"146.25999","low":"145.91000","close":"146.15500","volume":"287583"},{"datetime":"2022-06-06 15:53:00","open":"145.97000","high":"146.10001","low":"145.89760","close":"145.94569","volume":"231640"},{"datetime":"2022-06-06 15:52:00","open":"145.96500","high":"146.00000","low":"145.78999","close":"145.96500","volume":"189185"},{"datetime":"2022-06-06 15:51:00","open":"145.89000","high":"146.00000","low":"145.74001","close":"145.96001","volume":"182617"},{"datetime":"2022-06-06 15:50:00","open":"145.74001","high":"146.11290","low":"145.74001","close":"145.89500","volume":"376980"},{"datetime":"2022-06-06 15:49:00","open":"145.63499","high":"145.85001","low":"145.63000","close":"145.73000","volume":"190471"},{"datetime":"2022-06-06 15:48:00","open":"145.61000","high":"145.71001","low":"145.58000","close":"145.65131","volume":"138908"},{"datetime":"2022-06-06 15:47:00","open":"145.64999","high":"145.65500","low":"145.53999","close":"145.61011","volume":"166144"},{"datetime":"2022-06-06 15:46:00","open":"145.81500","high":"145.82500","low":"145.62061","close":"145.66000","volume":"175801"},{"datetime":"2022-06-06 15:45:00","open":"145.88989","high":"145.98000","low":"145.80780","close":"145.81880","volume":"161626"},{"datetime":"2022-06-06 15:44:00","open":"145.80000","high":"145.89000","low":"145.77000","close":"145.89000","volume":"89067"},{"datetime":"2022-06-06 15:43:00","open":"145.95000","high":"145.97000","low":"145.78500","close":"145.80000","volume":"180386"},{"datetime":"2022-06-06 15:42:00","open":"145.84000","high":"146.09000","low":"145.82001","close":"145.96989","volume":"377760"},{"datetime":"2022-06-06 15:41:00","open":"145.59000","high":"145.86000","low":"145.59000","close":"145.83730","volume":"283091"},{"datetime":"2022-06-06 15:40:00","open":"145.46001","high":"145.60001","low":"145.36000","close":"145.58501","volume":"159567"},{"datetime":"2022-06-06 15:39:00","open":"145.50999","high":"145.56850","low":"145.45000","close":"145.47009","volume":"113975"},{"datetime":"2022-06-06 15:38:00","open":"145.30000","high":"145.50880","low":"145.24010","close":"145.50500","volume":"174004"},{"datetime":"2022-06-06 15:37:00","open":"145.44000","high":"145.44000","low":"145.27000","close":"145.30000","volume":"189831"},{"datetime":"2022-06-06 15:36:00","open":"145.54890","high":"145.54890","low":"145.38000","close":"145.44000","volume":"101993"},{"datetime":"2022-06-06 15:35:00","open":"145.53000","high":"145.56000","low":"145.41000","close":"145.54500","volume":"114006"},{"datetime":"2022-06-06 15:34:00","open":"145.58501","high":"145.60789","low":"145.50999","close":"145.52010","volume":"108473"},{"datetime":"2022-06-06 15:33:00","open":"145.53999","high":"145.60500","low":"145.47000","close":"145.58501","volume":"133996"},{"datetime":"2022-06-06 15:32:00","open":"145.56500","high":"145.64000","low":"145.46030","close":"145.53999","volume":"131019"},{"datetime":"2022-06-06 15:31:00","open":"145.34500","high":"145.60001","low":"145.34000","close":"145.58800","volume":"238105"},{"datetime":"2022-06-06 15:30:00","open":"145.34500","high":"145.35001","low":"145.27000","close":"145.34000","volume":"136026"}],"status":"ok"}
I am interested in the "values" section, the datetime,h,l,o,c,v values and I want to import them into a dataframe.
My code is:
resp = requests.get(url)
which generates the above response. Then:
df = pd.DataFrame(resp)
which provides this:
0 b'{"meta":{"symbol":"AAPL","interval":"1day","...
1 b'":"XNGS","type":"Common Stock"},"values":[{"...
2 b'e":"146.14000","volume":"65217850"},{"dateti...
3 b'5.39000","volume":"88471302"},{"datetime":"2...
4 b'1","volume":"72348100"},{"datetime":"2022-06...
How can I skip the meta section and populate the dataframe only with the values that I need?
I have tried:
df = pd.DataFrame(resp.meta.values)
and
df = pd.DataFrame(resp['meta']['values'])
which return errors: no attribute meta and not subscriptable respectively.
Edit to fit actual solution:
You should be able to load your API response with:
data = resp.json()
pd.DataFrame(data['values'])
Related
Most efficient way of converting RESTful output to dataframe
I have output from a REST call that I've converted to JSON. It's a highly nested collection of dicts and lists, but I'm eventually able to convert it to dataframe as follows: import panads as pd from requests import get url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all' params = { 'startTime' : '2008-06', 'dimensionAtObservation' : 'TimeDimension' } r = get(url, params = params) x = r.json() d = x['dataSets'][0]['series'] a = pd.DataFrame(d['0:0:0']['observations']) b = pd.DataFrame(d['0:1:0']['observations']) This works absent some manipulation to make it easier to work with, and as there are multiple time series, I can do a version of the same for each, but it goes without saying it's kind of clunky. Is there a better/cleaner way to do this.
The pandasdmx library makes this super-simple: import pandasdmx as sdmx df = sdmx.Request('OECD').data( resource_id='MEI_FIN', key='IR3TIB.GBR+USA.M', params={'startTime': '2008-06', 'dimensionAtObservation': 'TimeDimension'}, ).write()
Absent any responses, here's the solution I came up with. I added a list comprehension to deal with getting each series into a dataframe, and then a transpose as this source resulted in the series being aligned across rows instead of down columns. import panads as pd from requests import get url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all' params = { 'startTime' : '2008-06', 'dimensionAtObservation' : 'TimeDimension' } r = get(url, params = params) x = r.json() d = x['dataSets'][0]['series'] df = [pd.DataFrame(d[i]['observations']).loc[0] for i in d] df = pd.DataFrame(df).T
Can't figure out why I get error when trying to get JSON value into DF
Been looking online but can't figure out why I'm getting the error as the data is available in the JSON. I'm trying to extract "pull_request_contributors" value from JSON and put into DF. I get the error: KeyError: "Try running with errors='ignore' as key 'pull_request_contributors' is not always present" Code cg = CoinGeckoAPI() ts = '01-01-2017' cs = 'bitcoin' # get data result = cg.get_coin_history_by_id(cs, ts) #pull_request_contributors df_pr = pd_json.json_normalize(data, record_path='developer_data', meta=['pull_request_contributors']).set_index(ts) JSON {'community_data': {'facebook_likes': 40055, 'reddit_accounts_active_48h': '4657.4', 'reddit_average_comments_48h': 186.5, 'reddit_average_posts_48h': 3.75, 'reddit_subscribers': 1014816, 'twitter_followers': 64099}, 'developer_data': {'closed_issues': 3845, 'commit_count_4_weeks': 245, 'forks': 22024, 'pull_request_contributors': 564, 'pull_requests_merged': 6163, 'stars': 36987, 'subscribers': 3521, 'total_issues': 4478}... Expectation date bitcoin 01-01-2017 564
Since the field pull_request_contributors is not available in each object, pandas cannot build the dataframe. Run df_pr = pd_json.json_normalize(data, record_path='developer_data', meta=['pull_request_contributors'], errors='ignore').set_index(ts) to ignore missing fields. EDIT json_normalized creates a table with all fields as columns and their values make the rows. So for what you want to achieve, I wouldn't go with json_normalize, since you know which particular field you want to read. Here's how I would do it ts = '01-01-2017' cs = 'bitcoin' df_pr = pd_json.json_normalize(data['developer_data']) df = pd.DataFrame(data=[{'date': ts, cs: data['developer_data']['pull_request_contributors']}]).set_index('date') This way we simply construct the DataFrame, without first normalizing the response. If the response is a string and not a dict, I don't know what the CoinGeckoAPI returns, you can decode it first with import json data = json.loads(json_string) Hope this helps
Create Proper Dataframe from SDMX Response, Python 3.6
I want to prepare dataset from the data available in http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ATSI_BIRTHS_SUMM Data API: http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetData/ATSI_BIRTHS_SUMM/1+4+5+7+8+9+10+13+14+15+18+19+20.IM+IB.0+1+2+3+4+5+6+7.A/all from pandasdmx import Request Agency_Code = 'ABS' Dataset_Id = 'ATSI_BIRTHS_SUMM' ABS = Request(Agency_Code) data_response = ABS.data(resource_id='ATSI_BIRTHS_SUMM') print(data_response.url) DF = data_response.write(data_response.data.obs(with_values=True, with_attributes=True), parse_time=False) Above gives error: ValueError: Type names and field names cannot be a keyword: 'None' DF = data_response.write(data_response.data.series, parse_time=False), This works but Dimension items coming in column wise. Support Links: http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetDataStructure/all http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetDataStructure/ATSI_BIRTHS_SUMM http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ATSI_BIRTHS_SUMM Please suggest better way to retrieve data.
Your example DF = data_response.write(data_response.data.series, parse_time=False) Produces a stacked DataFrame, by unstack().reset_index() you will get a "flat" DataFrame. data_response.write().unstack().reset_index() MEASURE INDIGENOUS_STATUS ASGS_2011 FREQUENCY TIME_PERIOD 0 0 1 IM 0 A 2001 8334.0 Is this what you are looking for?
Cannot Access dict with duplicate values dynamically from Api - Python for Binance
I am attempting to format this Api https://www.binance.com/api/v1/ticker/allBookTickers Here is an abbreviated version of the Api [{"symbol":"ETHBTC","bidPrice":"0.07200500","bidQty":"0.67800000","askPrice":"0.07203200","askQty":"7.19200000"},{"symbol":"LTCBTC","bidPrice":"0.01281100","bidQty":"10.90000000","askPrice":"0.01282500","askQty":"1.01000000"}] Each dict is saved as an index in the list my issue is with the fact that each dict starts with 'symbol' rather then the name like 'ETHBTC' I can call the index number but as their is hundreds of dicts in the api I need to find a method of being able to type in for instance 'ETHBTC' to call that dict? This is what it would look like in an ideal world but I have no idea how to achieve this any help would be greatly appreciated? > data = requests.get('https://www.binance.com/api/v1/ticker/allBookTickers') > data = data.json() > ltc = data['LTCBTC']
Use the following code:- import requests #fetched data from url using requests data = requests.get('https://www.binance.com/api/v1/ticker/allBookTickers') # creating json object from response dataJson = data.json() # creating dictionary from json object using symbol value as key dataDictionary = {d['symbol'] : d for d in dataJson} # accessing dictionary object using symbol value ltc = dataDictionary['LTCBTC'] print ltc # now you can use ltc values by given keys as, and so on for other values print ltc['askPrice'] In this code we created python dictionary from the response returned.
Pandas Google Distance Matrix API - Pass coordinates into URL
I am working with the Google Distance Matrix API, where I want to feed coordinates from a dataframe into the API and return the duration and distance between the two points. Here is my dataframe: import pandas as pd import simplejson import urllib import numpy as np Record orig_lat orig_lng dest_lat dest_lng 1 40.7484405 -74.0073127 40.7115242 -74.0145492 2 40.7421218 -73.9878531 40.7727216 -73.9863531 First, I need to combine the orig_lat & orig_lng and dest_lat & dest_lng into strings, which then pass into the url. So I've tried creating the variables orig_coord & dest_coord then passing them into the URL and returning values: orig_coord = df[['orig_lat','orig_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1) dest_coord = df[['dest_lat','dest_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1) for row in df.itertuples(): url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,end_coord) result = simplejson.load(urllib.urlopen(url)) df['driving_time_text'] = result['rows'][0]['elements'][0]['duration']['text'] But I get the following error: "TypeError: () got an unexpected keyword argument 'axis'" So my question is: how do I concatenate values from two columns into a string, then pass that string into a URL and output the result? Thank you in advance!
Hmm, I am not sure how you constructed your data frame. Maybe post those details? But if you can live with referencing tuple elements positionally, this worked for me: import pandas as pd data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549}, {'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353}] df = pd.DataFrame(data) for row in df.itertuples(): orig_coord='{},{}'.format(row[1],row[2]) dest_coord='{},{}'.format(row[3],row[4]) url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord) print url produces http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.748441,-74.007313&destinations=40.711524,-74.014549&units=imperial&MYGOOGLEAPIKEY http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.742122,-73.987853&destinations=40.772722,-73.986353&units=imperial&MYGOOGLEAPIKEY To update the data frame with the result, since row is a tuple and not writeable, you might want to keep track of the current index as you iterate. Maybe something like this: data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549, 'result': -1}, {'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353, 'result': -1}] df = pd.DataFrame(data) i_row = 0 for row in df.itertuples(): orig_coord='{},{}'.format(row[1],row[2]) dest_coord='{},{}'.format(row[3],row[4]) url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord) # Do stuff to get your result df['result'][i_row] = result i_row++