Most efficient way of converting RESTful output to dataframe - python

I have output from a REST call that I've converted to JSON.
It's a highly nested collection of dicts and lists, but I'm eventually able to convert it to dataframe as follows:
import panads as pd
from requests import get
url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all'
params = {
'startTime' : '2008-06',
'dimensionAtObservation' : 'TimeDimension'
}
r = get(url, params = params)
x = r.json()
d = x['dataSets'][0]['series']
a = pd.DataFrame(d['0:0:0']['observations'])
b = pd.DataFrame(d['0:1:0']['observations'])
This works absent some manipulation to make it easier to work with, and as there are multiple time series, I can do a version of the same for each, but it goes without saying it's kind of clunky.
Is there a better/cleaner way to do this.

The pandasdmx library makes this super-simple:
import pandasdmx as sdmx
df = sdmx.Request('OECD').data(
resource_id='MEI_FIN',
key='IR3TIB.GBR+USA.M',
params={'startTime': '2008-06', 'dimensionAtObservation': 'TimeDimension'},
).write()

Absent any responses, here's the solution I came up with. I added a list comprehension to deal with getting each series into a dataframe, and then a transpose as this source resulted in the series being aligned across rows instead of down columns.
import panads as pd
from requests import get
url = 'http://stats.oecd.org/SDMX-JSON/data/MEI_FIN/IR3TIB.GBR+USA.M/all'
params = {
'startTime' : '2008-06',
'dimensionAtObservation' : 'TimeDimension'
}
r = get(url, params = params)
x = r.json()
d = x['dataSets'][0]['series']
df = [pd.DataFrame(d[i]['observations']).loc[0] for i in d]
df = pd.DataFrame(df).T

Related

How to convert a nested JSON object to a dataframe?

I am getting a JSON object returned from an API call which looks like this:
{"meta":{"symbol":"AAPL","interval":"1min","currency":"USD","exchange_timezone":"America/New_York","exchange":"NASDAQ","mic_code":"XNGS","type":"Common Stock"},"values":[{"datetime":"2022-06-06 15:59:00","open":"146.14999","high":"146.47000","low":"146.09000","close":"146.14000","volume":"1364826"},{"datetime":"2022-06-06 15:58:00","open":"146.14000","high":"146.17999","low":"146.08000","close":"146.14680","volume":"358111"},{"datetime":"2022-06-06 15:57:00","open":"146.30499","high":"146.33000","low":"146.13000","close":"146.14000","volume":"0"},{"datetime":"2022-06-06 15:56:00","open":"146.25999","high":"146.34500","low":"146.20000","close":"146.31000","volume":"306725"},{"datetime":"2022-06-06 15:55:00","open":"146.14999","high":"146.38000","low":"146.07001","close":"146.25999","volume":"384471"},{"datetime":"2022-06-06 15:54:00","open":"145.95000","high":"146.25999","low":"145.91000","close":"146.15500","volume":"287583"},{"datetime":"2022-06-06 15:53:00","open":"145.97000","high":"146.10001","low":"145.89760","close":"145.94569","volume":"231640"},{"datetime":"2022-06-06 15:52:00","open":"145.96500","high":"146.00000","low":"145.78999","close":"145.96500","volume":"189185"},{"datetime":"2022-06-06 15:51:00","open":"145.89000","high":"146.00000","low":"145.74001","close":"145.96001","volume":"182617"},{"datetime":"2022-06-06 15:50:00","open":"145.74001","high":"146.11290","low":"145.74001","close":"145.89500","volume":"376980"},{"datetime":"2022-06-06 15:49:00","open":"145.63499","high":"145.85001","low":"145.63000","close":"145.73000","volume":"190471"},{"datetime":"2022-06-06 15:48:00","open":"145.61000","high":"145.71001","low":"145.58000","close":"145.65131","volume":"138908"},{"datetime":"2022-06-06 15:47:00","open":"145.64999","high":"145.65500","low":"145.53999","close":"145.61011","volume":"166144"},{"datetime":"2022-06-06 15:46:00","open":"145.81500","high":"145.82500","low":"145.62061","close":"145.66000","volume":"175801"},{"datetime":"2022-06-06 15:45:00","open":"145.88989","high":"145.98000","low":"145.80780","close":"145.81880","volume":"161626"},{"datetime":"2022-06-06 15:44:00","open":"145.80000","high":"145.89000","low":"145.77000","close":"145.89000","volume":"89067"},{"datetime":"2022-06-06 15:43:00","open":"145.95000","high":"145.97000","low":"145.78500","close":"145.80000","volume":"180386"},{"datetime":"2022-06-06 15:42:00","open":"145.84000","high":"146.09000","low":"145.82001","close":"145.96989","volume":"377760"},{"datetime":"2022-06-06 15:41:00","open":"145.59000","high":"145.86000","low":"145.59000","close":"145.83730","volume":"283091"},{"datetime":"2022-06-06 15:40:00","open":"145.46001","high":"145.60001","low":"145.36000","close":"145.58501","volume":"159567"},{"datetime":"2022-06-06 15:39:00","open":"145.50999","high":"145.56850","low":"145.45000","close":"145.47009","volume":"113975"},{"datetime":"2022-06-06 15:38:00","open":"145.30000","high":"145.50880","low":"145.24010","close":"145.50500","volume":"174004"},{"datetime":"2022-06-06 15:37:00","open":"145.44000","high":"145.44000","low":"145.27000","close":"145.30000","volume":"189831"},{"datetime":"2022-06-06 15:36:00","open":"145.54890","high":"145.54890","low":"145.38000","close":"145.44000","volume":"101993"},{"datetime":"2022-06-06 15:35:00","open":"145.53000","high":"145.56000","low":"145.41000","close":"145.54500","volume":"114006"},{"datetime":"2022-06-06 15:34:00","open":"145.58501","high":"145.60789","low":"145.50999","close":"145.52010","volume":"108473"},{"datetime":"2022-06-06 15:33:00","open":"145.53999","high":"145.60500","low":"145.47000","close":"145.58501","volume":"133996"},{"datetime":"2022-06-06 15:32:00","open":"145.56500","high":"145.64000","low":"145.46030","close":"145.53999","volume":"131019"},{"datetime":"2022-06-06 15:31:00","open":"145.34500","high":"145.60001","low":"145.34000","close":"145.58800","volume":"238105"},{"datetime":"2022-06-06 15:30:00","open":"145.34500","high":"145.35001","low":"145.27000","close":"145.34000","volume":"136026"}],"status":"ok"}
I am interested in the "values" section, the datetime,h,l,o,c,v values and I want to import them into a dataframe.
My code is:
resp = requests.get(url)
which generates the above response. Then:
df = pd.DataFrame(resp)
which provides this:
0 b'{"meta":{"symbol":"AAPL","interval":"1day","...
1 b'":"XNGS","type":"Common Stock"},"values":[{"...
2 b'e":"146.14000","volume":"65217850"},{"dateti...
3 b'5.39000","volume":"88471302"},{"datetime":"2...
4 b'1","volume":"72348100"},{"datetime":"2022-06...
How can I skip the meta section and populate the dataframe only with the values that I need?
I have tried:
df = pd.DataFrame(resp.meta.values)
and
df = pd.DataFrame(resp['meta']['values'])
which return errors: no attribute meta and not subscriptable respectively.
Edit to fit actual solution:
You should be able to load your API response with:
data = resp.json()
pd.DataFrame(data['values'])

how can i make two values ​in same column as separate columns in python

in this commands;
import requests
import pandas as pd
url = "https://api.binance.com/api/v3/depth?symbol=BNBUSDT&limit=100"
payload = {}
headers = {
'Content-Type': 'application/json'
}
response = requests.request("GET", url, headers=headers, data=payload).json()
depth = pd.DataFrame(response, columns=["bids","asks"])
print(depth)
outputs :
........
bids
asks
0
[382.40000000, 86.84800000]
[382.50000000, 196.24600000]
1
[382.30000000, 174.26400000]
[382.60000000, 116.10300000]
and first i need to change the table this format
........
bidsValue
bidsQuantity
asksValue
asksQuantity
rangeTotalbidsQuantity
rangeTotalasksQuantity
0
382.40000000
86.84800000
382.50000000
196.24600000
1
382.30000000
174.26400000
382.60000000
116.10300000
and then turn the columns values float so that be able to calculate in a specific range values quantity (e.g 0bidsValue 400.00 and ?bidsValue 380.00 ("?" because of i don't know row number) first i must find row number of bidsValue 380.00 (for example it is 59) then calculate 0bidsQuantity to 59bidsQuantity) and last write the results in rangeTotalbidsQuantity column.
I've been trying for hours, I tried many commands from "pandas.Dataframe" but I couldn't do it.
Thank you!
You can use this solution.
In your case this would look something like this:
depth['bidsValue'], depth['bidsQuantity'] = zip(*list(depth['bids'].values))
depth['asksValue'], depth['asksQuantity'] = zip(*list(depth['asks'].values))
depth = depth.drop(columns=['bids', 'asks'])
For the second part look at this tutorial.
You can use for example:
pd.DataFrame(df['bids'].tolist())
Then concatenate that to the original dataframe.
You can apply() pandas.Series and assign to new columns
depth[ ['bidsValue', 'bidsQuantity'] ] = depth['bids'].apply(pd.Series)
depth[ ['asksValue', 'asksQuantity'] ] = depth['asks'].apply(pd.Series)
and later you have to remove original columns bids,asks
depth = depth.drop(columns=['bids', 'asks'])
Full working code with other changes
import requests
import pandas as pd
url = "https://api.binance.com/api/v3/depth"
payload = {
'symbol': 'BNBUSDT',
'limit': 100,
}
response = requests.get(url, params=payload)
#print(response.status_code)
data = response.json()
depth = pd.DataFrame(data, columns=["bids","asks"])
#print(depth)
depth[ ['bidsValue', 'bidsQuantity'] ] = depth['bids'].apply(pd.Series)
depth[ ['asksValue', 'asksQuantity'] ] = depth['asks'].apply(pd.Series)
depth = depth.drop(columns=['bids', 'asks'])
print(depth.to_string())

How to update a pandas dataframe, from multiple API calls

I need to do a python script to
Read a csv file with the columns (person_id, name, flag). The file has 3000 rows.
Based on the person_id from the csv file, I need to call a URL passing the person_id to do a GET
http://api.myendpoint.intranet/get-data/1234
The URL will return some information of the person_id, like example below. I need to get all rents objects and save on my csv. My output needs to be like this
import pandas as pd
import requests
ids = pd.read_csv(f"{path}/data.csv", delimiter=';')
person_rents = df = pd.DataFrame([], columns=list('person_id','carId','price','rentStatus'))
for id in ids:
response = request.get(f'endpoint/{id["person_id"]}')
json = response.json()
person_rents.append( [person_id, rent['carId'], rent['price'], rent['rentStatus'] ] )
pd.read_csv(f"{path}/data.csv", delimiter=';' )
person_id;name;flag;cardId;price;rentStatus
1000;Joseph;1;6638;1000;active
1000;Joseph;1;5566;2000;active
Response example
{
"active": false,
"ctodx": false,
"rents": [{
"carId": 6638,
"price": 1000,
"rentStatus": "active"
}, {
"carId": 5566,
"price": 2000,
"rentStatus": "active"
}
],
"responseCode": "OK",
"status": [{
"request": 345,
"requestStatus": "F"
}, {
"requestId": 678,
"requestStatus": "P"
}
],
"transaction": false
}
After save the additional data from response on csv, i need to get data from another endpoint using the carId on the URL. The mileage result must be save in the same csv.
http://api.myendpoint.intranet/get-mileage/6638
http://api.myendpoint.intranet/get-mileage/5566
The return for each call will be like this
{"mileage":1000.0000}
{"mileage":550.0000}
The final output must be
person_id;name;flag;cardId;price;rentStatus;mileage
1000;Joseph;1;6638;1000;active;1000.0000
1000;Joseph;1;5566;2000;active;550.0000
SOmeone can help me with this script?
Could be with pandas or any python 3 lib.
Code Explanation
Create dataframe, df, with pd.read_csv.
It is expected that all of the values in 'person_id', are unique.
Use .apply on 'person_id', to call prepare_data.
prepare_data expects 'person_id' to be a str or int, as indicated by the type annotation, Union[int, str]
Call the API, which will return a dict, to the prepare_data function.
Convert the 'rents' key, of the dict, into a dataframe, with pd.json_normalize.
Use .apply on 'carId', to call the API, and extract the 'mileage', which is added to dataframe data, as a column.
Add 'person_id' to data, which can be used to merge df with s.
Convert pd.Series, s to a dataframe, with pd.concat, and then merge df and s, on person_id.
Save to a csv with pd.to_csv in the desired form.
Potential Issues
If there's an issue, it's most likely to occur in the call_api function.
As long as call_api returns a dict, like the response shown in the question, the remainder of the code will work correctly to produce the desired output.
import pandas as pd
import requests
import json
from typing import Union
def call_api(url: str) -> dict:
r = requests.get(url)
return r.json()
def prepare_data(uid: Union[int, str]) -> pd.DataFrame:
d_url = f'http://api.myendpoint.intranet/get-data/{uid}'
m_url = 'http://api.myendpoint.intranet/get-mileage/'
# get the rent data from the api call
rents = call_api(d_url)['rents']
# normalize rents into a dataframe
data = pd.json_normalize(rents)
# get the mileage data from the api call and add it to data as a column
data['mileage'] = data.carId.apply(lambda cid: call_api(f'{m_url}{cid}')['mileage'])
# add person_id as a column to data, which will be used to merge data to df
data['person_id'] = uid
return data
# read data from file
df = pd.read_csv('file.csv', sep=';')
# call prepare_data
s = df.person_id.apply(prepare_data)
# s is a Series of DataFrames, which can be combined with pd.concat
s = pd.concat([v for v in s])
# join df with s, on person_id
df = df.merge(s, on='person_id')
# save to csv
df.to_csv('output.csv', sep=';', index=False)
If there are any errors when running this code:
Leave a comment, to let me know.
edit your question, and paste the entire TraceBack, as text, into a code block.
Example
# given the following start dataframe
person_id name flag
0 1000 Joseph 1
1 400 Sam 1
# resulting dataframe using the same data for both id 1000 and 400
person_id name flag carId price rentStatus mileage
0 1000 Joseph 1 6638 1000 active 1000.0
1 1000 Joseph 1 5566 2000 active 1000.0
2 400 Sam 1 6638 1000 active 1000.0
3 400 Sam 1 5566 2000 active 1000.0
There are many different ways to implement this. One of them would be, like you started in your comment:
read the CSV file with pandas
for each line take the person_id and build a call
the delivered JSON response can then be taken from the rents
the carId is then extracted for each individual rental
finally this is collected in a row_list
the row_list is then converted back to csv via pandas
A very simple solution without any error handling could look something like this:
from types import SimpleNamespace
import pandas as pd
import requests
import json
path = '/some/path/'
df = pd.read_csv(f'{path}/data.csv', delimiter=';')
rows_list = []
for _, row in df.iterrows():
rentCall = f'http://api.myendpoint.intranet/get-data/{row.person_id}'
print(rentCall)
response = requests.get(rentCall)
r = json.loads(response.text, object_hook=lambda d: SimpleNamespace(**d))
for rent in r.rents:
mileageCall = f'http://api.myendpoint.intranet/get-mileage/{rent.carId}'
print(mileageCall)
response2 = requests.get(mileageCall)
m = json.loads(response2.text, object_hook=lambda d: SimpleNamespace(**d))
state = "active" if r.active else "inactive"
rows_list.append((row['person_id'], row['name'], row['flag'], rent.carId, rent.price, state, m.mileage))
df = pd.DataFrame(rows_list, columns=('person_id', 'name', 'flag', 'carId', 'price', 'rentStatus', 'mileage'))
print(df.to_csv(index=False, sep=';'))
Speeding up with multiprocessing
You mention that you have 3000 rows, which means that you'll have to make a lot of API calls. Depending on the connection, every one of these calls might take a while. As a result, performing this in a sequential way might be too slow. The majority of the time, your program will just be waiting on a response from the server without doing anything else.
We can improve this performance by using multiprocessing.
I use all the code from Trenton his answer, but I replace the following sequential call:
# call prepare_data
s = df.person_id.apply(prepare_data)
With a parallel alternative:
from multiprocessing import Pool
n_processes=20 # Experiment with this to see what works well
with Pool(n_processes) as p:
s=p.map(prepare_data, df.person_id)
Alternatively, a threadpool might be faster, but you'll have to test that by replacing the import with
from multiprocessing.pool import ThreadPool as Pool.

Pandas Google Distance Matrix API - Pass coordinates into URL

I am working with the Google Distance Matrix API, where I want to feed coordinates from a dataframe into the API and return the duration and distance between the two points.
Here is my dataframe:
import pandas as pd
import simplejson
import urllib
import numpy as np
Record orig_lat orig_lng dest_lat dest_lng
1 40.7484405 -74.0073127 40.7115242 -74.0145492
2 40.7421218 -73.9878531 40.7727216 -73.9863531
First, I need to combine the orig_lat & orig_lng and dest_lat & dest_lng into strings, which then pass into the url. So I've tried creating the variables orig_coord & dest_coord then passing them into the URL and returning values:
orig_coord = df[['orig_lat','orig_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
dest_coord = df[['dest_lat','dest_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
for row in df.itertuples():
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,end_coord)
result = simplejson.load(urllib.urlopen(url))
df['driving_time_text'] = result['rows'][0]['elements'][0]['duration']['text']
But I get the following error: "TypeError: () got an unexpected keyword argument 'axis'"
So my question is: how do I concatenate values from two columns into a string, then pass that string into a URL and output the result?
Thank you in advance!
Hmm, I am not sure how you constructed your data frame. Maybe post those details? But if you can live with referencing tuple elements positionally, this worked for me:
import pandas as pd
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353}]
df = pd.DataFrame(data)
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
print url
produces
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.748441,-74.007313&destinations=40.711524,-74.014549&units=imperial&MYGOOGLEAPIKEY
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.742122,-73.987853&destinations=40.772722,-73.986353&units=imperial&MYGOOGLEAPIKEY
To update the data frame with the result, since row is a tuple and not writeable, you might want to keep track of the current index as you iterate. Maybe something like this:
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549, 'result': -1},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353, 'result': -1}]
df = pd.DataFrame(data)
i_row = 0
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
# Do stuff to get your result
df['result'][i_row] = result
i_row++

Python pandas map dict keys to values

I have a csv for input, whose row values I'd like to join into a new field. This new field is a constructed url, which will then be processed by the requests.post() method.
I am constructing my url correctly, but my issue is with the data object that should be passed to requests. How can I have the correct values passed to their proper keys when my dictionary is unordered? If I need to use an ordered dict, how can I properly set it up with my current format?
Here is what I have:
import pandas as pd
import numpy as np
import requests
test_df = pd.read_csv('frame1.csv')
headers = {'content-type': 'application/x-www-form-urlencoded'}
test_df['FIRST_NAME'] = test_df['FIRST_NAME'].astype(str)
test_df['LAST_NAME'] = test_df['LAST_NAME'].astype(str)
test_df['ADDRESS_1'] = test_df['ADDRESS_1'].astype(str)
test_df['CITY'] = test_df['CITY'].astype(str)
test_df['req'] = 'site-url.com?' + '&FIRST_NAME=' + test_df['FIRST_NAME'] + '&LAST_NAME=' + \
test_df['LAST_NAME'] + '&ADDRESS_1=' + test_df['ADDRESS_1'] + '&CITY=' + test_df['CITY']
arr = test_df.values
d = {'FIRST_NAME':test_df['FIRST_NAME'], 'LAST_NAME':test_df['LAST_NAME'],
'ADDRESS_1':test_df['ADDRESS_1'], 'CITY':test_df['CITY']}
test_df = pd.DataFrame(arr[0:, 0:], columns=d, dtype=np.str)
data = test_df.to_dict()
data = {k: v for k, v in data.items()}
test_df['raw_result'] = test_df['req'].apply(lambda x: requests.post(x, headers=headers,
data=data).content)
test_df.to_csv('frame1_result.csv')
I tried to map values to keys with a dict comprehension, but the assignment of a key like FIRST_NAME could end up mapping to values from an arbitrary field like test_df['CITY'].
Not sure if I understand the problem correctly. However, you can give argument to to_dict function e.g.
data = test_df.to_dict(orient='records')
which will give you output as follows: [{'FIRST_NAME': ..., 'LAST_NAME': ...}, {'FIRST_NAME': ..., 'LAST_NAME': ...}] (which will give you a list that has equal length as test_df). This might be one possibility to easily map it to a correct row.

Categories

Resources