Python converting URL JSON response to pandas dataframe - python

Hi I am making a call to a web service from Python with the following code:
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)
The result of this is:
Results
forecast [2.1632421537363355]
index [{'SaleDate': 1644278400000, 'OfferingGroupId': 0...
prediction_interval [[-114.9747272420262, 119.30121154949884]]
What I am trying to do now is to have data in DataFrame as:
Forecast SaleDate OfferingGroupId
2.1632421537363355 2022-02-08 0
I have tried so many different things that have lost the count.
Could you please help me with this?

You could first convert the json string to a dictionary (thanks #JonSG):
import json
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
data = json.loads(string)
or use the json method of response:
data = response.json()
then use pandas.json_normalize where you can directly pass in the record and meta paths of your data to convert the dictionary to a pandas DataFrame object:
import pandas as pd
out = pd.json_normalize(data['Results'], record_path = ['index'], meta = ['forecast'])
Output:
SaleDate OfferingGroupId forecast
0 1644278400000 0 2.163242

Related

How to extract data from an api using python and convert it into a pandas data frame

I want to load the data from an API into a pandas data frame. How may I do that? The following is my code snippet:
import requests
import json
response_API = requests.get('https://data.spiceai.io/eth/v0.1/gasfees?period=1d')
#print(response_API.status_code)
data = response_API.text
parse_json = json.loads(data)
Almost there, the json is clean you can directly input it to a dataframe :
response_API = requests.get('https://data.spiceai.io/eth/v0.1/gasfees?period=1d')
data = response_API.json()
df = pd.DataFrame(data)

How to read Json data with unbalanced array length in Python

I have been trying to fetch Json data from an API using Python so that I can transfer that data to sqlite3 database. The issue is that the data is unbalanced. My end goal is to transfer this json data to a .db file in sqlite3.
Here is what I did:
import pandas as pd
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
df = pd.read_json(url)
print(df)
This is the error I am getting:
raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
It's not obvious what you want your final DataFrame to look like, but appending "orient='index'" avoids the problem in this case.
import pandas as pd
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
df = pd.read_json(url, orient='index')
print(df)
You could also request the data with, for example, the requests module and prepare it before loading it into a DataFrame
import requests
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
response = requests.get(url)
data = response.json()
"""
Do data transformations here
"""
df = pd.DataFrame.from_dict(data)

Is there an easy way to convert this API get request into a DataFrame?

I am trying to get this US Census Bureau api data get request into a dataframe and thought that it was a list of list but is showing up as a NoneType. Is there a way to make this into a dataframe that could be easily exported into a CSV file?
import request
# The Basic API Request:
# Build base URL
HOST = "https://api.census.gov/data"
year = "2010"
dataset = "dec/sf1"
base_url = "/".join([HOST, year, dataset])
# Specify Census variables and other predicates
get_vars = ["NAME","P013001","P037001"]
predicates = {}
predicates["get"] = ",".join(get_vars)
predicates["for"] = "state:*"
# Execute the request, examine text of response object
data = requests.get(base_url, params=predicates)
print(data.text)
This does produce the following output:
[["NAME","P013001","P037001","state"],
["Alabama","37.9","3.02","01"],
["Alaska","33.8","3.21","02"],
["Arizona","35.9","3.19","04"],
...
["Wyoming","36.8","2.96","56"],
["Puerto Rico","36.9","3.17","72"]]
The data.text is a string, so you could parse it through json, try this
import json
import pandas as pd
data = pd.DataFrame(json.loads(data.text)[1:], columns=['NAME', 'P013001', 'P037001', 'state'])
and you'll get something similar to the image below.

Empty JSON Objects

I have few columns in my data that are encoded in JSON format. I am trying to convert this data into several columns in python pandas dataframe.I have if problems with the empty JSON objects it's not letting me use the JSON.loads function to decode the JSON to several columns.
Here is my code:
json_columns = ['customDimensions','device','geoNetwork','hits','totals','trafficSource']
for json in json_columns:
if df[json][0].startswith('['):
df[json] = df[json].apply(lambda x: x[1:-1])
df[json] = df[json].apply(lambda x: x.replace('\'',"\""))
json_load = df[json].apply(loads)
json_list = list(json_load)
json_data = dumps(json_list)
df = df.join(pd.read_json(json_data))
df = df.drop(json,axis=1)
Here are some examples:
"customDimensions": [], "customMetrics": [], "customVariables": [],"experiment": []
I want these empty object to return NULL when it gets decoded into a pandas data frame. hits columns has the above empty JSON object which is resulting in the below error.
JSONDecodeError: Expecting value: line 1 column 79 (char 78)
I have shared a sample from my source data frame. Just in case if anyone needs to have a look at it.
URL: https://drive.google.com/file/d/142tOk03WxxPF30xE9G_UtJ2j0KbpHWc4/view?usp=sharing
Thanks

Parse a json file to get the right columns to insert into bigquery

I'm relatively new to Python and I am trying to get some exchange rate data from the ECB free api:
GET https://api.exchangeratesapi.io/latest?base=GBP
I want to ultimately end up with this data in a bigquery table. Loading the data to BQ is fine, but getting it into the right column/row format before sending it the BQ is the problem.
I want to end up with a table like this:
Currency Rate Date
CAD 1.629.. 2019-08-27
HKD 9.593.. 2019-08-27
ISK 152.6.. 2019-08-27
... ... ...
I've tried a few things but not quite got there yet:
# api-endpoint
URL = "https://api.exchangeratesapi.io/latest?base=GBP"
# sending get request and saving the response as response object
r = requests.get(url=URL)
# extracting data in json format
data = r.json()
with open('data.json', 'w') as outfile:
json.dump(data['rates'], outfile)
a_dict = {'date': '2019-08-26'}
with open('data.json') as f:
data = json.load(f)
data.update(a_dict)
with open('data.json', 'w') as f:
json.dump(data, f)
print(data)
Here is the original json file:
{
"rates":{
"CAD":1.6296861353,
"HKD":9.593490542,
"ISK":152.6759753684,
"PHP":64.1305429339,
"DKK":8.2428443501,
"HUF":363.2604778172,
"CZK":28.4888284523,
"GBP":1.0,
"RON":5.2195062629,
"SEK":11.8475893558,
"IDR":17385.9684034803,
"INR":87.6742617713,
"BRL":4.9997236134,
"RUB":80.646191945,
"HRK":8.1744110201,
"JPY":130.2223254066,
"THB":37.5852652759,
"CHF":1.2042718318,
"EUR":1.1055465269,
"MYR":5.1255348081,
"BGN":2.1622278974,
"TRY":7.0550451616,
"CNY":8.6717964026,
"NOK":11.0104695256,
"NZD":1.9192287707,
"ZAR":18.6217151449,
"USD":1.223287232,
"MXN":24.3265563331,
"SGD":1.6981194654,
"AUD":1.8126540855,
"ILS":4.3032293014,
"KRW":1482.7479464473,
"PLN":4.8146551248
},
"base":"GBP",
"date":"2019-08-23"
}
Welcome! How about this, as one way to tackle your problem.
# import the pandas library so we can use it's from_dict function:
import pandas as pd
# subset the json to a dict of exchange rates and country codes:
d = data['rates']
# create a dataframe from this data, using pandas from_dict function:
df = pd.DataFrame.from_dict(d,orient='index')
# add a column for date (this value is taken from the json data):
df['date'] = data['date']
# name our columns, to keep things clean
df.columns = ['rate','date']
This gives you:
rate date
CAD 1.629686 2019-08-23
HKD 9.593491 2019-08-23
ISK 152.675975 2019-08-23
PHP 64.130543 2019-08-23
...
In this case the currency is the index of the dataframe, if you'd prefer it as column of it's own just add:
df['currency'] = df.index
You can then write this dataframe out to a .csv file, or write it into BigQuery.
For this i'd recommend you take a look at The BigQuery Client library, this can be a little hard to get your head around at first, so you may also want to check out pandas.DataFrame.to_gbq, which is easier, but less robust (see this link for more detail on Client library vs. a pandas function.
Thanks Ben P for the help.
Here is my script that works for those interested. It uses an internal library my team uses for the BQ load, but the rest is pandas and requests:
from aa.py.gcp import GCPAuth, GCPBigQueryClient
from aa.py.log import StandardLogger
import requests, os, pandas as pd
# Connect to BigQuery
logger = StandardLogger('test').logger
auth = GCPAuth(logger=logger)
credentials_path = 'XXX'
credentials = auth.get_credentials(credentials_path)
gcp_bigquery = GCPBigQueryClient(logger=logger)
gcp_bigquery.connect(credentials)
# api-endpoint
URL = "https://api.exchangeratesapi.io/latest?base=GBP"
# sending get request and saving the response as response object
r = requests.get(url=URL)
# extracting data in json format
data = r.json()
# extract rates object from json
d = data['rates']
# split currency and rate for dataframe
df = pd.DataFrame.from_dict(d,orient='index')
# add date element to dataframe
df['date'] = data['date']
#column names
df.columns = ['rate', 'date']
# print dataframe
print(df)
# write dateframe to csv
df.to_csv('data.csv', sep='\t', encoding='utf-8')
#########################################
# write csv to BQ table
file_path = os.getcwd()
file_name = 'data.csv'
dataset_id = 'Testing'
table_id = 'Exchange_Rates'
response = gcp_bigquery.load_file_into_table(file_path, file_name, dataset_id, table_id, source_format='CSV', field_delimiter="\t", create_disposition='CREATE_NEVER', write_disposition='WRITE_TRUNCATE',skip_leading_rows=1)

Categories

Resources