How to read Json data with unbalanced array length in Python - python

I have been trying to fetch Json data from an API using Python so that I can transfer that data to sqlite3 database. The issue is that the data is unbalanced. My end goal is to transfer this json data to a .db file in sqlite3.
Here is what I did:
import pandas as pd
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
df = pd.read_json(url)
print(df)
This is the error I am getting:
raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

It's not obvious what you want your final DataFrame to look like, but appending "orient='index'" avoids the problem in this case.
import pandas as pd
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
df = pd.read_json(url, orient='index')
print(df)
You could also request the data with, for example, the requests module and prepare it before loading it into a DataFrame
import requests
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
response = requests.get(url)
data = response.json()
"""
Do data transformations here
"""
df = pd.DataFrame.from_dict(data)

Related

How to extract data from an api using python and convert it into a pandas data frame

I want to load the data from an API into a pandas data frame. How may I do that? The following is my code snippet:
import requests
import json
response_API = requests.get('https://data.spiceai.io/eth/v0.1/gasfees?period=1d')
#print(response_API.status_code)
data = response_API.text
parse_json = json.loads(data)
Almost there, the json is clean you can directly input it to a dataframe :
response_API = requests.get('https://data.spiceai.io/eth/v0.1/gasfees?period=1d')
data = response_API.json()
df = pd.DataFrame(data)

Python converting URL JSON response to pandas dataframe

Hi I am making a call to a web service from Python with the following code:
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)
The result of this is:
Results
forecast [2.1632421537363355]
index [{'SaleDate': 1644278400000, 'OfferingGroupId': 0...
prediction_interval [[-114.9747272420262, 119.30121154949884]]
What I am trying to do now is to have data in DataFrame as:
Forecast SaleDate OfferingGroupId
2.1632421537363355 2022-02-08 0
I have tried so many different things that have lost the count.
Could you please help me with this?
You could first convert the json string to a dictionary (thanks #JonSG):
import json
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
data = json.loads(string)
or use the json method of response:
data = response.json()
then use pandas.json_normalize where you can directly pass in the record and meta paths of your data to convert the dictionary to a pandas DataFrame object:
import pandas as pd
out = pd.json_normalize(data['Results'], record_path = ['index'], meta = ['forecast'])
Output:
SaleDate OfferingGroupId forecast
0 1644278400000 0 2.163242

Is there an easy way to convert this API get request into a DataFrame?

I am trying to get this US Census Bureau api data get request into a dataframe and thought that it was a list of list but is showing up as a NoneType. Is there a way to make this into a dataframe that could be easily exported into a CSV file?
import request
# The Basic API Request:
# Build base URL
HOST = "https://api.census.gov/data"
year = "2010"
dataset = "dec/sf1"
base_url = "/".join([HOST, year, dataset])
# Specify Census variables and other predicates
get_vars = ["NAME","P013001","P037001"]
predicates = {}
predicates["get"] = ",".join(get_vars)
predicates["for"] = "state:*"
# Execute the request, examine text of response object
data = requests.get(base_url, params=predicates)
print(data.text)
This does produce the following output:
[["NAME","P013001","P037001","state"],
["Alabama","37.9","3.02","01"],
["Alaska","33.8","3.21","02"],
["Arizona","35.9","3.19","04"],
...
["Wyoming","36.8","2.96","56"],
["Puerto Rico","36.9","3.17","72"]]
The data.text is a string, so you could parse it through json, try this
import json
import pandas as pd
data = pd.DataFrame(json.loads(data.text)[1:], columns=['NAME', 'P013001', 'P037001', 'state'])
and you'll get something similar to the image below.

Loading a series of JSON objects in pandas dataframe

I have downloaded a sample dataset from here that is a series of JSON objects.
{...}
{...}
I need to load them to a pandas dataframe. I have tried below code
import pandas as pd
import json
filename = "sample-S2-records"
df = pd.DataFrame.from_records(map(json.loads, "sample-S2-records"))
But there seems to be parsing error
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
What am I missing?
You can try pandas.read_json method:
import pandas as pd
data = pd.read_json('/path/to/file.json', lines=True)
print data
I have tested it with this file, it works fine
The function needs a list of JSON objects. For example,
data = [ json_obj_1,json_obj_2,....]
The file does not contain the syntax for list and just has series of JSON objects. Following would solve the issue:
import pandas as pd
import json
# Load content to a variable
with open('../sample-S2-records/sample-S2-records', 'r') as content_file:
content = content_file.read().strip()
# Split content by new line
content = content.split('\n')
# Read each line which has a json obj and store json obj in a list
json_list = []
for each_line in content:
json_list.append(json.loads(each_line))
# Load the json list in form of a string
df = pd.read_json(json.dumps(json_list))

JSON from API call to pandas dataframe

I'm trying to get an API call and save it as a dataframe.
problem is that I need the data from the 'result' column.
Didn't succeed to do that.
I'm basically just trying to save the API call as a csv file in order to work with it.
P.S when I do this with a "JSON to CSV converter" from the web it does it as I wish. (example: https://konklone.io/json/)
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&
address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&
endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
j
df = pd.DataFrame(j)
df.head()
output example picture
Try this
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
# print(j)
filename ="temp.csv"
df = pd.DataFrame(j['result'])
print(df.head())
df.to_csv(filename)
Looks like you need.
df = pd.DataFrame(j["result"])

Categories

Resources