Putting JSON API in Pandas Dataframe - python

I have a problem converting a JSON API into a pandas dataframe. I have the following structure of the json file:
{"place":{"AMS":[{"UTC":"14-11-2017 10:00","ValidUTC":"14-11-2017 00:00","Cardinality":"4",...},{"UTC":"14-11-2017 11:00",...}]}}
Now in my pandas DataFrame I want the columns, UTC, ValidUtc, cardinality, etc. So I tried to use the Json normalize function:
main_api = ('https://api.xxx")
url=main_api
json_data = requests.get(url).json()
df = json_normalize(json_data, 'place', ['AMS'])
and
main_api = ('https://api.xxx")
url=main_api
json_data = requests.get(url).json()
df = json_normalize(json_data, 'place')
df = json_normalize(json_data, 'AMS')
but they do not seem to work. Anyone has an idea about how to convert the json correctly in the pandas DataFrame.

Refer to JSON to pandas DataFrame it is very well described for normalizing JSON. Besides, you can to pass a parsing function for the json columns.

Not sure if I recreated your input correctly (you should add the full json without ...).
data = {"place":{"AMS":[{"UTC":"14-11-2017 10:00","ValidUTC":"14-11-2017 00:00","Cardinality":"4"}, {"UTC":"15-11-2017 10:00","ValidUTC":"15-11-2017 00:00","Cardinality":"5"} ]}}
pd.json_normalize(data['place']['AMS'])
Output
UTC ValidUTC Cardinality
0 14-11-2017 10:00 14-11-2017 00:00 4
1 15-11-2017 10:00 15-11-2017 00:00 5

Related

Re-write a json file to add missing data values with Pandas

I am trying to re-write a json file to add missing data values. but i cant seem to get the code to re-write the data on the json file. Here is the code to fill in missing data:
import pandas as pd
import json
data_df = pd.read_json("Data_test.json")
#replacing empty strings with nan
df2 = data_df.mask(data_df == "")
#filling the nan with data from above.
df2["Food_cat"].fillna(method="ffill", inplace=True,)
"Data_test.json" is the file with the list of dictionary and I am trying to either edit this json file or create a new one with the filled in data that was missing.
I have tried using
with open('complete_data', 'w') as f:
json.dump(df2, f)
but it does not seem to work. is there a way to edit the current data or create a new json file with the completed data?
this is the original data, I would like to keep this format.
Try to do this
import pandas as pd
import json
data_df = pd.read_json("Data_test.json")
#replacing empty strings with nan
df2 = data_df.mask(data_df == "")
#filling the nan with data from above.
df2["Food_cat"].fillna(method="ffill", inplace=True,)
df2.to_json('path_of_file.json')
Tell me if it works.

Python converting URL JSON response to pandas dataframe

Hi I am making a call to a web service from Python with the following code:
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)
The result of this is:
Results
forecast [2.1632421537363355]
index [{'SaleDate': 1644278400000, 'OfferingGroupId': 0...
prediction_interval [[-114.9747272420262, 119.30121154949884]]
What I am trying to do now is to have data in DataFrame as:
Forecast SaleDate OfferingGroupId
2.1632421537363355 2022-02-08 0
I have tried so many different things that have lost the count.
Could you please help me with this?
You could first convert the json string to a dictionary (thanks #JonSG):
import json
response = urllib.request.urlopen(req)
string = response.read().decode('utf-8')
data = json.loads(string)
or use the json method of response:
data = response.json()
then use pandas.json_normalize where you can directly pass in the record and meta paths of your data to convert the dictionary to a pandas DataFrame object:
import pandas as pd
out = pd.json_normalize(data['Results'], record_path = ['index'], meta = ['forecast'])
Output:
SaleDate OfferingGroupId forecast
0 1644278400000 0 2.163242

Parse a json file to get the right columns to insert into bigquery

I'm relatively new to Python and I am trying to get some exchange rate data from the ECB free api:
GET https://api.exchangeratesapi.io/latest?base=GBP
I want to ultimately end up with this data in a bigquery table. Loading the data to BQ is fine, but getting it into the right column/row format before sending it the BQ is the problem.
I want to end up with a table like this:
Currency Rate Date
CAD 1.629.. 2019-08-27
HKD 9.593.. 2019-08-27
ISK 152.6.. 2019-08-27
... ... ...
I've tried a few things but not quite got there yet:
# api-endpoint
URL = "https://api.exchangeratesapi.io/latest?base=GBP"
# sending get request and saving the response as response object
r = requests.get(url=URL)
# extracting data in json format
data = r.json()
with open('data.json', 'w') as outfile:
json.dump(data['rates'], outfile)
a_dict = {'date': '2019-08-26'}
with open('data.json') as f:
data = json.load(f)
data.update(a_dict)
with open('data.json', 'w') as f:
json.dump(data, f)
print(data)
Here is the original json file:
{
"rates":{
"CAD":1.6296861353,
"HKD":9.593490542,
"ISK":152.6759753684,
"PHP":64.1305429339,
"DKK":8.2428443501,
"HUF":363.2604778172,
"CZK":28.4888284523,
"GBP":1.0,
"RON":5.2195062629,
"SEK":11.8475893558,
"IDR":17385.9684034803,
"INR":87.6742617713,
"BRL":4.9997236134,
"RUB":80.646191945,
"HRK":8.1744110201,
"JPY":130.2223254066,
"THB":37.5852652759,
"CHF":1.2042718318,
"EUR":1.1055465269,
"MYR":5.1255348081,
"BGN":2.1622278974,
"TRY":7.0550451616,
"CNY":8.6717964026,
"NOK":11.0104695256,
"NZD":1.9192287707,
"ZAR":18.6217151449,
"USD":1.223287232,
"MXN":24.3265563331,
"SGD":1.6981194654,
"AUD":1.8126540855,
"ILS":4.3032293014,
"KRW":1482.7479464473,
"PLN":4.8146551248
},
"base":"GBP",
"date":"2019-08-23"
}
Welcome! How about this, as one way to tackle your problem.
# import the pandas library so we can use it's from_dict function:
import pandas as pd
# subset the json to a dict of exchange rates and country codes:
d = data['rates']
# create a dataframe from this data, using pandas from_dict function:
df = pd.DataFrame.from_dict(d,orient='index')
# add a column for date (this value is taken from the json data):
df['date'] = data['date']
# name our columns, to keep things clean
df.columns = ['rate','date']
This gives you:
rate date
CAD 1.629686 2019-08-23
HKD 9.593491 2019-08-23
ISK 152.675975 2019-08-23
PHP 64.130543 2019-08-23
...
In this case the currency is the index of the dataframe, if you'd prefer it as column of it's own just add:
df['currency'] = df.index
You can then write this dataframe out to a .csv file, or write it into BigQuery.
For this i'd recommend you take a look at The BigQuery Client library, this can be a little hard to get your head around at first, so you may also want to check out pandas.DataFrame.to_gbq, which is easier, but less robust (see this link for more detail on Client library vs. a pandas function.
Thanks Ben P for the help.
Here is my script that works for those interested. It uses an internal library my team uses for the BQ load, but the rest is pandas and requests:
from aa.py.gcp import GCPAuth, GCPBigQueryClient
from aa.py.log import StandardLogger
import requests, os, pandas as pd
# Connect to BigQuery
logger = StandardLogger('test').logger
auth = GCPAuth(logger=logger)
credentials_path = 'XXX'
credentials = auth.get_credentials(credentials_path)
gcp_bigquery = GCPBigQueryClient(logger=logger)
gcp_bigquery.connect(credentials)
# api-endpoint
URL = "https://api.exchangeratesapi.io/latest?base=GBP"
# sending get request and saving the response as response object
r = requests.get(url=URL)
# extracting data in json format
data = r.json()
# extract rates object from json
d = data['rates']
# split currency and rate for dataframe
df = pd.DataFrame.from_dict(d,orient='index')
# add date element to dataframe
df['date'] = data['date']
#column names
df.columns = ['rate', 'date']
# print dataframe
print(df)
# write dateframe to csv
df.to_csv('data.csv', sep='\t', encoding='utf-8')
#########################################
# write csv to BQ table
file_path = os.getcwd()
file_name = 'data.csv'
dataset_id = 'Testing'
table_id = 'Exchange_Rates'
response = gcp_bigquery.load_file_into_table(file_path, file_name, dataset_id, table_id, source_format='CSV', field_delimiter="\t", create_disposition='CREATE_NEVER', write_disposition='WRITE_TRUNCATE',skip_leading_rows=1)

Python dataframe to JSON add additional fields

I am looking to transform my dataframe to json
Age Eye Gender
30 blue male
My current code, I convert the dataframe to json and get the below result:
json_file = df.to_json(orient='records')
json_file
[{'age':'30'},{'eye':'blue'},{'gender':'male'}]
However, I want to add an additional layer that would state the id and name to the json data and then label it as 'info'.
{'id':'5231'
'name':'Bob'
'info': [
{'age':'30'},{'eye':'blue'},{'gender':'male'}
]
}
How would I add the additional fields? I tried reading the docs however I do not see a clear answer on how to add the additional fields in during dataframe to json conversion.
Based on the data you provided this is your answer:
import pandas as pd
a = {'id':'5231',
'name':'Bob',
}
df = pd.DataFrame({'Age':[30], 'Eye':['blue'], 'Gender': ['male']})
json = df.to_json(orient='records')
a['info'] = json

parse date-time while reading 'csv' file with pandas

I am trying to parse dates while I am​ reading my data from cvs file. The command that I use is
df = pd.read_csv('/Users/n....', names=names, parse_dates=['date'])​ )
And it is working on my files generally.
But I have couple of data sets which has variety in date formats. I mean it has date format is like that (09/20/15 09:59​ ) while it has another format in other lines is like that ( 2015-09-20 10:22:01.013​ ) in the same file. And the command that I wrote above doesn't work on these file. It is working when I delete (parse_dates=['date'])​, but that time I can't use date column as datetime format, it reads that column as integer . I would be appreciate anyone could answer that!
Pandas read_csv accepts date_parser argument which you can define your own date parsing function. So for example in your case you have 2 different datetime formats you can simply do:
import datetime
def date_parser(d):
try:
d = datetime.datetime.strptime("format 1")
except ValueError:
try:
d = datetime.datetime.strptime("format 2")
except:
# both formats not match, do something about it
return d
df = pd.read_csv('/Users/n....',
names=names,
parse_dates=['date1', 'date2']),
date_parser=date_parser)
You can then parse those dates in different formats in those columns.
Like this:
df = pd.read_csv(file, names=names)
df['date'] = pd.to_datetime(df['date'])

Categories

Resources