Python Normalize JSON to DataFrame

Python Normalize JSON to DataFrame - python

I have been trying to normalize this JSON data for quite some time now, but I am getting stuck at a very basic step. I think the answer might be quite simple. I will take any help provided.
import json
import urllib.request
import pandas as pd
url = "https://www.recreation.gov/api/camps/availability/campground/232447/month?start_date=2021-05-01T00%3A00%3A00.000Z"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
#data = json.dumps(data, indent=4)
df = pd.json_normalize(data = data['campsites'], record_path= 'availabilities', meta = 'campsites')
print(df)
My Expected df result is as following:
Expected DataFrame Output:

One approach (not using pd.json_normalize) is to iterate through a list of the unique campsites and convert the data for each campsite to a DataFrame. The list of campsite-specific DataFrames can then be concatenated using pd.concat.
Specifically:
## generate a list of unique campsites
unique_campsites = [item for item in data['campsites'].keys()]
## function that returns a DataFrame for each campsite,
## renaming the index to 'date'
def campsite_to_df(data, campsite):
out_df = pd.DataFrame(data['campsites'][campsite]).reset_index()
out_df = out_df.rename({'index': 'date'}, axis = 1)
return out_df
## generate a list of DataFrames, one per campsite
df_list = [campsite_to_df(data, cs) for cs in unique_campsites]
## concatenate the list of DataFrames into a single DataFrame,
## convert campsite id to integer and sort by campsite + date
df_full = pd.concat(df_list)
df_full['campsite_id'] = df_full['campsite_id'].astype(int)
df_full = df_full.sort_values(by = ['campsite_id','date'],
ascending = True)
## remove extraneous columns and rename campsite_id to campsites
df_full = df_full[['campsite_id','date','availabilities',
'max_num_people','min_num_people','type_of_use']]
df_full = df_full.rename({'campsite_id': 'campsites'}, axis = 1)

Related

How do I save each iteration of a for loop in one big DataFrame - Python

I want to gather all the historical prices of each stock in the S&P500 in Python. I'm using a package from IEX Cloud which gives me the historical prices of an individual stock. I want a for loop to run through a list of the tickers/symbols from the stock index so that I get all the data in a single DataFrame.
This is the code that produces a DataFrame - in this example I've chosen AAPL for a two year period:
import pyEX as p
sym = 'AAPL'
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']]
df
This DataFrame contains the date and the daily closing price. Now do any of you have any ideas how to loop through my list of tickers, so that I get a comprehensive DataFrame of all the historical prices?
Thank you.

Create an empty list to append to and concat everything together after you iterate over all the tickers
import pyEX as p
import pandas as pd
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
dfs = [] # create an empty list
for sym in stock_list: # iterate over your ticker list
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']] # create your frame
dfs.append(df) # append frame to list
final_df = pd.concat(dfs) # concat all your frames together into one
Update with Try-Except
import pyEX as p
import pandas as pd
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
dfs = [] # create an empty list
for sym in stock_list: # iterate over your ticker list
try:
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']] # create your frame
dfs.append(df) # append frame to list
except KeyError:
print(f'KeyError for {sym}')
final_df = pd.concat(dfs) # concat all your frames together into one

Create nested json lines from pipe delimited flat file using python

I have a text file pipe delimited as below. In that file for same ID, CODE and NUM combination we can have different INC and INC_DESC
ID|CODE|NUM|INC|INC_DESC
"F1"|"W1"|1|1001|"INC1001"
"F1"|"W1"|1|1002|"INC1002"
"F1"|"W1"|1|1003|"INC1003"
"F2"|"W1"|1|1002|"INC1003"
"F2"|"W1"|1|1003|"INC1004"
"F2"|"W2"|1|1003|"INC1003"
We want to create json like below where different INC and INC_DESC should come as an array for same combination of ID, CODE and NUM
{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001, "INC_DESC":"INC1001"},{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003, "INC_DESC":"INC1003"}]}
I tried below but it is not generating nested as I want
import pandas as pd
Input_File=f'V:\input.dat'
df=pd.read_csv(Input_File, sep='|')
json_output=f'V:\outfile.json'
output=df.to_json(json_output, orient='records')

import pandas as pd
# agg function
def agg_that(x):
l = [x]
return l
Input_File = f'V:\input.dat'
df = pd.read_csv(Input_File, sep='|')
# groupby columns
df = df.groupby(['ID', 'CODE', 'NUM']).agg(agg_that).reset_index()
# create new column
df['INC_DTL'] = df.apply(
lambda x: [{'INC': inc, 'INC_DESC': dsc} for inc, dsc in zip(x['INC'][0], x['INC_DESC'][0])], axis=1)
# drop old columns
df.drop(['INC', 'INC_DESC'], axis=1, inplace=True)
json_output = f'V:\outfile.json'
output = df.to_json(json_output, orient='records', lines=True)
OUTPUT:
{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001,"INC_DESC":"INC1001"},{"INC":1002,"INC_DESC":"INC1002"},{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F1","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002,"INC_DESC":"INC1003"},{"INC":1003,"INC_DESC":"INC1004"}]}

Iterate and Concat multiple Dataframe pandas DF python

I have the below code for a pandas operation to parse a json and pick certain columns and concat at axis 1
df_columns_raw_1 = df_tables_normalized['columns'][1]
df_columns_normalize_1 = pd.json_normalize(df_columns_raw_1)
df_colName_1 = df_columns_normalize_1['columnName']
df_table_1 = df_columns_normalize_1['tableName']
df_colLen_1 = df_columns_normalize_1['columnLength']
df_colDataType_1 = df_columns_normalize_1['columnDatatype']
result_1 = pd.concat([df_table_1, df_colName_1,df_colLen_1,df_colDataType_1], axis=1)
bigdata = pd.concat([result_1, result_2....result_500], ignore_index=True, sort=False)
I need to iterate and automate the above code to concat until result_500 df in the bigdata variable instead writing manually for all the dfs

How to append to the bottom row of two columns of a csv file using pandas?

I have a function that basically returns the date today and a random integer to the bottom of their respective columns each time the function is called.
def date_to_csv():
import pandas as pd
from random import randint
df = pd.read_csv("test.csv")
df['Date'] = [datetime.date.today()]
df['Price'] = [randint(1,100)]
df.to_csv('test.csv',mode='a',index=False,header=None)
For the first two time the function is called it works as expected and returns this in the csv file:
Date,Price
2021-06-26,29
2021-06-26,97
However calling the function afterwards returns an error: 'ValueError: Length of values (1) does not match length of index (2)'
I plan to call the function for a n number of consecutive days on the same csv file.

try:
df = pd.read_csv("test.csv")
df = df.append({'Date':datetime.date.today(), 'Price':randint(1,100)})
df.to_csv('test.csv',mode='a',index=False,header=None)

as Rob Raymond said.you today list and random list is not match you csv length.so you should make them match.and every time you write to_csv.the mode is a which is append new row to it.
df = pd.read_csv("test.csv")
length = df.shape[0]
df['Date'] = [datetime.date.today() for _ in range(length)]
df['Price'] = [randint(1,100) for _ in range(length)]
df.to_csv('test.csv',mode='a',index=False,header=None)
only append new row after each run
df = pd.read_csv("test.csv")
df = df.append({'Date':datetime.date.today(), 'Price':randint(1,100)},ignore_index=True)
df.to_csv('test.csv',mode='w',index=False)
df = pd.DataFrame({'Date':[datetime.date.today()], 'Price':[randint(1,100)]})
df.to_csv('test.csv',mode='a',index=False,header=None)
with open('test.csv','a') as f:
writer = csv.writer(f)
writer.writerow([datetime.date.today(),randint(1,100)])

multiple columns from a file into a single column of lists in pandas

I'm new to pandas , and need to prepare a table using pandas , imitating exact function performed by following code snippet:
with open(r'D:/DataScience/ml-100k/u.item') as f:
temp=''
for line in f:
fields = line.rstrip('\n').split('|')
movieId = int(fields[0])
name = fields[1]
geners = fields[5:25]
geners = map(int, geners)
My question is how to add a geners column in pandas having same :
geners = fields[5:25]

It's not clear to me what you intend to accomplish -- a single genres column containing fields 5-25 concatenated? Or separate genre columns for fields 5-25?
For the latter, you can use [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html):
import pandas as pd
cols = ['movieId', 'name'] + ['genre_' + str(i) for i in range(5, 25)]
df = pd.read_csv(r'D:/DataScience/ml-100k/u.item', delimiter='|', names=cols)
For the former, you could concatenate the genres into say, a space-separated list, using:
df['genres'] = df[cols[2:]].apply(lambda x: ' '.join(x), axis=1)
df.drop(cols[2:], axis=1, inplace=True) # drop the separate genre_N columns

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Normalize JSON to DataFrame - python

Related

How do I save each iteration of a for loop in one big DataFrame - Python

Create nested json lines from pipe delimited flat file using python

Iterate and Concat multiple Dataframe pandas DF python

How to append to the bottom row of two columns of a csv file using pandas?

multiple columns from a file into a single column of lists in pandas

Categories

Resources