Python Normalize JSON to DataFrame - python

I have been trying to normalize this JSON data for quite some time now, but I am getting stuck at a very basic step. I think the answer might be quite simple. I will take any help provided.
import json
import urllib.request
import pandas as pd
url = "https://www.recreation.gov/api/camps/availability/campground/232447/month?start_date=2021-05-01T00%3A00%3A00.000Z"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
#data = json.dumps(data, indent=4)
df = pd.json_normalize(data = data['campsites'], record_path= 'availabilities', meta = 'campsites')
print(df)
My Expected df result is as following:
Expected DataFrame Output:

One approach (not using pd.json_normalize) is to iterate through a list of the unique campsites and convert the data for each campsite to a DataFrame. The list of campsite-specific DataFrames can then be concatenated using pd.concat.
Specifically:
## generate a list of unique campsites
unique_campsites = [item for item in data['campsites'].keys()]
## function that returns a DataFrame for each campsite,
## renaming the index to 'date'
def campsite_to_df(data, campsite):
out_df = pd.DataFrame(data['campsites'][campsite]).reset_index()
out_df = out_df.rename({'index': 'date'}, axis = 1)
return out_df
## generate a list of DataFrames, one per campsite
df_list = [campsite_to_df(data, cs) for cs in unique_campsites]
## concatenate the list of DataFrames into a single DataFrame,
## convert campsite id to integer and sort by campsite + date
df_full = pd.concat(df_list)
df_full['campsite_id'] = df_full['campsite_id'].astype(int)
df_full = df_full.sort_values(by = ['campsite_id','date'],
ascending = True)
## remove extraneous columns and rename campsite_id to campsites
df_full = df_full[['campsite_id','date','availabilities',
'max_num_people','min_num_people','type_of_use']]
df_full = df_full.rename({'campsite_id': 'campsites'}, axis = 1)

Related

How do I save each iteration of a for loop in one big DataFrame - Python

I want to gather all the historical prices of each stock in the S&P500 in Python. I'm using a package from IEX Cloud which gives me the historical prices of an individual stock. I want a for loop to run through a list of the tickers/symbols from the stock index so that I get all the data in a single DataFrame.
This is the code that produces a DataFrame - in this example I've chosen AAPL for a two year period:
import pyEX as p
sym = 'AAPL'
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']]
df
This DataFrame contains the date and the daily closing price. Now do any of you have any ideas how to loop through my list of tickers, so that I get a comprehensive DataFrame of all the historical prices?
Thank you.
Create an empty list to append to and concat everything together after you iterate over all the tickers
import pyEX as p
import pandas as pd
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
dfs = [] # create an empty list
for sym in stock_list: # iterate over your ticker list
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']] # create your frame
dfs.append(df) # append frame to list
final_df = pd.concat(dfs) # concat all your frames together into one
Update with Try-Except
import pyEX as p
import pandas as pd
stock_list = stocks['Ticker'].tolist()
c = p.Client(api_token='TOKEN', version='stable')
timeframe = '2y'
dfs = [] # create an empty list
for sym in stock_list: # iterate over your ticker list
try:
df = c.chartDF(symbol=sym, timeframe=timeframe)[['close']] # create your frame
dfs.append(df) # append frame to list
except KeyError:
print(f'KeyError for {sym}')
final_df = pd.concat(dfs) # concat all your frames together into one

Create nested json lines from pipe delimited flat file using python

I have a text file pipe delimited as below. In that file for same ID, CODE and NUM combination we can have different INC and INC_DESC
ID|CODE|NUM|INC|INC_DESC
"F1"|"W1"|1|1001|"INC1001"
"F1"|"W1"|1|1002|"INC1002"
"F1"|"W1"|1|1003|"INC1003"
"F2"|"W1"|1|1002|"INC1003"
"F2"|"W1"|1|1003|"INC1004"
"F2"|"W2"|1|1003|"INC1003"
We want to create json like below where different INC and INC_DESC should come as an array for same combination of ID, CODE and NUM
{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001, "INC_DESC":"INC1001"},{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003, "INC_DESC":"INC1003"}]}
I tried below but it is not generating nested as I want
import pandas as pd
Input_File=f'V:\input.dat'
df=pd.read_csv(Input_File, sep='|')
json_output=f'V:\outfile.json'
output=df.to_json(json_output, orient='records')
import pandas as pd
# agg function
def agg_that(x):
l = [x]
return l
Input_File = f'V:\input.dat'
df = pd.read_csv(Input_File, sep='|')
# groupby columns
df = df.groupby(['ID', 'CODE', 'NUM']).agg(agg_that).reset_index()
# create new column
df['INC_DTL'] = df.apply(
lambda x: [{'INC': inc, 'INC_DESC': dsc} for inc, dsc in zip(x['INC'][0], x['INC_DESC'][0])], axis=1)
# drop old columns
df.drop(['INC', 'INC_DESC'], axis=1, inplace=True)
json_output = f'V:\outfile.json'
output = df.to_json(json_output, orient='records', lines=True)
OUTPUT:
{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001,"INC_DESC":"INC1001"},{"INC":1002,"INC_DESC":"INC1002"},{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F1","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002,"INC_DESC":"INC1003"},{"INC":1003,"INC_DESC":"INC1004"}]}

Iterate and Concat multiple Dataframe pandas DF python

I have the below code for a pandas operation to parse a json and pick certain columns and concat at axis 1
df_columns_raw_1 = df_tables_normalized['columns'][1]
df_columns_normalize_1 = pd.json_normalize(df_columns_raw_1)
df_colName_1 = df_columns_normalize_1['columnName']
df_table_1 = df_columns_normalize_1['tableName']
df_colLen_1 = df_columns_normalize_1['columnLength']
df_colDataType_1 = df_columns_normalize_1['columnDatatype']
result_1 = pd.concat([df_table_1, df_colName_1,df_colLen_1,df_colDataType_1], axis=1)
bigdata = pd.concat([result_1, result_2....result_500], ignore_index=True, sort=False)
I need to iterate and automate the above code to concat until result_500 df in the bigdata variable instead writing manually for all the dfs

How to append to the bottom row of two columns of a csv file using pandas?

I have a function that basically returns the date today and a random integer to the bottom of their respective columns each time the function is called.
def date_to_csv():
import pandas as pd
from random import randint
df = pd.read_csv("test.csv")
df['Date'] = [datetime.date.today()]
df['Price'] = [randint(1,100)]
df.to_csv('test.csv',mode='a',index=False,header=None)
For the first two time the function is called it works as expected and returns this in the csv file:
Date,Price
2021-06-26,29
2021-06-26,97
However calling the function afterwards returns an error: 'ValueError: Length of values (1) does not match length of index (2)'
I plan to call the function for a n number of consecutive days on the same csv file.
try:
df = pd.read_csv("test.csv")
df = df.append({'Date':datetime.date.today(), 'Price':randint(1,100)})
df.to_csv('test.csv',mode='a',index=False,header=None)
as Rob Raymond said.you today list and random list is not match you csv length.so you should make them match.and every time you write to_csv.the mode is a which is append new row to it.
df = pd.read_csv("test.csv")
length = df.shape[0]
df['Date'] = [datetime.date.today() for _ in range(length)]
df['Price'] = [randint(1,100) for _ in range(length)]
df.to_csv('test.csv',mode='a',index=False,header=None)
only append new row after each run
df = pd.read_csv("test.csv")
df = df.append({'Date':datetime.date.today(), 'Price':randint(1,100)},ignore_index=True)
df.to_csv('test.csv',mode='w',index=False)
df = pd.DataFrame({'Date':[datetime.date.today()], 'Price':[randint(1,100)]})
df.to_csv('test.csv',mode='a',index=False,header=None)
with open('test.csv','a') as f:
writer = csv.writer(f)
writer.writerow([datetime.date.today(),randint(1,100)])

multiple columns from a file into a single column of lists in pandas

I'm new to pandas , and need to prepare a table using pandas , imitating exact function performed by following code snippet:
with open(r'D:/DataScience/ml-100k/u.item') as f:
temp=''
for line in f:
fields = line.rstrip('\n').split('|')
movieId = int(fields[0])
name = fields[1]
geners = fields[5:25]
geners = map(int, geners)
My question is how to add a geners column in pandas having same :
geners = fields[5:25]
It's not clear to me what you intend to accomplish -- a single genres column containing fields 5-25 concatenated? Or separate genre columns for fields 5-25?
For the latter, you can use [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html):
import pandas as pd
cols = ['movieId', 'name'] + ['genre_' + str(i) for i in range(5, 25)]
df = pd.read_csv(r'D:/DataScience/ml-100k/u.item', delimiter='|', names=cols)
For the former, you could concatenate the genres into say, a space-separated list, using:
df['genres'] = df[cols[2:]].apply(lambda x: ' '.join(x), axis=1)
df.drop(cols[2:], axis=1, inplace=True) # drop the separate genre_N columns

Categories

Resources