store for loop results in one dataframe - python

I am trying to store all the dataframes generated by this code in just one dataframe. This is my code:
df_big = pd.DataFrame()
for i in range(1,3):
df = pd.read_csv('https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2016-0' + str(i) + '.csv')
df_big.append(df)
print(df_big.shape)
However, my result is an empty DF. Any help would be appreciated.

Appending your data to an empty dataframe will give you another empty dataframe.
Try using pd.concat:
import pandas as pd
df_big = pd.DataFrame()
df = pd.DataFrame(['a','b','c'])
df_big = pd.concat([df_big,df])
print(df_big)
print("Shape of df_big: " + str(df_big.shape))
Output:

Related

Filter a data frame with similar o near time values (vlookup in dataframe)

I'd like to merge two data frames with near times values leaving one index fixed to search in the other data frame (similar to vlookup in excel). Can you recommend another worflow?
I followed this process but is not working
import pandas as pd
# read csv data
path = r"C:\Users\Documents\"
df1 = pd.read_csv(path + '\obs_heads.csv')
df2 = pd.read_csv(path + '\sim.csv')
t = pd.merge_asof(df1, df2, on="A2")
print(t)
Input:
Data frame 1:
Data frame 2:
Output:
Error:
enter image description here
Thanks,
I was seeing more posts here and I found the answer: Thanks to all the community
Joining Two Different Dataframes on Timestamp
Pandas date range returns "could not convert string to Timestamp" for yyyy-ww
import pandas as pd
# read csv data
path = r"C:\Users\1"
df1 = pd.read_csv(path + '\obs_heads2.csv')
df2 = pd.read_csv(path + '\HEAD_compiled_export.csv')
df1['Times'] = pd.to_datetime(df1['Times'])
df1=df1.set_index('Times')
df2['Times'] = pd.to_datetime(df2['Times'])
df2=df2.set_index('Times')
tol = pd.Timedelta('5 minute')
t=pd.merge_asof(left=df1,right=df2,right_index=True,left_index=True,direction='nearest',tolerance=tol)
t.to_csv( path +"\File Name.csv")

Saving multiple dataframes to CSV using loop in python

I am trying to save multiple dataframes to csv in a loop using pandas, while keeping the name of the dataframe.
import pandas as pd
df1 = pd.DataFrame({'Col1':range(1,5), 'Col2':range(6,10)})
df2 = pd.DataFrame({'Col1':range(1,5), 'Col2':range(11,15)})
frames = [df1,df2]
for data in frames:
data['New'] = data['Col1']+data['Col2']
for data in frames:
data.to_csv('C:/Users/User/Desktop/{}.csv'.format(data))
This doesn't work, but the outcome I am looking for is for both dataframes to be saved in CSV format, to my desktop.
df1.csv
df2.csv
Thanks.
You just need to set the names of the CSV files; like so:
names = ["df1", "df2"]
for name, data in zip(names, frames):
data.to_csv('C:/Users/User/Desktop/{}.csv'.format(name))
Hope this help. Note I did not use the format function. But I set up code in the directory I am working on.
import pandas as pd
df1 = pd.DataFrame({'Col1':range(1,5), 'Col2':range(6,10)})
df2 = pd.DataFrame({'Col1':range(1,5), 'Col2':range(11,15)})
frames = [df1,df2]
for data in frames:
data['New'] = data['Col1']+data['Col2']
n = 0
for data in frames:
n = n + 1
data.to_csv('df' + str(n) + ".csv")
In this loop:
for data in frames:
data.to_csv('C:/Users/User/Desktop/{}.csv'.format(data))
You are looping over a list of DataFrame objects so you cannot use them in a string format.
Instead you could use enumerate() to get the indexes as well as the objects. Then you can use the indexes to format the string.
for idx,data in enumerate(frames):
data.to_csv('df{}.csv'.format(idx + 1))
# the reason for adding 1 is to get the numbers to start from 1 instead of 0
Otherwise you can loop through your list just using the index like this:
for i in range(len(frames)):
frames[i].to_csv('df{}.csv'.format(idx + 1))

How to write back the result to the csv file using pandas without replace the existing data?

I am trying to read the file, then i would like to done the calculation to write back to the same file. But the result will replace the ori existing data, how can i change it? Please help me
import pandas as pd
df = pd.read_csv(r'C:\Users\Asus\Downloads\Number of Employed Persons by Status In Employment, Malaysia.csv')
print(df.to_string())
mean1 = df['Value'].mean()
sum1 = df['Value'].sum()
print ('Mean Value: ' + str(mean1))
print ('Sum of Value: ' + str(sum1))
df = pd.DataFrame([['Mean Value: ' + str(mean1)], ['Sum of Value: ' + str(sum1)]])
df.to_csv(r'C:\Users\Asus\Downloads\Number of Employed Persons by Status In Employment, Malaysia.csv', index=False)
print(df)
Do you want to add the data at the bottom of the file?
Override the data is not the best approach, in my opinion, but this is one solution:
import pandas as pd
df = pd.read_csv('data.csv')
mean1 = df['Value'].mean()
sum1 = df['Value'].sum()
df.loc[df.index.max() + 1] = ['Mean','Sum']
df.loc[df.index.max() + 1] = [mean1, sum1]
df.to_csv('data.csv', index=False)
Another option could be: Save all into an xlsx at the end (is better load the data from CSV if there is a large of data) and keep the dataframe1 in a first sheet, and the analysis on a second sheet.

Python Normalize JSON to DataFrame

I have been trying to normalize this JSON data for quite some time now, but I am getting stuck at a very basic step. I think the answer might be quite simple. I will take any help provided.
import json
import urllib.request
import pandas as pd
url = "https://www.recreation.gov/api/camps/availability/campground/232447/month?start_date=2021-05-01T00%3A00%3A00.000Z"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
#data = json.dumps(data, indent=4)
df = pd.json_normalize(data = data['campsites'], record_path= 'availabilities', meta = 'campsites')
print(df)
My Expected df result is as following:
Expected DataFrame Output:
One approach (not using pd.json_normalize) is to iterate through a list of the unique campsites and convert the data for each campsite to a DataFrame. The list of campsite-specific DataFrames can then be concatenated using pd.concat.
Specifically:
## generate a list of unique campsites
unique_campsites = [item for item in data['campsites'].keys()]
## function that returns a DataFrame for each campsite,
## renaming the index to 'date'
def campsite_to_df(data, campsite):
out_df = pd.DataFrame(data['campsites'][campsite]).reset_index()
out_df = out_df.rename({'index': 'date'}, axis = 1)
return out_df
## generate a list of DataFrames, one per campsite
df_list = [campsite_to_df(data, cs) for cs in unique_campsites]
## concatenate the list of DataFrames into a single DataFrame,
## convert campsite id to integer and sort by campsite + date
df_full = pd.concat(df_list)
df_full['campsite_id'] = df_full['campsite_id'].astype(int)
df_full = df_full.sort_values(by = ['campsite_id','date'],
ascending = True)
## remove extraneous columns and rename campsite_id to campsites
df_full = df_full[['campsite_id','date','availabilities',
'max_num_people','min_num_people','type_of_use']]
df_full = df_full.rename({'campsite_id': 'campsites'}, axis = 1)

Split each line of a dataframe and turn into excel file - 'list' object has no attribute 'to_frame error'

I have a dataframe and want to create an excel file for each row on that dataframe.
what am I missing?
import pandas as pd
d = {'col1':['a','b','c','d','e','f','g','h','i','j'] , 'col2': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(data=d)
b=[]
for i, row in df.iterrows():
content = b.to_frame().T
content.to_excel("file number" + str(i))
First, b is a empty list. You must insert a value on it, before use b.to_frame().T
Maybe b=row
Or, just use row instead of b (see the example below)
Second, you need put ".xlsx" in the end of the excel file.
For instance:
import pandas as pd
d = {'col1':['a','b','c','d','e','f','g','h','i','j'] , 'col2':[1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(data=d)
for i, row in df.iterrows():
content = row.to_frame().T
content.to_excel("file number" + str(i) + '.xlsx')

Categories

Resources