I have about 13 dataframes. I need to write all this into csv. So I thought I got use a for loop.
For example:
data1 = pd.Dataframe({'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]})
data2 = pd.Dataframe({'Name':['ABC', 'EFG', 'HIJ', 'LMN'],'Age':[2,3,9,4]})
..
data13 = ....
list_df = [data1, data2,.....,data13]
for i in list_df:
list_df[i].to_csv(...)
But it says a list can't have dataframes Error. What can I do to loop through the variable name of df?
for i in list_df:
i.to_csv(...)
here the variable i is the individual dataframe in the list, but i think you are thinking it to be index which is not the case.
for i,x in enumerate(list_df):
list_df[i].to_csv(...)
this would work.
Just do this:
for i in list_df:
i.to_csv(...)
Your i references the dataframe. You can do one of the following ways:
for i in list_df:
i.to_csv(...)
or:
for i,df in encounter(list_df):
list_df[i].to_csv(...)
Related
I have 4 different dataframes containing time series data that all have the same structure.
My goal is to take each individual dataframe and pass it through a function I have defined that will group them by datestamp, sum the columns and return a new dataframe with the columns I want. So in total I want 4 new dataframes that have only the data I want.
I just looked through this post:
Loop through different dataframes and perform actions using a function
but applying this did not change my results.
Here is my code:
I am putting the dataframes in a list so I can iterate through them
dfs = [vds, vds2, vds3, vds4]
This is my function I want to pass each dataframe through:
def VDS_pre(df):
df = df.groupby(['datestamp','timestamp']).sum().reset_index()
df = df.rename(columns={'datestamp': 'Date','timestamp':'Time','det_vol': 'VolumeVDS'})
df = df[['Date','Time','VolumeVDS']]
return df
This is the loop I made to iterate through my dataframe list and pass each one through my function:
for df in dfs:
df = VDS_pre(df)
However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did. Thanks for the help!
However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did.
Yes, this is actually the case. The reason why they have not been modified is:
Assignment to an item in a for item in lst: loop does not have any effect on both the lst and the identifier/variables from which the lst items got their values as it is demonstrated with following code:
v1=1; v2=2; v3=3
lst = [v1,v2,v3]
for item in lst:
item = 0
print(lst, v1, v2, v3) # gives: [1, 2, 3] 1 2 3
To achieve the result you expect to obtain you can use a list comprehension and the list unpacking feature of Python:
vds,vds2,vds3,vds4=[VDS_pre(df) for df in [vds,vds2,vds3,vds4]]
or following code which is using a list of strings with the identifier/variable names of the dataframes:
sdfs = ['vds', 'vds2', 'vds3', 'vds4']
for sdf in sdfs:
exec(str(f'{sdf} = VDS_pre(eval(sdf))'))
Now printing vds, vds2, vds3 and vds4 will output the modified dataframes.
Pandas frame operations return new copy of data. Your snippet store the result in df variable which is not stored or updated to your initial list. This is why you don't have any stored result after execution.
If you don't need to keep original frames, you may simply overwrite them:
for i, df in enumerate(dfs):
dfs[i] = VDS_pre(df)
If not just use a second list and append result to it.
l = []
for df in dfs:
df2 = VDS_pre(df)
l.append(df2)
Or even better use list comprehension to rewrite this snippet into a single line of code.
Now you are able to store the result of your processing.
Additionally if your frames have the same structure and can be merged as a single frame, you may consider to first concat them and then apply your function on it. That would be totally pandas.
I am trying to loop through a list of dataframes (list_a) and reindex them using another list (list_b). The function .loc works fine in the loop below:
for z,x in zip (list_a, list_b):
z.loc([x])
I just have no clue how to save the result as new dataframes.
I imagine the code might start as follows:
df_new = pd.DataFrame()
for i in df_list:
for z,x in zip (df_list, regions_order):
z.loc([x])
Do you have any suggestions?
Thanks a lot for your help!
list_a=[df1, df2]
df1 and df2 consist of an index='colors' and one column='freq' (float)
list_b=['green', 'yellow', 'blue', 'brown']
You might be mistaking:
zip https://docs.python.org/3/library/functions.html#zip
and enumerate https://docs.python.org/3/library/functions.html#enumerate
In that case, I would suggest trying this:
for i, df in enumerate(list_a):
list_a[i] = df.reindex(list_b)
I want to append 3 variables to an empty dataframe after each loop.
dfvol = dfvol.append([stock,mean,median],columns=['Stock','Mean','Median'])
Columns in Dataframe should be ['Stock','Median','Mean']
Result should be:
How can I solve the problem, because something with the append code is wrong.
You're trying to use a syntax for creating a new dataframe to append to it, which is not going to work.
Here is one way you can try to do what you want
df.loc[len(df)] = [stock,mean,median]
The better approach will be creating list of entries and when your loop is done to create the dataframe using that list (instead of appending to df with every iteration)
Like this:
some_list = []
for a in b:
some_list.append([stock,mean,median])
df = pd.DataFrame(some_list, columns = ['Stock','Mean','Median'])
The append method doesn't work like that. You would only use the columns parameter if you were creating a DataFrame object. You either want to create a second temporary DataFrame and append it to the main DataFrame like this:
df_tmp = pd.DataFrame([[stock,mean,median]], columns=['Stock','Mean','Median'])
dfvol = dfvol.append(df_tmp)
...or you can use a dictionary like this:
dfvol = dfvol.append({'Stock':stock,'Mean':mean,'Median':median}, ignore_index=True)
Like this:
In [256]: dfvol = pd.DataFrame()
In [257]: stock = ['AAPL', 'FB']
In [258]: mean = [600.356, 700.245]
In [259]: median = [281.788, 344.55]
In [265]: dfvol = dfvol.append(pd.DataFrame(zip(stock, mean, median), columns=['Stock','Mean','Median']))
In [265]: dfvol
Out[265]:
Stock Mean Median
0 AAPL 600.356 281.788
1 FB 700.245 344.550
check the append notation here. There are multiple way to do it.
dfvol = dfvol.append(pd.DataFrame([[Stock,Mean,Median]],columns=['Stock','Mean','Median']))
I have 30 data frames and each df has a column. The column names are big and look something like as given below:
df1.columns = ['123.ABC_xyz_1.CB_1.S_01.M_01.Pmax']
df2.columns = ['123.ABC_xyz_1.CB_1.S_01.M_02.Pmax']
..
df30.columns = ['123.ABC_xyz_1.CB_1.S_01.M_30.Pmax']
I want to trim their names and I want them finally to be something like as given below:
df1.columns = ['M1Pmax']
df2.columns = ['M2Pmax']
..
df30.columns = ['M30Pmax']
I thought of something like this:
df_list = [df1,df2,....,df30]
for i,k in enumerate(df_list):
df_list[i].columns = [col_name+'_df[i]{}'.format(df_list[i]) for col_name in df_list[i].columns]
However, my above code is not working properly.
How to do it?
You are trying to use the dataframe itself in the name which is not gonna work. I am assuming you were trying to use the name of the dataframe. You are also not shortening anything in your code but just making it longer. I would suggest something like:
df_list = [df1,df2,....,df30]
for i, k in enumerate(df_list):
df_list[i].columns = ['M{}_'.format(i)+col_name.split(".")[-1] for col_name in df_list[i].columns]
IIUC
l=[]
for i in df_list:
i.columns=i.columns.str.split('.').str[-2:].str.join('').str.replace('_','')
l.append(i)
Why not doing it like this?
# List of all dataframes
df_list = [df1,df2,....,df30]
# List of Columns names for all dataframes
colum_names =[['M1Pmax'],['M2Pmax'],...., ['M30Pmax']]
for i in range(len(df_list)):
df_list[i].columns = [colum_names[i]]
Hope this will help you!.
I am new to pandas/python. So i am reading a .xlsx file and in that i created bunch of dataframes, 16 to be precise and a master dataframe which is empty. Now I want to append all of these 16 dataframes to the master dataframe one by one, using for loops.
1 method I thought of iterating through a list. But can these df_1, df_2 etc be stored in a list, and then we can iterate over them.
Let's say suppose i had a csv file then,
df1 = pd.read_csv('---.csv')
df2 = pd.read_csv('---.csv')
then i create a list,
filenames = ['---.csv','---.csv']
create an empty master dataframe :
master_df= []
finally, loop through the list :
for f in filenames:
master_df.append(pd.read_csv(f))
but this wont apply, i need something similar, so how can i iterate over all the dataframes. Any solution would be appreciated.
FINALLY, this is my master_df :
master_df = pd.DataFrame({'Variable_Name': [], 'Value':[], 'Count': []})
and this is the 1st dataframe :
df_1 = pd.DataFrame({
'Variable_Name': ['Track', 'Track', 'Track', 'Track'],
'Value': ['Track 38','Track 39', 'Track 40', 'Track 37'],
'Count': [161, 160, 158, 152]})
Similarly 15 more are there.
This is because append() returns new dataframe and this object should be stored somewhere
Try:
for f in filenames:
master_df = master_df.append(pd.read_csv(f))
More info of append function: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html