I have 30 data frames and each df has a column. The column names are big and look something like as given below:
df1.columns = ['123.ABC_xyz_1.CB_1.S_01.M_01.Pmax']
df2.columns = ['123.ABC_xyz_1.CB_1.S_01.M_02.Pmax']
..
df30.columns = ['123.ABC_xyz_1.CB_1.S_01.M_30.Pmax']
I want to trim their names and I want them finally to be something like as given below:
df1.columns = ['M1Pmax']
df2.columns = ['M2Pmax']
..
df30.columns = ['M30Pmax']
I thought of something like this:
df_list = [df1,df2,....,df30]
for i,k in enumerate(df_list):
df_list[i].columns = [col_name+'_df[i]{}'.format(df_list[i]) for col_name in df_list[i].columns]
However, my above code is not working properly.
How to do it?
You are trying to use the dataframe itself in the name which is not gonna work. I am assuming you were trying to use the name of the dataframe. You are also not shortening anything in your code but just making it longer. I would suggest something like:
df_list = [df1,df2,....,df30]
for i, k in enumerate(df_list):
df_list[i].columns = ['M{}_'.format(i)+col_name.split(".")[-1] for col_name in df_list[i].columns]
IIUC
l=[]
for i in df_list:
i.columns=i.columns.str.split('.').str[-2:].str.join('').str.replace('_','')
l.append(i)
Why not doing it like this?
# List of all dataframes
df_list = [df1,df2,....,df30]
# List of Columns names for all dataframes
colum_names =[['M1Pmax'],['M2Pmax'],...., ['M30Pmax']]
for i in range(len(df_list)):
df_list[i].columns = [colum_names[i]]
Hope this will help you!.
Related
I'm starting to lose my mind a bit. I have:
df = pd.DataFrame(bunch_of_stuff)
df2 = df.loc[bunch_of_conditions].copy()
def transform_df2(df2):
df2['new_col'] = [rand()]*len(df2)
df2['existing_column_1'] = [list of new values]
return df2
df2 = transform_df2(df2)
I know what to re-insert df2 into df, such that it overwrites all its previous records.
What would the best way to do this be? df.loc[df2.index] = df2 ? This doesn't bring over any of the new columns in df2 though.
You have the right method with pd.concat. However you can optimize a little bit by using a boolean mask to avoid to recompute the index difference:
m = bunch_of_conditions
df2 = df[m].copy()
df = pd.concat([df[~m], df2]).sort_index()
Why do you want to make a copy of your dataframe? Is not simpler to use the dataframe itself?
One way I did it was:
df= pd.concat([df.loc[~df.index.isin(df2.index)],df2])
I am looking into creating a big dataframe (pandas) from several individual frames. The data is organized in MF4-Files and the number of source files varies for each cycle. The goal is to have this process automated.
Creation of Dataframes:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
These Dataframes are then merged:
df = pd.concat([df, df1, df2], axis=0)
How can I do this without dynamically creating variables for df, df1 etc.? Or is there no other way?
I have all filepathes in an Array of the form:
Filepath = ['File1.mf4', 'File2.mf4','File3.mf4',]
Now I am thinking of looping through it and create dynamically the data frames df,df1.df1000.... Any advice here?
Edit here is the full code:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
#The Data has some offset:
x = df.index.max()
df1.index += x
x = df1.index.max()
df2.index += x
#With correct index now the data can be merged
df = pd.concat([df, df1, df2], axis=0)
The way I'm interpreting your question is that you have a predefined list you want. So just:
l = []
for f in [ list ... of ... files ]:
df = load_file(f) # however you load it
l.append(df)
big_df = pd.concat(l)
del l, df, f # if you want to clean it up
You therefore don't need to manually specify variable names for your data sub-sections. If you also want to do checks or column renaming between the various files, you can also just put that into the for-loop (or alternatively, if you want to simplify to a list comprehension, into the load_file function body).
Try this:
df_list = [(MDF(file)).to_dataframe(channels) for file in Filepath]
df = pd.concat(df_list)
I want to append 3 variables to an empty dataframe after each loop.
dfvol = dfvol.append([stock,mean,median],columns=['Stock','Mean','Median'])
Columns in Dataframe should be ['Stock','Median','Mean']
Result should be:
How can I solve the problem, because something with the append code is wrong.
You're trying to use a syntax for creating a new dataframe to append to it, which is not going to work.
Here is one way you can try to do what you want
df.loc[len(df)] = [stock,mean,median]
The better approach will be creating list of entries and when your loop is done to create the dataframe using that list (instead of appending to df with every iteration)
Like this:
some_list = []
for a in b:
some_list.append([stock,mean,median])
df = pd.DataFrame(some_list, columns = ['Stock','Mean','Median'])
The append method doesn't work like that. You would only use the columns parameter if you were creating a DataFrame object. You either want to create a second temporary DataFrame and append it to the main DataFrame like this:
df_tmp = pd.DataFrame([[stock,mean,median]], columns=['Stock','Mean','Median'])
dfvol = dfvol.append(df_tmp)
...or you can use a dictionary like this:
dfvol = dfvol.append({'Stock':stock,'Mean':mean,'Median':median}, ignore_index=True)
Like this:
In [256]: dfvol = pd.DataFrame()
In [257]: stock = ['AAPL', 'FB']
In [258]: mean = [600.356, 700.245]
In [259]: median = [281.788, 344.55]
In [265]: dfvol = dfvol.append(pd.DataFrame(zip(stock, mean, median), columns=['Stock','Mean','Median']))
In [265]: dfvol
Out[265]:
Stock Mean Median
0 AAPL 600.356 281.788
1 FB 700.245 344.550
check the append notation here. There are multiple way to do it.
dfvol = dfvol.append(pd.DataFrame([[Stock,Mean,Median]],columns=['Stock','Mean','Median']))
I need your help:
I have dataframe d3:
i am using pivot_table
df4 = df3.pivot_table(index = ['Number','Department','Task'], columns="Date", values="Score",fill_value = 'N/A')
output d4 looks like:
why is not showing rows where Task empty is.
What i am doing wrong?
I would like to create dataframe like this:
I think here is necessary replace misisng values before pivot_table:
cols = ['Number','Department','Task']
df[cols] = df[cols].fillna('N/A')
So i have Dataframe that has around 40 columns. They contain (made up) scores for a test. The columns are now named as follows:
Student, Date, Score, Score.1, Score.2 all the way to Score.39.
We were asked to reset the column names so they match the score (change Score to Score.1, Score.1 to Score.2, Score.2 to Score.3 and so on).
My code looks like this now:
import pandas as pd
prog = pd.read_excel('File.xlsx')
for c in prog.columns:
prog[c].rename(columns = lambda x : 'Score_' + x)
Unfortunatly this does not give the output i want it to.I was hoping someone could show me how to do this.
Thanks in advance
John Galt came up with the solution in the comments: cols = df.columns.tolist() and df.columns = cols[:2] + ['Score_%i' % i for i in xrange(1, len(cols[2:])+1)]