python pandas loops to melt or pivot multiple df - python

I have several df with the same structure. I'd like to create a loop to melt them or create a pivot table.
I tried the following but are not working
my_df = [df1, df2, df3]
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
for df in my_df:
df = pd.pivot_table(df, values = 'my_value', index = ['A','B','C'], columns = ['my_column'])
Any help would be great. Thank you in advance

You need assign output to new list of DataFrames:
out = []
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
out.append(df)
Same idea in list comprehension:
out = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]
If need overwitten origional values in list:
for i, df in enumerate(my_df):
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
my_df[i] = df
print (my_df)
If need overwrite variables df1, df2, df3:
df1, df2, df3 = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]

Related

Creating DataFrames from cell ranges to create an output

Here is my code:
import pandas as pd
import os
data_location = ""
os.chdir(data_location)
df1 = pd.read_excel('Calculation - (Vodafone) July 22.xlsx', sheet_name='PPD Summary',
index_col=False)
df2 = df1.iat[3, 5]
df3 = df1.iat[4, 5]
df4 = '9999305'
df5 = df1.iat[3, 1]
df6 = df1.iat[4, 1]
df7 = df1.iat[3, 6]
df8 = df1.iat[4, 6]
print(df4, df5, df2, df7)
print(df4, df6, df3, df8)
Running this script will return me the following which I want to output to a csv:
9999305 0.007018639425878576 GB GBP
9999305 0.006709984038878434 IE EUR
The cells which contain the information I need are in B5:B6, F5:F6 & G5:G6. I have tried using openpyxl to get the cell ranges, however I am struggling to present and output these in a way so that csv that is outputted like the above.
Try:
result = pd.DataFrame([[df4, df5, df2, df7],
[df4, df6, df3, df8]])
result.to_csv('filename.csv', header=False, index=False)
'filename.csv' will contain:
9999305,0.007018639425878576,GB,GBP
9999305,0.006709984038878434,IE,EUR
If you want just to print them in a comma-separated-format:
print(df4, df5, df2, df7, sep=',')
print(df4, df6, df3, df8, sep=',')

rename columns according to list

I have 3 lists of data frames and I want to add a suffix to each column according to whether it belongs to a certain list of data frames. its all in order, so the first item in the suffix list should be appended to the columns of data frames in the first list of data frames etc. I am trying here but its adding each item in the suffix list to each column.
In the expected output
all columns in dfs in cat_a need group1 appended
all columns in dfs in cat_b need group2 appended
all columns in dfs in cat_c need group3 appended
data and code are here
df1, df2, df3, df4 = (pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('a', 'b')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('c', 'd')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('e', 'f')),
pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=('g', 'h')))
cat_a = [df1, df2]
cat_b = [df3, df4, df2]
cat_c = [df1]
suffix =['group1', 'group2', 'group3']
dfs = [cat_a, cat_b, cat_c]
for x, y in enumerate(dfs):
for i in y:
suff=suffix
i.columns = i.columns + '_' + suff[x]
thanks for taking a look!
Brian Joseph's answer is great*, but I'd like to point out that you were very close, you just weren't renaming the columns correctly. Your last line should be like this:
i.columns = [col + '_' + suff[x] for col in i.columns]
instead of this:
i.columns = i.columns + '_' + suff[x]
Assuming you want to have multiple suffixes for some dataframes, I think this is what you want?:
suffix_mapper = {
'group1': [df1, df2],
'group2': [df3, df4, df2],
'group3': [df1]
}
for suffix, dfs in suffix_mapper.items():
for df in dfs:
df.columns = [f"{col}_{suffix}" for col in df.columns]
I think the issue is because you're not taking a copy of the dataframe so each cat dataframe is referencing a df dataframe multiple times.
Try:
cat_a = [df1.copy(), df2.copy()]
cat_b = [df3.copy(), df4.copy(), df2.copy()]
cat_c = [df1.copy()]

Combine series by date

The following 2 series of stocks in a single excel file:
Can be combined using the date as index?
The result should be like this:
You need a simple df.merge() here:
df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
OR
df = df1.join(df2, how='outer')
I am trying this:
df3 = pd.concat([df1, df2]).sort_values('Date').reset_index(drop=True)
or
df3 = df1.append(df2).sort_values('Date').reset_index(drop=True)

Changing pandas dataframe by reference

I have two large DataFrames that I don't want to make copies of, but want to apply the same change to. How can I do this properly? For example, this is similar to what I want to do, but on a smaller scale. This only creates the temporary variable df that gives the result of each DataFrame, but I want both DataFrames to be themselves changed:
import pandas as pd
df1 = pd.DataFrame({'a':[1,2,3]})
df2 = pd.DataFrame({'a':[0,1,5,7]})
for df in [df1, df2]:
df = df[df['a'] < 3]
We can do query with inplace
df1 = pd.DataFrame({'a':[1,2,3]})
df2 = pd.DataFrame({'a':[0,1,5,7]})
for df in [df1, df2]:
df.query('a<3',inplace=True)
df1
a
0 1
1 2
df2
a
0 0
1 1
Don't think this is the best solution, but should do the job.
import pandas as pd
df1 = pd.DataFrame({'a':[1,2,3]})
df2 = pd.DataFrame({'a':[0,1,5,7]})
dfs = [df1, df2]
for i, df in enumerate(dfs):
dfs[i] = df[df['a'] < 3]
dfs[0]
a
0 1
1 2

Renaming columns of dataframe list in Pandas

I have a list with lots of dataframes
col = ['open', 'high', 'low', 'close']
index = [1, 2, 3, 4]
df1 = pd.DataFrame(columns=col, index=index)
df2 = pd.DataFrame(columns=col, index=index)
df3 = pd.DataFrame(columns=col, index=index)
dflist = [df1, df2, df3]
I need to rename all the columns of all the dataframes in the list. I need to add the name of each dataframe to the name of each column. I tried to do it with a for loop.
for key in dflist:
key.rename(columns=lambda x: key+x)
Obviously, this is not working. The desired output would be:
In [1]: df1.columns.tolist()
Out [2]: ['df1open', 'df1high', 'df1low', 'df1close']
In [3]: df2.columns.tolist()
Out [4]: ['df2open', 'df2high', 'df2low', 'df2close']
In [5]: df3.columns.tolist()
Out [6]: ['df3open', 'df3high', 'df3low', 'df3close']
Thanks for your help.
You want to use a dict instead of a list to store the DataFrames, if you need to somehow access their "names" and manipulate them programmatically (think when you have thousands of them). Also note the use of the inplace argument, which is common in pandas:
import pandas as pd
col = ['open', 'high', 'low', 'close']
index = [1, 2, 3, 4]
df_all = {'df1': pd.DataFrame(columns=col, index=index),
'df2': pd.DataFrame(columns=col, index=index),
'df3': pd.DataFrame(columns=col, index=index)}
for key, df in df_all.iteritems():
df.rename(columns=lambda x: key+x, inplace=True)
print df_all['df1'].columns.tolist()
Output:
['df1open', 'df1high', 'df1low', 'df1close']
There are a couple of issues here. Firstly, dflist is the list of DataFrames, as opposed to the names of those DataFrames. So df1 is not the same as "df1", which means that key + x isn't a string concatenation.
Secondly, the rename() function returns a new DataFrame. So you have to pass the inplace=True parameter to overwrite the existing column names.
Try this instead:
dflist = ['df1', 'df2', 'df3']
for key in dflist:
df = eval(key)
df.rename(columns=lambda x: key+x, inplace=True)

Categories

Resources