Combinations of DataFrames from list - python

I have this:
dfs_in_list = [df1, df2, df3, df4, df5]
I want to concatenate all combinations of them one after the other (in a loop), like:
pd.concat([df1, df2], axis=1)
pd.concat([df1, df3], axis=1)
pd.concat([df1, df2, df3], axis=1)
...
pd.concat([df2, df3, df4, df5], axis=1)
Any ideas?

import itertools
import pandas as pd
dfs_in_list = [df1, df2, df3, df4, df5]
combinations = []
for length in range(2, len(dfs_in_list)):
combinations.extend(list(itertools.combinations(dfs_in_list, length)))
for c in combinations:
pd.concat(c, axis=1)

Related

Creating DataFrames from cell ranges to create an output

Here is my code:
import pandas as pd
import os
data_location = ""
os.chdir(data_location)
df1 = pd.read_excel('Calculation - (Vodafone) July 22.xlsx', sheet_name='PPD Summary',
index_col=False)
df2 = df1.iat[3, 5]
df3 = df1.iat[4, 5]
df4 = '9999305'
df5 = df1.iat[3, 1]
df6 = df1.iat[4, 1]
df7 = df1.iat[3, 6]
df8 = df1.iat[4, 6]
print(df4, df5, df2, df7)
print(df4, df6, df3, df8)
Running this script will return me the following which I want to output to a csv:
9999305 0.007018639425878576 GB GBP
9999305 0.006709984038878434 IE EUR
The cells which contain the information I need are in B5:B6, F5:F6 & G5:G6. I have tried using openpyxl to get the cell ranges, however I am struggling to present and output these in a way so that csv that is outputted like the above.
Try:
result = pd.DataFrame([[df4, df5, df2, df7],
[df4, df6, df3, df8]])
result.to_csv('filename.csv', header=False, index=False)
'filename.csv' will contain:
9999305,0.007018639425878576,GB,GBP
9999305,0.006709984038878434,IE,EUR
If you want just to print them in a comma-separated-format:
print(df4, df5, df2, df7, sep=',')
print(df4, df6, df3, df8, sep=',')

python pandas loops to melt or pivot multiple df

I have several df with the same structure. I'd like to create a loop to melt them or create a pivot table.
I tried the following but are not working
my_df = [df1, df2, df3]
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
for df in my_df:
df = pd.pivot_table(df, values = 'my_value', index = ['A','B','C'], columns = ['my_column'])
Any help would be great. Thank you in advance
You need assign output to new list of DataFrames:
out = []
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
out.append(df)
Same idea in list comprehension:
out = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]
If need overwitten origional values in list:
for i, df in enumerate(my_df):
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
my_df[i] = df
print (my_df)
If need overwrite variables df1, df2, df3:
df1, df2, df3 = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]

Combine series by date

The following 2 series of stocks in a single excel file:
Can be combined using the date as index?
The result should be like this:
You need a simple df.merge() here:
df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
OR
df = df1.join(df2, how='outer')
I am trying this:
df3 = pd.concat([df1, df2]).sort_values('Date').reset_index(drop=True)
or
df3 = df1.append(df2).sort_values('Date').reset_index(drop=True)

How to change the columns of multiple dataframes?

I have 8 dataframes I am working with. I want to rename all of the columns of each data frame to the same strings. I have tried:
dfs = [df1, df2, df3, df4, df5, df6, df7, df8, df9]
renames_dfs = []
for df in dfs:
renames_dfs.append(df.rename(columns={'column1':'column2','column3':'column4'}))
#renames_dfs
Where I would keep going with the column names beyond 4. It also would put the new renamed dataframes in a list, whereas I want them to be new variables.
Do you mean this, to rename those columns:
dfs = [df1, df2, df3, df4, df5, df6, df7, df8, df9]
renames_dfs = []
for df in dfs:
df.rename(columns={'column1':'column2','column3':'column4'}), inplace=True)

combine two dataframes with same index (unordered)

I don't know why this is confusing me so much. I am trying to combine two dataframes, and both share the same index (although as a note, they may not be in the same order).
df1 = |firstrow 10|
|secondrow 15|
df2 = |secondrow 115|
|firstrow 1000|
and I want the resulting dataframe to be:
result = |firstrow 10 1000|
|secondrow 15 115|
I have tried doing this:
df = pd.merge(df1,df2, on="INDEXNAME"), but it throws a KeyError on INDEXNAME
thanks!
I think you can use concat (by default outer join):
df = pd.concat([df1, df2], axis=1)
And if need inner join:
df = pd.concat([df1, df2], axis=1, join='inner')
Or merge (by default inner join) with parameters left_index and right_index:
df = pd.merge(df1, df2, left_index=True, right_index=True)
Sample:
df1 = pd.DataFrame({'a':[10,15]}, index=['firstrow','secondrow'])
df2 = pd.DataFrame({'b':[115,1000]}, index=['secondrow','firstrow'])
print (df1)
a
firstrow 10
secondrow 15
print (df2)
b
secondrow 115
firstrow 1000
print (pd.concat([df1, df2], axis=1))
a b
secondrow 15 115
firstrow 10 1000
print (pd.merge(df1, df2, left_index=True, right_index=True))
a b
secondrow 15 115
firstrow 10 1000

Categories

Resources