Python join dataframes by index - python

I'm working with multiple dataframes in Python and I'm looking to map one onto the other based on a common column (similar to index/match in Excel). I want to join the date column of one dataframe, to the index of the other dataframe (where the date is stored as the index). How would I call out the index? For reference, I want to subtract my ROI for dataframe 2 (awk_price) to the ROI from dataframe 1 (S&P 500). The dataframes are shown below.
I currently have a merged dataframe using
pd.merge(awk_price,sp_500, left_index=True, right_on='Date')
I would love to just add a column to df2 subtracting ROI from dataframe 2 by ROI from dataframe 1 but I can't figure out how to "map" the dates column from dataframe 1 to the index from dataframe 2.
Dataframe 2 (awk_price)
Dataframe 1 (sp_500)

You can use reset_index(), and then rename the column:
df=df1.reset_index().rename(columns={"index": "Date"})
df

Related

Rows into columns in a multi-index Pandas DataFrame

I have a DataFrame with a multi-index, one of dates and the other numbered from 0 to 1267 as the image shows.
How do I have the index 0-1267 as columns instead of rows and have the dates as the only row index?
Select some column and then use Series.unstack by first level:
df1 = df['CUMULATIVE FRACTION'].unstack(0)
Or if need MultiIndex in columns use DataFrame.unstack:
df2 = df.unstack(0)

Copying dataframes columns into another dataframe

I have two dataframes df1 and df2 where df1 has 9 columns and df2 has 8 columns. I want to replace the first 8 columns of df1 with that of df2. How can this be done? I tried with iloc but not able to succeed.
Following are the files:
https://www.filehosting.org/file/details/842516/tpkA0t2vAtkrqKTb/df1.csv for df1
https://www.filehosting.org/file/details/842517/8XpizwCAX79p9rrZ/df2.csv for df2
import pandas as pd
df1=pd.DataFrame({0:[1,1,1,0,0,0],1:[0,1,0,0,0,0],2:[1,1,1,0,0,0],3:[0,0,0,2,3,4],4:[0,0,0,0,1,0],5:[0,0,0,2,1,2]})
df2=pd.DataFrame({6:[2,2,2,0,0,0],7:[0,2,0,0,0,0],8:[2,2,2,0,0,0],'d':[0,0,0,2,3,4],'e':[0,0,0,0,1,0],'f':[0,0,0,2,1,2]})
z=pd.concat([df1.iloc[:,3:],df2.iloc[:,0:3]],axis=1)
Here I have concatenated from 3rd column to last column of 1st dataframe and the first 3 column of 2nd dataframe. Similarly you concatenate whichever row or column you want to concatenate

pandas - Subtract 2 similar pivot table of data frame

I have a 36 rows x 36 columns dataframe of pivot table which I transform using code below:
df_pivoted = pd.pivot_table(df,index='From',columns='To',values='count')
df_pivoted.fillna(0,inplace=True)
I transpose the same dataframe using this code:
df_trans = df_pivoted.transpose()
and want to substract those two dataframes with this code:
new_pivoted = df_pivoted - df_trans
It gives me 72 rows x 72 columns dataframe with NaN value in all cell.
Then I try to use other code:
delta = df_pivoted.subtract(df_trans, fill_value=0)
However, it yields 72 rows x 72 columns with dataframe that looks like this:
Please help me to find the difference between the original dataframe with the transpose dataframe.
After transforming of you DataFrame (pivot table) you have new DataFrame where columns become Indices and vise versa. Now when you subtract on df from another Pandas use columns and Indices and fill NaN in the rest.
if you need to subtract values no matter of index and columns use:
delta = df_pivoted.values - df_trans.values
If you want to keep Columns and Index of df_trans in df_pivoted:
df_trans = pd.DataFrame(data=df_pivoted.transpose().values,
index=df_pivoted.index,
columns = df_pivoted.columns)
delta = df_pivoted - df_trans
Now simple subtraction works.
Hope that helps!

DataFrame merge on column gives NaN

I have two DataFrames with the first df:
indegree interrupts Subject
1 2 Weather
2 3 Weather
4 5 Weather
The second join:
Subject interrupts_mean indegree_mean
weather 2 3
But the second is a lot shorter since I made that the means of all the different subjects in the first dataframe.
When I want to merge both DataFrames
pd.merge(df,join,left_index=True,right_index=True,how='left')
it merges but it gives NaNs on the second dataframe in the new dataframe and I suppose it it so since the DataFrames are not the same length. How can I still merge on subject so that the values from the second DataFrame are duplicated in the new DataFrame?

How to copy one DataFrame column in to another Dataframe if their indexes values are the same

After creating a DataFrame with some duplicated cell values in column with the name 'keys':
import pandas as pd
df = pd.DataFrame({'keys': [1,2,2,3,3,3,3],'values':[1,2,3,4,5,6,7]})
I go ahead and create two more DataFrames which are the consolidated versions of the original DataFrame df. Those newly created DataFrames will have no duplicated cell values under the 'keys' column:
df_sum = df_a.groupby('keys', axis=0).sum().reset_index()
df_mean = df_b.groupby('keys', axis=0).mean().reset_index()
As you can see df_sum['values'] cells values were all summed together.
While df_mean['values'] cell values were averaged with mean() method.
Lastly I rename the 'values' column in both dataframes with:
df_sum.columns = ['keys', 'sums']
df_mean.columns = ['keys', 'means']
Now I would like to copy the df_mean['means'] column into the dataframe df_sum.
How to achieve this?
The Photoshoped image below illustrates the dataframe I would like to create. Both 'sums' and 'means' columns are merged into a single DataFrame:
There are several ways to do this. Using the merge function off the dataframe is the most efficient.
df_both = df_sum.merge(df_mean, how='left', on='keys')
df_both
Out[1]:
keys sums means
0 1 1 1.0
1 2 5 2.5
2 3 22 5.5
I think pandas.merge() is the function you are looking for. Like pd.merge(df_sum, df_mean, on = "keys"). Besides, this result can also be summarized on one agg function as following:
df.groupby('keys')['values'].agg(['sum', 'mean']).reset_index()
# keys sum mean
#0 1 1 1.0
#1 2 5 2.5
#2 3 22 5.5

Categories

Resources