pandas - Subtract 2 similar pivot table of data frame

pandas - Subtract 2 similar pivot table of data frame - python

I have a 36 rows x 36 columns dataframe of pivot table which I transform using code below:
df_pivoted = pd.pivot_table(df,index='From',columns='To',values='count')
df_pivoted.fillna(0,inplace=True)
I transpose the same dataframe using this code:
df_trans = df_pivoted.transpose()
and want to substract those two dataframes with this code:
new_pivoted = df_pivoted - df_trans
It gives me 72 rows x 72 columns dataframe with NaN value in all cell.
Then I try to use other code:
delta = df_pivoted.subtract(df_trans, fill_value=0)
However, it yields 72 rows x 72 columns with dataframe that looks like this:
Please help me to find the difference between the original dataframe with the transpose dataframe.

After transforming of you DataFrame (pivot table) you have new DataFrame where columns become Indices and vise versa. Now when you subtract on df from another Pandas use columns and Indices and fill NaN in the rest.
if you need to subtract values no matter of index and columns use:
delta = df_pivoted.values - df_trans.values
If you want to keep Columns and Index of df_trans in df_pivoted:
df_trans = pd.DataFrame(data=df_pivoted.transpose().values,
index=df_pivoted.index,
columns = df_pivoted.columns)
delta = df_pivoted - df_trans
Now simple subtraction works.
Hope that helps!

Related

How to compare two columns in different pandas dataframes, store the differences in a 3rd dataframe

I need to compare two df1 (blue) and df2 (orange), store only the rows of df2 (orange) that are not in df1 in a separate data frame, and then add that to df1 while assigning function 6 and sector 20 for the employees that were not present in df1 (blue)
I know how to find the differences between the data frames and store that in a third data frame, but I'm stuck trying to figure out how to store only the rows of df2 that are not in df1.

Can try this:
Get a list with the data os orange u want to keep
Filter df2 with that list
Append
df1 --> blue, df2 --> orange
import pandas as pd
df2['Function'] = 6
df2['Sector'] = 20
ids_df2_keep = [e for e in df2['ID'] if e not in list(df1['ID'])]
df2 = df2[df2['ID'].isin(ids_df2_keep)
df1 = df1.append(df2)

This has been answered in pandas get rows which are NOT in other dataframe
Store it as a merge and simply select the rows that do not share common values.
~ negates the expression, select all that are NOT IN instead of IN.
common = df1.merge(df2,on=['ID','Name'])
df = df2[(~df2['ID'].isin(common['ID']))&(~df2['Name'].isin(common['Name']))]
This was tested using some of your data:
df1 = pd.DataFrame({'ID':[125,134,156],'Name':['John','Mary','Bill'],'func':[1,2,2]})
df2 = pd.DataFrame({'ID':[125,139,133],'Name':['John','Joana','Linda']})
Output is:
ID Name
1 139 Joana
2 133 Linda

combining dataframes and adding values on common date index

I have many dataframes with one column (same name in all) whose indexes are date ranges - I want to merge/combine these dataframes into one, summing the values where any dates are common. below is a simplified example
range1 = pd.date_range('2021-10-01','2021-11-01')
range2 = pd.date_range('2021-11-01','2021-12-01')
df1 = pd.DataFrame(np.random.rand(len(range1),1), columns=['value'], index=range1)
df2 = pd.DataFrame(np.random.rand(len(range2),1), columns=['value'], index=range2)
here '2021-11-01' appears in both df1 and df2 with different values
I would like to obtain a single dataframe of 62 rows (32+31-1) where the 2021-11-01 date contains the sum of its values in df1 and df2

We can use pd.concate() on the two dataframes, then df.reset_index() to get a new regular-integer index, rename the date column, and then use df.groupby().sum().
df = pd.concat([df1,df2]) # this gives 63 rows by 1 column, where the column is the values and the dates are the index
df = df.reset_index() # moves the dates to a column, now called 'index', and makes a new integer index
df = df.rename(columns={'index':'Date'}) #renames the column
df.groupby('Date').sum()

Fill Pandas Dataframe with exisiting dataframe but retain shape

I have created a Pandas dataframe using:
df = pd.DataFrame(index=np.arange(140), columns=np.arange(20))
Which gives me an empty dataframe with 140 rows and 20 columns.
I have another dataframe with 120 columns and 20 rows, I call it df2. I would like to add these rows to fill df, but still retain the shape of 140x20.
When I use:
newdf = df.append(df2) I get a dataframe with 280 rows and 20 columns.

df.iloc[:len(df2), :] = df2.values
will do the job. As the no. of columns are same so we can safely do this. Other values in df will remain NaNs. This will update the df2 records at the beginning. If you want at the end, similarly, you can do df.iloc[-len(df2):, :] = df2.values

Convert 2 dataframe columns into 1 series

How do you take 2 columns from a dataframe and create a series (1 column as index)?
number a
one 1
two 2
three 3
if the above was a dataframe, how would I convert it to a series with number column being the index?
I tried:
pd.Series(df['a'], index = df.number)
but all the values become nan.

Need set_index and select column a:
s = df.set_index('number')['a']
And for your solution is necessary add values for numpy array for avoid alignment:
s = pd.Series(df['a'].values, index = df.number)

How to copy one DataFrame column in to another Dataframe if their indexes values are the same

After creating a DataFrame with some duplicated cell values in column with the name 'keys':
import pandas as pd
df = pd.DataFrame({'keys': [1,2,2,3,3,3,3],'values':[1,2,3,4,5,6,7]})
I go ahead and create two more DataFrames which are the consolidated versions of the original DataFrame df. Those newly created DataFrames will have no duplicated cell values under the 'keys' column:
df_sum = df_a.groupby('keys', axis=0).sum().reset_index()
df_mean = df_b.groupby('keys', axis=0).mean().reset_index()
As you can see df_sum['values'] cells values were all summed together.
While df_mean['values'] cell values were averaged with mean() method.
Lastly I rename the 'values' column in both dataframes with:
df_sum.columns = ['keys', 'sums']
df_mean.columns = ['keys', 'means']
Now I would like to copy the df_mean['means'] column into the dataframe df_sum.
How to achieve this?
The Photoshoped image below illustrates the dataframe I would like to create. Both 'sums' and 'means' columns are merged into a single DataFrame:

There are several ways to do this. Using the merge function off the dataframe is the most efficient.
df_both = df_sum.merge(df_mean, how='left', on='keys')
df_both
Out[1]:
keys sums means
0 1 1 1.0
1 2 5 2.5
2 3 22 5.5

I think pandas.merge() is the function you are looking for. Like pd.merge(df_sum, df_mean, on = "keys"). Besides, this result can also be summarized on one agg function as following:
df.groupby('keys')['values'].agg(['sum', 'mean']).reset_index()
# keys sum mean
#0 1 1 1.0
#1 2 5 2.5
#2 3 22 5.5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas - Subtract 2 similar pivot table of data frame - python

Related

How to compare two columns in different pandas dataframes, store the differences in a 3rd dataframe

combining dataframes and adding values on common date index

Fill Pandas Dataframe with exisiting dataframe but retain shape

Convert 2 dataframe columns into 1 series

How to copy one DataFrame column in to another Dataframe if their indexes values are the same

Categories

Resources