I have a 36 rows x 36 columns dataframe of pivot table which I transform using code below:
df_pivoted = pd.pivot_table(df,index='From',columns='To',values='count')
df_pivoted.fillna(0,inplace=True)
I transpose the same dataframe using this code:
df_trans = df_pivoted.transpose()
and want to substract those two dataframes with this code:
new_pivoted = df_pivoted - df_trans
It gives me 72 rows x 72 columns dataframe with NaN value in all cell.
Then I try to use other code:
delta = df_pivoted.subtract(df_trans, fill_value=0)
However, it yields 72 rows x 72 columns with dataframe that looks like this:
Please help me to find the difference between the original dataframe with the transpose dataframe.
After transforming of you DataFrame (pivot table) you have new DataFrame where columns become Indices and vise versa. Now when you subtract on df from another Pandas use columns and Indices and fill NaN in the rest.
if you need to subtract values no matter of index and columns use:
delta = df_pivoted.values - df_trans.values
If you want to keep Columns and Index of df_trans in df_pivoted:
df_trans = pd.DataFrame(data=df_pivoted.transpose().values,
index=df_pivoted.index,
columns = df_pivoted.columns)
delta = df_pivoted - df_trans
Now simple subtraction works.
Hope that helps!
Related
I need to compare two df1 (blue) and df2 (orange), store only the rows of df2 (orange) that are not in df1 in a separate data frame, and then add that to df1 while assigning function 6 and sector 20 for the employees that were not present in df1 (blue)
I know how to find the differences between the data frames and store that in a third data frame, but I'm stuck trying to figure out how to store only the rows of df2 that are not in df1.
Can try this:
Get a list with the data os orange u want to keep
Filter df2 with that list
Append
df1 --> blue, df2 --> orange
import pandas as pd
df2['Function'] = 6
df2['Sector'] = 20
ids_df2_keep = [e for e in df2['ID'] if e not in list(df1['ID'])]
df2 = df2[df2['ID'].isin(ids_df2_keep)
df1 = df1.append(df2)
This has been answered in pandas get rows which are NOT in other dataframe
Store it as a merge and simply select the rows that do not share common values.
~ negates the expression, select all that are NOT IN instead of IN.
common = df1.merge(df2,on=['ID','Name'])
df = df2[(~df2['ID'].isin(common['ID']))&(~df2['Name'].isin(common['Name']))]
This was tested using some of your data:
df1 = pd.DataFrame({'ID':[125,134,156],'Name':['John','Mary','Bill'],'func':[1,2,2]})
df2 = pd.DataFrame({'ID':[125,139,133],'Name':['John','Joana','Linda']})
Output is:
ID Name
1 139 Joana
2 133 Linda
I have many dataframes with one column (same name in all) whose indexes are date ranges - I want to merge/combine these dataframes into one, summing the values where any dates are common. below is a simplified example
range1 = pd.date_range('2021-10-01','2021-11-01')
range2 = pd.date_range('2021-11-01','2021-12-01')
df1 = pd.DataFrame(np.random.rand(len(range1),1), columns=['value'], index=range1)
df2 = pd.DataFrame(np.random.rand(len(range2),1), columns=['value'], index=range2)
here '2021-11-01' appears in both df1 and df2 with different values
I would like to obtain a single dataframe of 62 rows (32+31-1) where the 2021-11-01 date contains the sum of its values in df1 and df2
We can use pd.concate() on the two dataframes, then df.reset_index() to get a new regular-integer index, rename the date column, and then use df.groupby().sum().
df = pd.concat([df1,df2]) # this gives 63 rows by 1 column, where the column is the values and the dates are the index
df = df.reset_index() # moves the dates to a column, now called 'index', and makes a new integer index
df = df.rename(columns={'index':'Date'}) #renames the column
df.groupby('Date').sum()
I have created a Pandas dataframe using:
df = pd.DataFrame(index=np.arange(140), columns=np.arange(20))
Which gives me an empty dataframe with 140 rows and 20 columns.
I have another dataframe with 120 columns and 20 rows, I call it df2. I would like to add these rows to fill df, but still retain the shape of 140x20.
When I use:
newdf = df.append(df2) I get a dataframe with 280 rows and 20 columns.
df.iloc[:len(df2), :] = df2.values
will do the job. As the no. of columns are same so we can safely do this. Other values in df will remain NaNs. This will update the df2 records at the beginning. If you want at the end, similarly, you can do df.iloc[-len(df2):, :] = df2.values
How do you take 2 columns from a dataframe and create a series (1 column as index)?
number a
one 1
two 2
three 3
if the above was a dataframe, how would I convert it to a series with number column being the index?
I tried:
pd.Series(df['a'], index = df.number)
but all the values become nan.
Need set_index and select column a:
s = df.set_index('number')['a']
And for your solution is necessary add values for numpy array for avoid alignment:
s = pd.Series(df['a'].values, index = df.number)
After creating a DataFrame with some duplicated cell values in column with the name 'keys':
import pandas as pd
df = pd.DataFrame({'keys': [1,2,2,3,3,3,3],'values':[1,2,3,4,5,6,7]})
I go ahead and create two more DataFrames which are the consolidated versions of the original DataFrame df. Those newly created DataFrames will have no duplicated cell values under the 'keys' column:
df_sum = df_a.groupby('keys', axis=0).sum().reset_index()
df_mean = df_b.groupby('keys', axis=0).mean().reset_index()
As you can see df_sum['values'] cells values were all summed together.
While df_mean['values'] cell values were averaged with mean() method.
Lastly I rename the 'values' column in both dataframes with:
df_sum.columns = ['keys', 'sums']
df_mean.columns = ['keys', 'means']
Now I would like to copy the df_mean['means'] column into the dataframe df_sum.
How to achieve this?
The Photoshoped image below illustrates the dataframe I would like to create. Both 'sums' and 'means' columns are merged into a single DataFrame:
There are several ways to do this. Using the merge function off the dataframe is the most efficient.
df_both = df_sum.merge(df_mean, how='left', on='keys')
df_both
Out[1]:
keys sums means
0 1 1 1.0
1 2 5 2.5
2 3 22 5.5
I think pandas.merge() is the function you are looking for. Like pd.merge(df_sum, df_mean, on = "keys"). Besides, this result can also be summarized on one agg function as following:
df.groupby('keys')['values'].agg(['sum', 'mean']).reset_index()
# keys sum mean
#0 1 1 1.0
#1 2 5 2.5
#2 3 22 5.5