Pandas Mean Across Two Data Frames on Similar Columns only

Pandas Mean Across Two Data Frames on Similar Columns only - python

I have a unique requirement , where i need mean of common columns (per row) from two dataframes.
I can not think of a pythonic way of doing this. I know i can loop through two data frames and find common columns and then get mean of rows where key matches.
Assuming I have below Data Frames:
DF1:
Key A B C D E
K1 2 3 4 5 8
K2 2 3 4 5 8
K3 2 3 4 5 8
K4 2 3 4 5 8
DF2:
Key A B C D
K1 4 7 4 7
K2 4 7 4 7
K3 4 7 4 7
K4 4 7 4 7
The result DF should be the mean values of the two DF , each column per row where Key matches.
ResultDF:
Key A B C D
K1 3 5 4 6
K2 3 5 4 6
K3 3 5 4 6
K4 3 5 4 6
I know i should put sample code here , but i can not think of any logic for achieving this till now.

Use DataFrame.add using Key as the indexes:
df1.set_index('Key').add(df2.set_index('Key')).dropna(axis=1) / 2
A B C D
Key
K1 3 5 4 6
K2 3 5 4 6
K3 3 5 4 6
K4 3 5 4 6
Alternative with concat + groupby.
pd.concat([df1, df2], axis=0).dropna(axis=1).groupby('Key').mean()
A B C D
Key
K1 3 5 4 6
K2 3 5 4 6
K3 3 5 4 6
K4 3 5 4 6

Try adding the to data frames together then use the pandas apply function then add a lambda in it then divide x with two:
import pandas as pd
df1 = pd.DataFrame({'A': [2,2]})
df2 = pd.DataFrame({'A': [4,4]})
print((df1+df2).apply(lambda x: x/2))
Output:
A
0 3.0
1 3.0
Note: this is just with a dummy data frame

Related

pandas apply custom function on subset of the columns

I have a dataframe:
A B C v1 v2 v3
q 2 3 4 9 1
8 f 2 7 4 7
I want to calc a new columns, that will have the RMS (sqrt(sum(x^2)) of all the v columns.
So the new df will be:
A B C v1 v2 v3 v_rms
q 2 3 4 9 1 9.9
8 f 2 7 2 4 8.3
since sqrt(4^2 + 9^2 + 1^2) = 9.9, sqrt(7^2 + 2^2 + 4^2) = 8.3
What is the best way to do so?

Use DataFrame.filter for v columns, then DataFrame.pow with sum and for sqrt is used pow with 1/2:
df['v_rms'] = df.filter(like='v').pow(2).sum(axis=1).pow(1/2)
print (df)
A B C v1 v2 v3 v_rms
0 q 2 3 4 9 1 9.899495
1 8 f 2 7 2 4 8.306624

Need to convert all columns in a row with unique values infront?

I have data frame where i need to convert all the column in a row with their unique values
A B C
1 2 2
1 2 3
5 2 9
Desired output
X1 V1
A 1
A 5
B 2
C 2
C 3
C 9
I can get unique values by unique() function but don't know how I get desired output in pandas

You can use melt and drop_duplicates:
df.melt(var_name='X1', value_name='V1').drop_duplicates()
Output:
X1 V1
0 A 1
2 A 5
3 B 2
6 C 2
7 C 3
8 C 9
P.S. And you can add .reset_index(drop=True) if you want to have sequential integers for index

summing columns from different dataframes Pandas

I have 3 DataFrames, all with over 100 rows and 1000 columns. I am trying to combine all these DataFrames into one in such a way that common columns from each DataFrame are summed up. I understand there is a method of summation called "pd.DataFrame.sum()", but remember, I have over 1000 columns and I can not add each common column manually. I am attaching sample DataFrames and the result I want. Help will be appreciated.
#Sample DataFrames.
df_1 = pd.DataFrame({'a':[1,2,3],'b':[2,1,0],'c':[1,3,5]})
df_2 = pd.DataFrame({'a':[1,1,0],'b':[2,1,4],'c':[1,0,2],'d':[2,2,2]})
df_3 = pd.DataFrame({'a':[1,2,3],'c':[1,3,5], 'x':[2,3,4]})
#Result.
df_total = pd.DataFrame({'a':[3,5,6],'b':[4,2,4],'c':[3,6,12],'d':[2,2,2], 'x':[2,3,4]})
df_total
a b c d x
0 3 4 3 2 2
1 5 2 6 2 3
2 6 4 12 2 4

Let us do pd.concat then sum
out = pd.concat([df_1,df_2,df_3],axis=1).sum(level=0,axis=1)
Out[7]:
a b c d x
0 3 4 3 2 2
1 5 2 6 2 3
2 6 4 12 2 4

You can add with fill_value=0:
df_1.add(df_2, fill_value=0).add(df_3, fill_value=0).astype(int)
Output:
a b c d x
0 3 4 3 2 2
1 5 2 6 2 3
2 6 4 12 2 4
Note: pandas intrinsically aligns most operations along indexes (index and column headers).

Finding difference between two columns of a dataframe along with groupby

I saw a primitive version of this question here
but i my dataframe has diffrent names and i want to calculate separately for them
A B C
0 a 3 5
1 a 6 9
2 b 3 8
3 b 11 19
i want to groupby A and then find diffence between alternate B and C.something like this
A B C dA
0 a 3 5 6
1 a 6 9 NaN
2 b 3 8 16
3 b 11 19 NaN
i tried doing
df['dA']=df.groupby('A')(['C']-['B'])
df['dA']=df.groupby('A')['C']-df.groupby('A')['B']
none of them helped
what mistake am i making?

IIUC, here is one way to perform the calculation:
# create the data frame
from io import StringIO
import pandas as pd
data = '''idx A B C
0 a 3 5
1 a 6 9
2 b 3 8
3 b 11 19
'''
df = pd.read_csv(StringIO(data), sep='\s+', engine='python').set_index('idx')
Now, compute dA. I look last value of C less first value of B, as grouped by A. (Is this right? Or is it max(C) less min(B)?). If you're guaranteed to have the A values in pairs, then #BenT's shift() would be more concise.
dA = (
(df.groupby('A')['C'].transform('last') -
df.groupby('A')['B'].transform('first'))
.drop_duplicates()
.rename('dA'))
print(pd.concat([df, dA], axis=1))
A B C dA
idx
0 a 3 5 6.0
1 a 6 9 NaN
2 b 3 8 16.0
3 b 11 19 NaN
I used groupby().transform() to preserve index values, to support the concat operation.

Replace specific column values in pandas dataframe [duplicate]

This question already has answers here:
pandas replace multiple values one column
(7 answers)
Closed 3 years ago.
df = pd.DataFrame({'Tissues':['a1','x2','y3','b','c1','v2','w3'], 'M':[1,2,'a',4,'b','a',7]})
df.set_index('Tissues')
The dataframe looks like:
M
Tissues
a1 1
x2 2
y3 a
b 4
c1 b
v2 a
w3 7
How can I replace all as in column M with say a specific value, 2 and all bs to 3?
I tried:
replace_values = {'a':2, 'b':3}
df['M'] = df['M'].map(replace_values)
, but that changed other values not in the keys in replace_values to NaN:
Tissues M
0 a1 NaN
1 x2 NaN
2 y3 2.0
3 b NaN
4 c1 3.0
5 v2 2.0
6 w3 NaN
I see that I can do
df.loc[(df['M'] == 'a')] = 2
but can I do this efficiently for a, b and so on, instead of one by one?

Use df.replace:
df = pd.DataFrame({'Tissues':['a1','x2','y3','b','c1','v2','w3'], 'M':[1,2,'a',4,'b','a',7]})
df.set_index('Tissues')
replace_values = {'a':2, 'b':3}
df['M'] = df['M'].replace(replace_values)
Output:
>>> df
Tissues M
0 a1 1
1 x2 2
2 y3 2
3 b 4
4 c1 3
5 v2 2
6 w3 7

Fix your code by add fillna
df['M'] = df['M'].map(replace_values).fillna(df.M)
df
Tissues M
0 a1 1.0
1 x2 2.0
2 y3 2.0
3 b 4.0
4 c1 3.0
5 v2 2.0
6 w3 7.0

Use df.replace
replace_values = {'a':2, 'b':3}
df = df.replace({"M": replace_values})
Results:
Tissues M
0 a1 1
1 x2 2
2 y3 a
3 b 4
4 c1 b
5 v2 a
6 w3 7

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Mean Across Two Data Frames on Similar Columns only - python

Related

pandas apply custom function on subset of the columns

Need to convert all columns in a row with unique values infront?

summing columns from different dataframes Pandas

Finding difference between two columns of a dataframe along with groupby

Replace specific column values in pandas dataframe [duplicate]

Categories

Resources