Pandas multi column mean - python

I have a pandas DataFrame, and would like to get column wise mean()
as below.
A B C D
1 10 100 1000 10000
2 20 200 2000 20000
3 30 300 3000 30000
4 40 400 4000 40000
5 50 500 5000 50000
Answer:
A B C D
30 300 3000 30000
Please suggest a way to do it.
I have tried df.mean() and other variations of it.

Add to_frame with T:
print (df.mean().to_frame().T)
A B C D
0 30.0 300.0 3000.0 30000.0
Or:
print (pd.DataFrame(df.mean().values.reshape(1,-1), columns=df.columns))
A B C D
0 30.0 300.0 3000.0 30000.0
Or:
print (pd.DataFrame(np.mean(df.values, axis=0).reshape(1,-1), columns=df.columns))
A B C D
0 30.0 300.0 3000.0 30000.0

Related

Pandas Multiply 2D by 1D Dataframe

Looking for an elegant way to multiply a 2D dataframe by a 1D series where the indices and column names align
df1 =
Index
A
B
1
1
5
2
2
6
3
3
7
4
4
8
df2 =
Coef
A
10
B
100
Something like...
df3 = df1.mul(df2)
To get :
Index
A
B
1
10
500
2
20
600
3
30
700
4
40
800
There is no such thing as 1D DataFrame, you need to slice as Series to have 1D, then multiply (by default on axis=1):
df3 = df1.mul(df2['Coef'])
Output:
A B
1 10 500
2 20 600
3 30 700
4 40 800
If Index is a column:
df3 = df1.mul(df2['Coef']).combine_first(df1)[df1.columns]
Output:
Index A B
0 1.0 10.0 500.0
1 2.0 20.0 600.0
2 3.0 30.0 700.0
3 4.0 40.0 800.0

Python: Divide row in one DataFrame by all rows in another DataFrame

I have two DataFrames as follows:
df1:
A B C D
index
0 10000 20000 30000 40000
df2:
time type A B C D
index
0 5/2020 unit 1000 4000 900 200
1 6/2020 unit 7000 2000 600 4000
I want to divide df1.iloc[0] by all rows in df2 to get the following:
df:
time type A B C D
index
0 5/2020 unit 10 5 33.33 200
1 6/2020 unit 1.43 10 50 10
I tried to use df1.iloc[0].div(df2.iloc[:]) but that gave me NaNs for all rows other than the 0 index.
Any suggestions would be greatly appreciated. Thank you.
Let us do
df2.update(df2.loc[:,df1.columns].rdiv(df1.iloc[0]))
df2
Out[861]:
time type A B C D
0 5/2020 unit 10.000000 5.0 33.333333 200.0
1 6/2020 unit 1.428571 10.0 50.000000 10.0
another way to do it, using numpy divide
df2.update(np.divide(df.to_numpy()[:,:], df2.loc[:,df.columns]))
df2
time type A B C D
0 5/2020 unit 10.000000 5.0 33.333333 200.0
1 6/2020 unit 1.428571 10.0 50.000000 10.0
You can use div.
df = df2.apply(lambda x:df1.iloc[0].div(x[df1.columns]), axis=1)
print(df):
A B C D
index
0 10.000000 5.0 33.333333 200.0
1 1.428571 10.0 50.000000 10.0

Python loop for calculating sum of column values in pandas

I have below data frame:
a
100
200
200
b
20
30
40
c
400
50
Need help to calculate sum of values for each item and place it in 2nd column, which ideally should look like below:
a 500
100
200
200
b 90
20
30
40
c 450
400
50
If need sum by groups by column col converted to numeric use GroupBy.transform with repeated non numeric values by ffill:
s = pd.to_numeric(df['col'], errors='coerce')
mask = s.isna()
df.loc[mask, 'new'] = s.groupby(df['col'].where(mask).ffill()).transform('sum')
print (df)
col new
0 a 500.0
1 100 NaN
2 200 NaN
3 200 NaN
4 b 90.0
5 20 NaN
6 30 NaN
7 40 NaN
8 c 450.0
9 400 NaN
10 50 NaN
Or:
df['new'] = np.where(mask, new.astype(int), '')
print (df)
col new
0 a 500
1 100
2 200
3 200
4 b 90
5 20
6 30
7 40
8 c 450
9 400
10 50

Merge Columns with the Same name in the same dataframe if null

I have a dataframe that looks like this
Depth DT DT DT GR GR GR
1 100 NaN 45 NaN 100 50 NaN
2 200 NaN 45 NaN 100 50 NaN
3 300 NaN 45 NaN 100 50 NaN
4 400 NaN Nan 50 100 50 NaN
5 500 NaN Nan 50 100 50 NaN
I need to merge the same name columns into one if there are null values and keep the first occurrence of the column if other columns are not null.
In the end the data frame should look like
Depth DT GR
1 100 45 100
2 200 45 100
3 300 45 100
4 400 50 100
5 500 50 100
I am beginner in pandas. I tried but wasn't successful. I tried drop duplicate but it couldn't do the what I wanted. Any suggestions?
IIUC, you can do:
(df.set_index('Depth')
.groupby(level=0, axis=1).first()
.reset_index())
output:
Depth DT GR
0 100 45.0 100.0
1 200 45.0 100.0
2 300 45.0 100.0
3 400 50.0 100.0
4 500 50.0 100.0

Map value from one row as a new column in pandas

I have a pandas dataframe:
SrNo value
a nan
1 100
2 200
3 300
b nan
1 500
2 600
3 700
c nan
1 900
2 1000
i want my final dataframe as:
value new_col
100 a
200 a
300 a
500 b
600 b
700 b
900 c
1000 c
i.e for sr.no 'a' the values under a should have 'a' as a new column similarly for b and c
Create new column by where with condition by isnull, then use ffill for replace NaNs by forward filling.
Last remove NaNs rows by dropna and column by drop:
print (df['SrNo'].where(df['value'].isnull()))
0 a
1 NaN
2 NaN
3 NaN
4 b
5 NaN
6 NaN
7 NaN
8 c
9 NaN
10 NaN
Name: SrNo, dtype: object
df['new_col'] = df['SrNo'].where(df['value'].isnull()).ffill()
df = df.dropna().drop('SrNo', 1)
print (df)
value new_col
1 100.0 a
2 200.0 a
3 300.0 a
5 500.0 b
6 600.0 b
7 700.0 b
9 900.0 c
10 1000.0 c
Here's one way
In [2160]: df.assign(
new_col=df.SrNo.str.extract('(\D+)', expand=True).ffill()
).dropna().drop('SrNo', 1)
Out[2160]:
value new_col
1 100.0 a
2 200.0 a
3 300.0 a
5 500.0 b
6 600.0 b
7 700.0 b
9 900.0 c
10 1000.0 c
Another way with replace numbers with nan and ffill()
df['col'] = df['SrNo'].replace('([0-9]+)',np.nan,regex=True).ffill()
df = df.dropna(subset=['value']).drop('SrNo',1)
Output:
value col
1 100.0 a
2 200.0 a
3 300.0 a
5 500.0 b
6 600.0 b
7 700.0 b
9 900.0 c
10 1000.0 c

Categories

Resources