Pandas create column for mean of grouped count - python

I have a grouped dataframe that looks as follows:
player_id shot_type count
01 03 3
02 01 3
03 2
03 01 4
I want to add an additional column which is the mean of the shot_type counts by player_id which would look as follows:
player_id shot_type count mean_shot_type_count_player
01 03 3 (3+2)/2
02 01 3 (3+4)/2
03 2 (3+2)/2
03 01 4 (3+4)/2

Use GroupBy.transform:
df['mean_shot_type_count_player']=df.groupby('shot_type')['count'].transform('mean')
print(df)
Output:
player_id shot_type count mean_shot_type_count_player
0 01 03 3 2.5
1 02 01 3 3.5
2 03 2 2.5
3 03 01 4 3.5

Related

How to check the nearest matching value between two fields in same table and add data to the third field using Pandas?

I have one table:
Index
Month_1
Month_2
Paid
01
12
10
02
09
03
03
02
04
04
01
08
The output should be:
Index
Month_1
Month_2
Paid
01
12
10
Yes
02
09
03
03
02
04
Yes
04
01
08
Logic: Add 'Yes' to the Paid field whose Month_1 and Month_2 are nearby
You can subtract columns, get absolute values and compare if equal or less like threshold, e.g. 2 and then set values in numpy.where:
df['Paid'] = np.where(df['Month_1'].sub(df['Month_2']).abs().le(2), 'Yes','')
print (df)
Index Month_1 Month_2 Paid
0 01 12 10 Yes
1 02 9 3
2 03 2 4 Yes
3 04 1 8

groupby two columns and count unique values from a third column

I have the following df1:
id period color size rate
1 01 red 12 30
1 02 red 12 30
2 01 blue 12 35
3 03 blue 12 35
4 01 blue 12 35
4 02 blue 12 35
5 01 pink 10 40
6 01 pink 10 40
I need to create a new df2 with an index that is an aggregate of 3 columns color-size-rate, then groupby 'period' and get the count of unique ids.
My final df should be have the following structure:
index period count
red-12-30 01 1
red-12-30 02 1
blue-12-35 01 2
blue-12-35 03 1
blue-12-35 02 1
pink-10-40 01 2
Thank you in advance for your help.
try .agg('-'.join) and .groupby
df1 = df.groupby([df[["color", "size", "rate"]].astype(str)\
.agg("-".join, 1).rename('index'), "period"])\
.agg(count=("id", "nunique"))\
.reset_index()
print(df1)
index period count
0 blue-12-35 1 2
1 blue-12-35 2 1
2 blue-12-35 3 1
3 pink-10-40 1 2
4 red-12-30 1 1
5 red-12-30 2 1
you can achieve this with a groupby
df2 = df1.groupby(['color', 'size', 'rate', 'period']).count().reset_index();
df2['index'] = df2.apply(lambda x: '-'.join([x['color'], x['size'], x['rate']]), axis = 1)

Custom Melt / Collapse Pandas

I have another problem with pandas. I can do the below task utilizing loops but it would be very inefficient due to the size of the input. Please let me know if there is a pandas solution.
I would like create a new DF_C based on DF A. I need to create multiple rows based on the columns COL_A and COL_B (the values will be separated by commas). State will always have one element in it.
The sequence of rows does not matter.
I have a DF A:
State COL_A B_COL
01 01 03, 01
02 01, 03 01, 04
02 07 03
04 01 05
I would like a resulting df_c:
State COL_A B_COL
01 01 03
01 01 01
02 01 01
02 01 04
02 03 01
02 03 04
02 07 03
04 01 05
you can do by first use str.split on both COL_A and B_COL, then chain with one explode on each column like:
df_ = (df.assign(COL_A=lambda x: x['COL_A'].str.split(', '),
B_COL=lambda x: x['B_COL'].str.split(', '))
.explode('COL_A')
.explode('B_COL')
)
print (df_)
State COL_A B_COL
0 1 01 03
0 1 01 01
1 2 01 01
1 2 01 04
1 2 03 01
1 2 03 04
2 2 07 03
3 4 01 05
EDIT: if you are after efficiency, maybe consider doing
df_ = pd.DataFrame(
[(s, a, b)
for s, cola, colb in zip(df['State'], df['COL_A'], df['B_COL'])
for a in cola.split(', ') for b in colb.split(', ')],
columns=df.columns)
An alternative to Ben.T's second solution, using itertools :
from itertools import product,chain
flatten = chain.from_iterable
result = flatten(product([state],col_a.split(","),b_col.split(","))
for state, col_a,b_col in df.to_numpy())
pd.DataFrame(result, columns = df.columns)
State COL_A B_COL
0 1 01 03
1 1 01 01
2 2 01 01
3 2 01 04
4 2 03 01
5 2 03 04
6 2 07 03
7 4 01 05

conditional dataframe shift

I have below dataframe
ID1 ID2 mon price
10 2 06 500
20 3 07 200
20 3 08 300
20 3 09 400
21 2 07 100
21 2 08 200
21 2 09 300
Required output :-
ID1 ID2 mon price ID1_shift ID2_shift mon_shift price_shift
10 2 06 500 10 2 06 500
20 3 07 200 20 3 07 200
20 3 08 300 20 3 07 200
20 3 09 400 20 3 08 300
21 2 07 100 21 2 07 100
21 3 08 200 21 2 07 100
21 4 09 300 21 3 08 200
I tried using df.shift() by different ways but was not successfull.
YOur valueable comments will be helpful.
I want to shift dataframe group by (ID1,ID2) and if NaN then fill with current values.
I tried below but it works with single column.
df["price_shift"]=df.groupby(["ID1","ID2"]).price.shift().fillna(df["price"])
Thanks
I came up with below , but this is feasible for less no of columns. Is there any way where complete row can be shifted with group by as above ?
df1['price_shift']=df.groupby(['ID1','ID2']).price.shift(1).fillna(df['price'])
df1['mon_shift']=df.groupby(['ID1','ID2']).mon.shift(1).fillna(df['mon'])
df1[['ID1_shift','ID2_shift']]=df[['ID1','ID2']]
df2=pd.concat([df, df1],axis=1)
df2
try the below:
for column_name in df.columns:
df[column_name+"_shift"]=df[column_name]
cheers

How to create a column for each year from a single date column containing year and month?

If I have a Data
Date Values
2005-01 10
2005-02 20
2005-03 30
2006-01 40
2006-02 50
2006-03 70
How can I change Year Column? like this
Date 2015 2016
01 10 40
02 20 50
03 30 70
Thanks.
You can use split with pivot:
df[['year','month']] = df.Date.str.split('-', expand=True)
df = df.pivot(index='month', columns='year', values='Values')
print (df)
year 2005 2006
month
01 10 40
02 20 50
03 30 70

Categories

Resources