pandas - Select Last Row of Column Based on Different Column Value

pandas - Select Last Row of Column Based on Different Column Value - python

I've got a dataframe like
Season Game Event_Num Home Away Margin
0 2016-17 1 1 0 0 0
1 2016-17 1 2 0 0 0
2 2016-17 1 3 0 2 2
3 2016-17 1 4 0 2 2
4 2016-17 1 5 0 2 2
.. ... ... ... ... ... ...
95 2017-18 5 53 17 10 7
96 2017-18 5 54 17 10 7
97 2017-18 5 55 17 10 7
98 2017-18 5 56 17 10 7
99 2017-18 5 57 17 10 7
And ultimately, I'd like to take the last row of each Game played, so for instance, the last row for Game 1, Game 2, etc. so I can see what the final margin was, but I'd like to do this for every unique season.
For example, if there were 3 games played for 2 unique seasons then the df would look something like:
Season Game Event_Num Home Away Final Margin
0 2016-17 1 1 90 80 10
1 2016-17 2 2 83 88 5
2 2016-17 3 3 67 78 11
3 2017-18 1 4 101 102 1
4 2017-18 2 5 112 132 20
5 2017-18 3 6
Is there a good way to do something like this? TIA.

Try:
df.groupby(['Season','Game']).tail(1)
output
Season Game Event_Num Home Away Margin
4 2016-17 1 5 0 2 2
9 2017-18 5 57 17 10 7

Related

Efficient way to generate new columns of the minimum value of certain subset of another column in Pandas Dataframe

I have a dataframe of performance of athletes in different races that looks like
Race_ID Date Athlete_ID Time Rank
1 2022-10-12 1 55 3
1 2022-10-12 2 52 2
1 2022-10-12 3 48 1
1 2022-10-12 4 58 5
1 2022-10-12 5 59 6
1 2022-10-12 6 57 4
2 2022-09-17 1 43 2
2 2022-09-17 2 48 4
2 2022-09-17 3 42 1
2 2022-09-17 4 50 5
2 2022-09-17 5 44 3
3 2022-08-11 1 56 4
3 2022-08-11 2 55 3
3 2022-08-11 3 51 2
3 2022-08-11 4 50 1
4 2022-05-30 1 43 2
4 2022-05-30 2 44 3
4 2022-05-30 3 40 1
4 2022-05-30 4 49 6
4 2022-05-30 5 48 5
4 2022-05-30 6 47 4
and I sort it according to Athlete_ID and Date:
df.sort_values(by=['Athlete_ID','Date], ascending=[True,True], inplace=True)
and I get
Race_ID Date Athlete_ID Time Rank
4 2022-05-30 1 43 2
3 2022-08-11 1 56 4
2 2022-09-17 1 43 2
1 2022-10-12 1 55 3
4 2022-05-30 2 44 3
3 2022-08-11 2 55 3
2 2022-09-17 2 48 4
1 2022-10-12 2 52 2
4 2022-05-30 3 40 1
3 2022-08-11 3 51 2
2 2022-09-17 3 42 1
1 2022-10-12 3 48 1
4 2022-05-30 4 49 6
3 2022-08-11 4 50 1
2 2022-09-17 4 50 5
1 2022-10-12 4 58 5
4 2022-05-30 5 48 5
2 2022-09-17 5 44 3
1 2022-10-12 5 59 6
4 2022-05-30 6 47 4
1 2022-10-12 6 57 4
For each Athlete_ID, I want to generate a new column Minimum_time#t-1 whose value is the minimum time of the LAST race that the athlete ran, and 0 otherwise, so the desired output looks like:
Race_ID Date Athlete_ID Time Rank Minimum_time#t-1
4 2022-05-30 1 43 2 0 #since that's the first race athlete1 ran
3 2022-08-11 1 56 4 40 #the last race athlete1 ran is race 4 and the fastest time is 40
2 2022-09-17 1 43 2 50 #the last race athlete1 ran is race 3 and the fastest time is 50
1 2022-10-12 1 55 3 42
4 2022-05-30 2 44 3 0
3 2022-08-11 2 55 3 40
2 2022-09-17 2 48 4 50
1 2022-10-12 2 52 2 42
4 2022-05-30 3 40 1 0
3 2022-08-11 3 51 2 40
2 2022-09-17 3 42 1 50
1 2022-10-12 3 48 1 42
4 2022-05-30 4 49 6 0
3 2022-08-11 4 50 1 40
2 2022-09-17 4 50 5 50
1 2022-10-12 4 58 5 42
4 2022-05-30 5 48 5 0
2 2022-09-17 5 44 3 40
1 2022-10-12 5 59 6 42
4 2022-05-30 6 47 4 0 #since that's the first race athlete6 ran
1 2022-10-12 6 57 4 40 #the last race athlete6 ran is race 4 and the fastest time is 40
The way I did this is to first define a funtion:
def minimum_time(Race_ID):
return df.loc[df['Race_ID] == Race_ID]['Time'].min()
and then use shift to get the Race_ID for the last race of the athletes and then apply minimum_time to it:
df.sort_values(by=['Athlete_ID','Date'], ascending=[True,False], inplace=True)
df['Race_ID#t-1'] = df.groupby('Athlete_ID')['Race_ID'].shift(-1).replace(np.nan, 0)
df['Minimum_time#t-1'] = df['Race_ID#t-1'].map(minimum_time).replace(np.nan, 0)
So it works but it's very slow for large datasets. I wanna ask is there a more computationally efficient way to do this? Thank you.

I would use a different method to get the minimum time, and fillna instead of replace:
# get min time per race
best = df.groupby('Race_ID')['Time'].min()
# shift to get the previous race
# map best time for this race, then fill NaNs with 0
df['Rank Minimum_time#t-1'] = (df.groupby('Athlete_ID')['Race_ID']
.shift(1).map(best)
.fillna(0, downcast='infer')
)
output:
Race_ID Date Athlete_ID Time Rank Rank Minimum_time#t-1
15 4 2022-05-30 1 43 2 0
11 3 2022-08-11 1 56 4 40
6 2 2022-09-17 1 43 2 50
0 1 2022-10-12 1 55 3 42
16 4 2022-05-30 2 44 3 0
12 3 2022-08-11 2 55 3 40
7 2 2022-09-17 2 48 4 50
1 1 2022-10-12 2 52 2 42
17 4 2022-05-30 3 40 1 0
13 3 2022-08-11 3 51 2 40
8 2 2022-09-17 3 42 1 50
2 1 2022-10-12 3 48 1 42
18 4 2022-05-30 4 49 6 0
14 3 2022-08-11 4 50 1 40
9 2 2022-09-17 4 50 5 50
3 1 2022-10-12 4 58 5 42
19 4 2022-05-30 5 48 5 0
10 2 2022-09-17 5 44 3 40
4 1 2022-10-12 5 59 6 42
20 4 2022-05-30 6 47 4 0
5 1 2022-10-12 6 57 4 40

How to get the number of events in a regular interval of time in a dataframe

Assume I'm having dataframe as shown below.
In the data frame we are representing the events occurred on every sec.
Time events_occured
1 2
2 3
3 7
4 4
5 6
6 3
7 86
8 26
9 7
10 26
. .
. .
. .
996 56
997 26
998 97
999 58
1000 34
Now I need to get the cumulative occurrences of events in every 5 secs.
As in first 5 seconds 22 events occurred, from 6 to 10 secs 148 events occurred and so on.

Like this:
In [647]: df['cumulative'] = df.events_occured.groupby(df.index // 5).cumsum()
In [648]: df
Out[648]:
Time events_occured cumulative
0 1 2 2
1 2 3 5
2 3 7 12
3 4 4 16
4 5 6 22
5 6 3 3
6 7 86 89
7 8 26 115
8 9 7 122
9 10 26 148

if there are missing values of Time using df.index could produce errors in the logic so use df['Time'].
It also works if time starts at any value N and if there are missing values greater than N
GROUP_SIZE = 5
df['cumulative'] = df.events_occured\
.groupby(df['Time'].sub(df['Time'].min()) // GROUP_SIZE).cumsum()
print(df)
Time events_occured cumulative
0 1 2 2
1 2 3 5
2 3 7 12
3 4 4 16
4 5 6 22
5 6 3 3
6 7 86 89
7 8 26 115
8 9 7 122
9 10 26 148

assign a number id for every 4 rows in pandas dataframe

I have a pandas dataframe like this:
pd.DataFrame({'week': ['2019-w01', '2019-w02','2019-w03','2019-w04',
'2019-w05','2019-w06','2019-w07','2019-w08',
'2019-w9','2019-w10','2019-w11','2019-w12'],
'value': [11,22,33,34,57,88,2,9,10,1,76,14],
'period': [1,1,1,1,2,2,2,2,3,3,3,3]})
week value
0 2019-w1 11
1 2019-w2 22
2 2019-w3 33
3 2019-w4 34
4 2019-w5 57
5 2019-w6 88
6 2019-w7 2
7 2019-w8 9
8 2019-w9 10
9 2019-w10 1
10 2019-w11 76
11 2019-w12 14
what I need is like below. I would like to assign a period ID every 4-week interval.
week value period
0 2019-w01 11 1
1 2019-w02 22 1
2 2019-w03 33 1
3 2019-w04 34 1
4 2019-w05 57 2
5 2019-w06 88 2
6 2019-w07 2 2
7 2019-w08 9 2
8 2019-w9 10 3
9 2019-w10 1 3
10 2019-w11 76 3
11 2019-w12 14 3
what is the best way to achieve that? Thanks.

try with:
df['period']=(pd.to_numeric(df['week'].str.split('-').str[-1]
.str.replace('w',''))//4).shift(fill_value=0).add(1)
print(df)
week value period
0 2019-w01 11 1
1 2019-w02 22 1
2 2019-w03 33 1
3 2019-w04 34 1
4 2019-w05 57 2
5 2019-w06 88 2
6 2019-w07 2 2
7 2019-w08 9 2
8 2019-w9 10 3
9 2019-w10 1 3
10 2019-w11 76 3
11 2019-w12 14 3

Defining Target based on two column values

I am new to python and I was facing some issue solving the following problem.
I have the following dataframe:
SoldDate CountSoldperMonth
2019-06-01 20
5
10
12
33
16
50
27
2019-05-01 2
5
11
13
2019-04-01 32
35
39
42
47
55
61
80
I need to add a Target column such that for the top 5 values in 'CountSoldperMonth' for a particular SoldDate, target should be 1 else 0. If the number of rows in 'CountSoldperMonth' for a particular 'SoldDate' is less than 5 then only the row with highest count will be marked as 1 in the Target and rest as 0. The resulting dataframe should look as below.
SoldDate CountSoldperMonth Target
2019-06-01 20 1
5 0
10 0
12 0
33 1
16 1
50 1
27 1
2019-05-01 2 0
5 0
11 0
13 1
2019-04-01 32 0
35 0
39 0
42 1
47 1
55 1
61 1
80 1
How do I do this?

In your case , using groupby with your rules chain with apply if...else
df.groupby('SoldDate').CountSoldperMonth.\
apply(lambda x : x==max(x) if len(x)<=5 else x.isin(sorted(x)[-5:])).astype(int)
Out[346]:
0 1
1 0
2 0
3 0
4 1
5 1
6 1
7 1
8 0
9 0
10 0
11 1
12 0
13 0
14 0
15 1
16 1
17 1
18 1
19 1
Name: CountSoldperMonth, dtype: int32

Pandas compare 2 dataframes by specific rows in all columns

I have the following Pandas dataframe of some raw numbers:
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 10000)
col_raw_headers = ['07_08_19 #1','07_08_19 #2','07_08_19 #2.1','11_31_19 #1','11_31_19 #1.1','11_31_19 #1.3','12_15_20 #1','12_15_20 #2','12_15_20 #2.1','12_15_20 #2.2']
col_raw_trial_info = ['Quantity1','Quantity2','Quantity3','Quantity4','Quantity5','Quantity6','TimeStamp',np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
cols_raw = [[1,75,9,7,-4,0.4,'07/08/2019 05:11'],[1,11,20,-17,12,0.8,'07/08/2019 10:54'],[2,0.9,17,102,56,0.6,'07/08/2019 21:04'],[1,70,4,75,0.8,0.4,'11/31/2019 11:15'],[2,60,74,41,-36,0.3,'11/31/2019 16:50'],[3,17,12,-89,30,0.1,'11/31/2019 21:33'],[1,6,34,496,-84,0.5,'12/15/2020 01:36'],[1,3,43,12,-23,0.5,'12/15/2020 07:01'],[2,5,92,17,64,0.5,'12/15/2020 11:15'],[3,7,11,62,-11,0.5,'12/15/2020 21:45']]
both_values = [[1,2,3,4,8,4,3,8,7],[6,5,3,7,3,23,27,3,11],[65,3,6,78,9,2,45,6,7],[4,3,6,8,3,5,66,32,84],[2,3,11,55,3,7,33,65,34],[22,1,6,32,5,6,4,3,898],[1,6,3,2,6,55,22,6,23],[34,37,46,918,0,37,91,12,68],[51,20,1,34,12,59,78,6,101],[12,71,34,94,1,73,46,51,21]]
processed_cols = ['c_1trial','14_1','14_2','8_1','8_2','8_3','28_1','24_1','24_2','24_3']
df_raw = pd.DataFrame(zip(*cols_raw))
df_temp = pd.DataFrame(zip(*both_values))
df_raw = pd.concat([df_raw,df_temp])
df_raw.columns=col_raw_headers
df_raw.insert(0,'Tr_id',col_raw_trial_info)
df_raw.reset_index(drop=True,inplace=True)
It looks like this:
Tr_id 07_08_19 #1 07_08_19 #2 07_08_19 #2.1 11_31_19 #1 11_31_19 #1.1 11_31_19 #1.3 12_15_20 #1 12_15_20 #2 12_15_20 #2.1 12_15_20 #2.2
0 Quantity1 1 1 2 1 2 3 1 1 2 3
1 Quantity2 75 11 0.9 70 60 17 6 3 5 7
2 Quantity3 9 20 17 4 74 12 34 43 92 11
3 Quantity4 7 -17 102 75 41 -89 496 12 17 62
4 Quantity5 -4 12 56 0.8 -36 30 -84 -23 64 -11
5 Quantity6 0.4 0.8 0.6 0.4 0.3 0.1 0.5 0.5 0.5 0.5
6 TimeStamp 07/08/2019 05:11 07/08/2019 10:54 07/08/2019 21:04 11/31/2019 11:15 11/31/2019 16:50 11/31/2019 21:33 12/15/2020 01:36 12/15/2020 07:01 12/15/2020 11:15 12/15/2020 21:45
7 NaN 1 6 65 4 2 22 1 34 51 12
8 NaN 2 5 3 3 3 1 6 37 20 71
9 NaN 3 3 6 6 11 6 3 46 1 34
10 NaN 4 7 78 8 55 32 2 918 34 94
11 NaN 8 3 9 3 3 5 6 0 12 1
12 NaN 4 23 2 5 7 6 55 37 59 73
13 NaN 3 27 45 66 33 4 22 91 78 46
14 NaN 8 3 6 32 65 3 6 12 6 51
15 NaN 7 11 7 84 34 898 23 68 101 21
I have a separate dataframe of a processed version of these numbers where:
some of the header rows from above have been deleted,
the column names have been changed
Here is the second dataframe:
df_processed = pd.DataFrame(zip(*both_values),columns=processed_cols)
df_processed = df_processed[[3,4,9,7,0,2,1,6,8,5]]
8_1 8_2 24_3 24_1 c_1trial 14_2 14_1 28_1 24_2 8_3
0 4 2 12 34 1 65 6 1 51 22
1 3 3 71 37 2 3 5 6 20 1
2 6 11 34 46 3 6 3 3 1 6
3 8 55 94 918 4 78 7 2 34 32
4 3 3 1 0 8 9 3 6 12 5
5 5 7 73 37 4 2 23 55 59 6
6 66 33 46 91 3 45 27 22 78 4
7 32 65 51 12 8 6 3 6 6 3
8 84 34 21 68 7 7 11 23 101 898
Common parts of each dataframe:
For each column, rows 8 onwards of the raw dataframe are the same as row 1 onwards from the processed dataframe. The order of columns in both dataframes is not the same.
Output combination:
I am looking to compare rows 8-16 in columns 1-10 of the raw dataframe dr_raw to the processed dataframe df_processed. If the columns match each other, then I would like to extract rows 1-7 of the df_raw and the column header from df_processed.
Example:
the values in column c_1trial only matches values in rows 8-16 from the column 07_08_19 #1. I would 2 steps: (1) I would like to find some way to determine that these 2 columns are matching each other, (2) if 2 columns do match eachother, then in the sample output, I would like to select rows from the matching columns.
Here is the output I am looking to get:
Tr_id 07_08_19 #1 07_08_19 #2 07_08_19 #2.1 11_31_19 #1 11_31_19 #1.1 11_31_19 #1.3 12_15_20 #1 12_15_20 #2 12_15_20 #2.1 12_15_20 #2.2
Quantity1 1 1 2 1 2 3 1 1 2 3
Quantity2 75 11 0.9 70 60 17 6 3 5 7
Quantity3 9 20 17 4 74 12 34 43 92 11
Proc_Name c_1trial 14_1 14_2 8_1 8_2 8_3 28_1 24_1 24_2 24_3
Quantity4 7 -17 102 75 41 -89 496 12 17 62
Quantity5 -4 12 56 0.8 -36 30 -84 -23 64 -11
Quantity6 0.4 0.8 0.6 0.4 0.3 0.1 0.5 0.5 0.5 0.5
TimeStamp 07/08/2019 05:11 07/08/2019 10:54 07/08/2019 21:04 11/31/2019 11:15 11/31/2019 16:50 11/31/2019 21:33 12/15/2020 01:36 12/15/2020 07:01 12/15/2020 11:15 12/15/2020 21:45
My attempts are giving trouble:
print (df_raw.iloc[7:,1:] == df_processed).all(axis=1)
gives
ValueError: Can only compare identically-labeled DataFrame objects
and
print (df_raw.ix[7:].values == df_processed.values) #gives False
gives
False
The problem with my second attempt is that I am not selecting .all(axis=1). When I make a comparison I want to do this across all rows of every column, not just one row.
Question:
Is there a way to select out the output I showed above from these 2 dataframes?

Does this look like the output you're looking for?
Raw dataframe df:
Tr_id 07_08_19 07_08_19.1 07_08_19.2 11_31_19 11_31_19.1
0 Quantity1 1 1 2 1 2
1 Quantity2 75 11 0.9 70 60
2 Quantity3 9 20 17 4 74
3 Quantity4 7 -17 102 75 41
4 Quantity5 -4 12 56 0.8 -36
5 Quantity6 0.4 0.8 0.6 0.4 0.3
6 TimeStamp 07/08/2019 07/08/2019 07/08/2019 11/31/2019 11/31/2019
7 NaN 1 6 65 4 2
8 NaN 2 5 3 3 3
9 NaN 3 3 6 6 11
10 NaN 4 7 78 8 55
11 NaN 8 3 9 3 3
12 NaN 4 23 2 5 7
13 NaN 3 27 45 66 33
14 NaN 8 3 6 32 65
15 NaN 7 11 7 84 34
11_31_19.2 12_15_20 12_15_20.1 12_15_20.2 12_15_20.3
0 3 1 1 2 3
1 17 6 3 5 7
2 12 34 43 92 11
3 -89 496 12 17 62
4 30 -84 -23 64 -11
5 0.1 0.5 0.5 0.5 0.5
6 11/31/2019 12/15/2020 12/15/2020 12/15/2020 12/15/2020
7 22 1 34 51 12
8 1 6 37 20 71
9 6 3 46 1 34
10 32 2 918 34 94
11 5 6 0 12 1
12 6 55 37 59 73
13 4 22 91 78 46
14 3 6 12 6 51
15 898 23 68 101 21
Processed dataframe dfp:
8_1 8_2 24_3 24_1 c_1trial 14_2 14_1 28_1 24_2 8_3
0 4 2 12 34 1 65 6 1 51 22
1 3 3 71 37 2 3 5 6 20 1
2 6 11 34 46 3 6 3 3 1 6
3 8 55 94 918 4 78 7 2 34 32
4 3 3 1 0 8 9 3 6 12 5
5 5 7 73 37 4 2 23 55 59 6
6 66 33 46 91 3 45 27 22 78 4
7 32 65 51 12 8 6 3 6 6 3
8 84 34 21 68 7 7 11 23 101 898
Code:
df = pd.read_csv('raw_df.csv') # raw dataframe
dfp = pd.read_csv('processed_df.csv') # processed dataframe
dfr = df.drop('Tr_id', axis=1)
x = pd.DataFrame()
for col_raw in dfr.columns:
for col_p in dfp.columns:
if (dfr.tail(9).astype(int)[col_raw] == dfp[col_p]).all():
series = dfr[col_raw].head(7).tolist()
series.append(col_raw)
x[col_p] = series
x = pd.concat([df['Tr_id'].head(7), x], axis=1)
Output:
Tr_id c_1trial 14_1 14_2 8_1 8_2
0 Quantity1 1 1 2 1 2
1 Quantity2 75 11 0.9 70 60
2 Quantity3 9 20 17 4 74
3 Quantity4 7 -17 102 75 41
4 Quantity5 -4 12 56 0.8 -36
5 Quantity6 0.4 0.8 0.6 0.4 0.3
6 TimeStamp 07/08/2019 07/08/2019 07/08/2019 11/31/2019 11/31/2019
7 NaN 07_08_19 07_08_19.1 07_08_19.2 11_31_19 11_31_19.1
8_3 28_1 24_1 24_2 24_3
0 3 1 1 2 3
1 17 6 3 5 7
2 12 34 43 92 11
3 -89 496 12 17 62
4 30 -84 -23 64 -11
5 0.1 0.5 0.5 0.5 0.5
6 11/31/2019 12/15/2020 12/15/2020 12/15/2020 12/15/2020
7 11_31_19.2 12_15_20 12_15_20.1 12_15_20.2 12_15_20.3
I think the code could be more concise but maybe this does the job.

alternative solution, using DataFrame.isin() method:
In [171]: df1
Out[171]:
a b c
0 1 1 3
1 0 2 4
2 4 2 2
3 0 3 3
4 0 4 4
In [172]: df2
Out[172]:
a b c
0 0 3 3
1 1 1 1
2 0 3 4
3 4 2 3
4 0 4 4
In [173]: common = pd.merge(df1, df2)
In [174]: common
Out[174]:
a b c
0 0 3 3
1 0 4 4
In [175]: df1[df1.isin(common.to_dict('list')).all(axis=1)]
Out[175]:
a b c
3 0 3 3
4 0 4 4
Or if you want to subtract second data set from the first one. I.e. Pandas equivalent for SQL's:
select col1, .., colN from tableA
minus
select col1, .., colN from tableB
in Pandas:
In [176]: df1[~df1.isin(common.to_dict('list')).all(axis=1)]
Out[176]:
a b c
0 1 1 3
1 0 2 4
2 4 2 2

I came up with this using loops. It is very disappointing:
holder = []
for randm,pp in enumerate(list(df_processed)):
list1 = df_processed[pp].tolist()
for car,rr in enumerate(list(df_raw)):
list2 = df_raw.loc[7:,rr].tolist()
if list1==list2:
holder.append([rr,pp])
df_intermediate = pd.DataFrame(holder,columns=['A','B'])
df_c = df_raw.loc[:6,df_intermediate.iloc[:,0].tolist()]
df_c.loc[df_c.shape[0]] = df_intermediate.iloc[:,1].tolist()
df_c.insert(0,list(df_raw)[0],df_raw[list(df_raw)[0]])
df_c.iloc[-1,0]='Proc_Name'
df_c = df_c.reindex([0,1,2]+[7]+[3,4,5,6]).reset_index(drop=True)
Output:
Tr_id 11_31_19 #1 11_31_19 #1.1 12_15_20 #2.2 12_15_20 #2 07_08_19 #1 07_08_19 #2.1 07_08_19 #2 12_15_20 #1 12_15_20 #2.1 11_31_19 #1.3
0 Quantity1 1 2 3 1 1 2 1 1 2 3
1 Quantity2 70 60 7 3 75 0.9 11 6 5 17
2 Quantity3 4 74 11 43 9 17 20 34 92 12
3 Proc_Name 8_1 8_2 24_3 24_1 c_1trial 14_2 14_1 28_1 24_2 8_3
4 Quantity4 75 41 62 12 7 102 -17 496 17 -89
5 Quantity5 0.8 -36 -11 -23 -4 56 12 -84 64 30
6 Quantity6 0.4 0.3 0.5 0.5 0.4 0.6 0.8 0.5 0.5 0.1
7 TimeStamp 11/31/2019 11:15 11/31/2019 16:50 12/15/2020 21:45 12/15/2020 07:01 07/08/2019 05:11 07/08/2019 21:04 07/08/2019 10:54 12/15/2020 01:36 12/15/2020 11:15 11/31/2019 21:33
The order of the columns is different than what I required, but that is a minor problem.
The real problem with this approach is using loops.
I wish there was a better way to do this using some built-in Pandas functionality. If you have a better solution, please post it. thank you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas - Select Last Row of Column Based on Different Column Value - python

Try: df.groupby(['Season','Game']).tail(1) output Season Game Event_Num Home Away Margin 4 2016-17 1 5 0 2 2 9 2017-18 5 57 17 10 7

Related

Efficient way to generate new columns of the minimum value of certain subset of another column in Pandas Dataframe

How to get the number of events in a regular interval of time in a dataframe

assign a number id for every 4 rows in pandas dataframe

Defining Target based on two column values

Pandas compare 2 dataframes by specific rows in all columns

Categories

Resources