How to calculate cumulative groupby counts in Pandas with point in time?

How to calculate cumulative groupby counts in Pandas with point in time? - python

I have a df that contains multiple weekly snapshots of JIRA tickets. I want to calculate the YTD counts of tickets.
the df looks like this:
pointInTime ticketId
2008-01-01 111
2008-01-01 222
2008-01-01 333
2008-01-07 444
2008-01-07 555
2008-01-07 666
2008-01-14 777
2008-01-14 888
2008-01-14 999
So if I df.groupby(['pointInTime'])['ticketId'].count() I can get the count of Ids in every snaphsots. But what I want to achieve is calculate the cumulative sum.
and have a df looks like this:
pointInTime ticketId cumCount
2008-01-01 111 3
2008-01-01 222 3
2008-01-01 333 3
2008-01-07 444 6
2008-01-07 555 6
2008-01-07 666 6
2008-01-14 777 9
2008-01-14 888 9
2008-01-14 999 9
so for 2008-01-07 number of ticket would be count of 2008-01-07 + count of 2008-01-01.

Use GroupBy.count and cumsum, then map the result back to "pointInTime":
df['cumCount'] = (
df['pointInTime'].map(df.groupby('pointInTime')['ticketId'].count().cumsum()))
df
pointInTime ticketId cumCount
0 2008-01-01 111 3
1 2008-01-01 222 3
2 2008-01-01 333 3
3 2008-01-07 444 6
4 2008-01-07 555 6
5 2008-01-07 666 6
6 2008-01-14 777 9
7 2008-01-14 888 9
8 2008-01-14 999 9

I am using value_counts
df.pointInTime.map(df.pointInTime.value_counts().sort_index().cumsum())
Out[207]:
0 3
1 3
2 3
3 6
4 6
5 6
6 9
7 9
8 9
Name: pointInTime, dtype: int64
Or
pd.Series(np.arange(len(df))+1,index=df.index).groupby(df['pointInTime']).transform('last')
Out[216]:
0 3
1 3
2 3
3 6
4 6
5 6
6 9
7 9
8 9
dtype: int32

Here's an approach transforming with the size and multiplying by the result of taking pd.factorize on pointInTime:
df['cumCount'] = (df.groupby('pointInTime').ticketId
.transform('size')
.mul(pd.factorize(df.pointInTime)[0]+1))
pointInTime ticketId cumCount
0 2008-01-01 111 3
1 2008-01-01 222 3
2 2008-01-01 333 3
3 2008-01-07 444 6
4 2008-01-07 555 6
5 2008-01-07 666 6
6 2008-01-14 777 9
7 2008-01-14 888 9
8 2008-01-14 999 9

Related

Create new column with largest number indexes based on values of another column

I have a DataFrame with two columns: 'goods name' and their 'overall sales'. I need to make another column which will contain the indexes with largest sales numerated from 1, 2, 3... Where 1 is the largest number, 2 second largest number and so on.
Hope you can help me.
My dataframe:
lst = [['Keyboard1', 1860], ['Keyboard2', 1650], ['Keyboard3', 900], ['Keyboard4', 1230], ['Keyboard5', 1150], ['Keyboard6', 1345],
['Mouse1', 3100], ['Mouse2', 2900], ['Mouse3', 3050], ['Mouse4', 2750], ['Mouse5', 4100], ['Mouse6', 3910]]
df = pd.DataFrame(lst, columns = ['Goods', 'Sales'])
Goods Sales
0 Keyboard1 1860
1 Keyboard2 1650
2 Keyboard3 900
3 Keyboard4 1230
4 Keyboard5 1150
5 Keyboard6 1345
6 Mouse1 3100
7 Mouse2 2900
8 Mouse3 3050
9 Mouse4 2750
10 Mouse5 4100
11 Mouse6 3910
I'm trying to use this code:
import pandas as pd
import numpy as np
df = df.sort_values('Sales', ascending = False)
df['Largest'] = np.arange(len(df))+1
But I get indexes of Largest values for all goods, I need to get Indexes of Largest values for each type of good separately. My result:
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
1 Keyboard2 1860 7
0 Keyboard1 1650 8
5 Keyboard6 1345 9
3 Keyboard4 1230 10
4 Keyboard5 1150 11
2 Keyboard3 900 12
Here is the output I need:
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
1 Keyboard2 1860 1
0 Keyboard1 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6

Just do:
# remove any number of groups at the end
df['goods_group'] = df['Goods'].str.replace('\d+$', '')
# sort by the new column and sales
df = df.sort_values(['goods_group', 'Sales'], ascending=False)
# create largest column
df['largest'] = df.groupby('goods_group').cumcount() + 1
# drop the new column
res = df.drop('goods_group', 1)
print(res)
Output
Goods Sales largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
0 Keyboard1 1860 1
1 Keyboard2 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6

Try adding these lines to the end of the code:
df['new'] = df['Goods'].str[:-1]
df['Largest'] = df.groupby('new').cumcount() + 1
df = df.drop('new', axis=1)
print(df)
Output:
Goods Sales new Largest
10 Mouse5 4100 Mouse 1
11 Mouse6 3910 Mouse 2
6 Mouse1 3100 Mouse 3
8 Mouse3 3050 Mouse 4
7 Mouse2 2900 Mouse 5
9 Mouse4 2750 Mouse 6
0 Keyboard1 1860 Keyboard 1
1 Keyboard2 1650 Keyboard 2
5 Keyboard6 1345 Keyboard 3
3 Keyboard4 1230 Keyboard 4
4 Keyboard5 1150 Keyboard 5
2 Keyboard3 900 Keyboard 6

You could groupby, Goods without the digits:
>>> df = df.sort_values('Sales', ascending=False)
>>> df
Goods Sales
10 Mouse5 4100
11 Mouse6 3910
6 Mouse1 3100
8 Mouse3 3050
7 Mouse2 2900
9 Mouse4 2750
0 Keyboard1 1860
1 Keyboard2 1650
5 Keyboard6 1345
3 Keyboard4 1230
4 Keyboard5 1150
2 Keyboard3 900
>>> df['Largest'] = df.groupby(df['Goods'].replace('\d+', '', regex=True)).cumcount() + 1
>>> df
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
0 Keyboard1 1860 1
1 Keyboard2 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6

Mark duplicates based on time difference between successive rows

There are duplicated transactions in a bank dataframe(DF). ID is customer IDs. Duplicated transaction is a multi-swipe, where a vendor accidentally charges a customer's card multiple times within a short time span (2 minutes here).
DF = pd.DataFrame({'ID': ['111', '111', '111','111', '222', '222', '222', '333', '333', '333', '333','111'],'Dollar': [1,3,1,10, 25, 8, 25,9,20, 9, 9,10],'transactionDateTime': ['2016-01-08 19:04:50', '2016-01-29 19:03:55', '2016-01-08 19:05:50', '2016-01-08 20:08:50', '2016-01-08 19:04:50', '2016-02-08 19:04:50', '2016-03-08 19:04:50', '2016-01-08 19:04:50', '2016-03-08 19:05:53', '2016-01-08 19:03:20', '2016-01-08 19:02:15', '2016-02-08 20:08:50']})
DF['transactionDateTime'] = pd.to_datetime(DF['transactionDateTime'])
ID Dollar transactionDateTime
0 111 1 2016-01-08 19:04:50
1 111 3 2016-01-29 19:03:55
2 111 1 2016-01-08 19:05:50
3 111 10 2016-01-08 20:08:50
4 222 25 2016-01-08 19:04:50
5 222 8 2016-02-08 19:04:50
6 222 25 2016-03-08 19:04:50
7 333 9 2016-01-08 19:04:50
8 333 20 2016-03-08 19:05:53
9 333 9 2016-01-08 19:03:20
10 333 9 2016-01-08 19:02:15
11 111 10 2016-02-08 20:08:50
I want to add a column to my dataframe, which recognizes the duplicated transactions (dollar amount of same customer ID should be the same, and transaction date time should be less than 2 minutes). Please consider the first transaction to be "normal".
ID Dollar transactionDateTime Duplicated?
0 111 1 2016-01-08 19:04:50 No
1 111 3 2016-01-29 19:03:55 No
2 111 1 2016-01-08 19:05:50 Yes
3 111 10 2016-01-08 20:08:50 No
4 222 25 2016-01-08 19:04:50 No
5 222 8 2016-02-08 19:04:50 No
6 222 25 2016-03-08 19:04:50 No
7 333 9 2016-01-08 19:04:50 Yes
8 333 20 2016-03-08 19:05:53 No
9 333 9 2016-01-08 19:03:20 Yes
10 333 9 2016-01-08 19:02:15 No
11 111 10 2016-02-08 20:08:50 No

IIUC, you can groupby and diff to check whether the difference between successive transactions is less than 120 seconds:
df['Duplicated?'] = (df.sort_values(['transactionDateTime'])
.groupby(['ID', 'Dollar'], sort=False)['transactionDateTime']
.diff()
.dt.total_seconds()
.lt(120))
df
ID Dollar transactionDateTime Duplicated?
0 111 1 2016-01-08 19:04:50 False
1 111 3 2016-01-29 19:03:55 False
2 111 1 2016-01-08 19:05:50 True
3 111 100 2016-01-08 20:08:50 False
4 222 25 2016-01-08 19:04:50 False
5 222 8 2016-02-08 19:04:50 False
6 222 25 2016-03-08 19:04:50 False
7 333 9 2016-01-08 19:04:50 True
8 333 20 2016-03-08 19:05:53 False
9 333 9 2016-01-08 19:03:20 True
10 333 9 2016-01-08 19:02:15 False
11 111 100 2016-02-08 20:08:50 False
Note that your data isn't sorted, so you must sort it first to get a meaningful result.

You can use:
m=(DF.groupby('customerID')['transactionDateTime'].diff()/ np.timedelta64(1, 'm')).le(2)
DF['Duplicated?']=np.where((DF.Dollar.duplicated()&m),'Yes','No')
print(DF)
customerID Dollar transactionDateTime Duplicated?
0 111 1 2016-01-08 19:04:50 No
1 111 3 2016-01-29 19:03:55 No
2 111 1 2016-01-08 19:05:50 Yes
3 111 100 2016-01-08 20:08:50 No
4 222 25 2016-01-08 19:04:50 No
5 222 8 2016-02-08 19:04:50 No
6 222 25 2016-03-08 19:04:50 No
7 333 9 2016-01-08 19:04:50 No
8 333 20 2016-03-08 19:05:53 No
9 333 9 2016-01-08 19:03:20 Yes
10 333 9 2016-01-08 19:02:15 Yes
11 111 100 2016-02-08 20:08:50 No

We can first mark the duplicate payments in your Dollar column. Then mark per customer if the difference is less then 2 minutes:
DF.sort_values(['customerID', 'transactionDateTime'], inplace=True)
m1 = DF.groupby('customerID', sort=False)['Dollar'].apply(lambda x: x.duplicated())
m2 = DF.groupby('customerID', sort=False)['transactionDateTime'].diff() <= pd.Timedelta(2, unit='minutes')
DF['Duplicated?'] = np.where(m1 & m2, 'Yes', 'No')
customerID Dollar transactionDateTime Duplicated?
0 111 1 2016-01-08 19:04:50 No
1 111 1 2016-01-08 19:05:50 Yes
2 111 100 2016-01-08 20:08:50 No
3 111 3 2016-01-29 19:03:55 No
4 111 100 2016-02-08 20:08:50 No
5 222 25 2016-01-08 19:04:50 No
6 222 8 2016-02-08 19:04:50 No
7 222 25 2016-03-08 19:04:50 No
8 333 9 2016-01-08 19:02:15 No
9 333 9 2016-01-08 19:03:20 Yes
10 333 9 2016-01-08 19:04:50 Yes
11 333 20 2016-03-08 19:05:53 No

I made pd.Timedelta(minutes=2) to compare against the diff()
m2 = pd.Timedelta(minutes=2)
DF['dup'] = DF.sort_values('transactionDateTime').groupby(['Dollar','ID']).transactionDateTime.diff().abs().le(m2).astype(int)
Out[272]:
Dollar ID transactionDateTime dup
0 1 111 2016-01-08 19:04:50 0
1 3 111 2016-01-29 19:03:55 0
2 1 111 2016-01-08 19:05:50 1
3 100 111 2016-01-08 20:08:50 0
4 25 222 2016-01-08 19:04:50 0
5 8 222 2016-02-08 19:04:50 0
6 25 222 2016-03-08 19:04:50 0
7 9 333 2016-01-08 19:04:50 1
8 20 333 2016-03-08 19:05:53 0
9 9 333 2016-01-08 19:03:20 1
10 9 333 2016-01-08 19:02:15 0
11 100 111 2016-02-08 20:08:50 0

How to compress rows after groupby in pandas

I have performed a groupby on my dataframe.
grouped = data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()
I am getting the below output :
data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()
Out[81]:
Cluster Visit Number Final
0 1 21846
2 1485
3 299
4 95
5 24
6 8
7 3
1 1 33600
2 2283
3 404
4 117
5 34
6 7
2 1 5858
2 311
3 55
4 14
5 6
6 3
7 1
3 1 19699
2 1101
3 214
4 78
5 14
6 8
7 3
4 1 10086
2 344
3 59
4 14
5 3
6 1
Name: Visitor_ID, dtype: int64
Now i want to compress the rows whose Visit Number Final >3(Add a new row which has the summation for visit number final 4,5,6). I am trying groupby.filter but not getting the expected output.
My final output should look like
Cluster Visit Number Final
0 1 21846
2 1485
3 299
>=4 130
1 1 33600
2 2283
3 404
>=4 158
2 1 5858
2 311
3 55
>=4 24
3 1 19699
2 1101
3 214
>=4 103
4 1 10086
2 344
3 59
>=4 18

The easiest way is to replace the 'Visit Number Final' values bigger than 3, before you group the dataframe:
df.loc[df['Visit Number Final'] > 3, 'Visit Number Final'] = '>=4'
df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()

Try this:
visit_val = df.index.get_level_values(1)
grp = np.where((visit_val <= 3) == 0, '>=4', visit_val)
(df.groupby(['Cluster',grp])['Number Final'].sum()
.reset_index().rename(columns={'level_1':'Visit'}))
Output:
Cluster Visit Number Final
0 0 1 21846
1 0 2 1485
2 0 3 299
3 0 >=4 130
4 1 1 33600
5 1 2 2283
6 1 3 404
7 1 >=4 158
8 2 1 5858
9 2 2 311
10 2 3 55
11 2 >=4 24
12 3 1 19699
13 3 2 1101
14 3 3 214
15 3 >=4 103
16 4 1 10086
17 4 2 344
18 4 3 59
19 4 >=4 18
Or to get dataframe with indexes:
(df.groupby(['Cluster',grp])['Number Final'].sum()
.rename_axis(['Cluster','Visit']).to_frame())
Output:
Number Final
Cluster Visit
0 1 21846
2 1485
3 299
>=4 130
1 1 33600
2 2283
3 404
>=4 158
2 1 5858
2 311
3 55
>=4 24
3 1 19699
2 1101
3 214
>=4 103
4 1 10086
2 344
3 59
>=4 18

Pandas: sum values in some column

I need to group elements and sum it with one column.
member_id event_path event_duration
0 111 vk.com 1
1 111 twitter.com 4
2 111 facebook.com 56
3 111 vk.com 23
4 222 vesti.ru 6
5 222 facebook.com 23
6 222 vk.com 56
7 333 avito.ru 8
8 333 avito.ru 4
9 444 mail.ru 7
10 444 vk.com 20
11 444 yandex.ru 40
12 111 vk.com 10
13 222 vk.com 20
And I want no unify member_id and event_path and sum event_duration.
Desire output
member_id event_path event_duration
0 111 vk.com 34
1 111 twitter.com 4
2 111 facebook.com 56
4 222 vesti.ru 6
5 222 facebook.com 23
6 222 vk.com 76
7 333 avito.ru 12
9 444 mail.ru 7
10 444 vk.com 20
11 444 yandex.ru 40
I use
df['event_duration'] = df.groupby(['member_id', 'event_path'])['event_duration'].transform('sum')
but I get
member_id event_path event_duration
0 111 vk.com 34
1 111 twitter.com 4
2 111 facebook.com 56
3 111 vk.com 34
4 222 vesti.ru 6
5 222 facebook.com 23
6 222 vk.com 76
7 333 avito.ru 12
8 333 avito.ru 12
9 444 mail.ru 7
10 444 vk.com 20
11 444 yandex.ru 40
12 111 vk.com 34
13 222 vk.com 76
What I do wrong?

You need groupby with parameters sort=False and as_index=False with aggregation sum:
df = df.groupby(['member_id','event_path'],sort=False,as_index=False)['event_duration'].sum()
print (df)
member_id event_path event_duration
0 111 vk.com 34
1 111 twitter.com 4
2 111 facebook.com 56
3 222 vesti.ru 6
4 222 facebook.com 23
5 222 vk.com 76
6 333 avito.ru 12
7 444 mail.ru 7
8 444 vk.com 20
9 444 yandex.ru 40
Another possible solution is add reset_index:
df = df.groupby(['member_id', 'event_path'],sort=False)['event_duration'].sum().reset_index()
print (df)
member_id event_path event_duration
0 111 vk.com 34
1 111 twitter.com 4
2 111 facebook.com 56
3 222 vesti.ru 6
4 222 facebook.com 23
5 222 vk.com 76
6 333 avito.ru 12
7 444 mail.ru 7
8 444 vk.com 20
9 444 yandex.ru 40
Function transform is used to add an aggregated calculation back to the original df as a new column.

What you are doing wrong is that you try to assign it to a column in the original dataframe. And since your new column has less rows than the original dataframe, it gets repeated at the end.

Pandas: union duplicate strings

I have dataframe
ID url date active_seconds
111 vk.com 12.01.2016 5
111 facebook.com 12.01.2016 4
111 facebook.com 12.01.2016 3
111 twitter.com 12.01.2016 12
222 vk.com 12.01.2016 8
222 twitter.com 12.01.2016 34
111 facebook.com 12.01.2016 5
and i need to get
ID url date active_seconds
111 vk.com 12.01.2016 5
111 facebook.com 12.01.2016 7
111 twitter.com 12.01.2016 12
222 vk.com 12.01.2016 8
222 twitter.com 12.01.2016 34
111 facebook.com 12.01.2016 5
If I try
df.groupby(['ID', 'url'])['active_seconds'].sum()
it unions all strings. How should I do to get desirable?

(s != s.shift()).cumsum() is a typical way to identify groups of contiguous identifiers
pd.DataFrame.assign is a convenient way to add a new column to a copy of a dataframe and chain more methods
pivot_table allows us to reconfigure our table and aggregate
args - this is a style preference of mine to keep code cleaner looking. I'll pass these arguments to pivot_table via *args
reset_index * 2 to clean up and get to final result
args = ('active_seconds', ['g', 'ID', 'url', 'date'], None, 'sum')
df.assign(g=df.ID.ne(df.ID.shift()).cumsum()).pivot_table(*args) \
.reset_index([1, 2, 3]).reset_index(drop=True)
ID url date active_seconds
0 111 facebook.com 12.01.2016 7
1 111 twitter.com 12.01.2016 12
2 111 vk.com 12.01.2016 5
3 222 twitter.com 12.01.2016 34
4 222 vk.com 12.01.2016 8
5 111 facebook.com 12.01.2016 5

Solutions 1 - cumsum by column url only:
You need groupby by custom Series created by cumsum of boolean mask, but then column url need aggregate by first. Then remove level url with reset_index and last reorder columns by reindex:
g = (df.url != df.url.shift()).cumsum()
print (g)
0 1
1 2
2 2
3 3
4 4
5 5
6 6
Name: url, dtype: int32
g = (df.url != df.url.shift()).cumsum()
#another solution with ne
#g = df.url.ne(df.url.shift()).cumsum()
print (df.groupby([df.ID,df.date,g], sort=False).agg({'active_seconds':'sum', 'url':'first'})
.reset_index(level='url', drop=True)
.reset_index()
.reindex(columns=df.columns))
ID url date active_seconds
0 111 vk.com 12.01.2016 5
1 111 facebook.com 12.01.2016 7
2 111 twitter.com 12.01.2016 12
3 222 vk.com 12.01.2016 8
4 222 twitter.com 12.01.2016 34
5 111 facebook.com 12.01.2016 5
g = (df.url != df.url.shift()).cumsum().rename('tmp')
print (g)
0 1
1 2
2 2
3 3
4 4
5 5
6 6
Name: tmp, dtype: int32
print (df.groupby([df.ID, df.url, df.date, g], sort=False)['active_seconds']
.sum()
.reset_index(level='tmp', drop=True)
.reset_index())
ID url date active_seconds
0 111 vk.com 12.01.2016 5
1 111 facebook.com 12.01.2016 7
2 111 twitter.com 12.01.2016 12
3 222 vk.com 12.01.2016 8
4 222 twitter.com 12.01.2016 34
5 111 facebook.com 12.01.2016 5
Solutions 2 - cumsum by columns ID and url:
g = df[['ID','url']].ne(df[['ID','url']].shift()).cumsum()
print (g)
ID url
0 1 1
1 1 2
2 1 2
3 1 3
4 2 4
5 2 5
6 3 6
print (df.groupby([g.ID, df.date, g.url], sort=False)
.agg({'active_seconds':'sum', 'url':'first'})
.reset_index(level='url', drop=True)
.reset_index()
.reindex(columns=df.columns))
ID url date active_seconds
0 1 vk.com 12.01.2016 5
1 1 facebook.com 12.01.2016 7
2 1 twitter.com 12.01.2016 12
3 2 vk.com 12.01.2016 8
4 2 twitter.com 12.01.2016 34
5 3 facebook.com 12.01.2016 5
And solution where add column df.url, but is necessary rename columns in helper df:
g = df[['ID','url']].ne(df[['ID','url']].shift()).cumsum()
g.columns = g.columns + '1'
print (g)
ID1 url1
0 1 1
1 1 2
2 1 2
3 1 3
4 2 4
5 2 5
6 3 6
print (df.groupby([df.ID, df.url, df.date, g.ID1, g.url1], sort=False)['active_seconds']
.sum()
.reset_index(level=['ID1','url1'], drop=True)
.reset_index())
ID url date active_seconds
0 111 vk.com 12.01.2016 5
1 111 facebook.com 12.01.2016 7
2 111 twitter.com 12.01.2016 12
3 222 vk.com 12.01.2016 8
4 222 twitter.com 12.01.2016 34
5 111 facebook.com 12.01.2016 5
Timings:
Similar solutions, but pivot_table is slowier as groupby:
In [180]: %timeit (df.assign(g=df.ID.ne(df.ID.shift()).cumsum()).pivot_table('active_seconds', ['g', 'ID', 'url', 'date'], None, 'sum').reset_index([1, 2, 3]).reset_index(drop=True))
100 loops, best of 3: 5.02 ms per loop
In [181]: %timeit (df.groupby([df.ID, df.url, df.date, (df.url != df.url.shift()).cumsum().rename('tmp')], sort=False)['active_seconds'].sum().reset_index(level='tmp', drop=True).reset_index())
100 loops, best of 3: 3.62 ms per loop

it looks like you want a cumsum():
In [195]: df.groupby(['ID', 'url'])['active_seconds'].cumsum()
Out[195]:
0 5
1 4
2 7
3 12
4 8
5 34
6 12
Name: active_seconds, dtype: int64

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate cumulative groupby counts in Pandas with point in time? - python

Related

Create new column with largest number indexes based on values of another column

Mark duplicates based on time difference between successive rows

How to compress rows after groupby in pandas

Pandas: sum values in some column

Pandas: union duplicate strings

Categories

Resources