Cumulative Sum based on a Trigger

Cumulative Sum based on a Trigger - python

I am trying to track cumulative sums of the 'Value' column that should begin every time I get 1 in the 'Signal' column.
So in the table below I need to obtain 3 cumulative sums starting at values 3, 6, and 9 of the index, and each sum ending at value 11 of the index:
Index
Value
Signal
0
3
0
1
8
0
2
8
0
3
7
1
4
9
0
5
10
0
6
14
1
7
10
0
8
10
0
9
4
1
10
10
0
11
10
0
What would be a way to do it?
Expected Output:
Index
Value
Signal
Cumsum_1
Cumsum_2
Cumsum_3
0
3
0
0
0
0
1
8
0
0
0
0
2
8
0
0
0
0
3
7
1
7
0
0
4
9
0
16
0
0
5
10
0
26
0
0
6
14
1
40
14
0
7
10
0
50
24
0
8
10
0
60
34
0
9
4
1
64
38
4
10
10
0
74
48
14
11
10
0
84
58
24

You can pivot, bfill, then cumsum:
df.merge(df.assign(id=df['Signal'].cumsum().add(1))
.pivot(index='Index', columns='id', values='Value')
.bfill(axis=1).fillna(0, downcast='infer')
.cumsum()
.add_prefix('cumsum'),
left_on='Index', right_index=True
)
output:
Index Value Signal cumsum1 cumsum2 cumsum3 cumsum4
0 0 3 0 3 0 0 0
1 1 8 0 11 0 0 0
2 2 8 0 19 0 0 0
3 3 7 1 26 7 0 0
4 4 9 0 35 16 0 0
5 5 10 0 45 26 0 0
6 6 14 1 59 40 14 0
7 7 10 0 69 50 24 0
8 8 10 0 79 60 34 0
9 9 4 1 83 64 38 4
10 10 10 0 93 74 48 14
11 11 10 0 103 84 58 24
older answer
IIUC, you can use groupby.cumsum:
df['cumsum'] = df.groupby(df['Signal'].cumsum())['Value'].cumsum()
output:
Index Value Signal cumsum
0 0 3 0 3
1 1 8 0 11
2 2 8 0 19
3 3 7 1 7
4 4 9 0 16
5 5 10 0 26
6 6 14 1 14
7 7 10 0 24
8 8 10 0 34
9 9 4 1 4
10 10 10 0 14
11 11 10 0 24

Related

How to calculate an an accumulated value conditionally?

This question is based on this thread.
I have the following dataframe:
diff_hours stage sensor
0 0 20
0 0 21
0 0 21
1 0 22
5 0 21
0 0 22
0 1 20
7 1 23
0 1 24
0 3 25
0 3 28
6 0 21
0 0 22
I need to calculated an accumulated value of diff_hours while stage is growing. When stage drops to 0, the accumulated value acc_hours should restart to 0 even though diff_hours might not be equal to 0.
The proposed solution is this one:
blocks = df['stage'].diff().lt(0).cumsum()
df['acc_hours'] = df['diff_hours'].groupby(blocks).cumsum()
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 6
12 0 0 22 6
On the line 11 the value of acc_hours is equal to 6. I need it to be restarted to 0, because the stage dropped from 3 back to 0 in row 11.
The expected output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 0
How can I implement this logic?

The expected output is unclear, what about a simple mask?
Masking only the value during the change:
m = df['stage'].diff().lt(0)
df['acc_hours'] = (df.groupby(m.cumsum())
['diff_hours'].cumsum()
.mask(m, 0)
)
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 6
13 3 0 22 9
14 0 0 22 9
Or ignoring the value completely bu masking before groupby:
m = df['stage'].diff().lt(0)
df['acc_hours'] = (df['diff_hours'].mask(m, 0)
.groupby(m.cumsum())
.cumsum()
)
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 0
13 3 0 22 3
14 0 0 22 3

How can I create unique id based on the value in the other column

I wanted to assign the unique id based on the value from the column. For ex. i have a table like this:
df = pd.DataFrame({'A': [0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1]}
Eventually I would like to have my output table looks like this:
A
id
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
1
2
8
1
2
9
1
2
10
1
2
11
1
2
12
1
2
13
0
3
14
0
3
15
0
3
16
0
3
17
0
3
18
0
3
19
1
4
20
1
4
21
1
4
22
0
5
23
0
5
24
0
5
25
0
5
26
1
6
27
1
6
28
1
6
I tried data.groupby(['a'], sort=False).ngroup() + 1 but its not working as what I want. Any help and guidance will be appreciated! thanks!

diff + cumsum:
df['id'] = df.A.diff().ne(0).cumsum()
df
A id
0 0 1
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 2
8 1 2
9 1 2
10 1 2
11 1 2
12 1 2
13 0 3
14 0 3
15 0 3
16 0 3
17 0 3
18 0 3
19 1 4
20 1 4
21 1 4
22 0 5
23 0 5
24 0 5
25 0 5
26 1 6
27 1 6
28 1 6

import pdrle
df["id"] = pdrle.get_id(df["A"]) + 1
df
# A id
# 0 0 1
# 1 0 1
# 2 0 1
# 3 0 1
# 4 0 1
# 5 0 1
# 6 0 1
# 7 1 2
# 8 1 2
# 9 1 2
# 10 1 2
# 11 1 2
# 12 1 2
# 13 0 3
# 14 0 3
# 15 0 3
# 16 0 3
# 17 0 3
# 18 0 3
# 19 1 4
# 20 1 4
# 21 1 4
# 22 0 5
# 23 0 5
# 24 0 5
# 25 0 5
# 26 1 6
# 27 1 6
# 28 1 6

cumcount() None

I would like to have a new columnen ( not_ordered_in_STREET_x_before_my_car ) that counts the None values in my Dataframe up until the row I am in, groupted by x, sorted by x and y.
import pandas as pd
x_start = 1
y_start = 1
size_city = 10
cars = pd.DataFrame({'x': np.repeat(np.arange(x_start,x_start+size_city),size_city),
'y': np.tile(np.arange(y_start,y_start+size_city),size_city),
'pizza_ordered' : np.repeat([None,None,1,6,3,7,5,None,8,9,0,None,None,None,4,None,11,12,14,15],5)})
The first 4 columns is what I have, and the 5th is the one I want.
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 1 2
3 1 4 2 2
4 1 5 1 2
5 1 6 1 2
6 1 7 1 2
7 1 8 None 2
8 1 9 1 3
9 1 10 4 3
10 2 1 1 0
11 2 2 None 0
12 2 3 None 1
13 2 4 None 2
14 2 5 4 3
15 2 6 None 3
16 2 7 5 4
17 2 8 3 4
18 2 9 1 4
19 2 10 1 4
This is what I have tried, but it does not work.
cars = cars.sort_values(['x', 'y'])
cars['not_ordered_in_STREET_x_before_my_car'] = cars.where(cars['pizza_ordered'].isnull()).groupby(['x']).cumcount().add(1)

You can try:
cars["not_ordered_in_STREET_x_before_my_car"] = cars.groupby("x")[
"pizza_ordered"
].transform(lambda x: x.isna().cumsum().shift(1).fillna(0).astype(int
))
print(cars)
Prints:
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 None 2
3 1 4 None 3
4 1 5 None 4
5 1 6 None 5
6 1 7 None 6
7 1 8 None 7
8 1 9 None 8
9 1 10 None 9
10 2 1 1 0
11 2 2 1 0
12 2 3 1 0
13 2 4 1 0
14 2 5 1 0
15 2 6 6 0
16 2 7 6 0
17 2 8 6 0
18 2 9 6 0
19 2 10 6 0
20 3 1 3 0
21 3 2 3 0
22 3 3 3 0
23 3 4 3 0
24 3 5 3 0
25 3 6 7 0
26 3 7 7 0
27 3 8 7 0
28 3 9 7 0
29 3 10 7 0
30 4 1 5 0
31 4 2 5 0
32 4 3 5 0
33 4 4 5 0
34 4 5 5 0
35 4 6 None 0
36 4 7 None 1
37 4 8 None 2
38 4 9 None 3
39 4 10 None 4
40 5 1 8 0
41 5 2 8 0
42 5 3 8 0
43 5 4 8 0
44 5 5 8 0
45 5 6 9 0
46 5 7 9 0
47 5 8 9 0
48 5 9 9 0
49 5 10 9 0
50 6 1 0 0
51 6 2 0 0
52 6 3 0 0
53 6 4 0 0
54 6 5 0 0
55 6 6 None 0
56 6 7 None 1
57 6 8 None 2
58 6 9 None 3
59 6 10 None 4
60 7 1 None 0
61 7 2 None 1
62 7 3 None 2
63 7 4 None 3
64 7 5 None 4
65 7 6 None 5
66 7 7 None 6
67 7 8 None 7
68 7 9 None 8
69 7 10 None 9
70 8 1 4 0
71 8 2 4 0
72 8 3 4 0
73 8 4 4 0
74 8 5 4 0
75 8 6 None 0
76 8 7 None 1
77 8 8 None 2
78 8 9 None 3
79 8 10 None 4
80 9 1 11 0
81 9 2 11 0
82 9 3 11 0
83 9 4 11 0
84 9 5 11 0
85 9 6 12 0
86 9 7 12 0
87 9 8 12 0
88 9 9 12 0
89 9 10 12 0
90 10 1 14 0
91 10 2 14 0
92 10 3 14 0
93 10 4 14 0
94 10 5 14 0
95 10 6 15 0
96 10 7 15 0
97 10 8 15 0
98 10 9 15 0
99 10 10 15 0

cars['not_ordered_in_STREET_x_before_my_car'] = pd.isnull(cars['pizza_ordered']).cumsum()

Finding Occurrences SUM using Dataframe

I have a data frame and I need to group by at least one occurrence greater than 0 and I need to sum it to last occurance. My code is below
data = {'id':
[7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
'timeatAcc':
[0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,0,0,1,1,0,0,0,0,0]
}
df =pd.DataFrame(data, columns= ['id','timeatAcc'])
df['consecutive'] = df['id'].groupby((df['timeatAcc'] !=
df['timeatAcc'].shift()).cumsum()).transform('size') * df['timeatAcc']
print(df)
Current Output
Expected output
Need help and thanks in advance

Let's try groupby().diff():
df['Occurences'] = df.groupby('id')['timeatAcc'].diff(-1).eq(1).astype(int)
Output:
id timeatAcc Occurences
0 7 0 0
1 7 0 0
2 7 0 0
3 7 0 0
4 7 0 0
5 7 0 0
6 7 0 0
7 7 0 0
8 7 1 0
9 7 1 0
10 7 1 1
11 7 0 0
12 7 0 0
13 7 1 0
14 7 1 1
15 7 0 0
16 7 0 0
17 7 1 0
18 7 1 0
19 7 1 0
20 1 1 0
21 1 1 0
22 1 1 1
23 1 0 0
24 1 0 0
25 1 0 0
26 1 0 0
27 1 0 0
28 1 1 0
29 1 1 0
30 1 1 1
31 1 0 0
32 1 0 0
33 1 1 0
34 1 1 1
35 1 0 0
36 1 0 0
37 1 0 0
38 1 0 0
39 1 0 0
Update: to get the sum instead of 1:
df['Occurences'] = df.groupby(['id', df['timeatAcc'].eq(0).cumsum()])['timeatAcc'].transform('sum')
df['Occurences'] = np.where(df.groupby('id')['timeatAcc'].diff(-1).eq(1).astype(int)
, df['Occurences'], 0)
Output:
id timeatAcc Occurences
0 7 0 0
1 7 0 0
2 7 0 0
3 7 0 0
4 7 0 0
5 7 0 0
6 7 0 0
7 7 0 0
8 7 1 0
9 7 1 0
10 7 1 3
11 7 0 0
12 7 0 0
13 7 1 0
14 7 1 2
15 7 0 0
16 7 0 0
17 7 1 0
18 7 1 0
19 7 1 0
20 1 1 0
21 1 1 0
22 1 1 3
23 1 0 0
24 1 0 0
25 1 0 0
26 1 0 0
27 1 0 0
28 1 1 0
29 1 1 0
30 1 1 3
31 1 0 0
32 1 0 0
33 1 1 0
34 1 1 2
35 1 0 0
36 1 0 0
37 1 0 0
38 1 0 0
39 1 0 0

Pandas group by with sum on few columns and retain the other column

I have a table which look like this.
msno date num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150513 0 0 0 0 1 1
1 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150709 9 1 0 0 7 11
2 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150105 3 3 0 0 68 36
3 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150306 1 0 1 1 97 27
4 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150501 3 0 0 0 38 38
5 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150702 4 0 1 1 33 10
6 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150830 3 1 0 0 4 7
7 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20151107 1 0 0 0 4 5
8 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160110 2 0 1 0 11 6
9 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160316 9 3 4 1 67 50
10 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160510 5 3 2 1 67 66
11 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160804 1 4 5 0 36 43
12 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160926 7 1 0 1 38 20
13 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20161115 0 1 4 1 38 40
14 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20170106 0 0 0 1 39 38
15 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20151201 3 3 2 0 8 11
16 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20160628 0 0 1 1 1 3
17 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20170106 2 1 0 0 35 34
18 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20150803 0 0 0 0 16 11
19 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160527 4 3 0 2 2 11
20 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160808 14 3 4 1 15 31
How should I sum up the columns 'num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs' to get the total and left only one unique msno number?
For example, after group all same msno number rows, it will produce result below, discarding date column.
msno num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 9 1 0 0 8 12
I tried this but the msno still duplicated and date column is still there.
df_user_logs_v2.groupby(['msno', 'date'])['num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'].sum()

Use drop + groupby + sum:
df = df_user_logs_v2.drop('date', axis=1).groupby('msno', as_index=False).sum()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cumulative Sum based on a Trigger - python

Related

How to calculate an an accumulated value conditionally?

How can I create unique id based on the value in the other column

cumcount() None

Finding Occurrences SUM using Dataframe

Pandas group by with sum on few columns and retain the other column

Categories

Resources