cumcount() None

cumcount() None - python

I would like to have a new columnen ( not_ordered_in_STREET_x_before_my_car ) that counts the None values in my Dataframe up until the row I am in, groupted by x, sorted by x and y.
import pandas as pd
x_start = 1
y_start = 1
size_city = 10
cars = pd.DataFrame({'x': np.repeat(np.arange(x_start,x_start+size_city),size_city),
'y': np.tile(np.arange(y_start,y_start+size_city),size_city),
'pizza_ordered' : np.repeat([None,None,1,6,3,7,5,None,8,9,0,None,None,None,4,None,11,12,14,15],5)})
The first 4 columns is what I have, and the 5th is the one I want.
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 1 2
3 1 4 2 2
4 1 5 1 2
5 1 6 1 2
6 1 7 1 2
7 1 8 None 2
8 1 9 1 3
9 1 10 4 3
10 2 1 1 0
11 2 2 None 0
12 2 3 None 1
13 2 4 None 2
14 2 5 4 3
15 2 6 None 3
16 2 7 5 4
17 2 8 3 4
18 2 9 1 4
19 2 10 1 4
This is what I have tried, but it does not work.
cars = cars.sort_values(['x', 'y'])
cars['not_ordered_in_STREET_x_before_my_car'] = cars.where(cars['pizza_ordered'].isnull()).groupby(['x']).cumcount().add(1)

You can try:
cars["not_ordered_in_STREET_x_before_my_car"] = cars.groupby("x")[
"pizza_ordered"
].transform(lambda x: x.isna().cumsum().shift(1).fillna(0).astype(int
))
print(cars)
Prints:
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 None 2
3 1 4 None 3
4 1 5 None 4
5 1 6 None 5
6 1 7 None 6
7 1 8 None 7
8 1 9 None 8
9 1 10 None 9
10 2 1 1 0
11 2 2 1 0
12 2 3 1 0
13 2 4 1 0
14 2 5 1 0
15 2 6 6 0
16 2 7 6 0
17 2 8 6 0
18 2 9 6 0
19 2 10 6 0
20 3 1 3 0
21 3 2 3 0
22 3 3 3 0
23 3 4 3 0
24 3 5 3 0
25 3 6 7 0
26 3 7 7 0
27 3 8 7 0
28 3 9 7 0
29 3 10 7 0
30 4 1 5 0
31 4 2 5 0
32 4 3 5 0
33 4 4 5 0
34 4 5 5 0
35 4 6 None 0
36 4 7 None 1
37 4 8 None 2
38 4 9 None 3
39 4 10 None 4
40 5 1 8 0
41 5 2 8 0
42 5 3 8 0
43 5 4 8 0
44 5 5 8 0
45 5 6 9 0
46 5 7 9 0
47 5 8 9 0
48 5 9 9 0
49 5 10 9 0
50 6 1 0 0
51 6 2 0 0
52 6 3 0 0
53 6 4 0 0
54 6 5 0 0
55 6 6 None 0
56 6 7 None 1
57 6 8 None 2
58 6 9 None 3
59 6 10 None 4
60 7 1 None 0
61 7 2 None 1
62 7 3 None 2
63 7 4 None 3
64 7 5 None 4
65 7 6 None 5
66 7 7 None 6
67 7 8 None 7
68 7 9 None 8
69 7 10 None 9
70 8 1 4 0
71 8 2 4 0
72 8 3 4 0
73 8 4 4 0
74 8 5 4 0
75 8 6 None 0
76 8 7 None 1
77 8 8 None 2
78 8 9 None 3
79 8 10 None 4
80 9 1 11 0
81 9 2 11 0
82 9 3 11 0
83 9 4 11 0
84 9 5 11 0
85 9 6 12 0
86 9 7 12 0
87 9 8 12 0
88 9 9 12 0
89 9 10 12 0
90 10 1 14 0
91 10 2 14 0
92 10 3 14 0
93 10 4 14 0
94 10 5 14 0
95 10 6 15 0
96 10 7 15 0
97 10 8 15 0
98 10 9 15 0
99 10 10 15 0

cars['not_ordered_in_STREET_x_before_my_car'] = pd.isnull(cars['pizza_ordered']).cumsum()

Related

pandas: Create new column by comparing DataFrame rows of one column of DataFrame

assume i have df:
pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
data
0 0
1 0
2 0
3 1
4 1
5 1
6 2
7 2
8 2
9 3
10 3
11 4
12 4
13 5
14 5
15 0
16 0
17 0
18 0
19 2
20 2
21 2
22 2
23 4
24 4
25 4
26 4
I'm looking for a way to create a new column in df that shows the number of data items repeated in new column For example:
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4
My logic was to get the rows to python list compare and create a new list.
Is there a simple way to do this?

Example
df = pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
Code
grouper = df['data'].ne(df['data'].shift(1)).cumsum()
df['new'] = df.groupby(grouper).cumcount().add(1)
df
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4

Cumulative Sum based on a Trigger

I am trying to track cumulative sums of the 'Value' column that should begin every time I get 1 in the 'Signal' column.
So in the table below I need to obtain 3 cumulative sums starting at values 3, 6, and 9 of the index, and each sum ending at value 11 of the index:
Index
Value
Signal
0
3
0
1
8
0
2
8
0
3
7
1
4
9
0
5
10
0
6
14
1
7
10
0
8
10
0
9
4
1
10
10
0
11
10
0
What would be a way to do it?
Expected Output:
Index
Value
Signal
Cumsum_1
Cumsum_2
Cumsum_3
0
3
0
0
0
0
1
8
0
0
0
0
2
8
0
0
0
0
3
7
1
7
0
0
4
9
0
16
0
0
5
10
0
26
0
0
6
14
1
40
14
0
7
10
0
50
24
0
8
10
0
60
34
0
9
4
1
64
38
4
10
10
0
74
48
14
11
10
0
84
58
24

You can pivot, bfill, then cumsum:
df.merge(df.assign(id=df['Signal'].cumsum().add(1))
.pivot(index='Index', columns='id', values='Value')
.bfill(axis=1).fillna(0, downcast='infer')
.cumsum()
.add_prefix('cumsum'),
left_on='Index', right_index=True
)
output:
Index Value Signal cumsum1 cumsum2 cumsum3 cumsum4
0 0 3 0 3 0 0 0
1 1 8 0 11 0 0 0
2 2 8 0 19 0 0 0
3 3 7 1 26 7 0 0
4 4 9 0 35 16 0 0
5 5 10 0 45 26 0 0
6 6 14 1 59 40 14 0
7 7 10 0 69 50 24 0
8 8 10 0 79 60 34 0
9 9 4 1 83 64 38 4
10 10 10 0 93 74 48 14
11 11 10 0 103 84 58 24
older answer
IIUC, you can use groupby.cumsum:
df['cumsum'] = df.groupby(df['Signal'].cumsum())['Value'].cumsum()
output:
Index Value Signal cumsum
0 0 3 0 3
1 1 8 0 11
2 2 8 0 19
3 3 7 1 7
4 4 9 0 16
5 5 10 0 26
6 6 14 1 14
7 7 10 0 24
8 8 10 0 34
9 9 4 1 4
10 10 10 0 14
11 11 10 0 24

Iterate over three lists with different lengths simultaneously

So I tried to iterate over 3 lists simultaneously using zip and itertools cycle in python 3, but it gave me something I didn't want. Suppose that I have
list_a = [0,1,2,3,4,5,6,7,8,9,10,11]
list_b = [0,1,2,3,4,5,6,7,8,9,10,11]
list_c = [0,1,2,3,4,5,6,7,8,9,10,11,
12,13,14,15,16,17,18,19,20,21,22,23,
24,25,26,27,28,29,30,31,32,33,34,35,
36,37,38,39,40,41,42,43,44,45,46,47,
48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,
72,73,74,75,76,77,78,79,80,81,82,83,
84,85,86,87,88,89,90,91,92,93,94,95,
96,97,98,99,100,101,102,103,104,105,106,107,
108,109,110,111,112,113,114,115,116,117,118,119,
120,121,122,123,124,125,126,127,128,129,130,131,
132,133,134,135,136,137,138,139,140,141,142,143]
I have tried this:
from itertools import cycle
for val_a in list_a:
for val_b, val_c in zip(cycle(list_b), list_c):
print(val_a, val_b, val_c)
my output is:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
0 0 12
0 1 13
0 2 14
0 3 15
0 4 16
0 5 17
0 6 18
0 7 19
0 8 20
0 9 21
0 10 22
0 11 23
0 0 24
0 1 25
0 2 26
0 3 27
0 4 28
0 5 29
0 6 30
0 7 31
0 8 32
0 9 33
0 10 34
0 11 35
. . .
. . .
. . .
. . .
. . .
and so on...
I expect the output:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
1 0 12
1 1 13
1 2 14
1 3 15
1 4 16
1 5 17
1 6 18
1 7 19
1 8 20
1 9 21
1 10 22
1 11 23
2 0 24
2 1 25
2 2 26
2 3 27
2 4 28
2 5 29
2 6 30
2 7 31
2 8 32
2 9 33
2 10 34
2 11 35
. . .
. . .
. . .
. . .
. . .
11 9 141
11 10 142
11 11 143
I have tried without using itertools cycle, using itertools.izip_longest and changing the order of iteration of lists. What should I do?

It appears you don't want to cycle through any lists at all. Instead you want to go through every element in b for each element in a, while incrementing c.
Turn c into an iterator like so to increment it, and proceed with the nested for loop like so:
iter_c = iter(list_c)
for val_a in list_a:
for val_b, val_c in zip(list_b, iter_c):
print(val_a, val_b, val_c)
Output:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
1 0 12
1 1 13
1 2 14
1 3 15
1 4 16
1 5 17
1 6 18
1 7 19
1 8 20
1 9 21
1 10 22
1 11 23
2 0 24
2 1 25
2 2 26
2 3 27
2 4 28
2 5 29
2 6 30
2 7 31
2 8 32
2 9 33
2 10 34
2 11 35
. . .
. . .
. . .
. . .
. . .
11 9 141
11 10 142
11 11 143

Pandas group by with sum on few columns and retain the other column

I have a table which look like this.
msno date num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150513 0 0 0 0 1 1
1 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150709 9 1 0 0 7 11
2 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150105 3 3 0 0 68 36
3 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150306 1 0 1 1 97 27
4 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150501 3 0 0 0 38 38
5 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150702 4 0 1 1 33 10
6 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150830 3 1 0 0 4 7
7 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20151107 1 0 0 0 4 5
8 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160110 2 0 1 0 11 6
9 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160316 9 3 4 1 67 50
10 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160510 5 3 2 1 67 66
11 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160804 1 4 5 0 36 43
12 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160926 7 1 0 1 38 20
13 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20161115 0 1 4 1 38 40
14 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20170106 0 0 0 1 39 38
15 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20151201 3 3 2 0 8 11
16 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20160628 0 0 1 1 1 3
17 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20170106 2 1 0 0 35 34
18 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20150803 0 0 0 0 16 11
19 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160527 4 3 0 2 2 11
20 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160808 14 3 4 1 15 31
How should I sum up the columns 'num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs' to get the total and left only one unique msno number?
For example, after group all same msno number rows, it will produce result below, discarding date column.
msno num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 9 1 0 0 8 12
I tried this but the msno still duplicated and date column is still there.
df_user_logs_v2.groupby(['msno', 'date'])['num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'].sum()

Use drop + groupby + sum:
df = df_user_logs_v2.drop('date', axis=1).groupby('msno', as_index=False).sum()

Pivot column and column values in pandas dataframe

I have a dataframe that looks like this, but with 26 rows and 110 columns:
index/io 1 2 3 4
0 42 53 23 4
1 53 24 6 12
2 63 12 65 34
3 13 64 23 43
Desired output:
index io value
0 1 42
0 2 53
0 3 23
0 4 4
1 1 53
1 2 24
1 3 6
1 4 12
2 1 63
2 2 12
...
I have tried with dict and lists by transforming the dataframe to dict, and then create a new list with index values and update in new dict with io.
indx = []
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indx.append(key)
indxio = {}
for element in indx:
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indxio.update({element:k})
I know this is too far probably, but it's the only thing I could think of. The process was too long, so I stopped.

You can use set_index, stack, and reset_index().
df.set_index("index/io").stack().reset_index(name="value")\
.rename(columns={'index/io':'index','level_1':'io'})
Output:
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43

You need set_index + stack + rename_axis + reset_index:
df = df.set_index('index/io').stack().rename_axis(('index','io')).reset_index(name='value')
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
Solution with melt, rename, but there is different order of values, so sort_values is necessary:
d = {'index/io':'index'}
df = df.melt('index/io', var_name='io', value_name='value') \
.rename(columns=d).sort_values(['index','io']).reset_index(drop=True)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
And alternative solution for numpy lovers:
df = df.set_index('index/io')
a = np.repeat(df.index, len(df.columns))
b = np.tile(df.columns, len(df.index))
c = df.values.ravel()
cols = ['index','io','value']
df = pd.DataFrame(np.column_stack([a,b,c]), columns = cols)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

cumcount() None - python

cars['not_ordered_in_STREET_x_before_my_car'] = pd.isnull(cars['pizza_ordered']).cumsum()

Related

pandas: Create new column by comparing DataFrame rows of one column of DataFrame

Cumulative Sum based on a Trigger

Iterate over three lists with different lengths simultaneously

Pandas group by with sum on few columns and retain the other column

Pivot column and column values in pandas dataframe

Categories

Resources