cumcount() None - python

I would like to have a new columnen ( not_ordered_in_STREET_x_before_my_car ) that counts the None values in my Dataframe up until the row I am in, groupted by x, sorted by x and y.
import pandas as pd
x_start = 1
y_start = 1
size_city = 10
cars = pd.DataFrame({'x': np.repeat(np.arange(x_start,x_start+size_city),size_city),
'y': np.tile(np.arange(y_start,y_start+size_city),size_city),
'pizza_ordered' : np.repeat([None,None,1,6,3,7,5,None,8,9,0,None,None,None,4,None,11,12,14,15],5)})
The first 4 columns is what I have, and the 5th is the one I want.
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 1 2
3 1 4 2 2
4 1 5 1 2
5 1 6 1 2
6 1 7 1 2
7 1 8 None 2
8 1 9 1 3
9 1 10 4 3
10 2 1 1 0
11 2 2 None 0
12 2 3 None 1
13 2 4 None 2
14 2 5 4 3
15 2 6 None 3
16 2 7 5 4
17 2 8 3 4
18 2 9 1 4
19 2 10 1 4
This is what I have tried, but it does not work.
cars = cars.sort_values(['x', 'y'])
cars['not_ordered_in_STREET_x_before_my_car'] = cars.where(cars['pizza_ordered'].isnull()).groupby(['x']).cumcount().add(1)

You can try:
cars["not_ordered_in_STREET_x_before_my_car"] = cars.groupby("x")[
"pizza_ordered"
].transform(lambda x: x.isna().cumsum().shift(1).fillna(0).astype(int
))
print(cars)
Prints:
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 None 2
3 1 4 None 3
4 1 5 None 4
5 1 6 None 5
6 1 7 None 6
7 1 8 None 7
8 1 9 None 8
9 1 10 None 9
10 2 1 1 0
11 2 2 1 0
12 2 3 1 0
13 2 4 1 0
14 2 5 1 0
15 2 6 6 0
16 2 7 6 0
17 2 8 6 0
18 2 9 6 0
19 2 10 6 0
20 3 1 3 0
21 3 2 3 0
22 3 3 3 0
23 3 4 3 0
24 3 5 3 0
25 3 6 7 0
26 3 7 7 0
27 3 8 7 0
28 3 9 7 0
29 3 10 7 0
30 4 1 5 0
31 4 2 5 0
32 4 3 5 0
33 4 4 5 0
34 4 5 5 0
35 4 6 None 0
36 4 7 None 1
37 4 8 None 2
38 4 9 None 3
39 4 10 None 4
40 5 1 8 0
41 5 2 8 0
42 5 3 8 0
43 5 4 8 0
44 5 5 8 0
45 5 6 9 0
46 5 7 9 0
47 5 8 9 0
48 5 9 9 0
49 5 10 9 0
50 6 1 0 0
51 6 2 0 0
52 6 3 0 0
53 6 4 0 0
54 6 5 0 0
55 6 6 None 0
56 6 7 None 1
57 6 8 None 2
58 6 9 None 3
59 6 10 None 4
60 7 1 None 0
61 7 2 None 1
62 7 3 None 2
63 7 4 None 3
64 7 5 None 4
65 7 6 None 5
66 7 7 None 6
67 7 8 None 7
68 7 9 None 8
69 7 10 None 9
70 8 1 4 0
71 8 2 4 0
72 8 3 4 0
73 8 4 4 0
74 8 5 4 0
75 8 6 None 0
76 8 7 None 1
77 8 8 None 2
78 8 9 None 3
79 8 10 None 4
80 9 1 11 0
81 9 2 11 0
82 9 3 11 0
83 9 4 11 0
84 9 5 11 0
85 9 6 12 0
86 9 7 12 0
87 9 8 12 0
88 9 9 12 0
89 9 10 12 0
90 10 1 14 0
91 10 2 14 0
92 10 3 14 0
93 10 4 14 0
94 10 5 14 0
95 10 6 15 0
96 10 7 15 0
97 10 8 15 0
98 10 9 15 0
99 10 10 15 0

cars['not_ordered_in_STREET_x_before_my_car'] = pd.isnull(cars['pizza_ordered']).cumsum()

Related

pandas: Create new column by comparing DataFrame rows of one column of DataFrame

assume i have df:
pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
data
0 0
1 0
2 0
3 1
4 1
5 1
6 2
7 2
8 2
9 3
10 3
11 4
12 4
13 5
14 5
15 0
16 0
17 0
18 0
19 2
20 2
21 2
22 2
23 4
24 4
25 4
26 4
I'm looking for a way to create a new column in df that shows the number of data items repeated in new column For example:
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4
My logic was to get the rows to python list compare and create a new list.
Is there a simple way to do this?
Example
df = pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
Code
grouper = df['data'].ne(df['data'].shift(1)).cumsum()
df['new'] = df.groupby(grouper).cumcount().add(1)
df
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4

Cumulative Sum based on a Trigger

I am trying to track cumulative sums of the 'Value' column that should begin every time I get 1 in the 'Signal' column.
So in the table below I need to obtain 3 cumulative sums starting at values 3, 6, and 9 of the index, and each sum ending at value 11 of the index:
Index
Value
Signal
0
3
0
1
8
0
2
8
0
3
7
1
4
9
0
5
10
0
6
14
1
7
10
0
8
10
0
9
4
1
10
10
0
11
10
0
What would be a way to do it?
Expected Output:
Index
Value
Signal
Cumsum_1
Cumsum_2
Cumsum_3
0
3
0
0
0
0
1
8
0
0
0
0
2
8
0
0
0
0
3
7
1
7
0
0
4
9
0
16
0
0
5
10
0
26
0
0
6
14
1
40
14
0
7
10
0
50
24
0
8
10
0
60
34
0
9
4
1
64
38
4
10
10
0
74
48
14
11
10
0
84
58
24
You can pivot, bfill, then cumsum:
df.merge(df.assign(id=df['Signal'].cumsum().add(1))
.pivot(index='Index', columns='id', values='Value')
.bfill(axis=1).fillna(0, downcast='infer')
.cumsum()
.add_prefix('cumsum'),
left_on='Index', right_index=True
)
output:
Index Value Signal cumsum1 cumsum2 cumsum3 cumsum4
0 0 3 0 3 0 0 0
1 1 8 0 11 0 0 0
2 2 8 0 19 0 0 0
3 3 7 1 26 7 0 0
4 4 9 0 35 16 0 0
5 5 10 0 45 26 0 0
6 6 14 1 59 40 14 0
7 7 10 0 69 50 24 0
8 8 10 0 79 60 34 0
9 9 4 1 83 64 38 4
10 10 10 0 93 74 48 14
11 11 10 0 103 84 58 24
older answer
IIUC, you can use groupby.cumsum:
df['cumsum'] = df.groupby(df['Signal'].cumsum())['Value'].cumsum()
output:
Index Value Signal cumsum
0 0 3 0 3
1 1 8 0 11
2 2 8 0 19
3 3 7 1 7
4 4 9 0 16
5 5 10 0 26
6 6 14 1 14
7 7 10 0 24
8 8 10 0 34
9 9 4 1 4
10 10 10 0 14
11 11 10 0 24

Iterate over three lists with different lengths simultaneously

So I tried to iterate over 3 lists simultaneously using zip and itertools cycle in python 3, but it gave me something I didn't want. Suppose that I have
list_a = [0,1,2,3,4,5,6,7,8,9,10,11]
list_b = [0,1,2,3,4,5,6,7,8,9,10,11]
list_c = [0,1,2,3,4,5,6,7,8,9,10,11,
12,13,14,15,16,17,18,19,20,21,22,23,
24,25,26,27,28,29,30,31,32,33,34,35,
36,37,38,39,40,41,42,43,44,45,46,47,
48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,
72,73,74,75,76,77,78,79,80,81,82,83,
84,85,86,87,88,89,90,91,92,93,94,95,
96,97,98,99,100,101,102,103,104,105,106,107,
108,109,110,111,112,113,114,115,116,117,118,119,
120,121,122,123,124,125,126,127,128,129,130,131,
132,133,134,135,136,137,138,139,140,141,142,143]
I have tried this:
from itertools import cycle
for val_a in list_a:
for val_b, val_c in zip(cycle(list_b), list_c):
print(val_a, val_b, val_c)
my output is:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
0 0 12
0 1 13
0 2 14
0 3 15
0 4 16
0 5 17
0 6 18
0 7 19
0 8 20
0 9 21
0 10 22
0 11 23
0 0 24
0 1 25
0 2 26
0 3 27
0 4 28
0 5 29
0 6 30
0 7 31
0 8 32
0 9 33
0 10 34
0 11 35
. . .
. . .
. . .
. . .
. . .
and so on...
I expect the output:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
1 0 12
1 1 13
1 2 14
1 3 15
1 4 16
1 5 17
1 6 18
1 7 19
1 8 20
1 9 21
1 10 22
1 11 23
2 0 24
2 1 25
2 2 26
2 3 27
2 4 28
2 5 29
2 6 30
2 7 31
2 8 32
2 9 33
2 10 34
2 11 35
. . .
. . .
. . .
. . .
. . .
11 9 141
11 10 142
11 11 143
I have tried without using itertools cycle, using itertools.izip_longest and changing the order of iteration of lists. What should I do?
It appears you don't want to cycle through any lists at all. Instead you want to go through every element in b for each element in a, while incrementing c.
Turn c into an iterator like so to increment it, and proceed with the nested for loop like so:
iter_c = iter(list_c)
for val_a in list_a:
for val_b, val_c in zip(list_b, iter_c):
print(val_a, val_b, val_c)
Output:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
1 0 12
1 1 13
1 2 14
1 3 15
1 4 16
1 5 17
1 6 18
1 7 19
1 8 20
1 9 21
1 10 22
1 11 23
2 0 24
2 1 25
2 2 26
2 3 27
2 4 28
2 5 29
2 6 30
2 7 31
2 8 32
2 9 33
2 10 34
2 11 35
. . .
. . .
. . .
. . .
. . .
11 9 141
11 10 142
11 11 143

Pandas group by with sum on few columns and retain the other column

I have a table which look like this.
msno date num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150513 0 0 0 0 1 1
1 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150709 9 1 0 0 7 11
2 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150105 3 3 0 0 68 36
3 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150306 1 0 1 1 97 27
4 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150501 3 0 0 0 38 38
5 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150702 4 0 1 1 33 10
6 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150830 3 1 0 0 4 7
7 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20151107 1 0 0 0 4 5
8 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160110 2 0 1 0 11 6
9 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160316 9 3 4 1 67 50
10 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160510 5 3 2 1 67 66
11 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160804 1 4 5 0 36 43
12 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160926 7 1 0 1 38 20
13 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20161115 0 1 4 1 38 40
14 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20170106 0 0 0 1 39 38
15 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20151201 3 3 2 0 8 11
16 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20160628 0 0 1 1 1 3
17 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20170106 2 1 0 0 35 34
18 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20150803 0 0 0 0 16 11
19 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160527 4 3 0 2 2 11
20 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160808 14 3 4 1 15 31
How should I sum up the columns 'num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs' to get the total and left only one unique msno number?
For example, after group all same msno number rows, it will produce result below, discarding date column.
msno num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 9 1 0 0 8 12
I tried this but the msno still duplicated and date column is still there.
df_user_logs_v2.groupby(['msno', 'date'])['num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'].sum()
Use drop + groupby + sum:
df = df_user_logs_v2.drop('date', axis=1).groupby('msno', as_index=False).sum()

Pivot column and column values in pandas dataframe

I have a dataframe that looks like this, but with 26 rows and 110 columns:
index/io 1 2 3 4
0 42 53 23 4
1 53 24 6 12
2 63 12 65 34
3 13 64 23 43
Desired output:
index io value
0 1 42
0 2 53
0 3 23
0 4 4
1 1 53
1 2 24
1 3 6
1 4 12
2 1 63
2 2 12
...
I have tried with dict and lists by transforming the dataframe to dict, and then create a new list with index values and update in new dict with io.
indx = []
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indx.append(key)
indxio = {}
for element in indx:
for key, value in mydict.iteritems():
for k, v in value.iteritems():
indxio.update({element:k})
I know this is too far probably, but it's the only thing I could think of. The process was too long, so I stopped.
You can use set_index, stack, and reset_index().
df.set_index("index/io").stack().reset_index(name="value")\
.rename(columns={'index/io':'index','level_1':'io'})
Output:
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
You need set_index + stack + rename_axis + reset_index:
df = df.set_index('index/io').stack().rename_axis(('index','io')).reset_index(name='value')
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
Solution with melt, rename, but there is different order of values, so sort_values is necessary:
d = {'index/io':'index'}
df = df.melt('index/io', var_name='io', value_name='value') \
.rename(columns=d).sort_values(['index','io']).reset_index(drop=True)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43
And alternative solution for numpy lovers:
df = df.set_index('index/io')
a = np.repeat(df.index, len(df.columns))
b = np.tile(df.columns, len(df.index))
c = df.values.ravel()
cols = ['index','io','value']
df = pd.DataFrame(np.column_stack([a,b,c]), columns = cols)
print (df)
index io value
0 0 1 42
1 0 2 53
2 0 3 23
3 0 4 4
4 1 1 53
5 1 2 24
6 1 3 6
7 1 4 12
8 2 1 63
9 2 2 12
10 2 3 65
11 2 4 34
12 3 1 13
13 3 2 64
14 3 3 23
15 3 4 43

Categories

Resources