Iterate over three lists with different lengths simultaneously - python

So I tried to iterate over 3 lists simultaneously using zip and itertools cycle in python 3, but it gave me something I didn't want. Suppose that I have
list_a = [0,1,2,3,4,5,6,7,8,9,10,11]
list_b = [0,1,2,3,4,5,6,7,8,9,10,11]
list_c = [0,1,2,3,4,5,6,7,8,9,10,11,
12,13,14,15,16,17,18,19,20,21,22,23,
24,25,26,27,28,29,30,31,32,33,34,35,
36,37,38,39,40,41,42,43,44,45,46,47,
48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,
72,73,74,75,76,77,78,79,80,81,82,83,
84,85,86,87,88,89,90,91,92,93,94,95,
96,97,98,99,100,101,102,103,104,105,106,107,
108,109,110,111,112,113,114,115,116,117,118,119,
120,121,122,123,124,125,126,127,128,129,130,131,
132,133,134,135,136,137,138,139,140,141,142,143]
I have tried this:
from itertools import cycle
for val_a in list_a:
for val_b, val_c in zip(cycle(list_b), list_c):
print(val_a, val_b, val_c)
my output is:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
0 0 12
0 1 13
0 2 14
0 3 15
0 4 16
0 5 17
0 6 18
0 7 19
0 8 20
0 9 21
0 10 22
0 11 23
0 0 24
0 1 25
0 2 26
0 3 27
0 4 28
0 5 29
0 6 30
0 7 31
0 8 32
0 9 33
0 10 34
0 11 35
. . .
. . .
. . .
. . .
. . .
and so on...
I expect the output:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
1 0 12
1 1 13
1 2 14
1 3 15
1 4 16
1 5 17
1 6 18
1 7 19
1 8 20
1 9 21
1 10 22
1 11 23
2 0 24
2 1 25
2 2 26
2 3 27
2 4 28
2 5 29
2 6 30
2 7 31
2 8 32
2 9 33
2 10 34
2 11 35
. . .
. . .
. . .
. . .
. . .
11 9 141
11 10 142
11 11 143
I have tried without using itertools cycle, using itertools.izip_longest and changing the order of iteration of lists. What should I do?

It appears you don't want to cycle through any lists at all. Instead you want to go through every element in b for each element in a, while incrementing c.
Turn c into an iterator like so to increment it, and proceed with the nested for loop like so:
iter_c = iter(list_c)
for val_a in list_a:
for val_b, val_c in zip(list_b, iter_c):
print(val_a, val_b, val_c)
Output:
0 0 0
0 1 1
0 2 2
0 3 3
0 4 4
0 5 5
0 6 6
0 7 7
0 8 8
0 9 9
0 10 10
0 11 11
1 0 12
1 1 13
1 2 14
1 3 15
1 4 16
1 5 17
1 6 18
1 7 19
1 8 20
1 9 21
1 10 22
1 11 23
2 0 24
2 1 25
2 2 26
2 3 27
2 4 28
2 5 29
2 6 30
2 7 31
2 8 32
2 9 33
2 10 34
2 11 35
. . .
. . .
. . .
. . .
. . .
11 9 141
11 10 142
11 11 143

Related

How to calculate an an accumulated value conditionally?

This question is based on this thread.
I have the following dataframe:
diff_hours stage sensor
0 0 20
0 0 21
0 0 21
1 0 22
5 0 21
0 0 22
0 1 20
7 1 23
0 1 24
0 3 25
0 3 28
6 0 21
0 0 22
I need to calculated an accumulated value of diff_hours while stage is growing. When stage drops to 0, the accumulated value acc_hours should restart to 0 even though diff_hours might not be equal to 0.
The proposed solution is this one:
blocks = df['stage'].diff().lt(0).cumsum()
df['acc_hours'] = df['diff_hours'].groupby(blocks).cumsum()
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 6
12 0 0 22 6
On the line 11 the value of acc_hours is equal to 6. I need it to be restarted to 0, because the stage dropped from 3 back to 0 in row 11.
The expected output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 0
How can I implement this logic?
The expected output is unclear, what about a simple mask?
Masking only the value during the change:
m = df['stage'].diff().lt(0)
df['acc_hours'] = (df.groupby(m.cumsum())
['diff_hours'].cumsum()
.mask(m, 0)
)
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 6
13 3 0 22 9
14 0 0 22 9
Or ignoring the value completely bu masking before groupby:
m = df['stage'].diff().lt(0)
df['acc_hours'] = (df['diff_hours'].mask(m, 0)
.groupby(m.cumsum())
.cumsum()
)
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 0
13 3 0 22 3
14 0 0 22 3

pandas: Create new column by comparing DataFrame rows of one column of DataFrame

assume i have df:
pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
data
0 0
1 0
2 0
3 1
4 1
5 1
6 2
7 2
8 2
9 3
10 3
11 4
12 4
13 5
14 5
15 0
16 0
17 0
18 0
19 2
20 2
21 2
22 2
23 4
24 4
25 4
26 4
I'm looking for a way to create a new column in df that shows the number of data items repeated in new column For example:
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4
My logic was to get the rows to python list compare and create a new list.
Is there a simple way to do this?
Example
df = pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
Code
grouper = df['data'].ne(df['data'].shift(1)).cumsum()
df['new'] = df.groupby(grouper).cumcount().add(1)
df
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4

Cumulative Sum based on a Trigger

I am trying to track cumulative sums of the 'Value' column that should begin every time I get 1 in the 'Signal' column.
So in the table below I need to obtain 3 cumulative sums starting at values 3, 6, and 9 of the index, and each sum ending at value 11 of the index:
Index
Value
Signal
0
3
0
1
8
0
2
8
0
3
7
1
4
9
0
5
10
0
6
14
1
7
10
0
8
10
0
9
4
1
10
10
0
11
10
0
What would be a way to do it?
Expected Output:
Index
Value
Signal
Cumsum_1
Cumsum_2
Cumsum_3
0
3
0
0
0
0
1
8
0
0
0
0
2
8
0
0
0
0
3
7
1
7
0
0
4
9
0
16
0
0
5
10
0
26
0
0
6
14
1
40
14
0
7
10
0
50
24
0
8
10
0
60
34
0
9
4
1
64
38
4
10
10
0
74
48
14
11
10
0
84
58
24
You can pivot, bfill, then cumsum:
df.merge(df.assign(id=df['Signal'].cumsum().add(1))
.pivot(index='Index', columns='id', values='Value')
.bfill(axis=1).fillna(0, downcast='infer')
.cumsum()
.add_prefix('cumsum'),
left_on='Index', right_index=True
)
output:
Index Value Signal cumsum1 cumsum2 cumsum3 cumsum4
0 0 3 0 3 0 0 0
1 1 8 0 11 0 0 0
2 2 8 0 19 0 0 0
3 3 7 1 26 7 0 0
4 4 9 0 35 16 0 0
5 5 10 0 45 26 0 0
6 6 14 1 59 40 14 0
7 7 10 0 69 50 24 0
8 8 10 0 79 60 34 0
9 9 4 1 83 64 38 4
10 10 10 0 93 74 48 14
11 11 10 0 103 84 58 24
older answer
IIUC, you can use groupby.cumsum:
df['cumsum'] = df.groupby(df['Signal'].cumsum())['Value'].cumsum()
output:
Index Value Signal cumsum
0 0 3 0 3
1 1 8 0 11
2 2 8 0 19
3 3 7 1 7
4 4 9 0 16
5 5 10 0 26
6 6 14 1 14
7 7 10 0 24
8 8 10 0 34
9 9 4 1 4
10 10 10 0 14
11 11 10 0 24

How can I create unique id based on the value in the other column

I wanted to assign the unique id based on the value from the column. For ex. i have a table like this:
df = pd.DataFrame({'A': [0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1]}
Eventually I would like to have my output table looks like this:
A
id
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
1
2
8
1
2
9
1
2
10
1
2
11
1
2
12
1
2
13
0
3
14
0
3
15
0
3
16
0
3
17
0
3
18
0
3
19
1
4
20
1
4
21
1
4
22
0
5
23
0
5
24
0
5
25
0
5
26
1
6
27
1
6
28
1
6
I tried data.groupby(['a'], sort=False).ngroup() + 1 but its not working as what I want. Any help and guidance will be appreciated! thanks!
diff + cumsum:
df['id'] = df.A.diff().ne(0).cumsum()
df
A id
0 0 1
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 2
8 1 2
9 1 2
10 1 2
11 1 2
12 1 2
13 0 3
14 0 3
15 0 3
16 0 3
17 0 3
18 0 3
19 1 4
20 1 4
21 1 4
22 0 5
23 0 5
24 0 5
25 0 5
26 1 6
27 1 6
28 1 6
import pdrle
df["id"] = pdrle.get_id(df["A"]) + 1
df
# A id
# 0 0 1
# 1 0 1
# 2 0 1
# 3 0 1
# 4 0 1
# 5 0 1
# 6 0 1
# 7 1 2
# 8 1 2
# 9 1 2
# 10 1 2
# 11 1 2
# 12 1 2
# 13 0 3
# 14 0 3
# 15 0 3
# 16 0 3
# 17 0 3
# 18 0 3
# 19 1 4
# 20 1 4
# 21 1 4
# 22 0 5
# 23 0 5
# 24 0 5
# 25 0 5
# 26 1 6
# 27 1 6
# 28 1 6

cumcount() None

I would like to have a new columnen ( not_ordered_in_STREET_x_before_my_car ) that counts the None values in my Dataframe up until the row I am in, groupted by x, sorted by x and y.
import pandas as pd
x_start = 1
y_start = 1
size_city = 10
cars = pd.DataFrame({'x': np.repeat(np.arange(x_start,x_start+size_city),size_city),
'y': np.tile(np.arange(y_start,y_start+size_city),size_city),
'pizza_ordered' : np.repeat([None,None,1,6,3,7,5,None,8,9,0,None,None,None,4,None,11,12,14,15],5)})
The first 4 columns is what I have, and the 5th is the one I want.
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 1 2
3 1 4 2 2
4 1 5 1 2
5 1 6 1 2
6 1 7 1 2
7 1 8 None 2
8 1 9 1 3
9 1 10 4 3
10 2 1 1 0
11 2 2 None 0
12 2 3 None 1
13 2 4 None 2
14 2 5 4 3
15 2 6 None 3
16 2 7 5 4
17 2 8 3 4
18 2 9 1 4
19 2 10 1 4
This is what I have tried, but it does not work.
cars = cars.sort_values(['x', 'y'])
cars['not_ordered_in_STREET_x_before_my_car'] = cars.where(cars['pizza_ordered'].isnull()).groupby(['x']).cumcount().add(1)
You can try:
cars["not_ordered_in_STREET_x_before_my_car"] = cars.groupby("x")[
"pizza_ordered"
].transform(lambda x: x.isna().cumsum().shift(1).fillna(0).astype(int
))
print(cars)
Prints:
x y pizza_ordered not_ordered_in_STREET_x_before_my_car
0 1 1 None 0
1 1 2 None 1
2 1 3 None 2
3 1 4 None 3
4 1 5 None 4
5 1 6 None 5
6 1 7 None 6
7 1 8 None 7
8 1 9 None 8
9 1 10 None 9
10 2 1 1 0
11 2 2 1 0
12 2 3 1 0
13 2 4 1 0
14 2 5 1 0
15 2 6 6 0
16 2 7 6 0
17 2 8 6 0
18 2 9 6 0
19 2 10 6 0
20 3 1 3 0
21 3 2 3 0
22 3 3 3 0
23 3 4 3 0
24 3 5 3 0
25 3 6 7 0
26 3 7 7 0
27 3 8 7 0
28 3 9 7 0
29 3 10 7 0
30 4 1 5 0
31 4 2 5 0
32 4 3 5 0
33 4 4 5 0
34 4 5 5 0
35 4 6 None 0
36 4 7 None 1
37 4 8 None 2
38 4 9 None 3
39 4 10 None 4
40 5 1 8 0
41 5 2 8 0
42 5 3 8 0
43 5 4 8 0
44 5 5 8 0
45 5 6 9 0
46 5 7 9 0
47 5 8 9 0
48 5 9 9 0
49 5 10 9 0
50 6 1 0 0
51 6 2 0 0
52 6 3 0 0
53 6 4 0 0
54 6 5 0 0
55 6 6 None 0
56 6 7 None 1
57 6 8 None 2
58 6 9 None 3
59 6 10 None 4
60 7 1 None 0
61 7 2 None 1
62 7 3 None 2
63 7 4 None 3
64 7 5 None 4
65 7 6 None 5
66 7 7 None 6
67 7 8 None 7
68 7 9 None 8
69 7 10 None 9
70 8 1 4 0
71 8 2 4 0
72 8 3 4 0
73 8 4 4 0
74 8 5 4 0
75 8 6 None 0
76 8 7 None 1
77 8 8 None 2
78 8 9 None 3
79 8 10 None 4
80 9 1 11 0
81 9 2 11 0
82 9 3 11 0
83 9 4 11 0
84 9 5 11 0
85 9 6 12 0
86 9 7 12 0
87 9 8 12 0
88 9 9 12 0
89 9 10 12 0
90 10 1 14 0
91 10 2 14 0
92 10 3 14 0
93 10 4 14 0
94 10 5 14 0
95 10 6 15 0
96 10 7 15 0
97 10 8 15 0
98 10 9 15 0
99 10 10 15 0
cars['not_ordered_in_STREET_x_before_my_car'] = pd.isnull(cars['pizza_ordered']).cumsum()

Categories

Resources