Iterate over rows and calculate values - python

I have the following pandas dataframe:
temp stage issue_datetime
20 1 2022/11/30 19:20
21 1 2022/11/30 19:21
20 1 None
25 1 2022/11/30 20:10
30 2 None
22 2 2022/12/01 10:00
22 2 2022/12/01 10:01
31 3 2022/12/02 11:00
32 3 2022/12/02 11:01
19 1 None
20 1 None
I want to get the following result:
temp stage num_issues
20 1 3
21 1 3
20 1 3
25 1 3
30 2 2
22 2 2
22 2 2
31 3 2
32 3 2
19 1 0
20 1 0
Basically, I need to calculate the number of non-None per continuous value of stage and create a new column called num_issues.
How can I do it?

You can find the blocks of continuous value with cumsum on the diff, then groupby that and transform the non-null`
blocks = df['stage'].ne(df['stage'].shift()).cumsum()
df['num_issues'] = df['issue_datetime'].notna().groupby(blocks).transform('sum')
# or
# df['num_issues'] = df['issue_datetime'].groupby(blocks).transform('count')
Output:
temp stage issue_datetime num_issues
0 20 1 2022/11/30 19:20 3
1 21 1 2022/11/30 19:21 3
2 20 1 None 3
3 25 1 2022/11/30 20:10 3
4 30 2 None 2
5 22 2 2022/12/01 10:00 2
6 22 2 2022/12/01 10:01 2
7 31 3 2022/12/02 11:00 2
8 32 3 2022/12/02 11:01 2
9 19 1 None 0
10 20 1 None 0

Related

how to replace specific data using pandas python or excel

I having a data csv file containing some data. In which i have some data within semi colons. In these semi colon there is some specific id numbers and i need to replace it with the specific location name.
Available data
24CFA4A-12L - GF Electrical corridor
Replacing data within semicolons of id number
1;1;35;;1/2/1/37 24CFA4A;;;0;;;
Files with data - https://gofile.io/d/bQDppz
Thank you if anyone have solution.
[![Main data to replaced after finding id number and replacing with location ][3]][3]
Supposing you have dataframes:
df1 = pd.read_excel("ID_list.xlsx", header=None)
df2 = pd.read_excel("location.xlsx", header=None)
df1:
0
0 1;1;27;;1/2/1/29 25BAB3D;;;0;;;
1 1;1;27;1;;;;0;;;
2 1;1;28;;1/2/1/30 290E6D2;;;0;;;
3 1;1;28;1;;;;0;;;
4 1;1;29;;1/2/1/31 28BA737;;;0;;;
5 1;1;29;1;;;;0;;;
6 1;1;30;;1/2/1/32 2717823;;;0;;;
7 1;1;30;1;;;;0;;;
8 1;1;31;;1/2/1/33 254DEAA;;;0;;;
9 1;1;31;1;;;;0;;;
10 1;1;32;;1/2/1/34 28AE041;;;0;;;
11 1;1;32;1;;;;0;;;
12 1;1;33;;1/2/1/35 254DE82;;;0;;;
13 1;1;33;1;;;;0;;;
14 1;1;34;;1/2/1/36 2539D70;;;0;;;
15 1;1;34;1;;;;0;;;
16 1;1;35;;1/2/1/37 24CFA4A;;;0;;;
17 1;1;35;1;;;;0;;;
18 1;1;36;;1/2/1/39 28F023E;;;0;;;
19 1;1;36;1;;;;0;;;
20 1;1;37;;1/2/1/40 2717831;;;0;;;
21 1;1;37;1;;;;0;;;
22 1;1;38;;1/2/1/41 2397D75;;;0;;;
23 1;1;38;1;;;;0;;;
24 1;1;39;;1/2/1/42 287844C;;;0;;;
25 1;1;39;1;;;;0;;;
26 1;1;40;;1/2/1/43 28784F0;;;0;;;
27 1;1;40;1;;;;0;;;
28 1;1;41;;1/2/1/44 2865B67;;;0;;;
29 1;1;41;1;;;;0;;;
30 1;1;42;;1/2/1/45 2865998;;;0;;;
31 1;1;42;1;;;;0;;;
32 1;1;43;;1/2/1/46 287852F;;;0;;;
33 1;1;43;1;;;;0;;;
34 1;1;44;;1/2/1/47 287AC43;;;0;;;
35 1;1;44;1;;;;0;;;
36 1;1;45;;1/2/1/48 287ACF8;;;0;;;
37 1;1;45;1;;;;0;;;
38 1;1;46;;1/2/1/49 2878586;;;0;;;
39 1;1;46;1;;;;0;;;
40 1;1;47;;1/2/1/50 2878474;;;0;;;
41 1;1;47;1;;;;0;;;
42 1;1;48;;1/2/1/51 2846315;;;0;;;
df2:
0 1
0 GF General Dining TC 254DEAA-02L
1 GF General Dining TC 2717823-26L
2 GF General Dining FC 28BA737-50L
3 GF Preparation FC 25BAB3D-10L
4 GF Preparation TC 290E6D2-01M
5 GF Hospital Kitchen FC 25BAB2F-10L
6 GF Hospital Kitchen TC 2906F5C-01M
7 GF Food Preparation FC 25F5723-10L
8 GF Food Preparation TC 29070D6-01M
9 GF KITCHEN Corridor 254DF5D-02L
Then:
df1 = df1[0].str.split(";", expand=True)
df1[4] = df1[4].apply(lambda x: v[-1] if (v := x.split()) else "")
df2[1] = df2[1].apply(lambda x: x.split("-")[0])
df1:
0 1 2 3 4 5 6 7 8 9 10
0 1 1 27 25BAB3D 0
1 1 1 27 1 0
2 1 1 28 290E6D2 0
3 1 1 28 1 0
4 1 1 29 28BA737 0
5 1 1 29 1 0
6 1 1 30 2717823 0
7 1 1 30 1 0
8 1 1 31 254DEAA 0
9 1 1 31 1 0
10 1 1 32 28AE041 0
11 1 1 32 1 0
12 1 1 33 254DE82 0
13 1 1 33 1 0
14 1 1 34 2539D70 0
15 1 1 34 1 0
16 1 1 35 24CFA4A 0
17 1 1 35 1 0
18 1 1 36 28F023E 0
19 1 1 36 1 0
20 1 1 37 2717831 0
21 1 1 37 1 0
22 1 1 38 2397D75 0
23 1 1 38 1 0
24 1 1 39 287844C 0
25 1 1 39 1 0
26 1 1 40 28784F0 0
27 1 1 40 1 0
28 1 1 41 2865B67 0
29 1 1 41 1 0
30 1 1 42 2865998 0
31 1 1 42 1 0
32 1 1 43 287852F 0
33 1 1 43 1 0
34 1 1 44 287AC43 0
35 1 1 44 1 0
36 1 1 45 287ACF8 0
37 1 1 45 1 0
38 1 1 46 2878586 0
39 1 1 46 1 0
40 1 1 47 2878474 0
41 1 1 47 1 0
42 1 1 48 2846315 0
df2:
0 1
0 GF General Dining TC 254DEAA
1 GF General Dining TC 2717823
2 GF General Dining FC 28BA737
3 GF Preparation FC 25BAB3D
4 GF Preparation TC 290E6D2
5 GF Hospital Kitchen FC 25BAB2F
6 GF Hospital Kitchen TC 2906F5C
7 GF Food Preparation FC 25F5723
8 GF Food Preparation TC 29070D6
9 GF KITCHEN Corridor 254DF5D
To replace the values:
m = dict(zip(df2[1], df2[0]))
df1[4] = df1[4].replace(m)
df1:
0 1 2 3 4 5 6 7 8 9 10
0 1 1 27 GF Preparation FC 0
1 1 1 27 1 0
2 1 1 28 GF Preparation TC 0
3 1 1 28 1 0
4 1 1 29 GF General Dining FC 0
5 1 1 29 1 0
6 1 1 30 GF General Dining TC 0
7 1 1 30 1 0
8 1 1 31 GF General Dining TC 0
9 1 1 31 1 0
10 1 1 32 28AE041 0
11 1 1 32 1 0
12 1 1 33 254DE82 0
13 1 1 33 1 0
14 1 1 34 2539D70 0
15 1 1 34 1 0
16 1 1 35 24CFA4A 0
17 1 1 35 1 0
18 1 1 36 28F023E 0
19 1 1 36 1 0
20 1 1 37 2717831 0
21 1 1 37 1 0
22 1 1 38 2397D75 0
23 1 1 38 1 0
24 1 1 39 287844C 0
25 1 1 39 1 0
26 1 1 40 28784F0 0
27 1 1 40 1 0
28 1 1 41 2865B67 0
29 1 1 41 1 0
30 1 1 42 2865998 0
31 1 1 42 1 0
32 1 1 43 287852F 0
33 1 1 43 1 0
34 1 1 44 287AC43 0
35 1 1 44 1 0
36 1 1 45 287ACF8 0
37 1 1 45 1 0
38 1 1 46 2878586 0
39 1 1 46 1 0
40 1 1 47 2878474 0
41 1 1 47 1 0
42 1 1 48 2846315 0

Get next value from range after reaching specific multiples

I have a range of values i iterating through the number of hours in a year (8760) starting at 1. For every hour, the variable hour increments by 1 until it reaches 24 where it restarts. The variable year_day increments by 1 after every 24 hours is reached. Eg
i hour year_day
1 1 1
2 2 1
3 3 1
...
23 23 1
24 1 2
25 2 2
...
47 24 2
48 1 3
49 2 3
I'm struggling to make it so that when i = 24, hour also is 24 and year_day remains at 1. Then when i is the next value directly after a multiple is found, the hour restarts at 1 and year_day increments by 1. In other words, everytime it reaches midnight, the hour = 24 and year_day is still the previous day. Eg
i hour year_day
23 23 1
24 24 1
25 1 2
...
47 23 2
48 24 2
49 1 3
Here is the code:
hour = 0
year_day = 1
for i in range(1, 8761):
hour = hour + 1
if i % 24 == 0:
hour = 1
year_day = year_day + 1
print(i, hour, year_day)
Your code is ok, you just need to start with hour=1 and print before the if statement. Try the following:
hour = 1
year_day = 1
for i in range(1, 8761):
print(i, hour, year_day)
hour+=1
if i % 24 == 0:
hour = 1
year_day = year_day + 1
Output:
...
21 21 1
22 22 1
23 23 1
24 24 1
25 1 2
26 2 2
27 3 2
...
I have used a pandas approach to this question. The code is as follows:
import numpy as np
import pandas as pd
i = list(range(1,50))
df = pd.DataFrame(i, columns=["i"])
df["hours"] = df["i"]%24
df["hours"][df["hours"]==0] = 24
df["days"] = (df["i"]//24.1+1).astype(int)
display(df)
The output is:
i hours days
0 1 1 1
1 2 2 1
2 3 3 1
3 4 4 1
4 5 5 1
5 6 6 1
6 7 7 1
7 8 8 1
8 9 9 1
9 10 10 1
10 11 11 1
11 12 12 1
12 13 13 1
13 14 14 1
14 15 15 1
15 16 16 1
16 17 17 1
17 18 18 1
18 19 19 1
19 20 20 1
20 21 21 1
21 22 22 1
22 23 23 1
23 24 24 1
24 25 1 2
25 26 2 2
26 27 3 2
27 28 4 2
28 29 5 2
29 30 6 2
30 31 7 2
31 32 8 2
32 33 9 2
33 34 10 2
34 35 11 2
35 36 12 2
36 37 13 2
37 38 14 2
38 39 15 2
39 40 16 2
40 41 17 2
41 42 18 2
42 43 19 2
43 44 20 2
44 45 21 2
45 46 22 2
46 47 23 2
47 48 24 2
48 49 1 3
hour = 0
year_day = 1
for i in range(1, 8761):
if i % 24 == 0:
hour = 0
year_day += 1
hour += 1
print(i, hour, year_day)
Returns:
20 20 1
. . .
24 1 2
25 2 2
. . .
46 23 2
47 24 2
48 1 3

assign a number id for every 4 rows in pandas dataframe

I have a pandas dataframe like this:
pd.DataFrame({'week': ['2019-w01', '2019-w02','2019-w03','2019-w04',
'2019-w05','2019-w06','2019-w07','2019-w08',
'2019-w9','2019-w10','2019-w11','2019-w12'],
'value': [11,22,33,34,57,88,2,9,10,1,76,14],
'period': [1,1,1,1,2,2,2,2,3,3,3,3]})
week value
0 2019-w1 11
1 2019-w2 22
2 2019-w3 33
3 2019-w4 34
4 2019-w5 57
5 2019-w6 88
6 2019-w7 2
7 2019-w8 9
8 2019-w9 10
9 2019-w10 1
10 2019-w11 76
11 2019-w12 14
what I need is like below. I would like to assign a period ID every 4-week interval.
week value period
0 2019-w01 11 1
1 2019-w02 22 1
2 2019-w03 33 1
3 2019-w04 34 1
4 2019-w05 57 2
5 2019-w06 88 2
6 2019-w07 2 2
7 2019-w08 9 2
8 2019-w9 10 3
9 2019-w10 1 3
10 2019-w11 76 3
11 2019-w12 14 3
what is the best way to achieve that? Thanks.
try with:
df['period']=(pd.to_numeric(df['week'].str.split('-').str[-1]
.str.replace('w',''))//4).shift(fill_value=0).add(1)
print(df)
week value period
0 2019-w01 11 1
1 2019-w02 22 1
2 2019-w03 33 1
3 2019-w04 34 1
4 2019-w05 57 2
5 2019-w06 88 2
6 2019-w07 2 2
7 2019-w08 9 2
8 2019-w9 10 3
9 2019-w10 1 3
10 2019-w11 76 3
11 2019-w12 14 3

Defining Target based on two column values

I am new to python and I was facing some issue solving the following problem.
I have the following dataframe:
SoldDate CountSoldperMonth
2019-06-01 20
5
10
12
33
16
50
27
2019-05-01 2
5
11
13
2019-04-01 32
35
39
42
47
55
61
80
I need to add a Target column such that for the top 5 values in 'CountSoldperMonth' for a particular SoldDate, target should be 1 else 0. If the number of rows in 'CountSoldperMonth' for a particular 'SoldDate' is less than 5 then only the row with highest count will be marked as 1 in the Target and rest as 0. The resulting dataframe should look as below.
SoldDate CountSoldperMonth Target
2019-06-01 20 1
5 0
10 0
12 0
33 1
16 1
50 1
27 1
2019-05-01 2 0
5 0
11 0
13 1
2019-04-01 32 0
35 0
39 0
42 1
47 1
55 1
61 1
80 1
How do I do this?
In your case , using groupby with your rules chain with apply if...else
df.groupby('SoldDate').CountSoldperMonth.\
apply(lambda x : x==max(x) if len(x)<=5 else x.isin(sorted(x)[-5:])).astype(int)
Out[346]:
0 1
1 0
2 0
3 0
4 1
5 1
6 1
7 1
8 0
9 0
10 0
11 1
12 0
13 0
14 0
15 1
16 1
17 1
18 1
19 1
Name: CountSoldperMonth, dtype: int32

Pandas DataFrame Return Value from Column Index

I have a dataframe that has values of the different column numbers for another dataframe. Is there a way that I can just return the value from the other dataframe instead of just having the column index.
I basically want to match up the index between the Push and df dataframes. The values in the Push dataframe contain what column I want to return from the df dataframe.
Push dataframe:
0 1
0 1 2
1 0 3
2 0 3
3 1 3
4 0 2
df dataframe:
0 1 2 3 4
0 10 11 22 33 44
1 10 11 22 33 44
2 10 11 22 33 44
3 10 11 22 33 44
4 10 11 22 33 44
return:
0 1
0 11 22
1 10 33
2 10 33
3 11 33
4 10 22
You can do it with np.take ; However this function works on the flattened array. push must be shift like that :
In [285]: push1 = push.values+np.arange(0,25,5)[:,None]
In [229]: pd.DataFrame(df.values.take(push1))
EDIT
No, I just reinvent np.choose :
In [24]: df
Out[24]:
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 20 21 22 23 24
3 30 31 32 33 34
4 40 41 42 43 44
In [25]: push
Out[25]:
0 1
0 1 2
1 0 3
2 0 3
3 1 3
4 0 2
In [27]: np.choose(push.T,df).T
Out[27]:
0 1
0 1 2
1 10 13
2 20 23
3 31 33
4 40 42
We using melt then replace notice (df1 is your push , df2 is your df)
df1.astype(str).replace(df2.melt().drop_duplicates().set_index('variable').value.to_dict())
Out[31]:
0 1
0 11 22
1 10 33
2 10 33
3 11 33
4 10 22

Categories

Resources