I have a data table that looks like this:
ID ARRIVAL_DATE_TIME DISPOSITION_DATE
1 2021-11-07 08:35:00 2021-11-07 17:58:00
2 2021-11-07 13:16:00 2021-11-08 02:52:00
3 2021-11-07 15:12:00 2021-11-07 21:08:00
I want to be able to count the number of patients in our location by date/hour and hour. I imagine I would eventually have to transform this data into a format seen below and then create a pivot table, but I'm not sure how to first transform this data. So for example, ID 1 would have a row for each date/hour and hour between '2021-11-07 08:35:00' and '2021-11-07 17:58:00'.
ID DATE_HOUR_IN_ED HOUR_IN_ED
1 2021-11-07 08:00:00 8:00
1 2021-11-07 09:00:00 9:00
1 2021-11-07 10:00:00 10:00
1 2021-11-07 11:00:00 11:00
...
2 2021-11-07 13:00:00 13:00
2 2021-11-07 14:00:00 14:00
2 2021-11-07 15:00:00 15:00
....
Use to_datetime with Series.dt.floor for remove times, then concat with repeat date_range and last create DataFrame by constructor:
df['ARRIVAL_DATE_TIME'] = pd.to_datetime(df['ARRIVAL_DATE_TIME']).dt.floor('H')
s = pd.concat([pd.Series(r.ID,pd.date_range(r.ARRIVAL_DATE_TIME,
r.DISPOSITION_DATE, freq='H'))
for r in df.itertuples()])
df1 = pd.DataFrame({'ID':s.to_numpy(),
'DATE_HOUR_IN_ED':s.index,
'HOUR_IN_ED': s.index.strftime('%H:%M')})
print (df1)
ID DATE_HOUR_IN_ED HOUR_IN_ED
0 1 2021-11-07 08:00:00 08:00
1 1 2021-11-07 09:00:00 09:00
2 1 2021-11-07 10:00:00 10:00
3 1 2021-11-07 11:00:00 11:00
4 1 2021-11-07 12:00:00 12:00
5 1 2021-11-07 13:00:00 13:00
6 1 2021-11-07 14:00:00 14:00
7 1 2021-11-07 15:00:00 15:00
8 1 2021-11-07 16:00:00 16:00
9 1 2021-11-07 17:00:00 17:00
10 2 2021-11-07 13:00:00 13:00
11 2 2021-11-07 14:00:00 14:00
12 2 2021-11-07 15:00:00 15:00
13 2 2021-11-07 16:00:00 16:00
14 2 2021-11-07 17:00:00 17:00
15 2 2021-11-07 18:00:00 18:00
16 2 2021-11-07 19:00:00 19:00
17 2 2021-11-07 20:00:00 20:00
18 2 2021-11-07 21:00:00 21:00
19 2 2021-11-07 22:00:00 22:00
20 2 2021-11-07 23:00:00 23:00
21 2 2021-11-08 00:00:00 00:00
22 2 2021-11-08 01:00:00 01:00
23 2 2021-11-08 02:00:00 02:00
24 3 2021-11-07 15:00:00 15:00
25 3 2021-11-07 16:00:00 16:00
26 3 2021-11-07 17:00:00 17:00
27 3 2021-11-07 18:00:00 18:00
28 3 2021-11-07 19:00:00 19:00
29 3 2021-11-07 20:00:00 20:00
30 3 2021-11-07 21:00:00 21:00
Alternative solution:
df['ARRIVAL_DATE_TIME'] = pd.to_datetime(df['ARRIVAL_DATE_TIME']).dt.floor('H')
L = [pd.date_range(s,e, freq='H')
for s, e in df[['ARRIVAL_DATE_TIME','DISPOSITION_DATE']].to_numpy()]
df['DATE_HOUR_IN_ED'] = L
df = (df.drop(['ARRIVAL_DATE_TIME','DISPOSITION_DATE'], axis=1)
.explode('DATE_HOUR_IN_ED')
.reset_index(drop=True)
.assign(HOUR_IN_ED = lambda x: x['DATE_HOUR_IN_ED'].dt.strftime('%H:%M')))
Try this:
import pandas as pd
import numpy as np
df = pd.read_excel('test.xls')
df1 = (df.set_index(['ID'])
.assign(DATE_HOUR_IN_ED=lambda x: [pd.date_range(s,d, freq='H')
for s,d in zip(x.ARRIVAL_DATE_TIME, x.DISPOSITION_DATE)])
['DATE_HOUR_IN_ED'].explode()
.reset_index()
)
df1['DATE_HOUR_IN_ED'] = df1['DATE_HOUR_IN_ED'].dt.floor('H')
df1['HOUR_IN_ED'] = df1['DATE_HOUR_IN_ED'].dt.strftime('%H:%M')
print(df1)
Output:
ID DATE_HOUR_IN_ED HOUR_IN_ED
0 1 2021-11-07 08:00:00 08:00
1 1 2021-11-07 09:00:00 09:00
2 1 2021-11-07 10:00:00 10:00
3 1 2021-11-07 11:00:00 11:00
4 1 2021-11-07 12:00:00 12:00
5 1 2021-11-07 13:00:00 13:00
6 1 2021-11-07 14:00:00 14:00
7 1 2021-11-07 15:00:00 15:00
8 1 2021-11-07 16:00:00 16:00
9 1 2021-11-07 17:00:00 17:00
10 2 2021-11-07 13:00:00 13:00
11 2 2021-11-07 14:00:00 14:00
12 2 2021-11-07 15:00:00 15:00
13 2 2021-11-07 16:00:00 16:00
14 2 2021-11-07 17:00:00 17:00
15 2 2021-11-07 18:00:00 18:00
16 2 2021-11-07 19:00:00 19:00
17 2 2021-11-07 20:00:00 20:00
18 2 2021-11-07 21:00:00 21:00
19 2 2021-11-07 22:00:00 22:00
20 2 2021-11-07 23:00:00 23:00
21 2 2021-11-08 00:00:00 00:00
22 2 2021-11-08 01:00:00 01:00
23 2 2021-11-08 02:00:00 02:00
24 3 2021-11-07 15:00:00 15:00
25 3 2021-11-07 16:00:00 16:00
26 3 2021-11-07 17:00:00 17:00
27 3 2021-11-07 18:00:00 18:00
28 3 2021-11-07 19:00:00 19:00
29 3 2021-11-07 20:00:00 20:00
Related
I'm creating a Python program using pandas and datetime libraries that will calculate the pay from my casual job each week, so I can cross reference my bank statement instead of looking through payslips.
The data that I am analysing is from the Google Calendar API that is synced with my work schedule. It prints the events in that particular calendar to a csv file in this format:
Start
End
Title
Hours
0
02.12.2020 07:00
02.12.2020 16:00
Shift
9.0
1
04.12.2020 18:00
04.12.2020 21:00
Shift
3.0
2
05.12.2020 07:00
05.12.2020 12:00
Shift
5.0
3
06.12.2020 09:00
06.12.2020 18:00
Shift
9.0
4
07.12.2020 19:00
07.12.2020 23:00
Shift
4.0
5
08.12.2020 19:00
08.12.2020 23:00
Shift
4.0
6
09.12.2020 10:00
09.12.2020 15:00
Shift
5.0
As I am a casual at this job I have to take a few things into consideration like penalty rates (baserate, after 6pm on Monday - Friday, Saturday, and Sunday all have different rates). I'm wondering if I can analyse this csv using datetime and calculate how many hours are before 6pm, and how many after 6pm. So using this as an example the output would be like:
Start
End
Title
Hours
1
04.12.2020 15:00
04.12.2020 21:00
Shift
6.0
Start
End
Title
Total Hours
Hours before 3pm
Hours after 3pm
1
04.12.2020 15:00
04.12.2020 21:00
Shift
6.0
3.0
3.0
I can use this to get the day of the week but I'm just not sure how to analyse certain bits of time for penalty rates:
df['day_of_week'] = df['Start'].dt.day_name()
I appreciate any help in Python or even other coding languages/techniques this can be applied to:)
Edit:
This is how my dataframe is looking at the moment
Start
End
Title
Hours
day_of_week
Pay
week_of_year
0
2020-12-02 07:00:00
2020-12-02 16:00:00
Shift
9.0
Wednesday
337.30
49
EDIT
In response to David Erickson's comment.
value
variable
bool
0
2020-12-02 07:00:00
Start
False
1
2020-12-02 08:00:00
Start
False
2
2020-12-02 09:00:00
Start
False
3
2020-12-02 10:00:00
Start
False
4
2020-12-02 11:00:00
Start
False
5
2020-12-02 12:00:00
Start
False
6
2020-12-02 13:00:00
Start
False
7
2020-12-02 14:00:00
Start
False
8
2020-12-02 15:00:00
Start
False
9
2020-12-02 16:00:00
End
False
10
2020-12-04 18:00:00
Start
False
11
2020-12-04 19:00:00
Start
True
12
2020-12-04 20:00:00
Start
True
13
2020-12-04 21:00:00
End
True
14
2020-12-05 07:00:00
Start
False
15
2020-12-05 08:00:00
Start
False
16
2020-12-05 09:00:00
Start
False
17
2020-12-05 10:00:00
Start
False
18
2020-12-05 11:00:00
Start
False
19
2020-12-05 12:00:00
End
False
20
2020-12-06 09:00:00
Start
False
21
2020-12-06 10:00:00
Start
False
22
2020-12-06 11:00:00
Start
False
23
2020-12-06 12:00:00
Start
False
24
2020-12-06 13:00:00
Start
False
25
2020-12-06 14:00:00
Start
False
26
2020-12-06 15:00:00
Start
False
27
2020-12-06 6:00:00
Start
False
28
2020-12-06 17:00:00
Start
False
29
2020-12-06 18:00:00
End
False
30
2020-12-07 19:00:00
Start
False
31
2020-12-07 20:00:00
Start
True
32
2020-12-07 21:00:00
Start
True
33
2020-12-07 22:00:00
Start
True
34
2020-12-07 23:00:00
End
True
35
2020-12-08 19:00:00
Start
False
36
2020-12-08 20:00:00
Start
True
37
2020-12-08 21:00:00
Start
True
38
2020-12-08 22:00:00
Start
True
39
2020-12-08 23:00:00
End
True
40
2020-12-09 10:00:00
Start
False
41
2020-12-09 11:00:00
Start
False
42
2020-12-09 12:00:00
Start
False
43
2020-12-09 13:00:00
Start
False
44
2020-12-09 14:00:00
Start
False
45
2020-12-09 15:00:00
End
False
46
2020-12-11 19:00:00
Start
False
47
2020-12-11 20:00:00
Start
True
48
2020-12-11 21:00:00
Start
True
49
2020-12-11 22:00:00
Start
True
UPDATE: (2020-12-19)
I have simply filtered out the Start rows, as you were correct an extra row wa being calculated. Also, I passed dayfirst=True to pd.to_datetime() to convert the date correctly. I have also made the output clean with some extra columns.
higher_pay = 40
lower_pay = 30
df['Start'], df['End'] = pd.to_datetime(df['Start'], dayfirst=True), pd.to_datetime(df['End'], dayfirst=True)
start = df['Start']
df1 = df[['Start', 'End']].melt(value_name='Date').set_index('Date')
s = df1.groupby('variable').cumcount()
df1 = df1.groupby(s, group_keys=False).resample('1H').asfreq().join(s.rename('Shift').to_frame()).ffill().reset_index()
df1 = df1[~df1['Date'].isin(start)]
df1['Day'] = df1['Date'].dt.day_name()
df1['Week'] = df1['Date'].dt.isocalendar().week
m = (df1['Date'].dt.hour > 18) | (df1['Day'].isin(['Saturday', 'Sunday']))
df1['Higher Pay Hours'] = np.where(m, 1, 0)
df1['Lower Pay Hours'] = np.where(m, 0, 1)
df1['Pay'] = np.where(m, higher_pay, lower_pay)
df1 = df1.groupby(['Shift', 'Day', 'Week']).sum().reset_index()
df2 = df.merge(df1, how='left', left_index=True, right_on='Shift').drop('Shift', axis=1)
df2
Out[1]:
Start End Title Hours Day Week \
0 2020-12-02 07:00:00 2020-12-02 16:00:00 Shift 9.0 Wednesday 49
1 2020-12-04 18:00:00 2020-12-04 21:00:00 Shift 3.0 Friday 49
2 2020-12-05 07:00:00 2020-12-05 12:00:00 Shift 5.0 Saturday 49
3 2020-12-06 09:00:00 2020-12-06 18:00:00 Shift 9.0 Sunday 49
4 2020-12-07 19:00:00 2020-12-07 23:00:00 Shift 4.0 Monday 50
5 2020-12-08 19:00:00 2020-12-08 23:00:00 Shift 4.0 Tuesday 50
6 2020-12-09 10:00:00 2020-12-09 15:00:00 Shift 5.0 Wednesday 50
Higher Pay Hours Lower Pay Hours Pay
0 0 9 270
1 3 0 120
2 5 0 200
3 9 0 360
4 4 0 160
5 4 0 160
6 0 5 150
There are probably more concise ways to do this, but I thought resampling the dataframe and then counting the hours would be a clean approach. You can melt the dataframe to have Start and End in the same column and fill in the gap hours with resample making sure to groupby by the 'Start' and 'End' values that were initially on the same row. The easiest way to figure out which rows were initially together is to get the cumulative count with cumcount of the values in the new the dataframe grouped by 'Start' and 'End'. I'll show you how this works later in the answer.
Full Code:
df['Start'], df['End'] = pd.to_datetime(df['Start']), pd.to_datetime(df['End'])
df = df[['Start', 'End']].melt().set_index('value')
df = df.groupby(df.groupby('variable').cumcount(), group_keys=False).resample('1H').asfreq().ffill().reset_index()
m = (df['value'].dt.hour > 18) | (df['value'].dt.day_name().isin(['Saturday', 'Sunday']))
print('Normal Rate No. of Hours', df[m].shape[0])
print('Higher Rate No. of Hours', df[~m].shape[0])
Normal Rate No. of Hours 20
Higher Rate No. of Hours 26
Adding some more details...
Step 1: Melt the dataframe: You only need two columns 'Start' and 'End' to get your desired output
df = df[['Start', 'End']].melt().set_index('value')
df
Out[1]:
variable
value
2020-02-12 07:00:00 Start
2020-04-12 18:00:00 Start
2020-05-12 07:00:00 Start
2020-06-12 09:00:00 Start
2020-07-12 19:00:00 Start
2020-08-12 19:00:00 Start
2020-09-12 10:00:00 Start
2020-02-12 16:00:00 End
2020-04-12 21:00:00 End
2020-05-12 12:00:00 End
2020-06-12 18:00:00 End
2020-07-12 23:00:00 End
2020-08-12 23:00:00 End
2020-09-12 15:00:00 End
Step 2: Create Group in preparation for resample: *As you can see group 0-6 line up with each other representing '
Start' and 'End' as they were together previously
df.groupby('variable').cumcount()
Out[2]:
value
2020-02-12 07:00:00 0
2020-04-12 18:00:00 1
2020-05-12 07:00:00 2
2020-06-12 09:00:00 3
2020-07-12 19:00:00 4
2020-08-12 19:00:00 5
2020-09-12 10:00:00 6
2020-02-12 16:00:00 0
2020-04-12 21:00:00 1
2020-05-12 12:00:00 2
2020-06-12 18:00:00 3
2020-07-12 23:00:00 4
2020-08-12 23:00:00 5
2020-09-12 15:00:00 6
Step 3: Resample the data per group by hour to fill in the gaps for each group:
df.groupby(df.groupby('variable').cumcount(), group_keys=False).resample('1H').asfreq().ffill().reset_index()
Out[3]:
value variable
0 2020-02-12 07:00:00 Start
1 2020-02-12 08:00:00 Start
2 2020-02-12 09:00:00 Start
3 2020-02-12 10:00:00 Start
4 2020-02-12 11:00:00 Start
5 2020-02-12 12:00:00 Start
6 2020-02-12 13:00:00 Start
7 2020-02-12 14:00:00 Start
8 2020-02-12 15:00:00 Start
9 2020-02-12 16:00:00 End
10 2020-04-12 18:00:00 Start
11 2020-04-12 19:00:00 Start
12 2020-04-12 20:00:00 Start
13 2020-04-12 21:00:00 End
14 2020-05-12 07:00:00 Start
15 2020-05-12 08:00:00 Start
16 2020-05-12 09:00:00 Start
17 2020-05-12 10:00:00 Start
18 2020-05-12 11:00:00 Start
19 2020-05-12 12:00:00 End
20 2020-06-12 09:00:00 Start
21 2020-06-12 10:00:00 Start
22 2020-06-12 11:00:00 Start
23 2020-06-12 12:00:00 Start
24 2020-06-12 13:00:00 Start
25 2020-06-12 14:00:00 Start
26 2020-06-12 15:00:00 Start
27 2020-06-12 16:00:00 Start
28 2020-06-12 17:00:00 Start
29 2020-06-12 18:00:00 End
30 2020-07-12 19:00:00 Start
31 2020-07-12 20:00:00 Start
32 2020-07-12 21:00:00 Start
33 2020-07-12 22:00:00 Start
34 2020-07-12 23:00:00 End
35 2020-08-12 19:00:00 Start
36 2020-08-12 20:00:00 Start
37 2020-08-12 21:00:00 Start
38 2020-08-12 22:00:00 Start
39 2020-08-12 23:00:00 End
40 2020-09-12 10:00:00 Start
41 2020-09-12 11:00:00 Start
42 2020-09-12 12:00:00 Start
43 2020-09-12 13:00:00 Start
44 2020-09-12 14:00:00 Start
45 2020-09-12 15:00:00 End
Step 4 - From there, you can calculate the boolean series I have called m: *True values represent conditions met for "Higher Rate".
m = (df['value'].dt.hour > 18) | (df['value'].dt.day_name().isin(['Saturday', 'Sunday']))
m
Out[4]:
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 True
11 True
12 True
13 True
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 True
31 True
32 True
33 True
34 True
35 True
36 True
37 True
38 True
39 True
40 True
41 True
42 True
43 True
44 True
45 True
Step 5: Filter the dataframe by True or False to count total hours for the Normal Rate and Higher Rate and print values.
print('Normal Rate No. of Hours', df[m].shape[0])
print('Higher Rate No. of Hours', df[~m].shape[0])
Normal Rate No. of Hours 20
Higher Rate No. of Hours 26
I have a DataFrame where 'B' is a Category and 'Boy ' is an Event , For Boy{1,2,3,4} B = 1 is alloted ;Boy = 1 uses B for 10 Mins Start from 12:00 to End = 12:10 ,Next boy should be using it from End_Time[0] , Like that for B =1 there are four Samples and B = 2 Different 4 Samples
Input Sample :
B Boy Start End Out
1 1 12:00 12:10 0:10
1 2 12:01 12:11 0:10
1 3 12:02 12:12 0:10
1 4 12:03 12:13 0:10
2 5 12:00 12:10 0:05
2 6 12:01 12:11 0:05
2 7 12:02 12:12 0:05
2 8 12:03 12:13 0:05
3 9 12:00 12:10 0:03
3 10 12:01 12:11 0:03
3 11 12:02 12:12 0:03
3 12 12:03 12:13 0:03
Code Tried :
data_1['End'] = pd.to_datetime(data_1['Start'] + pd.to_timedelta(data_1['Out'])
for i in range(1, len(data_1)):
data_1.loc[i, 'Start'] = data_1.loc[i-1, 'End']
Output :
B Boy Start End Out
1 1 12:00 12:10 0:10
1 2 12:10 12:20 0:10
1 3 12:20 12:30 0:10
1 4 12:30 12:40 0:10
2 5 12:40 12:45 0:05
2 6 12:45 12:50 0:05
2 7 12:50 12:55 0:05
2 8 12:55 13:00 0:05
3 9 13:00 13:03 0:03
3 10 13:03 13:06 0:03
3 11 13:06 13:09 0:03
3 12 13:09 13:12 0:03
Code Failed :
new_Start_time = []
for i,item in data_1.groupby('B'):
temp_list = [item.iloc[0,2]]
list_all = [item.iloc[0,3]]
for j in range(len(list_all)):
temp_list[j+1] = [list_all[j] for i in range(len(list_all) - 1) ]
temp_list.append(temp_list[j])
new_Start_time.extend(temp_list)
data_1['new_Start_time'] = new_Start_time
Error : IndexError: list assignment index out of range
Expected Result :
B Boy Start End Out
1 1 12:00 12:10 0:10
1 2 12:10 12:20 0:10
1 3 12:20 12:30 0:10
1 4 12:30 12:40 0:10
2 5 12:00 12:05 0:05
2 6 12:05 12:10 0:05
2 7 12:10 12:15 0:05
2 8 12:15 12:20 0:05
3 9 12:00 12:03 0:03
3 10 12:03 12:06 0:03
3 11 12:06 12:09 0:03
3 12 12:09 12:12 0:03
Thanks In Advance
I found a solution. It is not the best if your table is really big but it works.
First I converted the columns to datetime and timedelta:
df["Start"] = pd.to_datetime(df["Start"], format='%H:%M')
df["End"] = pd.to_datetime(df["End"], format='%H:%M')
df["Out"] = pd.to_timedelta("0"+df["Out"]+":00")
Then the code to create the new start and end columns:
new_start =[]
new_end = []
for i, group in df.groupby("B"):
temp_start =[]
temp_end = []
out = group.iloc[0,4]
for j in range(0,group.shape[0]):
if j==0:
temp_start.append(group.iloc[0,2])
temp_end.append(group.iloc[0,2]+out)
else:
temp_start.append(temp_end[j-1])
temp_end.append(temp_start[j]+out)
new_start.extend(temp_start)
new_end.extend(temp_end)
Now update the old start and end columns with the new values:
df["Start"]= new_start
df["End"] = new_end
df
Output:
B Boy Start End Out
0 1 1 1900-01-01 12:00:00 1900-01-01 12:10:00 00:10:00
1 1 2 1900-01-01 12:10:00 1900-01-01 12:20:00 00:10:00
2 1 3 1900-01-01 12:20:00 1900-01-01 12:30:00 00:10:00
3 1 4 1900-01-01 12:30:00 1900-01-01 12:40:00 00:10:00
4 2 5 1900-01-01 12:00:00 1900-01-01 12:05:00 00:05:00
5 2 6 1900-01-01 12:05:00 1900-01-01 12:10:00 00:05:00
6 2 7 1900-01-01 12:10:00 1900-01-01 12:15:00 00:05:00
7 2 8 1900-01-01 12:15:00 1900-01-01 12:20:00 00:05:00
8 3 9 1900-01-01 12:00:00 1900-01-01 12:03:00 00:03:00
9 3 10 1900-01-01 12:03:00 1900-01-01 12:06:00 00:03:00
10 3 11 1900-01-01 12:06:00 1900-01-01 12:09:00 00:03:00
11 3 12 1900-01-01 12:09:00 1900-01-01 12:12:00 00:03:00
You can use:
def toTimeDelta(s):
h = pd.to_timedelta(s.str.split(':').str[0].astype(int), unit='h')
m = pd.to_timedelta(s.str.split(':').str[1].astype(int), unit='m')
return h + m
def fx(s):
s = s.transform(toTimeDelta)
out = s['Out'].copy()
out.iloc[0] += s['Start'].iloc[0]
s['End'] = out.cumsum()
s['Start'].iloc[1:] = s['End'].shift().iloc[1:]
return s
df[['Start', 'End', 'Out']] = df.groupby('B')[['Start', 'End', 'Out']].apply(fx)
Result:
# print(df)
B Boy Start End Out
0 1 1 12:00:00 12:10:00 00:10:00
1 1 2 12:10:00 12:20:00 00:10:00
2 1 3 12:20:00 12:30:00 00:10:00
3 1 4 12:30:00 12:40:00 00:10:00
4 2 5 12:00:00 12:05:00 00:05:00
5 2 6 12:05:00 12:10:00 00:05:00
6 2 7 12:10:00 12:15:00 00:05:00
7 2 8 12:15:00 12:20:00 00:05:00
8 3 9 12:00:00 12:03:00 00:03:00
9 3 10 12:03:00 12:06:00 00:03:00
10 3 11 12:06:00 12:09:00 00:03:00
11 3 12 12:09:00 12:12:00 00:03:00
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I guys and gals,
I need help with dividing in pandas this dataframe into night time and day time data. Lets assume the night is after 17:00 and before 08:30 and that the day is between 08:30 and 17:00.
Date Time Open High Low Close Vol
7 2019-09-02 05:00 11919.9 11929.7 11917.7 11918.9 240
8 2019-09-02 06:00 11920.7 11940.4 11917.7 11927.9 240
9 2019-09-02 07:00 11927.4 11966.2 11927.2 11936.4 240
10 2019-09-02 08:00 11936.9 11955.9 11928.1 11951.4 240
11 2019-09-02 09:00 11951.4 11960.2 11939.4 11954.4 240
12 2019-09-02 10:00 11953.9 11995.9 11951.4 11976.9 240
13 2019-09-02 11:00 11976.7 11979.4 11956.2 11965.9 240
14 2019-09-02 12:00 11966.2 11971.4 11956.4 11965.4 240
15 2019-09-02 13:00 11965.7 11969.7 11943.4 11947.7 240
16 2019-09-02 14:00 11947.4 11962.4 11943.9 11960.7 240
17 2019-09-02 15:00 11960.9 11964.2 11901.2 11934.9 240
18 2019-09-02 16:00 11934.9 11939.7 11921.4 11929.7 240
19 2019-09-02 17:00 11929.9 11940.4 11928.4 11938.2 236
20 2019-09-02 18:00 11937.9 11938.2 11934.7 11938.2 176
21 2019-09-02 19:00 11937.9 11948.7 11937.7 11943.2 196
The between_time only shows times for the current date so that alone doesnt do it.
One idea is convert Time column to timedeltas and filter by boolean mask with Series.between:
mask = (pd.to_timedelta(df['Time'].astype(str).add(':00'))
.between(pd.Timedelta('08:30:00'), pd.Timedelta('17:00:00')))
df1 = df[mask]
print (df1)
Date Time Open High Low Close Vol
11 2019-09-02 09:00 11951.4 11960.2 11939.4 11954.4 240
12 2019-09-02 10:00 11953.9 11995.9 11951.4 11976.9 240
13 2019-09-02 11:00 11976.7 11979.4 11956.2 11965.9 240
14 2019-09-02 12:00 11966.2 11971.4 11956.4 11965.4 240
15 2019-09-02 13:00 11965.7 11969.7 11943.4 11947.7 240
16 2019-09-02 14:00 11947.4 11962.4 11943.9 11960.7 240
17 2019-09-02 15:00 11960.9 11964.2 11901.2 11934.9 240
18 2019-09-02 16:00 11934.9 11939.7 11921.4 11929.7 240
19 2019-09-02 17:00 11929.9 11940.4 11928.4 11938.2 236
df2 = df[~mask]
print (df2)
Date Time Open High Low Close Vol
7 2019-09-02 05:00 11919.9 11929.7 11917.7 11918.9 240
8 2019-09-02 06:00 11920.7 11940.4 11917.7 11927.9 240
9 2019-09-02 07:00 11927.4 11966.2 11927.2 11936.4 240
10 2019-09-02 08:00 11936.9 11955.9 11928.1 11951.4 240
20 2019-09-02 18:00 11937.9 11938.2 11934.7 11938.2 176
21 2019-09-02 19:00 11937.9 11948.7 11937.7 11943.2 196
EDIT:
Another idea with DataFrame.between_time, but necessary DatetimeIndex:
df['Datetime'] = pd.to_datetime(df['Date'].astype(str) + ':' + df['Time'].astype(str))
df = df.set_index('Datetime')
day = df.between_time('09:00','17:00')
night = df[~df.index.isin(day.index)]
I would try something like this, obviously change the times to what you need! but this is the general idea.
In [58]: df = pd.DataFrame({"Time":[
...: "05:00",
...: "06:00",
...: "07:00",
...: "08:00",
...: "09:00",
...: "10:00",
...: "11:00",
...: "12:00",
...: "13:00",
...: "14:00",
...: "15:00",
...: "16:00",
...: "17:00",
...: "18:00",
...: "19:00"]})
In [59]: df = df.set_index(pd.to_datetime(df["Time"]))
In [60]: df
Out[60]:
Time
Time
2019-09-15 05:00:00 05:00
2019-09-15 06:00:00 06:00
2019-09-15 07:00:00 07:00
2019-09-15 08:00:00 08:00
2019-09-15 09:00:00 09:00
2019-09-15 10:00:00 10:00
2019-09-15 11:00:00 11:00
2019-09-15 12:00:00 12:00
2019-09-15 13:00:00 13:00
2019-09-15 14:00:00 14:00
2019-09-15 15:00:00 15:00
2019-09-15 16:00:00 16:00
2019-09-15 17:00:00 17:00
2019-09-15 18:00:00 18:00
2019-09-15 19:00:00 19:00
In [61]: df["time_desc"] = "night"
In [62]: df
Out[62]:
Time time_desc
Time
2019-09-15 05:00:00 05:00 night
2019-09-15 06:00:00 06:00 night
2019-09-15 07:00:00 07:00 night
2019-09-15 08:00:00 08:00 night
2019-09-15 09:00:00 09:00 night
2019-09-15 10:00:00 10:00 night
2019-09-15 11:00:00 11:00 night
2019-09-15 12:00:00 12:00 night
2019-09-15 13:00:00 13:00 night
2019-09-15 14:00:00 14:00 night
2019-09-15 15:00:00 15:00 night
2019-09-15 16:00:00 16:00 night
2019-09-15 17:00:00 17:00 night
2019-09-15 18:00:00 18:00 night
2019-09-15 19:00:00 19:00 night
In [63]: df.loc[df.between_time("06:30", "18:00").index, "time_desc"] = "day"
In [64]: df
Out[64]:
Time time_desc
Time
2019-09-15 05:00:00 05:00 night
2019-09-15 06:00:00 06:00 night
2019-09-15 07:00:00 07:00 day
2019-09-15 08:00:00 08:00 day
2019-09-15 09:00:00 09:00 day
2019-09-15 10:00:00 10:00 day
2019-09-15 11:00:00 11:00 day
2019-09-15 12:00:00 12:00 day
2019-09-15 13:00:00 13:00 day
2019-09-15 14:00:00 14:00 day
2019-09-15 15:00:00 15:00 day
2019-09-15 16:00:00 16:00 day
2019-09-15 17:00:00 17:00 day
2019-09-15 18:00:00 18:00 day
2019-09-15 19:00:00 19:00 night
I have sorted df data like below:
day_name Day_id
time
2019-05-20 19:00:00 Monday 0
2018-12-31 15:00:00 Monday 0
2019-02-25 17:00:00 Monday 0
2019-05-06 20:00:00 Monday 0
2019-03-12 12:00:00 Tuesday 1
2019-04-16 15:00:00 Tuesday 1
2019-04-02 18:00:00 Tuesday 1
2019-02-05 09:00:00 Tuesday 1
2019-05-28 21:00:00 Tuesday 1
2019-01-15 12:00:00 Tuesday 1
2019-06-04 20:00:00 Tuesday 1
2018-12-04 07:00:00 Tuesday 1
2019-01-22 11:00:00 Tuesday 1
2019-01-09 07:00:00 Wednesday 2
2019-03-06 16:00:00 Wednesday 2
2019-06-19 17:00:00 Wednesday 2
2019-04-10 20:00:00 Wednesday 2
2019-04-24 15:00:00 Wednesday 2
2019-01-31 08:00:00 Thursday 3
2019-01-03 08:00:00 Thursday 3
2019-02-28 19:00:00 Thursday 3
2019-05-23 20:00:00 Thursday 3
2018-12-20 07:00:00 Thursday 3
2019-05-09 19:00:00 Thursday 3
2019-06-28 15:00:00 Friday 4
2019-03-22 12:00:00 Friday 4
2019-03-29 14:00:00 Friday 4
2018-12-15 08:00:00 Saturday 5
2019-02-17 11:00:00 Sunday 6
2019-06-16 19:00:00 Sunday 6
2018-12-02 08:00:00 Sunday 6
Currentry with help of this post:
df = df.groupby(df.day_name).count().plot(kind="bar")
plt.show()
my output is:
How to plot histogram with days of week in proper order like: Monday, Tuesday ...?
I have found several approaches: 1, 2, 3, to solve this but can't find method for using them in my case.
Thank You all for hard work.
You need sort=False under groupby:
m = df.groupby(df.day_name,sort=False).count().plot(kind="bar")
plt.show()
I have a row of data (in pandas), that has a time of day:
0 8:00 AM
1 11:00 AM
2 8:00 AM
3 4:00 PM
4 9:00 AM
5
6 9:00 AM
7
8 9:00 AM
9
10 9:00 AM
11
12 9:00 AM
13
14 8:00 AM
15 11:00 AM
16 8:00 AM
17 11:00 AM
18 9:00 AM
19
20 9:00 AM
21
22 9:00 AM
23
24 9:00 AM
25
26 9:00 AM
27
28 9:00 AM
I would like to convert this to something similar to this:
0 2015-11-11 08:00:00
1 2015-11-11 11:00:00
2 2015-11-11 08:00:00
3 2015-11-11 16:00:00
4 2015-11-11 09:00:00
5 NaT
6 2015-11-11 09:00:00
7 NaT
8 2015-11-11 09:00:00
9 NaT
10 2015-11-11 09:00:00
11 NaT
12 2015-11-11 09:00:00
13 NaT
14 2015-11-11 08:00:00
15 2015-11-11 11:00:00
16 2015-11-11 08:00:00
17 2015-11-11 11:00:00
18 2015-11-11 09:00:00
19 NaT
20 2015-11-11 09:00:00
21 NaT
22 2015-11-11 09:00:00
23 NaT
24 2015-11-11 09:00:00
25 NaT
26 2015-11-11 09:00:00
27 NaT
28 2015-11-11 09:00:00
29 NaT
But without the date added to it. I am then trying to merge my pandas columns into a single column to be able to iterate through. I have tried adding them astype(str) with no success in a pd.merge.
Any ideas on how to use the to_datetime function in pandas while just keeping it as UTC time?
Considering the following input Data:
data = ['8:00 AM',
'11:00 AM',
'8:00 AM',
'4:00 PM',
'9:00 AM',
'',
'9:00 AM',
'',
'9:00 AM']
Code:
import pandas as pd
x = pd.to_datetime(data).time
pd.Series(x)
Output:
0 08:00:00
1 11:00:00
2 08:00:00
3 16:00:00
4 09:00:00
5 NaN
6 09:00:00
7 NaN
8 09:00:00
dtype: object
If you have other data in another series you would like to join into the same dataframe:
x = pd.Series(x)
y = pd.Series(range(9))
pd.concat([x, y], axis=1)
0 1
0 08:00:00 0
1 11:00:00 1
2 08:00:00 2
Finally, if you prefer the columns merged as strings, try this:
z = pd.concat([x, y], axis=1)
z[0].astype(str) + ' foo ' + z[1].astype(str)
0 08:00:00 foo 0
1 11:00:00 foo 1
2 08:00:00 foo 2
3 16:00:00 foo 3
4 09:00:00 foo 4
5 nan foo 5
6 09:00:00 foo 6
7 nan foo 7
8 09:00:00 foo 8
dtype: object