Python Datetime calculation - python

I would like to create a function with python, this is the calculation, if end time of a shift is after 20:00 and between 06:00 it has to create me an extra 25% in minutes for each hour passed after 20:00.
Any suggestions?

UPDATED:
Here is a way to do what I believe your question asks:
from datetime import datetime, timedelta
def getHours(startTime, endTime, extraFraction):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%Y-%m-%d")
bonusStartTime = datetime.strptime(startDateStr + " " + "20:00:00", "%Y-%m-%d %H:%M:%S")
prevBonusEndTime = datetime.strptime(startTime.strftime("%Y-%m-%d") + " " + "06:00:00", "%Y-%m-%d %H:%M:%S")
bonusEndTime = prevBonusEndTime + timedelta(days=1)
bonusPeriod = timedelta(days=0)
duration = endTime - startTime
hours = duration.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
bonusPeriod += fullDays * (bonusEndTime - bonusStartTime)
endTime -= timedelta(days=fullDays)
if startTime < prevBonusEndTime:
bonusPeriod += prevBonusEndTime - startTime
if endTime < prevBonusEndTime:
bonusPeriod -= prevBonusEndTime - endTime
if startTime > bonusStartTime:
bonusPeriod -= startTime - bonusStartTime
if endTime > bonusStartTime:
bonusPeriod += min(endTime, bonusEndTime) - bonusStartTime
delta = duration + bonusPeriod * extraFraction
return delta
Explanation:
confirm startTime is before endTime, otherwise raise an exception
set the following:
prevBonusStartTime as 20:00 on the day before startTime
bonusStartTime as 20:00 on the day of startTime
bonusEndTime as 06:00 on the day after startTime
if endTime is more than 24 hours after startTime, record this in duration and bonusPeriod and rewind endTime by the number of full days (24-hour periods) by which it exceeds startTime
add or subtract to bonusPeriod by the number of hours (in addition to any calculated above) of overlap between startTime, endTime and the intervals 00:00, prevBonusEndTime and/or bonusStartTime, bonusEndTime.
Test code:
def testing(start, end):
print(f'start {start}, end {end}, actual hours {getHours(start, end, 0)}, effective hours {getHours(start, end, 0.25)}')
startTime = datetime.strptime("2022-05-26 06:00:00", "%Y-%m-%d %H:%M:%S")
endTime = startTime
for h in range(0, 48, 3):
testing(startTime, endTime + timedelta(hours=h))
endTime += timedelta(hours=48)
for h in range(0, 48, 3):
testing(startTime + timedelta(hours=h), endTime)
Output:
start 2022-05-26 06:00:00, end 2022-05-26 06:00:00, actual hours 0:00:00, effective hours 0:00:00
start 2022-05-26 06:00:00, end 2022-05-26 09:00:00, actual hours 3:00:00, effective hours 3:00:00
start 2022-05-26 06:00:00, end 2022-05-26 12:00:00, actual hours 6:00:00, effective hours 6:00:00
start 2022-05-26 06:00:00, end 2022-05-26 15:00:00, actual hours 9:00:00, effective hours 9:00:00
start 2022-05-26 06:00:00, end 2022-05-26 18:00:00, actual hours 12:00:00, effective hours 12:00:00
start 2022-05-26 06:00:00, end 2022-05-26 21:00:00, actual hours 15:00:00, effective hours 15:15:00
start 2022-05-26 06:00:00, end 2022-05-27 00:00:00, actual hours 18:00:00, effective hours 19:00:00
start 2022-05-26 06:00:00, end 2022-05-27 03:00:00, actual hours 21:00:00, effective hours 22:45:00
start 2022-05-26 06:00:00, end 2022-05-27 06:00:00, actual hours 1 day, 0:00:00, effective hours 1 day, 2:30:00
start 2022-05-26 06:00:00, end 2022-05-27 09:00:00, actual hours 1 day, 3:00:00, effective hours 1 day, 5:30:00
start 2022-05-26 06:00:00, end 2022-05-27 12:00:00, actual hours 1 day, 6:00:00, effective hours 1 day, 8:30:00
start 2022-05-26 06:00:00, end 2022-05-27 15:00:00, actual hours 1 day, 9:00:00, effective hours 1 day, 11:30:00
start 2022-05-26 06:00:00, end 2022-05-27 18:00:00, actual hours 1 day, 12:00:00, effective hours 1 day, 14:30:00
start 2022-05-26 06:00:00, end 2022-05-27 21:00:00, actual hours 1 day, 15:00:00, effective hours 1 day, 17:45:00
start 2022-05-26 06:00:00, end 2022-05-28 00:00:00, actual hours 1 day, 18:00:00, effective hours 1 day, 21:30:00
start 2022-05-26 06:00:00, end 2022-05-28 03:00:00, actual hours 1 day, 21:00:00, effective hours 2 days, 1:15:00
start 2022-05-26 06:00:00, end 2022-05-28 06:00:00, actual hours 2 days, 0:00:00, effective hours 2 days, 5:00:00
start 2022-05-26 09:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 21:00:00, effective hours 2 days, 2:00:00
start 2022-05-26 12:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 18:00:00, effective hours 1 day, 23:00:00
start 2022-05-26 15:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 15:00:00, effective hours 1 day, 20:00:00
start 2022-05-26 18:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 12:00:00, effective hours 1 day, 17:00:00
start 2022-05-26 21:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 9:00:00, effective hours 1 day, 13:45:00
start 2022-05-27 00:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 6:00:00, effective hours 1 day, 10:00:00
start 2022-05-27 03:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 3:00:00, effective hours 1 day, 6:15:00
start 2022-05-27 06:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 0:00:00, effective hours 1 day, 2:30:00
start 2022-05-27 09:00:00, end 2022-05-28 06:00:00, actual hours 21:00:00, effective hours 23:30:00
start 2022-05-27 12:00:00, end 2022-05-28 06:00:00, actual hours 18:00:00, effective hours 20:30:00
start 2022-05-27 15:00:00, end 2022-05-28 06:00:00, actual hours 15:00:00, effective hours 17:30:00
start 2022-05-27 18:00:00, end 2022-05-28 06:00:00, actual hours 12:00:00, effective hours 14:30:00
start 2022-05-27 21:00:00, end 2022-05-28 06:00:00, actual hours 9:00:00, effective hours 11:15:00
start 2022-05-28 00:00:00, end 2022-05-28 06:00:00, actual hours 6:00:00, effective hours 7:30:00
start 2022-05-28 03:00:00, end 2022-05-28 06:00:00, actual hours 3:00:00, effective hours 3:45:00
UPDATE #2:
Here is slightly modified code that outputs regular hours, bonus hours (i.e., hours in the bonus window from 20:00 to 06:00) and extra hours (25% * bonus hours):
from datetime import datetime, timedelta
def getRegularAndBonusHours(startTime, endTime):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%Y-%m-%d")
bonusStartTime = datetime.strptime(startDateStr + " " + "20:00:00", "%Y-%m-%d %H:%M:%S")
prevBonusEndTime = datetime.strptime(startTime.strftime("%Y-%m-%d") + " " + "06:00:00", "%Y-%m-%d %H:%M:%S")
bonusEndTime = prevBonusEndTime + timedelta(days=1)
bonusPeriod = timedelta(days=0)
duration = endTime - startTime
hours = duration.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
bonusPeriod += fullDays * (bonusEndTime - bonusStartTime)
endTime -= timedelta(days=fullDays)
if startTime < prevBonusEndTime:
bonusPeriod += prevBonusEndTime - startTime
if endTime < prevBonusEndTime:
bonusPeriod -= prevBonusEndTime - endTime
if startTime > bonusStartTime:
bonusPeriod -= startTime - bonusStartTime
if endTime > bonusStartTime:
bonusPeriod += min(endTime, bonusEndTime) - bonusStartTime
return duration, bonusPeriod
def getHours(startTime, endTime, extraFraction):
duration, bonusPeriod = getRegularAndBonusHours(startTime, endTime)
delta = duration + bonusPeriod * extraFraction
return delta
def testing(start, end):
duration, bonusPeriod = getRegularAndBonusHours(start, end)
def getHoursRoundedUp(delta):
return delta.days * 24 + delta.seconds // 3600 + (1 if delta.seconds % 3600 else 0)
regularHours, bonusHours = getHoursRoundedUp(duration), getHoursRoundedUp(bonusPeriod)
print(f'start {start}, end {end}, regular {regularHours}, bonus {bonusHours}, extra {0.25 * bonusHours}')
startTime = datetime.strptime("2022-05-26 06:00:00", "%Y-%m-%d %H:%M:%S")
endTime = startTime
for h in range(0, 48, 3):
testing(startTime, endTime + timedelta(hours=h))
endTime += timedelta(hours=48)
for h in range(0, 48, 3):
testing(startTime + timedelta(hours=h), endTime)
Output:
start 2022-05-26 06:00:00, end 2022-05-26 06:00:00, regular 0, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 09:00:00, regular 3, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 12:00:00, regular 6, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 15:00:00, regular 9, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 18:00:00, regular 12, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 21:00:00, regular 15, bonus 1, extra 0.25
start 2022-05-26 06:00:00, end 2022-05-27 00:00:00, regular 18, bonus 4, extra 1.0
start 2022-05-26 06:00:00, end 2022-05-27 03:00:00, regular 21, bonus 7, extra 1.75
start 2022-05-26 06:00:00, end 2022-05-27 06:00:00, regular 24, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 09:00:00, regular 27, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 12:00:00, regular 30, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 15:00:00, regular 33, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 18:00:00, regular 36, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 21:00:00, regular 39, bonus 11, extra 2.75
start 2022-05-26 06:00:00, end 2022-05-28 00:00:00, regular 42, bonus 14, extra 3.5
start 2022-05-26 06:00:00, end 2022-05-28 03:00:00, regular 45, bonus 17, extra 4.25
start 2022-05-26 06:00:00, end 2022-05-28 06:00:00, regular 48, bonus 20, extra 5.0
start 2022-05-26 09:00:00, end 2022-05-28 06:00:00, regular 45, bonus 20, extra 5.0
start 2022-05-26 12:00:00, end 2022-05-28 06:00:00, regular 42, bonus 20, extra 5.0
start 2022-05-26 15:00:00, end 2022-05-28 06:00:00, regular 39, bonus 20, extra 5.0
start 2022-05-26 18:00:00, end 2022-05-28 06:00:00, regular 36, bonus 20, extra 5.0
start 2022-05-26 21:00:00, end 2022-05-28 06:00:00, regular 33, bonus 19, extra 4.75
start 2022-05-27 00:00:00, end 2022-05-28 06:00:00, regular 30, bonus 16, extra 4.0
start 2022-05-27 03:00:00, end 2022-05-28 06:00:00, regular 27, bonus 13, extra 3.25
start 2022-05-27 06:00:00, end 2022-05-28 06:00:00, regular 24, bonus 10, extra 2.5
start 2022-05-27 09:00:00, end 2022-05-28 06:00:00, regular 21, bonus 10, extra 2.5
start 2022-05-27 12:00:00, end 2022-05-28 06:00:00, regular 18, bonus 10, extra 2.5
start 2022-05-27 15:00:00, end 2022-05-28 06:00:00, regular 15, bonus 10, extra 2.5
start 2022-05-27 18:00:00, end 2022-05-28 06:00:00, regular 12, bonus 10, extra 2.5
start 2022-05-27 21:00:00, end 2022-05-28 06:00:00, regular 9, bonus 9, extra 2.25
start 2022-05-28 00:00:00, end 2022-05-28 06:00:00, regular 6, bonus 6, extra 1.5
start 2022-05-28 03:00:00, end 2022-05-28 06:00:00, regular 3, bonus 3, extra 0.75
UPDATE #3
Latest clarification from OP in a comment indicates:
A need to update in excel the allowances received in case of night work
The goal in the excel sheet is to separately enter start time, end time, working time (without supplement), and night work supplement (25% from 20:00 to 06:) for each hour started for night work.
Here is updated code to create the required data result, and optionally to use a pandas dataframe to put this into an Excel file. Test inputs are used to explore a range of start and end times, including partial hours:
from datetime import datetime, timedelta
def getRegularAndBonusHours(startTime, endTime):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%Y-%m-%d")
bonusStartTime = datetime.strptime(startDateStr + " " + "20:00:00", "%Y-%m-%d %H:%M:%S")
prevBonusEndTime = datetime.strptime(startTime.strftime("%Y-%m-%d") + " " + "06:00:00", "%Y-%m-%d %H:%M:%S")
bonusEndTime = prevBonusEndTime + timedelta(days=1)
bonusPeriod = timedelta(days=0)
duration = endTime - startTime
hours = duration.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
bonusPeriod += fullDays * (bonusEndTime - bonusStartTime)
endTime -= timedelta(days=fullDays)
if startTime < prevBonusEndTime:
bonusPeriod += prevBonusEndTime - startTime
if endTime < prevBonusEndTime:
bonusPeriod -= prevBonusEndTime - endTime
if startTime > bonusStartTime:
bonusPeriod -= startTime - bonusStartTime
if endTime > bonusStartTime:
bonusPeriod += min(endTime, bonusEndTime) - bonusStartTime
return duration, bonusPeriod
def testing(start, end):
duration, bonusPeriod = getRegularAndBonusHours(start, end)
def getHoursFromDelta(delta, roundUp=False):
return delta.days * 24 + (delta.seconds // 3600 + (1 if delta.seconds % 3600 else 0)) if roundUp else (delta.seconds / 3600)
fullHours, bonusHours = getHoursFromDelta(duration + bonusPeriod), getHoursFromDelta(bonusPeriod, True)
return start, end, fullHours, bonusHours * 0.25
# calculate test results
results = []
startTime = datetime.strptime("2022-05-26 06:00:00", "%Y-%m-%d %H:%M:%S")
endTime = startTime
for halfHours in range(0, 2 * 48, 5):
results.append(testing(startTime, endTime + timedelta(hours=halfHours / 2)))
endTime += timedelta(hours=48)
for halfHours in range(0, 2 * 48, 5):
results.append(testing(startTime + timedelta(hours=halfHours / 2), endTime))
# print results
headings = ['Start Time', 'End Time', 'Working Hours', '25% of Supplemental Hours Started']
[print(f'{x:30}', end='') for x in headings]
[[print(f'{f"{x}":30}', end='') for x in row] for row in results if print() or True]
print()
# OPTIONAL: save results in pandas dataframe and save as Excel file
import pandas as pd
df = pd.DataFrame(results, columns=headings)
print(df)
with pd.ExcelWriter('TestTimesheet.xlsx') as writer:
df.to_excel(writer, index=None, sheet_name='Timesheet')
ws = writer.sheets['Timesheet']
for column in df:
column_length = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
ws.column_dimensions[chr(ord('A') + col_idx)].width = column_length
Output:
Start Time End Time Working Hours 25% of Supplemental Hours Started
0 2022-05-26 06:00:00 2022-05-26 06:00:00 0.0 0.00
1 2022-05-26 06:00:00 2022-05-26 08:30:00 2.5 0.00
2 2022-05-26 06:00:00 2022-05-26 11:00:00 5.0 0.00
3 2022-05-26 06:00:00 2022-05-26 13:30:00 7.5 0.00
4 2022-05-26 06:00:00 2022-05-26 16:00:00 10.0 0.00
5 2022-05-26 06:00:00 2022-05-26 18:30:00 12.5 0.00
6 2022-05-26 06:00:00 2022-05-26 21:00:00 16.0 0.25
7 2022-05-26 06:00:00 2022-05-26 23:30:00 21.0 1.00
8 2022-05-26 06:00:00 2022-05-27 02:00:00 2.0 1.50
9 2022-05-26 06:00:00 2022-05-27 04:30:00 7.0 2.25
10 2022-05-26 06:00:00 2022-05-27 07:00:00 11.0 2.50
11 2022-05-26 06:00:00 2022-05-27 09:30:00 13.5 2.50
12 2022-05-26 06:00:00 2022-05-27 12:00:00 16.0 2.50
13 2022-05-26 06:00:00 2022-05-27 14:30:00 18.5 2.50
14 2022-05-26 06:00:00 2022-05-27 17:00:00 21.0 2.50
15 2022-05-26 06:00:00 2022-05-27 19:30:00 23.5 2.50
16 2022-05-26 06:00:00 2022-05-27 22:00:00 4.0 3.00
17 2022-05-26 06:00:00 2022-05-28 00:30:00 9.0 3.75
18 2022-05-26 06:00:00 2022-05-28 03:00:00 14.0 4.25
19 2022-05-26 06:00:00 2022-05-28 05:30:00 19.0 5.00
20 2022-05-26 06:00:00 2022-05-28 06:00:00 20.0 5.00
21 2022-05-26 08:30:00 2022-05-28 06:00:00 17.5 5.00
22 2022-05-26 11:00:00 2022-05-28 06:00:00 15.0 5.00
23 2022-05-26 13:30:00 2022-05-28 06:00:00 12.5 5.00
24 2022-05-26 16:00:00 2022-05-28 06:00:00 10.0 5.00
25 2022-05-26 18:30:00 2022-05-28 06:00:00 7.5 5.00
26 2022-05-26 21:00:00 2022-05-28 06:00:00 4.0 4.75
27 2022-05-26 23:30:00 2022-05-28 06:00:00 23.0 4.25
28 2022-05-27 02:00:00 2022-05-28 06:00:00 18.0 3.50
29 2022-05-27 04:30:00 2022-05-28 06:00:00 13.0 3.00
30 2022-05-27 07:00:00 2022-05-28 06:00:00 9.0 2.50
31 2022-05-27 09:30:00 2022-05-28 06:00:00 6.5 2.50
32 2022-05-27 12:00:00 2022-05-28 06:00:00 4.0 2.50
33 2022-05-27 14:30:00 2022-05-28 06:00:00 1.5 2.50
34 2022-05-27 17:00:00 2022-05-28 06:00:00 23.0 2.50
35 2022-05-27 19:30:00 2022-05-28 06:00:00 20.5 2.50
36 2022-05-27 22:00:00 2022-05-28 06:00:00 16.0 2.00
37 2022-05-28 00:30:00 2022-05-28 06:00:00 11.0 1.50
38 2022-05-28 03:00:00 2022-05-28 06:00:00 6.0 0.75
39 2022-05-28 05:30:00 2022-05-28 06:00:00 1.0 0.25

from datetime import datetime, timedelta
soup_dienstbegin = '20:00'
soup_dienstende = '23:00'
startinfo = f'{soup_datum} {soup_dienstbegin}'
IcsStartData = datetime.strptime(startinfo, "%d.%m.%Y %H:%M")
endTime = endingtime
startTime = IcsStartData
def zeit_zuschlag_function(startTime, endTime):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%d.%m.%Y")
zeit_zuschlag_start = datetime.strptime(startDateStr + " " + "22:00", "%d.%m.%Y %H:%M")
zeit_zuschlag_end = datetime.strptime(startTime.strftime("%d.%m.%Y") + " " + "00:00", "%d.%m.%Y %H:%M")
zeit_zuschlag_time = zeit_zuschlag_end + timedelta(days=1)
zeit_zuschlag_zeit = timedelta(days=0)
dienstdauer = endTime - startTime
hours = dienstdauer.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
zeit_zuschlag_zeit += fullDays * (zeit_zuschlag_time - zeit_zuschlag_start)
endTime -= timedelta(days=fullDays)
if startTime < zeit_zuschlag_end:
zeit_zuschlag_zeit += zeit_zuschlag_end - startTime
if endTime < zeit_zuschlag_end:
zeit_zuschlag_zeit -= zeit_zuschlag_end - endTime
if startTime > zeit_zuschlag_start:
zeit_zuschlag_zeit -= startTime - zeit_zuschlag_start
if endTime > zeit_zuschlag_start:
zeit_zuschlag_zeit += min(endTime, zeit_zuschlag_time) - zeit_zuschlag_start
return dienstdauer, zeit_zuschlag_zeit
def nacht_zulagen_stunden(startTime, endTime, extraFraction):
dienstdauer, zeit_zuschlag_zeit = zeit_zuschlag_function(startTime, endTime)
delta = dienstdauer + zeit_zuschlag_zeit * extraFraction
return delta
def nacht_zulagen_executing(start, end):
dienstdauer, zeit_zuschlag_zeit = zeit_zuschlag_function(start, end)
def nacht_zulagen_hours_roundeup(delta):
return delta.days * 24 + delta.seconds // 3600 + (1 if delta.seconds % 3600 else 0)
regularHours, nachtszulage = nacht_zulagen_hours_roundeup(dienstdauer), nacht_zulagen_hours_roundeup(zeit_zuschlag_zeit)
# nachtarbeitzeitzuschlag 10 % pro Stunde
nachtarbeit_zeit_zuschlag_10 = zeit_zuschlag_zeit / 100 * 110 - (zeit_zuschlag_zeit)
print(nachtarbeit_zeit_zuschlag_10)
for h in range(1):
nacht_zulagen_executing(startTime, endTime + timedelta(hours=h))

Related

Split time series in intervals of non-uniform length

I have a time series with breaks (times w/o recordings) in between. A simplified example would be:
df = pd.DataFrame(
np.random.rand(13), columns=["values"],
index=pd.date_range(start='1/1/2020 11:00:00',end='1/1/2020 23:00:00',freq='H'))
df.iloc[4:7] = np.nan
df.dropna(inplace=True)
df
values
2020-01-01 11:00:00 0.100339
2020-01-01 12:00:00 0.054668
2020-01-01 13:00:00 0.209965
2020-01-01 14:00:00 0.551023
2020-01-01 18:00:00 0.495879
2020-01-01 19:00:00 0.479905
2020-01-01 20:00:00 0.250568
2020-01-01 21:00:00 0.904743
2020-01-01 22:00:00 0.686085
2020-01-01 23:00:00 0.188166
Now I would like to split it in intervals which are divided by a certain time span (e.g. 2h). In the example above this would be:
( values
2020-01-01 11:00:00 0.100339
2020-01-01 12:00:00 0.054668
2020-01-01 13:00:00 0.209965
2020-01-01 14:00:00 0.551023,
values
2020-01-01 18:00:00 0.495879
2020-01-01 19:00:00 0.479905
2020-01-01 20:00:00 0.250568
2020-01-01 21:00:00 0.904743
2020-01-01 22:00:00 0.686085
2020-01-01 23:00:00 0.188166)
I was a bit surprised that I didn't find anything on that since I thought this is a common problem. My current solution to get start and end index of each interval is :
def intervals(data: pd.DataFrame, delta_t: timedelta = timedelta(hours=2)):
data = data.sort_values(by=['event_timestamp'], ignore_index=True)
breaks = (data['event_timestamp'].diff() > delta_t).astype(bool).values
ranges = []
start = 0
end = start
for i, e in enumerate(breaks):
if not e:
end = i
if i == len(breaks) - 1:
ranges.append((start, end))
start = i
end = start
elif i != 0:
ranges.append((start, end))
start = i
end = start
return ranges
Any suggestions how I could do this in a smarter way? I suspect this should be somehow possible using groupby.
Yes, you can use the very convenient np.split:
dt = pd.Timedelta('2H')
parts = np.split(df, np.where(np.diff(df.index) > dt)[0] + 1)
Which gives, for your example:
>>> parts
[ values
2020-01-01 11:00:00 0.557374
2020-01-01 12:00:00 0.942296
2020-01-01 13:00:00 0.181189
2020-01-01 14:00:00 0.758822,
values
2020-01-01 18:00:00 0.682125
2020-01-01 19:00:00 0.818187
2020-01-01 20:00:00 0.053515
2020-01-01 21:00:00 0.572342
2020-01-01 22:00:00 0.423129
2020-01-01 23:00:00 0.882215]
#Pierre thanks for your input. I now got to a solution which is convenient for me:
df['diff'] = df.index.to_series().diff()
max_gap = timedelta(hours=2)
df['gapId'] = 0
df.loc[df['diff'] >= max_gap, ['gapId']] = 1
df['gapId'] = df['gapId'].cumsum()
list(df.groupby('gapId'))
gives:
[(0,
values date diff gapId
0 1.0 2020-01-01 11:00:00 NaT 0
1 1.0 2020-01-01 12:00:00 0 days 01:00:00 0
2 1.0 2020-01-01 13:00:00 0 days 01:00:00 0
3 1.0 2020-01-01 14:00:00 0 days 01:00:00 0),
(1,
values date diff gapId
7 1.0 2020-01-01 18:00:00 0 days 04:00:00 1
8 1.0 2020-01-01 19:00:00 0 days 01:00:00 1
9 1.0 2020-01-01 20:00:00 0 days 01:00:00 1
10 1.0 2020-01-01 21:00:00 0 days 01:00:00 1
11 1.0 2020-01-01 22:00:00 0 days 01:00:00 1
12 1.0 2020-01-01 23:00:00 0 days 01:00:00 1)]

Transform the Random time intervals to 30 mins Structured interval

I have this dataFrame where some tasks happened time period
Date Start Time End Time
0 2016-01-01 0:00:00 2016-01-01 0:10:00 2016-01-01 0:25:00
1 2016-01-01 0:00:00 2016-01-01 1:17:00 2016-01-01 1:31:00
2 2016-01-02 0:00:00 2016-01-02 0:30:00 2016-01-02 0:32:00
... ... ... ...
Convert this df to 30 mins interval
Expected outcome
Date Hours
1 2016-01-01 0:30:00 0:15
2 2016-01-01 1:00:00 0:00
3 2016-01-01 1:30:00 0:13
4 2016-01-01 2:00:00 0:01
5 2016-01-01 2:30:00 0:00
6 2016-01-01 3:00:00 0:00
... ...
47 2016-01-01 23:30:00 0:00
48 2016-01-02 23:59:59 0:00
49 2016-01-02 00:30:00 0:00
50 2016-01-02 01:00:00 0:02
... ...
I was trying to do with for loop which was getting tedious. Any simple way to do in pandas.
IIUC you can discard the Date column, get the time difference between start and end, groupby 30 minutes and agg on first (assuming you always have one entry only per 30 minutes slot):
print (df.assign(Diff=df["End Time"]-df["Start Time"])
.groupby(pd.Grouper(key="Start Time", freq="30T"))
.agg({"Diff": "first"})
.fillna(pd.Timedelta(seconds=0)))
Diff
Start Time
2016-01-01 00:00:00 0 days 00:15:00
2016-01-01 00:30:00 0 days 00:00:00
2016-01-01 01:00:00 0 days 00:14:00
2016-01-01 01:30:00 0 days 00:00:00
2016-01-01 02:00:00 0 days 00:00:00
2016-01-01 02:30:00 0 days 00:00:00
...
2016-01-02 00:30:00 0 days 00:02:00
The idea is to create a series with 0 and DatetimeIndex per minutes between min start time and max end time. Then add 1 where Start Time and subtract 1 where End Time. You can then use cumsum to count the values between Start and End, resample.sum per 30 minutes and reset_index. The last line of code is to get the proper format in the Hours column.
#create a series of 0 with a datetime index
res = pd.Series(data=0,
index= pd.DatetimeIndex(pd.date_range(df['Start Time'].min(),
df['End Time'].max(),
freq='T'),
name='Dates'),
name='Hours')
# add 1 o the start time and remove 1 to the end start
res[df['Start Time']] += 1
res[df['End Time']] -= 1
# cumsum to get the right value for each minute then resample per 30 minutes
res = (res.cumsum()
.resample('30T', label='right').sum()
.reset_index('Dates')
)
# change the format of the Hours column, honestly not necessary
res['Hours'] = pd.to_datetime(res['Hours'], format='%M').dt.strftime('%H:%M') # or .dt.time
print(res)
Dates Hours
0 2016-01-01 00:30:00 00:15
1 2016-01-01 01:00:00 00:00
2 2016-01-01 01:30:00 00:13
3 2016-01-01 02:00:00 00:01
4 2016-01-01 02:30:00 00:00
5 2016-01-01 03:00:00 00:00
...
48 2016-01-02 00:30:00 00:00
49 2016-01-02 01:00:00 00:02

Convert a Pandas Column to Hours and Minutes

I have one field in a Pandas DataFrame that is in integer format. How do I convert to a DateTime format and append the column to my DataFrame?. Specifically, I need hours and minutes.
Example:
DataFrame Name: df
The column as a list: df.index
dtype='int64'
Sample data in df.index -- [0, 15, 30, 45, 100, 115, 130, 145, 200...2300, 2315, 2330, 2345]
I tried pd.to_datetime(df.index, format='') but it is returning the wrong format.
You have an index that has time values as HHMM represented by an integer. In order to convert this to a datetime dtype, you have to first make strings that can be correctly converted by the to_datetime() method.
time_strs = df.index.astype(str).str.zfill(4)
This converts all of the integer values to strings that are zero padded to 4 characters, so 15 becomes the string "0015" for example.
Now you can use the format "%H%M" to convert to a datetime object:
pd.to_datetime(time_strs, format="%H%M")
And then use the methods of datetime objects to access the hours and minutes.
import pandas as pd
df = pd.DataFrame({'time':[0, 15, 30, 45, 100, 115, 130, 145, 200, 2300, 2315, 2330, 2345]})
df.set_index('time', inplace=True)
df['datetime_dtype'] = pd.to_datetime(df.index, format='%H', exact=False)
df['str_dtype'] = df['datetime_dtype'].astype(str).str[11:16]
print(df)
datetime_dtype str_dtype
time
0 1900-01-01 00:00:00 00:00
15 1900-01-01 15:00:00 15:00
30 1900-01-01 03:00:00 03:00
45 1900-01-01 04:00:00 04:00
100 1900-01-01 10:00:00 10:00
115 1900-01-01 11:00:00 11:00
130 1900-01-01 13:00:00 13:00
145 1900-01-01 14:00:00 14:00
200 1900-01-01 20:00:00 20:00
2300 1900-01-01 23:00:00 23:00
2315 1900-01-01 23:00:00 23:00
2330 1900-01-01 23:00:00 23:00
2345 1900-01-01 23:00:00 23:00
print(df.dtypes)
datetime_dtype datetime64[ns]
str_dtype object
dtype: object
If you want to get back to this year, you can use a time delta.
delta = pd.Timedelta(weeks=6278, hours=0, minutes=0)
df['datetime_dtype_2020'] = df['datetime_dtype'] + delta
print(df)
datetime_dtype str_dtype datetime_dtype_2020
time
0 1900-01-01 00:00:00 00:00 2020-04-27 00:00:00
15 1900-01-01 15:00:00 15:00 2020-04-27 15:00:00
30 1900-01-01 03:00:00 03:00 2020-04-27 03:00:00
45 1900-01-01 04:00:00 04:00 2020-04-27 04:00:00
100 1900-01-01 10:00:00 10:00 2020-04-27 10:00:00
115 1900-01-01 11:00:00 11:00 2020-04-27 11:00:00
130 1900-01-01 13:00:00 13:00 2020-04-27 13:00:00
145 1900-01-01 14:00:00 14:00 2020-04-27 14:00:00
200 1900-01-01 20:00:00 20:00 2020-04-27 20:00:00
2300 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00
2315 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00
2330 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00
2345 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00
If you only want hours and minutes, then you can use datetime.time objects.
import datetime
def int_to_time(i):
if i < 60:
return datetime.time(0, i)
elif i < 1000:
return datetime.time(int(str(i)[0]), int(str(i)[1:]))
else:
return datetime.time(int(str(i)[0:2]), int(str(i)[2:]))
df.index.apply(int_to_time)
Example
import datetime
import numpy as np
ints = [i for i in np.random.randint(0, 2400, 100) if i % 100 < 60][0:5]
df = pd.DataFrame({'a': ints})
>>>df
0 1559
1 1712
2 1233
3 953
4 938
>>>df['a'].apply(int_to_time)
0 15:59:00
1 17:12:00
2 12:33:00
3 09:53:00
4 09:38:00
From there, you can access the hour and minute properties of the values
>>>df['a'].apply(int_to_time).apply(lambda x: (x.hour, x.minute))
0 (15, 59)
1 (17, 12)
2 (12, 33)
3 (9, 53)
4 (9, 38)

Generating list of 5 minute interval between two times

I have the following strings:
start = "07:00:00"
end = "17:00:00"
How can I generate a list of 5 minute interval between those times, ie
["07:00:00","07:05:00",...,"16:55:00","17:00:00"]
This works for me, I'm sure you can figure out how to put the results in the list instead of printing them out:
>>> import datetime
>>> start = "07:00:00"
>>> end = "17:00:00"
>>> delta = datetime.timedelta(minutes=5)
>>> start = datetime.datetime.strptime( start, '%H:%M:%S' )
>>> end = datetime.datetime.strptime( end, '%H:%M:%S' )
>>> t = start
>>> while t <= end :
... print datetime.datetime.strftime( t, '%H:%M:%S')
... t += delta
...
07:00:00
07:05:00
07:10:00
07:15:00
07:20:00
07:25:00
07:30:00
07:35:00
07:40:00
07:45:00
07:50:00
07:55:00
08:00:00
08:05:00
08:10:00
08:15:00
08:20:00
08:25:00
08:30:00
08:35:00
08:40:00
08:45:00
08:50:00
08:55:00
09:00:00
09:05:00
09:10:00
09:15:00
09:20:00
09:25:00
09:30:00
09:35:00
09:40:00
09:45:00
09:50:00
09:55:00
10:00:00
10:05:00
10:10:00
10:15:00
10:20:00
10:25:00
10:30:00
10:35:00
10:40:00
10:45:00
10:50:00
10:55:00
11:00:00
11:05:00
11:10:00
11:15:00
11:20:00
11:25:00
11:30:00
11:35:00
11:40:00
11:45:00
11:50:00
11:55:00
12:00:00
12:05:00
12:10:00
12:15:00
12:20:00
12:25:00
12:30:00
12:35:00
12:40:00
12:45:00
12:50:00
12:55:00
13:00:00
13:05:00
13:10:00
13:15:00
13:20:00
13:25:00
13:30:00
13:35:00
13:40:00
13:45:00
13:50:00
13:55:00
14:00:00
14:05:00
14:10:00
14:15:00
14:20:00
14:25:00
14:30:00
14:35:00
14:40:00
14:45:00
14:50:00
14:55:00
15:00:00
15:05:00
15:10:00
15:15:00
15:20:00
15:25:00
15:30:00
15:35:00
15:40:00
15:45:00
15:50:00
15:55:00
16:00:00
16:05:00
16:10:00
16:15:00
16:20:00
16:25:00
16:30:00
16:35:00
16:40:00
16:45:00
16:50:00
16:55:00
17:00:00
>>>
Try:
# import modules
from datetime import datetime, timedelta
# Create starting and end datetime object from string
start = datetime.strptime("07:00:00", "%H:%M:%S")
end = datetime.strptime("17:00:00", "%H:%M:%S")
# min_gap
min_gap = 5
# compute datetime interval
arr = [(start + timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S")
for i in range(int((end-start).total_seconds() / 60.0 / min_gap))]
print(arr)
# ['07:00:00', '07:05:00', '07:10:00', '07:15:00', '07:20:00', '07:25:00', '07:30:00', ..., '16:55:00']
Explanations:
First, you need to convert string date to datetime object. The strptime does it!
Then, we will find the number of minutes between the starting date and the ending datetime. This discussion solved it! We can do it like this :
(end-start).total_seconds() / 60.0
However, in our case, we only want to iterate every n minutes. So, in our loop, we need to divide it by n.
Also, as we will iterate over this number of minutes, we need to convertit to int for the for loop. That results in:
int((end-start).total_seconds() / 60.0 / min_gap)
Then, on each element of our loop, we will add the number of minutes to the initial datetime. The tiemdelta function has been designed for. As parameter, we specify the number of hours we want to add : min_gap*i/60.
Finally, we convert this datetime object back to a string object using the strftime.

Pandas filtering values in dataframe

I have this dataframe. The columns represent the highs and the lows in daily EURUSD price:
df.low df.high
2013-01-17 16:00:00 1.33394 2013-01-17 20:00:00 1.33874
2013-01-18 18:00:00 1.32805 2013-01-18 09:00:00 1.33983
2013-01-21 00:00:00 1.32962 2013-01-21 09:00:00 1.33321
2013-01-22 11:00:00 1.32667 2013-01-22 09:00:00 1.33715
2013-01-23 17:00:00 1.32645 2013-01-23 14:00:00 1.33545
2013-01-24 10:00:00 1.32860 2013-01-24 18:00:00 1.33926
2013-01-25 04:00:00 1.33497 2013-01-25 17:00:00 1.34783
2013-01-28 10:00:00 1.34246 2013-01-28 16:00:00 1.34771
2013-01-29 13:00:00 1.34143 2013-01-29 21:00:00 1.34972
2013-01-30 08:00:00 1.34820 2013-01-30 21:00:00 1.35873
2013-01-31 13:00:00 1.35411 2013-01-31 17:00:00 1.35944
I summed them up into a third column (df.extremes).
df.extremes
2013-01-17 16:00:00 1.33394
2013-01-17 20:00:00 1.33874
2013-01-18 18:00:00 1.32805
2013-01-18 09:00:00 1.33983
2013-01-21 00:00:00 1.32962
2013-01-21 09:00:00 1.33321
2013-01-22 09:00:00 1.33715
2013-01-22 11:00:00 1.32667
2013-01-23 14:00:00 1.33545
2013-01-23 17:00:00 1.32645
2013-01-24 10:00:00 1.32860
2013-01-24 18:00:00 1.33926
2013-01-25 04:00:00 1.33497
2013-01-25 17:00:00 1.34783
2013-01-28 10:00:00 1.34246
2013-01-28 16:00:00 1.34771
2013-01-29 13:00:00 1.34143
2013-01-29 21:00:00 1.34972
2013-01-30 08:00:00 1.34820
2013-01-30 21:00:00 1.35873
2013-01-31 13:00:00 1.35411
2013-01-31 17:00:00 1.35944
But now i want to filter some values from df.extremes.
To explain what to filter i try with this "pseudocode":
IF following the index we move from: previous df.low --> df.low --> df.high:
IF df.low > previous df.low: delete df.low
IF df.low < previous df.low: delete previous df.low
If i try to work this out with a for loop, it gives me a KeyError: 1.3339399999999999.
day = df.groupby(pd.TimeGrouper('D'))
is_day_min = day.extremes.apply(lambda x: x == x.min())
for i in df.extremes:
if is_day_min[i] == True and is_day_min[i+1] == True:
if df.extremes[i] > df.extremes[i+1]:
del df.extremes[i]
for i in df.extremes:
if is_day_min[i] == True and is_day_min[i+1] == True:
if df.extremes[i] < df.extremes[i+1]:
del df.extremes[i+1]
How to filter/delete the values as i explained in pseudocode?
I am struggling with indexing and bools but i can't solve this. I strongly suspect that i need to use a lambda function, but i don't know how to apply it. So please have mercy it's too long that i'm trying on this. Hope i've been clear enough.
All you're really missing is a way of saying "previous low" in a vectorized fashion. That's spelled df['low'].shift(-1). Once you have that it's just:
prev = df.low.shift(-1)
filtered_df = df[~((df.low > prev) | (df.low < prev))]

Categories

Resources