Python Datetime calculation - python
I would like to create a function with python, this is the calculation, if end time of a shift is after 20:00 and between 06:00 it has to create me an extra 25% in minutes for each hour passed after 20:00.
Any suggestions?
UPDATED:
Here is a way to do what I believe your question asks:
from datetime import datetime, timedelta
def getHours(startTime, endTime, extraFraction):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%Y-%m-%d")
bonusStartTime = datetime.strptime(startDateStr + " " + "20:00:00", "%Y-%m-%d %H:%M:%S")
prevBonusEndTime = datetime.strptime(startTime.strftime("%Y-%m-%d") + " " + "06:00:00", "%Y-%m-%d %H:%M:%S")
bonusEndTime = prevBonusEndTime + timedelta(days=1)
bonusPeriod = timedelta(days=0)
duration = endTime - startTime
hours = duration.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
bonusPeriod += fullDays * (bonusEndTime - bonusStartTime)
endTime -= timedelta(days=fullDays)
if startTime < prevBonusEndTime:
bonusPeriod += prevBonusEndTime - startTime
if endTime < prevBonusEndTime:
bonusPeriod -= prevBonusEndTime - endTime
if startTime > bonusStartTime:
bonusPeriod -= startTime - bonusStartTime
if endTime > bonusStartTime:
bonusPeriod += min(endTime, bonusEndTime) - bonusStartTime
delta = duration + bonusPeriod * extraFraction
return delta
Explanation:
confirm startTime is before endTime, otherwise raise an exception
set the following:
prevBonusStartTime as 20:00 on the day before startTime
bonusStartTime as 20:00 on the day of startTime
bonusEndTime as 06:00 on the day after startTime
if endTime is more than 24 hours after startTime, record this in duration and bonusPeriod and rewind endTime by the number of full days (24-hour periods) by which it exceeds startTime
add or subtract to bonusPeriod by the number of hours (in addition to any calculated above) of overlap between startTime, endTime and the intervals 00:00, prevBonusEndTime and/or bonusStartTime, bonusEndTime.
Test code:
def testing(start, end):
print(f'start {start}, end {end}, actual hours {getHours(start, end, 0)}, effective hours {getHours(start, end, 0.25)}')
startTime = datetime.strptime("2022-05-26 06:00:00", "%Y-%m-%d %H:%M:%S")
endTime = startTime
for h in range(0, 48, 3):
testing(startTime, endTime + timedelta(hours=h))
endTime += timedelta(hours=48)
for h in range(0, 48, 3):
testing(startTime + timedelta(hours=h), endTime)
Output:
start 2022-05-26 06:00:00, end 2022-05-26 06:00:00, actual hours 0:00:00, effective hours 0:00:00
start 2022-05-26 06:00:00, end 2022-05-26 09:00:00, actual hours 3:00:00, effective hours 3:00:00
start 2022-05-26 06:00:00, end 2022-05-26 12:00:00, actual hours 6:00:00, effective hours 6:00:00
start 2022-05-26 06:00:00, end 2022-05-26 15:00:00, actual hours 9:00:00, effective hours 9:00:00
start 2022-05-26 06:00:00, end 2022-05-26 18:00:00, actual hours 12:00:00, effective hours 12:00:00
start 2022-05-26 06:00:00, end 2022-05-26 21:00:00, actual hours 15:00:00, effective hours 15:15:00
start 2022-05-26 06:00:00, end 2022-05-27 00:00:00, actual hours 18:00:00, effective hours 19:00:00
start 2022-05-26 06:00:00, end 2022-05-27 03:00:00, actual hours 21:00:00, effective hours 22:45:00
start 2022-05-26 06:00:00, end 2022-05-27 06:00:00, actual hours 1 day, 0:00:00, effective hours 1 day, 2:30:00
start 2022-05-26 06:00:00, end 2022-05-27 09:00:00, actual hours 1 day, 3:00:00, effective hours 1 day, 5:30:00
start 2022-05-26 06:00:00, end 2022-05-27 12:00:00, actual hours 1 day, 6:00:00, effective hours 1 day, 8:30:00
start 2022-05-26 06:00:00, end 2022-05-27 15:00:00, actual hours 1 day, 9:00:00, effective hours 1 day, 11:30:00
start 2022-05-26 06:00:00, end 2022-05-27 18:00:00, actual hours 1 day, 12:00:00, effective hours 1 day, 14:30:00
start 2022-05-26 06:00:00, end 2022-05-27 21:00:00, actual hours 1 day, 15:00:00, effective hours 1 day, 17:45:00
start 2022-05-26 06:00:00, end 2022-05-28 00:00:00, actual hours 1 day, 18:00:00, effective hours 1 day, 21:30:00
start 2022-05-26 06:00:00, end 2022-05-28 03:00:00, actual hours 1 day, 21:00:00, effective hours 2 days, 1:15:00
start 2022-05-26 06:00:00, end 2022-05-28 06:00:00, actual hours 2 days, 0:00:00, effective hours 2 days, 5:00:00
start 2022-05-26 09:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 21:00:00, effective hours 2 days, 2:00:00
start 2022-05-26 12:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 18:00:00, effective hours 1 day, 23:00:00
start 2022-05-26 15:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 15:00:00, effective hours 1 day, 20:00:00
start 2022-05-26 18:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 12:00:00, effective hours 1 day, 17:00:00
start 2022-05-26 21:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 9:00:00, effective hours 1 day, 13:45:00
start 2022-05-27 00:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 6:00:00, effective hours 1 day, 10:00:00
start 2022-05-27 03:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 3:00:00, effective hours 1 day, 6:15:00
start 2022-05-27 06:00:00, end 2022-05-28 06:00:00, actual hours 1 day, 0:00:00, effective hours 1 day, 2:30:00
start 2022-05-27 09:00:00, end 2022-05-28 06:00:00, actual hours 21:00:00, effective hours 23:30:00
start 2022-05-27 12:00:00, end 2022-05-28 06:00:00, actual hours 18:00:00, effective hours 20:30:00
start 2022-05-27 15:00:00, end 2022-05-28 06:00:00, actual hours 15:00:00, effective hours 17:30:00
start 2022-05-27 18:00:00, end 2022-05-28 06:00:00, actual hours 12:00:00, effective hours 14:30:00
start 2022-05-27 21:00:00, end 2022-05-28 06:00:00, actual hours 9:00:00, effective hours 11:15:00
start 2022-05-28 00:00:00, end 2022-05-28 06:00:00, actual hours 6:00:00, effective hours 7:30:00
start 2022-05-28 03:00:00, end 2022-05-28 06:00:00, actual hours 3:00:00, effective hours 3:45:00
UPDATE #2:
Here is slightly modified code that outputs regular hours, bonus hours (i.e., hours in the bonus window from 20:00 to 06:00) and extra hours (25% * bonus hours):
from datetime import datetime, timedelta
def getRegularAndBonusHours(startTime, endTime):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%Y-%m-%d")
bonusStartTime = datetime.strptime(startDateStr + " " + "20:00:00", "%Y-%m-%d %H:%M:%S")
prevBonusEndTime = datetime.strptime(startTime.strftime("%Y-%m-%d") + " " + "06:00:00", "%Y-%m-%d %H:%M:%S")
bonusEndTime = prevBonusEndTime + timedelta(days=1)
bonusPeriod = timedelta(days=0)
duration = endTime - startTime
hours = duration.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
bonusPeriod += fullDays * (bonusEndTime - bonusStartTime)
endTime -= timedelta(days=fullDays)
if startTime < prevBonusEndTime:
bonusPeriod += prevBonusEndTime - startTime
if endTime < prevBonusEndTime:
bonusPeriod -= prevBonusEndTime - endTime
if startTime > bonusStartTime:
bonusPeriod -= startTime - bonusStartTime
if endTime > bonusStartTime:
bonusPeriod += min(endTime, bonusEndTime) - bonusStartTime
return duration, bonusPeriod
def getHours(startTime, endTime, extraFraction):
duration, bonusPeriod = getRegularAndBonusHours(startTime, endTime)
delta = duration + bonusPeriod * extraFraction
return delta
def testing(start, end):
duration, bonusPeriod = getRegularAndBonusHours(start, end)
def getHoursRoundedUp(delta):
return delta.days * 24 + delta.seconds // 3600 + (1 if delta.seconds % 3600 else 0)
regularHours, bonusHours = getHoursRoundedUp(duration), getHoursRoundedUp(bonusPeriod)
print(f'start {start}, end {end}, regular {regularHours}, bonus {bonusHours}, extra {0.25 * bonusHours}')
startTime = datetime.strptime("2022-05-26 06:00:00", "%Y-%m-%d %H:%M:%S")
endTime = startTime
for h in range(0, 48, 3):
testing(startTime, endTime + timedelta(hours=h))
endTime += timedelta(hours=48)
for h in range(0, 48, 3):
testing(startTime + timedelta(hours=h), endTime)
Output:
start 2022-05-26 06:00:00, end 2022-05-26 06:00:00, regular 0, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 09:00:00, regular 3, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 12:00:00, regular 6, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 15:00:00, regular 9, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 18:00:00, regular 12, bonus 0, extra 0.0
start 2022-05-26 06:00:00, end 2022-05-26 21:00:00, regular 15, bonus 1, extra 0.25
start 2022-05-26 06:00:00, end 2022-05-27 00:00:00, regular 18, bonus 4, extra 1.0
start 2022-05-26 06:00:00, end 2022-05-27 03:00:00, regular 21, bonus 7, extra 1.75
start 2022-05-26 06:00:00, end 2022-05-27 06:00:00, regular 24, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 09:00:00, regular 27, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 12:00:00, regular 30, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 15:00:00, regular 33, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 18:00:00, regular 36, bonus 10, extra 2.5
start 2022-05-26 06:00:00, end 2022-05-27 21:00:00, regular 39, bonus 11, extra 2.75
start 2022-05-26 06:00:00, end 2022-05-28 00:00:00, regular 42, bonus 14, extra 3.5
start 2022-05-26 06:00:00, end 2022-05-28 03:00:00, regular 45, bonus 17, extra 4.25
start 2022-05-26 06:00:00, end 2022-05-28 06:00:00, regular 48, bonus 20, extra 5.0
start 2022-05-26 09:00:00, end 2022-05-28 06:00:00, regular 45, bonus 20, extra 5.0
start 2022-05-26 12:00:00, end 2022-05-28 06:00:00, regular 42, bonus 20, extra 5.0
start 2022-05-26 15:00:00, end 2022-05-28 06:00:00, regular 39, bonus 20, extra 5.0
start 2022-05-26 18:00:00, end 2022-05-28 06:00:00, regular 36, bonus 20, extra 5.0
start 2022-05-26 21:00:00, end 2022-05-28 06:00:00, regular 33, bonus 19, extra 4.75
start 2022-05-27 00:00:00, end 2022-05-28 06:00:00, regular 30, bonus 16, extra 4.0
start 2022-05-27 03:00:00, end 2022-05-28 06:00:00, regular 27, bonus 13, extra 3.25
start 2022-05-27 06:00:00, end 2022-05-28 06:00:00, regular 24, bonus 10, extra 2.5
start 2022-05-27 09:00:00, end 2022-05-28 06:00:00, regular 21, bonus 10, extra 2.5
start 2022-05-27 12:00:00, end 2022-05-28 06:00:00, regular 18, bonus 10, extra 2.5
start 2022-05-27 15:00:00, end 2022-05-28 06:00:00, regular 15, bonus 10, extra 2.5
start 2022-05-27 18:00:00, end 2022-05-28 06:00:00, regular 12, bonus 10, extra 2.5
start 2022-05-27 21:00:00, end 2022-05-28 06:00:00, regular 9, bonus 9, extra 2.25
start 2022-05-28 00:00:00, end 2022-05-28 06:00:00, regular 6, bonus 6, extra 1.5
start 2022-05-28 03:00:00, end 2022-05-28 06:00:00, regular 3, bonus 3, extra 0.75
UPDATE #3
Latest clarification from OP in a comment indicates:
A need to update in excel the allowances received in case of night work
The goal in the excel sheet is to separately enter start time, end time, working time (without supplement), and night work supplement (25% from 20:00 to 06:) for each hour started for night work.
Here is updated code to create the required data result, and optionally to use a pandas dataframe to put this into an Excel file. Test inputs are used to explore a range of start and end times, including partial hours:
from datetime import datetime, timedelta
def getRegularAndBonusHours(startTime, endTime):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%Y-%m-%d")
bonusStartTime = datetime.strptime(startDateStr + " " + "20:00:00", "%Y-%m-%d %H:%M:%S")
prevBonusEndTime = datetime.strptime(startTime.strftime("%Y-%m-%d") + " " + "06:00:00", "%Y-%m-%d %H:%M:%S")
bonusEndTime = prevBonusEndTime + timedelta(days=1)
bonusPeriod = timedelta(days=0)
duration = endTime - startTime
hours = duration.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
bonusPeriod += fullDays * (bonusEndTime - bonusStartTime)
endTime -= timedelta(days=fullDays)
if startTime < prevBonusEndTime:
bonusPeriod += prevBonusEndTime - startTime
if endTime < prevBonusEndTime:
bonusPeriod -= prevBonusEndTime - endTime
if startTime > bonusStartTime:
bonusPeriod -= startTime - bonusStartTime
if endTime > bonusStartTime:
bonusPeriod += min(endTime, bonusEndTime) - bonusStartTime
return duration, bonusPeriod
def testing(start, end):
duration, bonusPeriod = getRegularAndBonusHours(start, end)
def getHoursFromDelta(delta, roundUp=False):
return delta.days * 24 + (delta.seconds // 3600 + (1 if delta.seconds % 3600 else 0)) if roundUp else (delta.seconds / 3600)
fullHours, bonusHours = getHoursFromDelta(duration + bonusPeriod), getHoursFromDelta(bonusPeriod, True)
return start, end, fullHours, bonusHours * 0.25
# calculate test results
results = []
startTime = datetime.strptime("2022-05-26 06:00:00", "%Y-%m-%d %H:%M:%S")
endTime = startTime
for halfHours in range(0, 2 * 48, 5):
results.append(testing(startTime, endTime + timedelta(hours=halfHours / 2)))
endTime += timedelta(hours=48)
for halfHours in range(0, 2 * 48, 5):
results.append(testing(startTime + timedelta(hours=halfHours / 2), endTime))
# print results
headings = ['Start Time', 'End Time', 'Working Hours', '25% of Supplemental Hours Started']
[print(f'{x:30}', end='') for x in headings]
[[print(f'{f"{x}":30}', end='') for x in row] for row in results if print() or True]
print()
# OPTIONAL: save results in pandas dataframe and save as Excel file
import pandas as pd
df = pd.DataFrame(results, columns=headings)
print(df)
with pd.ExcelWriter('TestTimesheet.xlsx') as writer:
df.to_excel(writer, index=None, sheet_name='Timesheet')
ws = writer.sheets['Timesheet']
for column in df:
column_length = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
ws.column_dimensions[chr(ord('A') + col_idx)].width = column_length
Output:
Start Time End Time Working Hours 25% of Supplemental Hours Started
0 2022-05-26 06:00:00 2022-05-26 06:00:00 0.0 0.00
1 2022-05-26 06:00:00 2022-05-26 08:30:00 2.5 0.00
2 2022-05-26 06:00:00 2022-05-26 11:00:00 5.0 0.00
3 2022-05-26 06:00:00 2022-05-26 13:30:00 7.5 0.00
4 2022-05-26 06:00:00 2022-05-26 16:00:00 10.0 0.00
5 2022-05-26 06:00:00 2022-05-26 18:30:00 12.5 0.00
6 2022-05-26 06:00:00 2022-05-26 21:00:00 16.0 0.25
7 2022-05-26 06:00:00 2022-05-26 23:30:00 21.0 1.00
8 2022-05-26 06:00:00 2022-05-27 02:00:00 2.0 1.50
9 2022-05-26 06:00:00 2022-05-27 04:30:00 7.0 2.25
10 2022-05-26 06:00:00 2022-05-27 07:00:00 11.0 2.50
11 2022-05-26 06:00:00 2022-05-27 09:30:00 13.5 2.50
12 2022-05-26 06:00:00 2022-05-27 12:00:00 16.0 2.50
13 2022-05-26 06:00:00 2022-05-27 14:30:00 18.5 2.50
14 2022-05-26 06:00:00 2022-05-27 17:00:00 21.0 2.50
15 2022-05-26 06:00:00 2022-05-27 19:30:00 23.5 2.50
16 2022-05-26 06:00:00 2022-05-27 22:00:00 4.0 3.00
17 2022-05-26 06:00:00 2022-05-28 00:30:00 9.0 3.75
18 2022-05-26 06:00:00 2022-05-28 03:00:00 14.0 4.25
19 2022-05-26 06:00:00 2022-05-28 05:30:00 19.0 5.00
20 2022-05-26 06:00:00 2022-05-28 06:00:00 20.0 5.00
21 2022-05-26 08:30:00 2022-05-28 06:00:00 17.5 5.00
22 2022-05-26 11:00:00 2022-05-28 06:00:00 15.0 5.00
23 2022-05-26 13:30:00 2022-05-28 06:00:00 12.5 5.00
24 2022-05-26 16:00:00 2022-05-28 06:00:00 10.0 5.00
25 2022-05-26 18:30:00 2022-05-28 06:00:00 7.5 5.00
26 2022-05-26 21:00:00 2022-05-28 06:00:00 4.0 4.75
27 2022-05-26 23:30:00 2022-05-28 06:00:00 23.0 4.25
28 2022-05-27 02:00:00 2022-05-28 06:00:00 18.0 3.50
29 2022-05-27 04:30:00 2022-05-28 06:00:00 13.0 3.00
30 2022-05-27 07:00:00 2022-05-28 06:00:00 9.0 2.50
31 2022-05-27 09:30:00 2022-05-28 06:00:00 6.5 2.50
32 2022-05-27 12:00:00 2022-05-28 06:00:00 4.0 2.50
33 2022-05-27 14:30:00 2022-05-28 06:00:00 1.5 2.50
34 2022-05-27 17:00:00 2022-05-28 06:00:00 23.0 2.50
35 2022-05-27 19:30:00 2022-05-28 06:00:00 20.5 2.50
36 2022-05-27 22:00:00 2022-05-28 06:00:00 16.0 2.00
37 2022-05-28 00:30:00 2022-05-28 06:00:00 11.0 1.50
38 2022-05-28 03:00:00 2022-05-28 06:00:00 6.0 0.75
39 2022-05-28 05:30:00 2022-05-28 06:00:00 1.0 0.25
from datetime import datetime, timedelta
soup_dienstbegin = '20:00'
soup_dienstende = '23:00'
startinfo = f'{soup_datum} {soup_dienstbegin}'
IcsStartData = datetime.strptime(startinfo, "%d.%m.%Y %H:%M")
endTime = endingtime
startTime = IcsStartData
def zeit_zuschlag_function(startTime, endTime):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%d.%m.%Y")
zeit_zuschlag_start = datetime.strptime(startDateStr + " " + "22:00", "%d.%m.%Y %H:%M")
zeit_zuschlag_end = datetime.strptime(startTime.strftime("%d.%m.%Y") + " " + "00:00", "%d.%m.%Y %H:%M")
zeit_zuschlag_time = zeit_zuschlag_end + timedelta(days=1)
zeit_zuschlag_zeit = timedelta(days=0)
dienstdauer = endTime - startTime
hours = dienstdauer.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
zeit_zuschlag_zeit += fullDays * (zeit_zuschlag_time - zeit_zuschlag_start)
endTime -= timedelta(days=fullDays)
if startTime < zeit_zuschlag_end:
zeit_zuschlag_zeit += zeit_zuschlag_end - startTime
if endTime < zeit_zuschlag_end:
zeit_zuschlag_zeit -= zeit_zuschlag_end - endTime
if startTime > zeit_zuschlag_start:
zeit_zuschlag_zeit -= startTime - zeit_zuschlag_start
if endTime > zeit_zuschlag_start:
zeit_zuschlag_zeit += min(endTime, zeit_zuschlag_time) - zeit_zuschlag_start
return dienstdauer, zeit_zuschlag_zeit
def nacht_zulagen_stunden(startTime, endTime, extraFraction):
dienstdauer, zeit_zuschlag_zeit = zeit_zuschlag_function(startTime, endTime)
delta = dienstdauer + zeit_zuschlag_zeit * extraFraction
return delta
def nacht_zulagen_executing(start, end):
dienstdauer, zeit_zuschlag_zeit = zeit_zuschlag_function(start, end)
def nacht_zulagen_hours_roundeup(delta):
return delta.days * 24 + delta.seconds // 3600 + (1 if delta.seconds % 3600 else 0)
regularHours, nachtszulage = nacht_zulagen_hours_roundeup(dienstdauer), nacht_zulagen_hours_roundeup(zeit_zuschlag_zeit)
# nachtarbeitzeitzuschlag 10 % pro Stunde
nachtarbeit_zeit_zuschlag_10 = zeit_zuschlag_zeit / 100 * 110 - (zeit_zuschlag_zeit)
print(nachtarbeit_zeit_zuschlag_10)
for h in range(1):
nacht_zulagen_executing(startTime, endTime + timedelta(hours=h))
Related
Split time series in intervals of non-uniform length
I have a time series with breaks (times w/o recordings) in between. A simplified example would be: df = pd.DataFrame( np.random.rand(13), columns=["values"], index=pd.date_range(start='1/1/2020 11:00:00',end='1/1/2020 23:00:00',freq='H')) df.iloc[4:7] = np.nan df.dropna(inplace=True) df values 2020-01-01 11:00:00 0.100339 2020-01-01 12:00:00 0.054668 2020-01-01 13:00:00 0.209965 2020-01-01 14:00:00 0.551023 2020-01-01 18:00:00 0.495879 2020-01-01 19:00:00 0.479905 2020-01-01 20:00:00 0.250568 2020-01-01 21:00:00 0.904743 2020-01-01 22:00:00 0.686085 2020-01-01 23:00:00 0.188166 Now I would like to split it in intervals which are divided by a certain time span (e.g. 2h). In the example above this would be: ( values 2020-01-01 11:00:00 0.100339 2020-01-01 12:00:00 0.054668 2020-01-01 13:00:00 0.209965 2020-01-01 14:00:00 0.551023, values 2020-01-01 18:00:00 0.495879 2020-01-01 19:00:00 0.479905 2020-01-01 20:00:00 0.250568 2020-01-01 21:00:00 0.904743 2020-01-01 22:00:00 0.686085 2020-01-01 23:00:00 0.188166) I was a bit surprised that I didn't find anything on that since I thought this is a common problem. My current solution to get start and end index of each interval is : def intervals(data: pd.DataFrame, delta_t: timedelta = timedelta(hours=2)): data = data.sort_values(by=['event_timestamp'], ignore_index=True) breaks = (data['event_timestamp'].diff() > delta_t).astype(bool).values ranges = [] start = 0 end = start for i, e in enumerate(breaks): if not e: end = i if i == len(breaks) - 1: ranges.append((start, end)) start = i end = start elif i != 0: ranges.append((start, end)) start = i end = start return ranges Any suggestions how I could do this in a smarter way? I suspect this should be somehow possible using groupby.
Yes, you can use the very convenient np.split: dt = pd.Timedelta('2H') parts = np.split(df, np.where(np.diff(df.index) > dt)[0] + 1) Which gives, for your example: >>> parts [ values 2020-01-01 11:00:00 0.557374 2020-01-01 12:00:00 0.942296 2020-01-01 13:00:00 0.181189 2020-01-01 14:00:00 0.758822, values 2020-01-01 18:00:00 0.682125 2020-01-01 19:00:00 0.818187 2020-01-01 20:00:00 0.053515 2020-01-01 21:00:00 0.572342 2020-01-01 22:00:00 0.423129 2020-01-01 23:00:00 0.882215]
#Pierre thanks for your input. I now got to a solution which is convenient for me: df['diff'] = df.index.to_series().diff() max_gap = timedelta(hours=2) df['gapId'] = 0 df.loc[df['diff'] >= max_gap, ['gapId']] = 1 df['gapId'] = df['gapId'].cumsum() list(df.groupby('gapId')) gives: [(0, values date diff gapId 0 1.0 2020-01-01 11:00:00 NaT 0 1 1.0 2020-01-01 12:00:00 0 days 01:00:00 0 2 1.0 2020-01-01 13:00:00 0 days 01:00:00 0 3 1.0 2020-01-01 14:00:00 0 days 01:00:00 0), (1, values date diff gapId 7 1.0 2020-01-01 18:00:00 0 days 04:00:00 1 8 1.0 2020-01-01 19:00:00 0 days 01:00:00 1 9 1.0 2020-01-01 20:00:00 0 days 01:00:00 1 10 1.0 2020-01-01 21:00:00 0 days 01:00:00 1 11 1.0 2020-01-01 22:00:00 0 days 01:00:00 1 12 1.0 2020-01-01 23:00:00 0 days 01:00:00 1)]
Transform the Random time intervals to 30 mins Structured interval
I have this dataFrame where some tasks happened time period Date Start Time End Time 0 2016-01-01 0:00:00 2016-01-01 0:10:00 2016-01-01 0:25:00 1 2016-01-01 0:00:00 2016-01-01 1:17:00 2016-01-01 1:31:00 2 2016-01-02 0:00:00 2016-01-02 0:30:00 2016-01-02 0:32:00 ... ... ... ... Convert this df to 30 mins interval Expected outcome Date Hours 1 2016-01-01 0:30:00 0:15 2 2016-01-01 1:00:00 0:00 3 2016-01-01 1:30:00 0:13 4 2016-01-01 2:00:00 0:01 5 2016-01-01 2:30:00 0:00 6 2016-01-01 3:00:00 0:00 ... ... 47 2016-01-01 23:30:00 0:00 48 2016-01-02 23:59:59 0:00 49 2016-01-02 00:30:00 0:00 50 2016-01-02 01:00:00 0:02 ... ... I was trying to do with for loop which was getting tedious. Any simple way to do in pandas.
IIUC you can discard the Date column, get the time difference between start and end, groupby 30 minutes and agg on first (assuming you always have one entry only per 30 minutes slot): print (df.assign(Diff=df["End Time"]-df["Start Time"]) .groupby(pd.Grouper(key="Start Time", freq="30T")) .agg({"Diff": "first"}) .fillna(pd.Timedelta(seconds=0))) Diff Start Time 2016-01-01 00:00:00 0 days 00:15:00 2016-01-01 00:30:00 0 days 00:00:00 2016-01-01 01:00:00 0 days 00:14:00 2016-01-01 01:30:00 0 days 00:00:00 2016-01-01 02:00:00 0 days 00:00:00 2016-01-01 02:30:00 0 days 00:00:00 ... 2016-01-02 00:30:00 0 days 00:02:00
The idea is to create a series with 0 and DatetimeIndex per minutes between min start time and max end time. Then add 1 where Start Time and subtract 1 where End Time. You can then use cumsum to count the values between Start and End, resample.sum per 30 minutes and reset_index. The last line of code is to get the proper format in the Hours column. #create a series of 0 with a datetime index res = pd.Series(data=0, index= pd.DatetimeIndex(pd.date_range(df['Start Time'].min(), df['End Time'].max(), freq='T'), name='Dates'), name='Hours') # add 1 o the start time and remove 1 to the end start res[df['Start Time']] += 1 res[df['End Time']] -= 1 # cumsum to get the right value for each minute then resample per 30 minutes res = (res.cumsum() .resample('30T', label='right').sum() .reset_index('Dates') ) # change the format of the Hours column, honestly not necessary res['Hours'] = pd.to_datetime(res['Hours'], format='%M').dt.strftime('%H:%M') # or .dt.time print(res) Dates Hours 0 2016-01-01 00:30:00 00:15 1 2016-01-01 01:00:00 00:00 2 2016-01-01 01:30:00 00:13 3 2016-01-01 02:00:00 00:01 4 2016-01-01 02:30:00 00:00 5 2016-01-01 03:00:00 00:00 ... 48 2016-01-02 00:30:00 00:00 49 2016-01-02 01:00:00 00:02
Convert a Pandas Column to Hours and Minutes
I have one field in a Pandas DataFrame that is in integer format. How do I convert to a DateTime format and append the column to my DataFrame?. Specifically, I need hours and minutes. Example: DataFrame Name: df The column as a list: df.index dtype='int64' Sample data in df.index -- [0, 15, 30, 45, 100, 115, 130, 145, 200...2300, 2315, 2330, 2345] I tried pd.to_datetime(df.index, format='') but it is returning the wrong format.
You have an index that has time values as HHMM represented by an integer. In order to convert this to a datetime dtype, you have to first make strings that can be correctly converted by the to_datetime() method. time_strs = df.index.astype(str).str.zfill(4) This converts all of the integer values to strings that are zero padded to 4 characters, so 15 becomes the string "0015" for example. Now you can use the format "%H%M" to convert to a datetime object: pd.to_datetime(time_strs, format="%H%M") And then use the methods of datetime objects to access the hours and minutes.
import pandas as pd df = pd.DataFrame({'time':[0, 15, 30, 45, 100, 115, 130, 145, 200, 2300, 2315, 2330, 2345]}) df.set_index('time', inplace=True) df['datetime_dtype'] = pd.to_datetime(df.index, format='%H', exact=False) df['str_dtype'] = df['datetime_dtype'].astype(str).str[11:16] print(df) datetime_dtype str_dtype time 0 1900-01-01 00:00:00 00:00 15 1900-01-01 15:00:00 15:00 30 1900-01-01 03:00:00 03:00 45 1900-01-01 04:00:00 04:00 100 1900-01-01 10:00:00 10:00 115 1900-01-01 11:00:00 11:00 130 1900-01-01 13:00:00 13:00 145 1900-01-01 14:00:00 14:00 200 1900-01-01 20:00:00 20:00 2300 1900-01-01 23:00:00 23:00 2315 1900-01-01 23:00:00 23:00 2330 1900-01-01 23:00:00 23:00 2345 1900-01-01 23:00:00 23:00 print(df.dtypes) datetime_dtype datetime64[ns] str_dtype object dtype: object If you want to get back to this year, you can use a time delta. delta = pd.Timedelta(weeks=6278, hours=0, minutes=0) df['datetime_dtype_2020'] = df['datetime_dtype'] + delta print(df) datetime_dtype str_dtype datetime_dtype_2020 time 0 1900-01-01 00:00:00 00:00 2020-04-27 00:00:00 15 1900-01-01 15:00:00 15:00 2020-04-27 15:00:00 30 1900-01-01 03:00:00 03:00 2020-04-27 03:00:00 45 1900-01-01 04:00:00 04:00 2020-04-27 04:00:00 100 1900-01-01 10:00:00 10:00 2020-04-27 10:00:00 115 1900-01-01 11:00:00 11:00 2020-04-27 11:00:00 130 1900-01-01 13:00:00 13:00 2020-04-27 13:00:00 145 1900-01-01 14:00:00 14:00 2020-04-27 14:00:00 200 1900-01-01 20:00:00 20:00 2020-04-27 20:00:00 2300 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00 2315 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00 2330 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00 2345 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00
If you only want hours and minutes, then you can use datetime.time objects. import datetime def int_to_time(i): if i < 60: return datetime.time(0, i) elif i < 1000: return datetime.time(int(str(i)[0]), int(str(i)[1:])) else: return datetime.time(int(str(i)[0:2]), int(str(i)[2:])) df.index.apply(int_to_time) Example import datetime import numpy as np ints = [i for i in np.random.randint(0, 2400, 100) if i % 100 < 60][0:5] df = pd.DataFrame({'a': ints}) >>>df 0 1559 1 1712 2 1233 3 953 4 938 >>>df['a'].apply(int_to_time) 0 15:59:00 1 17:12:00 2 12:33:00 3 09:53:00 4 09:38:00 From there, you can access the hour and minute properties of the values >>>df['a'].apply(int_to_time).apply(lambda x: (x.hour, x.minute)) 0 (15, 59) 1 (17, 12) 2 (12, 33) 3 (9, 53) 4 (9, 38)
Generating list of 5 minute interval between two times
I have the following strings: start = "07:00:00" end = "17:00:00" How can I generate a list of 5 minute interval between those times, ie ["07:00:00","07:05:00",...,"16:55:00","17:00:00"]
This works for me, I'm sure you can figure out how to put the results in the list instead of printing them out: >>> import datetime >>> start = "07:00:00" >>> end = "17:00:00" >>> delta = datetime.timedelta(minutes=5) >>> start = datetime.datetime.strptime( start, '%H:%M:%S' ) >>> end = datetime.datetime.strptime( end, '%H:%M:%S' ) >>> t = start >>> while t <= end : ... print datetime.datetime.strftime( t, '%H:%M:%S') ... t += delta ... 07:00:00 07:05:00 07:10:00 07:15:00 07:20:00 07:25:00 07:30:00 07:35:00 07:40:00 07:45:00 07:50:00 07:55:00 08:00:00 08:05:00 08:10:00 08:15:00 08:20:00 08:25:00 08:30:00 08:35:00 08:40:00 08:45:00 08:50:00 08:55:00 09:00:00 09:05:00 09:10:00 09:15:00 09:20:00 09:25:00 09:30:00 09:35:00 09:40:00 09:45:00 09:50:00 09:55:00 10:00:00 10:05:00 10:10:00 10:15:00 10:20:00 10:25:00 10:30:00 10:35:00 10:40:00 10:45:00 10:50:00 10:55:00 11:00:00 11:05:00 11:10:00 11:15:00 11:20:00 11:25:00 11:30:00 11:35:00 11:40:00 11:45:00 11:50:00 11:55:00 12:00:00 12:05:00 12:10:00 12:15:00 12:20:00 12:25:00 12:30:00 12:35:00 12:40:00 12:45:00 12:50:00 12:55:00 13:00:00 13:05:00 13:10:00 13:15:00 13:20:00 13:25:00 13:30:00 13:35:00 13:40:00 13:45:00 13:50:00 13:55:00 14:00:00 14:05:00 14:10:00 14:15:00 14:20:00 14:25:00 14:30:00 14:35:00 14:40:00 14:45:00 14:50:00 14:55:00 15:00:00 15:05:00 15:10:00 15:15:00 15:20:00 15:25:00 15:30:00 15:35:00 15:40:00 15:45:00 15:50:00 15:55:00 16:00:00 16:05:00 16:10:00 16:15:00 16:20:00 16:25:00 16:30:00 16:35:00 16:40:00 16:45:00 16:50:00 16:55:00 17:00:00 >>>
Try: # import modules from datetime import datetime, timedelta # Create starting and end datetime object from string start = datetime.strptime("07:00:00", "%H:%M:%S") end = datetime.strptime("17:00:00", "%H:%M:%S") # min_gap min_gap = 5 # compute datetime interval arr = [(start + timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S") for i in range(int((end-start).total_seconds() / 60.0 / min_gap))] print(arr) # ['07:00:00', '07:05:00', '07:10:00', '07:15:00', '07:20:00', '07:25:00', '07:30:00', ..., '16:55:00'] Explanations: First, you need to convert string date to datetime object. The strptime does it! Then, we will find the number of minutes between the starting date and the ending datetime. This discussion solved it! We can do it like this : (end-start).total_seconds() / 60.0 However, in our case, we only want to iterate every n minutes. So, in our loop, we need to divide it by n. Also, as we will iterate over this number of minutes, we need to convertit to int for the for loop. That results in: int((end-start).total_seconds() / 60.0 / min_gap) Then, on each element of our loop, we will add the number of minutes to the initial datetime. The tiemdelta function has been designed for. As parameter, we specify the number of hours we want to add : min_gap*i/60. Finally, we convert this datetime object back to a string object using the strftime.
Pandas filtering values in dataframe
I have this dataframe. The columns represent the highs and the lows in daily EURUSD price: df.low df.high 2013-01-17 16:00:00 1.33394 2013-01-17 20:00:00 1.33874 2013-01-18 18:00:00 1.32805 2013-01-18 09:00:00 1.33983 2013-01-21 00:00:00 1.32962 2013-01-21 09:00:00 1.33321 2013-01-22 11:00:00 1.32667 2013-01-22 09:00:00 1.33715 2013-01-23 17:00:00 1.32645 2013-01-23 14:00:00 1.33545 2013-01-24 10:00:00 1.32860 2013-01-24 18:00:00 1.33926 2013-01-25 04:00:00 1.33497 2013-01-25 17:00:00 1.34783 2013-01-28 10:00:00 1.34246 2013-01-28 16:00:00 1.34771 2013-01-29 13:00:00 1.34143 2013-01-29 21:00:00 1.34972 2013-01-30 08:00:00 1.34820 2013-01-30 21:00:00 1.35873 2013-01-31 13:00:00 1.35411 2013-01-31 17:00:00 1.35944 I summed them up into a third column (df.extremes). df.extremes 2013-01-17 16:00:00 1.33394 2013-01-17 20:00:00 1.33874 2013-01-18 18:00:00 1.32805 2013-01-18 09:00:00 1.33983 2013-01-21 00:00:00 1.32962 2013-01-21 09:00:00 1.33321 2013-01-22 09:00:00 1.33715 2013-01-22 11:00:00 1.32667 2013-01-23 14:00:00 1.33545 2013-01-23 17:00:00 1.32645 2013-01-24 10:00:00 1.32860 2013-01-24 18:00:00 1.33926 2013-01-25 04:00:00 1.33497 2013-01-25 17:00:00 1.34783 2013-01-28 10:00:00 1.34246 2013-01-28 16:00:00 1.34771 2013-01-29 13:00:00 1.34143 2013-01-29 21:00:00 1.34972 2013-01-30 08:00:00 1.34820 2013-01-30 21:00:00 1.35873 2013-01-31 13:00:00 1.35411 2013-01-31 17:00:00 1.35944 But now i want to filter some values from df.extremes. To explain what to filter i try with this "pseudocode": IF following the index we move from: previous df.low --> df.low --> df.high: IF df.low > previous df.low: delete df.low IF df.low < previous df.low: delete previous df.low If i try to work this out with a for loop, it gives me a KeyError: 1.3339399999999999. day = df.groupby(pd.TimeGrouper('D')) is_day_min = day.extremes.apply(lambda x: x == x.min()) for i in df.extremes: if is_day_min[i] == True and is_day_min[i+1] == True: if df.extremes[i] > df.extremes[i+1]: del df.extremes[i] for i in df.extremes: if is_day_min[i] == True and is_day_min[i+1] == True: if df.extremes[i] < df.extremes[i+1]: del df.extremes[i+1] How to filter/delete the values as i explained in pseudocode? I am struggling with indexing and bools but i can't solve this. I strongly suspect that i need to use a lambda function, but i don't know how to apply it. So please have mercy it's too long that i'm trying on this. Hope i've been clear enough.
All you're really missing is a way of saying "previous low" in a vectorized fashion. That's spelled df['low'].shift(-1). Once you have that it's just: prev = df.low.shift(-1) filtered_df = df[~((df.low > prev) | (df.low < prev))]