Python3, nested dict comparison (recursive?) - python

I'm writing a program to take a .csv file and create 'metrics' for ticket closure data. Each ticket has one or more time entries; the goal is to grab the 'delta' (ie - time difference) for open -> close and time_start -> time_end on a PER TICKET basis; these are not real variables, they're just for the purpose of this question.
So, say we have ticket 12345 that has 3 time entries like so:
ticket: 12345
open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000
time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-26 00:02:00.000
ticket: 12345
open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000
time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-26 00:02:00.000
ticket: 12345
open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000
time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-27 00:02:00.000
I'd like to have the program display ONE entry for this, adding up the 'deltas', like so:
ticket: 12345
Delta open/close ($total time from open to close):
Delta start/end: ($total time of ALL ticket time entries added up)
Here's what I have so far;
.csv example:
Ticket #,Ticket Type,Opened,Closed,Time Entry Day,Start,End
737385,Software,2016-09-06 12:48:31.680,2016-09-06 15:41:52.933,2016-09-06 00:00:00.000,1900-01-01 15:02:00.417,1900-01-01 15:41:00.417
737318,Hardware,2016-09-06 12:20:28.403,2016-09-06 14:35:58.223,2016-09-06 00:00:00.000,1900-01-01 14:04:00.883,1900-01-01 14:35:00.883
737296,Printing/Scan/Fax,2016-09-06 11:37:10.387,2016-09-06 13:33:07.577,2016-09-06 00:00:00.000,1900-01-01 13:29:00.240,1900-01-01 13:33:00.240
737273,Software,2016-09-06 10:54:40.177,2016-09-06 13:28:24.140,2016-09-06 00:00:00.000,1900-01-01 13:17:00.860,1900-01-01 13:28:00.860
737261,Software,2016-09-06 10:33:09.070,2016-09-06 13:19:41.573,2016-09-06 00:00:00.000,1900-01-01 13:05:00.113,1900-01-01 13:15:00.113
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 12:01:00.350,1900-01-01 12:04:00.350
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 14:36:00.913,1900-01-01 14:42:00.913
737220,Password,2016-09-06 09:28:16.060,2016-09-06 11:41:16.750,2016-09-06 00:00:00.000,1900-01-01 11:30:00.303,1900-01-01 11:36:00.303
737197,Hardware,2016-09-06 08:50:23.197,2016-09-06 14:02:18.817,2016-09-06 00:00:00.000,1900-01-01 13:48:00.530,1900-01-01 14:02:00.530
736964,Internal,2016-09-06 01:02:27.453,2016-09-06 05:46:00.160,2016-09-06 00:00:00.000,1900-01-01 06:38:00.917,1900-01-01 06:45:00.917
class Time_Entry.py:
#! /usr/bin/python
from datetime import *
class Time_Entry:
def __init__(self, ticket_no, time_entry_day, opened, closed, start, end):
self.ticket_no = ticket_no
self.time_entry_day = time_entry_day
self.opened = opened
self.closed = closed
self.start = datetime.strptime(start, '%Y-%m-%d %H:%M:%S.%f')
self.end = datetime.strptime(end, '%Y-%m-%d %H:%M:%S.%f')
self.total_open_close_delta = 0
self.total_start_end_delta = 0
def open_close_delta(self, topen, tclose):
open_time = datetime.strptime(topen, '%Y-%m-%d %H:%M:%S.%f')
if tclose != '\\N':
close_time = datetime.strptime(tclose, '%Y-%m-%d %H:%M:%S.%f')
self.total_open_close_delta = close_time - open_time
def start_end_delta(self, tstart, tend):
start_time = datetime.strptime(tstart, '%Y-%m-%d %H:%M:%S.%f')
end_time = datetime.strptime(tend, '%Y-%m-%d %H:%M:%S.%f')
start_end_delta = (end_time - start_time).seconds
self.total_start_end_delta += start_end_delta
return (self.total_start_end_delta)
def add_start_end_delta(self, delta):
self.total_start_end_delta += delta
def display(self):
print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))
Which is called by metrics.py:
#! /usr/bin/python
import csv
import pprint
from Time_Entry import *
file = '/home/jmd9qs/userdrive/metrics.csv'
# setup CSV, load up a list of dicts
reader = csv.DictReader(open(file))
dict_list = []
for line in reader:
dict_list.append(line)
def load_tickets(ticket_list):
for i, key in enumerate(ticket_list):
ticket_no = key['Ticket #']
time_entry_day = key['Time Entry Day']
opened = key['Opened']
closed = key['Closed']
start = key['Start']
end = key['End']
time_entry = Time_Entry(ticket_no, time_entry_day, opened, closed, start, end)
time_entry.open_close_delta(opened, closed)
time_entry.start_end_delta(start, end)
for h, key2 in enumerate(ticket_list):
ticket_no2 = key2['Ticket #']
time_entry_day2 = key2['Time Entry Day']
opened2 = key2['Opened']
closed2 = key2['Closed']
start2 = key2['Start']
end2 = key2['End']
time_entry2 = Time_Entry(ticket_no2, time_entry_day2, opened2, closed2, start2, end2)
if time_entry.ticket_no == time_entry2.ticket_no and i != h:
# add delta and remove second time_entry from dict (no counting twice)
time_entry2_delta = time_entry2.start_end_delta(start2, end2)
time_entry.add_start_end_delta(time_entry2_delta)
del dict_list[h]
time_entry.display()
load_tickets(dict_list)
This seems to work OK so far; however, I get multiple lines of output per ticket instead of one with the 'deltas' added. FYI the way the program displays output is different from my example, which is intentional. See example below:
Ticket #: 738388 Start: 15:24:00.313000 End: 15:35:00.313000 Delta: 2400
Ticket #: 738388 Start: 16:30:00.593000 End: 16:40:00.593000 Delta: 1260
Ticket #: 738381 Start: 15:40:00.763000 End: 16:04:00.767000 Delta: 1440
Ticket #: 738357 Start: 13:50:00.717000 End: 14:10:00.717000 Delta: 1200
Ticket #: 738231 Start: 11:16:00.677000 End: 11:21:00.677000 Delta: 720
Ticket #: 738203 Start: 16:15:00.710000 End: 16:31:00.710000 Delta: 2160
Ticket #: 738203 Start: 09:57:00.060000 End: 10:02:00.060000 Delta: 1560
Ticket #: 738203 Start: 12:26:00.597000 End: 12:31:00.597000 Delta: 900
Ticket #: 738135 Start: 13:25:00.880000 End: 13:50:00.880000 Delta: 2040
Ticket #: 738124 Start: 07:56:00.117000 End: 08:31:00.117000 Delta: 2100
Ticket #: 738121 Start: 07:47:00.903000 End: 07:52:00.903000 Delta: 300
Ticket #: 738115 Start: 07:15:00.443000 End: 07:20:00.443000 Delta: 300
Ticket #: 737926 Start: 06:40:00.813000 End: 06:47:00.813000 Delta: 420
Ticket #: 737684 Start: 18:50:00.060000 End: 20:10:00.060000 Delta: 13380
Ticket #: 737684 Start: 13:00:00.560000 End: 13:08:00.560000 Delta: 8880
Ticket #: 737684 Start: 08:45:00 End: 10:00:00 Delta: 9480
Note that there are a few tickets with more than one entry, which is what I don't want.
Any notes on style, convention, etc. are also welcome as I'm trying to be more 'Pythonic'

The problem here is that with a nested loop like the one you implemented you double-examine the same ticket. Let me explain it better:
ticket_list = [111111, 111111, 666666, 777777] # lets simplify considering the ids only
# I'm trying to keep the same variable names
for i, key1 in enumerate(ticket_list): # outer loop
cnt = 1
for h, key2 in enumerate(ticket_list): # inner loop
if key1 == key2 and i != h:
print('>> match on i:', i, '- h:', h)
cnt += 1
print('Found', key1, cnt, 'times')
See how it double counts the 111111
>> match on i: 0 - h: 1
Found 111111 2 times
>> match on i: 1 - h: 0
Found 111111 2 times
Found 666666 1 times
Found 777777 1 times
That's because you will match the 111111 both when the inner loop examines the first position and the outer the second (i: 0, h: 1), and again when the outer is on the second position and the inner is on the first (i: 1, h: 0).
A proposed solution
A better solution for your problem is to group the entries of the same ticket together and then sum your deltas. groupby is ideal for your task. Here I took the liberty to rewrite some code:
Here I modified the constructor in order to accept the dictionary itself. It makes passing the parameters later less messy. I also removed the methods to add the deltas, later we'll see why.
import csv
import itertools
from datetime import *
class Time_Entry(object):
def __init__(self, entry):
self.ticket_no = entry['Ticket #']
self.time_entry_day = entry['Time Entry Day']
self.opened = datetime.strptime(entry['Opened'], '%Y-%m-%d %H:%M:%S.%f')
self.closed = datetime.strptime(entry['Closed'], '%Y-%m-%d %H:%M:%S.%f')
self.start = datetime.strptime(entry['Start'], '%Y-%m-%d %H:%M:%S.%f')
self.end = datetime.strptime(entry['End'], '%Y-%m-%d %H:%M:%S.%f')
self.total_open_close_delta = (self.closed - self.opened).seconds
self.total_start_end_delta = (self.end - self.start).seconds
def display(self):
print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))
Here we load the data using list comprehensions, the final output will be a the list of Time_Entry objects:
with open('metrics.csv') as ticket_list:
time_entry_list = [Time_Entry(line) for line in csv.DictReader(ticket_list)]
print(time_entry_list)
# [<Time_Entry object at 0x101142f60>, <Time_Entry object at 0x10114d048>, <Time_Entry object at 0x1011fddd8>, ... ]
In the nested-loop version instead you kept rebuilding the Time_Entry inside the inner loop, which means for 100 entries you end up initializing 10000 temporary variables! Creating a list "outside" instead allows us to initialize each Time_Entry only once.
Here comes the magic: we can use the groupby in order to collect all the objects with the same ticket_no in the same list:
sorted(time_entry_list, key=lambda x: x.ticket_no)
ticket_grps = itertools.groupby(time_entry_list, key=lambda x: x.ticket_no)
tickets = [(id, [t for t in tickets]) for id, tickets in ticket_grps]
The final result in ticket is a list tuples with the ticket id in the first position, and the list of associated Time_Entry in the last:
print(tickets)
# [('737385', [<Time_Entry object at 0x101142f60>]),
# ('737318', [<Time_Entry object at 0x10114d048>]),
# ('737238', [<Time_Entry object at 0x1011fdd68>, <Time_Entry object at 0x1011fde80>]),
# ...]
So finally we can iterate over all the tickets, and using again a list comprehension we can build a list containing only the deltas so we can sum them together. You can see why we removed the old method to update the deltas, since now we simply store their value for the single entry and then sum them externally.
Here is your result:
for ticket in tickets:
print('ticket:', ticket[0])
# extract list of deltas and then sum
print('Delta open / close:', sum([entry.total_open_close_delta for entry in ticket[1]]))
print('Delta start / end:', sum([entry.total_start_end_delta for entry in ticket[1]]))
print('(found {} occurrences)'.format(len(ticket[1])))
print()
Output:
ticket: 736964
Delta open / close: 17012
Delta start / end: 420
(found 1 occurrences)
ticket: 737197
Delta open / close: 18715
Delta start / end: 840
(found 1 occurrences)
ticket: 737220
Delta open / close: 7980
Delta start / end: 360
(found 1 occurrences)
ticket: 737238
Delta open / close: 34718
Delta start / end: 540
(found 2 occurrences)
ticket: 737261
Delta open / close: 9992
Delta start / end: 600
(found 1 occurrences)
ticket: 737273
Delta open / close: 9223
Delta start / end: 660
(found 1 occurrences)
ticket: 737296
Delta open / close: 6957
Delta start / end: 240
(found 1 occurrences)
ticket: 737318
Delta open / close: 8129
Delta start / end: 1860
(found 1 occurrences)
ticket: 737385
Delta open / close: 10401
Delta start / end: 2340
(found 1 occurrences)
At the end of the story: list comprehensions can be super-useful, they allows you to do a lot of stuff with a super-compact syntax. Also the python standard library contains a lot of ready-to-use tools that can really come to your aid, so get familiar!

Related

How do I add a daterange to open trades during when backtesting with backtrader?

I am trying to backtest a strategy where trades are only opened during 8.30 to 16.00 using backtrader.
Using the below attempt my code is running but returning no trades so my clsoing balane is the same as the opening. If I remove this filter my code is running correctly and trades are opening and closing so it is definitely the issue. Can anyone please help?
I have tried adding the datetime column of the data to a data feed using the below code:
` def __init__(self):
# Keep a reference to the "close" line in the data[0] dataseries
self.dataclose = self.datas[0].close
self.datatime = mdates.num2date(self.datas[0].datetime)
self.datatsi = self.datas[0].tsi
self.datapsar = self.datas[0].psar
self.databbmiddle = self.datas[0].bbmiddle
self.datastlower = self.datas[0].stlower
self.datastupper = self.datas[0].stupper
# To keep track of pending orders
self.order = None`
I then used the following code to try filter by this date range:
# Check if we are in the market
if not self.position:
current_time = self.datatime[0].time()
if datetime.time(8, 30) < current_time < datetime.time(16, 0):
if self.datatsi < 0 and self.datastupper[0] > self.dataclose[0] and self.datastlower[1] < self.dataclose[1] and self.dataclose[0] < self.databbmiddle[0] and self.datapsar[0] > self.dataclose[0]:
self.log('SELL CREATE, %.2f' % self.dataclose[0])
# Keep track of the created order to avoid a 2nd order
os = self.sell_bracket(size=100,price=sp1, stopprice=sp2, limitprice=sp3)
self.orefs = [o.ref for o in os]
else:
o1 = self.buy(exectype=bt.Order.Limit,price=bp1,transmit=False)
print('{}: Oref {} / Buy at {}'.format(self.datetime.date(), o1.ref, bp1))
o2 = self.sell(exectype=bt.Order.Stop,price=bp2,parent=o1,transmit=False)
print('{}: Oref {} / Sell Stop at {}'.format(self.datetime.date(), o2.ref, bp2))
o3 = self.sell(exectype=bt.Order.Limit,price=bp3,parent=o1,transmit=True)
print('{}: Oref {} / Sell Limit at {}'.format(self.datetime.date(), o3.ref, bp3))
self.orefs = [o1.ref, o2.ref, o3.ref] # self.sell(size=100, exectype=bt.Order.Limit, price=self.data.close[0]+16, parent=self.order, parent_bracket=bt.Order.Market)

TypeError: coercing to Unicode: need string or buffer, datetime.timedelta found

im currently getting this error
Traceback (most recent call last):
File "/Users/user/Documents/test.py", line 44, in <module>
get_slots(hours, appointments)
File "/Users/user/Documents/test.py", line 36, in get_slots
while start + duration <= end:
TypeError: coercing to Unicode: need string or buffer, datetime.timedelta found
My code:
from datetime import timedelta
import datetime
#notice the additional brackets to keep the 2 slots as two separate lists. So, 930-1230 is one slot, 1330-1400 is an another.
# HOURS AND APPOINTMENTS ARE GENERATED BY GATHERING DATA FROM DATABASE
hours = [[u'08:00', u'17:00']]
appointments = [(u'12:00', u'12:30'), (u'10:30', u'11:00')]
def get_slots(hours, appointments, duration=timedelta(hours=1)):
slots = sorted([(hours[0][0], hours[0][0])] + appointments + [(hours[0][1], hours[0][1])])
for start, end in ((slots[i][1], slots[i+1][0]) for i in range(len(slots)-1)):
assert start <= end, "Cannot attend all appointments"
while start + duration <= end:
json = []
json.append("{:%H:%M} - {:%H:%M}".format(start, start + duration))
start += duration
return json
if __name__ == "__main__":
get_slots(hours, appointments)
The code should output something like:
09:00 - 10:00
10:30 - 11:30
13:00 - 14:00
14:00 - 15:00
I found this code from Python - finding time slots
You have to convert both start and end string to datetime objects. See below example:
from datetime import timedelta
import datetime
#notice the additional brackets to keep the 2 slots as two separate lists. So, 930-1230 is one slot, 1330-1400 is an another.
# HOURS AND APPOINTMENTS ARE GENERATED BY GATHERING DATA FROM DATABASE
hours = [[u'08:00', u'17:00']]
appointments = [(u'12:00', u'12:30'), (u'10:30', u'11:00')]
def get_slots(hours, appointments, duration=timedelta(hours=1)):
slots = sorted([(hours[0][0], hours[0][0])] + appointments + [(hours[0][1], hours[0][1])])
for start, end in ((slots[i][1], slots[i+1][0]) for i in range(len(slots)-1)):
start = datetime.datetime.strptime(start, "%H:%M")
end = datetime.datetime.strptime(end, "%H:%M")
print(start+duration)
assert start <= end, "Cannot attend all appointments"
while start + duration <= end:
json = []
json.append("{:%H:%M} - {:%H:%M}".format(start, start + duration))
start += duration
return json
if __name__ == "__main__":
x = get_slots(hours, appointments)

How to loop function using a list of variables?

I have a function that prints OHLCV data for stock prices from a websocket. It works but I have to copy it for each variable (Var1 to Var14) to get each individual stock data. How would I automate this process given that I have list:
varlist = [var1, var2, var3...var14]
and my code is:
def process_messages_for_var1(msg):
if msg['e'] == 'error':
print(msg['m'])
# If message is a trade, print the OHLC data
else:
# Convert time into understandable structure
transactiontime = msg['k']['T'] / 1000
transactiontime = datetime.fromtimestamp(transactiontime).strftime('%d %b %Y %H:%M:%S')
# Process this message once websocket starts
print("{} - {} - Interval {} - Open: {} - Close: {} - High: {} - Low: {} - Volume: {}".
format(transactiontime,msg['s'],msg['k']['i'],msg['k']['o'],msg['k']['c'],msg['k']['h'],msg['k']['l'],msg['k']['v']))
# Also, put information into an array
kline_array_msg = "{},{},{},{},{},{}".format(
msg['k']['T'],msg['k']['o'],msg['k']['c'],msg['k']['h'],msg['k']['l'],msg['k']['v'])
# Insert at first position
kline_array_dct[var1].insert(0, kline_array_msg)
if (len(kline_array_dct[var1]) > window):
# Remove last message when res_array size is > of FIXED_SIZE
del kline_array_dct[var1][-1]
I'm hoping to get the following result (notice how function name also changes):
def process_messages_for_var2(msg):
if msg['e'] == 'error':
print(msg['m'])
# If message is a trade, print the OHLC data
else:
# Convert time into understandable structure
transactiontime = msg['k']['T'] / 1000
transactiontime = datetime.fromtimestamp(transactiontime).strftime('%d %b %Y %H:%M:%S')
# Process this message once websocket starts
print("{} - {} - Interval {} - Open: {} - Close: {} - High: {} - Low: {} - Volume: {}".
format(transactiontime,msg['s'],msg['k']['i'],msg['k']['o'],msg['k']['c'],msg['k']['h'],msg['k']['l'],msg['k']['v']))
# Also, put information into an array
kline_array_msg = "{},{},{},{},{},{}".format(
msg['k']['T'],msg['k']['o'],msg['k']['c'],msg['k']['h'],msg['k']['l'],msg['k']['v'])
# Insert at first position
kline_array_dct[var2].insert(0, kline_array_msg)
if (len(kline_array_dct[var2]) > window):
# Remove last message when res_array size is > of FIXED_SIZE
del kline_array_dct[var2][-1]
You can adjust the function so that it takes one of the vars as an argument. I.e.,
def process_messages(msg, var):
...
kline_array_dct[var].insert(0, kline_array_msg)
if (len(kline_array_dct[var]) > window):
# Remove last message when res_array size is > of FIXED_SIZE
del kline_array_dct[var][-1]
If the processes are generally the same, just define one of them, and give it more arguments:
def process_messages(msg, var)
Then, you can adjust your process code to run through each var when you call it. You can do this by removing the numbered vars in the process code:
if msg['e'] == 'error':
print(msg['m'])
# If message is a trade, print the OHLC data
else:
# Convert time into understandable structure
transactiontime = msg['k']['T'] / 1000
transactiontime = datetime.fromtimestamp(transactiontime).strftime('%d %b %Y %H:%M:%S')
# Process this message once websocket starts
print("{} - {} - Interval {} - Open: {} - Close: {} - High: {} - Low: {} - Volume: {}".
format(transactiontime,msg['s'],msg['k']['i'],msg['k']['o'],msg['k']['c'],msg['k']['h'],msg['k']['l'],msg['k']['v']))
# Also, put information into an array
kline_array_msg = "{},{},{},{},{},{}".format(
msg['k']['T'],msg['k']['o'],msg['k']['c'],msg['k']['h'],msg['k']['l'],msg['k']['v'])
# Insert at first position
kline_array_dct[var].insert(0, kline_array_msg)
if (len(kline_array_dct[var]) > window):
# Remove last message when res_array size is > of FIXED_SIZE
del kline_array_dct[var][-1]
Then, create a simple for loop to call the process for each var in the list:
for var in varList:
process_messages("msg", var)
The for loop will call the process for each var in the list.

Python: Finding time taken for each event in dataframe based on condition

I have a df with two columns, timestamp & eventType.
timestamp is ordered in chronological order, and eventType can be either ['start', 'change', 'end', resolve].
['start', 'change'] denotes the start of an event
['end','resolve'] denotes the end of an event
createdTime actionName
2020-03-16 18:28:14 start
2020-03-17 19:12:42 end
2020-03-18 19:56:10 change
2020-03-19 21:29:13 change
2020-03-20 21:42:06 end
2020-03-21 18:28:14 start
2020-03-21 19:12:42 resolve
2020-03-22 19:56:10 change
2020-03-22 21:29:13 change
2020-03-23 21:42:06 end
I wish to calculate the time delta between the each start/change event to the next end/resolve event.
An event can have several start/change statuses before it is
resolved, thus an event would need to take the initial start/change
status as the 1st start/change event time.
The output would need to be a list of time deltas taken for each event in the df
Thanks in advance :)
Edit
The expected outcome should be a list containing each time taken for each event.
event_times = ['24:44:28', '49:45.56', '0:44:28', '25:45:56']
Better late than never?
df['createdTime'] = pd.to_datetime(df.createdTime)
starts = ['start', 'change']
ends = ['end','resolve']
prev_status = 'end'
spans = []
for i in range(len(df)):
curr_status = df.actionName[i]
if curr_status in starts and prev_status in starts:
pass
elif curr_status in starts and prev_status in ends:
start_time = df.createdTime[i]
elif curr_status in ends and prev_status in starts:
t = df.createdTime[i] - start_time
hours = t.days * 24 + t.seconds // 3600
minutes = t.seconds % 3600 // 60
seconds = t.seconds % 60
spans.append(f"{hours}:{minutes}:{seconds}")
elif curr_status in ends and prev_status in ends:
raise ValueError (f"Two ends in a row at index {i}.")
else:
raise ValueError (f"Unrecognized action type at index {i}.")
prev_status = curr_status
print(spans)
gives
['24:44:28', '49:45:56', '0:44:28', '25:45:56']

Split up datetime interval according to labeled partition of a week

I have shift which is a datetime interval (a pair of datetimes). My weeks have a labeled partition (every week is the same: divided into parts, and each part has a label). I want to split up shift into labeled parts (i.e. into several subintervals), according to the partition of the week.
Example. Suppose shift is the interval 2019-10-21 18:30 - 2019-10-22 08:00, and the partition of the week is as follows: Monday to Friday 07:00 - 19:00 has label A, and the rest of the week has label B.
In this case the splitting of shift should be the following list of labeled subintervals:
2019-10-21 18:30 - 2019-10-21 19:00 with label A,
2019-10-21 19:00 - 2019-10-22 07:00 with label B, and
2019-10-22 07:00 - 2019-10-22 08:00 with label A.
How do I do this in general?
Input: a datetime interval (pair), and a labeled partition of the week (how to best represent this?)
Output: a list of labeled datetime intervals (pairs).
Note that shift can start in one week and end in another week (e.g. Sunday evening to Monday morning); each week does have the same labeled partition.
Here's a way to obtain the desired intervals:
from collections import namedtuple
from datetime import datetime, timedelta
import itertools as it
# Built-in as `it.pairwise` in Python 3.10+
def pairwise(iterable):
it = iter(iterable)
a = next(it, None)
for b in it:
yield (a, b)
a = b
def beginning_of_week(d: datetime) -> datetime:
''' Returns the datetime object for the beginning of the week the provided day is in. '''
return (d - timedelta(days=d.weekday())).replace(hour=0, minute=0, second=0, microsecond=0)
Partition = namedtuple('Partition', ('start', 'stop', 'label')) # output format
def _partition_shift_within_week(start: int, stop: int, partitions):
''' Splits the shift (defined by `start` and `stop`) into partitions within one week. '''
# Get partitions as ranges of absolute offsets from the beginning of the week in seconds
labels = (x for _, x in partitions)
absolute_offsets = it.accumulate(int(x.total_seconds()) for x, _ in partitions)
ranges = [range(x, y) for x, y in pairwise((0, *absolute_offsets))]
first_part_idx = [start in x for x in ranges].index(True)
last_part_idx = [stop in x for x in ranges].index(True)
for r, label in zip((ranges[i] for i in range(first_part_idx, last_part_idx + 1)), labels):
yield Partition(
timedelta(seconds=max(r.start, start)), # start of subinterval
timedelta(seconds=min(r.stop, stop)), # end of the subinterval
label
)
def _partition_shift_unjoined(shift, partitions):
''' Partitions a shift across weeks with partitions unjoined at the week edges. '''
start_monday = beginning_of_week(shift[0])
stop_monday = beginning_of_week(shift[1])
seconds_offsets = (
int((shift[0] - start_monday).total_seconds()),
*[604800] * ((stop_monday - start_monday).days // 7),
int((shift[1] - stop_monday).total_seconds()),
)
for x, y in pairwise(seconds_offsets):
num_weeks, x = divmod(x, 604800)
for part in _partition_shift_within_week(x, y - (y == 604800), partitions):
weeks_offset = timedelta(weeks=num_weeks)
yield Partition(
start_monday + weeks_offset + part.start,
start_monday + weeks_offset + part.stop,
part.label
)
def partition_shift(shift, partitions):
''' Partitions a shift across weeks. '''
results = []
for part in _partition_shift_unjoined(shift, partitions):
if len(results) and results[-1].label == part.label:
results[-1] = Partition(results[-1].start, part.stop, part.label)
else:
results.append(part)
return results
Usage example:
shift = (datetime(2019, 10, 21, 18, 30), datetime(2019, 10, 22, 8, 0))
# Partitions are stored as successive offsets from the beginning of the week
partitions = (
(timedelta(hours=7), 'B'), # Monday morning (midnight to 07:00)
(timedelta(hours=12), 'A'),
(timedelta(hours=12), 'B'), # Monday night & Tuesday morning (til 07:00)
(timedelta(hours=12), 'A'),
(timedelta(hours=12), 'B'), # Tuesday night & Wednesday morning (til 07:00)
(timedelta(hours=12), 'A'),
(timedelta(hours=12), 'B'), # Wednesday night & Thursday morning (til 07:00)
(timedelta(hours=12), 'A'),
(timedelta(hours=12), 'B'), # Thursday night & Friday morning (til 07:00)
(timedelta(hours=12), 'A'),
(timedelta(hours=53), 'B'), # Friday night & the weekend
)
for start, end, label in partition_shift(shift, partitions):
print(f"'{start}' - '{end}', label: {label}")
Output:
'2019-10-21 18:30:00' - '2019-10-21 19:00:00', label: A
'2019-10-21 19:00:00' - '2019-10-22 07:00:00', label: B
'2019-10-22 07:00:00' - '2019-10-22 08:00:00', label: A
This approach assumes that the partitions are input as successive offsets from the beginning of that week. The question did not specify how the partitions would be provided, so I choose to use this format. It's nice because it guarantees they do not overlap, and uses time deltas instead of being fixed to some particular date.
Converting other ways of specifying partitions into this one, or adapting this answer to work with other ways of specifying partitions has been left as an exercise to the reader.
Here's another usage example, using the same partitions as before, but a shift that starts in the previous week, thereby demonstrating that this approach works even when the shift spans multiple weeks.
shift = (datetime(2019, 10, 19, 18, 30), datetime(2019, 10, 22, 8, 0))
for start, end, label in partition_shift(shift, partitions):
print(f"'{start}' - '{end}', label: {label}")
Output:
'2019-10-19 18:30:00' - '2019-10-21 07:00:00', label: B
'2019-10-21 07:00:00' - '2019-10-21 19:00:00', label: A
'2019-10-21 19:00:00' - '2019-10-22 07:00:00', label: B
'2019-10-22 07:00:00' - '2019-10-22 08:00:00', label: A
I'd approach this by building the generic data structure, and then mapping week-minutes on top of it.
The generic structure looks like this:
class OrderedRangeMap:
""" ranges must be contiguous ; 0..limit """
def __init__(self, limit, default_value=""):
self.ranges = [(0,default_value),(limit,None)]
def find(self, key):
# could do bsearch
# what if value < self.ranges[0]?
kv = self.ranges[0]
if key < kv[0]:
return None,0,False
# what if value = self.ranges[0]?
# what if value == vl[0]?
for i,kv in enumerate(self.ranges):
k = kv[0]
if key < k:
return kvp,i-1,False
if key == k:
return kv,i,True
kvp = kv
# off the end
return None, len(self.ranges)-1, False
def add(self, skey, ekey, value):
newblock = (skey,value)
oldblock,si,sx = self.find(skey)
endblock,ei,ex = self.find(ekey)
if sx: #if start match, replace the oldblock
self.ranges[si] = newblock
else: #else insert after the oldblock
# bump
si += 1
ei += 1
self.ranges.insert(si,newblock)
if si == ei:
# insert the split block after that
self.ranges.insert(si+1,(ekey,oldblock[1]))
else:
# different blocks
# end block starts at new end point
self.ranges[ei] = (ekey,endblock[1])
# delete any in between
del self.ranges[si+1:ei]
# is that it?
def __getitem__(self, key):
block,index,match = self.find(key)
if index >= len(self.ranges) - 1:
return block[0], block[0], block[1]
return block[0], self.ranges[index+1][0], block[1]
def test_orm():
orm = OrderedRangeMap(100, "B")
assert orm.ranges == [(0,"B"),(100,None)]
# s/e in same block
orm.add(10,20, "A")
assert orm.ranges == [(0,"B"),(10,"A"),(20,"B"),(100,None)]
# s/e in same blocks, matches
orm.add(10,13, "a")
assert orm.ranges == [(0,"B"),(10,"a"),(13, "A"),(20,"B"),(100,None)]
# more blocks
orm.add(30,50, "c")
assert orm.ranges == [(0,"B"),(10,"a"),(13, "A"),(20,"B"),(30,"c"),(50,"B"),(100,None)]
# s/e in different blocks, no matches
orm.add(15,33, "d")
assert orm.ranges == [(0,"B"),(10,"a"),(13, "A"),(15,"d"),(33,"c"),(50,"B"),(100,None)]
# s/e in different blocks, s matches
orm.add(15,44, "e")
assert orm.ranges == [(0,"B"),(10,"a"),(13, "A"),(15,"e"),(44,"c"),(50,"B"),(100,None)]
# s/e in different blocks, s & e matches
orm.add(13,50, "f")
assert orm.ranges == [(0,"B"),(10,"a"),(13, "f"),(50,"B"),(100,None)]
# NOT tested: add outside of original range
test_orm()
(Edited to add:)
The upper layer converts from datetime to week minutes
import datetime
class WeekShiftLabels:
# this is assuming Monday=0
week_minutes = 7*24*60
def __init__(self, default_label="?"):
self.orm = OrderedRangeMap(self.week_minutes, default_label)
def add(self, dow, starttime, endtime, label):
dm = dow * 24*60
st = dm + t2m(starttime)
et = dm + t2m(endtime)
self.orm.add(st, et, label)
def __getitem__(self, dt):
wm = dt2wm(dt)
block,index,match = self.find(wm)
if index >= len(self.ranges) - 1:
return None
return block[1]
class WSLI:
# This doesn't handle modulo week_minutes
def __init__(self, wsl, sdt, edt):
self.wsl = wsl
self.base = sdt - datetime.timedelta(days=sdt.weekday())
t = sdt.time()
self.base -= datetime.timedelta(hours=t.hour,minutes=t.minute)
self.i = dt2wm(sdt)
self.em = dt2wm(edt)
def __next__(self):
if self.i < 0:
raise StopIteration
block,index,match = self.wsl.orm.find(self.i)
if not block:
raise StopIteration # or something else
start = wm2dt(self.base, self.i)
end = self.wsl.orm.ranges[index+1][0]
if end >= self.em:
end = self.em
self.i = -1
else:
self.i = end
end = wm2dt(self.base, end)
return start, end, block[1]
def __iter__(self):
return self
def __call__(self, sdt, edt):
return self.WSLI(self, sdt, edt)
def dt2wm(dt):
t = dt.time()
return dt.weekday() * 24*60 + t.hour*60 + t.minute
def wm2dt(base,wm):
return base + datetime.timedelta(minutes=wm)
def t2m(t):
return t.hour*60 + t.minute
def test_wsl():
wsl = WeekShiftLabels("B")
st = datetime.time(hour=7)
et = datetime.time(hour=19)
for dow in range(0,6):
wsl.add(dow, st, et, "A")
r = list(wsl(datetime.datetime(2019, 10, 21, 18, 30), datetime.datetime(2019, 10, 22, 8, 0)))
assert len(r) == 3
assert r[0]==(datetime.datetime(2019, 10, 21, 18, 30), datetime.datetime(2019, 10, 21, 19, 0), 'A')
assert r[1]==(datetime.datetime(2019, 10, 21, 19, 0), datetime.datetime(2019, 10, 22, 7, 0), 'B')
assert r[2]==(datetime.datetime(2019, 10, 22, 7, 0), datetime.datetime(2019, 10, 22, 8, 0), 'A')
test_wsl()
You have not defined what happens if one of your shift limits (start or end) lays inside one part of your week and the other limit lays outside. For example what happens if you are having
2019-10-21 18:30 - 2019-10-21 19:00
Is it A or B? You could either make rules that if one part is in "B" the label would be B or just test the start or end or you could take the average., etc. So I will show how to check if one specific datetime lays inside the intervals. I don't know of a library automating this task more than the datetime library.
datetime library
import datetime
now = datetime.datetime.now()
hour = now.hour
# day of the week as int, where Mon 0 and Sun 6
day = now.weekday()
if day < 5 and hour >= 7 and hour < 19:
label = "A"
else:
label = "B"
print(label)
You could also check if hour or day lays in a range or list. For example if you want to consider a break between 12 and 13 o'clock:
if hour in range(7, 13) or hour in range(13, 19):
# do something
docs for more info: https://docs.python.org/3.8/library/datetime.html
People also often recommend the Pendulum library but by scrolling over it's docs I can't see any method that makes your task easier than the above code. Of course you could do something like this (but it seems not to be easier for me; code is not tested):
alternate solution using pendulum
import pendulum
now = pendulum.now()
daystart = now.start_of('day')
weekstart = now.start_of('week')
if now < weekstart.add(days=5) and now > daystart.add(hours=7) and now < daystart.add(hours=19):
label = "A"
else:
label = "B"
pendulum docs: https://pendulum.eustace.io/docs
It must be stated here though that both solutions can be done in this way (or with some adjustments) in both libraries (pendulum and datetime) and probably in many others I haven't mentioned as well.
Bonus
Since you asked for a way to handle such things more generally I will show you one last thing how you could use the first solution and make it a bit more generic:
import datetime
gethour = lambda dt : dt.hour
getday = lambda dt : dt.weekday()
timeframes = {
"A": {
getday: range(0,6),
gethour: range(7,13) + range(13,19)
},
"break": {
getday: range(0,6),
gethour: [12]
}
}
default = "B"
now = datetime.datetime.now()
for tag, timeframe in timeframes.items():
label = tag
for getter, limit in timeframe.items():
if not getter(now) in limit:
label = default
break
if label != default:
break
print(label)

Categories

Resources