Updated to remove extraneous text and ambiguity.
The Rules:
An employee accrues 8 hours of Paid Time Off on the day after each quarter. Quarters, specifically being:
Jan 1 - Mar 31
Apr 1 - Jun 30
Jul 1 - Sep 30
Oct 1 - Dec 31
The Problem
Using python, I need to define the guts of the following function:
def acrued_hours_between(start_date, end_date):
# stuff
return integer
I'm currently using Python, and wondering what the correct approach to something like this would be.
I'm assuming that using DateTime objects, and possibly the dateutil module, would help here, but my brain isn't wrapping around this problem for some reason.
Update
I imagine the calculation being somewhat simple, as the problem is:
"How many hours of Paid Time Off are accrued from start_date to end_date?" given the above "rules".
The OP's edit mentions the real underlying problem is:
"How many hours of Paid Time Off are
accrued from X-date to Y-date?"
I agree, and I'd compute that in the most direct and straightforward way, e.g.:
import datetime
import itertools
accrual_months_days = (1,1), (4,1), (7,1), (10,1)
def accruals(begin_date, end_date, hours_per=8):
"""Vacation accrued between begin_date and end_date included."""
cur_year = begin_date.year - 1
result = 0
for m, d in itertools.cycle(accrual_months_days):
if m == 1: cur_year += 1
d = datetime.date(cur_year, m, d)
if d < begin_date: continue
if d > end_date: return result
result += hours_per
if __name__ == '__main__': # examples
print accruals(datetime.date(2010, 1, 12), datetime.date(2010, 9, 20))
print accruals(datetime.date(2010, 4, 20), datetime.date(2012, 12, 21))
print accruals(datetime.date(2010, 12, 21), datetime.date(2012, 4, 20))
A direct formula would of course be faster, but could be tricky to do it without bugs -- if nothing else, this "correct by inspection" example can serve to calibrate the faster one automatically, by checking that they agree over a large sample of date pairs (be sure to include in the latter all corner cases such as first and last days of quarters of course).
I would sort all the events for a particular employee in time order and simulate the events in that order checking that the available days of paid time off never falls below zero. A paid time off request is an event with a value -(number of hours). Jan 1st has an event with value +8 hours.
Every time a modification is made to the data, run the simulation again from the start.
The advantage of this method is that it will detect situations in which a new event is valid at that time but causes the number of free days to drop such that a later event which previously was valid now becomes invalid.
This could be optimized by storing intermediate results in a cache but since you will likely only have a few hundred events per employee this optimization probably won't be necessary.
This can be done with plain old integer math:
from datetime import date
def hours_accrued(start, end):
'''hours_accrued(date, date) -> int
Answers the question "How many hours of Paid Time Off
are accrued from X-date to Y-date?"
>>> hours_accrued(date(2010, 4, 20), date(2012, 12, 21))
80
>>> hours_accrued(date(2010, 12, 21), date(2012, 4, 20))
48
'''
return ( 4*(end.year - start.year)
+ ((end.month-1)/3 - (start.month-1)/3) ) * 8
I would count all free days before the date in question, then subtract the number of used days before then in order to come to a value for the maximum number of allowable days.
Set up a tuple for each date range (we'll call them quarters). In the tuple store the quarter (as a cardinal index, or as a begin date), the maximum accrued hours for a quarter, and the number of used hours in a quarter. You'll want to have a set of tuples that are sorted for this to work, so a plain list probably isn't your best option. A dictionary might be a better way to approach this, with the quarter as the key and the max/used entries returned in the tuple, as it can be "sorted".
(Note: I looked at the original explanation and rewrote my answer)
Get a copy of the set of all quarters for a given employee, sorted by the quarter's date. Iterate over each quarter summing the difference between the maximum per-quarter allotment of vacation time and the time "spent" on that quarter until you reach the quarter that the request date falls into. This gives accumulated time.
If accumulated time plus the time alloted for the requested quarter is not as much as the requested hours, fail immediately and reject the request. Otherwise, continue iterating up to the quarter of your quest.
If there is sufficient accumulated time, continue iterating over the copied set, computing the new available times on a per-quarter basis, starting with the left-over time from your initial calculation.
If any quarter has a computed time falling below zero, fail immediately and reject the request. Otherwise, continue until you run out of quarters.
If all quarters are computed, update the original set of data with the copy and grant the request.
Related
I've seen about 30 similar posts to this, but nothing really doing exactly what I'm looking for and some which just don't work..
I'm trying to return a list of N business dates, to then iterate through a dictionary and pull data out according to the corresponding dates.
Assuming the current date is:
refreshed = str(data['Meta Data']['3. Last Refreshed'])
For completion, the value of above right now is:2020-1-30
I want to be able to calculate n days prior to this date..
I don't really want to import a bunch of funky modules, and have tried a function using a loop and datetime.date.isoweekday() - but I always come across an issue when passing refreshed in.
One of the main issues I'm seeing with some of the examples elsewhere is where the examples are calculating the dates from datetime.date.today() - seemingly it's fine to pass that to isoweekday() but I can't pass refreshed to isoweekday() to calculate it's 0-6 reference. I've tried using strfrtime() to reformat the date into a suitable format for isoweekday but to no avail.
Subtracting days from a date
You can subtract 30 days from a datetime.datetime object by subtracting a datetime.timedelta object:
>>> import datetime
>>> datetime.datetime.today()
datetime.datetime(2020, 10, 31, 10, 20, 0, 704133)
>>> datetime.datetime.today() - datetime.timedelta(30)
datetime.datetime(2020, 10, 1, 10, 19, 49, 680385)
>>> datetime.datetime.strptime('2020-01-30', '%Y-%m-%d') - datetime.timedelta(30)
datetime.datetime(2019, 12, 31, 0, 0)
Skipping week-ends by subtracting 7 days instead of 5
We are starting from date d and you want to subtract N=30 non-week-end days. A general way could be:
Figure out which day of the week is d;
Figure out how many week-ends there are between d and d-N;
Remove the appropriate number of days.
However, you want to subtract 30 days, and 30 is a multiple of 5. This makes things particularly easy: when you subtract 5 days from a date, you are guaranteed to encounter exactly one week-end in those five days. So you can immediately remove 7 days instead of 5.
Removing 30 days is the same as removing 6 times 5 days. So you can remove 6 times 7 days instead, which is achieved by subtracting datetime.timedelta(42) from your date.
Note: this accounts for week-ends, but not for special holidays.
Skipping week-ends iteratively
You can test for days of the week using .weekday(). This is already answered on this other question: Loop through dates except for week-ends
You can add N days using a timedelta:
data['Meta Data']['3. Last Refreshed'] = pd.to_datetime(data['Meta Data']['3. Last Refreshed']) + pd.to_timedelta(4, unit="D")
Replace 4 with your n days.
I would like to add months to date in pandas. They may exceed beyond year 2262. There is a solution for relatively small number of months:
import numpy as np
import pandas as pd
pd.Timestamp('2018-01-22 00:00:00') + np.timedelta64(12, 'M')
which results in
Timestamp('2019-01-22 05:49:12')
However, when I add larger number (which, as a result, exceeds year 2262):
pd.Timestamp('2018-01-22 00:00:00') + np.timedelta64(3650, 'M')
Python does return result:
Timestamp('1737-09-02 14:40:26.290448384')
How to cope with this?
datetime
Pandas.Timestamp aims to handle much finer time resolution down to the nanosecond. This precision takes up enough of the 64 bits allocated to it that it can only go up to the year 2262. However, datetime.datetime does not have this limitation and can go up to year 9999. If you start working with datetime objects instead of Timestamp objects, you'll lose some functionality but you will be able to go beyond 2262.
Also, your number of months also went beyond the maximum number of days for a Timedelta
Let's begin by picking a more reasonable sized number of months that is a nice multiple of 48 (Four years).
d = pd.Timedelta(48, 'M')
And our date
t = pd.Timestamp('2018-01-22')
A multiple that represents how many times our 48 months fits inside the desired 3650 months.
m = 3650 / 48
Then we can use to_pydatetime and to_pytimedelta
t.to_pydatetime() + d.to_pytimedelta() * m
datetime.datetime(2322, 3, 24, 14, 15, 0, 1)
I have a forex data set which contains the date, time, open, high, low and close over a 1 year period in 1 minute increments.
I'd like to calculate the percent return over a 6-hour interval. My current code:
def perc_return(time_interval, data_dict):
p_returns = []
for element in data_dict:
next_element_time = element + timedelta(hours=time_interval)
print next_element_time
if next_element_time in data_dict:
p_return = ((data_dict[next_element_time][1] -
data_dict[element][0])/data_dict[element][0])*100
p_returns.append([element, next_element_time, p_return])
else:
break
return p_returns
price_dict = {}
for sample in raw_samples:
price_dict[sample[0]] = [Decimal(sample[1]), Decimal(sample[4])]
p_returns = perc_return(6, price_dict)
This code takes the raw data and converts it into a dictionary with the timestamp as the key and the open and close as the value, then uses that to calculate the 6 hour percent returns. The problem with this, is, bizarrely, adding 6 hours somehow changes the date by several months. Furthermore, this doesn't work if the next date is not in the dataset (for example if the date is a holiday or if the market is closed) so it does not traverse the entire dataset.
Sample output:
2007-10-24 03:14:00
2007-03-16 11:19:00
2007-07-25 14:43:00
[[datetime.datetime(2007, 10, 23, 21, 14), datetime.datetime(2007, 10, 24, 3, 14), Decimal('-0.3296857463524130190796857464')], [datetime.datetime(2007, 3, 16, 5, 19), datetime.datetime(2007, 3, 16, 11, 19), Decimal('-0.03752626838787151005703992795')]]
My raw dataset:
2007.01.01,20:00,1.323700,1.323700,1.323600,1.323700,0
2007.01.01,20:01,1.323600,1.323700,1.323600,1.323600,0
2007.01.01,20:02,1.323700,1.323800,1.323600,1.323600,0
2007.01.01,20:03,1.323700,1.323700,1.323600,1.323600,0
2007.01.01,20:04,1.323500,1.323600,1.323500,1.323500,0
2007.01.01,20:05,1.323600,1.323700,1.323500,1.323700,0
...
Are there any other libraries available to python that could do this without these errors?
EDIT Some more information that was requested:
Regarding the break and what I want to do if the 6 hour interval does not exist, basically, my originial (naive) thought process was that if this happens, that must mean I've reached the end point of the dataset (hence the break). However, ideally I'd like to retrieve the nearest datapoint to the 6 hour interval as possible.
element is simply a datetime object containing the timestamp. It looks like this:
2007-10-24 03:14:00
Finally, regarding the error, if you look at the output, the first three lines describe the next_element_time, that is the time after adding 6 hours, notice that they are not in any recognizable order and skip entire months. I now realize this is because the dictionaries in python are unordered when they built/accessed. This issue can probably be fixed using sets or another, ordered, data structure, however I still need help dividing the dataset into 6 hour intervals to take the percent return.
I want to build a simulation model of a production network with SimPy comprising the following features with regard to time:
Plants work from Monday to Friday (with two shifts of 8 hours)
Heavy trucks drive on all days of the week except Sunday
Light trucks drive on all days of the week, including Sunday
To this purpose, I want to construct a BroadcastPipe as given in the docs combined with timeouts to make the objects wait during days they are not working (for the plants additional logic is required to model shifts). This BroadcastPipe would just count the days (assuming 24*60 minutes for each day) and then say "It's Monday, everybody". The objects (plant, light and heavy trucks) would then process this information individually and act accordingly.
Now, I wonder whether there is an elegant method to link simulation time to regular Python Calender objects in order to easily access days of the week. This would be useful for clarity and enhancements like bank holidays and varying starting days. Do you have any advise how to do this? (or general advice on how to model better?). Thanks in advance!
I usually set a start date and define it to be equal with the simulation time (Environment.now) 0. Since SimPy’s simulation time has no inherent unit, I also define that it is in seconds. Using arrow, I can than easily calculate an actual date and time from the current simulation time:
import arrow
import simpy
start = arrow.get('2015-01-01T00:00:00')
env = simpy.Environment()
# do some simulation ...
current_date = start.replace(seconds=env.now)
print('Curret weekday:', current_date.weekday())
You might use the datetime module and create a day_of_week object, though you would still need to calculate the elapsed time:
import datetime
# yyyy = four digit year integer
# mm = 1- or 2-digit month integer
# dd = 1- or 2-digit day integer
day_of_week = datetime.datetime(yyyy, mm, dd).strftime('%a')
if day_of_week == 'Mon':
# Do Monday tasks...
elif day_of_week == 'Tue':
# Tuesday...
Given a date range how to calculate the number of weekends partially or wholly within that range?
(A few definitions as requested:
take 'weekend' to mean Saturday and Sunday.
The date range is inclusive i.e. the end date is part of the range
'wholly or partially' means that any part of the weekend falling within the date range means the whole weekend is counted.)
To simplify I imagine you only actually need to know the duration and what day of the week the initial day is...
I darn well now it's going to involve doing integer division by 7 and some logic to add 1 depending on the remainder but I can't quite work out what...
extra points for answers in Python ;-)
Edit
Here's my final code.
Weekends are Friday and Saturday (as we are counting nights stayed) and days are 0-indexed starting from Monday. I used onebyone's algorithm and Tom's code layout. Thanks a lot folks.
def calc_weekends(start_day, duration):
days_until_weekend = [5, 4, 3, 2, 1, 1, 6]
adjusted_duration = duration - days_until_weekend[start_day]
if adjusted_duration < 0:
weekends = 0
else:
weekends = (adjusted_duration/7)+1
if start_day == 5 and duration % 7 == 0: #Saturday to Saturday is an exception
weekends += 1
return weekends
if __name__ == "__main__":
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
for start_day in range(0,7):
for duration in range(1,16):
print "%s to %s (%s days): %s weekends" % (days[start_day], days[(start_day+duration) % 7], duration, calc_weekends(start_day, duration))
print
General approach for this kind of thing:
For each day of the week, figure out how many days are required before a period starting on that day "contains a weekend". For instance, if "contains a weekend" means "contains both the Saturday and the Sunday", then we have the following table:
Sunday: 8
Monday: 7
Tuesday: 6
Wednesday: 5
Thursday: 4
Friday: 3
Saturday: 2
For "partially or wholly", we have:
Sunday: 1
Monday: 6
Tuesday: 5
Wednesday: 4
Thursday: 3
Friday: 2
Saturday: 1
Obviously this doesn't have to be coded as a table, now that it's obvious what it looks like.
Then, given the day-of-week of the start of your period, subtract[*] the magic value from the length of the period in days (probably start-end+1, to include both fenceposts). If the result is less than 0, it contains 0 weekends. If it is equal to or greater than 0, then it contains (at least) 1 weekend.
Then you have to deal with the remaining days. In the first case this is easy, one extra weekend per full 7 days. This is also true in the second case for every starting day except Sunday, which only requires 6 more days to include another weekend. So in the second case for periods starting on Sunday you could count 1 weekend at the start of the period, then subtract 1 from the length and recalculate from Monday.
More generally, what's happening here for "whole or part" weekends is that we're checking to see whether we start midway through the interesting bit (the "weekend"). If so, we can either:
1) Count one, move the start date to the end of the interesting bit, and recalculate.
2) Move the start date back to the beginning of the interesting bit, and recalculate.
In the case of weekends, there's only one special case which starts midway, so (1) looks good. But if you were getting the date as a date+time in seconds rather than day, or if you were interested in 5-day working weeks rather than 2-day weekends, then (2) might be simpler to understand.
[*] Unless you're using unsigned types, of course.
My general approach for this sort of thing: don't start messing around trying to reimplement your own date logic - it's hard, ie. you'll screw it up for the edge cases and look bad. Hint: if you have mod 7 arithmetic anywhere in your program, or are treating dates as integers anywhere in your program: you fail. If I saw the "accepted solution" anywhere in (or even near) my codebase, someone would need to start over. It beggars the imagination that anyone who considers themselves a programmer would vote that answer up.
Instead, use the built in date/time logic that comes with Python:
First, get a list of all of the days that you're interested in:
from datetime import date, timedelta
FRI = 5; SAT = 6
# a couple of random test dates
now = date.today()
start_date = now - timedelta(57)
end_date = now - timedelta(13)
print start_date, '...', end_date # debug
days = [date.fromordinal(d) for d in
range( start_date.toordinal(),
end_date.toordinal()+1 )]
Next, filter down to just the days which are weekends. In your case you're interested in Friday and Saturday nights, which are 5 and 6. (Notice how I'm not trying to roll this part into the previous list comprehension, since that'd be hard to verify as correct).
weekend_days = [d for d in days if d.weekday() in (FRI,SAT)]
for day in weekend_days: # debug
print day, day.weekday() # debug
Finally, you want to figure out how many weekends are in your list. This is the tricky part, but there are really only four cases to consider, one for each end for either Friday or Saturday. Concrete examples help make it clearer, plus this is really the sort of thing you want documented in your code:
num_weekends = len(weekend_days) // 2
# if we start on Friday and end on Saturday we're ok,
# otherwise add one weekend
#
# F,S|F,S|F,S ==3 and 3we, +0
# F,S|F,S|F ==2 but 3we, +1
# S|F,S|F,S ==2 but 3we, +1
# S|F,S|F ==2 but 3we, +1
ends = (weekend_days[0].weekday(), weekend_days[-1].weekday())
if ends != (FRI, SAT):
num_weekends += 1
print num_weekends # your answer
Shorter, clearer and easier to understand means that you can have more confidence in your code, and can get on with more interesting problems.
To count whole weekends, just adjust the number of days so that you start on a Monday, then divide by seven. (Note that if the start day is a weekday, add days to move to the previous Monday, and if it is on a weekend, subtract days to move to the next Monday since you already missed this weekend.)
days = {"Saturday":-2, "Sunday":-1, "Monday":0, "Tuesday":1, "Wednesday":2, "Thursday":3, "Friday":4}
def n_full_weekends(n_days, start_day):
n_days += days[start_day]
if n_days <= 0:
n_weekends = 0
else:
n_weekends = n_days//7
return n_weekends
if __name__ == "__main__":
tests = [("Tuesday", 10, 1), ("Monday", 7, 1), ("Wednesday", 21, 3), ("Saturday", 1, 0), ("Friday", 1, 0),
("Friday", 3, 1), ("Wednesday", 3, 0), ("Sunday", 8, 1), ("Sunday", 21, 2)]
for start_day, n_days, expected in tests:
print start_day, n_days, expected, n_full_weekends(n_days, start_day)
If you want to know partial weekends (or weeks), just look at the fractional part of the division by seven.
You would need external logic beside raw math. You need to have a calendar library (or if you have a decent amount of time implement it yourself) to define what a weekend, what day of the week you start on, end on, etc.
Take a look at Python's calendar class.
Without a logical definition of days in your code, a pure mathematical methods would fail on corner case, like a interval of 1 day or, I believe, anything lower then a full week (or lower then 6 days if you allowed partials).