Business time difference between datetime objects in Python

Business time difference between datetime objects in Python - python

How do I find the business time difference between two datetime objects in Python?
Business time can be 9-17
I want also to consider holidays. I have a function is_holiday(date) that takes a date object.

def buisiness_time_between(datetime1, datetime2):
return business_time_after(datetime1) +
(business_time_per_day * days_between(datetime1,datetime2)) +
business_time_before(datetime2)
I'm sure you can figure out how to implement business_time_after, business_time_before, business_time_per_day, and days_between

I did not really find a simple pythonic answer to this problem.
The package businesstime suggested by J.F. Sebastian does what I wanted, but I think that the algorithms can be optimized and simplified.
For example, if we change the model of the day we can use the standard python datetime operations.
I try to be clearer: if our work day is 9-17 we may say that our day is made by 8h and not 24h. So, 9:00->0:00 and 17:00->08:00. Now what we have to do is to create a function that converts the real time inside our new model in this way:
if it is a time lower than our start time (09:00), we don't care about it and we convert it to 00:00
if it is a business time we convert subtracting the start time, for example 13:37 becomes 04:37
if it is a time greater than our stop time (17:00), we don't care about it and we convert it to 08:00
After this conversion all the calculation can be made easily with the python datetime package as you do with a normal datetime.
Thanks also to the answer of user2085282 you can find this concept implemented in the repo django-business-time.
With the model conversion, I found extremely easy to add another feature: the lunch break
In fact, if you have a lunch break 12-13, you can map in your new day model all the times inside the interval to 12. But pay attention to convert also the time after 13 considering one hour of lunch break.
I am sorry that what I developed is just a Django application and not a Python package. But for the moment, it works for me. And sorry also for my code, I am a newbie :)
PS: the full documentation is coming soon!

Related

Is there a library or suggested tactic for shift planning with hours and breaks?

I’m trying to think through a sort of extra credit project- optimizing our schedule.
Givens:
“demand” numbers that go down to the 1/2 hour. These tell us the ideal number of people we’d have on at any given time;
8 hour shift, plus an hour lunch break > 2 hours from the start and end of the shift (9 hours from start to finish);
Breaks: 2x 30 minute breaks in the middle of the shift;
For simplicity, can assume an employee would have the same schedule every day.
Desired result:
Dictionary or data frame with the best-case distribution of start times, breaks, lunches across an input number of employees such that the difference between staffed and demanded labor is minimized.
I have pretty basic python, so my first guess was to just come up with all of the possible shift permutations (points at which one could take breaks or lunches), and then ask python to select x (x=number of employees available) at random a lot of times, and then tell me which one best allocates the labor. That seems a bit cumbersome and silly, but my limitations are such that I can’t see beyond such a solution.
I have tried to look for libraries or tools that help with this, but the question here- how to distribute start times and breaks within a shift- doesn’t seem to be widely discussed. I’m open to hearing that this is several years off for me, but...
Appreciate anyone’s guidance!

How to pretty-print elapsed times/ETAs in Python?

I am measuring run times of a program in seconds. Depending on the amount of data I input, that can take milliseconds to days. Is there a Python module that I can use to convert the number of seconds to the most useful unit and display that? Approximations are fine.
For example, 50 should become 50 seconds, 590 should become 10 minutes, 100000 should become 1 day or something like that. I could write the basic thing myself, but I am sure people have thought about this more than I have and have considered many of the edge case I wouldn't think about in a 1000 years :)
Edit: I noticed tqdm must have some logic associated with that, as it selects the length of the ETA string accordingly. Compare
for _ in tqdm.tqdm(range(10)): time.sleep(1)
with
for _ in tqdm.tqdm(range(100000)): time.sleep(1)
Edit: I have also found this Gist, but I would prefer code with at least some maintanence :)
https://gist.github.com/alexwlchan/73933442112f5ae431cc

Close the question if you want, humanize.naturaldelta is the answer:
This modest package contains various common humanization utilities, like turning a number into a fuzzy human readable duration ('3 minutes ago') or into a human readable size or throughput. It works with python 2.7 and 3.3 and is localized to Russian, French, Korean and Slovak.
https://github.com/jmoiron/humanize

I just found arrow:
Arrow is a Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps.
It has humanize(), too, and is much more well-maintainted, it seems:
https://arrow.readthedocs.io/en/latest/#humanize

Design - How to handle timestamps (storage) and when performing computations ; Python

I'm trying to determine (as my application is dealing with lots of data from different sources and different time zones, formats, etc) how best to store my data AND work with it.
For example, should I store everything as UTC? This means when I fetch data I need to determine what timezone it is currently in, and if it's NOT UTC, do the necessary conversion to make it so. (Note, I'm in EST).
Then, when performing computations on the data, should I extract (say it's UTC) and get into MY time zone (EST), so it makes sense when I'm looking at it? I should I keep it in UTC and do all my calculations?
A lot of this data is time series and will be graphed, and the graph will be in EST.
This is a Python project, so lets say I have a data structure that is:
"id1": {
"interval": 60, <-- seconds, subDict['interval']
"last": "2013-01-29 02:11:11.151996+00:00" <-- UTC, subDict['last']
},
And I need to operate on this, by determine if the current time (now()) is > the last + interval (has the 60 second elapsed)? So in code:
lastTime = dateutil.parser.parse(subDict['last'])
utcNow = datetime.datetime.utcnow().replace(tzinfo=tz.tzutc())
if lastTime + datetime.timedelta(seconds=subDict['interval']) < utcNow:
print "Time elapsed, do something!"
Does that make sense? I'm working with UTC everywhere, both stored and computationally...
Also, if anyone has links to good write-ups on how to work with timestamps in software, I'd love to read it. Possibly like a Joel On Software for timestamp usage in applications ?

It seems to me as though you're already doing things 'the right way'. Users will probably expect to interact in their local time zone (input and output), but it's normal to store normalized dates in UTC format so that they are unambiguous and to simplify calculation. So, normalize to UTC as soon as possible, and localize as late as possible.
Some small amount of information about Python and timezone processing can be found here:
Django timezone implementation
pytz Documentation
My current preference is to store dates as unix timestamp tv_sec values in backend storage, and convert to Python datetime.datetime objects during processing. Processing will usually be done with a datetime object in the UTC timezone and then converted to a local user's timezone just before output. I find having that having a rich object such as a datetime.datetime helps with debugging.
Timezone are a nuisance to deal with and you probably need to determine on a case-by-case basis whether it's worth the effort to support timezones correctly.
For example, let's say you're calculating daily counts for bandwidth used. Some questions that may arise are:
What happens on a daylight saving boundary? Should you just assume that a day is always 24 hours for ease of calculation or do you need to always check for every daily calculation that a day may have less or more hours on the daylight savings boundary?
When presenting a localized time, does it matter if a time is repeated? eg. If you have an hourly report display in localtime without a time zone attached, will it confuse the user to have a missing hour of data, or a repeated hour of data around daylight savings changes.

Since, as I can see, you do not seem to be having any implementation problems, I would focus rather on design aspects than on code and timestamp format. I have an experience of participating in design of network support for a navigation system implemented as a distributed system in a local network. The nature of that system is such that there is a lot of data (often conflicting), coming from different sources, so solving possible conflicts and keeping data integrity is rather tricky. Just some thoughts based on that experience.
Timestamping data, even in a distributed system including many computers, usually is not a problem if you do not need a higher resoluition than one provided by system time functions and higher time synchronization accuracy than one provided by your OS components.
In the simplest case using UTC is quite reasonable, and for most of tasks it's enough. However, it's important to understand the purpose of using time stamps in your system from the very beginning of design. Time values (no matter if it is Unix time or formatted UTC strings) sometimes may be equal. If you have to resolve data conflicts based on timestamps (I mean, to always select a newer (or an older) value among several received from different sources), you need to understand if an incorrectly resolved conflict (that usually means a conflict that may be resolved in more than one way, as timestamps are equal) is a fatal problem for your system design, or not. The probable options are:
If the 99.99% of conflicts are resolved in the same way on all the nodes, you do not care about the remaining 0.01%, and they do not break data integrity. In that case you may safely continue using something like UTC.
If strict resolving of all the conflicts is a must for you, you have to design your own timestamping system. Timestamps may include time (maybe not system time, but some higher resolution timer), sequence number (to allow producing unique timestamps even if time resolution is not enough for that) and node identifier (to allow different nodes of your system to generate completely unique timestamps).
Finally, what you need may be not timestamps based on time. Do you really need to be able to calculate time difference between a pair of timestamps? Isn't it enough just to allow ordering timestamps, not connecting them to real time moments? If you don't need time calculations, just comparisons, timestamps based on sequential counters, not on real time, are a good choice (see Lamport time for more details).
If you need strict conflict resolving, or if you need very high time resolution, you will probably have to write your own timestamp service.
Many ideas and clues may be borrowed from a book by A. Tanenbaum, "Distributed systems: Principles and paradigms". When I faced such problems, it helped me a lot, and there is a separate chapter dedicated to timestamps generation in it.

I think the best approach is to store all timestamp data as UTC. When you read it in, immediately convert to UTC; right before display, convert from UTC to your local time zone.
You might even want to have your code print all timestamps twice, once in local time and the second time in UTC time... it depends on how much data you need to fit on a screen at once.
I am a big fan of the RFC 3339 timestamp format. It is unambiguous to both humans and machines. What is best about it is that almost nothing is optional, so it always looks the same:
2013-01-29T19:46:00.00-08:00
I prefer to convert timestamps to single float values for storage and computations, and then convert back to the datetime format for display. I wouldn't keep money in floats, but timestamp values are well within the precision of float values!
Working with time floats makes a lot of code very easy:
if time_now() >= last_time + interval:
print("interval has elapsed")
It looks like you are already doing it pretty much this way, so I can't suggest any dramatic improvements.
I wrote some library functions to parse timestamps into Python time float values, and convert time float values back to timestamp strings. Maybe something in here will be useful to you:
http://home.blarg.net/~steveha/pyfeed.html
I suggest you look at feed.date.rfc3339. BSD license, so you can just use the code if you like.
EDIT: Question: How does this help with timezones?
Answer: If every timestamp you store is stored in UTC time as a Python time float value (number of seconds since the epoch, with optional fractional part), you can directly compare them; subtract one from another to find out the interval between them; etc. If you use RFC 3339 timestamps, then every timestamp string has the timezone right there in the timestamp string, and it can be correctly converted to UTC time by your code. If you convert from a float value to a timestamp string value right before displaying, the timezone will be correct for local time.
Also, as I said, it looks like he is already pretty much doing this, so I don't think I can give any amazing advice.

Personally I'm using the Unix-time standard, it's very convenient for storage due to its simple representation form, it's merely a sequence of numbers. Since internally it represent UTC time, you have to make sure to generate it properly (converting from other timestamps) before storing and format it accordingly to any time zone you want.
Once you have a common timestamp format in the backend data (tz aware), plotting the data is very easy as is just a matter of setting the destination TZ.
As an example:
import time
import datetime
import pytz
# print pre encoded date in your local time from unix epoch
example = {"id1": {
"interval": 60,
"last": 1359521160.62
}
}
#this will use your system timezone formatted
print time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(example['id1']['last']))
#this will use ISO country code to localize the timestamp
countrytz = pytz.country_timezones['BR'][0]
it = pytz.timezone(countrytz)
print it.localize(datetime.datetime.utcfromtimestamp(example['id1']['last']))

Full-featured date and time library

I'm wondering if anyone knows of a good date and time library that has correctly-implemented features like the following:
Microsecond resolution
Daylight savings
Example: it knows that 2:30am did not exist in the US on 8 March 2009 for timezones that respect daylight savings.
I should be able to specify a timezone like "US/Eastern" and it should be smart enough to know whether a given timestamp should correspond to EST or EDT.
Custom date ranges
The ability to create specialized business calendars that skip over weekends and holidays.
Custom time ranges
The ability to define business hours so that times requested outside the business hours can be rounded up or down to the next or previous valid hour.
Arithmetic
Be able to add and subtract integer amounts of all units (years, months, weeks, days, hours, minutes, ...). Note that adding something like 0.5 days isn't well-defined because it could mean 12 hours or it could mean half the duration of a day (which isn't 24 hours on daylight savings changes).
Natural boundary alignment
Given a timestamp, I'd like be able to do things like round down to the nearest decade, year, month, week, ..., quarter hour, hour, etc.
I'm currently using Python, though I'm happy to have a solution in another language like perl, C, or C++.
I've found that the built-in Python libraries lack sophistication with their daylight savings logic and there isn't an obvious way (to me) to set up things like custom time ranges.

Python's standard library's datetime module is deliberately limited to non-controversial aspects that aren't changing all the time by legislative fiat -- that's why it deliberately excludes direct support for timezones, DST, fuzzy parsing and ill-defined arithmetic (such as "one month later"...) and the like. On top of it, dateutil for many kinds of manipulations, and pytz for timezones (including DST issues), add most of what you're asking for, though not extremely explosive things like "holidays" which vary so wildly not just across political jurisdictions but even across employers within a simgle jurisdiction (e.g. in the US some employers consider "Columbus Day" a holiday, but many don't -- and some, with offices in many locations, have it as a holiday on some locations but not in others; given this utter, total chaos, to expect to find a general-purpose library that somehow magically makes sense of the chaos is pretty weird).

Take a look at the dateutil and possibly mx.DateTime packages.

Saw this the other day, I havn't used it myself... looks promising. http://crsmithdev.com/arrow/

I should be able to specify a timezone
like "US/Eastern" and it should be
smart enough to know whether a given
timestamp should correspond to EST or
EDT.
This part isn't always possible - in the same way that 2:30am doesn't exist for one day of the year (in timezones with daylight saving that switches at 2:00am), 2:30am exists twice for another day - once in EDT and then an hour later in EST. If you pass that date/time to the library, how does it know which of the two times you're talking about?

Although the book is over ten years old, I would strongly recommend reading Standard C Date/Time Library: Programming the World's Calendars and Clocks by Lance Latham. This is one of those books that you will pick back up from time to time in amazement that it got written at all. The author goes into more detail than you want about calenders and time keeping systems, and along the way develops the source code to a library (written in C) to handle all of the calculations.
Amazingly, it seems to be still in print...

I just released a Python library called Fleming (https://github.com/ambitioninc/fleming), and it appears to solve two of your problems with sophistication in regards to Daylight Savings Time.
Problem 1, Arithmetic - Fleming has an add_timedelta function that takes a timedelta (from Python's datetime module) or a relativedelta from python-dateutil and adds it to a datetime object. The add_timedelta function handles the case when the datetime object crosses a DST boundary. Check out https://github.com/ambitioninc/fleming#add_timedelta for a complete explanation and examples. Here is a short example:
import fleming
import datetime
import timedelta
dt = fleming.add_timedelta(dt, datetime.timedelta(weeks=2, days=1))
print dt
2013-03-29 20:00:00-04:00
# Do timedelta arithmetic such that it starts in DST and crosses over into no DST.
# Note that the hours stay in tact and the timezone changes
dt = fleming.add_timedelta(dt, datetime.timedelta(weeks=-4))
print dt
2013-03-01 20:00:00-05:00
Problem 2, Natural Boundary Alignment - Fleming has a floor function that can take an arbitrary alignment. Let's say your time was datetime(2013, 2, 3) and you gave it a floor interval of month=3. This means it will round to the nearest trimonth (quarter). You could similarly specify nearest decade by using year=10 in the arguments. Check out (https://github.com/ambitioninc/fleming#floordt-within_tznone-yearnone-monthnone-weeknone-daynone-hournone-minutenone-secondnone-microsecondnone) for complete examples and illustrations. Here is a quick one:
import fleming
import datetime
# Get the starting of a quarter by using month=3
print fleming.floor(datetime.datetime(2013, 2, 4), month=3)
2013-01-01 00:00:00

Python dateutil.rrule is incredibly slow

I'm using the python dateutil module for a calendaring application which supports repeating events. I really like the ability to parse ical rrules using the rrulestr() function. Also, using rrule.between() to get dates within a given interval is very fast.
However, as soon as I try doing any other operations (ie: list slices, before(), after(),...) everything begins to crawl. It seems like dateutil tries to calculate every date even if all I want is to get the last date with rrule.before(datetime.max).
Is there any way of avoiding these unnecessary calculations?

My guess is probably not. The last date before datetime.max means you have to calculate all the recurrences up until datetime.max, and that will reasonably be a LOT of recurrences. It might be possible to add shortcuts for some of the simpler recurrences. If it is every year on the same date for example, you don't really need to compute the recurrences inbetween, for example. But if you have every third something you must, for example, and also if you have a maximum recurrences, etc. But I guess dateutil doesn't have these shortcuts. It would probably be quite complex to implement reliably.
May I ask why you need to find the last recurrence before datetime.max? It is, after all, almost eight thousand years into the future... :-)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.