Full-featured date and time library - python

I'm wondering if anyone knows of a good date and time library that has correctly-implemented features like the following:
Microsecond resolution
Daylight savings
Example: it knows that 2:30am did not exist in the US on 8 March 2009 for timezones that respect daylight savings.
I should be able to specify a timezone like "US/Eastern" and it should be smart enough to know whether a given timestamp should correspond to EST or EDT.
Custom date ranges
The ability to create specialized business calendars that skip over weekends and holidays.
Custom time ranges
The ability to define business hours so that times requested outside the business hours can be rounded up or down to the next or previous valid hour.
Arithmetic
Be able to add and subtract integer amounts of all units (years, months, weeks, days, hours, minutes, ...). Note that adding something like 0.5 days isn't well-defined because it could mean 12 hours or it could mean half the duration of a day (which isn't 24 hours on daylight savings changes).
Natural boundary alignment
Given a timestamp, I'd like be able to do things like round down to the nearest decade, year, month, week, ..., quarter hour, hour, etc.
I'm currently using Python, though I'm happy to have a solution in another language like perl, C, or C++.
I've found that the built-in Python libraries lack sophistication with their daylight savings logic and there isn't an obvious way (to me) to set up things like custom time ranges.

Python's standard library's datetime module is deliberately limited to non-controversial aspects that aren't changing all the time by legislative fiat -- that's why it deliberately excludes direct support for timezones, DST, fuzzy parsing and ill-defined arithmetic (such as "one month later"...) and the like. On top of it, dateutil for many kinds of manipulations, and pytz for timezones (including DST issues), add most of what you're asking for, though not extremely explosive things like "holidays" which vary so wildly not just across political jurisdictions but even across employers within a simgle jurisdiction (e.g. in the US some employers consider "Columbus Day" a holiday, but many don't -- and some, with offices in many locations, have it as a holiday on some locations but not in others; given this utter, total chaos, to expect to find a general-purpose library that somehow magically makes sense of the chaos is pretty weird).

Take a look at the dateutil and possibly mx.DateTime packages.

Saw this the other day, I havn't used it myself... looks promising. http://crsmithdev.com/arrow/

I should be able to specify a timezone
like "US/Eastern" and it should be
smart enough to know whether a given
timestamp should correspond to EST or
EDT.
This part isn't always possible - in the same way that 2:30am doesn't exist for one day of the year (in timezones with daylight saving that switches at 2:00am), 2:30am exists twice for another day - once in EDT and then an hour later in EST. If you pass that date/time to the library, how does it know which of the two times you're talking about?

Although the book is over ten years old, I would strongly recommend reading Standard C Date/Time Library: Programming the World's Calendars and Clocks by Lance Latham. This is one of those books that you will pick back up from time to time in amazement that it got written at all. The author goes into more detail than you want about calenders and time keeping systems, and along the way develops the source code to a library (written in C) to handle all of the calculations.
Amazingly, it seems to be still in print...

I just released a Python library called Fleming (https://github.com/ambitioninc/fleming), and it appears to solve two of your problems with sophistication in regards to Daylight Savings Time.
Problem 1, Arithmetic - Fleming has an add_timedelta function that takes a timedelta (from Python's datetime module) or a relativedelta from python-dateutil and adds it to a datetime object. The add_timedelta function handles the case when the datetime object crosses a DST boundary. Check out https://github.com/ambitioninc/fleming#add_timedelta for a complete explanation and examples. Here is a short example:
import fleming
import datetime
import timedelta
dt = fleming.add_timedelta(dt, datetime.timedelta(weeks=2, days=1))
print dt
2013-03-29 20:00:00-04:00
# Do timedelta arithmetic such that it starts in DST and crosses over into no DST.
# Note that the hours stay in tact and the timezone changes
dt = fleming.add_timedelta(dt, datetime.timedelta(weeks=-4))
print dt
2013-03-01 20:00:00-05:00
Problem 2, Natural Boundary Alignment - Fleming has a floor function that can take an arbitrary alignment. Let's say your time was datetime(2013, 2, 3) and you gave it a floor interval of month=3. This means it will round to the nearest trimonth (quarter). You could similarly specify nearest decade by using year=10 in the arguments. Check out (https://github.com/ambitioninc/fleming#floordt-within_tznone-yearnone-monthnone-weeknone-daynone-hournone-minutenone-secondnone-microsecondnone) for complete examples and illustrations. Here is a quick one:
import fleming
import datetime
# Get the starting of a quarter by using month=3
print fleming.floor(datetime.datetime(2013, 2, 4), month=3)
2013-01-01 00:00:00

Related

Reading time zone strings in python

I'm have a series of strings of dates and want to make all of them into datetime objects for a searchable database. I'm really having trouble understanding the time zone.
In the applications I'm working with, sometimes the time zone is expressed as three letters ("EST") and sometimes it's "-5:00".
pytz.timezone('EST') is okay. pytz.timezone('UTC+05:00') does not seem to work. However, I CAN get it pytz to output UTC as a string. For the utc times we have datetime.timezone(datetime.timedetla(hours=-5), 'UTC') but that won't be consistent with pytz.timezone. I would like to use pytz.timezone for all the time zones. How do I do it?
Since some of your strings have only abbreviations, sorry but you're out of luck. There is no solution.
The problem is that time zone abbreviations are not unique. Some common examples: CST has 3 different interpretations (Central (US), China, or Cuba), and IST has 3 different interpretations (Israel, Ireland, and India). Which should your program pick?
It's not a problem just for Python. It's the nature of time zone abbreviations. You'd need some other information to disambiguate.

Business time difference between datetime objects in Python

How do I find the business time difference between two datetime objects in Python?
Business time can be 9-17
I want also to consider holidays. I have a function is_holiday(date) that takes a date object.
def buisiness_time_between(datetime1, datetime2):
return business_time_after(datetime1) +
(business_time_per_day * days_between(datetime1,datetime2)) +
business_time_before(datetime2)
I'm sure you can figure out how to implement business_time_after, business_time_before, business_time_per_day, and days_between
I did not really find a simple pythonic answer to this problem.
The package businesstime suggested by J.F. Sebastian does what I wanted, but I think that the algorithms can be optimized and simplified.
For example, if we change the model of the day we can use the standard python datetime operations.
I try to be clearer: if our work day is 9-17 we may say that our day is made by 8h and not 24h. So, 9:00->0:00 and 17:00->08:00. Now what we have to do is to create a function that converts the real time inside our new model in this way:
if it is a time lower than our start time (09:00), we don't care about it and we convert it to 00:00
if it is a business time we convert subtracting the start time, for example 13:37 becomes 04:37
if it is a time greater than our stop time (17:00), we don't care about it and we convert it to 08:00
After this conversion all the calculation can be made easily with the python datetime package as you do with a normal datetime.
Thanks also to the answer of user2085282 you can find this concept implemented in the repo django-business-time.
With the model conversion, I found extremely easy to add another feature: the lunch break
In fact, if you have a lunch break 12-13, you can map in your new day model all the times inside the interval to 12. But pay attention to convert also the time after 13 considering one hour of lunch break.
I am sorry that what I developed is just a Django application and not a Python package. But for the moment, it works for me. And sorry also for my code, I am a newbie :)
PS: the full documentation is coming soon!

Design - How to handle timestamps (storage) and when performing computations ; Python

I'm trying to determine (as my application is dealing with lots of data from different sources and different time zones, formats, etc) how best to store my data AND work with it.
For example, should I store everything as UTC? This means when I fetch data I need to determine what timezone it is currently in, and if it's NOT UTC, do the necessary conversion to make it so. (Note, I'm in EST).
Then, when performing computations on the data, should I extract (say it's UTC) and get into MY time zone (EST), so it makes sense when I'm looking at it? I should I keep it in UTC and do all my calculations?
A lot of this data is time series and will be graphed, and the graph will be in EST.
This is a Python project, so lets say I have a data structure that is:
"id1": {
"interval": 60, <-- seconds, subDict['interval']
"last": "2013-01-29 02:11:11.151996+00:00" <-- UTC, subDict['last']
},
And I need to operate on this, by determine if the current time (now()) is > the last + interval (has the 60 second elapsed)? So in code:
lastTime = dateutil.parser.parse(subDict['last'])
utcNow = datetime.datetime.utcnow().replace(tzinfo=tz.tzutc())
if lastTime + datetime.timedelta(seconds=subDict['interval']) < utcNow:
print "Time elapsed, do something!"
Does that make sense? I'm working with UTC everywhere, both stored and computationally...
Also, if anyone has links to good write-ups on how to work with timestamps in software, I'd love to read it. Possibly like a Joel On Software for timestamp usage in applications ?
It seems to me as though you're already doing things 'the right way'. Users will probably expect to interact in their local time zone (input and output), but it's normal to store normalized dates in UTC format so that they are unambiguous and to simplify calculation. So, normalize to UTC as soon as possible, and localize as late as possible.
Some small amount of information about Python and timezone processing can be found here:
Django timezone implementation
pytz Documentation
My current preference is to store dates as unix timestamp tv_sec values in backend storage, and convert to Python datetime.datetime objects during processing. Processing will usually be done with a datetime object in the UTC timezone and then converted to a local user's timezone just before output. I find having that having a rich object such as a datetime.datetime helps with debugging.
Timezone are a nuisance to deal with and you probably need to determine on a case-by-case basis whether it's worth the effort to support timezones correctly.
For example, let's say you're calculating daily counts for bandwidth used. Some questions that may arise are:
What happens on a daylight saving boundary? Should you just assume that a day is always 24 hours for ease of calculation or do you need to always check for every daily calculation that a day may have less or more hours on the daylight savings boundary?
When presenting a localized time, does it matter if a time is repeated? eg. If you have an hourly report display in localtime without a time zone attached, will it confuse the user to have a missing hour of data, or a repeated hour of data around daylight savings changes.
Since, as I can see, you do not seem to be having any implementation problems, I would focus rather on design aspects than on code and timestamp format. I have an experience of participating in design of network support for a navigation system implemented as a distributed system in a local network. The nature of that system is such that there is a lot of data (often conflicting), coming from different sources, so solving possible conflicts and keeping data integrity is rather tricky. Just some thoughts based on that experience.
Timestamping data, even in a distributed system including many computers, usually is not a problem if you do not need a higher resoluition than one provided by system time functions and higher time synchronization accuracy than one provided by your OS components.
In the simplest case using UTC is quite reasonable, and for most of tasks it's enough. However, it's important to understand the purpose of using time stamps in your system from the very beginning of design. Time values (no matter if it is Unix time or formatted UTC strings) sometimes may be equal. If you have to resolve data conflicts based on timestamps (I mean, to always select a newer (or an older) value among several received from different sources), you need to understand if an incorrectly resolved conflict (that usually means a conflict that may be resolved in more than one way, as timestamps are equal) is a fatal problem for your system design, or not. The probable options are:
If the 99.99% of conflicts are resolved in the same way on all the nodes, you do not care about the remaining 0.01%, and they do not break data integrity. In that case you may safely continue using something like UTC.
If strict resolving of all the conflicts is a must for you, you have to design your own timestamping system. Timestamps may include time (maybe not system time, but some higher resolution timer), sequence number (to allow producing unique timestamps even if time resolution is not enough for that) and node identifier (to allow different nodes of your system to generate completely unique timestamps).
Finally, what you need may be not timestamps based on time. Do you really need to be able to calculate time difference between a pair of timestamps? Isn't it enough just to allow ordering timestamps, not connecting them to real time moments? If you don't need time calculations, just comparisons, timestamps based on sequential counters, not on real time, are a good choice (see Lamport time for more details).
If you need strict conflict resolving, or if you need very high time resolution, you will probably have to write your own timestamp service.
Many ideas and clues may be borrowed from a book by A. Tanenbaum, "Distributed systems: Principles and paradigms". When I faced such problems, it helped me a lot, and there is a separate chapter dedicated to timestamps generation in it.
I think the best approach is to store all timestamp data as UTC. When you read it in, immediately convert to UTC; right before display, convert from UTC to your local time zone.
You might even want to have your code print all timestamps twice, once in local time and the second time in UTC time... it depends on how much data you need to fit on a screen at once.
I am a big fan of the RFC 3339 timestamp format. It is unambiguous to both humans and machines. What is best about it is that almost nothing is optional, so it always looks the same:
2013-01-29T19:46:00.00-08:00
I prefer to convert timestamps to single float values for storage and computations, and then convert back to the datetime format for display. I wouldn't keep money in floats, but timestamp values are well within the precision of float values!
Working with time floats makes a lot of code very easy:
if time_now() >= last_time + interval:
print("interval has elapsed")
It looks like you are already doing it pretty much this way, so I can't suggest any dramatic improvements.
I wrote some library functions to parse timestamps into Python time float values, and convert time float values back to timestamp strings. Maybe something in here will be useful to you:
http://home.blarg.net/~steveha/pyfeed.html
I suggest you look at feed.date.rfc3339. BSD license, so you can just use the code if you like.
EDIT: Question: How does this help with timezones?
Answer: If every timestamp you store is stored in UTC time as a Python time float value (number of seconds since the epoch, with optional fractional part), you can directly compare them; subtract one from another to find out the interval between them; etc. If you use RFC 3339 timestamps, then every timestamp string has the timezone right there in the timestamp string, and it can be correctly converted to UTC time by your code. If you convert from a float value to a timestamp string value right before displaying, the timezone will be correct for local time.
Also, as I said, it looks like he is already pretty much doing this, so I don't think I can give any amazing advice.
Personally I'm using the Unix-time standard, it's very convenient for storage due to its simple representation form, it's merely a sequence of numbers. Since internally it represent UTC time, you have to make sure to generate it properly (converting from other timestamps) before storing and format it accordingly to any time zone you want.
Once you have a common timestamp format in the backend data (tz aware), plotting the data is very easy as is just a matter of setting the destination TZ.
As an example:
import time
import datetime
import pytz
# print pre encoded date in your local time from unix epoch
example = {"id1": {
"interval": 60,
"last": 1359521160.62
}
}
#this will use your system timezone formatted
print time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(example['id1']['last']))
#this will use ISO country code to localize the timestamp
countrytz = pytz.country_timezones['BR'][0]
it = pytz.timezone(countrytz)
print it.localize(datetime.datetime.utcfromtimestamp(example['id1']['last']))

General Python regex to extract dates (d,m,y) in different formats

I am looking for a way to extract dates (day, month, year) from a text. That is, I want to find all dates (or rather - as many as possible) in a human-written string.
Is there a Python regular expression covering as many possible formats as possible?
Comment:
from dateutil.parser import parse
parse(s, fuzzy = True)
works fine but it is constrained to one date per one string.
Example:
A program is taking place at sth from 21 January 2013 to 15th of February 2013.
Applications for funding will be accepted until April 15, 2012. Notification of acceptance : 1st Aug. or later. Early payment due: 15.10.12. etc. Late: 11/20/12.
Usually (but not always) convention is more-or-less consistent for a single entry.
It is easy to create an regex for a few cases, I can do that. The question is if there is already one collecting many different.
If you want to roll your own, you can take inspiration from the Regexp::Common's time module, and the patterns there for time and dates.
Be warned: the code (direct link to it) is not trivial.
I've had good luck with the module parsedatetime:
from parsedatetime import parsedatetime, parsedatetime_consts
pdt = parsedatetime.Calendar(parsedatetime_consts.Constants())
parsed, code = pdt.parse('''Your string''')

Is a day always 86,400 epoch seconds long?

While reviewing my past answers, I noticed I'd proposed code such as this:
import time
def dates_between(start, end):
# muck around between the 9k+ time representation systems in Python
# now start and end are seconds since epoch
# return [start, start + 86400, start + 86400*2, ...]
return range(start, end + 1, 86400)
When rereading this piece of code, I couldn't help but feel the ghastly touch of Tony the Pony on my spine, gently murmuring "leap seconds" to my ears and other such terrible, terrible things.
When does the "a day is 86,400 seconds long" assumption break, for epoch definitions of 'second', if ever? (I assume functions such as Python's time.mktime already return DST-adjusted values, so the above snippet should also work on DST switching days... I hope?)
Whenever doing calendrical calculations, it is almost always better to use whatever API the platform provides, such as Python's datetime and calendar modules, or a mature high-quality library, than it is to write "simpler" code yourself. Date and calendar APIs are ugly and complicated, but that's because real-world calendars have a lot of weird behavior.
For example, if it is "10:00:00 AM" right now, then the number of seconds to "10:00:00 AM tomorrow" could be a few different things, depending on what timezone(s) you are using, whether DST is starting or ending tonight, and so on.
Any time the constant 86400 appears in your code, there is a good chance you're doing something that's not quite right.
And things get even more complicated when you need to determine the number of seconds in a week, a month, a year, a quarter, and so on. Learn to use those calendar libraries.
Number of seconds in a day depends on time system that you use e.g., in POSIX, a day is exactly 86400 seconds by definition:
As represented in seconds since the Epoch, each and every day shall be
accounted for by exactly 86400 seconds.
In UTC, there could be a leap second included i.e., a day can be 86401 SI seconds (and theoretically 86399 SI seconds). As of Jun 30 2015, it has happened 26 times.
If we measure days by apparent motion of the Sun then the length of a (solar) day varies through the year by ~16 minutes from the mean.
In turn it is different from UT1 that is also based on rotation of the Earth (mean solar time). An apparent solar day can be 20 seconds shorter or 30 seconds longer than a mean solar day. UTC is kept within 0.9 seconds of UT1 by the introduction of occasional intercalary leap seconds.
If you define a day by local clock then it may be very chaotic due to bizarre political timezone changes. It is not correct to assume that a day may change only by an hour due to DST.
According to Wikipedia,
UTC days are almost always 86 400 s long, but due to "leap seconds"
are occasionally 86 401 s and could be 86 399 s long (though the
latter option has never been used as of December 2010); this keeps the
days synchronized with the rotation of the Earth (or Universal Time).
I expect that a double leap second could in fact make the day 86402s long, if that were to ever be used.
EDIT again: second guessed myself due to confusing python documentation. time.mktime always returns UTC epoch seconds. There done. : )
In all time zones that "support" daylight savings time, you'll get two days a year that don't have 24h. They'll have 25h or 23h respectively. And don't even think of hardcoding those dates. They change every year, and between time zones.
Oh, and here's a list of 34 other reasons that you hadn't thought about, and why you shouldn't do what you're doing.

Categories

Resources