Python strptime or alternative for complex date string parsing - python

I have been given a large list of date-time representations that need to be read into a database. I am using Python (because it rocks). The strings are in a terrible, terrible format where they are not precise to seconds, no timezone is stated, and the hours do not have a leading 0. So they look more like this:
April 29, 2013, 7:52 p.m.
April 30, 2013, 4 p.m.
You'll notice that if something happens between 4:00 and 4:01 it drops the minutes, too (ugh). Anyway, trying to parse these with time.strptime, but the docs state that hours must be decimal numbers [01:12] (or [01:24]). Since nothing is padded with 0's I'm wondering if there is something else I can pass to strptime to accept hours without leading 0; or if I should try splitting, then padding the strings; or use some other method of constructing the datetime object.
Also, it does not look like strptime accepts AM/PM as "A.M." or "P.M.", so I'll have to correct that as well. . .
Note, I am not able to just handle these strings in a batch. I receive them one-at-a-time from a foreign application which sometimes uses nicely formatted Unix epoch timestamps, but occasionally uses this format. Processing them on the fly is the only option.
I am using Python 2.7 with some Python 3 features imported.
from __future__ import (print_function, unicode_literals)

The most flexible parser is part of the dateutil package; it eats your input for breakfast:
>>> from dateutil import parser
>>> parser.parse('April 29, 2013, 7:52 p.m.')
datetime.datetime(2013, 4, 29, 19, 52)
>>> parser.parse('April 30, 2013, 4 p.m.')
datetime.datetime(2013, 4, 30, 16, 0)

Related

python strptime vs dateutil - recommended use

I need to convert between strings and datetime objects quite often - up until now i have always used strptime and strftime.
I started working with the google calendar API where i recieve strings like this: '2018-03-17T09:00:00+01:00
It seems like i need to convert the +01:00 into 0100 for strptime which is a little annoying.
While i dont have this issue with dateutil there are a few other inconveniences.Also i saw that the last update of dateutil was in 2016 which seems odd.
So my question is which one would you recommend for adding and substracting dates and datetimes and switching between string and datetime obj?
Also is dateutil still maintained or is it outdated?
Thanks a lot!
- Sally
It seems like i need to convert the +01:00 into 0100 for strptime
No you don't.
For one thing, the standard format is +0100, not 0100.
For another, strptime handles +01:00 just fine:
>>> datetime.datetime.strptime('2018-08-13T11:18:24+00:00',
... '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2018, 8, 13, 11, 18, 24, tzinfo=datetime.timezone.utc)
>>> datetime.datetime.strptime('2018-08-13T11:18:24+01:00',
... '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2018, 8, 13, 11, 18, 24, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))
So, the problem you're trying to solve doesn't exist in the first place.
While i dont have this issue with dateutil there are a few other inconveniences.Also i saw that the last update of dateutil was in 2016 which seems odd.
As of 13 Aug 2018, the last update to dateutil was 2 days ago. And the last official release to PyPI, version 2.7.3, was 3 months ago.
So, your secondary problem doesn't exist either.
So my question is which one would you recommend for adding and substracting dates and datetimes and switching between string and datetime obj?
Since dateutil just gives you the same datetime objects that datetime gives you, neither one is better for adding and subtracting dates and datetimes.
For converting to and from string format, sometimes dateutil is more convenient, and it also supports a wider range of formats that you don't care about—but for what you're doing, they both work fine, so there's no difference. If you expect to need other formats in the future, it might be worth bringing in dateutil, but if not, you might as well stick with the standard library.
dateutil last version (2.7.3) is in may 2018. It just says "copyright 2016" somewhere in the credits. Moreover, the documentation talks about policy for future versions, so it seems to be quite active. I would suggest to prefer it over strptime. However, be sure to get the latest version. Previous versions had a bug with converting ISO dates, with which you are working.

Python: How to parse Timex3 into datetime or equivalent

In my Python3 project, I use SUTime together with Stanford CoreNLP to retrieve normalized time expression in the Timex3 standard. I access CoreNLP using pycorenlp. How can I parse the resulting time expression in Timex3 (part of the TimeML standard) into a datetime or another temporal instance? I guess that datetime is probably not sufficient since it cannot represent dateranges
For instance, Timex3 expressions are:
2017-11 referring to the whole month November
2017-10-07 referring to the whole seventh day in November
I've already tried parsing them with the Python library parsedatetime, which is able to read almost any input containing a temporal expression, but the library is obviously not suited for the Timex3 format, i.e., it converts 2017-11 to a datetime of the current day (assuming it's November 2017):
>>> dateparser.parse('2017-11')
datetime.datetime(2017, 11, 27, 0, 0)
Isn't there a Python library for converting Timex3 expressions into datetime or something equivalent? Someone commented that 2017-11 cannot be converted into a datetime (as each only reflects a single points in time). But there's for example the pandaslibrary, which contains date ranges as well.

convert unicode to datetime strptime python [duplicate]

This question already has answers here:
Convert timestamps with offset to datetime obj using strptime
(4 answers)
Closed 5 years ago.
The format from api google calendar is like '2017-04-26T13:00:00+02:00'.
I tried to convert to datetime without success:
datetime.datetime.strptime('2017-04-26T13:00:00+02:00', '%Y-%m-%dT%H:%M:%S+%z')
Es error I get:
'ValueError: time data, '2017-04-26T13:00:00+02:00' does not match format '%Y-%m-%dT%H:%M:%S+%z'
How do I get this code to work?
Take a look at dateutil, the parse function will help you out.
>>> from dateutil.parser import parse
>>> dt = parse('2017-04-26T13:00:00+02:00')
>>> dt
datetime.datetime(2017, 4, 26, 13, 0, tzinfo=tzoffset(None, 7200))
this is because strptime expects for the %z format a string that looks like: +0200 not +02:00, i.e.:
>>> datetime.datetime.strptime('2017-04-26T13:00:00+0200', '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2017, 4, 26, 13, 0, tzinfo=datetime.timezone(datetime.timedelta(0, 7200)))
so depending on your context, either you modify the string to fit strptime() expectations, or you can use a higher level library to handle your dates. Like maya or pendulum, that offer very flexible facilities to handle format parsing and relative date management.
>>> import pendulum, maya # you need to pip install them
>>> pendulum.parse('2017-04-26T13:00:00+02:00')
<Pendulum [2017-04-26T13:00:00+02:00]>
>>> maya.parse('2017-04-26T13:00:00+02:00')
<MayaDT epoch=1493204400.0>
You may want to take a look in the documentation for PyTZ and Arrow, tho external modules that add functionality related to date and time. For instance, with arrow, you can do it pretty easily like this:
import arrow
my_arrow_object = arrow.get('2017-04-26T13:00:00+02:00')
my_datetime_object = my_arrow_object.datetime
print(my_datetime_object)
print(type(my_datetime_object))
This code prints the obtained timestamp and the type of my_datetime_object object, which is a datetime.datetime:
2017-04-26 13:00:00+02:00
<class 'datetime.datetime'>
Arrow also allows to easily generate "humanized" time strings (like "an hour ago" ou "two months ago"), which can be handy.

Parse timezone from a human time input

I'm having trouble parsing out timezone information from a string that looks like:
8pm PST on sunday
So far, using parsedatetime and dateutils allows me to parse the date out but the timezone usually causes issues.
Anyone know of a library that handles this sort of thing? My fallback is to naively parse out the timezones via a regex or a simple "PST" in datestring.
The abbreviations you're using are not unique; you will therefore need to interpret the time zones somehow (e.g. assume United States) and specify what each abbreviation means for your application:
from dateutil import parser
# map time zones to seconds from GMT
zones = {
'EST': -5 * 3600,
'PST': -8 * 3600
}
parser.parse('8 PM on Sunday PST', tzinfos=zones)
# datetime.datetime(2016, 4, 24, 20, 0, tzinfo=tzoffset('PST', -28800))
You can install dateutil with pip: pip install python-dateutil.
See this similar question for more information.

How do I get the correct date format string for a given locale without setting that locale program-wide in Python?

I'm trying to generate dictionaries that include dates which can be asked to be given in specific locales. For example, I might want my function to return this when called with en_US as an argument:
{'date': 'May 12, 2014', ...}
And this, when called with hu_HU:
{'date': '2014. május 12.', ...}
Now, based on what I've found so far, I should be using locale.setlocale() to set the locale I want to use, and locale.nl_langinfo(locale.D_FMT) to get the appropriate date format. I could call locale.resetlocale() after this to go back to the previously used one, but my program uses multiple threads, and I assume the other ones will be affected by this temporary locale change as well.
There is the non-standard babel module which offers this and a lot more:
>>> import babel.dates
>>> babel.dates.format_datetime(locale='ru_RU')
'12 мая 2014 г., 8:24:08'
>>> babel.dates.format_datetime(locale='de_DE')
'12.05.2014 08:24:14'
>>> babel.dates.format_datetime(locale='en_GB')
'12 May 2014 08:24:16'
>>> from datetime import datetime, timedelta
>>> babel.dates.format_datetime(datetime(2014, 4, 1), locale='en_GB')
'1 Apr 2014 00:00:00'
>>> babel.dates.format_timedelta(datetime.now() - datetime(2014, 4, 1),
locale='en_GB')
'1 month'
It might be expensive to do but how about before starting your threads you build a dictionary of country code: datetime format entries?
for lang in lang_codes:
locale.setlocale(lang)
time_format[lang] = locale.nl_langinfo(locale.D_FMT)
locale.resetlocale()
You can then use the format strings whenever you need the date in a specific format time.strftime(time_format[lang], t). Of course, this doesn't take care of GMT shifts.
The better way would be to find where locale gets its mapping from but I don't know that and don't have the time to investigate right now.

Categories

Resources