How do I parse an HTTP date-string in Python? - python

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.
In other words, I want to convert a string like "Wed, 23 Sep 2009 22:15:29 GMT" to a python time-structure.

>>> import email.utils, datetime
>>> email.utils.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)
If you want a datetime.datetime object, you can do:
# Python <3.3
def my_parsedate(text):
return datetime.datetime(*eut.parsedate(text)[:6])
# Python ≥3.3
def my_parsedate(text):
return email.utils.parsedate_to_datetime(text)
email.utils.parsedate
Attempts to parse a date according to the rules in RFC 2822. however, some mailers don’t follow that format as specified, so parsedate() tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as "Mon, 20 Nov 1995 19:12:08 -0500". If it succeeds in parsing the date, parsedate() returns a 9-tuple that can be passed directly to time.mktime(); otherwise None will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable.
email.utils.parsedate_to_datetime
The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime; otherwise ValueError is raised if date contains an invalid value such as an hour greater than 23 or a timezone offset not between -24 and 24 hours. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo.

Since Python 3.3 there's email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate, Internet Message Format fixed length format, a subset of HTTP-date of RFC 7231).
>>> from email.utils import parsedate_to_datetime
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
There's also undocumented http.cookiejar.http2time which can achieve the same as follows:
>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires directive which is expressed in the same format.

>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)

httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
it can probably parse many date formats
httplib is in the core
NOTE:
had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.
you can do this, if you only have that piece of string and you want to parse it:
>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>>
but let me exemplify through mime messages:
import mimetools
import StringIO
message = mimetools.Message(
StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
or via http messages (responses)
>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
right?
>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)
there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)
whatever the case, looks better than using email.utils for parsing http headers.

Related

Python convert date string to timestamp

I need to convert string type Wed, 18 May 2016 11:21:35 GMT to timestamp, in Python. I'm using:
datetime.datetime.strptime(string, format)
But I don't want to specify the format for the date type.
But I don't want to specify the format for the date type.
Then, let the dateutil parser figure that out:
>>> from dateutil.parser import parse
>>> parse("Wed, 18 May 2016 11:21:35 GMT")
datetime.datetime(2016, 5, 18, 11, 21, 35, tzinfo=tzutc())
To parse rfc 822 time string that is used in email, http, and other internet protocols, you could use email stdlib module:
#!/usr/bin/env python
from email.utils import parsedate_tz, mktime_tz
timestamp = mktime_tz(parsedate_tz("Wed, 18 May 2016 11:21:35 GMT"))
See Converting string to datetime object.

Parsing datetime in Python..?

I have a system (developed in Python) that accepts datetime as string in VARIOUS formats and i have to parse them..Currently datetime string formats are :
Fri Sep 25 18:09:49 -0500 2009
2008-06-29T00:42:18.000Z
2011-07-16T21:46:39Z
1294989360
Now i want a generic parser that can convert any of these datetime formats in appropriate datetime object...
Otherwise, i have to go with parsing them individually. So please also provide method for parsing them individually (if there is no generic parser)..!!
As #TimPietzcker suggested, the dateutil package is the way to go, it handles the first 3 formats correctly and automatically:
>>> from dateutil.parser import parse
>>> parse("Fri Sep 25 18:09:49 -0500 2009")
datetime.datetime(2009, 9, 25, 18, 9, 49, tzinfo=tzoffset(None, -18000))
>>> parse("2008-06-29T00:42:18.000Z")
datetime.datetime(2008, 6, 29, 0, 42, 18, tzinfo=tzutc())
>>> parse("2011-07-16T21:46:39Z")
datetime.datetime(2011, 7, 16, 21, 46, 39, tzinfo=tzutc())
The unixtime format it seems to hiccough on, but luckily the standard datetime.datetime is up for the task:
>>> from datetime import datetime
>>> datetime.utcfromtimestamp(float("1294989360"))
datetime.datetime(2011, 1, 14, 7, 16)
It is rather easy to make a function out of this that handles all 4 formats:
from dateutil.parser import parse
from datetime import datetime
def parse_time(s):
try:
ret = parse(s)
except ValueError:
ret = datetime.utcfromtimestamp(s)
return ret
You should look into the dateutil package.

Text to datetime in Python [duplicate]

This question already has answers here:
Python time to age
(6 answers)
Closed 8 years ago.
Following on from my previous question, Python time to age, I have now come across a problem regarding the timezone, and it turns out that it's not always going to be "+0200". So when strptime tries to parse it as such, it throws up an exception.
I thought about just chopping off the +0200 with [:-6] or whatever, but is there a real way to do this with strptime?
I am using Python 2.5.2 if it matters.
>>> from datetime import datetime
>>> fmt = "%a, %d %b %Y %H:%M:%S +0200"
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200", fmt)
datetime.datetime(2008, 7, 22, 8, 17, 41)
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0300", fmt)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError: time data did not match format: data=Tue, 22 Jul 2008 08:17:41 +0300 fmt=%a, %d %b %Y %H:%M:%S +0200
is there a real way to do this with strptime?
No, but since your format appears to be an RFC822-family date, you can read it much more easily using the email library instead:
>>> import email.utils
>>> email.utils.parsedate_tz('Tue, 22 Jul 2008 08:17:41 +0200')
(2008, 7, 22, 8, 17, 41, 0, 1, 0, 7200)
(7200 = timezone offset from UTC in seconds)
New in version 2.6.
For a naive object, the %z and %Z
format codes are replaced by empty
strings.
It looks like this is implemented only in >= 2.6, and I think you have to manually parse it.
I can't see another solution than to remove the time zone data:
from datetime import timedelta,datetime
try:
offset = int("Tue, 22 Jul 2008 08:17:41 +0300"[-5:])
except:
print "Error"
delta = timedelta(hours = offset / 100)
fmt = "%a, %d %b %Y %H:%M:%S"
time = datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200"[:-6], fmt)
time -= delta
You can use the dateutil library which is very useful:
from datetime import datetime
from dateutil.parser import parse
dt = parse("Tue, 22 Jul 2008 08:17:41 +0200")
## datetime.datetime(2008, 7, 22, 8, 17, 41, tzinfo=tzoffset(None, 7200)) <- dt
print dt
2008-07-22 08:17:41+02:00
As far as I know, strptime() doesn't recognize numeric time zone codes. If you know that the string is always going to end with a time zone specification of that form (+ or - followed by 4 digits), just chopping it off and parsing it manually seems like a perfectly reasonable thing to do.
It seems that %Z corresponds to time zone names, not offsets.
For example, given:
>>> format = '%a, %d %b %Y %H:%M:%S %Z'
I can parse:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 GMT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
Although it seems that it doesn't do anything with the time zone, merely observing that it exists and is valid:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 NZDT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
I suppose if you wished, you could locate a mapping of offsets to names, convert your input, and then parse it. It might be simpler to just truncate your input, though.

How to preserve timezone when parsing date/time strings with strptime()?

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where EST is an Australian time-zone):
Tue Jun 22 07:46:22 EST 2010
I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.
>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')
However, for some reason, the datetime object that comes back doesn't seem to have any tzinfo associated with it.
I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can't find anything to that effect documented here.
Is there any way to get strptime() to play nicely with timezones?
I recommend using python-dateutil. Its parser has been able to parse every date format I've thrown at it so far.
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
and so on. No dealing with strptime() format nonsense... just throw a date at it and it Does The Right Thing.
The datetime module documentation says:
Return a datetime corresponding to date_string, parsed according to format. This is equivalent to datetime(*(time.strptime(date_string, format)[0:6])).
See that [0:6]? That gets you (year, month, day, hour, minute, second). Nothing else. No mention of timezones.
Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime doesn't work but if you strip off the " %Z" and the " EST" it does work. Also using "UTC" or "GMT" instead of "EST" works. "PST" and "MEZ" don't work. Puzzling.
It's worth noting this has been updated as of version 3.2 and the same documentation now also states the following:
When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.
Note that this doesn't work with %Z, so the case is important. See the following example:
In [1]: from datetime import datetime
In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')
In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None
In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')
In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00
Since strptime returns a datetime object which has tzinfo attribute, We can simply replace it with desired timezone.
>>> import datetime
>>> date_time_str = '2018-06-29 08:15:27.243860'
>>> date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S.%f').replace(tzinfo=datetime.timezone.utc)
>>> date_time_obj.tzname()
'UTC'
Your time string is similar to the time format in rfc 2822 (date format in email, http headers). You could parse it using only stdlib:
>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)
See solutions that yield timezone-aware datetime objects for various Python versions: parsing date with timezone from an email.
In this format, EST is semantically equivalent to -0500. Though, in general, a timezone abbreviation is not enough, to identify a timezone uniquely.
Ran into this exact problem.
What I ended up doing:
# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'
# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)
# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))
# set timezone
import pendulum
tz = pendulum.timezone('utc')
dt_tz = datetime(*dt_vals,tzinfo=tz)

Python time to age, part 2: timezones [duplicate]

This question already has answers here:
Python time to age
(6 answers)
Closed 8 years ago.
Following on from my previous question, Python time to age, I have now come across a problem regarding the timezone, and it turns out that it's not always going to be "+0200". So when strptime tries to parse it as such, it throws up an exception.
I thought about just chopping off the +0200 with [:-6] or whatever, but is there a real way to do this with strptime?
I am using Python 2.5.2 if it matters.
>>> from datetime import datetime
>>> fmt = "%a, %d %b %Y %H:%M:%S +0200"
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200", fmt)
datetime.datetime(2008, 7, 22, 8, 17, 41)
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0300", fmt)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError: time data did not match format: data=Tue, 22 Jul 2008 08:17:41 +0300 fmt=%a, %d %b %Y %H:%M:%S +0200
is there a real way to do this with strptime?
No, but since your format appears to be an RFC822-family date, you can read it much more easily using the email library instead:
>>> import email.utils
>>> email.utils.parsedate_tz('Tue, 22 Jul 2008 08:17:41 +0200')
(2008, 7, 22, 8, 17, 41, 0, 1, 0, 7200)
(7200 = timezone offset from UTC in seconds)
New in version 2.6.
For a naive object, the %z and %Z
format codes are replaced by empty
strings.
It looks like this is implemented only in >= 2.6, and I think you have to manually parse it.
I can't see another solution than to remove the time zone data:
from datetime import timedelta,datetime
try:
offset = int("Tue, 22 Jul 2008 08:17:41 +0300"[-5:])
except:
print "Error"
delta = timedelta(hours = offset / 100)
fmt = "%a, %d %b %Y %H:%M:%S"
time = datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200"[:-6], fmt)
time -= delta
You can use the dateutil library which is very useful:
from datetime import datetime
from dateutil.parser import parse
dt = parse("Tue, 22 Jul 2008 08:17:41 +0200")
## datetime.datetime(2008, 7, 22, 8, 17, 41, tzinfo=tzoffset(None, 7200)) <- dt
print dt
2008-07-22 08:17:41+02:00
As far as I know, strptime() doesn't recognize numeric time zone codes. If you know that the string is always going to end with a time zone specification of that form (+ or - followed by 4 digits), just chopping it off and parsing it manually seems like a perfectly reasonable thing to do.
It seems that %Z corresponds to time zone names, not offsets.
For example, given:
>>> format = '%a, %d %b %Y %H:%M:%S %Z'
I can parse:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 GMT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
Although it seems that it doesn't do anything with the time zone, merely observing that it exists and is valid:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 NZDT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
I suppose if you wished, you could locate a mapping of offsets to names, convert your input, and then parse it. It might be simpler to just truncate your input, though.

Categories

Resources