Python convert date string to timestamp - python

I need to convert string type Wed, 18 May 2016 11:21:35 GMT to timestamp, in Python. I'm using:
datetime.datetime.strptime(string, format)
But I don't want to specify the format for the date type.

But I don't want to specify the format for the date type.
Then, let the dateutil parser figure that out:
>>> from dateutil.parser import parse
>>> parse("Wed, 18 May 2016 11:21:35 GMT")
datetime.datetime(2016, 5, 18, 11, 21, 35, tzinfo=tzutc())

To parse rfc 822 time string that is used in email, http, and other internet protocols, you could use email stdlib module:
#!/usr/bin/env python
from email.utils import parsedate_tz, mktime_tz
timestamp = mktime_tz(parsedate_tz("Wed, 18 May 2016 11:21:35 GMT"))
See Converting string to datetime object.

Related

Why does Python's datetime strptime() not set timezone when %Z is specified in a string? [duplicate]

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where EST is an Australian time-zone):
Tue Jun 22 07:46:22 EST 2010
I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.
>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')
However, for some reason, the datetime object that comes back doesn't seem to have any tzinfo associated with it.
I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can't find anything to that effect documented here.
Is there any way to get strptime() to play nicely with timezones?
I recommend using python-dateutil. Its parser has been able to parse every date format I've thrown at it so far.
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
and so on. No dealing with strptime() format nonsense... just throw a date at it and it Does The Right Thing.
The datetime module documentation says:
Return a datetime corresponding to date_string, parsed according to format. This is equivalent to datetime(*(time.strptime(date_string, format)[0:6])).
See that [0:6]? That gets you (year, month, day, hour, minute, second). Nothing else. No mention of timezones.
Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime doesn't work but if you strip off the " %Z" and the " EST" it does work. Also using "UTC" or "GMT" instead of "EST" works. "PST" and "MEZ" don't work. Puzzling.
It's worth noting this has been updated as of version 3.2 and the same documentation now also states the following:
When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.
Note that this doesn't work with %Z, so the case is important. See the following example:
In [1]: from datetime import datetime
In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')
In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None
In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')
In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00
Since strptime returns a datetime object which has tzinfo attribute, We can simply replace it with desired timezone.
>>> import datetime
>>> date_time_str = '2018-06-29 08:15:27.243860'
>>> date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S.%f').replace(tzinfo=datetime.timezone.utc)
>>> date_time_obj.tzname()
'UTC'
Your time string is similar to the time format in rfc 2822 (date format in email, http headers). You could parse it using only stdlib:
>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)
See solutions that yield timezone-aware datetime objects for various Python versions: parsing date with timezone from an email.
In this format, EST is semantically equivalent to -0500. Though, in general, a timezone abbreviation is not enough, to identify a timezone uniquely.
Ran into this exact problem.
What I ended up doing:
# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'
# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)
# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))
# set timezone
import pendulum
tz = pendulum.timezone('utc')
dt_tz = datetime(*dt_vals,tzinfo=tz)

Turn a String into a Python Date object

I have a String, Sent: Fri Sep 18 00:30:12 2009 that I want to turn into a Python date object.
I know there's a strptime() function that can be used like so:
>>> dt_str = '9/24/2010 5:03:29 PM'
>>> dt_obj = datetime.strptime(dt_str, '%m/%d/%Y %I:%M:%S %p')
>>> dt_obj
datetime.datetime(2010, 9, 24, 17, 3, 29)
Can anybody think of an easier way to accomplish this than going through a bunch of conditionals to parse out if Sep, month = 9?
To parse rfc 822-like date-time string, you could use email stdlib package:
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Fri Sep 18 00:30:12 2009')
datetime.datetime(2009, 9, 18, 0, 30, 12)
This is Python 3 code, see Python 2.6+ compatible code.
You could also provide the explicit format string:
>>> from datetime import datetime
>>> datetime.strptime('Fri Sep 18 00:30:12 2009', '%a %b %d %H:%M:%S %Y')
datetime.datetime(2009, 9, 18, 0, 30, 12)
See the table with the format codes.
Use the python-dateutil library!
First: pip install python-dateutil into your virtual-env if you have one then you can run the following code:
from dateutil import parser
s = u'Sent: Fri Sep 18 00:30:12 2009'
date = parser.parse(s.split(':', 1)[-1])

Reformatting Dates Python

I have a file with dates in a few different formats and am trying to get them all into YYYYMMDD format in Python. Most of the dates are in the below format:
Mon, 01 Jul 2013 16:33:59 GMT
and I have no idea how to get them into
20130701
I apologize if this is a pretty simple question---I am sort of new to python
EDIT: I am trying to do this for ANY given date. I used the 01 July as an example and in retrospect made it seem like I was asking a different question. So I guess I am looking for something that can both find dates and reformat them
Use the python-dateutil library:
from dateutil import parser
dtobject = parser.parse(datestring)
The datutil.parser.parse() method recognises a wide variety of date formats, and returns a datetime.datetime() object.
Use the datetime.strftime() method if you want to format the result as a (uniform) string again:
dtobject.strftime('%Y%m%d')
Demo:
>>> from dateutil import parser
>>> parser.parse('Mon, 01 Jul 2013 16:33:59 GMT')
datetime.datetime(2013, 7, 1, 16, 33, 59, tzinfo=tzlocal())
>>> parser.parse('Mon, 01 Jul 2013 16:33:59 GMT').strftime('%Y%m%d')
'20130701'
This can be achieved following way also:
import datetime
x = 'Mon, 01 Jul 2013 16:33:59 GMT'
''.join(str(datetime.datetime.strptime(x, '%a, %d %b %Y %H:%M:%S %Z').date()).split('-'))
if any other parameter is introduced in your date string, you can include the directive . for example %p is Locale’s equivalent of either AM or PM.

How to preserve timezone when parsing date/time strings with strptime()?

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where EST is an Australian time-zone):
Tue Jun 22 07:46:22 EST 2010
I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.
>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')
However, for some reason, the datetime object that comes back doesn't seem to have any tzinfo associated with it.
I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can't find anything to that effect documented here.
Is there any way to get strptime() to play nicely with timezones?
I recommend using python-dateutil. Its parser has been able to parse every date format I've thrown at it so far.
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
and so on. No dealing with strptime() format nonsense... just throw a date at it and it Does The Right Thing.
The datetime module documentation says:
Return a datetime corresponding to date_string, parsed according to format. This is equivalent to datetime(*(time.strptime(date_string, format)[0:6])).
See that [0:6]? That gets you (year, month, day, hour, minute, second). Nothing else. No mention of timezones.
Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime doesn't work but if you strip off the " %Z" and the " EST" it does work. Also using "UTC" or "GMT" instead of "EST" works. "PST" and "MEZ" don't work. Puzzling.
It's worth noting this has been updated as of version 3.2 and the same documentation now also states the following:
When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.
Note that this doesn't work with %Z, so the case is important. See the following example:
In [1]: from datetime import datetime
In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')
In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None
In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')
In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00
Since strptime returns a datetime object which has tzinfo attribute, We can simply replace it with desired timezone.
>>> import datetime
>>> date_time_str = '2018-06-29 08:15:27.243860'
>>> date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S.%f').replace(tzinfo=datetime.timezone.utc)
>>> date_time_obj.tzname()
'UTC'
Your time string is similar to the time format in rfc 2822 (date format in email, http headers). You could parse it using only stdlib:
>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)
See solutions that yield timezone-aware datetime objects for various Python versions: parsing date with timezone from an email.
In this format, EST is semantically equivalent to -0500. Though, in general, a timezone abbreviation is not enough, to identify a timezone uniquely.
Ran into this exact problem.
What I ended up doing:
# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'
# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)
# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))
# set timezone
import pendulum
tz = pendulum.timezone('utc')
dt_tz = datetime(*dt_vals,tzinfo=tz)

How do I parse an HTTP date-string in Python?

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.
In other words, I want to convert a string like "Wed, 23 Sep 2009 22:15:29 GMT" to a python time-structure.
>>> import email.utils, datetime
>>> email.utils.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)
If you want a datetime.datetime object, you can do:
# Python <3.3
def my_parsedate(text):
return datetime.datetime(*eut.parsedate(text)[:6])
# Python ≥3.3
def my_parsedate(text):
return email.utils.parsedate_to_datetime(text)
email.utils.parsedate
Attempts to parse a date according to the rules in RFC 2822. however, some mailers don’t follow that format as specified, so parsedate() tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as "Mon, 20 Nov 1995 19:12:08 -0500". If it succeeds in parsing the date, parsedate() returns a 9-tuple that can be passed directly to time.mktime(); otherwise None will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable.
email.utils.parsedate_to_datetime
The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime; otherwise ValueError is raised if date contains an invalid value such as an hour greater than 23 or a timezone offset not between -24 and 24 hours. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo.
Since Python 3.3 there's email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate, Internet Message Format fixed length format, a subset of HTTP-date of RFC 7231).
>>> from email.utils import parsedate_to_datetime
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
There's also undocumented http.cookiejar.http2time which can achieve the same as follows:
>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires directive which is expressed in the same format.
>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
it can probably parse many date formats
httplib is in the core
NOTE:
had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.
you can do this, if you only have that piece of string and you want to parse it:
>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>>
but let me exemplify through mime messages:
import mimetools
import StringIO
message = mimetools.Message(
StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
or via http messages (responses)
>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
right?
>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)
there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)
whatever the case, looks better than using email.utils for parsing http headers.

Categories

Resources