Parsing datetime in Python..? - python

I have a system (developed in Python) that accepts datetime as string in VARIOUS formats and i have to parse them..Currently datetime string formats are :
Fri Sep 25 18:09:49 -0500 2009
2008-06-29T00:42:18.000Z
2011-07-16T21:46:39Z
1294989360
Now i want a generic parser that can convert any of these datetime formats in appropriate datetime object...
Otherwise, i have to go with parsing them individually. So please also provide method for parsing them individually (if there is no generic parser)..!!

As #TimPietzcker suggested, the dateutil package is the way to go, it handles the first 3 formats correctly and automatically:
>>> from dateutil.parser import parse
>>> parse("Fri Sep 25 18:09:49 -0500 2009")
datetime.datetime(2009, 9, 25, 18, 9, 49, tzinfo=tzoffset(None, -18000))
>>> parse("2008-06-29T00:42:18.000Z")
datetime.datetime(2008, 6, 29, 0, 42, 18, tzinfo=tzutc())
>>> parse("2011-07-16T21:46:39Z")
datetime.datetime(2011, 7, 16, 21, 46, 39, tzinfo=tzutc())
The unixtime format it seems to hiccough on, but luckily the standard datetime.datetime is up for the task:
>>> from datetime import datetime
>>> datetime.utcfromtimestamp(float("1294989360"))
datetime.datetime(2011, 1, 14, 7, 16)
It is rather easy to make a function out of this that handles all 4 formats:
from dateutil.parser import parse
from datetime import datetime
def parse_time(s):
try:
ret = parse(s)
except ValueError:
ret = datetime.utcfromtimestamp(s)
return ret

You should look into the dateutil package.

Related

Convert string to datetime python with milli-seconds

Problem: I have the following string '2021-03-10T09:58:17.027323+00:00' which I want to convert to datetime. I have difficulties with the format. This is what I tried so far:
datetime.strptime('2021-03-10T09:58:17.027323+00:00', "%Y-%m-%dT%H:%M:%S.z")
Any help is highly appreciated!
The correct format string is "%Y-%m-%dT%H:%M:%S.%f%z"
>>> from datetime import datetime
>>> datetime.strptime('2021-03-10T09:58:17.027323+00:00', "%Y-%m-%dT%H:%M:%S.%f%z")
datetime.datetime(2021, 3, 10, 9, 58, 17, 27323, tzinfo=datetime.timezone.utc)
>>> datetime.fromisoformat('2021-03-10T09:58:17.027323+00:00')
datetime.datetime(2021, 3, 10, 9, 58, 17, 27323, tzinfo=datetime.timezone.utc)
But as mentioned in the comments - better use fromisoformat()
Given that your string is known from before and you won't be using a now time feature, you can check here I think you can use the following code:
import datetime
date_time_str = '2018-06-29 08:15:27.243860'
date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S.%f')
print('Date:', date_time_obj.date())
print('Time:', date_time_obj.time())
print('Date-time:', date_time_obj)

Changing string to UTC to dateTime object

I have time = '2020-06-24T13:30:00-04:00'. How can I change it to a dateTime object in UTC time. I would prefer not to use pd.Timestamp(time).tz_convert("UTC").to_pydatetime() because it returns a weird output that would look like this datetime.datetime(2020, 6, 24, 17, 30, tzinfo=<UTC>). As a result, when I check for equality with datetime.datetime(2020, 6, 24, 17, 30), it return False.
Edit:
import datetime
import pytz
time = '2020-06-24T13:30:00-04:00
dt = datetime.datetime(2020, 6, 24, 17, 30)
print("dt: ",dt)
so = datetime.datetime.strptime(time, '%Y-%m-%dT%H:%M:%S%z').astimezone(pytz.utc)
print("so:",so)
print(dt == so)
outputs
dt: 2020-06-24 17:30:00
so: 2020-06-24 17:30:00+00:00
False
How can I get it to properly evaluate to True?
#1 Since your string is ISO 8601 compatible, use fromisoformat() on Python 3.7+:
from datetime import datetime, timezone
s = '2020-06-24T13:30:00-04:00'
dtobj = datetime.fromisoformat(s)
# dtobj
# datetime.datetime(2020, 6, 24, 13, 30, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
Note that this will give you a timezone-aware datetime object; the tzinfo property is a UTC offset. You can easily convert that to UTC using astimezone():
dtobj_utc = dtobj.astimezone(timezone.utc)
# dtobj_utc
# datetime.datetime(2020, 6, 24, 17, 30, tzinfo=datetime.timezone.utc)
#2 You can achieve the same with strptime (also Python3.7+ according to this):
dtobj = datetime.strptime(s, '%Y-%m-%dT%H:%M:%S%z')
dtobj_utc = dtobj.astimezone(timezone.utc)
# dtobj_utc
# datetime.datetime(2020, 6, 24, 17, 30, tzinfo=datetime.timezone.utc)
#3 If you want to turn the result into a naive datetime object, i.e. remove the tzinfo property, replace with None:
dtobj_utc_naive = dtobj_utc.replace(tzinfo=None)
# dtobj_utc_naive
# datetime.datetime(2020, 6, 24, 17, 30)
#4 For older Python versions, you should be able to use dateutil's parser:
from dateutil import parser
dtobj = parser.parse(s)
dtobj_utc = dtobj.astimezone(timezone.utc)
dtobj_utc_naive = dtobj_utc.replace(tzinfo=None)
# dtobj_utc_naive
# datetime.datetime(2020, 6, 24, 17, 30)
Alright so my previous answer was sort of wack because I did not understand your issue entirely so I am rewriting it. You problem is that you are constructing a datetime object from a string and it is timezone aware(UTC). However, whenever you make a datetime object in python, dt = datetime.datetime(2020, 6, 24, 17, 30), it is creating it but with no timezone information (which you can check using .tzinfo on it). All you would need to do is make dt timezone aware when you first create it. See below my code snippit.
import datetime
time = '2020-06-24T13:30:00-04:00'
dt = datetime.datetime(2020, 6, 24, 17, 30, tzinfo=datetime.timezone.utc)
print("dt: ",dt.tzinfo)
so = datetime.datetime.strptime(time, '%Y-%m-%dT%H:%M:%S%z')
print("so:",so.tzinfo)
print(dt == so)

Parsing date, time and zone to UTC datetime object

As part of a logging system, I would like to parse a string timestamp coming from a Cisco device, which has the following format:
# show clock
16:26:19.990 GMT+1 Wed Sep 11 2013
The parsing result should be a UTC datetime instance which will be stored in a SQLite database, thus the need for a timezone conversion.
Using just datetime.strptime is not enough, because the %Z directive only recognises local timezones (i.e. those related to the current $LANG or $LC_* environment). Therefore, I need to make use of the pytz package.
Because the format is always the same, I can do something like the following:
import pytz
from datetime import datetime
s = '16:26:19.990 CEST Wed Sep 11 2013'
tm, tz, dt = s.split(" ", 2)
naive = datetime.strptime("%s %s" % (tm, dt), "%H:%M:%S.%f %a %b %d %Y")
aware = naive.replace(timezone=pytz.timezone(tz))
universal = aware.astimezone(pytz.UTC)
This, however, does not work without some modifications. The value of tz must be corrected to a name that is recognized by pytz. In the example, pytz.timezone('CEST') raises an UnknownTimezoneError because the real timezone is CET. The problem is that the daylight savings correction is not applied then:
>>> from datetime import datetime
>>> from pytz import UTC, timezone
>>> a = datetime.strptime('16:18:57.925 Wed Sep 11 2013', '%H:%M:%S.%f %a %b %d %Y')
>>> b = a.replace(tzinfo=timezone('CET'))
>>> a
datetime.datetime(2013, 9, 11, 16, 18, 57, 925000)
>>> b
datetime.datetime(2013, 9, 11, 16, 18, 57, 925000, tzinfo=<DstTzInfo 'CET' CET+1:00:00 STD>)
>>> b.astimezone(UTC)
datetime.datetime(2013, 9, 11, 15, 18, 57, 925000, tzinfo=<UTC>)
Using normalize does not seem to help:
>>> timezone('CET').normalize(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/etanol/virtualenvs/plexus/local/lib/python2.7/site-packages/pytz/tzinfo.py", line 235, in normalize
raise ValueError('Naive time - no tzinfo set')
ValueError: Naive time - no tzinfo set
>>> timezone('CET').normalize(b)
datetime.datetime(2013, 9, 11, 17, 18, 57, 925000, tzinfo=<DstTzInfo 'CET' CEST+2:00:00 DST>)
I don't really know what am I missing, but the wanted result is:
datetime.datetime(2013, 9, 11, 14, 18, 57, 925000, tzinfo=<UTC>)
Thanks in advance.
Using timezone.localize:
>>> from datetime import datetime
>>> from pytz import UTC, timezone
>>>
>>> CET = timezone('CET')
>>>
>>> a = datetime.strptime('16:18:57.925 Wed Sep 11 2013', '%H:%M:%S.%f %a %b %d %Y')
>>> print CET.localize(a).astimezone(UTC)
2013-09-11 14:18:57.925000+00:00

How to convert the following string to python date?

How can I convert: u'2012-11-07T13:25:10.703Z' to Python datetime?
EDIT
I intend to use something like this:
>>> from datetime import datetime
>>> datetime.strptime('2011-03-07','%Y-%m-%d')
datetime.datetime(2011, 3, 7, 0, 0)
but how can I change the second argument to accommodate my date format?
Use datetime.datetime.strptime:
datetime.datetime.strptime(u'2012-11-07T13:25:10.703Z', '%Y-%m-%dT%H:%M:%S.%fZ')
Result:
datetime.datetime(2012, 11, 7, 13, 25, 10, 703000)
See the description of the strptime behaviour.
Use strptime from the datetime module
import datetime
datetime.strptime(u'2012-11-07T13:25:10.703Z', '%Y-%m-%dT%H:%M:%S.%fZ')
>>> datetime.datetime(2012, 11, 7, 13, 25, 10, 703000)

How do I parse an HTTP date-string in Python?

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.
In other words, I want to convert a string like "Wed, 23 Sep 2009 22:15:29 GMT" to a python time-structure.
>>> import email.utils, datetime
>>> email.utils.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)
If you want a datetime.datetime object, you can do:
# Python <3.3
def my_parsedate(text):
return datetime.datetime(*eut.parsedate(text)[:6])
# Python ≥3.3
def my_parsedate(text):
return email.utils.parsedate_to_datetime(text)
email.utils.parsedate
Attempts to parse a date according to the rules in RFC 2822. however, some mailers don’t follow that format as specified, so parsedate() tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as "Mon, 20 Nov 1995 19:12:08 -0500". If it succeeds in parsing the date, parsedate() returns a 9-tuple that can be passed directly to time.mktime(); otherwise None will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable.
email.utils.parsedate_to_datetime
The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime; otherwise ValueError is raised if date contains an invalid value such as an hour greater than 23 or a timezone offset not between -24 and 24 hours. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo.
Since Python 3.3 there's email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate, Internet Message Format fixed length format, a subset of HTTP-date of RFC 7231).
>>> from email.utils import parsedate_to_datetime
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
There's also undocumented http.cookiejar.http2time which can achieve the same as follows:
>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires directive which is expressed in the same format.
>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
it can probably parse many date formats
httplib is in the core
NOTE:
had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.
you can do this, if you only have that piece of string and you want to parse it:
>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>>
but let me exemplify through mime messages:
import mimetools
import StringIO
message = mimetools.Message(
StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
or via http messages (responses)
>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
right?
>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)
there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)
whatever the case, looks better than using email.utils for parsing http headers.

Categories

Resources