Python time to age, part 2: timezones [duplicate] - python

This question already has answers here:
Python time to age
(6 answers)
Closed 8 years ago.
Following on from my previous question, Python time to age, I have now come across a problem regarding the timezone, and it turns out that it's not always going to be "+0200". So when strptime tries to parse it as such, it throws up an exception.
I thought about just chopping off the +0200 with [:-6] or whatever, but is there a real way to do this with strptime?
I am using Python 2.5.2 if it matters.
>>> from datetime import datetime
>>> fmt = "%a, %d %b %Y %H:%M:%S +0200"
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200", fmt)
datetime.datetime(2008, 7, 22, 8, 17, 41)
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0300", fmt)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError: time data did not match format: data=Tue, 22 Jul 2008 08:17:41 +0300 fmt=%a, %d %b %Y %H:%M:%S +0200

is there a real way to do this with strptime?
No, but since your format appears to be an RFC822-family date, you can read it much more easily using the email library instead:
>>> import email.utils
>>> email.utils.parsedate_tz('Tue, 22 Jul 2008 08:17:41 +0200')
(2008, 7, 22, 8, 17, 41, 0, 1, 0, 7200)
(7200 = timezone offset from UTC in seconds)

New in version 2.6.
For a naive object, the %z and %Z
format codes are replaced by empty
strings.
It looks like this is implemented only in >= 2.6, and I think you have to manually parse it.
I can't see another solution than to remove the time zone data:
from datetime import timedelta,datetime
try:
offset = int("Tue, 22 Jul 2008 08:17:41 +0300"[-5:])
except:
print "Error"
delta = timedelta(hours = offset / 100)
fmt = "%a, %d %b %Y %H:%M:%S"
time = datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200"[:-6], fmt)
time -= delta

You can use the dateutil library which is very useful:
from datetime import datetime
from dateutil.parser import parse
dt = parse("Tue, 22 Jul 2008 08:17:41 +0200")
## datetime.datetime(2008, 7, 22, 8, 17, 41, tzinfo=tzoffset(None, 7200)) <- dt
print dt
2008-07-22 08:17:41+02:00

As far as I know, strptime() doesn't recognize numeric time zone codes. If you know that the string is always going to end with a time zone specification of that form (+ or - followed by 4 digits), just chopping it off and parsing it manually seems like a perfectly reasonable thing to do.

It seems that %Z corresponds to time zone names, not offsets.
For example, given:
>>> format = '%a, %d %b %Y %H:%M:%S %Z'
I can parse:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 GMT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
Although it seems that it doesn't do anything with the time zone, merely observing that it exists and is valid:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 NZDT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
I suppose if you wished, you could locate a mapping of offsets to names, convert your input, and then parse it. It might be simpler to just truncate your input, though.

Related

Turn a String into a Python Date object

I have a String, Sent: Fri Sep 18 00:30:12 2009 that I want to turn into a Python date object.
I know there's a strptime() function that can be used like so:
>>> dt_str = '9/24/2010 5:03:29 PM'
>>> dt_obj = datetime.strptime(dt_str, '%m/%d/%Y %I:%M:%S %p')
>>> dt_obj
datetime.datetime(2010, 9, 24, 17, 3, 29)
Can anybody think of an easier way to accomplish this than going through a bunch of conditionals to parse out if Sep, month = 9?
To parse rfc 822-like date-time string, you could use email stdlib package:
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Fri Sep 18 00:30:12 2009')
datetime.datetime(2009, 9, 18, 0, 30, 12)
This is Python 3 code, see Python 2.6+ compatible code.
You could also provide the explicit format string:
>>> from datetime import datetime
>>> datetime.strptime('Fri Sep 18 00:30:12 2009', '%a %b %d %H:%M:%S %Y')
datetime.datetime(2009, 9, 18, 0, 30, 12)
See the table with the format codes.
Use the python-dateutil library!
First: pip install python-dateutil into your virtual-env if you have one then you can run the following code:
from dateutil import parser
s = u'Sent: Fri Sep 18 00:30:12 2009'
date = parser.parse(s.split(':', 1)[-1])

Python Error Handling concerning datetime and time

I have this variable called pubdate which is derived from rss feeds. Most of the time it's a time tuple which is what I want it to be, so there are no errors.
Sometimes it's a unicode string, that's where it gets annoying.
So far, I have this following code concerning pubdate when it is a unicode string:
if isinstance(pubdate, unicode):
try:
pubdate = time.mktime(datetime.strptime(pubdate, '%d/%m/%Y %H:%M:%S').timetuple()) # turn the string into a unix timestamp
except ValueError:
pubdate = re.sub(r'\w+,\s*', '', pubdate) # removes day words from string, i.e 'Mon', 'Tue', etc.
pubdate = time.mktime(datetime.strptime(pubdate, '%d %b %Y %H:%M:%S').timetuple()) # turn the string into a unix timestamp
But my problem is if the unicode string pubdate is in a different format from the one in the except ValueError clause it will raise another ValueError, what's the pythonic way to deal with multiple ValueError cases?
As you are parsing date string from a Rss. Maybe you need some guess when parsing the date string. I recommend you to use dateutil instead of the datetime module.
dateutil.parser offers a generic date/time string parser which is able to parse most known formats to represent a date and/or time.
The prototype of this function is: parse(timestr)(you don't have to specify the format yourself).
DEMO
>>> parse("2003-09-25T10:49:41")
datetime.datetime(2003, 9, 25, 10, 49, 41)
>>> parse("2003-09-25T10:49")
datetime.datetime(2003, 9, 25, 10, 49)
>>> parse("2003-09-25T10")
datetime.datetime(2003, 9, 25, 10, 0)
>>> parse("2003-09-25")
datetime.datetime(2003, 9, 25, 0, 0)
>>> parse("Sep 03", default=DEFAULT)
datetime.datetime(2003, 9, 3, 0, 0)
Fuzzy parsing:
>>> s = "Today is 25 of September of 2003, exactly " \
... "at 10:49:41 with timezone -03:00."
>>> parse(s, fuzzy=True)
datetime.datetime(2003, 9, 25, 10, 49, 41,
tzinfo=tzoffset(None, -10800))
You could take the following approach:
from datetime import datetime
import time
pub_dates = ['2/5/2013 12:23:34', 'Monday 2 Jan 2013 12:23:34', 'mon 2 Jan 2013 12:23:34', '10/14/2015 11:11', '10 2015']
for pub_date in pub_dates:
pubdate = 0 # value if all conversion attempts fail
for format in ['%d/%m/%Y %H:%M:%S', '%d %b %Y %H:%M:%S', '%a %d %b %Y %H:%M:%S', '%A %d %b %Y %H:%M:%S', '%m/%d/%Y %H:%M']:
try:
pubdate = time.mktime(datetime.strptime(pub_date, format).timetuple()) # turn the string into a unix timestamp
break
except ValueError as e:
pass
print '{:<12} {}'.format(pubdate, pub_date)
Giving output as:
1367493814.0 2/5/2013 12:23:34
1357129414.0 Monday 2 Jan 2013 12:23:34
1357129414.0 mon 2 Jan 2013 12:23:34
1444817460.0 10/14/2015 11:11
0 10 2015

How to convert date like "Apr 15 2014 16:21:16 UTC" to UTC time using python

I have dates in the following format that are used to name zip files:
Apr 15 2014 16:21:16 UTC
I would like to convert that to UTC numbers using Python. Does python recognize the 3-character month?
Use:
import datetime
datetime.datetime.strptime(yourstring, '%b %d %Y %H:%M:%S UTC')
%b is the abbreviated month name. By default, Python uses the C (English) locale, regardless of environment variables used.
Demo:
>>> import datetime
>>> yourstring = 'Apr 15 2014 16:21:16 UTC'
>>> datetime.datetime.strptime(yourstring, '%b %d %Y %H:%M:%S UTC')
datetime.datetime(2014, 4, 15, 16, 21, 16)
The value is timezone neutral, which for UTC timestamps is fine, provided you don't mix local objects into the mix (e.g. stick to datetime.datetime.utcnow() and similar methods).
An easier way is to use dateutil:
>>> from dateutil import parser
>>> parser.parse("Apr 15 2014 16:21:16 UTC")
datetime.datetime(2014, 4, 15, 16, 21, 16, tzinfo=tzutc())
Timezone is handled, and it supports other common datetime formats as well.

Text to datetime in Python [duplicate]

This question already has answers here:
Python time to age
(6 answers)
Closed 8 years ago.
Following on from my previous question, Python time to age, I have now come across a problem regarding the timezone, and it turns out that it's not always going to be "+0200". So when strptime tries to parse it as such, it throws up an exception.
I thought about just chopping off the +0200 with [:-6] or whatever, but is there a real way to do this with strptime?
I am using Python 2.5.2 if it matters.
>>> from datetime import datetime
>>> fmt = "%a, %d %b %Y %H:%M:%S +0200"
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200", fmt)
datetime.datetime(2008, 7, 22, 8, 17, 41)
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0300", fmt)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError: time data did not match format: data=Tue, 22 Jul 2008 08:17:41 +0300 fmt=%a, %d %b %Y %H:%M:%S +0200
is there a real way to do this with strptime?
No, but since your format appears to be an RFC822-family date, you can read it much more easily using the email library instead:
>>> import email.utils
>>> email.utils.parsedate_tz('Tue, 22 Jul 2008 08:17:41 +0200')
(2008, 7, 22, 8, 17, 41, 0, 1, 0, 7200)
(7200 = timezone offset from UTC in seconds)
New in version 2.6.
For a naive object, the %z and %Z
format codes are replaced by empty
strings.
It looks like this is implemented only in >= 2.6, and I think you have to manually parse it.
I can't see another solution than to remove the time zone data:
from datetime import timedelta,datetime
try:
offset = int("Tue, 22 Jul 2008 08:17:41 +0300"[-5:])
except:
print "Error"
delta = timedelta(hours = offset / 100)
fmt = "%a, %d %b %Y %H:%M:%S"
time = datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200"[:-6], fmt)
time -= delta
You can use the dateutil library which is very useful:
from datetime import datetime
from dateutil.parser import parse
dt = parse("Tue, 22 Jul 2008 08:17:41 +0200")
## datetime.datetime(2008, 7, 22, 8, 17, 41, tzinfo=tzoffset(None, 7200)) <- dt
print dt
2008-07-22 08:17:41+02:00
As far as I know, strptime() doesn't recognize numeric time zone codes. If you know that the string is always going to end with a time zone specification of that form (+ or - followed by 4 digits), just chopping it off and parsing it manually seems like a perfectly reasonable thing to do.
It seems that %Z corresponds to time zone names, not offsets.
For example, given:
>>> format = '%a, %d %b %Y %H:%M:%S %Z'
I can parse:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 GMT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
Although it seems that it doesn't do anything with the time zone, merely observing that it exists and is valid:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 NZDT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
I suppose if you wished, you could locate a mapping of offsets to names, convert your input, and then parse it. It might be simpler to just truncate your input, though.

Parsing date with timezone from an email?

I am trying to retrieve date from an email. At first it's easy:
message = email.parser.Parser().parse(file)
date = message['Date']
print date
and I receive:
'Mon, 16 Nov 2009 13:32:02 +0100'
But I need a nice datetime object, so I use:
datetime.strptime('Mon, 16 Nov 2009 13:32:02 +0100', '%a, %d %b %Y %H:%M:%S %Z')
which raises ValueError, since %Z isn't format for +0100. But I can't find proper format for timezone in the documentation, there is only this %Z for zone. Can someone help me on that?
email.utils has a parsedate() function for the RFC 2822 format, which as far as I know is not deprecated.
>>> import email.utils
>>> import time
>>> import datetime
>>> email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')
(2009, 11, 16, 13, 32, 2, 0, 1, -1)
>>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
1258378322.0
>>> datetime.datetime.fromtimestamp(1258378322.0)
datetime.datetime(2009, 11, 16, 13, 32, 2)
Please note, however, that the parsedate method does not take into account the time zone and time.mktime always expects a local time tuple.
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) ==
... time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100'))
True
So you'll still need to parse out the time zone and take into account the local time difference, too:
>>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) +
... time.timezone - REMOTE_TIME_ZONE_OFFSET)
1258410122.0
Use email.utils.parsedate_tz(date):
msg=email.message_from_file(open(file_name))
date=None
date_str=msg.get('date')
if date_str:
date_tuple=email.utils.parsedate_tz(date_str)
if date_tuple:
date=datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
if date:
... # valid date found
For python 3.3+ you can use parsedate_to_datetime function:
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Mon, 16 Nov 2009 13:32:02 +0100')
...
datetime.datetime(2009, 11, 16, 13, 32, 2, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
Official documentation:
The inverse of format_datetime(). Performs the same function as
parsedate(), but on success returns a datetime. If the input date has
a timezone of -0000, the datetime will be a naive datetime, and if the
date is conforming to the RFCs it will represent a time in UTC but
with no indication of the actual source timezone of the message the
date comes from. If the input date has any other valid timezone
offset, the datetime will be an aware datetime with the corresponding
a timezone tzinfo. New in version 3.3.
In Python 3.3+, email message can parse the headers for you:
import email
import email.policy
headers = email.message_from_file(file, policy=email.policy.default)
print(headers.get('date').datetime)
# -> 2009-11-16 13:32:02+01:00
Since Python 3.2+, it works if you replace %Z with %z:
>>> from datetime import datetime
>>> datetime.strptime("Mon, 16 Nov 2009 13:32:02 +0100",
... "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2009, 11, 16, 13, 32, 2,
tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
Or using email package (Python 3.3+):
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime("Mon, 16 Nov 2009 13:32:02 +0100")
datetime.datetime(2009, 11, 16, 13, 32, 2,
tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
if UTC offset is specified as -0000 then it returns a naive datetime object that represents time in UTC otherwise it returns an aware datetime object with the corresponding tzinfo set.
To parse rfc 5322 date-time string on earlier Python versions (2.6+):
from calendar import timegm
from datetime import datetime, timedelta, tzinfo
from email.utils import parsedate_tz
ZERO = timedelta(0)
time_string = 'Mon, 16 Nov 2009 13:32:02 +0100'
tt = parsedate_tz(time_string)
#NOTE: mktime_tz is broken on Python < 2.7.4,
# see https://bugs.python.org/issue21267
timestamp = timegm(tt) - tt[9] # local time - utc offset == utc time
naive_utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
aware_utc_dt = naive_utc_dt.replace(tzinfo=FixedOffset(ZERO, 'UTC'))
aware_dt = aware_utc_dt.astimezone(FixedOffset(timedelta(seconds=tt[9])))
print(aware_utc_dt)
print(aware_dt)
# -> 2009-11-16 12:32:02+00:00
# -> 2009-11-16 13:32:02+01:00
where FixedOffset is based on tzinfo subclass from the datetime documentation:
class FixedOffset(tzinfo):
"""Fixed UTC offset: `time = utc_time + utc_offset`."""
def __init__(self, offset, name=None):
self.__offset = offset
if name is None:
seconds = abs(offset).seconds
assert abs(offset).days == 0
hours, seconds = divmod(seconds, 3600)
if offset < ZERO:
hours = -hours
minutes, seconds = divmod(seconds, 60)
assert seconds == 0
#NOTE: the last part is to remind about deprecated POSIX
# GMT+h timezones that have the opposite sign in the
# name; the corresponding numeric value is not used e.g.,
# no minutes
self.__name = '<%+03d%02d>GMT%+d' % (hours, minutes, -hours)
else:
self.__name = name
def utcoffset(self, dt=None):
return self.__offset
def tzname(self, dt=None):
return self.__name
def dst(self, dt=None):
return ZERO
def __repr__(self):
return 'FixedOffset(%r, %r)' % (self.utcoffset(), self.tzname())
Have you tried
rfc822.parsedate_tz(date) # ?
More on RFC822, http://docs.python.org/library/rfc822.html
It's deprecated (parsedate_tz is now in email.utils.parsedate_tz), though.
But maybe these answers help:
How to parse dates with -0400 timezone string in python?
python time to age part 2, timezones
# Parses Nginx' format of "01/Jan/1999:13:59:59 +0400"
# Unfortunately, strptime doesn't support %z for the UTC offset (despite what
# the docs actually say), hence the need # for this function.
def parseDate(dateStr):
date = datetime.datetime.strptime(dateStr[:-6], "%d/%b/%Y:%H:%M:%S")
offsetDir = dateStr[-5]
offsetHours = int(dateStr[-4:-2])
offsetMins = int(dateStr[-2:])
if offsetDir == "-":
offsetHours = -offsetHours
offsetMins = -offsetMins
return date + datetime.timedelta(hours=offsetHours, minutes=offsetMins)
For those who want to get the correct local time, here is what I did:
from datetime import datetime
from email.utils import parsedate_to_datetime
mail_time_str = 'Mon, 16 Nov 2009 13:32:02 +0100'
local_time_str = datetime.fromtimestamp(parsedate_to_datetime(mail_time_str).timestamp()).strftime('%Y-%m-%d %H:%M:%S')
print(local_time_str)
ValueError: 'z' is a bad directive in format...
(note: I have to stick to python 2.7 in my case)
I have had a similar problem parsing commit dates from the output of git log --date=iso8601 which actually isn't the ISO8601 format (hence the addition of --date=iso8601-strict in a later version).
Since I am using django I can leverage the utilities there.
https://github.com/django/django/blob/master/django/utils/dateparse.py
>>> from django.utils.dateparse import parse_datetime
>>> parse_datetime('2013-07-23T15:10:59.342107+01:00')
datetime.datetime(2013, 7, 23, 15, 10, 59, 342107, tzinfo=+0100)
Instead of strptime you could use your own regular expression.

Categories

Resources