I have this variable called pubdate which is derived from rss feeds. Most of the time it's a time tuple which is what I want it to be, so there are no errors.
Sometimes it's a unicode string, that's where it gets annoying.
So far, I have this following code concerning pubdate when it is a unicode string:
if isinstance(pubdate, unicode):
try:
pubdate = time.mktime(datetime.strptime(pubdate, '%d/%m/%Y %H:%M:%S').timetuple()) # turn the string into a unix timestamp
except ValueError:
pubdate = re.sub(r'\w+,\s*', '', pubdate) # removes day words from string, i.e 'Mon', 'Tue', etc.
pubdate = time.mktime(datetime.strptime(pubdate, '%d %b %Y %H:%M:%S').timetuple()) # turn the string into a unix timestamp
But my problem is if the unicode string pubdate is in a different format from the one in the except ValueError clause it will raise another ValueError, what's the pythonic way to deal with multiple ValueError cases?
As you are parsing date string from a Rss. Maybe you need some guess when parsing the date string. I recommend you to use dateutil instead of the datetime module.
dateutil.parser offers a generic date/time string parser which is able to parse most known formats to represent a date and/or time.
The prototype of this function is: parse(timestr)(you don't have to specify the format yourself).
DEMO
>>> parse("2003-09-25T10:49:41")
datetime.datetime(2003, 9, 25, 10, 49, 41)
>>> parse("2003-09-25T10:49")
datetime.datetime(2003, 9, 25, 10, 49)
>>> parse("2003-09-25T10")
datetime.datetime(2003, 9, 25, 10, 0)
>>> parse("2003-09-25")
datetime.datetime(2003, 9, 25, 0, 0)
>>> parse("Sep 03", default=DEFAULT)
datetime.datetime(2003, 9, 3, 0, 0)
Fuzzy parsing:
>>> s = "Today is 25 of September of 2003, exactly " \
... "at 10:49:41 with timezone -03:00."
>>> parse(s, fuzzy=True)
datetime.datetime(2003, 9, 25, 10, 49, 41,
tzinfo=tzoffset(None, -10800))
You could take the following approach:
from datetime import datetime
import time
pub_dates = ['2/5/2013 12:23:34', 'Monday 2 Jan 2013 12:23:34', 'mon 2 Jan 2013 12:23:34', '10/14/2015 11:11', '10 2015']
for pub_date in pub_dates:
pubdate = 0 # value if all conversion attempts fail
for format in ['%d/%m/%Y %H:%M:%S', '%d %b %Y %H:%M:%S', '%a %d %b %Y %H:%M:%S', '%A %d %b %Y %H:%M:%S', '%m/%d/%Y %H:%M']:
try:
pubdate = time.mktime(datetime.strptime(pub_date, format).timetuple()) # turn the string into a unix timestamp
break
except ValueError as e:
pass
print '{:<12} {}'.format(pubdate, pub_date)
Giving output as:
1367493814.0 2/5/2013 12:23:34
1357129414.0 Monday 2 Jan 2013 12:23:34
1357129414.0 mon 2 Jan 2013 12:23:34
1444817460.0 10/14/2015 11:11
0 10 2015
Related
I have a string like this
dateStr = "Wed Mar 15 12:50:52 GMT+05:30 2017"
which is IST time.
Is there any way to read the dateStr as per the specified timezone within the dateStr
i.e. GMT+05:30.
So that I can make datetime object directly.
I have tried to parse it using format
format = "%a %b %d %H:%M:%S %Z%z %Y"
But it gives me error of format does not match.
Can you try this?
>>> dateStr = "Wed Mar 15 12:50:52 GMT+05:30 2017"
>>> from dateutil.parser import parse
>>> parse(dateStr)
datetime.datetime(2017, 3, 15, 12, 50, 52, tzinfo=tzoffset(None, -19800))
I have dates in the following format that are used to name zip files:
Apr 15 2014 16:21:16 UTC
I would like to convert that to UTC numbers using Python. Does python recognize the 3-character month?
Use:
import datetime
datetime.datetime.strptime(yourstring, '%b %d %Y %H:%M:%S UTC')
%b is the abbreviated month name. By default, Python uses the C (English) locale, regardless of environment variables used.
Demo:
>>> import datetime
>>> yourstring = 'Apr 15 2014 16:21:16 UTC'
>>> datetime.datetime.strptime(yourstring, '%b %d %Y %H:%M:%S UTC')
datetime.datetime(2014, 4, 15, 16, 21, 16)
The value is timezone neutral, which for UTC timestamps is fine, provided you don't mix local objects into the mix (e.g. stick to datetime.datetime.utcnow() and similar methods).
An easier way is to use dateutil:
>>> from dateutil import parser
>>> parser.parse("Apr 15 2014 16:21:16 UTC")
datetime.datetime(2014, 4, 15, 16, 21, 16, tzinfo=tzutc())
Timezone is handled, and it supports other common datetime formats as well.
In standard python, I can convert a string representation of time into datetime doing this:
date_string = u'Tue, 13 Sep 2011 02:38:59 GMT';
date_object = datetime.strptime(date_string, '%a, %d %b %Y %H:%M:%S %Z');
This works fine until I invoke the same over app engine where I get the error:
time data did not match format: data=2011-09-13 02:38:59 fmt=%a, %d %b %Y %H:%M:%S %Z
How would I convert this date string correctly so I can get a datetime representation?
Your error message indicates that you're not really passing Tue, 13 Sep 2011 02:38:59 GMT, but 2011-09-13 02:38:59. Are you sure you pass the correct parameters to strptime?
My python works just fine for the following:
datetime.strptime(u'Tue, 13 Sep 2011 02:38:59 GMT', "%a, %d %b %Y %H:%M:%S %Z")
# returns datetime.datetime(2011, 9, 13, 2, 38, 59)
This also works fine for me:
from dateutil imoprt parser as dparser
dparser.parse("Tue, 13 Sep 2011 02:38:59 GMT")
# returns datetime.datetime(2011, 9, 13, 2, 38, 59, tzinfo=tzutc())
This question already has answers here:
Python time to age
(6 answers)
Closed 8 years ago.
Following on from my previous question, Python time to age, I have now come across a problem regarding the timezone, and it turns out that it's not always going to be "+0200". So when strptime tries to parse it as such, it throws up an exception.
I thought about just chopping off the +0200 with [:-6] or whatever, but is there a real way to do this with strptime?
I am using Python 2.5.2 if it matters.
>>> from datetime import datetime
>>> fmt = "%a, %d %b %Y %H:%M:%S +0200"
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200", fmt)
datetime.datetime(2008, 7, 22, 8, 17, 41)
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0300", fmt)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError: time data did not match format: data=Tue, 22 Jul 2008 08:17:41 +0300 fmt=%a, %d %b %Y %H:%M:%S +0200
is there a real way to do this with strptime?
No, but since your format appears to be an RFC822-family date, you can read it much more easily using the email library instead:
>>> import email.utils
>>> email.utils.parsedate_tz('Tue, 22 Jul 2008 08:17:41 +0200')
(2008, 7, 22, 8, 17, 41, 0, 1, 0, 7200)
(7200 = timezone offset from UTC in seconds)
New in version 2.6.
For a naive object, the %z and %Z
format codes are replaced by empty
strings.
It looks like this is implemented only in >= 2.6, and I think you have to manually parse it.
I can't see another solution than to remove the time zone data:
from datetime import timedelta,datetime
try:
offset = int("Tue, 22 Jul 2008 08:17:41 +0300"[-5:])
except:
print "Error"
delta = timedelta(hours = offset / 100)
fmt = "%a, %d %b %Y %H:%M:%S"
time = datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200"[:-6], fmt)
time -= delta
You can use the dateutil library which is very useful:
from datetime import datetime
from dateutil.parser import parse
dt = parse("Tue, 22 Jul 2008 08:17:41 +0200")
## datetime.datetime(2008, 7, 22, 8, 17, 41, tzinfo=tzoffset(None, 7200)) <- dt
print dt
2008-07-22 08:17:41+02:00
As far as I know, strptime() doesn't recognize numeric time zone codes. If you know that the string is always going to end with a time zone specification of that form (+ or - followed by 4 digits), just chopping it off and parsing it manually seems like a perfectly reasonable thing to do.
It seems that %Z corresponds to time zone names, not offsets.
For example, given:
>>> format = '%a, %d %b %Y %H:%M:%S %Z'
I can parse:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 GMT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
Although it seems that it doesn't do anything with the time zone, merely observing that it exists and is valid:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 NZDT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
I suppose if you wished, you could locate a mapping of offsets to names, convert your input, and then parse it. It might be simpler to just truncate your input, though.
This question already has answers here:
Python time to age
(6 answers)
Closed 8 years ago.
Following on from my previous question, Python time to age, I have now come across a problem regarding the timezone, and it turns out that it's not always going to be "+0200". So when strptime tries to parse it as such, it throws up an exception.
I thought about just chopping off the +0200 with [:-6] or whatever, but is there a real way to do this with strptime?
I am using Python 2.5.2 if it matters.
>>> from datetime import datetime
>>> fmt = "%a, %d %b %Y %H:%M:%S +0200"
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200", fmt)
datetime.datetime(2008, 7, 22, 8, 17, 41)
>>> datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0300", fmt)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError: time data did not match format: data=Tue, 22 Jul 2008 08:17:41 +0300 fmt=%a, %d %b %Y %H:%M:%S +0200
is there a real way to do this with strptime?
No, but since your format appears to be an RFC822-family date, you can read it much more easily using the email library instead:
>>> import email.utils
>>> email.utils.parsedate_tz('Tue, 22 Jul 2008 08:17:41 +0200')
(2008, 7, 22, 8, 17, 41, 0, 1, 0, 7200)
(7200 = timezone offset from UTC in seconds)
New in version 2.6.
For a naive object, the %z and %Z
format codes are replaced by empty
strings.
It looks like this is implemented only in >= 2.6, and I think you have to manually parse it.
I can't see another solution than to remove the time zone data:
from datetime import timedelta,datetime
try:
offset = int("Tue, 22 Jul 2008 08:17:41 +0300"[-5:])
except:
print "Error"
delta = timedelta(hours = offset / 100)
fmt = "%a, %d %b %Y %H:%M:%S"
time = datetime.strptime("Tue, 22 Jul 2008 08:17:41 +0200"[:-6], fmt)
time -= delta
You can use the dateutil library which is very useful:
from datetime import datetime
from dateutil.parser import parse
dt = parse("Tue, 22 Jul 2008 08:17:41 +0200")
## datetime.datetime(2008, 7, 22, 8, 17, 41, tzinfo=tzoffset(None, 7200)) <- dt
print dt
2008-07-22 08:17:41+02:00
As far as I know, strptime() doesn't recognize numeric time zone codes. If you know that the string is always going to end with a time zone specification of that form (+ or - followed by 4 digits), just chopping it off and parsing it manually seems like a perfectly reasonable thing to do.
It seems that %Z corresponds to time zone names, not offsets.
For example, given:
>>> format = '%a, %d %b %Y %H:%M:%S %Z'
I can parse:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 GMT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
Although it seems that it doesn't do anything with the time zone, merely observing that it exists and is valid:
>>> datetime.datetime.strptime('Tue, 22 Jul 2008 08:17:41 NZDT', format)
datetime.datetime(2008, 7, 22, 8, 17, 41)
I suppose if you wished, you could locate a mapping of offsets to names, convert your input, and then parse it. It might be simpler to just truncate your input, though.