Convert date formats to another with Python - python

I download RSS content from different countries with Python, but each of them use their own datetime format or time zone. For instance,
Wed, 23 Oct 2013 17:44:13 GMT
23 Oct 2013 18:21:04 +0100
23 Oct 2013 13:12:41 EDT
10-23-2013 00:12:24
At the moment, my solution is to create a different function for each RSS source and change the date to a format I will decide. But is there any way to do this automatically?

Not really. But take a look at the feedparser lib.
Different feed types and versions use wildly different date formats.
Universal Feed Parser will attempt to auto-detect the date format used
in any date element, and parse it into a standard Python 9-tuple, as
documented in the Python time module.
From the list of Recognized Date Formats it seems to me, that the library could help you out some of the way :)
Best of luck

You can try using the dateutil module to parse the datetime.
It povides the functionality to parse most of the known datetime format. Here is an example from the docs:
>>> from dateutil.parser import *
>>> parse("Thu Sep 25 10:36:28 2003")
datetime.datetime(2003, 9, 25, 10, 36, 28)
It returns a datetime object which can be directly used for manipulation. You can then also use strftime to convert it to the required format string.

Related

Parsing time to Datetime with Timezone

I have a time like this:
Fri Dec 04 14:51:22 CST 2020
I want to parse it as Timestamp with Timezone in Python, I am not able to achieve this with regular methods.
Could someone please help me in doing this efficiently.
Use the parse function from the dateparser package.
from dateparser import parse
dt = parse("Fri Dec 04 14:51:22 CST 2020")

Python date comparison with string output/format

I'm currently trying to writing a script to automate a function at work, but I'm not intimately familiar with Python. I'm trying to take a XML dump and compare a specific entry's date to see if the time has passed or not.
The date is in a particular format, given:
<3-letter Month> <DD> <HH:MM:SS> <YYYY> <3-letter Timezone>
For example:
May 14 20:11:20 2014 GMT
I've parsed out a string in that raw form, and need to somehow compare it with the current time to find out if the time has passed or not. That said, I'm having a bit of trouble figuring out how I should go about either formatting my text, or choosing the right mask/time format in Python.
I've been messing around with different variations of the same basic format:
if(trimmed < time.strftime("%x") ):
Trimmed is the clean date/time string. Time is derived from import time.
Is there a simple way to fix this or will I have to dig into converting the format etc.? I know the above attempt is simplistic, but I'm still very new to Python. Thanks for your time and patience!
You should use combination of gmtime (for GMT time),mktime and datetime.
from time import gmtime,mktime
from datetime import datetime
s = "May 14 20:11:20 2014 GMT"
f = "%b %d %H:%M:%S %Y GMT"
dt = datetime.strptime(s, f)
gmt = datetime.fromtimestamp(mktime(gmtime()))
if dt<gmt:
print(dt)
else:
print(gmt)

Python email.utils.parsedate() alternative for french date

I'd like to use mailman archive scraper
I'm facing some bugs due to the mailgun archive I'm interested in which is set with french language.
At the moment the parser use the following code :
message_time = time.mktime(email.utils.parsedate(soup.h1.findNextSibling('i').string))
I obtain a
TypeError: argument must be 9-item sequence, not None
I think this is due to email.utils.parsedate() function and the date in french which has this format : Lun 17 Mar 19:30:40 CET 2014
I'm looking for an alternative way to obtain the same parse result of email.utils.parsedate() with this date format.
My python knowledge is limited and till there I don't find.
Any idea or orientation ?
You could use dateparser module:
>>> import dateparser
>>> dateparser.parse('Lun 17 Mar 19:30:40 CET 2014')
datetime.datetime(2014, 3, 17, 19, 30, 40)

Using dateutil.parser to parse a date in another language

Dateutil is a great tool for parsing dates in string format. for example
from dateutil.parser import parse
parse("Tue, 01 Oct 2013 14:26:00 -0300")
returns
datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))
however,
parse("Ter, 01 Out 2013 14:26:00 -0300") # In portuguese
yields this error:
ValueError: unknown string format
Does anybody know how to make dateutil aware of the locale?
As far as I can see, dateutil is not locale aware (yet!).
I can think of three alternative suggestions:
The day and month names are hardcoded in dateutil.parser (as part of the parserinfo class). You could subclass parserinfo, and replace these names with the appropriate names for Portuguese.
Modify dateutil to get day and month names based on the user’s locale. So you could do something like
import locale
locale.setlocale(locale.LC_ALL, "pt_PT")
from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300")
I’ve started a fork which gets the names from the calendar module (which is locale-aware) to work on this: https://github.com/alexwlchan/dateutil
Right now it works for Portuguese (or seems to), but I want to think about it a bit more before I submit a patch to the main branch. In particular, weirdness may happen if it faces characters which aren’t used in Western European languages. I haven’t tested this yet. (See https://stackoverflow.com/a/8917539/1558022)
If you’re not tied to the dateutil module, you could use datetime instead, which is already locale-aware:
from datetime import datetime, date
import locale
locale.setlocale(locale.LC_ALL, "pt_PT")
datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
"%a, %d %b %Y %H:%M:%S %z")
(Note that the %z token is not consistently supported in datetime.)
You could use PyICU to parse a localized date/time string in a given format:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu # PyICU
df = icu.SimpleDateFormat(
'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
print(datetime.utcfromtimestamp(ts))
# -> 2013-10-01 17:26:00 (UTC)
It works on Python 2/3. It does not modify global state (locale).
If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).
If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime() as #alexwlchan suggested:
#!/usr/bin/env python3
import locale
from datetime import datetime
locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
"%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
# -> 2013-10-01 14:26:00-03:00
The calendar module already has constants for a lot of of languages. I think the best solution is to customize the parser from dateutil using these constants. This is a simple solution and will work for a lot of languages. I didn't test it a lot, so use with caution.
Create a module localeparseinfo.py and subclass parser.parseinfo:
import calendar
from dateutil import parser
class LocaleParserInfo(parser.parserinfo):
WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]
Now you can use your new parseinfo object as a parameter to dateutil.parser.
In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
In [2]: from localeparserinfo import LocaleParserInfo
In [3]: from dateutil.parser import parse
In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=PtParserInfo())
Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))
It solved my problem, but note that this is an incomplete solution for all possible dates and times. Take a look at dateutil parser.py, specially the parserinfo class variables. Take a look at HMS variable and others. You'll probably be able to use other constants from the calendar module.
You can even pass the locale string as an argument to your parserinfo class.
from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300",fuzzy=True)
Result:
datetime.datetime(2013, 1, 28, 14, 26, tzinfo=tzoffset(None, -10800))
One could use a context manger to temporarily set the locale and return a custom parserinfo object
Context Manager définition:
import calendar
import contextlib
import locale
from dateutil import parser
#contextlib.contextmanager
def locale_parser_info(localename):
old_locale = locale.getlocale(locale.LC_TIME)
locale.setlocale(locale.LC_TIME, localename)
class InnerParserInfo(parser.parserinfo):
WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
# dots in abbreviation make dateutil raise a Parser Error exception
MONTHS = list(zip([abr.replace(".", "") for abr in calendar.month_abbr], calendar.month_name))[1:]
try:
yield InnerParserInfo()
finally:
# Restore original locale
locale.setlocale(locale.LC_TIME, old_locale)
The actual function just wraps the call to dateutil.parser.parse in the context manager we just defined, and uses the returned parserinfo object.
def parse_localized(datestr, date_locale="pt_PT"):
with locale_parser_info(date_locale) as parserinfo:
return parser.parse(datestr, parserinfo=parserinfo)

How to parse a RFC 2822 date/time into a Python datetime?

I have a date of the form specified by RFC 2822 -- say Fri, 15 May 2009 17:58:28 +0000, as a string. Is there a quick and/or standard way to get it as a datetime object in Python 2.5? I tried to produce a strptime format string, but the +0000 timezone specifier confuses the parser.
The problem is that parsedate will ignore the offset.
Do this instead:
from email.utils import parsedate_tz
print parsedate_tz('Fri, 15 May 2009 17:58:28 +0700')
I'd like to elaborate on previous answers. email.utils.parsedate and email.utils.parsedate_tz both return tuples, since the OP needs a datetime.datetime object, I'm adding these examples for completeness:
from email.utils import parsedate
from datetime import datetime
import time
t = parsedate('Sun, 14 Jul 2013 20:14:30 -0000')
d1 = datetime.fromtimestamp(time.mktime(t))
Or:
d2 = datetime.datetime(*t[:6])
Note that d1 and d2 are both naive datetime objects, there's no timezone information stored. If you need aware datetime objects, check the tzinfo datetime() arg.
Alternatively you could use the dateutil module
from email.utils import parsedate
print parsedate('Fri, 15 May 2009 17:58:28 +0000')
Documentation.
It looks like Python 3.3 going forward has a new method parsedate_to_datetime in email.utils that takes care of the intermediate steps:
email.utils.parsedate_to_datetime(date)
The inverse of format_datetime(). Performs the same function as parsedate(), but on
success returns a datetime. If the input date has a timezone of -0000,
the datetime will be a naive datetime, and if the date is conforming
to the RFCs it will represent a time in UTC but with no indication of
the actual source timezone of the message the date comes from. If the
input date has any other valid timezone offset, the datetime will be
an aware datetime with the corresponding a timezone tzinfo.
New in version 3.3.
http://python.readthedocs.org/en/latest/library/email.util.html#email.utils.parsedate_to_datetime
There is a parsedate function in email.util.
It parses all valid RFC 2822 dates and some special cases.
email.utils.parsedate_tz(date) is the function to use. Following are some variations.
Email date/time string (RFC 5322, RFC 2822, RFC 1123) to unix timestamp in float seconds:
import email.utils
import calendar
def email_time_to_timestamp(s):
tt = email.utils.parsedate_tz(s)
if tt is None: return None
return calendar.timegm(tt) - tt[9]
import time
print(time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(email_time_to_timestamp("Wed, 04 Jan 2017 09:55:45 -0800"))))
# 2017-01-04T17:55:45Z
Make sure you do not use mktime (which interprets the time_struct in your computer’s local time, not UTC); use timegm or mktime_tz instead (but beware caveat for mktime_tz in the next paragraph).
If you are sure that you have python version 2.7.4, 3.2.4, 3.3, or newer, then you can use email.utils.mktime_tz(tt) instead of calendar.timegm(tt) - tt[9]. Before that, mktime_tz gave incorrect times when invoked during the local time zone’s fall daylight savings transition (bug 14653).
Thanks to #j-f-sebastian for caveats about mktime and mktime_tz.
Email date/time string (RFC 5322, RFC 2822, RFC 1123) to “aware” datetime on python 3.3:
On python 3.3 and above, use email.utils.parsedate_to_datetime, which returns an aware datetime with the original zone offset:
import email.utils
email.utils.parsedate_to_datetime(s)
print(email.utils.parsedate_to_datetime("Wed, 04 Jan 2017 09:55:45 -0800").isoformat())
# 2017-01-04T09:55:45-08:00
Caveat: this will throw ValueError if the time falls on a leap second e.g. email.utils.parsedate_to_datetime("Sat, 31 Dec 2016 15:59:60 -0800").
Email date/time string (RFC 5322, RFC 2822, RFC 1123) to “aware” datetime in UTC zone:
This just converts to timestamp and then to UTC datetime:
import email.utils
import calendar
import datetime
def email_time_to_utc_datetime(s):
tt = email.utils.parsedate_tz(s)
if tt is None: return None
timestamp = calendar.timegm(tt) - tt[9]
return datetime.datetime.utcfromtimestamp(timestamp)
print(email_time_to_utc_datetime("Wed, 04 Jan 2017 09:55:45 -0800").isoformat())
# 2017-01-04T17:55:45
Email date/time string (RFC 5322, RFC 2822, RFC 1123) to python “aware” datetime with original offset:
Prior to python 3.2, python did not come with tzinfo implementations, so here an example using dateutil.tz.tzoffset (pip install dateutil):
import email.utils
import datetime
import dateutil.tz
def email_time_to_datetime(s):
tt = email.utils.parsedate_tz(s)
if tt is None: return None
tz = dateutil.tz.tzoffset("UTC%+02d%02d"%(tt[9]//60//60, tt[9]//60%60), tt[9])
return datetime.datetime(*tt[:5]+(min(tt[5], 59),), tzinfo=tz)
print(email_time_to_datetime("Wed, 04 Jan 2017 09:55:45 -0800").isoformat())
# 2017-01-04T09:55:45-08:00
If you are using python 3.2, you can use the builtin tzinfo implementation datetime.timezone: tz = datetime.timezone(datetime.timedelta(seconds=tt[9])) instead of the third-party dateutil.tz.tzoffset.
Thanks to #j-f-sebastian again for note on clamping the leap second.

Categories

Resources