Python email.utils.parsedate() alternative for french date - python

I'd like to use mailman archive scraper
I'm facing some bugs due to the mailgun archive I'm interested in which is set with french language.
At the moment the parser use the following code :
message_time = time.mktime(email.utils.parsedate(soup.h1.findNextSibling('i').string))
I obtain a
TypeError: argument must be 9-item sequence, not None
I think this is due to email.utils.parsedate() function and the date in french which has this format : Lun 17 Mar 19:30:40 CET 2014
I'm looking for an alternative way to obtain the same parse result of email.utils.parsedate() with this date format.
My python knowledge is limited and till there I don't find.
Any idea or orientation ?

You could use dateparser module:
>>> import dateparser
>>> dateparser.parse('Lun 17 Mar 19:30:40 CET 2014')
datetime.datetime(2014, 3, 17, 19, 30, 40)

Related

python strptime vs dateutil - recommended use

I need to convert between strings and datetime objects quite often - up until now i have always used strptime and strftime.
I started working with the google calendar API where i recieve strings like this: '2018-03-17T09:00:00+01:00
It seems like i need to convert the +01:00 into 0100 for strptime which is a little annoying.
While i dont have this issue with dateutil there are a few other inconveniences.Also i saw that the last update of dateutil was in 2016 which seems odd.
So my question is which one would you recommend for adding and substracting dates and datetimes and switching between string and datetime obj?
Also is dateutil still maintained or is it outdated?
Thanks a lot!
- Sally
It seems like i need to convert the +01:00 into 0100 for strptime
No you don't.
For one thing, the standard format is +0100, not 0100.
For another, strptime handles +01:00 just fine:
>>> datetime.datetime.strptime('2018-08-13T11:18:24+00:00',
... '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2018, 8, 13, 11, 18, 24, tzinfo=datetime.timezone.utc)
>>> datetime.datetime.strptime('2018-08-13T11:18:24+01:00',
... '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2018, 8, 13, 11, 18, 24, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))
So, the problem you're trying to solve doesn't exist in the first place.
While i dont have this issue with dateutil there are a few other inconveniences.Also i saw that the last update of dateutil was in 2016 which seems odd.
As of 13 Aug 2018, the last update to dateutil was 2 days ago. And the last official release to PyPI, version 2.7.3, was 3 months ago.
So, your secondary problem doesn't exist either.
So my question is which one would you recommend for adding and substracting dates and datetimes and switching between string and datetime obj?
Since dateutil just gives you the same datetime objects that datetime gives you, neither one is better for adding and subtracting dates and datetimes.
For converting to and from string format, sometimes dateutil is more convenient, and it also supports a wider range of formats that you don't care about—but for what you're doing, they both work fine, so there's no difference. If you expect to need other formats in the future, it might be worth bringing in dateutil, but if not, you might as well stick with the standard library.
dateutil last version (2.7.3) is in may 2018. It just says "copyright 2016" somewhere in the credits. Moreover, the documentation talks about policy for future versions, so it seems to be quite active. I would suggest to prefer it over strptime. However, be sure to get the latest version. Previous versions had a bug with converting ISO dates, with which you are working.

parsedatetime returns wrong year for dates occurring earlier than current month

import parsedatetime as pdt
c = pdt.Constants()
p = pdt.Calendar(c)
p.parseDateText('28 Feb 17') #Current date at runtime is Mar 7 2017
RETURNS:
(2018, 2, 17, 21, 51, 22, 1, 66, 0)
So I built a webscraper to go get calendar events posted to several websites of interest and they have their dates in various non-standard formats. I am using parsedatetime to convert the free text dates into something more usable in a calendar. The problem I just realized I am facing is for events that I scrape that have already occurred. I narrowed the problem down to starting when the month of the date to be parsed is at least one month prior to the current date, as demonstrated above in the code.
What can I do to either correctly parse these dates (they aren't all in the format as depicted above) or otherwise catch them so they don't get added erroneously to my google calendar?
I would recommmend using the dateparser library:
$ pip install dateparser
example:
>>> import dateparser
>>> dateparser.parse('28 Feb 17')
datetime.datetime(2017, 2, 28, 0, 0)

How do I get the correct date format string for a given locale without setting that locale program-wide in Python?

I'm trying to generate dictionaries that include dates which can be asked to be given in specific locales. For example, I might want my function to return this when called with en_US as an argument:
{'date': 'May 12, 2014', ...}
And this, when called with hu_HU:
{'date': '2014. május 12.', ...}
Now, based on what I've found so far, I should be using locale.setlocale() to set the locale I want to use, and locale.nl_langinfo(locale.D_FMT) to get the appropriate date format. I could call locale.resetlocale() after this to go back to the previously used one, but my program uses multiple threads, and I assume the other ones will be affected by this temporary locale change as well.
There is the non-standard babel module which offers this and a lot more:
>>> import babel.dates
>>> babel.dates.format_datetime(locale='ru_RU')
'12 мая 2014 г., 8:24:08'
>>> babel.dates.format_datetime(locale='de_DE')
'12.05.2014 08:24:14'
>>> babel.dates.format_datetime(locale='en_GB')
'12 May 2014 08:24:16'
>>> from datetime import datetime, timedelta
>>> babel.dates.format_datetime(datetime(2014, 4, 1), locale='en_GB')
'1 Apr 2014 00:00:00'
>>> babel.dates.format_timedelta(datetime.now() - datetime(2014, 4, 1),
locale='en_GB')
'1 month'
It might be expensive to do but how about before starting your threads you build a dictionary of country code: datetime format entries?
for lang in lang_codes:
locale.setlocale(lang)
time_format[lang] = locale.nl_langinfo(locale.D_FMT)
locale.resetlocale()
You can then use the format strings whenever you need the date in a specific format time.strftime(time_format[lang], t). Of course, this doesn't take care of GMT shifts.
The better way would be to find where locale gets its mapping from but I don't know that and don't have the time to investigate right now.

Convert date formats to another with Python

I download RSS content from different countries with Python, but each of them use their own datetime format or time zone. For instance,
Wed, 23 Oct 2013 17:44:13 GMT
23 Oct 2013 18:21:04 +0100
23 Oct 2013 13:12:41 EDT
10-23-2013 00:12:24
At the moment, my solution is to create a different function for each RSS source and change the date to a format I will decide. But is there any way to do this automatically?
Not really. But take a look at the feedparser lib.
Different feed types and versions use wildly different date formats.
Universal Feed Parser will attempt to auto-detect the date format used
in any date element, and parse it into a standard Python 9-tuple, as
documented in the Python time module.
From the list of Recognized Date Formats it seems to me, that the library could help you out some of the way :)
Best of luck
You can try using the dateutil module to parse the datetime.
It povides the functionality to parse most of the known datetime format. Here is an example from the docs:
>>> from dateutil.parser import *
>>> parse("Thu Sep 25 10:36:28 2003")
datetime.datetime(2003, 9, 25, 10, 36, 28)
It returns a datetime object which can be directly used for manipulation. You can then also use strftime to convert it to the required format string.

Python strptime or alternative for complex date string parsing

I have been given a large list of date-time representations that need to be read into a database. I am using Python (because it rocks). The strings are in a terrible, terrible format where they are not precise to seconds, no timezone is stated, and the hours do not have a leading 0. So they look more like this:
April 29, 2013, 7:52 p.m.
April 30, 2013, 4 p.m.
You'll notice that if something happens between 4:00 and 4:01 it drops the minutes, too (ugh). Anyway, trying to parse these with time.strptime, but the docs state that hours must be decimal numbers [01:12] (or [01:24]). Since nothing is padded with 0's I'm wondering if there is something else I can pass to strptime to accept hours without leading 0; or if I should try splitting, then padding the strings; or use some other method of constructing the datetime object.
Also, it does not look like strptime accepts AM/PM as "A.M." or "P.M.", so I'll have to correct that as well. . .
Note, I am not able to just handle these strings in a batch. I receive them one-at-a-time from a foreign application which sometimes uses nicely formatted Unix epoch timestamps, but occasionally uses this format. Processing them on the fly is the only option.
I am using Python 2.7 with some Python 3 features imported.
from __future__ import (print_function, unicode_literals)
The most flexible parser is part of the dateutil package; it eats your input for breakfast:
>>> from dateutil import parser
>>> parser.parse('April 29, 2013, 7:52 p.m.')
datetime.datetime(2013, 4, 29, 19, 52)
>>> parser.parse('April 30, 2013, 4 p.m.')
datetime.datetime(2013, 4, 30, 16, 0)

Categories

Resources