Options for converting between localized strings and datetime objects - python

I need to
parse a localized string to a datetime object,
use a datetime object to generate a localized string.
The problem is that the default locale ("de_DE") does not match the localized string ("en_US").
What options are there to implement the described conversions?

Please note that I have not studied all available libraries and just want to provide a starting point for others to solve their problems.
I will use the following localized string during all examples:
dt_str = "Thu 3 Apr 2014 13:19:52" # en_US
1. Methods of 'datetime.datetime'
This is the simplest approach, but it becomes unwieldy if you use locales other than the default locale.
Information about the syntax can be found here.
import datetime
dt = datetime.datetime.strptime(dt_str,
"%a %d %b %Y %H:%M:%S")
# datetime.datetime(2014, 4, 3, 13, 19, 52)
s = dt.strftime("%a %-d %b %Y %H:%M:%S")
# 'Thu 3 Apr 2014 13:19:52'
If you like to parse or format other locales, you have to change the global locale, which could result in undesired side effects. (I do not recommend this approach.)
import datetime
import locale
locale.setlocale(locale.LC_TIME, "de_DE.UTF-8")
dt = datetime.datetime.strptime("Do 3 Apr 2014 13:19:52",
"%a %d %b %Y %H:%M:%S")
# datetime.datetime(2014, 4, 3, 13, 19, 52)
s = dt.strftime("%a %-d %b %Y %H:%M:%S")
# 'Do 3 Apr 2014 13:19:52'
2. The python library 'arrow'
arrow allows you to pass a locale (otherwise it uses "en_US").
import arrow # installed via pip
import datetime
### localized string -> datetime
a_dt = arrow.get(dt_str,
"ddd D MMM YYYY H:mm:ss",
locale="en_US") # "en_US" is also the default, so this is just for clarification
dt = a_dt.datetime
# datetime.datetime(2014, 4, 3, 13, 19, 52, tzinfo=tzutc())
### datetime -> localized string
a_s = arrow.get(datetime.datetime(2014, 5, 17, 14, 0, 0))
s = a_s.format("ddd D MMM YYYY H:mm:ss",
locale="en_US") # "en_US" is also the default, so this is just for clarification
# 'Sat 17 May 2014 14:00:00'
This is especially useful if the desired locale differs from the default locale.
3. The python library 'babel' (mainly for formatting)
The strength of babel is the formatting (Unfortunately, it seems that it can not parse any pattern. Only the format "short" seems reliable.)
import babel.dates # 'babel' installed via pip
import datetime
dt = datetime.datetime(2014, 4, 3, 13, 19, 52)
# parsing is the problem with babel, therefore I created the datetime object directly.
s = babel.dates.format_datetime(dt,
"EEE d MMM yyyy H:mm:ss",
locale="en_US")
# 'Thu 3 Apr 2014 13:19:52'
4. The python library 'dateparser' (only parsing)
dateparser is very powerful. It is able to lookup dates in longer texts and does support non-Gregorian calendar systems, just to name a few features.
import dateparser # installed via pip
dt = dateparser.parse(dt_str,
date_formats=["%a %d %b %Y %H:%M:%S"],
languages=["en"])
# datetime.datetime(2014, 4, 3, 13, 19, 52)
5. Last but not least
The following noteworthy python libraries have great features, but unfortunately I could not use them for this specific problem (or did not know how to use them properly).
maya
delorean
pendulum

Related

Why does Python's datetime strptime() not set timezone when %Z is specified in a string? [duplicate]

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where EST is an Australian time-zone):
Tue Jun 22 07:46:22 EST 2010
I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.
>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')
However, for some reason, the datetime object that comes back doesn't seem to have any tzinfo associated with it.
I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can't find anything to that effect documented here.
Is there any way to get strptime() to play nicely with timezones?
I recommend using python-dateutil. Its parser has been able to parse every date format I've thrown at it so far.
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
and so on. No dealing with strptime() format nonsense... just throw a date at it and it Does The Right Thing.
The datetime module documentation says:
Return a datetime corresponding to date_string, parsed according to format. This is equivalent to datetime(*(time.strptime(date_string, format)[0:6])).
See that [0:6]? That gets you (year, month, day, hour, minute, second). Nothing else. No mention of timezones.
Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime doesn't work but if you strip off the " %Z" and the " EST" it does work. Also using "UTC" or "GMT" instead of "EST" works. "PST" and "MEZ" don't work. Puzzling.
It's worth noting this has been updated as of version 3.2 and the same documentation now also states the following:
When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.
Note that this doesn't work with %Z, so the case is important. See the following example:
In [1]: from datetime import datetime
In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')
In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None
In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')
In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00
Since strptime returns a datetime object which has tzinfo attribute, We can simply replace it with desired timezone.
>>> import datetime
>>> date_time_str = '2018-06-29 08:15:27.243860'
>>> date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S.%f').replace(tzinfo=datetime.timezone.utc)
>>> date_time_obj.tzname()
'UTC'
Your time string is similar to the time format in rfc 2822 (date format in email, http headers). You could parse it using only stdlib:
>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)
See solutions that yield timezone-aware datetime objects for various Python versions: parsing date with timezone from an email.
In this format, EST is semantically equivalent to -0500. Though, in general, a timezone abbreviation is not enough, to identify a timezone uniquely.
Ran into this exact problem.
What I ended up doing:
# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'
# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)
# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))
# set timezone
import pendulum
tz = pendulum.timezone('utc')
dt_tz = datetime(*dt_vals,tzinfo=tz)

Turn a String into a Python Date object

I have a String, Sent: Fri Sep 18 00:30:12 2009 that I want to turn into a Python date object.
I know there's a strptime() function that can be used like so:
>>> dt_str = '9/24/2010 5:03:29 PM'
>>> dt_obj = datetime.strptime(dt_str, '%m/%d/%Y %I:%M:%S %p')
>>> dt_obj
datetime.datetime(2010, 9, 24, 17, 3, 29)
Can anybody think of an easier way to accomplish this than going through a bunch of conditionals to parse out if Sep, month = 9?
To parse rfc 822-like date-time string, you could use email stdlib package:
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Fri Sep 18 00:30:12 2009')
datetime.datetime(2009, 9, 18, 0, 30, 12)
This is Python 3 code, see Python 2.6+ compatible code.
You could also provide the explicit format string:
>>> from datetime import datetime
>>> datetime.strptime('Fri Sep 18 00:30:12 2009', '%a %b %d %H:%M:%S %Y')
datetime.datetime(2009, 9, 18, 0, 30, 12)
See the table with the format codes.
Use the python-dateutil library!
First: pip install python-dateutil into your virtual-env if you have one then you can run the following code:
from dateutil import parser
s = u'Sent: Fri Sep 18 00:30:12 2009'
date = parser.parse(s.split(':', 1)[-1])

How to convert date like "Apr 15 2014 16:21:16 UTC" to UTC time using python

I have dates in the following format that are used to name zip files:
Apr 15 2014 16:21:16 UTC
I would like to convert that to UTC numbers using Python. Does python recognize the 3-character month?
Use:
import datetime
datetime.datetime.strptime(yourstring, '%b %d %Y %H:%M:%S UTC')
%b is the abbreviated month name. By default, Python uses the C (English) locale, regardless of environment variables used.
Demo:
>>> import datetime
>>> yourstring = 'Apr 15 2014 16:21:16 UTC'
>>> datetime.datetime.strptime(yourstring, '%b %d %Y %H:%M:%S UTC')
datetime.datetime(2014, 4, 15, 16, 21, 16)
The value is timezone neutral, which for UTC timestamps is fine, provided you don't mix local objects into the mix (e.g. stick to datetime.datetime.utcnow() and similar methods).
An easier way is to use dateutil:
>>> from dateutil import parser
>>> parser.parse("Apr 15 2014 16:21:16 UTC")
datetime.datetime(2014, 4, 15, 16, 21, 16, tzinfo=tzutc())
Timezone is handled, and it supports other common datetime formats as well.

How can I convert a timestamp string with timezone offset to local time?

I am trying to convert a string timestamp into a proper datetime object. The problem I am having is that there is a timezone offset and everything I am doing doesn't seem to work.
Ultimately I want to convert the string timestamp into a datetime object in my machines timezone.
# string timestamp
date = "Fri, 16 Jul 2010 07:08:23 -0700"
The dateutil package is handy for parsing date/times:
In [10]: date = u"Fri, 16 Jul 2010 07:08:23 -0700"
In [11]: from dateutil.parser import parse
In [12]: parse(date)
Out[12]: datetime.datetime(2010, 7, 16, 7, 8, 23, tzinfo=tzoffset(None, -25200))
Finally, to convert into your local timezone,
In [13]: parse(date).astimezone(YOUR_LOCAL_TIMEZONE)
It looks like datetime.datetime.strptime(d, '%a, %d %b %Y %H:%M:%S %z') should work, but according to this bug report there are issues with the %z processing. So you'll probably have to handle the timezone on your own:
import datetime
d = u"Fri, 16 Jul 2010 07:08:23 -0700"
d, tz_info = d[:-5], d[-5:]
neg, hours, minutes = tz_info[0], int(tz_info[1:3]), int(tz_info[3:])
if neg == '-':
hours, minutes = hours * -1, minutes * -1
d = datetime.datetime.strptime(d, '%a, %d %b %Y %H:%M:%S ')
print d
print d + datetime.timedelta(hours = hours, minutes = minutes)
Here's a stdlib solution:
>>> from datetime import datetime
>>> from email.utils import mktime_tz, parsedate_tz
>>> datetime.fromtimestamp(mktime_tz(parsedate_tz(u"Fri, 16 Jul 2010 07:08:23 -0700")))
datetime.datetime(2010, 7, 16, 16, 8, 23) # your local time may be different
See also, Python: parsing date with timezone from an email.
Note: fromtimestamp() may fail if the local timezone had different UTC offset in the past (2010) and if it does not use a historical timezone database on the given platform. To fix it, you could use tzlocal.get_localzone(), to get a pytz tzinfo object representing your local timezone. pytz provides access to the tz database in a portable manner:
>>> timestamp = mktime_tz(parsedate_tz(u"Fri, 16 Jul 2010 07:08:23 -0700"))
>>> import tzlocal # $ pip install tzlocal
>>> str(datetime.fromtimestamp(timestamp, tzlocal.get_localzone()))
'2010-07-16 16:08:23+02:00'

How to preserve timezone when parsing date/time strings with strptime()?

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where EST is an Australian time-zone):
Tue Jun 22 07:46:22 EST 2010
I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.
>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')
However, for some reason, the datetime object that comes back doesn't seem to have any tzinfo associated with it.
I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can't find anything to that effect documented here.
Is there any way to get strptime() to play nicely with timezones?
I recommend using python-dateutil. Its parser has been able to parse every date format I've thrown at it so far.
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
and so on. No dealing with strptime() format nonsense... just throw a date at it and it Does The Right Thing.
The datetime module documentation says:
Return a datetime corresponding to date_string, parsed according to format. This is equivalent to datetime(*(time.strptime(date_string, format)[0:6])).
See that [0:6]? That gets you (year, month, day, hour, minute, second). Nothing else. No mention of timezones.
Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime doesn't work but if you strip off the " %Z" and the " EST" it does work. Also using "UTC" or "GMT" instead of "EST" works. "PST" and "MEZ" don't work. Puzzling.
It's worth noting this has been updated as of version 3.2 and the same documentation now also states the following:
When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.
Note that this doesn't work with %Z, so the case is important. See the following example:
In [1]: from datetime import datetime
In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')
In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None
In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')
In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00
Since strptime returns a datetime object which has tzinfo attribute, We can simply replace it with desired timezone.
>>> import datetime
>>> date_time_str = '2018-06-29 08:15:27.243860'
>>> date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S.%f').replace(tzinfo=datetime.timezone.utc)
>>> date_time_obj.tzname()
'UTC'
Your time string is similar to the time format in rfc 2822 (date format in email, http headers). You could parse it using only stdlib:
>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)
See solutions that yield timezone-aware datetime objects for various Python versions: parsing date with timezone from an email.
In this format, EST is semantically equivalent to -0500. Though, in general, a timezone abbreviation is not enough, to identify a timezone uniquely.
Ran into this exact problem.
What I ended up doing:
# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'
# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)
# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))
# set timezone
import pendulum
tz = pendulum.timezone('utc')
dt_tz = datetime(*dt_vals,tzinfo=tz)

Categories

Resources