Format error when trying to parse datetime string using strptime - python

Trying to parse a datetime string to unix:
from calendar import timegm
from datetime import datetime
print(timegm(datetime.strptime(('2021-07-21 00:00:07.223977216+00:00'), '%Y-%m-%d %H:%M:%S.%f')))
Results in
ValueError: time data '2021-07-21 00:00:07.223977216+00:00' does not match format '%Y-%m-%d %H:%M:%S.%f+00:00'
Tried a lot, cant get anywhere so far ...

Your date is in ISO format, so you can use datetime.fromisoformat:
>>> from datetime import datetime
>>> datetime.fromisoformat("2021-07-21 00:00:07+00:00")
datetime.datetime(2021, 7, 21, 0, 0, 7, tzinfo=datetime.timezone.utc)

So if you look at the format of the date time you are providing, it does not have the fractional seconds the formatter is looking for:
# '2021-07-21 00:00:07+00:00' <- this date time
# '%Y-%m-%d %H:%M:%S.%f' <- in this format, is parsing like below
# 'YYYY-MM-DD HH:MM:SS.ff' (note the ff part is missing, then there's a +00:00 part leftover so the format is breaking)
If you don't need the microseconds, remove the '.%f' part from your format string. Otherwise, if you're parsing a series of values where some have the fractional part, you're going to need to give both options:
try:
timestamp = datetime.strptime(your_string_here, '%Y-%m-%d %H:%M:%S.%f')
except ValueError:
timestamp = datetime.strptime(your_string_here, '%Y-%m-%d %H:%M:%S')

Related

parsing datetime beyond microseconds in Python in unknown ISO format

I'm trying to parse this datetime:
t = '2021-08-21 11:23:45.180999936'
using datetime strptime function:
from datetime import datetime
datetime.strptime(t, '%Y-%m-%d %H:%M:%S.%f').time()
I'm struggling with the last element of the datime, which I assume to be microseconds (%f), but get this error:
ValueError: unconverted data remains: 936 strptime
So if I understood the value error says the datetime is three digits too long for the last part to be a microsecond. What is the right way of parsing this datetime if not with microseconds? What is the ISO format of this datetime?
My question is related to this (unanswered) question with a different (related?) format (with Z-suffix).
In python, strftime and strptime allow only up to 6 decimal places. They aren't fully ISO 8601.
%f Microsecond as a decimal number, zero-padded to 6 digits.
Taken from datetime.datetime.fromisodatetime documentation:
Caution This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat(). A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil.
There are 2 ways to parse the string to datetime or Timestamp objects
import pandas as pd
t = '2021-08-21 11:23:45.180999936'
t1 = pd.Timestamp(t)
t2 = pd.to_datetime(t)
The output is Timestamp object
Timestamp('2021-08-21 11:23:45.180999936')
Another way is using the library
from datetime import datetime
t = '2021-08-21 11:23:45.180999936'
t3 = datetime.fromisoformat(t.split('.')[0])
t4 = datetime.strptime(t.split('.')[0], '%Y-%m-%d %H:%M:%S')
The output is datetime object
datetime.datetime(2021, 8, 21, 11, 23, 45)

Convert an unusual/custom time format to datetime object

I have an unusual datetime format in my dataset, which I need to convert to usable datetime object.
An example looks like: '1/3/2018 1:29:35 PM(UTC+0)'
I have tried to parse it with:
from dateutil.parser import parse
parse('1/3/2018 1:29:35 PM(UTC+0)')
but it doesn't recognize the format.
My current workaround is to parse the datetime column (the data is in pandas dataframe) using regex into two columns, like so:
and then depending on the value of the 'utc' column apply custom convert_to_eastern function.
I wonder if there is an easier way to accomplish it using datetime.datetime.strptime() ?
Following didn't work:
import datetime as dt
my_time='1/3/2018 1:29:35 PM(UTC+0)'
dt.datetime.strptime(my_time, '%m/%d/%Y %I:%M:%S %p(%z)')
Addition:
This is not a question: "How to convert UTC timezone into local timezone" My dataset has rows with UTC as well as Eastern time zone rows. The problem I have is that the format is not an ISO format, but some human-readable custom format.
Question: an easier way to accomplish it using datetime.datetime.strptime()
Split the datestring into parts: utc:[('1/3/2018 1:29:35 PM', '(UTC+0)', 'UTC', '+', '0')]
Rebuild the datestring, fixing the hour part padding with 0 to 2 digits.
I assume, there are no minutes in the UTC part, therefore defaults to 00.
If the datestring has more then 2 UTC digits, returns the unchanged datestring.
Note: The strptime format have to be %Z%z!
Documentation: strftime-and-strptime-behavior
from datetime import datetime
import re
def fix_UTC(s):
utc = re.findall(r'(.+?)(\((\w{3})(\+|\-)(\d{1,2})\))', s)
if utc:
utc = utc[0]
return '{}({}{}{})'.format(utc[0], utc[2], utc[3], '{:02}00'.format(int(utc[4])))
else:
return s
my_time = fix_UTC('1/3/2018 1:29:35 PM(UTC+0)')
date = datetime.strptime(my_time, '%m/%d/%Y %I:%M:%S %p(%Z%z)')
print("{} {}".format(date, date.tzinfo))
Output:
2018-01-03 13:29:35+01:00 UTC
Tested with Python: 3.4.2
The problem is with '+0' for your timezone 'UTC+0'. datetime only takes utc offset in the form of HHMM. Possible workaround:
import datetime as dt
my_time = '1/3/2018 1:29:35 PM(UTC+0)'
my_time=my_time.replace('+0','+0000')
dt.datetime.strptime(my_time, '%m/%d/%Y %I:%M:%S %p(%Z%z)')
It should be something like that:
import datetime as dt
my_time='1/3/2018 1:29:35 PM(UTC+0000)'
tmp = dt.datetime.strptime(my_time, '%m/%d/%Y %I:%M:%S %p(%Z%z)')
print(tmp)
Big "Z" for timezone (UTC, GMT etc), small "z" for delta. Also you should add more zeros to delta.

Parse and extract values from time format in python

I am trying to parse and extract values from my time data 2018-03-11 13:15:31.734874+01:00.
I'm using strptime() to do this with the %Y %m %d %H:%M:%S.%f %Z format but I am getting this error:
ValueError: time data '2018-03-11 13:15:31.734874+01:00' does not match format '%Y %m %d %H:%M:%S.%f %Z'
Also, I don't know how to handle the +1:00 in my time data. Can anyone help?
There are two problems here to solve.
First is the format string. It should be %Y-%m-%d %H:%M:%S.%f%z to match exact date separators and timezone sequence (without space).
From strftime and strptime Behavior:
%z (lower case) UTC offset in the form +HHMM or -HHMM (empty string if the object is naive). (empty), +0000, -0400, +1030
Second is the colon (:) in timezone offset '+01:00'. That can be left out using substring: s[:-3]+s[-2:] or string substitute.
So the final answer is as below.
from datetime import datetime
s = '2018-03-11 13:15:31.734874+01:00'
datetime.strptime(s[:-3]+s[-2:], '%Y-%m-%d %H:%M:%S.%f%z')
%Y %m %d  should be changed to %Y-%m-%d to match with the time string. Also, you need to remove the last : from the input to use with %z.
Here is how you should do:
import datetime
s = '2018-03-11 13:15:31.734874+01:00'
print(datetime.datetime.strptime(''.join(s.rsplit(':', 1)), '%Y-%m-%d %H:%M:%S.%f%z'))
# 2018-03-11 13:15:31.734874+01:00
At first:
%Y %m %d will not match 2018-03-11. You need to adapt it to the time string! %Y-%m-%d instead should work.
Secondly:
IF you are in python3, the %z was added for time stamps. However the timestamp has to be without the colon, e.g. +0100instead of +01:00. Therefore, if you use python3 this works:
>>> time_string = '2018-03-11 13:15:31.734874+01:00'
>>> time_string = ''.join(time_string.rsplit(':', 1))
>>> datetime.datetime.strptime(time_string, '%Y-%m-%d %H:%M:%S.%f%z')
datetime.datetime(2018, 3, 11, 13, 15, 31, 734874, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
Btw the time_string after the editing looks like that:
>>> time_string
'2018-03-11 13:15:31.734874+0100'
IF you are in python2, the %z won't work, here you have to use the parse function of the dateutil module, which is straight forward.
>>> from dateutil.parser import parse
>>> parse('2018-03-11 13:15:31.734874+01:00')
datetime.datetime(2018, 3, 11, 13, 15, 31, 734874, tzinfo=tzoffset(None, 3600))

Create a datetime from a string representation in a CSV file

I have a CSV file with recorded datetimes with a particular format:
%Y-%m-%d %H:%M:%s %Z
Example:
2017-02-11 14:11:42 PST
I am trying to format the datetime to a friendlier value to use later on.
However, I have been unable to create a datetime object with my code so far.
Here is my code:
for r in row:
purchase_date.append(
datetime.strptime(row['purchase-date'], "%Y/%m/%d %H:%M:%S %Z")
)
This is the error received:
ValueError: time data '2017-02-11 14:11:42 PST' does not match format %Y/%m/%d %H:%M:%S %Z'
Timezones are often rather wonky when trying to convert from a string. It is often best to deal with the timezone string yourself. Here is a bit of code which separates the timezone from the timestamp, and then converts them separately.
Code:
import datetime as dt
import pytz
my_timezones = dict(
PST='US/Pacific',
)
def convert_my_datetime_str(dt_str):
# split into time and timezone
timestamp, tz_str = dt_str.rsplit(' ', 1)
# convert the date string to datetime
time = dt.datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
# get a timezone name
tz = pytz.timezone(my_timezones[tz_str])
# return a timezone aware datetime
return tz.localize(time)
Test Code:
print(convert_my_datetime_str('2017-02-11 14:11:42 PST'))
Results;
2017-02-11 14:11:42-08:00
You should be able to just change the format to match your date strings. In the error, your date string has dashes instead of slashes, so make the format string match:
for r in row:
purchase_date.append(
datetime.strptime(row['purchase-date'], "%Y-%m-%d %H:%M:%S %Z")
)

Datetime format does not match

I have a datetime object with integer number of seconds (ex: 2010-04-16 16:51:23). I am using the following command to extract exact time
dt = datetime.datetime.strptime(time, '%Y-%m-%d %H:%M:%S.%f
(generically, I have decimals (ex: 2010-04-16 16:51:23.1456) but sometimes I don't. So when I run this command, I get an error message
ValueError: time data '2010-04-16 16:51:23' does not match format '%Y-%m-%d %H:%M:%S.%f'
How do I go about resolving this?
It's because you don't have the format you specified. You have the format:
'%Y-%m-%d %H:%M:%S'
There are multiple solutions. First, always generate the data in the same format (adding .00 if you need to).
A second solution is that you try to decode in one format and if you fail, you decode using the other format:
try:
dt = datetime.datetime.strptime(time, '%Y-%m-%d %H:%M:%S.%f')
except ValueError:
dt = datetime.datetime.strptime(time, '%Y-%m-%d %H:%M:%S')
Another way avoiding using the exception handling mechanism is to default the field if not present and just try processing with the one format string:
from datetime import datetime
s = '2010-04-16 16:51:23.123'
dt, secs = s.partition('.')[::2]
print datetime.strptime('{}.{}'.format(dt, secs or '0'), '%Y-%m-%d %H:%M:%S.%f')
if you're using the latest python (3.2+) simple-date will do this kind of thing for you:
>>> from simpledate import *
>>> SimpleDate('2010-04-16 16:51:23.1456')
SimpleDate('2010-04-16 16:51:23.145600', tz='America/Santiago')
>>> SimpleDate('2010-04-16 16:51:23')
SimpleDate('2010-04-16 16:51:23', tz='America/Santiago')
it works by extending the python template format. so you could also write (it's not needed because ISO8601-like formats are handled by default):
>>> SimpleDate('2010-04-16 16:51:23', format='Y-m-d H:M:S(.f)?')
SimpleDate('2010-04-16 16:51:23', tz='America/Santiago')
see how the fractional seconds are (.f)? like a regexp - means it's optional (also, it will add % signs if there are none).
PS and you can access the datetime via an attribute. if you wanted to discard the tzinfo (which is taken from the locale by default - i live in chile, hence America/Santiago above) to get a naive datetime:
>>> SimpleDate('2010-04-16 16:51:23').naive.datetime
datetime.datetime(2010, 4, 16, 16, 51, 23)

Categories

Resources