Python - Convert February 29 dates to date time objects - python

I need to convert dates in string format to date time object but I keep getting a value error for 29th February dates. Here is my code.
from datetime import datetime
def try_parsing_date(text):
for fmt in ('%Y-%m-%d', '%d.%m.%Y', '%m/%d/%Y', '%d/%m/%Y', '%d-%b-%y', '%d/%m/%y', '%m/%d/%y', '%m/%d/%Y'):
try:
return datetime.strptime(text, fmt)
except ValueError:
pass
raise ValueError(text)
df['Dateofbirth'] = df.apply(lambda row: try_parsing_date(row['Dateofbirth']), axis=1)
The error I get is ValueError: ('2/29/57', 'occurred at index 82445').
What is the best way to resolve this issue?

This is not a Python problem. 1957 wasn't a leap and 2/29/57 never existed. If someone claims that as his date of birth, he's lying. So you could as well put any date into your list - or nan.

Error message I obtained was more specific and usefull:
from datetime import datetime
datetime.strptime('1957-02-29', '%Y-%m-%d')
ValueError: day is out of range for month
BTW there is a lot of misleading web sites about history informing 'what happend' on 29 Feb '57 :-)

Related

How to resolve VariantTimeToSystemTime error while reading/converting dates?

def count_messages_in_folder(folder, message_count, sent_dates):
for item in folder.Items:
message_count += 1
try:
sent_date = item.SentOn
if sent_date is not None and sent_date != 0:
sent_date = sent_date.strftime('%Y-%m-%d %H:%M:%S')
try:
date = datetime.datetime.strptime(sent_date, '%Y-%m-%d %H:%M:%S')
if (date.year >= 1998 or date.year <= datetime.datetime.now().year) and (date.month >= 1 or date.month <= 12) and (date.day >= 1 or date.day <= 31):
sent_dates.append(date)
except ValueError:
pass
except ValueError:
pass
for subfolder in folder.Folders:
message_count, sent_dates = count_messages_in_folder(subfolder, message_count, sent_dates)
return message_count, sent_dates
This is a snippet of my script where I read a PST file (Outlook container that contains messages) using the win32com library. I have no issues calculating the total message count per PST file. However, I do run into the following error while trying to parse the sent date of each message and appending them to a list. My goal is to take sent dates from all messages in a PST file, add it to a list (sent_dates), and then find the min and max of the datetime objects in that list in another function.
error: (0, 'VariantTimeToSystemTime', 'No error message is available')
item.SentOn when initially read has the following type and format:
<class 'pywintypes.datetime'> 2001-08-30 03:48:50+00:00
I then convert it to a string in '%Y-%m-%d %H:%M:%S' format using strftime.
Finally, I convert the date string to datetime object using strptime (so I can find the min and max dates later):
<class 'datetime.datetime'> 2001-08-30 03:48:50
I noticed that some dates in the PST had wrong year/month/day (e.g., "3547" or "1980" as year). I added the if condition to check for year, month, day, but I still get the error.
I'd really appreciate any help in figuring this out.

Convert DataFrame column from string to datetime for format "January 1, 2001 Monday"

I am trying to convert a dataframe column "date" from string to datetime. I have this format: "January 1, 2001 Monday".
I tried to use the following:
from dateutil import parser
for index,v in df['date'].items():
df['date'][index] = parser.parse(df['date'][index])
But it gives me the following error:
ValueError: Cannot set non-string value '2001-01-01 00:00:00' into a StringArray.
I checked the datatype of the column "date" and it tells me string type.
This is the snippet of the dataframe:
Any help would be most appreciated!
why don't you try this instead of dateutils, pandas offer much simpler tools such as pd.to_datetime function:
df['date'] = pd.to_datetime(df['date'], format='%B %d, %Y %A')
You need to specify the format for the datetime object in order it to be parsed correctly. The documentation helps with this:
%A is for Weekday as locale’s full name, e.g., Monday
%B is for Month as locale’s full name, e.g., January
%d is for Day of the month as a zero-padded decimal number.
%Y is for Year with century as a decimal number, e.g., 2021.
Combining all of them we have the following function:
from datetime import datetime
def mdy_to_ymd(d):
return datetime.strptime(d, '%B %d, %Y %A').strftime('%Y-%m-%d')
print(mdy_to_ymd('January 1, 2021 Monday'))
> 2021-01-01
One more thing is for your case, .apply() will work faster, thus the code is:
df['date'] = df['date'].apply(lambda x: mdy_to_ymd)
Feel free to add Hour-Minute-Second if needed.

How to format date to 1900's?

I'm preprocessing data and one column represents dates such as '6/1/51'
I'm trying to convert the string to a date object and so far what I have is:
date = row[2].strip()
format = "%m/%d/%y"
datetime_object = datetime.strptime(date, format)
date_object = datetime_object.date()
print(date_object)
print(type(date_object))
The problem I'm facing is changing 2051 to 1951.
I tried writing
format = "%m/%d/19%y"
But it gives me a ValueError.
ValueError: time data '6/1/51' does not match format '%m/%d/19%y'
I couldn't easily find the answer online so I'm asking here. Can anyone please help me with this?
Thanks.
Parse the date without the century using '%m/%d/%y', then:
year_1900 = datetime_object.year - 100
datetime_object = datetime_object.replace(year=year_1900)
You should put conditionals around that so you only do it on dates that are actually in the 1900's, for example anything later than today.

how to converting sting into dateobject for date comparison?

I am facing trouble while converting string into dateobject in python.
I want to convert string '10 JAN 2016" to dateobject so that i can compare it to the present date and get the time difference.
I tired but i am getting formatting error.What is the solution to this problem?
Use the datetime module.
import datetime
date = datetime.datetime.strptime('10 jan 2016`,'%d %b %Y').date()
difference_in_dates = date - datetime.date.today() #this returns a timedelta object
Use datetime.timedelta objects for comparison.
You can look up the documentation about the formats (%d, %m, etc.) here
You need to use a valid format.
Try looking here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
dt = '10 JAN 2016'
dtime = datetime.datetime.strptime(dt, "theformat")

How to remove unconverted data from a Python datetime object

I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015
Without the invalid year, this was working for me:
end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))
But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters.
Any pointers? I would just slice end_date but im hoping there is a datetime-safe strategy.
Unless you want to rewrite strptime (a very bad idea), the only real option you have is to slice end_date and chop off the extra characters at the end, assuming that this will give you the correct result you intend.
For example, you can catch the ValueError, slice, and try again:
def parse_prefix(line, fmt):
try:
t = time.strptime(line, fmt)
except ValueError as v:
if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
line = line[:-(len(v.args[0]) - 26)]
t = time.strptime(line, fmt)
else:
raise
return t
For example:
parse_prefix(
'2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',
'%Y-%m-%d %H:%M:%S'
) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ...
Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:
end_date = end_date.split(" ")
end_date[-1] = end_date[-1][:4]
end_date = " ".join(end_date)
I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.
You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.
Here's an even simpler one-liner I use:
end_date = end_date[:-4]
Improving (i hope) the code of Adam Rosenfield:
import time
for end_date in ( 'Fri Feb 18 20:41:47 Paris, Madrid 2011',
'Fri Feb 18 20:41:47 Paris, Madrid 20112015'):
print end_date
fmt = "%a %b %d %H:%M:%S %Z %Y"
try:
end_date = time.strptime(end_date, fmt)
except ValueError, v:
ulr = len(v.args[0].partition('unconverted data remains: ')[2])
if ulr:
end_date = time.strptime(end_date[:-ulr], fmt)
else:
raise v
print end_date,'\n'
strptime() really expects to see a correctly formatted date, so you probably need to do some munging on the end_date string before you call it.
This is one way to chop the last item in the end_date to 4 chars:
chop = len(end_date.split()[-1]) - 4
end_date = end_date[:-chop]

Categories

Resources