Today one of my script gave an error for an invalid datetime format as an input. The script is expecting the datetime input as '%m/%d/%Y', but it got it in an entirely different format. For example, the date should have been 5/2/2022 but it was May 2, 2022. To add a bit more information for clarity, the input is coming for a Google sheet and the entire date is in a single cell (rather than different cells for month, date and year).
Is there a way to convert this kind of worded format to the desired datetime format before the script starts any kind of processing?
If you're in presence of the full month name, try this:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y")
0 2022-05-02
Name: Date, dtype: datetime64[ns]
According to the Python docs:
%B: "Month as locale’s full name".
%d: "Day of the month as a zero-padded decimal number". (Although it seems to work in this case)
%Y: "Year with century as a decimal number."
Now, if you want to transform this date to the format you initially expected, just transform the series using .dt.strftime:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y").dt.strftime("%m/%d/%Y")
0 05/02/2022
Name: Date, dtype: object
Related
I am trying to convert a dataframe column "date" from string to datetime. I have this format: "January 1, 2001 Monday".
I tried to use the following:
from dateutil import parser
for index,v in df['date'].items():
df['date'][index] = parser.parse(df['date'][index])
But it gives me the following error:
ValueError: Cannot set non-string value '2001-01-01 00:00:00' into a StringArray.
I checked the datatype of the column "date" and it tells me string type.
This is the snippet of the dataframe:
Any help would be most appreciated!
why don't you try this instead of dateutils, pandas offer much simpler tools such as pd.to_datetime function:
df['date'] = pd.to_datetime(df['date'], format='%B %d, %Y %A')
You need to specify the format for the datetime object in order it to be parsed correctly. The documentation helps with this:
%A is for Weekday as locale’s full name, e.g., Monday
%B is for Month as locale’s full name, e.g., January
%d is for Day of the month as a zero-padded decimal number.
%Y is for Year with century as a decimal number, e.g., 2021.
Combining all of them we have the following function:
from datetime import datetime
def mdy_to_ymd(d):
return datetime.strptime(d, '%B %d, %Y %A').strftime('%Y-%m-%d')
print(mdy_to_ymd('January 1, 2021 Monday'))
> 2021-01-01
One more thing is for your case, .apply() will work faster, thus the code is:
df['date'] = df['date'].apply(lambda x: mdy_to_ymd)
Feel free to add Hour-Minute-Second if needed.
I have a series of dates but in a format like "1OCT20" or "30MAR19", how can I convert them into datetime?
thanks in advance
use pd.to_datetime with the format argument set to %d%b%y
%d Day of the month as a zero-padded decimal number.
%b Month as locale’s abbreviated name.
%y Year without century as a zero-padded decimal number.
I usually use this https://strftime.org/ website when looking for specific datetime formats.
pd.to_datetime('1OCT20',format='%d%b%y')
Timestamp('2020-10-01 00:00:00')
pd.to_datetime('30MAR19',format='%d%b%y')
Timestamp('2019-03-30 00:00:00')
on your dataset you can cast it directly on your column
df['trgdate'] = pd.to_datetime(df['srcdate'],format='%d%b%y')
I am trying to get user input for 2 different dates which i will pass on to another function.
def twodifferentdates():
print("Data between 2 different dates")
start_date = datetime.strptime(input('Enter Start Date in m/d/y format'), '%m&d&Y')
end_date = datetime.strptime(input('Enter end date in m/d/y format'), '%m&d&Y')
print(start_date)
twodifferentdates()
I have tried a lot of different ways to enter the dates but i keep getting
ValueError: time data '01/11/1987' does not match format '%m&d&Y'
I have used the same code which was discussed in:
how do I take input in the date time format?
Any help here would be appreciated.
Replace %m&d&Y with %m/%d/%Y as described in the referenced post.
datetime.strptime() requires you to specify the format, on a character-by-character basis, of the date you want to input. For the string '01/11/1987' you'd do
datetime.strptime(..., '%m/%d/%Y')
where %m is "two-digit month", %d is "two-digit day" and %Y is "four-digit year" (as opposed to two-digit year %y. These values are separated by slashes.
See also the datetime documentation which describes how to use strptime and strftime.
I'm not very experienced with the datetime module, but the error seems to be the way you're taking input. You should be taking it like this:
start_date = datetime.strptime(input('Enter Start Date in m/d/y format'), '%m &d &Y')
or
start_date = datetime.strptime(input('Enter Start Date in m/d/y format'), '%m/&d/&Y')
I have the following python snippet:
from datetime import datetime
timestamp = '05/Jan/2015:17:47:59:000-0800'
datetime_object = datetime.strptime(timestamp, '%d/%m/%y:%H:%M:%S:%f-%Z')
print datetime_object
However when I execute the code, I'm getting the following error:
ValueError: time data '05/Jan/2015:17:47:59:000-0800' does not match format '%d/%m/%y:%H:%M:%S:%f-%Z'
what's wrong with my matching expression?
EDIT 2: According to this post, strptime doesn't support %z (despite what the documentation suggests). To get around this, you can just ignore the timezone adjustment?:
from datetime import datetime
timestamp = '05/Jan/2015:17:47:59:000-0800'
# only take the first 24 characters of `timestamp` by using [:24]
dt_object = datetime.strptime(timestamp[:24], '%d/%b/%Y:%H:%M:%S:%f')
print(dt_object)
Gives the following output:
$ python date.py
2015-01-05 17:47:59
EDIT: Your datetime.strptime argument should be '%d/%b/%Y:%H:%M:%S:%f-%z'
With strptime(), %y refers to
Year without century as a zero-padded decimal number
I.e. 01, 99, etc.
If you want to use the full 4-digit year, you need to use %Y
Similarly, if you want to use the 3-letter month, you need to use %b, not %m
I haven't looked at the rest of the string, but there are possibly more mismatches. You can find out how each section can be defined in the table at https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
And UTC offset is lowercase z.
I have am trying to process data with a timestamp field. The timestamp looks like this:
'20151229180504511' (year, month, day, hour, minute, second, millisecond)
and is a python string. I am attempting to convert it to a python datetime object. Here is what I have tried (using pandas):
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S"))
# returns error time data '20151229180504511' does not match format '%Y%b%d%H%M%S'
So I add milliseconds:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S%f"))
# also tried with .%f all result in a format error
So tried using the dateutil.parser:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda s: dateutil.parser.parse(s).strftime(DateFormat))
# results in OverflowError: 'signed integer is greater than maximum'
Also tried converting these entries using the pandas function:
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'], unit='ms', errors='coerce')
# coerce does not show entries as NaT
I've made sure that whitespace is gone. Converting to Strings, to integers and floats. No luck so far - pretty stuck.
Any ideas?
p.s. Background info: The data is generated in an Android app as a the java.util.Calendar class, then converted to a string in Java, written to a csv and then sent to the python server where I read it in using pandas read_csv.
Just try :
datetime.strptime(x,"%Y%m%d%H%M%S%f")
You miss this :
%b : Month as locale’s abbreviated name.
%m : Month as a zero-padded decimal number.
%b is for locale-based month name abbreviations like Jan, Feb, etc.
Use %m for 2-digit months:
In [36]: df = pd.DataFrame({'Timestamp':['20151229180504511','20151229180504511']})
In [37]: df
Out[37]:
Timestamp
0 20151229180504511
1 20151229180504511
In [38]: pd.to_datetime(df['Timestamp'], format='%Y%m%d%H%M%S%f')
Out[38]:
0 2015-12-29 18:05:04.511
1 2015-12-29 18:05:04.511
Name: Timestamp, dtype: datetime64[ns]