Convert data type series to datetime - python

I have a column that is a Series and I want to convert to a datetime format. This particular column of my data frame looks like below:
x
Aug 1, 2019 7:20:04 AM
Aug 1, 2019 7:20:14 AM
Aug 1, 2019 7:20:24 AM
Aug 1, 2019 7:20:34 AM
I've seem some answers here and I tried to adapt my code accordingly.
datetime.datetime.strptime(df["x"],'%b %d, %Y %H:%M:%S %a').strftime('%d/%m/%Y %H:%M:%S')
But I get the following error:
strptime() argument 1 must be str, not Series
For this reason, I tried to convert to string using the following:
df['x'] = df['x'].apply(str)
df['x'] = df['x'].to_string()
df['x'] = df['x'].astype(str)
But it does not work.

I am assuming you are using pandas. You can use pandas to_datetime() function instead of datetimes functions which can only convert a single value for a given call. Also for AM/PM you need %p instead of %a
df['x'] = pd.to_datetime(df['x'], format="%b %d, %Y %H:%M:%S %p")
Edit
Check to make sure your data is exactly how you posted it. I copy and pasted your data, and created a data frame and it works without an error.
df = pd.DataFrame({'x':['Aug 1, 2019 7:20:04 AM','Aug 1, 2019 7:20:14 AM','Aug 1, 2019 7:20:24 AM','Aug 1, 2019 7:20:34 AM']})
Output:
x
0 Aug 1, 2019 7:20:04 AM
1 Aug 1, 2019 7:20:14 AM
2 Aug 1, 2019 7:20:24 AM
3 Aug 1, 2019 7:20:34 AM
df['x'] = pd.to_datetime(df['x'],format='%b %d, %Y %H:%M:%S %p')
Output:
x
0 2019-08-01 07:20:04
1 2019-08-01 07:20:14
2 2019-08-01 07:20:24
3 2019-08-01 07:20:34

Related

"Bad directive" value error when converting to Pandas datetime

I have a Pandas dataframe df that looks as follows:
df = pd.DataFrame({'timestamp' : ['Wednesday, Apr 4/04/22 at 17:02',
'Saturday, Apr 4/23/22 at 15:45'],
'foo' : [1, 2]
})
df
timestamp foo
0 Wednesday, Apr 4/04/22 at 17:02 1
1 Saturday, Apr 4/23/22 at 15:45 2
I'm trying to convert the timestamp column to a datetime object so that I can add a day_of_week column.
My attempt:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %-m/%-d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
The error is:
ValueError: '-' is a bad directive in format '%A, %b %-m/%-d/%y at %H:%M'
Any assistance would be greatly appreciated. Thanks!
Just use the format without the -:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %m/%d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
NB. to_datetime is quite flexible on the provided data, note how the incorrect day of week was just ignored.
output:
timestamp foo day_of_week
0 2022-04-04 17:02:00 1 Monday
1 2022-04-23 15:45:00 2 Saturday

How do I convert date from alphabetical to numeric format?

I want to convert date from 'Sep 17, 2021' format to '17.09.2021'. I made a function, but I can't apply it to the series. What am I doing wrong?
def to_normal_date(bad_date):
datetime.strptime(bad_date, '%b %d, %Y')
return s.strftime('%Y-%m-%d')
df['normal_date'] = df['date'].apply(to_normal_date)
I receive a ValueError when I'm trying to apply it to series. But it works fine with this:
to_normal_date('Sep 16, 2021')
Use pd.to_datetime to convert the "date" column to datetime format. Specifying errors="coerce" will convert dates that are not in the correct format to NaN values instead of raising errors.
Convert to the required format using .strftime with the .dt accessor.
df["normal_date"] = pd.to_datetime(df["date"], format="%b %d, %Y", errors="coerce").dt.strftime("%d.%m.%Y")
>>> df
date normal_date
0 Sep 17, 2021 17.09.2021
1 Oct 31, 2021 31.10.2021
2 Nov 19, 2021 19.11.2021
3 Dec 25, 2021 25.12.2021
Try:
pd.to_datetime(df['date'], format='%b %d, %Y').dt.strftime('%Y-%m-%d')
It should work provided all df['date'] entries match the date pattern of 'Sep 17, 2021'.

Convert string to datetime pandas

I have a DataFrame that contains strings which should be converted to datetime in order to sort the DataFrame. The strings are received from Syslogs.
The strings look like as the ones on the picture and below:
date
Mar 16 03:40:24.411
Mar 16 03:40:25.415
Mar 16 03:40:28.532
Mar 16 03:40:30.539
Mar 14 03:20:30.337
Mar 14 03:20:31.340
Mar 14 03:20:37.415
I tried to convert it with pandas.to_datetime(), but I received the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-03-16 03:40:24
I may need the nanoseconds as well.
Is necessary specify format of string with this reference.
There is no year, so output year is default:
df['date'] = pd.to_datetime(df['date'], format='%b %d %H:%M:%S.%f')
print (df)
date
0 1900-03-16 03:40:24.411
1 1900-03-16 03:40:25.415
2 1900-03-16 03:40:28.532
3 1900-03-16 03:40:30.539
4 1900-03-14 03:20:30.337
5 1900-03-14 03:20:31.340
6 1900-03-14 03:20:37.415
You can add some year to column and then parse it like:
df['date'] = pd.to_datetime('2020 ' + df['date'], format='%Y %b %d %H:%M:%S.%f')
print (df)
date
0 2020-03-16 03:40:24.411
1 2020-03-16 03:40:25.415
2 2020-03-16 03:40:28.532
3 2020-03-16 03:40:30.539
4 2020-03-14 03:20:30.337
5 2020-03-14 03:20:31.340
6 2020-03-14 03:20:37.415
The best way is using pandas.to_datetime as mentioned above. If you are not familiar with date string formatting, you can getaway using date parser libraries. Example dateutil library:
# python -m pip install —user dateutil
from dateutil import parser
import pandas as pd
df = pd.DataFrame({'dates': ['Mar 16 03:40:24.411',' Mar 16 03:40:25.415','Mar 16 03:40:28.532']})
# parse it
df['dates'] = df['dates'].apply(parser.parse)
print(df)
dateutil parser will add current year to your dates.
vectoring
# using numpy.vectorize
import numpy as np
df['dates'] = np.vectorize(parser.parse)(df['dates'])
Note:
This is not optional for large datasets and should be used only when pd.to_datetime is not able to parse date.

error reading date time from csv using pandas

I am using Pandas to read and process csv file. My csv file have date/time column that looks like:
11:59:50:322 02 10 2015 -0400 EDT
11:11:55:051 16 10 2015 -0400 EDT
00:38:37:106 02 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
13:43:28:540 28 11 2015 -0500 EST
09:24:12:723 14 12 2015 -0500 EST
13:28:12:346 28 12 2015 -0500 EST
How can I read this using python/pandas, so far what I have is this:
pd.to_datetime(pd.Series(df['senseStartTime']),format='%H:%M:%S:%f %d %m %Y %z %Z')
But this is not working, though previously I was able to use the same code for another format (with a different format specifier). Any suggestions?
The issue you're having is likely because versions of Python before 3.2 (I think?) had a lot of trouble with time zones, so your format string might be screwing up on the %z and %Z parts. For example, in Python 2.7:
In [187]: import datetime
In [188]: datetime.datetime.strptime('11:59:50:322 02 10 2015 -0400 EDT', '%H:%M:%S:%f %d %m %Y %z %Z')
ValueError: 'z' is a bad directive in format '%H:%M:%S:%f %d %m %Y %z %Z'
You're using pd.to_datetime instead of datetime.datetime.strptime but the underlying issues are the same, you can refer to this thread for help. What I would suggest is instead of using pd.to_datetime, do something like
In [191]: import dateutil
In [192]: dateutil.parser.parse('11:59:50.322 02 10 2015 -0400')
Out[192]: datetime.datetime(2015, 2, 10, 11, 59, 50, 322000, tzinfo=tzoffset(None, -14400))
It should be pretty simple to chop off the timezone at the end (which is redundant since you have the offset), and change the ":" to "." between the seconds and microseconds.
Since datetime.timezone has become available in Python 3.2, you can use %z with .strptime() (see docs). Starting with:
dateparse = lambda x: pd.datetime.strptime(x, '%H:%M:%S:%f %d %m %Y %z %Z')
df = pd.read_csv(path, parse_dates=['time_col'], date_parser=dateparse)
to get:
time_col
0 2015-10-02 11:59:50.322000-04:00
1 2015-10-16 11:11:55.051000-04:00
2 2015-11-02 00:38:37.106000-05:00
3 2015-11-14 04:15:51.600000-05:00
4 2015-11-14 04:15:51.600000-05:00
5 2015-11-28 13:43:28.540000-05:00
6 2015-12-14 09:24:12.723000-05:00
7 2015-12-28 13:28:12.346000-05:00

how to convert a string type to date format

My source data has a column including the date information but it is a string type.
Typical lines are like this:
04 13, 2013
07 1, 2012
I am trying to convert to a date format, so I used panda's to_datetime function:
df['ReviewDate_formated'] = pd.to_datetime(df['ReviewDate'],format='%mm%d, %yyyy')
But I got this error message:
ValueError: time data '04 13, 2013' does not match format '%mm%d, %yyyy' (match)
My questions are:
How do I convert to a date format?
I also want to extract to Month and Year and Day columns because I need to do some month over month comparison? But the problem here is the length of the string varies.
Your format string is incorrect, you want '%m %d, %Y', there is a reference that shows what the valid format identifiers are:
In [30]:
import io
import pandas as pd
t="""ReviewDate
04 13, 2013
07 1, 2012"""
df = pd.read_csv(io.StringIO(t), sep=';')
df
Out[30]:
ReviewDate
0 04 13, 2013
1 07 1, 2012
In [31]:
pd.to_datetime(df['ReviewDate'], format='%m %d, %Y')
Out[31]:
0 2013-04-13
1 2012-07-01
Name: ReviewDate, dtype: datetime64[ns]
To answer the second part, once the dtype is a datetime64 then you can call the vectorised dt accessor methods to get just the day, month, and year portions:
In [33]:
df['Date'] = pd.to_datetime(df['ReviewDate'], format='%m %d, %Y')
df['day'],df['month'],df['year'] = df['Date'].dt.day, df['Date'].dt.month, df['Date'].dt.year
df
Out[33]:
ReviewDate Date day month year
0 04 13, 2013 2013-04-13 13 4 2013
1 07 1, 2012 2012-07-01 1 7 2012

Categories

Resources