How to format datetime in a dataframe the way I want? - python

I cannot find the correct format for this datetime. I have tried several formats, %Y/%m/%d%I:%M:%S%p is the closest format I can find for the example below.
df['datetime'] = '2019-11-13 16:28:05.779'
df['datetime'] = pd.to_datetime(df['datetime'], format="%Y/%m/%d%I:%M:%S%p")
Result:
ValueError: time data '2019-11-13 16:28:05.779' does not match format '%Y/%m/%d%I:%M:%S%p' (match)

Before guessing yourself have pandas make the first guess
df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)
0 2019-11-13 16:28:05.779
Name: datetime, dtype: datetime64[ns]

You can solve this probably by using the parameter infer_datetime_format=True. Here's an example:
df = {}
df['datetime'] = '2019-11-13 16:28:05.779'
df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)
print(df['datetime'])
print(type(df['datetime'])
Output:
2019-11-13 16:28:05.779000
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Here is the pandas.to_datetime() call with the correct format string: pd.to_datetime(df['datetime'], format="%Y/%m/%d %H:%M:%S")
You were missing a space, %I is for 12-hour time (the example time you gave is 16:28, and %p is, to quote the docs, the Locale’s equivalent of either AM or PM.

Related

Convert String "YYYY-MM-DD hh:mm:ss Etc/GMT" to timestamp in UTC pandas

I have a pandas column of datetime-like string values like this exampe:
exammple_value = "2022-06-24 16:57:33 Etc/GMT"
Expected output
Timestamp('2022-06-24 16:57:33+0000', tz='UTC')
Etc/GMT is the timezone, you can get it in python with:
import pytz
list(filter(lambda x: 'GMT' in x, pytz.all_timezones))[0]
----
OUT:
'Etc/GMT'
Use to_datetime with %Z for parse timezones and for UTC use Timestamp.tz_convert:
exammple_value = "2022-06-24 16:57:33 Etc/GMT"
print (pd.to_datetime(exammple_value, format='%Y-%m-%d %H:%M:%S %Z').tz_convert('UTC'))
2022-06-24 16:57:33+00:00
Another idea is remove timezones by split:
print (pd.to_datetime(exammple_value.rsplit(maxsplit=1)[0]).tz_localize('UTC'))
2022-06-24 16:57:33+00:00

Timestamp datetime64 to datetime in dataframe

I am confused with datetime64 and trying convert it to a normal time format.
I have a column with timestamp format: 2022.01.02D23:10:12.197164900.
Output expected is: 2022-01-02 23:10:12
I'm trying with:
df['executionTime'] = pd.to_datetime(df['executionTime'], format='%Y-%m-%d %H:%M:%S.%f', errors='coerce')
Try this:
df['executionTime'] = pd.to_datetime(df['executionTime'], format='%Y.%m.%dD%H:%M:%S.%f', errors='coerce')

time data not matching format

time data '07/10/2019:08:00:00 PM' does not match format '%m/%d/%Y:%H:%M:%S.%f'
I am not sure what is wrong. Here is the code that I have been using:
import datetime as dt
df['DATE'] = df['DATE'].apply(lambda x: dt.datetime.strptime(x,'%m/%d/%Y:%H:%M:%S.%f'))
Here's a sample of the column:
Transaction_date
07/10/2019:08:00:00 PM
07/23/2019:08:00:00 PM
3/15/2021
8/15/2021
8/26/2021
Your format is incorrect. Try:
df["DATE"] = pd.to_datetime(df["DATE"], format="%m/%d/%Y:%I:%M:%S %p")
You should be using %I to specify a 12-hour format and %p for AM/PM.
Separately, just use pd.to_datetime instead of importing datetime and using apply.
Example:
>>> pd.to_datetime('07/10/2019:08:00:00 PM', format="%m/%d/%Y:%I:%M:%S %p")
Timestamp('2019-07-10 20:00:00')
Edit:
To handle multiple formats, you can use pd.to_datetime with fillna:
df["DATE"] = pd.to_datetime(df["DATE"], format="%m/%d/%Y:%I:%M:%S %p", errors="coerce").fillna(pd.to_datetime(df["DATE"], format="%m/%d/%Y", errors="coerce"))

How to correctly format datetime

We are struggling with formatting datetime in Python 3, and we can't seem to figure it out by our own. So far, we have formatted our dataframe to datetime, so that it should be '%Y-%m-%d %H:%M:%S':
before
02-01-2011 22:00:00
after
2011-01-02 22:00:00
For some very odd reason, when datetime is
13-01-2011 00:00:00
it is changed to this
2011-13-01 00:00:00
And from there it's mixing months with days and is therefore counting months instead of days.
This is all of our code for this datetime formatting:
df['local_date']=df['local_date'] + ':00'
df['local_date'] = pd.to_datetime(df.local_date)
df['local_date']=df['local_date'].dt.strftime('%Y-%m-%d %H:%M:%S')
UPDATED CODE WHICH WORKS:
df['local_date']=df['local_date'] + ':00'
df['local_date'] = pd.to_datetime(df.local_date.str.strip(), format='%d-%m-%Y %H:%M:%S')
df['local_date']=df['local_date'].dt.strftime('%Y-%m-%d %H:%M:%S')
Can't say for sure, but I believe this has to do with the warning mentioned in the documentation of to_datetime:
dayfirst : boolean, default False
Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).
I think the way to get around this is by explicitly pssing a format string to to_datetime:
df['local_date'] = pd.to_datetime(df.local_date, format='%d-%m-%Y %H:%M:%S')
This way it won't accidentally mix months and days (but it will raise an error if any line has a different format)
import pandas as pd
local_date = "13-01-2011 00:00"
local_date = local_date + ":00"
local_date = pd.to_datetime(local_date, format='%d-%m-%Y %H:%M:%S')
local_date = local_date.strftime('%Y-%m-%d %H:%M:%S')
print(local_date)
The output is:
2011-01-13 00:00:00

pandas.to_datetime with different length date strings

I have a column of timestamps that I would like to convert to datetime in my pandas dataframe. The format of the dates is %Y-%m-%d-%H-%M-%S which pd.to_datetime does not recognize. I have manually entered the format as below:
df['TIME'] = pd.to_datetime(df['TIME'], format = '%Y-%m-%d-%H-%M-%S')
My problem is some of the times do not have seconds so they are shorter
(format = %Y-%m-%d-%H-%M).
How can I get all of these strings to datetimes?
I was thinking I could add zero seconds (-0) to the end of my shorter dates but I don't know how to do that.
try strftime and if you want the right format and if Pandas can't recognize your custom datetime format, you should provide it explicetly
from functools import partial
df1 = pd.DataFrame({'Date': ['2018-07-02-06-05-23','2018-07-02-06-05']})
newdatetime_fmt = partial(pd.to_datetime, format='%Y-%m-%d-%H-%M-%S')
df1['Clean_Date'] = (df1.Date.str.replace('-','').apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d-%H-%M-%S'))
.apply(newdatetime_fmt))
print(df1,df1.dtypes)
output:
Date Clean_Date
0 2018-07-02-06-05-23 2018-07-02 06:05:23
1 2018-07-02-06-05 2018-07-02 06:05:00
Date object
Clean_Date datetime64[ns]

Categories

Resources