Change date from specific format in pandas? - python

I currently have a pandas DF with date column in the following format:
JUN 05, 2028
Expected Output:
2028-06-05.
Most of the examples I see online do not use the original format I have posted, is there not an existing solution for this?

Use to_datetime with custom format from python's strftime directives:
df = pd.DataFrame({'dates':['JUN 05, 2028','JUN 06, 2028']})
df['dates'] = pd.to_datetime(df['dates'], format='%b %d, %Y')
print (df)
dates
0 2028-06-05
1 2028-06-06

Related

Convert DataFrame column from string to datetime for format "January 1, 2001 Monday"

I am trying to convert a dataframe column "date" from string to datetime. I have this format: "January 1, 2001 Monday".
I tried to use the following:
from dateutil import parser
for index,v in df['date'].items():
df['date'][index] = parser.parse(df['date'][index])
But it gives me the following error:
ValueError: Cannot set non-string value '2001-01-01 00:00:00' into a StringArray.
I checked the datatype of the column "date" and it tells me string type.
This is the snippet of the dataframe:
Any help would be most appreciated!
why don't you try this instead of dateutils, pandas offer much simpler tools such as pd.to_datetime function:
df['date'] = pd.to_datetime(df['date'], format='%B %d, %Y %A')
You need to specify the format for the datetime object in order it to be parsed correctly. The documentation helps with this:
%A is for Weekday as locale’s full name, e.g., Monday
%B is for Month as locale’s full name, e.g., January
%d is for Day of the month as a zero-padded decimal number.
%Y is for Year with century as a decimal number, e.g., 2021.
Combining all of them we have the following function:
from datetime import datetime
def mdy_to_ymd(d):
return datetime.strptime(d, '%B %d, %Y %A').strftime('%Y-%m-%d')
print(mdy_to_ymd('January 1, 2021 Monday'))
> 2021-01-01
One more thing is for your case, .apply() will work faster, thus the code is:
df['date'] = df['date'].apply(lambda x: mdy_to_ymd)
Feel free to add Hour-Minute-Second if needed.

Date and Month formatting issue into Excel from DF

On Python 3.9 and Pandas 1.3.4.
So I'm trying to format 2 columns of my df and export it to excel. These 2 columns are date and month. Date is supposed to be formatted as %m/%d/%y and Month is supposed to be formatted as %B %Y.
When I do print(df['Date']) and print(df['Month']) it prints 01/04/22 and January 2022 respectively. However when I do df.to_csv(file.csv) it shows in excel as 1/4/2022 and Jan-22. I would like it to be formatted as 01/04/22 and January 2022 respectively. How can I solve this?
This is my current code:
import pandas as pd
df = pd.DataFrame(pd.read_csv(file.csv, dtype=str))
df["Date"] = pd.Timestamp("today").strftime("%m/%d/%y")
df["Month"] = pd.Timestamp("today").strftime("%B %Y")
print(df["Date"])
print(df["Month"])
df.to_excel('file.xlsx', index=False)
NOTE: to_excel fixed this issue

time data not matching format

time data '07/10/2019:08:00:00 PM' does not match format '%m/%d/%Y:%H:%M:%S.%f'
I am not sure what is wrong. Here is the code that I have been using:
import datetime as dt
df['DATE'] = df['DATE'].apply(lambda x: dt.datetime.strptime(x,'%m/%d/%Y:%H:%M:%S.%f'))
Here's a sample of the column:
Transaction_date
07/10/2019:08:00:00 PM
07/23/2019:08:00:00 PM
3/15/2021
8/15/2021
8/26/2021
Your format is incorrect. Try:
df["DATE"] = pd.to_datetime(df["DATE"], format="%m/%d/%Y:%I:%M:%S %p")
You should be using %I to specify a 12-hour format and %p for AM/PM.
Separately, just use pd.to_datetime instead of importing datetime and using apply.
Example:
>>> pd.to_datetime('07/10/2019:08:00:00 PM', format="%m/%d/%Y:%I:%M:%S %p")
Timestamp('2019-07-10 20:00:00')
Edit:
To handle multiple formats, you can use pd.to_datetime with fillna:
df["DATE"] = pd.to_datetime(df["DATE"], format="%m/%d/%Y:%I:%M:%S %p", errors="coerce").fillna(pd.to_datetime(df["DATE"], format="%m/%d/%Y", errors="coerce"))

Need to convert word month into number from a table

I would like to convert the column df['Date'] to numeric time format:
the current format i.e. Oct 9, 2019 --> 10-09-2019
Here is my code but I did not get an error until printing it. Thanks for your support!
I made some changes,
I want to convert the current time format to numeric time format, ie: Oct 9, 2019 --> 10-09-2019 in a column of a table
from time import strptime
strptime('Feb','%b').tm_mon
Date_list = df['Date'].tolist()
Date_num = []
for i in Date_list:
num_i=strptime('[i[0:3]]', '%b').tm_mon
Date_num.append(num_i)
df['Date'] = Date_num
print(df['Date'])
I got the error message as follows:
KeyError
ValueError: time data '[i[0:3]]' does not match format '%b'
Date
Oct 09, 2019
Oct 08, 2019
Oct 07, 2019
Oct 04, 2019
Oct 03, 2019
assuming Date column in df is of str/object type.
can be validated by running pd.dtypes.
in such case you can convert your column directly to datetime type by
df['Date'] = df['Date'].astype('datetime64[ns]')
which will show you dates in default format of 2019-10-09. If you want you can convert this to any other date format you want very easily by doing something like
pd.dt.dt.strftime("%d-%m-%Y")
please go through https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html for more info related to pandas datetime functions/operations

Cannot convert dataframe column to 24-H format datetime

I have a timestamp column in my dataframe which is originally a str type. Some sample values:
'6/13/2015 6:45:58 AM'
'6/13/2015 7:00:37 PM'
I use the following code to convert this values into datetime with 24H format using this code:
df['timestampx'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %H:%M:%S %p')
And, I obtain this result:
2015-06-13 06:45:58
2015-06-13 07:00:37
That means, the dates are NOT converted with 24H format and I am also loosing the AM/PM info. Any help?
You're reading it in as a 24 hour time, but really the current format isn't 24 hour time, it's 12 hour time. Read it in as 12 hour with the suffix (AM/PM), then you'll be OK to output in 24 hour time later if need be.
df = pd.DataFrame(['6/13/2015 6:45:58 AM','6/13/2015 7:00:37 PM'], columns = ['timestamp'])
df['timestampx'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %I:%M:%S %p')
print df
timestamp timestampx
0 6/13/2015 6:45:58 AM 2015-06-13 06:45:58
1 6/13/2015 7:00:37 PM 2015-06-13 19:00:37

Categories

Resources