Convert Pandas column into same date time format - python

I have a file where the date and time are in mixed formats as per below:
Ref_ID Date_Time
5.645217e 2020-12-02 16:23:15
5.587422e 2019-02-25 18:33:24
What I'm trying to do is convert the dates into a standard format so that I can further analyse my dataset.
Expected Outcome:
Ref_ID Date_Time
5.645217e 2020-02-12 16:23:15
5.587422e 2019-02-25 18:33:24
So far I've tried a few things like Pandas to_datetime conversion and converting the date using strptime but none has worked so far.
# Did not work
data["Date_Time"] = pd.to_datetime(data["Date_Time"], errors="coerce")
# Also Did not work
data["Date_Time"] = data["Date_Time"].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%y'))
I've also searched this site for a solution but haven't found one yet.

you could try uisng str.split to extract the day and month and use some boolean testing:
this may be a bit confusing with all the variables but all we are doing is creating new series and dataframes to manipulate the variables, those being the day and month of your original date-time column
# create new dataframe with time split by space so date and time are split
s = df['Date_Time'].str.split('\s',expand=True)
# split date into its own series
m = s[0].str.split('-',expand=True).astype(int)
#use conditional logic to figure out column is the month or day.
m['possible_month'] = np.where(m[1].ge(12),m[2],m[1])
m['possible_day'] = np.where(m[1].ge(12),m[1],m[2])
#concat this back into your first split to re-create a proper datetime.
s[0] = m[0].astype(str).str.cat([m['possible_month'].astype(str),
m['possible_day'].astype(str)],'-')
df['fixed_date'] = pd.to_datetime(s[0].str.cat(s[1].astype(str),' ')
,format='%Y-%m-%d %H:%M:%S')
print(df)
Ref_ID Date_Time fixed_date
0 5.645217e 2020-12-02 16:23:15 2020-02-12 16:23:15
1 5.587422e 2019-02-25 18:33:24 2019-02-25 18:33:24
print(df.dtypes)
Ref_ID object
Date_Time object
fixed_date datetime64[ns]
dtype: object

Related

convert yyyy-mm-dd to mmm-yy in dataframe python

I am trying to convert the way month and year is presented.
I have dataframe as below
Date
2020-01-31
2020-04-30
2021-05-05
and I want to convert it in the way like month and year.
The output that I am expecting is
Date
Jan-20
Apr-20
May-21
I tried to do it with datetime but it doesn't work.
pd.to_datetime(pd.Series(df['Date'),format='%mmm-%yy')
Use .dt.strftime() to change the display format. %b-%y is the format string for Mmm-YY:
df.Date = pd.to_datetime(df.Date).dt.strftime('%b-%y')
# Date
# 0 Jan-20
# 1 Apr-20
# 2 May-21
Or if Date is the index:
df.index = pd.to_datetime(df.index).dt.strftime('%b-%y')
import pandas as pd
date_sr = pd.to_datetime(pd.Series("2020-12-08"))
change_format = date_sr.dt.strftime('%b-%Y')
print(change_format)
reference https://docs.python.org/3/library/datetime.html
%Y-%m-%d changed to ('%b-%y')
import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').strftime('%b-%y'))
# reference https://docs.python.org/3/library/datetime.html
# %Y-%m-%d changed to ('%b-%y')

pandas.to_datetime with different length date strings

I have a column of timestamps that I would like to convert to datetime in my pandas dataframe. The format of the dates is %Y-%m-%d-%H-%M-%S which pd.to_datetime does not recognize. I have manually entered the format as below:
df['TIME'] = pd.to_datetime(df['TIME'], format = '%Y-%m-%d-%H-%M-%S')
My problem is some of the times do not have seconds so they are shorter
(format = %Y-%m-%d-%H-%M).
How can I get all of these strings to datetimes?
I was thinking I could add zero seconds (-0) to the end of my shorter dates but I don't know how to do that.
try strftime and if you want the right format and if Pandas can't recognize your custom datetime format, you should provide it explicetly
from functools import partial
df1 = pd.DataFrame({'Date': ['2018-07-02-06-05-23','2018-07-02-06-05']})
newdatetime_fmt = partial(pd.to_datetime, format='%Y-%m-%d-%H-%M-%S')
df1['Clean_Date'] = (df1.Date.str.replace('-','').apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d-%H-%M-%S'))
.apply(newdatetime_fmt))
print(df1,df1.dtypes)
output:
Date Clean_Date
0 2018-07-02-06-05-23 2018-07-02 06:05:23
1 2018-07-02-06-05 2018-07-02 06:05:00
Date object
Clean_Date datetime64[ns]

KeyError: Timestamp when converting date in column to date

Trying to convert the date (type=datetime) of a complete column into a date to use in a condition later on. The following error keeps showing up:
KeyError: Timestamp('2010-05-04 10:15:55')
Tried multiple things but I'm currently stuck with the code below.
for d in df.column:
pd.to_datetime(df.column[d]).apply(lambda x: x.date())
Also, how do I format the column so I can use it in a statement as follows:
df = df[df.column > 2015-05-28]
Just adding an answer in case anyone else ends up here :
firstly, lets create a dataframe with some dates, change the dtype into a string and convert it back. the errors='ignore' argument will ignore any non date time values in your column, so if you had John Smith in row x it would remain, on the same vein, if you changed errors='coerce' it would change John Smith into NaT (not a time value)
# Create date range with frequency of a day
rng = pd.date_range(start='01/01/18', end ='01/01/19',freq='D')
#pass this into a dataframe
df = pd.DataFrame({'Date' : rng})
print(df.dtypes)
Date datetime64[ns]
#okay lets case this into a str so we can convert it back
df['Date'] = df['Date'].astype(str)
print(df.dtypes)
Date object
# now lets convert it back #
df['Date'] = pd.to_datetime(df.Date,errors='ignore')
print(df.dtypes)
Date datetime64[ns]
# Okay lets slice the data frame for your desired date ##
print(df.loc[df.Date > '2018-12-29'))
Date
363 2018-12-30
364 2018-12-31
365 2019-01-01
The answer as provided by #Datanovice:
pd.to_datetime(df['your column'],errors='ignore')
then inspect the dtype it should be a datetime, if so, just do
df.loc[df.['your column'] > 'your-date' ]

How to add last day of the month for each month in python

I have my data in the following format:
final.head(5)
(Head of the data, displaying sales for each month from May 2015)
I want to add the last day of the month for each record and want an output like this
transactionDate sale_price_after_promo
05/30/2015 30393.8
06/31/2015 24345.68
07/30/2015 26688.91
08/31/2015 46626.1
09/30/2015 27933.84
10/31/2015 76087.55
I tried this
pd.Series(pd.DatetimeIndex(start=final.start_time, end=final.end_time, freq='M')).to_frame('transactionDate')
But getting an error
'DataFrame' object has no attribute 'start_time'
Create PeriodIndex and then convert it to_timestamp:
df = pd.DataFrame({'transactionDate':['2015-05','2015-06','2015-07']})
df['date'] = pd.PeriodIndex(df['transactionDate'], freq='M').to_timestamp(how='end')
print (df)
transactionDate date
0 2015-05 2015-05-31
1 2015-06 2015-06-30
2 2015-07 2015-07-31
I am attempting to convert dynamically all date columns to YYYY-MM-DD format using dataframe that come from read_csv. columns are below.
input
empno,ename,hiredate,report_date,end_date
1,sreenu,17-Jun-2021,18/06/2021,May-22
output
empno,ename,hiredate,report_date,end_date
1,sreenu,2021-06-17,2021-06-18,2022-05-31
rules are
if date is MMM-YY or MM-YYYY(May-22 or 05-2022) (then last day of the month(YYYY-MM-DD format - 2022-05-31)
other than point 1 then it should be YYYY-MM-DD
Now i want create a method/function to identify all date datatype columns in dataframe then convert to YYYY-MM-DD format/user expected format.

How to change format of data to '%Y%m%d' in Pandas?

I have a DF with first column showing as e.g. 2018-01-31 00:00:00.
I want to convert whole column (or during printing / saving to other variable) that date to 20180131 format.
NOT looking to do that during saving to a CSV file.
Tried this but it did not work:
df['mydate'] = pd.to_datetime(df['mydate'], format='%Y%m%d')
pd.to_datetime is used to convert your series to datetime:
s = pd.Series(['2018-01-31 00:00:00'])
s = pd.to_datetime(s)
print(s)
0 2018-01-31
dtype: datetime64[ns]
pd.Series.dt.strftime converts your datetime series to a string in your desired format:
s = s.dt.strftime('%Y%m%d')
print(s)
0 20180131
dtype: object
pd.to_datetime will convert a string to a date. You want to covert a date to a string
df['mydate'].dt.strftime('%Y%m%d')
Note that it's possible your date is already a string, but in the wrong format in which case you might have to convert it to a date first:
pd.to_datetime(df['mydate'], format='%Y-%m-%d %H:%M:%S').dt.strftime('%Y%m%d')
Convert the string column with 2018-01-31 00:00:00. to a datetime:
df['mydate'] = pd.to_datetime(df['mydate'])
#Get your preferred strings based on format:
df['mydate'].dt.strftime('%Y-%m-%d')
#Output: '2018-01-31'
df['mydate'].dt.strftime('%Y%m%d')
#output:'20180131'

Categories

Resources