Problem converting column with date info as object to datetime - python

I've a column with birth dates as object, the problem is when I tried to convert it into datetime, because it displays always the next warning
time data '27126' does not match format '%d/%m/%Y' (match)
date
0 05/06/1980
1 31/07/1947
2 07/01/1963
3 26/03/1973
4 30/01/1991
5 12/12/1991
6 13/08/1987
7 10/01/1944
8 23/06/1965
9 08/10/1995
till now I've tried the next codes:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['date'] = df['date'].apply(lambda x: datetime.datetime.strptime(x, "%d/%m/%Y").strftime("%Y-%m-%d"))
df['date'] = pd.to_datetime(df['date'].str.strip(), format='%d/%m/%Y')

Add parameter errors='coerce' for convert non matched datetimes to missing values, here NaT:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')

Related

Date and time conversion in python pandas

A .csv file has a date column. When read into a pandas DataFrame and displayed, the date and time are displayed as:
2021-06-30 19:39:25
The correct date is 30-06-2021 19:39:25
How can this be changed?
using pandas.to_datetime method to convert date format will be more reliable
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')
Try strftime:
>>> date.strftime('%d-%m-%Y %H:%M:%S')
'30-06-2021 19:39:25'
>>>
try below:
df = pd.DataFrame({'Date':['2021-06-30 19:39:25', '2021-07-22 19:39:25', '2021-08-18 19:39:25']})
# convert `Date` column to datetime
df['Date'] = pd.to_datetime(df['Date'])
Solution:
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')
if the above doesn't work then use belwo..
# Now convert to desired format
df['Date'] = pd.to_datetime(df["Date"].dt.strftime('%m-%d-%Y %H:%M:%S')).dt.strftime('%d-%m-%Y %H:%M:%S')
print(df)
0 30-06-2021 19:39:25
1 22-07-2021 19:39:25
2 18-08-2021 19:39:25
Name: Date, dtype: object

How to extract multiple parts of values of a single column?

I have a date column of the format YYYY-MM-DD. I want to slice the only year and month from it. But I don't want the "-" as I have to later convert it into an integer to feed into my linear regression model.
It's current datatype is "object".
Dataframe :-
date open close high low
0 2019-10-08 56.46 56.10 57.02 56.08
1 2019-10-09 56.76 56.76 56.95 56.41
2 2019-10-10 56.98 57.52 57.61 56.83
3 2019-10-11 58.24 59.05 59.41 58.08
4 2019-10-14 58.73 58.97 59.53 58.67
You can use pd.to_datetime to convert date column to datetime then use pd.Series.dt.strftime.
s = pd.to_datetime(df['date'])
df['date'] = s.dt.strftime("%Y%m") # would give 202010
# or
# df['date'] = s.dt.strftime("%y%m") # would give 2010
date --> your date column
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))

Cannot remove timestamp in datetime

I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.

TypeError: Passing PeriodDtype data is invalid. Use `data.to_timestamp()` instead

How can I convert a date column with format of 2014-09 to format of 2014-09-01 00:00:00.000? The previous format is converted from df['date'] = pd.to_datetime(df['date']).dt.to_period('M').
I use df['date'] = pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.000'), but it generates an error: TypeError: Passing PeriodDtype data is invalid. Use data.to_timestamp() instead. I also try with pd.to_datetime(df['date']).dt.strftime('%Y-%m'), it generates same error.
First idea is convert periods to timestamps by Series.to_timestamp and then use Series.dt.strftime:
print (df)
date
0 2014-09
print (df.dtypes)
date period[M]
dtype: object
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or simply add last values same for each value:
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S').add('.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or:
df['date'] = df['date'].dt.strftime('%Y-%m').add('-01 00:00:00.000')
print (df)
date
0 2014-09-01 00:00:00.000
use %f for milliseconds
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S.%f')
sample code is
df = pd.DataFrame({
'Date': ['2014-09-01 00:00:00.000']
})
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S.%f')
df
which gives you the following output
Date
0 2014-09-01
to convert 2014-09 in Period to 2014-09-01 00:00:00.000, we can do as follows
df = pd.DataFrame({
'date': ['2014-09-05']
})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['date'] = pd.to_datetime(df['date']).dt.to_period("M")
df['date'] = df['date'].dt.strftime('%Y-%m-01 00:00:00.000')
df
Try stripping the last 3 digits
print(pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.%f')[0][:-3])
Output:
2014-09-01 00:00:00.000
In the event the other answers don't work, you could try
df.index = pd.DatetimeIndex(df.date).to_period('s')
df.index
Which should show the datetimeindex object with the frequency set as 's'

Pandas - Different time formats in the same column

I have a Dataframe that has dates stored in different formats in the same column as shown below:
date
1-10-2018
2-10-2018
3-Oct-2018
4-10-2018
Is there anyway I could make all of them to have the same format.
Use to_datetime with specify formats with errors='coerce' for replace not matched values to NaNs. Last combine_first for replace missing values by date2 Series.
date1 = pd.to_datetime(df['date'], format='%d-%m-%Y', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce')
df['date'] = date1.combine_first(date2)
print (df)
date
0 2018-10-01
1 2018-10-02
2 2018-10-03
3 2018-10-04

Categories

Resources