Hello there stackoverflow community,
I would like to change the datetime format of a column, but I doesn't work and I don't know what I'am doing wrong.
After executing the following code:
df6['beginn'] = pd.to_datetime(df6['beginn'], unit='s', errors='ignore')
I got this output, and thats fine, but i would like to take out the hour to have only %m/%d/%Y left.
ID DATE
91060 2017-11-10 00:00:00
91061 2022-05-01 00:00:00
91062 2022-04-01 00:00:00
Name: beginn, Length: 91063, dtype: object
I've tried this one and many others
df6['beginn'] = df6['beginn'].dt.strftime('%m/%d/%Y')
and get the following output:
AttributeError: Can only use .dt accessor with datetimelike values.
But I don't understand why, I've transformed the data with pd.to_datetime or not?
Appreciate any hint you can give me! Thanks a lot!
The reason you have to use errors="ignore" is because not all the dates you are parsing are in the correct format. If you use errors="coerce" like #phi has mentioned then any dates that cannot be converted will be set to NaT. The columns datatype will still be converted to datatime64 and you can then format as you like and deal with the NaT as you want.
Example
A dataframe with one item in Date not written as Year/Month/Day (25th Month is wrong):
>>> df = pd.DataFrame({'ID': [91060, 91061, 91062, 91063], 'Date': ['2017/11/10', '2022/05/01', '2022/04/01', '2055/25/25']})
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Using errors="ignore":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='ignore')
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Column Date is still an object because not all the values have been converted. Running df['Date'] = df['Date'].dt.strftime("%m/%d/%Y") will result in the AttributeError
Using errors="coerce":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
>>> df
ID Date
0 91060 2017-11-10
1 91061 2022-05-01
2 91062 2022-04-01
3 91063 NaT
>>> df.dtypes
ID int64
Date datetime64[ns]
dtype: object
Invalid dates are set to NaT and the column is now of type datatime64 and you can now format it:
>>> df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
>>> df
ID Date
0 91060 11/10/2017
1 91061 05/01/2022
2 91062 04/01/2022
3 91063 NaN
Note: When formatting datatime64, it is converted back to type object so NaT's are changed to NaN. The issue you are having is a case of some dirty data not in the correct format.
Related
I have a similar question to this: Convert date column (string) to datetime and match the format and I want to convert a string like '12/7/21' to '2021-07-12' as a date object. I believe the answer given in the link above is wrong and here is why:
# The suggested solution on Stackoverflow
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21']})
>>> df['Date']
0 15/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-07-15
Name: Date, dtype: datetime64[ns]
Because Python doesn't care about the specified format in the above code! If you simply change 15 to 12 and input '12/7/21' then it treats 12 as month instead of day:
>>> df = pd.DataFrame({'Date':['12/7/21']})
>>> df['Date']
0 12/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-12-07
Name: Date, dtype: datetime64[ns]
Does anyone know what's the best solution to this problem?
(In R you simply use lubridate::dmy(df$Date) and it works perfectly)
.astype('datetime64') attempts to parse the string MM/DD/YY however if it can't (in the case that MM > 12) it will fall back to parsing as DD/MM/YY this is why you see seemingly inconsistent behaviour:
>>> import pandas as pd
>>> pd.Series('15/7/21').astype('datetime64')
0 2021-07-15
dtype: datetime64[ns]
>>> pd.Series('14/7/21').astype('datetime64')
0 2021-07-14
dtype: datetime64[ns]
>>> pd.Series('13/7/21').astype('datetime64')
0 2021-07-13
dtype: datetime64[ns]
>>> pd.Series('12/7/21').astype('datetime64')
0 2021-12-07
dtype: datetime64[ns]
The way to solve this is just to pass a Series of strings to pd.to_datetime instead of intermediately converting to datetime64s. So you can simply do
pd.to_datetime(df['Date'], format='%d/%m/%y')
without the .astype cast
This is the plain column
0 06:55:22
1 06:55:23
2 06:55:24
3 06:55:25
4 06:55:26
And the I would like to put that column in the index, the problem is when I try to use the method resample() I always get the same problem:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
I've been using this to change the Time column to
datetime dt['Time'] = pd.to_datetime(dt['Time'],format).apply(lambda x: x.time())
You can use set_index to set the Time column as your index of the dataframe.
In [1954]: df.set_index('Time')
Out[1954]:
a
Time
06:55:23 1
06:55:24 2
06:55:25 3
06:55:26 4
Update after OP's comment
If you don't have a date column, so pandas will attach a default date 1900-01-01 when you convert it to datetime. Like this:
In [1985]: pd.to_datetime(df['Time'], format='%H:%M:%S')
Out[1985]:
0 1900-01-01 06:55:23
1 1900-01-01 06:55:24
2 1900-01-01 06:55:25
3 1900-01-01 06:55:26
Name: Time, dtype: datetime64[ns]
i have a column with strings which is to convert to datetime (spanish date format)
>>> df['Date'].head()
0 31/10/2019
1 31/10/2019
2 30/10/2019
3 30/10/2019
4 29/10/2019
Name: Date, dtype: object
Convert
>>>pd.to_datetime(df['Date'], dayfirst = True)
>>>df['Date'].head()
0 2019-10-31
1 2019-10-31
2 2019-10-30
3 2019-10-30
4 2019-10-29
Name: Date, dtype: datetime64[ns]
And now I want to sort it by date, and the output converts strangely to:
>>>df['Date'] =df.sort_values(by=['Date'], ascending = True)
>>>df['Date'].head()
0 9443248_19
1 9443205_19
2 9441864_19
3 9441809_19
4 9440310_19
Name: Date, dtype: object
Any clue what happened here? Why the type converts back to object?
make sure your 'Date' column is converted to datetime first, then the sorting should work fine:
import pandas as pd
df = pd.DataFrame({'Date': ['31/10/2019', '31/10/2019', '30/10/2019', '30/10/2019', '29/10/2019']})
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df.sort_values(by='Date', ascending=True, inplace=True)
# df['Date']
# 4 2019-10-29
# 2 2019-10-30
# 3 2019-10-30
# 0 2019-10-31
# 1 2019-10-31
# Name: Date, dtype: datetime64[ns
I want to convert a string from a dataframe to datetime.
dfx = df.ix[:,'a']
dfx = pd.to_datetime(dfx)
But it gives the following error:
ValueError: day is out of range for month
Can anyone help?
Maybe help add parameter dayfirst=True to to_datetime, if format of datetime is 30-01-2016:
dfx = df.ix[:,'a']
dfx = pd.to_datetime(dfx, dayfirst=True)
More universal is use parameter format with errors='coerce' for replacing values with other format to NaN:
dfx = '30-01-2016'
dfx = pd.to_datetime(dfx, format='%d-%m-%Y', errors='coerce')
print (dfx)
2016-01-30 00:00:00
Sample:
dfx = pd.Series(['30-01-2016', '15-09-2015', '40-09-2016'])
print (dfx)
0 30-01-2016
1 15-09-2015
2 40-09-2016
dtype: object
dfx = pd.to_datetime(dfx, format='%d-%m-%Y', errors='coerce')
print (dfx)
0 2016-01-30
1 2015-09-15
2 NaT
dtype: datetime64[ns]
If format is standard (e.g. 01-30-2016 or 01-30-2016), add only errors='coerce':
dfx = pd.Series(['01-30-2016', '09-15-2015', '09-40-2016'])
print (dfx)
0 01-30-2016
1 09-15-2015
2 09-40-2016
dtype: object
dfx = pd.to_datetime(dfx, errors='coerce')
print (dfx)
0 2016-01-30
1 2015-09-15
2 NaT
dtype: datetime64[ns]
Well in my case
year = 2023
month = 2
date = datetime.date(year, month, 30)
got me this error because February month has 29 or 28 days in it. Maybe that point helps someone
Im learning python (3.6 with anaconda) for my studies.
Im using pandas to import a xls file with 2 columns : Date (dd-mm-yyyy) and price.
But pandas changes the date format :
xls_file = pd.read_excel('myfile.xls')
print(xls_file.iloc[0, 0])
Im getting :
2010-01-04 00:00:00
instead of :
04-01-2010 or at least : 2010-01-04
I dont know why hh:mm:ss is added, I get the same result for each row from the Date column. I tried also different things using to_datetime but it didnt fix it.
Any idea ?
Thanks
What you need is to define the format that the datetime values get printed. There might be a more elegant way to do it but something like that will work:
In [11]: df
Out[11]:
id date
0 1 2017-09-12
1 2 2017-10-20
# Specifying the format
In [16]: print(pd.datetime.strftime(df.iloc[0,1], "%Y-%m-%d"))
2017-09-12
If you want to store the date as string in your specific format then you can also do something like:
In [17]: df["datestr"] = pd.datetime.strftime(df.iloc[0,1], "%Y-%m-%d")
In [18]: df
Out[18]:
id date datestr
0 1 2017-09-12 2017-09-12
1 2 2017-10-20 2017-09-12
In [19]: df.dtypes
Out[19]:
id int64
date datetime64[ns]
datestr object
dtype: object