I have a Dataframe that has dates stored in different formats in the same column as shown below:
date
1-10-2018
2-10-2018
3-Oct-2018
4-10-2018
Is there anyway I could make all of them to have the same format.
Use to_datetime with specify formats with errors='coerce' for replace not matched values to NaNs. Last combine_first for replace missing values by date2 Series.
date1 = pd.to_datetime(df['date'], format='%d-%m-%Y', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce')
df['date'] = date1.combine_first(date2)
print (df)
date
0 2018-10-01
1 2018-10-02
2 2018-10-03
3 2018-10-04
Related
I have a date column of the format YYYY-MM-DD. I want to slice the only year and month from it. But I don't want the "-" as I have to later convert it into an integer to feed into my linear regression model.
It's current datatype is "object".
Dataframe :-
date open close high low
0 2019-10-08 56.46 56.10 57.02 56.08
1 2019-10-09 56.76 56.76 56.95 56.41
2 2019-10-10 56.98 57.52 57.61 56.83
3 2019-10-11 58.24 59.05 59.41 58.08
4 2019-10-14 58.73 58.97 59.53 58.67
You can use pd.to_datetime to convert date column to datetime then use pd.Series.dt.strftime.
s = pd.to_datetime(df['date'])
df['date'] = s.dt.strftime("%Y%m") # would give 202010
# or
# df['date'] = s.dt.strftime("%y%m") # would give 2010
date --> your date column
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))
I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.
I've a column with birth dates as object, the problem is when I tried to convert it into datetime, because it displays always the next warning
time data '27126' does not match format '%d/%m/%Y' (match)
date
0 05/06/1980
1 31/07/1947
2 07/01/1963
3 26/03/1973
4 30/01/1991
5 12/12/1991
6 13/08/1987
7 10/01/1944
8 23/06/1965
9 08/10/1995
till now I've tried the next codes:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['date'] = df['date'].apply(lambda x: datetime.datetime.strptime(x, "%d/%m/%Y").strftime("%Y-%m-%d"))
df['date'] = pd.to_datetime(df['date'].str.strip(), format='%d/%m/%Y')
Add parameter errors='coerce' for convert non matched datetimes to missing values, here NaT:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
For a date column I have data like this: 19.01.01, which means 2019-01-01. Is there a method to change the format from the former to the latter?
My idea is to add 20 to the start of date and replace . with -. Are there better ways to do that?
Thanks.
If format is YY.DD.MM use %y.%d.%m, if format is YY.MM.DD use %y.%m.%d in to_datetime:
df = pd.DataFrame({'date':['19.01.01','19.01.02']})
#YY.DD.MM
df['date'] = pd.to_datetime(df['date'], format='%y.%d.%m')
print (df)
date
0 2019-01-01
1 2019-02-01
#YY.MM.DD
df['date'] = pd.to_datetime(df['date'], format='%y.%m.%d')
print (df)
date
0 2019-01-01
1 2019-01-02
I get a1523245800 value in the date field from my incoming data feed. I wish to know, how to convert this value into the date dtype? I have tried pandas.to_datetime but that does not seem to work. thankyou.
here is my code
pd.to_datetime([`a1523245800`], errors='coerce')
and here is the output of the above:
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)
Remove a by str[1:] for remove first char or str.extract for get numeric part first and then to_datetime with parameter unit:
df = pd.DataFrame({'date':['a1523245800','a1523245800']})
df['date1'] = pd.to_datetime(df['date'].str[1:], unit='s')
Or:
df['date1'] = pd.to_datetime(df['date'].str.extract('(\d+)', expand=False), unit='s')
print (df)
date date1
0 a1523245800 2018-04-09 03:50:00
1 a1523245800 2018-04-09 03:50:00