I have a Pandas dataframe df that looks as follows:
df = pd.DataFrame({'timestamp' : ['Wednesday, Apr 4/04/22 at 17:02',
'Saturday, Apr 4/23/22 at 15:45'],
'foo' : [1, 2]
})
df
timestamp foo
0 Wednesday, Apr 4/04/22 at 17:02 1
1 Saturday, Apr 4/23/22 at 15:45 2
I'm trying to convert the timestamp column to a datetime object so that I can add a day_of_week column.
My attempt:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %-m/%-d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
The error is:
ValueError: '-' is a bad directive in format '%A, %b %-m/%-d/%y at %H:%M'
Any assistance would be greatly appreciated. Thanks!
Just use the format without the -:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %m/%d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
NB. to_datetime is quite flexible on the provided data, note how the incorrect day of week was just ignored.
output:
timestamp foo day_of_week
0 2022-04-04 17:02:00 1 Monday
1 2022-04-23 15:45:00 2 Saturday
Related
I am trying to convert a column with a real mix of date formats. I have tried a few things on SO but still not got a working solution. I have tried changing column to 'string', also tried converting the floats in int.
data
date
1 43076.0
2 43077
3 07 Dec 2017
4 2021-12-22 00:00:00
code to try and fix the Excel dates and '07 Dec 2017' style
d = ['43076.0', '43077', '07 Dec 2017', '2021-12-22 00:00:00']
df = pd.DataFrame(d, columns=['date'])
date1 = pd.to_datetime(df['date'], errors='coerce', format='%d %a %Y')
date2 = pd.to_datetime(df['date'], errors='coerce', unit='D', origin='1899-12-30')
frame_clean[col] = date2.fillna(date1)
error
Name: StartDate, Length: 16189, dtype: object' is not compatible with origin='1899-12-30'; it must be numeric with a unit specified
I like this solution rather than using apply as to slow. But I am struggling to get it working.
Edit
Breaking down #FObersteiner solution for better understanding.
convert the simple dates
df['datetime'] = pd.to_datetime(df['date'], errors='coerce')
0 NaT
1 NaT
2 2018-12-07
3 2021-12-22
isolate the numeric rows
m = pd.to_numeric(df['date'], errors='coerce').notna()
m
0 True
1 True
2 False
3 False
convert numeric rows to floats
df['date'][m].astype(float)
0 43080.0
1 43077.0
convert numeric rows to floats and then dt objects
pd.to_datetime(df['date'][m].astype(float), errors='coerce', unit='D', origin='1899-12-30')
0 2017-12-11
1 2017-12-08
pull it alltogether and bring back the simple date rows
df.loc[m, 'datetime'] = pd.to_datetime(df['date'][m].astype(float), errors='coerce', unit='D', origin='1899-12-30')
print(df)
For given example, use a mask to convert numeric and non-numeric data separately:
import pandas as pd
df = pd.DataFrame({'date':['43076.0', '43077', '07 Dec 2017', '2021-12-22 00:00:00']})
df['datetime'] = pd.to_datetime(df['date'], errors='coerce')
m = pd.to_numeric(df['date'], errors='coerce').notna()
df.loc[m, 'datetime'] = pd.to_datetime(df['date'][m].astype(float), errors='coerce', unit='D', origin='1899-12-30')
print(df)
date datetime
0 43076.0 2017-12-07
1 43077 2017-12-08
2 07 Dec 2017 2017-12-07
3 2021-12-22 00:00:00 2021-12-22
I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")
I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")
My data has date variable with two different date formats
Date
01 Jan 2019
02 Feb 2019
01-12-2019
23-01-2019
11-04-2019
22-05-2019
I want to convert this string into date(YYYY-mm-dd)
Date
2019-01-01
2019-02-01
2019-12-01
2019-01-23
2019-04-11
2019-05-22
I have tried following things, but I am looking for better approach
df['Date'] = np.where(df['Date'].str.contains('-'), pd.to_datetime(df['Date'], format='%d-%m-%Y'), pd.to_datetime(df['Date'], format='%d %b %Y'))
Working solution for me
df['Date_1']= np.where(df['Date'].str.contains('-'),df['Date'],np.nan)
df['Date_2']= np.where(df['Date'].str.contains('-'),np.nan,df['Date'])
df['Date_new'] = np.where(df['Date'].str.contains('-'),pd.to_datetime(df['Date_1'], format = '%d-%m-%Y'),pd.to_datetime(df['Date_2'], format = '%d %b %Y'))
Just use the option dayfirst=True
pd.to_datetime(df.Date, dayfirst=True)
Out[353]:
0 2019-01-01
1 2019-02-02
2 2019-12-01
3 2019-01-23
4 2019-04-11
5 2019-05-22
Name: Date, dtype: datetime64[ns]
My suggestion:
Define a conversion function as follows:
import datetime as dt
def conv_date(x):
try:
res = pd.to_datetime(dt.datetime.strptime(x, "%d %b %Y"))
except ValueError:
res = pd.to_datetime(dt.datetime.strptime(x, "%d-%m-%Y"))
return res
Now get the new date column as folows:
df['Date_new'] = df['Date'].apply(lambda x: conv_date(x))
You can get your desired result with the help of apply AND to_datetime method of pandas, as given below:-
import pandas pd
def change(value):
return pd.to_datetime(value)
df = pd.DataFrame(data = {'date':['01 jan 2019']})
df['date'] = df['date'].apply(change)
df
I hope it may help you.
This works simply as expected -
import pandas as pd
a = pd. DataFrame({
'Date' : ['01 Jan 2019',
'02 Feb 2019',
'01-12-2019',
'23-01-2019',
'11-04-2019',
'22-05-2019']
})
a['Date'] = a['Date'].apply(lambda date: pd.to_datetime(date, dayfirst=True))
print(a)
I am trying to read a date-time in microsecond format.
1500909283.955000
The expected output should be something like
July 24, 2017 3:14:43.955 PM
But when I use pandas to_datetime function I got
1970-01-01 00:00:01.500909283
I tried all possible format with no success.
Any hints
You need unit='s' param for to_datetime:
In[4]:
pd.to_datetime(1500909283.955000, unit='s')
Out[4]: Timestamp('2017-07-24 15:14:43.955000')
the timestamp is seconds since the epoch
The default unit value is nanoseconds:
In[5]:
pd.to_datetime(1500909283.955000, unit='ns')
Out[5]: Timestamp('1970-01-01 00:00:01.500909283')
which is what you observed
Need to_datetime with parameter unit, data are in seconds, not in miliseconds:
df = pd.DataFrame({'col':['1500909283.955000','1500909283.955000']})
df['col'] = pd.to_datetime(df['col'], unit='s')
print (df)
col
0 2017-07-24 15:14:43.955
1 2017-07-24 15:14:43.955
For milliseconds:
df['col'] = pd.to_datetime(df['col'], unit='ms')
print (df)
col
0 1970-01-18 08:55:09.283955
1 1970-01-18 08:55:09.283955
If need another format use Series.dt.strftime:
df['col1'] = df['col'].dt.strftime('%b %d, %Y %I:%M:%S.%f %p')
print (df)
col col1
0 2017-07-24 15:14:43.955 Jul 24, 2017 03:14:43.955000 PM
1 2017-07-24 15:14:43.955 Jul 24, 2017 03:14:43.955000 PM