I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")
Related
I have a Pandas dataframe df that looks as follows:
df = pd.DataFrame({'timestamp' : ['Wednesday, Apr 4/04/22 at 17:02',
'Saturday, Apr 4/23/22 at 15:45'],
'foo' : [1, 2]
})
df
timestamp foo
0 Wednesday, Apr 4/04/22 at 17:02 1
1 Saturday, Apr 4/23/22 at 15:45 2
I'm trying to convert the timestamp column to a datetime object so that I can add a day_of_week column.
My attempt:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %-m/%-d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
The error is:
ValueError: '-' is a bad directive in format '%A, %b %-m/%-d/%y at %H:%M'
Any assistance would be greatly appreciated. Thanks!
Just use the format without the -:
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%A, %b %m/%d/%y at %H:%M')
df['day_of_week'] = df['timestamp'].dt.day_name()
NB. to_datetime is quite flexible on the provided data, note how the incorrect day of week was just ignored.
output:
timestamp foo day_of_week
0 2022-04-04 17:02:00 1 Monday
1 2022-04-23 15:45:00 2 Saturday
I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")
I have a df_mixed column containing data in yyyyww format, eg: 201501, 201502…etc
I have to extract the last date of the week and put it in ds column.
For eg: For 201501, last day of week 1 is 4-1-2015
For 201502, last day is 11-1-2015
I have to follow the ISO format.
According to the ISO format the 1st week of 2015 starts from 29th December 2014 and ends on 4th January 2015
Any idea how to go about it using python, pandas and datetime library?
IIUC use pd.to_datetime to construct the datetime in format %Y%W%w. I added 0 as the weekday since you want Sundays which is first day of a week:
df = pd.DataFrame({"Date":[201501, 201502]})
df["Date"] = pd.to_datetime((df["Date"]-1).astype(str)+"0", format="%Y%W%w")
print (df)
Date
0 2015-01-04
1 2015-01-11
Assuming this input:
df = pd.DataFrame({'date': ['201501', '201502']})
If you choose Sunday as the last day of week:
df['date2'] = pd.to_datetime(df['date']+'Sun', format='%Y%W%a')
df
output:
date date2
0 201501 2015-01-11
1 201502 2015-01-18
NB. if you want American week format, use %U in place of %W and Mon as the last day of week. See the doc for datetime for more precisions
I have a CSV with some data that looks like such:
I have many of these files, and I want to read them into DataFrame:
df = pd.read_csv(filepath, engine='c')
df['closingDate'] = pd.to_datetime(df['closingDate'], format='%dd-%mmm-%yy')
df['Fut Expiration Date'] = pd.to_datetime(df['Fut Expiration Date'], format='%d-%m-%yy')
I've tried a multitude of formats, but none seem to work. Is there an alternative?
Actually you do not need to specify the format here. The format is unambiguous, if we convert it without specifying a format, we get:
>>> df
Date
0 1-Dec-99
1 1-Jul-99
2 1-Jun-99
3 1-Nov-99
4 1-Oct-99
5 1-Sep-99
6 2-Aug-99
7 2-Dec-99
>>> pd.to_datetime(df['Date'])
0 1999-12-01
1 1999-07-01
2 1999-06-01
3 1999-11-01
4 1999-10-01
5 1999-09-01
6 1999-08-02
7 1999-12-02
Name: Date, dtype: datetime64[ns]
Alternatively, we can look up the format in the documentation of the datetime module [Python-doc]. We here se that:
%d Day of the month as a zero-padded 01, 02, …, 31
decimal number.
%b Month as locale’s abbreviated name. Jan, Feb, …, Dec (en_US);
Jan, Feb, …, Dez (de_DE)
%y Year without century as a 00, 01, …, 99
zero-padded decimal number.
So we can specify the format as:
>>> pd.to_datetime(df['Date'], format='%d-%b-%y')
0 1999-12-01
1 1999-07-01
2 1999-06-01
3 1999-11-01
4 1999-10-01
5 1999-09-01
6 1999-08-02
7 1999-12-02
Name: Date, dtype: datetime64[ns]
Check out the directives for datetimes here. The following should work, using 3 letter months and 2 digit years:
df['Fut Expiration Date'] = pd.to_datetime(df['Fut Expiration Date'], format='%d-%b-%y')
Use %b for a three letter month. Please see the Python strftime reference: http://strftime.org/
I think you want: w for the day, b for the month, and yy for the year.
I'm assuming the days aren't zero padded, if the days are zero padded then use d instead of w.
One of my columns in a pandas dataframe has dates formatted like so:
Saturday, April 29th, 2017
How would I change this to a pandas readable date type so that I can sort by date?
(python 3)
use to_datetime. see example below
import pandas as pd
df = pd.DataFrame({'date': ["Saturday, April 29th, 2017", "Wednesday, March 22nd, 2017"]})
print df.head()
# conversion to pandas date time
df.date = pd.to_datetime(df.date)
print df.head()
# Sorting by Date
print "sorted by Date"
print df.sort_values(['date']).head()
results in
date
0 Saturday, April 29th, 2017
1 Wednesday, March 22nd, 2017
date
0 2017-04-29
1 2017-03-22
sorted by Date
date
1 2017-03-22
0 2017-04-29