pandas - converting d-mmm-yy to datetime object - python

I have a CSV with some data that looks like such:
I have many of these files, and I want to read them into DataFrame:
df = pd.read_csv(filepath, engine='c')
df['closingDate'] = pd.to_datetime(df['closingDate'], format='%dd-%mmm-%yy')
df['Fut Expiration Date'] = pd.to_datetime(df['Fut Expiration Date'], format='%d-%m-%yy')
I've tried a multitude of formats, but none seem to work. Is there an alternative?

Actually you do not need to specify the format here. The format is unambiguous, if we convert it without specifying a format, we get:
>>> df
Date
0 1-Dec-99
1 1-Jul-99
2 1-Jun-99
3 1-Nov-99
4 1-Oct-99
5 1-Sep-99
6 2-Aug-99
7 2-Dec-99
>>> pd.to_datetime(df['Date'])
0 1999-12-01
1 1999-07-01
2 1999-06-01
3 1999-11-01
4 1999-10-01
5 1999-09-01
6 1999-08-02
7 1999-12-02
Name: Date, dtype: datetime64[ns]
Alternatively, we can look up the format in the documentation of the datetime module [Python-doc]. We here se that:
%d Day of the month as a zero-padded 01, 02, …, 31
decimal number.
%b Month as locale’s abbreviated name. Jan, Feb, …, Dec (en_US);
Jan, Feb, …, Dez (de_DE)
%y Year without century as a 00, 01, …, 99
zero-padded decimal number.
So we can specify the format as:
>>> pd.to_datetime(df['Date'], format='%d-%b-%y')
0 1999-12-01
1 1999-07-01
2 1999-06-01
3 1999-11-01
4 1999-10-01
5 1999-09-01
6 1999-08-02
7 1999-12-02
Name: Date, dtype: datetime64[ns]

Check out the directives for datetimes here. The following should work, using 3 letter months and 2 digit years:
df['Fut Expiration Date'] = pd.to_datetime(df['Fut Expiration Date'], format='%d-%b-%y')

Use %b for a three letter month. Please see the Python strftime reference: http://strftime.org/
I think you want: w for the day, b for the month, and yy for the year.
I'm assuming the days aren't zero padded, if the days are zero padded then use d instead of w.

Related

Parsing dates in pandas.to_datetime when date is 'DD-MMM' [duplicate]

I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")

How do I extract last day of week from yyyyww column in python

I have a df_mixed column containing data in yyyyww format, eg: 201501, 201502…etc
I have to extract the last date of the week and put it in ds column.
For eg: For 201501, last day of week 1 is 4-1-2015
For 201502, last day is 11-1-2015
I have to follow the ISO format.
According to the ISO format the 1st week of 2015 starts from 29th December 2014 and ends on 4th January 2015
Any idea how to go about it using python, pandas and datetime library?
IIUC use pd.to_datetime to construct the datetime in format %Y%W%w. I added 0 as the weekday since you want Sundays which is first day of a week:
df = pd.DataFrame({"Date":[201501, 201502]})
df["Date"] = pd.to_datetime((df["Date"]-1).astype(str)+"0", format="%Y%W%w")
print (df)
Date
0 2015-01-04
1 2015-01-11
Assuming this input:
df = pd.DataFrame({'date': ['201501', '201502']})
If you choose Sunday as the last day of week:
df['date2'] = pd.to_datetime(df['date']+'Sun', format='%Y%W%a')
df
output:
date date2
0 201501 2015-01-11
1 201502 2015-01-18
NB. if you want American week format, use %U in place of %W and Mon as the last day of week. See the doc for datetime for more precisions

Convert column with month and year ("August 2020"...) to datetime

I have dataframe which contains one column of month and year as string :
>>>time index value
January 2021 y 5
January 2021 v 8
May 2020 y 25
June 2020 Y 13
June 2020 x 11
June 2020 v 10
...
I would like to change the column "time" into datetime format so I can sort the table by chronological order.
Is thery any way to do it when the time is string with month name and number?
#edit:
when I do :
result_Table['time']=pd.to_datetime(result_Table['time'],format='%Y-%m-%d')
I recieve error:
ValueError: time data January 2021 doesn't match format specified
Sample dataframe:
df=pd.DataFrame({'time':['January 2021','May 2020','June 2020']})
If you want to specify the format parameter then that should be '%B %Y' instead of '%Y-%m-%d':
df['time']=pd.to_datetime(df['time'],format='%B %Y')
#OR
#you can also simply use:
#df['time']=pd.to_datetime(df['time'])
output of df:
time
0 2021-01-01
1 2020-05-01
2 2020-06-01
For more info regarding format codes visit here

Parse Month Day ('%B %d') date column into datetime using current year

I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")

Pandas date range returns "could not convert string to Timestamp" for yyyy-ww

I have a dataframe with two columns; Sales and Date.
dataset.head(10)
Date Sales
0 2015-01-02 34988.0
1 2015-01-03 32809.0
2 2015-01-05 9802.0
3 2015-01-06 15124.0
4 2015-01-07 13553.0
5 2015-01-08 14574.0
6 2015-01-09 20836.0
7 2015-01-10 28825.0
8 2015-01-12 6938.0
9 2015-01-13 11790.0
I want to convert the Date column from yyyy-mm-dd (e.g. 2015-06-01) to yyyy-ww (e.g. 2015-23), so I run the following piece of code:
dataset["Date"] = pd.to_datetime(dataset["Date"]).dt.strftime('%Y-%V')
Then I group by my Sales based on weeks, i.e.
data = dataset.groupby(['Date'])["Sales"].sum().reset_index()
data.head(10)
Date Sales
0 2015-01 67797.0
1 2015-02 102714.0
2 2015-03 107011.0
3 2015-04 121480.0
4 2015-05 148098.0
5 2015-06 132152.0
6 2015-07 133914.0
7 2015-08 136160.0
8 2015-09 185471.0
9 2015-10 190793.0
Now I want to create a date range based on the Date column, since I'm predicting sales based on weeks:
ds = data.Date.values
ds_pred = pd.date_range(start=ds.min(), periods=len(ds) + num_pred_weeks,
freq="W")
However I'm getting the following error: could not convert string to Timestamp which I'm not really sure how to fix. So, if I use 2015-01-01 as the starting date of my date-import I get no error, which makes me realize that I'm using the functions wrong. However, I'm not sure how?
I would like to basically have a date range that spans weekly from the current week and then 52 weeks into the future.
I think problem is want create minimum of dataset["Date"] column filled by strings in format YYYY-VV. But for pass to date_range need format YYYY-MM-DD or datetime object.
I found this:
Several additional directives not required by the C89 standard are included for convenience. These parameters all correspond to ISO 8601 date values. These may not be available on all platforms when used with the strftime() method. The ISO 8601 year and ISO 8601 week directives are not interchangeable with the year and week number directives above. Calling strptime() with incomplete or ambiguous ISO 8601 directives will raise a ValueError.
%V ISO 8601 week as a decimal number with Monday as the first day of the week. Week 01 is the week containing Jan 4.
Pandas 0.24.2 bug with YYYY-VV format:
dataset = pd.DataFrame({'Date':['2015-06-01','2015-06-02']})
dataset["Date"] = pd.to_datetime(dataset["Date"]).dt.strftime('%Y-%V')
print (dataset)
Date
0 2015-23
1 2015-23
ds = pd.to_datetime(dataset['Date'], format='%Y-%V')
print (ds)
ValueError: 'V' is a bad directive in format '%Y-%V'
Possible solution is use %U or %W, check this:
%U Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
%W Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0.
dataset = pd.DataFrame({'Date':['2015-06-01','2015-06-02']})
dataset["Date"] = pd.to_datetime(dataset["Date"]).dt.strftime('%Y-%U')
print (dataset)
Date
0 2015-22
1 2015-22
ds = pd.to_datetime(dataset['Date'] + '-1', format='%Y-%U-%w')
print (ds)
0 2015-06-01
1 2015-06-01
Name: Date, dtype: datetime64[ns]
Or using data from original DataFrame in datetimes:
dataset = pd.DataFrame({'Date':['2015-06-01','2015-06-02'],
'Sales':[10,20]})
dataset["Date"] = pd.to_datetime(dataset["Date"])
print (dataset)
Date Sales
0 2015-06-01 10
1 2015-06-02 20
data = dataset.groupby(dataset['Date'].dt.strftime('%Y-%V'))["Sales"].sum().reset_index()
print (data)
Date Sales
0 2015-23 30
num_pred_weeks = 5
ds = data.Date.values
ds_pred = pd.date_range(start=dataset["Date"].min(), periods=len(ds) + num_pred_weeks, freq="W")
print (ds_pred)
DatetimeIndex(['2015-06-07', '2015-06-14', '2015-06-21',
'2015-06-28',
'2015-07-05', '2015-07-12'],
dtype='datetime64[ns]', freq='W-SUN')
If ds contains dates as string formatted as '2015-01' which should be '%Y-%W' (or '%G-%V' in datetime library) you have to add a day number to obtain a day. Here, assuming that you want the monday you should to:
ds_pred = pd.date_range(start=pd.to_datetime(ds.min() + '-1', format='%Y-%W-%w',
periods=len(ds) + num_pred_weeks, freq="W")

Categories

Resources