Pandas Manipulating Freq for Business Day DateRange - python

I am trying to add a set of common date related columns to my data frame and my approach to building these date columns is off the .date_range() pandas method that will have the date range for my dataframe.
While I can use methods like .index.day or .index.weekday_name for general date columns, I would like to set a business day column based on date_range I constructed, but not sure if I can use the freq attribute nickname 'B' or if I need to create a new date range.
Further, I am hoping to not count those business days based on a list of holiday dates that I have.
Here is my setup:
Holiday table
holiday_table = holiday_table.set_index('date')
holiday_table_dates = holiday_table.index.to_list() # ['2019-12-31', etc..]
Base Date Table
data_date_range = pd.date_range(start=date_range_start, end=date_range_end)
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# Business day
df['business_day'] = data_date_range.freq("B")
Error at df['business_day'] = data_date_range.freq("B"):
---> 13 df['business_day'] = data_date_range.freq("B")
ApplyTypeError: Unhandled type: str

OK, I think I understand your question now. You are looking to create a a new column of working business days (excluding your custom holidays). In my example i just used the regular US holidays from pandas but you already have your holidays as a list in holiday_table_dates but you should still be able to follow the general layout of my example for your specific use. I also used the assumption that you are OK with boolean values for your business_day column:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as h_cal
# sample data
data_date_range = pd.date_range(start='1/1/2019', end='12/31/2019')
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# this is just a sample using US holidays
hday = h_cal().holidays(df.index.min(), df.index.max())
# b is the same date range as bove just with the freq set to business days
b = pd.date_range(start='1/1/2019', end='12/31/2019', freq='B')
# find all the working business day where b is not a holiday
bday = b[~b.isin(hday)]
# create a boolean col where the date index is in your custom business day we just created
df['bday'] = df.index.isin(bday)
day_index weekday_name bday
date
2019-01-01 1 Tuesday False
2019-01-02 2 Wednesday True
2019-01-03 3 Thursday True
2019-01-04 4 Friday True
2019-01-05 5 Saturday False

Related

How to get the number of business days between two dates in pandas

I have the following column in a dataframe, I would like to add a column to the end of this dataframe, where the column has the business days from today (6/24) to the previous day.
Bday() function does not seem to have this capability.
Date
2019-6-21
2019-6-20
2019-6-14
I am looking for a result that looks like following:
Date Business days
2019-6-21 1
2019-6-20 2
2019-6-14 6
Is there an easy way to do this, other than doing individual manipulations or using datetime library
Use np.busday_count:
# df['Date'] = pd.to_datetime(df['Date']) # if needed
np.busday_count(df['Date'].dt.date, np.datetime64('today'))
# array([1, 2, 6])
df['bdays'] = np.busday_count(df['Date'].dt.date, np.datetime64('today'))
df
Date bdays
0 2019-06-21 1
1 2019-06-20 2
2 2019-06-14 6

Is there some Python function like .to_period that could help me extract a fiscal year's week number based on a date?

Essentially, I want to apply some lambda function(?) of some sort to apply to a column in my dataframe that contains dates. Originally, I used dt.week to extract the week number but the calendar dates don't match up with the fiscal year I'm using (Apr 2019 - Mar 2020).
I have tried using pandas' function to_period('Q-MAR) but that seems to be a little bit off. I have been researching other ways but nothing seems to work properly.
Apr 1 2019 -> Week 1
Apr 3 2019 -> Week 1
Apr 30 2019 -> Week 5
May 1 2019 -> Week 5
May 15 2019 -> Week 6
Thank you for any advice or tips in advance!
You can create a DataFrame which contains the dates with a frequency of weeks:
date_rng = pd.date_range(start='01/04/2019',end='31/03/2020', freq='W')
df = pd.DataFrame(date_rng, columns=['date'])
You can then query df for which index the date is smaller than or equal to the value:
df.index[df.date <= query_date][-1]
This will output the largest index which is smaller than or equal to the date you want to examine. I imagine you can pour this into a lambda yourself?
NOTE
This solution has limitations, the biggest one being you have to manually define the datetime dataframe.
I did create a fiscal calendar that can be later improvised to create function in spark
from fiscalyear import *
beginDate = '2016-01-01'
endDate = '2021-12-31'
#create empty dataframe
df = spark.createDataFrame([()])
#create date from given date range
df1 = df.withColumn("date",explode(expr(f"sequence(to_date('{beginDate}'), to_date('{endDate}'), interval 1 day)")))
# get week
df1 = df1.withColumn('week',weekofyear(col("date"))).withColumn('year',year(col("date")))
#translate to use pandas in python
df1 = df1.toPandas()
#get fiscal year
df1['financial_year'] = df1['date'].map(lambda x: x.year if x.month > 3 else x.year-1)
df1['date'] = pd.to_datetime(df1['date'])
#get calendar qtr
df1['quarter_old'] = df1['date'].dt.quarter
#get fiscal calendar
df1['quarter'] = np.where(df1['financial_year']< (df1['year']),df1['quarter_old']+3,df1['quarter_old'])
df1['quarter'] = np.where(df1['financial_year'] == (df1['year']),df1['quarter_old']-1,df1['quarter'])
#get fiscal week by shiftin gas per number of months different from usual calendar
df1["fiscal_week"] = df1.week.shift(91)
df1 = df1.loc[(df1['date'] >= '2020-01-01')]
df1.display()

Slice, combine, and map fiscal year dates to calendar year dates to new column

I have the following pandas data frame:
Shortcut_Dimension_4_Code Stage_Code
10225003 2
8225003 1
8225004 3
8225005 4
It is part of a much larger dataset that I need to be able to filter by month and year. I need to pull the fiscal year from the first two digits for values larger than 9999999 in the Shortcut_Dimension_4_Code column, and the first digit for values less than or equal to 9999999. That value needs to be added to "20" to produce a year i.e. "20" + "8" = 2008 | "20" + "10" = 2010.
That year "2008, 2010" needs to be combined with the stage code value (1-12) to produce a month/year, i.e. 02/2010.
The date 02/2010 then needs to converted from fiscal year date to calendar year date, i.e. Fiscal Year Date : 02/2010 = Calendar Year date: 08/2009. The resulting date needs to be presented in a new column. The resulting df would end up looking like this:
Shortcut_Dimension_4_Code Stage_Code Date
10225003 2 08/2009
8225003 1 07/2007
8225004 3 09/2007
8225005 4 10/2007
I am new to pandas and python and could use some help. I am beginning with this:
Shortcut_Dimension_4_Code Stage_Code CY_Month Fiscal_Year
0 10225003 2 8.0 10
1 8225003 1 7.0 82
2 8225003 1 7.0 82
3 8225003 1 7.0 82
4 8225003 1 7.0 82
I used .map and .str methods to produce this df, but have not been able to figure out how to get the FY's right, for fy 2008-2009.
In below code, I'll assume Shortcut_Dimension_4_Code is an integer. If it's a string you can convert it or slice it like this: df['Shortcut_Dimension_4_Code'].str[:-6]. More explanations in comments alongside the code.
That should work as long as you don't have to deal with empty values.
import pandas as pd
import numpy as np
from datetime import date
from dateutil.relativedelta import relativedelta
fiscal_month_offset = 6
input_df = pd.DataFrame(
[[10225003, 2],
[8225003, 1],
[8225004, 3],
[8225005, 4]],
columns=['Shortcut_Dimension_4_Code', 'Stage_Code'])
# make a copy of input dataframe to avoid modifying it
df = input_df.copy()
# numpy will help us with numeric operations on large collections
df['fiscal_year'] = 2000 + np.floor_divide(df['Shortcut_Dimension_4_Code'], 1000000)
# loop with `apply` to create `date` objects from available columns
# day is a required field in date, so we'll just use 1
df['fiscal_date'] = df.apply(lambda row: date(row['fiscal_year'], row['Stage_Code'], 1), axis=1)
df['calendar_date'] = df['fiscal_date'] - relativedelta(months=fiscal_month_offset)
# by default python dates will be saved as Object type in pandas. You can verify with `df.info()`
# to use clever things pandas can do with dates we need co convert it
df['calendar_date'] = pd.to_datetime(df['calendar_date'])
# I would just keep date as datetime type so I could access year and month
# but to create same representation as in question, let's format it as string
df['Date'] = df['calendar_date'].dt.strftime('%m/%Y')
# copy important columns into output dataframe
output_df = df[['Shortcut_Dimension_4_Code', 'Stage_Code', 'Date']].copy()
print(output_df)

pandas - generating next closest year off of last digit

When computing the expiration date of some financial instruments, usually all we have to go off of is the following:
Trade Date RIC
5/22/1989 SPH0
5/23/1989 SPH0
5/24/1989 SPH0
5/25/1989 SPH0
5/26/1989 SPH0
Where the trade date is a day the instrument is being traded, and the RIC is a 4 letter string composed of:
First 2 characters = an asset class
3rd character = expiration month
last character = last digit of expiration year
Expiration months explained:
month_codes_to_int = {'F':'1', 'G':'2', 'H':'3', 'J':'4', 'K':'5', 'M':'6',
'N':'7', 'Q':'8', 'U':'9', 'V':'10', 'X':'11', 'Z':'12'}
I am generating an expiration year for every row by using the trade date and the RIC... I am doing so by iterating over the DF and filling an empty cell in a column every time with a function that will correctly compute the expiration date.
for index, row in df.iterrows():
row['Trade Date'] = pd.to_datetime(row['Trade Date'])
print(row['Trade Date'], row['RIC'])
current_year = row['Trade Date'].year
asset_class = row['RIC'].split[0:3]
expiration_month = row['RIC'][2]
expiration_year_last_digit = row['RIC'][3]
expiration_year =
My methodology is to have the expiration date be the closest date with the month and year to the current date, so for 5/22/1989 for example it would be 3/15/1990 (day doesn't matter, 1990 is the closest year ending with a 0 to 1989).
Is there a way to automate this using pandas datetime features?
First, create series containing your month, year, as well as the last digit of your TradeDate column.
m = df.RIC.str[2].map(month_codes_to_int)
y = df.RIC.str[3].astype(int)
s = df.TradeDate.dt.year.mod(10)
Then calculate your offset:
offset = np.where(y==s, 0, 10+y-s)
Finally, create your new column:
pd.to_datetime((df.TradeDate.dt.year + offset).astype(str) + m, format='%Y%m')
Output:
0 1990-03-01
1 1990-03-01
2 1990-03-01
3 1990-03-01
4 1990-03-01
dtype: datetime64[ns]

Propagate dates pandas and interpolate

We have some ready available sales data for certain periods, like 1week, 1month...1year:
time_pillars = pd.Series(['1W', '1M', '3M', '1Y'])
sales = pd.Series([4.75, 5.00, 5.10, 5.75])
data = {'time_pillar': time_pillars, 'sales': sales}
df = pd.DataFrame(data)
I would like to do two operations.
Firstly, create a new column of date type, df['date'], that corresponds to the actual date of 1week, 1month..1year from now.
Then, I'd like to create another column df['days_from_now'], taking how many days are on these pillars (1week would be 7days, 1month would be around 30days..1year around 365days).
The goal of this is then to use any day as input for a a simple linear_interpolation_method() to obtain sales data for any given day (eg, what are sales for 4Octobober2018? ---> We would interpolate between 3months and 1year).
Many thanks.
I'm not exactly sure what you mean regarding your interpolation, but here is a way to make your dataframe in pandas (starting from your original df you provided in your post):
from datetime import datetime
from dateutil.relativedelta import relativedelta
def create_dates(df):
df['date'] = [i.date() for i in
[d+delt for d,delt in zip([datetime.now()] * 4 ,
[relativedelta(weeks=1), relativedelta(months=1),
relativedelta(months=3), relativedelta(years=1)])]]
df['days_from_now'] = df['date'] - datetime.now().date()
return df
create_dates(df)
sales time_pillar date days_from_now
0 4.75 1W 2018-04-11 7 days
1 5.00 1M 2018-05-04 30 days
2 5.10 3M 2018-07-04 91 days
3 5.75 1Y 2019-04-04 365 days
I wrapped it in a function, so that you can call it on any given day and get your results for 1 week, 3 weeks, etc. from that exact day.
Note: if you want your days_from_now to simply be an integer of the number of days, use df['days_from_now'] = [i.days for i in df['date'] - datetime.now().date()] in the function, instead of df['days_from_now'] = df['date'] - datetime.now().date()
Explanation:
df['date'] = [i.date() for i in
[d+delt for d,delt in zip([datetime.now()] * 4 ,
[relativedelta(weeks=1), relativedelta(months=1),
relativedelta(months=3), relativedelta(years=1)])]]
Takes a list of the date today (datetime.now()) repeated 4 times, and adds a relativedelta (a time difference) of 1 week, 1 month, 3 months, and 1 year, respectively, extracts the date (i.date() for ...), finally creating a new column using the resulting list.
df['days_from_now'] = df['date'] - datetime.now().date()
is much more straightforward, it simply subtracts those new dates that you got above from the date today. The result is a timedelta object, which pandas conveniently formats as "n days".

Categories

Resources