Compute salary days in Pandas - python

I've created a custom Calendar:
holidays_list = [...] # list of all weekends and holidays for needed time period
class MyBusinessCalendar(AbstractHolidayCalendar):
start_date = datetime(2011, 1, 1)
end_date = datetime(2017, 12, 31)
rules = [
Holiday(name='Day Off', year=d.year, month=d.month, day=d.day) for d in holidays_list
]
cal = MyBusinessCalendar()
I know that salary days are the 5th and the 20th days of each month or the previous business days if these ones are days off.
Therefore I take
bus_day = CustomBusinessDay(calendar=cal)
r = pd.date_range('2011-01-01', '2017-12-31', freq=bus_day)
and I'd like to compute for each day from r if it's a salary day. How can I get this?

The list of salary days (paydays in American English) is defined by you as:
the 5th and the 20th days of each month or the previous business days if these ones are days off
To generate the list of paydays programmatically using a holiday calendar, you can generate the list of every 6th of the month and every 21st of the month:
dates = [date(year, month, 6) for month in range(1, 13)] +
[date(year, month, 21) for month in range(1, 13)]
Then get the previous working day, i.e. offset=-1. I'd use this:
np.busday_offset(dates, -1, roll='forward', holidays=my_holidays)
The reason I use numpy.busday_offset instead of the Pandas stuff for doing the offsets is that it is vectorized and runs very fast, whereas the Pandas busday offset logic is very slow. If the number of dates is small, it won't matter. You can still use Pandas to generate the list of holidays if you want.
Note that roll='forward' is because you want the logic to be that if the 6th is on a weekend or holiday, you roll forward to the 7th or 8th, then from there you offset -1 working day to get the payday.

Related

How to get date range for a specific month starting with a previous month using Pandas

I'm trying to build a list of "pay days" for a given month in the future knowing only when the pay days started months ago. For example:
Starting date - When the paychecks started: 1/6/2023
Frequency is every two weeks
So if I want to know which dates are pay days in March, I have to start at the 1/6/2023 and add two weeks until I get to March to know that the first pay day in March is 3/3/2/2023.
Then I want my final list of dates to be only those March dates of:
(3/3/2023, 3/17/2023, 3/31/2023)
I know I can use pandas to do something like:
pd.date_range(starting_date, starting_date+relativedelta(months=1), freq='14d')
but it would include every date back to 1/6/2023.
The easiest thing to do here would be to just update the starting_date parameter to be the first pay day in the month you're interested in.
To do this, you can use this function that finds the first pay day in a given month by first finding the difference between your start date and the desired month.
# month is the number of the month (1-12)
def get_first_pay_day_in_month(month=datetime.datetime.now().month,
year=datetime.datetime.now().year,
start_date=datetime.datetime(2023, 1, 6),
):
diff = datetime.datetime(year, month, 1) - start_date
freq = 14
if diff.days % freq == 0:
print(f'Difference: {diff.days/freq} weeks')
return datetime.datetime(year,month,1)
else:
print(f'Difference: {diff.days} days')
print(f'Days: {diff.days % freq} extra')
return datetime.datetime(year,month,1 + 14 - (diff.days % freq))
Then you can use this function to get the first pay day of a specific month and plug it into the date_range method.
from dateutil import relativedelta
starting_date = get_first_pay_day_in_month(month=3)
pay_days = pd.date_range(starting_date, starting_date+relativedelta.relativedelta(months=1), freq='14d')
print(pay_days)

Python - Skip day if day is in a list of off days

I am working on something to automatically get the last day of class, depending on the first day and days of school.
I am struggling to find a way to skip off days and automatically add +1 day to have the correct end date. Also of course, if last day falls on a week-end, it should continue until it falls on a weekday.
Here's my code so far :
start_date = datetime.date(2019, 9, 30)
number_of_days = 5
date_list = []
for day in range(number_of_days):
a_date = (start_date + datetime.timedelta(days = day)).isoformat()
date_list.append(a_date)
print(date_list[-1])
I was thinking of putting all of my off days in a separate list, and iterate on it, but I cant find a way to add +1 in datetime. Also, creating a list of date seems difficult, as you can't iterate on dates?
you could simply iterate through the days one at a time
and discard those on weekends and in the "off-days" list
until you have enough class days
import datetime
start_date = datetime.date(2019, 9, 30)
number_of_days = 5
off_days = (
datetime.date(2019, 10, 1),
)
days_of_class = [start_date]
day = start_date
while len(days_of_class) < number_of_days:
day = day + datetime.timedelta(days=1)
if day.isoweekday() in (6, 7):
# saturday, sunday
continue
if day in off_days:
continue
days_of_class.append(day)
print(days_of_class)

Get week of UK fiscal year

I want to get the week number corresponding to the UK fiscal year (which runs 6th April to 5th April). report_date.strftime('%V') will give me the week number corresponding to the calendar year (1st January to 31st December).
For example, today is 2nd February which is UK fiscal week 44, but %V would return 05.
I've seen the https://pypi.org/project/fiscalyear/ library but it doesn't seem to offer a way to do this. I know that I can work out the number of days since April 6th and divide by 7, but just curious if there's a better way.
This does the job in Python. It counts the number of days since April 6th of the given year (formatted_report_date), and if the answer is negative (because April 6th hasn't passed yet), then a year is subtracted. Then divide by 7 and add 1 (for 1-indexing). The answer will be between 1-53.
def get_fiscal_week(formatted_report_date):
"""
Given a date, returns the week number (from 1-53) since the last April 6th.
:param formatted_report_date: the formatted date to be converted into a
fiscal week.
"""
from datetime import datetime
fiscal_start = formatted_report_date.replace(month=4, day=6)
days_since_fiscal_start = (formatted_report_date - fiscal_start).days
if days_since_fiscal_start < 0:
fiscal_start = fiscal_start.replace(year=fiscal_start.year-1)
days_since_fiscal_start = (formatted_report_date - fiscal_start).days
return (days_since_fiscal_start / 7) + 1

Pandas fill a DataFrame from another by DatetimeIndex

I have a DataFrame of sales numbers with a DatetimeIndex, for data that extends over a couple of years at the minute level, and I want to first calculate totals (of sales) per year, month, day, hour and location, then average over years and month.
Then with that date, I want to extrapolate to a new month, per day, hour and location. So to do that, I calculate the sales numbers per hour for each day of the week (expecting that weekend days will behave differently from work week days), then I create a new DataFrame for the month I want to extrapolate to, then for each day in that month, I calculate (day of week, hour, POS) and use the past data for the corresponding (day of week, hour, POS) as my "prediction" for what will be sold at POS at the given hour and day in the given month.
The reason I'm doing it this way is that once I calculate a mean per day of the week in the past, when I populate the DataFrame for the month of June, the 1st of June could be any day of the week, and that is important as weekdays/weekend days behave differently. I want the past sales number for a Friday, if the 1st is a Friday.
I have the following, that is unfortunately too slow - or maybe wrong, in any case, there is no error message but it doesn't complete (on the real data):
import numpy as np
import pandas as pd
# Setup some sales data for the past 2 years for some stores
hours = pd.date_range('2018-01-01', '2019-12-31', freq='h')
sales = pd.DataFrame(index = hours, columns=['Store', 'Count'])
sales['Store'] = np.random.randint(0,10, sales.shape[0])
sales['Count'] = np.random.randint(0,100, sales.shape[0])
# Calculate the average of sales over these 2 years for each hour in
# each day of the week and each store
sales.groupby([sales.index.year, sales.index.month, sales.index.dayofweek, sales.index.hour, 'Store'])['Count'] \
.sum() \
.rename_axis(index=['Year', 'Month', 'DayOfWeek', 'Hour', 'Store']) \
.reset_index() \
.groupby(['DayOfWeek', 'Hour', 'Store'])['Count'] \
.mean() \
.rename_axis(index=['DayOfWeek', 'Hour', 'Store'])
# Setup a DataFrame to predict May sales per store/day/hour
may_hours = pd.date_range('2020-05-01', '2020-05-31', freq='h')
predicted = pd.DataFrame(index = pd.MultiIndex.from_product([may_hours, range(0,11)]), columns = ['Count']) \
.rename_axis(index=['Datetime', 'Store'])
# "Predict" sales for each (day, hour, store) in May 2020
# by retrieving the average sales for the corresponding
# (day of week, hour store)
for idx in predicted.index:
qidx = (idx[0].dayofweek, idx[0].hour, idx[1])
predicted.loc[idx] = sales[qidx] if qidx in sales.index else 0

Pandas generating first and last business day of each week between two dates

I am trying to figure out a way to generate first and last business day for each week between two dates for example 2016-01-01 and 2017-02-28.
Considering there are many weeks in US where we have long weekend starting on Friday or extending to Monday, finding all dates with Monday and Friday is not a working logic. For weeks where Monday is holiday, Tuesday would be first business date and if Friday is holiday, Thursday will be last business date.
I can use pandas date_range function to generate all days between two dates but beyond that I can't seem to figure.
Hello this code should solve the issue:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
dr = pd.date_range(start='2016-01-01', end='2017-02-28')
cal = USFederalHolidayCalendar()
holidays = cal.holidays(start=dr.min(), end=dr.max())
A = dr[~dr.isin(holidays)] # make sure its not US holiday
B = A[A.weekday != 5] # make sure its not saturday
B = B[B.weekday != 6] # make sure its not saturday
for year in set(B.year): # for every year in your range
tmp = B[B.year == year] # slice it year wise
for week in set(tmp.week): # for each week in slice
temp = tmp[tmp.week == week]
print(temp[temp.weekday == temp.weekday.min()]) # begining of week
print(temp[temp.weekday == temp.weekday.max()]) # ending of week
So basically you import calendar with US holidays, create a desired date range, slice date range based on calendar, then get rid off Saturdays and Sundays and then loop through it and return start and end for each week in date range.
Hope it helps!

Categories

Resources