Group DataFrame by Business Day of Month - python

I am trying to group a Pandas DataFrame that is indexed by date by the business day of month, approx 22/month.
I would like to return a result that contains 22 rows with mean of some value in `DataFrame.
I can by day of month but cant seem to figure out how to by business day.
Is there a function that will return the business day of month of a date?
if someone could provide a simple example that would be most appreciated.

Assuming your dates are in the index (if not use 'set_index):
df.groupby(pd.TimeGrouper('B'))
See time series functionality.

I think what the question is asking is to groupby business day of month - the other answer just seems to resample the data to the nearest business day (at least for me).
This code returns a groupby object with 22 rows
from datetime import date
import pandas as pd
import numpy as np
d = pd.Series(np.random.randn(1000), index=pd.bdate_range(start='01 Jan 2018', periods=1000))
def to_bday_of_month(dt):
month_start = date(dt.year, dt.month, 1)
return np.busday_count(month_start, dt)
day_of_month = [to_bday_of_month(dt) for dt in d.index.date]
d.groupby(day_of_month).mean()

Related

Changing timeseries column into a date

I have a timeseries with 2 columns, the first being hours after 1 Jan 1970. In this column, a year is only 360 days, with 12 months of 30 days. I need to convert this column into a usable date so that I can analyse the other column based on month, year etc (e.g 1997-Jan-1-1 being year-month-day-hour).
I need to make an array with modulo, to convert the each row of the hours column into hour_of_day, day_of_month, year etc, so that the column is instead a year, month, day and hour. But I don't know how to do this. Appreciate it might be confusing. Any help on doing this would be very helpful.
Input: 233280.5 (in hours)
Output: 1997-01-01-01 (year-day-month-hour)
you can calculate the number of years and add it to the reference date like e.g.
import pandas as pd
import numpy as np
from pandas.tseries.offsets import DateOffset
refdate = pd.Timestamp('1970-01-01')
df = pd.DataFrame({'360d_year_hours': [233280.5]})
# we calculate the number of years and fractional years as helper Series
y_frac, y = np.modf(df['360d_year_hours'] / (24*360))
# now we can calculate the new date's year:
df['datetime'] = pd.Series(refdate + DateOffset(years=i) for i in y)
# we need the days in the given year to be able to use y_frac
daysinyear = np.where(df['datetime'].dt.is_leap_year, 366, 365)
# ...so we can update the datetime and round to the hour:
df['datetime'] = (df['datetime'] + pd.to_timedelta(y_frac*daysinyear, unit='d')).dt.round('h')
# df['datetime']
# 0 1997-01-01 01:00:00
# Name: datetime, dtype: datetime64[ns]

Iterating a groupby datetime over several weeks

I'm trying to group my data by a week that I predefined using to_datetime and timedelta. However, after copying my script a few times, I was hoping there was a way to iterate this process over multiple weeks. Is this something that can be done?
The data set that I'm working with lists sales out sales revenue and spending out by the day for each data source and its corresponding id.
Below is what I have so far but my knowledge of loops is pretty limited due to being self-taught.
Let me know if what I'm asking is feasible or if I have to continue to copy my code every week.
Code
import pandas as pd
from datetime import datetime, timedelta,date
startdate = '2021-09-26'
enddate = pd.to_datetime(startdate) + timedelta(days=6)
last7 = (df.date >= startdate) & (df.date <= enddate)
df = df.loc[last7,['datasource','id','revenue','spend']]
df.groupby(by=['datasource_name','id'],as_index=False).sum()
df['start_date'] = startdate
df['end_date'] = enddate
df
If I have understood your issue correctly, you are basically trying to aggregate daily data into weekly. You can try following code
import datetime as dt
import pandas as pd
#Get weekend date for each date
df['week_end_date']=df['date'].apply(lambda x: pd.Period(x,freq='W').end_time.date().strftime('%Y-%m-%d'))
#Aggregate sales and revenue at weekly level
df_agg = df.groupby(['datasource_name','id','week_end_date']).agg({'revenue':'sum','spend':'sum'}).reset_index()
df_agg will have all your sales and revenue numbers aggregated by the weekend date for corresponding date.

How to filter a dataframe for a given time range?

I have a dataframe that update every week and I want to drop the data older than 6 months
For example:
I have a dataframe from January until now.
Now it's September 14 and I want to drop the old data, in this case from January until March 14.
In the case we are in December, it´ll have from June until December, and so on.
Thank you
Months are an arbitrary time period, since the length changes
Use Boolean Indexing and filter against the current date minus 182 days
Alternative, use relativedelta from the python dateutil module, which can do months
from datetime import datetime
import pandas as pd
from dateutil.relativedelta import relativedelta as rd
# This line is just for creating test data
df = pd.DataFrame({'datetime': pd.date_range(start='2020-01-01', end=datetime.today(), freq='1d').to_pydatetime().tolist()})
# filter out the everything greater than 182 days
df_updated = df[df.datetime > datetime.today() - pd.Timedelta(days=182)]
# alternatively, use the relativedelta
df_updated = df[df.datetime > datetime.today() - rd(months=6)]

Calculate Last Friday of Month in Pandas

I've written this function to get the last Thursday of the month
def last_thurs_date(date):
month=date.dt.month
year=date.dt.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
But its not working with the lambda function.
datelist['Date'].map(lambda x: last_thurs_date(x))
Where datelist is
datelist = pd.DataFrame(pd.date_range(start = pd.to_datetime('01-01-2014',format='%d-%m-%Y')
, end = pd.to_datetime('06-03-2019',format='%d-%m-%Y'),freq='D').tolist()).rename(columns={0:'Date'})
datelist['Date']=pd.to_datetime(datelist['Date'])
Jpp already added the solution, but just to add a slightly more readable formatted string - see this awesome website.
import calendar
def last_thurs_date(date):
year, month = date.year, date.month
cal = calendar.monthcalendar(year, month)
# the last (4th week -> row) thursday (4th day -> column) of the calendar
# except when 0, then take the 3rd week (February exception)
last_thurs_date = cal[4][4] if cal[4][4] > 0 else cal[3][4]
return f'{year}-{month:02d}-{last_thurs_date}'
Also added a bit of logic - e.g. you got 2019-02-0 as February doesn't have 4 full weeks.
Scalar datetime objects don't have a dt accessor, series do: see pd.Series.dt. If you remove this, your function works fine. The key is understanding that pd.Series.apply passes scalars to your custom function via a loop, not an entire series.
def last_thurs_date(date):
month = date.month
year = date.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
You can rewrite your logic more succinctly via f-strings (Python 3.6+) and a ternary statement:
def last_thurs_date(date):
month = date.month
year = date.year
last_thurs_date = calendar.monthcalendar(year, month)[4][4]
return f'{year}{"-0" if month < 10 else "-"}{month}-{last_thurs_date}'
I know that a lot of time has passed since the date of this post, but I think it would be worth adding another option if someone came across this thread
Even though I use pandas every day at work, in that case my suggestion would be to just use the datetutil library. The solution is a simple one-liner, without unnecessary combinations.
from dateutil.rrule import rrule, MONTHLY, FR, SA
from datetime import datetime as dt
import pandas as pd
# monthly options expiration dates calculated for 2022
monthly_options = list(rrule(MONTHLY, count=12, byweekday=FR, bysetpos=3, dtstart=dt(2022,1,1)))
# last satruday of the month
last_saturday = list(rrule(MONTHLY, count=12, byweekday=SA, bysetpos=-1, dtstart=dt(2022,1,1)))
and then of course:
pd.DataFrame({'LAST_ST':last_saturdays}) #or whatever you need
This question answer Calculate Last Friday of Month in Pandas
This can be modified by selecting the appropriate day of the week, here freq='W-FRI'
I think the easiest way is to create a pandas.DataFrame using pandas.date_range and specifying freq='W-FRI.
W-FRI is Weekly Fridays
pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')
Creates all the Fridays in the date range between the min and max of the dates in df
Use a .groupby on year and month, and select .last(), to get the last Friday of every month for every year in the date range.
Because this method finds all the Fridays for every month in the range and then chooses .last() for each month, there's not an issue with trying to figure out which week of the month has the last Friday.
With this, use pandas: Boolean Indexing to find values in the Date column of the dataframe that are in last_fridays_in_daterange.
Use the .isin method to determine containment.
pandas: DateOffset objects
import pandas as pd
# test data: given a dataframe with a datetime column
df = pd.DataFrame({'Date': pd.date_range(start=pd.to_datetime('2014-01-01'), end=pd.to_datetime('2020-08-31'), freq='D')})
# create a dateframe with all Fridays in the daterange for min and max of df.Date
fridays = pd.DataFrame({'datetime': pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')})
# use groubpy and last, to get the last Friday of each month into a list
last_fridays_in_daterange = fridays.groupby([fridays.datetime.dt.year, fridays.datetime.dt.month]).last()['datetime'].tolist()
# find the data for the last Friday of the month
df[df.Date.isin(last_fridays_in_daterange)]

How can I change a month in a DateTime, using for loop (or better method )?

Revised question with appropriate MCVE:
As part of a script I'm writing I need to have a loop that contains a different pair of dates during each iteration, these dates are the first and last available stock trading dates of each month. I have managed to find a calendar with the available dates in an index however despite my research I am not sure how to select the correct dates from this index so that they can be used in the DateTime variables start and end.
Here is as far as my research has got me and I will continue to search for and build my own solution which I will post if I manage to find one:
from __future__ import division
import numpy as np
import pandas as pd
import datetime
import pandas_market_calendars as mcal
from pandas_datareader import data as web
from datetime import date
'''
Full date range:
'''
startrange = datetime.date(2016, 1, 1)
endrange = datetime.date(2016, 12, 31)
'''
Tradable dates in the year:
'''
nyse = mcal.get_calendar('NYSE')
available = nyse.valid_days(start_date='2016-01-01', end_date='2016-12-31')
'''
The loop that needs to take first and last trading date of each month:
'''
dict1 = {}
for i in available:
start = datetime.date('''first available trade day of the month''')
end = datetime.date('''last available trade day of the month''')
diffdays = ((end - start).days)/365
dict1 [i] = diffdays
print (dict1)
That is probably because 1 January 2016 was not a trading day. To check if I am right, try giving it the date 4 January 2016, which was the following Monday. If that works, then you will have to be more sophisticated about the dates you ask for.
Look in the documentaion for dm.BbgDataManager(). It is possible that you can ask it what dates are available.

Categories

Resources