I've written this function to get the last Thursday of the month
def last_thurs_date(date):
month=date.dt.month
year=date.dt.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
But its not working with the lambda function.
datelist['Date'].map(lambda x: last_thurs_date(x))
Where datelist is
datelist = pd.DataFrame(pd.date_range(start = pd.to_datetime('01-01-2014',format='%d-%m-%Y')
, end = pd.to_datetime('06-03-2019',format='%d-%m-%Y'),freq='D').tolist()).rename(columns={0:'Date'})
datelist['Date']=pd.to_datetime(datelist['Date'])
Jpp already added the solution, but just to add a slightly more readable formatted string - see this awesome website.
import calendar
def last_thurs_date(date):
year, month = date.year, date.month
cal = calendar.monthcalendar(year, month)
# the last (4th week -> row) thursday (4th day -> column) of the calendar
# except when 0, then take the 3rd week (February exception)
last_thurs_date = cal[4][4] if cal[4][4] > 0 else cal[3][4]
return f'{year}-{month:02d}-{last_thurs_date}'
Also added a bit of logic - e.g. you got 2019-02-0 as February doesn't have 4 full weeks.
Scalar datetime objects don't have a dt accessor, series do: see pd.Series.dt. If you remove this, your function works fine. The key is understanding that pd.Series.apply passes scalars to your custom function via a loop, not an entire series.
def last_thurs_date(date):
month = date.month
year = date.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
You can rewrite your logic more succinctly via f-strings (Python 3.6+) and a ternary statement:
def last_thurs_date(date):
month = date.month
year = date.year
last_thurs_date = calendar.monthcalendar(year, month)[4][4]
return f'{year}{"-0" if month < 10 else "-"}{month}-{last_thurs_date}'
I know that a lot of time has passed since the date of this post, but I think it would be worth adding another option if someone came across this thread
Even though I use pandas every day at work, in that case my suggestion would be to just use the datetutil library. The solution is a simple one-liner, without unnecessary combinations.
from dateutil.rrule import rrule, MONTHLY, FR, SA
from datetime import datetime as dt
import pandas as pd
# monthly options expiration dates calculated for 2022
monthly_options = list(rrule(MONTHLY, count=12, byweekday=FR, bysetpos=3, dtstart=dt(2022,1,1)))
# last satruday of the month
last_saturday = list(rrule(MONTHLY, count=12, byweekday=SA, bysetpos=-1, dtstart=dt(2022,1,1)))
and then of course:
pd.DataFrame({'LAST_ST':last_saturdays}) #or whatever you need
This question answer Calculate Last Friday of Month in Pandas
This can be modified by selecting the appropriate day of the week, here freq='W-FRI'
I think the easiest way is to create a pandas.DataFrame using pandas.date_range and specifying freq='W-FRI.
W-FRI is Weekly Fridays
pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')
Creates all the Fridays in the date range between the min and max of the dates in df
Use a .groupby on year and month, and select .last(), to get the last Friday of every month for every year in the date range.
Because this method finds all the Fridays for every month in the range and then chooses .last() for each month, there's not an issue with trying to figure out which week of the month has the last Friday.
With this, use pandas: Boolean Indexing to find values in the Date column of the dataframe that are in last_fridays_in_daterange.
Use the .isin method to determine containment.
pandas: DateOffset objects
import pandas as pd
# test data: given a dataframe with a datetime column
df = pd.DataFrame({'Date': pd.date_range(start=pd.to_datetime('2014-01-01'), end=pd.to_datetime('2020-08-31'), freq='D')})
# create a dateframe with all Fridays in the daterange for min and max of df.Date
fridays = pd.DataFrame({'datetime': pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')})
# use groubpy and last, to get the last Friday of each month into a list
last_fridays_in_daterange = fridays.groupby([fridays.datetime.dt.year, fridays.datetime.dt.month]).last()['datetime'].tolist()
# find the data for the last Friday of the month
df[df.Date.isin(last_fridays_in_daterange)]
Related
I'm trying to build a list of "pay days" for a given month in the future knowing only when the pay days started months ago. For example:
Starting date - When the paychecks started: 1/6/2023
Frequency is every two weeks
So if I want to know which dates are pay days in March, I have to start at the 1/6/2023 and add two weeks until I get to March to know that the first pay day in March is 3/3/2/2023.
Then I want my final list of dates to be only those March dates of:
(3/3/2023, 3/17/2023, 3/31/2023)
I know I can use pandas to do something like:
pd.date_range(starting_date, starting_date+relativedelta(months=1), freq='14d')
but it would include every date back to 1/6/2023.
The easiest thing to do here would be to just update the starting_date parameter to be the first pay day in the month you're interested in.
To do this, you can use this function that finds the first pay day in a given month by first finding the difference between your start date and the desired month.
# month is the number of the month (1-12)
def get_first_pay_day_in_month(month=datetime.datetime.now().month,
year=datetime.datetime.now().year,
start_date=datetime.datetime(2023, 1, 6),
):
diff = datetime.datetime(year, month, 1) - start_date
freq = 14
if diff.days % freq == 0:
print(f'Difference: {diff.days/freq} weeks')
return datetime.datetime(year,month,1)
else:
print(f'Difference: {diff.days} days')
print(f'Days: {diff.days % freq} extra')
return datetime.datetime(year,month,1 + 14 - (diff.days % freq))
Then you can use this function to get the first pay day of a specific month and plug it into the date_range method.
from dateutil import relativedelta
starting_date = get_first_pay_day_in_month(month=3)
pay_days = pd.date_range(starting_date, starting_date+relativedelta.relativedelta(months=1), freq='14d')
print(pay_days)
I'm trying to group an xarray.Dataset object into a custom 5-month period spanning from October-January with an annual frequency. This is complicated because the period crosses New Year.
I've been trying to use the approach
wb_start = temperature.sel(time=temperature.time.dt.month.isin([10,11,12,1]))
wb_start1 = wb_start.groupby('time.year')
But this predictably makes the January month of the same year, instead of +1 year. Any help would be appreciated!
I fixed this in a somewhat clunk albeit effective way by adding a year to the months after January. My method essentially moves the months 10,11,12 up one year while leaving the January data in place, and then does a groupby(year) instance on the reindexed time data.
wb_start = temperature.sel(time=temperature.time.dt.month.isin([10,11,12,1]))
# convert cftime to datetime
datetimeindex = wb_start.indexes['time'].to_datetimeindex()
wb_start['time'] = pd.to_datetime(datetimeindex)
# Add custom group by year functionality
custom_year = wb_start['time'].dt.year
# convert time type to pd.Timestamp
time1 = [pd.Timestamp(i) for i in custom_year['time'].values]
# Add year to Timestamp objects when month is before Jan. (relativedelta does not work from np.datetime64)
time2 = [i + relativedelta(years=1) if i.month>=10 else i for i in time1]
wb_start['time'] = time2
#Groupby using the new time index
wb_start1 = wb_start.groupby('time.year')
#first and last day of every month
s_january, e_january = ("1/1/2017"), ("1/31/2017")
s_february, e_february = ("2/1/2017"), ("2/28/2017")
s_march, e_march = ("3/1/2017"), ("3/31/2017")
s_april, e_april = ("4/1/2017"), ("4/30/2017")
s_may, e_may = ("5/1/2017"), ("5/31/2017")
s_june, e_june = ("6/1/2017"), ("6/30/2017")
s_july, e_july = ("7/1/2017"), ("7/31/2017")
s_august, e_august = ("8/1/2017"), ("8/31/2017")
s_September, e_September = ("9/1/2017"), ("9/30/2017")
s_october, e_october = ("10/1/2017"), ("10/31/2017")
s_november, e_november = ("11/1/2017"), ("11/30/2017")
s_december, e_december = ("12/1/2017"), ("12/31/2017")
def foo(s_date, e_date):
does stuff
foo(s_january, e_january)
foo(s_february, e_february)
foo(s_march, e_march)
foo(s_april, e_april)
foo(s_may, e_may)
foo(s_june, e_june)
foo(s_july, e_july)
foo(s_august, e_august)
foo(s_september, e_september)
foo(s_october, e_october)
foo(s_november, e_november)
foo(s_december, e_december)
I have a function that on a random date does stuff, but I have to call the function for every month, if I put the range for year I don't get the result that I want.
Is there any better way to avoid running it 12 times?
Set up your dates in a dictionary rather than 24 variables, and make life easier for yourself by computing the first and last day of each month. It would be useful also to represent your dates as datetimes not strings, since it's clear from your question header that you want to do computation on them.
import datetime
from dateutil import relativedelta
year = 2017
dates = {}
for month in range(1,13):
dates[(year,month)] = (
datetime.date(year,month,1),
datetime.date(year,month,1)
+ relativedelta.relativedelta(months=1)
- relativedelta.relativedelta(days=1))
The first element in each tuple is computed straightforwardly as the first day of the month. The second date is the same date, but with one month added (first day of the next month) and then one day subtracted, to get the last day of the month.
Then you can do:
for (year,month),(start,end) in dates.items():
print(year, month, foo (start,end))
You could use a dictionary to keep all start end end dates:
import calendar
import datetime as dt
def foo(s_date, e_date):
print ("Doing something between {} and {}".format(s_date.strftime('%d/%m/%Y'), e_date.strftime('%d/%m/%Y')))
def getMonths(year):
result = {}
for month in range(1, 13):
lastDayOfMonth = calendar.monthrange(year, month)[1]
result[month] = (dt.datetime(year, month, 1), dt.datetime(year, month, lastDayOfMonth))
return result
for month, start_end_dates in getMonths(2018).items():
foo(*start_end_dates)
Prints:
Doing something between 01/01/2018 and 31/01/2018
Doing something between 01/02/2018 and 28/02/2018
Doing something between 01/03/2018 and 31/03/2018
...
What do you mean by putting the range for year?
You could consider putting your dates to a dictionary or nested lists.
If I want to add a loop to constrain days as well, what is the easiest way to do it, considering different length of month, leap years etc.
This is the script with years and months:
yearStart = 2010
yearEnd = 2017
monthStart = 1
monthEnd = 12
for year in list(range(yearStart, yearEnd + 1)):
for month in list(range(monthStart, monthEnd + 1)):
startDate = '%04d%02d%02d' % (year, month, 1)
numberOfDays = calendar.monthrange(year, month)[1]
lastDate = '%04d%02d%02d' % (year, month, numberOfDays)
If you want only the days then this code, using the pendulum library, is probably the easiest.
>>> import pendulum
>>> first_date = pendulum.Pendulum(2010, 1, 1)
>>> end_date = pendulum.Pendulum(2018, 1, 1)
>>> for day in pendulum.period(first_date, end_date).range('days'):
... print (day)
... break
...
2010-01-01T00:00:00+00:00
pendulum has many other nice features. For one thing, it's a drop-in replacement for datetime. Therefore, many of the properties and methods that you are familiar with using for that class will also be available to you.
You may want to use datetime in addition to calendar library. I am exactly not sure on requirements. But it appears you want the first date and last date of a given month and year. And, then loop through those dates. The following function will give you the first day and last day of each month. Then, you can loop between those two dates in whichever way you want.
import datetime
import calendar
def get_first_last_day(month, year):
date = datetime.datetime(year=year, month=month, day=1)
first_day = date.replace(day = 1)
last_day = date.replace(day = calendar.monthrange(date.year, date.month)[1])
return first_day, last_day
Adding the logic for looping through 2 dates as well.
d = first_day
delta = datetime.timedelta(days=1)
while d <= last_day:
print d.strftime("%Y-%m-%d")
d += delta
I have a string that is the full year followed by the ISO week of the year (so some years have 53 weeks, because the week counting starts at the first full week of the year). I want to convert it to a datetime object using pandas.to_datetime(). So I do:
pandas.to_datetime('201145', format='%Y%W')
and it returns:
Timestamp('2011-01-01 00:00:00')
which is not right. Or if I try:
pandas.to_datetime('201145', format='%Y%V')
it tells me that %V is a bad directive.
What am I doing wrong?
I think that the following question would be useful to you: Reversing date.isocalender()
Using the functions provided in that question this is how I would proceed:
import datetime
import pandas as pd
def iso_year_start(iso_year):
"The gregorian calendar date of the first day of the given ISO year"
fourth_jan = datetime.date(iso_year, 1, 4)
delta = datetime.timedelta(fourth_jan.isoweekday()-1)
return fourth_jan - delta
def iso_to_gregorian(iso_year, iso_week, iso_day):
"Gregorian calendar date for the given ISO year, week and day"
year_start = iso_year_start(iso_year)
return year_start + datetime.timedelta(days=iso_day-1, weeks=iso_week-1)
def time_stamp(yourString):
year = int(yourString[0:4])
week = int(yourString[-2:])
day = 1
return year, week, day
yourTimeStamp = iso_to_gregorian( time_stamp('201145')[0] , time_stamp('201145')[1], time_stamp('201145')[2] )
print yourTimeStamp
Then run that function for your values and append them as date time objects to the dataframe.
The result I got from your specified string was:
2011-11-07