I'm trying to specify business days in a foreign country, but I can't get the pandas function pd.bdate_range() to recognize holidays. My code is as follows:
import pandas as pd
import datetime
weekmask = "Mon Tue Wed Thu Fri"
holidays = [datetime.datetime(2017, 1, 9), datetime.datetime(2017, 3, 20),
datetime.datetime(2017, 4, 13)]
BdaysCol2017 = pd.bdate_range(start = pd.datetime(2017, 1, 1),
end = pd.datetime(2017, 12, 31),
weekmask = weekmask,
holidays = holidays)
But I get the following error on the holidays parameter:
ValueError: a custom frequency string is required when holidays or weekmask are passed, got frequency B
Why is this? How can I specify custom holidays? Is there a better way to do this?
Thank you
as the docs specify about weekmask and holidays:
only used when custom frequency strings are passed
so you need:
BdaysCol2017 = pd.bdate_range(start = pd.datetime(2017, 1, 1),
end = pd.datetime(2017, 12, 31),
freq='C',
weekmask = weekmask,
holidays=holidays)
I would do that:
import pandas as pd
from datetime import datetime
weekmask = 'Sun Mon Tue Wed Thu'
exclude = [pd.datetime(2017, 3, 20),
pd.datetime(2017, 4, 13),
pd.datetime(2017, 5, 3)]
pd.bdate_range('2017/1/1','2017/12/31',
freq='C',
weekmask = weekmask,
holidays=exclude)
Related
I have data stored in a S3 bucket which uses "yyyy/MM/dd" format to store the files per date, like in this sample S3a path: s3a://mybucket/data/2018/07/03. The files in these buckets are in json.gz format and I would like to import all these files to a spark dataframe per day. After that I want to feed these spark dfs to some written code via a for loop:
for date in date_range:
s3a = 's3a://mybucket/data/{}/{}/{}/*.json.gz'.format(date.year, date.month, date.day)
df = spark.read.format('json').option("header", "true").load(s3a)
# Execute code here
In order to read the data, I tried to format the date_range like below:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).to_pydatetime().tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
date_range
[datetime.datetime(2018, 3, 6, 0, 0),
datetime.datetime(2018, 3, 7, 0, 0),
datetime.datetime(2018, 3, 8, 0, 0),
datetime.datetime(2018, 3, 9, 0, 0),
datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0),
datetime.datetime(2018, 3, 12, 0, 0)]
The problem is that pydatetime() returns the days and months without a '0'. How do I make sure that my code returns a list of values with '0's, like below:
[datetime.datetime(2018, 03, 06, 0, 0),
datetime.datetime(2018, 03, 07, 0, 0),
datetime.datetime(2018, 03, 08, 0, 0),
datetime.datetime(2018, 03, 09, 0, 0),
datetime.datetime(2018, 03, 10, 0, 0),
datetime.datetime(2018, 03, 11, 0, 0),
datetime.datetime(2018, 03, 12, 0, 0)]
This is one approach using .strftime("%Y/%m/%d")
Ex:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).strftime("%Y/%m/%d").tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
print(date_range)
Output:
['2018/03/06',
'2018/03/07',
'2018/03/08',
'2018/03/09',
'2018/03/10',
'2018/03/11',
'2018/03/12']
for date in date_range:
s3a = 's3a://mybucket/data/{}/*.json.gz'.format(date)
print(s3a)
s3a://mybucket/data/2018/03/06/*.json.gz
s3a://mybucket/data/2018/03/07/*.json.gz
s3a://mybucket/data/2018/03/08/*.json.gz
s3a://mybucket/data/2018/03/09/*.json.gz
s3a://mybucket/data/2018/03/10/*.json.gz
s3a://mybucket/data/2018/03/11/*.json.gz
s3a://mybucket/data/2018/03/12/*.json.gz
I want to write a program in Python to find list of 7 consecutive days of last month in Python for example if today is 2019-5-9 then my output should be list of dates from 2019-4-8 to 2019-4-2 and for that I used datetime module of Python. I have written a program as below but I don't want output in datetime.date() format instead 2019-4-8. Can you please tell me how to do this. Also, give suggestions of other less time complexity and simple code.
from datetime import date, timedelta
current_date=date.today()
current_year=date.today().year
current_day=date.today().day-1
month_before=date.today().month-1
date_before=current_date.replace(current_year,month_before,current_day)
month_list=[date_before]
print(month_list)
for i in range(1,7):
month_list.append(date_before-timedelta(days=i))
print (month_list)
output is
[datetime.date(2019, 4, 8)]
[datetime.date(2019, 4, 8), datetime.date(2019, 4, 7), datetime.date(2019, 4, 6), datetime.date(2019, 4, 5), datetime.date(2019, 4, 4), datetime.date(2019, 4, 3), datetime.date(2019, 4, 2)]
You can format date using strftime
from datetime import date, timedelta
current_date=date.today()
current_year=date.today().year
current_day=date.today().day-1
month_before=date.today().month-1
date_before=current_date.replace(current_year,month_before,current_day)
month_list=[date_before]
print(month_list)
month_list=[(date_before-timedelta(days=i)).strftime("%Y-%m-%d") for i in range(7)]
print(month_list)
OR you can use pandas date_range and dateoffset.
import pandas as pd
date_list = sorted(pd.date_range(pd.datetime.today()
- pd.DateOffset(months=1)
- pd.DateOffset(days=7) ,
periods=7).tolist(), reverse=True)
print([dt.strftime("%Y-%m-%d") for dt in date_list])
Result:
['2019-04-08', '2019-04-07', '2019-04-06', '2019-04-05', '2019-04-04', '2019-04-03', '2019-04-02']
I'm using rrule from python dateutil and don't know how to create an rruleset for the following example:
Monday, three weeks in a row. Then a week not, then again three weeks in a row, one not, and so on.
Any advice on creating an rrule(set) for this?
One way to do this is to use an rruleset with a WEEKLY rrule and a corresponding exrule for every 4th week:
from dateutil.rrule import rrule, rruleset
from dateutil.rrule import WEEKLY
from dateutil.relativedelta import relativedelta
from datetime import datetime, timedelta
dtstart = datetime(2011, 1, 1)
rrset = rruleset()
weekly_rule = rrule(freq=WEEKLY, dtstart=dtstart)
every_4_weeks = rrule(freq=WEEKLY, interval=4,
dtstart=dtstart + relativedelta(weeks=4))
rrset.rrule(weekly_rule)
rrset.exrule(every_4_weeks)
rrset.between(dtstart, dtstart + timedelta(days=65))
The result:
[datetime.datetime(2011, 1, 8, 0, 0),
datetime.datetime(2011, 1, 15, 0, 0),
datetime.datetime(2011, 1, 22, 0, 0),
datetime.datetime(2011, 2, 5, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 19, 0, 0),
datetime.datetime(2011, 3, 5, 0, 0)]
The way it works is weekly_rule generates one date per week, and the every_4_weeks generates every 4th week, starting with the 4th week after dtstart. That gives you a 3-on 1-off schedule.
I am trying to generate list of months between two dates.
For Example:
startDate = '2016-1-31'
endDate = '2017-3-26'
It should result as:
datetime.date(2016, 1, 31)
datetime.date(2016, 2, 28)
and so on....
I am trying like this
startDate = '2016-1-28'
endDate = '2017-3-26'
start = date(*map(int, startDate.split('-')))
end = date(*map(int, endDate.split('-')))
week = start
dateData = []
while week <= end:
dateData.append(week)
week = week + datetime.timedelta(weeks=4)
pprint(dateData)
This gives result as:
[datetime.date(2016, 1, 31),
datetime.date(2016, 2, 28),
datetime.date(2016, 3, 27),
datetime.date(2016, 4, 24),
datetime.date(2016, 5, 22),
datetime.date(2016, 6, 19),
datetime.date(2016, 7, 17),
datetime.date(2016, 8, 14),
datetime.date(2016, 9, 11),
datetime.date(2016, 10, 9),
datetime.date(2016, 11, 6),
datetime.date(2016, 12, 4),
datetime.date(2017, 1, 1),
datetime.date(2017, 1, 29),
datetime.date(2017, 2, 26),
datetime.date(2017, 3, 26)]
Here "2016, 12" & "2017, 1" is repeating twice. Can anybody help me solve this problem.
You could use the dateutil extension's relativedelta method like below -
from datetime import datetime
from dateutil.relativedelta import relativedelta
startDate = '2016-1-28'
endDate = '2017-3-26'
cur_date = start = datetime.strptime(startDate, '%Y-%m-%d').date()
end = datetime.strptime(endDate, '%Y-%m-%d').date()
while cur_date < end:
print(cur_date)
cur_date += relativedelta(months=1)
Following is the output
2016-01-28
2016-02-28
2016-03-28
2016-04-28
2016-05-28
2016-06-28
2016-07-28
2016-08-28
2016-09-28
2016-10-28
2016-11-28
2016-12-28
2017-01-28
2017-02-28
You can also use only pandas in one line:
import pandas as pd
pd.date_range('2018-01', '2020-05', freq='M')
The output will be:
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
'2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
'2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31',
'2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
'2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
'2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31',
'2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30'],
dtype='datetime64[ns]', freq='M')
I'm not privileged enough to comment yet, but your program is doing exactly as it is told. 4 weeks equals 28 days. The difference between 1st Jan and 29th Jan (2017) is 28 days; therefore you are getting the same month twice.
You might want to redefine what you are trying to solve. If however, you do want to solve just for the months of the years between the two dates, your code will need some expanding.
You need to create a loop to iterate over the years and months.
You will need a condition for the start year and end year to ensure you begin at the start month and finish at the end month.
Here is a working example, it includes the start and end dates in the list as well. I hope it helps:
import datetime
startDate = '2016-1-28'
endDate = '2017-3-26'
start = datetime.date(*map(int, startDate.split('-')))
end = datetime.date(*map(int, endDate.split('-')))
week = start
dateData = []
dateData.append(start)
rangeYear = (end.year - start.year)
for i in range(rangeYear + 1):
if i == 0:
for j in range(1,13-start.month):
date = datetime.date(start.year, start.month+j, 1)
dateData.append(date)
elif (i > 0) & (i < rangeYear):
for j in range(1,12):
date = datetime.date(start.year+i, j, 1)
dateData.append(date)
elif i == rangeYear:
for j in range(1,end.month):
date = datetime.date(start.year+i, j, 1)
dateData.append(date)
dateData.append(end)
How would I create an rrule that excludes only the end timestamp using dateutil? Would I have to create a custom function or is there a way to do it natively?
here is an example
rule = rrule.rrule(rule.HOURLY,tsstart=somedate,until=somedate_four_hours_later)
I want the output to EXCLUDE somedate_four_hours_later and only generate 4 timestamps, somedate, somedate+1 hour, etc.
One way to do this is to use an rruleset, which allows you to combine recurrence rules and specific dates as required. In this case, what you'd do is set the until date as an exdate (excluded date):
from dateutil import rrule
from datetime import datetime, timedelta
dtstart = datetime(2015, 1, 3, 12)
dtuntil = datetime(2015, 1, 3, 16)
rr = rrule.rrule(freq=rrule.HOURLY, dtstart=dtstart, until=dtuntil)
# Add your rrule to the ruleset, then exclude the until date from the rule set
rrset = rrule.rruleset()
rrset.rrule(rr)
l1 = list(rrset)
rrset.exdate(dtuntil)
l2 = list(rrset)
print(l1[-1]) # 2015-01-03 16:00:00
print(l2[-1]) # 2015-01-03 15:00:00
The rrule itself will include the until date, but the exdate will exclude it from the rruleset.
You can set the count:
In [1]: from dateutil import rrule
In [2]: from datetime import datetime, timedelta
In [3]: st = datetime(2016, 7, 5, 23, 30)
In [4]: rule = rrule.rrule(rrule.HOURLY, st, count=4)
In [5]: print(list(rule))
[datetime.datetime(2016, 7, 5, 23, 30), datetime.datetime(2016, 7, 6, 0, 30), datetime.datetime(2016, 7, 6, 1, 30), datetime.datetime(2016, 7, 6, 2, 30)]
Or use timedelta adding three hours to the start:
from datetime import datetime, timedelta
st = datetime(2016, 7, 5, 23, 30)
rule = rrule.rrule(rrule.HOURLY, st, until=(st + timedelta(hours=3)))