dateutil rruleset: How to combine EXRULE and RDATE correctly? - python

I have a rruleset with a daily recurrence rule and now I am trying to combine an RDATE with an EXRULE.
from dateutil.rrule import rruleset, rrule, DAILY, FR
rules = rruleset()
daily = rrule(freq=DAILY, dtstart=datetime(2022, 10, 12))
rules.rrule(daily)
not_on_friday = rrule(freq=DAILY, byweekday=FR, dtstart=datetime(2022, 10, 12))
but_on_friday_21th = datetime(2022, 10, 21)
rules.exrule(not_on_friday)
rules.rdate(but_on_friday_21th)
rules.between(datetime(2022,10,12), datetime(2022,10,24))
>>
[datetime.datetime(2022, 10, 13, 0, 0),
datetime.datetime(2022, 10, 15, 0, 0), # the 14th is excluded as expected
datetime.datetime(2022, 10, 16, 0, 0),
datetime.datetime(2022, 10, 17, 0, 0),
datetime.datetime(2022, 10, 18, 0, 0),
datetime.datetime(2022, 10, 19, 0, 0),
datetime.datetime(2022, 10, 20, 0, 0),
datetime.datetime(2022, 10, 22, 0, 0), # but the 21th is also excluded
datetime.datetime(2022, 10, 23, 0, 0)]
Now, confusingly, when I combine my EXRULE with an EXDATE it works:
rules = rruleset()
daily = rrule(freq=DAILY, dtstart=datetime(2022, 10, 12))
rules.rrule(daily)
not_on_friday = rrule(freq=DAILY, byweekday=FR, dtstart=datetime(2022, 10, 12))
but_also_not_on_the_22th_a_saturday = datetime(2022, 10, 22)
rules.exrule(not_on_friday)
rules.exdate(but_also_not_on_the_22th_a_saturday)
rules.between(datetime(2022,10,12), datetime(2022,10,24))
>>
[datetime.datetime(2022, 10, 13, 0, 0),
datetime.datetime(2022, 10, 15, 0, 0), # the 14th still excluded
datetime.datetime(2022, 10, 16, 0, 0),
datetime.datetime(2022, 10, 17, 0, 0),
datetime.datetime(2022, 10, 18, 0, 0),
datetime.datetime(2022, 10, 19, 0, 0),
datetime.datetime(2022, 10, 20, 0, 0), # the 22th also excluded as expected
datetime.datetime(2022, 10, 23, 0, 0)]
So, if possible at all, how to combine RDATE and EXRULE in my rruleset?

In your answer you note that exrule is applied last, after all other inclusive rules which actually does appear to be in the RFC. However, at least in dateutil, you can use an rruleset as the argument to exrule, so to accomplish what you want, you can try filtering out the date that you want included from the rule that gets passed to exrule, like so:
from datetime import datetime
from dateutil.rrule import rruleset, rrule, DAILY, WEEKLY, FR
# Create an rruleset that defaults to every day
rules = rruleset()
daily = rrule(freq=DAILY, dtstart=datetime(2022, 10, 12))
rules.rrule(daily)
# Create an rruleset corresponding to the days we want to *exclude*: every
# Friday, except 2022-10-21
ex_set = rruleset()
ex_set.rrule(rrule(freq=WEEKLY, byweekday=FR, dtstart=datetime(2022, 10, 14)))
ex_set.exdate(datetime(2022, 10, 21))
# Use our second rule set as an exrule
rules.exrule(ex_set)
rules.between(datetime(2022,10,12), datetime(2022,10,24))
Since the date you want to include never appears in the exrule, it is not filtered out:
>>> print("\n".join(map(str,
... map(datetime.date,
... rules.between(datetime(2022, 10, 12),
... datetime(2022, 10, 24))))))
2022-10-13
2022-10-15
2022-10-16
2022-10-17
2022-10-18
2022-10-19
2022-10-20
2022-10-21
2022-10-22
2022-10-23

So apparently there is no such thing as an EXRULE in the iCalendar specs. Its just RRULEs. And dateutils exdate function states in the doc string:
def exrule(self, exrule):
""" Include the given rrule instance in the recurrence set exclusion
list. Dates which are part of the given recurrence rules will not
be generated, even if some inclusive rrule or rdate matches them.
"""
So, even if I add an RDATE, if it is exclude by a rule added by exrule it will not show up in my occurrences. Same goes for the exdate function, hence my working second example.

Related

Add '0' before days and months using to_pydatetime()

I have data stored in a S3 bucket which uses "yyyy/MM/dd" format to store the files per date, like in this sample S3a path: s3a://mybucket/data/2018/07/03. The files in these buckets are in json.gz format and I would like to import all these files to a spark dataframe per day. After that I want to feed these spark dfs to some written code via a for loop:
for date in date_range:
s3a = 's3a://mybucket/data/{}/{}/{}/*.json.gz'.format(date.year, date.month, date.day)
df = spark.read.format('json').option("header", "true").load(s3a)
# Execute code here
In order to read the data, I tried to format the date_range like below:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).to_pydatetime().tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
date_range
[datetime.datetime(2018, 3, 6, 0, 0),
datetime.datetime(2018, 3, 7, 0, 0),
datetime.datetime(2018, 3, 8, 0, 0),
datetime.datetime(2018, 3, 9, 0, 0),
datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0),
datetime.datetime(2018, 3, 12, 0, 0)]
The problem is that pydatetime() returns the days and months without a '0'. How do I make sure that my code returns a list of values with '0's, like below:
[datetime.datetime(2018, 03, 06, 0, 0),
datetime.datetime(2018, 03, 07, 0, 0),
datetime.datetime(2018, 03, 08, 0, 0),
datetime.datetime(2018, 03, 09, 0, 0),
datetime.datetime(2018, 03, 10, 0, 0),
datetime.datetime(2018, 03, 11, 0, 0),
datetime.datetime(2018, 03, 12, 0, 0)]
This is one approach using .strftime("%Y/%m/%d")
Ex:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).strftime("%Y/%m/%d").tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
print(date_range)
Output:
['2018/03/06',
'2018/03/07',
'2018/03/08',
'2018/03/09',
'2018/03/10',
'2018/03/11',
'2018/03/12']
for date in date_range:
s3a = 's3a://mybucket/data/{}/*.json.gz'.format(date)
print(s3a)
s3a://mybucket/data/2018/03/06/*.json.gz
s3a://mybucket/data/2018/03/07/*.json.gz
s3a://mybucket/data/2018/03/08/*.json.gz
s3a://mybucket/data/2018/03/09/*.json.gz
s3a://mybucket/data/2018/03/10/*.json.gz
s3a://mybucket/data/2018/03/11/*.json.gz
s3a://mybucket/data/2018/03/12/*.json.gz

DateLocator in matplotlib to show the first days of both the week and the month

I would like to create a DateLocator in matplotlib that selects all Mondays and the first days of the month. As matplotlib uses the dateutil library I read the docs of how to use RRuleLocator with rrule objects. With the rruleset object from dateutil I can achieve the required functionality:
>>> rrset = rruleset()
>>> rrset.rrule(rrule(DAILY, byweekday=MO, count=5))
>>> rrset.rrule(rrule(DAILY, bymonthday=1, count=5))
>>> list(rrset)
[datetime.datetime(2020, 11, 30, 16, 10, 2),
datetime.datetime(2020, 12, 1, 16, 10, 2),
datetime.datetime(2020, 12, 7, 16, 10, 2),
datetime.datetime(2020, 12, 14, 16, 10, 2),
datetime.datetime(2020, 12, 21, 16, 10, 2),
datetime.datetime(2020, 12, 28, 16, 10, 2),
datetime.datetime(2021, 1, 1, 16, 10, 2),
datetime.datetime(2021, 2, 1, 16, 10, 2),
datetime.datetime(2021, 3, 1, 16, 10, 2),
datetime.datetime(2021, 4, 1, 16, 10, 2)]
But unfortunately I did not manage to find out how to use rruleset with matplotlib. RRuleLocator expects a rrulewrapper object (defined in matplotlib) that hides away the rrule instance and I can not use it with rruleset. Any other way to do this?
If I understood you correctly, calling .set_xticks(list(rrset)) might be enough. For example:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import dateutil
from dateutil.rrule import *
import datetime
import numpy as np
rrset = rruleset()
rrset.rrule(rrule(DAILY, byweekday=MO, count=5))
rrset.rrule(rrule(DAILY, bymonthday=1, count=5))
print(list(rrset))
## generate dates 90 days into the future
base = datetime.datetime.today()
dates = [base + datetime.timedelta(days=3*x) for x in range(30)]
fig = plt.figure(figsize=(10,5))
ax = plt.subplot(111)
ax.set_autoscale_on(True)
## simply plot dates over dates
ax.plot(dates,dates,marker='s')
ax.set_xticks(list(rrset))
formatter = mdates.DateFormatter('%m/%d/%y')
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_tick_params(rotation=30, labelsize=10)
ax.autoscale_view()
ax.grid()
plt.show()
yields (today on 11/26/20 where 11/30/2020 is the next Monday, hence the tick label overlapping with the first of the month):

I tried to calculate the time interval between two datetime in python, but the result is not correct

This is the list that I am using
dates = [
datetime.datetime(2020, 8, 24, 8, 23),
datetime.datetime(2020, 8, 24, 12, 21),
datetime.datetime(2020, 8, 23, 17, 13),
datetime.datetime(2020, 8, 22, 4, 12),
datetime.datetime(2020, 8, 21, 13, 42),
datetime.datetime(2020, 8, 21, 12, 34),
datetime.datetime(2020, 8, 19, 5, 32),
datetime.datetime(2020, 8, 12, 2, 55),
datetime.datetime(2020, 8, 11, 10, 10),
datetime.datetime(2020, 8, 11, 13, 55),
datetime.datetime(2020, 8, 11, 13, 7)
]
And for calculating the time interval I used this:
dates[0]- dates[1]
and I got this:
datetime.timedelta(days=-1, seconds=72120)
It doesn't make sense at all!! Since datetime(year, month, day, hour, minute, second), the first element in the list is almost 4 hours ahead from the second one. But the result says something completely different.
Actually the result is correct.
"days = -1, seconds=72120 " comes from a normalization of the delta value.
For calculating the time in hours:
72.120 s / 60 / 60 ≈ 20h
20h - 24h = -4h
means Date[0] is 4 hours ahead of Date[1]
Dates and numbers work the same way in that if you want to get a positive result, you need to subtract the smaller value from the bigger value, not the other way around.
>>> datetime.datetime(2020, 8, 24, 12, 21) - datetime.datetime(2020, 8, 24, 8, 23)
datetime.timedelta(seconds=14280)
14280 seconds ~= 4 hours.
If you subtract them the other way (like in your code, subtracting larger from smaller), you get a value that equates to negative 4 hours:
>>> datetime.datetime(2020, 8, 24, 8, 23) - datetime.datetime(2020, 8, 24, 12, 21)
datetime.timedelta(days=-1, seconds=72120)
>>> (datetime.datetime(2020, 8, 24, 8, 23) - datetime.datetime(2020, 8, 24, 12, 21)).total_seconds()
-14280.0
Note that the default way of expressing this negative delta for formatting purposes is "minus one day plus 72120 seconds"; using total_seconds() converts it to simply a number of seconds which makes it a little easier to reason about IMO.
It makes absolutely sense. It does exactly what you asked it to do. You asked to subtract a later date from an earlier date, so the result is negative (1 day = 24 hours; 72120 seconds equals roughly 20 hours) so you have -24h+20h=-4h. If you turn it around you get the exact result:
print(dates[1]-dates[0])
Out:
3:58:00
You will have to check if value 'a' is higher than 'b'
# Simulating the list
case.append(datetime.datetime(2020,8,24,8,23))`
>>> case
[datetime.datetime(2020, 8, 24, 8, 23)]
>>> case.append(datetime.datetime(2020,8,24,12,21))
>>> case
[datetime.datetime(2020, 8, 24, 8, 23), datetime.datetime(2020, 8, 24, 12, 21)]
# Negative value outcome
>>> case[0] - case[1]
datetime.timedelta(days=-1, seconds=72120)
*Positive value outcome*
>>> case[1] - case[0]
datetime.timedelta(seconds=14280)
>>> 14280/60
238.0 (minutes)
>>> 14280/60/60
3.966666666666667 (hours)
This post shows the use of (abs):
why I have negative date by subtraction of two column?

Why is datetime being truncated in Pandas?

I would like to filter pandas using the time stamp. This works fine for all hours except 0. If I filter for dt.hour = 0, only the date is displayed and not the time. How can I have the time displayed too?
import datetime
df = pd.DataFrame({'datetime': [datetime.datetime(2005, 7, 14, 12, 30),
datetime.datetime(2005, 7, 14, 0, 0),
datetime.datetime(2005, 7, 14, 10, 30),
datetime.datetime(2005, 7, 14, 15, 30)]})
print(df[df['datetime'].dt.hour == 10])
print(df[df['datetime'].dt.hour == 0]
use strftime:
print(df[df['datetime'].dt.hour == 0].datetime.dt.strftime("%Y-%m-%d %H:%M:%S"))
The result is:
1 2005-07-14 00:00:00
Name: datetime, dtype: object

Datetime object from a string for 15 days frequency

I am trying to code a function called days15(). The function will be passed an argument called ‘myDateStr’. myDateStr is string representation of a date in the form 20170817 (that is YearMonthDay). The code in the function will create a datetime object from the string, it will then create a timedelta object with a length of 1 day. Then, it will use a list comprehension to produce a list of 15 datetime objects, starting with the date that is passed to the function
the function should return the following list.
[datetime.datetime(2017, 8, 17, 0, 0), datetime.datetime(2017, 8, 18, 0, 0), datetime.datetime(2017, 8, 19, 0, 0), datetime.datetime(2017, 8, 20, 0, 0), datetime.datetime(2017, 8, 21, 0, 0), datetime.datetime(2017, 8, 22, 0, 0), datetime.datetime(2017, 8, 23, 0, 0), datetime.datetime(2017, 8, 24, 0, 0), datetime.datetime(2017, 8, 25, 0, 0), datetime.datetime(2017, 8, 26, 0, 0), datetime.datetime(2017, 8, 27, 0, 0), datetime.datetime(2017, 8, 28, 0, 0), datetime.datetime(2017, 8, 29, 0, 0), datetime.datetime(2017, 8, 30, 0, 0), datetime.datetime(2017, 8, 31, 0, 0)]
I am stuck for the code. I have strted with the below.Please help. Thanks
from datetime import datetime, timedelta
myDateStr = '20170817'
def days15(myDateStr):
Pandas will help you in converting strings to datetime, so first you need to import it:
from datetime import datetime, timedelta
import pandas as pd
myDateStr = '20170817'
Then you can initialize an empty list that you'll later append:
datelist = []
And then you write a function:
def days15(myDateStr):
#converting to datetime
date = pd.to_datetime(myDateStr)
#loop to create 15 datetimes
for i in range(15):
newdate = date + timedelta(days=i)
#adding new dates to the list
datelist.append(newdate)
and then you can call your function and get a list of 15 datetimes:
days15(myDateStr)
As you said, there will be two steps to implement: firstly, convert the string date to a datetime object and secondly, iterate over the next 15 days using timedelta, with a list comprehension or a simple loop.
from datetime import datetime, timedelta
myDateStr = '20170817'
# Parse the string and return a datetime object
def getDateTime(date):
return datetime(int(date[:4]),int(date[4:6]),int(date[6:]))
# Iterate over the timedelta added to the starting date
def days15(myDateStr):
return [getDateTime(myDateStr) + timedelta(days=x) for x in range(15)]

Categories

Resources