I'm using rrule from python dateutil and don't know how to create an rruleset for the following example:
Monday, three weeks in a row. Then a week not, then again three weeks in a row, one not, and so on.
Any advice on creating an rrule(set) for this?
One way to do this is to use an rruleset with a WEEKLY rrule and a corresponding exrule for every 4th week:
from dateutil.rrule import rrule, rruleset
from dateutil.rrule import WEEKLY
from dateutil.relativedelta import relativedelta
from datetime import datetime, timedelta
dtstart = datetime(2011, 1, 1)
rrset = rruleset()
weekly_rule = rrule(freq=WEEKLY, dtstart=dtstart)
every_4_weeks = rrule(freq=WEEKLY, interval=4,
dtstart=dtstart + relativedelta(weeks=4))
rrset.rrule(weekly_rule)
rrset.exrule(every_4_weeks)
rrset.between(dtstart, dtstart + timedelta(days=65))
The result:
[datetime.datetime(2011, 1, 8, 0, 0),
datetime.datetime(2011, 1, 15, 0, 0),
datetime.datetime(2011, 1, 22, 0, 0),
datetime.datetime(2011, 2, 5, 0, 0),
datetime.datetime(2011, 2, 12, 0, 0),
datetime.datetime(2011, 2, 19, 0, 0),
datetime.datetime(2011, 3, 5, 0, 0)]
The way it works is weekly_rule generates one date per week, and the every_4_weeks generates every 4th week, starting with the 4th week after dtstart. That gives you a 3-on 1-off schedule.
Related
I have this code and I want to count the days between 2 dates.
from datetime import date, datetime
checkin= datetime(2022, 1, 30, 1, 15, 00)
checkout= datetime(2022, 1, 31, 0, 0, 00)
count_days= (checkout - checkin).days
In this case, the result of count_days result is 0, because in an operation with 2 datetimes, it takes into account hours, minutes and seconds.
I want the result to be 1 because is +1 day of difference. Type of variables must be datetimes. Thanks!
Convert them to dates first, with the date method.
from datetime import date, datetime
checkin = datetime(2022, 1, 30, 1, 15, 00)
checkout = datetime(2022, 1, 31, 0, 0, 00)
count_days = (checkout.date() - checkin.date()).days
Could you do something like this?
(assuming you want a minimum since the solution you have is similar)
from datetime import date, datetime
check_in= datetime(2022, 1, 30, 1, 15, 00)
check_out= datetime(2022, 1, 31, 0, 0, 00)
# Number of days between two dates (min 1 day)
days = (check_out - check_in).days + 1
print(days)
I've been reading the pytz and datetime module documentation but I can't figure out why one date is under DST and the other is not.
import pytz
import datetime
mytz = pytz.timezone('America/New_York')
od = datetime.datetime(2021, 7, 1, 4, 0)
mytz.localize(od)
# Out: datetime.datetime(2021, 7, 1, 4, 0, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)
mytz.localize(od).dst()
# Out: datetime.timedelta(0, 3600)
dt = datetime.datetime(2089, 7, 1, 4, 0)
mytz.localize(dt)
# Out: datetime.datetime(2089, 7, 1, 4, 0, tzinfo=<DstTzInfo 'America/New_York' EST-1 day, 19:00:00 STD>)
mytz.localize(dt).dst()
# Out: datetime.timedelta(0)
If you look at the source of the time zone rules, you find that they can have a keyword "max" specified that "is used to extend a rule’s application into the indefinite future" ref. For the US, you can find that here. Unless otherwise specified, DST just continues to be applied during the specified period of the year. But keep in mind that this does not mean that it will actually be the case in the future, since time zones are subject to political decisions.
As an addition to #balmy 's comment suggesting this is a deficiency of pytz, Python 3.9's zoneinfo gives the result to be expected from the above:
import datetime
from zoneinfo import ZoneInfo
od = datetime.datetime(2021, 7, 1, 4, 0, tzinfo=ZoneInfo('America/New_York'))
print(od.dst())
# 1:00:00
dt = datetime.datetime(2089, 7, 1, 4, 0, tzinfo=ZoneInfo('America/New_York'))
print(dt.dst())
# 1:00:00
I have data stored in a S3 bucket which uses "yyyy/MM/dd" format to store the files per date, like in this sample S3a path: s3a://mybucket/data/2018/07/03. The files in these buckets are in json.gz format and I would like to import all these files to a spark dataframe per day. After that I want to feed these spark dfs to some written code via a for loop:
for date in date_range:
s3a = 's3a://mybucket/data/{}/{}/{}/*.json.gz'.format(date.year, date.month, date.day)
df = spark.read.format('json').option("header", "true").load(s3a)
# Execute code here
In order to read the data, I tried to format the date_range like below:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).to_pydatetime().tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
date_range
[datetime.datetime(2018, 3, 6, 0, 0),
datetime.datetime(2018, 3, 7, 0, 0),
datetime.datetime(2018, 3, 8, 0, 0),
datetime.datetime(2018, 3, 9, 0, 0),
datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0),
datetime.datetime(2018, 3, 12, 0, 0)]
The problem is that pydatetime() returns the days and months without a '0'. How do I make sure that my code returns a list of values with '0's, like below:
[datetime.datetime(2018, 03, 06, 0, 0),
datetime.datetime(2018, 03, 07, 0, 0),
datetime.datetime(2018, 03, 08, 0, 0),
datetime.datetime(2018, 03, 09, 0, 0),
datetime.datetime(2018, 03, 10, 0, 0),
datetime.datetime(2018, 03, 11, 0, 0),
datetime.datetime(2018, 03, 12, 0, 0)]
This is one approach using .strftime("%Y/%m/%d")
Ex:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).strftime("%Y/%m/%d").tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
print(date_range)
Output:
['2018/03/06',
'2018/03/07',
'2018/03/08',
'2018/03/09',
'2018/03/10',
'2018/03/11',
'2018/03/12']
for date in date_range:
s3a = 's3a://mybucket/data/{}/*.json.gz'.format(date)
print(s3a)
s3a://mybucket/data/2018/03/06/*.json.gz
s3a://mybucket/data/2018/03/07/*.json.gz
s3a://mybucket/data/2018/03/08/*.json.gz
s3a://mybucket/data/2018/03/09/*.json.gz
s3a://mybucket/data/2018/03/10/*.json.gz
s3a://mybucket/data/2018/03/11/*.json.gz
s3a://mybucket/data/2018/03/12/*.json.gz
I am trying to code a function called days15(). The function will be passed an argument called ‘myDateStr’. myDateStr is string representation of a date in the form 20170817 (that is YearMonthDay). The code in the function will create a datetime object from the string, it will then create a timedelta object with a length of 1 day. Then, it will use a list comprehension to produce a list of 15 datetime objects, starting with the date that is passed to the function
the function should return the following list.
[datetime.datetime(2017, 8, 17, 0, 0), datetime.datetime(2017, 8, 18, 0, 0), datetime.datetime(2017, 8, 19, 0, 0), datetime.datetime(2017, 8, 20, 0, 0), datetime.datetime(2017, 8, 21, 0, 0), datetime.datetime(2017, 8, 22, 0, 0), datetime.datetime(2017, 8, 23, 0, 0), datetime.datetime(2017, 8, 24, 0, 0), datetime.datetime(2017, 8, 25, 0, 0), datetime.datetime(2017, 8, 26, 0, 0), datetime.datetime(2017, 8, 27, 0, 0), datetime.datetime(2017, 8, 28, 0, 0), datetime.datetime(2017, 8, 29, 0, 0), datetime.datetime(2017, 8, 30, 0, 0), datetime.datetime(2017, 8, 31, 0, 0)]
I am stuck for the code. I have strted with the below.Please help. Thanks
from datetime import datetime, timedelta
myDateStr = '20170817'
def days15(myDateStr):
Pandas will help you in converting strings to datetime, so first you need to import it:
from datetime import datetime, timedelta
import pandas as pd
myDateStr = '20170817'
Then you can initialize an empty list that you'll later append:
datelist = []
And then you write a function:
def days15(myDateStr):
#converting to datetime
date = pd.to_datetime(myDateStr)
#loop to create 15 datetimes
for i in range(15):
newdate = date + timedelta(days=i)
#adding new dates to the list
datelist.append(newdate)
and then you can call your function and get a list of 15 datetimes:
days15(myDateStr)
As you said, there will be two steps to implement: firstly, convert the string date to a datetime object and secondly, iterate over the next 15 days using timedelta, with a list comprehension or a simple loop.
from datetime import datetime, timedelta
myDateStr = '20170817'
# Parse the string and return a datetime object
def getDateTime(date):
return datetime(int(date[:4]),int(date[4:6]),int(date[6:]))
# Iterate over the timedelta added to the starting date
def days15(myDateStr):
return [getDateTime(myDateStr) + timedelta(days=x) for x in range(15)]
I am working in python pandas and I am doing the following:
StDt = datetime(2018, 1, 1, 1, 0)
EnDt = datetime(2020, 1, 1, 1, 0)
allHours = pd.date_range(StDt, EnDt, freq='H').to_pydatetime()
The midnight hours are represented as:
datetime(2018, 1, 3, 0, 0)
datetime(2018, 1, 5, 0, 0)
Is it possible to create the series in a way such that midnight is represented as hour 24 of previous day
i.e. the above two cases will look as:
datetime(2018, 1, 2, 24, 0)
datetime(2018, 1, 4, 24, 0)
i.e. I am looking for following:
datetime(2018, 1, 3, 0, 0) = datetime(2018, 1, 2, 24, 0)
datetime(2018, 1, 5, 0, 0) = datetime(2018, 1, 4, 24, 0)
Edit:
My particular situation requires working in hour ending world and that is how the convention is in what I am working in.
Using datetimes, this is not possible. Python simply doesn't accept datetime(2018, 1, 2, 24, 0) as a valid time.
There was a request in 2010 to allow for this time to be accepted
Issue 10427: 24:-00 Hour in DateTime
which was rejected.
My only suggestion would be to consider whether you really need this time depicted as you outlined. For actual data manipulation, it should not make any difference as any operations you'd like to do in Pandas with datetimes will conform to this same restriction anyways.
I was working with similar data, and found it useful to consider that Hour Ending data labeled 1-24 is the equivalent of Hour Beginning data labeled 0-23.
So you'll have to change your rule set notation, but it should be a straightforward change.