python-dateutil - RRule - Different times for different weekdays - python

I'm using rrule as shown here:
https://labix.org/python-dateutil#head-470fa22b2db72000d7abe698a5783a46b0731b57
I'm wondering if it somehow possible to create a rule where different times are specified for different weekdays
e.g. WEEKLY Thursday 6pm and Saturday 10am
Hope someone can help :)

A single rrule can not specify both pairs of days and hours, but you could use a rrule.ruleset to combine rrules:
import datetime as DT
import dateutil.rrule as RR
today = DT.date.today()
aset = RR.rruleset()
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.TH, byhour=18, count=3, dtstart=today))
aset.rrule(RR.rrule(RR.WEEKLY, byweekday=RR.SA, byhour=10, count=3, dtstart=today))
for date in aset:
print(date)
yields
2015-03-26 18:00:00
2015-03-28 10:00:00
2015-04-02 18:00:00
2015-04-04 10:00:00
2015-04-09 18:00:00
2015-04-11 10:00:00

Related

How can I parse this date format into datetime? Python/Pandas

The starting date format I currently have is 2019-09-04 16:00 UTC+3 and I'm trying to convert it into a datetime format of 2019-09-04 16:00:00+0300.
The format I thought would work was format='%Y-%m-%d %H:%M %Z%z', but when I run it I get the error message ValueError: Cannot parse both %Z and %z.
Does anyone know the correct format to use, or should I be trying a different method altogether? Thanks.
Edit
Sorry, I had a hard time putting into words what it is I am looking to do, hopefully I can clarify.
I'm looking to change all the date and times in a dataframe into the datetime format.
This is the method I was trying to use which presented me with an error
df['datepicker'] = pd.to_datetime(df['datepicker'], format='%Y-%m-%d %H:%M %Z%z')
And here is a sample of the data I currently have.
datepicker
2019-09-07 16:00 UTC+2
2019-09-04 18:30 UTC+4
2019-09-06 17:00 UTC±0
2019-09-10 16:00 UTC+1
2019-09-04 18:00 UTC+3
And this is what I'm looking to convert them into, a timestamp format.
datepicker
2019-09-07 16:00:00+0200
2019-09-04 18:30:00+0400
2019-09-06 17:00:00+0000
2019-09-10 16:00:00+0100
2019-09-04 18:00:00+0300
pandas.to_datetime should parse this happily if you tweak the strings slightly:
import pandas as pd
df = pd.DataFrame({"datepicker":[ "2019-09-07 16:00 UTC+2", "2019-09-04 18:30 UTC+4",
"2019-09-06 17:00 UTC±0", "2019-09-10 16:00 UTC+1",
"2019-09-04 18:00 UTC+3"]})
df['datetime'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'))
# df['datetime']
# 0 2019-09-07 16:00:00-02:00
# 1 2019-09-04 18:30:00-04:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 16:00:00-01:00
# 4 2019-09-04 18:00:00-03:00
# Name: datetime, dtype: object
Note that due to the mixed UTC offsets, the column's data type is 'object' (datetime objects). If you wish, you can also convert to UTC straight away, to get a column of dtype datetime[ns]:
df['UTC'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'), utc=True)
# df['UTC']
# 0 2019-09-07 18:00:00+00:00
# 1 2019-09-04 22:30:00+00:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 17:00:00+00:00
# 4 2019-09-04 21:00:00+00:00
# Name: UTC, dtype: datetime64[ns, UTC]
When i defined as below. it works as you expect.
from datetime import datetime, timedelta, timezone
UTC = timezone(timedelta(hours=+3))
dt = datetime(2019, 1, 1, 12, 0, 0, tzinfo=UTC)
timestampStr = dt.strftime("%Y-%m-%d %H:%M %Z%z")
print(timestampStr)
With the output of:
2019-01-01 12:00 UTC+03:00+0300

sort dates in python according to hour, day, month [duplicate]

This question already has answers here:
sort dates in python array
(9 answers)
Closed 2 years ago.
I have a string array of dates with this formula:
array(['2018-01-01 02:00:00 +01:00', '2018-01-01 04:00:00 +01:00',
'2018-01-01 05:00:00 +01:00', ..., '2018-12-31 21:00:00 +01:00',
'2018-12-31 22:00:00 +01:00', '2018-12-31 23:00:00 +01:00'],
dtype='<U26')
I want to sort the dates by month and by hour, e.g:
2018-01-01 00:00:00 +01:00
2018-01-01 01:00:00 +01:00
2018-01-01 02:00:00 +01:00
and so on.
I am using this mini code:
time1=time.sort()
but the sorting gives the month and all the values of an hour before moving to the next hour.
Is there a way to sort these dates by hour according to each day of every month?
import datetime
dates = [datetime.datetime.strptime(ts, "%Y-%m-%d %H:%M:%S %z") for ts in timestamps]
dates.sort()
sorteddates = [datetime.datetime.strftime(ts, "%Y-%m-%d %H:%M:%S %z") for ts in dates]
Do double confirm if the DateTime format corresponds. https://www.w3schools.com/python/python_datetime.asp

Time difference in pandas (from string format to datetime)

I have the following column
Time
2:00
00:13
1:00
00:24
in object format (strings). This time refers to hours and minutes ago from a time that I need to use as a start: 8:00 (it might change; in this example is 8:00).
Since the times in the column Time are referring to hours/minutes ago, what I would like to expect should be
Time
6:00
07:47
7:00
07:36
calculated as time difference (e.g. 8:00 - 2:00).
However, I am having difficulties in doing this calculation and transform the result in a datetime (keeping only hours and minutes).
I hope you can help me.
Since the Time columns contains only Hour:Minute I suggest using timedelta instead of datetime:
df['Time'] = pd.to_timedelta(df.Time+':00')
df['Start_Time'] = pd.to_timedelta('8:00:00') - df['Time']
Output:
Time Start_Time
0 02:00:00 06:00:00
1 00:13:00 07:47:00
2 01:00:00 07:00:00
3 00:24:00 07:36:00
you can do it using pd.to_datetime.
ref = pd.to_datetime('08:00') #here define the hour of reference
s = ref-pd.to_datetime(df['Time'])
print (s)
0 06:00:00
1 07:47:00
2 07:00:00
3 07:36:00
Name: Time, dtype: timedelta64[ns]
This return a series, that can be change to a dataframe with s.to_frame() for example

Python - Local Time

I have a dataframe that has entries like this, where the times are in UTC:
start_date_time timezone
1 2017-01-01 14:00:00 America/Los_Angeles
2 2017-01-01 14:00:00 America/Denver
3 2017-01-01 14:00:00 America/Phoenix
4 2017-01-01 14:30:00 America/Los_Angeles
5 2017-01-01 14:30:00 America/Los_Angeles
I need to be able to group by date (local date, not UTC date) and I need to be able to create indicators for whether the event happened between certain times (local times, not UTC times).
I have successfully done the above in R by:
Creating a time variable in each of the timezones
Converting those to strings
Pulling each of the string date/time variables into one column, which one I pull depends on the appropriate timezone
Then, splitting that column to get a string date column and a string time column
I can then convert everything back to datetime objects for comparisons. e.g. now I can say if something happened between 2 and 3pm and it will correctly identify everything that happened between 2 and 3pm locally.
I have tried a bunch in python and have the dates as
2017-01-02 04:30:00-08:00
but I can't figure out how to go from there to
2017-01-01 20:30:00
Thanks!
Your example is incorrect. Your timezone is eight hours behind UTC, which means you need to add eight hours to 4:30AM which is 12:30PM UTC time.
The datetime object function astimezone(...) will do the conversion for you. For ease of use, I recommend pytz.
However in pure python:
import datetime as dt
local_tz = dt.timezone(dt.timedelta(hours=-8))
utc = dt.timezone.utc
d = dt.datetime(2017, 1, 2, 4, 30, 0, 0, local_tz)
print(d, d.astimezone(utc))
Will print:
2017-01-02 04:30:00-08:00 2017-01-02 12:30:00+00:00
Here's an example using pytz to lookup time zones:
import datetime as dt
import pytz
dates = [("2017-01-01 14:00:00", "America/Los_Angeles"),
("2017-01-01 14:00:00", "America/Denver"),
("2017-01-01 14:00:00", "America/Phoenix"),
("2017-01-01 14:30:00", "America/Los_Angeles"),
]
for d, tz_str in dates:
start = dt.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
start = start.replace(tzinfo=pytz.utc)
local_tz = pytz.timezone(tz_str) # convert to desired timezone
print(start, local_tz.zone, "\t", start.astimezone(local_tz))
This produces:
2017-01-01 14:00:00+00:00 America/Los_Angeles 2017-01-01 06:00:00-08:00
2017-01-01 14:00:00+00:00 America/Denver 2017-01-01 07:00:00-07:00
2017-01-01 14:00:00+00:00 America/Phoenix 2017-01-01 07:00:00-07:00
2017-01-01 14:30:00+00:00 America/Los_Angeles 2017-01-01 06:30:00-08:00

start end date each month in the pas 12 months

This is my current code
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1,-1,), dtstart=start_date):
yield st.combine(st, mint)
And this is output from this:
for y in TimeSeries().year():
print(y)
2013-01-31 00:00:00
2013-02-01 00:00:00
2013-02-28 00:00:00
2013-03-01 00:00:00
2013-03-31 00:00:00
2013-04-01 00:00:00
2013-04-30 00:00:00
2013-05-01 00:00:00
2013-05-31 00:00:00
2013-06-01 00:00:00
2013-06-30 00:00:00
2013-07-01 00:00:00
2013-07-31 00:00:00
2013-08-01 00:00:00
2013-08-31 00:00:00
2013-09-01 00:00:00
2013-09-30 00:00:00
2013-10-01 00:00:00
2013-10-31 00:00:00
2013-11-01 00:00:00
2013-11-30 00:00:00
2013-12-01 00:00:00
2013-12-31 00:00:00
2014-01-01 00:00:00
The question is how I can force that counting are started from 2013-01-01 00:00:00 and month end like 2013-01-31 23:59:59 and so on.
And the end of loop ends on 2014-01-31 23:59:59 instead 2014-01-01 00:00:00
Also I like make start date and end date on one line:
2013-03-01 00:00:00 2013-03-31 23:59:59
2013-04-01 00:00:00 2013-03-30 23:59:59
...
...
2014-01-01 00:00:00 2014-01-31 23:59:59
Any suggestion?
First, are you really sure that you want 2013-03-31 23:59:59. Date intervals are traditionally specified as half-open intervals—just like ranges in Python. And the reason for this is that 23:59:59 is not actually the end of a day.
Most obviously, 23:59:59.001 is later than that but on the same day. Python datetime objects include microseconds, so this isn't just a "meh, whatever" problem—if you, e.g., call now(), you can get a time that's incorrectly later than your "end of the day" on the same day.
Less obviously, on a day with a leap second, 23:59:60 is also later but on the same day.
But if you really want this, there are two obvious ways to get it:
You're already iterating dates instead of datetimes and combining the times in manually. And it's obvious when you're dealing with a day 1 vs. day -1, because the date's day member will be 1 or it won't be. So:
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1, -1,), dtstart=start_date):
yield st.combine(st, mint if st.day=1 else maxt)
Alternatively, instead of iterating both first and last days, just iterate first days, and subtract a second to get the last second of the previous month:
class TimeSeries():
def year(year):
today = datetime.now()
start_date = today+relativedelta(years=-1)
mint, maxt = datetime.min.time(), datetime.max.time()
for st in rrule(MONTHLY, count=24, bymonthday=(1,), dtstart=start_date):
dt = st.combine(st, mint)
yield dt - timedelta(seconds=1)
yield dt
As far as printing these in pairs… well, as written, that's an underspecified problem. The first value in your list is the second value in a pair—except when you run this on the 1st of a month. And likewise, the last date is the first value in a pair, except when you run this on the 31st. So, what do you want to do with them?
If this isn't obvious, look at your example. Your first value is 2013-01-31 00:00:00, but your first pair doesn't start with 2013-01-31.
There are many things you could want here:
Start with the first of the month a year ago, rather than the first first-or-last of the month that happened within the last year. And likewise for the end. So you would have 2013-01-01 in your list, and there would always be pairs.
Start with the first month that started within the last year, and likewise for the end. So you wouldn't get 2013-01-31 in your list, and there would always be pairs.
Use your current rule, and there's not a pair, use None for the missing value.
etc.
Whatever rule you actually want can be coded up pretty easily. And then you'll probably want to yield in (start, end) tuples, so the print loop can just do this:
for start, end in TimeSeries().year():
print(start, end)

Categories

Resources