Really struggling with this one so any help would be much appreciated.
GOAL - workout the hours between two datetime columns excluding weekends and only taking the hours between the working times of 9 & 17.
Now I have reused a function that I use for network days but the output is wrong and I can't seem to figure out how to get it working.
As an example I have In my data a start date and end date that are as follows
Start_Date = 2017-07-11 19:33:00
End_Date = 2017/07/12 12:01:00
and the output I'm after is
3.02
However the function I do have is returning 16!
Function below -
start = pd.Series(start)
end = pd.Series(end)
mask = (pd.notnull(start) & pd.notnull(end)) & (start.dt.hour >= 9) & (end.dt.hour <= 17) & (start.dt.weekday < 5) & (end.dt.weekday < 5)
result = np.empty(len(start), dtype=float)
result.fill(np.nan)
result[mask] = np.where((start[mask].dt.hour >= 9) & (end[mask].dt.hour <= 17), (end[mask] - start[mask]).astype('timedelta64[h]').astype(float), 0)
return result ```
It looks like what you need is businesstimedelta
import datetime
import businesstimedelta
start = datetime.datetime.strptime("2017-07-11 19:33:00", "%Y-%m-%d %H:%M:%S")
end = datetime.datetime.strptime("2017-07-12 12:01:00", "%Y-%m-%d %H:%M:%S")
# Define a working day rule
workday = businesstimedelta.WorkDayRule(
start_time=datetime.time(9),
end_time=datetime.time(17),
working_days=[0, 1, 2, 3, 4])
businesshours = businesstimedelta.Rules([workday])
# Calculate the difference
diff = businesshours.difference(start, end)
print(diff)
Output:
<BusinessTimeDelta 3 hours 60 seconds>
https://pypi.org/project/businesstimedelta/
So, I really struggled finding out how to apply the above into a function but finally after much banging of the head came up with the below. Sharing for the next person in my situation. I wanted to convert to minutes so if not required just remove the *60 at the return
import datetime
import businesstimedelta
# Define a working day
workday = businesstimedelta.WorkDayRule(
start_time=datetime.time(9),
end_time=datetime.time(17),
working_days=[0, 1, 2, 3, 4])
# Combine the two
businesshrs = businesstimedelta.Rules([workday])
def business_Mins(df, start, end):
try:
mask = pd.notnull(df[start]) & pd.notnull(df[end])
result = np.empty(len(df), dtype=object)
result[mask] = df.loc[mask].apply(lambda x: businesshrs.difference(x[start],x[end]).hours, axis=1)
result[~mask] = np.nan
return result * 60
except KeyError as e:
print(f"Error: One or more columns not found in the dataframe - {e}")
return None
df['Contact_SLA'] = business_Mins(df, 'Date and Time of Instruction', 'Date and Time of Attempted Contact')
Related
I have a dataframe and a function to get random dates..
from datetime import date, timedelta
import pandas as pd
import random
def dates(start_date, end_date):
start_date = date(start_date[0], start_date[1], start_date[2])
end_date = date(end_date[0], end_date[1], end_date[2])
days_delta = (end_date - start_date).days
return start_date + timedelta(days=random.randrange(days_delta))
df = pd.DataFrame(index=range(100))
df['MOVE_OUT_DATE'] = date(9999, 12, 31)
df['MOVE_IN_DATE'] = [dates((2021, 1, 1), (2021, 6, 30)) for _ in range(df.shape[0])]
To get the difference in days I do this,
df['days_diff'] = df['MOVE_OUT_DATE'] - df['MOVE_IN_DATE']
and this works fine in VS Code. But it throws a "Python int too large to convert to C long" in Databricks. A screenshot of error is attached below,
Any help or suggestion is appreciated. Thank you.
I was able to get everything to work and I believe it is what you are trying to accomplish with your code
df = pd.DataFrame(pd.date_range('2021-01-01', '2021-06-01', freq = 'D'), columns = ['START_DATE'])
df['MOVE_OUT_DATE'] = '2260-12-31'
df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MOVE_OUT_DATE'] = pd.to_datetime(df['MOVE_OUT_DATE'])
df['DAYS_DIFF'] = df['MOVE_OUT_DATE'] - df['START_DATE']
df
However, if you notice the 'MOVE_OUT_DATE' is only set to 2060 as anything long than that produced an error as the being to long. Could you take this and generate the results you want (if you converted it into a function)?
For the function below, I am inputting a string like "6/29/2020" and "8/10/2010" and I want to get a numbers of days after Jan. 1, 2010. For example, if I input "1/29/2010", I want the integer 29 to be returned.
Currently, I have gotten "6/29/2020" to a string "2020-06-29". Now I just need help with converting that string into the days after Jan. 1, 2010.
I feel like I have posted everything needed for you to help, but if you need more information, let me know. Thank You for helping me with this problem.
def day_conversion(dates):
import datetime
i = 0
for day in dates:
day = day.split('/')
if len(day[0]) == 1:
day[0] = f"0{day[0]}"
if len(day[1]) == 1:
day[1] = f"0{day[1]}"
day = f"{day[2]}-{day[0]}-{day[1]}"
# day = date.format(day)
# from datetime import date
# day0 = date(2000, 1, 1)
# day = day - day0
dates[i] = day
i += 1
return dates
datetime has a function for parsing dates, and subtracting two datetime objects gives a timedelta object with a .days attribute:
from datetime import datetime
def days_since_jan1_2010(date):
dt = datetime.strptime(date, '%m/%d/%Y')
diff = dt - datetime(2010, 1, 1)
return diff.days
def day_conversion(dates):
return [days_since_jan1_2010(d) for d in dates]
print(day_conversion(['6/29/2020', '8/10/2010', '1/1/2010', '1/2/2010']))
Output:
[3832, 221, 0, 1]
Everything in the previous answer is correct, but just thought I'd point out that you were very nearly there if you include the commented out part in your code above except for the following points:
from datetime import date needs to come before you try to use date.
You want date.fromisoformat, not date.format.
Your code has Jan 1 2000 but you state in your question that you want the number of days from Jan 1 2010.
If you substitute the commented part of your original code for the following four lines you should get the result you are after.
from datetime import date
day = date.fromisoformat(day)
day0 = date(2010, 1, 1)
day = day - day0
I'm new to Python. After a couple days researching and trying things out, I've landed on a decent solution for creating a list of timestamps, for each hour, between two dates.
Example:
import datetime
from datetime import datetime, timedelta
timestamp_format = '%Y-%m-%dT%H:%M:%S%z'
earliest_ts_str = '2020-10-01T15:00:00Z'
earliest_ts_obj = datetime.strptime(earliest_ts_str, timestamp_format)
latest_ts_str = '2020-10-02T00:00:00Z'
latest_ts_obj = datetime.strptime(latest_ts_str, timestamp_format)
num_days = latest_ts_obj - earliest_ts_obj
num_hours = int(round(num_days.total_seconds() / 3600,0))
ts_raw = []
for ts in range(num_hours):
ts_raw.append(latest_ts_obj - timedelta(hours = ts + 1))
dates_formatted = [d.strftime('%Y-%m-%dT%H:%M:%SZ') for d in ts_raw]
# Need timestamps in ascending order
dates_formatted.reverse()
dates_formatted
Which results in:
['2020-10-01T00:00:00Z',
'2020-10-01T01:00:00Z',
'2020-10-01T02:00:00Z',
'2020-10-01T03:00:00Z',
'2020-10-01T04:00:00Z',
'2020-10-01T05:00:00Z',
'2020-10-01T06:00:00Z',
'2020-10-01T07:00:00Z',
'2020-10-01T08:00:00Z',
'2020-10-01T09:00:00Z',
'2020-10-01T10:00:00Z',
'2020-10-01T11:00:00Z',
'2020-10-01T12:00:00Z',
'2020-10-01T13:00:00Z',
'2020-10-01T14:00:00Z',
'2020-10-01T15:00:00Z',
'2020-10-01T16:00:00Z',
'2020-10-01T17:00:00Z',
'2020-10-01T18:00:00Z',
'2020-10-01T19:00:00Z',
'2020-10-01T20:00:00Z',
'2020-10-01T21:00:00Z',
'2020-10-01T22:00:00Z',
'2020-10-01T23:00:00Z']
Problem:
If I change earliest_ts_str to include minutes, say earliest_ts_str = '2020-10-01T19:45:00Z', the resulting list does not increment the minute intervals accordingly.
Results:
['2020-10-01T20:00:00Z',
'2020-10-01T21:00:00Z',
'2020-10-01T22:00:00Z',
'2020-10-01T23:00:00Z']
I need it to be:
['2020-10-01T20:45:00Z',
'2020-10-01T21:45:00Z',
'2020-10-01T22:45:00Z',
'2020-10-01T23:45:00Z']
Feels like the problem is in the num_days and num_hours calculation, but I can't see how to fix it.
Ideas?
if you don't mind to use a 3rd party package, have a look at pandas.date_range:
import pandas as pd
earliest, latest = '2020-10-01T15:45:00Z', '2020-10-02T00:00:00Z'
dti = pd.date_range(earliest, latest, freq='H') # just specify hourly frequency...
l = dti.strftime('%Y-%m-%dT%H:%M:%SZ').to_list()
print(l)
# ['2020-10-01T15:45:00Z', '2020-10-01T16:45:00Z', '2020-10-01T17:45:00Z', '2020-10-01T18:45:00Z', '2020-10-01T19:45:00Z', '2020-10-01T20:45:00Z', '2020-10-01T21:45:00Z', '2020-10-01T22:45:00Z', '2020-10-01T23:45:00Z']
import datetime
from datetime import datetime, timedelta
timestamp_format = '%Y-%m-%dT%H:%M:%S%z'
earliest_ts_str = '2020-10-01T00:00:00Z'
ts_obj = datetime.strptime(earliest_ts_str, timestamp_format)
latest_ts_str = '2020-10-02T00:00:00Z'
latest_ts_obj = datetime.strptime(latest_ts_str, timestamp_format)
ts_raw = []
while ts_obj <= latest_ts_obj:
ts_raw.append(ts_obj)
ts_obj += timedelta(hours=1)
dates_formatted = [d.strftime('%Y-%m-%dT%H:%M:%SZ') for d in ts_raw]
print(dates_formatted)
EDIT:
Here is example with Maya
import maya
earliest_ts_str = '2020-10-01T00:00:00Z'
latest_ts_str = '2020-10-02T00:00:00Z'
start = maya.MayaDT.from_iso8601(earliest_ts_str)
end = maya.MayaDT.from_iso8601(latest_ts_str)
# end is not included, so we add 1 second
my_range = maya.intervals(start=start, end=end.add(seconds=1), interval=60*60)
dates_formatted = [d.iso8601() for d in my_range]
print(dates_formatted)
Both output
['2020-10-01T00:00:00Z',
'2020-10-01T01:00:00Z',
... some left out ...
'2020-10-01T23:00:00Z',
'2020-10-02T00:00:00Z']
Just change
num_hours = num_days.days*24 + num_days.seconds//3600
The problem is that num_days only takes integer values, so if it is not a multiple of 24h you will get the floor value (i.e for your example you will get 0). So in order to compute the hours you need to use both, days and seconds.
Also, you can create the list directly in the right order, I am not sure if you are doing it like this for some reason.
ts_raw.append(earliest_ts_obj + timedelta(hours = ts + 1))
I'm using Pandas to generate a list of dates and times within a specified range to get query an API. My aim is to query weeks or months on per-hour basis.
time_range = pd.date_range('20180601T07:00:0000', '20180701T07:00:0000', freq='H')
time_range = time_range.strftime("%Y%m%d"+'T%H:00-0000')
yields a list of times in the desired list format. Where I'm encountering difficulty is that the URL is formatted...
startdatetime=20180601T07:00-0000&enddatetime=20180601T08:00-0000
I understand I need to start with values 0 & 1 from the Pandas list but I don't how to cycle through. Should I be thinking of a dictionary like...
{date1:[hour1, hour2, etc...], date2:[hour1, hour2, etc...], ...}
and use a .format where startdatetime={1}&enddatetime={2} ?
or should it be more like a for loop...
for date in date_range:
url = 'http://somename?startdatetime={date}&enddatetime{date2}'
urldate = url.format(date=date)
urldate2 = url.format(date2=date + 1)
Any help is appreciated!
If I understand clearly, you want to iterate from a starting date/time (2018-06-01 07:00) to an ending date/time (2018-07-01 07:00) with a step of one hour. And produce an URL with date/time intervals of one hour.
I don’t know why you use Panda for that when you can do it with the standard library, like that:
import datetime
start = datetime.datetime(2018, 6, 1, 7)
end = datetime.datetime(2018, 7, 1, 7)
delta = datetime.timedelta(hours=1)
fmt = 'http://somename?startdatetime={date1:%Y%m%d%H:00-0000}&enddatetime{date2:%Y%m%d%H:00-0000}'
while start < end:
date1 = start
date2 = start + delta
url = fmt.format(date1=date1, date2=date2)
print(url)
start = date2
You get:
http://somename?startdatetime=2018060107:00-0000&enddatetime2018060108:00-0000
http://somename?startdatetime=2018060108:00-0000&enddatetime2018060109:00-0000
http://somename?startdatetime=2018060109:00-0000&enddatetime2018060110:00-0000
http://somename?startdatetime=2018060110:00-0000&enddatetime2018060111:00-0000
...
In the loop, I work with date instances. I use a format string, like “{date2:%Y%m%d%H:00-0000}” to format the date and time in the required format.
Notice that the date_range() function is easy to implement with the standard library:
def date_range(start, end, delta):
while start < end:
yield start
start = start + delta
To get the list of dates with an interval of one hour, you can do:
dates = list(date_range(
datetime.datetime(2018, 6, 1, 7),
datetime.datetime(2018, 7, 1, 7),
datetime.timedelta(hours=1)))
Then, the solution becomes:
fmt = 'http://somename?startdatetime={date1:%Y%m%d%H:00-0000}&enddatetime{date2:%Y%m%d%H:00-0000}'
for date1, date2 in zip(dates[:-1], dates[1:]):
url = fmt.format(date1=date1, date2=date2)
print(url)
The trick is to use the zip() function with the list of dates shifted with one item to get the couples of dates.
When I input a value like '2015-08', my date_range works as intended. If I use the startdate variable, then it no longer works? I cannot figure out why this would be.
The error I get is "Cannot convert input to Timestamp"
Not for points. I'm a bit confused, isn't what you're doing just basically the following?
Code:
from datetime import datetime, timedelta
now = datetime.now()
print now.strftime("%Y-%m")
month_ago = now.replace(day=1) - timedelta(days = 1)
print month_ago.strftime("%Y-%m")
months_ago = month_ago.replace(day=1) - timedelta(days = 1)
print months_ago.strftime("%Y-%m")
Output:
2015-11
2015-10
2015-09
The above might not be the perfect answer, but you can substitute any datetime for now and it will give you basically the current and last two months. Adjust as needed, of course.
EDIT:
You can even take it a step further and just create a function that allows you to specify the numbers of months back or use a custom date.
from datetime import datetime, timedelta
def last_n_months(num_of_months, start_date=datetime.now(), include_curr=True):
f = "%Y-%m"
curr = start_date
if include_curr:
yield curr.strftime(f)
for num in range(num_of_months):
curr = curr.replace(day=1) - timedelta(days=1)
yield curr.strftime(f)
# This month and last 12 months.
print [m for m in last_n_months(12)]
# ['2015-11', '2015-10', '2015-09', '2015-08', '2015-07', '2015-06', '2015-05', '2015-04', '2015-03', '2015-02', '2015-01', '2014-12', '2014-11']
# Last 12 months only.
print [m for m in last_n_months(12, include_curr=False)]
# ['2015-10', '2015-09', '2015-08', '2015-07', '2015-06', '2015-05', '2015-04', '2015-03', '2015-02', '2015-01', '2014-12', '2014-11']
# Last 12 months from custom date, exclude custom date.
d = datetime(2012, 6, 1)
print [m for m in last_n_months(12, d, False)]
# ['2012-05', '2012-04', '2012-03', '2012-02', '2012-01', '2011-12', '2011-11', '2011-10', '2011-09', '2011-08', '2011-07', '2011-06']