Add time in pandas beyond year 2262 - python

I would like to add months to date in pandas. They may exceed beyond year 2262. There is a solution for relatively small number of months:
import numpy as np
import pandas as pd
pd.Timestamp('2018-01-22 00:00:00') + np.timedelta64(12, 'M')
which results in
Timestamp('2019-01-22 05:49:12')
However, when I add larger number (which, as a result, exceeds year 2262):
pd.Timestamp('2018-01-22 00:00:00') + np.timedelta64(3650, 'M')
Python does return result:
Timestamp('1737-09-02 14:40:26.290448384')
How to cope with this?

datetime
Pandas.Timestamp aims to handle much finer time resolution down to the nanosecond. This precision takes up enough of the 64 bits allocated to it that it can only go up to the year 2262. However, datetime.datetime does not have this limitation and can go up to year 9999. If you start working with datetime objects instead of Timestamp objects, you'll lose some functionality but you will be able to go beyond 2262.
Also, your number of months also went beyond the maximum number of days for a Timedelta
Let's begin by picking a more reasonable sized number of months that is a nice multiple of 48 (Four years).
d = pd.Timedelta(48, 'M')
And our date
t = pd.Timestamp('2018-01-22')
A multiple that represents how many times our 48 months fits inside the desired 3650 months.
m = 3650 / 48
Then we can use to_pydatetime and to_pytimedelta
t.to_pydatetime() + d.to_pytimedelta() * m
datetime.datetime(2322, 3, 24, 14, 15, 0, 1)

Related

Listing the previous N business days

I've seen about 30 similar posts to this, but nothing really doing exactly what I'm looking for and some which just don't work..
I'm trying to return a list of N business dates, to then iterate through a dictionary and pull data out according to the corresponding dates.
Assuming the current date is:
refreshed = str(data['Meta Data']['3. Last Refreshed'])
For completion, the value of above right now is:2020-1-30
I want to be able to calculate n days prior to this date..
I don't really want to import a bunch of funky modules, and have tried a function using a loop and datetime.date.isoweekday() - but I always come across an issue when passing refreshed in.
One of the main issues I'm seeing with some of the examples elsewhere is where the examples are calculating the dates from datetime.date.today() - seemingly it's fine to pass that to isoweekday() but I can't pass refreshed to isoweekday() to calculate it's 0-6 reference. I've tried using strfrtime() to reformat the date into a suitable format for isoweekday but to no avail.
Subtracting days from a date
You can subtract 30 days from a datetime.datetime object by subtracting a datetime.timedelta object:
>>> import datetime
>>> datetime.datetime.today()
datetime.datetime(2020, 10, 31, 10, 20, 0, 704133)
>>> datetime.datetime.today() - datetime.timedelta(30)
datetime.datetime(2020, 10, 1, 10, 19, 49, 680385)
>>> datetime.datetime.strptime('2020-01-30', '%Y-%m-%d') - datetime.timedelta(30)
datetime.datetime(2019, 12, 31, 0, 0)
Skipping week-ends by subtracting 7 days instead of 5
We are starting from date d and you want to subtract N=30 non-week-end days. A general way could be:
Figure out which day of the week is d;
Figure out how many week-ends there are between d and d-N;
Remove the appropriate number of days.
However, you want to subtract 30 days, and 30 is a multiple of 5. This makes things particularly easy: when you subtract 5 days from a date, you are guaranteed to encounter exactly one week-end in those five days. So you can immediately remove 7 days instead of 5.
Removing 30 days is the same as removing 6 times 5 days. So you can remove 6 times 7 days instead, which is achieved by subtracting datetime.timedelta(42) from your date.
Note: this accounts for week-ends, but not for special holidays.
Skipping week-ends iteratively
You can test for days of the week using .weekday(). This is already answered on this other question: Loop through dates except for week-ends
You can add N days using a timedelta:
data['Meta Data']['3. Last Refreshed'] = pd.to_datetime(data['Meta Data']['3. Last Refreshed']) + pd.to_timedelta(4, unit="D")
Replace 4 with your n days.

Convert hours from January 1st midnight to actual date

I'm working with some telemetry that uses timestamps measured in hours since January 1st at midnight of the current year.
So I get value 1 at time 8668.12034
I'd like to convert it to a more useful date format and of course I've been doing so with hardcoded math dividing into days, remainder hours, minutes, etc accounting for leap years... and it works but I'm sure there's a simple way using the datetime library or something right?
I'm thinking timedelta is the way to go since it's giving me a delta since the beginning of the year but does that account for leap years?
Curious how others would approach this issue, thanks for any advice.
# import packages we need
import datetime
From elapsed hours to datetime.datetime object
You can for example do:
hours_elapsed = 1000
your_date = datetime.datetime(2020,1,1,0,0)+datetime.timedelta(hours=hours_elapsed)
(Of course change hours_elapsed to whatever hours elapsed in your case.)
your_date will be: datetime.datetime(2020, 2, 11, 16, 0)
Yes, timedelta does know about leap years.
Further processing
If want to process this further, can do, using getattr():
timeunits = ['year', 'month', 'day', 'hour', 'minute', 'second']
[getattr(your_date,timeunit) for timeunit in timeunits]
Resulting in:
[2020, 2, 11, 16, 0, 0]

dateutil.relativedelta - How to get duration in days?

I wish to get the total duration of a relativedelta in terms of days.
Expected:
dateutil.timedelta(1 month, 24 days) -> dateutil.timedelta(55 days)
What I tried:
dateutil.timedelta(1 month, 24 days).days -> 24 (WRONG)
Is there a simple way to do this? Thanks!
This one bothered me as well. There isn't a very clean way to get the span of time in a particular unit. This is partly because of the date-range dependency on units.
relativedelta() takes an argument for months. But when you think about how long a month is, the answer is "it depends". With that said, it's technically impossible to convert a relativedelta() directly to days, without knowing which days the delta lands on.
Here is what I ended up doing.
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
rd = relativedelta(years=3, months=7, days=19)
# I use 'now', but you may want to adjust your start and end range to a specific set of dates.
now = datetime.now()
# calculate the date difference from the relativedelta span
then = now - rd
# unlike normal timedelta 'then' is returned as a datetime
# subtracting two dates will give you a timedelta which contains the value you're looking for
diff = now - then
print diff.days
Simple date diff does it actually.
>>> from datetime import datetime
>>> (datetime(2017, 12, 1) - datetime(2018, 1, 1)).days
-31
To get positive number You can swap dates or use abs:
>>> abs((datetime(2017, 12, 1) - datetime(2018, 1, 1)).days)
31
In many situations you have a much restricted relativedelta, in my case, my relativedelta had only relative fields set (years, months, weeks, and days) and no other field. You may be able to get away with the simple method.
This is definitely off by few days, but it may be all you need
(365 * duration.years) + (30 * duration.months) + (duration.days)

Converting Matlab's datenum format to Python

I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.
You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here
Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format
Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()
Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.
Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.

"Passing Go" in a (python) date range

Updated to remove extraneous text and ambiguity.
The Rules:
An employee accrues 8 hours of Paid Time Off on the day after each quarter. Quarters, specifically being:
Jan 1 - Mar 31
Apr 1 - Jun 30
Jul 1 - Sep 30
Oct 1 - Dec 31
The Problem
Using python, I need to define the guts of the following function:
def acrued_hours_between(start_date, end_date):
# stuff
return integer
I'm currently using Python, and wondering what the correct approach to something like this would be.
I'm assuming that using DateTime objects, and possibly the dateutil module, would help here, but my brain isn't wrapping around this problem for some reason.
Update
I imagine the calculation being somewhat simple, as the problem is:
"How many hours of Paid Time Off are accrued from start_date to end_date?" given the above "rules".
The OP's edit mentions the real underlying problem is:
"How many hours of Paid Time Off are
accrued from X-date to Y-date?"
I agree, and I'd compute that in the most direct and straightforward way, e.g.:
import datetime
import itertools
accrual_months_days = (1,1), (4,1), (7,1), (10,1)
def accruals(begin_date, end_date, hours_per=8):
"""Vacation accrued between begin_date and end_date included."""
cur_year = begin_date.year - 1
result = 0
for m, d in itertools.cycle(accrual_months_days):
if m == 1: cur_year += 1
d = datetime.date(cur_year, m, d)
if d < begin_date: continue
if d > end_date: return result
result += hours_per
if __name__ == '__main__': # examples
print accruals(datetime.date(2010, 1, 12), datetime.date(2010, 9, 20))
print accruals(datetime.date(2010, 4, 20), datetime.date(2012, 12, 21))
print accruals(datetime.date(2010, 12, 21), datetime.date(2012, 4, 20))
A direct formula would of course be faster, but could be tricky to do it without bugs -- if nothing else, this "correct by inspection" example can serve to calibrate the faster one automatically, by checking that they agree over a large sample of date pairs (be sure to include in the latter all corner cases such as first and last days of quarters of course).
I would sort all the events for a particular employee in time order and simulate the events in that order checking that the available days of paid time off never falls below zero. A paid time off request is an event with a value -(number of hours). Jan 1st has an event with value +8 hours.
Every time a modification is made to the data, run the simulation again from the start.
The advantage of this method is that it will detect situations in which a new event is valid at that time but causes the number of free days to drop such that a later event which previously was valid now becomes invalid.
This could be optimized by storing intermediate results in a cache but since you will likely only have a few hundred events per employee this optimization probably won't be necessary.
This can be done with plain old integer math:
from datetime import date
def hours_accrued(start, end):
'''hours_accrued(date, date) -> int
Answers the question "How many hours of Paid Time Off
are accrued from X-date to Y-date?"
>>> hours_accrued(date(2010, 4, 20), date(2012, 12, 21))
80
>>> hours_accrued(date(2010, 12, 21), date(2012, 4, 20))
48
'''
return ( 4*(end.year - start.year)
+ ((end.month-1)/3 - (start.month-1)/3) ) * 8
I would count all free days before the date in question, then subtract the number of used days before then in order to come to a value for the maximum number of allowable days.
Set up a tuple for each date range (we'll call them quarters). In the tuple store the quarter (as a cardinal index, or as a begin date), the maximum accrued hours for a quarter, and the number of used hours in a quarter. You'll want to have a set of tuples that are sorted for this to work, so a plain list probably isn't your best option. A dictionary might be a better way to approach this, with the quarter as the key and the max/used entries returned in the tuple, as it can be "sorted".
(Note: I looked at the original explanation and rewrote my answer)
Get a copy of the set of all quarters for a given employee, sorted by the quarter's date. Iterate over each quarter summing the difference between the maximum per-quarter allotment of vacation time and the time "spent" on that quarter until you reach the quarter that the request date falls into. This gives accumulated time.
If accumulated time plus the time alloted for the requested quarter is not as much as the requested hours, fail immediately and reject the request. Otherwise, continue iterating up to the quarter of your quest.
If there is sufficient accumulated time, continue iterating over the copied set, computing the new available times on a per-quarter basis, starting with the left-over time from your initial calculation.
If any quarter has a computed time falling below zero, fail immediately and reject the request. Otherwise, continue until you run out of quarters.
If all quarters are computed, update the original set of data with the copy and grant the request.

Categories

Resources