can a pandas.Period represent an arbitrary time span? - python

Edit: the (puzzling) behavior below was for pandas 0.17.1. It appears fixed in 0.18.1.
Is there a way to represent an arbitrary time span with a pandas.Period?
Specifically, I was trying to contrive a pandas.Period() to represent an arbitrary n-day span (with the goal of making a multi-year Period).
I tried a few things, and it seems that playing with the freq argument gets me more or less what I want. However, I was surprised by the unexpected end_time of the period in the case of the freq argument having a multiplier (as in freq='2D').
import pandas as pd
p = pd.Period(1970, freq='2D')
p # Period('1970-01-01', '2D')
p.start_time # Timestamp('1970-01-01 00:00:00')
p.end_time # Timestamp('1970-01-04 23:59:59.999999999')
p.end_time - p.start_time
# Timedelta('3 days 23:59:59.999999')
Why? That's 4 days, not 2.
However:
p+1 # Period('1970-01-03', '2D')
(p+1).start_time # Timestamp('1970-01-03 00:00:00')
So, (p+1) gives me the expected (a period starting 2 days after p's start).
But what's the deal with end_time? What's the relationship between freq='nD' to actual duration in days?
def actual_span(n, unit='D'):
p = pd.Period(1970, freq='{}{}'.format(n, unit))
return p.end_time + pd.Timedelta(1) - p.start_time
x = pd.DataFrame({'d': range(1, 10)})
x['span'] = x.n.apply(actual_span)
print(x.set_index('n'))
# span
# n
# 1 1 days
# 2 4 days
# 3 9 days
# 4 16 days
# 5 25 days
# 6 36 days
# 7 49 days
# 8 64 days
# 9 81 days
Why is it the square of the requested number of days?
Note that (p+1).start_time is correct (gives us n days).
Small print: Python 3.51, Pandas 0.18.1 correction: 0.17.1.

pd.Period(1970, freq='2D') has the expected start_time and end_time for me, also using Pandas 0.18.1. Maybe try restarting your interpreter, and run the first bit of code you posted again to verify that you're still getting the unexpected output?

Related

Python datetime only returning negative numbers?

I have the following code:
commit_date = get_last_commit_date() # returns datetime obj: 2022-08-25 13:32:12
pr_date = datetime.fromisoformat(ISO_CODE) # returns 2022-08-24 19:15:07
Notice how commit_date is Aug. 25 and pr_date is Aug. 24. I want to find the difference in days which should be 1 day.
Now doing print((pr_date - commit_date).days) will give -1 days. So naturally I swap them and do print((commit_date - pr_date).days) but it reports 0 days? How can it be 0 days? Why isn't it reporting 1 day time difference?
This is a rounding error; since the dates are not at the exact same time, the difference between the two days can end up being on a different day depending on the order you subtract them.
i.e
2022-08-24 19:15:07 - 2022-08-25 13:32:12 = {0}-{0}-{-1} {5}:{42}{55}
and
2022-08-25 13:32:12 - 2022-08-24 19:15:07 = {0}-{0}-{0} {18}:{17}{05}
The difference of hours between 13-19 means the hour is -6. Negative hours in a day means it is a day before, so the difference is 0 days and 18 hours instead of 1 for the second calculation.
If you're trying to find the difference in days as opposed to total difference in date+time, you wanna subtract the "day" attributes of the individual datetimes as opposed to the entire date+time.
If you do
print((pr_date.day - commit_date.day))
>> -1
print((commit_date.day - pr_date.day))
>> 1
You get the right difference between days. If you use absolute value i.e
print(abs(pr_date.day - commit_date.day))
>> 1
The order of the days doesn't matter, and you can find the difference like that.

How to remove 0 day for and keep time only in a dataframe in pandas

Hi im having an issue with when converting the time into timedelta. I managed to keep the time only but it change to an object type. i would like to ask if you can assist me with keeping the time only in timedelta type. below is an example of my dataframe.
| Time | ST |S|
| ---------------| ---|-|
| 0 days 12:09:46| 33 |R|
| 0 days 12:09:51| 45 |H|
i tried the following
n['Time'] = pd.to_timedelta(n['Time'])
# n['Time'] = pd.to_timedelta(n['Time'], unit='s') # error unit must not be specified if the input contains a str
n['Time'] = n['Time'].astype(str).str.split('0 days ').str[-1] # works but cant
resample the data
n= n.set_index('Time')#.resample('6s').mean()
It was working before when i chaned to a new laptop 0 days appeared
Thanks for your time and help
We have to drop the column S contacting strings before resampling then
0 days won't affect resampling nor plotting as it is the same across all rows in the data
n= n.drop(['s'], axis = 1)
n['Time'] = pd.to_timedelta(n['Time'])
n = n.set_index('Time')#.resample('6s').mean()
This works fine
Thanks

Formula that gives workdays given starting day of the week as number and number of days?

I'm trying to come up with a formula (No ifs/else, only sum, subtraction, division, multiplication and mod) to com up with working days.
Workdays are days of the week 1-5
Given :
Day of the week as an integer between 1-7 (you can use 0-6 in your answer)
Number of days from day of the week between 1-n
Example input :
week_day = 1 //monday
days = 10
Should output :
workdays = 8
I believe the formula should be something around the mod operator, but not even sure where to start.
What I have so far only works if week_day < 5 :
week_day = 1
days = 16
saturday_day = 6
sunday_day = 7
saturdays = days/saturday_day
sundays = days/sunday_day
weekends = saturdays+sundays
workdays = days - weekends
I believe to make it work, saturday_day and sunday_day need to shift forward(or backward?) based on the week_day, but they both have to be between 1-7, that's where mod would come in I guess.
Here's my somewhat straightforward and rigorous solution. There could very well be an optimized way to do this:
# Determine the minimum of two integers without any branching (no `if`)
def min(x, y):
return y ^ ((x ^ y) & -(x < y))
# assuming that start is 1-7 with 1 being Monday
def compute_work_days(start, days):
# work in a 0-based scale (0 == Monday)
start -= 1
# remember our original start
orig_start = start
# adjust count so that we assume we start on the earlier Monday and
# end on the same day
days += start
# pull out full weeks, which provide 5 work days and otherwise leave the same problem
full_weeks = days // 7 # compute number of full weeks
days = days % 7 # take these even weeks out of the count
work_days = full_weeks * 5 # we get 5 work days for each full week
# what we have left is a value between 0 and 6, where the first 5 days
# are work days, so add at most 5 days
work_days += min(days, 5)
# now take off the extra days we added to the count at the beginning, the
# first 5 of which will be work days
work_days -= min(orig_start, 5)
return work_days
Let n denote the number of worked days and d a weekday in {1, 2, ..., 7}, then the number of workdays w maybe computed as following:
Hint: floor(x/d) is the number of multiples of d that are less than or equals to x.

How to get a time interval between two strings?

I have 2 times stored in separate strings in the form of H:M I need to get the difference between these two and be able to tell how much minutes it equals to. I was trying datetime and timedelta, but I'm only a beginner and I don't really understand how that works. I'm getting attribute errors everytime.
So I have a and b times, and I have to get their difference in minutes.
E.G. if a = 14:08 and b= 14:50 the difference should be 42
How do I do that in python in the simplest way possible? also, in what formats do I need to use for each step?
I assume the difference is 42, not 4 (since there are 42 minutes between 14:08 and 14:50).
If the times always contains of a 5 character length string, than it's reasonably easy.
time1 = '14:08'
time2 = '15:03'
hours = int(time2[:2]) - int(time1[:2])
mins = int(time2[3:]) - int(time1[3:])
print(hours)
print(mins)
print(hours * 60 + mins)
Prints:
1
-5
55
hours will be the integer value of the left two digits [:1] subtraction of the second and first time
minutes will be the integer value of the right two digits [3:] subtraction of the second and first time
This prints 55 ... with your values it prints out 42 (the above example is to show it also works when moving over a whole hour.
You can use datetime.strptime
also the difference is 42 not 4 50-8==42 I assume that was a typo
from datetime import datetime
a,b = "14:08", "14:50"
#convert to datetime
time_a = datetime.strptime(a, "%H:%M")
time_b = datetime.strptime(b, "%H:%M")
#get timedelta from the difference of the two datetimes
delta = time_b - time_a
#get the minutes elapsed
minutes = (delta.seconds//60)%60
print(minutes)
#42
You can get the difference between the datetime.timedelta objects created from the given time strings a and b by subtracting the former from the latter, and use the total_seconds method to obtain the time interval in seconds, with which you can convert to minutes by dividing it by 60:
from datetime import timedelta
from operator import sub
sub(*(timedelta(**dict(zip(('hours', 'minutes'), map(int, t.split(':'))))) for t in (b, a))).total_seconds() // 60
So that given a = '29:50' and b = '30:08', this returns:
18.0

Converting Matlab's datenum format to Python

I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.
You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here
Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format
Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()
Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.
Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.

Categories

Resources