Does Pandas account for leap years when calculating dates - python

I am trying to add 148.328971 years precisely from the day 01.01.2000 using pandas. I first converted this to days by multiplying it by 365. So here is my question, albeit probably a dumb one.
Does pandas consider leap years when calculating days? The obvious answer is yes because it is calculating days but I have to make sure, precision of dates and time is important in the analysis I am trying to do.
I get this result when calculating with pandas which I am not sure is entirely correct:
03.25.2148 1:47:09 AM
code being used:
import pandas as pd
start = "01/01/2000"
end = pd.to_datetime(start) + pd.DateOffset(days=54140.074415)
print(end)
Any help would be greatly appreciated! Sorry in advance if this seems to be basic knowledge but I have to be certain

Your question is flawed, as stated; a year is not a fixed length of time. So what does "precisely 148.328971 years" even mean? Do you mean 148 calendar years plus 0.328971 of way through the next calendar year? And if you're counting down to a precision of 0.000001 year – a millionth of a year, or about 30 seconds – then "01.01.2000" is a pretty imprecise starting point; from what time on January 1st, 2000 are you counting?
Let's assume you mean civil calendar years from midnight UTC. Then 148.0 years would get you to January 1, 2148, still at midnight UTC. Since 2148 is a leap year, 0.328971 of the way through it would add 0.328971 × 366 = 120.403 more days, which gets you to April 30, 2148 at 09:40 UTC.
Maybe you mean to be counting some "year" value that really is fixed? We do that in other contexts; the light year is based on the mean Julian calendar year, and so is defined as the distance light travels in exactly 365.25 atomic days. If you mean 148.328971 of those years, that'd be 54,177.1567 days, which would get you to May 1, 2148 at 03:45 UTC instead.
But we don't use the Julian calendar anymore for civil purposes. Maybe you want instead the mean year of the Gregorian calendar, which replaced it in the West? That's exactly 365.2425 days; 148.328971 of those years is 54,176.0442 days, from 2000 Jan 1 is back to 2148 April 30, only now at 01:03 UTC.
Then again, In parts of the world where the Eastern Orthodox Church is dominant, they instead use the Revised Julian calendar, whose mean year is 365.242̅ days (exactly 365 days, 5 hours, 48 minutes, 48 seconds). 148.328971 of those years is only 54,176.0030 days, which still get you to April 30th, but just barely over the line from the 29th at less than 5 minutes after midnight: 00:04 UTC.
So if you're counting calendar years of some description, you wind up somewhere on April 30th or May 1st, 2148. I trust this is helpful.
But maybe you mean to toss out calendars and go directly to the value they're trying to approximate: the mean tropical year! But then we have to ask where in the year you're measuring from, because December solstice to December solstice is a different length on average than March equinox to March equinox (because the length of the year itself is constantly changing). When we need a fixed value we tend to use the average of averages, as it were, taking the mean of the length values across the whole year. As of 2000 that value was about 365.24219 days. 148.328971 of those is only 54,175.9982 days, which leaves you even earlier: April 29th, 2148 at 23:57 UTC.
Then there's the sidereal year, but even though it arguably has the greatest claim at being the "real" period of the Earth's orbit, it doesn't see much use outside astronomy; probably not that.
Anyway, the real question is - where does this "148.328971" figure come from, and what is the intent behind it? Once you know what the desired answer actually is, it will be easy enough to find its value.

Yes, it does. However, your conversion from years to days is already ignoring the leap years. You can multiply by 365.25 (365.242, as suggested in the comments) which gives better results.
You can check the accuracy of the results on wolfram alpha: https://www.wolframalpha.com/input/?i=148.328971+years++from+01%2F01%2F2000
In addition, you can use pandas DateOffset with years. However, currently only integer values are supported.
import pandas as pd
start = "01/01/2000"
end = pd.to_datetime(start) + pd.DateOffset(years =148, days =0.328971*365.242)
print(end)
# 2148-04-30 03:45:35.229600
It seems to work well but misses by few hours.

Related

Creating a day of year column overriding the leap day in a leap year

I have a large database of climate variables - daily values of temp, humidity etc. I have a timestamp column %Y%m%d. I have removed leap days, as I need uniform 365 days for each of my years. I want to add a new column called 'day_of_year' with 1 to 365 for each year for as many years as I have in my database. How can I accomplish this in python, any pointers, please?
If I use the day of year function from pandas, I get 59 for feb 28 and get 61 for Mar 1. Is there a way to override the leap year, as I have dropped the leap day and get 60 for Mar 1?
Use pandas' day of year function, but instead of giving it the real timestamp, e.g. "2022-11-27", give it "2021-" + timestamp[-5:]. This will give you the altered number as if the timestamp was not a leap year.

Can the day of the month be encoded similarily to other cyclical variables?

I am working with time-series data in Python to see if variables like the time of day and the day of the month and the month of the year affect attendance at a gym. I have read up on encoding the time series data cyclicly using sine and cosine. I was wondering if you can do the same thing for the day of the month. The reason I ask is that, unlike the number of months in a year or the number of days in a week, the number of days in a month is variable (for example, February has 28, whereas March has 31). Is there any way to deal with that?
Here is a link describing what I mean by cyclic encoding: https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time/
Essentially, what this is saying is that you can't just convert the hour into a series of values like 1, 2, 3, ..., 24 when you are doing machine learning because that implies that the 24th hour is further away (from a euclidean geometric perspective) from the 1st hour than the 1st hour is from the 2nd hour, which is not true. Cyclical encoding (assigning sine and cosine values to each hour) allows you to represent the fact that the 24th hour and the 2nd hour are equidistant from the 1st hour.
My question is that I do not know if this cyclical conversion will work for days in a month, seeing as different months can have different numbers of days.
You can implement this by dividing each month into 2π radians; then in a 28-day month, a day is 0.2234 while in a 31-day month, a day is 0.2026.
This obviously introduces a skew where a shorter month will appear to take up as much time as a longer one; but it will satisfy your requirement. If you only use this metric for normalizing a single feature, that should be inconsequential, and let you achieve the stated goal.
If you have points in time with a finer granularity than a day, you obviously can and probably should normalize those into the same projection.

Pandas dataframe create Variable "Winter & rolling year" (interyears)

I am looking at a history of daily min/max temperatures for the past ~40 years of a specific city (the variable precipitation isn't needed).
I imported the CSV file with the aim to calculate an average for the low and high temperature for each winter (I consider the range November-March as winter). So I suppose a solution could be to loop over the years and maybe create a column which consists of "Winter&year" (for instance the first of december 2018 fell in winter 2018 and the 23rd of February 2019 fell in winter 2018 too). I found plenty of examples to aggregate days/months into seasons but nothing where the year changes and I actually struggle with that bit.
The structure of the data is the following:
Could anyone point me to the right direction?
Many thanks

Get the dates by entering the number of days

I was experimenting with something on Python and came across an interesting problem. I would like to enter the number of days and get the last and today's date in the range I specified. BUT, I only want the dates for the business days (Exclusing Holidays and Weekends). For example, I would like to do:
previous_days(10)
The output would give me the date that was 10 days ago and also today's date:
'2017-02-17' #10-days ago (because the 20th was President's day)
'2017-03-03' #Today's date
This is what I have been doing so far:
todaysDate = time.strftime("%Y-%m-%d") #outputs 2017-03-03
tenDaysAgo = datetime.strptime(todaysDate, "%Y-%m-%d").date() + timedelta(days=-12)
I had to do -12 to take into account the weekends. But the dates were incorrect this time because of President's day. Is there a library that exists in Python where I can enter the number of days I want and I can get the dates excluding weekends and holidays? If not, is there a more clever way for me to go about solving my issue?
Not enough reputation for a comment...
If DYZ's comment didn't help, take a look at dateutil.
More Dateutil examples
Other possibilities on SO here and here

Algorithm to get the current week number (not ISO)

I'm looking for an easy way to get the current week number of the year in Python. I'm well aware of the datetime.datetime.isocalendar() function in the standard library, but this function stipulates that week 1 is the first Monday of the new year. My dilemma is that I'm using Sunday as a starting point for each week, and if Sunday is for example December 27 and January 1st appears at some point during that week, I need to represent that week as week 1 (and year 2015).
I thought of doing something like (pseudocode):
if (Jan 1) - (current_sunday) < 7 days:
week_num = 1
And then storing that week number somewhere to iterate over next week. However, I feel that this is a very hackish method and would prefer something cleaner.
Generally to get the current week number (starts from Sunday):
>>> import datetime
>>> import calendar
>>> today = datetime.date.today())
>>> (int(today.strftime('%U')) + (datetime.date(today.year+1, 1, 1).weekday() != calendar.SUNDAY)) % 53
12
From the documentation of strftime('%U'):
"Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0."
Hence the modified code for your specific requirements. There isn't really a non-hacky way to do what you want.

Categories

Resources