Pandas week property result not as expected [duplicate] - python

The code below returns 52 52: how come?
import pandas as pd
ts = pd.Timestamp('01-01-2017 12:00:00')
print(ts.weekofyear, ts.week)

This is correct, that's ISO week date.
Last week
The last week of the ISO week-numbering year, i.e. the 52nd or 53rd one, is the week before week 01. This week’s properties are:
It has the year's last Thursday in it.
It is the last week with a majority (4 or more) of its days in December.
Its middle day, Thursday, falls in the ending year.
Its last day is the Sunday nearest to 31 December.
It has 28 December in it. Hence the earliest possible last week extends from Monday 22 December to Sunday 28 December, the latest possible last week extends from Monday 28 December to Sunday 3 January (next gregorian year).
If 31 December is on a Monday, Tuesday or Wednesday, it is in week 01 of the next year. If it is on a Thursday, it is in week 53 of the year just ending; if on a Friday it is in week 52 (or 53 if the year just ending is a leap year); if on a Saturday or Sunday, it is in week 52 of the year just ending.

Related

How to select a subset of rows in pandas with a certain starting value and certain ending value

In pandas, it's possible to return subsets of rows using like this:
df[:6]
which would with the dataset I'm using return:
weekday CO_level ...
0 Monday Very high
1 Tuesday Low
2 Wednesday Low
3 Saturday Medium
4 Sunday High
5 Thursday Low
I did a bit of data cleaning and removed all rows w/ null values which resulted in the rows having some missing weekdays but I want to visualize the CO_level for one entire week Monday - Sunday.
My question is: how can I go through the rows and return the first instance or all instance (doesn't really matter) of 7 consecutive rows with Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday values?
So that it could look something like this:
weekday CO_level ...
345 Monday Very high
346 Tuesday Low
347 Wednesday Low
348 Thursday Medium
349 Friday High
350 Saturday Low
351 Sunday Low
#Yefet's answer looks good. Here's a different approach:
days = ['Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday',
'Sunday']
for i in range(len(df)):
test_days = df['weekday'][i:i+7].to_list()
if test_days == days:
week_df = df.iloc[i:i+7,:]
break
Get the weekday offset per day. Extract seven days at a time. Calculate the start - end date in days and keep if the number is seven days. advance the start date by one day and repeat the process.

Calculate date from given weekday and month, but variable year (e.g. 3rd Sunday in August in "x" year)

My pandas dataframe "MSYs" has a "start_yr" variable built from a datetime column "Start Date" showing the year of someone's start date (note that month and day of "Start Date" also vary).
start_yr = pd.DatetimeIndex(MSYs['Start Date']).year
I want to use start_yr to help me return a datetime date in another column "Grant Start" showing the third Sunday in August of that variable year. I am stumped.
This is an answer to a similar quesion which might help you.
Use the datetime library.
Loop through subset of days in august of that year.
Check if if it is thursday.
Python: third Friday of a month
Here is a solution based on one of the answers in that thread. It is a generalized solution so you should be able to pick a month, a day of the week and the number in the month you want and get that date.
Note: Week days are 0 indexed starting at Monday. So Sunday's index is 6 and monday's index is 0. So when you feed the day_of_week into this function make sure you choose numbers between 0 and 6.
I have defaulted it to choose the 3rd Sunday of the month given the year.
import datetime as dt
def get_year_day(year,month=8,day_of_week=6,num_in_month=3):
## set up possible ranges
range_1 = 7*(num_in_month-1)+1
range_2 = 7*num_in_month+1
## loop through possible range in the year and the month
for i in range(range_1,range_2):
date = dt.datetime(year=year,month=month, day=i)
## if we have our weekday we can break
if date.weekday()==day_of_week:
break
return date
for i in range(2015,2021):
print(i,get_year_day(i))
2015 2015-08-16 00:00:00
2016 2016-08-21 00:00:00
2017 2017-08-20 00:00:00
2018 2018-08-19 00:00:00
2019 2019-08-18 00:00:00
2020 2020-08-16 00:00:00

Getting conflicting information from datetime isocalendar

I have a list that gives me the weekday (1-7 as if Monday-Sunday) and the week number for a given year. I need to convert that info to a given date.
So, I wrote a simple script that takes three arguments:
1. year
2. weekday
3. weeknumber
and then finds out the date.
My script basically iterates over all days of the given year, and creates a datetime object where I then extract isocalendar()[1] to compare it to the weeknumber.
I found that if I give the input 2017 7 52 I get two outputs!
In its most basic essence this is what happens:
#!/Library/Frameworks/Python.framework/Versions/3.7/bin/python3
import datetime
def print_dt(year, month, day):
dt = datetime.date(year, month, day)
print("%d-%d-%d (%d) -> week# %d" % (year, month, day, dt.weekday(), dt.isocalendar()[1]))
print_dt(2017, 1, 1)
print_dt(2017, 12, 31)
And the output is the same:
Anibals-iMac:RPEG anibal$ ./findDate-fixed-date.py
2017-1-1 (6) -> week# 52
2017-12-31 (6) -> week# 52
How's that possible? That would mean that week 52 in year 2017 has two day #6, i.e., two different Sundays. This situation is causing problems for my script.
Any idea on how to get around this?
My original problem is that I have events given in YYYYMMDD and I need to group them by week# by year. So, that I can say that X number of events occurred on week#4 of year 2017. With the situation above it doesn't work when it comes down to week 52 since there's more than one solution to a YYYYMMDD.
You are mixing up ISO calendar years and Gregorian calendar years.
The date 2017-1-1 is day 7 of week 52 of year 2016 in the ISO calendar.
The ISO calendar defines the first week of an ISO calendar year to be the one containing the first Thursday of the corresponding Gregorian calendar year. This could be anywhere from 1st to 7th January. As ISO numbers days with Monday = 1 to Sunday = 7, this means that for up to three days around each New Year the Gregorian calendar year and ISO calendar year of a date do not agree.
January 1st 2015 fell on a Thursday, so the Monday, Tuesday and Wednesday before it have ISO calendar year 2015 despite being in December 2014. Similarly, January 7th 2016 fell on a Thursday, and Friday January 1st to Sunday January 3rd 2016 have ISO calendar year 2015 despite being in 2016.
Your script appears to be taking a year in the Gregorian calendar, iterating through all days of this Gregorian calendar year and looking for the day with the matching day-of-week and ISO week-number. What you have found out is that the three values (Gregorian year, ISO week-number ISO day-of-week) do not uniquely identify a date. Your script needs to take into account the fact that ISO calendar years and Gregorian calendar years do not always agree and match on ISO calendar year instead of Gregorian calendar year. One way to do this is to:
include the last three days of the previous Gregorian calendar year and the first three days of the next Gregorian calendar year in the range of dates you search through, and
as well as matching on ISO week number and ISO day of week, ensure that the ISO year matches too. The ISO year is in dt.isocalendar()[0].
Or, as an alternative, you could avoid the ISO calendar system altogether and instead consider something like the following:
def get_week_and_day(year, month, day):
wday_of_jan_1 = datetime.date(year, 1, 1).timetuple().tm_wday
daytuple = datetime.date(year, month, day).timetuple()
wday = daytuple.tm_wday
week = (daytuple.tm_yday - 1 + wday_of_jan_1) // 7 + 1
return (week, wday)
Given a year, month and day this will return the week-of-the-year and day-of-week, with weeks starting on Monday (as per time.struct_time) and week 1 being the week that January 1 falls in. Weeks normally go up to 53 but if December 31 of a leap year falls on a Monday (as it did in 2012) this day will have week number 54.
This works by using the date.timetuple() method to get the weekday and day-of-year of a date, plus also the day-of-the-week of January 1 of that year. In the calculation of week, we:
Subtract 1 from the day-of-the-year of the given date (daytuple.tm_yday), so that January 1 is 0, January 2 is 1 and so on.
Add to this the day-of-the-week of January 1. We do this because the day-of-the-week of January 1 is also the number of days in the first week of the year that are 'lost' to the previous year. For example, if January 1 falls on a Wednesday, wday_of_jan_1 will be 2, the Monday and Tuesday before it will be missing from week 1 and hence the first week will only have 5 days in it.
The calculations so far give us number of days between the given date and the first Monday on or before January 1. We can then divide this by 7 to get the number of whole weeks since this Monday, and finally add 1 so that January 1 is in week 1 rather than week 0.
This approach also avoids looping over an entire year's worth of dates and performing calculations on them.

Python pandas Timestamp.week returns 52 for first day of year

The code below returns 52 52: how come?
import pandas as pd
ts = pd.Timestamp('01-01-2017 12:00:00')
print(ts.weekofyear, ts.week)
This is correct, that's ISO week date.
Last week
The last week of the ISO week-numbering year, i.e. the 52nd or 53rd one, is the week before week 01. This week’s properties are:
It has the year's last Thursday in it.
It is the last week with a majority (4 or more) of its days in December.
Its middle day, Thursday, falls in the ending year.
Its last day is the Sunday nearest to 31 December.
It has 28 December in it. Hence the earliest possible last week extends from Monday 22 December to Sunday 28 December, the latest possible last week extends from Monday 28 December to Sunday 3 January (next gregorian year).
If 31 December is on a Monday, Tuesday or Wednesday, it is in week 01 of the next year. If it is on a Thursday, it is in week 53 of the year just ending; if on a Friday it is in week 52 (or 53 if the year just ending is a leap year); if on a Saturday or Sunday, it is in week 52 of the year just ending.

The number of calendar weeks in a year?

In Python, how can we find out the number of calendar weeks in a year?
I didn't find a function in the standard library.
I then thought about date(year, 12, 31).isocalendar()[1], but
For example, 2004 begins on a Thursday, so the first week of ISO year 2004 begins on Monday, 29 Dec 2003 and ends on Sunday, 4 Jan 2004, so that date(2003, 12, 29).isocalendar() == (2004, 1, 1) and date(2004, 1, 4).isocalendar() == (2004, 1, 7).
According to the same ISO specification, January 4th is always going to be week 1 of a given year. By the same calculation, the 28th of December is then always in the last week of the year. You can use that to find the last week number of a given year:
from datetime import date, timedelta
def weeks_for_year(year):
last_week = date(year, 12, 28)
return last_week.isocalendar()[1]
Also see Wikipedia, the ISO week article lists all properties of the last week:
It has the year's last Thursday in it.
It is the last week with a majority (4 or more) of its days in December.
Its middle day, Thursday, falls in the ending year.
Its last day is the Sunday nearest to 31 December.
It has 28 December in it. Hence the latest possible dates are 28 December through 3 January, the earliest 21 through 28 December.
For more comprehensive week calculations, you could use the isoweek module; it has a Week.last_week_of_year() class method:
>>> import isoweek
>>> isoweek.Week.last_week_of_year(2014)
isoweek.Week(2014, 52)
>>> isoweek.Week.last_week_of_year(2014).week
52
You're almost there, take the date of Dec. 28. If there is a monday after that, it will only have 3 days in the old year and hence be week 1 of the new year.
I think Martijn Pieters has a great solution, the only draw back is you have to install the module which is an overkill if your ONLY use case is getting the week#, here is something you can do without installing any modules. (FYI this was tested for Python 3.8)
#!/usr/bin/env python3.8
from datetime import timedelta,datetime
#change the your_year to the year you would like to get the last week#
your_year= 2020
# we add 1 to get to the next year !
next_year_date =datetime(your_year+1, 1, 1)
# we subtract 4 days only to go to the last day of previous/your_year
# this is because of [ISO spec][1]
last_day = next_year_date - timedelta(days=4)
print(last_day)
# we just get the week count
print(last_day.isocalendar()[1] )

Categories

Resources