I'm new in pandas and trying to make aggregation. I converted Dataframe to date format and made indexing change for every day.
model['time_only'] = [time.time() for time in model['date']]
model['date_only'] = [date.date() for date in model['date']]
model['cumsum'] = ((model['date_only'].diff() == datetime.timedelta(days=1))*1).cumsum()
def get_out_of_market_data(data):
df = data.copy()
start_market_time = datetime.time(hour=13,minute=30)
end_market_time = datetime.time(hour=20,minute=0)
df['time_only'] = [time.time() for time in df['date']]
df['date_only'] = [date.date() for date in df['date']]
cond = (start_market_time > df['time_only']) | (df['time_only'] >= end_market_time)
return data[cond]
model['date'] = pd.to_datetime(model['date'])
new = model.drop(columns=['time_only', 'date_only'])
get_out_of_market_data(data=new).head(20)
what i get
0 0 65.5000 65.50 65.5000 65.500 DD 1 125 65.500000 2016-01-04 13:15:00 0
26 26 62.7438 62.96 62.6600 62.956 DD 1639 174595 62.781548 2016-01-04 20:00:00 0
27 27 62.5900 62.79 62.5300 62.747 DD 2113 268680 62.650260 2016-01-04 20:15:00 0
28 28 62.7950 62.80 62.5400 62.590 DD 2652 340801 62.652640 2016-01-04 20:30:00 0
29 29 63.1000 63.12 62.7800 62.800 DD 6284 725952 62.963512 2016-01-04 20:45:00 0
30 30 63.2200 63.22 63.0700 63.080 DD 21 699881 63.070114 2016-01-04 21:00:00 0
31 31 63.2200 63.22 63.2200 63.220 DD 7 1973 63.220000 2016-01-04 22:00:00 0
32 32 63.4000 63.40 63.4000 63.400 DD 2 150 63.400000 2016-01-05 00:30:00 1
33 33 62.3700 62.37 62.3700 62.370 DD 3 350 62.370000 2016-01-05 11:00:00 1
34 34 62.1000 62.37 62.1000 62.370 DD 2 300 62.280000 2016-01-05 11:15:00 1
35 35 62.0800 62.08 62.0800 62.080 DD 1 100 62.080000 2016-01-05 11:45:00 1
the last two columns are the time interval from 20:00 to 13:30 with the indexes of change of each day and the indices of change of the day
I tried to group by the last column the interval from 20:00 one day to 13:00 the next with indexing each interval through the groupbuy
I do not fully understand the method, but for example
new.groupby(pd.Grouper(freq='17hours'))
how to move the indexing to this interval ?
You could try creating a new column to represent the market day it belongs to. If the time is less than 13:30:00, it is yesterday's market day, otherwise it is today's market day. Then you can group by it.The code will be:
def get_market_day(dt):
if dt.time() < datetime.time(13, 30, 0):
return dt.date() - datetime.timedelta(days=1)
else:
return dt.date()
df["market_day"] = df["dt"].map(get_market_day)
df.groupby("market_day").agg(...)
This question already has answers here:
plot year over year on 12 month axis
(2 answers)
Closed 1 year ago.
I have a data across multiple years like this:
time value
0 2015-01-01 0.982295
1 2015-02-01 3.283557
2 2015-03-01 2.665395
3 2015-04-01 3.124564
4 2015-05-01 4.747362
5 2015-06-01 4.436057
6 2015-07-01 3.925824
7 2015-08-01 4.772219
8 2015-09-01 5.313609
9 2015-10-01 6.213427
10 2015-11-01 6.870897
11 2015-12-01 8.130550
12 2016-01-01 1.984611
13 2016-02-01 1.782809
14 2016-03-01 2.904271
15 2016-04-01 3.029645
16 2016-05-01 3.810806
17 2016-06-01 4.365906
18 2016-07-01 3.922678
19 2016-08-01 4.354115
20 2016-09-01 5.376155
21 2016-10-01 7.290028
22 2016-11-01 6.504523
23 2016-12-01 7.689338
24 2017-01-01 2.158096
25 2017-02-01 1.983260
26 2017-03-01 3.774609
27 2017-04-01 3.570528
28 2017-05-01 3.283161
29 2017-06-01 3.834184
30 2017-07-01 4.388914
31 2017-08-01 5.035261
32 2017-09-01 4.844120
33 2017-10-01 6.206708
34 2017-11-01 6.198993
35 2017-12-01 7.220857
36 2018-01-01 1.346803
37 2018-02-01 2.361194
38 2018-03-01 3.478777
39 2018-04-01 4.093510
40 2018-05-01 3.730770
41 2018-06-01 3.612807
42 2018-07-01 5.524375
43 2018-08-01 5.604300
44 2018-09-01 6.412848
45 2018-10-01 5.463882
46 2018-11-01 6.224526
47 2018-12-01 7.082455
48 2019-01-01 0.893474
49 2019-02-01 1.393201
50 2019-03-01 3.163579
51 2019-04-01 3.506390
52 2019-05-01 3.564924
53 2019-06-01 4.852669
54 2019-07-01 4.087379
55 2019-08-01 4.800931
56 2019-09-01 4.907763
57 2019-10-01 7.235331
58 2019-11-01 6.841004
59 2019-12-01 7.854044
I want to plot the trend of each year overlapped to others.
I tried with this code:
df['year'] = df['time'].dt.year
for year in df['year'].unique():
plt.plot(df[df['year'] == year]['time'],
df[df['year'] == year]['value'])
plt.show()
But I get this:
I want the lines to overlap with the x axis from Jan to Dec.
I already find this question but I do not want to use neither groupby function on my dataframe nor multiindex.
You could extract:
month to be set as the x axis
year to be used as filter
from the time column and use them to plot your data, as in this code:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv('data.csv')
df['time'] = pd.to_datetime(df['time'], format = '%Y-%m-%d')
df['month'] = pd.to_datetime(df['time'].dt.month, format = '%m')
df['year'] = df['time'].dt.year
fig, ax = plt.subplots(figsize = (16, 8))
for year in df['year'].unique():
ax.plot(df[df['year'] == year]['month'],
df[df['year'] == year]['value'],
label = year)
ax.xaxis.set_major_locator(md.MonthLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%b'))
ax.set_xlim([df['month'].iloc[0], df['month'].iloc[-1]])
plt.legend()
plt.show()
which gives this plot:
Or, better, use sns.lineplot in order to avoid the for loop:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
import seaborn as sns
df = pd.read_csv('data.csv')
df['time'] = pd.to_datetime(df['time'], format = '%Y-%m-%d')
df['month'] = pd.to_datetime(df['time'].dt.month, format = '%m')
df['year'] = df['time'].dt.year
fig, ax = plt.subplots(figsize = (16, 8))
palette = sns.color_palette('Set1', n_colors = len(df['year'].unique()))
sns.lineplot(ax = ax,
data = df,
x = 'month',
y = 'value',
hue = 'year',
palette = palette,
ci = None)
ax.xaxis.set_major_locator(md.MonthLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%b'))
ax.set_xlim([df['month'].iloc[0], df['month'].iloc[-1]])
plt.legend()
plt.show()
which gives almost the same plot:
I have got a start date ('2019-11-18') and an end date ('2021-02-19'). I am trying to create a list of all the weeks of each month that exist between the start and end date. My expected result should be something like this:
list = ['2019.Nov.3','2019.Nov.4', '2019.Nov.5' .... '2021.Feb.2','2021.Feb.3']
If the first or last date of a month lands on a Wednesday, i will assume that the week belongs to this month (As 3 out of the 5 working days will belong to this month)
I was actually successful in creating a dataframe with all the weeks of the year that exist between the start and end date using the following code:
date_1 = '18-11-19'
first_date = datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.strptime(date_2, '%d-%m-%y')
timeline = pd.DataFrame(columns=['Year', 'Week'])
def create_list(df):
start_year = int(first_date.isocalendar()[0])
start_week = int(first_date.isocalendar()[1])
end_year = int(last_date.isocalendar()[0])
end_week = int(last_date.isocalendar()[1])
while start_year < (end_year + 1):
if start_year == end_year:
while start_week < (end_week + 1):
if len(str(start_week)) == 1:
week = f'{start_year}' + '.0' + f'{start_week}'
else:
week = f'{start_year}' + '.' + f'{start_week}'
df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
start_week += 1
else:
while start_week < 53:
if len(str(start_week)) == 1:
week = f'{start_year}' + '.0' + f'{start_week}'
else:
week = f'{start_year}' + '.' + f'{start_week}'
df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
start_week += 1
start_year += 1
start_week = 1
return df
timeline = create_list(timeline)
I was successfully able to use this as an x axis for my line graph. However, the axis is a bit hard to read and its very difficult to know which week belongs to which month.
I would really appreciate if someone can give me a hand with this!
Edit:
So here is the solution with the guidance of #Serge Ballesta. I hope it helps anyone who might need something similiar in the future!
import pandas as pd
import dateutil.relativedelta
from datetime import datetime
def year_week(date):
if len(str(date.isocalendar()[1])) == 1:
return f'{date.isocalendar()[0]}' + '.0' + f'{date.isocalendar()[1]}'
else:
return f'{date.isocalendar()[0]}' + '.' + f'{date.isocalendar()[1]}'
date_1 = '18-11-19'
first_date = datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.strptime(date_2, '%d-%m-%y')
set_first_date = str((first_date - dateutil.relativedelta.relativedelta(months=1)).date())
set_last_date = str((last_date + dateutil.relativedelta.relativedelta(months=1)).date())
s = pd.date_range(set_first_date, set_last_date, freq='W-WED'
).to_series(name='wed').reset_index(drop=True)
df = s.to_frame()
df['week'] = df.apply(lambda x: year_week(x['wed']), axis=1)
df = df.assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
).cumcount() + 1)
df = df[(s >= pd.Timestamp('2019-11-18'))
& (s <= pd.Timestamp('2021-02-19'))]
df['month_week'] = (df['wed'].dt.strftime('%Y.%b.') + df['week_of_month'].astype(str)).tolist()
df = df.drop(['wed', 'week_of_month'], axis = 1)
print (df)
Printed df:
week month_week
4 2019.47 2019.Nov.3
5 2019.48 2019.Nov.4
6 2019.49 2019.Dec.1
7 2019.50 2019.Dec.2
8 2019.51 2019.Dec.3
.. ... ...
65 2021.03 2021.Jan.3
66 2021.04 2021.Jan.4
67 2021.05 2021.Feb.1
68 2021.06 2021.Feb.2
69 2021.07 2021.Feb.3
I would build a Series of timestamps with a frequency of W-WED to have consistently Wednesday as day of week. That way, we immediately get the correct month for the week.
To have the number of the week in the month, I would start one month before the required start, and use a cumcount on year-month + 1. Then it would be enough to filter only the expected range and properly format the values:
# produce a series of wednesdays starting in 2019-10-01
s = pd.date_range('2019-10-01', '2021-03-31', freq='W-WED'
).to_series(name='wed').reset_index(drop=True)
# compute the week number in the month
df = s.to_frame().assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
).cumcount() + 1)
# filter the required range
df = df[(s >= pd.Timestamp('2019-11-18'))
& (s <= pd.Timestamp('2021-02-19'))]
# here is the expected list
lst = (df['wed'].dt.strftime('%Y.%b.')+df['week_of_month'].astype(str)).tolist()
lst is as expected:
['2019.Nov.3', '2019.Nov.4', '2019.Dec.1', '2019.Dec.2', '2019.Dec.3', '2019.Dec.4',
'2020.Jan.1', '2020.Jan.2', '2020.Jan.3', '2020.Jan.4', '2020.Jan.5', '2020.Feb.1',
'2020.Feb.2', '2020.Feb.3', '2020.Feb.4', '2020.Mar.1', '2020.Mar.2', '2020.Mar.3',
'2020.Mar.4', '2020.Apr.1', '2020.Apr.2', '2020.Apr.3', '2020.Apr.4', '2020.Apr.5',
'2020.May.1', '2020.May.2', '2020.May.3', '2020.May.4', '2020.Jun.1', '2020.Jun.2',
'2020.Jun.3', '2020.Jun.4', '2020.Jul.1', '2020.Jul.2', '2020.Jul.3', '2020.Jul.4',
'2020.Jul.5', '2020.Aug.1', '2020.Aug.2', '2020.Aug.3', '2020.Aug.4', '2020.Sep.1',
'2020.Sep.2', '2020.Sep.3', '2020.Sep.4', '2020.Sep.5', '2020.Oct.1', '2020.Oct.2',
'2020.Oct.3', '2020.Oct.4', '2020.Nov.1', '2020.Nov.2', '2020.Nov.3', '2020.Nov.4',
'2020.Dec.1', '2020.Dec.2', '2020.Dec.3', '2020.Dec.4', '2020.Dec.5', '2021.Jan.1',
'2021.Jan.2', '2021.Jan.3', '2021.Jan.4', '2021.Feb.1', '2021.Feb.2', '2021.Feb.3']
This may not give you exactly what you need (because of 3 out of 5 days in the last week condition), but may be you can get an idea on how to tweak it to get your desired result.
You can export column res to list with df['res'].to_list()
df = pd.DataFrame({'date': pd.date_range('2019-11-18','2021-02-19', freq=pd.offsets.Week(weekday=0))})
df['year_wk']= df.date.apply(lambda x: x.strftime("%W")).astype(int)
df['mon_beg_wk']= df.date.dt.to_period('M').dt.to_timestamp().dt.strftime("%W").astype(int)
df['mon_wk']= df['year_wk']-df['mon_beg_wk']
df['res']= df['date'].dt.strftime("%Y.%b")+'.'+df['mon_wk'].astype(str)
df
Output
date year_wk mon_beg_wk mon_wk res
0 2019-11-18 46 43 3 2019.Nov.3
1 2019-11-25 47 43 4 2019.Nov.4
2 2019-12-02 48 47 1 2019.Dec.1
3 2019-12-09 49 47 2 2019.Dec.2
4 2019-12-16 50 47 3 2019.Dec.3
5 2019-12-23 51 47 4 2019.Dec.4
6 2019-12-30 52 47 5 2019.Dec.5
7 2020-01-06 1 0 1 2020.Jan.1
8 2020-01-13 2 0 2 2020.Jan.2
9 2020-01-20 3 0 3 2020.Jan.3
10 2020-01-27 4 0 4 2020.Jan.4
11 2020-02-03 5 4 1 2020.Feb.1
12 2020-02-10 6 4 2 2020.Feb.2
13 2020-02-17 7 4 3 2020.Feb.3
14 2020-02-24 8 4 4 2020.Feb.4
15 2020-03-02 9 8 1 2020.Mar.1
16 2020-03-09 10 8 2 2020.Mar.2
17 2020-03-16 11 8 3 2020.Mar.3
18 2020-03-23 12 8 4 2020.Mar.4
19 2020-03-30 13 8 5 2020.Mar.5
20 2020-04-06 14 13 1 2020.Apr.1
21 2020-04-13 15 13 2 2020.Apr.2
22 2020-04-20 16 13 3 2020.Apr.3
23 2020-04-27 17 13 4 2020.Apr.4
24 2020-05-04 18 17 1 2020.May.1
25 2020-05-11 19 17 2 2020.May.2
26 2020-05-18 20 17 3 2020.May.3
27 2020-05-25 21 17 4 2020.May.4
28 2020-06-01 22 22 0 2020.Jun.0
29 2020-06-08 23 22 1 2020.Jun.1
... ... ... ... ... ...
36 2020-07-27 30 26 4 2020.Jul.4
37 2020-08-03 31 30 1 2020.Aug.1
38 2020-08-10 32 30 2 2020.Aug.2
39 2020-08-17 33 30 3 2020.Aug.3
40 2020-08-24 34 30 4 2020.Aug.4
41 2020-08-31 35 30 5 2020.Aug.5
42 2020-09-07 36 35 1 2020.Sep.1
43 2020-09-14 37 35 2 2020.Sep.2
44 2020-09-21 38 35 3 2020.Sep.3
45 2020-09-28 39 35 4 2020.Sep.4
46 2020-10-05 40 39 1 2020.Oct.1
47 2020-10-12 41 39 2 2020.Oct.2
48 2020-10-19 42 39 3 2020.Oct.3
49 2020-10-26 43 39 4 2020.Oct.4
50 2020-11-02 44 43 1 2020.Nov.1
51 2020-11-09 45 43 2 2020.Nov.2
52 2020-11-16 46 43 3 2020.Nov.3
53 2020-11-23 47 43 4 2020.Nov.4
54 2020-11-30 48 43 5 2020.Nov.5
55 2020-12-07 49 48 1 2020.Dec.1
56 2020-12-14 50 48 2 2020.Dec.2
57 2020-12-21 51 48 3 2020.Dec.3
58 2020-12-28 52 48 4 2020.Dec.4
59 2021-01-04 1 0 1 2021.Jan.1
60 2021-01-11 2 0 2 2021.Jan.2
61 2021-01-18 3 0 3 2021.Jan.3
62 2021-01-25 4 0 4 2021.Jan.4
63 2021-02-01 5 5 0 2021.Feb.0
64 2021-02-08 6 5 1 2021.Feb.1
65 2021-02-15 7 5 2 2021.Feb.2
I used datetime.timedelta to do this. It is supposed to work for all start and end dates.
import datetime
import math
date_1 = '18-11-19'
first_date = datetime.datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.datetime.strptime(date_2, '%d-%m-%y')
start_week_m=math.ceil(int(first_date.strftime("%d"))/7)#Week number of first month
daysTill_nextWeek=7-int(first_date.strftime("%w"))#Number of days to next sunday.
date_template='%Y.%b.'
tempdate=first_date
weeks=['%s%d' % (tempdate.strftime(date_template),start_week_m)]
tempdate=tempdate+datetime.timedelta(days=daysTill_nextWeek)#tempdate becomes the next sunday
while tempdate < last_date:
temp_year,temp_month=int(tempdate.strftime("%Y")),int(tempdate.strftime("%m"))
print(start_week_m)
weeks.append('%s%d' % (tempdate.strftime(date_template),start_week_m+1))
start_week_m+=1
tempdate=tempdate+datetime.timedelta(days=7)
if temp_month != int(tempdate.strftime("%m")):
print(temp_year,int(tempdate.strftime("%Y")))
start_week_m=0
print(weeks)
prints
['2019.Nov.3', '2019.Nov.4', '2019.Dec.1', '2019.Dec.2', '2019.Dec.3', '2019.Dec.4', '2019.Dec.5', '2020.Jan.1', '2020.Jan.2', '2020.Jan.3', '2020.Jan.4', '2020.Feb.1', '2020.Feb.2', '2020.Feb.3', '2020.Feb.4', '2020.Mar.1', '2020.Mar.2', '2020.Mar.3', '2020.Mar.4', '2020.Mar.5', '2020.Apr.1', '2020.Apr.2', '2020.Apr.3', '2020.Apr.4', '2020.May.1', '2020.May.2', '2020.May.3', '2020.May.4', '2020.May.5', '2020.Jun.1', '2020.Jun.2', '2020.Jun.3', '2020.Jun.4', '2020.Jul.1', '2020.Jul.2', '2020.Jul.3', '2020.Jul.4', '2020.Aug.1', '2020.Aug.2', '2020.Aug.3', '2020.Aug.4', '2020.Aug.5', '2020.Sep.1', '2020.Sep.2', '2020.Sep.3', '2020.Sep.4', '2020.Oct.1', '2020.Oct.2', '2020.Oct.3', '2020.Oct.4', '2020.Nov.1', '2020.Nov.2', '2020.Nov.3', '2020.Nov.4', '2020.Nov.5', '2020.Dec.1', '2020.Dec.2', '2020.Dec.3', '2020.Dec.4', '2021.Jan.1', '2021.Jan.2', '2021.Jan.3', '2021.Jan.4', '2021.Jan.5', '2021.Feb.1', '2021.Feb.2']
I have a dataframe where one of the column ('ProcessingDATE') is datetime format. I want to create another column ('Report Date') where if the processing date is a Monday, subtract 3 days from it, which will end to be a Friday; else subtract 1 day from it.
I've been using python for a short amount of time, so doesn't have a lot of idea about how to write it. My thoughts was to write a for loop with if the cell = Monday, then = datetime.datetime.today() – datetime.timedelta(days=3); else = datetime.datetime.today() – datetime.timedelta(days=1)
for j in range(len(DDA_compamy['ProcessingDATE'])):
if pd.to_datetime(datetime(DDA_company.ProcessingDATE[j])).weekday() == 2
Hope this helps,
from datetime import timedelta
if DDA_compamy['ProcessingDATE'].weekday() == 4: #Condition to check if it is friday
DDA_compamy['Report Date']=DDA_compamy['ProcessingDATE'] - timedelta(days=3) # if friday subtracting 3 days
else:
DDA_compamy['Report Date']=DDA_compamy['ProcessingDATE'] - timedelta(days=1) #Else one day from the date is subtracted
the above can also be written as,
DDA_compamy['Report Date'] = (DDA_compamy['ProcessingDATE'] - timedelta(days=3)) if (DDA_compamy['ProcessingDATE'].weekday() == 4) else (DDA_compamy['Report Date']=DDA_compamy['ProcessingDATE'] - timedelta(days=1))
Use pandas.Series.dt.weekday and some logic:
import pandas as pd
df = pd.DataFrame({'ProcessingDATE':pd.date_range('2019-04-01', '2019-04-27')})
df1 = df.copy()
mask = df1['ProcessingDATE'].dt.weekday == 0
df.loc[mask, 'ProcessingDATE'] = df1['ProcessingDATE'] - pd.to_timedelta('3 days')
df.loc[~mask, 'ProcessingDATE'] = df1['ProcessingDATE'] - pd.to_timedelta('1 days')
Output:
ProcessingDATE
0 2019-03-29
1 2019-04-01
2 2019-04-02
3 2019-04-03
4 2019-04-04
5 2019-04-05
6 2019-04-06
7 2019-04-05
8 2019-04-08
9 2019-04-09
10 2019-04-10
11 2019-04-11
12 2019-04-12
13 2019-04-13
14 2019-04-12
15 2019-04-15
16 2019-04-16
17 2019-04-17
18 2019-04-18
19 2019-04-19
20 2019-04-20
21 2019-04-19
22 2019-04-22
23 2019-04-23
24 2019-04-24
25 2019-04-25
26 2019-04-26
I have a dataframe with columns date, day and week of the year.
I need to use week of the year to create a new column with values from 1 to 5.
Lets say i'm on week 35 all the columns with week 35 should have one, the weeks with 36 should have 2 and so on.
Once it reaches week 40 and number 5 the numbers in the new column need to start from 1 at week 41 and continue in this kind of pattern for however long the data range is
def date_table(start='2019-08-26', end='2019-10-27'):
df = pd.DataFrame({"Date": pd.date_range(start, end)})
df["Day"] = df.Date.dt.weekday_name
df["Week"] = df.Date.dt.weekofyear
return df
Calculate the index using modulo and the weeknumber:
import pandas as pd
start='2019-08-26'
end='2019-10-27'
df = pd.DataFrame({"Date": pd.date_range(start, end)})
df["Day"] = df.Date.dt.weekday_name
df["Week"] = df.Date.dt.weekofyear
df["idx"] = df["Week"] % 5 +1 # n % 5 = 0..4 plus 1 == 1..5
print(df)
Output:
0 2019-08-26 Monday 35 1
[...]
6 2019-09-01 Sunday 35 1
7 2019-09-02 Monday 36 2
[...]
13 2019-09-08 Sunday 36 2
14 2019-09-09 Monday 37 3
[...]
20 2019-09-15 Sunday 37 3
21 2019-09-16 Monday 38 4
[...]
27 2019-09-22 Sunday 38 4
28 2019-09-23 Monday 39 5
[...]
34 2019-09-29 Sunday 39 5
35 2019-09-30 Monday 40 1
[...]
[63 rows x 4 columns]
If you want it to start on a not by 5 divisible weeknumber - you can do that too by substracting the modulo 5 value of the first week for all weeknumbers:
# startweeknumber:
startweekmod = df["Week"][0] % 5
# offset by inital weeks mod
df["idx"] = (df["Week"] - startweekmod) % 5 + 1