I'm trying to resample daily frequency data to business days using the Pandas resample function with an offset so the last day of the week becomes Thursday and the beginning Sunday.
This is the code so far:
import pandas as pd
resampled_data = df.resample('B', base=-1)
But it keeps resampling so Friday is being used in the resample and Sunday is excluded. I tried many different values for base and loffset but it's not affecting the resampling.
Please note: The raw data is using UTC timestamps. Timezone is Eastern Daylight Time. Sunday UTC 21:00 - Thursday UTC 21:00.
Use a CustomBusinessDay(). I've resampled the whole of Jan which includes Fri / Sat and also included day_name() and dayofweek to show it has worked.
import datetime as dt
df = pd.DataFrame(index=pd.date_range(dt.datetime(2020,1,1), dt.datetime(2020,2,1)))
bd = pd.tseries.offsets.CustomBusinessDay(n=1,
weekmask="Sun Mon Tue Wed Thu")
df = df.resample(rule=bd).first().assign(
day=lambda dfa: dfa.index.day_name(),
dn=lambda dfa: dfa.index.dayofweek
)
output
day dn
2020-01-01 Wednesday 2
2020-01-02 Thursday 3
2020-01-05 Sunday 6
2020-01-06 Monday 0
2020-01-07 Tuesday 1
2020-01-08 Wednesday 2
2020-01-09 Thursday 3
2020-01-12 Sunday 6
2020-01-13 Monday 0
2020-01-14 Tuesday 1
2020-01-15 Wednesday 2
2020-01-16 Thursday 3
2020-01-19 Sunday 6
2020-01-20 Monday 0
2020-01-21 Tuesday 1
2020-01-22 Wednesday 2
2020-01-23 Thursday 3
2020-01-26 Sunday 6
2020-01-27 Monday 0
2020-01-28 Tuesday 1
2020-01-29 Wednesday 2
2020-01-30 Thursday 3
Related
I am having Financial Data,trying to calculate percent change in values between two consecutive thursday. Sometime due to Holiday's on Thursday the weekly data for this week is absent, so I want to calculate that week pct_change from Thursday to Wednesday, as Thursday data is not present in dataframe.
Reproducible Code-
# !pip install investpy
import pandas as pd
import investpy
from datetime import datetime
df = investpy.get_index_historical_data(index="Nifty 50",country="India",from_date=("23/03/2022"),to_date= "23/04/2022")
df['weekday'] = df.index.day_name()
df = df.loc[:, ['Close', 'weekday']]
df.tail(10)
Output-
Close weekday
Date
2022-04-07 17639.55 Thursday
2022-04-08 17784.35 Friday
2022-04-11 17674.95 Monday
2022-04-12 17530.30 Tuesday
2022-04-13 17475.65 Wednesday
2022-04-18 17173.65 Monday
2022-04-19 16958.65 Tuesday
2022-04-20 17136.55 Wednesday
2022-04-21 17392.60 Thursday
2022-04-22 17171.95 Friday
In df.tail(10), 14-Apr-2022 date is missing as it's holiday, so in that case I want to calculate pct_change between Thursday to Wednesday.
Code I used previously to calculate pct_returns
weekly_pct_change = df.loc[df['weekday'] == 'Thursday']
weekly_pct_change['pct_change']= np.log(1+weekly_pct_change['Close'].pct_change())*100
weekly_pct_change
Output-
Close weekday pct_change
Date
2022-03-24 17222.75 Thursday NaN
2022-03-31 17464.75 Thursday 1.395338
2022-04-07 17639.55 Thursday 0.995898
2022-04-21 17392.60 Thursday -1.409871
Onde idea is add missing datetimes by DataFrame.asfreq with method='ffill' and then reassign names of days by DatetimeIndex.day_name:
df1 = df.asfreq('B', method='ffill')
df1['weekday'] = df1.index.day_name()
print (df1.tail(10))
Close weekday
Date
2022-04-11 17674.95 Monday
2022-04-12 17530.30 Tuesday
2022-04-13 17475.65 Wednesday
2022-04-14 17475.65 Thursday
2022-04-15 17475.65 Friday
2022-04-18 17173.65 Monday
2022-04-19 16958.65 Tuesday
2022-04-20 17136.55 Wednesday
2022-04-21 17392.60 Thursday
2022-04-22 17171.95 Friday
weekly_pct_change = df1.loc[df1['weekday'] == 'Thursday'].copy()
weekly_pct_change['pct_change']= np.log(1+weekly_pct_change['Close'].pct_change())*100
print(weekly_pct_change)
Close weekday pct_change
Date
2022-03-24 17222.75 Thursday NaN
2022-03-31 17464.75 Thursday 1.395338
2022-04-07 17639.55 Thursday 0.995898
2022-04-14 17475.65 Thursday -0.933506
2022-04-21 17392.60 Thursday -0.476366
I am looking to find the current week of the month. There are many answers already on this here but i have below scenario:
Week starts from Sunday-Saturday
When month changes, majority of the dates should be considered. Example, 30th March 2020 is Week1 of April since in that week, there are 3 dates of March(29, 30, 31) but 4 dates of April(1,2,3,4).
Sample start dates and end dates are shown below:
start date end date
Jan 1 12/29/2019 2/1/2020
Feb 2 2/2/2020 2/29/2020
Mar 3 3/1/2020 3/28/2020
Apr 4 3/29/2020 5/2/2020
May 5 5/3/2020 5/30/2020
Jun 6 5/31/2020 6/27/2020
Jul 7 6/28/2020 8/1/2020
Aug 8 8/2/2020 8/29/2020
Sep 9 8/30/2020 9/26/2020
Oct 10 9/27/2020 10/31/2020
Nov 11 11/1/2020 11/28/2020
Dec 12 11/29/2020 12/26/2020
I am doing it via pd.merge where i have created complete table for 1 year which i can lookup and find the date but i am looking for something automated which will not be required to be updated every year.
I have created a table for the complete year as follows:
(I have truncated the dataframe below to save space but it continues till complete year end)
df_week
Month Week 1 2 3 4 5 6 7
0 Jan W1 2019-12-29 2019-12-30 2019-12-31 2020-01-01 2020-01-02 2020-01-03 2020-01-04
1 Jan W2 2020-01-05 2020-01-06 2020-01-07 2020-01-08 2020-01-09 2020-01-10 2020-01-11
2 Jan W3 2020-01-12 2020-01-13 2020-01-14 2020-01-15 2020-01-16 2020-01-17 2020-01-18
3 Jan W4 2020-01-19 2020-01-20 2020-01-21 2020-01-22 2020-01-23 2020-01-24 2020-01-25
4 Jan W5 2020-01-26 2020-01-27 2020-01-28 2020-01-29 2020-01-30 2020-01-31 2020-02-01
5 Feb W1 2020-02-02 2020-02-03 2020-02-04 2020-02-05 2020-02-06 2020-02-07 2020-02-08
6 Feb W2 2020-02-09 2020-02-10 2020-02-11 2020-02-12 2020-02-13 2020-02-14 2020-02-15
Then i used the following:
# Finds today's date in YYYY-MM-DD format
>d = dt.datetime.today().strftime("%Y-%m-%d")
#Find matching row in complete dataframe corresponding to the current date
>cal_week = df_week[df_week.eq(d).any(1)]["Week"].iat[0]
>cal_week
W2
I have a df of crypto data and am trying to see if there is a particular time of the day/week when prices move one way or the other. I have the time stamp, day of the week and return from the previous time stamps close, as is the case in the example data below.
Date Day Return
2019-06-22 01:00:00 Saturday -0.046910
2019-06-22 07:00:00 Saturday -0.018756
2019-06-22 13:00:00 Saturday 0.036842
2019-06-22 19:00:00 Saturday 0.000998
2019-06-23 01:00:00 Sunday 0.017672
2019-06-23 07:00:00 Sunday 0.021102
2019-06-23 13:00:00 Sunday -0.014737
2019-06-23 19:00:00 Sunday -0.039085
2019-06-24 01:00:00 Monday 0.009690
2019-06-24 07:00:00 Monday -0.004367
2019-06-24 13:00:00 Monday -0.005342
2019-06-24 19:00:00 Monday 0.001060
2019-06-25 01:00:00 Tuesday -0.027738
2019-06-25 07:00:00 Tuesday -0.001599
2019-06-25 13:00:00 Tuesday 0.006247
2019-06-25 19:00:00 Tuesday -0.036937
2019-06-26 01:00:00 Wednesday -0.064866
2019-06-26 07:00:00 Wednesday 0.012319
My first issue is the time stamp is confusing. As I get data from different exchanges the time stamp is different across a lot of them so I have abandoned the idea of trying to standardise the Date column and would now just like a new column that numbers the period in each day. So the first 6 hours in each saturday would be Saturday_1 and so on. So in the end I would have 28 different categories (4 time periods x 7 days in the week).
What I would then like is to groupby this new column, and have returned to me the average return for each category as it were.
Cheers
Assuming that your Day column is correct:
# ignore if already datetime
df.Date = pd.to_datetime(df.Date)
# hour block in the day
s = df.Date.dt.hour//6 + 1
# new column
df['group'] = df['Day'] + '_' + s.astype(str)
output:
0 Saturday_1
1 Saturday_2
2 Saturday_3
3 Saturday_4
4 Sunday_1
5 Sunday_2
6 Sunday_3
7 Sunday_4
8 Monday_1
9 Monday_2
10 Monday_3
11 Monday_4
12 Tuesday_1
13 Tuesday_2
14 Tuesday_3
15 Tuesday_4
16 Wednesday_1
17 Wednesday_2
Name: group, dtype: object
I am still very new to pandas and just figured out I have made a mistake in the process I was following earlier.
df_date
Date day
0 2016-05-26 Thursday
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday
20 2016-06-24 Friday
21 2016-06-27 Monday
22 2016-06-28 Tuesday
23 2016-06-29 Wednesday
There are about 600+ rows.
What I want to do
Make a column 'Exit' where if thursday is not in the week the Wednesday becomes E and if wednesday is not there then Tuesday.
I tried a for loop and I just can't seem to get this right.
Expected Output:
df_date
Date day Exit
0 2016-05-26 Thursday E
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday E
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday E
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday E
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday E
20 2016-06-24 Friday
21 2016-06-27 Monday
22 2016-06-28 Tuesday
23 2016-06-29 Wednesday E
I added this in comments but should be here as well:
If Thursday is not present then the record just before it.
So if Wednesday is also not present in the week, then Tuesday
If Tuesday is also not then Monday, if monday is not then Friday. Saturday and Sunday will never have a record.
Here's a solution:
ix = df.groupby(pd.Grouper(key='Date', freq='W')).Date
.apply(lambda x: (x.dt.dayofweek <= 3)[::-1].idxmax()).values
df.loc[ix,'Exit'] = 'E'
df.fillna('')
Date day Exit
0 2016-05-26 Thursday E
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday E
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday E
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday E
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday
20 2016-06-23 Thursday E
21 2016-06-24 Friday
22 2016-06-27 Monday
23 2016-06-28 Tuesday
24 2016-06-29 Wednesday E
You can use dt.week and dt.weekday properties of your datetime series. Then use groupby + max for your required logic. This is likely to be more efficient than sequential equality checks.
df['Date'] = pd.to_datetime(df['Date'])
# add week and weekday series
df['Week'] = df['Date'].dt.week
df['Weekday'] = df['Date'].dt.weekday.where(df['Date'].dt.weekday.isin([1, 2, 3]))
df['Exit'] = np.where(df['Weekday'] == df.groupby('Week')['Weekday'].transform('max'),
'E', '')
Result
I have left the helper columns so the way the solution works is clear. These can easily be removed.
print(df)
Date day Week Weekday Exit
0 2016-05-26 Thursday 21 3.0 E
1 2016-05-27 Friday 21 NaN
2 2016-05-30 Monday 22 NaN
3 2016-05-31 Tuesday 22 1.0
4 2016-06-01 Wednesday 22 2.0
5 2016-06-02 Thursday 22 3.0 E
6 2016-06-03 Friday 22 NaN
7 2016-06-06 Monday 23 NaN
8 2016-06-07 Tuesday 23 1.0
9 2016-06-08 Wednesday 23 2.0
10 2016-06-09 Thursday 23 3.0 E
11 2016-06-10 Friday 23 NaN
12 2016-06-13 Monday 24 NaN
13 2016-06-14 Tuesday 24 1.0
14 2016-06-15 Wednesday 24 2.0
15 2016-06-16 Thursday 24 3.0 E
16 2016-06-17 Friday 24 NaN
17 2016-06-20 Monday 25 NaN
18 2016-06-21 Tuesday 25 1.0
19 2016-06-22 Wednesday 25 2.0 E
20 2016-06-24 Friday 25 NaN
21 2016-06-27 Monday 26 NaN
22 2016-06-28 Tuesday 26 1.0
23 2016-06-29 Wednesday 26 2.0 E
I have the following pandas dataframe indexed to a Time_Stamp:
df = DataFrame(index = pd.date_range('4/1/2017', freq='3D', periods=10))
df['weekday'] = df.index.weekday_name
Data looks like this:
weekday
2017-04-01 Saturday
2017-04-04 Tuesday
2017-04-07 Friday
2017-04-10 Monday
2017-04-13 Thursday
2017-04-16 Sunday
2017-04-19 Wednesday
2017-04-22 Saturday
2017-04-25 Tuesday
2017-04-28 Friday
I want to create a new column 'week' that will give the week ordinal of the year but with a weekday.
I know I can just do this:
df['week_sun'] = df.index.week
Except I want the first day of the week to be something besides Sunday. For this question, lets say I need it to be Wednesday so that the resulting dataframe would be like so:
weekday week_sun week_wed
2017-04-01 Saturday 13 13
2017-04-04 Tuesday 14 13
2017-04-07 Friday 14 14
2017-04-10 Monday 15 14
2017-04-13 Thursday 15 15
2017-04-16 Sunday 15 15
2017-04-19 Wednesday 16 16
2017-04-22 Saturday 16 16
2017-04-25 Tuesday 17 16
2017-04-28 Friday 17 17
I'm at a loss to how to achieve this. Thanks!
Given your requirements, you would only need to subtract 1 to the week number, in case the day of the week is "before" the reference day (Wednesday in your example).
In [162]: df
Out[162]:
weekday week_sun
2017-04-01 Saturday 13
2017-04-04 Tuesday 14
2017-04-07 Friday 14
2017-04-10 Monday 15
2017-04-13 Thursday 15
2017-04-16 Sunday 15
2017-04-19 Wednesday 16
2017-04-22 Saturday 16
2017-04-25 Tuesday 17
2017-04-28 Friday 17
In [163]: df['week_wed'] = df['week_sun']
Let's now shift the value where needed, meaning when the weekday is before Wednesday, hence df.index.dayofweek < 2.
In [164]: df.loc[df.index.dayofweek < 2, 'week_wed'] = (df[df.index.dayofweek < 2]['week_sun'] - 2) % 52 + 1
In [165]: df
Out[165]:
weekday week_sun week_wed
2017-04-01 Saturday 13 13
2017-04-04 Tuesday 14 13
2017-04-07 Friday 14 14
2017-04-10 Monday 15 14
2017-04-13 Thursday 15 15
2017-04-16 Sunday 15 15
2017-04-19 Wednesday 16 16
2017-04-22 Saturday 16 16
2017-04-25 Tuesday 17 16
2017-04-28 Friday 17 17
I didn't exactly subtract 1, but instead used a modulo operation ((X-2) %52 +1)) so I can convert week 1 to week 52 from the previous year if need be:
weekday week_sun week_wed
2017-12-27 Wednesday 52 52
2017-12-30 Saturday 52 52
2018-01-02 Tuesday 1 52
2018-01-05 Friday 1 1