Average of consecutive days in dataframe - python

I have a pandas dataframe df as:
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 2.95 Saturday
1/6/2019 3.39 Sunday
1/7/2019 3.39 Monday
1/12/2019 2.23 Saturday
1/13/2019 2.50 Sunday
1/14/2019 3.62 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
I need to get the following df2 from above:
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 3.24 Saturday
1/6/2019 3.24 Sunday
1/7/2019 3.24 Monday
1/12/2019 2.78 Saturday
1/13/2019 2.78 Sunday
1/14/2019 2.78 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
Where the df2 values are updated to have average of consecutive Sat, Sun and Mon values.
i.e. average of 2.95, 3.39, 3.39 for dates 1/5/2019, 1/6/2019, 1/7/2019 in df is 3.24 and hence in df2 I have replaced the 1/5/2019, 1/6/2019, 1/7/2019 values with 3.24.
The trick has been finding the consecutive Saturday, Sunday and Monday. Not sure how to approach this.

You can use CustomBusinessDay with pd.grouper to create a group col:
# if you want to only find the mean if all three days are found
from pandas.tseries.offsets import CustomBusinessDay
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')
df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df.update(df[df.groupby('group_col')['Val'].transform('size').eq(3)].groupby('group_col').transform('mean'))
Date Val WD group_col
0 2019-01-03 2.650000 Thursday 0
1 2019-01-04 2.510000 Friday 1
2 2019-01-05 3.243333 Saturday 2
3 2019-01-06 3.243333 Sunday 2
4 2019-01-07 3.243333 Monday 2
5 2019-01-12 2.783333 Saturday 7
6 2019-01-13 2.783333 Sunday 7
7 2019-01-14 2.783333 Monday 7
8 2019-01-15 3.810000 Tuesday 8
9 2019-01-16 3.750000 Wednesday 9
10 2019-01-17 3.690000 Thursday 10
11 2019-01-18 3.470000 Friday 11
or if you want to find the mean of any combination of sat sun mon in the same week
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')
df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df['Val'] = df.groupby('group_col')['Val'].transform('mean')

This logic creates a Series that assigns a unique ID to groups of consecutive Sat/Sun/Mon rows in your DataFrame. Then ensure there are 3 of them (not just Sat/Sun or Sun/Mon), and transform those values with the mean:
import pandas as pd
#df['Date'] = pd.to_datetime(df.Date)
s = (~(df.Date.dt.dayofweek.isin([0,6])
& (df.Date - df.Date.shift(1)).dt.days.eq(1))).cumsum()
to_trans = s[s.groupby(s).transform('size').eq(3)]
df.loc[to_trans.index, 'Val'] = df.loc[to_trans.index].groupby(to_trans).Val.transform('mean')
Output:
Date Val WD
0 2019-01-03 2.650000 Thursday
1 2019-01-04 2.510000 Friday
2 2019-01-05 3.243333 Saturday
3 2019-01-06 3.243333 Sunday
4 2019-01-07 3.243333 Monday
5 2019-01-12 2.783333 Saturday
6 2019-01-13 2.783333 Sunday
7 2019-01-14 2.783333 Monday
8 2019-01-15 3.810000 Tuesday
9 2019-01-16 3.750000 Wednesday
10 2019-01-17 3.690000 Thursday
11 2019-01-18 3.470000 Friday
12 2019-01-19 3.250000 Saturday
13 2019-01-20 3.250000 Sunday
14 2019-01-21 3.250000 Monday
15 2019-01-22 5.000000 Tuesday
16 2019-01-27 2.000000 Sunday
17 2019-01-28 4.000000 Monday
18 2019-01-29 6.000000 Tuesday
19 2019-02-05 7.000000 Tuesday
20 2019-02-07 6.000000 Thursday
21 2019-02-12 9.000000 Tuesday
Extended Input Data
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 2.95 Saturday
1/6/2019 3.39 Sunday
1/7/2019 3.39 Monday
1/12/2019 2.23 Saturday
1/13/2019 2.50 Sunday
1/14/2019 3.62 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
1/19/2019 3.75 Saturday
1/20/2019 2.00 Sunday
1/21/2019 4.00 Monday
1/22/2019 5.00 Tuesday
1/27/2019 2.00 Sunday
1/28/2019 4.00 Monday
1/29/2019 6.00 Tuesday
2/5/2019 7.00 Tuesday
2/7/2019 6.00 Thursday
2/12/2019 9.00 Tuesday

One approach is to calculate a week number, then use groupby to calculate means across specific days and map this back to your original dataframe.
df['Date'] = pd.to_datetime(df['Date'])
# consider Monday to belong to previous week
week, weekday = df['Date'].dt.week, df['Date'].dt.weekday
df['Week'] = np.where(weekday.eq(0), week - 1, week)
# take means of Fri, Sat, Sun, then map back
mask = weekday.isin([5, 6, 0])
week_val_map = df[mask].groupby('Week')['Val'].mean()
df.loc[mask, 'Val'] = df['Week'].map(week_val_map)
print(df)
Date Val WD Week
0 2019-01-03 2.650000 Thursday 1
1 2019-01-04 2.510000 Friday 1
2 2019-01-05 3.243333 Saturday 1
3 2019-01-06 3.243333 Sunday 1
4 2019-01-07 3.243333 Monday 1
5 2019-01-12 2.783333 Saturday 2
6 2019-01-13 2.783333 Sunday 2
7 2019-01-14 2.783333 Monday 2
8 2019-01-15 3.810000 Tuesday 3
9 2019-01-16 3.750000 Wednesday 3
10 2019-01-17 3.690000 Thursday 3
11 2019-01-18 3.470000 Friday 3

Related

Business Days Resample with Offset

I'm trying to resample daily frequency data to business days using the Pandas resample function with an offset so the last day of the week becomes Thursday and the beginning Sunday.
This is the code so far:
import pandas as pd
resampled_data = df.resample('B', base=-1)
But it keeps resampling so Friday is being used in the resample and Sunday is excluded. I tried many different values for base and loffset but it's not affecting the resampling.
Please note: The raw data is using UTC timestamps. Timezone is Eastern Daylight Time. Sunday UTC 21:00 - Thursday UTC 21:00.
Use a CustomBusinessDay(). I've resampled the whole of Jan which includes Fri / Sat and also included day_name() and dayofweek to show it has worked.
import datetime as dt
df = pd.DataFrame(index=pd.date_range(dt.datetime(2020,1,1), dt.datetime(2020,2,1)))
bd = pd.tseries.offsets.CustomBusinessDay(n=1,
weekmask="Sun Mon Tue Wed Thu")
df = df.resample(rule=bd).first().assign(
day=lambda dfa: dfa.index.day_name(),
dn=lambda dfa: dfa.index.dayofweek
)
output
day dn
2020-01-01 Wednesday 2
2020-01-02 Thursday 3
2020-01-05 Sunday 6
2020-01-06 Monday 0
2020-01-07 Tuesday 1
2020-01-08 Wednesday 2
2020-01-09 Thursday 3
2020-01-12 Sunday 6
2020-01-13 Monday 0
2020-01-14 Tuesday 1
2020-01-15 Wednesday 2
2020-01-16 Thursday 3
2020-01-19 Sunday 6
2020-01-20 Monday 0
2020-01-21 Tuesday 1
2020-01-22 Wednesday 2
2020-01-23 Thursday 3
2020-01-26 Sunday 6
2020-01-27 Monday 0
2020-01-28 Tuesday 1
2020-01-29 Wednesday 2
2020-01-30 Thursday 3

Plot line plot per weekday and week number

I have the following data. This represents the number of occurrences in January:
date value WeekDay WeekNo Year Month
2018-01-01 214.0 Monday 1 2018 1
2018-01-02 232.0 Tuesday 1 2018 1
2018-01-03 147.0 Wed 1 2018 1
2018-01-04 257.0 Thursd 1 2018 1
2018-01-05 164.0 Friday 1 2018 1
2018-01-06 187.0 Saturd 1 2018 1
2018-01-07 201.0 Sunday 1 2018 1
2018-01-08 141.0 Monday 2 2018 1
2018-01-09 152.0 Tuesday 2 2018 1
2018-01-10 167.0 Wednesd 2 2018 1
2018-01-15 113.0 Monday 3 2018 1
2018-01-16 139.0 Tuesday 3 2018 1
2018-01-17 159.0 Wednesd 3 2018 1
2018-01-18 202.0 Thursd 3 2018 1
2018-01-19 207.0 Friday 3 2018 1
... ... ... ... ...
WeekNo is the number of the week in a year.
My goal is to have a line plot showing the evolution of occurrences, for this particular month, per week number. Therefore, I'd like to have the weekday in the x-axis, the occurrences on the y-axis and different lines, each with a different color, for each week (and a legend with the color that corresponds to each week).
Does anyone have any idea how this could be done? Thanks a lot!
You can first reshape your dataframe to a format where the columns are the week number and one row per weekday. Then, use the plot pandas method:
reshaped = (df
.assign(date=lambda f: pd.to_datetime(f.date))
.assign(dayofweek=lambda f: f.date.dt.dayofweek,
dayname=lambda f: f.date.dt.weekday_name)
.set_index(['dayofweek', 'dayname', 'WeekNo'])
.value
.unstack()
.reset_index(0, drop=True))
print(reshaped)
reshaped.plot(marker='x')
WeekNo 1 2 3
dayname
Monday 214.0 141.0 113.0
Tuesday 232.0 152.0 139.0
Wednesday 147.0 167.0 159.0
Thursday 257.0 NaN 202.0
Friday 164.0 NaN 207.0
Saturday 187.0 NaN NaN
Sunday 201.0 NaN NaN

Reorder day of week in pandas groupby plot bar

I have sorted df data like below:
day_name Day_id
time
2019-05-20 19:00:00 Monday 0
2018-12-31 15:00:00 Monday 0
2019-02-25 17:00:00 Monday 0
2019-05-06 20:00:00 Monday 0
2019-03-12 12:00:00 Tuesday 1
2019-04-16 15:00:00 Tuesday 1
2019-04-02 18:00:00 Tuesday 1
2019-02-05 09:00:00 Tuesday 1
2019-05-28 21:00:00 Tuesday 1
2019-01-15 12:00:00 Tuesday 1
2019-06-04 20:00:00 Tuesday 1
2018-12-04 07:00:00 Tuesday 1
2019-01-22 11:00:00 Tuesday 1
2019-01-09 07:00:00 Wednesday 2
2019-03-06 16:00:00 Wednesday 2
2019-06-19 17:00:00 Wednesday 2
2019-04-10 20:00:00 Wednesday 2
2019-04-24 15:00:00 Wednesday 2
2019-01-31 08:00:00 Thursday 3
2019-01-03 08:00:00 Thursday 3
2019-02-28 19:00:00 Thursday 3
2019-05-23 20:00:00 Thursday 3
2018-12-20 07:00:00 Thursday 3
2019-05-09 19:00:00 Thursday 3
2019-06-28 15:00:00 Friday 4
2019-03-22 12:00:00 Friday 4
2019-03-29 14:00:00 Friday 4
2018-12-15 08:00:00 Saturday 5
2019-02-17 11:00:00 Sunday 6
2019-06-16 19:00:00 Sunday 6
2018-12-02 08:00:00 Sunday 6
Currentry with help of this post:
df = df.groupby(df.day_name).count().plot(kind="bar")
plt.show()
my output is:
How to plot histogram with days of week in proper order like: Monday, Tuesday ...?
I have found several approaches: 1, 2, 3, to solve this but can't find method for using them in my case.
Thank You all for hard work.
You need sort=False under groupby:
m = df.groupby(df.day_name,sort=False).count().plot(kind="bar")
plt.show()

Pandas: Check whether the particular day is in index at regular interval and if not mark the one before entry as something?

I am still very new to pandas and just figured out I have made a mistake in the process I was following earlier.
df_date
Date day
0 2016-05-26 Thursday
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday
20 2016-06-24 Friday
21 2016-06-27 Monday
22 2016-06-28 Tuesday
23 2016-06-29 Wednesday
There are about 600+ rows.
What I want to do
Make a column 'Exit' where if thursday is not in the week the Wednesday becomes E and if wednesday is not there then Tuesday.
I tried a for loop and I just can't seem to get this right.
Expected Output:
df_date
Date day Exit
0 2016-05-26 Thursday E
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday E
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday E
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday E
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday E
20 2016-06-24 Friday
21 2016-06-27 Monday
22 2016-06-28 Tuesday
23 2016-06-29 Wednesday E
I added this in comments but should be here as well:
If Thursday is not present then the record just before it.
So if Wednesday is also not present in the week, then Tuesday
If Tuesday is also not then Monday, if monday is not then Friday. Saturday and Sunday will never have a record.
Here's a solution:
ix = df.groupby(pd.Grouper(key='Date', freq='W')).Date
.apply(lambda x: (x.dt.dayofweek <= 3)[::-1].idxmax()).values
df.loc[ix,'Exit'] = 'E'
df.fillna('')
Date day Exit
0 2016-05-26 Thursday E
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday E
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday E
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday E
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday
20 2016-06-23 Thursday E
21 2016-06-24 Friday
22 2016-06-27 Monday
23 2016-06-28 Tuesday
24 2016-06-29 Wednesday E
You can use dt.week and dt.weekday properties of your datetime series. Then use groupby + max for your required logic. This is likely to be more efficient than sequential equality checks.
df['Date'] = pd.to_datetime(df['Date'])
# add week and weekday series
df['Week'] = df['Date'].dt.week
df['Weekday'] = df['Date'].dt.weekday.where(df['Date'].dt.weekday.isin([1, 2, 3]))
df['Exit'] = np.where(df['Weekday'] == df.groupby('Week')['Weekday'].transform('max'),
'E', '')
Result
I have left the helper columns so the way the solution works is clear. These can easily be removed.
print(df)
Date day Week Weekday Exit
0 2016-05-26 Thursday 21 3.0 E
1 2016-05-27 Friday 21 NaN
2 2016-05-30 Monday 22 NaN
3 2016-05-31 Tuesday 22 1.0
4 2016-06-01 Wednesday 22 2.0
5 2016-06-02 Thursday 22 3.0 E
6 2016-06-03 Friday 22 NaN
7 2016-06-06 Monday 23 NaN
8 2016-06-07 Tuesday 23 1.0
9 2016-06-08 Wednesday 23 2.0
10 2016-06-09 Thursday 23 3.0 E
11 2016-06-10 Friday 23 NaN
12 2016-06-13 Monday 24 NaN
13 2016-06-14 Tuesday 24 1.0
14 2016-06-15 Wednesday 24 2.0
15 2016-06-16 Thursday 24 3.0 E
16 2016-06-17 Friday 24 NaN
17 2016-06-20 Monday 25 NaN
18 2016-06-21 Tuesday 25 1.0
19 2016-06-22 Wednesday 25 2.0 E
20 2016-06-24 Friday 25 NaN
21 2016-06-27 Monday 26 NaN
22 2016-06-28 Tuesday 26 1.0
23 2016-06-29 Wednesday 26 2.0 E

Creating week variable in pandas with custom start day for week

I have the following pandas dataframe indexed to a Time_Stamp:
df = DataFrame(index = pd.date_range('4/1/2017', freq='3D', periods=10))
df['weekday'] = df.index.weekday_name
Data looks like this:
weekday
2017-04-01 Saturday
2017-04-04 Tuesday
2017-04-07 Friday
2017-04-10 Monday
2017-04-13 Thursday
2017-04-16 Sunday
2017-04-19 Wednesday
2017-04-22 Saturday
2017-04-25 Tuesday
2017-04-28 Friday
I want to create a new column 'week' that will give the week ordinal of the year but with a weekday.
I know I can just do this:
df['week_sun'] = df.index.week
Except I want the first day of the week to be something besides Sunday. For this question, lets say I need it to be Wednesday so that the resulting dataframe would be like so:
weekday week_sun week_wed
2017-04-01 Saturday 13 13
2017-04-04 Tuesday 14 13
2017-04-07 Friday 14 14
2017-04-10 Monday 15 14
2017-04-13 Thursday 15 15
2017-04-16 Sunday 15 15
2017-04-19 Wednesday 16 16
2017-04-22 Saturday 16 16
2017-04-25 Tuesday 17 16
2017-04-28 Friday 17 17
I'm at a loss to how to achieve this. Thanks!
Given your requirements, you would only need to subtract 1 to the week number, in case the day of the week is "before" the reference day (Wednesday in your example).
In [162]: df
Out[162]:
weekday week_sun
2017-04-01 Saturday 13
2017-04-04 Tuesday 14
2017-04-07 Friday 14
2017-04-10 Monday 15
2017-04-13 Thursday 15
2017-04-16 Sunday 15
2017-04-19 Wednesday 16
2017-04-22 Saturday 16
2017-04-25 Tuesday 17
2017-04-28 Friday 17
In [163]: df['week_wed'] = df['week_sun']
Let's now shift the value where needed, meaning when the weekday is before Wednesday, hence df.index.dayofweek < 2.
In [164]: df.loc[df.index.dayofweek < 2, 'week_wed'] = (df[df.index.dayofweek < 2]['week_sun'] - 2) % 52 + 1
In [165]: df
Out[165]:
weekday week_sun week_wed
2017-04-01 Saturday 13 13
2017-04-04 Tuesday 14 13
2017-04-07 Friday 14 14
2017-04-10 Monday 15 14
2017-04-13 Thursday 15 15
2017-04-16 Sunday 15 15
2017-04-19 Wednesday 16 16
2017-04-22 Saturday 16 16
2017-04-25 Tuesday 17 16
2017-04-28 Friday 17 17
I didn't exactly subtract 1, but instead used a modulo operation ((X-2) %52 +1)) so I can convert week 1 to week 52 from the previous year if need be:
weekday week_sun week_wed
2017-12-27 Wednesday 52 52
2017-12-30 Saturday 52 52
2018-01-02 Tuesday 1 52
2018-01-05 Friday 1 1

Categories

Resources