I have the following data. This represents the number of occurrences in January:
date value WeekDay WeekNo Year Month
2018-01-01 214.0 Monday 1 2018 1
2018-01-02 232.0 Tuesday 1 2018 1
2018-01-03 147.0 Wed 1 2018 1
2018-01-04 257.0 Thursd 1 2018 1
2018-01-05 164.0 Friday 1 2018 1
2018-01-06 187.0 Saturd 1 2018 1
2018-01-07 201.0 Sunday 1 2018 1
2018-01-08 141.0 Monday 2 2018 1
2018-01-09 152.0 Tuesday 2 2018 1
2018-01-10 167.0 Wednesd 2 2018 1
2018-01-15 113.0 Monday 3 2018 1
2018-01-16 139.0 Tuesday 3 2018 1
2018-01-17 159.0 Wednesd 3 2018 1
2018-01-18 202.0 Thursd 3 2018 1
2018-01-19 207.0 Friday 3 2018 1
... ... ... ... ...
WeekNo is the number of the week in a year.
My goal is to have a line plot showing the evolution of occurrences, for this particular month, per week number. Therefore, I'd like to have the weekday in the x-axis, the occurrences on the y-axis and different lines, each with a different color, for each week (and a legend with the color that corresponds to each week).
Does anyone have any idea how this could be done? Thanks a lot!
You can first reshape your dataframe to a format where the columns are the week number and one row per weekday. Then, use the plot pandas method:
reshaped = (df
.assign(date=lambda f: pd.to_datetime(f.date))
.assign(dayofweek=lambda f: f.date.dt.dayofweek,
dayname=lambda f: f.date.dt.weekday_name)
.set_index(['dayofweek', 'dayname', 'WeekNo'])
.value
.unstack()
.reset_index(0, drop=True))
print(reshaped)
reshaped.plot(marker='x')
WeekNo 1 2 3
dayname
Monday 214.0 141.0 113.0
Tuesday 232.0 152.0 139.0
Wednesday 147.0 167.0 159.0
Thursday 257.0 NaN 202.0
Friday 164.0 NaN 207.0
Saturday 187.0 NaN NaN
Sunday 201.0 NaN NaN
Related
I have a time-series dataset with different amounts of released gases in each time step as follows, the data is monitored day to day, in Date the sampling time is reflected and in the other columns the amount of released gas.
import pandas as pd
from statistics import mean
import numpy as np
Data = pd.read_csv('PTR 69.csv')
Data.columns = ['Date', 'H2', 'CH4', 'C2H6', 'C2H4', 'C2H2', 'CO', 'CO2', 'O2']
Data.dropna(how='all', axis=1, inplace=True)
Data.head()
It looks like this:
Date H2 CH4 C2H6 C2H4 C2H2 CO CO2 O2
0 2021-04-14 2:00 8.301259 10.889560 7.205929 3.485577 0.108262 318.616211 1659.179688 866.826721
1 2021-04-13 3:00 8.190150 10.224614 7.369829 3.561115 0.130052 318.895599 1641.014526 883.500305
2 2021-04-12 4:00 8.223248 10.297009 7.571199 3.479434 0.113566 315.364594 1636.670776 896.083679
3 2021-04-11 5:00 8.342580 10.233653 7.726023 3.474085 0.234786 316.315277 1641.205078 875.664856
4 2021-04-10 6:00 8.365788 9.825816 7.640978 3.621368 0.320388 320.200409 1658.575806 880.871399
5 2021-04-09 7:00 8.113251 11.198173 7.588203 3.561790 0.200721 318.738922 1651.639038 886.923401
6 2021-04-08 8:00 7.881397 7.967482 7.382273 3.528960 0.180016 315.252838 1625.236328 878.604309
7 2021-04-07 9:00 7.833044 6.773924 7.292545 3.475330 0.401435 317.085449 1628.325562 893.305664
8 2021-04-06 10:00 7.908926 9.419571 7.018494 3.347562 0.406113 317.643768 1620.742554 912.732422
9 2021-04-05 11:00 8.192807 9.262563 7.227449 3.275920 0.133978 312.931152 1601.240 845 932.079102
10 2021-04-04 12:00 8.086914 9.480316 6.515196 3.312712 0.000000 315.486816 1609.530884 928.141907
11 2021-04-03 13:00 7.984566 9.406860 6.712120 3.476949 0.336859 312.862793 1596.182495 938.904724
12 2021-04-02 14:00 8.077889 8.335327 7.443592 3.605910 0.416443 315.546539 1605.549438 928.619568
13 2021-04-01 15:00 7.996786 9.087573 7.950811 3.626776 0.745824 311.601471 1608.987183 897.747498
14 2021-03-31 16:00 8.433417 10.078784 6.567528 3.646854 0.682301 313.811615 1619.164673 825.123596
15 2021-03-30 17:00 8.445275 9.768773 7.460344 3.712297 0.353539 314.944672 1606.494751 811.027161
16 2021-03-29 18:00 8.398427 9.607062 7.446943 3.674934 0.287205 314.554596 1599.793823 828.780090
17 2021-03-28 19:00 8.272332 9.678397 7.303371 3.617573 0.430137 311.486664 1590.122192 828.557312
18 2021-03-27 20:00 8.478241 9.364383 7.153194 3.616118 0.548547 314.538849 1578.516235 821.565125
19 2021-03-26 21:00 8.452413 10.828227 6.825691 3.260484 0.642971 314.990082 1561.811890 826.468079
First I used the [pd.to_datetime] and separate the data frame based on the month and year as you can see:
Data['Date'] = pd.to_datetime(Data['Date'])
# How long is the dataset
Data['Date'].max () - Data['Date'].min ()
Reults:
```python
Timedelta('1364 days 12:49:00')
Data['Month'] = Data['Date'].dt.month
Data['Year'] = Data['Date'].dt.year
Data.head()
Then like this:
```python
Date H2 CH4 C2H6 C2H4 C2H2 CO CO2 O2 Month Year
0 2021-04-14 02:00:00 8.301259 10.889560 7.205929 3.485577 0.108262 318.616211 1659.179688 866.826721 4 2021
1 2021-04-13 03:00:00 8.190150 10.224614 7.369829 3.561115 0.130052 318.895599 1641.014526 883.500305 4 2021
2 2021-04-12 04:00:00 8.223248 10.297009 7.571199 3.479434 0.113566 315.364594 1636.670776 896.083679 4 2021
3 2021-04-11 05:00:00 8.342580 10.233653 7.726023 3.474085 0.234786 316.315277 1641.205078 875.664856 4 2021
4 2021-04-10 06:00:00 8.365788 9.825816 7.640978 3.621368 0.320388 320.200409 1658.575806 880.871399 4 2021
5 2021-04-09 07:00:00 8.113251 11.198173 7.588203 3.561790 0.200721 318.738922 1651.639038 886.923401 4 2021
6 2021-04-08 08:00:00 7.881397 7.967482 7.382273 3.528960 0.180016 315.252838 1625.236328 878.604309 4 2021
7 2021-04-07 09:00:00 7.833044 6.773924 7.292545 3.475330 0.401435 317.085449 1628.325562 893.305664 4 2021
8 2021-04-06 10:00:00 7.908926 9.419571 7.018494 3.347562 0.406113 317.643768 1620.742554 912.732422 4 2021
9 2021-04-05 11:00:00 8.192807 9.262563 7.227449 3.275920 0.133978 312.931152 1601.240845 932.079102 4 2021
So, two other columns [Month] and [Year] are added to the data frame.
My question: How I can calculate the rate of H2 changes over a month?
I know that first, I should calculate the mean of H2 in each month and each year as my data is time-sereis.
Mean_month = Data.set_index('Date').groupby(pd.Grouper(freq = 'M'))['H2'].mean().reset_index()
I used the previous steps to convert the date to [pd.to_datetime]:
Mean_month['Date'] = pd.to_datetime(Mean_month['Date'])
Mean_month['Month_mean'] = Mean_month['Date'].dt.month
Mean_month['Year_mean'] = Mean_month['Date'].dt.year
Mean_month.head ()
looks like this one:
Date H2 CH4 C2H2 C2H4 C2H6 CO CO2 O2 Month_mean Year_mean
0 2017-07-31 0.892207 0.797776 0.572518 0.119328 0.203212 23.137884 230.986328 1756.658813 7 2017
1 2017-08-31 NaN NaN NaN NaN NaN NaN NaN NaN 8 2017
2 2017-09-30 NaN NaN NaN NaN NaN NaN NaN NaN 9 2017
3 2017-10-31 NaN NaN NaN NaN NaN NaN NaN NaN 10 2017
4 2017-11-30 NaN NaN NaN NaN NaN NaN NaN NaN 11 2017
5 2017-12-31 NaN NaN NaN NaN NaN NaN NaN NaN 12 2017
6 2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN 1 2018
7 2018-02-28 NaN NaN NaN NaN NaN NaN NaN NaN 2 2018
8 2018-03-31 NaN NaN NaN NaN NaN NaN NaN NaN 3 2018
9 2018-04-30 NaN NaN NaN NaN NaN NaN NaN NaN 4 2018
10 2018-05-31 NaN NaN NaN NaN NaN NaN NaN NaN 5 2018
11 2018-06-30 3.376091 1.780959 0.488345 0.431397 1.777461 59.424690 246.135108 2927.244192 6 2018
12 2018-07-31 3.785872 1.710799 0.479277 0.405084 2.416031 63.220747 256.035651 2971.905932 7 2018
13 2018-08-31 3.789915 1.874313 0.444453 0.339609 2.516580 67.629768 264.437564 3016.440033 8 2018
14 2018-09-30 3.882403 1.842717 0.443967 0.342131 2.848867 71.592693 271.972792 3073.598901 9 2018
15 2018-10-31 3.858354 2.037401 0.364234 0.358209 2.651448 75.036622 274.889362 3150.082060 10 2018
16 2018-11-30 3.861638 1.854492 0.276273 0.289241 2.813399 78.563868 289.631986 3176.243186 11 2018
17 2018-12-31 5.029865 2.526096 0.232814 0.510899 3.423260 95.641880 409.359902 2831.721010 12 2018
18 2019-01-31 6.103601 2.528294 0.177558 0.612607 4.039948 116.639744 516.362618 2423.434258 1 2019
19 2019-02-28 7.480646 3.316433 0.239254 0.959470 5.319684 142.571229 662.409360 1877.447767 2 2019
20 2019-03-31 8.363644 3.779225 0.213011 1.171834 6.179431 167.295488 815.904473 1415.431158 3 2019
21 2019-04-30 9.523452 4.620810 0.233048 1.703750 8.359211 195.914846 1044.554593 898.940531 4 2019
22 2019-05-31 10.118435 5.524447 0.311802 1.904199 9.275237 213.531002 1178.495602 657.617859 5 2019
23 2019-06-30 10.283766 6.186843 0.377420 2.165453 10.729356 226.061226 1226.489872 589.417023 6 2019
24 2019-07-31 9.943331 6.648062 0.492584 2.326774 11.791042 234.309877 1257.822071 572.162592 7 2019
25 2019-08-31 9.812387 6.681962 0.510871 2.483979 13.067311 243.440762 1302.643938 568.994610 8 2019
26 2019-09-30 9.661653 7.323367 0.420726 2.628199 13.308826 252.133648 1383.259943 550.533951 9 2019
27 2019-10-31 9.246261 7.644706 0.372446 2.673924 13.880747 257.093790 1407.996110 565.502500 10 2019
28 2019-11-30 8.226894 6.606762 0.411812 2.290050 12.958136 257.590110 1306.817593 654.086494 11 2019
29 2019-12-31 7.985734 7.461197 0.314830 2.417687 13.255049 259.519881 1309.507549 684.085808 12 2019
30 2020-01-31 7.754674 7.804206 0.336518 2.506526 13.554615 262.188585 1312.052006 700.065050 1 2020
31 2020-02-29 7.662918 7.607357 0.283796 2.483387 13.803671 264.348120 1300.252926 710.281917 2 2020
32 2020-03-31 7.602619 8.326974 0.278294 2.629290 13.983202 268.429411 1351.023144 698.012543 3 2020
33 2020-04-30 7.585870 8.028798 0.389348 2.856049 15.635886 273.859451 1426.279447 703.866225 4 2020
34 2020-05-31 7.752543 8.622809 0.329810 2.974434 16.470193 279.636700 1484.100789 685.164897 5 2020
35 2020-06-30 7.935418 8.632543 0.408732 3.410121 18.330232 287.545439 1593.554077 653.294214 6 2020
36 2020-07-31 8.226212 9.180892 0.474289 3.646311 19.746735 295.059049 1688.793476 613.164837 7 2020
37 2020-08-31 8.535027 9.583940 0.517722 3.860195 20.853958 303.025472 1759.655769 597.264223 8 2020
38 2020-09-30 8.782468 9.318198 0.447619 3.780273 21.613501 309.644693 1790.096266 594.891798 9 2020
39 2020-10-31 8.766880 17.531840 0.436720 3.671641 21.794714 312.511920 1783.446248 622.681765 10 2020
40 2020-11-30 8.535022 9.695740 0.427224 3.352291 11.561881 311.624202 1676.413354 713.680609 11 2020
41 2020-12-31 8.374398 9.114723 0.340198 3.351321 6.768138 312.902290 1642.077442 766.767532 12 2020
42 2021-01-31 8.238818 9.373566 0.344173 3.372903 6.670032 313.475182 1604.747685 788.205679 1 2021
43 2021-02-28 8.191080 9.900578 0.334562 3.352319 6.802692 314.076140 1572.294619 815.143081 2 2021
44 2021-03-31 8.317389 9.627182 0.385551 3.209554 5.862067 312.134351 1484.145511 867.169165 3 2021
45 2021-04-30 8.107043 9.457317 0.266317 3.488106 7.331760 316.181560 1627.434300 900.000397 4 2021
As the [Mean_month] data frame is sorted ascending, I resorted it again with:
Srt_Mean = Mean_month.sort_values(['Date'],ascending=False)
Srt_Mean
the results are:
Date H2 CH4 C2H2 C2H4 C2H6 CO CO2 O2 Month_mean Year_mean
45 2021-04-30 8.107043 9.457317 0.266317 3.488106 7.331760 316.181560 1627.434300 900.000397 4 2021
44 2021-03-31 8.317389 9.627182 0.385551 3.209554 5.862067 312.134351 1484.145511 867.169165 3 2021
43 2021-02-28 8.191080 9.900578 0.334562 3.352319 6.802692 314.076140 1572.294619 815.143081 2 2021
42 2021-01-31 8.238818 9.373566 0.344173 3.372903 6.670032 313.475182 1604.747685 788.205679 1 2021
41 2020-12-31 8.374398 9.114723 0.340198 3.351321 6.768138 312.902290 1642.077442 766.767532 12 2020
40 2020-11-30 8.535022 9.695740 0.427224 3.352291 11.561881 311.624202 1676.413354 713.680609 11 2020
39 2020-10-31 8.766880 17.531840 0.436720 3.671641 21.794714 312.511920 1783.446248 622.681765 10 2020
38 2020-09-30 8.782468 9.318198 0.447619 3.780273 21.613501 309.644693 1790.096266 594.891798 9 2020
37 2020-08-31 8.535027 9.583940 0.517722 3.860195 20.853958 303.025472 1759.655769 597.264223 8 2020
36 2020-07-31 8.226212 9.180892 0.474289 3.646311 19.746735 295.059049 1688.793476 613.164837 7 2020
35 2020-06-30 7.935418 8.632543 0.408732 3.410121 18.330232 287.545439 1593.554077 653.294214 6 2020
34 2020-05-31 7.752543 8.622809 0.329810 2.974434 16.470193 279.636700 1484.100789 685.164897 5 2020
33 2020-04-30 7.585870 8.028798 0.389348 2.856049 15.635886 273.859451 1426.279447 703.866225 4 2020
32 2020-03-31 7.602619 8.326974 0.278294 2.629290 13.983202 268.429411 1351.023144 698.012543 3 2020
31 2020-02-29 7.662918 7.607357 0.283796 2.483387 13.803671 264.348120 1300.252926 710.281917 2 2020
30 2020-01-31 7.754674 7.804206 0.336518 2.506526 13.554615 262.188585 1312.052006 700.065050 1 2020
29 2019-12-31 7.985734 7.461197 0.314830 2.417687 13.255049 259.519881 1309.507549 684.085808 12 2019
28 2019-11-30 8.226894 6.606762 0.411812 2.290050 12.958136 257.590110 1306.817593 654.086494 11 2019
27 2019-10-31 9.246261 7.644706 0.372446 2.673924 13.880747 257.093790 1407.996110 565.502500 10 2019
26 2019-09-30 9.661653 7.323367 0.420726 2.628199 13.308826 252.133648 1383.259943 550.533951 9 2019
25 2019-08-31 9.812387 6.681962 0.510871 2.483979 13.067311 243.440762 1302.643938 568.994610 8 2019
24 2019-07-31 9.943331 6.648062 0.492584 2.326774 11.791042 234.309877 1257.822071 572.162592 7 2019
23 2019-06-30 10.283766 6.186843 0.377420 2.165453 10.729356 226.061226 1226.489872 589.417023 6 2019
22 2019-05-31 10.118435 5.524447 0.311802 1.904199 9.275237 213.531002 1178.495602 657.617859 5 2019
21 2019-04-30 9.523452 4.620810 0.233048 1.703750 8.359211 195.914846 1044.554593 898.940531 4 2019
20 2019-03-31 8.363644 3.779225 0.213011 1.171834 6.179431 167.295488 815.904473 1415.431158 3 2019
19 2019-02-28 7.480646 3.316433 0.239254 0.959470 5.319684 142.571229 662.409360 1877.447767 2 2019
18 2019-01-31 6.103601 2.528294 0.177558 0.612607 4.039948 116.639744 516.362618 2423.434258 1 2019
17 2018-12-31 5.029865 2.526096 0.232814 0.510899 3.423260 95.641880 409.359902 2831.721010 12 2018
16 2018-11-30 3.861638 1.854492 0.276273 0.289241 2.813399 78.563868 289.631986 3176.243186 11 2018
15 2018-10-31 3.858354 2.037401 0.364234 0.358209 2.651448 75.036622 274.889362 3150.082060 10 2018
14 2018-09-30 3.882403 1.842717 0.443967 0.342131 2.848867 71.592693 271.972792 3073.598901 9 2018
13 2018-08-31 3.789915 1.874313 0.444453 0.339609 2.516580 67.629768 264.437564 3016.440033 8 2018
12 2018-07-31 3.785872 1.710799 0.479277 0.405084 2.416031 63.220747 256.035651 2971.905932 7 2018
11 2018-06-30 3.376091 1.780959 0.488345 0.431397 1.777461 59.424690 246.135108 2927.244192 6 2018
10 2018-05-31 NaN NaN NaN NaN NaN NaN NaN NaN 5 2018
9 2018-04-30 NaN NaN NaN NaN NaN NaN NaN NaN 4 2018
8 2018-03-31 NaN NaN NaN NaN NaN NaN NaN NaN 3 2018
7 2018-02-28 NaN NaN NaN NaN NaN NaN NaN NaN 2 2018
6 2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN 1 2018
5 2017-12-31 NaN NaN NaN NaN NaN NaN NaN NaN 12 2017
4 2017-11-30 NaN NaN NaN NaN NaN NaN NaN NaN 11 2017
3 2017-10-31 NaN NaN NaN NaN NaN NaN NaN NaN 10 2017
2 2017-09-30 NaN NaN NaN NaN NaN NaN NaN NaN 9 2017
1 2017-08-31 NaN NaN NaN NaN NaN NaN NaN NaN 8 2017
0 2017-07-31 0.892207 0.797776 0.572518 0.119328 0.203212 23.137884 230.986328 1756.658813 7 2017
I also defined the index for both data frames as finally, I want to divide the column of [H2] in the first data frame over the column of [H2] in the first dataframe:
df_Data = Data.set_index(['Month', 'Year'])
df_Data.head (50)
df_Srt_Mean = Srt_Mean.set_index (['Month_mean', 'Year_mean'])
df_Srt_Mean.head (50)
Date H2 CH4 C2H6 C2H4 C2H2 CO CO2 O2
Month Year
4 2021 2021-04-14 02:00:00 8.301259 10.889560 7.205929 3.485577 0.108262 318.616211 1659.179688 866.826721
2021 2021-04-13 03:00:00 8.190150 10.224614 7.369829 3.561115 0.130052 318.895599 1641.014526 883.500305
2021 2021-04-12 04:00:00 8.223248 10.297009 7.571199 3.479434 0.113566 315.364594 1636.670776 896.083679
2021 2021-04-11 05:00:00 8.342580 10.233653 7.726023 3.474085 0.234786 316.315277 1641.205078 875.664856
2021 2021-04-10 06:00:00 8.365788 9.825816 7.640978 3.621368 0.320388 320.200409 1658.575806 880.871399
2021 2021-04-09 07:00:00 8.113251 11.198173 7.588203 3.561790 0.200721 318.738922 1651.639038 886.923401
2021 2021-04-08 08:00:00 7.881397 7.967482 7.382273 3.528960 0.180016 315.252838 1625.236328 878.604309
2021 2021-04-07 09:00:00 7.833044 6.773924 7.292545 3.475330 0.401435 317.085449 1628.325562 893.305664
2021 2021-04-06 10:00:00 7.908926 9.419571 7.018494 3.347562 0.406113 317.643768 1620.742554 912.732422
2021 2021-04-05 11:00:00 8.192807 9.262563 7.227449 3.275920 0.133978 312.931152 1601.240845 932.079102
2021 2021-04-04 12:00:00 8.086914 9.480316 6.515196 3.312712 0.000000 315.486816 1609.530884 928.141907
2021 2021-04-03 13:00:00 7.984566 9.406860 6.712120 3.476949 0.336859 312.862793 1596.182495 938.904724
2021 2021-04-02 14:00:00 8.077889 8.335327 7.443592 3.605910 0.416443 315.546539 1605.549438 928.619568
2021 2021-04-01 15:00:00 7.996786 9.087573 7.950811 3.626776 0.745824 311.601471 1608.987183 897.747498
3 2021 2021-03-31 16:00:00 8.433417 10.078784 6.567528 3.646854 0.682301 313.811615 1619.164673 825.123596
2021 2021-03-30 17:00:00 8.445275 9.768773 7.460344 3.712297 0.353539 314.944672 1606.494751 811.027161
2021 2021-03-29 18:00:00 8.398427 9.607062 7.446943 3.674934 0.287205 314.554596 1599.793823 828.780090
2021 2021-03-28 19:00:00 8.272332 9.678397 7.303371 3.617573 0.430137 311.486664 1590.122192 828.557312
2021 2021-03-27 20:00:00 8.478241 9.364383 7.153194 3.616118 0.548547 314.538849 1578.516235 821.565125
2021 2021-03-26 21:00:00 8.452413 10.828227 6.825691 3.260484 0.642971 314.990082 1561.811890 826.468079
2021 2021-03-25 22:00:00 8.420037 10.468951 6.614395 3.279383 0.442519 314.821197 1538.289673 835.261902
2021 2021-03-24 23:00:00 8.290853 9.943011 5.952219 3.263231 0.077059 313.060883 1498.917969 859.999023
2021 2021-03-24 00:00:00 8.053485 9.717534 5.773523 3.210894 0.477235 309.256561 1461.547974 867.371643
2021 2021-03-23 01:00:00 8.813514 10.700623 5.444063 2.965948 0.421797 312.926971 1437.077026 867.363709
2021 2021-03-22 02:00:00 8.149124 9.727563 4.518490 2.958276 0.368664 311.796661 1420.417358 916.602539
2021 2021-03-21 03:00:00 8.169525 8.859634 5.212233 3.129839 0.416121 312.702301 1419.987427 904.523865
2021 2021-03-20 04:00:00 7.999515 8.994797 5.137753 3.148643 0.475540 307.183685 1420.932739 913.971130
2021 2021-03-19 05:00:00 8.183563 10.373088 4.949068 3.037351 0.584536 312.275482 1440.424683 895.362122
2021 2021-03-18 06:00:00 9.914630 10.722699 4.891720 3.121366 0.364292 312.476959 1446.715210 889.638367
2021 2021-03-17 07:00:00 8.063797 9.449814 4.965353 3.158536 0.332817 307.930389 1443.011108 883.420349
2021 2021-03-16 08:00:00 8.858215 9.454753 5.053194 3.093672 0.249709 313.467071 1456.114624 902.091492
2021 2021-03-15 09:00:00 8.146770 8.423282 5.213614 3.038460 0.228652 312.719238 1443.799438 900.013672
2021 2021-03-14 10:00:00 8.160034 14.032947 5.426914 2.981697 0.391028 313.519440 1459.276245 891.870300
2021 2021-03-13 11:00:00 7.876873 5.985085 5.602545 2.998276 0.607312 311.964203 1447.259399 886.466492
2021 2021-03-12 12:00:00 8.299830 9.434842 5.768423 2.931913 0.374833 312.165375 1450.703979 893.731873
2021 2021-03-11 13:00:00 8.258931 9.164996 5.773973 2.917338 0.367790 312.416412 1447.783203 884.459534
2021 2021-03-10 14:00:00 8.285775 9.396652 5.687450 3.018778 0.367582 312.764160 1452.421875 883.869568
2021 2021-03-09 15:00:00 8.069007 9.174088 5.641685 3.134619 0.282684 307.792206 1445.247192 887.044922
2021 2021-03-08 16:00:00 8.150889 8.341151 5.952223 3.310198 0.276260 310.551758 1453.108765 881.680664
2021 2021-03-07 17:00:00 8.148776 8.571256 5.962189 3.365770 0.321035 311.439789 1450.016235 881.019348
2021 2021-03-06 18:00:00 8.235992 9.840173 5.190016 3.325249 0.390993 313.732513 1476.067505 880.206055
2021 2021-03-05 19:00:00 8.041183 8.705338 6.181820 3.528234 0.299884 308.838959 1456.264038 857.722656
2021 2021-03-04 20:00:00 8.286016 8.883926 5.667931 3.196103 0.350631 314.590729 1479.576538 861.197266
2021 2021-03-03 21:00:00 8.245660 9.066014 5.785030 3.191303 0.378657 313.044281 1479.022095 850.414856
2021 2021-03-02 22:00:00 8.386712 9.401718 6.162895 3.043518 0.363813 312.941315 1493.645142 840.161438
2021 2021-03-01 23:00:00 8.231705 10.864131 6.184435 3.010111 0.217610 309.424164 1501.307983 834.103943
2021 2021-03-01 00:00:00 8.253326 10.673305 5.977970 3.028328 0.349412 310.304413 1501.962891 825.492371
2 2021 2021-02-28 01:00:00 8.313703 10.718976 5.379131 3.017091 0.303016 313.576935 1511.731079 837.980774
2021 2021-02-27 02:00:00 8.315781 10.122794 5.632700 3.183661 0.419333 309.140228 1502.215210 855.478516
2021 2021-02-26 03:00:00 7.974852 10.396459 6.063492 3.239314 0.497979 314.248688 1523.176880 852.766907
Date H2 CH4 C2H2 C2H4 C2H6 CO CO2 O2
Month_mean Year_mean
4 2021 2021-04-30 8.107043 9.457317 0.266317 3.488106 7.331760 316.181560 1627.434300 900.000397
3 2021 2021-03-31 8.317389 9.627182 0.385551 3.209554 5.862067 312.134351 1484.145511 867.169165
2 2021 2021-02-28 8.191080 9.900578 0.334562 3.352319 6.802692 314.076140 1572.294619 815.143081
1 2021 2021-01-31 8.238818 9.373566 0.344173 3.372903 6.670032 313.475182 1604.747685 788.205679
12 2020 2020-12-31 8.374398 9.114723 0.340198 3.351321 6.768138 312.902290 1642.077442 766.767532
11 2020 2020-11-30 8.535022 9.695740 0.427224 3.352291 11.561881 311.624202 1676.413354 713.680609
10 2020 2020-10-31 8.766880 17.531840 0.436720 3.671641 21.794714 312.511920 1783.446248 622.681765
9 2020 2020-09-30 8.782468 9.318198 0.447619 3.780273 21.613501 309.644693 1790.096266 594.891798
8 2020 2020-08-31 8.535027 9.583940 0.517722 3.860195 20.853958 303.025472 1759.655769 597.264223
7 2020 2020-07-31 8.226212 9.180892 0.474289 3.646311 19.746735 295.059049 1688.793476 613.164837
6 2020 2020-06-30 7.935418 8.632543 0.408732 3.410121 18.330232 287.545439 1593.554077 653.294214
5 2020 2020-05-31 7.752543 8.622809 0.329810 2.974434 16.470193 279.636700 1484.100789 685.164897
4 2020 2020-04-30 7.585870 8.028798 0.389348 2.856049 15.635886 273.859451 1426.279447 703.866225
3 2020 2020-03-31 7.602619 8.326974 0.278294 2.629290 13.983202 268.429411 1351.023144 698.012543
2 2020 2020-02-29 7.662918 7.607357 0.283796 2.483387 13.803671 264.348120 1300.252926 710.281917
1 2020 2020-01-31 7.754674 7.804206 0.336518 2.506526 13.554615 262.188585 1312.052006 700.065050
12 2019 2019-12-31 7.985734 7.461197 0.314830 2.417687 13.255049 259.519881 1309.507549 684.085808
11 2019 2019-11-30 8.226894 6.606762 0.411812 2.290050 12.958136 257.590110 1306.817593 654.086494
10 2019 2019-10-31 9.246261 7.644706 0.372446 2.673924 13.880747 257.093790 1407.996110 565.502500
9 2019 2019-09-30 9.661653 7.323367 0.420726 2.628199 13.308826 252.133648 1383.259943 550.533951
8 2019 2019-08-31 9.812387 6.681962 0.510871 2.483979 13.067311 243.440762 1302.643938 568.994610
7 2019 2019-07-31 9.943331 6.648062 0.492584 2.326774 11.791042 234.309877 1257.822071 572.162592
6 2019 2019-06-30 10.283766 6.186843 0.377420 2.165453 10.729356 226.061226 1226.489872 589.417023
5 2019 2019-05-31 10.118435 5.524447 0.311802 1.904199 9.275237 213.531002 1178.495602 657.617859
4 2019 2019-04-30 9.523452 4.620810 0.233048 1.703750 8.359211 195.914846 1044.554593 898.940531
3 2019 2019-03-31 8.363644 3.779225 0.213011 1.171834 6.179431 167.295488 815.904473 1415.431158
2 2019 2019-02-28 7.480646 3.316433 0.239254 0.959470 5.319684 142.571229 662.409360 1877.447767
1 2019 2019-01-31 6.103601 2.528294 0.177558 0.612607 4.039948 116.639744 516.362618 2423.434258
12 2018 2018-12-31 5.029865 2.526096 0.232814 0.510899 3.423260 95.641880 409.359902 2831.721010
11 2018 2018-11-30 3.861638 1.854492 0.276273 0.289241 2.813399 78.563868 289.631986 3176.243186
10 2018 2018-10-31 3.858354 2.037401 0.364234 0.358209 2.651448 75.036622 274.889362 3150.082060
9 2018 2018-09-30 3.882403 1.842717 0.443967 0.342131 2.848867 71.592693 271.972792 3073.598901
8 2018 2018-08-31 3.789915 1.874313 0.444453 0.339609 2.516580 67.629768 264.437564 3016.440033
7 2018 2018-07-31 3.785872 1.710799 0.479277 0.405084 2.416031 63.220747 256.035651 2971.905932
6 2018 2018-06-30 3.376091 1.780959 0.488345 0.431397 1.777461 59.424690 246.135108 2927.244192
5 2018 2018-05-31 NaN NaN NaN NaN NaN NaN NaN NaN
4 2018 2018-04-30 NaN NaN NaN NaN NaN NaN NaN NaN
3 2018 2018-03-31 NaN NaN NaN NaN NaN NaN NaN NaN
2 2018 2018-02-28 NaN NaN NaN NaN NaN NaN NaN NaN
1 2018 2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
12 2017 2017-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
11 2017 2017-11-30 NaN NaN NaN NaN NaN NaN NaN NaN
10 2017 2017-10-31 NaN NaN NaN NaN NaN NaN NaN NaN
9 2017 2017-09-30 NaN NaN NaN NaN NaN NaN NaN NaN
8 2017 2017-08-31 NaN NaN NaN NaN NaN NaN NaN NaN
7 2017 2017-07-31 0.892207 0.797776 0.572518 0.119328 0.203212 23.137884 230.986328 1756.658813
Now, for each month of each year, I have one mean, How can divide the column of H2 of the first data frame over this column which includes one number.. For example for
April 2021, we have 30 days and one mean,
May 2021, we have 31 days and one mean,
Based on the index of these two data frames this division should be performed.
I really appreciate it if if you can help me find a solution..
Here's a quick peek of my dataframe:
local_date amount
0 2017-08-16 10.00
1 2017-10-26 21.70
2 2017-11-04 5.00
3 2017-11-12 37.20
4 2017-11-13 10.00
5 2017-11-18 31.00
6 2017-11-27 14.00
7 2017-11-29 10.00
8 2017-11-30 37.20
9 2017-12-16 8.00
10 2017-12-17 43.20
11 2017-12-17 49.60
12 2017-12-19 102.50
13 2017-12-19 28.80
14 2017-12-22 72.55
15 2017-12-23 24.80
16 2017-12-24 62.00
17 2017-12-26 12.40
18 2017-12-26 15.50
19 2017-12-26 40.00
20 2017-12-28 57.60
21 2017-12-31 37.20
22 2018-01-01 18.60
23 2018-01-02 12.40
24 2018-01-04 32.40
25 2018-01-05 17.00
26 2018-01-06 28.80
27 2018-01-11 20.80
28 2018-01-12 10.00
29 2018-01-12 26.00
I am trying to plot monthly sum of transactions, which is fine, except for ugly x-ticks:
I would like to change it to Name of the month and year (e.g. Jan 2019). So I sort the dates, change them using strftime and plot it again, but the order of the date are completely messed up.
The code I used to sort the dates and conver them is:
transactions = transactions.sort_values(by='local_date')
transactions['month_year'] = transactions['local_date'].dt.strftime('%B %Y')
#And then groupby that column:
transactions.groupby('month_year').amount.sum().plot(kind='bar')
When doing this, the Month_year are paired together. January 2019 comes after January 2018 etc. etc.
I thought sorting by date would fix this, but it doesn't. What's the best way to approach this?
You can convert column to mont periods by Series.dt.to_period and then change PeriodIndex to custom format in rename:
transactions = transactions.sort_values(by='local_date')
(transactions.groupby(transactions['local_date'].dt.to_period('m'))
.amount.sum()
.rename(lambda x: x.strftime('%B %Y'))
.plot(kind='bar'))
Alternative solution:
transactions = transactions.sort_values(by='local_date')
s = transactions.groupby(transactions['local_date'].dt.to_period('m')).amount.sum()
s.index = s.index.strftime('%B %Y')
s.plot(kind='bar')
I tried to ask this question previously, but it was too ambiguous so here goes again. I am new to programming, so I am still learning how to ask questions in a useful way.
In summary, I have a pandas dataframe that resembles "INPUT DATA" that I would like to convert to "DESIRED OUTPUT", as shown below.
Each row contains an ID, a DateTime, and a Value. For each unique ID, the first row corresponds to timepoint 'zero', and each subsequent row contains a value 5 minutes following the previous row and so on.
I would like to calculate the mean of all the IDs for every 'time elapsed' timepoint. For example, in "DESIRED OUTPUT" Time Elapsed=0.0 would have the value 128.3 (100+105+180/3); Time Elapsed=5.0 would have the value 150.0 (150+110+190/3); Time Elapsed=10.0 would have the value 133.3 (125+90+185/3) and so on for Time Elapsed=15,20,25 etc.
I'm not sure how to create a new column which has the value for the time elapsed for each ID (e.g. 0.0, 5.0, 10.0 etc). I think that once I know how to do that, then I can use the groupby function to calculate the means for each time elapsed.
INPUT DATA
ID DateTime Value
1 2018-01-01 15:00:00 100
1 2018-01-01 15:05:00 150
1 2018-01-01 15:10:00 125
2 2018-02-02 13:15:00 105
2 2018-02-02 13:20:00 110
2 2018-02-02 13:25:00 90
3 2019-03-03 05:05:00 180
3 2019-03-03 05:10:00 190
3 2019-03-03 05:15:00 185
DESIRED OUTPUT
Time Elapsed Mean Value
0.0 128.3
5.0 150.0
10.0 133.3
Here is one way , using transform with groupby get the group key 'Time Elapsed', then just groupby it get the mean
df['Time Elapsed']=df.DateTime-df.groupby('ID').DateTime.transform('first')
df.groupby('Time Elapsed').Value.mean()
Out[998]:
Time Elapsed
00:00:00 128.333333
00:05:00 150.000000
00:10:00 133.333333
Name: Value, dtype: float64
You can do this explicitly by taking advantage of the datetime attributes of the DateTime column in your DataFrame
First get the year, month and day for each DateTime since they are all changing in your data
df['month'] = df['DateTime'].dt.month
df['day'] = df['DateTime'].dt.day
df['year'] = df['DateTime'].dt.year
print(df)
ID DateTime Value month day year
1 1 2018-01-01 15:00:00 100 1 1 2018
1 1 2018-01-01 15:05:00 150 1 1 2018
1 1 2018-01-01 15:10:00 125 1 1 2018
2 2 2018-02-02 13:15:00 105 2 2 2018
2 2 2018-02-02 13:20:00 110 2 2 2018
2 2 2018-02-02 13:25:00 90 2 2 2018
3 3 2019-03-03 05:05:00 180 3 3 2019
3 3 2019-03-03 05:10:00 190 3 3 2019
3 3 2019-03-03 05:15:00 185 3 3 2019
Then append a sequential DateTime counter column (per this SO post)
the counter is computed within (1) each year, (2) then each month and then (3) each day
since the data are in multiples of 5 minutes, use this to scale the counter values (i.e. the counter will be in multiples of 5 minutes, rather than a sequence of increasing integers)
df['Time Elapsed'] = df.groupby(['year', 'month', 'day']).cumcount() + 1
df['Time Elapsed'] *= 5
print(df)
ID DateTime Value month day year cumulative_record
1 1 2018-01-01 15:00:00 100 1 1 2018 5
1 1 2018-01-01 15:05:00 150 1 1 2018 10
1 1 2018-01-01 15:10:00 125 1 1 2018 15
2 2 2018-02-02 13:15:00 105 2 2 2018 5
2 2 2018-02-02 13:20:00 110 2 2 2018 10
2 2 2018-02-02 13:25:00 90 2 2 2018 15
3 3 2019-03-03 05:05:00 180 3 3 2019 5
3 3 2019-03-03 05:10:00 190 3 3 2019 10
3 3 2019-03-03 05:15:00 185 3 3 2019 15
Perform the groupby over the newly appended counter column
dfg = df.groupby('Time Elapsed')['Value'].mean()
print(dfg)
Time Elapsed
5 128.333333
10 150.000000
15 133.333333
Name: Value, dtype: float64
I have a pandas dataframe df as:
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 2.95 Saturday
1/6/2019 3.39 Sunday
1/7/2019 3.39 Monday
1/12/2019 2.23 Saturday
1/13/2019 2.50 Sunday
1/14/2019 3.62 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
I need to get the following df2 from above:
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 3.24 Saturday
1/6/2019 3.24 Sunday
1/7/2019 3.24 Monday
1/12/2019 2.78 Saturday
1/13/2019 2.78 Sunday
1/14/2019 2.78 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
Where the df2 values are updated to have average of consecutive Sat, Sun and Mon values.
i.e. average of 2.95, 3.39, 3.39 for dates 1/5/2019, 1/6/2019, 1/7/2019 in df is 3.24 and hence in df2 I have replaced the 1/5/2019, 1/6/2019, 1/7/2019 values with 3.24.
The trick has been finding the consecutive Saturday, Sunday and Monday. Not sure how to approach this.
You can use CustomBusinessDay with pd.grouper to create a group col:
# if you want to only find the mean if all three days are found
from pandas.tseries.offsets import CustomBusinessDay
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')
df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df.update(df[df.groupby('group_col')['Val'].transform('size').eq(3)].groupby('group_col').transform('mean'))
Date Val WD group_col
0 2019-01-03 2.650000 Thursday 0
1 2019-01-04 2.510000 Friday 1
2 2019-01-05 3.243333 Saturday 2
3 2019-01-06 3.243333 Sunday 2
4 2019-01-07 3.243333 Monday 2
5 2019-01-12 2.783333 Saturday 7
6 2019-01-13 2.783333 Sunday 7
7 2019-01-14 2.783333 Monday 7
8 2019-01-15 3.810000 Tuesday 8
9 2019-01-16 3.750000 Wednesday 9
10 2019-01-17 3.690000 Thursday 10
11 2019-01-18 3.470000 Friday 11
or if you want to find the mean of any combination of sat sun mon in the same week
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')
df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df['Val'] = df.groupby('group_col')['Val'].transform('mean')
This logic creates a Series that assigns a unique ID to groups of consecutive Sat/Sun/Mon rows in your DataFrame. Then ensure there are 3 of them (not just Sat/Sun or Sun/Mon), and transform those values with the mean:
import pandas as pd
#df['Date'] = pd.to_datetime(df.Date)
s = (~(df.Date.dt.dayofweek.isin([0,6])
& (df.Date - df.Date.shift(1)).dt.days.eq(1))).cumsum()
to_trans = s[s.groupby(s).transform('size').eq(3)]
df.loc[to_trans.index, 'Val'] = df.loc[to_trans.index].groupby(to_trans).Val.transform('mean')
Output:
Date Val WD
0 2019-01-03 2.650000 Thursday
1 2019-01-04 2.510000 Friday
2 2019-01-05 3.243333 Saturday
3 2019-01-06 3.243333 Sunday
4 2019-01-07 3.243333 Monday
5 2019-01-12 2.783333 Saturday
6 2019-01-13 2.783333 Sunday
7 2019-01-14 2.783333 Monday
8 2019-01-15 3.810000 Tuesday
9 2019-01-16 3.750000 Wednesday
10 2019-01-17 3.690000 Thursday
11 2019-01-18 3.470000 Friday
12 2019-01-19 3.250000 Saturday
13 2019-01-20 3.250000 Sunday
14 2019-01-21 3.250000 Monday
15 2019-01-22 5.000000 Tuesday
16 2019-01-27 2.000000 Sunday
17 2019-01-28 4.000000 Monday
18 2019-01-29 6.000000 Tuesday
19 2019-02-05 7.000000 Tuesday
20 2019-02-07 6.000000 Thursday
21 2019-02-12 9.000000 Tuesday
Extended Input Data
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 2.95 Saturday
1/6/2019 3.39 Sunday
1/7/2019 3.39 Monday
1/12/2019 2.23 Saturday
1/13/2019 2.50 Sunday
1/14/2019 3.62 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
1/19/2019 3.75 Saturday
1/20/2019 2.00 Sunday
1/21/2019 4.00 Monday
1/22/2019 5.00 Tuesday
1/27/2019 2.00 Sunday
1/28/2019 4.00 Monday
1/29/2019 6.00 Tuesday
2/5/2019 7.00 Tuesday
2/7/2019 6.00 Thursday
2/12/2019 9.00 Tuesday
One approach is to calculate a week number, then use groupby to calculate means across specific days and map this back to your original dataframe.
df['Date'] = pd.to_datetime(df['Date'])
# consider Monday to belong to previous week
week, weekday = df['Date'].dt.week, df['Date'].dt.weekday
df['Week'] = np.where(weekday.eq(0), week - 1, week)
# take means of Fri, Sat, Sun, then map back
mask = weekday.isin([5, 6, 0])
week_val_map = df[mask].groupby('Week')['Val'].mean()
df.loc[mask, 'Val'] = df['Week'].map(week_val_map)
print(df)
Date Val WD Week
0 2019-01-03 2.650000 Thursday 1
1 2019-01-04 2.510000 Friday 1
2 2019-01-05 3.243333 Saturday 1
3 2019-01-06 3.243333 Sunday 1
4 2019-01-07 3.243333 Monday 1
5 2019-01-12 2.783333 Saturday 2
6 2019-01-13 2.783333 Sunday 2
7 2019-01-14 2.783333 Monday 2
8 2019-01-15 3.810000 Tuesday 3
9 2019-01-16 3.750000 Wednesday 3
10 2019-01-17 3.690000 Thursday 3
11 2019-01-18 3.470000 Friday 3
I am still very new to pandas and just figured out I have made a mistake in the process I was following earlier.
df_date
Date day
0 2016-05-26 Thursday
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday
20 2016-06-24 Friday
21 2016-06-27 Monday
22 2016-06-28 Tuesday
23 2016-06-29 Wednesday
There are about 600+ rows.
What I want to do
Make a column 'Exit' where if thursday is not in the week the Wednesday becomes E and if wednesday is not there then Tuesday.
I tried a for loop and I just can't seem to get this right.
Expected Output:
df_date
Date day Exit
0 2016-05-26 Thursday E
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday E
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday E
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday E
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday E
20 2016-06-24 Friday
21 2016-06-27 Monday
22 2016-06-28 Tuesday
23 2016-06-29 Wednesday E
I added this in comments but should be here as well:
If Thursday is not present then the record just before it.
So if Wednesday is also not present in the week, then Tuesday
If Tuesday is also not then Monday, if monday is not then Friday. Saturday and Sunday will never have a record.
Here's a solution:
ix = df.groupby(pd.Grouper(key='Date', freq='W')).Date
.apply(lambda x: (x.dt.dayofweek <= 3)[::-1].idxmax()).values
df.loc[ix,'Exit'] = 'E'
df.fillna('')
Date day Exit
0 2016-05-26 Thursday E
1 2016-05-27 Friday
2 2016-05-30 Monday
3 2016-05-31 Tuesday
4 2016-06-01 Wednesday
5 2016-06-02 Thursday E
6 2016-06-03 Friday
7 2016-06-06 Monday
8 2016-06-07 Tuesday
9 2016-06-08 Wednesday
10 2016-06-09 Thursday E
11 2016-06-10 Friday
12 2016-06-13 Monday
13 2016-06-14 Tuesday
14 2016-06-15 Wednesday
15 2016-06-16 Thursday E
16 2016-06-17 Friday
17 2016-06-20 Monday
18 2016-06-21 Tuesday
19 2016-06-22 Wednesday
20 2016-06-23 Thursday E
21 2016-06-24 Friday
22 2016-06-27 Monday
23 2016-06-28 Tuesday
24 2016-06-29 Wednesday E
You can use dt.week and dt.weekday properties of your datetime series. Then use groupby + max for your required logic. This is likely to be more efficient than sequential equality checks.
df['Date'] = pd.to_datetime(df['Date'])
# add week and weekday series
df['Week'] = df['Date'].dt.week
df['Weekday'] = df['Date'].dt.weekday.where(df['Date'].dt.weekday.isin([1, 2, 3]))
df['Exit'] = np.where(df['Weekday'] == df.groupby('Week')['Weekday'].transform('max'),
'E', '')
Result
I have left the helper columns so the way the solution works is clear. These can easily be removed.
print(df)
Date day Week Weekday Exit
0 2016-05-26 Thursday 21 3.0 E
1 2016-05-27 Friday 21 NaN
2 2016-05-30 Monday 22 NaN
3 2016-05-31 Tuesday 22 1.0
4 2016-06-01 Wednesday 22 2.0
5 2016-06-02 Thursday 22 3.0 E
6 2016-06-03 Friday 22 NaN
7 2016-06-06 Monday 23 NaN
8 2016-06-07 Tuesday 23 1.0
9 2016-06-08 Wednesday 23 2.0
10 2016-06-09 Thursday 23 3.0 E
11 2016-06-10 Friday 23 NaN
12 2016-06-13 Monday 24 NaN
13 2016-06-14 Tuesday 24 1.0
14 2016-06-15 Wednesday 24 2.0
15 2016-06-16 Thursday 24 3.0 E
16 2016-06-17 Friday 24 NaN
17 2016-06-20 Monday 25 NaN
18 2016-06-21 Tuesday 25 1.0
19 2016-06-22 Wednesday 25 2.0 E
20 2016-06-24 Friday 25 NaN
21 2016-06-27 Monday 26 NaN
22 2016-06-28 Tuesday 26 1.0
23 2016-06-29 Wednesday 26 2.0 E