I have a dataframe that contains a time series with hourly data form 2015 to 2020. I want to create a new dataframe that has a column with the values of the time series for each year or for each month of each year to perform a separate analysis. As I have 1 leap year, I want them to share index but have a NaN value at that position (29 Feb) on the years that are not leap. I tried using merge creating two new columns called month and day_of_month but index gets crazy and ends up having millions of entries instead of the ~40.000 it should have, and in the end it ends up with a more than 20GB file on RAM and breaks:
years = pd.DataFrame(index=pd.date_range('2016-01-01', '2017-01-01', freq='1H'))
years['month'] = years.index.month
years['day_of_month'] = years.index.day
gp = data_md[['value', 'month', 'day_of_month']].groupby(pd.Grouper(freq='1Y'))
for name, group in gp:
years = years.merge(group, right_on=['month', 'day_of_month'], left_on=['month', 'day_of_month'])
RESULT:
month day_of_month value
0 1 1 0
1 1 1 6
2 1 1 2
3 1 1 0
4 1 1 1
... ... ... ...
210259 12 31 6
210260 12 31 2
210261 12 31 4
210262 12 31 5
210263 12 31 1
How can I get the frame constructed having one value column for each single year or month?
Here I leave the original frame from which I want to create the new one, only needed column by now is value
value month day_of_month week day_name year hour season dailyp day_of_week ... hourly_no_noise daily_trend daily_seasonal daily_residuals daily_no_noise daily_trend_h daily_seasonal_h daily_residuals_h daily_no_noise_h Total
date
2015-01-01 00:00:00 0 1 1 1 Thursday 2015 0 Invierno 165.0 3 ... NaN NaN -9.053524 NaN NaN NaN -3.456929 NaN NaN 6436996.0
2015-01-01 01:00:00 6 1 1 1 Thursday 2015 1 Invierno NaN 3 ... NaN NaN -9.053524 NaN NaN NaN -4.879983 NaN NaN NaN
2015-01-01 02:00:00 2 1 1 1 Thursday 2015 2 Invierno NaN 3 ... NaN NaN -9.053524 NaN NaN NaN -5.895367 NaN NaN NaN
2015-01-01 03:00:00 0 1 1 1 Thursday 2015 3 Invierno NaN 3 ... NaN NaN -9.053524 NaN NaN NaN -6.468616 NaN NaN NaN
2015-01-01 04:00:00 1 1 1 1 Thursday 2015 4 Invierno NaN 3 ... NaN NaN -9.053524 NaN NaN NaN -6.441830 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-12-31 19:00:00 6 12 31 1 Tuesday 2019 19 Invierno NaN 1 ... 11.529465 230.571429 -4.997480 -11.299166 237.299166 9.613095 2.805720 1.176491 17.823509 NaN
2019-12-31 20:00:00 3 12 31 1 Tuesday 2019 20 Invierno NaN 1 ... 11.314857 230.571429 -4.997480 -11.299166 237.299166 9.613095 2.928751 1.176491 17.823509 NaN
2019-12-31 21:00:00 3 12 31 1 Tuesday 2019 21 Invierno NaN 1 ... 10.141139 230.571429 -4.997480 -11.299166 237.299166 9.613095 1.774848 1.176491 17.823509 NaN
2019-12-31 22:00:00 3 12 31 1 Tuesday 2019 22 Invierno NaN 1 ... 8.823152 230.571429 -4.997480 -11.299166 237.299166 9.613095 0.663344 1.176491 17.823509 NaN
2019-12-31 23:00:00 6 12 31 1 Tuesday 2019 23 Invierno NaN 1 ... 6.884636 230.571429 -4.997480 -11.299166 237.299166 9.613095 -1.624980 1.176491 17.823509 NaN
I would like to end up with a dataframe like this:
2015 2016 2017 2018 2019
2016-01-01 00:00:00 0.074053 0.218161 0.606810 0.687365 0.352672
2016-01-01 01:00:00 0.465167 0.210297 0.722825 0.683341 0.885175
2016-01-01 02:00:00 0.175964 0.610560 0.722479 0.016842 0.205916
2016-01-01 03:00:00 0.945955 0.807490 0.627525 0.187677 0.535116
2016-01-01 04:00:00 0.757608 0.797835 0.639215 0.455989 0.042285
... ... ... ... ... ...
2016-12-30 20:00:00 0.046138 0.139100 0.397547 0.738687 0.335306
2016-12-30 21:00:00 0.672800 0.802090 0.617625 0.787601 0.007535
2016-12-30 22:00:00 0.698141 0.776686 0.423712 0.667808 0.298338
2016-12-30 23:00:00 0.198089 0.642073 0.586527 0.106567 0.514569
2016-12-31 00:00:00 0.367572 0.390791 0.105193 0.592167 0.007365
where 29 Feb is NaN on non-leap years:
df['2016-02']
2015 2016 2017 2018 2019
2016-02-01 00:00:00 0.656703 0.348784 0.383639 0.208786 0.183642
2016-02-01 01:00:00 0.488729 0.909498 0.873642 0.122028 0.547563
2016-02-01 02:00:00 0.210427 0.912393 0.505873 0.085149 0.358841
2016-02-01 03:00:00 0.281107 0.534750 0.622473 0.643611 0.258437
2016-02-01 04:00:00 0.187434 0.327459 0.701008 0.887041 0.385816
... ... ... ... ... ...
2016-02-29 19:00:00 NaN 0.742402 NaN NaN NaN
2016-02-29 20:00:00 NaN 0.013419 NaN NaN NaN
2016-02-29 21:00:00 NaN 0.517194 NaN NaN NaN
2016-02-29 22:00:00 NaN 0.003136 NaN NaN NaN
2016-02-29 23:00:00 NaN 0.128406 NaN NaN NaN
IIUC, you just need the original DataFrame:
origin = 2016 # or whatever year of your chosing
newidx = pd.to_datetime(df.index.strftime(f'{origin}-%m-%d %H:%M:%S'))
newdf = (
df[['value']]
.assign(year=df.index.year)
.set_axis(newidx, axis=0)
.pivot(columns='year', values='value')
)
Using the small sample data you provided for that "original frame" df, we get:
>>> newdf
year 2015 2019
date
2016-01-01 00:00:00 0.0 NaN
2016-01-01 01:00:00 6.0 NaN
2016-01-01 02:00:00 2.0 NaN
... ... ...
2016-12-31 21:00:00 NaN 3.0
2016-12-31 22:00:00 NaN 3.0
2016-12-31 23:00:00 NaN 6.0
On a larger (made-up) DataFrame:
np.random.seed(0)
ix = pd.date_range('2015', '2020', freq='H', inclusive='left')
df = pd.DataFrame({'value': np.random.randint(0, 100, len(ix))}, index=ix)
# (code above)
>>> newdf
year 2015 2016 2017 2018 2019
2016-01-01 00:00:00 44.0 82.0 96.0 68.0 71.0
2016-01-01 01:00:00 47.0 99.0 54.0 44.0 71.0
2016-01-01 02:00:00 64.0 28.0 11.0 10.0 55.0
... ... ... ... ... ...
2016-12-31 21:00:00 0.0 30.0 28.0 53.0 14.0
2016-12-31 22:00:00 47.0 82.0 19.0 6.0 64.0
2016-12-31 23:00:00 22.0 75.0 13.0 37.0 35.0
and, as expected, only 2016 has values for 02/29:
>>> newdf[:'2016-02-29 02:00:00'].tail()
year 2015 2016 2017 2018 2019
2016-02-28 22:00:00 74.0 54.0 22.0 17.0 39.0
2016-02-28 23:00:00 37.0 61.0 31.0 8.0 62.0
2016-02-29 00:00:00 NaN 34.0 NaN NaN NaN
2016-02-29 01:00:00 NaN 82.0 NaN NaN NaN
2016-02-29 02:00:00 NaN 67.0 NaN NaN NaN
Addendum: by months
The code above can easily be adapted for month columns:
Either using MultiIndex columns:
origin = 2016
newidx = pd.to_datetime(df.index.strftime(f'{origin}-01-%d %H:%M:%S'))
newdf = (
df[['value']]
.assign(year=df.index.year, month=df.index.month)
.set_axis(newidx, axis=0)
.pivot(columns=['year', 'month'], values='value')
)
>>> newdf
year 2015 ... 2019
month 1 2 3 4 5 6 7 8 9 10 ... 3 4 5 6 7 8 9 10 11 12
2016-01-01 00:00:00 44.0 49.0 40.0 60.0 71.0 67.0 63.0 16.0 71.0 78.0 ... 32.0 35.0 51.0 35.0 68.0 43.0 4.0 23.0 65.0 19.0
2016-01-01 01:00:00 47.0 71.0 27.0 88.0 68.0 58.0 74.0 67.0 98.0 49.0 ... 85.0 27.0 70.0 8.0 9.0 29.0 78.0 29.0 21.0 68.0
2016-01-01 02:00:00 64.0 90.0 4.0 61.0 95.0 3.0 57.0 41.0 28.0 24.0 ... 7.0 93.0 21.0 10.0 72.0 79.0 46.0 45.0 25.0 99.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2016-01-31 21:00:00 48.0 NaN 24.0 NaN 79.0 NaN 55.0 47.0 NaN 20.0 ... 87.0 NaN 19.0 NaN 56.0 76.0 NaN 91.0 NaN 14.0
2016-01-31 22:00:00 82.0 NaN 6.0 NaN 46.0 NaN 9.0 57.0 NaN 21.0 ... 69.0 NaN 67.0 NaN 85.0 38.0 NaN 34.0 NaN 64.0
2016-01-31 23:00:00 51.0 NaN 97.0 NaN 45.0 NaN 55.0 41.0 NaN 87.0 ... 94.0 NaN 80.0 NaN 37.0 81.0 NaN 98.0 NaN 35.0
or a simple string column made of %Y-%m to indicate year/month:
origin = 2016
newidx = pd.to_datetime(df.index.strftime(f'{origin}-01-%d %H:%M:%S'))
newdf = (
df[['value']]
.assign(ym=df.index.strftime(f'%Y-%m'))
.set_axis(newidx, axis=0)
.pivot(columns='ym', values='value')
)
>>> newdf
ym 2015-01 2015-02 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08 2015-09 2015-10 ... 2019-03 2019-04 2019-05 2019-06 2019-07 2019-08 2019-09 \
2016-01-01 00:00:00 44.0 49.0 40.0 60.0 71.0 67.0 63.0 16.0 71.0 78.0 ... 32.0 35.0 51.0 35.0 68.0 43.0 4.0
2016-01-01 01:00:00 47.0 71.0 27.0 88.0 68.0 58.0 74.0 67.0 98.0 49.0 ... 85.0 27.0 70.0 8.0 9.0 29.0 78.0
2016-01-01 02:00:00 64.0 90.0 4.0 61.0 95.0 3.0 57.0 41.0 28.0 24.0 ... 7.0 93.0 21.0 10.0 72.0 79.0 46.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2016-01-31 21:00:00 48.0 NaN 24.0 NaN 79.0 NaN 55.0 47.0 NaN 20.0 ... 87.0 NaN 19.0 NaN 56.0 76.0 NaN
2016-01-31 22:00:00 82.0 NaN 6.0 NaN 46.0 NaN 9.0 57.0 NaN 21.0 ... 69.0 NaN 67.0 NaN 85.0 38.0 NaN
2016-01-31 23:00:00 51.0 NaN 97.0 NaN 45.0 NaN 55.0 41.0 NaN 87.0 ... 94.0 NaN 80.0 NaN 37.0 81.0 NaN
ym 2019-10 2019-11 2019-12
2016-01-01 00:00:00 23.0 65.0 19.0
2016-01-01 01:00:00 29.0 21.0 68.0
2016-01-01 02:00:00 45.0 25.0 99.0
... ... ... ...
2016-01-31 21:00:00 91.0 NaN 14.0
2016-01-31 22:00:00 34.0 NaN 64.0
2016-01-31 23:00:00 98.0 NaN 35.0
The former gives you more flexibility to index sub-parts. For example, here is a selection of rows for "all February months":
>>> newdf.loc[:'2016-01-29 02:00:00', (slice(None), 2)].tail()
year 2015 2016 2017 2018 2019
month 2 2 2 2 2
2016-01-28 22:00:00 74.0 54.0 22.0 17.0 39.0
2016-01-28 23:00:00 37.0 61.0 31.0 8.0 62.0
2016-01-29 00:00:00 NaN 34.0 NaN NaN NaN
2016-01-29 01:00:00 NaN 82.0 NaN NaN NaN
2016-01-29 02:00:00 NaN 67.0 NaN NaN NaN
So let's assume we have the following dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame(pd.date_range('2015-01-01', '2020-01-01', freq='1H'),
columns = ['Date and Time'])
df['str'] = df['Date and Time'].dt.strftime('%Y-%m-%d')
df[['Year', 'Month','Day']] = df['str'].apply(lambda x: pd.Series(str(x).split("-")))
df['Values'] = np.random.rand(len(df))
print(df)
Output:
Date and Time str Year Month Day Values
0 2015-01-01 00:00:00 2015-01-01 2015 01 01 0.153948
1 2015-01-01 01:00:00 2015-01-01 2015 01 01 0.663132
2 2015-01-01 02:00:00 2015-01-01 2015 01 01 0.141534
3 2015-01-01 03:00:00 2015-01-01 2015 01 01 0.263551
4 2015-01-01 04:00:00 2015-01-01 2015 01 01 0.094391
... ... ... ... ... .. ...
43820 2019-12-31 20:00:00 2019-12-31 2019 12 31 0.055802
43821 2019-12-31 21:00:00 2019-12-31 2019 12 31 0.952963
43822 2019-12-31 22:00:00 2019-12-31 2019 12 31 0.106768
43823 2019-12-31 23:00:00 2019-12-31 2019 12 31 0.834583
43824 2020-01-01 00:00:00 2020-01-01 2020 01 01 0.325849
[43825 rows x 6 columns]
Now we separate the dataframe by year and save it in a disk:
d = {}
for i in range(2015,2020):
d[i] = pd.DataFrame(df[df['Year'] == str(i)])
d[i].sort_values(by = 'Date and Time',inplace=True,ignore_index=True)
for i in range(2015,2020):
print('Feb', i,':',(d[i][d[i]['Month'] == '02']).shape)
print((d[i][d[i]['Month'] == '02']).tail(3))
print('-----------------------------------------------------------------')
Output:
Feb 2015 : (672, 6)
Date and Time str Year Month Day Values
1413 2015-02-28 21:00:00 2015-02-28 2015 02 28 0.517525
1414 2015-02-28 22:00:00 2015-02-28 2015 02 28 0.404741
1415 2015-02-28 23:00:00 2015-02-28 2015 02 28 0.299090
-----------------------------------------------------------------
Feb 2016 : (696, 6)
Date and Time str Year Month Day Values
1437 2016-02-29 21:00:00 2016-02-29 2016 02 29 0.854047
1438 2016-02-29 22:00:00 2016-02-29 2016 02 29 0.035787
1439 2016-02-29 23:00:00 2016-02-29 2016 02 29 0.955364
-----------------------------------------------------------------
Feb 2017 : (672, 6)
Date and Time str Year Month Day Values
1413 2017-02-28 21:00:00 2017-02-28 2017 02 28 0.936354
1414 2017-02-28 22:00:00 2017-02-28 2017 02 28 0.954680
1415 2017-02-28 23:00:00 2017-02-28 2017 02 28 0.625131
-----------------------------------------------------------------
Feb 2018 : (672, 6)
Date and Time str Year Month Day Values
1413 2018-02-28 21:00:00 2018-02-28 2018 02 28 0.965274
1414 2018-02-28 22:00:00 2018-02-28 2018 02 28 0.848050
1415 2018-02-28 23:00:00 2018-02-28 2018 02 28 0.238984
-----------------------------------------------------------------
Feb 2019 : (672, 6)
Date and Time str Year Month Day Values
1413 2019-02-28 21:00:00 2019-02-28 2019 02 28 0.476142
1414 2019-02-28 22:00:00 2019-02-28 2019 02 28 0.498278
1415 2019-02-28 23:00:00 2019-02-28 2019 02 28 0.127525
-----------------------------------------------------------------
To fix the leap year problem:
There is definitely a better way, but the only thing I can think of is to create the value rows, add them, and then join the dataframes.
indexs = list(range(1416,1440))
lines = pd.DataFrame(np.nan ,columns = df.columns.values , index = indexs)
print(lines.head())
Output:
Date and Time str Year Month Day Values
1416 NaN NaN NaN NaN NaN NaN
1417 NaN NaN NaN NaN NaN NaN
1418 NaN NaN NaN NaN NaN NaN
1419 NaN NaN NaN NaN NaN NaN
1420 NaN NaN NaN NaN NaN NaN
Then I add the NaN rows to the data frame with the following code:
b = {}
for i in range(2015,2020):
if list(d[i][d[i]['Month'] == '02'].tail(1)['Day'])[0] == '28':
bi = pd.concat([d[i].iloc[0:1416], lines]).reset_index(drop=True)
b[i] = pd.concat([bi,d[i].iloc[1416:8783]]).reset_index(drop=True)
else:
b[i] = d[i].copy()
for i in range(2015,2020):
print(i,':',b[i].shape)
print(b[i].iloc[1438:1441])
print('-----------------------------------------------------------------')
Output:
2015 : (8784, 6)
Date and Time str Year Month Day Values
1438 NaT NaN NaN NaN NaN NaN
1439 NaT NaN NaN NaN NaN NaN
1440 2015-03-01 2015-03-01 2015 03 01 0.676486
-----------------------------------------------------------------
2016 : (8784, 6)
Date and Time str Year Month Day Values
1438 2016-02-29 22:00:00 2016-02-29 2016 02 29 0.035787
1439 2016-02-29 23:00:00 2016-02-29 2016 02 29 0.955364
1440 2016-03-01 00:00:00 2016-03-01 2016 03 01 0.014158
-----------------------------------------------------------------
2017 : (8784, 6)
Date and Time str Year Month Day Values
1438 NaT NaN NaN NaN NaN NaN
1439 NaT NaN NaN NaN NaN NaN
1440 2017-03-01 2017-03-01 2017 03 01 0.035952
-----------------------------------------------------------------
2018 : (8784, 6)
Date and Time str Year Month Day Values
1438 NaT NaN NaN NaN NaN NaN
1439 NaT NaN NaN NaN NaN NaN
1440 2018-03-01 2018-03-01 2018 03 01 0.44876
-----------------------------------------------------------------
2019 : (8784, 6)
Date and Time str Year Month Day Values
1438 NaT NaN NaN NaN NaN NaN
1439 NaT NaN NaN NaN NaN NaN
1440 2019-03-01 2019-03-01 2019 03 01 0.096433
-----------------------------------------------------------------
And finally, if we want to create the dataframe you want:
final_df = pd.DataFrame(index = b[2016]['Date and Time'])
for i in range(2015,2020):
final_df[i] = np.array(b[i]['Values'])
Output:
2015 2016 2017 2018 2019
Date and Time
2016-01-01 00:00:00 0.153948 0.145602 0.957265 0.427620 0.868948
2016-01-01 01:00:00 0.663132 0.318746 0.013658 0.380105 0.442332
2016-01-01 02:00:00 0.141534 0.483471 0.048050 0.139065 0.702211
2016-01-01 03:00:00 0.263551 0.737948 0.528827 0.472889 0.165095
2016-01-01 04:00:00 0.094391 0.939737 0.120343 0.134011 0.297611
... ... ... ... ... ...
2016-02-28 22:00:00 0.404741 0.864423 0.954680 0.848050 0.498278
2016-02-28 23:00:00 0.299090 0.348466 0.625131 0.238984 0.127525
2016-02-29 00:00:00 NaN 0.375469 NaN NaN NaN
2016-02-29 01:00:00 NaN 0.186092 NaN NaN NaN
... ... ... ... ... ...
2016-02-29 22:00:00 NaN 0.035787 NaN NaN NaN
2016-02-29 23:00:00 NaN 0.955364 NaN NaN NaN
2016-03-01 00:00:00 0.676486 0.014158 0.035952 0.448760 0.096433
2016-03-01 01:00:00 0.792168 0.520436 0.138874 0.229396 0.913848
... ... ... ... ... ...
2016-12-31 19:00:00 0.517459 0.956219 0.116335 0.736170 0.739740
2016-12-31 20:00:00 0.814362 0.324332 0.324911 0.485508 0.055802
2016-12-31 21:00:00 0.870459 0.809150 0.335461 0.124459 0.952963
2016-12-31 22:00:00 0.549891 0.043623 0.997053 0.144286 0.106768
2016-12-31 23:00:00 0.047090 0.730074 0.698159 0.235253 0.834583
[8784 rows x 5 columns]
Related
I have a time-series dataset with different amounts of released gases in each time step as follows, the data is monitored day to day, in Date the sampling time is reflected and in the other columns the amount of released gas.
import pandas as pd
from statistics import mean
import numpy as np
Data = pd.read_csv('PTR 69.csv')
Data.columns = ['Date', 'H2', 'CH4', 'C2H6', 'C2H4', 'C2H2', 'CO', 'CO2', 'O2']
Data.dropna(how='all', axis=1, inplace=True)
Data.head()
It looks like this:
Date H2 CH4 C2H6 C2H4 C2H2 CO CO2 O2
0 2021-04-14 2:00 8.301259 10.889560 7.205929 3.485577 0.108262 318.616211 1659.179688 866.826721
1 2021-04-13 3:00 8.190150 10.224614 7.369829 3.561115 0.130052 318.895599 1641.014526 883.500305
2 2021-04-12 4:00 8.223248 10.297009 7.571199 3.479434 0.113566 315.364594 1636.670776 896.083679
3 2021-04-11 5:00 8.342580 10.233653 7.726023 3.474085 0.234786 316.315277 1641.205078 875.664856
4 2021-04-10 6:00 8.365788 9.825816 7.640978 3.621368 0.320388 320.200409 1658.575806 880.871399
5 2021-04-09 7:00 8.113251 11.198173 7.588203 3.561790 0.200721 318.738922 1651.639038 886.923401
6 2021-04-08 8:00 7.881397 7.967482 7.382273 3.528960 0.180016 315.252838 1625.236328 878.604309
7 2021-04-07 9:00 7.833044 6.773924 7.292545 3.475330 0.401435 317.085449 1628.325562 893.305664
8 2021-04-06 10:00 7.908926 9.419571 7.018494 3.347562 0.406113 317.643768 1620.742554 912.732422
9 2021-04-05 11:00 8.192807 9.262563 7.227449 3.275920 0.133978 312.931152 1601.240 845 932.079102
10 2021-04-04 12:00 8.086914 9.480316 6.515196 3.312712 0.000000 315.486816 1609.530884 928.141907
11 2021-04-03 13:00 7.984566 9.406860 6.712120 3.476949 0.336859 312.862793 1596.182495 938.904724
12 2021-04-02 14:00 8.077889 8.335327 7.443592 3.605910 0.416443 315.546539 1605.549438 928.619568
13 2021-04-01 15:00 7.996786 9.087573 7.950811 3.626776 0.745824 311.601471 1608.987183 897.747498
14 2021-03-31 16:00 8.433417 10.078784 6.567528 3.646854 0.682301 313.811615 1619.164673 825.123596
15 2021-03-30 17:00 8.445275 9.768773 7.460344 3.712297 0.353539 314.944672 1606.494751 811.027161
16 2021-03-29 18:00 8.398427 9.607062 7.446943 3.674934 0.287205 314.554596 1599.793823 828.780090
17 2021-03-28 19:00 8.272332 9.678397 7.303371 3.617573 0.430137 311.486664 1590.122192 828.557312
18 2021-03-27 20:00 8.478241 9.364383 7.153194 3.616118 0.548547 314.538849 1578.516235 821.565125
19 2021-03-26 21:00 8.452413 10.828227 6.825691 3.260484 0.642971 314.990082 1561.811890 826.468079
First I used the [pd.to_datetime] and separate the data frame based on the month and year as you can see:
Data['Date'] = pd.to_datetime(Data['Date'])
# How long is the dataset
Data['Date'].max () - Data['Date'].min ()
Reults:
```python
Timedelta('1364 days 12:49:00')
Data['Month'] = Data['Date'].dt.month
Data['Year'] = Data['Date'].dt.year
Data.head()
Then like this:
```python
Date H2 CH4 C2H6 C2H4 C2H2 CO CO2 O2 Month Year
0 2021-04-14 02:00:00 8.301259 10.889560 7.205929 3.485577 0.108262 318.616211 1659.179688 866.826721 4 2021
1 2021-04-13 03:00:00 8.190150 10.224614 7.369829 3.561115 0.130052 318.895599 1641.014526 883.500305 4 2021
2 2021-04-12 04:00:00 8.223248 10.297009 7.571199 3.479434 0.113566 315.364594 1636.670776 896.083679 4 2021
3 2021-04-11 05:00:00 8.342580 10.233653 7.726023 3.474085 0.234786 316.315277 1641.205078 875.664856 4 2021
4 2021-04-10 06:00:00 8.365788 9.825816 7.640978 3.621368 0.320388 320.200409 1658.575806 880.871399 4 2021
5 2021-04-09 07:00:00 8.113251 11.198173 7.588203 3.561790 0.200721 318.738922 1651.639038 886.923401 4 2021
6 2021-04-08 08:00:00 7.881397 7.967482 7.382273 3.528960 0.180016 315.252838 1625.236328 878.604309 4 2021
7 2021-04-07 09:00:00 7.833044 6.773924 7.292545 3.475330 0.401435 317.085449 1628.325562 893.305664 4 2021
8 2021-04-06 10:00:00 7.908926 9.419571 7.018494 3.347562 0.406113 317.643768 1620.742554 912.732422 4 2021
9 2021-04-05 11:00:00 8.192807 9.262563 7.227449 3.275920 0.133978 312.931152 1601.240845 932.079102 4 2021
So, two other columns [Month] and [Year] are added to the data frame.
My question: How I can calculate the rate of H2 changes over a month?
I know that first, I should calculate the mean of H2 in each month and each year as my data is time-sereis.
Mean_month = Data.set_index('Date').groupby(pd.Grouper(freq = 'M'))['H2'].mean().reset_index()
I used the previous steps to convert the date to [pd.to_datetime]:
Mean_month['Date'] = pd.to_datetime(Mean_month['Date'])
Mean_month['Month_mean'] = Mean_month['Date'].dt.month
Mean_month['Year_mean'] = Mean_month['Date'].dt.year
Mean_month.head ()
looks like this one:
Date H2 CH4 C2H2 C2H4 C2H6 CO CO2 O2 Month_mean Year_mean
0 2017-07-31 0.892207 0.797776 0.572518 0.119328 0.203212 23.137884 230.986328 1756.658813 7 2017
1 2017-08-31 NaN NaN NaN NaN NaN NaN NaN NaN 8 2017
2 2017-09-30 NaN NaN NaN NaN NaN NaN NaN NaN 9 2017
3 2017-10-31 NaN NaN NaN NaN NaN NaN NaN NaN 10 2017
4 2017-11-30 NaN NaN NaN NaN NaN NaN NaN NaN 11 2017
5 2017-12-31 NaN NaN NaN NaN NaN NaN NaN NaN 12 2017
6 2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN 1 2018
7 2018-02-28 NaN NaN NaN NaN NaN NaN NaN NaN 2 2018
8 2018-03-31 NaN NaN NaN NaN NaN NaN NaN NaN 3 2018
9 2018-04-30 NaN NaN NaN NaN NaN NaN NaN NaN 4 2018
10 2018-05-31 NaN NaN NaN NaN NaN NaN NaN NaN 5 2018
11 2018-06-30 3.376091 1.780959 0.488345 0.431397 1.777461 59.424690 246.135108 2927.244192 6 2018
12 2018-07-31 3.785872 1.710799 0.479277 0.405084 2.416031 63.220747 256.035651 2971.905932 7 2018
13 2018-08-31 3.789915 1.874313 0.444453 0.339609 2.516580 67.629768 264.437564 3016.440033 8 2018
14 2018-09-30 3.882403 1.842717 0.443967 0.342131 2.848867 71.592693 271.972792 3073.598901 9 2018
15 2018-10-31 3.858354 2.037401 0.364234 0.358209 2.651448 75.036622 274.889362 3150.082060 10 2018
16 2018-11-30 3.861638 1.854492 0.276273 0.289241 2.813399 78.563868 289.631986 3176.243186 11 2018
17 2018-12-31 5.029865 2.526096 0.232814 0.510899 3.423260 95.641880 409.359902 2831.721010 12 2018
18 2019-01-31 6.103601 2.528294 0.177558 0.612607 4.039948 116.639744 516.362618 2423.434258 1 2019
19 2019-02-28 7.480646 3.316433 0.239254 0.959470 5.319684 142.571229 662.409360 1877.447767 2 2019
20 2019-03-31 8.363644 3.779225 0.213011 1.171834 6.179431 167.295488 815.904473 1415.431158 3 2019
21 2019-04-30 9.523452 4.620810 0.233048 1.703750 8.359211 195.914846 1044.554593 898.940531 4 2019
22 2019-05-31 10.118435 5.524447 0.311802 1.904199 9.275237 213.531002 1178.495602 657.617859 5 2019
23 2019-06-30 10.283766 6.186843 0.377420 2.165453 10.729356 226.061226 1226.489872 589.417023 6 2019
24 2019-07-31 9.943331 6.648062 0.492584 2.326774 11.791042 234.309877 1257.822071 572.162592 7 2019
25 2019-08-31 9.812387 6.681962 0.510871 2.483979 13.067311 243.440762 1302.643938 568.994610 8 2019
26 2019-09-30 9.661653 7.323367 0.420726 2.628199 13.308826 252.133648 1383.259943 550.533951 9 2019
27 2019-10-31 9.246261 7.644706 0.372446 2.673924 13.880747 257.093790 1407.996110 565.502500 10 2019
28 2019-11-30 8.226894 6.606762 0.411812 2.290050 12.958136 257.590110 1306.817593 654.086494 11 2019
29 2019-12-31 7.985734 7.461197 0.314830 2.417687 13.255049 259.519881 1309.507549 684.085808 12 2019
30 2020-01-31 7.754674 7.804206 0.336518 2.506526 13.554615 262.188585 1312.052006 700.065050 1 2020
31 2020-02-29 7.662918 7.607357 0.283796 2.483387 13.803671 264.348120 1300.252926 710.281917 2 2020
32 2020-03-31 7.602619 8.326974 0.278294 2.629290 13.983202 268.429411 1351.023144 698.012543 3 2020
33 2020-04-30 7.585870 8.028798 0.389348 2.856049 15.635886 273.859451 1426.279447 703.866225 4 2020
34 2020-05-31 7.752543 8.622809 0.329810 2.974434 16.470193 279.636700 1484.100789 685.164897 5 2020
35 2020-06-30 7.935418 8.632543 0.408732 3.410121 18.330232 287.545439 1593.554077 653.294214 6 2020
36 2020-07-31 8.226212 9.180892 0.474289 3.646311 19.746735 295.059049 1688.793476 613.164837 7 2020
37 2020-08-31 8.535027 9.583940 0.517722 3.860195 20.853958 303.025472 1759.655769 597.264223 8 2020
38 2020-09-30 8.782468 9.318198 0.447619 3.780273 21.613501 309.644693 1790.096266 594.891798 9 2020
39 2020-10-31 8.766880 17.531840 0.436720 3.671641 21.794714 312.511920 1783.446248 622.681765 10 2020
40 2020-11-30 8.535022 9.695740 0.427224 3.352291 11.561881 311.624202 1676.413354 713.680609 11 2020
41 2020-12-31 8.374398 9.114723 0.340198 3.351321 6.768138 312.902290 1642.077442 766.767532 12 2020
42 2021-01-31 8.238818 9.373566 0.344173 3.372903 6.670032 313.475182 1604.747685 788.205679 1 2021
43 2021-02-28 8.191080 9.900578 0.334562 3.352319 6.802692 314.076140 1572.294619 815.143081 2 2021
44 2021-03-31 8.317389 9.627182 0.385551 3.209554 5.862067 312.134351 1484.145511 867.169165 3 2021
45 2021-04-30 8.107043 9.457317 0.266317 3.488106 7.331760 316.181560 1627.434300 900.000397 4 2021
As the [Mean_month] data frame is sorted ascending, I resorted it again with:
Srt_Mean = Mean_month.sort_values(['Date'],ascending=False)
Srt_Mean
the results are:
Date H2 CH4 C2H2 C2H4 C2H6 CO CO2 O2 Month_mean Year_mean
45 2021-04-30 8.107043 9.457317 0.266317 3.488106 7.331760 316.181560 1627.434300 900.000397 4 2021
44 2021-03-31 8.317389 9.627182 0.385551 3.209554 5.862067 312.134351 1484.145511 867.169165 3 2021
43 2021-02-28 8.191080 9.900578 0.334562 3.352319 6.802692 314.076140 1572.294619 815.143081 2 2021
42 2021-01-31 8.238818 9.373566 0.344173 3.372903 6.670032 313.475182 1604.747685 788.205679 1 2021
41 2020-12-31 8.374398 9.114723 0.340198 3.351321 6.768138 312.902290 1642.077442 766.767532 12 2020
40 2020-11-30 8.535022 9.695740 0.427224 3.352291 11.561881 311.624202 1676.413354 713.680609 11 2020
39 2020-10-31 8.766880 17.531840 0.436720 3.671641 21.794714 312.511920 1783.446248 622.681765 10 2020
38 2020-09-30 8.782468 9.318198 0.447619 3.780273 21.613501 309.644693 1790.096266 594.891798 9 2020
37 2020-08-31 8.535027 9.583940 0.517722 3.860195 20.853958 303.025472 1759.655769 597.264223 8 2020
36 2020-07-31 8.226212 9.180892 0.474289 3.646311 19.746735 295.059049 1688.793476 613.164837 7 2020
35 2020-06-30 7.935418 8.632543 0.408732 3.410121 18.330232 287.545439 1593.554077 653.294214 6 2020
34 2020-05-31 7.752543 8.622809 0.329810 2.974434 16.470193 279.636700 1484.100789 685.164897 5 2020
33 2020-04-30 7.585870 8.028798 0.389348 2.856049 15.635886 273.859451 1426.279447 703.866225 4 2020
32 2020-03-31 7.602619 8.326974 0.278294 2.629290 13.983202 268.429411 1351.023144 698.012543 3 2020
31 2020-02-29 7.662918 7.607357 0.283796 2.483387 13.803671 264.348120 1300.252926 710.281917 2 2020
30 2020-01-31 7.754674 7.804206 0.336518 2.506526 13.554615 262.188585 1312.052006 700.065050 1 2020
29 2019-12-31 7.985734 7.461197 0.314830 2.417687 13.255049 259.519881 1309.507549 684.085808 12 2019
28 2019-11-30 8.226894 6.606762 0.411812 2.290050 12.958136 257.590110 1306.817593 654.086494 11 2019
27 2019-10-31 9.246261 7.644706 0.372446 2.673924 13.880747 257.093790 1407.996110 565.502500 10 2019
26 2019-09-30 9.661653 7.323367 0.420726 2.628199 13.308826 252.133648 1383.259943 550.533951 9 2019
25 2019-08-31 9.812387 6.681962 0.510871 2.483979 13.067311 243.440762 1302.643938 568.994610 8 2019
24 2019-07-31 9.943331 6.648062 0.492584 2.326774 11.791042 234.309877 1257.822071 572.162592 7 2019
23 2019-06-30 10.283766 6.186843 0.377420 2.165453 10.729356 226.061226 1226.489872 589.417023 6 2019
22 2019-05-31 10.118435 5.524447 0.311802 1.904199 9.275237 213.531002 1178.495602 657.617859 5 2019
21 2019-04-30 9.523452 4.620810 0.233048 1.703750 8.359211 195.914846 1044.554593 898.940531 4 2019
20 2019-03-31 8.363644 3.779225 0.213011 1.171834 6.179431 167.295488 815.904473 1415.431158 3 2019
19 2019-02-28 7.480646 3.316433 0.239254 0.959470 5.319684 142.571229 662.409360 1877.447767 2 2019
18 2019-01-31 6.103601 2.528294 0.177558 0.612607 4.039948 116.639744 516.362618 2423.434258 1 2019
17 2018-12-31 5.029865 2.526096 0.232814 0.510899 3.423260 95.641880 409.359902 2831.721010 12 2018
16 2018-11-30 3.861638 1.854492 0.276273 0.289241 2.813399 78.563868 289.631986 3176.243186 11 2018
15 2018-10-31 3.858354 2.037401 0.364234 0.358209 2.651448 75.036622 274.889362 3150.082060 10 2018
14 2018-09-30 3.882403 1.842717 0.443967 0.342131 2.848867 71.592693 271.972792 3073.598901 9 2018
13 2018-08-31 3.789915 1.874313 0.444453 0.339609 2.516580 67.629768 264.437564 3016.440033 8 2018
12 2018-07-31 3.785872 1.710799 0.479277 0.405084 2.416031 63.220747 256.035651 2971.905932 7 2018
11 2018-06-30 3.376091 1.780959 0.488345 0.431397 1.777461 59.424690 246.135108 2927.244192 6 2018
10 2018-05-31 NaN NaN NaN NaN NaN NaN NaN NaN 5 2018
9 2018-04-30 NaN NaN NaN NaN NaN NaN NaN NaN 4 2018
8 2018-03-31 NaN NaN NaN NaN NaN NaN NaN NaN 3 2018
7 2018-02-28 NaN NaN NaN NaN NaN NaN NaN NaN 2 2018
6 2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN 1 2018
5 2017-12-31 NaN NaN NaN NaN NaN NaN NaN NaN 12 2017
4 2017-11-30 NaN NaN NaN NaN NaN NaN NaN NaN 11 2017
3 2017-10-31 NaN NaN NaN NaN NaN NaN NaN NaN 10 2017
2 2017-09-30 NaN NaN NaN NaN NaN NaN NaN NaN 9 2017
1 2017-08-31 NaN NaN NaN NaN NaN NaN NaN NaN 8 2017
0 2017-07-31 0.892207 0.797776 0.572518 0.119328 0.203212 23.137884 230.986328 1756.658813 7 2017
I also defined the index for both data frames as finally, I want to divide the column of [H2] in the first data frame over the column of [H2] in the first dataframe:
df_Data = Data.set_index(['Month', 'Year'])
df_Data.head (50)
df_Srt_Mean = Srt_Mean.set_index (['Month_mean', 'Year_mean'])
df_Srt_Mean.head (50)
Date H2 CH4 C2H6 C2H4 C2H2 CO CO2 O2
Month Year
4 2021 2021-04-14 02:00:00 8.301259 10.889560 7.205929 3.485577 0.108262 318.616211 1659.179688 866.826721
2021 2021-04-13 03:00:00 8.190150 10.224614 7.369829 3.561115 0.130052 318.895599 1641.014526 883.500305
2021 2021-04-12 04:00:00 8.223248 10.297009 7.571199 3.479434 0.113566 315.364594 1636.670776 896.083679
2021 2021-04-11 05:00:00 8.342580 10.233653 7.726023 3.474085 0.234786 316.315277 1641.205078 875.664856
2021 2021-04-10 06:00:00 8.365788 9.825816 7.640978 3.621368 0.320388 320.200409 1658.575806 880.871399
2021 2021-04-09 07:00:00 8.113251 11.198173 7.588203 3.561790 0.200721 318.738922 1651.639038 886.923401
2021 2021-04-08 08:00:00 7.881397 7.967482 7.382273 3.528960 0.180016 315.252838 1625.236328 878.604309
2021 2021-04-07 09:00:00 7.833044 6.773924 7.292545 3.475330 0.401435 317.085449 1628.325562 893.305664
2021 2021-04-06 10:00:00 7.908926 9.419571 7.018494 3.347562 0.406113 317.643768 1620.742554 912.732422
2021 2021-04-05 11:00:00 8.192807 9.262563 7.227449 3.275920 0.133978 312.931152 1601.240845 932.079102
2021 2021-04-04 12:00:00 8.086914 9.480316 6.515196 3.312712 0.000000 315.486816 1609.530884 928.141907
2021 2021-04-03 13:00:00 7.984566 9.406860 6.712120 3.476949 0.336859 312.862793 1596.182495 938.904724
2021 2021-04-02 14:00:00 8.077889 8.335327 7.443592 3.605910 0.416443 315.546539 1605.549438 928.619568
2021 2021-04-01 15:00:00 7.996786 9.087573 7.950811 3.626776 0.745824 311.601471 1608.987183 897.747498
3 2021 2021-03-31 16:00:00 8.433417 10.078784 6.567528 3.646854 0.682301 313.811615 1619.164673 825.123596
2021 2021-03-30 17:00:00 8.445275 9.768773 7.460344 3.712297 0.353539 314.944672 1606.494751 811.027161
2021 2021-03-29 18:00:00 8.398427 9.607062 7.446943 3.674934 0.287205 314.554596 1599.793823 828.780090
2021 2021-03-28 19:00:00 8.272332 9.678397 7.303371 3.617573 0.430137 311.486664 1590.122192 828.557312
2021 2021-03-27 20:00:00 8.478241 9.364383 7.153194 3.616118 0.548547 314.538849 1578.516235 821.565125
2021 2021-03-26 21:00:00 8.452413 10.828227 6.825691 3.260484 0.642971 314.990082 1561.811890 826.468079
2021 2021-03-25 22:00:00 8.420037 10.468951 6.614395 3.279383 0.442519 314.821197 1538.289673 835.261902
2021 2021-03-24 23:00:00 8.290853 9.943011 5.952219 3.263231 0.077059 313.060883 1498.917969 859.999023
2021 2021-03-24 00:00:00 8.053485 9.717534 5.773523 3.210894 0.477235 309.256561 1461.547974 867.371643
2021 2021-03-23 01:00:00 8.813514 10.700623 5.444063 2.965948 0.421797 312.926971 1437.077026 867.363709
2021 2021-03-22 02:00:00 8.149124 9.727563 4.518490 2.958276 0.368664 311.796661 1420.417358 916.602539
2021 2021-03-21 03:00:00 8.169525 8.859634 5.212233 3.129839 0.416121 312.702301 1419.987427 904.523865
2021 2021-03-20 04:00:00 7.999515 8.994797 5.137753 3.148643 0.475540 307.183685 1420.932739 913.971130
2021 2021-03-19 05:00:00 8.183563 10.373088 4.949068 3.037351 0.584536 312.275482 1440.424683 895.362122
2021 2021-03-18 06:00:00 9.914630 10.722699 4.891720 3.121366 0.364292 312.476959 1446.715210 889.638367
2021 2021-03-17 07:00:00 8.063797 9.449814 4.965353 3.158536 0.332817 307.930389 1443.011108 883.420349
2021 2021-03-16 08:00:00 8.858215 9.454753 5.053194 3.093672 0.249709 313.467071 1456.114624 902.091492
2021 2021-03-15 09:00:00 8.146770 8.423282 5.213614 3.038460 0.228652 312.719238 1443.799438 900.013672
2021 2021-03-14 10:00:00 8.160034 14.032947 5.426914 2.981697 0.391028 313.519440 1459.276245 891.870300
2021 2021-03-13 11:00:00 7.876873 5.985085 5.602545 2.998276 0.607312 311.964203 1447.259399 886.466492
2021 2021-03-12 12:00:00 8.299830 9.434842 5.768423 2.931913 0.374833 312.165375 1450.703979 893.731873
2021 2021-03-11 13:00:00 8.258931 9.164996 5.773973 2.917338 0.367790 312.416412 1447.783203 884.459534
2021 2021-03-10 14:00:00 8.285775 9.396652 5.687450 3.018778 0.367582 312.764160 1452.421875 883.869568
2021 2021-03-09 15:00:00 8.069007 9.174088 5.641685 3.134619 0.282684 307.792206 1445.247192 887.044922
2021 2021-03-08 16:00:00 8.150889 8.341151 5.952223 3.310198 0.276260 310.551758 1453.108765 881.680664
2021 2021-03-07 17:00:00 8.148776 8.571256 5.962189 3.365770 0.321035 311.439789 1450.016235 881.019348
2021 2021-03-06 18:00:00 8.235992 9.840173 5.190016 3.325249 0.390993 313.732513 1476.067505 880.206055
2021 2021-03-05 19:00:00 8.041183 8.705338 6.181820 3.528234 0.299884 308.838959 1456.264038 857.722656
2021 2021-03-04 20:00:00 8.286016 8.883926 5.667931 3.196103 0.350631 314.590729 1479.576538 861.197266
2021 2021-03-03 21:00:00 8.245660 9.066014 5.785030 3.191303 0.378657 313.044281 1479.022095 850.414856
2021 2021-03-02 22:00:00 8.386712 9.401718 6.162895 3.043518 0.363813 312.941315 1493.645142 840.161438
2021 2021-03-01 23:00:00 8.231705 10.864131 6.184435 3.010111 0.217610 309.424164 1501.307983 834.103943
2021 2021-03-01 00:00:00 8.253326 10.673305 5.977970 3.028328 0.349412 310.304413 1501.962891 825.492371
2 2021 2021-02-28 01:00:00 8.313703 10.718976 5.379131 3.017091 0.303016 313.576935 1511.731079 837.980774
2021 2021-02-27 02:00:00 8.315781 10.122794 5.632700 3.183661 0.419333 309.140228 1502.215210 855.478516
2021 2021-02-26 03:00:00 7.974852 10.396459 6.063492 3.239314 0.497979 314.248688 1523.176880 852.766907
Date H2 CH4 C2H2 C2H4 C2H6 CO CO2 O2
Month_mean Year_mean
4 2021 2021-04-30 8.107043 9.457317 0.266317 3.488106 7.331760 316.181560 1627.434300 900.000397
3 2021 2021-03-31 8.317389 9.627182 0.385551 3.209554 5.862067 312.134351 1484.145511 867.169165
2 2021 2021-02-28 8.191080 9.900578 0.334562 3.352319 6.802692 314.076140 1572.294619 815.143081
1 2021 2021-01-31 8.238818 9.373566 0.344173 3.372903 6.670032 313.475182 1604.747685 788.205679
12 2020 2020-12-31 8.374398 9.114723 0.340198 3.351321 6.768138 312.902290 1642.077442 766.767532
11 2020 2020-11-30 8.535022 9.695740 0.427224 3.352291 11.561881 311.624202 1676.413354 713.680609
10 2020 2020-10-31 8.766880 17.531840 0.436720 3.671641 21.794714 312.511920 1783.446248 622.681765
9 2020 2020-09-30 8.782468 9.318198 0.447619 3.780273 21.613501 309.644693 1790.096266 594.891798
8 2020 2020-08-31 8.535027 9.583940 0.517722 3.860195 20.853958 303.025472 1759.655769 597.264223
7 2020 2020-07-31 8.226212 9.180892 0.474289 3.646311 19.746735 295.059049 1688.793476 613.164837
6 2020 2020-06-30 7.935418 8.632543 0.408732 3.410121 18.330232 287.545439 1593.554077 653.294214
5 2020 2020-05-31 7.752543 8.622809 0.329810 2.974434 16.470193 279.636700 1484.100789 685.164897
4 2020 2020-04-30 7.585870 8.028798 0.389348 2.856049 15.635886 273.859451 1426.279447 703.866225
3 2020 2020-03-31 7.602619 8.326974 0.278294 2.629290 13.983202 268.429411 1351.023144 698.012543
2 2020 2020-02-29 7.662918 7.607357 0.283796 2.483387 13.803671 264.348120 1300.252926 710.281917
1 2020 2020-01-31 7.754674 7.804206 0.336518 2.506526 13.554615 262.188585 1312.052006 700.065050
12 2019 2019-12-31 7.985734 7.461197 0.314830 2.417687 13.255049 259.519881 1309.507549 684.085808
11 2019 2019-11-30 8.226894 6.606762 0.411812 2.290050 12.958136 257.590110 1306.817593 654.086494
10 2019 2019-10-31 9.246261 7.644706 0.372446 2.673924 13.880747 257.093790 1407.996110 565.502500
9 2019 2019-09-30 9.661653 7.323367 0.420726 2.628199 13.308826 252.133648 1383.259943 550.533951
8 2019 2019-08-31 9.812387 6.681962 0.510871 2.483979 13.067311 243.440762 1302.643938 568.994610
7 2019 2019-07-31 9.943331 6.648062 0.492584 2.326774 11.791042 234.309877 1257.822071 572.162592
6 2019 2019-06-30 10.283766 6.186843 0.377420 2.165453 10.729356 226.061226 1226.489872 589.417023
5 2019 2019-05-31 10.118435 5.524447 0.311802 1.904199 9.275237 213.531002 1178.495602 657.617859
4 2019 2019-04-30 9.523452 4.620810 0.233048 1.703750 8.359211 195.914846 1044.554593 898.940531
3 2019 2019-03-31 8.363644 3.779225 0.213011 1.171834 6.179431 167.295488 815.904473 1415.431158
2 2019 2019-02-28 7.480646 3.316433 0.239254 0.959470 5.319684 142.571229 662.409360 1877.447767
1 2019 2019-01-31 6.103601 2.528294 0.177558 0.612607 4.039948 116.639744 516.362618 2423.434258
12 2018 2018-12-31 5.029865 2.526096 0.232814 0.510899 3.423260 95.641880 409.359902 2831.721010
11 2018 2018-11-30 3.861638 1.854492 0.276273 0.289241 2.813399 78.563868 289.631986 3176.243186
10 2018 2018-10-31 3.858354 2.037401 0.364234 0.358209 2.651448 75.036622 274.889362 3150.082060
9 2018 2018-09-30 3.882403 1.842717 0.443967 0.342131 2.848867 71.592693 271.972792 3073.598901
8 2018 2018-08-31 3.789915 1.874313 0.444453 0.339609 2.516580 67.629768 264.437564 3016.440033
7 2018 2018-07-31 3.785872 1.710799 0.479277 0.405084 2.416031 63.220747 256.035651 2971.905932
6 2018 2018-06-30 3.376091 1.780959 0.488345 0.431397 1.777461 59.424690 246.135108 2927.244192
5 2018 2018-05-31 NaN NaN NaN NaN NaN NaN NaN NaN
4 2018 2018-04-30 NaN NaN NaN NaN NaN NaN NaN NaN
3 2018 2018-03-31 NaN NaN NaN NaN NaN NaN NaN NaN
2 2018 2018-02-28 NaN NaN NaN NaN NaN NaN NaN NaN
1 2018 2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
12 2017 2017-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
11 2017 2017-11-30 NaN NaN NaN NaN NaN NaN NaN NaN
10 2017 2017-10-31 NaN NaN NaN NaN NaN NaN NaN NaN
9 2017 2017-09-30 NaN NaN NaN NaN NaN NaN NaN NaN
8 2017 2017-08-31 NaN NaN NaN NaN NaN NaN NaN NaN
7 2017 2017-07-31 0.892207 0.797776 0.572518 0.119328 0.203212 23.137884 230.986328 1756.658813
Now, for each month of each year, I have one mean, How can divide the column of H2 of the first data frame over this column which includes one number.. For example for
April 2021, we have 30 days and one mean,
May 2021, we have 31 days and one mean,
Based on the index of these two data frames this division should be performed.
I really appreciate it if if you can help me find a solution..
I have rain and temp data sourced from Environment Canada but it contains some NaN values.
start_date = '2015-12-31'
end_date = '2021-05-26'
mask = (data['date'] > start_date) & (data['date'] <= end_date)
df = data.loc[mask]
print(df)
date time rain_gauge_value temperature
8760 2016-01-01 00:00:00 0.0 -2.9
8761 2016-01-01 01:00:00 0.0 -3.4
8762 2016-01-01 02:00:00 0.0 -3.6
8763 2016-01-01 03:00:00 0.0 -3.6
8764 2016-01-01 04:00:00 0.0 -4.0
... ... ... ... ...
56107 2021-05-26 19:00:00 0.0 22.0
56108 2021-05-26 20:00:00 0.0 21.5
56109 2021-05-26 21:00:00 0.0 21.1
56110 2021-05-26 22:00:00 0.0 19.5
56111 2021-05-26 23:00:00 0.0 18.5
[47352 rows x 4 columns]
Find the rows with a NaN value
null = df[df['rain_gauge_value'].isnull()]
print(null)
date time rain_gauge_value temperature
11028 2016-04-04 12:00:00 NaN -6.9
11986 2016-05-14 10:00:00 NaN NaN
11987 2016-05-14 11:00:00 NaN NaN
11988 2016-05-14 12:00:00 NaN NaN
11989 2016-05-14 13:00:00 NaN NaN
... ... ... ... ...
49024 2020-08-04 16:00:00 NaN NaN
49025 2020-08-04 17:00:00 NaN NaN
50505 2020-10-05 09:00:00 NaN 11.3
54083 2021-03-03 11:00:00 NaN -5.1
54084 2021-03-03 12:00:00 NaN -4.5
[6346 rows x 4 columns]
This is my dataframe I want to use to fill the NaN values
print(rain_df)
date time rain_gauge_value temperature
0 2015-12-28 00:00:00 0.1 -6.0
1 2015-12-28 01:00:00 0.0 -7.0
2 2015-12-28 02:00:00 0.0 -8.0
3 2015-12-28 03:00:00 0.0 -8.0
4 2015-12-28 04:00:00 0.0 -7.0
... ... ... ... ...
48043 2021-06-19 19:00:00 0.6 20.0
48044 2021-06-19 20:00:00 0.6 19.0
48045 2021-06-19 21:00:00 0.8 18.0
48046 2021-06-19 22:00:00 0.4 17.0
48047 2021-06-19 23:00:00 0.0 16.0
[48048 rows x 4 columns]
But when I use the fillna() method, some of the values don't get substitued.
null = null.fillna(rain_df)
null = null[null['rain_gauge_value'].isnull()]
print(null)
date time rain_gauge_value temperature
48057 2020-06-25 09:00:00 NaN NaN
48058 2020-06-25 10:00:00 NaN NaN
48059 2020-06-25 11:00:00 NaN NaN
48060 2020-06-25 12:00:00 NaN NaN
48586 2020-07-17 10:00:00 NaN NaN
48587 2020-07-17 11:00:00 NaN NaN
48588 2020-07-17 12:00:00 NaN NaN
49022 2020-08-04 14:00:00 NaN NaN
49023 2020-08-04 15:00:00 NaN NaN
49024 2020-08-04 16:00:00 NaN NaN
49025 2020-08-04 17:00:00 NaN NaN
50505 2020-10-05 09:00:00 NaN 11.3
54083 2021-03-03 11:00:00 NaN -5.1
54084 2021-03-03 12:00:00 NaN -4.5
How can I resolve this issue?
when fillna, you probably want a method, like fill using previous/next value, mean of column etc, what we can do is like this
nulls_index = df['rain_gauge_value'].isnull()
df = df.fillna(method='ffill') # use ffill as example
nulls_after_fill = df[nulls_index]
take a look at:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
You need to inform pandas how you want to patch. It may be obvious to you want to use the "patch" dataframe's values when the date and times line up, but it won't be obvious to pandas. see my dummy example:
raw = pd.DataFrame(dict(date=[date(2015,12,28), date(2015,12,28)], time= [time(0,0,0),time(0,0,1)],temp=[1.,np.nan],rain=[4.,np.nan]))
raw
date time temp rain
0 2015-12-28 00:00:00 1.0 4.0
1 2015-12-28 00:00:01 NaN NaN
patch = pd.DataFrame(dict(date=[date(2015,12,28), date(2015,12,28)], time=[time(0,0,0),time(0,0,1)],temp=[5.,5.],rain=[10.,10.]))
patch
date time temp rain
0 2015-12-28 00:00:00 5.0 10.0
1 2015-12-28 00:00:01 5.0 10.0
you need the indexes of raw and patch to correspond to how you want to patch the raw data (in this case, you want to patch based on date and time)
raw.set_index(['date','time']).fillna(patch.set_index(['date','time']))
returns
temp rain
date time
2015-12-28 00:00:00 1.0 4.0
00:00:01 5.0 10.0
I want to find the last valid index of the first Dataframe, and use it to index the second Dataframe.
So, suppose I have the following Dataframe (df1):
Site 1 Site 2 Site 3 Site 4 Site 5 Site 6
Date
2000-01-01 13.0 28.0 76.0 45 90.0 58.0
2001-01-01 77.0 75.0 57.0 3 41.0 24.0
2002-01-01 50.0 29.0 2.0 65 48.0 21.0
2003-01-01 7.0 48.0 14.0 63 12.0 66.0
2004-01-01 11.0 90.0 11.0 5 47.0 6.0
2005-01-01 50.0 4.0 31.0 1 40.0 79.0
2006-01-01 30.0 98.0 91.0 96 43.0 39.0
2007-01-01 50.0 20.0 54.0 65 NaN 47.0
2008-01-01 24.0 84.0 52.0 84 NaN 81.0
2009-01-01 56.0 61.0 57.0 25 NaN 36.0
2010-01-01 87.0 45.0 68.0 65 NaN 71.0
2011-01-01 22.0 50.0 92.0 91 NaN 48.0
2012-01-01 12.0 44.0 79.0 77 NaN 25.0
2013-01-01 1.0 22.0 34.0 57 NaN 25.0
2014-01-01 94.0 NaN 86.0 97 NaN 91.0
2015-01-01 2.0 NaN 98.0 44 NaN 79.0
2016-01-01 81.0 NaN 35.0 87 NaN 32.0
2017-01-01 59.0 NaN 95.0 32 NaN 58.0
2018-01-01 NaN NaN 3.0 14 NaN NaN
2019-01-01 NaN NaN 48.0 9 NaN NaN
2020-01-01 NaN NaN NaN 49 NaN NaN
Now I can use "first_valid_index()" to find the last valid index of each column:
lvi = df.apply(lambda series: series.last_valid_index())
Which yields:
Site 1 2017-01-01
Site 2 2013-01-01
Site 3 2019-01-01
Site 4 2020-01-01
Site 5 2006-01-01
Site 6 2017-01-01
How do I apply this to another Dataframe where I use this index to slice the timeseries of another Dataframe. Another example of a Dataframe could be created with:
import pandas as pd
import numpy as np
from numpy import random
random.seed(30)
df2 = pd.DataFrame({
"Site 1": np.random.rand(21),
"Site 2": np.random.rand(21),
"Site 3": np.random.rand(21),
"Site 4": np.random.rand(21),
"Site 5": np.random.rand(21),
"Site 6": np.random.rand(21)})
idx = pd.date_range(start='2000-01-01', end='2020-01-01',freq ='AS')
df2 = df2.set_index(idx)
How do I use that "lvi" variable to index into df2?
To do this manually I could just use:
df_s1 = df['Site 1'].loc['2000-01-01':'2017-01-01']
To get something like:
2000-01-01 13.0
2001-01-01 77.0
2002-01-01 50.0
2003-01-01 7.0
2004-01-01 11.0
2005-01-01 50.0
2006-01-01 30.0
2007-01-01 50.0
2008-01-01 24.0
2009-01-01 56.0
2010-01-01 87.0
2011-01-01 22.0
2012-01-01 12.0
2013-01-01 1.0
2014-01-01 94.0
2015-01-01 2.0
2016-01-01 81.0
2017-01-01 59.0
Is there a better way to approach this? Also, will each column have to essentially be its own dataframe to work? Any help is greatly appreciated!
This might be a bit more idiomatic:
df2[df.notna()]
or even
df2.where(df.notna())
Note that in these cases (and df1*0 + df2), the operations are done for matching index values of df and df2. For example, df2[df.reset_index(drop=True).notna()] will return all nan because there are no common index values.
This seems to work just fine:
In [34]: d
Out[34]:
x y
Date
2020-01-01 1.0 2.0
2020-01-02 1.0 2.0
2020-01-03 1.0 2.0
2020-01-04 1.0 2.0
2020-01-05 1.0 2.0
2020-01-06 1.0 NaN
2020-01-07 1.0 NaN
2020-01-08 1.0 NaN
2020-01-09 1.0 NaN
2020-01-10 1.0 NaN
2020-01-11 NaN NaN
2020-01-12 NaN NaN
2020-01-13 NaN NaN
2020-01-14 NaN NaN
2020-01-15 NaN NaN
2020-01-16 NaN NaN
2020-01-17 NaN NaN
2020-01-18 NaN NaN
2020-01-19 NaN NaN
2020-01-20 NaN NaN
In [35]: d.apply(lambda col: col.last_valid_index())
Out[35]:
x 2020-01-10
y 2020-01-05
dtype: datetime64[ns]
And then:
In [15]: d.apply(lambda col: col.last_valid_index()).apply(lambda date: df2.loc[date]) Out[15]: z x 0.940396 y 0.564007
Alright, so after thinking about this for a while and trying to come up with a detailed procedure that involved a for loop etc., I came to the conclusions that this simple math operation will do the trick. Basically I am taking advantage of how math is done between Dataframes in pandas.
output = df1*0 + df2
This gives the output on df2 that will take on the NaN values from df1 and look like this:
Site 1 Site 2 Site 3 Site 4 Site 5 Site 6
Date
2000-01-01 0.690597 0.443933 0.787931 0.659639 0.363606 0.922373
2001-01-01 0.388669 0.577734 0.450225 0.021592 0.554249 0.305546
2002-01-01 0.578212 0.927848 0.361426 0.840541 0.626881 0.545491
2003-01-01 0.431668 0.128282 0.893351 0.783488 0.122182 0.666194
2004-01-01 0.151491 0.928584 0.834474 0.945401 0.590830 0.802648
2005-01-01 0.113477 0.398326 0.649955 0.202538 0.485927 0.127925
2006-01-01 0.521906 0.458672 0.923632 0.948696 0.638754 0.552753
2007-01-01 0.266599 0.839047 0.099069 0.000928 NaN 0.018146
2008-01-01 0.819810 0.809779 0.706223 0.247780 NaN 0.759691
2009-01-01 0.441574 0.020291 0.702551 0.468862 NaN 0.341191
2010-01-01 0.277030 0.130573 0.906697 0.589474 NaN 0.819986
2011-01-01 0.795344 0.103121 0.846405 0.589916 NaN 0.564411
2012-01-01 0.697255 0.599767 0.206482 0.718980 NaN 0.731366
2013-01-01 0.891771 0.001944 0.703132 0.751986 NaN 0.845933
2014-01-01 0.672579 NaN 0.466981 0.466770 NaN 0.618069
2015-01-01 0.767219 NaN 0.702156 0.370905 NaN 0.481971
2016-01-01 0.315264 NaN 0.793531 0.754920 NaN 0.091432
2017-01-01 0.431651 NaN 0.974520 0.708074 NaN 0.870077
2018-01-01 NaN NaN 0.408743 0.430576 NaN NaN
2019-01-01 NaN NaN 0.751509 0.755521 NaN NaN
2020-01-01 NaN NaN NaN 0.518533 NaN NaN
I was basically wanting to imprint the NaN values from one Dataframe onto another. I cannot believe how difficult I was making this. As long as my Dataframes are the same size this should work fine for my needs.
Now I should be able to take it from here to calculate the percent change from each last valid datapoint. Thank you everyone for the input!
EDIT:
Just to show everyone what I was ultimately trying to accomplish, here is the final code I produced with everyone's help and suggestions!
The original df originally looked like:
Site 1 Site 2 Site 3 Site 4 Site 5 Site 6
Date
2000-01-01 13.0 28.0 76.0 45 90.0 58.0
2001-01-01 77.0 75.0 57.0 3 41.0 24.0
2002-01-01 50.0 29.0 2.0 65 48.0 21.0
2003-01-01 7.0 48.0 14.0 63 12.0 66.0
2004-01-01 11.0 90.0 11.0 5 47.0 6.0
2005-01-01 50.0 4.0 31.0 1 40.0 79.0
2006-01-01 30.0 98.0 91.0 96 43.0 39.0
2007-01-01 50.0 20.0 54.0 65 NaN 47.0
2008-01-01 24.0 84.0 52.0 84 NaN 81.0
2009-01-01 56.0 61.0 57.0 25 NaN 36.0
2010-01-01 87.0 45.0 68.0 65 NaN 71.0
2011-01-01 22.0 50.0 92.0 91 NaN 48.0
2012-01-01 12.0 44.0 79.0 77 NaN 25.0
2013-01-01 1.0 22.0 34.0 57 NaN 25.0
2014-01-01 94.0 NaN 86.0 97 NaN 91.0
2015-01-01 2.0 NaN 98.0 44 NaN 79.0
2016-01-01 81.0 NaN 35.0 87 NaN 32.0
2017-01-01 59.0 NaN 95.0 32 NaN 58.0
2018-01-01 NaN NaN 3.0 14 NaN NaN
2019-01-01 NaN NaN 48.0 9 NaN NaN
2020-01-01 NaN NaN NaN 49 NaN NaN
Then I came up with a second full dataframe (df2) with:
df2 = pd.DataFrame({
"Site 1": np.random.rand(21),
"Site 2": np.random.rand(21),
"Site 3": np.random.rand(21),
"Site 4": np.random.rand(21),
"Site 5": np.random.rand(21),
"Site 6": np.random.rand(21)})
idx = pd.date_range(start='2000-01-01', end='2020-01-01',freq ='AS')
df2 = df2.set_index(idx)
Now I replace the nan values in df2 with the nan values from df:
dfr = df2[df.notna()]
Then I invert the dataframe:
dfr = dfr[::-1]
valid_first = dfr.apply(lambda col: col.first_valid_index())
valid_last = dfr.apply(lambda col: col.last_valid_index())
Now I want the to calculate the percent change from my last valid data point, which is fixed for each column. This gives me the % change from the present to the past, with respect to the most recent (or last valid) data point.
new = []
for j in dfr:
m = dfr[j].loc[valid_first[j]:valid_last[j]]
pc = m / m.iloc[0]-1
new.append(pc)
final = pd.concat(new,axis=1)
print(final)
Which gave me:
Site 1 Site 2 Site 3 Site 4 Site 5 Site 6
2000-01-01 0.270209 -0.728445 -0.636105 0.380330 41.339081 -0.462147
2001-01-01 0.854952 -0.827804 -0.703568 -0.787391 40.588791 -0.884806
2002-01-01 -0.677757 -0.120482 -0.208255 -0.982097 54.348094 -0.483415
2003-01-01 -0.322010 -0.061277 -0.382602 1.025088 5.440808 -0.602661
2004-01-01 1.574451 -0.768251 -0.543260 1.210434 50.494788 -0.859331
2005-01-01 -0.412226 -0.866441 -0.055027 -0.168267 1.346869 -0.385080
2006-01-01 1.280867 -0.640899 0.354513 1.086703 0.000000 0.108504
2007-01-01 1.121585 -0.741675 -0.735990 -0.768578 NaN -0.119436
2008-01-01 -0.210467 -0.376884 -0.575106 -0.779147 NaN 0.055949
2009-01-01 1.864107 -0.966827 0.566590 1.003121 NaN -0.214482
2010-01-01 0.571762 -0.311459 -0.518113 1.036950 NaN -0.513911
2011-01-01 -0.122525 -0.178137 -0.641642 0.197481 NaN 0.033141
2012-01-01 0.403578 -0.829402 0.161753 -0.438578 NaN -0.996595
2013-01-01 0.383481 0.000000 -0.305824 0.602079 NaN -0.057711
2014-01-01 -0.699708 NaN -0.515074 -0.277157 NaN -0.840873
2015-01-01 0.422364 NaN -0.759708 1.230037 NaN -0.663253
2016-01-01 -0.418945 NaN 0.197396 -0.445260 NaN -0.299741
2017-01-01 0.000000 NaN -0.897428 0.669791 NaN 0.000000
2018-01-01 NaN NaN 0.138997 0.486961 NaN NaN
2019-01-01 NaN NaN 0.000000 0.200771 NaN NaN
2020-01-01 NaN NaN NaN 0.000000 NaN NaN
I know often times these questions don't have context, so here is the final output achieved thanks to your input. Again, thank you to everyone for the help!
I have multiple different series data saved as Multiindex(2-level) pandas dataframe. I want to know how to reindex a Multiindex dataframe so that I get indexes for all(hourly) data between two existing indexes.
So this is an example of my dataframe:
A B C D
tick act
2019-01-10 2019-01-09 20:00:00 5.0 5.0 5.0 5.0
2019-01-10 00:00:00 52.0 34.0 1.0 9.0
2019-01-10 01:00:00 75.0 52.0 61.0 1.0
2019-01-10 02:00:00 28.0 29.0 46.0 61.0
2019-01-16 2019-01-09 22:00:00 91.0 42.0 3.0 34.0
2019-01-10 02:00:00 2.0 22.0 41.0 59.0
2019-01-10 03:00:00 16.0 9.0 92.0 53.0
And this is what I want to get:
tick act
2019-01-10 2019-01-09 20:00:00 5.0 5.0 5.0 5.0
2019-01-09 21:00:00 NaT NaN NaN NaN NaN
2019-01-09 22:00:00 NaT NaN NaN NaN NaN
2019-01-09 23:00:00 NaT NaN NaN NaN NaN
2019-01-10 00:00:00 52.0 34.0 1.0 9.0
2019-01-10 01:00:00 75.0 52.0 61.0 1.0
2019-01-10 02:00:00 28.0 29.0 46.0 61.0
2019-01-16 2019-01-09 22:00:00 91.0 42.0 3.0 34.0
2019-01-09 23:00:00 NaT NaN NaN NaN NaN
2019-01-10 00:00:00 NaT NaN NaN NaN NaN
2019-01-10 01:00:00 NaT NaN NaN NaN NaN
2019-01-10 02:00:00 2.0 22.0 41.0 59.0
2019-01-10 03:00:00 16.0 9.0 92.0 53.0
The important thing to remember is that the 'act' index level doesn't have same date range(for example in 2019-01-10 it starts with 2019-01-09 20:00:00 and ends with 2019-01-10 02:00:00 while for 2019-01-16 it starts with 2019-01-09 22:00:00 and ends with 2019-01-10 03:00:00).
I am mainly interested if there exists a solution using pandas methods without unnecessary external loops.
At first reset_index of your data.
d = df.reset_index()
d
tick act A B C D
0 2019-01-10 2019-01-09 20:00:00 5.0 5.0 5.0 5.0
1 2019-01-10 2019-01-10 00:00:00 52.0 34.0 1.0 9.0
2 2019-01-10 2019-01-10 01:00:00 75.0 52.0 61.0 1.0
3 2019-01-10 2019-01-10 02:00:00 28.0 29.0 46.0 61.0
4 2019-01-16 2019-01-09 22:00:00 91.0 42.0 3.0 34.0
5 2019-01-16 2019-01-10 02:00:00 2.0 22.0 41.0 59.0
6 2019-01-16 2019-01-10 03:00:00 16.0 9.0 92.0 53.0
Group your data by tick and apply the interpolate function to each group.
def interpolate(df):
# generate new index
new_index = pd.date_range(df.act.min(),df.act.max(),freq="h")
# set `act` as index and unsampleing it to hours
return df.set_index("act").reindex(new_index)
d.groupby("tick").apply(interpolate)
It gives:
tick A B C D
tick
2019-01-10 2019-01-09 20:00:00 2019-01-10 5.0 5.0 5.0 5.0
2019-01-09 21:00:00 NaN NaN NaN NaN NaN
2019-01-09 22:00:00 NaN NaN NaN NaN NaN
2019-01-09 23:00:00 NaN NaN NaN NaN NaN
2019-01-10 00:00:00 2019-01-10 52.0 34.0 1.0 9.0
2019-01-10 01:00:00 2019-01-10 75.0 52.0 61.0 1.0
2019-01-10 02:00:00 2019-01-10 28.0 29.0 46.0 61.0
2019-01-16 2019-01-09 22:00:00 2019-01-16 91.0 42.0 3.0 34.0
2019-01-09 23:00:00 NaN NaN NaN NaN NaN
2019-01-10 00:00:00 NaN NaN NaN NaN NaN
2019-01-10 01:00:00 NaN NaN NaN NaN NaN
2019-01-10 02:00:00 2019-01-16 2.0 22.0 41.0 59.0
2019-01-10 03:00:00 2019-01-16 16.0 9.0 92.0 53.0
This is part of my data:
Day_Data Hour_Data WIN_D WIN_S TEM RHU PRE_1h
1 0 58 1 22 78 0
1 3 32 1.9 24.6 65 0
1 6 41 3.2 25.6 59 0
1 9 20 0.8 24.8 64 0
1 12 44 1.7 22.7 76 0
1 15 118 0.7 20.2 92 0
1 18 70 2.6 20.2 94 0
1 21 76 3.4 19.9 66 0
2 0 76 3.8 19.4 58 0
2 3 75 5.8 19.4 47 0
2 6 81 5.1 19.5 42 0
2 9 61 3.6 17.4 48 0
2 12 50 0.9 15.8 46 0
2 15 348 1.1 14.5 52 0
2 18 357 1.9 13.5 60 0
2 21 333 1.2 12.4 74 0
and, I want to generate extra data like this:
the fill values are the mean of the last value and the next value.
How can I do that?
Thank you!
And, #jdy thanks for reminder, this is what I have done:
data['time']='2017'+'-'+'10'+'-'+data['Day_Data'].map(int).map(str)+'
'+data['Hour_Data'].map(int).map(str)+':'+'00'+':'+'00'
from datetime import datetime
data.loc[:,'Date']=pd.to_datetime(data['time'])
data=data.drop(['Day_Data','Hour_Data','time'],axis=1)
index = data.set_index(data['Date'])
data=index.resample('1h').mean()
Output:
2017-10-01 00:00:00 58.0 1.0 22.0 78.0 0.0
2017-10-01 01:00:00 NaN NaN NaN NaN NaN
2017-10-01 02:00:00 NaN NaN NaN NaN NaN
2017-10-01 03:00:00 32.0 1.9 24.6 65.0 0.0
2017-10-01 04:00:00 NaN NaN NaN NaN NaN
2017-10-01 05:00:00 NaN NaN NaN NaN NaN
2017-10-01 06:00:00 41.0 3.2 25.6 59.0 0.0
2017-10-01 07:00:00 NaN NaN NaN NaN NaN
2017-10-01 08:00:00 NaN NaN NaN NaN NaN
2017-10-01 09:00:00 20.0 0.8 24.8 64.0 0.0
2017-10-01 10:00:00 NaN NaN NaN NaN NaN
2017-10-01 11:00:00 NaN NaN NaN NaN NaN
2017-10-01 12:00:00 44.0 1.7 22.7 76.0 0.0
2017-10-01 13:00:00 NaN NaN NaN NaN NaN
2017-10-01 14:00:00 NaN NaN NaN NaN NaN
2017-10-01 15:00:00 118.0 0.7 20.2 92.0 0.0
2017-10-01 16:00:00 NaN NaN NaN NaN NaN
2017-10-01 17:00:00 NaN NaN NaN NaN NaN
2017-10-01 18:00:00 70.0 2.6 20.2 94.0 0.0
2017-10-01 19:00:00 NaN NaN NaN NaN NaN
2017-10-01 20:00:00 NaN NaN NaN NaN NaN
2017-10-01 21:00:00 76.0 3.4 19.9 66.0 0.0
2017-10-01 22:00:00 NaN NaN NaN NaN NaN
2017-10-01 23:00:00 NaN NaN NaN NaN NaN
2017-10-02 00:00:00 76.0 3.8 19.4 58.0 0.0
2017-10-02 01:00:00 NaN NaN NaN NaN NaN
2017-10-02 02:00:00 NaN NaN NaN NaN NaN
2017-10-02 03:00:00 75.0 5.8 19.4 47.0 0.0
2017-10-02 04:00:00 NaN NaN NaN NaN NaN
2017-10-02 05:00:00 NaN NaN NaN NaN NaN
2017-10-02 06:00:00 81.0 5.1 19.5 42.0 0.0
but, I have no idea about how to fill the NaN by the mean of the last value and the next value.