Resampling Hourly Data into Half Hourly in Pandas

Resampling Hourly Data into Half Hourly in Pandas - python

I have the following DataFrame called prices:
DateTime PriceAmountGBP
0 2022-03-27 23:00:00 202.807890
1 2022-03-28 00:00:00 197.724150
2 2022-03-28 01:00:00 191.615328
3 2022-03-28 02:00:00 188.798436
4 2022-03-28 03:00:00 187.706682
... ... ...
19 2023-01-24 18:00:00 216.915400
20 2023-01-24 19:00:00 197.050516
21 2023-01-24 20:00:00 168.227992
22 2023-01-24 21:00:00 158.954200
23 2023-01-24 22:00:00 149.039322
I'm trying to resample prices to show Half Hourly data instead of Hourly, with PriceAmountGBP repeating on the half hour, desired output below:
DateTime PriceAmountGBP
0 2022-03-27 23:00:00 202.807890
1 2022-03-28 23:30:00 202.807890
2 2022-03-28 00:00:00 197.724150
3 2022-03-28 00:30:00 197.724150
4 2022-03-28 01:00:00 191.615328
... ... ...
19 2023-01-24 18:00:00 216.915400
20 2023-01-24 18:30:00 216.915400
21 2023-01-24 19:00:00 197.050516
22 2023-01-24 19:30:00 197.050516
23 2023-01-24 20:00:00 168.227992
I've attempted the below which is incorrect:
prices.set_index('DateTime').resample('30T').interpolate()
Output:
PriceAmountGBP
DateTime
2022-03-27 23:00:00 202.807890
2022-03-27 23:30:00 200.266020
2022-03-28 00:00:00 197.724150
2022-03-28 00:30:00 194.669739
2022-03-28 01:00:00 191.615328
... ...
2023-01-24 20:00:00 168.227992
2023-01-24 20:30:00 163.591096
2023-01-24 21:00:00 158.954200
2023-01-24 21:30:00 153.996761
2023-01-24 22:00:00 149.039322
Any help appreciated!

You want to resample without any transformation, and then do a so-called "forward fill" of the resulting null values.
That's:
result = (
prices.set_index('DateTime')
.resample('30T')
.asfreq() # no transformation
.ffill() # drag previous values down
)

Related

plotting graph of day from a years data

So I have a dataset that has electricity load over 24 hours:
Time_of_Day = loadData.groupby(loadData.index.hour).mean()
Time_of_Day
Time Load
2019-01-01 01:00:00 38.045
2019-01-01 02:00:00 30.675
2019-01-01 03:00:00 22.570
2019-01-01 04:00:00 22.153
2019-01-01 05:00:00 21.085
... ...
2019-12-31 20:00:00 65.565
2019-12-31 21:00:00 53.513
2019-12-31 22:00:00 49.096
2019-12-31 23:00:00 44.409
2020-01-01 00:00:00 45.744
how do I plot a random day(24hrs) from the 8760 hours please

With the following toy dataframe:
import pandas as pd
import random
df = pd.DataFrame({"Time": pd.date_range(start="1/1/2019", end="12/31/2019", freq="H")})
df["Load"] = [round(random.random() * 100, 2) for _ in range(df.shape[0])]
Time Load
0 2019-01-01 00:00:00 53.36
1 2019-01-01 01:00:00 34.20
2 2019-01-01 02:00:00 64.19
3 2019-01-01 03:00:00 89.18
4 2019-01-01 04:00:00 27.82
... ... ...
8732 2019-12-30 20:00:00 38.26
8733 2019-12-30 21:00:00 49.66
8734 2019-12-30 22:00:00 64.15
8735 2019-12-30 23:00:00 23.97
8736 2019-12-31 00:00:00 3.72
[8737 rows x 2 columns]
Here is one way to do it using choice function from Python standard library random module:
# In Jupyter cell
df[
(df["Time"].dt.month == random.choice(df["Time"].dt.month))
& (df["Time"].dt.day == random.choice(df["Time"].dt.day))
].plot(x="Time")
Output:

Add hours to year-month-day data in pandas data frame

I have the following data frame with hourly resolution
day_ahead_DK1
Out[27]:
DateStamp DK1
0 2017-01-01 20.96
1 2017-01-01 20.90
2 2017-01-01 18.13
3 2017-01-01 16.03
4 2017-01-01 16.43
... ...
8756 2017-12-31 25.56
8757 2017-12-31 11.02
8758 2017-12-31 7.32
8759 2017-12-31 1.86
type(day_ahead_DK1)
Out[28]: pandas.core.frame.DataFrame
But the current column DateStamp is missing hours. How can I add hours 00:00:00, to 2017-01-01 for Index 0 so it will be 2017-01-01 00:00:00, and then 01:00:00, to 2017-01-01 for Index 1 so it will be 2017-01-01 01:00:00, and so on, so that all my days will have hours from 0 to 23. Thank you!
The expected output:
day_ahead_DK1
Out[27]:
DateStamp DK1
0 2017-01-01 00:00:00 20.96
1 2017-01-01 01:00:00 20.90
2 2017-01-01 02:00:00 18.13
3 2017-01-01 03:00:00 16.03
4 2017-01-01 04:00:00 16.43
... ...
8756 2017-12-31 20:00:00 25.56
8757 2017-12-31 21:00:00 11.02
8758 2017-12-31 22:00:00 7.32
8759 2017-12-31 23:00:00 1.86

Use GroupBy.cumcount for counter with to_timedelta for hours and add to DateStamp column:
df['DateStamp'] = pd.to_datetime(df['DateStamp'])
df['DateStamp'] += pd.to_timedelta(df.groupby('DateStamp').cumcount(), unit='H')
print (df)
DateStamp DK1
0 2017-01-01 00:00:00 20.96
1 2017-01-01 01:00:00 20.90
2 2017-01-01 02:00:00 18.13
3 2017-01-01 03:00:00 16.03
4 2017-01-01 04:00:00 16.43
8756 2017-12-31 00:00:00 25.56
8757 2017-12-31 01:00:00 11.02
8758 2017-12-31 02:00:00 7.32
8759 2017-12-31 03:00:00 1.86

How can I calculate relative time between two time stamps in hours in pandas?

In the example dataframe below, how can I convert t_relative into hours? For example, the relative time in the first row would be 49 hours.
tstart tend t_relative
0 2131-05-16 23:00:00 2131-05-19 00:00:00 2 days 01:00:00
1 2131-05-16 23:00:00 2131-05-19 00:15:00 2 days 01:15:00
2 2131-05-16 23:00:00 2131-05-19 00:45:00 2 days 01:45:00
3 2131-05-16 23:00:00 2131-05-19 01:00:00 2 days 02:00:00
4 2131-05-16 23:00:00 2131-05-19 01:15:00 2 days 02:15:00
t_relative was calculated with the operation, df['t_relative'] = df['tend']-df['tstart'].

You can divide Timedelta:
df['t_relative']/pd.Timedelta('1H')
Output:
0 49.00
1 49.25
2 49.75
3 50.00
4 50.25
Name: t_relative, dtype: float64

Select groups using slicing based on the group index in pandas DataFrame

I have a Dataframe with a users indicated by the column: 'user_id'. Each of these users have several entries in the dataframe based on the date on which they did something, which is also a column. The dataframe looks somthing like
df:
user_id date
0 2019-04-13 02:00:00
0 2019-04-13 03:00:00
3 2019-02-18 22:00:00
3 2019-02-18 23:00:00
3 2019-02-19 00:00:00
3 2019-02-19 02:00:00
3 2019-02-19 03:00:00
3 2019-02-19 04:00:00
8 2019-04-05 04:00:00
8 2019-04-05 05:00:00
8 2019-04-05 06:00:00
8 2019-04-05 15:00:00
15 2019-04-28 19:00:00
15 2019-04-28 20:00:00
15 2019-04-29 01:00:00
23 2019-06-24 02:00:00
23 2019-06-24 05:00:00
23 2019-06-24 06:00:00
24 2019-03-27 12:00:00
24 2019-03-27 13:00:00
What I want to do is, for example, select the first 3 users. I wanted to do this with a code like this:
df.groupby('user_id').iloc[:3]
I know that groupby doesn't have an iloc so how could I achieve the same thing like an iloc in the groups, so I am able to slice them?

I found a way based on crayxt's answer:
df[df['user_id'].isin(df['user_id'].unique()[:3])]

How can I efficiently convert hourly data into dates and times for every day of the year using Python pandas?

I have a pandas DataFrame that represents a value for every hour of a day and I want to report each value of each day for a year. I have written the 'naive' way to do it. Is there a more efficient way?
Naive way (that works correctly, but takes a lot of time):
dfConsoFrigo = pd.read_csv("../assets/datas/refregirateur.csv", sep=';')
dataframe = pd.DataFrame(columns=['Puissance'])
iterator = 0
for day in pd.date_range("01 Jan 2017 00:00", "31 Dec 2017 23:00", freq='1H'):
iterator = iterator % 24
dataframe.loc[day] = dfConsoFrigo.iloc[iterator]['Puissance']
iterator += 1
Input (time;value) 24 rows:
Heure;Puissance
00:00;48.0
01:00;47.0
02:00;46.0
03:00;46.0
04:00;45.0
05:00;46.0
...
19:00;55.0
20:00;53.0
21:00;51.0
22:00;50.0
23:00;49.0
Expected Output (8760 rows):
Puissance
2017-01-01 00:00:00 48
2017-01-01 01:00:00 47
2017-01-01 02:00:00 46
2017-01-01 03:00:00 46
2017-01-01 04:00:00 45
...
2017-12-31 20:00:00 53
2017-12-31 21:00:00 51
2017-12-31 22:00:00 50
2017-12-31 23:00:00 49

I think you need numpy.tile:
np.random.seed(10)
df = pd.DataFrame({'Puissance':np.random.randint(100, size=24)})
rng = pd.date_range("01 Jan 2017 00:00", "31 Dec 2017 23:00", freq='1H')
df = pd.DataFrame({'a':np.tile(df['Puissance'].values, 365)}, index=rng)
print (df.head(30))
a
2017-01-01 00:00:00 9
2017-01-01 01:00:00 15
2017-01-01 02:00:00 64
2017-01-01 03:00:00 28
2017-01-01 04:00:00 89
2017-01-01 05:00:00 93
2017-01-01 06:00:00 29
2017-01-01 07:00:00 8
2017-01-01 08:00:00 73
2017-01-01 09:00:00 0
2017-01-01 10:00:00 40
2017-01-01 11:00:00 36
2017-01-01 12:00:00 16
2017-01-01 13:00:00 11
2017-01-01 14:00:00 54
2017-01-01 15:00:00 88
2017-01-01 16:00:00 62
2017-01-01 17:00:00 33
2017-01-01 18:00:00 72
2017-01-01 19:00:00 78
2017-01-01 20:00:00 49
2017-01-01 21:00:00 51
2017-01-01 22:00:00 54
2017-01-01 23:00:00 77
2017-01-02 00:00:00 9
2017-01-02 01:00:00 15
2017-01-02 02:00:00 64
2017-01-02 03:00:00 28
2017-01-02 04:00:00 89
2017-01-02 05:00:00 93

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Resampling Hourly Data into Half Hourly in Pandas - python

You want to resample without any transformation, and then do a so-called "forward fill" of the resulting null values. That's: result = ( prices.set_index('DateTime') .resample('30T') .asfreq() # no transformation .ffill() # drag previous values down )

Related

plotting graph of day from a years data

Add hours to year-month-day data in pandas data frame

How can I calculate relative time between two time stamps in hours in pandas?

Select groups using slicing based on the group index in pandas DataFrame

How can I efficiently convert hourly data into dates and times for every day of the year using Python pandas?

Categories

Resources