I'm trying to round a datetime object DOWN in Python, and am having a few problems. There is lots on here about rounding datetime but I can't find anything specific to my needs.
I'm trying to get a date range of 15 minute intervals, with .now() being the end point. To get my end= I do:
pd.Timestamp.now().round('15min')
which returns:
2019-08-16 11:15:00 which is exactly what I want, however, if I run this at 11:23 say, it will return me 2019-08-16 11:30:00, and that's not actually what I want, I want it to round down to 2019-08-16 11:15:00 up until the moment we strike 11:30.
Is there a simple way to get it to round down as I haven't had any luck finding the answer if so.
Cheers for any help
Use Timestamp.floor:
print (pd.Timestamp('2019-08-16 11:15:00').floor('15min'))
2019-08-16 11:15:00
print (pd.Timestamp('2019-08-16 11:23:00').floor('15min'))
2019-08-16 11:15:00
print (pd.Timestamp('2019-08-16 11:30:00').floor('15min'))
2019-08-16 11:30:00
For testing:
df = pd.DataFrame({'dates':pd.date_range('2009-01-01', freq='T', periods=20)})
df['new'] = df['dates'].dt.floor('15min')
print (df)
0 2009-01-01 00:00:00 2009-01-01 00:00:00
1 2009-01-01 00:01:00 2009-01-01 00:00:00
2 2009-01-01 00:02:00 2009-01-01 00:00:00
3 2009-01-01 00:03:00 2009-01-01 00:00:00
4 2009-01-01 00:04:00 2009-01-01 00:00:00
5 2009-01-01 00:05:00 2009-01-01 00:00:00
6 2009-01-01 00:06:00 2009-01-01 00:00:00
7 2009-01-01 00:07:00 2009-01-01 00:00:00
8 2009-01-01 00:08:00 2009-01-01 00:00:00
9 2009-01-01 00:09:00 2009-01-01 00:00:00
10 2009-01-01 00:10:00 2009-01-01 00:00:00
11 2009-01-01 00:11:00 2009-01-01 00:00:00
12 2009-01-01 00:12:00 2009-01-01 00:00:00
13 2009-01-01 00:13:00 2009-01-01 00:00:00
14 2009-01-01 00:14:00 2009-01-01 00:00:00
15 2009-01-01 00:15:00 2009-01-01 00:15:00
16 2009-01-01 00:16:00 2009-01-01 00:15:00
17 2009-01-01 00:17:00 2009-01-01 00:15:00
18 2009-01-01 00:18:00 2009-01-01 00:15:00
19 2009-01-01 00:19:00 2009-01-01 00:15:00
Related
I have the following DataFrame called prices:
DateTime PriceAmountGBP
0 2022-03-27 23:00:00 202.807890
1 2022-03-28 00:00:00 197.724150
2 2022-03-28 01:00:00 191.615328
3 2022-03-28 02:00:00 188.798436
4 2022-03-28 03:00:00 187.706682
... ... ...
19 2023-01-24 18:00:00 216.915400
20 2023-01-24 19:00:00 197.050516
21 2023-01-24 20:00:00 168.227992
22 2023-01-24 21:00:00 158.954200
23 2023-01-24 22:00:00 149.039322
I'm trying to resample prices to show Half Hourly data instead of Hourly, with PriceAmountGBP repeating on the half hour, desired output below:
DateTime PriceAmountGBP
0 2022-03-27 23:00:00 202.807890
1 2022-03-28 23:30:00 202.807890
2 2022-03-28 00:00:00 197.724150
3 2022-03-28 00:30:00 197.724150
4 2022-03-28 01:00:00 191.615328
... ... ...
19 2023-01-24 18:00:00 216.915400
20 2023-01-24 18:30:00 216.915400
21 2023-01-24 19:00:00 197.050516
22 2023-01-24 19:30:00 197.050516
23 2023-01-24 20:00:00 168.227992
I've attempted the below which is incorrect:
prices.set_index('DateTime').resample('30T').interpolate()
Output:
PriceAmountGBP
DateTime
2022-03-27 23:00:00 202.807890
2022-03-27 23:30:00 200.266020
2022-03-28 00:00:00 197.724150
2022-03-28 00:30:00 194.669739
2022-03-28 01:00:00 191.615328
... ...
2023-01-24 20:00:00 168.227992
2023-01-24 20:30:00 163.591096
2023-01-24 21:00:00 158.954200
2023-01-24 21:30:00 153.996761
2023-01-24 22:00:00 149.039322
Any help appreciated!
You want to resample without any transformation, and then do a so-called "forward fill" of the resulting null values.
That's:
result = (
prices.set_index('DateTime')
.resample('30T')
.asfreq() # no transformation
.ffill() # drag previous values down
)
I have a dataframe that contains some NaT values.
Date Value
6312957 2012-01-01 23:58:00 -49
6312958 2012-01-01 23:59:00 -49
6312959 NaT -48
6312960 2012-01-02 00:01:00 -47
6312961 2012-01-02 00:02:00 -46
I try to replace these NAT by adding a minute to the previous entry.
indices_of_NAT = np.flatnonzero(pd.isna(df.loc[:, "Date"]))
df.loc[indices_of_NAT, "Date"] = df.loc[indices_of_NAT - 1, "Date"] + pd.Timedelta(minutes=1)
This produces the correct timestamps and indices, which I checked manually. The only problem is that they don't replace the NaT values for whatever reason. I wonder if something goes wrong with the indexing in my last line of code. Is there something obvious I am missing?
You can fillna with the shifted values + 1 min:
df['Date'] = df['Date'].fillna(df['Date'].shift().add(pd.Timedelta('1min')))
Another method is to interpolate. For this you need to temporarily convert to a number. This way you can fill more than one gap and the increment will be calculated automatically, and there are many nice interpolation methods (see doc):
df['Date'] = (pd.to_datetime(pd.to_numeric(df['Date'])
.mask(df['Date'].isna())
.interpolate('linear'))
)
Example:
Date Value shift interpolate
0 2012-01-01 23:58:00 -49 2012-01-01 23:58:00 2012-01-01 23:58:00
1 2012-01-01 23:59:00 -49 2012-01-01 23:59:00 2012-01-01 23:59:00
2 NaT -48 2012-01-02 00:00:00 2012-01-02 00:00:00
3 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
4 NaT -48 2012-01-02 00:02:00 2012-01-02 00:01:20
5 NaT -48 NaT 2012-01-02 00:01:40
6 2012-01-02 00:02:00 -46 2012-01-02 00:02:00 2012-01-02 00:02:00
Use Series.fillna with shifted values with add 1 minute:
df['Date'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))
Or with forward filling missing values with add 1 minute:
df['Date'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))
You can see difference with another data:
df['Date'] = pd.to_datetime(df['Date'])
df['Date1'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))
df['Date2'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))
print (df)
Date Value Date1 Date2
6312957 2012-01-01 23:58:00 -49 2012-01-01 23:58:00 2012-01-01 23:58:00
6312958 2012-01-01 23:59:00 -49 2012-01-01 23:59:00 2012-01-01 23:59:00
6312959 NaT -48 2012-01-02 00:00:00 2012-01-02 00:00:00
6312960 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
6312961 2012-01-02 00:02:00 -46 2012-01-02 00:02:00 2012-01-02 00:02:00
6312962 NaT -47 2012-01-02 00:03:00 2012-01-02 00:03:00
6312963 NaT -47 NaT 2012-01-02 00:03:00
6312967 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
In the example dataframe below, how can I convert t_relative into hours? For example, the relative time in the first row would be 49 hours.
tstart tend t_relative
0 2131-05-16 23:00:00 2131-05-19 00:00:00 2 days 01:00:00
1 2131-05-16 23:00:00 2131-05-19 00:15:00 2 days 01:15:00
2 2131-05-16 23:00:00 2131-05-19 00:45:00 2 days 01:45:00
3 2131-05-16 23:00:00 2131-05-19 01:00:00 2 days 02:00:00
4 2131-05-16 23:00:00 2131-05-19 01:15:00 2 days 02:15:00
t_relative was calculated with the operation, df['t_relative'] = df['tend']-df['tstart'].
You can divide Timedelta:
df['t_relative']/pd.Timedelta('1H')
Output:
0 49.00
1 49.25
2 49.75
3 50.00
4 50.25
Name: t_relative, dtype: float64
i have below dataframe. and i wanna make a hourly mean dataframe
condition that every hour just calculate mean value 00:15:00~00:45:00.
date/time are multi index.
aaa
date time
2017-01-01 00:00:00 146.88
00:15:00 143.28
00:30:00 143.28
00:45:00 141.12
01:00:00 134.64
01:15:00 132.48
01:30:00 136.80
01:45:00 138.24
02:00:00 131.76
02:15:00 131.04
02:30:00 134.64
02:45:00 139.68
03:00:00 136.08
03:15:00 132.48
03:30:00 132.48
03:45:00 139.68
04:00:00 134.64
04:15:00 131.04
04:30:00 160.56
04:45:00 177.12
...
results should be belows.. how can i do it?
aaa
date time
2017-01-01 00:00:00 146.88
01:00:00 134.64
02:00:00 131.76
03:00:00 136.08
04:00:00 134.64
...
It seems need only select rows with 00:00 in the end of times:
df2 = df1[df1.index.get_level_values(1).astype(str).str.endswith('00:00')]
print (df2)
aaa
date time
2017-01-01 00:00:00 146.88
01:00:00 134.64
02:00:00 131.76
03:00:00 136.08
04:00:00 134.64
But if need mean only values 00:15-00:45 it is more complicated:
lvl1 = pd.Series(df1.index.get_level_values(1))
m = ~lvl1.astype(str).str.endswith('00:00')
lvl1new = lvl1.mask(m).ffill()
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0),
lvl1new.where(m)], names=df1.index.names)
print (df1)
aaa
date time
2017-01-01 NaN 146.88
00:00:00 143.28
00:00:00 143.28
00:00:00 141.12
NaN 134.64
01:00:00 132.48
01:00:00 136.80
01:00:00 138.24
NaN 131.76
02:00:00 131.04
02:00:00 134.64
02:00:00 139.68
NaN 136.08
03:00:00 132.48
03:00:00 132.48
03:00:00 139.68
NaN 134.64
04:00:00 131.04
04:00:00 160.56
04:00:00 177.12
df = df1['aaa'].groupby(level=[0,1]).mean()
print (df)
date time
2017-01-01 00:00:00 142.56
01:00:00 135.84
02:00:00 135.12
03:00:00 134.88
04:00:00 156.24
Name: aaa, dtype: float64
I have a DataFrame with data similar to the following
import pandas as pd; import numpy as np; import datetime; from datetime import timedelta;
df = pd.DataFrame(index=pd.date_range(start='20160102', end='20170301', freq='5min'))
df['value'] = np.random.randn(df.index.size)
df.index += pd.Series([timedelta(seconds=np.random.randint(-60, 60))
for _ in range(df.index.size)])
which looks like this
In[37]: df
Out[37]:
value
2016-01-02 00:00:33 0.546675
2016-01-02 00:04:52 1.080558
2016-01-02 00:10:46 -1.551206
2016-01-02 00:15:52 -1.278845
2016-01-02 00:19:04 -1.672387
2016-01-02 00:25:36 -0.786985
2016-01-02 00:29:35 1.067132
2016-01-02 00:34:36 -0.575365
2016-01-02 00:39:33 0.570341
2016-01-02 00:44:56 -0.636312
...
2017-02-28 23:14:57 -0.027981
2017-02-28 23:19:51 0.883150
2017-02-28 23:24:15 -0.706997
2017-02-28 23:30:09 -0.954630
2017-02-28 23:35:08 -1.184881
2017-02-28 23:40:20 0.104017
2017-02-28 23:44:10 -0.678742
2017-02-28 23:49:15 -0.959857
2017-02-28 23:54:36 -1.157165
2017-02-28 23:59:10 0.527642
Now, I'm aiming to get the mean per 5 minute period over the course of a 24 hour day - without considering what day those values actually come from.
How can I do this effectively? I would like to think I could somehow remove the actual dates from my index and then use something like pd.TimeGrouper, but I haven't figured out how to do so.
My not-so-great solution
My solution so far has been to use between_time in a loop like this, just using an arbitrary day.
aggregates = []
start_time = datetime.datetime(1990, 1, 1, 0, 0, 0)
while start_time < datetime.datetime(1990, 1, 1, 23, 59, 0):
aggregates.append(
(
start_time,
df.between_time(start_time.time(),
(start_time + timedelta(minutes=5)).time(),
include_end=False).value.mean()
)
)
start_time += timedelta(minutes=5)
result = pd.DataFrame(aggregates, columns=['time', 'value'])
which works as expected
In[68]: result
Out[68]:
time value
0 1990-01-01 00:00:00 0.032667
1 1990-01-01 00:05:00 0.117288
2 1990-01-01 00:10:00 -0.052447
3 1990-01-01 00:15:00 -0.070428
4 1990-01-01 00:20:00 0.034584
5 1990-01-01 00:25:00 0.042414
6 1990-01-01 00:30:00 0.043388
7 1990-01-01 00:35:00 0.050371
8 1990-01-01 00:40:00 0.022209
9 1990-01-01 00:45:00 -0.035161
.. ... ...
278 1990-01-01 23:10:00 0.073753
279 1990-01-01 23:15:00 -0.005661
280 1990-01-01 23:20:00 -0.074529
281 1990-01-01 23:25:00 -0.083190
282 1990-01-01 23:30:00 -0.036636
283 1990-01-01 23:35:00 0.006767
284 1990-01-01 23:40:00 0.043436
285 1990-01-01 23:45:00 0.011117
286 1990-01-01 23:50:00 0.020737
287 1990-01-01 23:55:00 0.021030
[288 rows x 2 columns]
But this doesn't feel like a very Pandas-friendly solution.
IIUC then the following should work:
In [62]:
df.groupby(df.index.floor('5min').time).mean()
Out[62]:
value
00:00:00 -0.038002
00:05:00 -0.011646
00:10:00 0.010701
00:15:00 0.034699
00:20:00 0.041164
00:25:00 0.151187
00:30:00 -0.006149
00:35:00 -0.008256
00:40:00 0.021389
00:45:00 0.016851
00:50:00 -0.074825
00:55:00 0.012861
01:00:00 0.054048
01:05:00 0.041907
01:10:00 -0.004457
01:15:00 0.052428
01:20:00 -0.021518
01:25:00 -0.019010
01:30:00 0.030887
01:35:00 -0.085415
01:40:00 0.002386
01:45:00 -0.002189
01:50:00 0.049720
01:55:00 0.032292
02:00:00 -0.043642
02:05:00 0.067132
02:10:00 -0.029628
02:15:00 0.064098
02:20:00 0.042731
02:25:00 -0.031113
... ...
21:30:00 -0.018391
21:35:00 0.032155
21:40:00 0.035014
21:45:00 -0.016979
21:50:00 -0.025248
21:55:00 0.027896
22:00:00 -0.117036
22:05:00 -0.017970
22:10:00 -0.008494
22:15:00 -0.065303
22:20:00 -0.014623
22:25:00 0.076994
22:30:00 -0.030935
22:35:00 0.030308
22:40:00 -0.124668
22:45:00 0.064853
22:50:00 0.057913
22:55:00 0.002309
23:00:00 0.083586
23:05:00 -0.031043
23:10:00 -0.049510
23:15:00 0.003520
23:20:00 0.037135
23:25:00 -0.002231
23:30:00 -0.029592
23:35:00 0.040335
23:40:00 -0.021513
23:45:00 0.104421
23:50:00 -0.022280
23:55:00 -0.021283
[288 rows x 1 columns]
Here I floor the index to '5 min' intervals and then group on the time attribute and aggregate the mean