Selecting Data between Specific hours in a pandas dataframe - python

My Pandas Dataframe frame looks something like this
1. 2013-10-09 09:00:05
2. 2013-10-09 09:05:00
3. 2013-10-09 10:00:00
4. ............
5. ............
6. ............
7. 2013-10-10 09:00:05
8. 2013-10-10 09:05:00
9. 2013-10-10 10:00:00
I want the data lying in between hours 9 and 10 ...if anyone has worked on something like this ,it would be really helpful.

In [7]: index = date_range('20131009 08:30','20131010 10:05',freq='5T')
In [8]: df = DataFrame(randn(len(index),2),columns=list('AB'),index=index)
In [9]: df
Out[9]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 308 entries, 2013-10-09 08:30:00 to 2013-10-10 10:05:00
Freq: 5T
Data columns (total 2 columns):
A 308 non-null values
B 308 non-null values
dtypes: float64(2)
In [10]: df.between_time('9:00','10:00')
Out[10]:
A B
2013-10-09 09:00:00 -0.664639 1.597453
2013-10-09 09:05:00 1.197290 -0.500621
2013-10-09 09:10:00 1.470186 -0.963553
2013-10-09 09:15:00 0.181314 -0.242415
2013-10-09 09:20:00 0.969427 -1.156609
2013-10-09 09:25:00 0.261473 0.413926
2013-10-09 09:30:00 -0.003698 0.054953
2013-10-09 09:35:00 0.418147 -0.417291
2013-10-09 09:40:00 0.413565 -1.096234
2013-10-09 09:45:00 0.460293 1.200277
2013-10-09 09:50:00 -0.702444 -0.041597
2013-10-09 09:55:00 0.548385 -0.832382
2013-10-09 10:00:00 -0.526582 0.758378
2013-10-10 09:00:00 0.926738 0.178204
2013-10-10 09:05:00 -1.178534 0.184205
2013-10-10 09:10:00 1.408258 0.948526
2013-10-10 09:15:00 0.523318 0.327390
2013-10-10 09:20:00 -0.193174 0.863294
2013-10-10 09:25:00 1.355610 -2.160864
2013-10-10 09:30:00 1.930622 0.174683
2013-10-10 09:35:00 0.273551 0.870682
2013-10-10 09:40:00 0.974756 -0.327763
2013-10-10 09:45:00 1.808285 0.080267
2013-10-10 09:50:00 0.842119 0.368689
2013-10-10 09:55:00 1.065585 0.802003
2013-10-10 10:00:00 -0.324894 0.781885

Make a new column for the time after splitting your original column . Use the below code to split your time for hours, minutes, and seconds:-
df[['h','m','s']] = df['Time'].astype(str).str.split(':', expand=True).astype(int)
Once you are done with that, you have to select the data by filtering it out:-
df9to10 =df[df['h'].between(9, 10, inclusive=True)]
And, it's dynamic, if you want to take another period between apart from 9 and 10.

Another method that uses query. Tested with Python 3.9.
from Pandas import Timestamp
from datetime import time
df = pd.DataFrame({"timestamp":
[Timestamp("2017-01-03 09:30:00.049"), Timestamp("2017-01-03 09:30:00.049"),
Timestamp("2017-12-29 16:12:34.214"), Timestamp("2017-12-29 16:17:19.006")]})
df["time"] = df.timestamp.dt.time
start_time = time(9,20,0)
end_time = time(10,0,0)
df_times = df.query("time >= #start_time and time <= #end_time")
In:
timestamp
2017-01-03 09:30:00.049
2017-01-03 09:30:00.049
2017-12-29 16:12:34.214
2017-12-29 16:17:19.006
Out:
timestamp time
2017-01-03 09:30:00.049 09:30:00.049000
2017-01-03 09:30:00.049 09:30:00.049000
As a bonus, arbitrarily complex expressions can be used within a query, e.g. selecting everything within two separate time ranges (this is impossible with between_time).

Assuming your original dataframe is called "df" and your time column is called "time" this would work: (where start_time and end_time correspond to the time interval that you'd like)
>>> df_new = df[(df['time'] > start_time) & (df['time'] < end_time)]

Related

How do i subtarct 2 time columns with each other in Python?

I have a column Start and HT where both are Object Datatype:
The output which is needed is (HT - Start) in minutes.
I try to convert them to datetime through pd.to_datetime but it throws error
TypeError: <class 'datetime.time'> is not convertible to datetime
Start
HT
09:30:00
09:40:00
09:30:00
09:36:00
09:30:00
09:50:00
09:30:00
10:36:00
Expected Output
Start
HT
diff(in minutes)
09:30:00
09:40:00
10
09:30:00
09:36:00
6
09:30:00
09:50:00
20
09:30:00
10:36:00
66
Please help.
You should fisrt convert dates using pd.to_datetime()
df['Start'] = pd.to_datetime(df['Start'], format='%H:%M:%S').dt.time.apply(str)
df['HT'] = pd.to_datetime(df['HT'], format='%H:%M:%S').dt.time.apply(str)
df['diff(in minutes)'] = (pd.to_timedelta(df['HT']) - pd.to_timedelta(df['Start'])).dt.total_seconds() / 60
print(df)
You can simplify the above code using pd.to_timedelta()
df['Start'] = pd.to_timedelta(df['Start'])
df['HT'] = pd.to_timedelta(df['HT'])
df['diff(in minutes)'] = (df['HT'] - df['Start']).dt.total_seconds() / 60
print(df)
Start HT diff(in minutes)
0 09:30:00 09:40:00 10.0
1 09:30:00 09:36:00 6.0
2 09:30:00 09:50:00 20.0
3 09:30:00 10:36:00 66.0

Compare two dataframes and keep a specific datetime range of another

I have two dataframes with timestamps. I want to select the timestamps from df1 that equal the timestamps 'start_show' of df2 but also keep all the timestamps of df1 2 hours before and 2 hours after (of df1) where the timestamps are equal.
df1:
van_timestamp weekdag
2880 2016-11-19 00:00:00 6
2881 2016-11-19 00:15:00 6
2882 2016-11-19 00:30:00 6
... ... ...
822349 2019-11-06 22:45:00 3
822350 2019-11-06 23:00:00 3
822351 2019-11-06 23:15:00 3
df2:
einde_show start_show
255 2016-01-16 22:00:00 2016-01-16 20:00:00
256 2016-01-23 21:30:00 2016-01-23 19:45:00
257 2016-01-26 21:30:00 2016-01-26 19:45:00
... ... ...
1111 2019-12-29 18:30:00 2019-12-29 17:00:00
1112 2019-12-30 15:00:00 2019-12-30 13:30:00
1113 2019-12-30 18:30:00 2019-12-30 17:00:00
df1 contains a timestamp every 15 minutes of every day whereas df2['start_show'] contains just a single timestamp per day.
So ultimately what I want to achieve is that for every timestamp of df2 I have the corresponding timestamp of df1 +- 2 hours.
So far I've tried:
df1['van_timestamp'][df1['van_timestamp'].isin(df2['start_show'])]
This selects the right timestamps. Now I want to select everything from df1 in the range of
+ pd.Timedelta(2, unit='h')
- pd.Timedelta(2, unit='h')
But I'm not sure how to go about this. Help would be much appreciated!
Thanks!
I got it working (ugly fix). I created a datetime range
dates = [pd.date_range(start = df2['start_show'].iloc[i] - pd.Timedelta(2, unit='h'), end = df2['start_show'].iloc[i], freq = '15T') for i in range(len(evs_data))]
Which I then unlisted:
dates = [i for sublist in dates for i in sublist]
Afterwards I compared the dataframe with this list.
relevant_timestamps = df1[df1['van_timestamp'].isin(dates)]
If anyone else has a better solution, please let me know!

How to handle end of time series in pandas resample when upsampling?

I want to resample from hours to half-hours. I use .ffill() in the example, but I've tested .asfreq() as an intermediate step too.
The goal is to get intervals of half hours where the hourly values are spread among the upsampled intervals, and I'm trying to find a general solution for any ranges with the same problem.
import pandas as pd
index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
hourly = pd.Series(range(10, len(index)+10), index=index)
half_hourly = hourly.resample('30min').ffill() / 2
The hourly series looks like:
2018-10-10 00:00:00 10
2018-10-10 01:00:00 11
2018-10-10 02:00:00 12
Freq: H, dtype: int64
And the half_hourly:
2018-10-10 00:00:00 5.0
2018-10-10 00:30:00 5.0
2018-10-10 01:00:00 5.5
2018-10-10 01:30:00 5.5
2018-10-10 02:00:00 6.0
Freq: 30T, dtype: float64
The problem with the last one is that there is no row for representing 02:30:00
I want to achieve something that is:
2018-10-10 00:00:00 5.0
2018-10-10 00:30:00 5.0
2018-10-10 01:00:00 5.5
2018-10-10 01:30:00 5.5
2018-10-10 02:00:00 6.0
2018-10-10 02:30:00 6.0
Freq: 30T, dtype: float64
I understand that the hourly series ends at 02:00, so there is no reason to expect pandas to insert the last half hour by default. However, after reading a lot of deprecated/old posts, some newer ones, the documentation, and cookbook, I still weren't able to find a straight-forward solution.
Lastly, I've also tested the use of .mean(), but that didn't fill the NaNs. And interpolate() didn't average by hour as I wanted it to.
My .ffill() / 2 almost works as a way to spread hour to half hours in this case, but it seems like a hack to a problem that I expect pandas already provides a better solution to.
Thanks in advance.
Your precise issue can be solved like this
>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2
2018-10-10 00:00:00 5.0
2018-10-10 00:30:00 5.0
2018-10-10 01:00:00 5.5
2018-10-10 01:30:00 5.5
2018-10-10 02:00:00 6.0
2018-10-10 02:30:00 6.0
Freq: 30T, dtype: float64
>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2
I suspect that this is a minimal example so I will try to generically solve as well. Lets say you have multiple points to fill in each day
>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> x.resample('6h').ffill()
2018-09-21 00:00:00 1.5
2018-09-21 06:00:00 1.5
2018-09-21 12:00:00 1.5
2018-09-21 18:00:00 1.5
2018-09-22 00:00:00 2.5
Freq: 6H, dtype: float64
Employ a similar trick to include 6am, 12pm, 6pm on 2018-09-22 as well.
Re-index with a shift equal to that you want to have as an inclusive endpoint. In this case our shift is an extra day
>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> res = x.reindex(x.index.union(x.index.shift(freq='1D'))).resample('6h').ffill()
>>> res[:res.last_valid_index()] # drop the start of next day
2018-09-21 00:00:00 1.5
2018-09-21 06:00:00 1.5
2018-09-21 12:00:00 1.5
2018-09-21 18:00:00 1.5
2018-09-22 00:00:00 2.5
2018-09-22 06:00:00 2.5
2018-09-22 12:00:00 2.5
2018-09-22 18:00:00 2.5
Freq: 6H, dtype: float64

How to resample a df with datetime index to exactly n equally sized periods?

I've got a large dataframe with a datetime index and need to resample data to exactly 10 equally sized periods.
So far, I've tried finding the first and last dates to determine the total number of days in the data, divide that by 10 to determine the size of each period, then resample using that number of days. eg:
first = df.reset_index().timesubmit.min()
last = df.reset_index().timesubmit.max()
periodsize = str((last-first).days/10) + 'D'
df.resample(periodsize,how='sum')
This doesn't guarantee exactly 10 periods in the df after resampling since the periodsize is a rounded down int. Using a float doesn't work in the resampling. Seems that either there's something simple that I'm missing here, or I'm attacking the problem all wrong.
import numpy as np
import pandas as pd
n = 10
nrows = 33
index = pd.date_range('2000-1-1', periods=nrows, freq='D')
df = pd.DataFrame(np.ones(nrows), index=index)
print(df)
# 0
# 2000-01-01 1
# 2000-01-02 1
# ...
# 2000-02-01 1
# 2000-02-02 1
first = df.index.min()
last = df.index.max() + pd.Timedelta('1D')
secs = int((last-first).total_seconds()//n)
periodsize = '{:d}S'.format(secs)
result = df.resample(periodsize, how='sum')
print('\n{}'.format(result))
assert len(result) == n
yields
0
2000-01-01 00:00:00 4
2000-01-04 07:12:00 3
2000-01-07 14:24:00 3
2000-01-10 21:36:00 4
2000-01-14 04:48:00 3
2000-01-17 12:00:00 3
2000-01-20 19:12:00 4
2000-01-24 02:24:00 3
2000-01-27 09:36:00 3
2000-01-30 16:48:00 3
The values in the 0-column indicate the number of rows that were aggregated, since the original DataFrame was filled with values of 1. The pattern of 4's and 3's is about as even as you can get since 33 rows can not be evenly grouped into 10 groups.
Explanation: Consider this simpler DataFrame:
n = 2
nrows = 5
index = pd.date_range('2000-1-1', periods=nrows, freq='D')
df = pd.DataFrame(np.ones(nrows), index=index)
# 0
# 2000-01-01 1
# 2000-01-02 1
# 2000-01-03 1
# 2000-01-04 1
# 2000-01-05 1
Using df.resample('2D', how='sum') gives the wrong number of groups
In [366]: df.resample('2D', how='sum')
Out[366]:
0
2000-01-01 2
2000-01-03 2
2000-01-05 1
Using df.resample('3D', how='sum') gives the right number of groups, but the
second group starts at 2000-01-04 which does not evenly divide the DataFrame
into two equally-spaced groups:
In [367]: df.resample('3D', how='sum')
Out[367]:
0
2000-01-01 3
2000-01-04 2
To do better, we need to work at a finer time resolution than in days. Since Timedeltas have a total_seconds method, let's work in seconds. So for the example above, the desired frequency string would be
In [374]: df.resample('216000S', how='sum')
Out[374]:
0
2000-01-01 00:00:00 3
2000-01-03 12:00:00 2
since there are 216000*2 seconds in 5 days:
In [373]: (pd.Timedelta(days=5) / pd.Timedelta('1S'))/2
Out[373]: 216000.0
Okay, so now all we need is a way to generalize this. We'll want the minimum and maximum dates in the index:
first = df.index.min()
last = df.index.max() + pd.Timedelta('1D')
We add an extra day because it makes the difference in days come out right. In
the example above, There are only 4 days between the Timestamps for 2000-01-05
and 2000-01-01,
In [377]: (pd.Timestamp('2000-01-05')-pd.Timestamp('2000-01-01')).days
Out[378]: 4
But as we can see in the worked example, the DataFrame has 5 rows representing 5
days. So it makes sense that we need to add an extra day.
Now we can compute the correct number of seconds in each equally-spaced group with:
secs = int((last-first).total_seconds()//n)
Here is one way to ensure equal-size sub-periods by using np.linspace() on pd.Timedelta and then classifying each obs into different bins using pd.cut.
import pandas as pd
import numpy as np
# generate artificial data
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100, 2), columns=['A', 'B'], index=pd.date_range('2015-01-01 00:00:00', periods=100, freq='8H'))
Out[87]:
A B
2015-01-01 00:00:00 1.7641 0.4002
2015-01-01 08:00:00 0.9787 2.2409
2015-01-01 16:00:00 1.8676 -0.9773
2015-01-02 00:00:00 0.9501 -0.1514
2015-01-02 08:00:00 -0.1032 0.4106
2015-01-02 16:00:00 0.1440 1.4543
2015-01-03 00:00:00 0.7610 0.1217
2015-01-03 08:00:00 0.4439 0.3337
2015-01-03 16:00:00 1.4941 -0.2052
2015-01-04 00:00:00 0.3131 -0.8541
2015-01-04 08:00:00 -2.5530 0.6536
2015-01-04 16:00:00 0.8644 -0.7422
2015-01-05 00:00:00 2.2698 -1.4544
2015-01-05 08:00:00 0.0458 -0.1872
2015-01-05 16:00:00 1.5328 1.4694
... ... ...
2015-01-29 08:00:00 0.9209 0.3187
2015-01-29 16:00:00 0.8568 -0.6510
2015-01-30 00:00:00 -1.0342 0.6816
2015-01-30 08:00:00 -0.8034 -0.6895
2015-01-30 16:00:00 -0.4555 0.0175
2015-01-31 00:00:00 -0.3540 -1.3750
2015-01-31 08:00:00 -0.6436 -2.2234
2015-01-31 16:00:00 0.6252 -1.6021
2015-02-01 00:00:00 -1.1044 0.0522
2015-02-01 08:00:00 -0.7396 1.5430
2015-02-01 16:00:00 -1.2929 0.2671
2015-02-02 00:00:00 -0.0393 -1.1681
2015-02-02 08:00:00 0.5233 -0.1715
2015-02-02 16:00:00 0.7718 0.8235
2015-02-03 00:00:00 2.1632 1.3365
[100 rows x 2 columns]
# cutoff points, 10 equal-size group requires 11 points
# measured by timedelta 1 hour
time_delta_in_hours = (df.index - df.index[0]) / pd.Timedelta('1h')
n = 10
ts_cutoff = np.linspace(0, time_delta_in_hours[-1], n+1)
# labels, time index
time_index = df.index[0] + np.array([pd.Timedelta(str(time_delta)+'h') for time_delta in ts_cutoff])
# create a categorical reference variables
df['start_time_index'] = pd.cut(time_delta_in_hours, bins=10, labels=time_index[:-1])
# for clarity, reassign labels using end-period index
df['end_time_index'] = pd.cut(time_delta_in_hours, bins=10, labels=time_index[1:])
Out[89]:
A B start_time_index end_time_index
2015-01-01 00:00:00 1.7641 0.4002 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-01 08:00:00 0.9787 2.2409 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-01 16:00:00 1.8676 -0.9773 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-02 00:00:00 0.9501 -0.1514 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-02 08:00:00 -0.1032 0.4106 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-02 16:00:00 0.1440 1.4543 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-03 00:00:00 0.7610 0.1217 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-03 08:00:00 0.4439 0.3337 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-03 16:00:00 1.4941 -0.2052 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-04 00:00:00 0.3131 -0.8541 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-04 08:00:00 -2.5530 0.6536 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-04 16:00:00 0.8644 -0.7422 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-05 00:00:00 2.2698 -1.4544 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-05 08:00:00 0.0458 -0.1872 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-05 16:00:00 1.5328 1.4694 2015-01-04 07:12:00 2015-01-07 14:24:00
... ... ... ... ...
2015-01-29 08:00:00 0.9209 0.3187 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-29 16:00:00 0.8568 -0.6510 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-30 00:00:00 -1.0342 0.6816 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-30 08:00:00 -0.8034 -0.6895 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-30 16:00:00 -0.4555 0.0175 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-31 00:00:00 -0.3540 -1.3750 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-01-31 08:00:00 -0.6436 -2.2234 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-01-31 16:00:00 0.6252 -1.6021 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-01 00:00:00 -1.1044 0.0522 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-01 08:00:00 -0.7396 1.5430 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-01 16:00:00 -1.2929 0.2671 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-02 00:00:00 -0.0393 -1.1681 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-02 08:00:00 0.5233 -0.1715 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-02 16:00:00 0.7718 0.8235 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-03 00:00:00 2.1632 1.3365 2015-01-30 16:48:00 2015-02-03 00:00:00
[100 rows x 4 columns]
df.groupby('start_time_index').agg('sum')
Out[90]:
A B
start_time_index
2015-01-01 00:00:00 8.6133 2.7734
2015-01-04 07:12:00 1.9220 -0.8069
2015-01-07 14:24:00 -8.1334 0.2318
2015-01-10 21:36:00 -2.7572 -4.2862
2015-01-14 04:48:00 1.1957 7.2285
2015-01-17 12:00:00 3.2485 6.6841
2015-01-20 19:12:00 -0.8903 2.2802
2015-01-24 02:24:00 -2.1025 1.3800
2015-01-27 09:36:00 -1.1017 1.3108
2015-01-30 16:48:00 -0.0902 -2.5178
Another potential shorter way to do this is to specify your sampling freq as the time delta. But the problem, as shown in below, is that it delivers 11 sub-samples instead of 10. I believe the reason is that the resample implements a left-inclusive/right-exclusive (or left-exclusive/right-inclusive) sub-sampling scheme so that the very last obs at '2015-02-03 00:00:00' is considered as a separate group. If we use pd.cut to do it ourself, we can specify include_lowest=True so that it gives us exactly 10 sub-samples rather than 11.
n = 10
time_delta_str = str((df.index[-1] - df.index[0]) / (pd.Timedelta('1s') * n)) + 's'
df.resample(pd.Timedelta(time_delta_str), how='sum')
Out[114]:
A B
2015-01-01 00:00:00 8.6133 2.7734
2015-01-04 07:12:00 1.9220 -0.8069
2015-01-07 14:24:00 -8.1334 0.2318
2015-01-10 21:36:00 -2.7572 -4.2862
2015-01-14 04:48:00 1.1957 7.2285
2015-01-17 12:00:00 3.2485 6.6841
2015-01-20 19:12:00 -0.8903 2.2802
2015-01-24 02:24:00 -2.1025 1.3800
2015-01-27 09:36:00 -1.1017 1.3108
2015-01-30 16:48:00 -2.2534 -3.8543
2015-02-03 00:00:00 2.1632 1.3365

averaging every five minutes data as one datapoint in pandas dataframe

I have a Dataframe in Pandas like this
1. 2013-10-09 09:00:05
2. 2013-10-09 09:01:00
3. 2013-10-09 09:02:00
4. ............
5. ............
6. ............
7. 2013-10-10 09:15:05
8. 2013-10-10 09:16:00
9. 2013-10-10 09:17:00
I would like reduce the size of the Dataframe by averaging every 5 mins data and forming 1 datapoint for it ..like this
1. 2013-10-09 09:05:00
2. 2013-10-09 09:10:00
3. 2013-10-09 09:15:00
Can someone help me with this ??
you may want to look at pandas.resample:
df['Data'].resample('5Min', how='mean')
or, as how = 'mean' is default parameter:
df['Data'].resample('5Min')
For example:
>>> rng = pd.date_range('1/1/2012', periods=10, freq='Min')
>>> df = pd.DataFrame({'Data':np.random.randint(0, 500, len(rng))}, index=rng)
>>> df
Data
2012-01-01 00:00:00 488
2012-01-01 00:01:00 172
2012-01-01 00:02:00 276
2012-01-01 00:03:00 5
2012-01-01 00:04:00 233
2012-01-01 00:05:00 266
2012-01-01 00:06:00 103
2012-01-01 00:07:00 40
2012-01-01 00:08:00 274
2012-01-01 00:09:00 494
>>>
>>> df['Data'].resample('5Min')
2012-01-01 00:00:00 234.8
2012-01-01 00:05:00 235.4
You can find more examples here.

Categories

Resources