How to restrict time difference to same day?

How to restrict time difference to same day? - python

I have a dataframe like as shown below
df1 = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,1,1,1,1],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-03
12:59:00','2173-04-03 13:14:00','2173-04-03 13:37:00','2173-04-04
11:30:00','2173-04-05 16:00:00','2173-04-05 22:00:00','2173-04-06
04:00:00','2173-04-06 04:30:00','2173-04-06 08:00:00']
})
I would like to create another column called tdiff to calculate the time difference
This is what I tried
df1['time_1'] = pd.to_datetime(df1['time_1'])
df['time_2'] = df['time_1'].shift(-1)
df['tdiff'] = (df['time_2'] - df['time_1']).dt.total_seconds() / 3600
But this produces an output like as shown below. As you can see, it subtracts from the next date. Instead I would like to restrict the time difference only to the same day. Ex: if Jan 15th 20:00:00 PM is the last record for that day, then I expect the tdiff to be 4:00:00 (24:00:00: - 20:00:00)
I understand it is happening because I am shifting the values of time to subtract and it's obvious that the highlighted rows are picking records from next date. But is there a way to avoid this but calculate the time difference between records in a same day?
I expect my output to be like this. Here NaN should be replaced by the current date (23:59:00). if you check the difference, you will get an idea
Is there any existing method or pandas function that can help us do this datewise timedelta? How can I shift the values datewise?

IIUC, you can use:
s=pd.to_timedelta(24,unit='h')-(df1.time_1-df1.time_1.dt.normalize())
df1['tdiff']=df1.groupby(df1.time_1.dt.date).time_1.diff().shift(-1).fillna(s)
#df1.groupby(df1.time_1.dt.date).time_1.diff().shift(-1).fillna(s).dt.total_seconds()/3600
subject_id time_1 tdiff
0 1 2173-04-03 12:35:00 00:15:00
1 1 2173-04-03 12:50:00 00:09:00
2 1 2173-04-03 12:59:00 00:15:00
3 1 2173-04-03 13:14:00 00:23:00
4 1 2173-04-03 13:37:00 10:23:00
5 1 2173-04-04 11:30:00 12:30:00
6 1 2173-04-05 16:00:00 06:00:00
7 1 2173-04-05 22:00:00 02:00:00
8 1 2173-04-06 04:00:00 00:30:00
9 1 2173-04-06 04:30:00 03:30:00
10 1 2173-04-06 08:00:00 16:00:00

you could use df.where and df.dt.ceil to decide if to subtract from time_2 or from midnight of time_1:
sameDayOrMidnight = df.time_2.where(df.time_1.dt.date==df.time_2.dt.date, df.time_1.dt.ceil(freq='1d'))
df['tdiff'] = (sameDayOrMidnight - df.time_1).dt.total_seconds() / 3600
result:
subject_id time_1 time_2 tdiff
0 1 2173-04-03 12:35:00 2173-04-03 12:50:00 0.250000
1 1 2173-04-03 12:50:00 2173-04-03 12:59:00 0.150000
2 1 2173-04-03 12:59:00 2173-04-03 13:14:00 0.250000
3 1 2173-04-03 13:14:00 2173-04-03 13:37:00 0.383333
4 1 2173-04-03 13:37:00 2173-04-04 11:30:00 10.383333
5 1 2173-04-04 11:30:00 2173-04-05 16:00:00 12.500000
6 1 2173-04-05 16:00:00 2173-04-05 22:00:00 6.000000
7 1 2173-04-05 22:00:00 2173-04-06 04:00:00 2.000000
8 1 2173-04-06 04:00:00 2173-04-06 04:30:00 0.500000
9 1 2173-04-06 04:30:00 2173-04-06 08:00:00 3.500000
10 1 2173-04-06 08:00:00 NaT 16.000000

Related

From hours to String

I have this df:
Index Dates
0 2017-01-01 23:30:00
1 2017-01-12 22:30:00
2 2017-01-20 13:35:00
3 2017-01-21 14:25:00
4 2017-01-28 22:30:00
5 2017-08-01 13:00:00
6 2017-09-26 09:39:00
7 2017-10-08 06:40:00
8 2017-10-04 07:30:00
9 2017-12-13 07:40:00
10 2017-12-31 14:55:00
The purpose was that between the time ranges 5:00 to 11:59 a new df would be created with data that would say: morning. To achieve this I converted those hours to booleans:
hour_morning=(pd.to_datetime(df['Dates']).dt.strftime('%H:%M:%S').between('05:00:00','11:59:00'))
and then passed them to a list with "morning" str
text_morning=[str('morning') for x in hour_morning if x==True]
I have the error in the last line because it only returns ´morning´ string values, it is as if the 'X' ignored the 'if' condition. Why is this happening and how do i fix it?

Do
text_morning=[str('morning') if x==True else 'not_morning' for x in hour_morning ]
You can also use np.where:
text_morning = np.where(hour_morning, 'morning', 'not morning')

Given:
Dates values
0 2017-01-01 23:30:00 0
1 2017-01-12 22:30:00 1
2 2017-01-20 13:35:00 2
3 2017-01-21 14:25:00 3
4 2017-01-28 22:30:00 4
5 2017-08-01 13:00:00 5
6 2017-09-26 09:39:00 6
7 2017-10-08 06:40:00 7
8 2017-10-04 07:30:00 8
9 2017-12-13 07:40:00 9
10 2017-12-31 14:55:00 10
Doing:
# df.Dates = pd.to_datetime(df.Dates)
df = df.set_index("Dates")
Now we can use pd.DataFrame.between_time:
new_df = df.between_time('05:00:00','11:59:00')
print(new_df)
Output:
values
Dates
2017-09-26 09:39:00 6
2017-10-08 06:40:00 7
2017-10-04 07:30:00 8
2017-12-13 07:40:00 9
Or use it to update the original dataframe:
df.loc[df.between_time('05:00:00','11:59:00').index, 'morning'] = 'morning'
# Output:
values morning
Dates
2017-01-01 23:30:00 0 NaN
2017-01-12 22:30:00 1 NaN
2017-01-20 13:35:00 2 NaN
2017-01-21 14:25:00 3 NaN
2017-01-28 22:30:00 4 NaN
2017-08-01 13:00:00 5 NaN
2017-09-26 09:39:00 6 morning
2017-10-08 06:40:00 7 morning
2017-10-04 07:30:00 8 morning
2017-12-13 07:40:00 9 morning
2017-12-31 14:55:00 10 NaN

How to create a set rows for every group depending on the result after the operation in the groupby() Pandas?

I have a situation where my pandas dataframe df looks like below:
stay_id starttime charttime dd day delta total cum_uo
0 30578301 2154-03-14 00:30:00 2154-03-13 13:00:00 0 days 11:30:00 0 0 0 90.0
1 30578301 2154-03-14 00:30:00 2154-03-13 14:00:00 0 days 10:30:00 0 1 1 215.0
2 30578301 2154-03-14 00:30:00 2154-03-13 15:00:00 0 days 09:30:00 0 1 2 325.0
3 30578301 2154-03-14 00:30:00 2154-03-13 16:00:00 0 days 08:30:00 0 1 3 370.0
4 30578301 2154-03-14 00:30:00 2154-03-13 17:00:00 0 days 07:30:00 0 1 4 425.0
5 30578301 2154-03-14 00:30:00 2154-03-13 18:00:00 0 days 06:30:00 0 1 5 490.0
6 30578301 2154-03-14 00:30:00 2154-03-13 19:00:00 0 days 05:30:00 0 1 6 540.0
7 30578301 2154-03-14 00:30:00 2154-03-13 20:00:00 0 days 04:30:00 0 1 7 615.0
8 30578301 2154-03-14 00:30:00 2154-03-13 21:00:00 0 days 03:30:00 0 1 8 660.0
9 30578301 2154-03-14 00:30:00 2154-03-13 22:00:00 0 days 02:30:00 0 1 9 710.0
10 30578301 2154-03-14 00:30:00 2154-03-13 23:00:00 0 days 01:30:00 0 1 10 740.0
11 30578301 2154-03-14 00:30:00 2154-03-14 00:00:00 0 days 00:30:00 0 1 11 780.0
12 30578301 2154-03-14 00:30:00 2154-03-14 01:00:00 -1 days+23:30:00 -1 1 12 905.0
13 30578301 2154-03-14 00:30:00 2154-03-14 02:00:00 -1 days+22:30:00 -1 1 13 1255.0
The frame I am showing above is grouped by stay_id, starttime.
I want to add the hour from "total" column as a new column for minimum value of delta=0 days 00:30:00, for df["day"]==0".
What I tried:
def helper(rows):
val = rows[rows["day"]==0]["dd"].min()
d_hour_array =rows[rows["dd"]==val]["total"].values[0]
return d_hour_array
df.groupby(['stay_id', 'starttime']).apply(helper)
The result I am getting
stay_id starttime
30578301 2154-03-14 00:30:00 11
2154-03-14 05:11:00 16
2154-03-14 09:41:00 20
2154-03-14 19:05:00 29
2154-03-15 09:59:00 44
2154-03-15 20:58:00 55
dtype: int64
How to add the values as new column 11, 16, 20, 29, 44, 45 from the result to each group after `groupby(['stay_id', 'starttime'])?
(like creating df["d_hour"] for each group).
Expected Output:
stay_id starttime charttime dd day delta total cum_uo d_hour
30578301 2154-03-14 00:30:00 2154-03-13 13:00:00 0 days 11:30:00 0 0 0 90.0 11
2154-03-14 00:30:00 2154-03-13 14:00:00 0 days 10:30:00 0 1 1 215.0 11
30578301 2154-03-14 05:11:00 2154-03-13 14:00:00 0 days 13:11:00 0 1 3 370.0 16
2154-03-14 05:11:00 2154-03-13 17:00:00 0 days 12:11:00 0 1 4 425.0 16
Any help is much appreciated please...

How to find occurrence of consecutive events in python timeseries data frame?

I have got a time series of meteorological observations with date and value columns:
df = pd.DataFrame({'date':['11/10/2017 0:00','11/10/2017 03:00','11/10/2017 06:00','11/10/2017 09:00','11/10/2017 12:00',
'11/11/2017 0:00','11/11/2017 03:00','11/11/2017 06:00','11/11/2017 09:00','11/11/2017 12:00',
'11/12/2017 00:00','11/12/2017 03:00','11/12/2017 06:00','11/12/2017 09:00','11/12/2017 12:00'],
'value':[850,np.nan,np.nan,np.nan,np.nan,500,650,780,np.nan,800,350,690,780,np.nan,np.nan],
'consecutive_hour': [ 3,0,0,0,0,3,6,9,0,3,3,6,9,0,0]})
With this DataFrame, I want a third column of consecutive_hours such that if the value in a particular timestamp is less than 1000, we give corresponding value in "consecutive-hours" of "3:00" hours and find consecutive such occurrence like 6:00 9:00 as above.
Lastly, I want to summarize the table counting consecutive hours occurrence and number of days such that the summary table looks like:
df_summary = pd.DataFrame({'consecutive_hours':[3,6,9,12],
'number_of_day':[2,0,2,0]})
I tried several online solutions and methods like shift(), diff() etc. as mentioned in:How to groupby consecutive values in pandas DataFrame
and more, spent several days but no luck yet.
I would highly appreciate help on this issue.
Thanks!

Input data:
>>> df
date value
0 2017-11-10 00:00:00 850.0
1 2017-11-10 03:00:00 NaN
2 2017-11-10 06:00:00 NaN
3 2017-11-10 09:00:00 NaN
4 2017-11-10 12:00:00 NaN
5 2017-11-11 00:00:00 500.0
6 2017-11-11 03:00:00 650.0
7 2017-11-11 06:00:00 780.0
8 2017-11-11 09:00:00 NaN
9 2017-11-11 12:00:00 800.0
10 2017-11-12 00:00:00 350.0
11 2017-11-12 03:00:00 690.0
12 2017-11-12 06:00:00 780.0
13 2017-11-12 09:00:00 NaN
14 2017-11-12 12:00:00 NaN
The cumcount_reset function is adapted from this answer of #jezrael:
Python pandas cumsum with reset everytime there is a 0
cumcount_reset = \
lambda b: b.cumsum().sub(b.cumsum().where(~b).ffill().fillna(0)).astype(int)
df["consecutive_hour"] = (df.set_index("date")["value"] < 1000) \
.groupby(pd.Grouper(freq="D")) \
.apply(lambda b: cumcount_reset(b)).mul(3) \
.reset_index(drop=True)
Output result:
>>> df
date value consecutive_hour
0 2017-11-10 00:00:00 850.0 3
1 2017-11-10 03:00:00 NaN 0
2 2017-11-10 06:00:00 NaN 0
3 2017-11-10 09:00:00 NaN 0
4 2017-11-10 12:00:00 NaN 0
5 2017-11-11 00:00:00 500.0 3
6 2017-11-11 03:00:00 650.0 6
7 2017-11-11 06:00:00 780.0 9
8 2017-11-11 09:00:00 NaN 0
9 2017-11-11 12:00:00 800.0 3
10 2017-11-12 00:00:00 350.0 3
11 2017-11-12 03:00:00 690.0 6
12 2017-11-12 06:00:00 780.0 9
13 2017-11-12 09:00:00 NaN 0
14 2017-11-12 12:00:00 NaN 0
Summary table
df_summary = df.loc[df.groupby(pd.Grouper(key="date", freq="D"))["consecutive_hour"] \
.apply(lambda h: (h - h.shift(-1).fillna(0)) > 0),
"consecutive_hour"] \
.value_counts().reindex([3, 6, 9, 12], fill_value=0) \
.rename("number_of_day") \
.rename_axis("consecutive_hour") \
.reset_index()
>>> df_summary
consecutive_hour number_of_day
0 3 2
1 6 0
2 9 2
3 12 0

How can I calculate relative time between two time stamps in hours in pandas?

In the example dataframe below, how can I convert t_relative into hours? For example, the relative time in the first row would be 49 hours.
tstart tend t_relative
0 2131-05-16 23:00:00 2131-05-19 00:00:00 2 days 01:00:00
1 2131-05-16 23:00:00 2131-05-19 00:15:00 2 days 01:15:00
2 2131-05-16 23:00:00 2131-05-19 00:45:00 2 days 01:45:00
3 2131-05-16 23:00:00 2131-05-19 01:00:00 2 days 02:00:00
4 2131-05-16 23:00:00 2131-05-19 01:15:00 2 days 02:15:00
t_relative was calculated with the operation, df['t_relative'] = df['tend']-df['tstart'].

You can divide Timedelta:
df['t_relative']/pd.Timedelta('1H')
Output:
0 49.00
1 49.25
2 49.75
3 50.00
4 50.25
Name: t_relative, dtype: float64

How to resample a df with datetime index to exactly n equally sized periods?

I've got a large dataframe with a datetime index and need to resample data to exactly 10 equally sized periods.
So far, I've tried finding the first and last dates to determine the total number of days in the data, divide that by 10 to determine the size of each period, then resample using that number of days. eg:
first = df.reset_index().timesubmit.min()
last = df.reset_index().timesubmit.max()
periodsize = str((last-first).days/10) + 'D'
df.resample(periodsize,how='sum')
This doesn't guarantee exactly 10 periods in the df after resampling since the periodsize is a rounded down int. Using a float doesn't work in the resampling. Seems that either there's something simple that I'm missing here, or I'm attacking the problem all wrong.

import numpy as np
import pandas as pd
n = 10
nrows = 33
index = pd.date_range('2000-1-1', periods=nrows, freq='D')
df = pd.DataFrame(np.ones(nrows), index=index)
print(df)
# 0
# 2000-01-01 1
# 2000-01-02 1
# ...
# 2000-02-01 1
# 2000-02-02 1
first = df.index.min()
last = df.index.max() + pd.Timedelta('1D')
secs = int((last-first).total_seconds()//n)
periodsize = '{:d}S'.format(secs)
result = df.resample(periodsize, how='sum')
print('\n{}'.format(result))
assert len(result) == n
yields
0
2000-01-01 00:00:00 4
2000-01-04 07:12:00 3
2000-01-07 14:24:00 3
2000-01-10 21:36:00 4
2000-01-14 04:48:00 3
2000-01-17 12:00:00 3
2000-01-20 19:12:00 4
2000-01-24 02:24:00 3
2000-01-27 09:36:00 3
2000-01-30 16:48:00 3
The values in the 0-column indicate the number of rows that were aggregated, since the original DataFrame was filled with values of 1. The pattern of 4's and 3's is about as even as you can get since 33 rows can not be evenly grouped into 10 groups.
Explanation: Consider this simpler DataFrame:
n = 2
nrows = 5
index = pd.date_range('2000-1-1', periods=nrows, freq='D')
df = pd.DataFrame(np.ones(nrows), index=index)
# 0
# 2000-01-01 1
# 2000-01-02 1
# 2000-01-03 1
# 2000-01-04 1
# 2000-01-05 1
Using df.resample('2D', how='sum') gives the wrong number of groups
In [366]: df.resample('2D', how='sum')
Out[366]:
0
2000-01-01 2
2000-01-03 2
2000-01-05 1
Using df.resample('3D', how='sum') gives the right number of groups, but the
second group starts at 2000-01-04 which does not evenly divide the DataFrame
into two equally-spaced groups:
In [367]: df.resample('3D', how='sum')
Out[367]:
0
2000-01-01 3
2000-01-04 2
To do better, we need to work at a finer time resolution than in days. Since Timedeltas have a total_seconds method, let's work in seconds. So for the example above, the desired frequency string would be
In [374]: df.resample('216000S', how='sum')
Out[374]:
0
2000-01-01 00:00:00 3
2000-01-03 12:00:00 2
since there are 216000*2 seconds in 5 days:
In [373]: (pd.Timedelta(days=5) / pd.Timedelta('1S'))/2
Out[373]: 216000.0
Okay, so now all we need is a way to generalize this. We'll want the minimum and maximum dates in the index:
first = df.index.min()
last = df.index.max() + pd.Timedelta('1D')
We add an extra day because it makes the difference in days come out right. In
the example above, There are only 4 days between the Timestamps for 2000-01-05
and 2000-01-01,
In [377]: (pd.Timestamp('2000-01-05')-pd.Timestamp('2000-01-01')).days
Out[378]: 4
But as we can see in the worked example, the DataFrame has 5 rows representing 5
days. So it makes sense that we need to add an extra day.
Now we can compute the correct number of seconds in each equally-spaced group with:
secs = int((last-first).total_seconds()//n)

Here is one way to ensure equal-size sub-periods by using np.linspace() on pd.Timedelta and then classifying each obs into different bins using pd.cut.
import pandas as pd
import numpy as np
# generate artificial data
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100, 2), columns=['A', 'B'], index=pd.date_range('2015-01-01 00:00:00', periods=100, freq='8H'))
Out[87]:
A B
2015-01-01 00:00:00 1.7641 0.4002
2015-01-01 08:00:00 0.9787 2.2409
2015-01-01 16:00:00 1.8676 -0.9773
2015-01-02 00:00:00 0.9501 -0.1514
2015-01-02 08:00:00 -0.1032 0.4106
2015-01-02 16:00:00 0.1440 1.4543
2015-01-03 00:00:00 0.7610 0.1217
2015-01-03 08:00:00 0.4439 0.3337
2015-01-03 16:00:00 1.4941 -0.2052
2015-01-04 00:00:00 0.3131 -0.8541
2015-01-04 08:00:00 -2.5530 0.6536
2015-01-04 16:00:00 0.8644 -0.7422
2015-01-05 00:00:00 2.2698 -1.4544
2015-01-05 08:00:00 0.0458 -0.1872
2015-01-05 16:00:00 1.5328 1.4694
... ... ...
2015-01-29 08:00:00 0.9209 0.3187
2015-01-29 16:00:00 0.8568 -0.6510
2015-01-30 00:00:00 -1.0342 0.6816
2015-01-30 08:00:00 -0.8034 -0.6895
2015-01-30 16:00:00 -0.4555 0.0175
2015-01-31 00:00:00 -0.3540 -1.3750
2015-01-31 08:00:00 -0.6436 -2.2234
2015-01-31 16:00:00 0.6252 -1.6021
2015-02-01 00:00:00 -1.1044 0.0522
2015-02-01 08:00:00 -0.7396 1.5430
2015-02-01 16:00:00 -1.2929 0.2671
2015-02-02 00:00:00 -0.0393 -1.1681
2015-02-02 08:00:00 0.5233 -0.1715
2015-02-02 16:00:00 0.7718 0.8235
2015-02-03 00:00:00 2.1632 1.3365
[100 rows x 2 columns]
# cutoff points, 10 equal-size group requires 11 points
# measured by timedelta 1 hour
time_delta_in_hours = (df.index - df.index[0]) / pd.Timedelta('1h')
n = 10
ts_cutoff = np.linspace(0, time_delta_in_hours[-1], n+1)
# labels, time index
time_index = df.index[0] + np.array([pd.Timedelta(str(time_delta)+'h') for time_delta in ts_cutoff])
# create a categorical reference variables
df['start_time_index'] = pd.cut(time_delta_in_hours, bins=10, labels=time_index[:-1])
# for clarity, reassign labels using end-period index
df['end_time_index'] = pd.cut(time_delta_in_hours, bins=10, labels=time_index[1:])
Out[89]:
A B start_time_index end_time_index
2015-01-01 00:00:00 1.7641 0.4002 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-01 08:00:00 0.9787 2.2409 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-01 16:00:00 1.8676 -0.9773 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-02 00:00:00 0.9501 -0.1514 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-02 08:00:00 -0.1032 0.4106 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-02 16:00:00 0.1440 1.4543 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-03 00:00:00 0.7610 0.1217 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-03 08:00:00 0.4439 0.3337 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-03 16:00:00 1.4941 -0.2052 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-04 00:00:00 0.3131 -0.8541 2015-01-01 00:00:00 2015-01-04 07:12:00
2015-01-04 08:00:00 -2.5530 0.6536 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-04 16:00:00 0.8644 -0.7422 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-05 00:00:00 2.2698 -1.4544 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-05 08:00:00 0.0458 -0.1872 2015-01-04 07:12:00 2015-01-07 14:24:00
2015-01-05 16:00:00 1.5328 1.4694 2015-01-04 07:12:00 2015-01-07 14:24:00
... ... ... ... ...
2015-01-29 08:00:00 0.9209 0.3187 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-29 16:00:00 0.8568 -0.6510 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-30 00:00:00 -1.0342 0.6816 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-30 08:00:00 -0.8034 -0.6895 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-30 16:00:00 -0.4555 0.0175 2015-01-27 09:36:00 2015-01-30 16:48:00
2015-01-31 00:00:00 -0.3540 -1.3750 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-01-31 08:00:00 -0.6436 -2.2234 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-01-31 16:00:00 0.6252 -1.6021 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-01 00:00:00 -1.1044 0.0522 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-01 08:00:00 -0.7396 1.5430 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-01 16:00:00 -1.2929 0.2671 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-02 00:00:00 -0.0393 -1.1681 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-02 08:00:00 0.5233 -0.1715 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-02 16:00:00 0.7718 0.8235 2015-01-30 16:48:00 2015-02-03 00:00:00
2015-02-03 00:00:00 2.1632 1.3365 2015-01-30 16:48:00 2015-02-03 00:00:00
[100 rows x 4 columns]
df.groupby('start_time_index').agg('sum')
Out[90]:
A B
start_time_index
2015-01-01 00:00:00 8.6133 2.7734
2015-01-04 07:12:00 1.9220 -0.8069
2015-01-07 14:24:00 -8.1334 0.2318
2015-01-10 21:36:00 -2.7572 -4.2862
2015-01-14 04:48:00 1.1957 7.2285
2015-01-17 12:00:00 3.2485 6.6841
2015-01-20 19:12:00 -0.8903 2.2802
2015-01-24 02:24:00 -2.1025 1.3800
2015-01-27 09:36:00 -1.1017 1.3108
2015-01-30 16:48:00 -0.0902 -2.5178
Another potential shorter way to do this is to specify your sampling freq as the time delta. But the problem, as shown in below, is that it delivers 11 sub-samples instead of 10. I believe the reason is that the resample implements a left-inclusive/right-exclusive (or left-exclusive/right-inclusive) sub-sampling scheme so that the very last obs at '2015-02-03 00:00:00' is considered as a separate group. If we use pd.cut to do it ourself, we can specify include_lowest=True so that it gives us exactly 10 sub-samples rather than 11.
n = 10
time_delta_str = str((df.index[-1] - df.index[0]) / (pd.Timedelta('1s') * n)) + 's'
df.resample(pd.Timedelta(time_delta_str), how='sum')
Out[114]:
A B
2015-01-01 00:00:00 8.6133 2.7734
2015-01-04 07:12:00 1.9220 -0.8069
2015-01-07 14:24:00 -8.1334 0.2318
2015-01-10 21:36:00 -2.7572 -4.2862
2015-01-14 04:48:00 1.1957 7.2285
2015-01-17 12:00:00 3.2485 6.6841
2015-01-20 19:12:00 -0.8903 2.2802
2015-01-24 02:24:00 -2.1025 1.3800
2015-01-27 09:36:00 -1.1017 1.3108
2015-01-30 16:48:00 -2.2534 -3.8543
2015-02-03 00:00:00 2.1632 1.3365

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to restrict time difference to same day? - python

Related

From hours to String

How to create a set rows for every group depending on the result after the operation in the groupby() Pandas?

How to find occurrence of consecutive events in python timeseries data frame?

How can I calculate relative time between two time stamps in hours in pandas?

How to resample a df with datetime index to exactly n equally sized periods?

Categories

Resources