I have a range of timestamps with start time and end time. I would like to generate the number of minutes per hour between the two timestamps:
import pandas as pd
start_time = pd.to_datetime('2013-03-26 21:49:08',infer_datetime_format=True)
end_time = pd.to_datetime('2013-03-27 05:21:00, infer_datetime_format=True)
pd.date_range(start_time, end_time, freq='h')
which gives:
DatetimeIndex(['2013-03-26 21:49:08', '2013-03-26 22:49:08',
'2013-03-26 23:49:08', '2013-03-27 00:49:08',
'2013-03-27 01:49:08', '2013-03-27 02:49:08',
'2013-03-27 03:49:08', '2013-03-27 04:49:08'],
dtype='datetime64[ns]', freq='H')
Sample result: I would like to compute the number of minutes bounded by the hour between the start and end times, like below:
2013-03-26 21:00:00' - 10m 52secs
2013-03-26 22:00:00' - 60 m
2013-03-26 23:00:00' - 60 m
2013-03-27 05:00:00' - 21 m
I have looked at pandas resample, but not exactly sure how to achieve this. Any direction is appreciated.
Construct two Series corresponding to the start and end time of each hour. Use clip_lower and clip_upper to restrict them to be within your desired timespan, then subtract:
# hourly range, floored to the nearest hour
rng = pd.date_range(start_time.floor('h'), end_time.floor('h'), freq='h')
# get the left and right endpoints for each hour
# clipped to be inclusive of [start_time, end_time]
left = pd.Series(rng, index=rng).clip_lower(start_time)
right = pd.Series(rng + 1, index=rng).clip_upper(end_time)
# construct a series of the lengths
s = right - left
The resulting output:
2013-03-26 21:00:00 00:10:52
2013-03-26 22:00:00 01:00:00
2013-03-26 23:00:00 01:00:00
2013-03-27 00:00:00 01:00:00
2013-03-27 01:00:00 01:00:00
2013-03-27 02:00:00 01:00:00
2013-03-27 03:00:00 01:00:00
2013-03-27 04:00:00 01:00:00
2013-03-27 05:00:00 00:21:00
Freq: H, dtype: timedelta64[ns]
Utilizing datetime.timedelta() in some sort of for loop seems like it's what you're looking for.
https://docs.python.org/2/library/datetime.html#datetime.timedelta
It seems like this might be a viable solution:
import pandas as pd
import datetime as dt
def bounded_min(t, range_time):
""" For a given timestamp t and considered time interval range_time,
return the desired bounded value in minutes and seconds"""
# min() takes care of the end of the time interval,
# max() takes care of the beginning of the interval
s = (min(t + dt.timedelta(hours=1), range_time.max()) -
max(t, range_time.min())).total_seconds()
if s%60:
return "%dm %dsecs" % (s/60, s%60)
else:
return "%dm" % (s/60)
start_time = pd.to_datetime('2013-03-26 21:49:08',infer_datetime_format=True)
end_time = pd.to_datetime('2013-03-27 05:21:00', infer_datetime_format=True)
range_time = pd.date_range(start_time, end_time, freq='h')
# Include the end of the time range using the union() trick, as described at:
# https://stackoverflow.com/questions/37890391/how-to-include-end-date-in-pandas-date-range-method
range_time = range_time.union([end_time])
# This is essentially timestamps for beginnings of hours
index_time = pd.Series(range_time).apply(lambda x: dt.datetime(year=x.year,
month=x.month,
day=x.day,
hour=x.hour,
minute=0,
second=0))
bounded_mins = index_time.apply(lambda x: bounded_min(x, range_time))
# Put timestamps and values together
bounded_df = pd.DataFrame(bounded_mins, columns=["Bounded Mins"]).set_index(index_time)
print bounded_df
Gotta love the powerful lambdas:). Maybe there is a simpler way to do it though.
Output:
Bounded Mins
2013-03-26 21:00:00 10m 52secs
2013-03-26 22:00:00 60m
2013-03-26 23:00:00 60m
2013-03-27 00:00:00 60m
2013-03-27 01:00:00 60m
2013-03-27 02:00:00 60m
2013-03-27 03:00:00 60m
2013-03-27 04:00:00 60m
2013-03-27 05:00:00 21m
Related
I would like to make a subtraction with date_time in pandas python but with a shift of two rows, I don't know the function
Timestamp
2020-11-26 20:00:00
2020-11-26 21:00:00
2020-11-26 22:00:00
2020-11-26 23:30:00
Explanation:
(2020-11-26 21:00:00) - (2020-11-26 20:00:00)
(2020-11-26 23:30:00) - (2020-11-26 22:00:00)
The result must be:
01:00:00
01:30:00
Firstly you need to check if this is as type datetime.
If not, kindly do pd.to_datetime()
demo = pd.DataFrame(columns=['Timestamps'])
demotime = ['20:00:00','21:00:00','22:00:00','23:30:00']
demo['Timestamps'] = demotime
demo['Timestamps'] = pd.to_datetime(demo['Timestamps'])
Your dataframe would look like:
Timestamps
0 2020-11-29 20:00:00
1 2020-11-29 21:00:00
2 2020-11-29 22:00:00
3 2020-11-29 23:30:00
After that you can either use for loop or while and in that just do:
demo.iloc[i+1,0]-demo.iloc[i,0]
IIUC, you want to iterate on chunks of two and find the difference, one approach is to:
res = df.groupby(np.arange(len(df)) // 2).diff().dropna()
print(res)
Output
Timestamp
1 0 days 01:00:00
3 0 days 01:30:00
Good evening,
is it possible to calculate with - let's say - two columns inside a dataframe and add a third column with the fitting result?
Dataframe (original):
name time_a time_b
name_a 08:00:00 09:00:00
name_b 07:45:00 08:15:00
name_c 07:00:00 08:10:00
name_d 06:00:00 10:00:00
Or to be specific...is it possible to obtain the difference of two times (time_b - time_a) and create a
new column (time_c) at the end of the dataframe?
Dataframe (new):
name time_a time_b time_c
name_a 08:00:00 09:00:00 01:00:00
name_b 07:45:00 08:15:00 00:30:00
name_c 07:00:00 08:10:00 01:10:00
name_d 06:00:00 10:00:00 04:00:00
Thanks and a good night!
If your columns are in datetime or timedelta format:
# New column is a timedelta object
df["time_c"] = (df["time_b"] - df["time_a"])
If your columns are in datetime.time format (which it appears they are):
def time_diff(time_1,time_2):
"""returns the difference between time 1 and time 2 (time_2-time_1)"""
now = datetime.datetime.now()
time_1 = datetime.datetime.combine(now,time_1)
time_2 = datetime.datetime.combine(now,time_2)
return time_2 - time_1
# Apply the function
df["time_c"] = df[["time_a","time_b"]].apply(lambda arr: time_diff(*arr), axis=1)
Alternatively, you can convert to a timedelta by first converting to a string:
df["time_a"]=pd.to_timedelta(df["time_a"].astype(str))
df["time_b"]=pd.to_timedelta(df["time_b"].astype(str))
df["time_c"] = df["time_b"] - df["time_a"]
(Not duplicate / my question is entirely different)
My dataframe looks like this:
# [df2] is day based
time time2
2017-01-01, 2017-01-01 00:12:00
2017-01-02, 2017-01-02 03:15:00
2017-01-03, 2017-01-03 01:25:00
2017-01-04, 2017-01-04 04:12:00
2017-01-05, 2017-01-05 00:45:00
....
# [df] is minute based
time value
2017-01-01 00:01:00, 0.1232
2017-01-01 00:02:00, 0.1232
2017-01-01 00:03:00, 0.1232
2017-01-01 00:04:00, 0.1232
2017-01-01 00:05:00, 0.1232
....
I want to create a new column called time_val_min in [df2] that finds the min value between df2['time2'] and df2['time'] form [df] within the range specified in df2['time'] and df2['time2']
What did I do?
I did df2['time_val_min'] = df[df['time'].dt.hour.between(df2['time'], df2['time'])].min() but it does not work.
Could you please let me know how to fix it?
You can merge the two data frame on date, and filter the time:
# create the date from the time column
df['date'] = df['time'].dt.normalize()
# merge
new_df = (df.merge(df2, left_on='date', # left on date
right_on='time', # right on time, if time is purely beginning of days
how='right',
suffixes=['','_y'])
.query('time < time2')
.groupby('date')
['time'].min()
.to_frame(name='time_val_min')
.merge(df2, right_on='time', left_index=True)
)
Output:
time_val_min time time2
0 2017-01-01 00:01:00 2017-01-01 2017-01-01 00:12:00
I have pandas dataframe with two timestamps columns start and end
start end
2014-08-28 17:00:00 | 2014-08-29 22:00:00
2014-08-29 10:45:00 | 2014-09-01 17:00:00
2014-09-01 15:00:00 | 2014-09-01 19:00:00
The intention is to aggregate the number of hours that were logged on a given date. So in the case of my example.
I would be creating date range and aggreating the hours over multiple entries.
2014-08-28 -> 7 hrs
2014-08-29 -> 10 hrs + 1 hr 15 min => 11 hrs 15 mins
2014-08-30 -> 24 hrs
2014-08-31 -> 24 hrs
2014-09-01 -> 17 hrs + 4 hrs => 21 hrs
I've tried using timedelta but it only splits in absolute hours, not on a per day basis.
I've also tried to explode the rows(i.e split the row on a day basis but I could only get it to works at a date level, not at a time stamp level)
Any suggestion are greatly appreciated.
you can use of pd.date_range to create a minute to minute interval of each day that spent, after that you can count the spent minutes and convert it to time delta
start end
0 2014-08-28 17:00:00 2014-08-29 22:00:00
1 2014-08-29 10:45:00 2014-09-01 17:00:00
2 2014-09-01 15:00:00 2014-09-01 19:00:00
#Creating the minute to minute time intervals from start to end date of each line and creating as one series of dates
a = pd.Series(sum(df.apply(lambda x: pd.date_range(x['start'],x['end'],freq='min').tolist(),1).tolist(),[])).dt.date
# Counting the each mintue intervals and converting to time stamps
a.value_counts().apply(lambda x: pd.to_timedelta(x,'m'))
Out:
2014-08-29 1 days 11:16:00
2014-08-30 1 days 00:00:00
2014-08-31 1 days 00:00:00
2014-09-01 0 days 21:02:00
2014-08-28 0 days 07:00:00
dtype: timedelta64[ns]
Hope that would be useful. I guess you'll be able to adjust to serve your purpose. Way to thinking is the following - store day and corresponding time in dict. if it's the same day - just write difference. Otherwise write time till first midnight, iterate whenever days needed and write time from last midnight till end. FYI... I guess for 2014-09-01 result might be 21 hrs.
from datetime import datetime, timedelta
from collections import defaultdict
s = [('2014-08-28 17:00:00', '2014-08-29 22:00:00'),
('2014-08-29 10:45:00', '2014-09-01 17:00:00'),
('2014-09-01 15:00:00', '2014-09-01 19:00:00') ]
def aggreate(time):
store = defaultdict(timedelta)
for slice in time:
start = datetime.strptime(slice[0], "%Y-%m-%d %H:%M:%S")
end = datetime.strptime(slice[1], "%Y-%m-%d %H:%M:%S")
start_date = start.date()
end_date = end.date()
if start_date == end_date:
store[start_date] += end - start
else:
midnight = datetime(start.year, start.month, start.day + 1, 0, 0, 0)
part1 = midnight - start
store[start_date] += part1
for i in range(1, (end_date - start_date).days):
next_date = start_date + timedelta(days=i)
store[next_date] += timedelta(hours=24)
last_midnight = datetime(end_date.year, end_date.month, end_date.day, 0, 0, 0)
store[end_date] += end - last_midnight
return store
r = aggreate(s)
for i in r:
print(i, r[i])
2014-08-28 7:00:00
2014-08-29 1 day, 11:15:00
2014-08-30 1 day, 0:00:00
2014-08-31 1 day, 0:00:00
2014-09-01 21:00:00
I have a dataset with measurements acquired almost every 2-hours over a week. I would like to calculate a mean of measurements taken at the same time on different days. For example, I want to calculate the mean of every measurement taken between 12:00 and 13:59.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
#generating test dataframe
date_today = datetime.now()
time_of_taken_measurment = pd.date_range(date_today, date_today +
timedelta(72), freq='2H20MIN')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100,
size=len(time_of_taken_measurment))
df = pd.DataFrame({'measurementTimestamp': time_of_taken_measurment, 'measurment': data})
df = df.set_index('measurementTimestamp')
#Calculating the mean for measurments taken in the same hour
hourly_average = df.groupby([df.index.hour]).mean()
hourly_average
The code above gives me this output:
0 47.967742
1 43.354839
2 46.935484
.....
22 42.833333
23 52.741935
I would like to have a result like this:
0 mean0
2 mean1
4 mean2
.....
20 mean10
22 mean11
I was trying to solve my problem using rolling_mean function, but I could not find a way to apply it to my static case.
Use the built-in floor functionality of datetimeIndex, which allows you to easily create 2 hour time bins.
df.groupby(df.index.floor('2H').time).mean()
Output:
measurment
00:00:00 51.516129
02:00:00 54.868852
04:00:00 52.935484
06:00:00 43.177419
08:00:00 43.903226
10:00:00 55.048387
12:00:00 50.639344
14:00:00 48.870968
16:00:00 43.967742
18:00:00 49.225806
20:00:00 43.774194
22:00:00 50.590164