Subtract time only from two datetime columns in Pandas - python

I am looking to do something like in this thread. However, I only want to subtract the time component of the two datetime columns.
For eg., given this dataframe:
ts1 ts2
0 2018-07-25 11:14:00 2018-07-27 12:14:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00
The expected output for ts2 -ts1 time component only should give:
ts1 ts2 ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00 1:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -0:17:00
So, for row 0: the time for ts2 is 12:14:00, the time for ts1 is 11:14:00. The expected output is just these two times subtracting (don't care about the days). In this case:
12:14:00 - 11:14:00 = 1:00:00.
How would I do this in one single line?

Since you only want the time difference and you're not working with timezone-aware datetime, the date does not matter. Therefore you don't have to change any dates or set some arbitrary reference date. Just work with what you have.
Subtract ts1's time component from ts2 as a timedelta, then convert the resulting datetime to a timedelta by subtracting ts2' date:
df["delta_time"] = (df["ts2"] - pd.to_timedelta(df["ts1"].dt.time.astype(str))) - df["ts2"].dt.floor("d")
df
ts1 ts2 delta_time
0 2018-07-25 11:14:00 2018-07-27 12:14:00 0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1 days +23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -1 days +23:43:00

You need to set both datetimes to a common date first.
One way is to use pandas.DateOffset:
o = pd.DateOffset(day=1, month=1, year=2022) # the exact numbers don't matter
# reset dates
ts1 = df['ts1'].add(o)
ts2 = df['ts2'].add(o)
# subtract
df['ts_delta'] = ts2.sub(ts1)
As one-liner:
df['ts_delta'] = df['ts2'].add((o:=pd.DateOffset(day=1, month=1, year=2022))).sub(df['ts1'].add(o))
Other way using a difference between ts2-ts1 (with dates) and ts2-ts1 (dates only):
df['ts_delta'] = (df['ts2'].sub(df['ts1'])
-df['ts2'].dt.normalize().sub(df['ts1'].dt.normalize())
)
output:
ts1 ts2 ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00 0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1 days +23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -1 days +23:43:00
NB. don't get confused by the -1 days +23:00:00, this is actually the ways to represent -1hour

I've tried to simulate your problem in my local environment. Apparently pandas.datetime64' types supporting add/subtract operations. You don't actually need to access datetime` object to execute these operations.
I did my experiments as below;
import pandas as pd
df = pd.DataFrame({'a' : ['2018-07-25 11:14:00', '2018-08-26 11:15:00', '2018-07-29 11:17:00'],
'b' : ['2018-07-27 12:14:00', '2018-09-24 10:15:00', '2018-07-22 11:00:00'] })
df['a'] = pd.to_datetime(df['a'])
df['b'] = pd.to_datetime(df['b'])
df['d'] = df['b'] - df['a']
and df is like;
a b d
0 2018-07-25 11:14:00 2018-07-27 12:14:00 2 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 28 days 23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -8 days +23:43:00

Try this, first strip time, then put time on same day, subtract, and take absolute value.
l = lambda x: pd.to_datetime("01-01-1900 " + x)
df["ts_delta"] = (
df["ts2"].dt.time.astype(str).apply(l) - df["ts1"].dt.time.astype(str).apply(l)
).abs()
df
Output:
ts1 ts2 ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00 0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 0 days 01:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 0 days 00:17:00

Use df['col_name'].dt.time to get time from date time column. Let's assume your dataframe name is df. Now, in your case
t1 = df['ts1'].dt.time
t2 = df['ts2'].dt.time
df['ts_delta'] = t2 - t1
For single line
df['ts_delta'] = df['ts2'].dt.time - df['ts1'].dt.time
I hope it will resolve your issue. Happy Coding!

Related

Pandas datetime - keep time only as dtype datetime

I want the time without the date in Pandas.
I want to keep the time as dtype datetime64[ns] and not as an object so that I can determine periods between times.
The closest I have gotten is as follows, but it gives back the date in a new column not the time as needed as dtype datetime.
df_pres_mf['time'] = pd.to_datetime(df_pres_mf['time'], format ='%H:%M', errors = 'coerce') # returns date (1900-01-01) and actual time as a dtype datetime64[ns] format
df_pres_mf['just_time'] = df_pres_mf['time'].dt.date
df_pres_mf['normalised_time'] = df_pres_mf['time'].dt.normalize()
df_pres_mf.head()
Returns the date as 1900-01-01 and not the time that is needed.
Edit: Data
time
1900-01-01 11:16:00
1900-01-01 15:20:00
1900-01-01 09:55:00
1900-01-01 12:01:00
You could do it like Vishnudev suggested but then you would have dtype: object (or even strings, after using dt.strftime), which you said you didn't want.
What you are looking for doesn't exist, but the closest thing that I can get you is converting to timedeltas. Which won't seem like a solution at first but is actually very useful.
Convert it like this:
# sample df
df
>>
time
0 2021-02-07 09:22:00
1 2021-05-10 19:45:00
2 2021-01-14 06:53:00
3 2021-05-27 13:42:00
4 2021-01-18 17:28:00
df["timed"] = df.time - df.time.dt.normalize()
df
>>
time timed
0 2021-02-07 09:22:00 0 days 09:22:00 # this is just the time difference
1 2021-05-10 19:45:00 0 days 19:45:00 # since midnight, which is essentially the
2 2021-01-14 06:53:00 0 days 06:53:00 # same thing as regular time, except
3 2021-05-27 13:42:00 0 days 13:42:00 # that you can go over 24 hours
4 2021-01-18 17:28:00 0 days 17:28:00
this allows you to calculate periods between times like this:
# subtract the last time from the current
df["difference"] = df.timed - df.timed.shift()
df
Out[48]:
time timed difference
0 2021-02-07 09:22:00 0 days 09:22:00 NaT
1 2021-05-10 19:45:00 0 days 19:45:00 0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00 # <-- this is because the last
3 2021-05-27 13:42:00 0 days 13:42:00 0 days 06:49:00 # time was later than the current
4 2021-01-18 17:28:00 0 days 17:28:00 0 days 03:46:00 # (see below)
to get rid of odd differences, make it absolute:
df["abs_difference"] = df.difference.abs()
df
>>
time timed difference abs_difference
0 2021-02-07 09:22:00 0 days 09:22:00 NaT NaT
1 2021-05-10 19:45:00 0 days 19:45:00 0 days 10:23:00 0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00 0 days 12:52:00 ### <<--
3 2021-05-27 13:42:00 0 days 13:42:00 0 days 06:49:00 0 days 06:49:00
4 2021-01-18 17:28:00 0 days 17:28:00 0 days 03:46:00 0 days 03:46:00
Use proper formatting according to your date format and convert to datetime
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
Format according to the preferred format
df['time'].dt.strftime('%H:%M')
Output
0 11:16
1 15:20
2 09:55
3 12:01
Name: time, dtype: object

Pandas: How to set hour of a datetime from another column?

I have a dataframe including a datetime column for date and a column for hour.
like this:
min hour date
0 0 2020-12-01
1 5 2020-12-02
2 6 2020-12-01
I need a datetime column including both date and hour.
like this :
min hour date datetime
0 0 2020-12-01 2020-12-01 00:00:00
0 5 2020-12-02 2020-12-02 05:00:00
0 6 2020-12-01 2020-12-01 06:00:00
How can I do it?
Use pd.to_datetime and pd.to_timedelta:
In [393]: df['date'] = pd.to_datetime(df['date'])
In [396]: df['datetime'] = df['date'] + pd.to_timedelta(df['hour'], unit='h')
In [405]: df
Out[405]:
min hour date datetime
0 0 0 2020-12-01 2020-12-01 00:00:00
1 1 5 2020-12-02 2020-12-02 05:00:00
2 2 6 2020-12-01 2020-12-01 06:00:00
You could also try using apply and np.timedelta64:
df['datetime'] = df['date'] + df['hour'].apply(lambda x: np.timedelta64(x, 'h'))
print(df)
Output:
min hour date datetime
0 0 0 2020-12-01 2020-12-01 00:00:00
1 1 5 2020-12-02 2020-12-02 05:00:00
2 2 6 2020-12-01 2020-12-01 06:00:00
In the first question it is not clear the data type of columns, so i thought they are
in date (not pandas) and he want the datetime version.
If this is the case so, solution is similar to the previous, but using a different constructor.
from datetime import datetime
df['datetime'] = df.apply(lambda x: datetime(x.date.year, x.date.month, x.date.day, int(x['hour']), int(x['min'])), axis=1)

How to find the datetime difference between rows in a column, based on the condition?

I have the following pandas DataFrame df:
date time val1
2018-12-31 09:00:00 15
2018-12-31 10:00:00 22
2018-12-31 11:00:00 19
2018-12-31 11:30:00 10
2018-12-31 11:45:00 5
2018-12-31 12:00:00 1
2018-12-31 12:05:00 6
I want to find how many minutes are between the val1 value that is greater than 20 and the val1 value that is lower than or equal to 5?
In this example, the answer is 1 hour and 45 minutes = 95 minutes.
I know how to check the difference between two datetime values:
(df.from_datetime-df.to_datetime).astype('timedelta64[m]')
But how to slice it over the DataFrame, detecting the proper rows?
UPDATE: Taking into consideration that date might be different
Convert the date column to a datetime object and time column to a timedelta object and combine them to get another datetime object
df.time = pd.to_timedelta(df.time)
df.date = pd.to_datetime(df.date)
df['date_time'] = df['date'] + df['time']
df
date time val1 date_time
0 2018-12-31 09:00:00 15 2018-12-31 09:00:00
1 2018-12-31 10:00:00 22 2018-12-31 10:00:00
2 2018-12-31 11:00:00 19 2018-12-31 11:00:00
3 2018-12-31 11:30:00 10 2018-12-31 11:30:00
4 2018-12-31 11:45:00 5 2018-12-31 11:45:00
5 2018-12-31 12:00:00 1 2018-12-31 12:00:00
6 2018-12-31 12:05:00 6 2018-12-31 12:05:00
Now could use one of these two methods
1) Love lambdas and this works with Series objects.
subtr = lambda d1, d2: abs(d1 - d2)/np.timedelta64(1, 'm')
d20 = df[df.val1 > 20].date_time.iloc[0]
d5 = df[df.val1 <= 5].date_time.iloc[0]
subtr(d20, d5)
105.0
2) Needs DataFrame object instead of Series object. Hinders with my aesthetics
d20 = df[df.val1 <= 5][['date_time']].iloc[0]
d5 = df[df.val1 > 20][['date_time']].iloc[0]
abs(d5 - d20).astype('timedelta64[m]')[0]
105.0
So this is my approach:
1) Filter out any val1 that is not >= 20 or <= 5
df = pd.DataFrame({'date':['2018-12-31','2018-12-31','2018-12-31','2018-12-31','2018-12-31','2018-12-31','2018-12-31'],
'time':['09:00:00', '10:00:00', '11:00:00', '11:30:00', '11:45:00', '12:00:00', '12:05:00'],
'val1': [15,22,19,10,5,1,6]})
df2 = df[(df['val1'] >= 20)|(df['val1'] <= 5)].copy()
Then we will do the following code:
df2['TimeDiff'] = np.where(df2['val1'] - df2['val1'].shift(-1) >= 15,
df2['time'].astype('datetime64[ns]').shift(-1) - df2['time'].astype('datetime64[ns]'),
np.NaN)
Let me go through this.
np.where is a if statement, where if the first statment is true it will do the second, if not true then the third.
df2['val1'] - df2['val1'].shift(-1) >= 15 Since we filtered the df the minimum difference between two rows must be great than or equal to 15.
If it is true:
df2['time'].astype('datetime64[ns]').shift(-1) - df2['time'].astype('datetime64[ns]') We take the later time and subtract it from the beginning time.
If not true, we just return np.NaN
We get a df that looks like the following:
date time val1 TimeDiff
1 2018-12-31 10:00:00 22 01:45:00
4 2018-12-31 11:45:00 5 NaT
5 2018-12-31 12:00:00 1 NaT
If you want to put the TimeDiff on the end time you can do the following:
df2['TimeDiff'] = np.where(df2['val1'] - df2['val1'].shift(1) <= -15,
df2['time'].astype('datetime64[ns]') - df2['time'].astype('datetime64[ns]').shift(),
np.NaN)
and you will get:
date time val1 TimeDiff
1 2018-12-31 10:00:00 22 NaT
4 2018-12-31 11:45:00 5 01:45:00
5 2018-12-31 12:00:00 1 NaT

Reorder timestamps pandas

I have a pandas column that contain timestamps that are unordered. When I sort them it works fine except for the values H:MM:SS.
d = ({
'A' : ['8:00:00','9:00:00','10:00:00','20:00:00','24:00:00','26:20:00'],
})
df = pd.DataFrame(data=d)
df = df.sort_values(by='A',ascending=True)
Out:
A
2 10:00:00
3 20:00:00
4 24:00:00
5 26:20:00
0 8:00:00
1 9:00:00
Ideally, I'd like to add a zero before 5 letter strings. If I convert them all to time delta it converts the times after midnight into 1 day plus n amount of hours. e.g.
df['A'] = pd.to_timedelta(df['A'])
A
0 0 days 08:00:00
1 0 days 09:00:00
2 0 days 10:00:00
3 0 days 20:00:00
4 1 days 00:00:00
5 1 days 02:20:00
Intended Output:
A
0 08:00:00
1 09:00:00
2 10:00:00
3 20:00:00
4 24:00:00
5 26:20:00
If you only need to sort by the column as timedelta, you can convert the column to timedelta and use argsort on it to create the sorting order to sort the data frame:
df.iloc[pd.to_timedelta(df.A).argsort()]
# A
#0 8:00:00
#1 9:00:00
#2 10:00:00
#3 20:00:00
#4 24:00:00
#5 26:20:00

Convert and order time in a pandas df

I am trying to order timestamps in a pandas df. The times begin around 08:00:00 am and finish around 3:00:00 am. I'd like to add 24hrs to times after midnight. So times read 08:00:00 to 27:00:00 am. The problem is the times aren't ordered.
Example:
import pandas as pd
d = ({
'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'],
})
df = pd.DataFrame(data=d)
If I try order the times via
df = pd.DataFrame(data=d)
df['time'] = pd.to_timedelta(df['time'])
df = df.sort_values(by='time',ascending=True)
Out:
time
4 02:00:00
6 03:00:00
0 08:00:00
1 12:00:00
5 13:00:00
2 16:00:00
3 20:00:00
Whereas I'm hoping the output is:
time
0 08:00:00
1 12:00:00
2 13:00:00
3 16:00:00
4 20:00:00
5 26:00:00
6 27:00:00
I'm not sure if this can be done though. Specifically, if I can differentiate between 8:00:00 am and the times after midnight (1am-3am).
Add a day offset for times after midnight and before when a new "day" is supposed to begin (pick some time after 3 am & before 7 am) & then sort values
cutoff, day = pd.to_timedelta(['3.5H', '24H'])
df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True)
# Out:
0 0 days 08:00:00
1 0 days 12:00:00
2 0 days 13:00:00
3 0 days 16:00:00
4 0 days 20:00:00
5 1 days 02:00:00
6 1 days 03:00:00
The last two values are numerically equal to 26 hours & 27 hours, just displayed differently.
If you need them in HH:MM:SS format, use string-formatting with the appropriate timedelta components
Ex:
x = df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True).dt.components
x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)
#Out:
0 08:00:00
1 12:00:00
2 13:00:00
3 16:00:00
4 20:00:00
5 26:00:00
6 27:00:00
dtype: object

Categories

Resources