How to detect if timedifference is negative in below code. My data is pandas dataframe.
data['starttime'] = pd.to_datetime(data.starttime, format = '%H:%M:%S.%f') #2014-10-28 21:39:52.654394
data['endtime'] = pd.to_datetime(data.endtime, format = '%H:%M:%S.%f') #2014-10-28 21:37:18.793405
if (data.endtime- data.starttime) < 0: #-1 days +23:57:26.139011
data['timediff'] = (data.endtime- data.starttime)
Above code does not detects time difference is negative or not. It throws me error-
TypeError: Invalid comparison between dtype=timedelta64[ns] and int
data.endtime - data.starttime
Is giving you a timedelta object, you can't compare that directly to an integer, but you should be able to do
duration = data.endtime - data.startime
if duration.total_seconds() < 0:
data['timediff'] = duration
Or compare the two datetime objects directly with something like
if data.endtime > data.starttime:
data['timediff'] = (data.endtime - data.starttime)
Note - The logic here assumes that data.endtime and data.starttime are single datetime objects, in the case where they represent an array or array-like of datetime objects (e.g. a DataFrame with more than one row) you will need to iterate over them instead.
Related
how do I convert string to timedelta in order to create a new column within my dataframe?
from pandas as pd
from numpy as np
from datetime import timedelta
pricetime = pd.DataFrame({'price1':[22.34, 44.68, 52.98], 'time1':['9:48:14', '15:54:33', '13:13:22'],'price2':[28.88, 47.68, 22.32], 'time2':['10:52:44', '15:59:59', '10:12:22']})
pricetime['price_change'] = np.where(pricetime['time1'] < pricetime['time2'], (pricetime['price1'] - pricetime['price2'])/pricetime['price2'], np.nan)
pricetime['time_diff'] = np.where(pricetime['time1'] < pricetime['time2'], pricetime['time2'] - pricetime['time1'], np.nan)
When I do this. I get an error for the time where I'm subtracting the two different times.
I tried to do this but it gave me an error:
pricetime['price_change'] = np.where((datetime.strptime(pricetime['time1'], '%H:%M:%S') < datetime.strptime(pricetime['time2'], '%H:%M:%S')), (pricetime['price1'] - pricetime['price2'])/pricetime['price2'], np.nan)
pricetime['time_diff'] = np.where((datetime.strptime(pricetime['time1'], '%H:%M:%S') < datetime.strptime(pricetime['time2'], '%H:%M:%S'), datetime.strptime(pricetime['time2'], '%H:%M:%S') - datetime.strptime(pricetime['time1'], '%H:%M:%S'), np.nan)
The error it gave is:
TypeError: strptime() argument 1 must be str, not Series
after a discussion with #Marc_Law the answer he looked for is:
pricetime['time_diff'] = pd.to_datetime(pricetime['time2']) - pd.to_datetime(pricetime['time1'])
pricetime.loc[pd.to_datetime(pricetime['time1']) >= pd.to_datetime(pricetime['time2']),'time_diff'] = np.nan
pricetime['time_diff'] = pricetime['time_diff'].apply(lambda x: str(x).split(' ')[-1:][0])
what he needed is to have the difference only if the value in time1 column was smaller than the value in time2 column, otherwise put np.nan. than return it to string without the "X days".
If you only want to find the difference in time, then you can follow this sample
from datetime import datetime
foo = '9:48:14'
bar = '15:54:33'
foo = datetime.strptime(foo, '%H:%M:%S')
bar = datetime.strptime(bar, '%H:%M:%S')
print(bar - foo)
Output
6:06:19
Further reading
I have a list of dates which are mostly consecutive, for example:
['01-Jan-10', '02-Jan-10', '03-Jan-10', '04-Jan-10', '08-Jan-10', '09-Jan-10', '10-Jan-10', '11-Jan-10', '13-Jan-10']
This is just an illustration as the full list contains thousands of dates.
This list can have couple of spots where the consecutiveness breaks. In the example shown above, it is 05-Jan-10, 07-Jan-10, and then 12-Jan-10. I am looking for the minimal and maximal day in the gap time span. Is there any way to do this efficiently in python?
The datetime package from the standard library can be useful.
Check the right date format and apply it with strptime to all terms in the list, loop through a pairs and check the difference between (in days) them using timedelta arithmetics. To keep the same format (which is non-standard) you need apply strftime.
from datetime import datetime, timedelta
dates = ['01-Jan-10', '02-Jan-10', '03-Jan-10', '04-Jan-10', '08-Jan-10', '09-Jan-10', '10-Jan-10', '11-Jan-10', '13-Jan-10']
# date format code
date_format = '%d-%b-%y'
# cast to datetime objects
days = list(map(lambda d: datetime.strptime(d, date_format).date(), dates))
# check consecutive days
for d1, d2 in zip(days, days[1:]):
date_gap = (d2-d1).days
# check consecutiveness
if date_gap > 1:
# compute day boundary of the gap
min_day_gap, max_day_gap = d1 + timedelta(days=1), d2 - timedelta(days=1)
# apply format
min_day_gap = min_day_gap.strftime(date_format)
max_day_gap = max_day_gap.strftime(date_format)
# check
print(min_day_gap, max_day_gap)
#05-Jan-10 07-Jan-10
#12-Jan-10 12-Jan-10
Remark: it is not clear what would happen when the time gap is of 2 days, in this case the min & max day in the gap are identical. In that case add a conditional check date_gap == 2 and correct the behavior...
if date_gap == 2: ... elif date_gap > 1: ...
or add a comment/edit the question with a proper description.
I am using the awful library datetime and I trying to do what should be very easy. I have a collection of timestamps in my video file, and I want to simply subtract start_time from end_time and then take the sum of all and output, the total time of the video file. My data in my video file looks like this
<p begin="00:02:42.400" end="00:02:43.080" style="s2">product_1</p>
So my code,
start_time = dt.strptime(begin, '%H:%M:%S.%f')
endie_time = dt.strptime(end, '%H:%M:%S.%f')
diff += endie_time-start_time
What I am trying to do is to keep adding up 'diff'
I get this error,
UnboundLocalError: local variable 'diff' referenced before assignment
I think the error is because diff is a datetime object and it is not an integer. But then when I do `int(diff), nothing works.
How can I do this simple task? I appreciate any help I can get on this annoying problem.
Thanks
The fundamental issue here is that the datetime module deals with real-world wall clock times, whereas you're trying to deal with durations. The only really applicable class in the datetime module to deal with your problem appropriately is therefore timedelta, which essentially expresses durations. To parse your strings into a timedelta, you'll need to do so slightly manually:
>>> from datetime import timedelta
>>> h, m, s = '00:02:43.080'.split(':')
>>> timedelta(hours=int(h), minutes=int(m), seconds=float(s))
datetime.timedelta(seconds=163, microseconds=80000)
If you now have two such timedeltas, you can subtract them:
>>> end - start
datetime.timedelta(microseconds=680000)
And you can add them to an existing timedelta:
diff = timedelta()
diff += end - start
Complete example:
from datetime import timedelta
diff = timedelta()
def parse_ts(ts: str) -> timedelta:
h, m, s = ts.split(':')
return timedelta(hours=int(h), minutes=int(m), seconds=float(s))
timestamps = [('00:02:42.400', '00:02:43.080'), ...]
for start, end in timestamps:
diff += parse_ts(end) - parse_ts(start)
print(diff)
As the comments to the original question say, using
X += Y
requires that you have alredy defined X.
A possible fix would be:
import datetime as dt
diff = dt.timedelta(0) # Initialize the diff with 0
start_time = dt.datetime.strptime(begin, '%H:%M:%S.%f')
endie_time = dt.datetime.strptime(end, '%H:%M:%S.%f')
diff += endie_time-start_time # Accumulate the time difference in diff
Since it seems that you want to iterate over multiple star/end dates:
import datetime as dt
diff = dt.timedelta(0) # Initialize the diff with 0
for begin_i, end_i in zip(begin, end):
start_time = dt.datetime.strptime(begin_i, '%H:%M:%S.%f')
endie_time = dt.datetime.strptime(end_i , '%H:%M:%S.%f')
diff += endie_time-start_time # Accumulate the time difference in diff
In both cases above, diff will be of the dt.timedelta type.
I am looking to compare two dataframe end1 and tt1, what I want to do is to see when if an event in tt1 influences an event in end1 at roughly the same time
When I am trying to create a simple loop to look for events at roughly the same time , I get the error message :
ValueError: Can only compare identically-labeled Series objects
end1['end_date'] = pd.to_datetime(end1['end_date'], format = '%Y/%m/%d %H:%M')
tt1['Minstart'] = pd.to_datetime(tt1['Minstart'], format = '%Y/%m/%d %H:%M')
tt1['Maxstart'] = pd.to_datetime(tt1['Maxstart'], format = '%Y/%m/%d %H:%M')
for index, row in end1.iterrows():
if end1['end_date'] > tt1['Minstart']:
if end1['end_date'] < tt1['Maxstart']:
d = end1.count(end1.end_date)
print(d)
both are :
pandas.core.series.Series
Thank you
I seems the int_overlaps function from the lubridate package, makes it possible to know if two dates overlaps, and then I am able to choose the events where the int_overlaps is 'TRUE'
However not all overlaps appear as TRUE , I don't understand the reason why
I have data from multiple sets of instrumentation with slight variability of timing in csv files. I want to standardise the time intervals to the nearest modulus 15 minute time period so that I can create a dataframe with data from multiple instruments all aligned to the same time interval.
I do my modulus arithmetic with:-
def timeround(dt,multiple):
import datetime as dt2
#set the interval required
#multiple = 15
a, b = divmod(round(dt.minute, -0), multiple)
a = a*multiple
if b >=8 :
a = a+multiple
outputdelta = dt2.timedelta(hours=dt.hour,minutes=a,seconds=00)
# output a new datetime
return (outputdelta)
This works fine with dt entered as
dt2 = dt.datetime.now()
but with at dtype object returns ValueError: invalid literal for int() with base 10: '02-06-2012 18:15:50'
is there a method to convert that object to a datetime object and can I do this through a call to read_csv