Python converting time from string to timedelta to find time difference - python

how do I convert string to timedelta in order to create a new column within my dataframe?
from pandas as pd
from numpy as np
from datetime import timedelta
pricetime = pd.DataFrame({'price1':[22.34, 44.68, 52.98], 'time1':['9:48:14', '15:54:33', '13:13:22'],'price2':[28.88, 47.68, 22.32], 'time2':['10:52:44', '15:59:59', '10:12:22']})
pricetime['price_change'] = np.where(pricetime['time1'] < pricetime['time2'], (pricetime['price1'] - pricetime['price2'])/pricetime['price2'], np.nan)
pricetime['time_diff'] = np.where(pricetime['time1'] < pricetime['time2'], pricetime['time2'] - pricetime['time1'], np.nan)
When I do this. I get an error for the time where I'm subtracting the two different times.
I tried to do this but it gave me an error:
pricetime['price_change'] = np.where((datetime.strptime(pricetime['time1'], '%H:%M:%S') < datetime.strptime(pricetime['time2'], '%H:%M:%S')), (pricetime['price1'] - pricetime['price2'])/pricetime['price2'], np.nan)
pricetime['time_diff'] = np.where((datetime.strptime(pricetime['time1'], '%H:%M:%S') < datetime.strptime(pricetime['time2'], '%H:%M:%S'), datetime.strptime(pricetime['time2'], '%H:%M:%S') - datetime.strptime(pricetime['time1'], '%H:%M:%S'), np.nan)
The error it gave is:
TypeError: strptime() argument 1 must be str, not Series

after a discussion with #Marc_Law the answer he looked for is:
pricetime['time_diff'] = pd.to_datetime(pricetime['time2']) - pd.to_datetime(pricetime['time1'])
pricetime.loc[pd.to_datetime(pricetime['time1']) >= pd.to_datetime(pricetime['time2']),'time_diff'] = np.nan
pricetime['time_diff'] = pricetime['time_diff'].apply(lambda x: str(x).split(' ')[-1:][0])
what he needed is to have the difference only if the value in time1 column was smaller than the value in time2 column, otherwise put np.nan. than return it to string without the "X days".

If you only want to find the difference in time, then you can follow this sample
from datetime import datetime
foo = '9:48:14'
bar = '15:54:33'
foo = datetime.strptime(foo, '%H:%M:%S')
bar = datetime.strptime(bar, '%H:%M:%S')
print(bar - foo)
Output
6:06:19
Further reading

Related

Subtract 2 datetime lists dd/mm/YYYY in pandas

So, Basically, I got this 2 df columns with data content. The initial content is in the dd/mm/YYYY format, and I want to subtract them. But I can't really subtract string, so I converted it to datetime, but when I do such thing for some reason the format changes to YYYY-dd-mm, so when I try to subtract them, I got a wrong result. For example:
Initial Content:
a: 05/09/2022
b: 30/09/2021
result expected: 25 days.
Converted to DateTime:
a: 2022-05-09
b: 2021-09-30 (For some reason this date stills the same)
result: 144 days.
I'm using pandas and datetime to make this project.
So, I wanted to know a way I can subtract this 2 columns with the proper result.
--- Answer
When I used
pd.to_datetime(date, format="%d/%m/%Y")
It worked. Thank you all for your time. This is my first project in pandas. :)
df = pd.DataFrame({'Date1': ['05/09/2021'], 'Date2': ['30/09/2021']})
df = df.apply(lambda x:pd.to_datetime(x,format=r'%d/%m/%Y')).assign(Delta=lambda x: (x.Date2-x.Date1).dt.days)
print(df)
Date1 Date2 Delta
0 2021-09-05 2021-09-30 25
I just answered a similar query here subtracting dates in python
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_format_str = '%Y-%m-%d %H:%M:%S.%f'
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = datetime.strptime(date_1, date_format_str)
end = datetime.strptime(date_2, date_format_str)
diff = end - start
# Get interval between two timstamps as timedelta object
diff_in_hours = diff.total_seconds() / 3600
print(diff_in_hours)
# get the difference between two dates as timedelta object
diff = end.date() - start.date()
print(diff.days)
Pandas
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = pd.to_datetime(date_1, format='%Y-%m-%d %H:%M:%S.%f')
end = pd.to_datetime(date_2, format='%Y-%m-%d %H:%M:%S.%f')
# get the difference between two datetimes as timedelta object
diff = end - start
print(diff.days)

How to sum timestamps Python

I am using the awful library datetime and I trying to do what should be very easy. I have a collection of timestamps in my video file, and I want to simply subtract start_time from end_time and then take the sum of all and output, the total time of the video file. My data in my video file looks like this
<p begin="00:02:42.400" end="00:02:43.080" style="s2">product_1</p>
So my code,
start_time = dt.strptime(begin, '%H:%M:%S.%f')
endie_time = dt.strptime(end, '%H:%M:%S.%f')
diff += endie_time-start_time
What I am trying to do is to keep adding up 'diff'
I get this error,
UnboundLocalError: local variable 'diff' referenced before assignment
I think the error is because diff is a datetime object and it is not an integer. But then when I do `int(diff), nothing works.
How can I do this simple task? I appreciate any help I can get on this annoying problem.
Thanks
The fundamental issue here is that the datetime module deals with real-world wall clock times, whereas you're trying to deal with durations. The only really applicable class in the datetime module to deal with your problem appropriately is therefore timedelta, which essentially expresses durations. To parse your strings into a timedelta, you'll need to do so slightly manually:
>>> from datetime import timedelta
>>> h, m, s = '00:02:43.080'.split(':')
>>> timedelta(hours=int(h), minutes=int(m), seconds=float(s))
datetime.timedelta(seconds=163, microseconds=80000)
If you now have two such timedeltas, you can subtract them:
>>> end - start
datetime.timedelta(microseconds=680000)
And you can add them to an existing timedelta:
diff = timedelta()
diff += end - start
Complete example:
from datetime import timedelta
diff = timedelta()
def parse_ts(ts: str) -> timedelta:
h, m, s = ts.split(':')
return timedelta(hours=int(h), minutes=int(m), seconds=float(s))
timestamps = [('00:02:42.400', '00:02:43.080'), ...]
for start, end in timestamps:
diff += parse_ts(end) - parse_ts(start)
print(diff)
As the comments to the original question say, using
X += Y
requires that you have alredy defined X.
A possible fix would be:
import datetime as dt
diff = dt.timedelta(0) # Initialize the diff with 0
start_time = dt.datetime.strptime(begin, '%H:%M:%S.%f')
endie_time = dt.datetime.strptime(end, '%H:%M:%S.%f')
diff += endie_time-start_time # Accumulate the time difference in diff
Since it seems that you want to iterate over multiple star/end dates:
import datetime as dt
diff = dt.timedelta(0) # Initialize the diff with 0
for begin_i, end_i in zip(begin, end):
start_time = dt.datetime.strptime(begin_i, '%H:%M:%S.%f')
endie_time = dt.datetime.strptime(end_i , '%H:%M:%S.%f')
diff += endie_time-start_time # Accumulate the time difference in diff
In both cases above, diff will be of the dt.timedelta type.

Detect if time difference is negative in pandas dataframe column

How to detect if timedifference is negative in below code. My data is pandas dataframe.
data['starttime'] = pd.to_datetime(data.starttime, format = '%H:%M:%S.%f') #2014-10-28 21:39:52.654394
data['endtime'] = pd.to_datetime(data.endtime, format = '%H:%M:%S.%f') #2014-10-28 21:37:18.793405
if (data.endtime- data.starttime) < 0: #-1 days +23:57:26.139011
data['timediff'] = (data.endtime- data.starttime)
Above code does not detects time difference is negative or not. It throws me error-
TypeError: Invalid comparison between dtype=timedelta64[ns] and int
data.endtime - data.starttime
Is giving you a timedelta object, you can't compare that directly to an integer, but you should be able to do
duration = data.endtime - data.startime
if duration.total_seconds() < 0:
data['timediff'] = duration
Or compare the two datetime objects directly with something like
if data.endtime > data.starttime:
data['timediff'] = (data.endtime - data.starttime)
Note   -   The logic here assumes that data.endtime and data.starttime are single datetime objects, in the case where they represent an array or array-like of datetime objects (e.g. a DataFrame with more than one row) you will need to iterate over them instead.

How to convert datetime to timestamp and calculate difference between dates using lambda function

I need to convert a variable i created into a timestamp from a datetime.
I need it in a timestamp format to perform a lambda function against my pandas series, which is stored as a datetime64.
The lambda function should find the difference in months between startDate and the entire pandas series. Please help?
I've tried using relativedelta to calculate the difference in months but I'm not sure how to implement it with a pandas series.
from datetime import datetime
import pandas as pd
from dateutil.relativedelta import relativedelta as rd
#open the data set and store in the series ('df')
file = pd.read_csv("test_data.csv")
df = pd.DataFrame(file)
#extract column "AccountOpenedDate into a data frame"
open_date_data = pd.Series.to_datetime(df['AccountOpenedDate'], format = '%Y/%m/%d')
#set the variable startDate
dateformat = '%Y/%m/%d %H:%M:%S'
set_date = datetime.strptime('2017/07/01 00:00:00',dateformat)
startDate = datetime.timestamp(set_date)
#This function calculates the difference in months between two dates: ignore
def month_delta(start_date, end_date):
delta = rd(end_date, start_date)
# >>> relativedelta(years=+2, months=+3, days=+28)
return 12 * delta.years + delta.months
d1 = datetime(2017, 7, 1)
d2 = datetime(2019, 10, 29)
total_months = month_delta(d1, d2)
# Apply a lambda function to each row by adding 5 to each value in each column
dfobj = open_date_data.apply(lambda x: x + startDate)
print(dfobj)
I'm only using a single column from the loaded data set. It's a date column in the following format ("%Y/%m/%d %H:%M:%S"). I want to find the difference in months between startDate and all the dates in the series.
As I don't have your original csv, I've made up some sample data and hopefully managed to shorten your code quite a bit:
open_date_data = pd.Series(pd.date_range('2017/07/01', periods=10, freq='M'))
startDate = pd.Timestamp("2017/07/01")
Then, with help from this answer to get the appropriate month_diff formula:
def month_diff(a, b):
return 12 * (a.year - b.year) + (a.month - b.month)
open_date_data.apply(lambda x: month_diff(x, startDate))

Create a date range in julian date python

I have to create a range between two dates, interval between dates of some minutes, in Julian date, i create a code, but is taking a lot of time about(15 minutes, for the ex.)
my code is:
from astropy.time import Time
import pandas as pd
timedelta = "600s"
start = "2018-01-01"
end = "2018-06-30"
dateslist = pd.date_range(start,end, freq =timedelta ).tolist()
dates = pd.DataFrame({'col':dateslist})
dates["col2"] =""
for i in range(len(dateslist)):
#print(i," / ", len(dateslist))
dates["col2"][i] = (Time(str(dateslist[i]).replace(" ", "T"), format="fits").jd)
I tried using Time without for, but is getting error
time = str(list(dates['col'])).replace("[Timestamp('","").replace(" Timestamp('","").replace("')","").replace(" ","T").split(",")
time
Time(time, format="fits")
ValueError: Input values did not match the format class fits
Is there some way of doing this quickly?
Thanks for now,
Use DatetimeIndex.to_julian_date:
dates["col2"] = pd.date_range(start,end, freq = timedelta).to_julian_date()
The equivalent way in astropy would be:
from astropy.time import Time
import astropy.units as u
timedelta = 600 * u.s
start = "2018-01-01"
end = "2018-06-30"
dates["col2"] = np.arange(Time(start).jd, Time(end).jd, timedelta.to_value('day'))
An alternate (perhaps more idiomatic way in astropy) is:
start = Time("2018-01-01")
end = Time("2018-06-30")
timedelta = 600 * u.s
dates = start + timedelta * np.arange((end - start) / timedelta)
This gives you a vector Time object, which you could convert to JD via the jd attribute.

Categories

Resources