Date time : ValueError: Can only compare identically-labeled Series objects - Python - python

I am looking to compare two dataframe end1 and tt1, what I want to do is to see when if an event in tt1 influences an event in end1 at roughly the same time
When I am trying to create a simple loop to look for events at roughly the same time , I get the error message :
ValueError: Can only compare identically-labeled Series objects
end1['end_date'] = pd.to_datetime(end1['end_date'], format = '%Y/%m/%d %H:%M')
tt1['Minstart'] = pd.to_datetime(tt1['Minstart'], format = '%Y/%m/%d %H:%M')
tt1['Maxstart'] = pd.to_datetime(tt1['Maxstart'], format = '%Y/%m/%d %H:%M')
for index, row in end1.iterrows():
if end1['end_date'] > tt1['Minstart']:
if end1['end_date'] < tt1['Maxstart']:
d = end1.count(end1.end_date)
print(d)
both are :
pandas.core.series.Series
Thank you

I seems the int_overlaps function from the lubridate package, makes it possible to know if two dates overlaps, and then I am able to choose the events where the int_overlaps is 'TRUE'
However not all overlaps appear as TRUE , I don't understand the reason why

Related

Python find when list of dates becomes non-consecutive

I have a list of dates which are mostly consecutive, for example:
['01-Jan-10', '02-Jan-10', '03-Jan-10', '04-Jan-10', '08-Jan-10', '09-Jan-10', '10-Jan-10', '11-Jan-10', '13-Jan-10']
This is just an illustration as the full list contains thousands of dates.
This list can have couple of spots where the consecutiveness breaks. In the example shown above, it is 05-Jan-10, 07-Jan-10, and then 12-Jan-10. I am looking for the minimal and maximal day in the gap time span. Is there any way to do this efficiently in python?
The datetime package from the standard library can be useful.
Check the right date format and apply it with strptime to all terms in the list, loop through a pairs and check the difference between (in days) them using timedelta arithmetics. To keep the same format (which is non-standard) you need apply strftime.
from datetime import datetime, timedelta
dates = ['01-Jan-10', '02-Jan-10', '03-Jan-10', '04-Jan-10', '08-Jan-10', '09-Jan-10', '10-Jan-10', '11-Jan-10', '13-Jan-10']
# date format code
date_format = '%d-%b-%y'
# cast to datetime objects
days = list(map(lambda d: datetime.strptime(d, date_format).date(), dates))
# check consecutive days
for d1, d2 in zip(days, days[1:]):
date_gap = (d2-d1).days
# check consecutiveness
if date_gap > 1:
# compute day boundary of the gap
min_day_gap, max_day_gap = d1 + timedelta(days=1), d2 - timedelta(days=1)
# apply format
min_day_gap = min_day_gap.strftime(date_format)
max_day_gap = max_day_gap.strftime(date_format)
# check
print(min_day_gap, max_day_gap)
#05-Jan-10 07-Jan-10
#12-Jan-10 12-Jan-10
Remark: it is not clear what would happen when the time gap is of 2 days, in this case the min & max day in the gap are identical. In that case add a conditional check date_gap == 2 and correct the behavior...
if date_gap == 2: ... elif date_gap > 1: ...
or add a comment/edit the question with a proper description.

Using pd.apply() to turn each element in a column to a list, grab the 1st element, and turn it into datetime

I have a dataframe with only one row (in the future, it will be more, but I am just using this as an example). I am trying to grab the "SETTLEMENT_DATE" column.
print(previous["SETTLEMENT_DATE"])
0 2021-06-22 00:00:00.0
Name: SETTLEMENT_DATE, dtype: object.
In order to get the date, I did list(previous["SETTLEMENT_DATE"])[0], which yields:
'2021-06-22 00:00:00.0'. Now I turn this into a date format using:
def create_datetime_object(pd_object):
date_time_str = list(pd_object)[0]
return datetime.strptime(date_time_str, "%Y-%m-%d %H:%M:%S.%f")
This code: create_datetime_object(previous["SETTLEMENT_DATE"])
yields: datetime.datetime(2021, 6, 22, 0, 0)
In the future, I will have multiple rows of data, so I wanted to use pd.apply() to apply this function to the entire column. But when I do that, I get:
previous["SETTLEMENT_DATE"] = previous["SETTLEMENT_DATE"].apply(create_datetime_object)
ValueError: time data '2' does not match format '%Y-%m-%d %H:%M:%S.%f
Does anyone know why I am getting this error and how to solve?
This should work :
previous["SETTLEMENT_DATE"] = previous.apply(lambda r : create_datetime_object(r.SETTLEMENT_DATE) , axis=1)

Detect if time difference is negative in pandas dataframe column

How to detect if timedifference is negative in below code. My data is pandas dataframe.
data['starttime'] = pd.to_datetime(data.starttime, format = '%H:%M:%S.%f') #2014-10-28 21:39:52.654394
data['endtime'] = pd.to_datetime(data.endtime, format = '%H:%M:%S.%f') #2014-10-28 21:37:18.793405
if (data.endtime- data.starttime) < 0: #-1 days +23:57:26.139011
data['timediff'] = (data.endtime- data.starttime)
Above code does not detects time difference is negative or not. It throws me error-
TypeError: Invalid comparison between dtype=timedelta64[ns] and int
data.endtime - data.starttime
Is giving you a timedelta object, you can't compare that directly to an integer, but you should be able to do
duration = data.endtime - data.startime
if duration.total_seconds() < 0:
data['timediff'] = duration
Or compare the two datetime objects directly with something like
if data.endtime > data.starttime:
data['timediff'] = (data.endtime - data.starttime)
Note   -   The logic here assumes that data.endtime and data.starttime are single datetime objects, in the case where they represent an array or array-like of datetime objects (e.g. a DataFrame with more than one row) you will need to iterate over them instead.

Converting an array of dates to datetime format and comparing them

I would like to convert an array of dates and times string to an array of datetime object so I can compare the dates and find the newest one (the one most in the future).
First I convert and then combine the dates and time.
I'm struggling to create an array of datetimes and after some research I'm not sure if it is possible.
dates = ['2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-19','2019-02-19']
times = ['06:15', '18:30', '19:45', '14:20', '16:10','06:10', '18:35', '19:40', '14:25', '16:15' ]
dates_count = len(dates)
dates_obj = []
times_obj = []
for i in range(dates_count):
dates_obj.append(datetime.strptime(dates[i], '%Y-%m-%d'))
times_obj.appned(datetime.strptime(times[i], '%H:%M'))
dates_times_obj = datetime.combine(datetime.date(dates_obj[i]), datetime.time(times[i]))
print (dates_times_obj)
Output error
dates_times_obj = datetime.combine(datetime.date(dates_obj[i]), datetime.time(times[i]))
TypeError: descriptor 'time' requires a 'datetime.datetime' object but received a 'str'
Can you try the following:
datetime.datetime.strptime(dates_obj[i] + times[i], '%Y-%m-%d%H:%M').date()
So in your code it will the following:
dates_times_obj = datetime.datetime.strptime(dates_obj[i] + times[i], '%Y-%m-%d%H:%M').date()
It appears to be a typo, since you referred to times[i] instead of times_obj[i]. Additionally, if you don't need the lists of dates and times object, I would suggest to make the use of some nice features of python language, like zip:
dates = ['2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-18','2019-02-19','2019-02-19']
times = ['06:15', '18:30', '19:45', '14:20', '16:10','06:10', '18:35', '19:40', '14:25', '16:15' ]
for date_str, time_str in zip(dates, times):
date_obj = datetime.strptime(date_str, '%Y-%m-%d')
time_obj = datetime.strptime(time_str, '%H:%M').time()
dates_times_obj = datetime.combine(date_obj, time_obj)
print(dates_times_obj)
You can use the builtin function max as follow:
newest_date = max(
datetime.datetime.strptime( d + " " + t, "%Y-%m-%d %H:%M" )
for d, t in zip( dates, times )
)

PYTHON Numpy where time condition

I have the following target: I need to compare two date columns in the same table and create a 3rd column based on the result of the comparison. I do not know how to compare dates in a np.where statement.
This is my current code:
now = datetime.datetime.now() #set the date to compare
delta = datetime.timedelta(days=7) #set delta
time_delta = now+delta #now+7 days
And here is the np.where statement:
DB['s_date'] = np.where((DB['Start Date']<=time_delta | DB['Start Date'] = (None,"")),DB['Start Date'],RW['date'])
There is an OR condition to take into account the possibility that Start Date column might be empty
Would lambda apply work for you Filippo? It looks at a series row-wise, then applies a function of your choice to every value of the row. Whatever is returned in the function will fill up the series with the values it returns.
def compare(date):
if date <= time_delta or date == None:
#return something
else:
#return something else
DB['s_date'] = DB.apply(lambda x: compare(x))
EDIT: This will work as well (thanks EyuelDK)
DB['s_date'] = DB.apply(compare)
Thank you for the insights. I updated (and adjusted for my purposes) the code as following and it works:
now = datetime.datetime.now() #set the date to compare
delta = datetime.timedelta(days=7) #set delta
time_delta = now+delta #now+7 days
DB['Start'] = np.where(((DB['Start Date']<=time_delta) | (DB['Start Date'].isnull()) | (DB['Start Date'] == "")),DB['Start'],DB['Start Date'])
They key was to add () in each condition separated by |. Otherwise was giving an error by comparing two different data types.

Categories

Resources