Python Nested list -Time intervals - intersection and difference - python

I have a problem with a nested list, time as elements
time=[(2017-01-01T00:00:00.000000Z,2017-01-01T00:00:39.820000Z),
(2017-01-01T00:00:38.840000Z,2017-01-01T01:36:33.260000Z),
(2017-01-01T01:36:45.960000Z,2017-01-01T03:06:15.340000Z),
(2017-01-01T03:06:24.320000Z,2017-01-01T03:31:00.420000Z),
(2017-01-01T03:31:22.880000Z,2017-01-01T03:48:43.500000Z),
(2017-01-01T03:48:53.280000Z,2017-01-01T04:14:53.660000Z),
(2017-01-01T04:15:15.160000Z,2017-01-01T04:34:44.060000Z),
(2017-01-01T04:34:57.440000Z,2017-01-01T04:46:31.100000Z),
(2017-01-01T04:46:53.320000Z,2017-01-01T05:22:20.340000Z),
(2017-01-01T05:22:24.920000Z,2017-01-01T06:17:30.900000Z),
(2017-01-01T06:18:02.280000Z,2017-01-01T07:01:45.740000Z),
(2017-01-01T07:02:04.640000Z,2017-01-01T07:39:48.780000Z),
(2017-01-01T07:40:12.400000Z,2017-01-01T08:19:46.140000Z),
(2017-01-01T08:20:13.520000Z,2017-01-01T10:17:45.380000Z),
(2017-01-01T10:17:59.880000Z,2017-01-01T15:01:29.100000Z),
(2017-01-01T15:01:55.840000Z,2017-01-01T15:08:45.460000Z),
(2017-01-01T15:09:04.000000Z,2017-01-01T15:42:13.180000Z),
(2017-01-01T15:42:30.360000Z,2017-01-01T16:14:07.340000Z),
(2017-01-01T16:14:24.560000Z,2017-01-01T17:11:28.420000Z),
(2017-01-01T17:11:32.960000Z,2017-01-01T17:46:07.660000Z),
(2017-01-01T17:46:30.280000Z,2017-01-01T18:02:17.860000Z),
(2017-01-01T18:02:35.240000Z,2017-01-01T18:16:17.740000Z),
(2017-01-01T18:16:26.720000Z,2017-01-01T18:39:10.540000Z),
(2017-01-01T18:39:19.360000Z,2017-01-01T19:45:25.860000Z),
(2017-01-01T19:45:34.720000Z,2017-01-01T20:41:00.220000Z),
(2017-01-01T20:41:21.520000Z,2017-01-01T21:13:51.660000Z),
(2017-01-01T21:14:13.360000Z,2017-01-01T21:41:16.220000Z),
(2017-01-01T21:41:28.640000Z,2017-01-01T22:03:03.820000Z),
(2017-01-01T22:03:29.400000Z,2017-01-01T23:14:13.500000Z),
(2017-01-01T23:14:35.200000Z,2017-01-01T23:59:59.980000Z)]
as you can see, all the elements belong to the same day, 2017-01-01, what I want to do is the difference (in seconds or ms) of the entire day (86400s) and all the intervals in the list, but there are some overlaps, so I think that first I have to do some kind of "intersection check", and after all the intersections are set, just do the difference between all the elements and 86400, but how can I do that intersection check?. Any suggestion would be highly appreciated, Thanks in advance!
Desired Output:
86400(day) - 85000(possible time in seconds after time intersection of list) = 1400

The problem is twofold:
to replace overlapping intervals with their unions;
to sum the resulting non-overlapping intervals.
The first part can be done like this:
time.sort()
new_time = [list(time[0])]
for t in time[1:]:
if t[0] <= new_time[-1][1]:
if t[1] > new_time[-1][1]:
new_time[-1][1] = t[1]
else:
new_time.append(list(t))
while the second part is best done using datetime module:
import datetime
total = sum([ ( datetime.datetime.strptime(t[1], '%Y-%m-%dT%H:%M:%S.%fZ') -
datetime.datetime.strptime(t[0], '%Y-%m-%dT%H:%M:%S.%fZ') ).total_seconds()
for t in new_time ])
print(86400 - total)

After converting strings to numbers, you could use the top answer from Python find continuous interesctions of intervals

You can sort and then merge any overlaps
time.sort()
noOverlapList = []
start = time[0][0] # start of first interval
end = time[0][1] # end of first interval
for interval in time:
# if interval overlaps with tempInterval
if interval[0] < end and interval[1] > end:
end = interval[1]
else if interval[0] > end:
noOverlapList.append((start, end)) # merged non overlapping interval
start = interval[0]
end = interval[1]
Then just sum the intervals contained in noOverlaplList, and get the difference

Related

python 3 datetime difference in microseconds giving wrong answer for a long operation

I'm doing a delete operation of 3000 elements from a binary search tree of size 6000 ( sorted therefore one sided tree). I need to calculate the time taken for completing all the deletes
I did this
bst2 = foo.BinarySearchTree() #init
insert_all_to_tree(bst2,insert_lines) #insert 6000 elements
start = datetime.now() #start time
for idx, line in enumerate(lines):
bst2.delete(line) #deleting
if (idx%10 == 0):
print("deleted ", (idx+1), "th element - ", line)
end = datetime.now() #completion time
duration = end - start
print(duration.microseconds) #duration in microseconds
I got the answer 761716 microseconds which is less than even a minute when my actual code ran for about 5 hours. I expected something in the ranges of 10^9 - 10^10. I even checked the max integer allowed in python to see if it's related to that but apparently that's not the problem.
Why I'm I getting a wrong answer for the duration?
datetime.now() returns a datetime, so doing math with it doesn't work out. You want to either use time.time() (Python < v3.3), time.perf_counter() (Python v3.3 until v3.7) or time.perf_counter_ns() (Python > v3.7).
time.time() and time.perf_counter() both return float, and time.perf_counter_ns() returns int.

Check overlapping reservation [duplicate]

In MySQL, If I have a list of date ranges (range-start and range-end). e.g.
10/06/1983 to 14/06/1983
15/07/1983 to 16/07/1983
18/07/1983 to 18/07/1983
And I want to check if another date range contains ANY of the ranges already in the list, how would I do that?
e.g.
06/06/1983 to 18/06/1983 = IN LIST
10/06/1983 to 11/06/1983 = IN LIST
14/07/1983 to 14/07/1983 = NOT IN LIST
This is a classical problem, and it's actually easier if you reverse the logic.
Let me give you an example.
I'll post one period of time here, and all the different variations of other periods that overlap in some way.
|-------------------| compare to this one
|---------| contained within
|----------| contained within, equal start
|-----------| contained within, equal end
|-------------------| contained within, equal start+end
|------------| not fully contained, overlaps start
|---------------| not fully contained, overlaps end
|-------------------------| overlaps start, bigger
|-----------------------| overlaps end, bigger
|------------------------------| overlaps entire period
on the other hand, let me post all those that doesn't overlap:
|-------------------| compare to this one
|---| ends before
|---| starts after
So if you simple reduce the comparison to:
starts after end
ends before start
then you'll find all those that doesn't overlap, and then you'll find all the non-matching periods.
For your final NOT IN LIST example, you can see that it matches those two rules.
You will need to decide wether the following periods are IN or OUTSIDE your ranges:
|-------------|
|-------| equal end with start of comparison period
|-----| equal start with end of comparison period
If your table has columns called range_end and range_start, here's some simple SQL to retrieve all the matching rows:
SELECT *
FROM periods
WHERE NOT (range_start > #check_period_end
OR range_end < #check_period_start)
Note the NOT in there. Since the two simple rules finds all the non-matching rows, a simple NOT will reverse it to say: if it's not one of the non-matching rows, it has to be one of the matching ones.
Applying simple reversal logic here to get rid of the NOT and you'll end up with:
SELECT *
FROM periods
WHERE range_start <= #check_period_end
AND range_end >= #check_period_start
Taking your example range of 06/06/1983 to 18/06/1983 and assuming you have columns called start and end for your ranges, you could use a clause like this
where ('1983-06-06' <= end) and ('1983-06-18' >= start)
i.e. check the start of your test range is before the end of the database range, and that the end of your test range is after or on the start of the database range.
If your RDBMS supports the OVERLAP() function then this becomes trivial -- no need for homegrown solutions. (In Oracle it apparantly works but is undocumented).
In your expected results you say
06/06/1983 to 18/06/1983 = IN LIST
However, this period does not contain nor is contained by any of the periods in your table (not list!) of periods. It does, however, overlap the period 10/06/1983 to 14/06/1983.
You may find the Snodgrass book (http://www.cs.arizona.edu/people/rts/tdbbook.pdf) useful: it pre-dates mysql but the concept of time hasn't changed ;-)
I created function to deal with this problem in MySQL. Just convert the dates to seconds before use.
DELIMITER ;;
CREATE FUNCTION overlap_interval(x INT,y INT,a INT,b INT)
RETURNS INTEGER DETERMINISTIC
BEGIN
DECLARE
overlap_amount INTEGER;
IF (((x <= a) AND (a < y)) OR ((x < b) AND (b <= y)) OR (a < x AND y < b)) THEN
IF (x < a) THEN
IF (y < b) THEN
SET overlap_amount = y - a;
ELSE
SET overlap_amount = b - a;
END IF;
ELSE
IF (y < b) THEN
SET overlap_amount = y - x;
ELSE
SET overlap_amount = b - x;
END IF;
END IF;
ELSE
SET overlap_amount = 0;
END IF;
RETURN overlap_amount;
END ;;
DELIMITER ;
Look into the following example. It will helpful for you.
SELECT DISTINCT RelatedTo,CAST(NotificationContent as nvarchar(max)) as NotificationContent,
ID,
Url,
NotificationPrefix,
NotificationDate
FROM NotificationMaster as nfm
inner join NotificationSettingsSubscriptionLog as nfl on nfm.NotificationDate between nfl.LastSubscribedDate and isnull(nfl.LastUnSubscribedDate,GETDATE())
where ID not in(SELECT NotificationID from removednotificationsmaster where Userid=#userid) and nfl.UserId = #userid and nfl.RelatedSettingColumn = RelatedTo
Try This on MS SQL
WITH date_range (calc_date) AS (
SELECT DATEADD(DAY, DATEDIFF(DAY, 0, [ending date]) - DATEDIFF(DAY, [start date], [ending date]), 0)
UNION ALL SELECT DATEADD(DAY, 1, calc_date)
FROM date_range
WHERE DATEADD(DAY, 1, calc_date) <= [ending date])
SELECT P.[fieldstartdate], P.[fieldenddate]
FROM date_range R JOIN [yourBaseTable] P on Convert(date, R.calc_date) BETWEEN convert(date, P.[fieldstartdate]) and convert(date, P.[fieldenddate])
GROUP BY P.[fieldstartdate], P.[fieldenddate];
CREATE FUNCTION overlap_date(s DATE, e DATE, a DATE, b DATE)
RETURNS BOOLEAN DETERMINISTIC
RETURN s BETWEEN a AND b or e BETWEEN a and b or a BETWEEN s and e;
Another method by using BETWEEN sql statement
Periods included :
SELECT *
FROM periods
WHERE #check_period_start BETWEEN range_start AND range_end
AND #check_period_end BETWEEN range_start AND range_end
Periods excluded :
SELECT *
FROM periods
WHERE (#check_period_start NOT BETWEEN range_start AND range_end
OR #check_period_end NOT BETWEEN range_start AND range_end)
SELECT *
FROM tabla a
WHERE ( #Fini <= a.dFechaFin AND #Ffin >= a.dFechaIni )
AND ( (#Fini >= a.dFechaIni AND #Ffin <= a.dFechaFin) OR (#Fini >= a.dFechaIni AND #Ffin >= a.dFechaFin) OR (a.dFechaIni>=#Fini AND a.dFechaFin <=#Ffin) OR
(a.dFechaIni>=#Fini AND a.dFechaFin >=#Ffin) )

Segmenting a list of data; python

I am trying to write a code that takes a list flow_rate, changes it into a segmented list list_segmented of length segment_len. Then with that segmented list, I take each index and make it a list of data_segment.
I am getting stuck trying to figure out how to make each list_segmented[i] = data_segment. The last part of the code calls another function for data_segment in which I have previously written and can import it.
Appreciate your help.
def flow_rate_to_disorder_status(flow_rate,segment_len,interval,threshold):
inlist = flow_rate[:]
list_segmented = []
disorder_status = []
while inlist:
list_segmented.append(inlist[0 : segment_len])
inlist[0 : segment_len] = []
for i in range(0, len(list_segmented)):
data_segment = list_segmented[i]
condition = sym.has_symptom(data_segment, interval, threshold)
disorder_status.append(condition)
Initial function:
def has_symptom(data_segment,interval,threshold):
max_ratio = 1 # maximum ratio allowed when dividing
# data points in interval by len data_segment
# for our example it is 1
# NOTE: max_ratio can NOT be less than threshold
# to define the range of the given interval:
min_interval = interval[0]
max_interval = interval[1]
# create an empty list to add to data points that fall in the interval
symptom_yes = []
# create a loop function to read every point in data_segment
# and compare wether or not it falls in the interval
for i in range(0, len(data_segment)):
if min_interval <= data_segment[i] <= max_interval:
# if the data falls in interval, add it to list symptom_yes
symptom_yes.append(data_segment[i])
# to get the fraction ration between interval points and total data points
fraction_ratio = len(symptom_yes) / len(data_segment)
# if the ratio of data points that fall in interval to total points in
# data segments is more than or equal to threshold and less than or equal
# to max_ratio (1 in our case) then apply condition
if threshold <= fraction_ratio <= max_ratio:
condition = True # entire segment has the symptom
else:
condition = False # entire segment does NOT have the symptom
return condition
You nearly did it:
for i in range(0, len(data_segment)): # <-- looping thru data_segment
# data_segment = list_segmented[i] <-- this was back to front.
list_segmented[i] = data_segment # <-- this will work
note: there are cleaner ways of doing this in python (like list comprehension).
Anyway, good question. Hope that helps.
It looks like the lines
condition = sym.has_symptom(data_segment, interval, threshold)
disorder_status.append(condition)
should each be indented by one more level to be inside the for loop, so that they are executed for each data segment.
You presumably also want to return disorder_status at the end of the function.

ValueError with recursive function in Python

def recursive(start, end, datelist):
results = ga.GAnalytics().create_query(profile_id,
metrics,
start,
end,
dimensions).execute()
if results.get("containsSampledData") is True:
x = len(datelist) / 2
recursive(datelist[0],datelist[:x][-1],datelist[:x])
recursive(datelist[x:][0],datelist[-1],datelist[x:])
else:
unsampled_date_ranges = []
for x, y in start, end:
unsampled_date_ranges.append((x, y))
recursive(start_date, end_date, date_list)
The function above takes a start date, end date and an inclusive list of dates based on the start and end dates. If first checks if the data returned for the initial date range is sampled, if it is then the date range is split in half then checked, and so on.
My issue is with the else statement. To make sure the function worked I tried print start + " - " + end which returned the expected date ranges. Ideally, I would like the data to be returned as a list of tuples, so I tried the above, but unfortunately I am getting this error ValueError: too many values to unpack here for x, y in start, end:
What is the issue with my code in my else statement and how can I get it to return a list of tuples?

python, datetime.date: difference between two days

I'm playing around with 2 objects {#link http://docs.python.org/library/datetime.html#datetime.date}
I would like to calculate all the days between them, assuming that date 1 >= date 2, and print them out. Here is an example what I would like to achieve. But I don't think this is efficient at all. Is there a better way to do this?
# i think +2 because this calc gives only days between the two days,
# i would like to include them
daysDiff = (dateTo - dateFrom).days + 2
while (daysDiff > 0):
rptDate = dateFrom.today() - timedelta(days=daysDiff)
print rptDate.strftime('%Y-%m-%d')
daysDiff -= 1
I don't see this as particularly inefficient, but you could make it slightly cleaner without the while loop:
delta = dateTo - dateFrom
for delta_day in range(0, delta.days+1): # Or use xrange in Python 2.x
print dateFrom + datetime.timedelta(delta_day)
(Also, notice how printing or using str on a date produces that '%Y-%m-%d' format for you for free)
It might be inefficient, however, to do it this way if you were creating a long list of days in one go instead of just printing, for example:
[dateFrom + datetime.timedelta(delta_day) for delta_day in range(0, delta.days+1)]
This could easily be rectified by creating a generator instead of a list. Either replace [...] with (...) in the above example, or:
def gen_days_inclusive(start_date, end_date):
delta_days = (end_date - start_date).days
for day in xrange(delta_days + 1):
yield start_date + datetime.timedelta(day)
Whichever suits your syntax palate better.

Categories

Resources