De-overlapping intervals - python

I'm trying to write some code to de-overlap (open) intervals. I found inspiration in Algorithm to shift overlapping intervals until no overlap is left and Possible Interview Question: How to Find All Overlapping Intervals.
Intervals represent physical entities. Their position was estimated by some means, but this imperfect estimate results in overlapping positions of these physical entities. However, in reality these physical entities cannot occupy the same space, so I'm using this code to readjust their positions. This adjustment should move these physical entities as little as possible and maintain their estimated relative positions as much as possible. Since these are physical entities, the length of each interval cannot change.
Code works well in most cases but hangs in some cases like this:
intervals = [(0, 8), (9, 13), (11, 14), (15, 21)]
Here's my python code. Any suggestions?
def intervalLength(interval):
'''
Finds the length of an interval, supplied as a tupple
'''
return interval[1]-interval[0]
def findOverlappingIntervals(intervals):
# https://stackoverflow.com/questions/4542892/possible-interview-question-how-to-find-all-overlapping-intervals?rq=1
# Throw the endpoints of the intervals into an array, marking them as either start- or end-points.
# Sort them by breaking ties by placing end-points before start-points if the intervals are closed, or the other way around if they're half-open.
'''
Takes a list of intervals and returns the intervals that overlap.
List returned has nested list composed of the intervals that overlap with each other
Assumes list is ordered by the start position of the intervals
'''
end_points = []
for n,i in enumerate(intervals):
end_points.append((i[0],'b',n)) #'b' = beginning of interval
end_points.append((i[1],'e',n)) #'e' = end of interval
end_points.sort()
b = 0
e = 0
overlapping = [set()]
open_intervals = set()
for ep,sORe,i in end_points:
if sORe == 'b':
b += 1
open_intervals.add(i)
if b-e > 1 and i in open_intervals:
overlapping[-1].update(open_intervals)
elif len(overlapping[-1]) > 0:
overlapping.append(set())
else:
e += 1
open_intervals.remove(i)
overlapping = [o for o in overlapping if len(o) > 0]
overlapping = [[intervals[i] for i in o] for o in overlapping]
return overlapping
def deOverlap(intervals):
'''
Takes a list of overlapping intervals and returns a new list with updated postions
without overlap and intervals separated by 1, maintaining the previous center
'''
# Find center of overlapping intervals
avg = (reduce(lambda x,y: x+y, [a+b for a,b in intervals])+len(intervals)-1)/(len(intervals)*2.0)
# Find the total length of the ovrlapping intervals
tot_length = reduce(lambda x,y: x+y, [intervalLength(i) for i in intervals]) + len(intervals) - 1
# Find new start position for the overlapping intervals
new_start = int(round(avg-(tot_length/2.0)))
# Place first interval in new position
non_over_intervals = [(new_start, new_start+intervalLength(intervals[0]))]
# Place rest of intervals in new positions
for i in intervals[1:]:
start = non_over_intervals[-1][1]+1
non_over_intervals.append((start,start+intervalLength(i)))
return non_over_intervals
def deOverlapIntervals(intervals):
'''
Takes a list of intervals and returns a list with the same intervals with no overlap and
located as close to the original locations as possible
'''
non_over_intervals = intervals
i = 0
while len(findOverlappingIntervals(non_over_intervals)) > 0:
if i >= 10000:
print 'Tried 10,000 times and did not finish de-overlapping. Returning best I could do'
return non_over_intervals
overlapping_intervals = findOverlappingIntervals(non_over_intervals)
non_over_intervals = set(non_over_intervals) - set([oi for group in overlapping_intervals for oi in group])
for oi in overlapping_intervals:
non_overlapping = deOverlap(oi)
non_over_intervals.update(non_overlapping)
non_over_intervals = list(non_over_intervals)
non_over_intervals.sort()
i += 1
return non_over_intervals
These intervals run fine
intervals = [(-5,-1), (0,6), (7,11), (9,14), (12,17), (21,24), (27,32), (32,36), (39,41)]
These don't. They hang because intervals keep shifting and overlapping with the left or the right end
intervals = [(0, 8), (9, 13), (11, 14), (15, 21)]
non_over_intervals = deOverlapIntervals(intervals)

Can't you do it a lot more simply? Like this:
result = intervals[:1]
for begin, end in intervals[1:]:
if begin <= result[-1][1]:
result[-1] = (result[-1][0], end)
else:
result.append((begin, end))

Like this?
intervals = [(0, 8), (9, 13), (11, 14), (15, 21)]
intervals = [list(t) for t in intervals]
for i in range(len(intervals) - 1):
offset = intervals[i + 1][0] - intervals[i][1]
if offset < 1:
diff = intervals[i + 1][1] - intervals[i + 1][0]
intervals[i + 1][0] = intervals[i][1] + 1
intervals[i + 1][1] = intervals[i + 1][0] + diff
print(intervals)

I found that if shifting an interval will make it overlap with another interval, then including both the original overlapping intervals and the intervals that would overlap with the shift in a single deOverlapIntervals run will fix the problem.

Related

Merge overlapping sessions, how do I find the end value of the session? [Leetcode Similar to Car Pooling]

I have the following code block which figures out the number of overlapping sessions. Given different intervals, the task is to print the maximum number of overlap among these intervals at any time and also to find the overlapped interval.
def overlap(v):
# variable to store the maximum
# count
ans = 0
count = 0
data = []
# storing the x and y
# coordinates in data vector
for i in range(len(v)):
# pushing the x coordinate
data.append([v[i][0], 'x'])
# pushing the y coordinate
data.append([v[i][1], 'y'])
# sorting of ranges
data = sorted(data)
# Traverse the data vector to
# count number of overlaps
for i in range(len(data)):
# if x occur it means a new range
# is added so we increase count
if (data[i][1] == 'x'):
count += 1
# if y occur it means a range
# is ended so we decrease count
if (data[i][1] == 'y'):
count -= 1
# updating the value of ans
# after every traversal
ans = max(ans, count)
# printing the maximum value
print(ans)
# Driver code
v = [[ 1, 2 ], [ 2, 4 ], [ 3, 6 ],[3,8]]
overlap(v)
This returns 3.
But what would be the best way to also return the maximum overlapping interval by modifying my existing approach? In this case which should be [3,4].
You could use the counter object (from collections) to create a list of intersecting sub intervals and count the number of original intervals that intersect with them. Each interval in your list would be intersected with all the sub-intervals found so far in order to accumulate the counts:
v = [[ 1, 2 ], [ 2, 4 ], [ 3, 6 ],[3,8]]
from collections import Counter
overCounts = Counter()
for vStart,vEnd in v:
overlaps = [(max(s,vStart),min(e,vEnd)) for s,e in overCounts
if s<=vEnd and e>=vStart]
overCounts += Counter(overlaps + [(vStart,vEnd)])
interval,count = overCounts.most_common(1)[0]
print(interval,count) # (3,4) 3
The overlaps list detects intersections with the sub-intervals found so far. s<=vEnd and e>=vStart will return True when interval (s,e) intersects with interval (vStart,vEnd). For those intervals that do intersect we want the start and end of the intersection (sub-interval). The intersection will start at the largest beginning and end at the smallest end. So we take the max() of the start positions with the min() of the end positions to form the sub-interval: (max(s,vStart),min(e,vEnd))
vStart vEnd
[--------------------]
[--------------------------]
s e
[-------------]
--max-> <----min-----
[EDIT]
To be honest, I like your original approach better than mine. It will respond in O(NLogN) time whereas mine could go up to O(N^2) depending on the data.
In order to capture the sub-interval corresponding to the result in your original approach, you would need to add a variable to keep track of the last starting position encountered and move detection of a higher count inside the 'y' condition.
For example:
lastStart = maxStart = maxEnd = None
# ...
if (data[i][1] == 'x'):
lastStart = data[i][0] # last start of sub-interval
count += 1
if (data[i][1] == 'y'):
if count > ans: # detect a greater overlap
maxStart = lastStart # start of corresponding sub-interval
maxEnd = data[i][0]
ans = count
count -= 1
# ans = max(ans, count) <-- removed
# ...
You could also implement it using accumulate:
v = [[ 1, 2 ], [ 2, 4 ], [ 3, 6 ],[3,8]]
from itertools import accumulate
edges = sorted((p,e) for i in v for p,e in zip(i,(-1,1)))
counts = accumulate(-e for _,e in edges)
starts = accumulate((p*(e<0) for p,e in edges),max)
count,start,end = max((c+1,s,p) for c,s,(p,e) in zip(counts,starts,edges) if e>0)
print(count,[start,end]) # 3 [3, 4]

Excluding intervals from a real number line

A long time ago I asked the following question:
Python: delete substring by indices
Last week I was asked a very similar question, but with continuous real-number line.
Imagine you are given an interval (X, Y), and a bunch of sub-intervals blocks=[(x1, y1), (x2, y2), ...]. Your goal is to find a list of intervals remaining=[(a1, b1), (a2, b2), ..] that is (1) in (X, Y), but (2) not in any of the sub-intervals in blocks. The intervals in blocks can overlap.
In other words, the function signature looks something like:
def delete_blocks_from_interval(X, Y, blocks):
```
X: start of the given interval
Y: end of the given interval
blocks: list of intervals (x, y) to be removed, can overlap
returns remaining = [(a, b), ...] intervals remaining after removal of blocks
```
pass
I can construct a graph of connected intervals in blocks, and find both the minimum of the start and the maximum of end for each connected component in the graph. But this is quadratic in length of blocks. I wonder if there is a better-runtime algorithm.
Please also discuss what code routine you think is more efficient for the algorithm if you will.
Many many thanks.
As requested, please consider the following illustrative inputs:
X = -1
Y = 20
blocks = [(1, 10), (4, 5), (9, 11), (16, 17.5)]
the expected output is remaining = [(-1, 1), (11, 16), (17.5, 20)]
This should be linear run-time in number of blocks
import bisect
def delete_blocks_from_interval(range_start, range_end, blocks):
blocks = sorted(blocks)
# check if the interval overlaps with blocks,
# if so, truncate the block lists, reset end points if required
start_idx = bisect.bisect_left([b[0] for b in blocks],range_start)
end_idx = bisect.bisect_left([b[0] for b in blocks],range_end)
blocks = blocks[start_idx:end_idx]
if blocks[0][0] < range_start:
blocks[0][0] = range_start
if blocks[-1][1] > range_end:
blocks[-1][1] = range_end
# emit the first gap, if any
if range_start < blocks[0][0]:
yield (range_start, blocks[0][0])
# loop through till the end of the blocks
end = blocks[0][1]
for block in blocks[1:]:
if end < block[0]:
yield (end, block[0])
end = block[1]
elif end < block[1]:
end = block[1]
# emit the last gap, if any
if range_end > blocks[-1][1]:
yield (blocks[-1][1], range_end)
blocks = [(1, 10), (4, 5), (9, 11), (16, 17.5)]
list(delete_blocks_from_interval(-1, 20, blocks))
python-ranges is a library I wrote that excels at this particular use case. It isn't the most efficient code you could possibly write (it essentially uses #wwii's algorithm below, in fact) but it is nice and terse.
from ranges import Range, RangeSet
...
def delete_blocks_from_interval(X, Y, blocks):
# make a Range
orig = Range(X, Y)
# make a RangeSet out of the 2-tuple blocks
# (using the unpacking operator to interpret 2-tuples as positional args for Range())
# and then find the difference from the original set
# (like with sets, the - operator is a shorthand for .difference())
remaining = orig - RangeSet(
Range(*block) for block in blocks
)
# return each range in the RangeSet as a tuple
return [(rng.start, rng.end) for rng in remaining.ranges()]
Would something like this work?
def delete_blocks_from_interval(a, b, blocks):
sorted_blocks = sorted(blocks)
for i, (c, d) in enumerate(sorted_blocks):
if a <= c <= b:
yield (a, c)
if d > a:
a = d
for (e, f) in sorted_blocks[i + 1:]:
if e <= a <= f:
a = f
elif e > a:
break
if a <= d <= b:
yield (d, b)
blocks = ((1, 10), (4, 5), (9, 11), (16, 17.5))
print(list(delete_blocks_from_interval(-1, 20, blocks)))
# (-1, 1), (11, 16), (17.5, 20)
Sort the blocks;
check if there is a gap between X and item 0 of the first block; save if there is;
iterate over blocks in pairs,
get the first and second item
if item 1 of the first block is less than item 0 of the second block,
save this interval (first[1],second[0]) ;
or if they overlap,
compare the last items,
if the last item of the second tuple is bigger than the last item of the first - expand the first tuple range,
get the third/next item, repeat the comparison(s)
repeat til you get to the end of the (X,Y) interval or you run out of blocks.

how to group near points in list in python

I am getting a list using a list comprehension. lat say I am getting this list using this line of code bellow:
quality, angle, distance = measurements[i]
new_data = [each_value for each_value in measurements[i:i + 20] if angle <= each_value[1] <= angle + 30 and
distance - 150 <= each_value[2] <= distance + 150]
where measurements is a big data set which contains (quality, angle, distance) pair. from that, I am getting those value.
desired_list= [(1,2,3)(1,5,3),(1,8,3)(1,10,3),(1,16,3),(1,17,3)]]
Now how can I add a new condition in my list comprehension so that I will only get the value if the angle is within some offset value? let say if the difference between two respective angles is less then or equal to 5 then put them in desired_list.
with this condition my list should be like so:
desired_list= [(1,2,3)(1,5,3),(1,8,3)(1,10,3)]
cause from 2 to 5, 5 to 8, 8 to 10 the distance is less than or equal to 5.
But the last two points are not included as they break the condition after (1,10,3) and they don't need to check.
How can I achieve this? please help me
Note: it doesn't need to be in the same list comprehension.
You mention the data set is large. Depending how large you many wish to avoid creating a new list from scratch and just search for the relavant index.
data = [(1,2,3), (1,5,3), (1,8,3), (1,10,3), (1,16,3), (1,17,3)]
MAXIMUM_ANGLE = 5
def angles_within_range(x, y):
return abs(x[1] - y[1]) <= MAXIMUM_ANGLE
def first_angle_break_index():
for i in range(len(data) - 1):
if not angles_within_range(data[i], data[i+1]):
return i+1
def valid_angles_list():
return data[:first_angle_break_index()]
print(valid_angles_list())
If you means traverse from start to end, and break out when one neighor pairs break the rule.
here is a way without list comprehension:
desired_list = [(1, 2, 3), (1, 5, 3), (1, 8, 3), (1, 10, 3), (1, 16, 3), (1, 17, 3)]
res = [desired_list[0]]
for a, b in zip(desired_list[:-1], desired_list[1:]):
if abs(a[1] - b[1]) > 5:
break
res += [b]
print(res)
output:
[(1, 2, 3), (1, 5, 3), (1, 8, 3), (1, 10, 3)]
if you insist on using list comprehension with break, here is a solution of recording last pair:
res = [last.pop() and last.append(b) or b for last in [[desired_list[0]]] for a, b in
zip([desired_list[0]] + desired_list, desired_list) if abs(a[1] - b[1]) <= 5 and a == last[0]]
another version use end condition:
res = [b for end in [[]] for a, b in zip([desired_list[0]] + desired_list, desired_list) if
(False if end or abs(a[1] - b[1]) <= 5 else end.append(42)) or not end and abs(a[1] - b[1]) <= 5]
Note: This is a bad idea. (just for fun : ))

Python removing intersection from list 2 out of list 1 [duplicate]

My problem is as follows:
having file with list of intervals:
1 5
2 8
9 12
20 30
And a range of
0 200
I would like to do such an intersection that will report the positions [start end] between my intervals inside the given range.
For example:
8 9
12 20
30 200
Beside any ideas how to bite this, would be also nice to read some thoughts on optimization, since as always the input files are going to be huge.
this solution works as long the intervals are ordered by the start point and does not require to create a list as big as the total range.
code
with open("0.txt") as f:
t=[x.rstrip("\n").split("\t") for x in f.readlines()]
intervals=[(int(x[0]),int(x[1])) for x in t]
def find_ints(intervals, mn, mx):
next_start = mn
for x in intervals:
if next_start < x[0]:
yield next_start,x[0]
next_start = x[1]
elif next_start < x[1]:
next_start = x[1]
if next_start < mx:
yield next_start, mx
print list(find_ints(intervals, 0, 200))
output:
(in the case of the example you gave)
[(0, 1), (8, 9), (12, 20), (30, 200)]
Rough algorithm:
create an array of booleans, all set to false seen = [False]*200
Iterate over the input file, for each line start end set seen[start] .. seen[end] to be True
Once done, then you can trivially walk the array to find the unused intervals.
In terms of optimisations, if the list of input ranges is sorted on start number, then you can track the highest seen number and use that to filter ranges as they are processed -
e.g. something like
for (start,end) in input:
if end<=lowest_unseen:
next
if start<lowest_unseen:
start=lowest_unseen
...
which (ignoring the cost of the original sort) should make the whole thing O(n) - you go through the array once to tag seen/unseen and once to output unseens.
Seems I'm feeling nice. Here is the (unoptimised) code, assuming your input file is called input
seen = [False]*200
file = open('input','r')
rows = file.readlines()
for row in rows:
(start,end) = row.split(' ')
print "%s %s" % (start,end)
for x in range( int(start)-1, int(end)-1 ):
seen[x] = True
print seen[0:10]
in_unseen_block=False
start=1
for x in range(1,200):
val=seen[x-1]
if val and not in_unseen_block:
continue
if not val and in_unseen_block:
continue
# Must be at a change point.
if val:
# we have reached the end of the block
print "%s %s" % (start,x)
in_unseen_block = False
else:
# start of new block
start = x
in_unseen_block = True
# Handle end block
if in_unseen_block:
print "%s %s" % (start, 200)
I'm leaving the optimizations as an exercise for the reader.
If you make a note every time that one of your input intervals either opens or closes, you can do what you want by putting together the keys of opens and closes, sort into an ordered set, and you'll be able to essentially think, "okay, let's say that each adjacent pair of numbers forms an interval. Then I can focus all of my logic on these intervals as discrete chunks."
myRange = range(201)
intervals = [(1,5), (2,8), (9,12), (20,30)]
opens = {}
closes = {}
def open(index):
if index not in opens:
opens[index] = 0
opens[index] += 1
def close(index):
if index not in closes:
closes[index] = 0
closes[index] += 1
for start, end in intervals:
if end > start: # Making sure to exclude empty intervals, which can be problematic later
open(start)
close(end)
# Sort all the interval-endpoints that we really need to look at
oset = {0:None, 200:None}
for k in opens.keys():
oset[k] = None
for k in closes.keys():
oset[k] = None
relevant_indices = sorted(oset.keys())
# Find the clear ranges
state = 0
results = []
for i in range(len(relevant_indices) - 1):
start = relevant_indices[i]
end = relevant_indices[i+1]
start_state = state
if start in opens:
start_state += opens[start]
if start in closes:
start_state -= closes[start]
end_state = start_state
if end in opens:
end_state += opens[end]
if end in closes:
end_state -= closes[end]
state = end_state
if start_state == 0:
result_start = start
result_end = end
results.append((result_start, result_end))
for start, end in results:
print(str(start) + " " + str(end))
This outputs:
0 1
8 9
12 20
30 200
The intervals don't need to be sorted.
This question seems to be a duplicate of Merging intervals in Python.
If I understood well the problem, you have a list of intervals (1 5; 2 8; 9 12; 20 30) and a range (0 200), and you want to get the positions outside your intervals, but inside given range. Right?
There's a Python library that can help you on that: python-intervals (also available from PyPI using pip). Disclaimer: I'm the maintainer of that library.
Assuming you import this library as follows:
import intervals as I
It's quite easy to get your answer. Basically, you first want to create a disjunction of intervals based on the ones you provide:
inters = I.closed(1, 5) | I.closed(2, 8) | I.closed(9, 12) | I.closed(20, 30)
Then you compute the complement of these intervals, to get everything that is "outside":
compl = ~inters
Then you create the union with [0, 200], as you want to restrict the points to that interval:
print(compl & I.closed(0, 200))
This results in:
[0,1) | (8,9) | (12,20) | (30,200]

Merging Overlapping Intervals in Python [duplicate]

This question already has answers here:
Merging Overlapping Intervals
(4 answers)
Closed last year.
I am trying to solve a question where in overlapping intervals need to be merged.
The question is:
Given a collection of intervals, merge all overlapping intervals.
For example, Given [1,3],[2,6],[8,10],[15,18], return [1,6],[8,10],[15,18].
I tried my solution:
# Definition for an interval.
# class Interval:
# def __init__(self, s=0, e=0):
# self.start = s
# self.end = e
class Solution:
def merge(self, intervals):
"""
:type intervals: List[Interval]
:rtype: List[Interval]
"""
start = sorted([x.start for x in intervals])
end = sorted([x.end for x in intervals])
merged = []
j = 0
new_start = 0
for i in range(len(start)):
if start[i]<end[j]:
continue
else:
j = j + 1
merged.append([start[new_start], end[j]])
new_start = i
return merged
However it is clearly missing the last interval as:
Input : [[1,3],[2,6],[8,10],[15,18]]
Answer :[[1,6],[8,10]]
Expected answer: [[1,6],[8,10],[15,18]]
Not sure how to include the last interval as overlap can only be checked in forward mode.
How to fix my algorithm so that it works till the last slot?
Your code implicitly already assumes the starts and ends to be sorted, so that sort could be left out. To see this, try the following intervals:
intervals = [[3,9],[2,6],[8,10],[15,18]]
start = sorted([x[0] for x in intervals])
end = sorted([x[1] for x in intervals]) #mimicking your start/end lists
merged = []
j = 0
new_start = 0
for i in range(len(start)):
if start[i]<end[j]:
continue
else:
j = j + 1
merged.append([start[new_start], end[j]])
new_start = i
print(merged) #[[2, 9], [8, 10]]
Anyway, the best way to do this is probably recursion, here shown for a list of lists instead of Interval objects.
def recursive_merge(inter, start_index = 0):
for i in range(start_index, len(inter) - 1):
if inter[i][1] > inter[i+1][0]:
new_start = inter[i][0]
new_end = inter[i+1][1]
inter[i] = [new_start, new_end]
del inter[i+1]
return recursive_merge(inter.copy(), start_index=i)
return inter
sorted_on_start = sorted(intervals)
merged = recursive_merge(sorted_on_start.copy())
print(merged) #[[2, 10], [15, 18]]
I know the question is old, but in case it might help, I wrote a Python library to deal with (set of) intervals. Its name is portion and makes it easy to merge intervals:
>>> import portion as P
>>> inputs = [[1,3],[2,6],[8,10],[15,18]]
>>> # Convert each input to an interval
>>> intervals = [P.closed(a, b) for a, b in inputs]
>>> # Merge these intervals
>>> merge = P.Interval(*intervals)
>>> merge
[1,6] | [8,10] | [15,18]
>>> # Output as a list of lists
>>> [[i.lower, i.upper] for i in merge]
[[1,6],[8,10],[15,18]]
Documentation can be found here: https://github.com/AlexandreDecan/portion
We can have intervals sorted by the first interval and we can build the merged list in the same interval list by checking the intervals one by one not appending to another one so. we increment i for every interval and interval_index is current interval check
x =[[1,3],[2,6],[8,10],[15,18]]
#y = [[1,3],[2,6],[8,10],[15,18],[19,25],[20,26],[25,30], [32,40]]
def merge_intervals(intervals):
sorted_intervals = sorted(intervals, key=lambda x: x[0])
interval_index = 0
#print(sorted_intervals)
for i in sorted_intervals:
if i[0] > sorted_intervals[interval_index][1]:
interval_index += 1
sorted_intervals[interval_index] = i
else:
sorted_intervals[interval_index] = [sorted_intervals[interval_index][0], i[1]]
#print(sorted_intervals)
return sorted_intervals[:interval_index+1]
print(merge_intervals(x)) #-->[[1, 6], [8, 10], [15, 18]]
#print ("------------------------------")
#print(merge_intervals(y)) #-->[[1, 6], [8, 10], [15, 18], [19, 30], [32, 40]]
This is very old now, but in case anyone stumbles across this, I thought I'd throw in my two cents, since I wasn't completely happy with the answers above.
I'm going to preface my solution by saying that when I work with intervals, I prefer to convert them to python3 ranges (probably an elegant replacement for your Interval class) because I find them easy to work with. However, you need to remember that ranges are half-open like everything else in Python, so the stop coordinate is not "inside" of the interval. Doesn't matter for my solution, but something to keep in mind.
My own solution:
# Start by converting the intervals to ranges.
my_intervals = [[1, 3], [2, 6], [8, 10], [15, 18]]
my_ranges = [range(start, stop) for start, stop in my_intervals]
# Next, define a check which will return True if two ranges overlap.
# The double inequality approach means that your intervals don't
# need to be sorted to compare them.
def overlap(range1, range2):
if range1.start <= range2.stop and range2.start <= range1.stop:
return True
return False
# Finally, the actual function that returns a list of merged ranges.
def merge_range_list(ranges):
ranges_copy = sorted(ranges.copy(), key=lambda x: x.stop)
ranges_copy = sorted(ranges_copy, key=lambda x: x.start)
merged_ranges = []
while ranges_copy:
range1 = ranges_copy[0]
del ranges_copy[0]
merges = [] # This will store the position of ranges that get merged.
for i, range2 in enumerate(ranges_copy):
if overlap(range1, range2): # Use our premade check function.
range1 = range(min([range1.start, range2.start]), # Overwrite with merged range.
max([range1.stop, range2.stop]))
merges.append(i)
merged_ranges.append(range1)
# Time to delete the ranges that got merged so we don't use them again.
# This needs to be done in reverse order so that the index doesn't move.
for i in reversed(merges):
del ranges_copy[i]
return merged_ranges
print(merge_range_list(my_ranges)) # --> [range(1, 6), range(8, 10), range(15, 18)]
Make pairs for every endpoint: (value; kind = +/-1 for start or end of interval)
Sort them by value. In case of tie choose paie with -1 first if you need to merge intervals with coinciding ends like 0-1 and 1-2
Make CurrCount = 0, walk through sorted list, adding kind to CurrCount
Start new resulting interval when CurrCount becomes nonzero, finish interval when CurrCount becomes zero.
Late to the party, but here is my solution. I typically find recursion with an invariant easier to conceptualize. In this case, the invariant is that the head is always merged, and the tail is always waiting to be merged, and you compare the last element of head with the first element of tail.
One should definitely use sorted with the key argument rather than using a list comprehension.
Not sure how efficient this is with slicing and concatenating lists.
def _merge(head, tail):
if tail == []:
return head
a, b = head[-1]
x, y = tail[0]
do_merge = b > x
if do_merge:
head_ = head[:-1] + [(a, max(b, y))]
tail_ = tail[1:]
return _merge(head_, tail_)
else:
head_ = head + tail[:1]
tail_ = tail[1:]
return _merge(head_, tail_)
def merge_intervals(lst):
if len(lst) <= 1:
return lst
lst = sorted(lst, key=lambda x: x[0])
return _merge(lst[:1], lst[1:])

Categories

Resources