Intersections of intervals - python

I understand that start[i] (first list) & end[i](second list) are creating a new output as new list but how to iterate through the twolist with a for loop or comprehension list? Can someone provide a more efficient code?
Here is the scenario:
You are given two lists of closed intervals, firstList and secondList, where firstList[i] = [starti, endi] and secondList[j] = [startj, endj]. Each list of intervals is pairwise disjoint and in sorted order.
Return the intersection of these two interval lists.
A closed interval [a, b] (with a <= b) denotes the set of real numbers x with a <= x <= b.
The intersection of two closed intervals is a set of real numbers that are either empty or represented as a closed interval. For example, the intersection of [1, 3] and [2, 4] is [2, 3].
Example 1:
  Input:
firstList = [[0,2],[5,10],[13,23],[24,25]]
secondList = [[1,5],[8,12],[15,24],[25,26]]
  Output:
[[1,2],[5,5],[8,10],[15,23],[24,24],[25,25]]
Example 2:
  Input:
firstList = [[1,3],[5,9]]
secondList = []
  Output:
[]
This is what I have but I want less lines of code:
l = intervals[0][0]
r = intervals[0][1]
for i in range(1,N):
# If no intersection exists
if (intervals[i][0] > r or intervals[i][1] < l):
print(-1)
# Else update the intersection
else:
l = max(l, intervals[i][0])
r = min(r, intervals[i][1])

ended up with this:
what do you think?
def intervalIntersection(self, A: List[List[int]], B: List[List[int]]) -> List[List[int]]:
ans = []
i = j = 0
while i < len(A) and j < len(B):
# Let's check if A[i] intersects B[j].
# lo - the startpoint of the intersection
# hi - the endpoint of the intersection
lo = max(A[i][0], B[j][0])
hi = min(A[i][1], B[j][1])
if lo <= hi:
ans.append([lo, hi])
# Remove the interval with the smallest endpoint
if A[i][1] < B[j][1]:
i += 1
else:
j += 1
return ans

intervals1 = [[0,2],[5,10],[13,23],[24,25]]
intervals2 = [[1,5],[8,12],[15,24],[25,26]]
def get_intersection(interval1, interval2):
new_min = max(interval1[0], interval2[0])
new_max = min(interval1[1], interval2[1])
return [new_min, new_max] if new_min <= new_max else None
interval_intersection = [x for x in (get_intersection(y, z) for y in intervals1 for z in intervals2) if x is not None]
print(interval_intersection)
Try it at www.mycompiler.io
This assumes that the intervals are disjoint but would work regardless of ordering. It is very slightly inefficient in that once you have found one matching intersection you don't need to keep checking all the others.

You could merge the values of the two list with a flag identifying starts (1) and ends (3) of ranges. Sort the values and then go through them accumulating the flags (+1 for starts, -1 for ends) to determine if both lists have a range covering each value:
firstList = [[0,2],[5,10],[13,23],[24,25]]
secondList = [[1,5],[8,12],[15,24],[25,26]]
breaks = sorted((n,f) for a,b in firstList+secondList for n,f in [(a,1),(b,3)])
merged,s = [[]],0
for n,flag in breaks:
s += 2-flag
if s>=2 or s==1 and flag>1: merged[-1].append(n)
if s<2 and merged[-1]: merged.append([])
merged = merged[:-1]
print(merged)
[[1, 2], [5, 5], [8, 10], [15, 23], [24, 24], [25, 25]]
Note that I'm assuming that the ranges never overlap within a given list

Related

Implementing Merge Sort algorithm

def merge(arr,l,m,h):
lis = []
l1 = arr[l:m]
l2 = arr[m+1:h]
while((len(l1) and len(l2)) is not 0):
if l1[0]<=l2[0]:
x = l1.pop(0)
else:
x = l2.pop(0)
lis.append(x)
return lis
def merge_sort(arr,l,h): generating them
if l<h:
mid = (l+h)//2
merge_sort(arr,l,mid)
merge_sort(arr,mid+1,h)
arr = merge(arr,l,mid,h)
return arr
arr = [9,3,7,5,6,4,8,2]
print(merge_sort(arr,0,7))
Can anyone please enlighten where my approach is going wrong ?
I get only [6,4,8] as the answer. I'm trying to understand the algo and implement the logic my own way. Please help.
Several issues:
As you consider h to be the last index of the sublist, then realise that when slicing a list, the second index is the one following the intended range. So change this:
Wrong
Right
l1 = arr[l:m]
l1 = arr[l:m+1]
l2 = arr[m+1:h]
l2 = arr[m+1:h+1]
As merge returns the result for a sub list, you should not assign it to arr. arr is supposed to be the total list, so you should only replace a part of it:
arr[l:h+1] = merge(arr,l,mid,h)
As the while loop requires that both lists are not empty, you should still consider the case where after the loop one of the lists is still not empty: its elements should be added to the merged result. So replace the return statement to this:
return lis + l1 + l2
It is not advised to compare integers with is or is not, which you do in the while condition. In fact that condition can be simplified to this:
while l1 and l2:
With these changes (and correct indentation) it will work.
Further remarks:
This implementation is not efficient. pop(0) has a O(n) time complexity. Use indexes that you update during the loop, instead of really extracting the values out the lists.
It is more pythonic to let h and m be the indices after the range that they close, instead of them being the indices of the last elements within the range they close. So if you go that way, then some of the above points will be resolved differently.
Corrected implementation
Here is your code adapted using all of the above remarks:
def merge(arr, l, m, h):
lis = []
i = l
j = m
while i < m and j < h:
if arr[i] <= arr[j]:
x = arr[i]
i += 1
else:
x = arr[j]
j += 1
lis.append(x)
return lis + arr[i:m] + arr[j:h]
def merge_sort(arr, l, h):
if l < h - 1:
mid = (l + h) // 2
merge_sort(arr, l, mid)
merge_sort(arr, mid, h)
arr[l:h] = merge(arr, l, mid, h)
return arr
arr = [9, 3, 7, 5, 6, 4, 8, 2]
print(merge_sort(arr,0,len(arr)))

What is the most efficient way of getting the intersection of k sorted arrays?

Given k sorted arrays what is the most efficient way of getting the intersection of these lists
Example
INPUT:
[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
Output:
[1,7]
There is a way to get the union of k sorted arrays based on what I read in the Elements of programming interviews book in nlogk time. I was wondering if there is a way to do something similar for the intersection as well
## merge sorted arrays in nlogk time [ regular appending and merging is nlogn time ]
import heapq
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heapq.heappop(heap)
it = srtd_iters[ary]
res.append(elem)
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
EDIT: obviously this is an algorithm question that I am trying to solve so I cannot use any of the inbuilt functions like set intersection etc
Exploiting sort order
Here is a single pass O(n) approach that doesn't require any special data structures or auxiliary memory beyond the fundamental requirement of one iterator per input.
from itertools import cycle, islice
def intersection(inputs):
"Yield the intersection of elements from multiple sorted inputs."
# intersection(['ABBCD', 'BBDE', 'BBBDDE']) --> B B D
n = len(inputs)
iters = cycle(map(iter, inputs))
try:
candidate = next(next(iters))
while True:
for it in islice(iters, n-1):
while (value := next(it)) < candidate:
pass
if value != candidate:
candidate = value
break
else:
yield candidate
candidate = next(next(iters))
except StopIteration:
return
Here's a sample session:
>>> data = [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
>>> list(intersection(data))
[1, 7]
>>> data = [[1,1,2,3], [1,1,4,4]]
>>> list(intersection(data))
[1, 1]
Algorithm in words
The algorithm starts by selecting the next value from the next iterator to be a candidate.
The main loop assumes a candidate has been selected and it loops over the next n - 1 iterators. For each of those iterators, it consumes values until it finds a value that is a least as large as the candidate. If that value is larger than the candidate, that value becomes the new candidate and the main loop starts again. If all n - 1 values are equal to the candidate, then the candidate is emitted and a new candidate is fetched.
When any input iterator is exhausted, the algorithm is complete.
Doing it without libraries (core language only)
The same algorithm works fine (though less beautifully) without using itertools. Just replace cycle and islice with their list based equivalents:
def intersection(inputs):
"Yield the intersection of elements from multiple sorted inputs."
# intersection(['ABBCD', 'BBDE', 'BBBDDE']) --> B B D
n = len(inputs)
iters = list(map(iter, inputs))
curr_iter = 0
try:
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
candidate = next(it)
while True:
for i in range(n - 1):
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
while (value := next(it)) < candidate:
pass
if value != candidate:
candidate = value
break
else:
yield candidate
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
candidate = next(it)
except StopIteration:
return
Yes, it is possible! I've modified your example code to do this.
My answer assumes that your question is about the algorithm - if you want the fastest-running code using sets, see other answers.
This maintains the O(n log(k)) time complexity: all the code between if lowest != elem or ary != times_seen: and unbench_all = False is O(log(k)). There is a nested loop inside the main loop (for unbenched in range(times_seen):) but this only runs times_seen times, and times_seen is initially 0 and is reset to 0 after every time this inner loop is run, and can only be incremented once per main loop iteration, so the inner loop cannot do more iterations in total than the main loop. Thus, since the code inside the inner loop is O(log(k)) and runs at most as many times as the outer loop, and the outer loop is O(log(k)) and runs n times, the algorithm is O(n log(k)).
This algorithm relies upon how tuples are compared in Python. It compares the first items of the tuples, and if they are equal it, compares the second items (i.e. (x, a) < (x, b) is true if and only if a < b).
In this algorithm, unlike in the example code in the question, when an item is popped from the heap, it is not necessarily pushed again in the same iteration. Since we need to check if all sub-lists contain the same number, after a number is popped from the heap, it's sublist is what I call "benched", meaning that it is not added back to the heap. This is because we need to check if other sub-lists contain the same item, so adding this sub-list's next item is not needed right now.
If a number is indeed in all sub-lists, then the heap will look something like [(2,0),(2,1),(2,2),(2,3)], with all the first elements of the tuples the same, so heappop will select the one with the lowest sub-list index. This means that first index 0 will be popped and times_seen will be incremented to 1, then index 1 will be popped and times_seen will be incremented to 2 - if ary is not equal to times_seen then the number is not in the intersection of all sub-lists. This leads to the condition if lowest != elem or ary != times_seen:, which decides when a number shouldn't be in the result. The else branch of this if statement is for when it still might be in the result.
The unbench_all boolean is for when all sub-lists need to be removed from the bench - this could be because:
The current number is known to not be in the intersection of the sub-lists
It is known to be in the intersection of the sub-lists
When unbench_all is True, all the sub-lists that were removed from the heap are re-added. It is known that these are the ones with indices in range(times_seen) since the algorithm removes items from the heap only if they have the same number, so they must have been removed in order of index, contiguously and starting from index 0, and there must be times_seen of them. This means that we don't need to store the indices of the benched sub-lists, only the number that have been benched.
import heapq
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# the number of tims that the current number has been seen
times_seen = 0
# the lowest number from the heap - currently checking if the first numbers in all sub-lists are equal to this
lowest = heap[0][0] if heap else None
# collect results in nlogK time
while heap:
elem, ary = heap[0]
unbench_all = True
if lowest != elem or ary != times_seen:
if lowest == elem:
heapq.heappop(heap)
it = srtd_iters[ary]
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
else:
heapq.heappop(heap)
times_seen += 1
if times_seen == len(srtd_arys):
res.append(elem)
else:
unbench_all = False
if unbench_all:
for unbenched in range(times_seen):
unbenched_it = srtd_iters[unbenched]
nxt = next(unbenched_it, None)
if nxt:
heapq.heappush(heap, (nxt, unbenched))
times_seen = 0
if heap:
lowest = heap[0][0]
return res
if __name__ == '__main__':
a1 = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
a2 = [[1, 1], [1, 1, 2, 2, 3]]
for arys in [a1, a2]:
print(mergeArys(arys))
An equivalent algorithm can be written like this, if you prefer:
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heap[0]
lowest = elem
keep_elem = True
for i in range(len(srtd_arys)):
elem, ary = heap[0]
if lowest != elem or ary != i:
if ary != i:
heapq.heappop(heap)
it = srtd_iters[ary]
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
keep_elem = False
i -= 1
break
heapq.heappop(heap)
if keep_elem:
res.append(elem)
for unbenched in range(i+1):
unbenched_it = srtd_iters[unbenched]
nxt = next(unbenched_it, None)
if nxt:
heapq.heappush(heap, (nxt, unbenched))
if len(heap) < len(srtd_arys):
heap = []
return res
You can use builtin sets and sets intersections :
d = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
result = set(d[0]).intersection(*d[1:])
{1, 7}
You can use reduce:
from functools import reduce
a = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
reduce(lambda x, y: x & set(y), a[1:], set(a[0]))
{1, 7}
I've come up with this algorithm. It doesn't exceed O(nk) I don't know if it's good enough for you. the point of this algorithm is that you can have k indexes for each array and each iteration you find the indexes of the next element in the intersection and increase every index until you exceed the bounds of an array and there are no more items in the intersection. the trick is since the arrays are sorted you can look at two elements in two different arrays and if one is bigger than the other you can instantly throw away the other because you know you cant have a smaller number than the one you are looking at. the worst case of this algorithm is that every index will be increased to the bound which takes kn time since an index cannot decrease its value.
inter = []
for n in range(len(arrays[0])):
if indexes[0] >= len(arrays[0]):
return inter
for i in range(1,k):
if indexes[i] >= len(arrays[i]):
return inter
while indexes[i] < len(arrays[i]) and arrays[i][indexes[i]] < arrays[0][indexes[0]]:
indexes[i] += 1
while indexes[i] < len(arrays[i]) and indexes[0] < len(arrays[0]) and arrays[i][indexes[i]] > arrays[0][indexes[0]]:
indexes[0] += 1
if indexes[0] < len(arrays[0]):
inter.append(arrays[0][indexes[0]])
indexes = [idx+1 for idx in indexes]
return inter
You said we can't use sets but how about dicts / hash tables? (yes I know they're basically the same thing) :D
If so, here's a fairly simple approach (please excuse the py2 syntax):
arrays = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
counts = {}
for ar in arrays:
last = None
for i in ar:
if (i != last):
counts[i] = counts.get(i, 0) + 1
last = i
N = len(arrays)
intersection = [i for i, n in counts.iteritems() if n == N]
print intersection
Same as Raymond Hettinger's solution but with more basic python code:
def intersection(arrays, unique: bool=False):
result = []
if not len(arrays) or any(not len(array) for array in arrays):
return result
pointers = [0] * len(arrays)
target = arrays[0][0]
start_step = 0
current_step = 1
while True:
idx = current_step % len(arrays)
array = arrays[idx]
while pointers[idx] < len(array) and array[pointers[idx]] < target:
pointers[idx] += 1
if pointers[idx] < len(array) and array[pointers[idx]] > target:
target = array[pointers[idx]]
start_step = current_step
current_step += 1
continue
if unique:
while (
pointers[idx] + 1 < len(array)
and array[pointers[idx]] == array[pointers[idx] + 1]
):
pointers[idx] += 1
if (current_step - start_step) == len(arrays):
result.append(target)
for other_idx, other_array in enumerate(arrays):
pointers[other_idx] += 1
if pointers[idx] < len(array):
target = array[pointers[idx]]
start_step = current_step
if pointers[idx] == len(array):
return result
current_step += 1
Here's an O(n) answer (where n = sum(len(sublist) for sublist in data)).
from itertools import cycle
def intersection(data):
result = []
maxval = float("-inf")
consecutive = 0
try:
for sublist in cycle(iter(sublist) for sublist in data):
value = next(sublist)
while value < maxval:
value = next(sublist)
if value > maxval:
maxval = value
consecutive = 0
continue
consecutive += 1
if consecutive >= len(data)-1:
result.append(maxval)
consecutive = 0
except StopIteration:
return result
print(intersection([[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]))
[1, 7]
Some of the above methods are not covering the examples when there are duplicates in every subset of the list. The Below code implements this intersection and it will be more efficient if there are lots of duplicates in the subset of the list :) If not sure about duplicates it is recommended to use Counter from collections from collections import Counter. The custom counter function is made for increasing the efficiency of handling large duplicates. But still can not beat Raymond Hettinger's implementation.
def counter(my_list):
my_list = sorted(my_list)
first_val, *all_val = my_list
p_index = my_list.index(first_val)
my_counter = {}
for item in all_val:
c_index = my_list.index(item)
diff = abs(c_index-p_index)
p_index = c_index
my_counter[first_val] = diff
first_val = item
c_index = my_list.index(item)
diff = len(my_list) - c_index
my_counter[first_val] = diff
return my_counter
def my_func(data):
if not data or not isinstance(data, list):
return
# get the first value
first_val, *all_val = data
if not isinstance(first_val, list):
return
# count items in first value
p = counter(first_val) # counter({1: 2, 3: 1, 5: 1, 7: 1})
# collect all common items and calculate the minimum occurance in intersection
for val in all_val:
# collecting common items
c = counter(val)
# calculate the minimum occurance in intersection
inner_dict = {}
for inner_val in set(c).intersection(set(p)):
inner_dict[inner_val] = min(p[inner_val], c[inner_val])
p = inner_dict
# >>>p
# {1: 2, 7: 1}
# Sort by keys of counter
sorted_items = sorted(p.items(), key=lambda x:x[0]) # [(1, 2), (7, 1)]
result=[i[0] for i in sorted_items for _ in range(i[1])] # [1, 1, 7]
return result
Here are the sample Examples
>>> data = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
>>> my_func(data=data)
[1, 7]
>>> data = [[1,1,3,5,7],[1,1,3,5,7],[1,1,4,7,9]]
>>> my_func(data=data)
[1, 1, 7]
You can do the following using the functions heapq.merge, chain.from_iterable and groupby
from heapq import merge
from itertools import groupby, chain
ls = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
def index_groups(lst):
"""[1, 1, 3, 5, 7] -> [(1, 0), (1, 1), (3, 0), (5, 0), (7, 0)]"""
return chain.from_iterable(((e, i) for i, e in enumerate(group)) for k, group in groupby(lst))
iterables = (index_groups(li) for li in ls)
flat = merge(*iterables)
res = [k for (k, _), g in groupby(flat) if sum(1 for _ in g) == len(ls)]
print(res)
Output
[1, 7]
The idea is to give an extra value (using enumerate) to differentiate between equal values within the same list (see the function index_groups).
The complexity of this algorithm is O(n) where n is the sum of the lengths of each list in the input.
Note that the output for (an extra 1 en each list):
ls = [[1, 1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 1, 4, 7, 9]]
is:
[1, 1, 7]
You can use bit-masking with one-hot encoding. The inner lists become maxterms. You and them together for the intersection and or them for the union. Then you have to convert back, for which I've used a bit hack.
problem = [[1,3,5,7],[1,1,3,5,8,7],[1,4,7,9]];
debruijn = [0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9];
u32 = accum = (1 << 32) - 1;
for vec in problem:
maxterm = 0;
for v in vec:
maxterm |= 1 << v;
accum &= maxterm;
# https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
result = [];
while accum:
power = accum;
accum &= accum - 1; # Peter Wegner CACM 3 (1960), 322
power &= ~accum;
result.append(debruijn[((power * 0x077CB531) & u32) >> 27]);
print result;
This uses (simulates) 32-bit integers, so you can only have [0, 31] in your sets.
*I am inexperienced at Python, so I timed it. One should definitely use set.intersection.
Here is the single-pass counting algorithm, a simplified version of what others have suggested.
def intersection(iterables):
target, count = None, 0
for it in itertools.cycle(map(iter, iterables)):
for value in it:
if count == 0 or value > target:
target, count = value, 1
break
if value == target:
count += 1
break
else: # exhausted iterator
return
if count >= len(iterables):
yield target
count = 0
Binary and exponential search haven't come up yet. They're easily recreated even with the "no builtins" constraint.
In practice, that would be much faster, and sub-linear. In the worst case - where the intersection isn't shrinking - the naive approach would repeat work. But there's a solution for that: integrate the binary search while splitting the arrays in half.
def intersection(seqs):
seq = min(seqs, key=len)
if not seq:
return
pivot = seq[len(seq) // 2]
lows, counts, highs = [], [], []
for seq in seqs:
start = bisect.bisect_left(seq, pivot)
stop = bisect.bisect_right(seq, pivot, start)
lows.append(seq[:start])
counts.append(stop - start)
highs.append(seq[stop:])
yield from intersection(lows)
yield from itertools.repeat(pivot, min(counts))
yield from intersection(highs)
Both handle duplicates. Both guarantee O(N) worst-case time (counting slicing as atomic). The latter will approach O(min_size) speed; by always splitting the smallest in half it essentially can't suffer from the bad luck of uneven splits.
I couldn't help but notice that this is seems to be a variation on the Welfare Crook problem; see David Gries's book, The Science of Programming. Edsger Dijkstra also wrote an EWD about this, see Ascending Functions and the Welfare Crook.
The Welfare Crook
Suppose we have three long magnetic tapes, each containing a list of names in alphabetical order:
all people working for IBM Yorktown
students at Columbia University
people on welfare in New York City
Practically speaking, all three lists are endless, so no upper bounds are given. It is know that at least one person is on all three lists. Write a program to locate the first such person.
Our intersection of the ordered lists problem is a generalization of the Welfare Crook problem.
Here's a (rather primitive?) Python solution to the Welfare Crook problem:
def find_welfare_crook(f, g, h, i, j, k):
"""f, g, and h are "ascending functions," i.e.,
i <= j implies f[i] <= f[j] or, equivalently,
f[i] < f[j] implies i < j, and the same goes for g and h.
i, j, k define where to start the search in each list.
"""
# This is an implementation of a solution to the Welfare Crook
# problems presented in David Gries's book, The Science of Programming.
# The surprising and beautiful thing is that the guard predicates are
# so few and so simple.
i , j , k = i , j , k
while True:
if f[i] < g[j]:
i += 1
elif g[j] < h[k]:
j += 1
elif h[k] < f[i]:
k += 1
else:
break
return (i,j,k)
# The other remarkable thing is how the negation of the guard
# predicates works out to be: f[i] == g[j] and g[j] == c[k].
Generalization to Intersection of K Lists
This generalizes to K lists, and here's what I devised; I don't know how Pythonic this is, but it pretty compact:
def findIntersectionLofL(lofl):
"""Generalized findIntersection function which operates on a "list of lists." """
K = len(lofl)
indices = [0 for i in range(K)]
result = []
#
try:
while True:
# idea is to maintain the indices via a construct like the following:
allEqual = True
for i in range(K):
if lofl[i][indices[i]] < lofl[(i+1)%K][indices[(i+1)%K]] :
indices[i] += 1
allEqual = False
# When the above iteration finishes, if all of the list
# items indexed by the indices are equal, then another
# item common to all of the lists must be added to the result.
if allEqual :
result.append(lofl[0][indices[0]])
while lofl[0][indices[0]] == lofl[1][indices[1]]:
indices[0] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result
This solution does not keep repeated items. Changing the program to include all of the repeated items by changing what the program does in the conditional at the end of the "while True" loop is an exercise left to the reader.
Improved Performance
Comments from #greybeard prompted refinements shown below, in the
pre-computation of the "array index moduli" (the "(i+1)%K" expressions) and further investigation also brought about changes to the inner iteration's structure, to further remove overhead:
def findIntersectionLofLunRolled(lofl):
"""Generalized findIntersection function which operates on a "list of lists."
Accepts a list-of-lists, lofl. Each of the lists must be ordered.
Returns the list of each element which appears in all of the lists at least once.
"""
K = len(lofl)
indices = [0] * K
result = []
lt = [ (i, (i+1) % K) for i in range(K) ] # avoids evaluation of index exprs inside the loop
#
try:
while True:
allUnEqual = True
while allUnEqual:
allUnEqual = False
for i,j in lt:
if lofl[i][indices[i]] < lofl[j][indices[j]]:
indices[i] += 1
allUnEqual = True
# Now all of the lofl[i][indices[i]], for all i, are the same value.
# Store that value in the result, and then advance all of the indices
# past that common value:
v = lofl[0][indices[0]]
result.append(v)
for i,j in lt:
while lofl[i][indices[i]] == v:
indices[i] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result

Merging Overlapping Intervals in Python [duplicate]

This question already has answers here:
Merging Overlapping Intervals
(4 answers)
Closed last year.
I am trying to solve a question where in overlapping intervals need to be merged.
The question is:
Given a collection of intervals, merge all overlapping intervals.
For example, Given [1,3],[2,6],[8,10],[15,18], return [1,6],[8,10],[15,18].
I tried my solution:
# Definition for an interval.
# class Interval:
# def __init__(self, s=0, e=0):
# self.start = s
# self.end = e
class Solution:
def merge(self, intervals):
"""
:type intervals: List[Interval]
:rtype: List[Interval]
"""
start = sorted([x.start for x in intervals])
end = sorted([x.end for x in intervals])
merged = []
j = 0
new_start = 0
for i in range(len(start)):
if start[i]<end[j]:
continue
else:
j = j + 1
merged.append([start[new_start], end[j]])
new_start = i
return merged
However it is clearly missing the last interval as:
Input : [[1,3],[2,6],[8,10],[15,18]]
Answer :[[1,6],[8,10]]
Expected answer: [[1,6],[8,10],[15,18]]
Not sure how to include the last interval as overlap can only be checked in forward mode.
How to fix my algorithm so that it works till the last slot?
Your code implicitly already assumes the starts and ends to be sorted, so that sort could be left out. To see this, try the following intervals:
intervals = [[3,9],[2,6],[8,10],[15,18]]
start = sorted([x[0] for x in intervals])
end = sorted([x[1] for x in intervals]) #mimicking your start/end lists
merged = []
j = 0
new_start = 0
for i in range(len(start)):
if start[i]<end[j]:
continue
else:
j = j + 1
merged.append([start[new_start], end[j]])
new_start = i
print(merged) #[[2, 9], [8, 10]]
Anyway, the best way to do this is probably recursion, here shown for a list of lists instead of Interval objects.
def recursive_merge(inter, start_index = 0):
for i in range(start_index, len(inter) - 1):
if inter[i][1] > inter[i+1][0]:
new_start = inter[i][0]
new_end = inter[i+1][1]
inter[i] = [new_start, new_end]
del inter[i+1]
return recursive_merge(inter.copy(), start_index=i)
return inter
sorted_on_start = sorted(intervals)
merged = recursive_merge(sorted_on_start.copy())
print(merged) #[[2, 10], [15, 18]]
I know the question is old, but in case it might help, I wrote a Python library to deal with (set of) intervals. Its name is portion and makes it easy to merge intervals:
>>> import portion as P
>>> inputs = [[1,3],[2,6],[8,10],[15,18]]
>>> # Convert each input to an interval
>>> intervals = [P.closed(a, b) for a, b in inputs]
>>> # Merge these intervals
>>> merge = P.Interval(*intervals)
>>> merge
[1,6] | [8,10] | [15,18]
>>> # Output as a list of lists
>>> [[i.lower, i.upper] for i in merge]
[[1,6],[8,10],[15,18]]
Documentation can be found here: https://github.com/AlexandreDecan/portion
We can have intervals sorted by the first interval and we can build the merged list in the same interval list by checking the intervals one by one not appending to another one so. we increment i for every interval and interval_index is current interval check
x =[[1,3],[2,6],[8,10],[15,18]]
#y = [[1,3],[2,6],[8,10],[15,18],[19,25],[20,26],[25,30], [32,40]]
def merge_intervals(intervals):
sorted_intervals = sorted(intervals, key=lambda x: x[0])
interval_index = 0
#print(sorted_intervals)
for i in sorted_intervals:
if i[0] > sorted_intervals[interval_index][1]:
interval_index += 1
sorted_intervals[interval_index] = i
else:
sorted_intervals[interval_index] = [sorted_intervals[interval_index][0], i[1]]
#print(sorted_intervals)
return sorted_intervals[:interval_index+1]
print(merge_intervals(x)) #-->[[1, 6], [8, 10], [15, 18]]
#print ("------------------------------")
#print(merge_intervals(y)) #-->[[1, 6], [8, 10], [15, 18], [19, 30], [32, 40]]
This is very old now, but in case anyone stumbles across this, I thought I'd throw in my two cents, since I wasn't completely happy with the answers above.
I'm going to preface my solution by saying that when I work with intervals, I prefer to convert them to python3 ranges (probably an elegant replacement for your Interval class) because I find them easy to work with. However, you need to remember that ranges are half-open like everything else in Python, so the stop coordinate is not "inside" of the interval. Doesn't matter for my solution, but something to keep in mind.
My own solution:
# Start by converting the intervals to ranges.
my_intervals = [[1, 3], [2, 6], [8, 10], [15, 18]]
my_ranges = [range(start, stop) for start, stop in my_intervals]
# Next, define a check which will return True if two ranges overlap.
# The double inequality approach means that your intervals don't
# need to be sorted to compare them.
def overlap(range1, range2):
if range1.start <= range2.stop and range2.start <= range1.stop:
return True
return False
# Finally, the actual function that returns a list of merged ranges.
def merge_range_list(ranges):
ranges_copy = sorted(ranges.copy(), key=lambda x: x.stop)
ranges_copy = sorted(ranges_copy, key=lambda x: x.start)
merged_ranges = []
while ranges_copy:
range1 = ranges_copy[0]
del ranges_copy[0]
merges = [] # This will store the position of ranges that get merged.
for i, range2 in enumerate(ranges_copy):
if overlap(range1, range2): # Use our premade check function.
range1 = range(min([range1.start, range2.start]), # Overwrite with merged range.
max([range1.stop, range2.stop]))
merges.append(i)
merged_ranges.append(range1)
# Time to delete the ranges that got merged so we don't use them again.
# This needs to be done in reverse order so that the index doesn't move.
for i in reversed(merges):
del ranges_copy[i]
return merged_ranges
print(merge_range_list(my_ranges)) # --> [range(1, 6), range(8, 10), range(15, 18)]
Make pairs for every endpoint: (value; kind = +/-1 for start or end of interval)
Sort them by value. In case of tie choose paie with -1 first if you need to merge intervals with coinciding ends like 0-1 and 1-2
Make CurrCount = 0, walk through sorted list, adding kind to CurrCount
Start new resulting interval when CurrCount becomes nonzero, finish interval when CurrCount becomes zero.
Late to the party, but here is my solution. I typically find recursion with an invariant easier to conceptualize. In this case, the invariant is that the head is always merged, and the tail is always waiting to be merged, and you compare the last element of head with the first element of tail.
One should definitely use sorted with the key argument rather than using a list comprehension.
Not sure how efficient this is with slicing and concatenating lists.
def _merge(head, tail):
if tail == []:
return head
a, b = head[-1]
x, y = tail[0]
do_merge = b > x
if do_merge:
head_ = head[:-1] + [(a, max(b, y))]
tail_ = tail[1:]
return _merge(head_, tail_)
else:
head_ = head + tail[:1]
tail_ = tail[1:]
return _merge(head_, tail_)
def merge_intervals(lst):
if len(lst) <= 1:
return lst
lst = sorted(lst, key=lambda x: x[0])
return _merge(lst[:1], lst[1:])

Finding if the next element is smaller than the one before it and deleting it from the list python

I am having trouble with my code, I am writing a method that will check if the next element is smaller than the previous element and if it is, it will delete it.
Example:
Input: [1, 20, 10, 30]
Desired output: [1,20,30]
Actual output: [30]
def findSmaller(s):
i = -1
y = []
while i <= len(s):
for j in range(len(s)):
if s[i+1] <= s[i]:
del s[i + 1]
y.append(s[i])
i += 1
return y
If you are uncertain about how your loops work I recommend adding in some print statements. That way you can see what your loop is actually doing, especially in more complicated problems this is useful.
Something like this would solve your problem.
a = [1,2,3,2,4]
for k in range(0,len(a)-2): #-2 so that one don't go past the loops length
#print(k)
y = a
if(a[k]>a[k+1]):
del y[k+1] #delete the k+1 element if it is
>>> s = [5, 20, 10, 15, 30]
>>> max_so_far = s[0]
>>> result = []
>>> for x in s:
if x >= max_so_far:
result.append(x)
max_so_far = x
>>> result
[5, 20, 30]
Depending whether you need to do some calculation later with the list you can use a generator
s = [1, 20, 10, 30]
def find_smaller_generator(l: list):
last_item = None
for item in l:
if last_item is None or item >= last_item:
last_item = item
yield item
def find_smaller_list(l: list):
return list(find_smaller_generator(l))
print(find_smaller_list(s))
for i in find_smaller_generator(s):
print(i)
print([i**2 for i in find_smaller_generator(s)])
this returns:
[1, 20, 30]
1
20
30
[1, 400, 900]
You can try something like this
def findSmaller(s):
# sets p (previous) as the first value in s
p = s[0]
# initializes y to be an array and sets the first value to p
y = [p]
# iterates over s, starting with the second element
for i in s[1::]:
# checks if i is greater than or equal to the previous element
if i >= p:
# if it is, i is appended to the list y
y.append(i)
# also set the previous value to i, so the next iteration can check against p
p = i
#returns the list
return y
What this does is iterate over s and checks if the current item in the list is greater than or equal to the previous element in the list. If it is then it appends it to y, and y is returned.
Try out the code here.

How to check if two permutations are symmetric?

Given two permutations A and B of L different elements, L is even, let's call these permutations "symmetric" (for a lack of a better term), if there exist n and m, m > n such as (in python notation):
- A[n:m] == B[L-m:L-n]
- B[n:m] == A[L-m:L-n]
- all other elements are in place
Informally, consider
A = 0 1 2 3 4 5 6 7
Take any slice of it, for example 1 2. It starts at the second index and its length is 2. Now take a slice symmetric to it: it ends at the penultimate index and is 2 chars long too, so it's 5 6. Swapping these slices gives
B = 0 5 6 3 4 1 2 7
Now, A and B are "symmetric" in the above sense (n=1, m=3). On the other hand
A = 0 1 2 3 4 5 6 7
B = 1 0 2 3 4 5 7 6
are not "symmetric" (no n,m with above properties exist).
How can I write an algorithm in python that finds if two given permutations (=lists) are "symmetric" and if yes, find the n and m? For simplicity, let's consider only even L (because the odd case can be trivially reduced to the even one by eliminating the middle fixed element) and assume correct inputs (set(A)==set(B), len(set(A))==len(A)).
(I have no problem bruteforcing all possible symmetries, but looking for something smarter and faster than that).
Fun fact: the number of symmetric permutations for the given L is a Triangular number.
I use this code to test out your answers.
Bounty update: many excellent answers here. #Jared Goguen's solution appears to be the fastest.
Final timings:
testing 0123456789 L= 10
test_alexis ok in 15.4252s
test_evgeny_kluev_A ok in 30.3875s
test_evgeny_kluev_B ok in 27.1382s
test_evgeny_kluev_C ok in 14.8131s
test_ian ok in 26.8318s
test_jared_goguen ok in 10.0999s
test_jason_herbburn ok in 21.3870s
test_tom_karzes ok in 27.9769s
Here is the working solution for the question:
def isSymmetric(A, B):
L = len(A) #assume equivalent to len(B), modifying this would be as simple as checking if len(A) != len(B), return []
la = L//2 # half-list length
Al = A[:la]
Ar = A[la:]
Bl = B[:la]
Br = B[la:]
for i in range(la):
lai = la - i #just to reduce the number of computation we need to perform
for j in range(1, lai + 1):
k = lai - j #same here, reduce computation
if Al[i] != Br[k] or Ar[k] != Bl[i]: #the key for efficient computation is here: do not proceed unnecessarily
continue
n = i #written only for the sake of clarity. i is n, and we can use i directly
m = i + j
if A[n:m] == B[L-m:L-n] and B[n:m] == A[L-m:L-n]: #possibly symmetric
if A[0:n] == B[0:n] and A[m:L-m] == B[m:L-m] and A[L-n:] == B[L-n:]:
return [n, m]
return []
As you have mentioned, though the idea looks simple, but it is actually quite a tricky one. Once we see the patterns, however, the implementation is straight-forward.
The central idea of the solution is this single line:
if Al[i] != Br[k] or Ar[k] != Bl[i]: #the key for efficient computation is here: do not proceed unnecessarily
All other lines are just either direct code translation from the problem statement or optimization made for more efficient computation.
There are few steps involved in order to find the solution:
Firstly, we need to split the each both list Aand list B into two half-lists (called Al, Ar, Bl, and Br). Each half-list would contain half of the members of the original lists:
Al = A[:la]
Ar = A[la:]
Bl = B[:la]
Br = B[la:]
Secondly, to make the evaluation efficient, the goal here is to find what I would call pivot index to decide whether a position in the list (index) is worth evaluated or not to check if the lists are symmetric. This pivot index is the central idea to find an efficient solution. So I would try to elaborate it quite a bit:
Consider the left half part of the A list, suppose you have a member like this:
Al = [al1, al2, al3, al4, al5, al6]
We can imagine that there is a corresponding index list for the mentioned list like this
Al = [al1, al2, al3, al4, al5, al6]
iAl = [0, 1, 2, 3, 4, 5 ] #corresponding index list, added for explanation purpose
(Note: the reason why I mention of imagining a corresponding index list is for ease of explanation purposes)
Likewise, we can imagine that the other three lists may have similar index lists. Let's name them iAr, iBl, and iBr respectively and they are all having identical members with iAl.
It is the index of the lists which would really matter for us to look into - in order to solve the problem.
Here is what I mean: suppose we have two parameters:
index (let's give a variable name i to it, and I would use symbol ^ for current i)
length (let's give a variable name j to it, and I would use symbol == to visually represent its length value)
for each evaluation of the index element in iAl - then each evaluation would mean:
Given an index value i and length value of j in iAl, do
something to determine if it is worth to check for symmetric
qualifications starting from that index and with that length
(Hence the name pivot index come).
Now, let's take example of one evaluation when i = 0 and j = 1. The evaluation can be illustrated as follow:
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 0
== <-- now this has length (j) of 1
In order for those index i and length j to be worth evaluated further, then the counterpart iBr must have the same item value with the same length but on different index (let's name it index k)
iBr = [0, 1, 2, 3, 4, 5]
^ <-- must compare the value in this index to what is pointed by iAl
== <-- must evaluate with the same length = 1
For example, for the above case, this is a possible "symmetric" permutation just for the two lists Al-Br (we will consider the other two lists Ar-Bl later):
Al = [0, x, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, x, 0]
At this moment, it is good to note that
It won't worth evaluating further if even the above condition is not
true
And this is where you get the algorithm to be more efficient; that is, by selectively evaluating only the few possible cases among all possible cases. And how to find the few possible cases?
By trying to find relationship between indexes and lengths of the
four lists. That is, for a given index i and length j in a
list (say Al), what must be the index k in the counterpart
list (in the case is Br). Length for the counterpart list need not
be found because it is the same as in the list (that is j).
Having know that, let's now proceed further to see if we can see more patterns in the evaluation process.
Consider now the effect of length (j). For example, if we are to evaluate from index 0, but the length is 2 then the counterpart list would need to have different index k evaluated than when the length is 1
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 0
===== <-- now this has length (j) of 2
iBr = [0, 1, 2, 3, 4, 5]
^ <-- must compare the value in this index to what is pointed by iAl
===== <-- must evaluate with the same length = 2
Or, for the illustration above, what really matters fox i = 0 and y = 2 is something like this:
# when i = 0 and y = 2
Al = [0, y, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, 0, y] #y means to be checked later
Take a look that the above pattern is a bit different from when i = 0 and y = 1 - the index position for 0 value in the example is shifted:
# when i = 0 and y = 1, k = 5
Al = [0, x, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, x, 0]
# when i = 0 and y = 2, k = 4
Al = [0, y, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, 0, y] #y means to be checked later
Thus, length shifts where the index of the counterpart list must be checked. In the first case, when i = 0 and y = 1, then the k = 5. But in the second case, when i = 0 and y = 1, then the k = 4. Thus we found the pivot indexes relationship when we change the length j for a fixed index i (in this case being 0) unto the counterpart list index k.
Now, consider the effects of index i with fixed length j for counterpart list index k. For example, let's fix the length as y = 4, then for index i = 0, we have:
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 0
========== <-- now this has length (j) of 4
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 1
========== <-- now this has length (j) of 4
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 2
========== <-- now this has length (j) of 4
#And no more needed
In the above example, it can be seen that we need to evaluate 3 possibilities for the given i and j, but if the index i is changed to 1 with the same length j = 4:
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 1
========== <-- now this has length (j) of 4
iAl = [0, 1, 2, 3, 4, 5]
^ <-- now evaluate this index (i) = 2
========== <-- now this has length (j) of 4
Note that we only need to evaluate 2 possibilities. Thus the increase of index i decreases the number of possible cases to be evaluated!
With all the above patterns found, we almost found all the basis we need to make the algorithm works. But to complete that, we need to find the relationship between indexes which appear in Al-Br pair for a given [i, j] => [k, j] with the indexes in Ar-Bl pair for the same [i, j].
Now, we can actually see that they are simply mirroring the relationship we found in Al-Br pair!
(IMHO, this is really beautiful! and thus I think term "symmetric" permutation is not far from truth)
For example, if we have the following Al-Br pair evaluated with i = 0 and y = 2
Al = [0, y, x, x, x, x] #x means don't care for now
Br = [x, x, x, x, 0, y] #y means to be checked later
Then, to make it symmetric, we must have the corresponding Ar-Bl:
Ar = [x, x, x, x, 3, y] #x means don't care for now
Bl = [3, y, x, x, x, x] #y means to be checked later
The indexing of Al-Br pair is mirroring (or, is symmetric to) the indexing of Ar-Bl pair!
Therefore, combining all the pattern we found above, we now could find the pivot indexes for evaluating Al, Ar, Bl, and Br.
We only need to check the values of the lists in the pivot index
first. If the values of the lists in the pivot indexes of Al, Ar, Bl, and Br
matches in the evaluation then and only then we need to check
for symmetric criteria (thus making the computation efficient!)
Putting up all the knowledge above into code, the following is the resulting for-loop Python code to check for symmetricity:
for i in range(len(Al)): #for every index in the list
lai = la - i #just simplification
for j in range(1, lai + 1): #get the length from 1 to la - i + 1
k = lai - j #get the mirror index
if Al[i] != Br[k] or Ar[k] != Bl[i]: #if the value in the pivot indexes do not match
continue #skip, no need to evaluate
#at this point onwards, then the values in the pivot indexes match
n = i #assign n
m = i + j #assign m
#test if the first two conditions for symmetric are passed
if A[n:m] == B[L-m:L-n] and B[n:m] == A[L-m:L-n]: #possibly symmetric
#if it passes, test the third condition for symmetric, the rests of the elements must stay in its place
if A[0:n] == B[0:n] and A[m:L-m] == B[m:L-m] and A[L-n:] == B[L-n:]:
return [n, m] #if all three conditions are passed, symmetric lists are found! return [n, m] immediately!
#passing this but not outside of the loop means
#any of the 3 conditions to find symmetry are failed
#though values in the pivot indexes match, simply continue
return [] #nothing can be found - asymmetric lists
And there go you with the symmetric test!
(OK, this is quite a challenge and it takes quite a while for me to figure out how.)
I rewrote the code without some of the complexity (and errors).
def test_o_o(a, b):
L = len(a)
H = L//2
n, m = 0, H-1
# find the first difference in the left-side
while n < H:
if a[n] != b[n]: break
n += 1
else: return
# find the last difference in the left-side
while m > -1:
if a[m] != b[m]: break
m -= 1
else: return
# for slicing, we want end_index+1
m += 1
# compare each slice for equality
# order: beginning, block 1, block 2, middle, end
if (a[0:n] == b[0:n] and \
a[n:m] == b[L-m:L-n] and \
b[n:m] == a[L-m:L-n] and \
a[m:L-m] == b[m:L-m] and \
a[L-n:L] == b[L-n:L]):
return n, m
The implementation is both elegant and efficient.
The break into else: return structures ensure that the function returns at the soonest possible point. They also validate that n and m have been set to valid values, but this does not appear to be necessary when explicitly checking the slices. These lines can be removed with no noticeable impact on the timing.
Explicitly comparing the slices will also short-circuit as soon as one evaluates to False.
Originally, I checked whether a permutation existed by transforming b into a:
b = b[:]
b[n:m], b[L-m:L-n] = b[L-m:L-n], b[n:m]
if a == b:
return n, m
But this is slower than explicitly comparing the slices. Let me know if the algorithm doesn't speak for itself and I can offer further explanation (maybe even proof) as to why it works and is minimal.
I tried to implement 3 different algorithms for this task. All of them have O(N) time complexity and require O(1) additional space. Interesting fact: all other answers (known so far) implement 2 of these algorithms (though they not always keep optimal asymptotic time/space complexity). Here is high-level description for each algorithm:
Algorithm A
Compare the lists, group "non-equal" intervals, make sure there are exactly two such intervals (with special case when intervals meet in the middle).
Check if "non-equal" intervals are positioned symmetrically, and their contents is also "symmetrical".
Algorithm B
Compare first halves of the lists to guess where are "intervals to be exchanged".
Check if contents of these intervals is "symmetrical". And make sure the lists are equal outside of these intervals.
Algorithm C
Compare first halves of the lists to find first mismatched element.
Find this mismatched element of first list in second one. This hints the position of "intervals to be exchanged".
Check if contents of these intervals is "symmetrical". And make sure the lists are equal outside of these intervals.
There are two alternative implementations for step 1 of each algorithm: (1) using itertools, and (2) using plain loops (or list comprehensions). itertools are efficient for long lists but relatively slow on short lists.
Here is algorithm C with first step implemented using itertools. It looks simpler than other two algorithms (at the end of this post). And it is pretty fast, even for short lists:
import itertools as it
import operator as op
def test_C(a, b):
length = len(a)
half = length // 2
mismatches = it.imap(op.ne, a, b[:half]) # compare half-lists
try:
n = next(it.compress(it.count(), mismatches))
nr = length - n
mr = a.index(b[n], half, nr)
m = length - mr
except StopIteration: return None
except ValueError: return None
if a[n:m] == b[mr:nr] and b[n:m] == a[mr:nr] \
and a[m:mr] == b[m:mr] and a[nr:] == b[nr:]:
return (n, m)
This could be done using mostly itertools:
def test_A(a, b):
equals = it.imap(op.eq, a, b) # compare lists
e1, e2 = it.tee(equals)
l = it.chain(e1, [True])
r = it.chain([True], e2)
borders = it.imap(op.ne, l, r) # delimit equal/non-equal intervals
ranges = list(it.islice(it.compress(it.count(), borders), 5))
if len(ranges) == 4:
n1, m1 = ranges[0], ranges[1]
n2, m2 = ranges[2], ranges[3]
elif len(ranges) == 2:
n1, m1 = ranges[0], len(a) // 2
n2, m2 = len(a) // 2, ranges[1]
else:
return None
if n1 == len(a) - m2 and m1 == len(a) - n2 \
and a[n1:m1] == b[n2:m2] and b[n1:m1] == a[n2:m2]:
return (n1, m1)
High-level description of this algorithm is already provided in OP comments by #j_random_hacker. Here are some details:
Start with comparing the lists:
A 0 1 2 3 4 5 6 7
B 0 5 6 3 4 1 2 7
= E N N E E N N E
Then find borders between equal/non-equal intervals:
= E N N E E N N E
B _ * _ * _ * _ *
Then determine ranges for non-equal elements:
B _ * _ * _ * _ *
[1 : 3] [5 : 7]
Then check if there are exactly 2 ranges (with special case when both ranges meet in the middle), the ranges themselves are symmetrical, and their contents too.
Other alternative is to use itertools to process only half of each list. This allows slightly simpler (and slightly faster) algorithm because there is no need to handle a special case:
def test_B(a, b):
equals = it.imap(op.eq, a, b[:len(a) // 2]) # compare half-lists
e1, e2 = it.tee(equals)
l = it.chain(e1, [True])
r = it.chain([True], e2)
borders = it.imap(op.ne, l, r) # delimit equal/non-equal intervals
ranges = list(it.islice(it.compress(it.count(), borders), 2))
if len(ranges) != 2:
return None
n, m = ranges[0], ranges[1]
nr, mr = len(a) - n, len(a) - m
if a[n:m] == b[mr:nr] and b[n:m] == a[mr:nr] \
and a[m:mr] == b[m:mr] and a[nr:] == b[nr:]:
return (n, m)
This does the right thing:
Br = B[L//2:]+B[:L//2]
same_full = [a==b for (a,b) in zip(A, Br)]
same_part = [a+b for (a,b) in zip(same_full[L//2:], same_full[:L//2])]
for n, vn in enumerate(same_part):
if vn != 2:
continue
m = n
for vm in same_part[n+1:]:
if vm != 2:
break
m+=1
if m>n:
print("n=", n, "m=", m+1)
I'm pretty sure you could do the counting a bit bettter, but... meh
I believe the following pseudocode should work:
Find the first element i for which A[i] != B[i], set n = i. If no such element, return success. If n >= L/2, return fail.
Find the first element i > n for which A[i] == B[i], set m = i. If no such element or m > L/2, set m = L/2.
Check so A[0:n] == B[0:n], A[n:m] == B[L-m:L-n], B[n:m] == A[L-m:L-n], A[m:L-m] == B[m:L-m] and A[L-n:L] == B[L-n:L]. If all are true, return success. Else, return fail.
Complexity is O(n) which should be the lowest possible as one always needs to compare all elements in the lists.
I build a map of where the characters are in list B, then use that to determine the implied subranges in list A. Once I have the subranges, I can sanity check some of the info, and compare the slices.
If A[i] == x, then where does x appear in B? Call that position p.
I know i, the start of the left subrange.
I know L (= len(A)), so I know L-i, the end of the right subrange.
If I know p, then I know the implied start of the right subrange, assuming that B[p] and A[i] are the start of a symmetric pair of ranges. Thus, the OP's L - m would be p if the lists were symmetric.
Setting L-m == p gives me m, so I have all four end points.
Sanity tests are:
n and m are in left half of list(s)
n <= m (note: OP did not prohibit n == m)
L-n is in right half of list (computed)
L-m is in right half (this is a good check for quick fail)
If all those check out, compare A[left] == B[right] and B[left] == A[right]. Return left if true.
def find_symmetry(a:list, b:list) -> slice or None:
assert len(a) == len(b)
assert set(a) == set(b)
assert len(set(a)) == len(a)
length = len(a)
assert length % 2 == 0
half = length // 2
b_loc = {bi:n for n,bi in enumerate(b)}
for n,ai in enumerate(a[:half]):
L_n = length - 1 - n # L - n
L_m = b_loc[ai] # L - m (speculative)
if L_m < half: # Sanity: bail if on wrong side
continue
m = b_loc[a[L_n]] # If A[n] starts range, A[m] ends it.
if m < n or m > half: # Sanity: bail if backwards or wrong side
continue
left = slice(n, m+1)
right = slice(L_m, L_n+1)
if a[left] == b[right] and \
b[left] == a[right]:
return left
return None
res = find_symmetry(
[ 10, 11, 12, 13, 14, 15, 16, 17, ],
[ 10, 15, 16, 13, 14, 11, 12, 17, ])
assert res == slice(1,3)
res = find_symmetry(
[ 0, 1, 2, 3, 4, 5, 6, 7, ],
[ 1, 0, 2, 3, 4, 5, 7, 6, ])
assert res is None
res = find_symmetry("abcdefghijklmn", "nbcdefghijklma")
assert res == slice(0,1)
res = find_symmetry("abcdefghijklmn", "abjklfghicdmen")
assert res == slice(3,4)
res = find_symmetry("abcdefghijklmn", "ancjkfghidelmb")
assert res == slice(3,5)
res = find_symmetry("abcdefghijklmn", "bcdefgaijklmnh")
assert res is None
res = find_symmetry("012345", "013245")
assert res == slice(2,3)
Here's an O(N) solution which passes the test code:
def sym_check(a, b):
cnt = len(a)
ml = [a[i] == b[i] for i in range(cnt)]
sl = [i for i in range(cnt) if (i == 0 or ml[i-1]) and not ml[i]]
el = [i+1 for i in range(cnt) if not ml[i] and (i == cnt-1 or ml[i+1])]
assert(len(sl) == len(el))
range_cnt = len(sl)
if range_cnt == 1:
start1 = sl[0]
end2 = el[0]
if (end2 - start1) % 2 != 0:
return None
end1 = (start1 + end2) // 2
start2 = end1
elif range_cnt == 2:
start1, start2 = sl
end1, end2 = el
else:
return None
if end1 - start1 != end2 - start2:
return None
if start1 != cnt - end2:
return None
if a[start1:end1] != b[start2:end2]:
return None
if b[start1:end1] != a[start2:end2]:
return None
return start1, end1
I only tested it with Python 2, but I believe it will also work with Python 3.
It identifies the ranges where the two lists differ. It looks for two such ranges (if there is only one such range, it tries to divide it in half). It then checks that both ranges are the same length and in the proper positions relative to each other. If so, then it checks that the elements in the ranges match.
Yet another version:
def compare(a, b):
i_zip = list(enumerate(zip(a, b)))
llen = len(a)
hp = llen // 2
def find_index(i_zip):
for i, (x, y) in i_zip:
if x != y:
return i
return i_zip[0][0]
# n and m are determined by the unmoved items:
n = find_index(i_zip[:hp])
p = find_index(i_zip[hp:])
m = llen - p
q = llen - n
# Symmetric?
if a[:n] + a[p:q] + a[m:p] + a[n:m] + a[q:] != b:
return None
return n, m
This solution is based on:
All validly permuted list pairs A, B adhering to the symmetry requirement will have the structure:
A = P1 + P2 + P3 + P4 + P5
B = P1 + P4 + P3 + P2 + P5
^n ^m ^hp ^p ^q <- indexes
,len(P1) == len(P5) and len(P2) == len(P4)
Therefore the 3 last lines of the above function will determine the correct solution provided the indexes n, m are correctly determined. (p & q are just mirror indexes of m & n)
Finding n is a matter of determining when items of A and B start to diverge. Next the same method is applied to finding p starting from midpoint hp. m is just mirror index of p. All involved indexes are found and the solution emerges.
Make a list (ds) of indices where the first halves of the two lists differ.
A possible n is the first such index, the last such index is m - 1.
Check if valid symmetry. len(ds) == m - n makes sure there aren't any gaps.
import itertools as it
import operator as op
def test(a, b):
sz = len(a)
ds = list(it.compress(it.count(), map(op.ne, a[:sz//2], b[:sz//2])))
n,m = ds[0], ds[-1]+1
if a[n:m] == b[sz-m:sz-n] and b[n:m] == a[sz-m:sz-n] and len(ds) == m - n:
return n,m
else:
return None
Here's a simple solution that passes my tests, and yours:
Compare the inputs, looking for a subsequence that does not match.
Transform A by transposing the mismatched subsequence according to the rules. Does the result match B?
The algorithm is O(N); there are no embedded loops, explicit or implicit.
In step 1, I need to detect the case where the swapped substrings are adjacent. This can only happen in the middle of the string, but I found it easier to just look out for the first element of the moved piece (firstval). Step 2 is simpler (and hence less error-prone) than explicitly checking all the constraints.
def compare(A, B):
same = True
for i, (a, b) in enumerate(zip(A,B)):
if same and a != b: # Found the start of a presumed transposition
same = False
n = i
firstval = a # First element of the transposed piece
elif (not same) and (a == b or b == firstval): # end of the transposition
m = i
break
# Construct the transposed string, compare it to B
origin = A[n:m]
if n == 0: # swap begins at the edge
dest = A[-m:]
B_expect = dest + A[m:-m] + origin
else:
dest = A[-m:-n]
B_expect = A[:n] + dest + A[m:-m] + origin + A[-n:]
return bool(B_expect == B)
Sample use:
>>> compare("01234567", "45670123")
True
Bonus: I believe the name for this relationship would be "symmetric block transposition". A block transposition swaps two subsequences, taking ABCDE to ADCBE. (See definition 4 here; I actually found this by googling "ADCBE"). I've added "symmetric" to the name to describe the length conditions.

Categories

Resources