Excluding intervals from a real number line - python

A long time ago I asked the following question:
Python: delete substring by indices
Last week I was asked a very similar question, but with continuous real-number line.
Imagine you are given an interval (X, Y), and a bunch of sub-intervals blocks=[(x1, y1), (x2, y2), ...]. Your goal is to find a list of intervals remaining=[(a1, b1), (a2, b2), ..] that is (1) in (X, Y), but (2) not in any of the sub-intervals in blocks. The intervals in blocks can overlap.
In other words, the function signature looks something like:
def delete_blocks_from_interval(X, Y, blocks):
```
X: start of the given interval
Y: end of the given interval
blocks: list of intervals (x, y) to be removed, can overlap
returns remaining = [(a, b), ...] intervals remaining after removal of blocks
```
pass
I can construct a graph of connected intervals in blocks, and find both the minimum of the start and the maximum of end for each connected component in the graph. But this is quadratic in length of blocks. I wonder if there is a better-runtime algorithm.
Please also discuss what code routine you think is more efficient for the algorithm if you will.
Many many thanks.
As requested, please consider the following illustrative inputs:
X = -1
Y = 20
blocks = [(1, 10), (4, 5), (9, 11), (16, 17.5)]
the expected output is remaining = [(-1, 1), (11, 16), (17.5, 20)]

This should be linear run-time in number of blocks
import bisect
def delete_blocks_from_interval(range_start, range_end, blocks):
blocks = sorted(blocks)
# check if the interval overlaps with blocks,
# if so, truncate the block lists, reset end points if required
start_idx = bisect.bisect_left([b[0] for b in blocks],range_start)
end_idx = bisect.bisect_left([b[0] for b in blocks],range_end)
blocks = blocks[start_idx:end_idx]
if blocks[0][0] < range_start:
blocks[0][0] = range_start
if blocks[-1][1] > range_end:
blocks[-1][1] = range_end
# emit the first gap, if any
if range_start < blocks[0][0]:
yield (range_start, blocks[0][0])
# loop through till the end of the blocks
end = blocks[0][1]
for block in blocks[1:]:
if end < block[0]:
yield (end, block[0])
end = block[1]
elif end < block[1]:
end = block[1]
# emit the last gap, if any
if range_end > blocks[-1][1]:
yield (blocks[-1][1], range_end)
blocks = [(1, 10), (4, 5), (9, 11), (16, 17.5)]
list(delete_blocks_from_interval(-1, 20, blocks))

python-ranges is a library I wrote that excels at this particular use case. It isn't the most efficient code you could possibly write (it essentially uses #wwii's algorithm below, in fact) but it is nice and terse.
from ranges import Range, RangeSet
...
def delete_blocks_from_interval(X, Y, blocks):
# make a Range
orig = Range(X, Y)
# make a RangeSet out of the 2-tuple blocks
# (using the unpacking operator to interpret 2-tuples as positional args for Range())
# and then find the difference from the original set
# (like with sets, the - operator is a shorthand for .difference())
remaining = orig - RangeSet(
Range(*block) for block in blocks
)
# return each range in the RangeSet as a tuple
return [(rng.start, rng.end) for rng in remaining.ranges()]

Would something like this work?
def delete_blocks_from_interval(a, b, blocks):
sorted_blocks = sorted(blocks)
for i, (c, d) in enumerate(sorted_blocks):
if a <= c <= b:
yield (a, c)
if d > a:
a = d
for (e, f) in sorted_blocks[i + 1:]:
if e <= a <= f:
a = f
elif e > a:
break
if a <= d <= b:
yield (d, b)
blocks = ((1, 10), (4, 5), (9, 11), (16, 17.5))
print(list(delete_blocks_from_interval(-1, 20, blocks)))
# (-1, 1), (11, 16), (17.5, 20)

Sort the blocks;
check if there is a gap between X and item 0 of the first block; save if there is;
iterate over blocks in pairs,
get the first and second item
if item 1 of the first block is less than item 0 of the second block,
save this interval (first[1],second[0]) ;
or if they overlap,
compare the last items,
if the last item of the second tuple is bigger than the last item of the first - expand the first tuple range,
get the third/next item, repeat the comparison(s)
repeat til you get to the end of the (X,Y) interval or you run out of blocks.

Related

how to group near points in list in python

I am getting a list using a list comprehension. lat say I am getting this list using this line of code bellow:
quality, angle, distance = measurements[i]
new_data = [each_value for each_value in measurements[i:i + 20] if angle <= each_value[1] <= angle + 30 and
distance - 150 <= each_value[2] <= distance + 150]
where measurements is a big data set which contains (quality, angle, distance) pair. from that, I am getting those value.
desired_list= [(1,2,3)(1,5,3),(1,8,3)(1,10,3),(1,16,3),(1,17,3)]]
Now how can I add a new condition in my list comprehension so that I will only get the value if the angle is within some offset value? let say if the difference between two respective angles is less then or equal to 5 then put them in desired_list.
with this condition my list should be like so:
desired_list= [(1,2,3)(1,5,3),(1,8,3)(1,10,3)]
cause from 2 to 5, 5 to 8, 8 to 10 the distance is less than or equal to 5.
But the last two points are not included as they break the condition after (1,10,3) and they don't need to check.
How can I achieve this? please help me
Note: it doesn't need to be in the same list comprehension.
You mention the data set is large. Depending how large you many wish to avoid creating a new list from scratch and just search for the relavant index.
data = [(1,2,3), (1,5,3), (1,8,3), (1,10,3), (1,16,3), (1,17,3)]
MAXIMUM_ANGLE = 5
def angles_within_range(x, y):
return abs(x[1] - y[1]) <= MAXIMUM_ANGLE
def first_angle_break_index():
for i in range(len(data) - 1):
if not angles_within_range(data[i], data[i+1]):
return i+1
def valid_angles_list():
return data[:first_angle_break_index()]
print(valid_angles_list())
If you means traverse from start to end, and break out when one neighor pairs break the rule.
here is a way without list comprehension:
desired_list = [(1, 2, 3), (1, 5, 3), (1, 8, 3), (1, 10, 3), (1, 16, 3), (1, 17, 3)]
res = [desired_list[0]]
for a, b in zip(desired_list[:-1], desired_list[1:]):
if abs(a[1] - b[1]) > 5:
break
res += [b]
print(res)
output:
[(1, 2, 3), (1, 5, 3), (1, 8, 3), (1, 10, 3)]
if you insist on using list comprehension with break, here is a solution of recording last pair:
res = [last.pop() and last.append(b) or b for last in [[desired_list[0]]] for a, b in
zip([desired_list[0]] + desired_list, desired_list) if abs(a[1] - b[1]) <= 5 and a == last[0]]
another version use end condition:
res = [b for end in [[]] for a, b in zip([desired_list[0]] + desired_list, desired_list) if
(False if end or abs(a[1] - b[1]) <= 5 else end.append(42)) or not end and abs(a[1] - b[1]) <= 5]
Note: This is a bad idea. (just for fun : ))

Looking for a more efficient/pythonic way to sum tuples in a list, and compute an average

I am trying to do some basic computations with data from the web. For this cause, I have found some code that extracts begin and end years for Rembrandt works. It saves it in a list
date_list =[(work['datebegin'], work['dateend']) for work in `rembrandt2_parsed['records']]`
date_list is a list containing the tuples with begin and end years for some Rembrandt works in the Harvard Art Museum. For the sake of completeness, it looks like this:
[(0, 0), (1648, 1648), (1637, 1647), (1626, 1636), (0, 0), (1638, 1638), (1635, 1635), (1634, 1634), (0, 0), (0, 0)]
Now I want to do some basic computations, I want to sum over this list of tuples, and compute the average of the years when they are not null. I came up with a solution:
datebegin =0
date_end =0
count_begin =0
count_end =0
for x, y in date_list:
if x !=0:
datebegin +=x
count_begin +=1
if y != 0:
date_end +=y
count_end +=1
final_date_begin = datebegin/count_begin #value = year 1636
final_date_end = date_end/count_end #value = year 1639
But I think this can be done much more efficient/pythonic. In the first place because I seem to need a lot of code for such a simple task, and in the second place because I need to initialize 4(!) global vars if I do it in this way. Could someone enlighten me and show me a more efficient way to solve this?
Non-numpy solution:
lst = [(0, 0), (1648, 1648), (1637, 1647), (1626, 1636), (0, 0), (1638, 1638), (1635, 1635), (1634, 1634), (0, 0), (0, 0)]
print(sum(x[0] for x in lst) / sum(x[0] != 0 for x in lst))
# 1636.3333333333333
print(sum(x[1] for x in lst) / sum(x[1] != 0 for x in lst))
# 1639.6666666666667
Numpy and list comprehensions are your friend here.
import numpy as np
date_list = [(0, 0), (1648, 1648), (1637, 1647), (1626, 1636), (0, 0),
(1638, 1638), (1635, 1635), (1634, 1634), (0, 0), (0, 0)]
final_date_begin = np.mean([x for x, y in date_list if not x == 0])
final_date_end = np.mean([y for x, y in date_list if not y == 0])
In pure Python
starts = [s for s, e in date_list for if s and e]
ends = [e for s, e in date_list for if s and e]
start_avg = sum(starts) / len(starts)
end_avg = sum(ends) / len(ends)
You can use numpy to solve this:
import numpy as np
result = list(np.ma.masked_equal(date_list, 0).mean(axis=0))
Here we thus first store the date_list in an array, next we mask out the zero values, and then we calculate the average over the first axis.
For your sample data, we obtain:
>>> list(np.ma.masked_equal(date_list, 0).mean(axis=0))
[1636.3333333333333, 1639.6666666666667]
Performance: for a list containing 100'000 2-tuples, generated with:
from random import randint
date_list = [(randint(0, 10), randint(0, 10)) for _ in range(100000)]
we repeated this function 1'000 times, and obtain:
>>> timeit(f, number=1000)
51.31010195999988
so locally, this works for a 100'000×2 "matrix" in 51.3 ms per run.

Is it possible to write a combination function with recursion technique?

Yesterday, I encountered a problem which requires calculating combinations in an iterable with range 5.
Instead of using itertools.combination, I tried to make a primitive function of my own. It looks like:
def combine_5(elements):
"""Find all combinations in elements with range 5."""
temp_list = []
for i in elements:
cur_index = elements.index(i)
for j in elements[cur_index+1 : ]:
cur_index = elements.index(j)
for k in elements[cur_index+1 : ]:
cur_index = elements.index(k)
for n in elements[cur_index+1 : ]:
cur_index = elements.index(n)
for m in elements[cur_index+1 : ]:
temp_list.append((i,j,k,n,m))
return temp_list
Then I thought maybe I can abstract it a bit, to make a combine_n function. And below is my initial blueprint:
# Unfinished version of combine_n
def combine_n(elements, r, cur_index=-1):
"""Find all combinations in elements with range n"""
r -= 1
target_list = elements[cur_index+1 : ]
for i in target_list:
cur_index = elements.index(i)
if r > 0:
combine_n(elements, r, cur_index)
pass
else:
pass
Then I've been stuck there for a whole day, the major problem is that I can't convey a value properly inside the recursive function. I added some code that fixed one problem. But as it works for every recursive loop, new problems arose. More fixes lead to more bugs, a vicious cycle.
And then I went for help to itertools.combination's source code. And it turns out it didn't use recursion technique.
Do you think it is possible to abstract this combine_5 function into a combine_n function with recursion technique? Do you have any ideas about its realization?
FAILURE SAMPLE 1:
def combine_n(elements, r, cur_index=-1):
"""Find all combinations in elements with range n"""
r -= 1
target_list = elements[cur_index+1 : ]
for i in target_list:
cur_index = elements.index(i)
if r > 0:
combine_n(elements, r, cur_index)
print i
else:
print i
This is my recent try after a bunch of overcomplicated experiments.
The core ideas is: if I can print them right, I can collect them into a container later.
But the problem is, in a nested for loop, when the lower for-loop hit with an empty list.
The temp_list.append((i,j,k,n,m)) clause of combine_5 will not work.
But in FAILURE SAMPLE 1, it still will print the content of the upper for-loop
like combine_n([0,1], 2) will print 2, 1, 2.
I need to find a way to convey this empty message to the superior for-loop.
Which I didn't figure out so far.
Yes, it's possible to do it with recursion. You can make combine_n return a list of tuples with all the combinations beginning at index cur_index, and starting with a partial combination of cur_combo, which you build up as you recurse:
def combine_n(elements, r, cur_index=0, cur_combo=()):
r-=1
temp_list = []
for elem_index in range(cur_index, len(elements)-r):
i = elements[elem_index]
if r > 0:
temp_list = temp_list + combine_n(elements, r, elem_index+1, cur_combo+(i,))
else:
temp_list.append(cur_combo+(i,))
return temp_list
elements = list(range(1,6))
print = combine_n(elements, 3)
output:
[(1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5)]
The for loop only goes up to len(elements)-r, because if you go further than that then there aren't enough remaining elements to fill the remaining places in the tuple. The tuples only get added to the list with append at the last level of recursion, then they get passed back up the call stack by returning the temp_lists and concatenating at each level back to the top.

De-overlapping intervals

I'm trying to write some code to de-overlap (open) intervals. I found inspiration in Algorithm to shift overlapping intervals until no overlap is left and Possible Interview Question: How to Find All Overlapping Intervals.
Intervals represent physical entities. Their position was estimated by some means, but this imperfect estimate results in overlapping positions of these physical entities. However, in reality these physical entities cannot occupy the same space, so I'm using this code to readjust their positions. This adjustment should move these physical entities as little as possible and maintain their estimated relative positions as much as possible. Since these are physical entities, the length of each interval cannot change.
Code works well in most cases but hangs in some cases like this:
intervals = [(0, 8), (9, 13), (11, 14), (15, 21)]
Here's my python code. Any suggestions?
def intervalLength(interval):
'''
Finds the length of an interval, supplied as a tupple
'''
return interval[1]-interval[0]
def findOverlappingIntervals(intervals):
# https://stackoverflow.com/questions/4542892/possible-interview-question-how-to-find-all-overlapping-intervals?rq=1
# Throw the endpoints of the intervals into an array, marking them as either start- or end-points.
# Sort them by breaking ties by placing end-points before start-points if the intervals are closed, or the other way around if they're half-open.
'''
Takes a list of intervals and returns the intervals that overlap.
List returned has nested list composed of the intervals that overlap with each other
Assumes list is ordered by the start position of the intervals
'''
end_points = []
for n,i in enumerate(intervals):
end_points.append((i[0],'b',n)) #'b' = beginning of interval
end_points.append((i[1],'e',n)) #'e' = end of interval
end_points.sort()
b = 0
e = 0
overlapping = [set()]
open_intervals = set()
for ep,sORe,i in end_points:
if sORe == 'b':
b += 1
open_intervals.add(i)
if b-e > 1 and i in open_intervals:
overlapping[-1].update(open_intervals)
elif len(overlapping[-1]) > 0:
overlapping.append(set())
else:
e += 1
open_intervals.remove(i)
overlapping = [o for o in overlapping if len(o) > 0]
overlapping = [[intervals[i] for i in o] for o in overlapping]
return overlapping
def deOverlap(intervals):
'''
Takes a list of overlapping intervals and returns a new list with updated postions
without overlap and intervals separated by 1, maintaining the previous center
'''
# Find center of overlapping intervals
avg = (reduce(lambda x,y: x+y, [a+b for a,b in intervals])+len(intervals)-1)/(len(intervals)*2.0)
# Find the total length of the ovrlapping intervals
tot_length = reduce(lambda x,y: x+y, [intervalLength(i) for i in intervals]) + len(intervals) - 1
# Find new start position for the overlapping intervals
new_start = int(round(avg-(tot_length/2.0)))
# Place first interval in new position
non_over_intervals = [(new_start, new_start+intervalLength(intervals[0]))]
# Place rest of intervals in new positions
for i in intervals[1:]:
start = non_over_intervals[-1][1]+1
non_over_intervals.append((start,start+intervalLength(i)))
return non_over_intervals
def deOverlapIntervals(intervals):
'''
Takes a list of intervals and returns a list with the same intervals with no overlap and
located as close to the original locations as possible
'''
non_over_intervals = intervals
i = 0
while len(findOverlappingIntervals(non_over_intervals)) > 0:
if i >= 10000:
print 'Tried 10,000 times and did not finish de-overlapping. Returning best I could do'
return non_over_intervals
overlapping_intervals = findOverlappingIntervals(non_over_intervals)
non_over_intervals = set(non_over_intervals) - set([oi for group in overlapping_intervals for oi in group])
for oi in overlapping_intervals:
non_overlapping = deOverlap(oi)
non_over_intervals.update(non_overlapping)
non_over_intervals = list(non_over_intervals)
non_over_intervals.sort()
i += 1
return non_over_intervals
These intervals run fine
intervals = [(-5,-1), (0,6), (7,11), (9,14), (12,17), (21,24), (27,32), (32,36), (39,41)]
These don't. They hang because intervals keep shifting and overlapping with the left or the right end
intervals = [(0, 8), (9, 13), (11, 14), (15, 21)]
non_over_intervals = deOverlapIntervals(intervals)
Can't you do it a lot more simply? Like this:
result = intervals[:1]
for begin, end in intervals[1:]:
if begin <= result[-1][1]:
result[-1] = (result[-1][0], end)
else:
result.append((begin, end))
Like this?
intervals = [(0, 8), (9, 13), (11, 14), (15, 21)]
intervals = [list(t) for t in intervals]
for i in range(len(intervals) - 1):
offset = intervals[i + 1][0] - intervals[i][1]
if offset < 1:
diff = intervals[i + 1][1] - intervals[i + 1][0]
intervals[i + 1][0] = intervals[i][1] + 1
intervals[i + 1][1] = intervals[i + 1][0] + diff
print(intervals)
I found that if shifting an interval will make it overlap with another interval, then including both the original overlapping intervals and the intervals that would overlap with the shift in a single deOverlapIntervals run will fix the problem.

How to split a list into subsets with no repeating elements in python

I need code that takes a list (up to n=31) and returns all possible subsets of n=3 without any two elements repeating in the same subset twice (think of people who are teaming up in groups of 3 with new people every time):
list=[1,2,3,4,5,6,7,8,9]
and returns
[1,2,3][4,5,6][7,8,9]
[1,4,7][2,3,8][3,6,9]
[1,6,8][2,4,9][3,5,7]
but not:
[1,5,7][2,4,8][3,6,9]
because 1 and 7 have appeared together already (likewise, 3 and 9).
I would also like to do this for subsets of n=2.
Thank you!!
Here's what I came up with:
from itertools import permutations, combinations, ifilter, chain
people = [1,2,3,4,5,6,7,8,9]
#get all combinations of 3 sets of 3 people
combos_combos = combinations(combinations(people,3), 3)
#filter out sets that don't contain all 9 people
valid_sets = ifilter(lambda combo:
len(set(chain.from_iterable(combo))) == 9,
combos_combos)
#a set of people that have already been paired
already_together = set()
for sets in valid_sets:
#get all (sorted) combinations of pairings in this set
pairings = list(chain.from_iterable(combinations(combo, 2) for combo in sets))
pairings = set(map(tuple, map(sorted, pairings)))
#if all of the pairings have never been paired before, we have a new one
if len(pairings.intersection(already_together)) == 0:
print sets
already_together.update(pairings)
This prints:
~$ time python test_combos.py
((1, 2, 3), (4, 5, 6), (7, 8, 9))
((1, 4, 7), (2, 5, 8), (3, 6, 9))
((1, 5, 9), (2, 6, 7), (3, 4, 8))
((1, 6, 8), (2, 4, 9), (3, 5, 7))
real 0m0.182s
user 0m0.164s
sys 0m0.012s
Try this:
from itertools import permutations
lst = list(range(1, 10))
n = 3
triplets = list(permutations(lst, n))
triplets = [set(x) for x in triplets]
def array_unique(seq):
checked = []
for x in seq:
if x not in checked:
checked.append(x)
return checked
triplets = array_unique(triplets)
result = []
m = n * 3
for x in triplets:
for y in triplets:
for z in triplets:
if len(x.union(y.union(z))) == m:
result += [[x, y, z]]
def groups(sets, i):
result = [sets[i]]
for x in sets:
flag = True
for y in result:
for r in x:
for p in y:
if len(r.intersection(p)) >= 2:
flag = False
break
else:
continue
if flag == False:
break
if flag == True:
result.append(x)
return result
for i in range(len(result)):
print('%d:' % (i + 1))
for x in groups(result, i):
print(x)
Output for n = 10:
http://pastebin.com/Vm54HRq3
Here's my attempt of a fairly general solution to your problem.
from itertools import combinations
n = 3
l = range(1, 10)
def f(l, n, used, top):
if len(l) == n:
if all(set(x) not in used for x in combinations(l, 2)):
yield [l]
else:
for group in combinations(l, n):
if any(set(x) in used for x in combinations(group, 2)):
continue
for rest in f([i for i in l if i not in group], n, used, False):
config = [list(group)] + rest
if top:
# Running at top level, this is a valid
# configuration. Update used list.
for c in config:
used.extend(set(x) for x in combinations(c, 2))
yield config
break
for i in f(l, n, [], True):
print i
However, it is very slow for high values of n, too slow for n=31. I don't have time right now to try to improve the speed, but I might try later. Suggestions are welcome!
My wife had this problem trying to arrange breakout groups for a meeting with nine people; she wanted no pairs of attendees to repeat.
I immediately busted out itertools and was stumped and came to StackOverflow. But in the meantime, my non-programmer wife solved it visually. The key insight is to create a tic-tac-toe grid:
1 2 3
4 5 6
7 8 9
And then simply take 3 groups going down, 3 groups going across, and 3 groups going diagonally wrapping around, and 3 groups going diagonally the other way, wrapping around.
You can do it just in your head then.
- : 123,456,789
| : 147,258,368
\ : 159,267,348
/ : 168,249,357
I suppose the next question is how far can you take a visual method like this? Does it rely on the coincidence that the desired subset size * the number of subsets = the number of total elements?

Categories

Resources