Finding intersectionpoints that form rectangles - python

I have a bunch of lines described by their direction as well as a point that describes its origin. I have to combine these lines to make them form rectangles that can lie within eachother, but their edges cannot overlap. I also know that the origin of the lines lie within an edge of a rectangle, but it does not necessarily lie in the middle of that edge. Basically the input I have could be something like this:
And what I'm trying to achieve looks something like this:
Where every line is now described by the points where it intersected the other lines to form the correct rectangles.
I'm looking for an algorithm that finds the relevant intersection points and links them to the lines that describe the rectangles.

First of all, this problem as it was stated, may have multiple solutions. For example I don't see any constraint that invalidates the following:
So, you need to define an objective, for example:
maximize total covered are
maximize number of rectangles
maximize number of used lines
Here I'm trying to maximize number of rectangle using a greedy approach. Keep in mind that a greedy algorithm never guarantees to find the optimum solution, but finds a sub-optimal one in a reasonable time.
Now, there are two steps in my algorithm:
Find all possible rectangles
Select a set of rectangles that satisfy constrains
Step 1: Find all possible rectangles
Two vertical lines (l & r) plus two horizontal lines (b & t) can form a valid rectangle if:
l.x < r.x and b.y < t.y
l.y and r.y are between b.y and t.y
b.x and t.x are between l.x and r.x
In the following pseudocode, Xs and Ys are sorted lists of vertical and horizontal line respectively:
function findRectangles
for i1 from 1 to (nx-1)
for i2 from (i1+1) to nx
for j1 from 1 to (ny-1)
if (Ys[j1].x>=Xs[i1].x and
Ys[j1].x<=Xs[i2].x and
Ys[j1].y<=Xs[i1].y and
for j2 from (j1+1) to ny
if (Ys[j2].x>=Xs[i1].x and
Ys[j2].x<=Xs[i2].x and
Ys[j2].y>=Xs[i1].y and
add [i1 j1 i2 j2] to results
end if
end for
end if
end for
end for
end for
Step 2: Select valid rectangles
Valid rectangles, as stated in the problem, can not overlap partially and also can not share an edge. In previous step, too many rectangles are found. But, as I said before, there may be more than one combination of these rectangles that satisfy constraints. To maximize the number of rectangle I suggest the following algorithm that tends to accept smaller rectangles:
function selectRects( Xs, Ys, rects )
sort rectangles by their area;
for i from 1 to rects.count
if (non of edges of rects[i] are eliminated)&
(rects[i] does not partially overlap any of items in results)
add rects[i] to results;
Xs[rects[i].left].eliminated = true;
Xs[rects[i].right].eliminated = true;
Ys[rects[i].bottom].eliminated = true;
Ys[rects[i].top].eliminated = true;
end if
end for


how to order the regions in regionprops by area?

i am doing some OCR whit python, in order to get the coordinates of the letters in an image, i take the centroid of a region(returned by the regionprops from skimage.measure) and if a distance between one centroid vs the others centroids is less than some value, i drop that region, i though this would solve the problem of several regions one inside the others but i missed that if a region with less area is detected first(like just a part of a letter) all the bigger regions (that may contain the whole letter) are ignored, here is my code
centroids = []
for region in regionprops(label_image):
if len(centroids) == 0:
do some stuff...
if len(centroids) != 0:
distances = []
for centroid in centroids:
distance = abs(centroid - region.centroid[1])
if all(i >= 0.5 * region_width for i in distances):
do some stuff...
now the questions here is if there is a way to order the list returned by regionprops by area? and how to do it?, or if you can give a better way to avoid the problem of a region inside another regions, thanks in advance
The Python built-in sorted() takes a key= argument, a function by which to sort, and a reversed= argument to sort in decreasing order. So you can change your loop to:
for region in sorted(
key=lambda r: r.area,
To check whether one region is completely contained in another, you can use r.bbox, and check whether one box is inside another, or overlaps it.
Finally, if you have a lot of regions, I recommend you build a scipy.spatial.cKDTree with all the centroids before running your loop, as this will make it much faster to check whether a region is close to existing ones.

Number of shortest paths

Here is the problem:
Given the input n = 4 x = 5, we must imagine a chessboard that is 4 squares across (x-axis) and 5 squares tall (y-axis). (This input changes, all the up to n = 200 x = 200)
Then, we are asked to determine the minimum shortest path from the bottom left square on the board to the top right square on the board for the Knight (the Knight can move 2 spaces on one axis, then 1 space on the other axis).
My current ideas:
Use a 2d array to store all the possible moves, perform breadth-first
search(BFS) on the 2d array to find the shortest path.
Floyd-Warshall shortest path algorithm.
Create an adjacency list and perform BFS on that (but I think this would be inefficient).
To be honest though I don't really have a solid grasp on the logic.
Can anyone help me with psuedocode, python code, or even just a logical walk-through of the problem?
BFS is efficient enough for this problem as it's complexity is O(n*x) since you explore each cell only one time. For keeping the number of shortest paths, you just have to keep an auxiliary array to save them.
You can also use A* to solve this faster but it's not necessary in this case because it is a programming contest problem.
dist = {}
ways = {}
def bfs():
start = 1,1
goal = 6,6
queue = [start]
dist[start] = 0
ways[start] = 1
while len(queue):
cur = queue[0]
if cur == goal:
print "reached goal in %d moves and %d ways"%(dist[cur],ways[cur])
for move in [ (1,2),(2,1),(-1,-2),(-2,-1),(1,-2),(-1,2),(-2,1),(2,-1) ]:
next_pos = cur[0]+move[0], cur[1]+move[1]
if next_pos[0] > goal[0] or next_pos[1] > goal[1] or next_pos[0] < 1 or next_pos[1] < 1:
if next_pos in dist and dist[next_pos] == dist[cur]+1:
ways[next_pos] += ways[cur]
if next_pos not in dist:
dist[next_pos] = dist[cur]+1
ways[next_pos] = ways[cur]
reached goal in 4 moves and 4 ways
Note that the number of ways to reach the goal can get exponentially big
I suggest:
Use BFS backwards from the target location to calculate (in just O(nx) total time) the minimum distance to the target (x, n) in knight's moves from each other square. For each starting square (i, j), store this distance in d[i][j].
Calculate c[i][j], the number of minimum-length paths starting at (i, j) and ending at the target (x, n), recursively as follows:
c[x][n] = 1
c[i][j] = the sum of c[p][q] over all (p, q) such that both
(p, q) is a knight's-move-neighbour of (i, j), and
d[p][q] = d[i][j]-1.
Use memoisation in step 2 to keep the recursion from taking exponential time. Alternatively, you can compute c[][] bottom-up with a slightly modified second BFS (also backwards) as follows:
c = x by n array with each entry initially 0;
seen = x by n array with each entry initially 0;
s = createQueue();
push(s, (x, n));
while (notEmpty(s)) {
(i, j) = pop(s);
for (each location (p, q) that is a knight's-move-neighbour of (i, j) {
if (d[p][q] == d[i][j] + 1) {
c[p][q] = c[p][q] + c[i][j];
if (seen[p][q] == 0) {
push(s, (p, q));
seen[p][q] = 1;
The idea here is to always compute c[][] values for all positions having some given distance from the target before computing any c[][] value for a position having a larger distance, as the latter depend on the former.
The length of a shortest path will be d[1][1], and the number of such shortest paths will be c[1][1]. Total computation time is O(nx), which is clearly best-possible in an asymptotic sense.
My approach to this question would be backtracking as the number of squares in the x-axis and y-axis are different.
Note: Backtracking algorithms can be slow for certain cases and fast for the other
Create a 2-d Array for the chess-board. You know the staring index and the final index. To reach to the final index u need to keep close to the diagonal that's joining the two indexes.
From the starting index see all the indexes that the knight can travel to, choose the index which is closest to the diagonal indexes and keep on traversing, if there is no way to travel any further backtrack one step and move to the next location available from there.
PS : This is a bit similar to a well known problem Knight's Tour, in which choosing any starting point you have to find that path in which the knight whould cover all squares. I have codes this as a java gui application, I can send you the link if you want any help
Hope this helps!!
Try something. Draw boards of the following sizes: 1x1, 2x2, 3x3, 4x4, and a few odd ones like 2x4 and 3x4. Starting with the smallest board and working to the largest, start at the bottom left corner and write a 0, then find all moves from zero and write a 1, find all moves from 1 and write a 2, etc. Do this until there are no more possible moves.
After doing this for all 6 boards, you should have noticed a pattern: Some squares couldn't be moved to until you got a larger board, but once a square was "discovered" (ie could be reached), the number of minimum moves to that square was constant for all boards not smaller than the board on which it was first discovered. (Smaller means less than n OR less than x, not less than (n * x) )
This tells something powerful, anecdotally. All squares have a number associated with them that must be discovered. This number is a property of the square, NOT the board, and is NOT dependent on size/shape of the board. It is always true. However, if the square cannot be reached, then obviously the number is not applicable.
So you need to find the number of every square on a 200x200 board, and you need a way to see if a board is a subset of another board to determine if a square is reachable.
Remember, in these programming challenges, some questions that are really hard can be solved in O(1) time by using lookup tables. I'm not saying this one can, but keep that trick in mind. For this one, pre-calculating the 200x200 board numbers and saving them in an array could save a lot of time, whether it is done only once on first run or run before submission and then the results are hard coded in.
If the problem needs move sequences rather than number of moves, the idea is the same: save move sequences with the numbers.

Vectorised Marching cubes (squares) - connecting the lines into curves

I am drawing a metaball with marching cubes (squares as it is a 2d) algorithm.
Everything is fine, but I want to get it as a vector object.
So far I've got a vector line or two from each active square, keeping them in list lines. In other words, I have an array of small vector lines, spatially displaying several isolines (curves) - my aim is to rebuild those curves back from lines.
Now I am stuck with fast joining them all together: basically I need to connect all lines one by one all together into several sequences (curves). I don't know how many curves (sequences) will be there, and lines could be in different directions and I need to process lines into sequences of unique points.
So far I wrote something obviously ugly and half-working (here line is a class with list of points as attribute points, and chP is a function checking if points are close enough, t definies this 'enough') :
def countur2(lines):
'''transform random list of lines into
list of grouped sequences'''
t = 2 # tolerance
sqnss = [[lines[0]]] # sequences
kucha = [lines[0]] #list of already used lines
for l in lines:
for i,el in enumerate(lines):
print 'working on el', i
ss = sqnss[-1][0]
ee = sqnss[-1][-1]
if el not in kucha:
if chP(el.points[0],ee.points[1],t):
elif chP(el.points[1],ee.points[1],t):
elif chP(el.points[1],ss.points[0],t):
sqnss[-1] = [el] + sqnss[-1]
elif chP(el.points[0],ss.points[0],t):
sqnss[-1] = [el.rvrse()] + sqnss[-1]
print 'new shape added, with el as start'
#return sqnse of points
ps = []
for x in sqnss: ps.append([el.points[0] for el in x])
return ps
I know this is such a big question, but please give me any clue on right direction to handle this task
A first option is to number all cell sides uniquely, and associate to every vector the pair of edges it joins.
Enter all pairs in a dictionary, both ways: (a,b) and (b,a). Then, starting from an arbitrary pair, say (a,b), you will find the next pair through b, say (b,c). You will remove both (b,c) and (c,b) from the dictionary, and continue from c, until the chain breaks on a side of the domain, or loops.
A second option is to scan the whole domain and when you find a cell crossed by an isocurve, compute the vector, and move to the neighboring cell that shares an edge crossed by the isocurve, and so on. To avoid an infinite scanning, you will flag the cell as already visited.
By contrast with the first approach, no dictionary is required, as the following of the chain is purely based on the local geometry.
Beware that there are two traps:
cells having one or more corner values equal to the iso-level are creating trouble. A possible cure is by slightly modifying the values corner; this will create a few tiny vectors.
cells can be crossed by two vectors instead of one, and require to be visited twice.

Performance of a "fuzzy" Jaccard index implementation

I'm a trying to calculate a kind of fuzzy Jaccard index between two sets with the following rationale: as the Jaccard index, I want to calculate the ratio between the number of items that are common to both sets and the total number of different items in both sets. The problem is that I want to use a similarity function with a threshold to determine what what counts as the "same" item being in both sets, so that items that are similar:
Aren't counted twice in the union
Are counted in the intersection.
I have a working implementation here (in python):
def fuzzy_jaccard(set1, set2, similarity, threshold):
intersection_size = union_size = len(set1 & set2)
shorter_difference, longer_difference = sorted([set2 - set1, set1 - set2], key=len)
while len(shorter_difference) > 0:
item1, item2 = max(
itertools.product(longer_difference, shorter_difference),
key=lambda (a, b): similarity(a, b)
if similarity(item1, item2) > threshold:
union_size += 1
intersection_size += 1
union_size += 2
union_size = union_size + len(longer_difference)
return intersection_size / union_size
The problem here is the this is quadratic in the size of the sets, because in itertools.product I iterate in all possible pairs of items taken one from each set(*). Now, I think I must do this because I want to match each item a from set1 with the best possible candidate b from set2 that isn't more similar to another item a' from set1.
I have a feeling that there should be a O(n) way of doing that I'm not grasping. Do you have any suggestions?
There are other issues two, like recalculating the similarity for each pair once I get the best match, but I don't care to much about them.
I doubt there's any way that would be O(n) in the general case, but you can probably do a lot better than O(n^2) at least for most cases.
Is similarity transitive? By this I mean: can you assume that distance(a, c) <= distance(a, b) + distance(b, c)? If not, this answer probably won't help. I'm treating similarities like distances.
Try clumping the data:
Pick a radius r. Based on intuition, I suggest setting r to one-third of the average of the first 5 similarities you calculate, or something.
The first point you pick in set1 becomes the centre of your first clump. Classify the points in set2 as being in the clump (similarity to the centre point <= r) or outside the clump. Also keep track of points that are within 2r of the clump centre.
You can require that clump centre points be at least a distance of 2r from each other; in that case some points may not be in any clump. I suggest making them at least r from each other. (Maybe less if you're dealing with a large number of dimensions.) You could treat every point as a clump centre but then you wouldn't save any processing time.
When you pick a new point, first compare it with the clump centre points (even though they're in the same set). Either it's in an already existing clump, or it becomes a new clump centre, (or perhaps neither if it's between r and 2r of a clump centre). If it's within r of a clump centre, then compare it with all points in the other set that are within 2r of that clump centre. You may be able to ignore points further than 2r from the clump centre. If you don't find a similar point within the clump (perhaps because the clump has no points left), then you may have to scan all the rest of the points for that case. Hopefully this would mostly happen only when there aren't many points left in the set. If this works well, then in most cases you'd find the most similar point within the clump and would know that it's the most similar point.
This idea may require some tweaking.
If there are a large number of dimenstions involved, then you might find that for a given radius r, frustratingly many points are within 2r of each other while few are within r of each other.
Here's another algorithm. The more time-consuming it is to calculate your similarity function (as compared to the time it takes to maintain sorted lists of points) the more index points you might want to have. If you know the number of dimensions, it might make sense to use that number of index points. You might reject a point as a candidate index point if it's too similar to another index point.
For each of the first point you use and any others you decide to use as index points, generate a list of all the remaining points in the other set, sorted in order of distance from the index point,
When you're comparing a point P1 to points in the other set, I think you can skip over sets for two possible reasons. Consider the most similar point P2 you've found to P1. If P2 is similar to an index point then you can skip all points which are sufficiently dissimilar from that index point. If P2 is dissimilar to an index point then you can skip over all points which are sufficiently similar to that index point. I think in some cases you can skip over some of both types of point for the same index point.

how to find a selected vert's symmetrical pair on a maya mesh

What would be the best (fastest) method to find a given vertex's symmetrical pair (ie , a vert on the left side of a mesh and its right hand side equivalent) Is it possible to use the open maya api in python for this or is there a better way?
You don't want to check the position of every vert against every other over and over, that will be extremely slow.
A simple approach is to get a hash value - a simple comparison operation - for every vertex that is identical for two verts which are symmetrical. Luckily tuples - NOT lists!! - are hashable. So the algorithm would be:
get a position tuple for each vert
make a dictionary (position: vertex) for all the verts on the 'left' (or 'up' or 'back' etc) side.
make a second dictionary (position: vertex) for the verts on the opposite side, with the symmetry axis flipped: if you were doing left/right axis, the left list would be
{ (x, y, z) : "pCube1.vtx[0]" } #etc
and the right list would be
{ (-1 * x, y, z) : "pCube1.vtx[99]" } # and so on
Any key which is duplicated in both dictionaries is symmetrical, so you need to collect the keys which show up in both and then get the verts from each side they represent:
duplicates = [j for j in leftDictionary if j in rightDictionary]
pairs = [leftDictionary[k], rightDictionary[k] for k in keys]
This won't be super fast on big meshes but it should do the trick.
The one place it may have issues is if there are floating point discrepancies between the sides so items which are visually symmetrical may not be mathematically symmetrical. You could modify it to match numbers withing a given tolerance by, say, quantizing the values you put into the tuples -- the easy way would be to blow them up by some factor, say 1000, and then turn them into integers. The raw method errs on the side of missing some things which look symmetrical but aren't exactly, the quantized method may collapse multiple matches if they are too close. I'd stick with the raw data unless it's failing to do what you need.

