Joining two convex nonintersecting polygons into one

Joining two convex nonintersecting polygons into one - python

I need to join two convex, non-intersecting polygons into one joined covex polygon in way of minimisation of resulting area, like in picture below: I'm seeking for an alhorithm doing this. I also would be appreciate if someone provide me with corresponding python implementation.

If there are two non-intersecting polygons having say, m and n vertices respectively, then your problem can be thought of in this way:
Finding the convex polygon of the least area containing all of the m+n points. Having said this, check out the QuickHull Algorithm here: http://www.geeksforgeeks.org/quickhull-algorithm-convex-hull/
Additionally you can also check out these algorithms.
Jarvis's Algorithm: http://www.geeksforgeeks.org/convex-hull-set-1-jarviss-algorithm-or-wrapping/
And, Graham's Scan: http://www.geeksforgeeks.org/convex-hull-set-2-graham-scan/
Hope this helps.
P.S. I think you can find the python implementations of these algorithms anywhere on the internet. :)

For an efficient solution, you can adapt the Monotone Chain method (https://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain) as follows:
for both polygons, find the leftmost and rightmost sites (in case of ties, use the highest/lowest respectively);
these sites split the polygons in two chains, that are ordered on X;
merge the two upper and two lower chains with comparisons on X (this is a pass of mergesort);
reject the reflex sites from the upper and lower chains, using the same procedure as in the monotone chain method (a variant of Graham's walk).
The total running time will be governed by
n + m comparisons to find the extreme sites;
n + m comparisons for the merge;
n + m + 2 h LeftOf tests (signed area; h is the number vertices of the result).
Thus the complexity is O(n + m), which is not optimal but quite probably good enough for your purposes (a more sophisticated O(Log(n + m) solution is possible when the polygons do not overlap, but not worth the fuss for small polygon sizes).
In the example, the result of the merges are just the concatenation of the chains, but more complex cases can arise.
Final remark: if you keep all polygons as the concatenation of two monotone chains, you can spare the first step of the above procedure.

Finding the convex hull of both sets would work but the following approach is probably faster as it only needs to visit the polygons vertices in order:
Given polygons P and Q, pick from every one a vertex p1 and q1.
Search in Q the vertex q2 contiguous to q1 such that the rotation from p1-q1 to p1-q2 is clockwise (this can be checked easyly using vector cross product).
Repeat until you reach a point qk whose two contiguous vertices in Q generate and anticlockwise rotation.
Now, invert the process traveling from p1 across contigous vertices in P such that the rotation is anticlockwise until an extreme pl is found again.
Repeat from 2 until no more advance is possible. You have now two points pm and pn which are two the vertices where one side of the red area meets the black polygons in your drawing above.
Now repeat the algorithm again but changing the directions, from clockwise to anti-clockwise and vice-versa in order to find the vertices for the other side of the red area.
The only remaining work is generating the final polygon from the two red area sides already found and the segments from the polygons.

Related

How to get the largest simple convex polygon from a complex polygon?

Given a polygon that may be concave and may have holes, how can I get the largest simple convex polygon composed of a subset of its vertices?
ie, given the simple concave polygon:
p = Polygon([(30, 2.01), (31.91, 0.62), (31.18, -1.63), (28.82, -1.63), (28.09, 0.62), (30, -0.001970703138552843)])
I want the largest simple convex polygon (perhaps the same but without the leftmost point (28.09, 0.62) and replace (28.82, -1.63) with (30, -1.63)). Like this:
This is just an unmeasured example. It may be that in fact both (28.09, 0.62) and (30, 2.01) must be removed, if this produces a larger area, such as might result from the cut indicated by the green line here:
But, assuming the first cut was correct, If we added a hole to the "other side":
p = Polygon([(30, 2.01), (31.91, 0.62), (31.18, -1.63), (28.82, -1.63), (28.09, 0.62), (30, -0.001970703138552843)],
[[(30.1,0.62), (30.1,1.25), (31, 1.25), (31,0.62)]])
the largest simple convex polygon might in such cases rotate to the other half of the polygon, so instead of dropping the previous point, it would drop (30, 2.01) and replace (31.91, 0.62) with a point between that and (31, -1.63). Obviously, in this case it would throw out all of the vertices of the hole.
commentary
Any hole that would be left intact inside the polygon would introduce a concave angle to the polygon by definition. in the case that there is a hole in the input polygon, at most one edge from it can remain in the output polygon (and, by definition of "simple polygon", that edge would be a member of the exterior coordinates).
There's a little bit of sloppiness in this definition so I should try to be more clear. All interior and exterior vertices are members of the set of possible points in the output simple polygon. So are all points that intersect the interior and exterior bounds (so the line segments between them). The selection of points should result in a simple, convex polygon that is inscribed in the source polygon. In the case that the source polygon is a simple, convex polygon, it should return the same polygon as output. It is quite possible to have whole families of candidate solutions with equal area. If they are maximal, any one of them will do.
Sketch approach: if you throw out cuts like in the sample with the green line, then all that remains are removal of points with projections from segments. So you could count all interior and exterior points as a set, and exclude subsets of 0 or more of them, then find the largest convex polygon. So, either just exclude the point or when excluding a point produces a new concave angle, project from the line segment on that side of the angle to the line segment on the other side of the polygon (this is the approach used to produce the first sample solution image). Revisiting the green line cuts, these are lines that bisect the polygon and tangent the center point of the concave angle. If this bisection must run perpendicular to the line from the center point to the centroid of the remaining polygon, then this is not much more complex. But I'm not sure that that is true. And in any case, that is a lot of polygons to consider.
note: at first I marked a duplicate, thinking this is essentially a more complicated version of another question (Finding largest subset of points forming a convex polygon but with holes). However, this approach does not allow for addition of new vertices in the solution. For example, Delaunay triangulation of the first shape in this article produces no new points:
[ 'POLYGON ((28.09 0.62, 28.82 -1.63, 30 -0.001970703138552843, 28.09 0.62))',
'POLYGON ((28.09 0.62, 30 -0.001970703138552843, 30 2.01, 28.09 0.62))',
'POLYGON ((30 2.01, 30 -0.001970703138552843, 31.91 0.62, 30 2.01))',
'POLYGON ((31.91 0.62, 30 -0.001970703138552843, 31.18 -1.63, 31.91 0.62))',
'POLYGON ((28.82 -1.63, 31.18 -1.63, 30 -0.001970703138552843, 28.82 -1.63))']
The article provided as possible duplicate only counts subsets of the points to find the maximal convex hull -- ie, it does not introduce points on the line.

I am not really sure your problem is well specified (or rather, they way you describe it it is reduced to a well-known, simpler problem).
First, let me introduce you the idea of the Convex Hull:
Given a list of points, the convex hull is the smallest convex (simple) polygon that contains all points.
The shape of the CH is essentially what you would get if you were to "place a rubber band" around the points so that it touches the outer ones.
Now, there is a straightforward property of the CH:
Given a set of points, the are of their CH is larger (or equal) than the area of any other (simple) polygon those may form.
This is true because
i) If they form a convex polygon, then they form the CH by definition.
ii) otherwise, they form some non convex shape. Visually, you can get from the CH to that non convex shape by "removing triangles" comprised of 2 points on the CH and one inner point. So you are removing area, so the CH has the largest area.
So, the largest convex polygon comprise of all the vertices is the CH.
Now, about selecting a subset of the original vertices: This will obviously give you a smaller (or equal) -sized shape. So there is no point in selecting any subset, really.
Also, holes don't really impact this argument. Keeping the whole is obviously to your benefit, since you can add the area around the hole.
So, the final answer (unless I missed something), is that all you need is the Convex Hull.
Fortunately, there are some good python libraries for computing, plotting and messing around with convex hulls.

Remove almost parallels NetworkX shortest path

I generated a path between locations A and B with the constrain of locations that I have to pass throw them or at close to them so the route looks like: A -> c1 -> c2 - > B, even though it is not the shortest path.
I used for path in nx.all_shortest_paths(UG, source=l1_node_id, target=l2_node_id,weight = 'wgt'):
when 'wgt' is the distance of the edge/driving speed in this road.
I generated a list of lists where each inner list is the node_id for example:
l_list = [
[n11,n12,n13,n14....]
[n21,n22,n23,n24....]
..
]
and on the map, its looks like:(the markers are the beginning of each route and I also colored them with different color)
I want to change it to one route but as you can see there are some splits like the green and the red, some common sequences(which I can handle) and the second problem is the beginning of the blue route\end of the black one which is unimportant.
I can't just remove the red route because it supposed to be a generic algorithm and I don't know even where it will happen again during this route.
I do have timestamps of each marker but it just says that I have been close to this area. (it is locations of cellular antennas)

First, you gonna need to define what is "almost parallel" more concisely, or more formally, you need to define a similarity function.
Choosing a similarity/distance function
There are plenty of ways to define a similarity function, here is one of them
Resample
Assuming each node n_i has an x and y coordinates (n_i_x,n_i_y).
You can resample the points on the x-axis, such that the new points are sampled at 1km.
Then, for each 2 routes, you'll can sum the difference in the y axis.
Use this distance in order to cluster routes.
Other ideas
Earth mover distance
Jaccard (~ % of common nodes)
Clustering
Once you defined a similarity function, you can use a distance based clstering algorithm, I recommend using sklearn's agglomerative clustering.
After the clustering is done, all you have left to do is to choose one route from each cluster.

What is the Most Efficient Way to Compute the (euclidean) Distance of the Nearest Neighbor in a List of (x,y,z) points?

What is the most efficient way compute (euclidean) distance of the nearest neighbor for each point in an array?
I have a list of 100k (X,Y,Z) points and I would like to compute a list of nearest neighbor distances. The index of the distance would correspond to the index of the point.
I've looked into PYOD and sklearn neighbors, but those seem to require "teaching". I think my problem is simpler than that. For each point: find nearest neighbor, compute distance.
Example data:
points = [
(0 0 1322.1695
0.006711111 0 1322.1696
0.026844444 0 1322.1697
0.0604 0 1322.1649
0.107377778 0 1322.1651
0.167777778 0 1322.1634
0.2416 0 1322.1629
0.328844444 0 1322.1631
0.429511111 0 1322.1627...)]
compute k = 1 nearest neighbor distances
result format:
results = [nearest neighbor distance]
example results:
results = [
0.005939372
0.005939372
0.017815632
0.030118587
0.041569616
0.053475883
0.065324964
0.077200014
0.089077602)
]
UPDATE:
I've implemented two of the approaches suggested.
Use the scipy.spatial.cdist to compute the full distances matrices
Use a nearest X neighbors in radius R to find subset of neighbor distances for every point and return the smallest.
Results are that Method 2 is faster than Method 1 but took a lot more effort to implement (makes sense).
It seems the limiting factor for Method 1 is the memory needed to run the full computation, especially when my data set is approaching 10^5 (x, y, z) points. For my data set of 23k points, it takes ~ 100 seconds to capture the minimum distances.
For method 2, the speed scales as n_radius^2. That is, "neighbor radius squared", which really means that the algorithm scales ~ linearly with number of included neighbors. Using a Radius of ~ 5 (more than enough given application) it took 5 seconds, for the set of 23k points, to provide a list of mins in the same order as the point_list themselves. The difference matrix between the "exact solution" and Method 2 is basically zero.
Thanks for everyones' help!

Similar to Caleb's answer, but you could stop the iterative loop if you get a distance greater than some previous minimum distance (sorry - no code).
I used to program video games. It would take too much CPU to calculate the actual distance between two points. What we did was divide the "screen" into larger Cartesian squares and avoid the actual distance calculation if the Delta-X or Delta-Y was "too far away" - That's just subtraction, so maybe something like that to qualify where the actual Eucledian distance metric calculation is needed (extend to n-dimensions as needed)?
EDIT - expanding "too far away" candidate pair selection comments.
For brevity, I'll assume a 2-D landscape.
Take the point of interest (X0,Y0) and "draw" an nxn square around that point, with (X0,Y0) at the origin.
Go through the initial list of points and form a list of candidate points that are within that square. While doing that, if the DeltaX [ABS(Xi-X0)] is outside of the square, there is no need to calculate the DeltaY.
If there are no candidate points, make the square larger and iterate.
If there is exactly one candidate point and it is within the radius of the circle incribed by the square, that is your minimum.
If there are "too many" candidates, make the square smaller, but you only need to reexamine the candidate list from this iteration, not all the points.
If there are not "too many" candidates, then calculate the distance for that list. When doing so, first calculate DeltaX^2 + DeltaY^2 for the first candidate. If for subsequent candidates the DetlaX^2 is greater than the minumin so far, no need to calculate the DeltaY^2.
The minimum from that calculation is the minimum if it is within the radius of the circle inscribed by the square.
If not, you need to go back to a previous candidate list that includes points within the circle that has the radius of that minimum. For example, if you ended with one candidate in a 2x2 square that happened to be on the vertex X=1, Y=1, distance/radius would be SQRT(2). So go back to a previous candidate list that has a square greated or equal to 2xSQRT(2).
If warranted, generate a new candidate list that only includes points withing the +/- SQRT(2) square.
Calculate distance for those candidate points as described above - omitting any that exceed the minimum calcluated so far.
No need to do the square root of the sum of the Delta^2 until you have only one candidate.
How to size the initial square, or if it should be a rectangle, and how to increase or decrease the size of the square/rectangle could be influenced by application knowledge of the data distribution.
I would consider recursive algorithms for some of this if the language you are using supports that.

How about this?
from scipy.spatial import distance
A = (0.003467119 ,0.01422762 ,0.0101960126)
B = (0.007279433 ,0.01651597 ,0.0045558849)
C = (0.005392258 ,0.02149997 ,0.0177409387)
D = (0.017898802 ,0.02790659 ,0.0006487222)
E = (0.013564214 ,0.01835688 ,0.0008102952)
F = (0.013375397 ,0.02210725 ,0.0286032185)
points = [A, B, C, D, E, F]
results = []
for point in points:
distances = [{'point':point, 'neighbor':p, 'd':distance.euclidean(point, p)} for p in points if p != point]
results.append(min(distances, key=lambda k:k['d']))
results will be a list of objects, like this:
results = [
{'point':(x1, y1, z1), 'neighbor':(x2, y2, z2), 'd':"distance from point to neighbor"},
...]
Where point is the reference point and neighbor is point's closest neighbor.

The fastest option available to you may be scipy.spatial.distance.cdist, which finds the pairwise distances between all of the points in its input. While finding all of those distances may not be the fastest algorithm to find the nearest neighbors, cdist is implemented in C, so it is likely run faster than anything you try in Python.
import scipy as sp
import scipy.spatial
from scipy.spatial.distance import cdist
points = sp.array(...)
distances = sp.spatial.distance.cdist(points)
# An element is not its own nearest neighbor
sp.fill_diagonal(distances, sp.inf)
# Find the index of each element's nearest neighbor
mins = distances.argmin(0)
# Extract the nearest neighbors from the data by row indexing
nearest_neighbors = points[mins, :]
# Put the arrays in the specified shape
results = np.stack((points, nearest_neighbors), 1)
You could theoretically make this run faster (mostly by combining all of the steps into one algorithm), but unless you're writing in C, you won't be able to compete with SciPy/NumPy.
(cdist runs in Θ(n2) time (if the size of each point is fixed), and every other part of the algorithm in O(n) time, so even if you did try to optimize the code in Python, you wouldn't notice the change for small amounts of data, and the improvements would be overshadowed by cdist for more data.)

Finding all points common to two circles

In Python, how would one find all integer points common to two circles?
For example, imagine a Venn diagram-like intersection of two (equally sized) circles, with center-points (x1,y1) and (x2,y2) and radii r1=r2. Additionally, we already know the two points of intersection of the circles are (xi1,yi1) and (xi2,yi2).
How would one generate a list of all points (x,y) contained in both circles in an efficient manner? That is, it would be simple to draw a box containing the intersections and iterate through it, checking if a given point is within both circles, but is there a better way?

Keep in mind that there are four cases here.
Neither circle intersects, meaning the "common area" is empty.
One circle resides entirely within the other, meaning the "common area" is the smaller/interior circle. Also note that a degenerate case of this is if they are the same concentric circle, which would have to be the case given the criteria that they are equal-diameter circles that you specified.
The two circles touch at one intersection point.
The "general" case where there are going to be two intersection points. From there, you have two arcs that define the enclosed area. In that case, the box-drawing method could work for now, I'm not sure there's a more efficient method for determining what is contained by the intersection. Do note, however, if you're just interested in the area, there is a formula for that.

You may also want to look into the various clipping algorithms used in graphics development. I have used clipping algorithms to solve alot of problems similar to what you are asking here.

If the locations and radii of your circles can vary with a granularity less than your grid, then you'll be checking a bunch of points anyway.
You can minimize the number of points you check by defining the search area appropriately. It has a width equal to the distance between the points of intersection, and a height equal to
r1 + r2 - D
with D being the separation of the two centers. Note that this rectangle in general is not aligned with the X and Y axes. (This also gives you a test as to whether the two circles intersect!)
Actually, you'd only need to check half of these points. If the radii are the same, you'd only need to check a quarter of them. The symmetry of the problem helps you there.

You're almost there.
Iterating over the points in the box should be fairly good, but you can do better if for the second coordinate you iterate directly between the limits.
Say you iterate along the x axis first, then for the y axis, instead of iterating between bounding box coords figure out where each circle intersects the x line, more specifically you are interested in the y coordinate of the intersection points, and iterate between those (pay attention to rounding)
When you do this, because you already know you are inside the circles you can skip the checks entirely.
If you have a lot of points then you skip a lot of checks and you might get some performance improvements.
As an additional improvement you can pick the x axis or the y axis to minimize the number of times you need to compute intersection points.

So you want to find the lattice points that are inside both circles?
The method you suggested of drawing a box and iterating through all the points in the box seems the simplest to me. It will probably be efficient, as long as the number of points in the box is comparable to the number of points in the intersection.
And even if it isn't as efficient as possible, you shouldn't try to optimize it until you have a good reason to believe it's a real bottleneck.

I assume by "all points" you mean "all pixels". Suppose your display is NX by NY pixels. Have two arrays
int x0[NY], x1[NY]; initially full of -1.
The intersection is lozenge-shaped, between two curves.
Iterate x,y values along each curve. At each y value (that is, where the curve crosses y + 0.5), store the x value in the array. If x0[y] is -1, store it in x0, else store it in x1.
Also keep track of the lowest and highest values of y.
When you are done, just iterate over the y values, and at each y, iterate over the x values between x0 and x1, that is, for (ix = x0[iy]; ix < x1[iy]; ix++) (or the reverse).
It's important to understand that pixels are not the points where x and y are integers. Rather pixels are the little squares between the grid lines. This will prevent you from having edge-case problems.

Estimating the boundary of arbitrarily distributed data

I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it.
Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch.
My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists.
1st attempt http://astro.dur.ac.uk/~dmurphy/data_limits.png
OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg:
from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx
y=ymin+dy, do 1
do 1-2, but now sample in y
An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments.
Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary?
A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon.
The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now?
Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really!
~~~~~~~~~~~~~~~~~~~~~~~~~
OK, here's attempt #2 using Mark's idea of convex hulls:
alt text http://astro.dur.ac.uk/~dmurphy/data_limitsv2.png
For this I used qconvex from the qhull package, getting it to return the extreme vertices. For those interested:
cat [data] | qconvex Fx > out
The sampling of the perimeter seems quite low, and although I haven't played much with the settings, I'm not convinced I can improve the fidelity.

I think what you are looking for is the Convex Hull of the data That will give a set of points that if connected will mean that all your points are on or inside the connected points

I may have mixed something, but what's the motivation for simply not determining the maximum and minimum x and y level? Unless you have an enormous amount of data you could simply iterate through your points determining minimum and maximum levels fairly quickly.
This isn't the most efficient example, but if your data set is small this won't be particularly slow:
import random
data = [(random.randint(-100, 100), random.randint(-100, 100)) for i in range(1000)]
x_min = min([point[0] for point in data])
x_max = max([point[0] for point in data])
y_min = min([point[1] for point in data])
y_max = max([point[1] for point in data])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.