I'm interested in calculating the Hausdorff Distance between 2 polygons (specifically quadrilaterals which are almost rectangles) defined by their vertices. They may overlap.
Recall $d_H(A,B) = \max(d(A,B), d(B,A))$ where $d$ is the Hausdorff semi-metric
$d(A,B) = \sup_{a\in A}\inf_{b\in B}d(a,b)$.
Is it true that, given a finite disjoint covering of $A$, ${A_i}$, $d(A,B)=\max{d(A_i,B)}$? A corollary of which is that $d(A,B)=d(A\setminus B,B)$.
I have found a paper by Atallah 1 (PDF). I'm interested in working in Python and would be open to any preprogrammed solutions.
In the case of convex polygons, d(A, B) is the maximum of the distances from vertices of A to any point in B. Therefore the Hausdorff distance is not too hard to calculate if you can calculate the distance from an arbitrary point to a convex polygon.
To calculate the distance from a point to a convex polygon you first have to test whether the point is inside the polygon (if so the distance is 0), and then if it is not find the minimum distance to any of the bounding line segments.
The answer to your other query is no. As an example let A and B both be the same 2x2 square centered at the origin. Partition A into 4 1x1 squares. The Hausdorff distance from each Ai to B is sqrt(2), but the distance from A to B is 0.
UPDATE: The claim about the vertices is not immediately obvious, therefore I'll sketch a proof that is good in any finite number of dimensions. The result I am trying to prove is that in calculating d(A, B) with both polygons and B convex, it suffices to find the distances from the vertices of A to B. (The closest point in B might not be a vertex, but one of the farthest points in A must be a vertex.)
Since both are finite closed shapes, they are compact. From compactness, there must exist a point p in A that is as far as possible from B. From compactness, there must exist a point q in B that is as close as possible to A.
This distance is 0 only if A and B are the same polygon, in which case it is clear that we achieve that distance at a vertex of A. So without loss of generality we may assume that there is a positive distance from p to q.
Draw the plane (in higher dimensions, some sort of hyperplane) touching q that is perpendicular to the line from p to q. Can any point in B cross this plane? Well if there was one, say r, then every point on the line segment from q to r must be in B because B is convex. But it is easy to show that there must be a point on this line segment that is closer to p than q is, contradicting the definition of q. Therefore B cannot cross this plane.
Clearly p cannot be an interior point, because if it was, then continue along the ray from q to p and you find points in A that are farther from the plane that B is bounded behind, contradicting the definition of p. If p is a vertex of A, then the result is trivially true. Therefore the only interesting case is if p is on a boundary of A but is not a vertex.
If so, then p is on a surface. If that surface were not parallel to the plane we constructed, it would be easy to travel along that surface, away from the plane we have bounded B behind, and find points farther away from B than p. Therefore that surface must be parallel to that plane. Since A is finite, that surface must terminate in vertices somewhere. Those vertices are the same distance from that plane as p, and therefore are at least as far from B as p. Therefore there exists at least one vertex of A that is as far as possible from B.
That is why it sufficed to find the maximum of the distances from the vertices of the polygons to the other polygon.
(I leave constructing a pair of polygons with q not a vertex as a fun exercise for the reader.)
Related
the Wikipedia article about Delaunay Triangulations in d dimensions states as a prerequisite for uniqueness of a triangulation:
It is known that there exists a unique Delaunay triangulation for P if P is a set of points in general position; that is, the affine hull of P is d-dimensional and no set of d + 2 points in P lie on the boundary of a ball whose interior does not intersect P.
Now that I've written my own Delaunay library, I want to be able to verify the uniqueness of the triangulation given its points. Checking the dimensionality of the affine hull can be done easily by calculating the rank of the set. The second part however is way more difficult.
How can I check if d+2 points lie on the boundary of a ball not intersecting with the set and without some gigantic loop over each point? Or is there maybe an alternative way of checking the uniqueness?
I'm using Python with Numpy, however this is more of an theoretical question, thus the language doesn't matter.
Thanks!
For each a simplex of maximal dimension of your triangulation, with d+1 points, you have d+1 neighbor simplices. Two neighbor simplices share d points (forming a simplex themselves). As I do not know if you are familiar with that working in dimension d, let's have a look at dimensions 2 and 3...
In dimension 2, a simplex of maximal dimension is a triangle (simplex of dimension 2), and two neighbor triangles share an edge (simplex of dimension 1).
In dimension 3, a simplex of maximal dimension is a tetrahedron (simplex of dimension 3), and two neighbor tetrahedra share an a face that is a triangle (simplex of dimension 2).
Anyway, given a Delaunay triangulation in dimension d, if you want to check if you have a unique triangulation, iterate on simplices of dimension d-1 (triangles in 3D, or edges in 2D), and take two incident simplices (two neighbor tetrahedra in 3D, or neighbor triangles in 2D). That gives you d+2 points. As the triangulation is Delaunay, you know that the in_sphere predicate for those d+2 points has a result positive or null. If, during the iteration on simplices of dimension d, the result of all the in_sphere predicate for the d+2 points is strictly positive, then your triangulation is unique, otherwise it is degenerated and non-unique.
Have a look at the paper Efficient Exact Geometric Predicates for Delaunay Triangulations by Olivier Devillers and Sylvain Pion: the introduction gives the basics of the theory about Delaunay triangulations, and the in_sphere predicate. It also gives leads to implement the orientation and in_sphere predicates in an exact but efficient way. That might help you in your implementation of the Delaunay triangulations.
You have already a Delaunay triangulation i.e., there are no points inside a circumsphere. Now, e.g., for the 2D case, simply check if you can flip any edge such that it is still a Delaunay triangulation. If not, you have a unique solution. Be aware of the rounding errors of double precision arithmetic.
I would like to write a code in python that would solve for the smallest angle (θ) that would include all the points in a 2D plane given any number of points. The vertex of the angle is always centered at the origin (0,0). The points are defined using the Cartesian coordinate system with an (x,y) value. A figure below is shown for visualization. Any thoughts on how I should approach this problem?
Convert each of the Cartesian representations into polar coordinates.
Sort by reference angle.
Subtract adjacent reference angles to get the angles between adjacent vectors. Make sure to include the last and first points as one more angle to compute.
Identify the largest angle between adjacent vectors. The opposite side of this angle is the smallest angle that include all of the points.
For instance, using the canonical representation -- counter-clockwise from the x-positive ray -- you would find the reference angle for each polar vector. Sorted, you would have the list [d, c, b, a, e, f].
Next, you compute the angles dOc, cOb, bOa, ... and fOd.
You note that aOe is the largest angle of them all being somewhat in excess of a whole radian.
Therefore, eOa is the desired angle.
given a set of n points , i take k points randomly. I need to compute in the most efficient way the maximum distance of the k points from the n points with a 2-approx factor (exploiting in some way the triangular inequality).
A first idea I had was to use the Manhattan distance instead of the Euclidean Distance, but this does not reduce complexity as it is still O(n*k).
What could be some ideas?
EDIT: what if i first compute the 2 farthest point in the k points and then calculate the distance of the 2 points from all the n points?
Technically, if you are only looking for the points with maximum distance, you can build a polygon (convex hull) with the points, the maximum distance should be the ones in the border.
You can calculate convex hull in O(k.log(k))
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.ConvexHull.html
After that, you need to just test points on the border.
This is the deterministic approach, you can apply heuristic, randomized search to do it faster but they are not guaranteed to provide the correct solution.
Here's a paper which discusses the topic with another algorithm: https://arxiv.org/ftp/arxiv/papers/1708/1708.02758.pdf
What is the most efficient way compute (euclidean) distance of the nearest neighbor for each point in an array?
I have a list of 100k (X,Y,Z) points and I would like to compute a list of nearest neighbor distances. The index of the distance would correspond to the index of the point.
I've looked into PYOD and sklearn neighbors, but those seem to require "teaching". I think my problem is simpler than that. For each point: find nearest neighbor, compute distance.
Example data:
points = [
(0 0 1322.1695
0.006711111 0 1322.1696
0.026844444 0 1322.1697
0.0604 0 1322.1649
0.107377778 0 1322.1651
0.167777778 0 1322.1634
0.2416 0 1322.1629
0.328844444 0 1322.1631
0.429511111 0 1322.1627...)]
compute k = 1 nearest neighbor distances
result format:
results = [nearest neighbor distance]
example results:
results = [
0.005939372
0.005939372
0.017815632
0.030118587
0.041569616
0.053475883
0.065324964
0.077200014
0.089077602)
]
UPDATE:
I've implemented two of the approaches suggested.
Use the scipy.spatial.cdist to compute the full distances matrices
Use a nearest X neighbors in radius R to find subset of neighbor distances for every point and return the smallest.
Results are that Method 2 is faster than Method 1 but took a lot more effort to implement (makes sense).
It seems the limiting factor for Method 1 is the memory needed to run the full computation, especially when my data set is approaching 10^5 (x, y, z) points. For my data set of 23k points, it takes ~ 100 seconds to capture the minimum distances.
For method 2, the speed scales as n_radius^2. That is, "neighbor radius squared", which really means that the algorithm scales ~ linearly with number of included neighbors. Using a Radius of ~ 5 (more than enough given application) it took 5 seconds, for the set of 23k points, to provide a list of mins in the same order as the point_list themselves. The difference matrix between the "exact solution" and Method 2 is basically zero.
Thanks for everyones' help!
Similar to Caleb's answer, but you could stop the iterative loop if you get a distance greater than some previous minimum distance (sorry - no code).
I used to program video games. It would take too much CPU to calculate the actual distance between two points. What we did was divide the "screen" into larger Cartesian squares and avoid the actual distance calculation if the Delta-X or Delta-Y was "too far away" - That's just subtraction, so maybe something like that to qualify where the actual Eucledian distance metric calculation is needed (extend to n-dimensions as needed)?
EDIT - expanding "too far away" candidate pair selection comments.
For brevity, I'll assume a 2-D landscape.
Take the point of interest (X0,Y0) and "draw" an nxn square around that point, with (X0,Y0) at the origin.
Go through the initial list of points and form a list of candidate points that are within that square. While doing that, if the DeltaX [ABS(Xi-X0)] is outside of the square, there is no need to calculate the DeltaY.
If there are no candidate points, make the square larger and iterate.
If there is exactly one candidate point and it is within the radius of the circle incribed by the square, that is your minimum.
If there are "too many" candidates, make the square smaller, but you only need to reexamine the candidate list from this iteration, not all the points.
If there are not "too many" candidates, then calculate the distance for that list. When doing so, first calculate DeltaX^2 + DeltaY^2 for the first candidate. If for subsequent candidates the DetlaX^2 is greater than the minumin so far, no need to calculate the DeltaY^2.
The minimum from that calculation is the minimum if it is within the radius of the circle inscribed by the square.
If not, you need to go back to a previous candidate list that includes points within the circle that has the radius of that minimum. For example, if you ended with one candidate in a 2x2 square that happened to be on the vertex X=1, Y=1, distance/radius would be SQRT(2). So go back to a previous candidate list that has a square greated or equal to 2xSQRT(2).
If warranted, generate a new candidate list that only includes points withing the +/- SQRT(2) square.
Calculate distance for those candidate points as described above - omitting any that exceed the minimum calcluated so far.
No need to do the square root of the sum of the Delta^2 until you have only one candidate.
How to size the initial square, or if it should be a rectangle, and how to increase or decrease the size of the square/rectangle could be influenced by application knowledge of the data distribution.
I would consider recursive algorithms for some of this if the language you are using supports that.
How about this?
from scipy.spatial import distance
A = (0.003467119 ,0.01422762 ,0.0101960126)
B = (0.007279433 ,0.01651597 ,0.0045558849)
C = (0.005392258 ,0.02149997 ,0.0177409387)
D = (0.017898802 ,0.02790659 ,0.0006487222)
E = (0.013564214 ,0.01835688 ,0.0008102952)
F = (0.013375397 ,0.02210725 ,0.0286032185)
points = [A, B, C, D, E, F]
results = []
for point in points:
distances = [{'point':point, 'neighbor':p, 'd':distance.euclidean(point, p)} for p in points if p != point]
results.append(min(distances, key=lambda k:k['d']))
results will be a list of objects, like this:
results = [
{'point':(x1, y1, z1), 'neighbor':(x2, y2, z2), 'd':"distance from point to neighbor"},
...]
Where point is the reference point and neighbor is point's closest neighbor.
The fastest option available to you may be scipy.spatial.distance.cdist, which finds the pairwise distances between all of the points in its input. While finding all of those distances may not be the fastest algorithm to find the nearest neighbors, cdist is implemented in C, so it is likely run faster than anything you try in Python.
import scipy as sp
import scipy.spatial
from scipy.spatial.distance import cdist
points = sp.array(...)
distances = sp.spatial.distance.cdist(points)
# An element is not its own nearest neighbor
sp.fill_diagonal(distances, sp.inf)
# Find the index of each element's nearest neighbor
mins = distances.argmin(0)
# Extract the nearest neighbors from the data by row indexing
nearest_neighbors = points[mins, :]
# Put the arrays in the specified shape
results = np.stack((points, nearest_neighbors), 1)
You could theoretically make this run faster (mostly by combining all of the steps into one algorithm), but unless you're writing in C, you won't be able to compete with SciPy/NumPy.
(cdist runs in Θ(n2) time (if the size of each point is fixed), and every other part of the algorithm in O(n) time, so even if you did try to optimize the code in Python, you wouldn't notice the change for small amounts of data, and the improvements would be overshadowed by cdist for more data.)
I have plotted n random points (the black points) and used delaunay triangulation, now I want to interpolate m random evaluation points (the red points) so I need to calculate which triangle the evaluation point is inside.
What is the approach for calculating the vertices of the triangle for each point?
For a given triangle, ABC, a point is inside the triangle if it is on the same side of line AB as point C is, on the same side of line BC as point A is, and on the same side of line AC as point B is. You can pre-optimize this check for each triangle and check them all until you find the triangle(s) it is in. See this page for more details.
To save computation, you can compute the minimum and maximum X and Y coordinates of the points for each triangle. If the X and Y coordinates of a point are not within the minimum and maximum values, you can immediately skip checking that triangle. The point cannot be inside it if it isn't inside the rectangle that bounds the triangle.
I'll assume that triangles do not intersect except of common edges.
You don't want to check every triangle (or subset of them) independently. The main reason is computation errors - due to them you may get answer "inside" for more than one triangle (or zero) which may break logic of your program.
More robust way is:
Find closest edge to the point
Select one of triangles touching this edge
Make one check for that triangle (the point lies on the same side as the third triangle vertex)
If "inside" - return this triangle
If "outside" - return another triangle on this edge (or "nothing" if there is no other triangle)
Even if you will return wrong triangle because of computation error, there still be exactly one triangle and point will lie close enough to it to accept such mistakes.
For #1 you can use something like quad-tree as Michael Wild suggests.
This simple example triangulates 10 random points, a further 3 random points are generated and if they fall inside a triangle the vertices are given:
import numpy as np
from pyhull.delaunay import DelaunayTri
def sign(a,b,c):
return (a[0]-c[0])*(b[1]-c[1])-(b[0]-c[0])*(a[1]-c[1])
def findfacet(p,simplice):
c,b,a = simplice.coords
b1 = sign(p,a,b) < 0.0
b2 = sign(p,b,c) < 0.0
b3 = sign(p,c,a) < 0.0
return b1 == b2 == b3
data = np.random.randn(10, 2)
dtri = DelaunayTri(data)
interpolate = np.random.randn(3, 2)
for point in interpolate:
for triangle in dtri.simplices:
if findfacet(point,triangle):
print "Point",point,"inside",triangle.coords
break
Using matplotlib to visualize (code omitted):
The dotted cyan lines now connect the points to interpolate with the vertices of triangle it lays within. The black lines are the convex hull, and the solid cyan lines are the delaunay triangulation.
A Delaunay triangulation is in itself a search data structure. Your Delaunay triangulation implementation probably has location functions. How have you computed the Delaunay triangulation of your points?
CGAL has an implementation of 2D and 3D triangulations. The resulting data structure is able to localize any point using a walk from a given point. See for example that chapter of the manual. CGAL is a C++ library, but it has python bindings.