Efficiently find all points within sorted 2D Numpy Array - python

How do I efficiently find the set of points within a circle of a given radius and centre from a sorted numpy array of equally spaced points?
For example, this is my code and how I currently extract those points within the radius.
import numpy as np
n_points = 10000
x_lim = [0, 100]
y_lim = [0, 100]
x, y = np.meshgrid(np.linspace(*x_lim, n_points), np.linspace(*y_lim, n_points))
xy = np.vstack((x.flatten(), y.flatten())).T 
# Current approach
radius = 5
point = np.array([50, 35], dtype=float) 
# Indexes of those points within a circle of radius centered at point
idxs = np.linalg.norm(point - xy, axis=-1) < radius
points_within_circle = xy[idxs]
How do I do I calculate these indexes more efficiently? I imagine because the array is structured and has a set distance between each point I should be able to exploit this to eliminate most of the checks.

One of the most important tricks that people forget is that it is a lot faster to calculate distance**2 and compare it to radius**2, than to calculate if distance < radius. So given that it looks like you're using a center of 0, calculate x**2 + y**2, and compare to 25.

Related

Sort 4 3D coordinates in a winding order in any given direction

I need to sort a selection of 3D coordinates in a winding order as seen in the image below. The bottom-right vertex should be the first element of the array and the bottom-left vertex should be the last element of the array. This needs to work given any direction that the camera is facing the points and at any orientation of those points. Since "top-left","bottom-right", etc is relative, I assume I can use the camera as a reference point? We can also assume all 4 points will be coplanar.
I am using the Blender API (writing a Blender plugin) and have access to the camera's view matrix if that is even necessary. Mathematically speaking is this even possible if so how? Maybe I am overcomplicating things?
Since the Blender API is in Python I tagged this as Python, but I am fine with pseudo-code or no code at all. I'm mainly concerned with how to approach this mathematically as I have no idea where to start.
Since you assume the four points are coplanar, all you need to do is find the centroid, calculate the vector from the centroid to each point, and sort the points by the angle of the vector.
import numpy as np
def sort_points(pts):
centroid = np.sum(pts, axis=0) / pts.shape[0]
vector_from_centroid = pts - centroid
vector_angle = np.arctan2(vector_from_centroid[:, 1], vector_from_centroid[:, 0])
sort_order = np.argsort(vector_angle) # Find the indices that give a sorted vector_angle array
# Apply sort_order to original pts array.
# Also returning centroid and angles so I can plot it for illustration.
return (pts[sort_order, :], centroid, vector_angle[sort_order])
This function calculates the angle assuming that the points are two-dimensional, but if you have coplanar points then it should be easy enough to find the coordinates in the common plane and eliminate the third coordinate.
Let's write a quick plot function to plot our points:
from matplotlib import pyplot as plt
def plot_points(pts, centroid=None, angles=None, fignum=None):
fig = plt.figure(fignum)
plt.plot(pts[:, 0], pts[:, 1], 'or')
if centroid is not None:
plt.plot(centroid[0], centroid[1], 'ok')
for i in range(pts.shape[0]):
lstr = f"pt{i}"
if angles is not None:
lstr += f" ang: {angles[i]:.3f}"
plt.text(pts[i, 0], pts[i, 1], lstr)
return fig
And now let's test this:
With random points:
pts = np.random.random((4, 2))
spts, centroid, angles = sort_points(pts)
plot_points(spts, centroid, angles)
With points in a rectangle:
pts = np.array([[0, 0], # pt0
[10, 5], # pt2
[10, 0], # pt1
[0, 5]]) # pt3
spts, centroid, angles = sort_points(pts)
plot_points(spts, centroid, angles)
It's easy enough to find the normal vector of the plane containing our points, it's simply the (normalized) cross product of the vectors joining two pairs of points:
plane_normal = np.cross(pts[1, :] - pts[0, :], pts[2, :] - pts[0, :])
plane_normal = plane_normal / np.linalg.norm(plane_normal)
Now, to find the projections of all points in this plane, we need to know the "origin" and basis of the new coordinate system in this plane. Let's assume that the first point is the origin, the x axis joins the first point to the second, and since we know the z axis (plane normal) and x axis, we can calculate the y axis.
new_origin = pts[0, :]
new_x = pts[1, :] - pts[0, :]
new_x = new_x / np.linalg.norm(new_x)
new_y = np.cross(plane_normal, new_x)
Now, the projections of the points onto the new plane are given by this answer:
proj_x = np.dot(pts - new_origin, new_x)
proj_y = np.dot(pts - new_origin, new_y)
Now you have two-dimensional points. Run the code above to sort them.
After many hours, I finally found a solution. #Pranav Hosangadi's solution worked for the 2D side of things. However, I was having trouble projecting the 3D coordinates to 2D coordinates using the second part of his solution. I also tried projecting the coordinates as described in this answer, but it did not work as intended. I then discovered an API function called location_3d_to_region_2d() (see docs) which, as the name implies, gets the 2D screen coordinates in pixels of the given 3D coordinate. I didn't need to necessarily "project" anything into 2D in the first place, getting the screen coordinates worked perfectly fine and is much more simple. From that point, I could sort the coordinates using Pranav's function with some slight adjustments to get it in the order illustrated in the screenshot of my first post and I wanted it returned as a list instead of a NumPy array.
import bpy
from bpy_extras.view3d_utils import location_3d_to_region_2d
import numpy
def sort_points(pts):
"""Sort 4 points in a winding order"""
pts = numpy.array(pts)
centroid = numpy.sum(pts, axis=0) / pts.shape[0]
vector_from_centroid = pts - centroid
vector_angle = numpy.arctan2(
vector_from_centroid[:, 1], vector_from_centroid[:, 0])
# Find the indices that give a sorted vector_angle array
sort_order = numpy.argsort(-vector_angle)
# Apply sort_order to original pts array.
return list(sort_order)
# Get 2D screen coords of selected vertices
region = bpy.context.region
region_3d = bpy.context.space_data.region_3d
corners2d = []
for corner in selected_verts:
corners2d.append(location_3d_to_region_2d(
region, region_3d, corner))
# Sort the 2d points in a winding order
sort_order = sort_points(corners2d)
sorted_corners = [selected_verts[i] for i in sort_order]
Thanks, Pranav for your time and patience in helping me solve this problem!
There is a simpler and faster solution for the Blender case:
1.) The following code sorts 4 planar points in 2D (vertices of the plane object in Blender) very efficiently:
def sort_clockwise(pts):
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
2.) Blender keeps vertices related data, such as the translation, rotation and scale in the world matrix. If you query for vertices.co(ordinates) only, you just get the original coordinates, without translation, rotation and scaling. But that does not affect the order of vertices. That simplifies the problem because what you get is actually a 2D (with z's = 0) mesh data. If you sort that 2D data (excluding z's) you will get the information, the sort indices for the 3D sorted data. You can modify the code above to get the indices from that 2D array. For the plane object of Blender, for some reason the order is always [0,1,3,2], not [0,1,2,3]. The following modified code gives the sorted indices for the vertices data in 2D.
def sorted_ix_clockwise(pts):
#rect = zeros((4, 2), dtype="float32")
ix = array([0,0,0,0])
s = pts.sum(axis=1)
#rect[0] = pts[argmin(s)]
#rect[2] = pts[argmax(s)]
ix[0] = argmin(s)
ix[2] = argmax(s)
dif = diff(pts, axis=1)
#rect[1] = pts[argmin(dif)]
#rect[3] = pts[argmax(dif)]
ix[1] = argmin(dif)
ix[3] = argmax(dif)
return ix
You can use these indices to get the actual 3D sorted data, which you can obtain by multiplying vertices coordinates with the world matrix to include any translation, rotation and scaling.

Get the distance of each point with every other, and find where the curve approach itself

I am programming a random generated spline curve by first generating control points and then interpolate with spicy.splev.Here is an example.
Points are given like this:
np.array =[[ 1.00000000e+01 -4.65000000e+02]
[ 1.78319153e+01 -4.60252579e+02]
...]
I now want to get the distance of every point with every other of the spline to see if at one point the spline comes too close to itself which includes self-collision.
Before and after every point there should be an interval where points are ignored as these are always the closest points to each point:
def collision(splinePoints, interval):
length = len(splinePoints)
mylist = []
i = -1
for item in splinePoints:
i += 1
first = item
lowerLimit = i - interval
uperLimit = i + interval
if lowerLimit >= 0:
for item in splinePoints[:lowerLimit]:
mylist.append(first)
mylist.append(item)
if uperLimit <= length:
for item in splinePoints[uperLimit:]:
mylist.append(first)
mylist.append(item)
return np.amin(lengthOfLines(np.array(mylist)))
Lengths of lines is checked with this:
def lengthOfLines(points):
return np.sqrt(np.sum(np.diff(points.T)**2, axis=0))
It somehow works, but not always. I am also struggling with debugging as the generated data is big and hard to read check or compare. Any idea how to do it better?
All pairwise distances are obtained with pdist method of scipy.spatial package. It returns a flat array of distances, with redundancies eliminated. The utility function squareform unpacks them to a symmetric square matrix, which is often more convenient.
You also want to find the nearest point that is not directly before or after the given point on the curve. In the example below, I penalize the distances between neighbors (within 20 index values) by setting those distances to infinity. Then argmin find the nearest point for everyone, and I visualize it by drawing a red line to that nearest point.
import numpy as np
from scipy.spatial.distance import pdist, squareform
t = np.linspace(0, 10, 50)
points = np.stack(((t+5)*np.cos(t), (t+5)*np.sin(t)), axis=-1) # for example
distances = squareform(pdist(points)) # distance matrix
i, j = np.meshgrid(np.arange(t.size), np.arange(t.size))
distances[np.abs(i-j) <= 20] = np.inf # don't count neighbors
nearest = np.argmin(distances, axis=0) # nearest to each
plt.plot(points[:, 0], points[:, 1])
for k in range(len(t)):
npoint = points[nearest[k]]
plt.plot([points[k, 0], npoint[0]], [points[k, 1], npoint[1]], 'r')
plt.axes().set_aspect('equal', 'datalim')
plt.show()

python - Finding the vertices of the cuboid surrounding a coordinate in a cuboidal 3-d grid with non-regular spacings

I will have a 3-d grid of points (defined by Cartesian vectors). For any given coordinate within the grid, I wish to find the 8 grid points making the cuboid which surrounds the given coordinate. I also need the distances between the vertices of the cuboid and the given coordinate. I have found a way of doing this for a meshgrid with regular spacings, but not for irregular spacings. I do not yet have an example of the irregularly spaced grid data, I just know that the algorithm will have to deal with them eventually. My solution for the regularly spaced points is based off of this post, Finding index of nearest point in numpy arrays of x and y coordinates and is as follows:
import scipy as sp
import numpy as np
x, y, z = np.mgrid[0:5, 0:10, 0:20]
# Example 3-d grid of points.
b = np.dstack((x.ravel(), y.ravel(), z.ravel()))[0]
tree = sp.spatial.cKDTree(b)
example_coord = np.array([1.5, 3.5, 5.5])
d, i = tree.query((example_coord), 8)
# i being the indices of the closest grid points, d being their distance from the
# given coordinate, example_coord
b[i[0]], d[0]
# This gives one of the points of the surrounding cuboid and its distance from
# example_coord
I am looking to make this algorithm run as efficiently as possible as it will need to be run a lot. Thanks in advance for your help.

How can I set a minimum distance constraint for generating points with numpy.random.rand?

I am trying to generate an efficient code for generating a number of random position vectors which I then use to calculate a pair correlation function. I am wondering if there is straightforward way to set a constraint on the minimum distance allowed between any two points placed in my box.
My code currently is as follows:
def pointRun(number, dr):
"""
Compute the 3D pair correlation function
for a random distribution of 'number' particles
placed into a 1.0x1.0x1.0 box.
"""
## Create array of distances over which to calculate.
r = np.arange(0., 1.0+dr, dr)
## Generate list of arrays to define the positions of all points,
## and calculate number density.
a = np.random.rand(number, 3)
numberDensity = len(a)/1.0**3
## Find reference points within desired region to avoid edge effects.
b = [s for s in a if all(s > 0.4) and all(s < 0.6) ]
## Compute pairwise correlation for each reference particle
dist = scipy.spatial.distance.cdist(a, b, 'euclidean')
allDists = dist[(dist < np.sqrt(3))]
## Create histogram to generate radial distribution function, (RDF) or R(r)
Rr, bins = np.histogram(allDists, bins=r, density=False)
## Make empty containers to hold radii and pair density values.
radii = []
rhor = []
## Normalize RDF values by distance and shell volume to get pair density.
for i in range(len(Rr)):
y = (r[i] + r[i+1])/2.
radii.append(y)
x = np.average(Rr[i])/(4./3.*np.pi*(r[i+1]**3 - r[i]**3))
rhor.append(x)
## Generate normalized pair density function, by total number density
gr = np.divide(rhor, numberDensity)
return radii, gr
I have previously tried using a loop that calculated all distances for each point as it was made and then accepted or rejected. This method was very slow if I use a lot of points.
Here is a scalable O(n) solution using numpy. It works by initially specifying an equidistant grid of points and then perturbing the points by some amount keeping the distance between the points at most min_dist.
You'll want to tweak the number of points, box shape and perturbation sensitivity to get the min_dist you want.
Note: If you fix the size of a box and specify a minimum distance between every point, it makes sense that there will be a limit to the number of points you can draw satisfying the minimum distance.
import numpy as np
import matplotlib.pyplot as plt
# specify params
n = 500
shape = np.array([64, 64])
sensitivity = 0.8 # 0 means no movement, 1 means max distance is init_dist
# compute grid shape based on number of points
width_ratio = shape[1] / shape[0]
num_y = np.int32(np.sqrt(n / width_ratio)) + 1
num_x = np.int32(n / num_y) + 1
# create regularly spaced neurons
x = np.linspace(0., shape[1]-1, num_x, dtype=np.float32)
y = np.linspace(0., shape[0]-1, num_y, dtype=np.float32)
coords = np.stack(np.meshgrid(x, y), -1).reshape(-1,2)
# compute spacing
init_dist = np.min((x[1]-x[0], y[1]-y[0]))
min_dist = init_dist * (1 - sensitivity)
assert init_dist >= min_dist
print(min_dist)
# perturb points
max_movement = (init_dist - min_dist)/2
noise = np.random.uniform(
low=-max_movement,
high=max_movement,
size=(len(coords), 2))
coords += noise
# plot
plt.figure(figsize=(10*width_ratio,10))
plt.scatter(coords[:,0], coords[:,1], s=3)
plt.show()
Based on #Samir 's answer, and make it a callable function for your convenience :)
import numpy as np
import matplotlib.pyplot as plt
def generate_points_with_min_distance(n, shape, min_dist):
# compute grid shape based on number of points
width_ratio = shape[1] / shape[0]
num_y = np.int32(np.sqrt(n / width_ratio)) + 1
num_x = np.int32(n / num_y) + 1
# create regularly spaced neurons
x = np.linspace(0., shape[1]-1, num_x, dtype=np.float32)
y = np.linspace(0., shape[0]-1, num_y, dtype=np.float32)
coords = np.stack(np.meshgrid(x, y), -1).reshape(-1,2)
# compute spacing
init_dist = np.min((x[1]-x[0], y[1]-y[0]))
# perturb points
max_movement = (init_dist - min_dist)/2
noise = np.random.uniform(low=-max_movement,
high=max_movement,
size=(len(coords), 2))
coords += noise
return coords
coords = generate_points_with_min_distance(n=8, shape=(2448,2448), min_dist=256)
# plot
plt.figure(figsize=(10,10))
plt.scatter(coords[:,0], coords[:,1], s=3)
plt.show()
As I understood, you're looking for an algorithm to create many random points in a box such that no two points are closer than some minimum distance. If this is your problem, then you can take advantage of statistical physics, and solve it using molecular dynamics software. Moreover, you do need molecular dynamics or Monte Carlo to obtain exact solution of this problem.
You place N atoms in a rectangular box, create a repulsive interaction of a fixed radius between them (such as shifted Lennard-Jones interaction), and run simulation for some time (untill you see that the points spread out uniformly throughout the box). By laws of statistical physics you can show that positions of the points would be maximally random given the constraint that points cannot be close than some distance. This would not be true if you use iterative algorithm, such as placing points one-by-one and rejecting them if they overlap
I would estimate a runtime of several seconds for 10000 points, and several minutes for 100k. I use OpenMM for all my moelcular dynamics simulations.
#example of generating 50 points in a square of 4000x4000 and with minimum distance of 400
import numpy as np
import random as rnd
n_points=50
x,y = np.zeros(n_points),np.zeros(n_points)
x[0],y[0]=np.round(rnd.uniform(0,4000)),np.round(rnd.uniform(0,4000))
min_distances=[]
i=1
while i<n_points :
x_temp,y_temp=np.round(rnd.uniform(0,4000)),np.round(rnd.uniform(0,4000))
distances = []
for j in range(0,i):
distances.append(np.sqrt((x_temp-x[j])**2+(y_temp-y[j])**2))
min_distance = np.min(distances)
if min_distance>400 :
min_distances.append(min_distance)
x[i]=x_temp
y[i]=y_temp
i = i+1
print(x,y)

Parse list of x,y coordinates and detect continious areas

I have a list of x, y coordinates
What I need to do is separate those into groups of continuous areas
All the x, y coordinates in a list will end up belonging to a particular group.
I currently have an simple algorithm, that just goes through each point and finds all the adjacent points (so points with coordinates of +-1 on x and +-1 on y)
However, it is much too slow when it comes to using large x,y lists.
PS Keep in mind that there could be holes in the middle of groups.
One simple method that you could use is k-means clustering. k-means partitions a list of observations into k clusters, where each point belongs to the cluster with the nearest mean. If you know that there are k=2 groups of points, then this method should work very well, assuming your clusters of points are reasonably well separated (and even if they have holes). SciPy has an implementation of k-means that should be easy to apply.
Here's an example of the type of analysis you can perform.
# import required modules
import numpy as np
from scipy.cluster.vq import kmeans2
# generate clouds of 2D normally distributed points
N = 6000000 # number of points in each cluster
# cloud 1: mean (0, 0)
mean1 = [0, 0]
cov1 = [[1, 0], [0, 1]]
x1,y1 = np.random.multivariate_normal(mean1, cov1, N).T
# cloud 2: mean (5, 5)
mean2 = [5, 5]
cov2 = [[1, 0], [0, 1]]
x2,y2 = np.random.multivariate_normal(mean2, cov2, N).T
# merge the clouds and arrange into data points
xs, ys = np.concatenate( (x1, x2) ), np.concatenate( (y1, y2) )
points = np.array([xs, ys]).T
# cluster the points using k-means
centroids, clusters = kmeans2(points, k=2)
Running this on my 2012 MBA with 12 million data points is pretty fast:
>>> time python test.py
real 0m20.957s
user 0m18.128s
sys 0m2.732s
It is also 100% accurate (not surprising given that the point clouds don't overlap at all). Here's some quick code for computing the accuracy of the cluster assignments. The only tricky part is I first use Euclidean distance to identify which cluster's centroid matches up with the mean of the original data cloud.
# determine which centroid belongs to which cluster
# using Euclidean distance
dist1 = np.linalg.norm(centroids[0]-mean1)
dist2 = np.linalg.norm(centroids[1]-mean1)
if dist1 <= dist2:
FIRST, SECOND = 0, 1
else:
FIRST, SECOND = 1, 0
# compute accuracy by iterating through all 2N points
# note: first N points are from cloud1, second N points are from cloud2
correct = 0
for i in range(len(clusters)):
if clusters[i] == FIRST and i < N:
correct += 1
elif clusters[i] == SECOND and i >= N:
correct += 1
# output accuracy
print 'Accuracy: %.2f' % (correct*100./len(clusters))
What you want to do is called finding connected components in image processing. You have a binary image in which all the (x, y) pixels that are in your list are 1, and pixels that aren't are 0.
You can use numpy/scipy to turn your data into a 2D binary image, and then call ndimage.label to find the connected components.
Suppose all x and y are >= 0, you know max_x and max_y, and the resulting image fits into memory, then something like:
import numpy as np
from scipy import ndimage
image = np.zeros(max_x, max_y)
for x, y in huge_list_of_xy_points:
image[x, y] = 1
labelled = ndimage.label(image)
Should give you an array in which all pixels in group 1 have value 1, all pixels in groups 2 have value 2, et cetera. Not tested.
First of all, you can identify the problem with a corresponding graph G(V, E):
Points are vertices and there is an edge e between point A and point B if and only if A is "close" to B, where you can define "close" on your own.
Since each point belongs to exactly one group, groups form disjoint sets and you can use a simple DFS to assign points to groups. In graph theory the underlying problem is called Connected Components.
The complexity of DFS is linear i.e. O(V + E).

Categories

Resources