I have a problem where in a grid of x*y size I am provided a single dot, and I need to find the nearest neighbour. In practice, I am trying to find the closest dot to the cursor in pygame that crosses a color distance threshold that is calculated as following:
sqrt(((rgb1[0]-rgb2[0])**2)+((rgb1[1]-rgb2[1])**2)+((rgb1[2]-rgb2[2])**2))
So far I have a function that calculates the different resolutions for the grid and reduces it by a factor of two while always maintaining the darkest pixel. It looks as following:
from PIL import Image
from typing import Dict
import numpy as np
#we input a pillow image object and retrieve a dictionary with every grid version of the 3 dimensional array:
def calculate_resolutions(image: Image) -> Dict[int, np.ndarray]:
resolutions = {}
#we start with the highest resolution image, the size of which we initially divide by 1, then 2, then 4 etc.:
divisor = 1
#reduce the grid by 5 iterations
resolution_iterations = 5
for i in range(resolution_iterations):
pixel_lookup = image.load() #convert image to PixelValues object, which allows for pixellookup via [x,y] index
#calculate the resolution of the new grid, round upwards:
resolution = (int((image.size[0] - 1) // divisor + 1), int((image.size[1] - 1) // divisor + 1))
#generate 3d array with new grid resolution, fill in values that are darker than white:
new_grid = np.full((resolution[0],resolution[1],3),np.array([255,255,255]))
for x in range(image.size[0]):
for y in range(image.size[1]):
if not x%divisor and not y%divisor:
darkest_pixel = (255,255,255)
x_range = divisor if x+divisor<image.size[0] else (0 if image.size[0]-x<0 else image.size[0]-x)
y_range = divisor if y+divisor<image.size[1] else (0 if image.size[1]-y<0 else image.size[1]-y)
for x_ in range(x,x+x_range):
for y_ in range(y,y+y_range):
if pixel_lookup[x_,y_][0]+pixel_lookup[x_,y_][1]+pixel_lookup[x_,y_][2] < darkest_pixel[0]+darkest_pixel[1]+darkest_pixel[2]:
darkest_pixel = pixel_lookup[x_,y_]
if darkest_pixel != (255,255,255):
new_grid[int(x/divisor)][int(y/divisor)] = np.array(darkest_pixel)
resolutions[i] = new_grid
divisor = divisor*2
return resolutions
This is the most performance efficient solution I was able to come up with. If this function is run on a grid that continually changes, like a video with x fps, it will be very performance intensive. I also considered using a kd-tree algorithm that simply adds and removes any dots that happen to change on the grid, but when it comes to finding individual nearest neighbours on a static grid this solution has the potential to be more resource efficient. I am open to any kinds of suggestions in terms of how this function could be improved in terms of performance.
Now, I am in a position where for example, I try to find the nearest neighbour of the current cursor position in a 100x100 grid. The resulting reduced grids are 50^2, 25^2, 13^2, and 7^2. In a situation where a part of the grid looks as following:
And I am on the aggregation step where a part of the grid consisting of six large squares, the black one being the current cursor position and the orange dots being dots where the color distance threshold is crossed, I would not know which diagonally located closest neighbour I would want to pick to search next. In this case, going one aggregation step down shows that the lower left would be the right choice. Depending on how many grid layers I have this could result in a very large error in terms of the nearest neighbour search. Is there a good way how I can solve this problem? If there are multiple squares that show they have a relevant location, do I have to search them all in the next step to be sure? And if that is the case, the further away I get the more I would need to make use of math functions such as the pythagorean theorem to assert whether the two positive squares I find are overlapping in terms of distance and could potentially contain the closest neighbour, which would start to be performance intensive again if the function is called frequently. Would it still make sense to pursue this solution over a regular kd tree? For now the grid size is still fairly small (~800-600) but if the grid gets larger the performance may start suffering again. Is there a good scalable solution to this problem that could be applied here?
Related
I am trying to pack hard-spheres in a unit cubical box, such that these spheres cannot overlap on each other. This is being done in Python.
I am given some packing fraction f, and the number of spheres in the system is N.
So, I say that the diameter of each sphere will be
d = (p*6/(math.pi*N)**)1/3).
My box has periodic boundary conditions - which means that there is a recurring image of my box in all direction. If there is a particle who is at the edge of the box and has a portion of it going beyond the wall, it will stick out at the other side.
My attempt:
Create a numpy N-by-3 array box which holds the position vector of each particle [x,y,z]
The first particle is fine as it is.
The next particle in the array is checked with all the previous particles. If the distance between them is more than d, move on to the next particle. If they overlap, randomly change the position vector of the particle in question. If the new position does not overlap with the previous atoms, accept it.
Repeat steps 2-3 for the next particle.
I am trying to populate my box with these hard spheres, in the following manner:
for i in range(1,N):
mybool=True
print("particles in box: " + str(i))
while (mybool): #the deal with this while loop is that if we place a bad particle, we need to change its position, and restart the process of checking
for j in range(0,i):
displacement=box[j,:]-box[i,:]
for k in range(3):
if abs(displacement[k])>L/2:
displacement[k] -= L*np.sign(displacement[k])
distance = np.linalg.norm(displacement,2) #check distance between ith particle and the trailing j particles
if distance<diameter:
box[i,:] = np.random.uniform(0,1,(1,3)) #change the position of the ith particle randomly, restart the process
break
if j==i-1 and distance>diameter:
mybool = False
break
The problem with this code is that if p=0.45, it is taking a really, really long time to converge. Is there a better method to solve this problem, more efficiently?
I think what you are looking for is either the hexagonal closed-packed (HCP or sometime called face-centered cubic, FCC) lattice or the cubic closed-packed one (CCP). See e.g. Wikipedia on Close-packing of equal spheres.
Since your space has periodic conditions, I believe it doesn't matter which one you chose (hcp or ccp), and they both achieve the same density of ~74.04%, which was proved by Gauss to be the highest density by lattice packing.
Update:
For the follow-up question on how to generate efficiently one such lattice, let's take as an example the HCP lattice. First, let's create a bunch of (i, j, k) indices [(0,0,0), (1,0,0), (2,0,0), ..., (0,1,0), ...]. Then, get xyz coordinates from those indices and return a DataFrame with them:
def hcp(n):
dim = 3
k, j, i = [v.flatten()
for v in np.meshgrid(*([range(n)] * dim), indexing='ij')]
df = pd.DataFrame({
'x': 2 * i + (j + k) % 2,
'y': np.sqrt(3) * (j + 1/3 * (k % 2)),
'z': 2 * np.sqrt(6) / 3 * k,
})
return df
We can plot the result as scatter3d using plotly for interactive exploration:
import plotly.graph_objects as go
df = hcp(12)
fig = go.Figure(data=go.Scatter3d(
x=df.x, y=df.y, z=df.z, mode='markers',
marker=dict(size=df.x*0 + 30, symbol="circle", color=-df.z, opacity=1),
))
fig.show()
Note: plotly's scatter3d is not a very good rendering of spheres: the marker sizes are constant (so when you zoom in and out, the "spheres" will appear to change relative size), and there is no shading, limited z-ordering faithfulness, etc., but it's convenient to interact with the plot.
Resize and clip to the unit box:
Here, a strict clipping (each sphere needs to be completely inside the unit box). Your "periodic boundary condition" is something you will need to address separately (see further below for ideas).
def hcp_unitbox(r):
n = int(np.ceil(1 / (np.sqrt(3) * r)))
df = hcp(n) * r
df += r
df = df[(df <= 1 - r).all(axis=1)]
return df
With this, you find that a radius of 0.06 gives you 608 fully enclosed spheres:
hcp_unitbox(.06).shape # (608, 3)
Where you would go next:
You may dig deeper into the effect of your so-called "periodic boundary conditions", and perhaps play with some rotations (and small translations).
To do so, you may try to generate an HCP-lattice that is large enough that any rotation will still fully enclose your unit cube. For example:
r = 0.2 # example
n = int(np.ceil(2 / r))
df = hcp(n) * r - 1
Then rotate it (by any amount) and translate it (by up to 1 radius in any direction) as you wish for your research, and clip. The "periodic boundary conditions", as you call them, present a bit of extra challenge, as the clipping becomes trickier. First, clip any sphere whose center is outside your box. Then select spheres close enough to the boundaries, or even partition the regions of interest into overlapping regions along the walls of your cube, then check for collisions among the spheres (as per your periodic boundary conditions) that fall in each such region.
I have a set of approximately 10,000 vectors max (random directions) in 3d space and I'm looking for a new direction v_dev (vector) which deviates from all other directions in the set by e.g. a minimum of 5 degrees. My naive initial try is the following, which has of course bad runtime complexity but succeeds for some cases.
#!/usr/bin/env python
import numpy as np
numVecs = 10000
vecs = np.random.rand(numVecs, 3)
randVec = np.random.rand(1, 3)
notFound=True
foundVec=randVec
below=False
iter = 1
for vec in vecs:
angle = np.rad2deg(np.arccos(np.vdot(vec, foundVec)/(np.linalg.norm(vec) * np.linalg.norm(foundVec))))
print("angle: %f\n" % angle)
while notFound:
for vec in vecs:
angle = np.rad2deg(np.arccos(np.vdot(vec, randVec)/(np.linalg.norm(vec) * np.linalg.norm(randVec))))
if angle < 5:
below=True
if below:
randVec = np.random.rand(1, 3)
else:
notFound=False
print("iteration no. %i" % iter)
iter = iter + 1
Any hints how to approach this problem (language agnostic) would be appreciate.
Consider the vectors in a spherical coordinate system (u,w,r), where r is always 1 because vector length doesn't matter here. Any vector can be expressed as (u,w) and the "deadzone" around each vector x, in which the target vector t cannot fall, can be expressed as dist((u_x, w_x, 1), (u_x-u_t, w_x-w_t, 1)) < 5°. However calculating this distance can be a bit tricky, so converting back into cartesian coordinates might be easier. These deadzones are circular on the spherical shell around the origin and you're looking for a t that doesn't hit any on them.
For any fixed u_t you can iterate over all x and using the distance function can find the start and end point of a range of w_t, that are blocked because they fall into the deadzone of the vector x. The union of all 10000 ranges build the possible values of w_t for that given u_t. The same can be done for any fixed w_t, looking for a u_t.
Now comes the part that I'm not entirely sure of: Given that you have two unknows u_t and w_t and 20000 knowns, the system is just a tad overdetermined and if there's a solution, it should be possible to find it.
My suggestion: Set u_t fixed to a random value and check which w_t are possible. If you find a non-empty range, great, you're done. If all w_t are blocked, select a different u_t and try again. Now, selecting u_t at random will work eventually, yet a smarter iteration should be possible. Maybe u_t(n) = u_t(n-1)*phi % 360°, where phi is the golden ratio. That way the u_t never repeat and will cover the whole space with finer and finer granularity instead of starting from one end and going slowly to the other.
Edit: You might also have more luck on the mathematics stackexchange since this isn't so much a code question as it is a mathematics question. For example I'm not sure what I wrote is all that rigorous, so I don't even know it works.
One way would be two build a 2d manifold (area on the sphere) of forbidden areas. You start by adding a point, then, the forbidden area is a circle on the sphere surface.
While true, pick a point on the boundary of the area. If this is not close (within 5 degrees) to any other vector, then, you're done, return it. If not, you just found a new circle of forbidden area. Add it to your manifold of forbidden area. You'll need to chop the circle in line or arc segments and build the boundary as a list.
If the set of vector has no solution, you boundary will collapse to an empty point. Then you return failure.
It's not the easiest approach, and you'll have to deal with the boundaries of a complex shape over a sphere. But it's guaranteed to work and should have reasonable complexity.
What is the most efficient way compute (euclidean) distance of the nearest neighbor for each point in an array?
I have a list of 100k (X,Y,Z) points and I would like to compute a list of nearest neighbor distances. The index of the distance would correspond to the index of the point.
I've looked into PYOD and sklearn neighbors, but those seem to require "teaching". I think my problem is simpler than that. For each point: find nearest neighbor, compute distance.
Example data:
points = [
(0 0 1322.1695
0.006711111 0 1322.1696
0.026844444 0 1322.1697
0.0604 0 1322.1649
0.107377778 0 1322.1651
0.167777778 0 1322.1634
0.2416 0 1322.1629
0.328844444 0 1322.1631
0.429511111 0 1322.1627...)]
compute k = 1 nearest neighbor distances
result format:
results = [nearest neighbor distance]
example results:
results = [
0.005939372
0.005939372
0.017815632
0.030118587
0.041569616
0.053475883
0.065324964
0.077200014
0.089077602)
]
UPDATE:
I've implemented two of the approaches suggested.
Use the scipy.spatial.cdist to compute the full distances matrices
Use a nearest X neighbors in radius R to find subset of neighbor distances for every point and return the smallest.
Results are that Method 2 is faster than Method 1 but took a lot more effort to implement (makes sense).
It seems the limiting factor for Method 1 is the memory needed to run the full computation, especially when my data set is approaching 10^5 (x, y, z) points. For my data set of 23k points, it takes ~ 100 seconds to capture the minimum distances.
For method 2, the speed scales as n_radius^2. That is, "neighbor radius squared", which really means that the algorithm scales ~ linearly with number of included neighbors. Using a Radius of ~ 5 (more than enough given application) it took 5 seconds, for the set of 23k points, to provide a list of mins in the same order as the point_list themselves. The difference matrix between the "exact solution" and Method 2 is basically zero.
Thanks for everyones' help!
Similar to Caleb's answer, but you could stop the iterative loop if you get a distance greater than some previous minimum distance (sorry - no code).
I used to program video games. It would take too much CPU to calculate the actual distance between two points. What we did was divide the "screen" into larger Cartesian squares and avoid the actual distance calculation if the Delta-X or Delta-Y was "too far away" - That's just subtraction, so maybe something like that to qualify where the actual Eucledian distance metric calculation is needed (extend to n-dimensions as needed)?
EDIT - expanding "too far away" candidate pair selection comments.
For brevity, I'll assume a 2-D landscape.
Take the point of interest (X0,Y0) and "draw" an nxn square around that point, with (X0,Y0) at the origin.
Go through the initial list of points and form a list of candidate points that are within that square. While doing that, if the DeltaX [ABS(Xi-X0)] is outside of the square, there is no need to calculate the DeltaY.
If there are no candidate points, make the square larger and iterate.
If there is exactly one candidate point and it is within the radius of the circle incribed by the square, that is your minimum.
If there are "too many" candidates, make the square smaller, but you only need to reexamine the candidate list from this iteration, not all the points.
If there are not "too many" candidates, then calculate the distance for that list. When doing so, first calculate DeltaX^2 + DeltaY^2 for the first candidate. If for subsequent candidates the DetlaX^2 is greater than the minumin so far, no need to calculate the DeltaY^2.
The minimum from that calculation is the minimum if it is within the radius of the circle inscribed by the square.
If not, you need to go back to a previous candidate list that includes points within the circle that has the radius of that minimum. For example, if you ended with one candidate in a 2x2 square that happened to be on the vertex X=1, Y=1, distance/radius would be SQRT(2). So go back to a previous candidate list that has a square greated or equal to 2xSQRT(2).
If warranted, generate a new candidate list that only includes points withing the +/- SQRT(2) square.
Calculate distance for those candidate points as described above - omitting any that exceed the minimum calcluated so far.
No need to do the square root of the sum of the Delta^2 until you have only one candidate.
How to size the initial square, or if it should be a rectangle, and how to increase or decrease the size of the square/rectangle could be influenced by application knowledge of the data distribution.
I would consider recursive algorithms for some of this if the language you are using supports that.
How about this?
from scipy.spatial import distance
A = (0.003467119 ,0.01422762 ,0.0101960126)
B = (0.007279433 ,0.01651597 ,0.0045558849)
C = (0.005392258 ,0.02149997 ,0.0177409387)
D = (0.017898802 ,0.02790659 ,0.0006487222)
E = (0.013564214 ,0.01835688 ,0.0008102952)
F = (0.013375397 ,0.02210725 ,0.0286032185)
points = [A, B, C, D, E, F]
results = []
for point in points:
distances = [{'point':point, 'neighbor':p, 'd':distance.euclidean(point, p)} for p in points if p != point]
results.append(min(distances, key=lambda k:k['d']))
results will be a list of objects, like this:
results = [
{'point':(x1, y1, z1), 'neighbor':(x2, y2, z2), 'd':"distance from point to neighbor"},
...]
Where point is the reference point and neighbor is point's closest neighbor.
The fastest option available to you may be scipy.spatial.distance.cdist, which finds the pairwise distances between all of the points in its input. While finding all of those distances may not be the fastest algorithm to find the nearest neighbors, cdist is implemented in C, so it is likely run faster than anything you try in Python.
import scipy as sp
import scipy.spatial
from scipy.spatial.distance import cdist
points = sp.array(...)
distances = sp.spatial.distance.cdist(points)
# An element is not its own nearest neighbor
sp.fill_diagonal(distances, sp.inf)
# Find the index of each element's nearest neighbor
mins = distances.argmin(0)
# Extract the nearest neighbors from the data by row indexing
nearest_neighbors = points[mins, :]
# Put the arrays in the specified shape
results = np.stack((points, nearest_neighbors), 1)
You could theoretically make this run faster (mostly by combining all of the steps into one algorithm), but unless you're writing in C, you won't be able to compete with SciPy/NumPy.
(cdist runs in Θ(n2) time (if the size of each point is fixed), and every other part of the algorithm in O(n) time, so even if you did try to optimize the code in Python, you wouldn't notice the change for small amounts of data, and the improvements would be overshadowed by cdist for more data.)
Given a binary square array of the fixed size like on the image below.
It is assumed in advance that the array contains an image of a circle or part of it. It's important that this circle is always centred on the image.
Example
It is necessary to find an effective way to supplement the arc to the full circle, if it's possible.
I've tried to statistically calculate the average distance from the centre to the white points and complete the circle. And it works. I've also tried the Hough Transform to fit the ellipse and determine its size. But both methods are very resource intensive.
1 method sketch:
points = np.transpose(np.array(np.nonzero(array))).tolist() # array of one-value points
random.shuffle(points)
points = np.array(points[:500]).astype('uint8') # take into account only 500 random points
distances = np.zeros(points.shape[0], dtype='int32') # array of distances from the centre of image (40, 40) to some point
for i in xrange(points.shape[0]):
distances[i] = int(np.sqrt((points[i][0] - 40) ** 2 + (points[i][1] - 40) ** 2))
u, indices = np.unique(distances, return_inverse=True)
mean_dist = u[np.argmax(np.bincount(indices))] # most probable distance
# use this mean_dist in order to draw a complete circle
1 method result
2 method sketch:
from skimage.transform import hough_ellipse
result = hough_ellipse(array, min_size=..., max_size=...)
result.sort(order='accumulator')
# ... extract the necessary info from result variable if it's not empty
Could someone suggest another and effective solution? Thank you!
I've tried to statistically calculate the average distance from the centre to the white points and complete the circle.
Well this seems to be a good start. Given an image with n pixels, this Algorithm is O(n) which is already pretty efficient.
If you want a faster implementation, try using randomization:
Take m random sample points from the image and use those to calculate the average radius of the white points. Then complete the circle using this radius.
This algorithm would then have O(m) which means that it's faster for all m < n. Choosing a good value for m might be tricky because you have to compromise between runtime and output quality.
I am trying to estimate the value of pi using a monte carlo simulation. I need to use two unit circles that are a user input distance from the origin. I understand how this problem works with a single circle, I just don't understand how I am meant to use two circles. Here is what I have got so far (this is the modified code I used for a previous problem the used one circle with radius 2.
import random
import math
import sys
def main():
numDarts=int(sys.argv[1])
distance=float(sys.argv[2])
print(montePi(numDarts,distance))
def montePi(numDarts,distance):
if distance>=1:
return(0)
inCircle=0
for I in range(numDarts):
x=(2*(random.random()))-2
y=random.random()
d=math.sqrt(x**2+y**2)
if d<=2 and d>=-2:
inCircle=inCircle+1
pi=inCircle/numDarts*4
return pi
main()
I need to change this code to work with 2 unit circles, but I do not understand how to use trigonometry to do this, or am I overthinking the problem? Either way help will be appreciated as I continue trying to figure this out.
What I do know is that I need to change the X coordinate, as well as the equation that determines "d" (d=math.sqrt(x*2+y*2)), im just not sure how.
These are my instructions-
Write a program called mcintersection.py that uses the Monte Carlo method to
estimate the area of this shape (and prints the result). Your program should take
two command-line parameters: distance and numDarts. The distance parameter
specifies how far away the circles are from the origin on the x-axis. So if distance
is 0, then both circles are centered on the origin, and completely overlap. If
distance is 0.5 then one circle is centered at (-0.5, 0) and the other at (0.5, 0). If
distance is 1 or greater, then the circles do not overlap at all! In that last case, your
program can simply output 0. The numDarts parameter should specify the number
of random points to pick in the Monte Carlo process.
In this case, the rectangle should be 2 units tall (with the top at y = 1 and the
bottom at y = -1). You could also safely make the rectangle 2 units wide, but this
will generally be much bigger than necessary. Instead, you should figure out
exactly how wide the shape is, based on the distance parameter. That way you can
use as skinny a rectangle as possible.
If I understand the problem correctly, you have two unit circles centered at (distance, 0) and (-distance, 0) (that is, one is slightly to the right of the origin and one is slightly to the left). You're trying to determine if a given point, (x, y) is within both circles.
The simplest approach might be to simply compute the distance between the point and the center of each of the circles. You've already done this in your previous code, just repeat the computation twice, once with the offset distance inverted, then use and to see if your point is in both circles.
But a more elegant solution would be to notice how your two circles intersect each other exactly on the y-axis. To the right of the axis, the left circle is completely contained within the right one. To the left of the y-axis, the right circle is entirely within the left circle. And since the shape is symmetrical, the two halves are of exactly equal size.
This means you can limit your darts to only hitting on one side of the axis, and then get away with just a single distance test:
def circle_intersection_area(num_darts, distance):
if distance >= 1:
return 0
in_circle = 0
width = 1-distance # this is enough to cover half of the target
for i in range(num_darts):
x = random.random()*width # random value from 0 to 1-distance
y = random.random()*2 - 1 # random value from -1 to 1
d = math.sqrt((x+distance)**2 + y**2) # distance from (-distance, 0)
if d <= 1:
in_circle += 1
sample_area = width * 2
target_area = sample_area * (in_circle / num_darts)
return target_area * 2 # double, since we were only testing half the target