Given a binary square array of the fixed size like on the image below.
It is assumed in advance that the array contains an image of a circle or part of it. It's important that this circle is always centred on the image.
Example
It is necessary to find an effective way to supplement the arc to the full circle, if it's possible.
I've tried to statistically calculate the average distance from the centre to the white points and complete the circle. And it works. I've also tried the Hough Transform to fit the ellipse and determine its size. But both methods are very resource intensive.
1 method sketch:
points = np.transpose(np.array(np.nonzero(array))).tolist() # array of one-value points
random.shuffle(points)
points = np.array(points[:500]).astype('uint8') # take into account only 500 random points
distances = np.zeros(points.shape[0], dtype='int32') # array of distances from the centre of image (40, 40) to some point
for i in xrange(points.shape[0]):
distances[i] = int(np.sqrt((points[i][0] - 40) ** 2 + (points[i][1] - 40) ** 2))
u, indices = np.unique(distances, return_inverse=True)
mean_dist = u[np.argmax(np.bincount(indices))] # most probable distance
# use this mean_dist in order to draw a complete circle
1 method result
2 method sketch:
from skimage.transform import hough_ellipse
result = hough_ellipse(array, min_size=..., max_size=...)
result.sort(order='accumulator')
# ... extract the necessary info from result variable if it's not empty
Could someone suggest another and effective solution? Thank you!
I've tried to statistically calculate the average distance from the centre to the white points and complete the circle.
Well this seems to be a good start. Given an image with n pixels, this Algorithm is O(n) which is already pretty efficient.
If you want a faster implementation, try using randomization:
Take m random sample points from the image and use those to calculate the average radius of the white points. Then complete the circle using this radius.
This algorithm would then have O(m) which means that it's faster for all m < n. Choosing a good value for m might be tricky because you have to compromise between runtime and output quality.
Related
I have a problem where in a grid of x*y size I am provided a single dot, and I need to find the nearest neighbour. In practice, I am trying to find the closest dot to the cursor in pygame that crosses a color distance threshold that is calculated as following:
sqrt(((rgb1[0]-rgb2[0])**2)+((rgb1[1]-rgb2[1])**2)+((rgb1[2]-rgb2[2])**2))
So far I have a function that calculates the different resolutions for the grid and reduces it by a factor of two while always maintaining the darkest pixel. It looks as following:
from PIL import Image
from typing import Dict
import numpy as np
#we input a pillow image object and retrieve a dictionary with every grid version of the 3 dimensional array:
def calculate_resolutions(image: Image) -> Dict[int, np.ndarray]:
resolutions = {}
#we start with the highest resolution image, the size of which we initially divide by 1, then 2, then 4 etc.:
divisor = 1
#reduce the grid by 5 iterations
resolution_iterations = 5
for i in range(resolution_iterations):
pixel_lookup = image.load() #convert image to PixelValues object, which allows for pixellookup via [x,y] index
#calculate the resolution of the new grid, round upwards:
resolution = (int((image.size[0] - 1) // divisor + 1), int((image.size[1] - 1) // divisor + 1))
#generate 3d array with new grid resolution, fill in values that are darker than white:
new_grid = np.full((resolution[0],resolution[1],3),np.array([255,255,255]))
for x in range(image.size[0]):
for y in range(image.size[1]):
if not x%divisor and not y%divisor:
darkest_pixel = (255,255,255)
x_range = divisor if x+divisor<image.size[0] else (0 if image.size[0]-x<0 else image.size[0]-x)
y_range = divisor if y+divisor<image.size[1] else (0 if image.size[1]-y<0 else image.size[1]-y)
for x_ in range(x,x+x_range):
for y_ in range(y,y+y_range):
if pixel_lookup[x_,y_][0]+pixel_lookup[x_,y_][1]+pixel_lookup[x_,y_][2] < darkest_pixel[0]+darkest_pixel[1]+darkest_pixel[2]:
darkest_pixel = pixel_lookup[x_,y_]
if darkest_pixel != (255,255,255):
new_grid[int(x/divisor)][int(y/divisor)] = np.array(darkest_pixel)
resolutions[i] = new_grid
divisor = divisor*2
return resolutions
This is the most performance efficient solution I was able to come up with. If this function is run on a grid that continually changes, like a video with x fps, it will be very performance intensive. I also considered using a kd-tree algorithm that simply adds and removes any dots that happen to change on the grid, but when it comes to finding individual nearest neighbours on a static grid this solution has the potential to be more resource efficient. I am open to any kinds of suggestions in terms of how this function could be improved in terms of performance.
Now, I am in a position where for example, I try to find the nearest neighbour of the current cursor position in a 100x100 grid. The resulting reduced grids are 50^2, 25^2, 13^2, and 7^2. In a situation where a part of the grid looks as following:
And I am on the aggregation step where a part of the grid consisting of six large squares, the black one being the current cursor position and the orange dots being dots where the color distance threshold is crossed, I would not know which diagonally located closest neighbour I would want to pick to search next. In this case, going one aggregation step down shows that the lower left would be the right choice. Depending on how many grid layers I have this could result in a very large error in terms of the nearest neighbour search. Is there a good way how I can solve this problem? If there are multiple squares that show they have a relevant location, do I have to search them all in the next step to be sure? And if that is the case, the further away I get the more I would need to make use of math functions such as the pythagorean theorem to assert whether the two positive squares I find are overlapping in terms of distance and could potentially contain the closest neighbour, which would start to be performance intensive again if the function is called frequently. Would it still make sense to pursue this solution over a regular kd tree? For now the grid size is still fairly small (~800-600) but if the grid gets larger the performance may start suffering again. Is there a good scalable solution to this problem that could be applied here?
After edge detection of an image, I have a list of point that make arbitrary shape, but i want to eliminate those that dont contribute to a rectangle shape. In the following example picture, the two points in the bottom left (E, F) should be removed, so the shape of the remaining points is almost a rect (since D is a little above, it give a trapezoid shape, but it is not significant)
I thought of brute force all points and compare their area, but it is not guarranteing of being a rect. But i dont know how to implement this in python.
If someone has a better approach i'd like to hear it please.
Thanks in advance.
p = [ (8,133), (78,13), (242,89), (183,217), (217,235), (213,240) ]
The best approach you can follow is using Slope
The mathematical formula for the slope of a given line is shown below.
m = (y2-y1)/(x2-x1)
so for example: if the slope of (8,133),(78,13) is equal to (242,89),(183,217) this means we have 2 parallel lines, then we have a rectangle, but this doesn't mean that the 4 points on the 4 corners. however, they would be within 2 sides of a rectangle.
If you want to make sure they are on the 4 corners you can compute the slope of (8,133)(183,217) and the slope of (78,13)(242,89), then compare them, if they are equal, then you have 4 corners points of a rectangle.
backing to the code
first, you are going to need all possible combinations of length 4 of all the points, to accomplish this use combinations from itertools
from itertools import combinations
p = [(8,133), (78,13), (242,89), (183,217), (217,235), (213,240)]
possible_combinations = []
for comb in combinations(p, 4):
possible_combinations.append(comb)
after that apply the above algorithm to each combination to get your rectangle.
I have a set of approximately 10,000 vectors max (random directions) in 3d space and I'm looking for a new direction v_dev (vector) which deviates from all other directions in the set by e.g. a minimum of 5 degrees. My naive initial try is the following, which has of course bad runtime complexity but succeeds for some cases.
#!/usr/bin/env python
import numpy as np
numVecs = 10000
vecs = np.random.rand(numVecs, 3)
randVec = np.random.rand(1, 3)
notFound=True
foundVec=randVec
below=False
iter = 1
for vec in vecs:
angle = np.rad2deg(np.arccos(np.vdot(vec, foundVec)/(np.linalg.norm(vec) * np.linalg.norm(foundVec))))
print("angle: %f\n" % angle)
while notFound:
for vec in vecs:
angle = np.rad2deg(np.arccos(np.vdot(vec, randVec)/(np.linalg.norm(vec) * np.linalg.norm(randVec))))
if angle < 5:
below=True
if below:
randVec = np.random.rand(1, 3)
else:
notFound=False
print("iteration no. %i" % iter)
iter = iter + 1
Any hints how to approach this problem (language agnostic) would be appreciate.
Consider the vectors in a spherical coordinate system (u,w,r), where r is always 1 because vector length doesn't matter here. Any vector can be expressed as (u,w) and the "deadzone" around each vector x, in which the target vector t cannot fall, can be expressed as dist((u_x, w_x, 1), (u_x-u_t, w_x-w_t, 1)) < 5°. However calculating this distance can be a bit tricky, so converting back into cartesian coordinates might be easier. These deadzones are circular on the spherical shell around the origin and you're looking for a t that doesn't hit any on them.
For any fixed u_t you can iterate over all x and using the distance function can find the start and end point of a range of w_t, that are blocked because they fall into the deadzone of the vector x. The union of all 10000 ranges build the possible values of w_t for that given u_t. The same can be done for any fixed w_t, looking for a u_t.
Now comes the part that I'm not entirely sure of: Given that you have two unknows u_t and w_t and 20000 knowns, the system is just a tad overdetermined and if there's a solution, it should be possible to find it.
My suggestion: Set u_t fixed to a random value and check which w_t are possible. If you find a non-empty range, great, you're done. If all w_t are blocked, select a different u_t and try again. Now, selecting u_t at random will work eventually, yet a smarter iteration should be possible. Maybe u_t(n) = u_t(n-1)*phi % 360°, where phi is the golden ratio. That way the u_t never repeat and will cover the whole space with finer and finer granularity instead of starting from one end and going slowly to the other.
Edit: You might also have more luck on the mathematics stackexchange since this isn't so much a code question as it is a mathematics question. For example I'm not sure what I wrote is all that rigorous, so I don't even know it works.
One way would be two build a 2d manifold (area on the sphere) of forbidden areas. You start by adding a point, then, the forbidden area is a circle on the sphere surface.
While true, pick a point on the boundary of the area. If this is not close (within 5 degrees) to any other vector, then, you're done, return it. If not, you just found a new circle of forbidden area. Add it to your manifold of forbidden area. You'll need to chop the circle in line or arc segments and build the boundary as a list.
If the set of vector has no solution, you boundary will collapse to an empty point. Then you return failure.
It's not the easiest approach, and you'll have to deal with the boundaries of a complex shape over a sphere. But it's guaranteed to work and should have reasonable complexity.
I'm using griddata() to interpolate my (irregular) 2-dimensional depth-measurements; x,y,depth. The method does a great job - but it interpolates over the entire grid where it can find to opposing points. I don't want that behaviour. I'd like to have an interpolation around the existing measurements, say with up to an extent of a certain radius.
Is it possible to tell numpy/scipy: don't interpolate if you're too far from an existing measurement? Resulting in a NODATA-value? ideal = griddata(.., .., .., radius=5.0)
edit example:
In the image below; black dots are the measurements. Shades of blue are the interpolated cells by numpy. The area marked in green is in fact part of the picture but is considered as NODATA by numpy (because there's no points in between). Now, the red areas, are interpolated, but I want to get rid of them. any ideas?
Ok cool. I don't think there is a built-in option for griddata() that does what you want, so you will need to write it yourself.
This comes down to calculating the distances between N input data points and M interpolation points. This is simple enough to do but if you have a lot of points it can be slow at ~O(M*N). But here's an example that calculates the distances to allN data points, for each interpolation point. If the number of data points withing the radius is at least neighbors, it keeps the value. Otherwise is writes the value of NODATA.
neighbors is 4 because griddata() will use biilinear interpolation which needs points bounding the interpolants in each dimension (2*2 = 4).
#invec - input points Nx2 numpy array
#mvec - interpolation points Mx2 numpy array
#just some random points for example
N=100
invec = 10*np.random.random([N,2])
M=50
mvec = 10*np.random.random([M,2])
# --- here you would put your griddata() call, returning interpolated_values
interpolated_values = np.zeros(M)
NODATA=np.nan
radius = 5.0
neighbors = 4
for m in range(M):
data_in_radius = np.sqrt(np.sum( (invec - mvec[m])**2, axis=1)) <= radius
if np.sum(data_in_radius) < neighbors :
interpolated_values[m] = NODATA
Edit:
Ok re-read and noticed the input is really 2D. Example modified.
Just as an additional comment, this could be greatly accelerated if you first build a coarse mapping from each point mvec[m] to a subset of the relevant data points.
The costliest step in the loop would change from
np.sqrt(np.sum( (invec - mvec[m])**2, axis=1))
to something like
np.sqrt(np.sum( (invec[subset[m]] - mvec[m])**2, axis=1))
There are plenty of ways to do this, for example using a Quadtree, hashing function, or 2D index. But whether this gives performance advantage depends on the application, how your data is structured, etc.
I have some points that are located in the same place, with WGS84 latlngs, and I want to 'jitter' them randomly so that they don't overlap.
Right now I'm using this crude method, which jitters them within a square:
r['latitude'] = float(r['latitude']) + random.uniform(-0.0005, 0.0005)
r['longitude'] = float(r['longitude']) + random.uniform(-0.0005, 0.0005)
How could I adapt this to jitter them randomly within a circle?
I guess I want a product x*y = 0.001 where x and y are random values. But I have absolutely no idea how to generate this!
(I realise that really I should use something like this to account for the curvature of the earth's surface, but in practice a simple circle is probably fine :) )
One simple way to generate random samples within a circle is to just generate square samples as you are, and then reject the ones that fall outside the circle.
The basic idea is, you generate a vector with x = radius of circle y = 0.
You then rotate the vector by a random angle between 0 and 360, or 0 to 2 pi radians.
You then apply this displacement vector and you have your random jitter in a circle.
An example from one of my scripts:
def get_randrad(pos, radius):
radius = random() * radius
angle = random() * 2 * pi
return (int(pos[0] + radius * cos(angle)),
int(pos[1] + radius * sin(angle)))
pos beeing the target location and radius beeing the "jitter" range.
As pjs pointed out, add
radius *= math.sqrt(random())
for uniform distribution
Merely culling results that fall outside your circle will be sufficient.
If you don't want to throw out some percentage of random results, you could choose a random angle and distance, to ensure all your values fall within the radius of your circle. It's important to note, with this solution, that the precision of the methods you use to extrapolate an angle into a vector will skew your distribution to be more concentrated in the center.
If you make a vector out of your x,y values, and then do something like randomize the length of said vector to fall within your circle, your distribution will no longer be uniform, so I would steer clear of that approach, if uniformity is your biggest concern.
The culling approach is the most evenly distributed, of the three I mentioned, although the random angle/length approach is usually fine, except in cases involving very fine precision and granularity.