how to optimize performances of geometry operations

how to optimize performances of geometry operations - python

I am looking for an approach to optimize performances of geometry operations. My goal is to count how many points (205,779) within a series of polygons (21,562). Using python and R are preferable as well as GIS software, like ArcGIS, QGIS.
Here are solutions I have searched and written.
using ArcGIS: one of examples is in http://support.esri.com/cn/knowledgebase/techarticles/detail/30779 -> although I did not try it, it always take a large amount of time in spatial join, based on my previous experiences.
using GDAL, OGR: Here is an example: http://geoexamples.blogspot.tw/2012/06/density-maps-using-gdalogr-python.html -> It takes 5 to 9 seconds for every polygon.
using Shapely prepared geometry operations with a loop: Here is my example, and it takes 2.7 to 3.0 seconds for every polygon. (Note that points is Point objects in a list)
prep_poly=[]
for i in polygons:
mycount=[]
for j in points:
if prep(i).contains(j):
mycount.append(1) #count how many points within polygons
prep_poly.append(sum(mycount)) #sum once for every polygon
mycount=[]
using Shapely prepared geometry operations with a filter: Here is my example, and it takes about 3.3 to 3.9 seconds for every polygon.(Note that points is a MultiPoint object)
prep_poly=[]
for i in polygons:
prep_poly.append(len(filter(prep(i).contains, point1)))
Though prepared geometry operations did improve the performances, it is still time-consuming to process lots of polygons. Any suggestion? Thanks!

Rather than looking through every pixel on the screen for every rectangle, you can do the following (Python code):
first_pixel = any pixel in the polygon
px_list = [] # array with pixels left to check
px_list.append(first_pixel) # add pixel to list of pixels to process
count = 0
while len(array) > 0: # pixels left in pixel list
curr_pixel = array[0]
for pixel in get_adjacent_pixels(curr_pixel): # find adjacent pixels
# ie (vertical, horizontal, diagonal)
if pixel in shape:
px_list.append(pixel) # add pixel to list
px_list.remove(curr_pixel)
count += 1
Essentially, the same way that path finding works. Check this wiki article for a visual representation of the above algorithm:
http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Algorithm
If you have no easy way to find starting points you could loop through all of the points once, checking for each point whether it is contained by a shape, and then storing that point together with the shape in a separate list and deleting it from the original shapes-for-which-we-have-no-point-yet list.

Related

Calculating the nearest neighbour in a 2d grid using multilevel solution

I have a problem where in a grid of x*y size I am provided a single dot, and I need to find the nearest neighbour. In practice, I am trying to find the closest dot to the cursor in pygame that crosses a color distance threshold that is calculated as following:
sqrt(((rgb1[0]-rgb2[0])**2)+((rgb1[1]-rgb2[1])**2)+((rgb1[2]-rgb2[2])**2))
So far I have a function that calculates the different resolutions for the grid and reduces it by a factor of two while always maintaining the darkest pixel. It looks as following:
from PIL import Image
from typing import Dict
import numpy as np
#we input a pillow image object and retrieve a dictionary with every grid version of the 3 dimensional array:
def calculate_resolutions(image: Image) -> Dict[int, np.ndarray]:
resolutions = {}
#we start with the highest resolution image, the size of which we initially divide by 1, then 2, then 4 etc.:
divisor = 1
#reduce the grid by 5 iterations
resolution_iterations = 5
for i in range(resolution_iterations):
pixel_lookup = image.load() #convert image to PixelValues object, which allows for pixellookup via [x,y] index
#calculate the resolution of the new grid, round upwards:
resolution = (int((image.size[0] - 1) // divisor + 1), int((image.size[1] - 1) // divisor + 1))
#generate 3d array with new grid resolution, fill in values that are darker than white:
new_grid = np.full((resolution[0],resolution[1],3),np.array([255,255,255]))
for x in range(image.size[0]):
for y in range(image.size[1]):
if not x%divisor and not y%divisor:
darkest_pixel = (255,255,255)
x_range = divisor if x+divisor<image.size[0] else (0 if image.size[0]-x<0 else image.size[0]-x)
y_range = divisor if y+divisor<image.size[1] else (0 if image.size[1]-y<0 else image.size[1]-y)
for x_ in range(x,x+x_range):
for y_ in range(y,y+y_range):
if pixel_lookup[x_,y_][0]+pixel_lookup[x_,y_][1]+pixel_lookup[x_,y_][2] < darkest_pixel[0]+darkest_pixel[1]+darkest_pixel[2]:
darkest_pixel = pixel_lookup[x_,y_]
if darkest_pixel != (255,255,255):
new_grid[int(x/divisor)][int(y/divisor)] = np.array(darkest_pixel)
resolutions[i] = new_grid
divisor = divisor*2
return resolutions
This is the most performance efficient solution I was able to come up with. If this function is run on a grid that continually changes, like a video with x fps, it will be very performance intensive. I also considered using a kd-tree algorithm that simply adds and removes any dots that happen to change on the grid, but when it comes to finding individual nearest neighbours on a static grid this solution has the potential to be more resource efficient. I am open to any kinds of suggestions in terms of how this function could be improved in terms of performance.
Now, I am in a position where for example, I try to find the nearest neighbour of the current cursor position in a 100x100 grid. The resulting reduced grids are 50^2, 25^2, 13^2, and 7^2. In a situation where a part of the grid looks as following:
And I am on the aggregation step where a part of the grid consisting of six large squares, the black one being the current cursor position and the orange dots being dots where the color distance threshold is crossed, I would not know which diagonally located closest neighbour I would want to pick to search next. In this case, going one aggregation step down shows that the lower left would be the right choice. Depending on how many grid layers I have this could result in a very large error in terms of the nearest neighbour search. Is there a good way how I can solve this problem? If there are multiple squares that show they have a relevant location, do I have to search them all in the next step to be sure? And if that is the case, the further away I get the more I would need to make use of math functions such as the pythagorean theorem to assert whether the two positive squares I find are overlapping in terms of distance and could potentially contain the closest neighbour, which would start to be performance intensive again if the function is called frequently. Would it still make sense to pursue this solution over a regular kd tree? For now the grid size is still fairly small (~800-600) but if the grid gets larger the performance may start suffering again. Is there a good scalable solution to this problem that could be applied here?

Any faster way to determine how far away the points from a point, line, and polygon?

I'm trying to make a code that calculates the distance between a set of points from a set of point / line / polygon.
Code below with the sample data gives me data but it is taking forever to go through all the points (Around an hour or so)
I am using shapely because it should also include distance between:
point - point
point - line segment
point - polygon
Line segments and polygons are not included in the code
Is it because I am using for loop?
Is there more efficient way of achieving this?
from timeit import default_timer as timer
import time
start = timer()
import numpy as np
import shapely
import progressbar
from shapely.geometry import Point
#Create 10k Random X and Y coordinates
x_coordinates=np.random.rand(10000)
y_coordinates=np.random.rand(10000)
#Create 40k Center points of Circles
Circles=np.ones((201*201,2),dtype=float)
linspace=np.linspace(-1, 1, num=201) #set distance between circles for sample data. Actual data are more randomly placed and changes from design to design
temp=0
#Make array of circles
for x in linspace:
Circles[temp:temp+201,0]=x
Circles[temp:temp+201,1]=linspace
temp=temp+201
#Create empty array for saving result
#result should save which circle the point belongs
result=np.empty([10000, 2], dtype=object)
for x in progressbar.progressbar(range(10000)):
defect = Point(x_coordinates[x],y_coordinates[x]) #go through 10000 points
for j in range(201*201):
if defect.distance(Point(Circles[j,:]))<0.005: #go through 40000 circles
result[x]=Circles[j,:]
break #break if match found
end = timer()
print(end - start)

I'm not familiar with numpy (or shapely) but based on your code you are looking for circles that are close to your points. Slightly confused about why you have circles at all based on the title of the question. (Is there any need for circles in your code as they seem to be only used as points anyway?)
distance between a set of points from a set of point
Do you need the distance for each point to each of your circles? What is it that you are specifically looking for?
Your algorithm is slow for two reasons:
Calculate distance between two points. This is a straight forward calculation but involves taking a square root which is slow compared to other operations.
Instead use the square of the distance. Use the formula for the distance but just skip the square root. Maybe this isn't too slow in shapely.
Comparing each point with possibly EVERY circle by calculating the distance. This is most likely the main cause of your code being slow.
If you don't need the distance from each point to all the circles then you need a fast way to find the circle you are looking for. You could consider ordering your circles by the x-coordinates to achieve this. Based on your code looking for circles very close (0.005) to your point, you could easily eliminate all circles whose x-coordinate is further than that from your point and completely skip calculating the distance between the two. (Then you could do the same for the y-coordinates.) This way you wouldn't need to look at all the circles but could jump out of that loop because you know the rest is going to be further away on that axis.
If you provide a better description of what exactly you are looking for then someone can probably provide an example of how you could implement it.

Creating and offsetting points outside polygon on a discrete grid

I am working in a discrete 2D grid of points in which there are "shapes" that I would like to create points outside of. I have been able to identify the vertices of these points and take convex hulls. So far, this leads to this and all is good and well. The purple here is the shape in question and the red line is the convex contour I have computed.
What I would like to do now is create two neighborhoods of points outside this shape. The first one is a set of points directly outside (as close as the grid size will allow), the second is another set of points but offset some distance away (the distance is not fixed, but rather an input).
I have attempted to write this in Python and get okay results. Here is an example of my current output. The problem is I notice the offsets are not perfect, for example look at the bottom most point in the image I attached. It kinks downwards whereas the original shape does not. It's not too bad in this example, but in other cases where the shape is smaller or if I take a smaller offset it gets worse. I also have an issue where the offsets sometimes overlap, even if they are supposed to be some distance away. I would also like there to be one line in each section of the contour, not two lines (for example in the top left).
My current attempt uses the Shapely package to handle most of the computational geometry. An outline of what I do once I have found the vertices of the convex contour is to offset these vertices by some amount, and interpolate along each pair of vertices to obtain many points alone these lines. Afterwards I use a coordinate transform to identify all points to the nearest grid point. This is how I obtain my final set of points. Below is the actual code I have written.
How can I improve this so I don't run into the issues I described?
Function #1 - Computes the offset points
def OutsidePoints(vertices, dist):
poly_line = LinearRing(vertices)
poly_line_offset = poly_line.buffer(dist, resolution=1, join_style=2, mitre_limit=1).exterior
new_vertices = list(poly_line_offset.coords)
new_vertices = np.asarray(new_vertices)
shape = sg.Polygon(new_vertices)
points = []
for t in np.arange(0, shape.length, step_size):
temp_points = np.transpose(shape.exterior.interpolate(t).xy)
points.append(temp_points[0])
points = np.array(points)
points = np.unique(points, axis=0)
return points
Function #2 - Transforming these points into points that are on my grid
def IndexFinder(points):
index_points = invCoordinateTransform(points)
for i in range(len(index_points)):
for j in range(2):
index_points[i][j] = math.floor(index_points[i][j])
index_points = np.unique(index_points, axis=0)
return index_points
Many thanks!

Lucas Kanade: How to calculate distance between tracked points

I'm using lucas-kanade opencv implementation to track objects between frames. I want to be able to do the following two things:
Calculate the distance moved by each point between frames
Track bounding boxes for each object across frames
I have obtained the features to track using cv2.goodFeaturesToTrack(). I also add the bounding boxes of objects to the features to be tracked. Right now I am using the following to calculate distance between the points
np.sqrt(np.square(new_pts - old_pts).sum(axis=1).sum(axis=1)). I am not quite sure if this is the correct way to do this because the indices of the points might be different in the new_pts.
Is the assumption that every index in old_pts corresponds to the same feature in new_pts array correct?
Secondly, is there a way to track bounding boxes across frames using lucas kanade?

In new_pts points have the same index. But they can be not founded - see to the status array: if status[i] == 1 then new_pts[i] contains a new coordinates of the old_pts[i].
For the more robustness it can to search direct flow (goodFeaturesToTrack(frame1) -> LK flow), backward flow (goodFeaturesToTrack(frame2) -> LK flow) and leave the points whose coordinates are equal in both directions.

Numpy griddata interpolation up to certain radius

I'm using griddata() to interpolate my (irregular) 2-dimensional depth-measurements; x,y,depth. The method does a great job - but it interpolates over the entire grid where it can find to opposing points. I don't want that behaviour. I'd like to have an interpolation around the existing measurements, say with up to an extent of a certain radius.
Is it possible to tell numpy/scipy: don't interpolate if you're too far from an existing measurement? Resulting in a NODATA-value? ideal = griddata(.., .., .., radius=5.0)
edit example:
In the image below; black dots are the measurements. Shades of blue are the interpolated cells by numpy. The area marked in green is in fact part of the picture but is considered as NODATA by numpy (because there's no points in between). Now, the red areas, are interpolated, but I want to get rid of them. any ideas?

Ok cool. I don't think there is a built-in option for griddata() that does what you want, so you will need to write it yourself.
This comes down to calculating the distances between N input data points and M interpolation points. This is simple enough to do but if you have a lot of points it can be slow at ~O(M*N). But here's an example that calculates the distances to allN data points, for each interpolation point. If the number of data points withing the radius is at least neighbors, it keeps the value. Otherwise is writes the value of NODATA.
neighbors is 4 because griddata() will use biilinear interpolation which needs points bounding the interpolants in each dimension (2*2 = 4).
#invec - input points Nx2 numpy array
#mvec - interpolation points Mx2 numpy array
#just some random points for example
N=100
invec = 10*np.random.random([N,2])
M=50
mvec = 10*np.random.random([M,2])
# --- here you would put your griddata() call, returning interpolated_values
interpolated_values = np.zeros(M)
NODATA=np.nan
radius = 5.0
neighbors = 4
for m in range(M):
data_in_radius = np.sqrt(np.sum( (invec - mvec[m])**2, axis=1)) <= radius
if np.sum(data_in_radius) < neighbors :
interpolated_values[m] = NODATA
Edit:
Ok re-read and noticed the input is really 2D. Example modified.
Just as an additional comment, this could be greatly accelerated if you first build a coarse mapping from each point mvec[m] to a subset of the relevant data points.
The costliest step in the loop would change from
np.sqrt(np.sum( (invec - mvec[m])**2, axis=1))
to something like
np.sqrt(np.sum( (invec[subset[m]] - mvec[m])**2, axis=1))
There are plenty of ways to do this, for example using a Quadtree, hashing function, or 2D index. But whether this gives performance advantage depends on the application, how your data is structured, etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.