Related
I am trying to compute the minimum distance between a set of points and a set of polygons.
My code looks like this:
polys = sf.geometry.tolist()
cities = sf.CITY_LABEL.tolist()
min_dist = np.empty(len(points))
min_city = ['NA'] * len(points)
min_coord = ['NA'] * len(points)
inside = ['NA'] * len(points)
for i, point in enumerate(points):
aux = sf.boundary.distance(point).tolist()
idx = aux.index(min(aux))
min_dist[i] = aux[idx]
min_city[i] = cities[idx]
min_coord[i] = polys[idx].boundary.interpolate(polys[idx].boundary.project(point)).wkt
inside[i] = polys[idx].contains(point)
where the variable points contain the points and the variable sf is my shapefile with polygons.
I then save to a file the min distance in degrees, km (deg*111), closest polygon name (a city), the closest polygon point, and whether the point is inside the polygon. So, I have something like this:
But the minimum distance I computed (column D) is larger than the distance between the point (column A) and the polygon closest point (column F).
Any idea what am I doing wrong? Why is the distance function returning a different distance than the distance between column A and F?
(I also replicated the code in R, and indeed the minimum distance I get is the one computed between column A and F of the file above, not the one in column D)
I am using shapely in python and trying to generate evenly spaced points in a grid that fall within a shape in the fastest O(n) time. The shape may be any closed polygon, not just a square or circle. My current approach is:
Find min/max y & x to build a rectangle.
Build a grid of points given a spacing parameter (resolution)
Verify one-by-one if the points fall within the shape.
Is there a faster way to do this?
# determine maximum edges
polygon = shape(geojson['features'][i]['geometry'])
latmin, lonmin, latmax, lonmax = polygon.bounds
# construct a rectangular mesh
points = []
for lat in np.arange(latmin, latmax, resolution):
for lon in np.arange(lonmin, lonmax, resolution):
points.append(Point((round(lat,4), round(lon,4))))
# validate if each point falls inside shape
valid_points.extend([i for i in points if polygon.contains(i)])
I saw that you answered your question (and seems to be happy with using intersection) but also note that shapely (and the underlying geos library) have prepared geometries for more efficient batch operations on some predicates (contains, contains_properly, covers, and intersects).
See Prepared geometry operations.
Adapted from the code in your question, it could be used like so:
from shapely.prepared import prep
# determine maximum edges
polygon = shape(geojson['features'][i]['geometry'])
latmin, lonmin, latmax, lonmax = polygon.bounds
# create prepared polygon
prep_polygon = prep(polygon)
# construct a rectangular mesh
points = []
for lat in np.arange(latmin, latmax, resolution):
for lon in np.arange(lonmin, lonmax, resolution):
points.append(Point((round(lat,4), round(lon,4))))
# validate if each point falls inside shape using
# the prepared polygon
valid_points.extend(filter(prep_polygon.contains, points))
The best i can think is do this:
X,Y = np.meshgrid(np.arange(latmin, latmax, resolution),
np.arange(lonmin, lonmax, resolution))
#create a iterable with the (x,y) coordinates
points = zip(X.flatten(),Y.flatten())
valid_points.extend([i for i in points if polygon.contains(i)])
if you want to generate n points in a shapely.geometry.Polygon, there is a simple iterative function to do it. Manage tol (tolerance) argument to speed up the points generation.
import numpy as np
from shapely.geometry import Point, Polygon
def gen_n_point_in_polygon(self, n_point, polygon, tol = 0.1):
"""
-----------
Description
-----------
Generate n regular spaced points within a shapely Polygon geometry
-----------
Parameters
-----------
- n_point (int) : number of points required
- polygon (shapely.geometry.polygon.Polygon) : Polygon geometry
- tol (float) : spacing tolerance (Default is 0.1)
-----------
Returns
-----------
- points (list) : generated point geometries
-----------
Examples
-----------
>>> geom_pts = gen_n_point_in_polygon(200, polygon)
>>> points_gs = gpd.GeoSeries(geom_pts)
>>> points_gs.plot()
"""
# Get the bounds of the polygon
minx, miny, maxx, maxy = polygon.bounds
# ---- Initialize spacing and point counter
spacing = polygon.area / n_point
point_counter = 0
# Start while loop to find the better spacing according to tolérance increment
while point_counter <= n_point:
# --- Generate grid point coordinates
x = np.arange(np.floor(minx), int(np.ceil(maxx)), spacing)
y = np.arange(np.floor(miny), int(np.ceil(maxy)), spacing)
xx, yy = np.meshgrid(x,y)
# ----
pts = [Point(X,Y) for X,Y in zip(xx.ravel(),yy.ravel())]
# ---- Keep only points in polygons
points = [pt for pt in pts if pt.within(polygon)]
# ---- Verify number of point generated
point_counter = len(points)
spacing -= tol
# ---- Return
return points
Oh why hell yes. Use the intersection method of shapely.
polygon = shape(geojson['features'][i]['geometry'])
lonmin, latmin, lonmax, latmax = polygon.bounds
# construct rectangle of points
x, y = np.round(np.meshgrid(np.arange(lonmin, lonmax, resolution), np.arange(latmin, latmax, resolution)),4)
points = MultiPoint(list(zip(x.flatten(),y.flatten())))
# validate each point falls inside shapes
valid_points.extend(list(points.intersection(polygon)))
GeoPandas uses shapely under the hood. To get the nearest neighbor I saw the use of nearest_points from shapely. However, this approach does not include k-nearest points.
I needed to compute distances to nearest points from to GeoDataFrames and insert the distance into the GeoDataFrame containing the "from this point" data.
This is my approach using GeoSeries.distance() without using another package or library. Note that when k == 1 the returned value essentially shows the distance to the nearest point. There is also a GeoPandas-only solution for nearest point by #cd98 which inspired my approach.
This works well for my data, but I wonder if there is a better or faster approach or another benefit to use shapely or sklearn.neighbors?
import pandas as pd
import geopandas as gp
gdf1 > GeoDataFrame with point type geometry column - distance from this point
gdf2 > GeoDataFrame with point type geometry column - distance to this point
def knearest(from_points, to_points, k):
distlist = to_points.distance(from_points)
distlist.sort_values(ascending=True, inplace=True) # To have the closest ones first
return distlist[:k].mean()
# looping through a list of nearest points
for Ks in [1, 2, 3, 4, 5, 10]:
name = 'dist_to_closest_' + str(Ks) # to set column name
gdf1[name] = gdf1.geometry.apply(knearest, args=(gdf2, closest_x))
yes there is, but first, I must credit the University of Helsinki from automating GIS process, here's the source code. Here's how
first, read the data, for example, finding nearest bus stops for each building.
# Filepaths
stops = gpd.read_file('data/pt_stops_helsinki.gpkg')
buildings = read_gdf_from_zip('data/building_points_helsinki.zip')
define the function, here, you can adjust the k_neighbors
from sklearn.neighbors import BallTree
import numpy as np
def get_nearest(src_points, candidates, k_neighbors=1):
"""Find nearest neighbors for all source points from a set of candidate points"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
# Return indices and distances
return (closest, closest_dist)
def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
"""
For each point in left_gdf, find closest point in right GeoDataFrame and return them.
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
"""
left_geom_col = left_gdf.geometry.name
right_geom_col = right_gdf.geometry.name
# Ensure that index in right gdf is formed of sequential numbers
right = right_gdf.copy().reset_index(drop=True)
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
# Find the nearest points
# -----------------------
# closest ==> index in right_gdf that corresponds to the closest point
# dist ==> distance between the nearest neighbors (in meters)
closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
# Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
closest_points = right.loc[closest]
# Ensure that the index corresponds the one in left_gdf
closest_points = closest_points.reset_index(drop=True)
# Add distance if requested
if return_dist:
# Convert to meters from radians
earth_radius = 6371000 # meters
closest_points['distance'] = dist * earth_radius
return closest_points
Do the nearest neighbours analysis
# Find closest public transport stop for each building and get also the distance based on haversine distance
# Note: haversine distance which is implemented here is a bit slower than using e.g. 'euclidean' metric
# but useful as we get the distance between points in meters
closest_stops = nearest_neighbor(buildings, stops, return_dist=True)
now join the from and to data frame
# Rename the geometry of closest stops gdf so that we can easily identify it
closest_stops = closest_stops.rename(columns={'geometry': 'closest_stop_geom'})
# Merge the datasets by index (for this, it is good to use '.join()' -function)
buildings = buildings.join(closest_stops)
The answer above using Automating GIS-processes is really nice but there is an error when converting points as a numpy array as RADIANS. The latitude and longitude are reversed.
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.y * np.pi / 180, geom.x * np.pi / 180)).to_list())
Indeed Points are given with (lat, lon) but the longitude correspond the x-axis of a plan or a sphere and the latitude to the y-axis.
If your data are in grid coordinates, then the approach is a bit leaner, but with one key gotcha.
Building on sutan's answer and streamlining the block from the Uni Helsinki...
To get multiple neighbors, you edit the k_neighbors argument....and must ALSO hard code vars within the body of the function (see my additions below 'closest' and 'closest_dist') AND add them to the return statement.
Thus, if you want the 2 closest points, it looks like:
from sklearn.neighbors import BallTree
import numpy as np
def get_nearest(src_points, candidates, k_neighbors=2):
"""
Find nearest neighbors for all source points from a set of candidate points
modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='euclidean')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
closest_second = indices[1] # *manually add per comment above*
closest_second_dist = distances[1] # *manually add per comment above*
# Return indices and distances
return (closest, closest_dist, closest_sec, closest_sec_dist)
The inputs are lists of (x,y) tuples. Thus, since (by question title) your data is in a GeoDataframe:
# easier to read
in_pts = [(row.geometry.x, row.geometry.y) for idx, row in gdf1.iterrows()]
qry_pts = [(row.geometry.x, row.geometry.y) for idx, row in gdf2.iterrows()]
# faster (by about 7X)
in_pts = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
qry_pts = [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]
I'm not interested in distances, so instead of commenting out of the function, I run:
idx_nearest, _, idx_2ndnearest, _ = get_nearest(in_pts, qry_pts)
and get two arrays of the same length of in_pts that, respectively, contain index values of the closest and second closest points from the original geodataframe for qry_pts.
Great solution! If you are using Automating GIS-processes solution, make sure to reset the index of buildings geoDataFrame before join (only if you are using a subset of left_gdf).
buildings.insert(0, 'Number', range(0,len(buildings)))
buildings.set_index('Number' , inplace = True)
Based on the answers before I have a all-in-one solution for you which takes two geopandas.DataFrames as input and searches for the nearest k-neighbors.
def get_nearest_neighbors(gdf1, gdf2, k_neighbors=2):
'''
Find k nearest neighbors for all source points from a set of candidate points
modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
Parameters
----------
gdf1 : geopandas.DataFrame
Geometries to search from.
gdf2 : geopandas.DataFrame
Geoemtries to be searched.
k_neighbors : int, optional
Number of nearest neighbors. The default is 2.
Returns
-------
gdf_final : geopandas.DataFrame
gdf1 with distance, index and all other columns from gdf2.
'''
src_points = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
candidates = [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='euclidean')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
closest_gdfs = []
for k in np.arange(k_neighbors):
gdf_new = gdf2.iloc[indices[k]].reset_index()
gdf_new['distance'] = distances[k]
gdf_new = gdf_new.add_suffix(f'_{k+1}')
closest_gdfs.append(gdf_new)
closest_gdfs.insert(0,gdf1)
gdf_final = pd.concat(closest_gdfs,axis=1)
return gdf_final
I have a set of points in an example ASCII file showing a 2D image.
I would like to estimate the total area that these points are filling. There are some places inside this plane that are not filled by any point because these regions have been masked out. What I guess might be practical for estimating the area would be applying a concave hull or alpha shapes.
I tried this approach to find an appropriate alpha value, and consequently estimate the area.
from shapely.ops import cascaded_union, polygonize
import shapely.geometry as geometry
from scipy.spatial import Delaunay
import numpy as np
import pylab as pl
from descartes import PolygonPatch
from matplotlib.collections import LineCollection
def plot_polygon(polygon):
fig = pl.figure(figsize=(10,10))
ax = fig.add_subplot(111)
margin = .3
x_min, y_min, x_max, y_max = polygon.bounds
ax.set_xlim([x_min-margin, x_max+margin])
ax.set_ylim([y_min-margin, y_max+margin])
patch = PolygonPatch(polygon, fc='#999999',
ec='#000000', fill=True,
zorder=-1)
ax.add_patch(patch)
return fig
def alpha_shape(points, alpha):
if len(points) < 4:
# When you have a triangle, there is no sense
# in computing an alpha shape.
return geometry.MultiPoint(list(points)).convex_hull
def add_edge(edges, edge_points, coords, i, j):
"""
Add a line between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
return
edges.add( (i, j) )
edge_points.append(coords[ [i, j] ])
coords = np.array([point.coords[0]
for point in points])
tri = Delaunay(coords)
edges = set()
edge_points = []
# loop over triangles:
# ia, ib, ic = indices of corner points of the
# triangle
for ia, ib, ic in tri.vertices:
pa = coords[ia]
pb = coords[ib]
pc = coords[ic]
# Lengths of sides of triangle
a = np.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
b = np.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
c = np.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)
# Semiperimeter of triangle
s = (a + b + c)/2.0
# Area of triangle by Heron's formula
area = np.sqrt(s*(s-a)*(s-b)*(s-c))
circum_r = a*b*c/(4.0*area)
# Here's the radius filter.
#print circum_r
if circum_r < 1.0/alpha:
add_edge(edges, edge_points, coords, ia, ib)
add_edge(edges, edge_points, coords, ib, ic)
add_edge(edges, edge_points, coords, ic, ia)
m = geometry.MultiLineString(edge_points)
triangles = list(polygonize(m))
return cascaded_union(triangles), edge_points
points=[]
with open("test.asc") as f:
for line in f:
coords=map(float,line.split(" "))
points.append(geometry.shape(geometry.Point(coords[0],coords[1])))
print geometry.Point(coords[0],coords[1])
x = [p.x for p in points]
y = [p.y for p in points]
pl.figure(figsize=(10,10))
point_collection = geometry.MultiPoint(list(points))
point_collection.envelope
convex_hull_polygon = point_collection.convex_hull
_ = plot_polygon(convex_hull_polygon)
_ = pl.plot(x,y,'o', color='#f16824')
concave_hull, edge_points = alpha_shape(points, alpha=0.001)
lines = LineCollection(edge_points)
_ = plot_polygon(concave_hull)
_ = pl.plot(x,y,'o', color='#f16824')
I get this result but I would like that this method could detect the hole in the middle.
Update
This is how my real data looks like:
My question is what is the best way to estimate an area of the aforementioned shape? I can not figure out what has gone wrong that this code doesn't work properly?!! Any help will be appreciated.
Okay, here's the idea. A Delaunay triangulation is going to generate triangles which are indiscriminately large. It's also going to be problematic because only triangles will be generated.
Therefore, we'll generate what you might call a "fuzzy Delaunay triangulation". We'll put all the points into a kd-tree and, for each point p, look at its k nearest neighbors. The kd-tree makes this fast.
For each of those k neighbors, find the distance to the focal point p. Use this distance to generate a weighting. We want nearby points to be favored over more distant points, so an exponential function exp(-alpha*dist) is appropriate here. Use the weighted distances to build a probability density function describing the probability of drawing each point.
Now, draw from that distribution a large number of times. Nearby points will be chosen often while farther away points will be chosen less often. For point drawn, make a note of how many times it was drawn for the focal point. The result is a weighted graph where each edge in the graph connects nearby points and is weighted by how often the pairs were chosen.
Now, cull all edges from the graph whose weights are too small. These are the points which are probably not connected. The result looks like this:
Now, let's throw all of the remaining edges into shapely. We can then convert the edges into very small polygons by buffering them. Like so:
Differencing the polygons with a large polygon covering the entire region will yield polygons for the triangulation. THIS MAY TAKE A WHILE. The result looks like this:
Finally, cull off all of the polygons which are too large:
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
import random
import scipy
import scipy.spatial
import networkx as nx
import shapely
import shapely.geometry
import matplotlib
dat = np.loadtxt('test.asc')
xycoors = dat[:,0:2]
xcoors = xycoors[:,0] #Convenience alias
ycoors = xycoors[:,1] #Convenience alias
npts = len(dat[:,0]) #Number of points
dist = scipy.spatial.distance.euclidean
def GetGraph(xycoors, alpha=0.0035):
kdt = scipy.spatial.KDTree(xycoors) #Build kd-tree for quick neighbor lookups
G = nx.Graph()
npts = np.max(xycoors.shape)
for x in range(npts):
G.add_node(x)
dist, idx = kdt.query(xycoors[x,:], k=10) #Get distances to neighbours, excluding the cenral point
dist = dist[1:] #Drop central point
idx = idx[1:] #Drop central point
pq = np.exp(-alpha*dist) #Exponential weighting of nearby points
pq = pq/np.sum(pq) #Convert to a PDF
choices = np.random.choice(idx, p=pq, size=50) #Choose neighbors based on PDF
for c in choices: #Insert neighbors into graph
if G.has_edge(x, c): #Already seen neighbor
G[x][c]['weight'] += 1 #Strengthen connection
else:
G.add_edge(x, c, weight=1) #New neighbor; build connection
return G
def PruneGraph(G,cutoff):
newg = G.copy()
bad_edges = set()
for x in newg:
for k,v in newg[x].items():
if v['weight']<cutoff:
bad_edges.add((x,k))
for b in bad_edges:
try:
newg.remove_edge(*b)
except nx.exception.NetworkXError:
pass
return newg
def PlotGraph(xycoors,G,cutoff=6):
xcoors = xycoors[:,0]
ycoors = xycoors[:,1]
G = PruneGraph(G,cutoff)
plt.plot(xcoors, ycoors, "o")
for x in range(npts):
for k,v in G[x].items():
plt.plot((xcoors[x],xcoors[k]),(ycoors[x],ycoors[k]), 'k-', lw=1)
plt.show()
def GetPolys(xycoors,G):
#Get lines connecting all points in the graph
xcoors = xycoors[:,0]
ycoors = xycoors[:,1]
lines = []
for x in range(npts):
for k,v in G[x].items():
lines.append(((xcoors[x],ycoors[x]),(xcoors[k],ycoors[k])))
#Get bounds of region
xmin = np.min(xycoors[:,0])
xmax = np.max(xycoors[:,0])
ymin = np.min(xycoors[:,1])
ymax = np.max(xycoors[:,1])
mls = shapely.geometry.MultiLineString(lines) #Bundle the lines
mlsb = mls.buffer(2) #Turn lines into narrow polygons
bbox = shapely.geometry.box(xmin,ymin,xmax,ymax) #Generate background polygon
polys = bbox.difference(mlsb) #Subtract to generate polygons
return polys
def PlotPolys(polys,area_cutoff):
fig, ax = plt.subplots(figsize=(8, 8))
for polygon in polys:
if polygon.area<area_cutoff:
mpl_poly = matplotlib.patches.Polygon(np.array(polygon.exterior), alpha=0.4, facecolor=np.random.rand(3,1))
ax.add_patch(mpl_poly)
ax.autoscale()
fig.show()
#Functional stuff starts here
G = GetGraph(xycoors, alpha=0.0035)
#Choose a value that rips off an appropriate amount of the left side of this histogram
weights = sorted([v['weight'] for x in G for k,v in G[x].items()])
plt.hist(weights, bins=20);plt.show()
PlotGraph(xycoors,G,cutoff=6) #Plot the graph to ensure our cut-offs were okay. May take a while
prunedg = PruneGraph(G,cutoff=6) #Prune the graph
polys = GetPolys(xycoors,prunedg) #Get polygons from graph
areas = sorted(p.area for p in polys)
plt.plot(areas)
plt.hist(areas,bins=20);plt.show()
area_cutoff = 150000
PlotPolys(polys,area_cutoff=area_cutoff)
good_polys = ([p for p in polys if p.area<area_cutoff])
total_area = sum([p.area for p in good_polys])
Here's a thought: use k-means clustering.
You can accomplish this in Python as follows:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
dat = np.loadtxt('test.asc')
xycoors = dat[:,0:2]
fit = KMeans(n_clusters=2).fit(xycoors)
plt.scatter(dat[:,0],dat[:,1], c=fit.labels_)
plt.axes().set_aspect('equal', 'datalim')
plt.gray()
plt.show()
Using your data, this gives the following result:
Now, you can take the convex hull of the top cluster and the bottom cluster and calculate the areas of each separately. Adding the areas then becomes an estimator of the area of their union, but, cunningly, avoids the hole in the middle.
To fine-tune your results, you can play with the number of clusters and the number of different starts to the algorithm (the algorithm is randomized and is typically run more than once).
You asked, for instance, if two clusters will always leave the hole in the middle. I've used the following code to experiment with that. I generate a uniform distribution of points and then chop out a randomly sized and orientated ellipse to simulate a hole.
#!/usr/bin/env python3
import sklearn
import sklearn.cluster
import numpy as np
import matplotlib.pyplot as plt
PWIDTH = 6
PHEIGHT = 6
def GetPoints(num):
return np.random.rand(num,2)*300-150 #Centered about zero
def MakeHole(pts): #Chop out a randomly orientated and sized ellipse
a = np.random.uniform(10,150) #Semi-major axis
b = np.random.uniform(10,150) #Semi-minor axis
h = np.random.uniform(-150,150) #X-center
k = np.random.uniform(-150,150) #Y-center
A = np.random.uniform(0,2*np.pi) #Angle of rotation
surviving_points = []
for pt in range(pts.shape[0]):
x = pts[pt,0]
y = pts[pt,1]
if ((x-h)*np.cos(A)+(y-k)*np.sin(A))**2/a/a+((x-h)*np.sin(A)-(y-k)*np.cos(A))**2/b/b>1:
surviving_points.append(pt)
return pts[surviving_points,:]
def ShowManyClusters(pts,fitter,clusters,title):
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
fig,axs = plt.subplots(PWIDTH,PHEIGHT)
axs = axs.ravel()
for i in range(PWIDTH*PHEIGHT):
lbls = fitter(pts[i],clusters)
axs[i].scatter(pts[i][:,0],pts[i][:,1], c=colors[lbls])
axs[i].get_xaxis().set_ticks([])
axs[i].get_yaxis().set_ticks([])
plt.suptitle(title)
#plt.show()
plt.savefig('/z/'+title+'.png')
fitters = {
'SpectralClustering': lambda x,clusters: sklearn.cluster.SpectralClustering(n_clusters=clusters,affinity='nearest_neighbors').fit(x).labels_,
'KMeans': lambda x,clusters: sklearn.cluster.KMeans(n_clusters=clusters).fit(x).labels_,
'AffinityPropagation': lambda x,clusters: sklearn.cluster.AffinityPropagation().fit(x).labels_,
}
np.random.seed(1)
pts = []
for i in range(PWIDTH*PHEIGHT):
temp = GetPoints(300)
temp = MakeHole(temp)
pts.append(temp)
for name,fitter in fitters.items():
for clusters in [2,3]:
np.random.seed(1)
ShowManyClusters(pts,fitter,clusters,"{0}: {1} clusters".format(name,clusters))
Consider the results for K-Means:
At least to my eye, it seems as though using two clusters performs worst when the "hole" separates the data into two separate blobs. (In this case that occurs when the ellipse is orientated such that it overlaps two edges of the rectangular region containing the sample points.) Using three clusters resolves most of these difficulties.
You'll also notice that K-means produces some counter-intuitive results on the 1st Column, 3rd Row as well as on the 3rd Column, 4th Row. Reviewing sklearn's menagerie of clustering methods here shows the following comparison image:
From this, image it seems as though SpectralClustering produces results that align with what we want. Trying this on the same data above fixes the problems mentioned (see 1st Column, 3rd Row and 3rd Column, 4th Row).
The foregoing suggests that Spectral clustering with three clusters should be adequate for most situations of this sort.
Although you seem intent on doing a concave shape, here is an alternate route that is hella fast and I think would give you very a pretty stable reading:
Create a function which takes as an argument (int radiusOfInfluence). Inside the function run a voxel filter with that as the radius. Then simply multiply the area of that circle (pi*AOI^2) by the number of remaining points in the cloud. This should give you a relatively robust estimation of area and would be very resilient to holes and weird edges.
Some things to consider:
-This will give you a positive overshoot of area due to over-reaching edges by exactly one radius. A modification to adjust for this could be to run a statistical outlier removal filter (in inverse mode) to acquire statistical edge points. Then an assumption can be made that approximately half of each edge point is lying outside the shape, subtract half the number of points found from your total count prior to multiplying into area.
-The radius of influence largely determines this function's hole detection as a larger one will allow single points to cover larger areas, but also by tuning the std cutoff on the stat outlier filter, you can more aggressively detect interior holes and adjust your area that way.
It really begs the question of what you are after, as this is more of a shot accuracy/ shot grouping type assessment assuming a reasonably distributed set of samples. Your method kinda is making the assumption that your outer edge points are the absolute limits of what is possible (which may be a fair assumption depending on the situation)
EDIT-----------------------
I do not have time to write out example code, but I can further explain to aid in understanding.
At the core of this is the voxel filter. Very simply, it sets a seed point in x,y coordinates and then creates a grid over the whole space which has units (grid spacing) on both axes of a user specified filter radius. Inside each grid box, it will average all points to a single point. This is very important for this concept because it almost entirely eliminates the issue of overlap.
The second part (the inverse stat outlier removal) is just a bit of cleverness to tighten your edge fit. Basically, stat outlier is built to remove noise by looking at the distance from each point to its (k) nearest neighbors. After generating the average distance to k nearest neighbors for each point, it sets up a histogram and a user defined parameter acts as a binary threshold for keeping or removing points. When inverted and set to a reasonable cutt-off (~0.75 std should work), instead it will delete all the points that are in the bulk of the object (ie only leaving edge points). The reason this is important is that technically these points are over-reaching the boundary of your object by 1 radius. Although some will be on acute and some on obtuse edge angles (ie more than or less than half a circle of overfill) taking off 1/2 of a circle area per point should over the whole object give you a pretty sound improvement on edge fit.
Keep in mind though that at the end of the day, this is just going to give you a number. As far as stress testing, I suggest creating contrived point clouds of known area and or creating a graphical output that shows where you are dropping circles and half circles (oriented towards the interior of the object if you are fancy).
The knobs you will want to turn to improve this method are:
Voxel filter radius, area of influence per point (could actually be controlled separately from vox filter radius, though they should remain pretty close to one another), std cutt-off.
Hope this helped to clarify, cheers!
Edit:
I have noticed that you have your own code to compute the alpha shape,
and the areas of Delaunay triangles are just there, so computing the area of the shape is even easier...
Just add the areas of triangles, if triangle is going to be added to the alpha-shape polygon.
If you want to detect holes... add a secondary threshold to avoid adding triangles with an area greater than the threshold. For this example, a value of max_area = 99999 will remove the hole.
The only problem is the way you create the graphic output, because you will not see the hole.
def alpha_shape(points, alpha, max_area):
if len(points) < 4:
# When you have a triangle, there is no sense
# in computing an alpha shape.
return geometry.MultiPoint(list(points)).convex_hull , 0
def add_edge(edges, edge_points, coords, i, j):
"""
Add a line between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
return
edges.add( (i, j) )
edge_points.append(coords[ [i, j] ])
coords = np.array([point.coords[0]
for point in points])
tri = Delaunay(coords)
total_area = 0
edges = set()
edge_points = []
# loop over triangles:
# ia, ib, ic = indices of corner points of the
# triangle
for ia, ib, ic in tri.vertices:
pa = coords[ia]
pb = coords[ib]
pc = coords[ic]
# Lengths of sides of triangle
a = np.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
b = np.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
c = np.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)
# Semiperimeter of triangle
s = (a + b + c)/2.0
# Area of triangle by Heron's formula
area = np.sqrt(s*(s-a)*(s-b)*(s-c))
circum_r = a*b*c/(4.0*area)
# Here's the radius filter.
# print("radius", circum_r)
if circum_r < 1.0/alpha and area < max_area:
add_edge(edges, edge_points, coords, ia, ib)
add_edge(edges, edge_points, coords, ib, ic)
add_edge(edges, edge_points, coords, ic, ia)
total_area += area
m = geometry.MultiLineString(edge_points)
triangles = list(polygonize(m))
return cascaded_union(triangles), edge_points, total_area
The
Old answer:
To compute the area of an irregular simple polygon, you can use the Shoelace formula, and the CCW coordinates of the boundary as input.
If you want to detect holes inside of your cloud, you have to remove the Delaunay triangles with a circumradius greater that a secondary threshold.
The ideal is: Compute the Delaunay triangulation and filter with your current alpha shape. Then, compute the circumradius of every triangle and remove those triangles with circumradius much bigger than average circumradius.
To compute the area of an irregular polygon with holes, use the Shoelace formula for each hole boundary. Input the external boundary in CCW (positive) order to obtain the area. Then input the boundary of each hole in CW (negative) order, to obtain a (negative) value for area.
This question is related to Transformation between two set of points . Hovewer this is better specified, and some assumptions added.
I have element image and some model.
I've detected contours on both
contoursModel0, hierarchyModel = cv2.findContours(model.copy(), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE);
contoursModel = [cv2.approxPolyDP(cnt, 2, True) for cnt in contoursModel0];
contours0, hierarchy = cv2.findContours(canny.copy(), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE);
contours = [cv2.approxPolyDP(cnt, 2, True) for cnt in contours0];
Then I've matched each contour to each other
modelMassCenters = [];
imageMassCenters = [];
for cnt in contours:
for cntModel in contoursModel:
result = cv2.matchShapes(cnt, cntModel, cv2.cv.CV_CONTOURS_MATCH_I1, 0);
if(result != 0):
if(result < 0.05):
#Here are matched contours
momentsModel = cv2.moments(cntModel);
momentsImage = cv2.moments(cnt);
massCenterModel = (momentsModel['m10']/momentsModel['m00'],
momentsModel['m01']/momentsModel['m00']);
massCenterImage = (momentsImage['m10']/momentsImage['m00'],
momentsImage['m01']/momentsImage['m00']);
modelMassCenters.append(massCenterModel);
imageMassCenters.append(massCenterImage);
Matched contours are something like features.
Now I want to detect transformation between this two sets of points.
Assumptions: element is rigid body, only rotation, displacement and scale change.
Some features may be miss detected how to eliminate them. I've once used cv2.findHomography and it takes two vectors and calculates homography between them even there are some miss matches.
cv2.getAffineTransformation takes only three points (can't cope missmatches) and here I have multiple features.
Answer in my previous question says how to calculate this transformation but does not take missmatches. Also I think that it is possible to return some quality level from algorithm (by checking how many points are missmatched, after computing some transformation from the rest)
And the last question: should I take all vector points to compute transformation or treat only mass centers of this shapes as feature?
To show it I've added simple image. Features with green are good matches in red bad matches. Here match should be computed from 3 green featrues and red missmatches should affect match quality.
I'm adding fragments of solution I've figured out for now (but I think it could be done much better):
for i in range(0, len(modelMassCenters) - 1):
for j in range(i + 1, len(modelMassCenters) - 1 ):
x1, y1 = modelMassCenters[i];
x2, y2 = modelMassCenters [j];
modelVec = (x2 - x1, y2 - y1);
x1, y1 = imageMassCenters[i];
x2, y2 = imageMassCenters[j];
imageVec = (x2 - x1, y2 - y1);
rotation = angle(modelVec,imageVec);
rotations.append((i, j, rotation));
scale = length(modelVec)/length(imageVec);
scales.append((i, j, scale));
After computing scales and rotation given by each pair of corresponding lines I'm going to find median value and average values of rotation which does not differ more than some delta from median value. The same thing with scale. Then points which are making those values taken to computation will be used to compute displacement.
Your second step (match contours to each other by doing a pairwise shape comparison) sounds very vulnerable to errors if features have a similar shape, e.g., you have several similar-sized circular contours. Yet if you have a rigid body with 5 circular features in one quadrant only, you could get a very robust estimate of the affine transform if you consider the body and its features as a whole. So don't discard information like a feature's range and direction from the center of the whole body when matching features. Those are at least as important in correlating features as size and shape of the individual contour.
I'd try something like (untested pseudocode):
"""
Convert from rectangular (x,y) to polar (r,w)
r = sqrt(x^2 + y^2)
w = arctan(y/x) = [-\pi,\pi]
"""
def polar(x, y): # w in radians
from math import hypot, atan2, pi
return hypot(x, y), atan2(y, x)
model_features = []
model = params(model_body_contour) # return tuple (center_x, center_y, area)
for contour in model_feature_contours:
f = params(countour)
range, angle = polar(f[0]-model[0], f[1]-model[1])
model_features.append((angle, range, f[2]))
image_features = []
image = params(image_body_contour)
for contour in image_feature_contours:
f = params(countour)
range, angle = polar(f[0]-image[0], f[1]-image[1])
image_features.append((angle, range, f[2]))
# sort image_features and model_features by angle, range
#
# correlate image_features against model_features across angle offsets
# rotation = angle offset of max correlation
# scale = average(model areas and ranges) / average(image areas and ranges)
If you have very challenging images, such as a ring of 6 equally-spaced similar-sized features, 5 of which have the same shape and one is different (e.g. 5 circles and a star), you could add extra parameters such as eccentricity and sharpness to the list of feature parameters, and include them in the correlation when searching for the rotation angle.