Fix up shapely polygon object when discontinuous after map projection

Fix up shapely polygon object when discontinuous after map projection - python

This demo program (intended to be run in an IPython notebook; you need matplotlib, mpl_toolkits.basemap, pyproj, and shapely) is supposed to plot increasingly large circles on the surface of the Earth. It works correctly as long as the circle does not cross over one of the poles. If that happens, the result is complete nonsense when plotted on a map (see below cell 2)
If I plot them "in a void" instead of on a map (see below cell 3) the results are correct in the sense that, if you removed the horizontal line going from +180 to -180 longitude, the rest of the curve would indeed delimit the boundary between the interior and exterior of the desired circle. However, they are wrong in that the polygon is invalid (.is_valid is False), and much more importantly, the nonzero-winding-number interior of the polygon does not enclose the correct region of the map.
I believe this is happening because shapely.ops.transform is blind to the coordinate singularity at +180==-180 longitude. The question is, how do I detect the problem and repair the polygon, so that it does enclose the correct region of the map? In this case, an appropriate fixup would be to replace the horizontal segment from (X,+180) -- (X,-180) with three lines, (X,+180) -- (+90,+180) -- (+90,-180) -- (X,-180); but note that if the circle had gone over the south pole, the fixup lines would need to go south instead. And if the circle had gone over both poles, we'd have a valid polygon again but its interior would be the complement of what it should be. I need to detect all of these cases and handle them correctly. Also, I do not know how to "edit" a shapely geometry object.
Downloadable notebook: https://gist.github.com/zackw/e48cb1580ff37acfee4d0a7b1d43a037
## cell 1
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import pyproj
from shapely.geometry import Point, Polygon, MultiPolygon
from shapely.ops import transform as sh_transform
from functools import partial
wgs84_globe = pyproj.Proj(proj='latlong', ellps='WGS84')
def disk_on_globe(lat, lon, radius):
aeqd = pyproj.Proj(proj='aeqd', ellps='WGS84', datum='WGS84',
lat_0=lat, lon_0=lon)
return sh_transform(
partial(pyproj.transform, aeqd, wgs84_globe),
Point(0, 0).buffer(radius)
)
## cell 2
def plot_poly_on_map(map_, pol):
if isinstance(pol, Polygon):
map_.plot(*(pol.exterior.xy), '-', latlon=True)
else:
assert isinstance(pol, MultiPolygon)
for p in pol:
map_.plot(*(p.exterior.xy), '-', latlon=True)
plt.figure(figsize=(14, 12))
map_ = Basemap(projection='cyl', resolution='c')
map_.drawcoastlines(linewidth=0.25)
for rad in range(1,10):
plot_poly_on_map(
map_,
disk_on_globe(40.439, -79.976, rad * 1000 * 1000)
)
plt.show()
## cell 3
def plot_poly_in_void(pol):
if isinstance(pol, Polygon):
plt.plot(*(pol.exterior.xy), '-')
else:
assert isinstance(pol, MultiPolygon)
for p in pol:
plt.plot(*(p.exterior.xy), '-', latlon=True)
plt.figure()
for rad in range(1,10):
plot_poly_in_void(
disk_on_globe(40.439, -79.976, rad * 1000 * 1000)
)
plt.show()
(The sunlit region shown at http://www.die.net/earth/rectangular.html is an example of what a circle that crosses a pole should look like when projected onto an equirectangular map, as long as it's not an equinox today.)

Manually fixing up the projected polygon turns out not to be that bad.
There are two steps: first, find all segments of the polygon that cross the coordinate singularity at longitude ±180, and replace them with excursions to either the north or south pole, whichever is nearest; second, if the resulting polygon doesn't contain the origin point, invert it. Note that both steps must be carried out whether or not shapely thinks the projected polygon is "invalid"; depending on where the starting point is, it may cross one or both poles without being invalid.
This probably isn't the most efficient way to do it, but it works.
import pyproj
from shapely.geometry import Point, Polygon, box as Box
from shapely.ops import transform as sh_transform
from functools import partial
wgs84_globe = pyproj.Proj(proj='latlong', ellps='WGS84')
def disk_on_globe(lat, lon, radius):
"""Generate a shapely.Polygon object representing a disk on the
surface of the Earth, containing all points within RADIUS meters
of latitude/longitude LAT/LON."""
aeqd = pyproj.Proj(proj='aeqd', ellps='WGS84', datum='WGS84',
lat_0=lat, lon_0=lon)
disk = sh_transform(
partial(pyproj.transform, aeqd, wgs84_globe),
Point(0, 0).buffer(radius)
)
# Fix up segments that cross the coordinate singularity at longitude ±180.
# We do this unconditionally because it may or may not create a non-simple
# polygon, depending on where the initial point was.
boundary = np.array(disk.boundary)
i = 0
while i < boundary.shape[0] - 1:
if abs(boundary[i+1,0] - boundary[i,0]) > 180:
assert (boundary[i,1] > 0) == (boundary[i,1] > 0)
vsign = -1 if boundary[i,1] < 0 else 1
hsign = -1 if boundary[i,0] < 0 else 1
boundary = np.insert(boundary, i+1, [
[hsign*179, boundary[i,1]],
[hsign*179, vsign*89],
[-hsign*179, vsign*89],
[-hsign*179, boundary[i+1,1]]
], axis=0)
i += 5
else:
i += 1
disk = Polygon(boundary)
# If the fixed-up polygon doesn't contain the origin point, invert it.
if not disk.contains(Point(lon, lat)):
disk = Box(-180, -90, 180, 90).difference(disk)
assert disk.is_valid
assert disk.boundary.is_simple
assert disk.contains(Point(lon, lat))
return disk
The other problem -- mpl_toolkits.basemap.Basemap.plot producing garbage -- is not corrected by fixing up the polygon as above. However, if you manually project the polygon into map coordinates and then draw it using a descartes.PolygonPatch, that works, as long as the projection has a rectangular boundary, and that's enough of a workaround for me. (I think it would work for any projection if one added a lot of extra points along all straight lines at the map boundary.)
%matplotlib inline
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
from descartes import PolygonPatch
plt.figure(figsize=(14, 12))
map_ = Basemap(projection='cea', resolution='c')
map_.drawcoastlines(linewidth=0.25)
for rad in range(3,19,2):
plt.gca().add_patch(PolygonPatch(
sh_transform(map_,
disk_on_globe(40.439, -79.976, rad * 1000 * 1000)),
alpha=0.1))
plt.show()

Related

geopandas not recognizing point in polygon

I have two data frames. One has polygons of buildings (around 70K) and the other has points that may or not be inside the polygons (around 100K). I need to identify if a point is inside a polygon or not.
When I plot both dataframes (example below), the plot shows that some points are inside the polygons and other are not. However, when I use .within(), the outcome says none of the points are inside polygons.
I recreated the example creating one polygon and one point "by hand" rather than importing the data and in this case .within() does recognize that the point is in the polygon. Therefore, I assume I'm making a mistake but I don't know where.
Example: (I'll just post the part that corresponds to one point and one polygon for simplicity. In this case, each data frame contains either a single point or a single polygon)
1) Using the imported data. The data frame dmR has the points and the data frame dmf has the polygon
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely import wkt
from shapely.geometry import Point, Polygon
plt.style.use("seaborn")
# I'm skipping the data manipulation stage and
# going to the point where the data are used.
print(dmR)
geometry
35 POINT (-95.75207 29.76047)
print(dmf)
geometry
41964 POLYGON ((-95.75233 29.76061, -95.75194 29.760...
# Plot
fig, ax = plt.subplots(figsize=(5,5))
minx, miny, maxx, maxy = ([-95.7525, 29.7603, -95.7515, 29.761])
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)
dmR.plot(ax=ax, c='Red')
dmf.plot(ax=ax, alpha=0.5)
plt.savefig('imported_data.png')
The outcome
shows that the point is inside the polygon. However,
print(dmR.within(dmf))
35 False
41964 False
dtype: bool
2) If I try to recreate this by hand, it would be as follows (there may be a better way to do this but I couldn't figure it out):
# Get the vertices of the polygon to create it by hand
poly1 = dmf['geometry']
g = [i for i in poly1]
x,y = g[0].exterior.coords.xy
x,y
(array('d', [-95.752332508564, -95.75193554162979, -95.75193151831627, -95.75232848525047, -95.752332508564]),
array('d', [29.760606530637265, 29.760607694859385, 29.76044470363038, 29.76044237518235, 29.760606530637265]))
# Create the polygon by hand using the corresponding vertices
coords = [(-95.752332508564, 29.760606530637265),
(-95.75193554162979, 29.760607694859385),
(-95.75193151831627, 29.7604447036303),
(-95.75232848525047, 29.76044237518235),
(-95.752332508564, 29.760606530637265)]
poly = Polygon(coords)
# Create point by hand (just copy the point from 1) above
p1 = Point(-95.75207, 29.76047)
# Create the GeoPandas data frames from the point and polygon
ex = gpd.GeoDataFrame()
ex['geometry']=[poly]
ex = ex.set_geometry('geometry')
ex_p = gpd.GeoDataFrame()
ex_p['geometry'] = [p1]
ex_p = ex_p.set_geometry('geometry')
# Plot and print
fig, ax = plt.subplots(figsize=(5,5))
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)
ex_p.plot(ax=ax, c='Red')
ex.plot(ax = ax, alpha=0.5)
plt.savefig('by_hand.png')
In this case, the outcome also shows the point in the polygon. However,
ex_p.within(ex)
0 True
dtype: bool
which recognize that the point is in the polygon. All suggestions on what to do are appreciated! Thanks.

I don't know if this is the most efficient way to do it but I was able to do what I needed within Python and using Geopandas.
Instead of using point.within(polygon) approach, I did a spatial join (geopandas.sjoin(df_1, df_2, how = 'inner', op = 'contains')) This results in a new data frame that contains the points that are within polygons and excludes the ones that are not. More information on how to do this can be found here.

I assume something is fishy about your coordinate reference system (crs). I cannot tell about dmr as it is not provided but ex_p is a naive geometry as you generated it from points without specifying the crs. You can check the crs using:
dmr.crs
Let's assume it's in 4326, then it will return:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In this case you would need to set a CRS for ex_p first using:
ex_p = ex_p.set_crs(epsg=4326)
If you want to inherit the crs of dmr dynamically you can also use:
ex_p = ex_p.set_crs(dmr.crs)
After you set a crs, you can re-project from one crs to another using:
ex_p = ex_p.to_crs(epsg=3395)
More on that topic:
https://geopandas.org/projections.html

Estimating an area of an image generated by a set of points (Alpha shapes??)

I have a set of points in an example ASCII file showing a 2D image.
I would like to estimate the total area that these points are filling. There are some places inside this plane that are not filled by any point because these regions have been masked out. What I guess might be practical for estimating the area would be applying a concave hull or alpha shapes.
I tried this approach to find an appropriate alpha value, and consequently estimate the area.
from shapely.ops import cascaded_union, polygonize
import shapely.geometry as geometry
from scipy.spatial import Delaunay
import numpy as np
import pylab as pl
from descartes import PolygonPatch
from matplotlib.collections import LineCollection
def plot_polygon(polygon):
fig = pl.figure(figsize=(10,10))
ax = fig.add_subplot(111)
margin = .3
x_min, y_min, x_max, y_max = polygon.bounds
ax.set_xlim([x_min-margin, x_max+margin])
ax.set_ylim([y_min-margin, y_max+margin])
patch = PolygonPatch(polygon, fc='#999999',
ec='#000000', fill=True,
zorder=-1)
ax.add_patch(patch)
return fig
def alpha_shape(points, alpha):
if len(points) < 4:
# When you have a triangle, there is no sense
# in computing an alpha shape.
return geometry.MultiPoint(list(points)).convex_hull
def add_edge(edges, edge_points, coords, i, j):
"""
Add a line between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
return
edges.add( (i, j) )
edge_points.append(coords[ [i, j] ])
coords = np.array([point.coords[0]
for point in points])
tri = Delaunay(coords)
edges = set()
edge_points = []
# loop over triangles:
# ia, ib, ic = indices of corner points of the
# triangle
for ia, ib, ic in tri.vertices:
pa = coords[ia]
pb = coords[ib]
pc = coords[ic]
# Lengths of sides of triangle
a = np.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
b = np.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
c = np.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)
# Semiperimeter of triangle
s = (a + b + c)/2.0
# Area of triangle by Heron's formula
area = np.sqrt(s*(s-a)*(s-b)*(s-c))
circum_r = a*b*c/(4.0*area)
# Here's the radius filter.
#print circum_r
if circum_r < 1.0/alpha:
add_edge(edges, edge_points, coords, ia, ib)
add_edge(edges, edge_points, coords, ib, ic)
add_edge(edges, edge_points, coords, ic, ia)
m = geometry.MultiLineString(edge_points)
triangles = list(polygonize(m))
return cascaded_union(triangles), edge_points
points=[]
with open("test.asc") as f:
for line in f:
coords=map(float,line.split(" "))
points.append(geometry.shape(geometry.Point(coords[0],coords[1])))
print geometry.Point(coords[0],coords[1])
x = [p.x for p in points]
y = [p.y for p in points]
pl.figure(figsize=(10,10))
point_collection = geometry.MultiPoint(list(points))
point_collection.envelope
convex_hull_polygon = point_collection.convex_hull
_ = plot_polygon(convex_hull_polygon)
_ = pl.plot(x,y,'o', color='#f16824')
concave_hull, edge_points = alpha_shape(points, alpha=0.001)
lines = LineCollection(edge_points)
_ = plot_polygon(concave_hull)
_ = pl.plot(x,y,'o', color='#f16824')
I get this result but I would like that this method could detect the hole in the middle.
Update
This is how my real data looks like:
My question is what is the best way to estimate an area of the aforementioned shape? I can not figure out what has gone wrong that this code doesn't work properly?!! Any help will be appreciated.

Okay, here's the idea. A Delaunay triangulation is going to generate triangles which are indiscriminately large. It's also going to be problematic because only triangles will be generated.
Therefore, we'll generate what you might call a "fuzzy Delaunay triangulation". We'll put all the points into a kd-tree and, for each point p, look at its k nearest neighbors. The kd-tree makes this fast.
For each of those k neighbors, find the distance to the focal point p. Use this distance to generate a weighting. We want nearby points to be favored over more distant points, so an exponential function exp(-alpha*dist) is appropriate here. Use the weighted distances to build a probability density function describing the probability of drawing each point.
Now, draw from that distribution a large number of times. Nearby points will be chosen often while farther away points will be chosen less often. For point drawn, make a note of how many times it was drawn for the focal point. The result is a weighted graph where each edge in the graph connects nearby points and is weighted by how often the pairs were chosen.
Now, cull all edges from the graph whose weights are too small. These are the points which are probably not connected. The result looks like this:
Now, let's throw all of the remaining edges into shapely. We can then convert the edges into very small polygons by buffering them. Like so:
Differencing the polygons with a large polygon covering the entire region will yield polygons for the triangulation. THIS MAY TAKE A WHILE. The result looks like this:
Finally, cull off all of the polygons which are too large:
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
import random
import scipy
import scipy.spatial
import networkx as nx
import shapely
import shapely.geometry
import matplotlib
dat = np.loadtxt('test.asc')
xycoors = dat[:,0:2]
xcoors = xycoors[:,0] #Convenience alias
ycoors = xycoors[:,1] #Convenience alias
npts = len(dat[:,0]) #Number of points
dist = scipy.spatial.distance.euclidean
def GetGraph(xycoors, alpha=0.0035):
kdt = scipy.spatial.KDTree(xycoors) #Build kd-tree for quick neighbor lookups
G = nx.Graph()
npts = np.max(xycoors.shape)
for x in range(npts):
G.add_node(x)
dist, idx = kdt.query(xycoors[x,:], k=10) #Get distances to neighbours, excluding the cenral point
dist = dist[1:] #Drop central point
idx = idx[1:] #Drop central point
pq = np.exp(-alpha*dist) #Exponential weighting of nearby points
pq = pq/np.sum(pq) #Convert to a PDF
choices = np.random.choice(idx, p=pq, size=50) #Choose neighbors based on PDF
for c in choices: #Insert neighbors into graph
if G.has_edge(x, c): #Already seen neighbor
G[x][c]['weight'] += 1 #Strengthen connection
else:
G.add_edge(x, c, weight=1) #New neighbor; build connection
return G
def PruneGraph(G,cutoff):
newg = G.copy()
bad_edges = set()
for x in newg:
for k,v in newg[x].items():
if v['weight']<cutoff:
bad_edges.add((x,k))
for b in bad_edges:
try:
newg.remove_edge(*b)
except nx.exception.NetworkXError:
pass
return newg
def PlotGraph(xycoors,G,cutoff=6):
xcoors = xycoors[:,0]
ycoors = xycoors[:,1]
G = PruneGraph(G,cutoff)
plt.plot(xcoors, ycoors, "o")
for x in range(npts):
for k,v in G[x].items():
plt.plot((xcoors[x],xcoors[k]),(ycoors[x],ycoors[k]), 'k-', lw=1)
plt.show()
def GetPolys(xycoors,G):
#Get lines connecting all points in the graph
xcoors = xycoors[:,0]
ycoors = xycoors[:,1]
lines = []
for x in range(npts):
for k,v in G[x].items():
lines.append(((xcoors[x],ycoors[x]),(xcoors[k],ycoors[k])))
#Get bounds of region
xmin = np.min(xycoors[:,0])
xmax = np.max(xycoors[:,0])
ymin = np.min(xycoors[:,1])
ymax = np.max(xycoors[:,1])
mls = shapely.geometry.MultiLineString(lines) #Bundle the lines
mlsb = mls.buffer(2) #Turn lines into narrow polygons
bbox = shapely.geometry.box(xmin,ymin,xmax,ymax) #Generate background polygon
polys = bbox.difference(mlsb) #Subtract to generate polygons
return polys
def PlotPolys(polys,area_cutoff):
fig, ax = plt.subplots(figsize=(8, 8))
for polygon in polys:
if polygon.area<area_cutoff:
mpl_poly = matplotlib.patches.Polygon(np.array(polygon.exterior), alpha=0.4, facecolor=np.random.rand(3,1))
ax.add_patch(mpl_poly)
ax.autoscale()
fig.show()
#Functional stuff starts here
G = GetGraph(xycoors, alpha=0.0035)
#Choose a value that rips off an appropriate amount of the left side of this histogram
weights = sorted([v['weight'] for x in G for k,v in G[x].items()])
plt.hist(weights, bins=20);plt.show()
PlotGraph(xycoors,G,cutoff=6) #Plot the graph to ensure our cut-offs were okay. May take a while
prunedg = PruneGraph(G,cutoff=6) #Prune the graph
polys = GetPolys(xycoors,prunedg) #Get polygons from graph
areas = sorted(p.area for p in polys)
plt.plot(areas)
plt.hist(areas,bins=20);plt.show()
area_cutoff = 150000
PlotPolys(polys,area_cutoff=area_cutoff)
good_polys = ([p for p in polys if p.area<area_cutoff])
total_area = sum([p.area for p in good_polys])

Here's a thought: use k-means clustering.
You can accomplish this in Python as follows:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
dat = np.loadtxt('test.asc')
xycoors = dat[:,0:2]
fit = KMeans(n_clusters=2).fit(xycoors)
plt.scatter(dat[:,0],dat[:,1], c=fit.labels_)
plt.axes().set_aspect('equal', 'datalim')
plt.gray()
plt.show()
Using your data, this gives the following result:
Now, you can take the convex hull of the top cluster and the bottom cluster and calculate the areas of each separately. Adding the areas then becomes an estimator of the area of their union, but, cunningly, avoids the hole in the middle.
To fine-tune your results, you can play with the number of clusters and the number of different starts to the algorithm (the algorithm is randomized and is typically run more than once).
You asked, for instance, if two clusters will always leave the hole in the middle. I've used the following code to experiment with that. I generate a uniform distribution of points and then chop out a randomly sized and orientated ellipse to simulate a hole.
#!/usr/bin/env python3
import sklearn
import sklearn.cluster
import numpy as np
import matplotlib.pyplot as plt
PWIDTH = 6
PHEIGHT = 6
def GetPoints(num):
return np.random.rand(num,2)*300-150 #Centered about zero
def MakeHole(pts): #Chop out a randomly orientated and sized ellipse
a = np.random.uniform(10,150) #Semi-major axis
b = np.random.uniform(10,150) #Semi-minor axis
h = np.random.uniform(-150,150) #X-center
k = np.random.uniform(-150,150) #Y-center
A = np.random.uniform(0,2*np.pi) #Angle of rotation
surviving_points = []
for pt in range(pts.shape[0]):
x = pts[pt,0]
y = pts[pt,1]
if ((x-h)*np.cos(A)+(y-k)*np.sin(A))**2/a/a+((x-h)*np.sin(A)-(y-k)*np.cos(A))**2/b/b>1:
surviving_points.append(pt)
return pts[surviving_points,:]
def ShowManyClusters(pts,fitter,clusters,title):
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
fig,axs = plt.subplots(PWIDTH,PHEIGHT)
axs = axs.ravel()
for i in range(PWIDTH*PHEIGHT):
lbls = fitter(pts[i],clusters)
axs[i].scatter(pts[i][:,0],pts[i][:,1], c=colors[lbls])
axs[i].get_xaxis().set_ticks([])
axs[i].get_yaxis().set_ticks([])
plt.suptitle(title)
#plt.show()
plt.savefig('/z/'+title+'.png')
fitters = {
'SpectralClustering': lambda x,clusters: sklearn.cluster.SpectralClustering(n_clusters=clusters,affinity='nearest_neighbors').fit(x).labels_,
'KMeans': lambda x,clusters: sklearn.cluster.KMeans(n_clusters=clusters).fit(x).labels_,
'AffinityPropagation': lambda x,clusters: sklearn.cluster.AffinityPropagation().fit(x).labels_,
}
np.random.seed(1)
pts = []
for i in range(PWIDTH*PHEIGHT):
temp = GetPoints(300)
temp = MakeHole(temp)
pts.append(temp)
for name,fitter in fitters.items():
for clusters in [2,3]:
np.random.seed(1)
ShowManyClusters(pts,fitter,clusters,"{0}: {1} clusters".format(name,clusters))
Consider the results for K-Means:
At least to my eye, it seems as though using two clusters performs worst when the "hole" separates the data into two separate blobs. (In this case that occurs when the ellipse is orientated such that it overlaps two edges of the rectangular region containing the sample points.) Using three clusters resolves most of these difficulties.
You'll also notice that K-means produces some counter-intuitive results on the 1st Column, 3rd Row as well as on the 3rd Column, 4th Row. Reviewing sklearn's menagerie of clustering methods here shows the following comparison image:
From this, image it seems as though SpectralClustering produces results that align with what we want. Trying this on the same data above fixes the problems mentioned (see 1st Column, 3rd Row and 3rd Column, 4th Row).
The foregoing suggests that Spectral clustering with three clusters should be adequate for most situations of this sort.

Although you seem intent on doing a concave shape, here is an alternate route that is hella fast and I think would give you very a pretty stable reading:
Create a function which takes as an argument (int radiusOfInfluence). Inside the function run a voxel filter with that as the radius. Then simply multiply the area of that circle (pi*AOI^2) by the number of remaining points in the cloud. This should give you a relatively robust estimation of area and would be very resilient to holes and weird edges.
Some things to consider:
-This will give you a positive overshoot of area due to over-reaching edges by exactly one radius. A modification to adjust for this could be to run a statistical outlier removal filter (in inverse mode) to acquire statistical edge points. Then an assumption can be made that approximately half of each edge point is lying outside the shape, subtract half the number of points found from your total count prior to multiplying into area.
-The radius of influence largely determines this function's hole detection as a larger one will allow single points to cover larger areas, but also by tuning the std cutoff on the stat outlier filter, you can more aggressively detect interior holes and adjust your area that way.
It really begs the question of what you are after, as this is more of a shot accuracy/ shot grouping type assessment assuming a reasonably distributed set of samples. Your method kinda is making the assumption that your outer edge points are the absolute limits of what is possible (which may be a fair assumption depending on the situation)
EDIT-----------------------
I do not have time to write out example code, but I can further explain to aid in understanding.
At the core of this is the voxel filter. Very simply, it sets a seed point in x,y coordinates and then creates a grid over the whole space which has units (grid spacing) on both axes of a user specified filter radius. Inside each grid box, it will average all points to a single point. This is very important for this concept because it almost entirely eliminates the issue of overlap.
The second part (the inverse stat outlier removal) is just a bit of cleverness to tighten your edge fit. Basically, stat outlier is built to remove noise by looking at the distance from each point to its (k) nearest neighbors. After generating the average distance to k nearest neighbors for each point, it sets up a histogram and a user defined parameter acts as a binary threshold for keeping or removing points. When inverted and set to a reasonable cutt-off (~0.75 std should work), instead it will delete all the points that are in the bulk of the object (ie only leaving edge points). The reason this is important is that technically these points are over-reaching the boundary of your object by 1 radius. Although some will be on acute and some on obtuse edge angles (ie more than or less than half a circle of overfill) taking off 1/2 of a circle area per point should over the whole object give you a pretty sound improvement on edge fit.
Keep in mind though that at the end of the day, this is just going to give you a number. As far as stress testing, I suggest creating contrived point clouds of known area and or creating a graphical output that shows where you are dropping circles and half circles (oriented towards the interior of the object if you are fancy).
The knobs you will want to turn to improve this method are:
Voxel filter radius, area of influence per point (could actually be controlled separately from vox filter radius, though they should remain pretty close to one another), std cutt-off.
Hope this helped to clarify, cheers!

Edit:
I have noticed that you have your own code to compute the alpha shape,
and the areas of Delaunay triangles are just there, so computing the area of the shape is even easier...
Just add the areas of triangles, if triangle is going to be added to the alpha-shape polygon.
If you want to detect holes... add a secondary threshold to avoid adding triangles with an area greater than the threshold. For this example, a value of max_area = 99999 will remove the hole.
The only problem is the way you create the graphic output, because you will not see the hole.
def alpha_shape(points, alpha, max_area):
if len(points) < 4:
# When you have a triangle, there is no sense
# in computing an alpha shape.
return geometry.MultiPoint(list(points)).convex_hull , 0
def add_edge(edges, edge_points, coords, i, j):
"""
Add a line between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
return
edges.add( (i, j) )
edge_points.append(coords[ [i, j] ])
coords = np.array([point.coords[0]
for point in points])
tri = Delaunay(coords)
total_area = 0
edges = set()
edge_points = []
# loop over triangles:
# ia, ib, ic = indices of corner points of the
# triangle
for ia, ib, ic in tri.vertices:
pa = coords[ia]
pb = coords[ib]
pc = coords[ic]
# Lengths of sides of triangle
a = np.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
b = np.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
c = np.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)
# Semiperimeter of triangle
s = (a + b + c)/2.0
# Area of triangle by Heron's formula
area = np.sqrt(s*(s-a)*(s-b)*(s-c))
circum_r = a*b*c/(4.0*area)
# Here's the radius filter.
# print("radius", circum_r)
if circum_r < 1.0/alpha and area < max_area:
add_edge(edges, edge_points, coords, ia, ib)
add_edge(edges, edge_points, coords, ib, ic)
add_edge(edges, edge_points, coords, ic, ia)
total_area += area
m = geometry.MultiLineString(edge_points)
triangles = list(polygonize(m))
return cascaded_union(triangles), edge_points, total_area
The
Old answer:
To compute the area of an irregular simple polygon, you can use the Shoelace formula, and the CCW coordinates of the boundary as input.
If you want to detect holes inside of your cloud, you have to remove the Delaunay triangles with a circumradius greater that a secondary threshold.
The ideal is: Compute the Delaunay triangulation and filter with your current alpha shape. Then, compute the circumradius of every triangle and remove those triangles with circumradius much bigger than average circumradius.
To compute the area of an irregular polygon with holes, use the Shoelace formula for each hole boundary. Input the external boundary in CCW (positive) order to obtain the area. Then input the boundary of each hole in CW (negative) order, to obtain a (negative) value for area.

Find the intersection between two geographical data points

I have two pairs of lat/lon (expressed in decimal degrees) along with their radius (expressed in meters). What I am trying to achieve is to find if an intersect between these two points exits (of course, it is obvious that this doesn't hold here but the plan is to try this algorithm in many other data points). In order to check this I am using Shapely's intersects() function. My question however is how should I deal with the different units? Should I make some sort of transformation \ projection first (same units for both lat\lon and radius)?
48.180759,11.518950,19.0
47.180759,10.518950,10.0
EDIT:
I found this library here (https://pypi.python.org/pypi/utm) which seems helpfull. However, I am not 100% sure if I apply it correctly. Any ideas?
X = utm.from_latlon(38.636782, 21.414384)
A = geometry.Point(X[0], X[1]).buffer(30.777)
Y = utm.from_latlon(38.636800, 21.414488)
B = geometry.Point(Y[0], Y[1]).buffer(23.417)
A.intersects(B)
SOLUTION:
So, I finally managed to solve my problem. Here are two different implementations that both solve the same problem:
X = from_latlon(48.180759, 11.518950)
Y = from_latlon(47.180759, 10.518950)
print(latlonbuffer(48.180759, 11.518950, 19.0).intersects(latlonbuffer(47.180759, 10.518950, 19.0)))
print(latlonbuffer(48.180759, 11.518950, 100000.0).intersects(latlonbuffer(47.180759, 10.518950, 100000.0)))
X = from_latlon(48.180759, 11.518950)
Y = from_latlon(47.180759, 10.518950)
print(geometry.Point(X[0], X[1]).buffer(19.0).intersects(geometry.Point(Y[0], Y[1]).buffer(19.0)))
print(geometry.Point(X[0], X[1]).buffer(100000.0).intersects(geometry.Point(Y[0], Y[1]).buffer(100000.0)))

Shapely only uses the Cartesian coordinate system, so in order to make sense of metric distances, you would need to either:
project the coordinates into a local projection system that uses distance units in metres, such as a UTM zone.
buffer a point from (0,0), and use a dynamic azimuthal equidistant projection centered on the lat/lon point to project to geographic coords.
Here's how to do #2, using shapely.ops.transform and pyproj
import pyproj
from shapely.geometry import Point
from shapely.ops import transform
from functools import partial
WGS84 = pyproj.Proj(init='epsg:4326')
def latlonbuffer(lat, lon, radius_m):
proj4str = '+proj=aeqd +lat_0=%s +lon_0=%s +x_0=0 +y_0=0' % (lat, lon)
AEQD = pyproj.Proj(proj4str)
project = partial(pyproj.transform, AEQD, WGS84)
return transform(project, Point(0, 0).buffer(radius_m))
A = latlonbuffer(48.180759, 11.518950, 19.0)
B = latlonbuffer(47.180759, 10.518950, 10.0)
print(A.intersects(B)) # False
Your two buffered points don't intersect. But these do:
A = latlonbuffer(48.180759, 11.518950, 100000.0)
B = latlonbuffer(47.180759, 10.518950, 100000.0)
print(A.intersects(B)) # True
As shown by plotting the lon/lat coords (which distorts the circles):

Ordering polygon coordinates for plotting

I have a model grid composed of many cells for which I would like to plot a shaded polygon on a matplotlib basemap.
Using pyproj, I first projected the points, before creating a polygon using shapely.geometry's Polygon class to extract the grid's exterior coordinates from. I then revert them back to WGS84 for passing to my plotting function:
grid_x_mesh, grid_y_mesh = pyproj.transform(wgs84, nplaea, grid_lons, grid_lats)
grid_x = grid_x_mesh.ravel()
grid_y = grid_y_mesh.ravel()
grid_poly = Polygon(zip(grid_x, grid_y))
grid_x, grid_y = grid_poly.exterior.coords.xy
grid_plons, grid_plats = pyproj.transform(nplaea, wgs84, grid_x, grid_y)
Then, using the matplotlib.basemap method, I projected the WSG84 coordinates to the map projection (nplaea in this case) and
grid_poly_x, grid_poly_y = m(grid_plons, grid_plats)
grid_poly_xy = zip(grid_poly_x, grid_poly_y)
grid_poly = Polygon(grid_poly_xy, facecolor='red', alpha=0.4)
plt.gca().add_patch(grid_poly)
When attempting to do so, I am getting a criss-cross pattern, which I assume has to do the ordering of the coordinates that I supplied to the polygon function.
I would think this has to do with either how I extracted the exterior coordinates, or just the ordering of the coordinate lists when I created the final polygon to plot.
Is there a clever way of ordering these properly if that is the problem?
Plotted polygon
Close-up

I agree there is some misarrangement of the grid coordinates. How was grid_lons created? Possibly a cleaner way to use Pyproj with Shapely geometries is to use a relatively new function shapely.ops.transform. For example:
import pyproj
from shapely.geometry import Polygon, Point
from shapely.ops import transform
from functools import partial
project = partial(
pyproj.transform,
pyproj.Proj(init='epsg:4326'), # WGS84 geographic
pyproj.Proj(init='epsg:3575')) # North Pole LAEA Europe
# Example grid cell, in long/lat
poly_g = Polygon(((5, 52), (5, 60), (15, 60), (15, 52), (5, 52)))
# Transform to projected system
poly_p = transform(project, poly_g)
The sanity of the coordinates should be preserved through the transformation (assuming that they were sane to begin with).

Soooo... apparently the shapely.geometry.Polygon method was drawing the polygon with all interior grid coordinates, which I realized due to the grid_plons and grid_plats having the same length as the np.ravel()'ed mesh coordinate array.
I ended up just doing a manual extraction of the external coordinates from the mesh coordinate arrays before passing them to the Polygon method (see below). Though, I imagine there may be a prettier and more general way of doing this.
Manual Extraction method:
grid_x_mesh, grid_y_mesh = pyproj.transform(wgs84, nplaea, grid_lons, grid_lats)
# The coordinates must be ordered in the order they are to be drawn
[grid_x.append(i) for i in grid_x_mesh[0,:]]
[grid_x.append(i) for i in grid_x_mesh[1:-1,-1]]
# Note that these two sides of the polygon are appended in reverse
[grid_x.append(i) for i in (grid_x_mesh[-1,:])[::-1]]
[grid_x.append(i) for i in (grid_x_mesh[1:-1,0])[::-1]]
[grid_y.append(i) for i in grid_y_mesh[0,:]]
[grid_y.append(i) for i in grid_y_mesh[1:-1,-1]]
[grid_y.append(i) for i in (grid_y_mesh[-1,:])[::-1]]
[grid_y.append(i) for i in (grid_y_mesh[1:-1,0])[::-1]]
grid_poly = Polygon(zip(grid_x, grid_y))

Automatically center matplotlib basemap onto data

I would like a solution to automatically center a basemap plot on my coordinate data.
I've got things to automatically center, but the resulting area is much larger than the area actually used by my data. I would like the plot to be bounded by the plot coordinates, rather than an area drawn from the lat/lon boundaries.
I am using John Cook's code for calculating the distance between two points on (an assumed perfect) sphere.
First Try
Here is the script I started with. This was causing the width and height to bee small too small for the data area, and the center latitude (lat0) too far south.
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
import sys
import csv
import spheredistance as sd
print '\n'
if len(sys.argv) < 3:
print >>sys.stderr,'Usage:',sys.argv[0],'<datafile> <#rows to skip>'
sys.exit(1)
print '\n'
dataFile = sys.argv[1]
dataStream = open(dataFile, 'rb')
dataReader = csv.reader(dataStream,delimiter='\t')
numRows = sys.argv[2]
dataValues = []
dataLat = []
dataLon = []
print 'Plotting Data From: '+dataFile
dataReader.next()
for row in dataReader:
dataValues.append(row[0])
dataLat.append(float(row[1]))
dataLon.append(float(row[2]))
# center and set extent of map
earthRadius = 6378100 #meters
factor = 1.00
lat0new = ((max(dataLat)-min(dataLat))/2)+min(dataLat)
lon0new = ((max(dataLon)-min(dataLon))/2)+min(dataLon)
mapH = sd.distance_on_unit_sphere(max(dataLat),lon0new,
min(dataLat),lon0new)*earthRadius*factor
mapW = sd.distance_on_unit_sphere(lat0new,max(dataLon),
lat0new,min(dataLon))*earthRadius*factor
# setup stereographic basemap.
# lat_ts is latitude of true scale.
# lon_0,lat_0 is central point.
m = Basemap(width=mapW,height=mapH,
resolution='l',projection='stere',\
lat_0=lat0new,lon_0=lon0new)
#m.shadedrelief()
m.drawcoastlines(linewidth=0.2)
m.fillcontinents(color='white', lake_color='aqua')
#plot data points (omitted due to ownership)
#x, y = m(dataLon,dataLat)
#m.scatter(x,y,2,marker='o',color='k')
# draw parallels and meridians.
m.drawparallels(np.arange(-80.,81.,20.), labels=[1,0,0,0], fontsize=10)
m.drawmeridians(np.arange(-180.,181.,20.), labels=[0,0,0,1], fontsize=10)
m.drawmapboundary(fill_color='aqua')
plt.title("Example")
plt.show()

After generating some random data, it was obvious that the bounds that I chose did not work with this projection (red lines). Using map.drawgreatcircle(), I first visualized where I wanted the bounds while zoomed over the projection of random data.
I corrected the longitude by using the longitudinal difference at the southern most latitude (blue horizontal line).
I determined the latitudinal range using the Pythagorean theorem to solve for the vertical distance, knowing the distance between the northern most longitudinal bounds, and the central southernmost point (blue triangle).
def centerMap(lats,lons,scale):
#Assumes -90 < Lat < 90 and -180 < Lon < 180, and
# latitude and logitude are in decimal degrees
earthRadius = 6378100.0 #earth's radius in meters
northLat = max(lats)
southLat = min(lats)
westLon = max(lons)
eastLon = min(lons)
# average between max and min longitude
lon0 = ((westLon-eastLon)/2.0)+eastLon
# a = the height of the map
b = sd.spheredist(northLat,westLon,northLat,eastLon)*earthRadius/2
c = sd.spheredist(northLat,westLon,southLat,lon0)*earthRadius
# use pythagorean theorom to determine height of plot
mapH = pow(pow(c,2)-pow(b,2),1./2)
arcCenter = (mapH/2)/earthRadius
lat0 = sd.secondlat(southLat,arcCenter)
# distance between max E and W longitude at most souther latitude
mapW = sd.spheredist(southLat,westLon,southLat,eastLon)*earthRadius
return lat0,lon0,mapW*scale,mapH*scale
lat0center,lon0center,mapWidth,mapHeight = centerMap(dataLat,dataLon,1.1)
The lat0 (or latitudinal center) in this case is therefore the point half-way up the height of this triangle, which I solved using John Cooks method, but for solving for an unknown coordinate while knowing the first coordinate (the median longitude at the southern boundary) and the arc length (half that of the total height).
def secondlat(lat1, arc):
degrees_to_radians = math.pi/180.0
lat2 = (arc-((90-lat1)*degrees_to_radians))*(1./degrees_to_radians)+90
return lat2
Update:
The above function, as well as the distance between two coordinates can be achieved with higher accuracy using the pyproj Geod class methods geod.fwd() and geod.inv(). I found this in Erik Westra's Python for Geospatial Development, which is an excellent resource.
Update:
I have now verified that this also works for Lambert Conformal Conic (lcc) projections.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.