Related
I have several dataframes which each contain two columns of x and y values, so each row represents a point on a curve. The different dataframes then represent contours on a map. I have another series of data points (fewer in number), and I'd like to see which contour they are closest to on average.
I would like to establish the distance from each datapoint to each point on the curve, with sqrt(x^2+y^2) - sqrt(x_1^2 + y_1^2), add them up for each point on the curve. The trouble is that there are several thousand points on the curve, and there are only a few dozen datapoints to assess, so I can't simply put these in columns next to each other.
I think I need to cycle through the datapoints, checking the sqdistance between them and each point in the curve.
I don't know whether there is an easy function or module that can do this.
Thanks in advance!
Edit: Thanks for the comments. #Alexander: I've tried the vectorize function, as follows, with a sample dataset. I'm actually using contours which comprise several thousand datapoints, and the dataset to compare against are 100+, so I'd like to be able to automate as much as possible. I'm currently able to create a distance measurement from the first datapoint against my contour, but I would ideally like to cycle through j as well. When I try it, it comes up with an error:
import numpy as np
from numpy import vectorize
import pandas as pd
from pandas import DataFrame
df1 = {'X1':['1', '2', '2', '3'], 'Y1':['2', '5', '7', '9']}
df1 = DataFrame(df1, columns=['X1', 'Y1'])
df2 = {'X2':['3', '5', '6'], 'Y2':['10', '15', '16']}
df2 = DataFrame(df2, columns=['X2', 'Y2'])
df1=df1.astype(float)
df2=df2.astype(float)
Distance=pd.DataFrame()
i = range(0, len(df1))
j = range(0, len(df2))
def myfunc(x1, y1, x2, y2):
return np.sqrt((x2-x1)**2+np.sqrt(y2-y1)**2)
vfunc=np.vectorize(myfunc)
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[0]['X2'], df2.iloc[0]['Y2'])
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[1]['X2'], df2.iloc[1]['Y2'])
Distance
General idea
The "curve" is actually a polygon with a lot's of points. There definetly some libraries to calculate the distance between the polygon and the point. But generally it will be something like:
Calculate "approximate distance" to whole polygon, e.g. to the bounding box of a polygon (from point to 4 line segments), or to the center of bounding box
calculate distances to the lines of a polygon. If you have too many points then as an extra step "resolution" of a polygon might be reduced.
Smallest found distance is the distance from point to the polygon.
repeat for each point and each polygon
Existing solutions
Some libraries already can do that:
shapely question, shapely Geo-Python docs
Using shapely in geopandas to calculate distance
scipy.spatial.distance: scipy can be used to calculate distance between arbitrary number of points
numpy.linalg.norm(point1-point2): some answers propose different ways to calculate distance using numpy. Some even show performance benchmarks
sklearn.neighbors: not really about curves and distances to them, but can be used if you want to check "to which area point is most likely related"
And you can always calculate distances yourself using D(x1, y1, x2, y2) = sqrt((x₂-x₁)² + (y₂-y₁)²) and search for best combination of points that gives minimal distance
Example:
# get distance from points of 1 dataset to all the points of another dataset
from scipy.spatial import distance
d = distance.cdist(df1.to_numpy(), df2.to_numpy(), 'euclidean')
print(d)
# Results will be a matrix of all possible distances:
# [[ D(Point_df1_0, Point_df2_0), D(Point_df1_0, Point_df2_1), D(Point_df1_0, Point_df2_2)]
# [ D(Point_df1_1, Point_df2_0), D(Point_df1_1, Point_df2_1), D(Point_df1_1, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_2, Point_df2_1), D(Point_df1_2, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_3, Point_df2_1), D(Point_df1_3, Point_df2_2)]]
[[ 8.24621125 13.60147051 14.86606875]
[ 5.09901951 10.44030651 11.70469991]
[ 3.16227766 8.54400375 9.8488578 ]
[ 1. 6.32455532 7.61577311]]
What to do next is up to you. For example as a metric of "general distance between curves" you can:
Pick smallest values in each row and each column (if you skip some columns/rows, then you might end up with candidate that "matches only a part of contour), and calculate their median: np.median(np.hstack([np.amin(d, axis) for axis in range(len(d.shape))])).
Or you can calculate mean value of:
all the distances: np.median(d)
of "smallest 2/3 of distances": np.median(d[d<np.percentile(d, 66, interpolation='higher')])
of "smallest distances that cover at least each rows and each columns":
for min_value in np.sort(d, None):
chosen_indices = d<=min_value
if np.all(np.hstack([np.amax(chosen_indices, axis) for axis in range(len(chosen_indices.shape))])):
break
similarity = np.median(d[chosen_indices])
Or maybe you can use different type of distance from the begining (e.g. "correlation distance" looks promising to your task)
Maybe use "Procrustes analysis, a similarity test for two data sets" together with distances.
Maybe you can use minkowski distance as a similarity metric.
Alternative approach
Alternative approach would be to use some "geometry" library to compare areas of concave hulls:
Build concave hulls for contours and for "candidate datapoints" (not easy, but possible: using shapely , using concaveman). But if you are sure that your contours are already ordered and without overlapping segments, then you can directly build polygons from those points without need for concave hull.
Use "intersection area" minus "non-common area" as a metric of similarity (shapely can be used for that):
Non-common area is: union - intersection or simply "symmetric difference"
Final metric: intersection.area - symmetric_difference.area (intersection, area)
This approach might be better than processing distances in some situations, for example:
You want to prefer "fewer points covering whole area" over "huge amount of very close points that cover only half of the area"
It's more obvious way to compare candidates with different number of points
But it has it's disadvantages too (just draw some examples on paper and experiment to find them)
Other ideas:
instead of using polygons or concave hull you can:
build a linear ring from your points and then use contour.buffer(some_distance). This way you ignore "internal area" of the contour and only compare contour itself (with tolerance of some_distance). Distance between centroids (or double of that) may be used as value for some_distance
You can build polygons/lines from segments using ops.polygonize
instead of using intersection.area - symmetric_difference.area you can:
Snap one object to another, and then compare snapped object to original
Before comparing real objects you can compare "simpler" versions of the objects to filter out obvious mismatches:
For example you can check if boundaries of objects intersect
Or you can simplify geometries before comparing them
For the distance, you need to change your formula to
def getDistance(x, y, x_i, y_i):
return sqrt((x_i -x)^2 + (y_i - y)^2)
with (x,y) being your datapoint and (x_i, y_i) being a point from the curve.
Consider using NumPy for vectorization. Explicitly looping through your data points will most likely be less efficient, depending on your use case, it might however be quick enough. (If you need to run it on a regular basis, I think vectorization will easily outspeed the explicit way) This could look something like this:
import numpy as np # Universal abbreviation for the module
datapoints = np.random.rand(3,2) # Returns a vector with randomized entries of size 3x2 (Imagine it as 3 sets of x- and y-values
contour1 = np.random.rand(1000, 2) # Other than the size (which is 1000x2) no different than datapoints
contour2 = np.random.rand(1000, 2)
contour3 = np.random.rand(1000, 2)
def squareDistanceUnvectorized(datapoint, contour):
retVal = 0.
print("Using datapoint with values x:{}, y:{}".format(datapoint[0], datapoint[1]))
lengthOfContour = np.size(contour, 0) # This gets you the number of lines in the vector
for pointID in range(lengthOfContour):
squaredXDiff = np.square(contour[pointID,0] - datapoint[0])
squaredYDiff = np.square(contour[pointID,1] - datapoint[1])
retVal += np.sqrt(squaredXDiff + squaredYDiff)
retVal = retVal / lengthOfContour # As we want the average, we are dividing the sum by the element count
return retVal
if __name__ == "__main__":
noOfDatapoints = np.size(datapoints,0)
contID = 0
for currentDPID in range(noOfDatapoints):
dist1 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour1)
dist2 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour2)
dist3 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour3)
if dist1 > dist2 and dist1 > dist3:
contID = 1
elif dist2 > dist1 and dist2 > dist3:
contID = 2
elif dist3 > dist1 and dist3 > dist2:
contID = 3
else:
contID = 0
if contID == 0:
print("Datapoint {} is inbetween two contours".format(currentDPID))
else:
print("Datapoint {} is closest to contour {}".format(currentDPID, contID))
Okay, now moving on to vector-land.
I have taken the liberty to adjust this part to what I think is your dataset. Try it and let me know if it works.
import numpy as np
import pandas as pd
# Generate 1000 points (2-dim Vector) with random values between 0 and 1. Make them strings afterwards.
# This is the first contour
random2Ddata1 = np.random.rand(1000,2)
listOfX1 = [str(x) for x in random2Ddata1[:,0]]
listOfY1 = [str(y) for y in random2Ddata1[:,1]]
# Do the same for a second contour, except that we de-center this 255 units into the first dimension
random2Ddata2 = np.random.rand(1000,2)+[255,0]
listOfX2 = [str(x) for x in random2Ddata2[:,0]]
listOfY2 = [str(y) for y in random2Ddata2[:,1]]
# After this step, our 'contours' are basically two blobs of datapoints whose centers are approx. 255 units apart.
# Generate a set of 4 datapoints and make them a Pandas-DataFrame
datapoints = {'X': ['0.5', '0', '255.5', '0'], 'Y': ['0.5', '0', '0.5', '-254.5']}
datapoints = pd.DataFrame(datapoints, columns=['X', 'Y'])
# Do the same for the two contours
contour1 = {'Xf': listOfX1, 'Yf': listOfY1}
contour1 = pd.DataFrame(contour1, columns=['Xf', 'Yf'])
contour2 = {'Xf': listOfX2, 'Yf': listOfY2}
contour2 = pd.DataFrame(contour2, columns=['Xf', 'Yf'])
# We do now have 4 datapoints.
# - The first datapoint is basically where we expect the mean of the first contour to be.
# Contour 1 consists of 1000 points with x, y- values between 0 and 1
# - The second datapoint is at the origin. Its distances should be similar to the once of the first datapoint
# - The third datapoint would be the result of shifting the first datapoint 255 units into the positive first dimension
# - The fourth datapoint would be the result of shifting the first datapoint 255 units into the negative second dimension
# Transformation into numpy array
# First the x and y values of the data points
dpArray = ((datapoints.values).T).astype(np.float)
c1Array = ((contour1.values).T).astype(np.float)
c2Array = ((contour2.values).T).astype(np.float)
# This did the following:
# - Transform the datapoints and contours into numpy arrays
# - Transpose them afterwards so that if we want all x values, we can write var[0,:] instead of var[:,0].
# A personal preference, maybe
# - Convert all the values into floats.
# Now, we iterate through the contours. If you have a lot of them, putting them into a list beforehand would do the job
for contourid, contour in enumerate([c1Array, c2Array]):
# Now for the datapoints
for _index, _value in enumerate(dpArray[0,:]):
# The next two lines do vectorization magic.
# First, we square the difference between one dpArray entry and the contour x values.
# You might notice that contour[0,:] returns an 1x1000 vector while dpArray[0,_index] is an 1x1 float value.
# This works because dpArray[0,_index] is broadcasted to fit the size of contour[0,:].
dx = np.square(dpArray[0,_index] - contour[0,:])
# The same happens for dpArray[1,_index] and contour[1,:]
dy = np.square(dpArray[1,_index] - contour[1,:])
# Now, we take (for one datapoint and one contour) the mean value and print it.
# You could write it into an array or do basically anything with it that you can imagine
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
# But you want to be able to call this... so here we go, generating a function out of it!
def getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, listOfContourDataFrames):
""" Takes a DataFrame with points and a list of different contours to return the average distance for each combination"""
dpArray = ((datapoints.values).T).astype(np.float)
listOfContours = []
for item in listOfContourDataFrames:
listOfContours.append(((item.values).T).astype(np.float))
retVal = np.zeros((np.size(dpArray,1), len(listOfContours)))
for contourid, contour in enumerate(listOfContours):
for _index, _value in enumerate(dpArray[0,:]):
dx = np.square(dpArray[0,_index] - contour[0,:])
dy = np.square(dpArray[1,_index] - contour[1,:])
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
retVal[_index, contourid] = distance
return retVal
# And just to see that it is, indeed, returning the same results, run it once
getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, [contour1, contour2])
I am rather new to Python and I think I'm trying to attempt something quite complicated? I have repeatedly tried to search for this and think I'm missing some base knowledge as I truly don't understand what I've read.
Effectively I have 3 arrays, which contain x points, y points, and z points. They create an approximately hemispherical shape. I have these points from taking a contour of a 2D axisymmetrical semicircular shape, extracting the x points and y points from it into separate arrays, and creating a "z points" array of zeros of the same length as the previous two arrays.
From this, I then rotate the y points into the z-domain using angles to create a 3D approximation of the 2D contour.
This 3D shape is completely bounded (in that it creates both the bottom and the top), as linked below:
3d Approximation
I've butchered the following code from my much longer program, it's the only part that is pertinent to my question however:
c, contours, _ = cv2.findContours(WB, mode = cv2.RETR_EXTERNAL, method = cv2.CHAIN_APPROX_NONE) # Find new contours in recently bisected mask
contours = sorted (contours, key = cv2.contourArea, reverse = True) # # Sort contours (there is only 2 contours left, plinth/background, and droplet)
contour = contours[1] # Second longest contour is the desired contour
xpointsList = [xpoints[0][0] for xpoints in contour] # Create new array of all x co-ordinates
ypointsList = [ypoints[0][1] for ypoints in contour] # Create new array of all y co-ordinates
zpointsList = np.zeros(len(xpointsList)) # # Create an array of 0 values to represent the z domain. True contour sits at 0 on all points in the z-domain
angles = np.arange(0,180,5, dtype = int) # # Creating an array of angles between to form a full drop in degrees of 5
i = 0
b = 0
d = 0
qx = np.zeros(len(xpointsList)*len(angles)) # Setting variable to equal the length of x points times the length of iterative angle calculations
qy = np.zeros(len(ypointsList)*len(angles)) # Setting variable to equal the length of x points times the length of iterative angle calculations
qz = np.zeros(len(zpointsList)*len(angles)) # Setting variable to equal the length of x points times the length of iterative angle calculations
ax = plt.axes(projection='3d')
for b in range(0,len(angles)):
angle = angles[b]
for i in range(0,len(ypointsList)):
qx[i+d] = xpointsList[i]-origin[0] # Setting the x-axis to equal it's current values around the origin
qz[i+d] = origin[0] + math.cos(math.radians(angle)) * (zpointsList[i]) - math.sin(math.radians(angle)) * (ypointsList[i] - origin[1]) # creating a z value based on 10 degree rotations of the y values around the x axis
qy[i+d] = origin[1] + math.sin(math.radians(angle)) * (zpointsList[i]) + math.cos(math.radians(angle)) * (ypointsList[i] - origin[1]) # adapting the y value based on the creation of the z values.
i = i + 1
b = b + 1
d = d + len (xpointsList)
ax.plot3D (qy, qz, qx, color = 'blue', linewidth = 0.1)
Effectively my question is, how do I use this structure to somehow find the volume using the co-ordinate arrays? I think I need to use some kind of scipy.spatial ConvexHull to fill in the areas between my rotations so that it has a surface and work from there?
I don't understand your approach (I'm not that good in Python) so can't help you fix it, but there was a similar question with much simpler solution here:
https://stackoverflow.com/a/48114604/1479630
Your situation is even easier because what you basically have is a distorted sphere, and you have a regular grid on its surface. You can iterate over all surface triangles and for each, compute volume of tetrahedron made of this triangle and center of object. If you sum up those volumes you'll get volume of entire object.
I've interpolated a spline to fit pixel data from an image with a curve that I would like to straighten. I'm not sure what tools are appropriate to solve this problem. Can someone recommend an approach?
Here's how I'm getting my spline:
import numpy as np
from skimage import io
from scipy import interpolate
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import networkx as nx
# Read a skeletonized image, return an array of points on the skeleton, and divide them into x and y coordinates
skeleton = io.imread('skeleton.png')
curvepoints = np.where(skeleton==False)
xpoints = curvepoints[1]
ypoints = -curvepoints[0]
# reformats x and y coordinates into a 2-dimensional array
inputarray = np.c_[xpoints, ypoints]
# runs a nearest neighbors algorithm on the coordinate array
clf = NearestNeighbors(2).fit(inputarray)
G = clf.kneighbors_graph()
T = nx.from_scipy_sparse_matrix(G)
# sorts coordinates according to their nearest neighbors order
order = list(nx.dfs_preorder_nodes(T, 0))
xx = xpoints[order]
yy = ypoints[order]
# Loops over all points in the coordinate array as origin, determining which results in the shortest path
paths = [list(nx.dfs_preorder_nodes(T, i)) for i in range(len(inputarray))]
mindist = np.inf
minidx = 0
for i in range(len(inputarray)):
p = paths[i] # order of nodes
ordered = inputarray[p] # ordered nodes
# find cost of that order by the sum of euclidean distances between points (i) and (i+1)
cost = (((ordered[:-1] - ordered[1:])**2).sum(1)).sum()
if cost < mindist:
mindist = cost
minidx = i
opt_order = paths[minidx]
xxx = xpoints[opt_order]
yyy = ypoints[opt_order]
# fits a spline to the ordered coordinates
tckp, u = interpolate.splprep([xxx, yyy], s=3, k=2, nest=-1)
xpointsnew, ypointsnew = interpolate.splev(np.linspace(0,1,270), tckp)
# prints spline variables
print(tckp)
# plots the spline
plt.plot(xpointsnew, ypointsnew, 'r-')
plt.show()
My broader project is to follow the approach outlined in A novel method for straightening curved text-lines in stylistic documents. That article is reasonably detailed in finding the line that describes curved text, but much less so where straightening the curve is concerned. I have trouble visualizing the only reference to straightening that I see is in the abstract:
find the angle between the normal at a point on the curve and the vertical line, and finally visit each point on the text and rotate by their corresponding angles.
I also found Geometric warp of image in python, which seems promising. If I could rectify the spline, I think that would allow me to set a range of target points for the affine transform to map to. Unfortunately, I haven't found an approach to rectify my spline and test it.
Finally, this program implements an algorithm to straighten splines, but the paper on the algorithm is behind a pay wall and I can't make sense of the javascript.
Basically, I'm lost and in need of pointers.
Update
The affine transformation was the only approach I had any idea how to start exploring, so I've been working on that since I posted. I generated a set of destination coordinates by performing an approximate rectification of the curve based on the euclidean distance between points on my b-spline.
From where the last code block left off:
# calculate euclidian distances between adjacent points on the curve
newcoordinates = np.c_[xpointsnew, ypointsnew]
l = len(newcoordinates) - 1
pointsteps = []
for index, obj in enumerate(newcoordinates):
if index < l:
ord1 = np.c_[newcoordinates[index][0], newcoordinates[index][1]]
ord2 = np.c_[newcoordinates[index + 1][0], newcoordinates[index + 1][1]]
length = spatial.distance.cdist(ord1, ord2)
pointsteps.append(length)
# calculate euclidian distance between first point and each consecutive point
xpositions = np.asarray(pointsteps).cumsum()
# compose target coordinates for the line after the transform
targetcoordinates = [(0,0),]
for element in xpositions:
targetcoordinates.append((element, 0))
# perform affine transformation with newcoordinates as control points and targetcoordinates as target coordinates
tform = PiecewiseAffineTransform()
tform.estimate(newcoordinates, targetcoordinates)
I'm presently hung up on errors with the affine transform (scipy.spatial.qhull.QhullError: QH6154 Qhull precision error: Initial simplex is flat (facet 1 is coplanar with the interior point)
), but I'm not sure whether it's because of a problem with how I'm feeding the data in, or because I'm abusing the transform to do my projection.
I got the same error with you when using scipy.spatial.ConvexHull.
First, let me explain my project: what i wanted to do is to segment the people from its background(image matting). In my code, first I read an image and a trimap, then according to the trimap, I segment the original image to foreground, bakground and unknown pixels. Here is part of the coed:
img = scipy.misc.imread('sweater_black.png') #color_image
trimap = scipy.misc.imread('sw_trimap.png', flatten='True') #trimap
bg = trimap == 0 #background
fg = trimap == 255 #foreground
unknown = True ^ np.logical_or(fg,bg) #unknown pixels
fg_px = img[fg] #here i got the rgb value of the foreground pixels,then send them to the ConvexHull
fg_hull = scipy.spatial.ConvexHull(fg_px)
But i got an error here.So I check the Array of fg_px and then I found this array is n*4. which means every scalar i send to ConvexHull has four values. Howerver, the input of ConvexHUll should be 3 dimension.
I source my error and found that the input color image is 32bits(rgb channel and alpha channel) which means it has an alpha channel. After transferring the image to 24 bit (which means only rgb channels), the code works.
In one sentence, the input of ConvexHull should be b*4, so check your input data! Hope this works for you~
I have a set of points in an example ASCII file showing a 2D image.
I would like to estimate the total area that these points are filling. There are some places inside this plane that are not filled by any point because these regions have been masked out. What I guess might be practical for estimating the area would be applying a concave hull or alpha shapes.
I tried this approach to find an appropriate alpha value, and consequently estimate the area.
from shapely.ops import cascaded_union, polygonize
import shapely.geometry as geometry
from scipy.spatial import Delaunay
import numpy as np
import pylab as pl
from descartes import PolygonPatch
from matplotlib.collections import LineCollection
def plot_polygon(polygon):
fig = pl.figure(figsize=(10,10))
ax = fig.add_subplot(111)
margin = .3
x_min, y_min, x_max, y_max = polygon.bounds
ax.set_xlim([x_min-margin, x_max+margin])
ax.set_ylim([y_min-margin, y_max+margin])
patch = PolygonPatch(polygon, fc='#999999',
ec='#000000', fill=True,
zorder=-1)
ax.add_patch(patch)
return fig
def alpha_shape(points, alpha):
if len(points) < 4:
# When you have a triangle, there is no sense
# in computing an alpha shape.
return geometry.MultiPoint(list(points)).convex_hull
def add_edge(edges, edge_points, coords, i, j):
"""
Add a line between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
return
edges.add( (i, j) )
edge_points.append(coords[ [i, j] ])
coords = np.array([point.coords[0]
for point in points])
tri = Delaunay(coords)
edges = set()
edge_points = []
# loop over triangles:
# ia, ib, ic = indices of corner points of the
# triangle
for ia, ib, ic in tri.vertices:
pa = coords[ia]
pb = coords[ib]
pc = coords[ic]
# Lengths of sides of triangle
a = np.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
b = np.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
c = np.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)
# Semiperimeter of triangle
s = (a + b + c)/2.0
# Area of triangle by Heron's formula
area = np.sqrt(s*(s-a)*(s-b)*(s-c))
circum_r = a*b*c/(4.0*area)
# Here's the radius filter.
#print circum_r
if circum_r < 1.0/alpha:
add_edge(edges, edge_points, coords, ia, ib)
add_edge(edges, edge_points, coords, ib, ic)
add_edge(edges, edge_points, coords, ic, ia)
m = geometry.MultiLineString(edge_points)
triangles = list(polygonize(m))
return cascaded_union(triangles), edge_points
points=[]
with open("test.asc") as f:
for line in f:
coords=map(float,line.split(" "))
points.append(geometry.shape(geometry.Point(coords[0],coords[1])))
print geometry.Point(coords[0],coords[1])
x = [p.x for p in points]
y = [p.y for p in points]
pl.figure(figsize=(10,10))
point_collection = geometry.MultiPoint(list(points))
point_collection.envelope
convex_hull_polygon = point_collection.convex_hull
_ = plot_polygon(convex_hull_polygon)
_ = pl.plot(x,y,'o', color='#f16824')
concave_hull, edge_points = alpha_shape(points, alpha=0.001)
lines = LineCollection(edge_points)
_ = plot_polygon(concave_hull)
_ = pl.plot(x,y,'o', color='#f16824')
I get this result but I would like that this method could detect the hole in the middle.
Update
This is how my real data looks like:
My question is what is the best way to estimate an area of the aforementioned shape? I can not figure out what has gone wrong that this code doesn't work properly?!! Any help will be appreciated.
Okay, here's the idea. A Delaunay triangulation is going to generate triangles which are indiscriminately large. It's also going to be problematic because only triangles will be generated.
Therefore, we'll generate what you might call a "fuzzy Delaunay triangulation". We'll put all the points into a kd-tree and, for each point p, look at its k nearest neighbors. The kd-tree makes this fast.
For each of those k neighbors, find the distance to the focal point p. Use this distance to generate a weighting. We want nearby points to be favored over more distant points, so an exponential function exp(-alpha*dist) is appropriate here. Use the weighted distances to build a probability density function describing the probability of drawing each point.
Now, draw from that distribution a large number of times. Nearby points will be chosen often while farther away points will be chosen less often. For point drawn, make a note of how many times it was drawn for the focal point. The result is a weighted graph where each edge in the graph connects nearby points and is weighted by how often the pairs were chosen.
Now, cull all edges from the graph whose weights are too small. These are the points which are probably not connected. The result looks like this:
Now, let's throw all of the remaining edges into shapely. We can then convert the edges into very small polygons by buffering them. Like so:
Differencing the polygons with a large polygon covering the entire region will yield polygons for the triangulation. THIS MAY TAKE A WHILE. The result looks like this:
Finally, cull off all of the polygons which are too large:
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
import random
import scipy
import scipy.spatial
import networkx as nx
import shapely
import shapely.geometry
import matplotlib
dat = np.loadtxt('test.asc')
xycoors = dat[:,0:2]
xcoors = xycoors[:,0] #Convenience alias
ycoors = xycoors[:,1] #Convenience alias
npts = len(dat[:,0]) #Number of points
dist = scipy.spatial.distance.euclidean
def GetGraph(xycoors, alpha=0.0035):
kdt = scipy.spatial.KDTree(xycoors) #Build kd-tree for quick neighbor lookups
G = nx.Graph()
npts = np.max(xycoors.shape)
for x in range(npts):
G.add_node(x)
dist, idx = kdt.query(xycoors[x,:], k=10) #Get distances to neighbours, excluding the cenral point
dist = dist[1:] #Drop central point
idx = idx[1:] #Drop central point
pq = np.exp(-alpha*dist) #Exponential weighting of nearby points
pq = pq/np.sum(pq) #Convert to a PDF
choices = np.random.choice(idx, p=pq, size=50) #Choose neighbors based on PDF
for c in choices: #Insert neighbors into graph
if G.has_edge(x, c): #Already seen neighbor
G[x][c]['weight'] += 1 #Strengthen connection
else:
G.add_edge(x, c, weight=1) #New neighbor; build connection
return G
def PruneGraph(G,cutoff):
newg = G.copy()
bad_edges = set()
for x in newg:
for k,v in newg[x].items():
if v['weight']<cutoff:
bad_edges.add((x,k))
for b in bad_edges:
try:
newg.remove_edge(*b)
except nx.exception.NetworkXError:
pass
return newg
def PlotGraph(xycoors,G,cutoff=6):
xcoors = xycoors[:,0]
ycoors = xycoors[:,1]
G = PruneGraph(G,cutoff)
plt.plot(xcoors, ycoors, "o")
for x in range(npts):
for k,v in G[x].items():
plt.plot((xcoors[x],xcoors[k]),(ycoors[x],ycoors[k]), 'k-', lw=1)
plt.show()
def GetPolys(xycoors,G):
#Get lines connecting all points in the graph
xcoors = xycoors[:,0]
ycoors = xycoors[:,1]
lines = []
for x in range(npts):
for k,v in G[x].items():
lines.append(((xcoors[x],ycoors[x]),(xcoors[k],ycoors[k])))
#Get bounds of region
xmin = np.min(xycoors[:,0])
xmax = np.max(xycoors[:,0])
ymin = np.min(xycoors[:,1])
ymax = np.max(xycoors[:,1])
mls = shapely.geometry.MultiLineString(lines) #Bundle the lines
mlsb = mls.buffer(2) #Turn lines into narrow polygons
bbox = shapely.geometry.box(xmin,ymin,xmax,ymax) #Generate background polygon
polys = bbox.difference(mlsb) #Subtract to generate polygons
return polys
def PlotPolys(polys,area_cutoff):
fig, ax = plt.subplots(figsize=(8, 8))
for polygon in polys:
if polygon.area<area_cutoff:
mpl_poly = matplotlib.patches.Polygon(np.array(polygon.exterior), alpha=0.4, facecolor=np.random.rand(3,1))
ax.add_patch(mpl_poly)
ax.autoscale()
fig.show()
#Functional stuff starts here
G = GetGraph(xycoors, alpha=0.0035)
#Choose a value that rips off an appropriate amount of the left side of this histogram
weights = sorted([v['weight'] for x in G for k,v in G[x].items()])
plt.hist(weights, bins=20);plt.show()
PlotGraph(xycoors,G,cutoff=6) #Plot the graph to ensure our cut-offs were okay. May take a while
prunedg = PruneGraph(G,cutoff=6) #Prune the graph
polys = GetPolys(xycoors,prunedg) #Get polygons from graph
areas = sorted(p.area for p in polys)
plt.plot(areas)
plt.hist(areas,bins=20);plt.show()
area_cutoff = 150000
PlotPolys(polys,area_cutoff=area_cutoff)
good_polys = ([p for p in polys if p.area<area_cutoff])
total_area = sum([p.area for p in good_polys])
Here's a thought: use k-means clustering.
You can accomplish this in Python as follows:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
dat = np.loadtxt('test.asc')
xycoors = dat[:,0:2]
fit = KMeans(n_clusters=2).fit(xycoors)
plt.scatter(dat[:,0],dat[:,1], c=fit.labels_)
plt.axes().set_aspect('equal', 'datalim')
plt.gray()
plt.show()
Using your data, this gives the following result:
Now, you can take the convex hull of the top cluster and the bottom cluster and calculate the areas of each separately. Adding the areas then becomes an estimator of the area of their union, but, cunningly, avoids the hole in the middle.
To fine-tune your results, you can play with the number of clusters and the number of different starts to the algorithm (the algorithm is randomized and is typically run more than once).
You asked, for instance, if two clusters will always leave the hole in the middle. I've used the following code to experiment with that. I generate a uniform distribution of points and then chop out a randomly sized and orientated ellipse to simulate a hole.
#!/usr/bin/env python3
import sklearn
import sklearn.cluster
import numpy as np
import matplotlib.pyplot as plt
PWIDTH = 6
PHEIGHT = 6
def GetPoints(num):
return np.random.rand(num,2)*300-150 #Centered about zero
def MakeHole(pts): #Chop out a randomly orientated and sized ellipse
a = np.random.uniform(10,150) #Semi-major axis
b = np.random.uniform(10,150) #Semi-minor axis
h = np.random.uniform(-150,150) #X-center
k = np.random.uniform(-150,150) #Y-center
A = np.random.uniform(0,2*np.pi) #Angle of rotation
surviving_points = []
for pt in range(pts.shape[0]):
x = pts[pt,0]
y = pts[pt,1]
if ((x-h)*np.cos(A)+(y-k)*np.sin(A))**2/a/a+((x-h)*np.sin(A)-(y-k)*np.cos(A))**2/b/b>1:
surviving_points.append(pt)
return pts[surviving_points,:]
def ShowManyClusters(pts,fitter,clusters,title):
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
fig,axs = plt.subplots(PWIDTH,PHEIGHT)
axs = axs.ravel()
for i in range(PWIDTH*PHEIGHT):
lbls = fitter(pts[i],clusters)
axs[i].scatter(pts[i][:,0],pts[i][:,1], c=colors[lbls])
axs[i].get_xaxis().set_ticks([])
axs[i].get_yaxis().set_ticks([])
plt.suptitle(title)
#plt.show()
plt.savefig('/z/'+title+'.png')
fitters = {
'SpectralClustering': lambda x,clusters: sklearn.cluster.SpectralClustering(n_clusters=clusters,affinity='nearest_neighbors').fit(x).labels_,
'KMeans': lambda x,clusters: sklearn.cluster.KMeans(n_clusters=clusters).fit(x).labels_,
'AffinityPropagation': lambda x,clusters: sklearn.cluster.AffinityPropagation().fit(x).labels_,
}
np.random.seed(1)
pts = []
for i in range(PWIDTH*PHEIGHT):
temp = GetPoints(300)
temp = MakeHole(temp)
pts.append(temp)
for name,fitter in fitters.items():
for clusters in [2,3]:
np.random.seed(1)
ShowManyClusters(pts,fitter,clusters,"{0}: {1} clusters".format(name,clusters))
Consider the results for K-Means:
At least to my eye, it seems as though using two clusters performs worst when the "hole" separates the data into two separate blobs. (In this case that occurs when the ellipse is orientated such that it overlaps two edges of the rectangular region containing the sample points.) Using three clusters resolves most of these difficulties.
You'll also notice that K-means produces some counter-intuitive results on the 1st Column, 3rd Row as well as on the 3rd Column, 4th Row. Reviewing sklearn's menagerie of clustering methods here shows the following comparison image:
From this, image it seems as though SpectralClustering produces results that align with what we want. Trying this on the same data above fixes the problems mentioned (see 1st Column, 3rd Row and 3rd Column, 4th Row).
The foregoing suggests that Spectral clustering with three clusters should be adequate for most situations of this sort.
Although you seem intent on doing a concave shape, here is an alternate route that is hella fast and I think would give you very a pretty stable reading:
Create a function which takes as an argument (int radiusOfInfluence). Inside the function run a voxel filter with that as the radius. Then simply multiply the area of that circle (pi*AOI^2) by the number of remaining points in the cloud. This should give you a relatively robust estimation of area and would be very resilient to holes and weird edges.
Some things to consider:
-This will give you a positive overshoot of area due to over-reaching edges by exactly one radius. A modification to adjust for this could be to run a statistical outlier removal filter (in inverse mode) to acquire statistical edge points. Then an assumption can be made that approximately half of each edge point is lying outside the shape, subtract half the number of points found from your total count prior to multiplying into area.
-The radius of influence largely determines this function's hole detection as a larger one will allow single points to cover larger areas, but also by tuning the std cutoff on the stat outlier filter, you can more aggressively detect interior holes and adjust your area that way.
It really begs the question of what you are after, as this is more of a shot accuracy/ shot grouping type assessment assuming a reasonably distributed set of samples. Your method kinda is making the assumption that your outer edge points are the absolute limits of what is possible (which may be a fair assumption depending on the situation)
EDIT-----------------------
I do not have time to write out example code, but I can further explain to aid in understanding.
At the core of this is the voxel filter. Very simply, it sets a seed point in x,y coordinates and then creates a grid over the whole space which has units (grid spacing) on both axes of a user specified filter radius. Inside each grid box, it will average all points to a single point. This is very important for this concept because it almost entirely eliminates the issue of overlap.
The second part (the inverse stat outlier removal) is just a bit of cleverness to tighten your edge fit. Basically, stat outlier is built to remove noise by looking at the distance from each point to its (k) nearest neighbors. After generating the average distance to k nearest neighbors for each point, it sets up a histogram and a user defined parameter acts as a binary threshold for keeping or removing points. When inverted and set to a reasonable cutt-off (~0.75 std should work), instead it will delete all the points that are in the bulk of the object (ie only leaving edge points). The reason this is important is that technically these points are over-reaching the boundary of your object by 1 radius. Although some will be on acute and some on obtuse edge angles (ie more than or less than half a circle of overfill) taking off 1/2 of a circle area per point should over the whole object give you a pretty sound improvement on edge fit.
Keep in mind though that at the end of the day, this is just going to give you a number. As far as stress testing, I suggest creating contrived point clouds of known area and or creating a graphical output that shows where you are dropping circles and half circles (oriented towards the interior of the object if you are fancy).
The knobs you will want to turn to improve this method are:
Voxel filter radius, area of influence per point (could actually be controlled separately from vox filter radius, though they should remain pretty close to one another), std cutt-off.
Hope this helped to clarify, cheers!
Edit:
I have noticed that you have your own code to compute the alpha shape,
and the areas of Delaunay triangles are just there, so computing the area of the shape is even easier...
Just add the areas of triangles, if triangle is going to be added to the alpha-shape polygon.
If you want to detect holes... add a secondary threshold to avoid adding triangles with an area greater than the threshold. For this example, a value of max_area = 99999 will remove the hole.
The only problem is the way you create the graphic output, because you will not see the hole.
def alpha_shape(points, alpha, max_area):
if len(points) < 4:
# When you have a triangle, there is no sense
# in computing an alpha shape.
return geometry.MultiPoint(list(points)).convex_hull , 0
def add_edge(edges, edge_points, coords, i, j):
"""
Add a line between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
return
edges.add( (i, j) )
edge_points.append(coords[ [i, j] ])
coords = np.array([point.coords[0]
for point in points])
tri = Delaunay(coords)
total_area = 0
edges = set()
edge_points = []
# loop over triangles:
# ia, ib, ic = indices of corner points of the
# triangle
for ia, ib, ic in tri.vertices:
pa = coords[ia]
pb = coords[ib]
pc = coords[ic]
# Lengths of sides of triangle
a = np.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
b = np.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
c = np.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)
# Semiperimeter of triangle
s = (a + b + c)/2.0
# Area of triangle by Heron's formula
area = np.sqrt(s*(s-a)*(s-b)*(s-c))
circum_r = a*b*c/(4.0*area)
# Here's the radius filter.
# print("radius", circum_r)
if circum_r < 1.0/alpha and area < max_area:
add_edge(edges, edge_points, coords, ia, ib)
add_edge(edges, edge_points, coords, ib, ic)
add_edge(edges, edge_points, coords, ic, ia)
total_area += area
m = geometry.MultiLineString(edge_points)
triangles = list(polygonize(m))
return cascaded_union(triangles), edge_points, total_area
The
Old answer:
To compute the area of an irregular simple polygon, you can use the Shoelace formula, and the CCW coordinates of the boundary as input.
If you want to detect holes inside of your cloud, you have to remove the Delaunay triangles with a circumradius greater that a secondary threshold.
The ideal is: Compute the Delaunay triangulation and filter with your current alpha shape. Then, compute the circumradius of every triangle and remove those triangles with circumradius much bigger than average circumradius.
To compute the area of an irregular polygon with holes, use the Shoelace formula for each hole boundary. Input the external boundary in CCW (positive) order to obtain the area. Then input the boundary of each hole in CW (negative) order, to obtain a (negative) value for area.
I need to fit several thousand 2D gaussian functions to star profiles (14x14 pixel blocks) on CCD image and get centroid coordinates, FWHM over long and short axes and angle of rotation of the long axis. The problem is that my current code takes too long to execute. Several 10s of seconds on i7 processor and I need it to make much faster. Preferably as fast as possible. I tested several gaussian fitting functions and it appears that the one used in AsPyLib is the fastest http://www.aspylib.com/doc/aspylib_fitting.html
Below is the code that I'm trying to make run faster. Profiling showed that most of the time is spent inside mplfit function.So my question is if this can be accelerated? I tried cythonizing the code but it provided really minor boost. Calculating moments (10x faster) is not suitable for many images that I have due to noise issues which make estimates unreliable. And probably due to the fact that star profiles are often far from being gaussian due to aberrations.
Multiprocessing wasn't a solution either as new process creation overhead was too high for fitting just 1 star profile.
Any ideas where to look further?
import numpy as np
from scipy.optimize import leastsq
def fit_gauss_elliptical(xy, data):
"""
---------------------
Purpose
Fitting a star with a 2D elliptical gaussian PSF.
---------------------
Inputs
* xy (list) = list with the form [x,y] where x and y are the integer positions in the complete image of the first pixel (the one with x=0 and y=0) of the small subimage that is used for fitting.
* data (2D Numpy array) = small subimage, obtained from the full FITS image by slicing. It must contain a single object : the star to be fitted, placed approximately at the center.
---------------------
Output (list) = list with 8 elements, in the form [maxi, floor, height, mean_x, mean_y, fwhm_small, fwhm_large, angle]. The list elements are respectively:
- maxi is the value of the star maximum signal,
- floor is the level of the sky background (fit result),
- height is the PSF amplitude (fit result),
- mean_x and mean_y are the star centroid x and y positions, on the full image (fit results),
- fwhm_small is the smallest full width half maximum of the elliptical gaussian PSF (fit result) in pixels
- fwhm_large is the largest full width half maximum of the elliptical gaussian PSF (fit result) in pixels
- angle is the angular direction of the largest fwhm, measured clockwise starting from the vertical direction (fit result) and expressed in degrees. The direction of the smallest fwhm is obtained by adding 90 deg to angle.
---------------------
"""
#find starting values
dat=data.flatten()
maxi = data.max()
floor = np.ma.median(dat)
height = maxi - floor
if height==0.0: #if star is saturated it could be that median value is 32767 or 65535 --> height=0
floor = np.mean(dat)
height = maxi - floor
mean_x = (np.shape(data)[0]-1)/2
mean_y = (np.shape(data)[1]-1)/2
fwhm = np.sqrt(np.sum((data>floor+height/2.).flatten()))
fwhm_1 = fwhm
fwhm_2 = fwhm
sig_1 = fwhm_1 / (2.*np.sqrt(2.*np.log(2.)))
sig_2 = fwhm_2 / (2.*np.sqrt(2.*np.log(2.)))
angle = 0.
p0 = floor, height, mean_x, mean_y, sig_1, sig_2, angle
#---------------------------------------------------------------------------------
#fitting gaussian
def gauss(floor, height, mean_x, mean_y, sig_1, sig_2, angle):
A = (np.cos(angle)/sig_1)**2. + (np.sin(angle)/sig_2)**2.
B = (np.sin(angle)/sig_1)**2. + (np.cos(angle)/sig_2)**2.
C = 2.0*np.sin(angle)*np.cos(angle)*(1./(sig_1**2.)-1./(sig_2**2.))
#do not forget factor 0.5 in exp(-0.5*r**2./sig**2.)
return lambda x,y: floor + height*np.exp(-0.5*(A*((x-mean_x)**2)+B*((y-mean_y)**2)+C*(x-mean_x)*(y-mean_y)))
def err(p,data):
return np.ravel(gauss(*p)(*np.indices(data.shape))-data)
p = leastsq(err, p0, args=(data), maxfev=200)
p = p[0]
#---------------------------------------------------------------------------------
#formatting results
floor = p[0]
height = p[1]
mean_x = p[2] + xy[0]
mean_y = p[3] + xy[1]
#angle gives the direction of the p[4]=sig_1 axis, starting from x (vertical) axis, clockwise in direction of y (horizontal) axis
if np.abs(p[4])>np.abs(p[5]):
fwhm_large = np.abs(p[4]) * (2.*np.sqrt(2.*np.log(2.)))
fwhm_small = np.abs(p[5]) * (2.*np.sqrt(2.*np.log(2.)))
angle = np.arctan(np.tan(p[6]))
else: #then sig_1 is the smallest : we want angle to point to sig_y, the largest
fwhm_large = np.abs(p[5]) * (2.*np.sqrt(2.*np.log(2.)))
fwhm_small = np.abs(p[4]) * (2.*np.sqrt(2.*np.log(2.)))
angle = np.arctan(np.tan(p[6]+np.pi/2.))
output = [maxi, floor, height, mean_x, mean_y, fwhm_small, fwhm_large, angle]
return output
I have come across this method before but not implemented it:
http://www.nature.com/nmeth/journal/v9/n7/full/nmeth.2071.html
Might be worth a look as a short-cut method. There is even code to copy on his website http://physics-server.uoregon.edu/~raghu/particle_tracking.html