Fit a line segment to a set of points

Fit a line segment to a set of points - python

I'm trying to fit a line segment to a set of points but I have trouble finding an algorithm for it. I have a 2D line segment L and a set of 2D points C. L can be represented in any suitable way (I don't care), like support and definition vector, two points, a linear equation with left and right bound, ... The only important thing is that the line has a beginning and an end, so it's not infinite.
I want to fit L in C, so that the sum of all distances of c to L (where c is a point in C) is minimized. This is a least squares problem but I (think) cannot use polynmoial fitting, because L is only a segment. My mathematical knowledge in that area is a bit lacking so any hints on further reading would be appreciated aswell.
Here is an illustration of my problem:
The orange line should be fitted to the blue points so that the sum of squares of distances of each point to the line is minimal. I don't mind if the solution is in a different language or not code at all, as long as I can extract an algorithm from it.
Since this is more of a mathematical question I'm not sure if it's ok for SO or should be moved to cross validated or math exchange.

This solution is relatively similar to one already posted here, but I think is slightly more efficient, elegant and understandable, which is why I posted it despite the similarity.
As was already written, the min(max(...)) formulation makes it hard to solve this problem analytically, which is why scipy.optimize fits well.
The solution is based on the mathematical formulation for distance between a point and a finite line segment outlined in https://math.stackexchange.com/questions/330269/the-distance-from-a-point-to-a-line-segment
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize, NonlinearConstraint
def calc_distance_from_point_set(v_):
#v_ is accepted as 1d array to make easier with scipy.optimize
#Reshape into two points
v = (v_[:2].reshape(2, 1), v_[2:].reshape(2, 1))
#Calculate t* for s(t*) = v_0 + t*(v_1-v_0), for the line segment w.r.t each point
t_star_matrix = np.minimum(np.maximum(np.matmul(P-v[0].T, v[1]-v[0]) / np.linalg.norm(v[1]-v[0])**2, 0), 1)
#Calculate s(t*)
s_t_star_matrix = v[0]+((t_star_matrix.ravel())*(v[1]-v[0]))
#Take distance between all points and respective point on segment
distance_from_every_point = np.linalg.norm(P.T -s_t_star_matrix, axis=0)
return np.sum(distance_from_every_point)
if __name__ == '__main__':
#Random points from bounding box
box_1 = np.random.uniform(-5, 5, 20)
box_2 = np.random.uniform(-5, 5, 20)
P = np.stack([box_1, box_2], axis=1)
segment_length = 3
segment_length_constraint = NonlinearConstraint(fun=lambda x: np.linalg.norm(np.array([x[0], x[1]]) - np.array([x[2] ,x[3]])), lb=[segment_length], ub=[segment_length])
point = minimize(calc_distance_from_point_set, (0.0,-.0,1.0,1.0), options={'maxiter': 100, 'disp': True},constraints=segment_length_constraint).x
plt.scatter(box_1, box_2)
plt.plot([point[0], point[2]], [point[1], point[3]])
Example result:

Here is a proposition in python. The distance between the points and the line is computed based on the approach proposed here: Fit a line segment to a set of points
The fact that the segment has a finite length, which impose the usage of min and max function, or if tests to see whether we have to use perpendicular distance or distance to one of the end points, makes really difficult (impossible?) to get an analytic solution.
The proposed solution will thus use optimization algorithm to approach the best solution. It uses scipy.optimize.minimize, see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
Since the segment length is fixed, we have only three degrees of freedom. In the proposed solution I use x and y coordinate of the starting segment point and segment slope as free parameters. I use getCoordinates function to get starting and ending point of the segment from these 3 parameters and the length.
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import math as m
from scipy.spatial import distance
# Plot the points and the segment
def plotFunction(points,x1,x2):
'Plotting function for plane and iterations'
plt.plot(points[:,0],points[:,1],'ro')
plt.plot([x1[0],x2[0]],[x1[1],x2[1]])
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()
# Get the sum of the distance between all the points and the segment
# The segment is defined by guess and length were:
# guess[0]=x coordinate of the starting point
# guess[1]=y coordinate of the starting point
# guess[2]=slope
# Since distance is always >0 no need to use root mean square values
def getDist(guess,points,length):
start_pt=np.array([guess[0],guess[1]])
slope=guess[2]
[x1,x2]=getCoordinates(start_pt,slope,length)
total_dist=0
# Loop over each points to get the distance between the point and the segment
for pt in points:
total_dist+=minimum_distance(x1,x2,pt,length)
return(total_dist)
# Return minimum distance between line segment x1-x2 and point pt
# Adapted from https://stackoverflow.com/questions/849211/shortest-distance-between-a-point-and-a-line-segment
def minimum_distance(x1, x2, pt,length):
length2 = length**2 # i.e. |x1-x2|^2 - avoid a sqrt, we use length that we already know to avoid re-computation
if length2 == 0.0:
return distance.euclidean(p, v);
# Consider the line extending the segment, parameterized as x1 + t (x2 - x1).
# We find projection of point p onto the line.
# It falls where t = [(pt-x1) . (x2-x1)] / |x2-x1|^2
# We clamp t from [0,1] to handle points outside the segment vw.
t = max(0, min(1, np.dot(pt - x1, x2 - x1) / length2));
projection = x1 + t * (x2 - x1); # Projection falls on the segment
return distance.euclidean(pt, projection);
# Get coordinates of start and end point of the segment from start_pt,
# slope and length, obtained by solving slope=dy/dx, dx^2+dy^2=length
def getCoordinates(start_pt,slope,length):
x1=start_pt
dx=length/m.sqrt(slope**2+1)
dy=slope*dx
x2=start_pt+np.array([dx,dy])
return [x1,x2]
if __name__ == '__main__':
# Generate random points
num_points=20
points=np.random.rand(num_points,2)
# Starting position
length=0.5
start_pt=np.array([0.25,0.5])
slope=0
#Use scipy.optimize, minimize to find the best start_pt and slope combination
res = minimize(getDist, x0=[start_pt[0],start_pt[1],slope], args=(points,length), method="Nelder-Mead")
# Retreive best parameters
start_pt=np.array([res.x[0],res.x[1]])
slope=res.x[2]
[x1,x2]=getCoordinates(start_pt,slope,length)
print("\n** The best segment found is defined by:")
print("\t** start_pt:\t",x1)
print("\t** end_pt:\t",x2)
print("\t** slope:\t",slope)
print("** The total distance is:",getDist([x1[0],x2[1],slope],points,length),"\n")
# Plot results
plotFunction(points,x1,x2)

Related

Underestimation of f(x) by using a piecewise linear function

I am trying to check if there is any Matlab/Python procedure to underestimate f(x) by using a piecewise linear function g(x). That is g(x) needs to be less or equal to, f(x). See the picture and code below. Could you please help to modify this code to find how to underestimate this function?
x = 0.000000001:0.001:1;
y = abs(f(x));
%# Find section sizes, by using an inverse of the approximation of the derivative
numOfSections = 5;
totalRange = max(x(:))-min(x(:));
%# The relevant nodes
xNodes = x(1) + [ 0 cumsum(sectionSize)];
yNodes = abs(f(xNodes));
figure;plot(x,y);
hold on;
plot (xNodes,yNodes,'r');
scatter (xNodes,yNodes,'r');
legend('abs(f(x))','adaptive linear interpolation');

This approach is based on Luis Mendo's comment. The idea is the following:
Select a number of points from the original curve, your final piecewise linear curve will pass through these points
For each point calculate the equation of the tangent to the original curve. Because your graph is convex, the tangents of consecutive points in your sample will intersect below the curve
Calculate, for each set of consecutive tangents, the x-coordinate of the point of intersection. Use the equation of the tangent to calculate the corresponding y-coordinate
Now, after reordering the points, this gives you a piecewise linear approximation with the constraints you want.
h = 0.001;
x = 0.000000001:h:1;
y = abs(log2(x));
% Derivative of function on all the points
der = diff(y)/h;
NPts = 10; % Number of sample points
% Draw the index of the points by which the output will pass at random
% Still make sure you got first and last point
idx = randperm(length(x)-3,NPts-2);
idx = [1 idx+1 length(x)-1];
idx = sort(idx);
x_pckd = x(idx);
y_pckd = y(idx);
der_pckd = der(idx);
% Use obscure math to calculate the points of intersection
xder = der_pckd.*x_pckd;
x_2add = -(diff(y_pckd)-(diff(xder)))./diff(der_pckd);
y_2add = der_pckd(1:(end-1)).*(x_2add-(x_pckd(1:(end-1))))+y_pckd(1:(end-1));
% Calculate the error as the sum of the errors made at the middle points
Err_add = sum(abs(y_2add-interp1(x,y,x_2add)));
% Get final x and y coordinates of interpolant
x_final = [reshape([x_pckd(1:end-1);x_2add],1,[]) x_pckd(end)];
y_final = [reshape([y_pckd(1:end-1);y_2add],1,[]) y_pckd(end)];
figure;
plot(x,y,'-k');
hold on
plot(x_final,y_final,'-or')
You can see in my code that the points are drawn at random. If you want to do some sort of optimization (e.g. what is the set of points that minimizes the error), you can just run this a high amount of time and keep track of the best contender. For example, 10000 random draws see the rise of this guy:

Inverse FFT returns negative values when it should not

I have several points (x,y,z coordinates) in a 3D box with associated masses. I want to draw an histogram of the mass-density that is found in spheres of a given radius R.
I have written a code that, providing I did not make any errors which I think I may have, works in the following way:
My "real" data is something huge thus I wrote a little code to generate non overlapping points randomly with arbitrary mass in a box.
I compute a 3D histogram (weighted by mass) with a binning about 10 times smaller than the radius of my spheres.
I take the FFT of my histogram, compute the wave-modes (kx, ky and kz) and use them to multiply my histogram in Fourier space by the analytic expression of the 3D top-hat window (sphere filtering) function in Fourier space.
I inverse FFT my newly computed grid.
Thus drawing a 1D-histogram of the values on each bin would give me what I want.
My issue is the following: given what I do there should not be any negative values in my inverted FFT grid (step 4), but I get some, and with values much higher that the numerical error.
If I run my code on a small box (300x300x300 cm3 and the points of separated by at least 1 cm) I do not get the issue. I do get it for 600x600x600 cm3 though.
If I set all the masses to 0, thus working on an empty grid, I do get back my 0 without any noted issues.
I here give my code in a full block so that it is easily copied.
import numpy as np
import matplotlib.pyplot as plt
import random
from numba import njit
# 1. Generate a bunch of points with masses from 1 to 3 separated by a radius of 1 cm
radius = 1
rangeX = (0, 100)
rangeY = (0, 100)
rangeZ = (0, 100)
rangem = (1,3)
qty = 20000 # or however many points you want
# Generate a set of all points within 1 of the origin, to be used as offsets later
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
for z in range(-radius, radius+1):
if x*x + y*y + z*z<= radius*radius:
deltas.add((x,y,z))
X = []
Y = []
Z = []
M = []
excluded = set()
for i in range(qty):
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
z = random.randrange(*rangeZ)
m = random.uniform(*rangem)
if (x,y,z) in excluded: continue
X.append(x)
Y.append(y)
Z.append(z)
M.append(m)
excluded.update((x+dx, y+dy, z+dz) for (dx,dy,dz) in deltas)
print("There is ",len(X)," points in the box")
# Compute the 3D histogram
a = np.vstack((X, Y, Z)).T
b = 200
H, edges = np.histogramdd(a, weights=M, bins = b)
# Compute the FFT of the grid
Fh = np.fft.fftn(H, axes=(-3,-2, -1))
# Compute the different wave-modes
kx = 2*np.pi*np.fft.fftfreq(len(edges[0][:-1]))*len(edges[0][:-1])/(np.amax(X)-np.amin(X))
ky = 2*np.pi*np.fft.fftfreq(len(edges[1][:-1]))*len(edges[1][:-1])/(np.amax(Y)-np.amin(Y))
kz = 2*np.pi*np.fft.fftfreq(len(edges[2][:-1]))*len(edges[2][:-1])/(np.amax(Z)-np.amin(Z))
# I create a matrix containing the values of the filter in each point of the grid in Fourier space
R = 5
Kh = np.empty((len(kx),len(ky),len(kz)))
#njit(parallel=True)
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2) != 0:
Kh[i][j][k] = (np.sin((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)-(np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R*np.cos((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R))*3/((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)**3
else:
Kh[i][j][k] = 1
return Kh
Kh = func_njit(kx, ky, kz, Kh)
# I multiply each point of my grid by the associated value of the filter (multiplication in Fourier space = convolution in real space)
Gh = np.multiply(Fh, Kh)
# I take the inverse FFT of my filtered grid. I take the real part to get back floats but there should only be zeros for the imaginary part.
Density = np.real(np.fft.ifftn(Gh,axes=(-3,-2, -1)))
# Here it shows if there are negative values the magnitude of the error
print(np.min(Density))
D = Density.flatten()
N = np.mean(D)
# I then compute the histogram I want
hist, bins = np.histogram(D/N, bins='auto', density=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
plt.xlabel('rho/rhom')
plt.ylabel('P(rho)')
plt.show()
Do you know why I'm getting these negative values? Do you think there is a simpler way to proceed?
Sorry if this is a very long post, I tried to make it very clear and will edit it with your comments, thanks a lot!
-EDIT-
A follow-up question on the issue can be found [here].1

The filter you create in the frequency domain is only an approximation to the filter you want to create. The problem is that we are dealing with the DFT here, not the continuous-domain FT (with its infinite frequencies). The Fourier transform of a ball is indeed the function you describe, however this function is infinitely large -- it is not band-limited!
By sampling this function only within a window, you are effectively multiplying it with an ideal low-pass filter (the rectangle of the domain). This low-pass filter, in the spatial domain, has negative values. Therefore, the filter you create also has negative values in the spatial domain.
This is a slice through the origin of the inverse transform of Kh (after I applied fftshift to move the origin to the middle of the image, for better display):
As you can tell here, there is some ringing that leads to negative values.
One way to overcome this ringing is to apply a windowing function in the frequency domain. Another option is to generate a ball in the spatial domain, and compute its Fourier transform. This second option would be the simplest to achieve. Do remember that the kernel in the spatial domain must also have the origin at the top-left pixel to obtain a correct FFT.
A windowing function is typically applied in the spatial domain to avoid issues with the image border when computing the FFT. Here, I propose to apply such a window in the frequency domain to avoid similar issues when computing the IFFT. Note, however, that this will always further reduce the bandwidth of the kernel (the windowing function would work as a low-pass filter after all), and therefore yield a smoother transition of foreground to background in the spatial domain (i.e. the spatial domain kernel will not have as sharp a transition as you might like). The best known windowing functions are Hamming and Hann windows, but there are many others worth trying out.
Unsolicited advice:
I simplified your code to compute Kh to the following:
kr = np.sqrt(kx[:,None,None]**2 + ky[None,:,None]**2 + kz[None,None,:]**2)
kr *= R
Kh = (np.sin(kr)-kr*np.cos(kr))*3/(kr)**3
Kh[0,0,0] = 1
I find this easier to read than the nested loops. It should also be significantly faster, and avoid the need for njit. Note that you were computing the same distance (what I call kr here) 5 times. Factoring out such computation is not only faster, but yields more readable code.

Just a guess:
Where do you get the idea that the imaginary part MUST be zero? Have you ever tried to take the absolute values (sqrt(re^2 + im^2)) and forget about the phase instead of just taking the real part? Just something that came to my mind.

Straighten B-Spline

I've interpolated a spline to fit pixel data from an image with a curve that I would like to straighten. I'm not sure what tools are appropriate to solve this problem. Can someone recommend an approach?
Here's how I'm getting my spline:
import numpy as np
from skimage import io
from scipy import interpolate
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import networkx as nx
# Read a skeletonized image, return an array of points on the skeleton, and divide them into x and y coordinates
skeleton = io.imread('skeleton.png')
curvepoints = np.where(skeleton==False)
xpoints = curvepoints[1]
ypoints = -curvepoints[0]
# reformats x and y coordinates into a 2-dimensional array
inputarray = np.c_[xpoints, ypoints]
# runs a nearest neighbors algorithm on the coordinate array
clf = NearestNeighbors(2).fit(inputarray)
G = clf.kneighbors_graph()
T = nx.from_scipy_sparse_matrix(G)
# sorts coordinates according to their nearest neighbors order
order = list(nx.dfs_preorder_nodes(T, 0))
xx = xpoints[order]
yy = ypoints[order]
# Loops over all points in the coordinate array as origin, determining which results in the shortest path
paths = [list(nx.dfs_preorder_nodes(T, i)) for i in range(len(inputarray))]
mindist = np.inf
minidx = 0
for i in range(len(inputarray)):
p = paths[i] # order of nodes
ordered = inputarray[p] # ordered nodes
# find cost of that order by the sum of euclidean distances between points (i) and (i+1)
cost = (((ordered[:-1] - ordered[1:])**2).sum(1)).sum()
if cost < mindist:
mindist = cost
minidx = i
opt_order = paths[minidx]
xxx = xpoints[opt_order]
yyy = ypoints[opt_order]
# fits a spline to the ordered coordinates
tckp, u = interpolate.splprep([xxx, yyy], s=3, k=2, nest=-1)
xpointsnew, ypointsnew = interpolate.splev(np.linspace(0,1,270), tckp)
# prints spline variables
print(tckp)
# plots the spline
plt.plot(xpointsnew, ypointsnew, 'r-')
plt.show()
My broader project is to follow the approach outlined in A novel method for straightening curved text-lines in stylistic documents. That article is reasonably detailed in finding the line that describes curved text, but much less so where straightening the curve is concerned. I have trouble visualizing the only reference to straightening that I see is in the abstract:
find the angle between the normal at a point on the curve and the vertical line, and finally visit each point on the text and rotate by their corresponding angles.
I also found Geometric warp of image in python, which seems promising. If I could rectify the spline, I think that would allow me to set a range of target points for the affine transform to map to. Unfortunately, I haven't found an approach to rectify my spline and test it.
Finally, this program implements an algorithm to straighten splines, but the paper on the algorithm is behind a pay wall and I can't make sense of the javascript.
Basically, I'm lost and in need of pointers.
Update
The affine transformation was the only approach I had any idea how to start exploring, so I've been working on that since I posted. I generated a set of destination coordinates by performing an approximate rectification of the curve based on the euclidean distance between points on my b-spline.
From where the last code block left off:
# calculate euclidian distances between adjacent points on the curve
newcoordinates = np.c_[xpointsnew, ypointsnew]
l = len(newcoordinates) - 1
pointsteps = []
for index, obj in enumerate(newcoordinates):
if index < l:
ord1 = np.c_[newcoordinates[index][0], newcoordinates[index][1]]
ord2 = np.c_[newcoordinates[index + 1][0], newcoordinates[index + 1][1]]
length = spatial.distance.cdist(ord1, ord2)
pointsteps.append(length)
# calculate euclidian distance between first point and each consecutive point
xpositions = np.asarray(pointsteps).cumsum()
# compose target coordinates for the line after the transform
targetcoordinates = [(0,0),]
for element in xpositions:
targetcoordinates.append((element, 0))
# perform affine transformation with newcoordinates as control points and targetcoordinates as target coordinates
tform = PiecewiseAffineTransform()
tform.estimate(newcoordinates, targetcoordinates)
I'm presently hung up on errors with the affine transform (scipy.spatial.qhull.QhullError: QH6154 Qhull precision error: Initial simplex is flat (facet 1 is coplanar with the interior point)
), but I'm not sure whether it's because of a problem with how I'm feeding the data in, or because I'm abusing the transform to do my projection.

I got the same error with you when using scipy.spatial.ConvexHull.
First, let me explain my project: what i wanted to do is to segment the people from its background(image matting). In my code, first I read an image and a trimap, then according to the trimap, I segment the original image to foreground, bakground and unknown pixels. Here is part of the coed:
img = scipy.misc.imread('sweater_black.png') #color_image
trimap = scipy.misc.imread('sw_trimap.png', flatten='True') #trimap
bg = trimap == 0 #background
fg = trimap == 255 #foreground
unknown = True ^ np.logical_or(fg,bg) #unknown pixels
fg_px = img[fg] #here i got the rgb value of the foreground pixels,then send them to the ConvexHull
fg_hull = scipy.spatial.ConvexHull(fg_px)
But i got an error here.So I check the Array of fg_px and then I found this array is n*4. which means every scalar i send to ConvexHull has four values. Howerver, the input of ConvexHUll should be 3 dimension.
I source my error and found that the input color image is 32bits(rgb channel and alpha channel) which means it has an alpha channel. After transferring the image to 24 bit (which means only rgb channels), the code works.
In one sentence, the input of ConvexHull should be b*4, so check your input data! Hope this works for you~

Distance between point and arc in 3D

I want to compute the distance between an arc and a point in a 3D space. All I found is the distance between a circle and a point link (which is either wrong, or where I made a mistake, as I get wrong values):
P = np.array([1,0,1])
center = np.array([0,0,0])
radius = 1
n2 = np.array([0,0,1])
Delta = P-center
dist_tmp = np.sqrt( (n2*Delta)**2 + (np.abs(np.cross(n2, Delta))-radius)**2 )
dist = np.linalg.norm(dist_tmp)
I have a circle in the x-y-plane with origin at x-y-z = 0 and radius = 1. The point of interest is in distance 1 above the circle. The result of the distance from the code is 1.73.. and not 1.
What is the right equation for point-circle distance?
How can I extend it to point-arc distance?

You have several errors in your code. Here is the answer to your first question.
First, you try to implement the dot product of n2 and Delta as n2*Delta, but that is not what the multiplication of 2 np arrays does. Use np.dot() instead. Next, you try to take the "absolute value" (magnitude) of a vector with np.abs, but that latter is for real and complex numbers only. One way to get the vector magnitude is np.linalg.norm(). Changing those gives you the proper answer, and you don't need the calculation you used for variable dist. So use
Delta = P-center
dist = np.sqrt(np.dot(n2, Delta)**2 + (np.linalg.norm(np.cross(n2, Delta))- radius)**2)
That gives the proper answer for dist, 1.0.

Approximating data with a multi segment cubic bezier curve and a distance as well as a curvature contraint

I have some geo data (the image below shows the path of a river as red dots) which I want to approximate using a multi segment cubic bezier curve. Through other questions on stackoverflow here and here I found the algorithm by Philip J. Schneider from "Graphics Gems". I successfully implemented it and can report that even with thousands of points it is very fast. Unfortunately that speed comes with some disadvantages, namely that the fitting is done quite sloppily. Consider the following graphic:
The red dots are my original data and the blue line is the multi segment bezier created by the algorithm by Schneider. As you can see, the input to the algorithm was a tolerance which is at least as high as the green line indicates. Nevertheless, the algorithm creates a bezier curve which has too many sharp turns. You see too of these unnecessary sharp turns in the image. It is easy to imagine a bezier curve with less sharp turns for the shown data while still maintaining the maximum tolerance condition (just push the bezier curve a bit into the direction of the magenta arrows). The problem seems to be that the algorithm picks data points from my original data as end points of the individual bezier curves (the magenta arrows point indicate some suspects). With the endpoints of the bezier curves restricted like that, it is clear that the algorithm will sometimes produce rather sharp curvatures.
What I am looking for is an algorithm which approximates my data with a multi segment bezier curve with two constraints:
the multi segment bezier curve must never be more than a certain distance away from the data points (this is provided by the algorithm by Schneider)
the multi segment bezier curve must never create curvatures that are too sharp. One way to check for this criteria would be to roll a circle with the minimum curvature radius along the multisegment bezier curve and check whether it touches all parts of the curve along its path. Though it seems there is a better method involving the cross product of the first and second derivative
The solutions I found which create better fits sadly either work only for single bezier curves (and omit the question of how to find good start and end points for each bezier curve in the multi segment bezier curve) or do not allow a minimum curvature contraint. I feel that the minimum curvature contraint is the tricky condition here.
Here another example (this is hand drawn and not 100% precise):
Lets suppose that figure one shows both, the curvature constraint (the circle must fit along the whole curve) as well as the maximum distance of any data point from the curve (which happens to be the radius of the circle in green). A successful approximation of the red path in figure two is shown in blue. That approximation honors the curvature condition (the circle can roll inside the whole curve and touches it everywhere) as well as the distance condition (shown in green). Figure three shows a different approximation to the path. While it honors the distance condition it is clear that the circle does not fit into the curvature any more. Figure four shows a path which is impossible to be approximated with the given constraints because it is too pointy. This example is supposed to illustrate that to properly approximate some pointy turns in the path, it is necessary that the algorithm chooses control points which are not part of the path. Figure three shows that if control points along the path were chosen, the curvature constraint cannot be fulfilled anymore. This example also shows that the algorithm must quit on some inputs as it is not possible to approximate it with the given constraints.
Does there exist a solution to this problem? The solution does not have to be fast. If it takes a day to process 1000 points, then that's fine. The solution does also not have to be optimal in the sense that it must result in a least squares fit.
In the end I will implement this in C and Python but I can read most other languages too.

I found the solution that fulfills my criterea. The solution is to first find a B-Spline that approximates the points in the least square sense and then convert that spline into a multi segment bezier curve. B-Splines do have the advantage that in contrast to bezier curves they will not pass through the control points as well as providing a way to specify a desired "smoothness" of the approximation curve. The needed functionality to generate such a spline is implemented in the FITPACK library to which scipy offers a python binding. Lets suppose I read my data into the lists x and y, then I can do:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
tck,u = interpolate.splprep([x,y],s=3)
unew = np.arange(0,1.01,0.01)
out = interpolate.splev(unew,tck)
plt.figure()
plt.plot(x,y,out[0],out[1])
plt.show()
The result then looks like this:
If I want the curve more smooth, then I can increase the s parameter to splprep. If I want the approximation closer to the data I can decrease the s parameter for less smoothness. By going through multiple s parameters programatically I can find a good parameter that fits the given requirements.
The question though is how to convert that result into a bezier curve. The answer in this email by Zachary Pincus. I will replicate his solution here to give a complete answer to my question:
def b_spline_to_bezier_series(tck, per = False):
"""Convert a parametric b-spline into a sequence of Bezier curves of the same degree.
Inputs:
tck : (t,c,k) tuple of b-spline knots, coefficients, and degree returned by splprep.
per : if tck was created as a periodic spline, per *must* be true, else per *must* be false.
Output:
A list of Bezier curves of degree k that is equivalent to the input spline.
Each Bezier curve is an array of shape (k+1,d) where d is the dimension of the
space; thus the curve includes the starting point, the k-1 internal control
points, and the endpoint, where each point is of d dimensions.
"""
from fitpack import insert
from numpy import asarray, unique, split, sum
t,c,k = tck
t = asarray(t)
try:
c[0][0]
except:
# I can't figure out a simple way to convert nonparametric splines to
# parametric splines. Oh well.
raise TypeError("Only parametric b-splines are supported.")
new_tck = tck
if per:
# ignore the leading and trailing k knots that exist to enforce periodicity
knots_to_consider = unique(t[k:-k])
else:
# the first and last k+1 knots are identical in the non-periodic case, so
# no need to consider them when increasing the knot multiplicities below
knots_to_consider = unique(t[k+1:-k-1])
# For each unique knot, bring it's multiplicity up to the next multiple of k+1
# This removes all continuity constraints between each of the original knots,
# creating a set of independent Bezier curves.
desired_multiplicity = k+1
for x in knots_to_consider:
current_multiplicity = sum(t == x)
remainder = current_multiplicity%desired_multiplicity
if remainder != 0:
# add enough knots to bring the current multiplicity up to the desired multiplicity
number_to_insert = desired_multiplicity - remainder
new_tck = insert(x, new_tck, number_to_insert, per)
tt,cc,kk = new_tck
# strip off the last k+1 knots, as they are redundant after knot insertion
bezier_points = numpy.transpose(cc)[:-desired_multiplicity]
if per:
# again, ignore the leading and trailing k knots
bezier_points = bezier_points[k:-k]
# group the points into the desired bezier curves
return split(bezier_points, len(bezier_points) / desired_multiplicity, axis = 0)
So B-Splines, FITPACK, numpy and scipy saved my day :)

polygonize data
find the order of points so you just find the closest points to each other and try them to connect 'by lines'. Avoid to loop back to origin point
compute derivation along path
it is the change of direction of the 'lines' where you hit local min or max there is your control point ... Do this to reduce your input data (leave just control points).
curve
now use these points as control points. I strongly recommend interpolation polynomial for both x and y separately for example something like this:
x=a0+a1*t+a2*t*t+a3*t*t*t
y=b0+b1*t+b2*t*t+b3*t*t*t
where a0..a3 are computed like this:
d1=0.5*(p2.x-p0.x);
d2=0.5*(p3.x-p1.x);
a0=p1.x;
a1=d1;
a2=(3.0*(p2.x-p1.x))-(2.0*d1)-d2;
a3=d1+d2+(2.0*(-p2.x+p1.x));
b0 .. b3 are computed in same way but use y coordinates of course
p0..p3 are control points for cubic interpolation curve
t =<0.0,1.0> is curve parameter from p1 to p2
this ensures that position and first derivation is continuous (c1) and also you can use BEZIER but it will not be as good match as this.
[edit1] too sharp edges is a BIG problem
To solve it you can remove points from your dataset before obtaining the control points. I can think of two ways to do it right now ... choose what is better for you
remove points from dataset with too high first derivation
dx/dl or dy/dl where x,y are coordinates and l is curve length (along its path). The exact computation of curvature radius from curve derivation is tricky
remove points from dataset that leads to too small curvature radius
compute intersection of neighboring line segments (black lines) midpoint. Perpendicular axises like on image (red lines) the distance of it and the join point (blue line) is your curvature radius. When the curvature radius is smaller then your limit remove that point ...
now if you really need only BEZIER cubics then you can convert my interpolation cubic to BEZIER cubic like this:
// ---------------------------------------------------------------------------
// x=cx[0]+(t*cx[1])+(tt*cx[2])+(ttt*cx[3]); // cubic x=f(t), t = <0,1>
// ---------------------------------------------------------------------------
// cubic matrix bz4 = it4
// ---------------------------------------------------------------------------
// cx[0]= ( x0) = ( X1)
// cx[1]= (3.0*x1)-(3.0*x0) = (0.5*X2) -(0.5*X0)
// cx[2]= (3.0*x2)-(6.0*x1)+(3.0*x0) = -(0.5*X3)+(2.0*X2)-(2.5*X1)+( X0)
// cx[3]= ( x3)-(3.0*x2)+(3.0*x1)-( x0) = (0.5*X3)-(1.5*X2)+(1.5*X1)-(0.5*X0)
// ---------------------------------------------------------------------------
const double m=1.0/6.0;
double x0,y0,x1,y1,x2,y2,x3,y3;
x0 = X1; y0 = Y1;
x1 = X1-(X0-X2)*m; y1 = Y1-(Y0-Y2)*m;
x2 = X2+(X1-X3)*m; y2 = Y2+(Y1-Y3)*m;
x3 = X2; y3 = Y2;
In case you need the reverse conversion see:
Bezier curve with control points within the curve

The question was posted long ago, but here is a simple solution based on splprep, finding the minimal value of s allowing to fulfill a minimum curvature radius criteria.
route is the set of input points, the first dimension being the number of points.
import numpy as np
from scipy.interpolate import splprep, splev
#The minimum curvature radius we want to enforce
minCurvatureConstraint = 2000
#Relative tolerance on the radius
relTol = 1.e-6
#Initial values for bisection search, should bound the solution
s_0 = 0
minCurvature_0 = 0
s_1 = 100000000 #Should be high enough to produce curvature radius larger than constraint
s_1 *= 2
minCurvature_1 = np.float('inf')
while np.abs(minCurvature_0 - minCurvature_1)>minCurvatureConstraint*relTol:
s = 0.5 * (s_0 + s_1)
tck, u = splprep(np.transpose(route), s=s)
smoothed_route = splev(u, tck)
#Compute radius of curvature
derivative1 = splev(u, tck, der=1)
derivative2 = splev(u, tck, der=2)
xprim = derivative1[0]
xprimprim = derivative2[0]
yprim = derivative1[1]
yprimprim = derivative2[1]
curvature = 1.0 / np.abs((xprim*yprimprim - yprim* xprimprim) / np.power(xprim*xprim + yprim*yprim, 3 / 2))
minCurvature = np.min(curvature)
print("s is %g => Minimum curvature radius is %g"%(s,np.min(curvature)))
#Perform bisection
if minCurvature > minCurvatureConstraint:
s_1 = s
minCurvature_1 = minCurvature
else:
s_0 = s
minCurvature_0 = minCurvature
It may require some refinements such as iterations to find a suitable s_1, but works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.