Parabolic fit with fixed peak - python

I have a set of data and want to put a parabolic fit over it. This already works with the polyfit function from numpy like this:
fit = np.polyfit(X, y, 2)
formula = np.poly1d(fit)
Now I want the parabula to have its peak value at a fixed x value and that the fit is still carried out as best as possible with this fixed peak. Is there a way to accomplish that?
From my data I know that the parabola will always be open downwards.

I think this is quite a difficult problem since the x coordinate of the peak of a second-order polynomial (ax^2 + bx + c) always lies in x = -b/2a.
A thing you could do is to drop the b term and offset it by the desired peak x value in fitting the polynomial like the code below. Note that I used scipy.optimize.curve_fit to fit for the custom function func.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generating a parabola with noise
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x-2)**2 + np.random.normal(0, 5, x.shape)
# function to fit
def func(x, a, c):
return a*x**2 + c
# desired x peak value
x_peak = 2
popt, pcov = curve_fit(func, x - x_peak, y)
y_fit = func(x - x_peak, *popt)
# plotting
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
plt.axvline(x_peak)
plt.show()
Outputs the image:

Fixing a point on your parabola simplifies the problem, since you can rewrite your equation slightly in terms of a constant now:
y = A(x - B)**2 + C
Given the coefficients a, b, c in your original unconstrained fit, you have the relationships
a = A
b = -2AB
c = AB**2 + C
The only difference is that since B is a constant and you don't have an x - B term in the equation, you need to set up the least-squares problem yourself. Given arrays x, y and constant B, the problem looks like this:
m = np.stack((x - B, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
You can then extract the normal coefficient from the formulas for a, b, c above.
Here is a complete example, just like the one in the other answer:
B = 2
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x - B)**2 + np.random.normal(0, 5, x.shape)
m = np.stack(((x - B)**2, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
a = A
b = -2 * A * B
c = A * B**2 + C
y_fit = a * x**2 + b * x + c
You can drop a, b, c entirely and do
y_fit = A * (x - B)**2 + C
The result will be identical.
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)

Without the condition of location of the peak the function to be fitted would be :
y = a x^2 + b x + c
With condition of location of the peak at x=p , given p :
-b/(2a)=p
b=-2 a p
y = a x^2 -2 a p x + c
y = a (x^2 - 2 p x) +c
Knowing p , one change of variable :
X = x^2 -2 p x
So, from the data (x,y) one first compute the new data (X,y)
Then a and c are computed thanks to linear regression
y = a X + c

Related

Cubic spline for non-monotonic data (not a 1d function)

I have a curve as shown below:
The x coordinates and the y coordinates for this plot are:
path_x= (4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0)
path_y =(0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668)
And I obtained the above picture by:
x_min =min(path_x)-1
x_max =max(path_x)+1
y_min =min(path_y)-1
y_max =max(path_y)+1
num_pts = len(path_x)
fig = plt.figure(figsize=(8,8))
#fig = plt.figure()
plt.suptitle("Curve and the boundary")
ax = fig.add_subplot(1,1,1)
ax.set_xlim([min(x_min,y_min),max(x_max,y_max)])
ax.set_ylim([min(x_min,y_min),max(x_max,y_max)])
ax.plot(path_x,path_y)
Now my intention is to draw a smooth curve using cubic splines. But looks like for cubic splines you need the x coordinates to be on ascending order. whereas in this case, neither x values nor y values are in the ascending order.
Also this is not a function. That is an x value is mapped with more than one element in the range.
I also went over this post. But I couldn't figure out a proper method to solve my problem.
I really appreciate your help in this regard
As suggested in the comments, you can always parameterize any curve/surface with an arbitrary (and linear!) parameter.
For example, define t as a parameter such that you get x=x(t) and y=y(t). Since t is arbitrary, you can define it such that at t=0, you get your first path_x[0],path_y[0], and at t=1, you get your last pair of coordinates, path_x[-1],path_y[-1].
Here is a code using scipy.interpolate
import numpy
import scipy.interpolate
import matplotlib.pyplot as plt
path_x = numpy.asarray((4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0),dtype=float)
path_y = numpy.asarray((0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668),dtype=float)
# defining arbitrary parameter to parameterize the curve
path_t = numpy.linspace(0,1,path_x.size)
# this is the position vector with
# x coord (1st row) given by path_x, and
# y coord (2nd row) given by path_y
r = numpy.vstack((path_x.reshape((1,path_x.size)),path_y.reshape((1,path_y.size))))
# creating the spline object
spline = scipy.interpolate.interp1d(path_t,r,kind='cubic')
# defining values of the arbitrary parameter over which
# you want to interpolate x and y
# it MUST be within 0 and 1, since you defined
# the spline between path_t=0 and path_t=1
t = numpy.linspace(numpy.min(path_t),numpy.max(path_t),100)
# interpolating along t
# r[0,:] -> interpolated x coordinates
# r[1,:] -> interpolated y coordinates
r = spline(t)
plt.plot(path_x,path_y,'or')
plt.plot(r[0,:],r[1,:],'-k')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
With output
For non-ascending x splines can be easily computed if you make both x and y functions of another parameter t: x(t), y(t).
In your case you have 5 points so t should be just enumeration of these points, i.e. t = 0, 1, 2, 3, 4 for 5 points.
So if x = [5, 2, 7, 3, 6] then x(t) = x(0) = 5, x(1) = 2, x(2) = 7, x(3) = 3, x(4) = 6. Same for y.
Then compute spline function for both x(t) and y(t). Afterwards compute values of splines in all many intermediate t points. Lastly just use all calculated values x(t) and y(t) as a function y(x).
Once before I implemented cubic spline computation from scratch using Numpy, so I use this code in my example below if you don't mind (it could be useful for you to learn about spline math), replace with your library functions. Also in my code you can see numba lines commented out, if you want you can use these Numba annotations to speed up computation.
You have to look at main() function at the bottom of code, it shows how to compute and use x(t) and y(t).
Try it online!
import numpy as np, matplotlib.pyplot as plt
# Solves linear system given by Tridiagonal Matrix
# Helper for calculating cubic splines
##numba.njit(cache = True, fastmath = True, inline = 'always')
def tri_diag_solve(A, B, C, F):
n = B.size
assert A.ndim == B.ndim == C.ndim == F.ndim == 1 and (
A.size == B.size == C.size == F.size == n
) #, (A.shape, B.shape, C.shape, F.shape)
Bs, Fs = np.zeros_like(B), np.zeros_like(F)
Bs[0], Fs[0] = B[0], F[0]
for i in range(1, n):
Bs[i] = B[i] - A[i] / Bs[i - 1] * C[i - 1]
Fs[i] = F[i] - A[i] / Bs[i - 1] * Fs[i - 1]
x = np.zeros_like(B)
x[-1] = Fs[-1] / Bs[-1]
for i in range(n - 2, -1, -1):
x[i] = (Fs[i] - C[i] * x[i + 1]) / Bs[i]
return x
# Calculate cubic spline params
##numba.njit(cache = True, fastmath = True, inline = 'always')
def calc_spline_params(x, y):
a = y
h = np.diff(x)
c = np.concatenate((np.zeros((1,), dtype = y.dtype),
np.append(tri_diag_solve(h[:-1], (h[:-1] + h[1:]) * 2, h[1:],
((a[2:] - a[1:-1]) / h[1:] - (a[1:-1] - a[:-2]) / h[:-1]) * 3), 0)))
d = np.diff(c) / (3 * h)
b = (a[1:] - a[:-1]) / h + (2 * c[1:] + c[:-1]) / 3 * h
return a[1:], b, c[1:], d
# Spline value calculating function, given params and "x"
##numba.njit(cache = True, fastmath = True, inline = 'always')
def func_spline(x, ix, x0, a, b, c, d):
dx = x - x0[1:][ix]
return a[ix] + (b[ix] + (c[ix] + d[ix] * dx) * dx) * dx
# Compute piece-wise spline function for "x" out of sorted "x0" points
##numba.njit([f'f{ii}[:](f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:])' for ii in (4, 8)],
# cache = True, fastmath = True, inline = 'always')
def piece_wise_spline(x, x0, a, b, c, d):
xsh = x.shape
x = x.ravel()
ix = np.searchsorted(x0[1 : -1], x)
y = func_spline(x, ix, x0, a, b, c, d)
y = y.reshape(xsh)
return y
def main():
x0 = np.array([4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0])
y0 = np.array([0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668])
t0 = np.arange(len(x0)).astype(np.float64)
plt.plot(x0, y0)
vs = []
for e in (x0, y0):
a, b, c, d = calc_spline_params(t0, e)
x = np.linspace(0, t0[-1], 100)
vs.append(piece_wise_spline(x, t0, a, b, c, d))
plt.plot(vs[0], vs[1])
plt.show()
if __name__ == '__main__':
main()
Output:

How to substract a "y" value from the trendline **numpy and pylab**

I have a 3 point graph with a trendline but I need to find an "x" value for specific y value. Here is what I have:
from numpy import *
from pylab import *
x = ng
y = density
plt.scatter(x, y)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plt.plot(x,p(x),"r--")
The x value is basically a DNA concentration whereas the y value is a densitometric value that I calculated. I need to find a DNA concertation for the density of 19159.8.
Can somebody help me, please?
The inverse function of y = a*x + b is simply x = (1/a)*y + (-b/a):
a, b = z
z_inv = np.array([1 / a, -b / a])
p_inv = np.poly1d(z_inv)
print(np.allclose(p_inv(p(x)), x))
# True

Best fit plane to 3D data: different results for two different methods

(Although there are a number of questions regarding how to best fit a plane to some 3D data on SO, I couldn't find an answer for this issue.)
Given N (x, y, z) points, I need the best fit plane
a*x + b*y + c*z + d = 0
defined through the a, b, c, d coefficients that minimize the mean of the orthogonal distances from the points to the plane. The point-plane orthogonal distance (for a given (x0, y0, z0) point) is defined as:
d = |a*x0 + b*y0 + c*z0 + d|/sqrt(a^2 + b^2 + c^2)
I set up two methods (code below):
Singular-Value Decomposition (source)
Basin-Hopping minimization of the mean orthogonal distances
As I understand it, the SVD method should produce the exact best fit plane by minimizing the orthogonal distances analytically. What I find instead is that the BH method gives better results than the supposedly exact SVD method, even for a low number of BH runs.
By "better" I mean that the final mean orthogonal distance value is smaller with the BH method, than with the SVD method.
What am I missing here?
import numpy as np
import scipy.optimize as optimize
def perp_error(params, xyz):
"""
Mean of the absolute values for the perpendicular distance of the
'xyz' points, to the plane defined by the coefficients 'a,b,c,d' in
'params'.
"""
a, b, c, d = params
x, y, z = xyz
length = np.sqrt(a**2 + b**2 + c**2)
return (np.abs(a * x + b * y + c * z + d) / length).mean()
def minPerpDist(x, y, z, N_min):
"""
Basin-Hopping method, minimize mean absolute values of the
orthogonal distances.
"""
def unit_length(params):
"""
Constrain the vector perpendicular to the plane to be of unit length.
"""
a, b, c, d = params
return a**2 + b**2 + c**2 - 1
# Random initial guess for the a,b,c,d plane coefficients.
initial_guess = np.random.uniform(-10., 10., 4)
# Constrain the vector perpendicular to the plane to be of unit length.
cons = ({'type': 'eq', 'fun': unit_length})
min_kwargs = {"constraints": cons, "args": [x, y, z]}
# Use Basin-Hopping to obtain the best fit coefficients.
sol = optimize.basinhopping(
perp_error, initial_guess, minimizer_kwargs=min_kwargs, niter=N_min)
abcd = list(sol.x)
return abcd
def SVD(X):
"""
Singular value decomposition method.
Source: https://gist.github.com/lambdalisue/7201028
"""
# Find the average of points (centroid) along the columns
C = np.average(X, axis=0)
# Create CX vector (centroid to point) matrix
CX = X - C
# Singular value decomposition
U, S, V = np.linalg.svd(CX)
# The last row of V matrix indicate the eigenvectors of
# smallest eigenvalues (singular values).
N = V[-1]
# Extract a, b, c, d coefficients.
x0, y0, z0 = C
a, b, c = N
d = -(a * x0 + b * y0 + c * z0)
return a, b, c, d
# Generate a random plane.
seed = np.random.randint(100000)
print("Seed: {}".format(seed))
np.random.seed(seed)
a, b, c, d = np.random.uniform(-10., 10., 4)
print("Orig abc(d=1): {:.3f} {:.3f} {:.3f}\n".format(a / d, b / d, c / d))
# Generate random (x, y, z) points.
N = 200
x, y = np.random.uniform(-5., 5., (2, N))
z = -(a * x + b * y + d) / c
# Add scatter in z.
z = z + np.random.uniform(-.2, .2, N)
# Solve using SVD method.
a, b, c, d = SVD(np.array([x, y, z]).T)
print("SVD abc(d=1): {:.3f} {:.3f} {:.3f}".format(a / d, b / d, c / d))
# Orthogonal mean distance
print("Perp err: {:.5f}\n".format(perp_error((a, b, c, d), (x, y, z))))
# Solve using Basin-Hopping.
abcd = minPerpDist(x, y, z, 500)
a, b, c, d = abcd
print("BH abc(d=1): {:.3f} {:.3f} {:.3f}".format(a / d, b / d, c / d))
print("Perp err: {:.5f}".format(perp_error(abcd, (x, y, z))))
I believe I found the reason for the discrepancy.
When I minimize the perpendicular distance of points to a plane using Basin-Hopping, I am using the absolute valued point-plane distance:
d_abs = |a*x0 + b*y0 + c*z0 + d| / sqrt(a^2 + b^2 + c^2)
The SVD method on the other hand, apparently minimizes the squared point-plane distance:
d_sqr = (a*x0 + b*y0 + c*z0 + d)^2 / (a^2 + b^2 + c^2)
If, in the code shared in the question, I use the squared distance in the perp_error() function instead of the absolute valued distance, both methods give the exact same answer.

How to find the equation for an ellipse

I am looking to find the equation for an ellipse given five or six points using the general equation for a conic:
A x2 + B xy + C y2 + D x + E y + F = 0.
At first I tried using six points. Here is my python code:
import numpy as np
def conic_section(p1, p2, p3, p4, p5, p6):
def row(point):
return [point[0]*point[0], point[0]*point[1], point[1]*point[1],
point[0], point[1], 1]
matrix=np.matrix([row(p1),row(p2),row(p3),row(p4),row(p5), row(p6)])
b=[0,0,0,0,0,0]
return np.linalg.solve(matrix,b)
print conic_section(np.array([6,5]), np.array([2,9]), np.array([0,0]),
np.array([11, 5.5]), np.array([6, 7]), np.array([-1,-1]))
The problem is that this will return the solution [0,0,0,0,0,0] because the right hand side of my equation is the zero vector.
I then attempted to change the conic by subtracting the F and dividing it through:
A x2 + B xy + C y2 + D x + E y + F = 0
A x2 + B xy + C y2 + D x + E y = -F
A' x2 + B xy + C' y2 + D' x + E' y = -1.
The reason that this doesn't work though is that if one of my point is (0,0), then I would end up with a Matrix that has a row of zeros, yet the right hand side of the equation would have a -1 for the entries in the vector. In other words, if one of my points is (0,0) - then "F" should be 0, and so I can't divide it out.
Any help would be appreciated.
Thank You.
It seems that you have exact points on ellipse, don't need approximation, and use Braikenridge-Maclaurin construction for conic sections by some strange way.
Five points(x[i],y[i]) determines ellipse with this explicit equation (mathworld page, eq. 8)
So to find ellipse equation, you can build cofactor expansion of the determinant by minors for the first row. For example, coefficient Ais determinant value for submatrix from x1y1 to the right bottom corner, coefficient B is negated value of determinant for submatrix without xiyi column and so on.
Ellipse equation (without translation and rotation):
The goal is to resolve this linear equation in variable A through F:
Use:
from math import sin, cos, pi, sqrt
import matplotlib.pyplot as plt
import numpy as np
from numpy.linalg import eig, inv
# basis parameters of the ellipse
a = 7
b = 4
def ellipse(t, a, b):
return a*cos(t), b*sin(t)
points = [ellipse(t, a, b) for t in np.linspace(0, 2*pi, 100)]
x, y = [np.array(v) for v in list(zip(*points))]
fig = plt.figure()
plt.scatter(x, y)
plt.show()
def fit_ellipse(x, y):
x = x[:, np.newaxis]
y = y[:, np.newaxis]
D = np.column_stack((x**2, x*y, y**2, x, y, np.ones_like(x)))
S = np.dot(D.T, D)
C = np.zeros([6,6])
C[0, 2] = C[2, 0] = 2
C[1, 1] = -1
E, V = eig(np.dot(inv(S), C))
n = np.argmax(np.abs(E))
return V[:, n]
A, B, C, D, E, F = fit_ellipse(x, y)
K = D**2/(4*A) + E**2/(4*C) - F
# a, b
print('a:', sqrt(K/A), 'b:', sqrt(K/C))
Output:
a: 6.999999999999998 b: 4.0
See:
http://mathworld.wolfram.com/ConicSection.html
https://fr.wikipedia.org/wiki/Ellipse_(math%C3%A9matiques)#Forme_matricielle
http://nicky.vanforeest.com/misc/fitEllipse/fitEllipse.html

Line fitting below points

I have a set of x, y points and I'd like to find the line of best fit such that the line is below all points using SciPy. I'm trying to use leastsq for this, but I'm unsure how to adjust the line to be below all points instead of the line of best fit. The coefficients for the line of best fit can be produced via:
def linreg(x, y):
fit = lambda params, x: params[0] * x - params[1]
err = lambda p, x, y: (y - fit(p, x))**2
# initial slope/intercept
init_p = np.array((1, 0))
p, _ = leastsq(err, init_p.copy(), args=(x, y))
return p
xs = sp.array([1, 2, 3, 4, 5])
ys = sp.array([10, 20, 30, 40, 50])
print linreg(xs, ys)
The output is the coefficients for the line of best fit:
array([ 9.99999997e+00, -1.68071668e-15])
How can I get the coefficients of the line of best fit that is below all points?
A possible algorithm is as follows:
Move the axes to have all the data on the positive half of the x axis.
If the fit is of the form y = a * x + b, then for a given b the best fit for a will be the minimum of the slopes joining the point (0, b) with each of the (x, y) points.
You can then calculate a fit error, which is a function of only b, and use scipy.optimize.minimize to find the best value for b.
All that's left is computing a for that b and calculating b for the original position of the axes.
The following does that most of the time, except when the minimization fails with some mysterious error:
from __future__ import division
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
def fit_below(x, y) :
idx = np.argsort(x)
x = x[idx]
y = y[idx]
x0, y0 = x[0] - 1, y[0]
x -= x0
y -= y0
def error_function_2(b, x, y) :
a = np.min((y - b) / x)
return np.sum((y - a * x - b)**2)
b = scipy.optimize.minimize(error_function_2, [0], args=(x, y)).x[0]
a = np.min((y - b) / x)
return a, b - a * x0 + y0
x = np.arange(10).astype(float)
y = x * 2 + 3 + 3 * np.random.rand(len(x))
a, b = fit_below(x, y)
plt.plot(x, y, 'o')
plt.plot(x, a*x + b, '-')
plt.show()
And as TheodrosZelleke wisely predicted, it goes through two points that are part of the convex hull:

Categories

Resources