Line fitting below points - python

I have a set of x, y points and I'd like to find the line of best fit such that the line is below all points using SciPy. I'm trying to use leastsq for this, but I'm unsure how to adjust the line to be below all points instead of the line of best fit. The coefficients for the line of best fit can be produced via:
def linreg(x, y):
fit = lambda params, x: params[0] * x - params[1]
err = lambda p, x, y: (y - fit(p, x))**2
# initial slope/intercept
init_p = np.array((1, 0))
p, _ = leastsq(err, init_p.copy(), args=(x, y))
return p
xs = sp.array([1, 2, 3, 4, 5])
ys = sp.array([10, 20, 30, 40, 50])
print linreg(xs, ys)
The output is the coefficients for the line of best fit:
array([ 9.99999997e+00, -1.68071668e-15])
How can I get the coefficients of the line of best fit that is below all points?

A possible algorithm is as follows:
Move the axes to have all the data on the positive half of the x axis.
If the fit is of the form y = a * x + b, then for a given b the best fit for a will be the minimum of the slopes joining the point (0, b) with each of the (x, y) points.
You can then calculate a fit error, which is a function of only b, and use scipy.optimize.minimize to find the best value for b.
All that's left is computing a for that b and calculating b for the original position of the axes.
The following does that most of the time, except when the minimization fails with some mysterious error:
from __future__ import division
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
def fit_below(x, y) :
idx = np.argsort(x)
x = x[idx]
y = y[idx]
x0, y0 = x[0] - 1, y[0]
x -= x0
y -= y0
def error_function_2(b, x, y) :
a = np.min((y - b) / x)
return np.sum((y - a * x - b)**2)
b = scipy.optimize.minimize(error_function_2, [0], args=(x, y)).x[0]
a = np.min((y - b) / x)
return a, b - a * x0 + y0
x = np.arange(10).astype(float)
y = x * 2 + 3 + 3 * np.random.rand(len(x))
a, b = fit_below(x, y)
plt.plot(x, y, 'o')
plt.plot(x, a*x + b, '-')
plt.show()
And as TheodrosZelleke wisely predicted, it goes through two points that are part of the convex hull:

Related

How to make a piecewise linear fit in Python with some constant pieces?

I'm trying to make a piecewise linear fit consisting of 3 pieces whereof the first and last pieces are constant. As you can see in this figure
don't get the expected fit, since the fit doesn't capture the 3 linear pieces clearly visual from the original data points.
I've tried following this question and expanded it to the case of 3 pieces with the two constant pieces, but I must have done something wrong.
Here is my code:
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = [16, 6]
x = np.arange(0, 50, dtype=float)
y = np.array([50 for i in range(10)]
+ [50 - (50-5)/31 * i for i in range(1, 31)]
+ [5 for i in range(10)],
dtype=float)
def piecewise_linear(x, x0, y0, x1, y1):
return np.piecewise(x,
[x < x0, (x >= x0) & (x < x1), x >= x1],
[lambda x:y0, lambda x:(y1-y0)/(x1-x0)*(x-x0)+y0, lambda x:y1])
p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 50, 101)
plt.plot(x, y, "o", label='original data')
plt.plot(xd, piecewise_linear(xd, *p), label='piecewise linear fit')
plt.legend()
The accepted answer to the previous mentioned question suggest looking at segments_fit.ipynb for the case of N parts, but following that it doesn't seem that I can specify, that the first and last pieces should be constant.
Furthermore I do get the following warning:
OptimizeWarning: Covariance of the parameters could not be estimated
What do I do wrong?
You could directly copy the segments_fit implementation
from scipy import optimize
def segments_fit(X, Y, count):
xmin = X.min()
xmax = X.max()
seg = np.full(count - 1, (xmax - xmin) / count)
px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])
def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
return px, py
def err(p):
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)
r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
return func(r.x)
Then you apply it as follows
import numpy as np;
# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40)
# apply the segment fit
fx, fy = segments_fit(x, y, 3)
This will give you (fx,fy) the corners your piecewise fit, let's plot it
import matplotlib.pyplot as plt
# show the results
plt.figure(figsize=(8, 3))
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.')
plt.legend(['fitted line', 'given points'])
EDIT: Introducing constant segments
As mentioned in the comments the above example doesn't guarantee that the output will be constant in the end segments.
Based on this implementation the easier way I can think is to restrict func(p) to do that, a simple way to ensure a segment is constant, is to set y[i+1]==y[i]. Thus I added xanchor and yanchor. If you give an array with repeated numbers you can bind multiple points to the same value.
from scipy import optimize
def segments_fit(X, Y, count, xanchors=slice(None), yanchors=slice(None)):
xmin = X.min()
xmax = X.max()
seg = np.full(count - 1, (xmax - xmin) / count)
px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])
def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py = py[yanchors]
px = px[xanchors]
return px, py
def err(p):
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)
r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
return func(r.x)
I modified a little the data generation to make it more clear the effect of the change
import matplotlib.pyplot as plt
import numpy as np;
# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40) + np.random.randn(len(x)) + 0.25 * x
# apply the segment fit
fx, fy = segments_fit(x, y, 3)
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.k')
# apply the segment fit with some consecutive points having the
# same anchor
fx, fy = segments_fit(x, y, 3, yanchors=[1,1,2,2])
plt.plot(fx, fy, 'o--r')
plt.legend(['fitted line', 'given points', 'with const segments'])
You can get a one line solution (not counting the import) using univariate splines of degree one. Like this
from scipy.interpolate import UnivariateSpline
f = UnivariateSpline(x,y,k=1,s=0)
Here k=1 means we interpolate using polynomials of degree one aka lines. s is the smoothing parameter. It decides how much you want to compromise on the fit to avoid using too many segments. Setting it to zero means no compromises i.e. the line HAS to go threw all points. See the documentation.
Then
plt.plot(x, y, "o", label='original data')
plt.plot(x, f(x), label='linear interpolation')
plt.legend()
plt.savefig("out.png", dpi=300)
gives
This i consider a funny non-linear approach that works quite well.
Note that even though this is highly non-linear it approximates the linear behavior very well. Moreover, the fit parameters provide the linear results. Only for the offset b a little transformation and according error propagation is required. (Also, I don't care about the value of p as long as it is somewhat larger than 5)
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
np.set_printoptions( linewidth=250, precision=4)
np.set_printoptions( linewidth=250, precision=4)
### piecewise linear function for data generation
def pwl( x, m, b, a1, a2 ):
if x < a1:
out = pwl( a1, m, b, a1, a2 )
elif x > a2:
out = pwl( a2, m, b, a1, a2 )
else:
out = m * x + b
return out
### non-linear approximation
def func( x, m, b, a1, a2, p ):
out = b + np.log(
1 / ( 1 + np.exp( -m *( x - a1 ) )**p )
) / p - np.log(
1 / ( 1 + np.exp( -m * ( x - a2 ) )**p )
) / p
return out
### some data
nn = 36
xdata = np.linspace( -5, 19, nn )
ydata = np.fromiter( (pwl( x, -2.1, 11.6, -1.1, 12.7 ) for x in xdata ), float)
ydata += np.random.normal( size=nn, scale=0.2)
### dense grid for printing
xth = np.linspace( -5, 19, 150 )
###fitting
popt, cov = curve_fit( func, xdata, ydata, p0=[-2, 11, -1, 10, 1])
mF, betaF, a1F, a2F, pF = popt
bF = betaF - mF * a1F
sol=( mF, bF, a1F, a2F, pF )
### transforming the covariance due to the b' -> b mapping
J1 = np.identity(5)
J1[1,0] = -popt[2]
J1[1,2] = -popt[0]
cov2 = np.dot( J1, np.dot( cov, np.transpose( J1 ) ) )
### results
print( cov2 )
for i, v in enumerate( ("m", "b", "a1", "a2", "p" ) ):
print( "{:>2} = {:+2.4e} ± {:0.4e}".format( v, sol[i], np.sqrt( cov2[i,i] ) ) )
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xdata, ydata, ls='', marker='+' )
ax.plot( xth, func( xth, -2, 11, -1, 10, 1 ) )
ax.plot( xth, func( xth, *popt ) )
plt.show()
Providing
[[ 1.3553e-04 -7.6291e-04 -4.3488e-04 4.5624e-04 1.2619e-01]
[-7.6291e-04 6.4126e-03 3.4560e-03 -1.5573e-03 -7.4983e-01]
[-4.3488e-04 3.4560e-03 3.4741e-03 -9.8284e-04 -4.2344e-01]
[ 4.5624e-04 -1.5573e-03 -9.8284e-04 3.0842e-03 -5.2739e+00]
[ 1.2619e-01 -7.4983e-01 -4.2344e-01 -5.2739e+00 3.1583e+05]]
m = -2.0810e+00 ± 9.7718e-03
b = +1.1463e+01 ± 6.7217e-02
a1 = -1.2545e+00 ± 5.0384e-02
a2 = +1.2739e+01 ± 4.7176e-02
p = +1.6840e+01 ± 2.9872e+02
and

Parabolic fit with fixed peak

I have a set of data and want to put a parabolic fit over it. This already works with the polyfit function from numpy like this:
fit = np.polyfit(X, y, 2)
formula = np.poly1d(fit)
Now I want the parabula to have its peak value at a fixed x value and that the fit is still carried out as best as possible with this fixed peak. Is there a way to accomplish that?
From my data I know that the parabola will always be open downwards.
I think this is quite a difficult problem since the x coordinate of the peak of a second-order polynomial (ax^2 + bx + c) always lies in x = -b/2a.
A thing you could do is to drop the b term and offset it by the desired peak x value in fitting the polynomial like the code below. Note that I used scipy.optimize.curve_fit to fit for the custom function func.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generating a parabola with noise
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x-2)**2 + np.random.normal(0, 5, x.shape)
# function to fit
def func(x, a, c):
return a*x**2 + c
# desired x peak value
x_peak = 2
popt, pcov = curve_fit(func, x - x_peak, y)
y_fit = func(x - x_peak, *popt)
# plotting
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
plt.axvline(x_peak)
plt.show()
Outputs the image:
Fixing a point on your parabola simplifies the problem, since you can rewrite your equation slightly in terms of a constant now:
y = A(x - B)**2 + C
Given the coefficients a, b, c in your original unconstrained fit, you have the relationships
a = A
b = -2AB
c = AB**2 + C
The only difference is that since B is a constant and you don't have an x - B term in the equation, you need to set up the least-squares problem yourself. Given arrays x, y and constant B, the problem looks like this:
m = np.stack((x - B, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
You can then extract the normal coefficient from the formulas for a, b, c above.
Here is a complete example, just like the one in the other answer:
B = 2
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x - B)**2 + np.random.normal(0, 5, x.shape)
m = np.stack(((x - B)**2, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
a = A
b = -2 * A * B
c = A * B**2 + C
y_fit = a * x**2 + b * x + c
You can drop a, b, c entirely and do
y_fit = A * (x - B)**2 + C
The result will be identical.
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
Without the condition of location of the peak the function to be fitted would be :
y = a x^2 + b x + c
With condition of location of the peak at x=p , given p :
-b/(2a)=p
b=-2 a p
y = a x^2 -2 a p x + c
y = a (x^2 - 2 p x) +c
Knowing p , one change of variable :
X = x^2 -2 p x
So, from the data (x,y) one first compute the new data (X,y)
Then a and c are computed thanks to linear regression
y = a X + c

Cubic spline for non-monotonic data (not a 1d function)

I have a curve as shown below:
The x coordinates and the y coordinates for this plot are:
path_x= (4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0)
path_y =(0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668)
And I obtained the above picture by:
x_min =min(path_x)-1
x_max =max(path_x)+1
y_min =min(path_y)-1
y_max =max(path_y)+1
num_pts = len(path_x)
fig = plt.figure(figsize=(8,8))
#fig = plt.figure()
plt.suptitle("Curve and the boundary")
ax = fig.add_subplot(1,1,1)
ax.set_xlim([min(x_min,y_min),max(x_max,y_max)])
ax.set_ylim([min(x_min,y_min),max(x_max,y_max)])
ax.plot(path_x,path_y)
Now my intention is to draw a smooth curve using cubic splines. But looks like for cubic splines you need the x coordinates to be on ascending order. whereas in this case, neither x values nor y values are in the ascending order.
Also this is not a function. That is an x value is mapped with more than one element in the range.
I also went over this post. But I couldn't figure out a proper method to solve my problem.
I really appreciate your help in this regard
As suggested in the comments, you can always parameterize any curve/surface with an arbitrary (and linear!) parameter.
For example, define t as a parameter such that you get x=x(t) and y=y(t). Since t is arbitrary, you can define it such that at t=0, you get your first path_x[0],path_y[0], and at t=1, you get your last pair of coordinates, path_x[-1],path_y[-1].
Here is a code using scipy.interpolate
import numpy
import scipy.interpolate
import matplotlib.pyplot as plt
path_x = numpy.asarray((4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0),dtype=float)
path_y = numpy.asarray((0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668),dtype=float)
# defining arbitrary parameter to parameterize the curve
path_t = numpy.linspace(0,1,path_x.size)
# this is the position vector with
# x coord (1st row) given by path_x, and
# y coord (2nd row) given by path_y
r = numpy.vstack((path_x.reshape((1,path_x.size)),path_y.reshape((1,path_y.size))))
# creating the spline object
spline = scipy.interpolate.interp1d(path_t,r,kind='cubic')
# defining values of the arbitrary parameter over which
# you want to interpolate x and y
# it MUST be within 0 and 1, since you defined
# the spline between path_t=0 and path_t=1
t = numpy.linspace(numpy.min(path_t),numpy.max(path_t),100)
# interpolating along t
# r[0,:] -> interpolated x coordinates
# r[1,:] -> interpolated y coordinates
r = spline(t)
plt.plot(path_x,path_y,'or')
plt.plot(r[0,:],r[1,:],'-k')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
With output
For non-ascending x splines can be easily computed if you make both x and y functions of another parameter t: x(t), y(t).
In your case you have 5 points so t should be just enumeration of these points, i.e. t = 0, 1, 2, 3, 4 for 5 points.
So if x = [5, 2, 7, 3, 6] then x(t) = x(0) = 5, x(1) = 2, x(2) = 7, x(3) = 3, x(4) = 6. Same for y.
Then compute spline function for both x(t) and y(t). Afterwards compute values of splines in all many intermediate t points. Lastly just use all calculated values x(t) and y(t) as a function y(x).
Once before I implemented cubic spline computation from scratch using Numpy, so I use this code in my example below if you don't mind (it could be useful for you to learn about spline math), replace with your library functions. Also in my code you can see numba lines commented out, if you want you can use these Numba annotations to speed up computation.
You have to look at main() function at the bottom of code, it shows how to compute and use x(t) and y(t).
Try it online!
import numpy as np, matplotlib.pyplot as plt
# Solves linear system given by Tridiagonal Matrix
# Helper for calculating cubic splines
##numba.njit(cache = True, fastmath = True, inline = 'always')
def tri_diag_solve(A, B, C, F):
n = B.size
assert A.ndim == B.ndim == C.ndim == F.ndim == 1 and (
A.size == B.size == C.size == F.size == n
) #, (A.shape, B.shape, C.shape, F.shape)
Bs, Fs = np.zeros_like(B), np.zeros_like(F)
Bs[0], Fs[0] = B[0], F[0]
for i in range(1, n):
Bs[i] = B[i] - A[i] / Bs[i - 1] * C[i - 1]
Fs[i] = F[i] - A[i] / Bs[i - 1] * Fs[i - 1]
x = np.zeros_like(B)
x[-1] = Fs[-1] / Bs[-1]
for i in range(n - 2, -1, -1):
x[i] = (Fs[i] - C[i] * x[i + 1]) / Bs[i]
return x
# Calculate cubic spline params
##numba.njit(cache = True, fastmath = True, inline = 'always')
def calc_spline_params(x, y):
a = y
h = np.diff(x)
c = np.concatenate((np.zeros((1,), dtype = y.dtype),
np.append(tri_diag_solve(h[:-1], (h[:-1] + h[1:]) * 2, h[1:],
((a[2:] - a[1:-1]) / h[1:] - (a[1:-1] - a[:-2]) / h[:-1]) * 3), 0)))
d = np.diff(c) / (3 * h)
b = (a[1:] - a[:-1]) / h + (2 * c[1:] + c[:-1]) / 3 * h
return a[1:], b, c[1:], d
# Spline value calculating function, given params and "x"
##numba.njit(cache = True, fastmath = True, inline = 'always')
def func_spline(x, ix, x0, a, b, c, d):
dx = x - x0[1:][ix]
return a[ix] + (b[ix] + (c[ix] + d[ix] * dx) * dx) * dx
# Compute piece-wise spline function for "x" out of sorted "x0" points
##numba.njit([f'f{ii}[:](f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:])' for ii in (4, 8)],
# cache = True, fastmath = True, inline = 'always')
def piece_wise_spline(x, x0, a, b, c, d):
xsh = x.shape
x = x.ravel()
ix = np.searchsorted(x0[1 : -1], x)
y = func_spline(x, ix, x0, a, b, c, d)
y = y.reshape(xsh)
return y
def main():
x0 = np.array([4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0])
y0 = np.array([0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668])
t0 = np.arange(len(x0)).astype(np.float64)
plt.plot(x0, y0)
vs = []
for e in (x0, y0):
a, b, c, d = calc_spline_params(t0, e)
x = np.linspace(0, t0[-1], 100)
vs.append(piece_wise_spline(x, t0, a, b, c, d))
plt.plot(vs[0], vs[1])
plt.show()
if __name__ == '__main__':
main()
Output:

Returning best support vectors from support vector machine

I'm using a support vector classifier from sklearn (in Python) to find the optimal boundary between a set of "0" and "1" labelled data.
See: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
However I want to perform some analysis after rotating the data about the boundary line and therefore I need to return the properties which will allow me to define the line to start with.
I carry out the SVC as follows:
Relevant imports:
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
I define the classifier as:
clf = svm.SVC(kernel='linear',C = 1e-3 ,class_weight='balanced')
Which is then fit to the training data:
clf.fit(f_train, labels_train)
Whereupon the linear class boundary can be viewed using:
plt.figure()
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
As seen in: https://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane.html
But when calling:
clf.support_vectors_.shape
I'm not sure how to interpret the output as being relevant if trying to describe the linear boundary as the output has shape (4485, 2)
Any help with returning something that will allow me to define the boundary line will be greatly appreciated!
Based on Plotting 3D Decision Boundary From Linear SVM you can get a boundary line using clf.intercept_ and clf.coef_ attributes:
def decision_hyperplane(clf, x, y=None, dimension=2):
"""
Return a decision line (dimension 2, return y based on x) or a
decision plane (dimension 3, return z based on x and y).
Decision plane equation is wx + b = 0, so in 2d case:
w.dot(x) + b = w_x * x + w_y * y + b = 0
y = (-w_x * x - b) / w_y
In 3d:
w_x * x + w_y * y + w_z * z + b = 0
z = (-w_x * x - w_y * y - b) / w_z
"""
if dimension == 2:
return (-clf.intercept_[0] - clf.coef_[0][0] * x) / clf.coef_[0][1]
elif dimension == 3:
return (-clf.intercept_[0] - clf.coef_[0][0] * x - clf.coef_[0][1] * y) / clf.coef_[0][2]
If you use it with your code like this
ax.plot(xx, decision_hyperplane(clf, xx), color='red')
the result will be

Equivalent of `polyfit` for a 2D polynomial in Python

I'd like to find a least-squares solution for the a coefficients in
z = (a0 + a1*x + a2*y + a3*x**2 + a4*x**2*y + a5*x**2*y**2 + a6*y**2 +
a7*x*y**2 + a8*x*y)
given arrays x, y, and z of length 20. Basically I'm looking for the equivalent of numpy.polyfit but for a 2D polynomial.
This question is similar, but the solution is provided via MATLAB.
Here is an example showing how you can use numpy.linalg.lstsq for this task:
import numpy as np
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
Z = X**2 + Y**2 + np.random.rand(*X.shape)*0.01
X = X.flatten()
Y = Y.flatten()
A = np.array([X*0+1, X, Y, X**2, X**2*Y, X**2*Y**2, Y**2, X*Y**2, X*Y]).T
B = Z.flatten()
coeff, r, rank, s = np.linalg.lstsq(A, B)
the adjusting coefficients coeff are:
array([ 0.00423365, 0.00224748, 0.00193344, 0.9982576 , -0.00594063,
0.00834339, 0.99803901, -0.00536561, 0.00286598])
Note that coeff[3] and coeff[6] respectively correspond to X**2 and Y**2, and they are close to 1. because the example data was created with Z = X**2 + Y**2 + small_random_component.
Based on the answers from #Saullo and #Francisco I have made a function which I have found helpful:
def polyfit2d(x, y, z, kx=3, ky=3, order=None):
'''
Two dimensional polynomial fitting by least squares.
Fits the functional form f(x,y) = z.
Notes
-----
Resultant fit can be plotted with:
np.polynomial.polynomial.polygrid2d(x, y, soln.reshape((kx+1, ky+1)))
Parameters
----------
x, y: array-like, 1d
x and y coordinates.
z: np.ndarray, 2d
Surface to fit.
kx, ky: int, default is 3
Polynomial order in x and y, respectively.
order: int or None, default is None
If None, all coefficients up to maxiumum kx, ky, ie. up to and including x^kx*y^ky, are considered.
If int, coefficients up to a maximum of kx+ky <= order are considered.
Returns
-------
Return paramters from np.linalg.lstsq.
soln: np.ndarray
Array of polynomial coefficients.
residuals: np.ndarray
rank: int
s: np.ndarray
'''
# grid coords
x, y = np.meshgrid(x, y)
# coefficient array, up to x^kx, y^ky
coeffs = np.ones((kx+1, ky+1))
# solve array
a = np.zeros((coeffs.size, x.size))
# for each coefficient produce array x^i, y^j
for index, (j, i) in enumerate(np.ndindex(coeffs.shape)):
# do not include powers greater than order
if order is not None and i + j > order:
arr = np.zeros_like(x)
else:
arr = coeffs[i, j] * x**i * y**j
a[index] = arr.ravel()
# do leastsq fitting and return leastsq result
return np.linalg.lstsq(a.T, np.ravel(z), rcond=None)
And the resultant fit can be visualised with:
fitted_surf = np.polynomial.polynomial.polyval2d(x, y, soln.reshape((kx+1,ky+1)))
plt.matshow(fitted_surf)
Excellent answer by Saullo Castro. Just to add the code to reconstruct the function using the least-squares solution for the a coefficients,
def poly2Dreco(X, Y, c):
return (c[0] + X*c[1] + Y*c[2] + X**2*c[3] + X**2*Y*c[4] + X**2*Y**2*c[5] +
Y**2*c[6] + X*Y**2*c[7] + X*Y*c[8])
You can also use scikit-learn for this.
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
X = X.flatten()
Y = Y.flatten()
# Generate noisy data
np.random.seed(0)
Z = X**2 + Y**2 + np.random.randn(*X.shape)*0.01
# Process 2D inputs
poly = PolynomialFeatures(degree=2)
input_pts = np.stack([X, Y]).T
assert(input_pts.shape == (400, 2))
in_features = poly.fit_transform(input_pts)
# Linear regression
model = LinearRegression()
model.fit(in_features, Z)
# Display coefficients
print(dict(zip(poly.get_feature_names_out(), model.coef_.round(4))))
# Check fit
print(f"R-squared: {model.score(poly.transform(input_pts), Z):.3f}")
# Make predictions
Z_predicted = model.predict(poly.transform(input_pts))
Out:
{'1': 0.0, 'x0': 0.003, 'x1': -0.0074, 'x0^2': 0.9974, 'x0 x1': 0.0047, 'x1^2': 1.0014}
R-squared: 1.000
Note that if kx != ky the code will fail because the j and i indices are inverted in the loop.
You get (j,i) from enumerate(np.ndindex(coeffs.shape)), but then you address elements in coeffs as coeffs[i,j]. Since the shape of the coefficient matrix is given by the maximum polynomial order that you are asking to use, the matrix will be rectangular if kx != ky and you will exceed one of its dimensions.

Categories

Resources