I'd like to find the best fit (least-squares solution) for the a coefficients in an equation similar to this one:
b = f(x,y,z) = (a0 + a1*x + a2*y + a3*z + a4*x*y + a5*x*z + a6*y*z + a7*x*y*z)
x, y, and z are small arrays with a length of about 20. The shown example is for x**k with k=1. I'm looking for a solution up k=3.
I have found this solution for a 2d fit Equivalent of `polyfit` for a 2D polynomial in Python
Now I'm looking for a similar solution but in 3d.
You right, similar technic works:
import numpy as np
x, y, z = np.random.randn(3, 20)
grid = np.meshgrid(x, y, z, indexing='ij')
x, y, z = np.stack(grid).reshape(3, -1)
b = np.random.randn(*x.shape).reshape(-1)
A = np.stack([np.ones_like(x, dtype=x.dtype), x, y, z, x * y, x * z, y * z, x * y * z], axis=1)
coeff, r, rank, s = np.linalg.lstsq(A, b, rcond=None)
Related
I have a set of data and want to put a parabolic fit over it. This already works with the polyfit function from numpy like this:
fit = np.polyfit(X, y, 2)
formula = np.poly1d(fit)
Now I want the parabula to have its peak value at a fixed x value and that the fit is still carried out as best as possible with this fixed peak. Is there a way to accomplish that?
From my data I know that the parabola will always be open downwards.
I think this is quite a difficult problem since the x coordinate of the peak of a second-order polynomial (ax^2 + bx + c) always lies in x = -b/2a.
A thing you could do is to drop the b term and offset it by the desired peak x value in fitting the polynomial like the code below. Note that I used scipy.optimize.curve_fit to fit for the custom function func.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generating a parabola with noise
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x-2)**2 + np.random.normal(0, 5, x.shape)
# function to fit
def func(x, a, c):
return a*x**2 + c
# desired x peak value
x_peak = 2
popt, pcov = curve_fit(func, x - x_peak, y)
y_fit = func(x - x_peak, *popt)
# plotting
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
plt.axvline(x_peak)
plt.show()
Outputs the image:
Fixing a point on your parabola simplifies the problem, since you can rewrite your equation slightly in terms of a constant now:
y = A(x - B)**2 + C
Given the coefficients a, b, c in your original unconstrained fit, you have the relationships
a = A
b = -2AB
c = AB**2 + C
The only difference is that since B is a constant and you don't have an x - B term in the equation, you need to set up the least-squares problem yourself. Given arrays x, y and constant B, the problem looks like this:
m = np.stack((x - B, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
You can then extract the normal coefficient from the formulas for a, b, c above.
Here is a complete example, just like the one in the other answer:
B = 2
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x - B)**2 + np.random.normal(0, 5, x.shape)
m = np.stack(((x - B)**2, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
a = A
b = -2 * A * B
c = A * B**2 + C
y_fit = a * x**2 + b * x + c
You can drop a, b, c entirely and do
y_fit = A * (x - B)**2 + C
The result will be identical.
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
Without the condition of location of the peak the function to be fitted would be :
y = a x^2 + b x + c
With condition of location of the peak at x=p , given p :
-b/(2a)=p
b=-2 a p
y = a x^2 -2 a p x + c
y = a (x^2 - 2 p x) +c
Knowing p , one change of variable :
X = x^2 -2 p x
So, from the data (x,y) one first compute the new data (X,y)
Then a and c are computed thanks to linear regression
y = a X + c
So I'm new to Python and I'd like to convert a 3D-array containing cartesian coordinates to spherical coordinates. I have done this function that calculates the conversion:
def cart2sph(x, y, z):
xy = np.sqrt(x**2 + y**2) # sqrt(x² + y²)
x_2 = x**2
y_2 = y**2
z_2 = z**2
r = np.sqrt(x_2 + y_2 + z_2) # r = sqrt(x² + y² + z²)
theta = np.arctan2(y, x)
phi = np.arctan2(xy, z)
return r, theta, phi
However, if I have a random array (N,N,N), such as
N = 3
array = np.random.rand(N, N, N).astype(dtype=np.float16)
And pass the x, y and z coordinates to my function to convert from cartesian to spherical
x = np.asarray(array_np)[:,0].astype(dtype=np.float16)
y = np.asarray(array_np)[:,1].astype(dtype=np.float16)
z = np.asarray(array_np)[:,2].astype(dtype=np.float16)
sphere_coord = cart2sph(x,y,z)
I keep getting wrong conversion results. I've tried different approaches but still couldn't figure out what I am doing wrong.
I have checked the function with a unique (x, y, z) and it seems to be converting to (r, theta, phi) just fine.
I think your problem is on how are you getting the random (x, y, z). Maybe try something like this:
import numpy as np
def cart2sph(x, y, z):
xy = np.sqrt(x**2 + y**2) # sqrt(x² + y²)
x_2 = x**2
y_2 = y**2
z_2 = z**2
r = np.sqrt(x_2 + y_2 + z_2) # r = sqrt(x² + y² + z²)
theta = np.arctan2(y, x)
phi = np.arctan2(xy, z)
return r, theta, phi
N = 3
array_np = np.random.rand(N).astype(dtype=np.float16)
print('array_np:')
print(array_np)
x = np.asarray(array_np)[0].astype(dtype=np.float16)
y = np.asarray(array_np)[1].astype(dtype=np.float16)
z = np.asarray(array_np)[2].astype(dtype=np.float16)
sphere_coord = cart2sph(x,y,z)
print('\nCartesian:')
print('x',x,'\ny',y,'\nz',z)
print('\nSpherical:')
print(sphere_coord)
Output:
array_np: [0.2864 0.938 0.9243]
Cartesian: x 0.2864 y 0.938 z 0.9243
Spherical: (1.3476626409849026, 1.274, 0.8150028593437515)
Given a set of points in (X, Y, Z) coordinates that are points on a surface, I would like to be able to interpolate Z-values at arbitrary (X, Y) coordinates. I've found some success using mlab.griddata for interpolating values on a grid, but I want to be able to call a general use function for any (X, Y) coordinate.
The set of points form a roughly hemispherical surface. To simplify the problem, I am trying to write a method that interpolates values between known points of the hemisphere defined by the x, y, and z coordinates below. Although there is an analytical solution to find z = f(x, y) for a perfect sphere, such that you don't have to interpolate, the actual set of points will not be a perfect sphere, so we should assume that we need to interpolate values at unknown (X, Y) coordinates. Link to IPython notebook with point data
resolution = 10
u = np.linspace(-np.pi / 2, np.pi / 2, resolution)
v = np.linspace(0, np.pi, resolution)
U, V = np.meshgrid(u, v)
xs = np.sin(U) * np.cos(V)
ys = np.sin(U) * np.sin(V)
zs = np.cos(U)
I have been using scipy.interpolate.interp2d, which "returns a function whose call method uses spline interpolation to find the value of new points."
def polar(xs, ys, zs, resolution=10):
rs = np.sqrt(np.multiply(xs, xs) + np.multiply(ys, ys))
ts = np.arctan2(ys, xs)
func = interp2d(rs, ts, zs, kind='cubic')
vectorized = np.vectorize(func)
# Guesses
ri = np.linspace(0, rs.max(), resolution)
ti = np.linspace(0, np.pi * 2, resolution)
R, T = np.meshgrid(ri, ti)
Z = vectorized(R, T)
return R * np.cos(T), R * np.sin(T), Z
Unfortunately I get pretty weird results, similarly to another StackOverflow user who tried to use interp2d.
The most success I have found thus far is using inverse squares to estimate values of Z at (X, Y). But the function is not perfect for estimating values of Z near Z=0.
What can I do to get a function z = f(x, y) given a set of points in (x, y, z)? Am I missing something here... do I need more than a point cloud to reliably estimate a value on a surface?
EDIT:
This is the function that I ended up writing. The function takes input arrays of xs, ys, zs and interpolates at x, y using scipy.interpolate.griddata, which does not require a regular grid. I'm sure there is a smarter way to do this and would appreciate any updates, but it works and I'm not concerned with performance. Including a snippet in case it helps anyone in the future.
def interpolate(x, y, xs, ys, zs):
r = np.sqrt(x*x + y*y)
t = np.arctan2(y, x)
rs = np.sqrt(np.multiply(xs, xs) + np.multiply(ys, ys))
ts = np.arctan2(ys, xs)
rs = rs.ravel()
ts = ts.ravel()
zs = zs.ravel()
ts = np.concatenate((ts - np.pi * 2, ts, ts + np.pi * 2))
rs = np.concatenate((rs, rs, rs))
zs = np.concatenate((zs, zs, zs))
Z = scipy.interpolate.griddata((rs, ts), zs, (r, t))
Z = Z.ravel()
R, T = np.meshgrid(r, t)
return Z
You're saying that you've tried using griddata. So why was that not working? griddata also works if the new points are not regularly spaced. For example,
# Definitions of xs, ys and zs
nx, ny = 20, 30
x = np.linspace(0, np.pi, nx)
y = np.linspace(0, 2*np.pi, ny)
X,Y = np.meshgrid(x, y)
xs = X.reshape((nx*ny, 1))
ys = Y.reshape((nx*ny, 1))
## Arbitrary definition of zs
zs = np.cos(3*xs/2.)+np.sin(5*ys/3.)**2
## new points where I want the interpolations
points = np.random.rand(1000, 2)
import scipy.interpolate
zs2 = scipy.interpolate.griddata(np.hstack((xs, ys)), zs, points)
Is this not what you are after?
If I understand your question, you have points xs, ys, zs that are defined by
xs = np.sin(U) * np.cos(V)
ys = np.sin(U) * np.sin(V)
zs = np.cos(U)
What you want is to be able to interpolate and find a z-value for a given x and y? Why do you need interpolation? The above equations represent a sphere, they can be rewritten as xs*xs + ys*ys + zs*zs = 1, so there is an easy analytical solution to this problem:
def Z(X, Y):
return np.sqrt(1-X**2-Y**2)
## or return -np.sqrt(1-X**2-Y**2) since this equation has two solutions
unless I misunderstood the question.
I'd like to find a least-squares solution for the a coefficients in
z = (a0 + a1*x + a2*y + a3*x**2 + a4*x**2*y + a5*x**2*y**2 + a6*y**2 +
a7*x*y**2 + a8*x*y)
given arrays x, y, and z of length 20. Basically I'm looking for the equivalent of numpy.polyfit but for a 2D polynomial.
This question is similar, but the solution is provided via MATLAB.
Here is an example showing how you can use numpy.linalg.lstsq for this task:
import numpy as np
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
Z = X**2 + Y**2 + np.random.rand(*X.shape)*0.01
X = X.flatten()
Y = Y.flatten()
A = np.array([X*0+1, X, Y, X**2, X**2*Y, X**2*Y**2, Y**2, X*Y**2, X*Y]).T
B = Z.flatten()
coeff, r, rank, s = np.linalg.lstsq(A, B)
the adjusting coefficients coeff are:
array([ 0.00423365, 0.00224748, 0.00193344, 0.9982576 , -0.00594063,
0.00834339, 0.99803901, -0.00536561, 0.00286598])
Note that coeff[3] and coeff[6] respectively correspond to X**2 and Y**2, and they are close to 1. because the example data was created with Z = X**2 + Y**2 + small_random_component.
Based on the answers from #Saullo and #Francisco I have made a function which I have found helpful:
def polyfit2d(x, y, z, kx=3, ky=3, order=None):
'''
Two dimensional polynomial fitting by least squares.
Fits the functional form f(x,y) = z.
Notes
-----
Resultant fit can be plotted with:
np.polynomial.polynomial.polygrid2d(x, y, soln.reshape((kx+1, ky+1)))
Parameters
----------
x, y: array-like, 1d
x and y coordinates.
z: np.ndarray, 2d
Surface to fit.
kx, ky: int, default is 3
Polynomial order in x and y, respectively.
order: int or None, default is None
If None, all coefficients up to maxiumum kx, ky, ie. up to and including x^kx*y^ky, are considered.
If int, coefficients up to a maximum of kx+ky <= order are considered.
Returns
-------
Return paramters from np.linalg.lstsq.
soln: np.ndarray
Array of polynomial coefficients.
residuals: np.ndarray
rank: int
s: np.ndarray
'''
# grid coords
x, y = np.meshgrid(x, y)
# coefficient array, up to x^kx, y^ky
coeffs = np.ones((kx+1, ky+1))
# solve array
a = np.zeros((coeffs.size, x.size))
# for each coefficient produce array x^i, y^j
for index, (j, i) in enumerate(np.ndindex(coeffs.shape)):
# do not include powers greater than order
if order is not None and i + j > order:
arr = np.zeros_like(x)
else:
arr = coeffs[i, j] * x**i * y**j
a[index] = arr.ravel()
# do leastsq fitting and return leastsq result
return np.linalg.lstsq(a.T, np.ravel(z), rcond=None)
And the resultant fit can be visualised with:
fitted_surf = np.polynomial.polynomial.polyval2d(x, y, soln.reshape((kx+1,ky+1)))
plt.matshow(fitted_surf)
Excellent answer by Saullo Castro. Just to add the code to reconstruct the function using the least-squares solution for the a coefficients,
def poly2Dreco(X, Y, c):
return (c[0] + X*c[1] + Y*c[2] + X**2*c[3] + X**2*Y*c[4] + X**2*Y**2*c[5] +
Y**2*c[6] + X*Y**2*c[7] + X*Y*c[8])
You can also use scikit-learn for this.
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
X = X.flatten()
Y = Y.flatten()
# Generate noisy data
np.random.seed(0)
Z = X**2 + Y**2 + np.random.randn(*X.shape)*0.01
# Process 2D inputs
poly = PolynomialFeatures(degree=2)
input_pts = np.stack([X, Y]).T
assert(input_pts.shape == (400, 2))
in_features = poly.fit_transform(input_pts)
# Linear regression
model = LinearRegression()
model.fit(in_features, Z)
# Display coefficients
print(dict(zip(poly.get_feature_names_out(), model.coef_.round(4))))
# Check fit
print(f"R-squared: {model.score(poly.transform(input_pts), Z):.3f}")
# Make predictions
Z_predicted = model.predict(poly.transform(input_pts))
Out:
{'1': 0.0, 'x0': 0.003, 'x1': -0.0074, 'x0^2': 0.9974, 'x0 x1': 0.0047, 'x1^2': 1.0014}
R-squared: 1.000
Note that if kx != ky the code will fail because the j and i indices are inverted in the loop.
You get (j,i) from enumerate(np.ndindex(coeffs.shape)), but then you address elements in coeffs as coeffs[i,j]. Since the shape of the coefficient matrix is given by the maximum polynomial order that you are asking to use, the matrix will be rectangular if kx != ky and you will exceed one of its dimensions.
I have a set of x, y points and I'd like to find the line of best fit such that the line is below all points using SciPy. I'm trying to use leastsq for this, but I'm unsure how to adjust the line to be below all points instead of the line of best fit. The coefficients for the line of best fit can be produced via:
def linreg(x, y):
fit = lambda params, x: params[0] * x - params[1]
err = lambda p, x, y: (y - fit(p, x))**2
# initial slope/intercept
init_p = np.array((1, 0))
p, _ = leastsq(err, init_p.copy(), args=(x, y))
return p
xs = sp.array([1, 2, 3, 4, 5])
ys = sp.array([10, 20, 30, 40, 50])
print linreg(xs, ys)
The output is the coefficients for the line of best fit:
array([ 9.99999997e+00, -1.68071668e-15])
How can I get the coefficients of the line of best fit that is below all points?
A possible algorithm is as follows:
Move the axes to have all the data on the positive half of the x axis.
If the fit is of the form y = a * x + b, then for a given b the best fit for a will be the minimum of the slopes joining the point (0, b) with each of the (x, y) points.
You can then calculate a fit error, which is a function of only b, and use scipy.optimize.minimize to find the best value for b.
All that's left is computing a for that b and calculating b for the original position of the axes.
The following does that most of the time, except when the minimization fails with some mysterious error:
from __future__ import division
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
def fit_below(x, y) :
idx = np.argsort(x)
x = x[idx]
y = y[idx]
x0, y0 = x[0] - 1, y[0]
x -= x0
y -= y0
def error_function_2(b, x, y) :
a = np.min((y - b) / x)
return np.sum((y - a * x - b)**2)
b = scipy.optimize.minimize(error_function_2, [0], args=(x, y)).x[0]
a = np.min((y - b) / x)
return a, b - a * x0 + y0
x = np.arange(10).astype(float)
y = x * 2 + 3 + 3 * np.random.rand(len(x))
a, b = fit_below(x, y)
plt.plot(x, y, 'o')
plt.plot(x, a*x + b, '-')
plt.show()
And as TheodrosZelleke wisely predicted, it goes through two points that are part of the convex hull: