How to substract a "y" value from the trendline **numpy and pylab**

How to substract a "y" value from the trendline **numpy and pylab** - python

I have a 3 point graph with a trendline but I need to find an "x" value for specific y value. Here is what I have:
from numpy import *
from pylab import *
x = ng
y = density
plt.scatter(x, y)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plt.plot(x,p(x),"r--")
The x value is basically a DNA concentration whereas the y value is a densitometric value that I calculated. I need to find a DNA concertation for the density of 19159.8.
Can somebody help me, please?

The inverse function of y = a*x + b is simply x = (1/a)*y + (-b/a):
a, b = z
z_inv = np.array([1 / a, -b / a])
p_inv = np.poly1d(z_inv)
print(np.allclose(p_inv(p(x)), x))
# True

Related

Parabolic fit with fixed peak

I have a set of data and want to put a parabolic fit over it. This already works with the polyfit function from numpy like this:
fit = np.polyfit(X, y, 2)
formula = np.poly1d(fit)
Now I want the parabula to have its peak value at a fixed x value and that the fit is still carried out as best as possible with this fixed peak. Is there a way to accomplish that?
From my data I know that the parabola will always be open downwards.

I think this is quite a difficult problem since the x coordinate of the peak of a second-order polynomial (ax^2 + bx + c) always lies in x = -b/2a.
A thing you could do is to drop the b term and offset it by the desired peak x value in fitting the polynomial like the code below. Note that I used scipy.optimize.curve_fit to fit for the custom function func.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generating a parabola with noise
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x-2)**2 + np.random.normal(0, 5, x.shape)
# function to fit
def func(x, a, c):
return a*x**2 + c
# desired x peak value
x_peak = 2
popt, pcov = curve_fit(func, x - x_peak, y)
y_fit = func(x - x_peak, *popt)
# plotting
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
plt.axvline(x_peak)
plt.show()
Outputs the image:

Fixing a point on your parabola simplifies the problem, since you can rewrite your equation slightly in terms of a constant now:
y = A(x - B)**2 + C
Given the coefficients a, b, c in your original unconstrained fit, you have the relationships
a = A
b = -2AB
c = AB**2 + C
The only difference is that since B is a constant and you don't have an x - B term in the equation, you need to set up the least-squares problem yourself. Given arrays x, y and constant B, the problem looks like this:
m = np.stack((x - B, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
You can then extract the normal coefficient from the formulas for a, b, c above.
Here is a complete example, just like the one in the other answer:
B = 2
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x - B)**2 + np.random.normal(0, 5, x.shape)
m = np.stack(((x - B)**2, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
a = A
b = -2 * A * B
c = A * B**2 + C
y_fit = a * x**2 + b * x + c
You can drop a, b, c entirely and do
y_fit = A * (x - B)**2 + C
The result will be identical.
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)

Without the condition of location of the peak the function to be fitted would be :
y = a x^2 + b x + c
With condition of location of the peak at x=p , given p :
-b/(2a)=p
b=-2 a p
y = a x^2 -2 a p x + c
y = a (x^2 - 2 p x) +c
Knowing p , one change of variable :
X = x^2 -2 p x
So, from the data (x,y) one first compute the new data (X,y)
Then a and c are computed thanks to linear regression
y = a X + c

Cubic spline for non-monotonic data (not a 1d function)

I have a curve as shown below:
The x coordinates and the y coordinates for this plot are:
path_x= (4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0)
path_y =(0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668)
And I obtained the above picture by:
x_min =min(path_x)-1
x_max =max(path_x)+1
y_min =min(path_y)-1
y_max =max(path_y)+1
num_pts = len(path_x)
fig = plt.figure(figsize=(8,8))
#fig = plt.figure()
plt.suptitle("Curve and the boundary")
ax = fig.add_subplot(1,1,1)
ax.set_xlim([min(x_min,y_min),max(x_max,y_max)])
ax.set_ylim([min(x_min,y_min),max(x_max,y_max)])
ax.plot(path_x,path_y)
Now my intention is to draw a smooth curve using cubic splines. But looks like for cubic splines you need the x coordinates to be on ascending order. whereas in this case, neither x values nor y values are in the ascending order.
Also this is not a function. That is an x value is mapped with more than one element in the range.
I also went over this post. But I couldn't figure out a proper method to solve my problem.
I really appreciate your help in this regard

As suggested in the comments, you can always parameterize any curve/surface with an arbitrary (and linear!) parameter.
For example, define t as a parameter such that you get x=x(t) and y=y(t). Since t is arbitrary, you can define it such that at t=0, you get your first path_x[0],path_y[0], and at t=1, you get your last pair of coordinates, path_x[-1],path_y[-1].
Here is a code using scipy.interpolate
import numpy
import scipy.interpolate
import matplotlib.pyplot as plt
path_x = numpy.asarray((4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0),dtype=float)
path_y = numpy.asarray((0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668),dtype=float)
# defining arbitrary parameter to parameterize the curve
path_t = numpy.linspace(0,1,path_x.size)
# this is the position vector with
# x coord (1st row) given by path_x, and
# y coord (2nd row) given by path_y
r = numpy.vstack((path_x.reshape((1,path_x.size)),path_y.reshape((1,path_y.size))))
# creating the spline object
spline = scipy.interpolate.interp1d(path_t,r,kind='cubic')
# defining values of the arbitrary parameter over which
# you want to interpolate x and y
# it MUST be within 0 and 1, since you defined
# the spline between path_t=0 and path_t=1
t = numpy.linspace(numpy.min(path_t),numpy.max(path_t),100)
# interpolating along t
# r[0,:] -> interpolated x coordinates
# r[1,:] -> interpolated y coordinates
r = spline(t)
plt.plot(path_x,path_y,'or')
plt.plot(r[0,:],r[1,:],'-k')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
With output

For non-ascending x splines can be easily computed if you make both x and y functions of another parameter t: x(t), y(t).
In your case you have 5 points so t should be just enumeration of these points, i.e. t = 0, 1, 2, 3, 4 for 5 points.
So if x = [5, 2, 7, 3, 6] then x(t) = x(0) = 5, x(1) = 2, x(2) = 7, x(3) = 3, x(4) = 6. Same for y.
Then compute spline function for both x(t) and y(t). Afterwards compute values of splines in all many intermediate t points. Lastly just use all calculated values x(t) and y(t) as a function y(x).
Once before I implemented cubic spline computation from scratch using Numpy, so I use this code in my example below if you don't mind (it could be useful for you to learn about spline math), replace with your library functions. Also in my code you can see numba lines commented out, if you want you can use these Numba annotations to speed up computation.
You have to look at main() function at the bottom of code, it shows how to compute and use x(t) and y(t).
Try it online!
import numpy as np, matplotlib.pyplot as plt
# Solves linear system given by Tridiagonal Matrix
# Helper for calculating cubic splines
##numba.njit(cache = True, fastmath = True, inline = 'always')
def tri_diag_solve(A, B, C, F):
n = B.size
assert A.ndim == B.ndim == C.ndim == F.ndim == 1 and (
A.size == B.size == C.size == F.size == n
) #, (A.shape, B.shape, C.shape, F.shape)
Bs, Fs = np.zeros_like(B), np.zeros_like(F)
Bs[0], Fs[0] = B[0], F[0]
for i in range(1, n):
Bs[i] = B[i] - A[i] / Bs[i - 1] * C[i - 1]
Fs[i] = F[i] - A[i] / Bs[i - 1] * Fs[i - 1]
x = np.zeros_like(B)
x[-1] = Fs[-1] / Bs[-1]
for i in range(n - 2, -1, -1):
x[i] = (Fs[i] - C[i] * x[i + 1]) / Bs[i]
return x
# Calculate cubic spline params
##numba.njit(cache = True, fastmath = True, inline = 'always')
def calc_spline_params(x, y):
a = y
h = np.diff(x)
c = np.concatenate((np.zeros((1,), dtype = y.dtype),
np.append(tri_diag_solve(h[:-1], (h[:-1] + h[1:]) * 2, h[1:],
((a[2:] - a[1:-1]) / h[1:] - (a[1:-1] - a[:-2]) / h[:-1]) * 3), 0)))
d = np.diff(c) / (3 * h)
b = (a[1:] - a[:-1]) / h + (2 * c[1:] + c[:-1]) / 3 * h
return a[1:], b, c[1:], d
# Spline value calculating function, given params and "x"
##numba.njit(cache = True, fastmath = True, inline = 'always')
def func_spline(x, ix, x0, a, b, c, d):
dx = x - x0[1:][ix]
return a[ix] + (b[ix] + (c[ix] + d[ix] * dx) * dx) * dx
# Compute piece-wise spline function for "x" out of sorted "x0" points
##numba.njit([f'f{ii}[:](f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:], f{ii}[:])' for ii in (4, 8)],
# cache = True, fastmath = True, inline = 'always')
def piece_wise_spline(x, x0, a, b, c, d):
xsh = x.shape
x = x.ravel()
ix = np.searchsorted(x0[1 : -1], x)
y = func_spline(x, ix, x0, a, b, c, d)
y = y.reshape(xsh)
return y
def main():
x0 = np.array([4.0, 5.638304088577984, 6.785456961280076, 5.638304088577984, 4.0])
y0 = np.array([0.0, 1.147152872702092, 2.7854569612800755, 4.423761049858059, 3.2766081771559668])
t0 = np.arange(len(x0)).astype(np.float64)
plt.plot(x0, y0)
vs = []
for e in (x0, y0):
a, b, c, d = calc_spline_params(t0, e)
x = np.linspace(0, t0[-1], 100)
vs.append(piece_wise_spline(x, t0, a, b, c, d))
plt.plot(vs[0], vs[1])
plt.show()
if __name__ == '__main__':
main()
Output:

Equivalent of `polyfit` for a 2D polynomial in Python

I'd like to find a least-squares solution for the a coefficients in
z = (a0 + a1*x + a2*y + a3*x**2 + a4*x**2*y + a5*x**2*y**2 + a6*y**2 +
a7*x*y**2 + a8*x*y)
given arrays x, y, and z of length 20. Basically I'm looking for the equivalent of numpy.polyfit but for a 2D polynomial.
This question is similar, but the solution is provided via MATLAB.

Here is an example showing how you can use numpy.linalg.lstsq for this task:
import numpy as np
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
Z = X**2 + Y**2 + np.random.rand(*X.shape)*0.01
X = X.flatten()
Y = Y.flatten()
A = np.array([X*0+1, X, Y, X**2, X**2*Y, X**2*Y**2, Y**2, X*Y**2, X*Y]).T
B = Z.flatten()
coeff, r, rank, s = np.linalg.lstsq(A, B)
the adjusting coefficients coeff are:
array([ 0.00423365, 0.00224748, 0.00193344, 0.9982576 , -0.00594063,
0.00834339, 0.99803901, -0.00536561, 0.00286598])
Note that coeff[3] and coeff[6] respectively correspond to X**2 and Y**2, and they are close to 1. because the example data was created with Z = X**2 + Y**2 + small_random_component.

Based on the answers from #Saullo and #Francisco I have made a function which I have found helpful:
def polyfit2d(x, y, z, kx=3, ky=3, order=None):
'''
Two dimensional polynomial fitting by least squares.
Fits the functional form f(x,y) = z.
Notes
-----
Resultant fit can be plotted with:
np.polynomial.polynomial.polygrid2d(x, y, soln.reshape((kx+1, ky+1)))
Parameters
----------
x, y: array-like, 1d
x and y coordinates.
z: np.ndarray, 2d
Surface to fit.
kx, ky: int, default is 3
Polynomial order in x and y, respectively.
order: int or None, default is None
If None, all coefficients up to maxiumum kx, ky, ie. up to and including x^kx*y^ky, are considered.
If int, coefficients up to a maximum of kx+ky <= order are considered.
Returns
-------
Return paramters from np.linalg.lstsq.
soln: np.ndarray
Array of polynomial coefficients.
residuals: np.ndarray
rank: int
s: np.ndarray
'''
# grid coords
x, y = np.meshgrid(x, y)
# coefficient array, up to x^kx, y^ky
coeffs = np.ones((kx+1, ky+1))
# solve array
a = np.zeros((coeffs.size, x.size))
# for each coefficient produce array x^i, y^j
for index, (j, i) in enumerate(np.ndindex(coeffs.shape)):
# do not include powers greater than order
if order is not None and i + j > order:
arr = np.zeros_like(x)
else:
arr = coeffs[i, j] * x**i * y**j
a[index] = arr.ravel()
# do leastsq fitting and return leastsq result
return np.linalg.lstsq(a.T, np.ravel(z), rcond=None)
And the resultant fit can be visualised with:
fitted_surf = np.polynomial.polynomial.polyval2d(x, y, soln.reshape((kx+1,ky+1)))
plt.matshow(fitted_surf)

Excellent answer by Saullo Castro. Just to add the code to reconstruct the function using the least-squares solution for the a coefficients,
def poly2Dreco(X, Y, c):
return (c[0] + X*c[1] + Y*c[2] + X**2*c[3] + X**2*Y*c[4] + X**2*Y**2*c[5] +
Y**2*c[6] + X*Y**2*c[7] + X*Y*c[8])

You can also use scikit-learn for this.
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, y, copy=False)
X = X.flatten()
Y = Y.flatten()
# Generate noisy data
np.random.seed(0)
Z = X**2 + Y**2 + np.random.randn(*X.shape)*0.01
# Process 2D inputs
poly = PolynomialFeatures(degree=2)
input_pts = np.stack([X, Y]).T
assert(input_pts.shape == (400, 2))
in_features = poly.fit_transform(input_pts)
# Linear regression
model = LinearRegression()
model.fit(in_features, Z)
# Display coefficients
print(dict(zip(poly.get_feature_names_out(), model.coef_.round(4))))
# Check fit
print(f"R-squared: {model.score(poly.transform(input_pts), Z):.3f}")
# Make predictions
Z_predicted = model.predict(poly.transform(input_pts))
Out:
{'1': 0.0, 'x0': 0.003, 'x1': -0.0074, 'x0^2': 0.9974, 'x0 x1': 0.0047, 'x1^2': 1.0014}
R-squared: 1.000

Note that if kx != ky the code will fail because the j and i indices are inverted in the loop.
You get (j,i) from enumerate(np.ndindex(coeffs.shape)), but then you address elements in coeffs as coeffs[i,j]. Since the shape of the coefficient matrix is given by the maximum polynomial order that you are asking to use, the matrix will be rectangular if kx != ky and you will exceed one of its dimensions.

Python: heat density plot in a disk

My goal is to make a density heat map plot of sphere in 2D. The plotting code below the line works when I use rectangular domains. However, I am trying to use the code for a circular domain. The radius of sphere is 1. The code I have so far is:
from pylab import *
import numpy as np
from matplotlib.colors import LightSource
from numpy.polynomial.legendre import leggauss, legval
xi = 0.0
xf = 1.0
numx = 500
yi = 0.0
yf = 1.0
numy = 500
def f(x):
if 0 <= x <= 1:
return 100
if -1 <= x <= 0:
return 0
deg = 1000
xx, w = leggauss(deg)
L = np.polynomial.legendre.legval(xx, np.identity(deg))
integral = (L * (f(x) * w)[None,:]).sum(axis = 1)
c = (np.arange(1, 500) + 0.5) * integral[1:500]
def r(x, y):
return np.sqrt(x ** 2 + y ** 2)
theta = np.arctan2(y, x)
x, y = np.linspace(0, 1, 500000)
def T(x, y):
return (sum(r(x, y) ** l * c[:,None] *
np.polynomial.legendre.legval(xx, identity(deg)) for l in range(1, 500)))
T(x, y) should equal the sum of c the coefficients times the radius as a function of x and y to the l power times the legendre polynomial where the argument is of the legendre polynomial is cos(theta).
In python: integrating a piecewise function, I learned how to use the Legendre polynomials in a summation but that method is slightly different, and for the plotting, I need a function T(x, y).
This is the plotting code.
densityinterpolation = 'bilinear'
densitycolormap = cm.jet
densityshadedflag = False
densitybarflag = True
gridflag = True
plotfilename = 'laplacesphere.eps'
x = arange(xi, xf, (xf - xi) / (numx - 1))
y = arange(yi, yf, (yf - yi) / (numy - 1))
X, Y = meshgrid(x, y)
z = T(X, Y)
if densityshadedflag:
ls = LightSource(azdeg = 120, altdeg = 65)
rgb = ls.shade(z, densitycolormap)
im = imshow(rgb, extent = [xi, xf, yi, yf], cmap = densitycolormap)
else:
im = imshow(z, extent = [xi, xf, yi, yf], cmap = densitycolormap)
im.set_interpolation(densityinterpolation)
if densitybarflag:
colorbar(im)
grid(gridflag)
show()
I made the plot in Mathematica for reference of what my end goal is

If you set the values outside of the disk domain (or whichever domain you want) to float('nan'), those points will be ignored when plotting (leaving them in white color).

Line fitting below points

I have a set of x, y points and I'd like to find the line of best fit such that the line is below all points using SciPy. I'm trying to use leastsq for this, but I'm unsure how to adjust the line to be below all points instead of the line of best fit. The coefficients for the line of best fit can be produced via:
def linreg(x, y):
fit = lambda params, x: params[0] * x - params[1]
err = lambda p, x, y: (y - fit(p, x))**2
# initial slope/intercept
init_p = np.array((1, 0))
p, _ = leastsq(err, init_p.copy(), args=(x, y))
return p
xs = sp.array([1, 2, 3, 4, 5])
ys = sp.array([10, 20, 30, 40, 50])
print linreg(xs, ys)
The output is the coefficients for the line of best fit:
array([ 9.99999997e+00, -1.68071668e-15])
How can I get the coefficients of the line of best fit that is below all points?

A possible algorithm is as follows:
Move the axes to have all the data on the positive half of the x axis.
If the fit is of the form y = a * x + b, then for a given b the best fit for a will be the minimum of the slopes joining the point (0, b) with each of the (x, y) points.
You can then calculate a fit error, which is a function of only b, and use scipy.optimize.minimize to find the best value for b.
All that's left is computing a for that b and calculating b for the original position of the axes.
The following does that most of the time, except when the minimization fails with some mysterious error:
from __future__ import division
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
def fit_below(x, y) :
idx = np.argsort(x)
x = x[idx]
y = y[idx]
x0, y0 = x[0] - 1, y[0]
x -= x0
y -= y0
def error_function_2(b, x, y) :
a = np.min((y - b) / x)
return np.sum((y - a * x - b)**2)
b = scipy.optimize.minimize(error_function_2, [0], args=(x, y)).x[0]
a = np.min((y - b) / x)
return a, b - a * x0 + y0
x = np.arange(10).astype(float)
y = x * 2 + 3 + 3 * np.random.rand(len(x))
a, b = fit_below(x, y)
plt.plot(x, y, 'o')
plt.plot(x, a*x + b, '-')
plt.show()
And as TheodrosZelleke wisely predicted, it goes through two points that are part of the convex hull:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to substract a "y" value from the trendline numpy and pylab - python

The inverse function of y = ax + b is simply x = (1/a)y + (-b/a): a, b = z z_inv = np.array([1 / a, -b / a]) p_inv = np.poly1d(z_inv) print(np.allclose(p_inv(p(x)), x)) # True

Related

Parabolic fit with fixed peak

Cubic spline for non-monotonic data (not a 1d function)

Equivalent of `polyfit` for a 2D polynomial in Python

Python: heat density plot in a disk

Line fitting below points

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to substract a "y" value from the trendline **numpy and pylab** - python

The inverse function of y = a*x + b is simply x = (1/a)*y + (-b/a): a, b = z z_inv = np.array([1 / a, -b / a]) p_inv = np.poly1d(z_inv) print(np.allclose(p_inv(p(x)), x)) # True

Related

Parabolic fit with fixed peak

Cubic spline for non-monotonic data (not a 1d function)

Equivalent of `polyfit` for a 2D polynomial in Python

Python: heat density plot in a disk

Line fitting below points

Categories

Resources

How to substract a "y" value from the trendline numpy and pylab - python

The inverse function of y = ax + b is simply x = (1/a)y + (-b/a): a, b = z z_inv = np.array([1 / a, -b / a]) p_inv = np.poly1d(z_inv) print(np.allclose(p_inv(p(x)), x)) # True