Use UnivariateSpline to fit data tightly - python

I have a bunch of x, y points that represent a sigmoidal function:
x=[ 1.00094909 1.08787635 1.17481363 1.2617564 1.34867881 1.43562284
1.52259341 1.609522 1.69631283 1.78276102 1.86426648 1.92896789
1.9464453 1.94941586 2.00062852 2.073691 2.14982808 2.22808316
2.30634034 2.38456905 2.46280126 2.54106611 2.6193345 2.69748825]
y=[-0.10057627 -0.10172142 -0.10320428 -0.10378959 -0.10348456 -0.10312503
-0.10276956 -0.10170055 -0.09778279 -0.08608644 -0.05797392 0.00063599
0.08732999 0.16429878 0.2223306 0.25368884 0.26830932 0.27313931
0.27308756 0.27048902 0.26626313 0.26139534 0.25634544 0.2509893 ]
I use scipy.interpolate.UnivariateSpline() to fit to some cubic spline as follows:
from scipy.interpolate import UnivariateSpline
s = UnivariateSpline(x, y, k=3, s=0)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x,y)
plt.plot(xfit, s(xfit))
plt.show()
This is what I get:
Since I specify s=0, the spline adheres completely to the data, but there are too many wiggles. Using a higher k value leads to even more wiggles.
So my questions are --
How should I correctly use scipy.interpolate.UnivariateSpline() to fit my data? More precisely, how do I make the spline minimise its wiggling?
Is this even the correct choice for this kind of a sigmoidal function? Should I be using something like scipy.optimize.curve_fit() with a trial tanh(x) function instead?

There are several options, I list a few below. The last one seems to give the best output. Whether you should use a spline or an actual function depends on what you want to do with the output; I list two analytical functions below that could be used but I don't know in which context the data were derived so it is hard to find the best one for you.
You can play with s, e.g. for s=0.005, the plot looks like this (still not extremely pretty but you could further adjust):
But I would indeed use a "proper" function and fit using e.g. curve_fit. The function below is still not ideal as it is monotonically increasing, so we miss the decrease at the end; the plot looks as follows:
This is the entire code, for both the spline and the actual fit:
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def func(x, ymax, n, k, c):
return ymax * x ** n / (k ** n + x ** n) + c
x=np.array([ 1.00094909, 1.08787635, 1.17481363, 1.2617564, 1.34867881, 1.43562284,
1.52259341, 1.609522, 1.69631283, 1.78276102, 1.86426648, 1.92896789,
1.9464453, 1.94941586, 2.00062852, 2.073691, 2.14982808, 2.22808316,
2.30634034, 2.38456905, 2.46280126, 2.54106611, 2.6193345, 2.69748825])
y=np.array([-0.10057627, -0.10172142, -0.10320428, -0.10378959, -0.10348456, -0.10312503,
-0.10276956, -0.10170055, -0.09778279, -0.08608644, -0.05797392, 0.00063599,
0.08732999, 0.16429878, 0.2223306, 0.25368884, 0.26830932, 0.27313931,
0.27308756, 0.27048902, 0.26626313, 0.26139534, 0.25634544, 0.2509893 ])
popt, pcov = curve_fit(func, x, y, p0=[y.max(), 2, 2, -0.1], bounds=([0, 0, 0, -0.2], [0.4, 45, 2000, 10]))
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, func(xfit, *popt))
plt.show()
s = UnivariateSpline(x, y, k=3, s=0.005)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, s(xfit))
plt.show()
A third option is to use a more advanced function that can also reproduce the decrease at the end and differential_evolution for the fit; that seems to give the best fit:
The code is as follows (using the same data as above):
from scipy.optimize import curve_fit, differential_evolution
def sigmoid_with_decay(x, a, b, c, d, e, f):
return a * (1. / (1. + np.exp(-b * (x - c)))) * (1. / (1. + np.exp(d * (x - e)))) + f
def error_sigmoid_with_decay(parameters, x_data, y_data):
return np.sum((y_data - sigmoid_with_decay(x_data, *parameters)) ** 2)
res = differential_evolution(error_sigmoid_with_decay,
bounds=[(0, 10), (0, 25), (0, 10), (0, 10), (0, 10), (-1, 0.1)],
args=(x, y),
seed=42)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, sigmoid_with_decay(xfit, *res.x))
plt.show()
The fit is quite sensitive regarding the bounds, so be careful when you play with that...

This illustrates the result of fitting two halves of the data to different functions, the lower half to all data with X < 2.0 and the upper half to all data with X >= 1.9, so that there is overlap in the data for the fitted curves. The code switches from one equation to another at the center of the overlap region, X = 1.95.
import numpy, matplotlib
import matplotlib.pyplot as plt
xData=numpy.array([ 1.00094909, 1.08787635, 1.17481363, 1.2617564, 1.34867881, 1.43562284,
1.52259341, 1.609522, 1.69631283, 1.78276102, 1.86426648, 1.92896789,
1.9464453, 1.94941586, 2.00062852, 2.073691, 2.14982808, 2.22808316,
2.30634034, 2.38456905, 2.46280126, 2.54106611, 2.6193345, 2.69748825])
yData=numpy.array([-0.10057627, -0.10172142, -0.10320428, -0.10378959, -0.10348456, -0.10312503,
-0.10276956, -0.10170055, -0.09778279, -0.08608644, -0.05797392, 0.00063599,
0.08732999, 0.16429878, 0.2223306, 0.25368884, 0.26830932, 0.27313931,
0.27308756, 0.27048902, 0.26626313, 0.26139534, 0.25634544, 0.2509893 ])
# function for x < 1.95 (fitted up to 2.0 for overlap)
def lowerFunc(x_in): # Bleasdale-Nelder Power With Offset
# coefficients
a = -1.1431476643503597E+03
b = 3.3819340844164983E+21
c = -6.3633178925040745E+01
d = 3.1481973843740194E+00
Offset = -1.0300724909782859E-01
temp = numpy.power(a + b * numpy.power(x_in, c), -1.0 / d)
temp += Offset
return temp
# function for x >= 1.95 (fitted down to 1.9 for overlap)
def upperFunc(x_in): # rational equation with Offset
# coefficients
a = -2.5294212380048242E-01
b = 1.4262697377369586E+00
c = -2.6141935706529118E-01
d = -8.8730045918252121E-02
Offset = -4.8283287597672708E-01
temp = (a * numpy.power(x_in, 2) + b * numpy.log(x_in)) # numerator
temp /= (1.0 + c * numpy.power(numpy.log(x_in), -1) + d * numpy.exp(x_in)) # denominator
temp += Offset
return temp
def combinedFunc(x_in):
returnVal = []
for x in x_in:
if x < 1.95:
returnVal.append(lowerFunc(x))
else:
returnVal.append(upperFunc(x))
return returnVal
modelPredictions = combinedFunc(xData)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = combinedFunc(xModel)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Related

Interpolating non-uniformly distributed points on a 3D sphere

I have several points on the unit sphere that are distributed according to the algorithm described in https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf (and implemented in the code below). On each of these points, I have a value that in my particular case represents 1 minus a small error. The errors are in [0, 0.1] if this is important, so my values are in [0.9, 1].
Sadly, computing the errors is a costly process and I cannot do this for as many points as I want. Still, I want my plots to look like I am plotting something "continuous".
So I want to fit an interpolation function to my data, to be able to sample as many points as I want.
After a little bit of research I found scipy.interpolate.SmoothSphereBivariateSpline which seems to do exactly what I want. But I cannot make it work properly.
Question: what can I use to interpolate (spline, linear interpolation, anything would be fine for the moment) my data on the unit sphere? An answer can be either "you misused scipy.interpolation, here is the correct way to do this" or "this other function is better suited to your problem".
Sample code that should be executable with numpy and scipy installed:
import typing as ty
import numpy
import scipy.interpolate
def get_equidistant_points(N: int) -> ty.List[numpy.ndarray]:
"""Generate approximately n points evenly distributed accros the 3-d sphere.
This function tries to find approximately n points (might be a little less
or more) that are evenly distributed accros the 3-dimensional unit sphere.
The algorithm used is described in
https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf.
"""
# Unit sphere
r = 1
points: ty.List[numpy.ndarray] = list()
a = 4 * numpy.pi * r ** 2 / N
d = numpy.sqrt(a)
m_v = int(numpy.round(numpy.pi / d))
d_v = numpy.pi / m_v
d_phi = a / d_v
for m in range(m_v):
v = numpy.pi * (m + 0.5) / m_v
m_phi = int(numpy.round(2 * numpy.pi * numpy.sin(v) / d_phi))
for n in range(m_phi):
phi = 2 * numpy.pi * n / m_phi
points.append(
numpy.array(
[
numpy.sin(v) * numpy.cos(phi),
numpy.sin(v) * numpy.sin(phi),
numpy.cos(v),
]
)
)
return points
def cartesian2spherical(x: float, y: float, z: float) -> numpy.ndarray:
r = numpy.linalg.norm([x, y, z])
theta = numpy.arccos(z / r)
phi = numpy.arctan2(y, x)
return numpy.array([r, theta, phi])
n = 100
points = get_equidistant_points(n)
# Random here, but costly in real life.
errors = numpy.random.rand(len(points)) / 10
# Change everything to spherical to use the interpolator from scipy.
ideal_spherical_points = numpy.array([cartesian2spherical(*point) for point in points])
r_interp = 1 - errors
theta_interp = ideal_spherical_points[:, 1]
phi_interp = ideal_spherical_points[:, 2]
# Change phi coordinate from [-pi, pi] to [0, 2pi] to please scipy.
phi_interp[phi_interp < 0] += 2 * numpy.pi
# Create the interpolator.
interpolator = scipy.interpolate.SmoothSphereBivariateSpline(
theta_interp, phi_interp, r_interp
)
# Creating the finer theta and phi values for the final plot
theta = numpy.linspace(0, numpy.pi, 100, endpoint=True)
phi = numpy.linspace(0, numpy.pi * 2, 100, endpoint=True)
# Creating the coordinate grid for the unit sphere.
X = numpy.outer(numpy.sin(theta), numpy.cos(phi))
Y = numpy.outer(numpy.sin(theta), numpy.sin(phi))
Z = numpy.outer(numpy.cos(theta), numpy.ones(100))
thetas, phis = numpy.meshgrid(theta, phi)
heatmap = interpolator(thetas, phis)
Issue with the code above:
With the code as-is, I have a
ValueError: The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. The weighted least-squares spline corresponds to the current set of knots.
that is raised when initialising the interpolator instance.
The issue above seems to say that I should change the value of s that is one on the parameters of scipy.interpolate.SmoothSphereBivariateSpline. I tested different values of s ranging from 0.0001 to 100000, the code above always raise, either the exception described above or:
ValueError: Error code returned by bispev: 10
Edit: I am including my findings here. They can't really be considered as a solution, that is why I am editing and not posting as an answer.
With more research I found this question Using Radial Basis Functions to Interpolate a Function on a Sphere. The author has exactly the same problem as me and use a different interpolator: scipy.interpolate.Rbf. I changed the above code by replacing the interpolator and plotting:
# Create the interpolator.
interpolator = scipy.interpolate.Rbf(theta_interp, phi_interp, r_interp)
# Creating the finer theta and phi values for the final plot
plot_points = 100
theta = numpy.linspace(0, numpy.pi, plot_points, endpoint=True)
phi = numpy.linspace(0, numpy.pi * 2, plot_points, endpoint=True)
# Creating the coordinate grid for the unit sphere.
X = numpy.outer(numpy.sin(theta), numpy.cos(phi))
Y = numpy.outer(numpy.sin(theta), numpy.sin(phi))
Z = numpy.outer(numpy.cos(theta), numpy.ones(plot_points))
thetas, phis = numpy.meshgrid(theta, phi)
heatmap = interpolator(thetas, phis)
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import cm
colormap = cm.inferno
normaliser = mpl.colors.Normalize(vmin=numpy.min(heatmap), vmax=1)
scalar_mappable = cm.ScalarMappable(cmap=colormap, norm=normaliser)
scalar_mappable.set_array([])
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(
X,
Y,
Z,
facecolors=colormap(normaliser(heatmap)),
alpha=0.7,
cmap=colormap,
)
plt.colorbar(scalar_mappable)
plt.show()
This code runs smoothly and gives the following result:
The interpolation seems OK except on one line that is discontinuous, just like in the question that led me to this class. One of the answer give the idea of using a different distance, more adapted the the spherical coordinates: the Haversine distance.
def haversine(x1, x2):
theta1, phi1 = x1
theta2, phi2 = x2
return 2 * numpy.arcsin(
numpy.sqrt(
numpy.sin((theta2 - theta1) / 2) ** 2
+ numpy.cos(theta1) * numpy.cos(theta2) * numpy.sin((phi2 - phi1) / 2) ** 2
)
)
# Create the interpolator.
interpolator = scipy.interpolate.Rbf(theta_interp, phi_interp, r_interp, norm=haversine)
which, when executed, gives a warning:
LinAlgWarning: Ill-conditioned matrix (rcond=1.33262e-19): result may not be accurate.
self.nodes = linalg.solve(self.A, self.di)
and a result that is not at all the one expected: the interpolated function have values that may go up to -1 which is clearly wrong.
You can use Cartesian coordinate instead of Spherical coordinate.
The default norm parameter ('euclidean') used by Rbf is sufficient
# interpolation
x, y, z = numpy.array(points).T
interpolator = scipy.interpolate.Rbf(x, y, z, r_interp)
# predict
heatmap = interpolator(X, Y, Z)
Here the result:
ax.plot_surface(
X, Y, Z,
rstride=1, cstride=1,
# or rcount=50, ccount=50,
facecolors=colormap(normaliser(heatmap)),
cmap=colormap,
alpha=0.7, shade=False
)
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
You can also use a cosine distance if you want (norm parameter):
def cosine(XA, XB):
if XA.ndim == 1:
XA = numpy.expand_dims(XA, axis=0)
if XB.ndim == 1:
XB = numpy.expand_dims(XB, axis=0)
return scipy.spatial.distance.cosine(XA, XB)
In order to better see the differences,
I stacked the two images, substracted them and inverted the layer.

Scipy curve_fit gives wrong answer

I have an oscillating data as shown in the below figure and want to fit a sine curve to it. However, my result is not correct.
The function that I want to fit to this curve is:
def radius (z,phi, a0, k0,):
Z = z.reshape(z.shape[0],1)
k = np.array([k0,])
a = np.array([a0,])
r0 = 110
rs = r0 + np.sum(a*np.sin(k*Z +phi), axis=1)
return rs
a correct solution could look like this:
r_fit = radius(z, phi=np.pi/.8, a0=10,k0=0.017)
plt.plot(z, r, label='data')
plt.plot(z, r_fit, label='fitted curve')
plt.legend()
My result however from fitting the curve looks:
from scipy.optimize import curve_fit
popt, pcov = curve_fit(radius, xdata=z, ydata=r)
r_fit = radius(z, *popt)
plt.plot(z, r, label='data')
plt.plot(z, r_fit, label='fitted curve')
plt.legend()
My data is also as follow:
r = np.array([100.09061214, 100.17932773, 100.45526772, 102.27891728,
113.12440802, 119.30644014, 119.86570527, 119.75184665,
117.12160143, 101.55081608, 100.07280857, 100.12880236,
100.39251753, 103.05404178, 117.15257288, 119.74048706,
119.86955437, 119.37452005, 112.83384329, 101.0507198 ,
100.05521567])
z = np.array([-407.90074345, -360.38004677, -312.99221012, -266.36934609,
-224.36240585, -188.55933945, -155.21242348, -122.02778866,
-87.84335638, -47.0274899 , 0. , 47.54559191,
94.97469981, 141.33801462, 181.59490575, 215.77219256,
248.95956379, 282.28027286, 318.16440024, 360.7246922 ,
407.940799 ])
since my function simply represents a Fourier series, I also tried scipy.fftpack.fft(r) but I couldn't reproduce a close signal to that of which I have calculated the fft.
Here is a graphical Python fitter with a sine equation and your data using the scipy.optimize Differential Evolution genetic algorithm module to determine initial parameter estimates for curve_fit's non-linear solver. That scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space requiring bounds within which to search. In this example those bounds are taken from the data maximum and minimum values.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
r = numpy.array([100.09061214, 100.17932773, 100.45526772, 102.27891728,
113.12440802, 119.30644014, 119.86570527, 119.75184665,
117.12160143, 101.55081608, 100.07280857, 100.12880236,
100.39251753, 103.05404178, 117.15257288, 119.74048706,
119.86955437, 119.37452005, 112.83384329, 101.0507198 ,
100.05521567])
z = numpy.array([-407.90074345, -360.38004677, -312.99221012, -266.36934609,
-224.36240585, -188.55933945, -155.21242348, -122.02778866,
-87.84335638, -47.0274899 , 0. , 47.54559191,
94.97469981, 141.33801462, 181.59490575, 215.77219256,
248.95956379, 282.28027286, 318.16440024, 360.7246922 ,
407.940799 ])
# rename data to match previous example code
xData = z
yData = r
def func (x, amplitude, center, width, offset): # equation sine[radians] + offset from zunzun.com
return amplitude * numpy.sin(numpy.pi * (x - center) / width) + offset
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
diffY = maxY - minY
diffX = maxX - minX
parameterBounds = []
parameterBounds.append([0.0, diffY]) # search bounds for amplitude
parameterBounds.append([minX, maxX]) # search bounds for center
parameterBounds.append([0.0, diffX]) # search bounds for width
parameterBounds.append([minY, maxY]) # search bounds for offset
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
The problem is that without providing an initial guess, the solution is not able to converge. Try adding a sensible initial guess:
p0 = [np.pi/.8, 10, 0.017]
popt, pcov = curve_fit(radius, xdata=z, ydata=r, p0=p0)
Note that if you were to use one of the other methods such as trf or dogbox then without the initial guess this would be more likely to return a RunTime error due to the parameters not being able to converge.

Gaussian fit for python with confidence interval

I'd like to make a Gaussian Fit for some data that has a rough gaussian fit. I'd like the information of data peak (A), center position (mu), and standard deviation (sigma), along with 95% confidence intervals for these values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import norm
# gaussian function
def gaussian_func(x, A, mu, sigma):
return A * np.exp( - (x - mu)**2 / (2 * sigma**2))
# generate toy data
x = np.arange(50)
y = [ 97.04421053, 96.53052632, 96.85684211, 96.33894737, 96.85052632,
96.30526316, 96.87789474, 96.75157895, 97.05052632, 96.73473684,
96.46736842, 96.23368421, 96.22526316, 96.11789474, 96.41263158,
96.32631579, 96.33684211, 96.44421053, 96.48421053, 96.49894737,
97.30105263, 98.58315789, 100.07368421, 101.43578947, 101.92210526,
102.26736842, 101.80421053, 101.91157895, 102.07368421, 102.02105263,
101.35578947, 99.83578947, 98.28, 96.98315789, 96.61473684,
96.82947368, 97.09263158, 96.82105263, 96.24210526, 95.95578947,
95.84210526, 95.67157895, 95.83157895, 95.37894737, 95.25473684,
95.32842105, 95.45684211, 95.31578947, 95.42526316, 95.30526316]
plt.scatter(x,y)
# initial_guess_of_parameters
# この値はソルバーとかで求めましょう.
parameter_initial = np.array([652, 2.9, 1.3])
# estimate optimal parameter & parameter covariance
popt, pcov = curve_fit(gaussian_func, x, y, p0=parameter_initial)
# plot result
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_func(xd, popt[0], popt[1], popt[2])
plt.plot(xd, estimated_curve, label="Estimated curve", color="r")
plt.legend()
plt.savefig("gaussian_fitting.png")
plt.show()
# estimate standard Error
StdE = np.sqrt(np.diag(pcov))
# estimate 95% confidence interval
alpha=0.025
lwCI = popt + norm.ppf(q=alpha)*StdE
upCI = popt + norm.ppf(q=1-alpha)*StdE
# print result
mat = np.vstack((popt,StdE, lwCI, upCI)).T
df=pd.DataFrame(mat,index=("A", "mu", "sigma"),
columns=("Estimate", "Std. Error", "lwCI", "upCI"))
print(df)
Data Plot with Fitted Curve
The data peak and center position seems correct, but the standard deviation is off. Any input is greatly appreciated.
Your scatter indeed looks similar to a gaussian distribution, but it is not centered around zero. Given the specifics of the Gaussian function it will therefor be hard to nicely fit a Gaussian distribution to the data the way you gave us. I would therefor propose by starting with demeaning the x series:
x = np.arange(0, 50) - 24.5
Next I would add one additional parameter to your gaussian function, the offset. Since the regular Gaussian function will always have its tails close to zero it is impossible to otherwise nicely fit your scatterplot:
def gaussian_function(x, A, mu, sigma, offset):
return A * np.exp(-np.power((x - mu)/sigma, 2.)/2.) + offset
Next you should define an error_loss_function to minimise:
def error_loss_function(params):
gaussian = gaussian_function(x, params[0], params[1], params[2], params[3])
errors = gaussian - y
return sum(np.power(errors, 2)) # You can also pick a different error loss function!
All that remains is fitting our curve now:
fit = scipy.optimize.minimize(fun=error_loss_function, x0=[2, 0, 0.2, 97])
params = fit.x # A: 6.57592661, mu: 1.95248855, sigma: 3.93230503, offset: 96.12570778
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_function(xd, params[0], params[1], params[2], params[3])
plt.plot(xd, estimated_curve, label="Estimated curve", color="b")
plt.legend()
plt.show(block=False)
Hopefully this helps. Looks like a fun project, let me know if my answer is not clear.

Why does Python's curve_fit not finish the optimization?

I need to find two parameters of an equation that best fit the given values of x and y.
I'm using Python 3, with Numpy and Scipy.
from scipy.optimize import curve_fit
def func(dx, d50, p):
return (1 / (1 + ((d50 / dx) ** p)))
xdata = [280, 150, 75, 45, 38, 20, 10, 5.1, 2.6]
ydata = [99.57592773, 95.53773499, 81.14313507, 67.08183289, 62.93716431, 49.961483, 37.80876923, 24.53152657, 13.2219696]
# curve fit:
popt, pcov = curve_fit(func, xdata, ydata)
print(popt)
I expect a d50 ~ 20 and a p > 0.
But Python send to me:
[0.00221498 1.60291553]
> /usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4:
> RuntimeWarning: invalid value encountered in power
after removing the cwd from sys.path.
I was unable to obtain a good fit to your data using the equation in your post. My equation search found that a standard Weibull peak equation, "a * exp(-0.5 * pow(log(x/b) / c, 2.0))", gave RMSE= 1.619 and R-squared = 0.997 for parameters a = 103.1533969, b = 498.93546398 and c = 2.67321918 as shown below. I have included a Python graphical fitter using this equation and the standard scipy differential_evolution genetic algorithm module to find initial parameter estimates for curve_fit(), this scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space and that algorithm requires bounds within which to search. In this example, the search bounds are derived from the data. It is much easier to determine ranges for the initial parameter estimates than to find specific values.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData = [280, 150, 75, 45, 38, 20, 10, 5.1, 2.6]
yData = [99.57592773, 95.53773499, 81.14313507, 67.08183289, 62.93716431, 49.961483, 37.80876923, 24.53152657, 13.2219696]
def func(x, a, b, c): # Peak_WeibullPeak_model from zunzun.com
return a * numpy.exp(-0.5 * numpy.power(numpy.log(x/b) / c, 2.0))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
minData = min(minX, minY)
maxData = max(maxY, maxX)
parameterBounds = []
parameterBounds.append([minData, maxData]) # search bounds for a
parameterBounds.append([minData, maxData]) # search bounds for b
parameterBounds.append([minData, maxData]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

python curve_fitting with bad results

the link of data from dropboxbadfittingI tried use the curve_fit to fit the data with my pre_defined function in python, but the result was far to perfect. The code is simple and shown as below. I have no idea what's wrong.
Since I am new to python, are there any other optimization or fitting methods which are suitable for my case with predefined function?
Thanks in advance!
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, r1, r2, r3,l,c):
w=2*math.pi*x
m=r1+(r2*l*w)/(r2**2+l**2*w**2)+r3/(1+r3*c**2*w**2)
n=(r2**2*l*w)/(r2**2+l**2*w**2)-r3**3*c*w/(1+r3*c**2*w**2)
y= (m**2+n**2)**.5
return y
def readdata(filename):
x = filename.readlines()
x = list(map(lambda s: s.strip(), x))
x = list(map(float, x))
return x
# test data
f_x= open(r'C:\Users\adm\Desktop\simpletry\fre.txt')
xdata = readdata(f_x)
f_y= open(r'C:\Users\adm\Desktop\simpletry\impedance.txt')
ydata = readdata(f_y)
xdata = np.array(xdata)
ydata = np.array(ydata)
plt.semilogx(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata, bounds=((0, 0, 0, 0, 0), (np.inf, np.inf, np.inf, np.inf, np.inf)))
plt.semilogx(xdata, func(xdata, *popt), 'r-', label='fitted curve')
print(popt)
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
as you guessed, this is a LCR circuit model. now I am trying to fit two curves with the same parameters like
def func1(x, r1, r2, r3,l,c):
w=2*math.pi*x
m=r1+(r2*l*w)/(r2**2+l**2*w**2)+r3/(1+r3*c**2*w**2)
return m
def func2(x, r1, r2, r3,l,c):
w=2*math.pi*x
n=(r2**2*l*w)/(r2**2+l**2*w**2)-r3**3*c*w/(1+r3*c**2*w**2)
return n
is it possible to use the curve_fitting to optimize the parameters?
Here are my results using scipy's differential_evolution genetic algorithm module to generate the initial parameter estimates for curve_fit, along with a simple "brick wall" in the function to ensure all parameters are positive. Scipy's implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example, those bounds are taken from the data maximum and minimum values. My results:
RMSE: 7.415
R-squared: 0.999995
r1 = 1.16614005e+00
r2 = 2.00000664e+05
r3 = 1.54718886e+01
l = 1.94473531e+04
c = 4.32515535e+05
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
def func(x, r1, r2, r3,l,c):
# "brick wall" ensuring all parameters are positive
if r1 < 0.0 or r2 < 0.0 or r3 < 0.0 or l < 0.0 or c < 0.0:
return 1.0E10 # large value gives large error, curve_fit hits a brick wall
w=2*numpy.pi*x
m=r1+(r2*l*w)/(r2**2+l**2*w**2)+r3/(1+r3*c**2*w**2)
n=(r2**2*l*w)/(r2**2+l**2*w**2)-r3**3*c*w/(1+r3*c**2*w**2)
y= (m**2+n**2)**.5
return y
def readdata(filename):
x = filename.readlines()
x = list(map(lambda s: s.strip(), x))
x = list(map(float, x))
return x
# test data
f_x= open('/home/zunzun/temp/data/fre.txt')
xData = readdata(f_x)
f_y= open('/home/zunzun/temp/data/impedance.txt')
yData = readdata(f_y)
xData = numpy.array(xData)
yData = numpy.array(yData)
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
minBound = min(minX, minY)
maxBound = max(maxX, maxY)
parameterBounds = []
parameterBounds.append([minBound, maxBound]) # search bounds for r1
parameterBounds.append([minBound, maxBound]) # search bounds for r2
parameterBounds.append([minBound, maxBound]) # search bounds for r3
parameterBounds.append([minBound, maxBound]) # search bounds for l
parameterBounds.append([minBound, maxBound]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
plt.semilogx(xData, yData, 'D')
# create data for the fitted equation plot
yModel = func(xData, *fittedParameters)
# now the model as a line plot
plt.semilogx(xData, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
To have your Least Squares regression make sense, you'll have to at least supply initial parameters which make sense.
As all parameters are by default initiated to the value 1, the biggest influence on the initial regression will be resistor r1 which adds a constant to the mix.
Most probably you'll end up in something like the following configuration:
popt
Out[241]:
array([1.66581563e+03, 2.43663552e+02, 1.13019744e+00, 1.20233767e+00,
5.04984535e-04])
Which will output a neat-looking flat line, due to m = something big + ~0 + ~0 ; n=~0 - ~0, so y = r1.
However, if you initialize your parameters somewhat differently,
popt, pcov = curve_fit(func, xdata.flatten(), ydata.flatten(), p0=[0.1,1e5,1000,1000,0.2],
bounds=((0, 0, 0, 0, 0), (np.inf, np.inf, np.inf, np.inf, np.inf)))
You will get a better looking fit,
popt
Out[244]:
array([1.14947146e+00, 4.12512324e+05, 1.36182466e+02, 8.29771756e+04,
1.77593448e+03])
((fitted-ydata.flatten())**2).mean()
Out[257]: 0.6099524982664816
#RMSE hence 0.78
P.s. My data starts at the second data point, due to a conversion error with pd.read_clipboard where the first row became headers instead of data. Shouldn't change the overall picture though.

Categories

Resources