Sine Curve fitting in Python - python

I want to fit in a one bump of sine cure in this sets of data
xData = np.array([1.7, 8.8, 15, 25, 35, 45, 54.8, 60, 64.7, 70])
yData = np.array([30, 20, 13.2, 6.2, 3.9, 5.2, 10, 14.8, 20, 27.5])
I have successfully fitted in a parabola using scipy.optimize.curve_fit function. But I don't know how to fit a sine curve to the data.
Here's what I did so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.interpolate as inp
xData = np.array([1.7, 8.8, 15, 25, 35, 45, 54.8, 60, 64.7, 70])
yData = np.array([30, 20, 13.2, 6.2, 3.9, 5.2, 10, 14.8, 20, 27.5])
def model_parabola(x, a, b, c):
return a * (x - b) ** 2 + c
def model_sine(x, amp, omega, phase, c, z):
return amp * np.sin(omega * (x - z) + phase) + c
poptsin, pcovsine = curve_fit(model_sine, xData, yData, p0=[np.std(yData) *2 **0.5, 2 * np.pi, 0, np.mean(yData), 0])
popt, pcov = curve_fit(model_parabola, xData, yData, p0=[2, 3, 4])
# for parabola
aopt, bopt, copt = popt
xmodel = np.linspace(min(xData), max(xData), 100)
ymodel = model_parabola(xmodel, aopt, bopt, copt)
print(poptsin)
# for sine curve
ampopt, omegaopt, phaseopt, ccopt, zopt = poptsin
xSinModel = np.linspace(min(xData), max(xData), 100)
ySinModel = model_sine(xSinModel, ampopt, omegaopt, phaseopt, ccopt, zopt)
y_fit = model_sine(xSinModel, *poptsin)
plt.scatter(xData, yData)
plt.plot(xmodel, ymodel, 'r-')
plt.plot(xSinModel, ySinModel, 'g-')
plt.show()
And this is the result:

Try with the following:
def model_sine(x, amp, omega, phase, offset):
return amp * np.sin(omega * x + phase) + offset
poptsin, pcovsine = curve_fit(model_sine, xData, yData,
p0=[np.max(yData) - np.min(yData), np.pi/70, 3, np.max(yData)],
maxfev=5000)
You don't need both phase and z; one should be enough.
I needed to increase the number of allowed function evaluations (maxfev); this would probably not be necessary if the data was completely normalised, although it's still close enough to order of 1.

The usual method of fitting (such as in Python) involves an iterative process starting from "guessed" values of the parameters which must be not too far from the unknown exact values.
An unusual method exists not iterative and not requiring initial values to start the calculus. The application to the case of the sine function is shown below with a numerical example (Computaion with MathCad).
The above method consists in a linear fitting of an integral equation to which the sine function is solution. For general explanation see https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales .
Note : If some particular criteria of fitting is specified ( MLSE, MLSAE, MLSRE or other) one cannot avoid nonlinear regression. Then the approximate values of the parameters found above are very good values to start the iterative process in expecting a better reliability with an usual non-linear regression software.

Related

scipy curve_fit give the initial guess values as optimal results when I set the bounds parameter

I'm trying to fit a curve with a Gaussian plus a Lorentzian function, using the curve_fit function from scipy.
def gaussian(x, a, x0, sig):
return a * np.exp(-1/2 * (x - x0)**2 / sig**2)
def lorentzian(x, a, b, c):
return a*c**2/((x-b)**2+c**2)
def decompose(x, z, n, b, *par):
hb_n = gaussian(x, par[0], 4861.3*(1+z), n)
hb_b = lorentzian(x, par[1], 4861.3*(1+z), b)
return hb_b + hb_n
And when I set the p0 parameter, I can get a reasonable result, which fits the curve well.
guess = [0.0001, 2, 10, 3e-16, 3e-16]
p, c = curve_fit(decompose, wave, residual, guess)
fitting parameters
the fitting model and data figure when I set the p0 parameter
But if I set the p0 and bounds parameters simultaneously, the curve_fit function gives the initial guess as the final fitting result, which is rather deviated from the data.
guess = [0.0001, 2, 10, 3e-16, 3e-16]
p, c = curve_fit(decompose, wave, residual, guess, bounds=([-0.001, 0, 0, 0, 0], [0.001, 10, 100, 1e-15, 1e-15]))
fitting parameters
the fitting model and data figure when I set the p0 and bounds parameters simultaneously
I have tried many different combinations of boundaries for the parameters, but the fitting results invariably return the initial guess values. I've been stuck in this problem for a long time. I would be very grateful if anyone can give me some advice to solve this problem.
This happens due to a combination of the optimization algorithm and its parameters.
From the official documentation:
method{‘lm’, ‘trf’, ‘dogbox’}, optional
Method to use for optimization. See least_squares for more details.
Default is ‘lm’ for unconstrained problems and ‘trf’ if bounds are
provided. The method ‘lm’ won’t work when the number of observations
is less than the number of variables, use ‘trf’ or ‘dogbox’ in this
case.
So when you add bound constraints curve_fit will use different optimization algorithm (trust region instead of Levenberg-Marquardt).
To debug the problem you can try to set full_output=True as Warren Weckesser noted in the comments.
In the case of the fit with bounds you will see something similar to:
'nfev': 1
`gtol` termination condition is satisfied.
So the optimization stopped after the first iteration. That's why founed parameters similar to the initial guess.
To fix this you can specify lower gtol parameter. Full list of available parameters you can find here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares
Example:
Code:
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
def gaussian(x, a, x0, sig):
return a * np.exp(-1 / 2 * (x - x0) ** 2 / sig**2)
def lorentzian(x, a, b, c):
return a * c**2 / ((x - b) ** 2 + c**2)
def decompose(x, z, n, b, *par):
hb_n = gaussian(x, par[0], 4861.3 * (1 + z), n)
hb_b = lorentzian(x, par[1], 4861.3 * (1 + z), b)
return hb_b + hb_n
gt_parameters = [-2.42688295e-4, 2.3477827, 1.56977708e1, 4.47455820e-16, 2.2193466e-16]
wave = np.linspace(4750, 5000, num=400)
gt_curve = decompose(wave, *gt_parameters)
noisy_curve = gt_curve + np.random.normal(0, 2e-17, size=len(wave))
guess = [0.0001, 2, 10, 3e-16, 3e-16]
bounds = ([-0.001, 0, 0, 0, 0], [0.001, 10, 100, 1e-15, 1e-15])
options = [
("Levenberg-Marquardt without bounds", dict(method="lm")),
("Trust Region without bounds", dict(method="trf")),
("Trust Region with bounds", dict(method="trf", bounds=bounds)),
(
"Trust Region with bounds + fixed tolerance",
dict(method="trf", bounds=bounds, gtol=1e-36),
),
]
fig, axs = plt.subplots(len(options))
for (title, fit_params), ax in zip(options, axs):
ax.set_title(title)
p, c = curve_fit(decompose, wave, noisy_curve, guess, **fit_params)
fitted_curve = decompose(wave, *p)
ax.plot(wave, gt_curve, label="gt_curve")
ax.plot(wave, noisy_curve, label="noisy")
ax.plot(wave, fitted_curve, label="fitted_curve")
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels)
plt.show()

Python Estimate the standard deviation after data fitting

I am trying to fit a data set into the hyperpolic equation using ipython --pylab:
y = ax / (b + x)
Here is my python code:
from scipy import optimize as opti
import numpy as np
from pandas import DataFrame
x = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8])
y = np.array([0.375, 0.466, 0.509, 0.520, 0.525, 0.536, 0.541])
y_stdev = np.array([0.025, 0.016, 0.009, 0.009, 0.025, 0.019])
def func(x, a, b):
return a*x / (b + x)
popt, pcov = opti.curve_fit(func, x, y)
print(popt)
print("a = ", popt.ix[0])
print("b = ", popt.ix[1])
The values of a and b should be inside the popt parameter. What I would like to ask is, the values of a and b are inferred when fitting the data set into the func(x, a, b), then, how can we estimate the standard deviations of a and b?
Thank you.
The answer is in the docs:
pcov : 2d array
The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov))...

How to fit an int list to a desired function

I have an int list x, like [43, 43, 46, ....., 487, 496, 502](just for example)
x is a list of word count, I want change a list of word count to a list penalty score when training a text classification model.
I'd like use a curve function(maybe like math.log?) use to map value from x to y, and I need the min value in x(43) mapping to y(0.8), and the max value in x(502) to y(0.08), the other values in x map to a y follow the function.
For example:
x = [43, 43, 46, ....., 487, 496, 502]
y_bounds = [0.8, 0.08]
def creat_curve_func(x, y_bounds, curve_shape='log'):
...
func = creat_curve_func(x, y)
assert func(43) == 0.8
assert func(502) == 0.08
func(46)
>>> 0.78652 (just a fake result for example)
func(479)
>>> 0.097 (just a fake result for example)
I quickly found that I have to try some parameter by my self to get a curve function fit my purpose, try again and again.
Then I try to find a lib to do such work, scipy.optimize.curve_fit turns out. But it need three parameter at least: f(the function I want to generate), xdata, ydata(I only have y bounds:0.8, 0.08), only xdata I have.
Is there any good sulotion?
update
I think this is easy unserstood so didn't write the fail code of curve_fit.Is this the reason of down vote?
The reason that why I can't just use curve_fit
x = sorted([43, 43, 46, ....., 487, 496, 502])
y = np.linspace(0.8, 0.08, len(x)) # can not set y as this way which lead to the wrong result
def func(x, a, b):
return a * x +b # I want a curve function in fact, linear is simple to understand here
popt, pcov = curve_fit(func, x, y)
func(42, *popt)
0.47056348146450089 # I want 0.8 here
How about this way?
EDIT: added weights. If you don't need to put your end points exactly on the curve you could use weights:
import scipy.optimize as opti
import numpy as np
xdata = np.array([43, 56, 234, 502], float)
ydata = np.linspace(0.8, 0.08, len(xdata))
weights = np.ones_like(xdata, float)
weights[0] = 0.001
weights[-1] = 0.001
def fun(x, a, b, z):
return np.log(z/x + a) + b
popt, pcov = opti.curve_fit(fun, xdata, ydata, sigma=weights)
print fun(xdata, *popt)
>>> [ 0.79999994 ... 0.08000009]
EDIT:
You can also play with these parameters, of course:
import scipy.optimize as opti
import numpy as np
xdata = np.array([43, 56, 234, 502], float)
xdata = np.round(np.sort(np.random.rand(100) * (502-43) + 43))
ydata = np.linspace(0.8, 0.08, len(xdata))
weights = np.ones_like(xdata, float)
weights[0] = 0.00001
weights[-1] = 0.00001
def fun(x, a, b, z):
return np.log(z/x + a) + b
popt, pcov = opti.curve_fit(fun, xdata, ydata, sigma=weights)
print fun(xdata, *popt)
>>>[ 0.8 ... 0.08 ]

How to use the "Least square method" in Python

I need to determine the values of ceofficients in my equation. For that I decided to use the least square method. The equation is presented below:
The equation presents a connection between stress and time to failure of a tested product at different temperature levels. The data that I've used is made up, but presents the structure of the actual data, that I will use later on.
For better understanding I also included a graphical correlation:
I am fairly new to python so I didn't know that there so many ways/functions of this method availible, so I decided to try out a few:
Input data
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from lmfit import minimize, Parameters, fit_report
# data
temp =np.array([650, 700, 750, 720, 680]) # temperature
xdata = np.array([500, 525, 540, 534, 490]) # time
ydata = np.array([330, 332, 315, 325, 335]) # stress
T = temp[0]
plt.plot(xdata,ydata,'*')
plt.xlabel('xdata')
plt.ylabel('ydata')
1. Using the curve_fit function
def func(logS, a_0, a_1, a_2, T_a, logt_a):
return logt_a + (T - T_a) * (a_0 + a_1 * logS + a_2 * logS**2)
popt, pcov = curve_fit(func, xdata, ydata, p0=(1, 1, 1, 1, 1))
popt
zapis = 'a_0: {0:1.5e}\na_1: {1:1.5e}\na_2: {2:1.5e}\nT_a: {3:1.5e}\nlogt_a: {4:1.5e}'.format(popt[0], popt[1], popt[2], popt[3], popt[4])
print(zapis)
a_0 = popt[0]
a_1 = popt[1]
a_2 = popt[2]
T_a = popt[3]
logt_a = popt[4]
residuals = ydata - func(logS, a_0, a_1, a_2, T_a, logt_a)
fres = sum(residuals**2)
print(fres)
curvex=np.linspace(np.min(xdata)-np.min(xdata)/10, np.max(xdata)+50, np.max(xdata)/10)
curvey=func(curvex, a_0, a_1, a_2, T_a, logt_a)
plt.plot(xdata,ydata,'*')
plt.plot(curvex,curvey, 'r')
plt.xlabel('xdata')
plt.ylabel('ydata')
2. Using the leastsq function
from scipy.optimize import leastsq
def function(parameters, logS):
a_0, a_1, a_2, T_a, logt_a = parameters
model = logt_a + (T - T_a) * (a_0 + a_1 * logS + a_2 * logS**2)
return model
def objective(pars, t_r, logS):
err = t_r - function(pars, logS)
return err
x0 = [ 1.0, 1.0, 1.0, 1.0, 1.0 ] #initial guess of parameters
plsq = leastsq(objective, x0, args=(ydata, xdata))
print('Fitted parameters = {0}'.format(plsq[0]))
plt.plot(xdata, ydata, 'ro')
#plot the fitted curve on top
x = np.linspace(min(xdata), max(xdata), 50)
y = function(plsq[0], x)
plt.plot(x, y, 'k-')
plt.xlabel('x')
plt.ylabel('y')
In both cases I got this results:
a_0: -5.95683e+02
a_1: 2.65405e-02
a_2: -2.63017e-05
T_a: 1.21502e+02
logt_a: 3.11614e+05
Question 1: What is the best way of determing the initial values of the searched coefficients?
Question 2: Which of the methods in python, that is based on the least square method is the best for equations like in my case?
Question 3: Is there a way to make the process of determing the coefficients as parameters more automated? Because I will have to try out also higher order polynomials which will lead to more coefficients (a_3, a_4, a_5,...). The idea would be to write the order of the polynomial and everything else would then be calculated and formed by itself.

how to find 50% point after curve fitting using numpy

I have used numpy in python to fit my data to a sigmoidal curve. How can I find the vaue for X at y=50% point in the curve after the data is fit to the curve
enter code here`import numpy as np
enter code here`import pylab
from scipy.optimize import curve_fit
def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y
xdata = np.array([0.0, 1.0, 3.0, 4.3, 7.0, 8.0, 8.5, 10.0, 12.0])
ydata = np.array([0.01, 0.02, 0.04, 0.11, 0.43, 0.7, 0.89, 0.95, 0.99])
popt, pcov = curve_fit(sigmoid, xdata, ydata)
print popt
x = np.linspace(-1, 15, 50)
y = sigmoid(x, *popt)
pylab.plot(xdata, ydata, 'o', label='data')
pylab.plot(x,y, label='fit')
pylab.ylim(0, 1.05)
pylab.legend(loc='best')
pylab.show()
You just need to solve the function you found for y(x) = 0.50. You can use one of the root finding tools of scipy, though these only solve for zero, so you need to give your function an offset:
def sigmoid(x, x0, k, y0=0):
y = 1 / (1 + np.exp(-k*(x-x0))) + y0
return y
Then it's just a matter of calling the root finding method of choice:
from scipy.optimize import brentq
a = np.min(xdata)
b = np.max(xdata)
x0, k = popt
y0 = -0.50
solution = brentq(sigmoid, a, b, args=(x0, k, y0)) # = 7.142
In addition to your comment:
My code above uses the original popt that was calculated with your code. If you do the curve fitting with the updated sigmoid function (with the offset), popt will also contain a fitted parameter for y0.
Probably you don't want this.. you'll want the curve fitted for y0=0. This can be done by supplying a guess for the curve_fit with only two values. This way the default value for y0 of the sigmoid function will be used:
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0 = (1,1))
Alternatively, just declare two seperate sigmmoid functions, one with the offset and one without it.

Categories

Resources