How to fit an int list to a desired function - python

I have an int list x, like [43, 43, 46, ....., 487, 496, 502](just for example)
x is a list of word count, I want change a list of word count to a list penalty score when training a text classification model.
I'd like use a curve function(maybe like math.log?) use to map value from x to y, and I need the min value in x(43) mapping to y(0.8), and the max value in x(502) to y(0.08), the other values in x map to a y follow the function.
For example:
x = [43, 43, 46, ....., 487, 496, 502]
y_bounds = [0.8, 0.08]
def creat_curve_func(x, y_bounds, curve_shape='log'):
...
func = creat_curve_func(x, y)
assert func(43) == 0.8
assert func(502) == 0.08
func(46)
>>> 0.78652 (just a fake result for example)
func(479)
>>> 0.097 (just a fake result for example)
I quickly found that I have to try some parameter by my self to get a curve function fit my purpose, try again and again.
Then I try to find a lib to do such work, scipy.optimize.curve_fit turns out. But it need three parameter at least: f(the function I want to generate), xdata, ydata(I only have y bounds:0.8, 0.08), only xdata I have.
Is there any good sulotion?
update
I think this is easy unserstood so didn't write the fail code of curve_fit.Is this the reason of down vote?
The reason that why I can't just use curve_fit
x = sorted([43, 43, 46, ....., 487, 496, 502])
y = np.linspace(0.8, 0.08, len(x)) # can not set y as this way which lead to the wrong result
def func(x, a, b):
return a * x +b # I want a curve function in fact, linear is simple to understand here
popt, pcov = curve_fit(func, x, y)
func(42, *popt)
0.47056348146450089 # I want 0.8 here

How about this way?
EDIT: added weights. If you don't need to put your end points exactly on the curve you could use weights:
import scipy.optimize as opti
import numpy as np
xdata = np.array([43, 56, 234, 502], float)
ydata = np.linspace(0.8, 0.08, len(xdata))
weights = np.ones_like(xdata, float)
weights[0] = 0.001
weights[-1] = 0.001
def fun(x, a, b, z):
return np.log(z/x + a) + b
popt, pcov = opti.curve_fit(fun, xdata, ydata, sigma=weights)
print fun(xdata, *popt)
>>> [ 0.79999994 ... 0.08000009]
EDIT:
You can also play with these parameters, of course:
import scipy.optimize as opti
import numpy as np
xdata = np.array([43, 56, 234, 502], float)
xdata = np.round(np.sort(np.random.rand(100) * (502-43) + 43))
ydata = np.linspace(0.8, 0.08, len(xdata))
weights = np.ones_like(xdata, float)
weights[0] = 0.00001
weights[-1] = 0.00001
def fun(x, a, b, z):
return np.log(z/x + a) + b
popt, pcov = opti.curve_fit(fun, xdata, ydata, sigma=weights)
print fun(xdata, *popt)
>>>[ 0.8 ... 0.08 ]

Related

Sine Curve fitting in Python

I want to fit in a one bump of sine cure in this sets of data
xData = np.array([1.7, 8.8, 15, 25, 35, 45, 54.8, 60, 64.7, 70])
yData = np.array([30, 20, 13.2, 6.2, 3.9, 5.2, 10, 14.8, 20, 27.5])
I have successfully fitted in a parabola using scipy.optimize.curve_fit function. But I don't know how to fit a sine curve to the data.
Here's what I did so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.interpolate as inp
xData = np.array([1.7, 8.8, 15, 25, 35, 45, 54.8, 60, 64.7, 70])
yData = np.array([30, 20, 13.2, 6.2, 3.9, 5.2, 10, 14.8, 20, 27.5])
def model_parabola(x, a, b, c):
return a * (x - b) ** 2 + c
def model_sine(x, amp, omega, phase, c, z):
return amp * np.sin(omega * (x - z) + phase) + c
poptsin, pcovsine = curve_fit(model_sine, xData, yData, p0=[np.std(yData) *2 **0.5, 2 * np.pi, 0, np.mean(yData), 0])
popt, pcov = curve_fit(model_parabola, xData, yData, p0=[2, 3, 4])
# for parabola
aopt, bopt, copt = popt
xmodel = np.linspace(min(xData), max(xData), 100)
ymodel = model_parabola(xmodel, aopt, bopt, copt)
print(poptsin)
# for sine curve
ampopt, omegaopt, phaseopt, ccopt, zopt = poptsin
xSinModel = np.linspace(min(xData), max(xData), 100)
ySinModel = model_sine(xSinModel, ampopt, omegaopt, phaseopt, ccopt, zopt)
y_fit = model_sine(xSinModel, *poptsin)
plt.scatter(xData, yData)
plt.plot(xmodel, ymodel, 'r-')
plt.plot(xSinModel, ySinModel, 'g-')
plt.show()
And this is the result:
Try with the following:
def model_sine(x, amp, omega, phase, offset):
return amp * np.sin(omega * x + phase) + offset
poptsin, pcovsine = curve_fit(model_sine, xData, yData,
p0=[np.max(yData) - np.min(yData), np.pi/70, 3, np.max(yData)],
maxfev=5000)
You don't need both phase and z; one should be enough.
I needed to increase the number of allowed function evaluations (maxfev); this would probably not be necessary if the data was completely normalised, although it's still close enough to order of 1.
The usual method of fitting (such as in Python) involves an iterative process starting from "guessed" values of the parameters which must be not too far from the unknown exact values.
An unusual method exists not iterative and not requiring initial values to start the calculus. The application to the case of the sine function is shown below with a numerical example (Computaion with MathCad).
The above method consists in a linear fitting of an integral equation to which the sine function is solution. For general explanation see https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales .
Note : If some particular criteria of fitting is specified ( MLSE, MLSAE, MLSRE or other) one cannot avoid nonlinear regression. Then the approximate values of the parameters found above are very good values to start the iterative process in expecting a better reliability with an usual non-linear regression software.

How to show standard error with curve_fit from scipy in python?

I am trying to fit multiple curve equations to my data to determine what kind of decay curve best represents my data. I am using the curve_fit function within scipy in python.
Here is my example data:
data = {'X':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12 ,13 ,14 ,15
],
'Y':[55, 55, 55, 54, 54, 54, 54, 53, 53, 50, 45, 37, 27, 16, 0
]}
df = pd.DataFrame(data)
I then want to try fitting a linear, logarithmic, second-degree polynomial, and exponential decay curve to my data points and then plotting. I am trying out the following code:
import pandas as pd
import numpy as np
from numpy import array, exp
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# load the dataset
data = df.values
# choose the input and output variables
x, y = data[:, 0], data[:, 1]
def func1(x, a, b):
return a * x + b
def func2(x, a, b):
return a * np.log(x) + b
def func3(x, a, b, c):
return a*x**2+b*x+c
def func4(x, a, b, c):
return a*exp(b*x)+c
params, covs = curve_fit(func1, x, y)
params, _ = curve_fit(func1, x, y)
a, b = params[0], params[1]
yfit1 = a * x + b
print('Linear decay fit:')
print('y = %.5f * x + %.5f' % (a, b))
params, _ = curve_fit(func2, x, y)
a, b = params[0], params[1]
yfit2 = a * np.log(x) + b
print('Logarithmic decay fit:')
print('y = %.5f * ln(x)+ %.5f' % (a, b))
params, _ = curve_fit(func3, x, y)
a, b, c = params[0], params[1], params[2]
yfit3 = a*x**2+b*x+c
print('Polynomial decay fit:')
print('y = %.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
params, _ = curve_fit(func4, x, y)
a, b, c = params[0], params[1], params[2]
yfit4 = a*exp(x*b)+c
print('Exponential decay fit:')
print('y = %.5f * exp(x*%.5f)+%.5f' % (a, b, c))
plt.plot(x, y, 'bo', label="y-original")
plt.plot(x, yfit1, label="y=a * x + b")
plt.plot(x, yfit2, label="y=a * np.log(x) + b")
plt.plot(x, yfit3, label="y=a*x^2 + bx + c")
plt.plot(x, yfit4, label="y=a*exp(x*b)+c")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='best', fancybox=True, shadow=True)
plt.grid(True)
plt.show()
This produces the following text and visual plotting output:
I had originally tried to find a way to show the r-squared value for each curve fit, but I found out that for non-linear curves, R squared is not suitable, and that instead I should be identifying the standard error for each curve fit to best judge which curve equation best describes the decay I am seeing in my datapoints. My question is how can I take my code and the fits of each equation I attempted and output a standard error reading for each curve fit so that I can best judge which equation most accurately represents my data? I am ultimately trying to make the judgement call of saying "I have discovered that as the x-axis value increases, the y-axis value decays ..." and fitting in that blank with "linearly", "logarithmically", second-degree quadratically", or "exponentially". Visually I see that the exponentially decay equation fits my data the best, but I do not have a quantitative basis to make that call, other than simply visually, and so that is why I am trying to find out how I can print the standard error, so that I can judge the best fit for an equation as corresponding to the equation with the lowest standard error. I have been unable to locate just how to derive this from my calculations from documentation.
You can use Mean Absolute Scaled Error (MASE) for comparing the goodness of fit. Define a function for MASE as follows:
import numpy as np
def mase(actual : np.ndarray, predicted : np.ndarray):
forecast_error = np.mean(np.abs(actual - predicted))
naive_forecast = np.mean(np.abs(np.diff(actual)))
mase = forecast_error / naive_forecast
return mase
Then simply compare all of the predicted values with the actual values (both must be NumPy arrays to use the function above). You can then select the model/equation with the lowest MASE value. For example:
>>> actual = np.array([1,2,3])
>>> predicted1 = np.array([1.1, 2.2, 3.3])
>>> mase(actual, predicted1)
0.20000000000000004
>>>
>>> predicted2 = np.array([1.1, 2.1, 3.1])
>>> mase(actual, predicted2)
0.10000000000000009
Here we can safely say that predicted2 has a lower MASE value and thus is favorable.

curve fitting with a numerical model in python

I have some data points to fit with a model. My model is not defined as an equation but as a numerical solution of 3 equations.
My model is defined as below:
def eq(q):
z1=q[0]
z2=q[1]
H=q[2]
F=empty((3))
F[0] = ((J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*sin(z1-z2))+(H*sin(z1-pi/4))+(((3.6*10**5)/2)*sin(2*z1))
F[1] = ((-J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*sin(z1-z2))+(H*sin(z2-pi/4))+(((3.6*10**5)/2)*sin(2*z2))
F[2] = cos(z1-pi/4)
return F
guess1=array([2.35,0.2,125000])
z=fsolve(eq,guess1)
Hc=z[2]*(1-(T/Tb)**(1/2))
that
D=10**(-8)
a=2.2*10**(-28)
Lx=4.28*10**(-9)
and J, b, Tb are parameters and z1, z2, H are variables
My data points are:
T=[10, 60, 110, 160, 210, 260, 300]
Hc=[0.58933, 0.5783, 0.57938, 0.58884, 0.60588, 0.62788, 0.6474]
how can I find J, b, Tb according to fitting model with data points?
You can use scipy.optimize.curve_fit. You need to understand that curve_fit will only care about the input and the output of your function, such that you need to define it this way :
def func(x, *params):
....
return y
Then you can apply curve_fit (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html):
popt, pcov = curve_fit(func, x, y_data)
I wrote and example taking your case. There is 100 ways to write it more efficiently. However, I tried to make it clear so you can understand the link between what you gave and what I propose.
import numpy as np
from scipy.optimize import fsolve, curve_fit
def eq(q, T, J, b):
z1 = q[0]
z2 = q[1]
H = q[2]
D=10**(-8)
a=2.2*10**(-28)
Lx=4.28*10**(-9)
F = np.empty((3))
F[0] = ((J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*np.sin(z1-z2))+(H*np.sin(z1-np.pi/4))+(((3.6*10**5)/2)*np.sin(2*z1))
F[1] = ((-J*(1-(D*(1-(1-8*a*T/D**3)**(1/3)))**(b)/Lx))*np.sin(z1-z2))+(H*np.sin(z2-np.pi/4))+(((3.6*10**5)/2)*np.sin(2*z2))
F[2] = np.cos(z1-np.pi/4)
return F
def func(T,J,b,Tb):
Hc = []
guess1 = np.array([2.35,0.2,125000])
for t in T :
z = fsolve(eq, guess1, args = (t, J, b))
Hc.append(z[2]*(1-(t/Tb)**(1/2)))
return Hc
T = [10, 60, 110, 160, 210, 260, 300]
Hc_exp = [0.58933, 0.5783, 0.57938, 0.58884, 0.60588, 0.62788, 0.6474]
p0 = (1,1,100)
popt, pcov = curve_fit(func, T, Hc_exp, p0)
J = popt[0]
b = popt[1]
Tb = popt[2]
It seems that the fit is hard to be done. To improve that, you can add an initial guess to the parameters through p0, or adding bounds to the parameters (see curve_fit documentation). Once it converges, you can have the error on the estimated parameters.

fit multiple gaussians to the data in python

I am just wondering if there is a easy way to implement gaussian/lorentzian fits to 10 peaks and extract fwhm and also to determine the position of fwhm on the x-values. The complicated way is to separate the peaks and fit the data and extract fwhm.
Data is [https://drive.google.com/file/d/0B6sUnnbyNGuOT2RZb2UwYXU4dlE/view?usp=sharing].
Any advise greatly appreciated. Thanks.
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.txt', delimiter=',')
x, y = data
plt.plot(x,y)
plt.show()
def func(x, *params):
y = np.zeros_like(x)
print len(params)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
guess = [0, 60000, 80, 1000, 60000, 80]
for i in range(12):
guess += [60+80*i, 46000, 25]
popt, pcov = curve_fit(func, x, y, p0=guess)
print popt
fit = func(x, *popt)
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()
Traceback (most recent call last):
File "C:\Users\test.py", line 33, in <module>
popt, pcov = curve_fit(func, x, y, p0=guess)
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 533, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 368, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 19, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "C:\Python27\lib\site-packages\scipy\optimize\minpack.py", line 444, in _ general_function
return function(xdata, *params) - ydata
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
This requires a non-linear fit. A good tool for this is scipy's curve_fit function.
To use curve_fit, we need a model function, call it func, that takes x and our (guessed) parameters as arguments and returns the corresponding values for y. As our model, we use a sum of gaussians:
from scipy.optimize import curve_fit
import numpy as np
def func(x, *params):
y = np.zeros_like(x)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
return y
Now, let's create an initial guess for our parameters. This guess starts with peaks at x=0 and x=1,000 with amplitude 60,000 and e-folding widths of 80. Then, we add candidate peaks at x=60, 140, 220, ... with amplitude 46,000 and width of 25:
guess = [0, 60000, 80, 1000, 60000, 80]
for i in range(12):
guess += [60+80*i, 46000, 25]
Now, we are ready to perform the fit:
popt, pcov = curve_fit(func, x, y, p0=guess)
fit = func(x, *popt)
To see how well we did, let's plot the actual y values (solid black curve) and the fit (dashed red curve) against x:
As you can see, the fit is fairly good.
Complete working code
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.txt', delimiter=',')
x, y = data
plt.plot(x,y)
plt.show()
def func(x, *params):
y = np.zeros_like(x)
for i in range(0, len(params), 3):
ctr = params[i]
amp = params[i+1]
wid = params[i+2]
y = y + amp * np.exp( -((x - ctr)/wid)**2)
return y
guess = [0, 60000, 80, 1000, 60000, 80]
for i in range(12):
guess += [60+80*i, 46000, 25]
popt, pcov = curve_fit(func, x, y, p0=guess)
print popt
fit = func(x, *popt)
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()
#john1024's answer is good, but requires a manual process to generate the initial guess. here's an easy way to automate the starting guess. replace the relevant 3 lines of john1024's code by the following:
import scipy.signal
i_pk = scipy.signal.find_peaks_cwt(y, widths=range(3,len(x)//Npks))
DX = (np.max(x)-np.min(x))/float(Npks) # starting guess for component width
guess = np.ravel([[x[i], y[i], DX] for i in i_pk]) # starting guess for (x, amp, width) for each component
IMHO it is always advisable to plot the residual (data - model) in problems such as this. You will also want to the look at the ChiSq of the fit.

Getting standard error associated with parameter estimates from scipy.optimize.curve_fit

I am using scipy.optimize.curve_fit to fit a curve to some data i have. The curves, for the most part, seem to fit very well. For some reason, pcov = inf when i print it off.
What i really need is to calculate the error associated with the parameters i'm fitting, and am not sure how exactly to do this even if it does give me the covariance matrix.
The model being fit to is:
def intensity(x,R_out,R_in,K_in,K_out,a,b,c):
K_in,K_out = abs(0.0),abs(K_out)
if x<=R_in:
return 2*R_out*(K_out*np.sqrt(1-x**2/R_out**2)-
(K_out-0.0)*np.sqrt(R_in**2/R_out**2-x**2/R_out**2)) + c
elif x>=R_in and x<=R_out:
return K_out*2*R_out*np.sqrt(1-x**2/R_out**2) + c
elif x>R_out:
return c
intensity_vec = np.vectorize(intensity)
def intensity_vec_self(x,R_out,R_in,K_in,K_out,a,b,c):
y = np.zeros(x.shape)
for i in range(len(y)):
y[i]=intensity_vec(x[i],R_out,R_in,K_in,K_out,a,b,c)
return y
and there are 400 data points, i can put that on here if you think it will help.
To summarize, i can't get curve_fit to print off my pcov and need help as to figure out why and if i can get it to do so.
Also, if it is a quick explanation i would like to know how to use the pcov array to attain the errors associated with my fit.
Thanks
The variance of parameters are the diagonal elements of the variance-co variance matrix, and the standard error is the square root of it. np.sqrt(np.diag(pcov))
Regarding getting inf, see and compare these two examples:
In [129]:
import numpy as np
def func(x, a, b, c, d):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5, 1)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ inf inf inf inf]
And:
In [130]:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ 0.11097646 0.11849107 0.05230711]
In this extreme example, d has no effect on the function func, hence it will be associated with variance of +inf, or in another word, it can be just about any value. Removing d from func will get what will make sense.
In reality, if parameters are of very different scale, say:
def func(x, a, b, c, d):
#return a * np.exp(-b * x) + c
return a * np.exp(-b * x) + c + d*1e-10
You will also get inf due to float point overflow/underflow.
In your case, I think you never used a and b. So it is just like the first example here.

Categories

Resources