Python Estimate the standard deviation after data fitting - python

I am trying to fit a data set into the hyperpolic equation using ipython --pylab:
y = ax / (b + x)
Here is my python code:
from scipy import optimize as opti
import numpy as np
from pandas import DataFrame
x = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8])
y = np.array([0.375, 0.466, 0.509, 0.520, 0.525, 0.536, 0.541])
y_stdev = np.array([0.025, 0.016, 0.009, 0.009, 0.025, 0.019])
def func(x, a, b):
return a*x / (b + x)
popt, pcov = opti.curve_fit(func, x, y)
print(popt)
print("a = ", popt.ix[0])
print("b = ", popt.ix[1])
The values of a and b should be inside the popt parameter. What I would like to ask is, the values of a and b are inferred when fitting the data set into the func(x, a, b), then, how can we estimate the standard deviations of a and b?
Thank you.

The answer is in the docs:
pcov : 2d array
The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov))...

Related

exponential curve fit parameters in python do not make sense--fit itself looks great

I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.
The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.
here is the plot
(https://i.stack.imgur.com/fBfg7.png)](https://i.stack.imgur.com/fBfg7.png)
here is the experimental raw data I used: https://pastebin.com/CR2BCJji
xdata = cfu_u
ydata = OD_u
min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))
def func(x,a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()
print(popt)
[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I used the curve_fit function from scipy to fit an exponential curve to some data. The fit looks very good, so that part was a success.
However, the parameters output by the curve_fit function do not make sense, and solving f(x) with them results in f(x)=0 for every value of x, which is clearly not what is happening in the curve.
Modify your model to show what's actually happening:
def func(x: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
return (a*x)**(1 - b) + (c - a)*x
producing optimized parameters
[3.49003332e-04 6.60420171e-06 3.13366557e-08]
This is likely to be numerically unstable. Try optimizing in the log domain instead.
When I run your example (after adding imports, etc.), I get NaNs for popt, and I eventually realized you were allowing general, real b with negative x. If I fit to the positive x only, I get a popt of [1.89176133e+01 5.66689997e+00 1.29380532e+08]. The fit isn't too bad (see below), but perhaps you need to restrict b to be an integer to fit the whole set. I'm not sure how to do that in Scipy (I assume you need mixed integer-real optimization, and I haven't investigated if Scipy supports that.)
Code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
cfu_u, OD_u = np.loadtxt('data.txt', skiprows=1).T
# fit to positive x only
posmask = cfu_u > 0
xdata = cfu_u[posmask]
ydata = OD_u[posmask]
def func(x, a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata, p0=[1000,2,1])
x_vec = np.geomspace(xdata.min(), xdata.max())
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data', marker='x',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.yscale("log")
plt.xscale("symlog")
plt.show()
print(popt)
#[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]

Limit index values in scipy fitting

I'm trying to fit the following data
tau = [0.0001, 0.0004, 0.0006, 0.0008, 0.001, 0.0015, 0.002, 0.004, 0.006, 0.008, 0.01, 0.05, 0.1, 0.2, 0.5, 0.6, 0.8, 1.0, 1.5, 2.0, 4.0, 6.0, 8.0, 10.0]
tet = [1.000000000, 0.993790739, 0.965602604, 0.924802378, 0.88010508, 0.778684048, 0.702773729, 0.569882533, 0.544103907, 0.54709633, 0.547347558, 0.543859156, 0.504348651, 0.691909732, 0.351717086, 0.405861814, 0.340536768, 0.301032851, 0.192656835, 0.188915355, 0.100207658, 0.059809495, 0.035968302, 0.024147687]
using a summation with the general formula
f(x) = $\sum_{i=1}^{n} a_i* exp^{-x/ti}$
I'm doing it separately, I'm sure I can do it using a for a function or something like that but I do not know how to do it. So here it goes
def fitfunc_1(x, a, t1):
return a * np.exp(- x / t1)
popt_tet_1, pcov = curve_fit(fitfunc_1, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
def fitfunc_2(x, a, t1, b, t2):
return a * np.exp(- x / t1) + b * np.exp(- x / t2)
popt_tet_2, pcov = curve_fit(fitfunc_2, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
def fitfunc_3(x, a, t1, b, t2, c, t3):
return a * np.exp(- x / t1) + b * np.exp(- x / t2) + c * np.exp(- x / t3)
popt_tet_3, pcov = curve_fit(fitfunc_3, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
However, I need to make sure that the sum of the a_i indexes, a, b and c are around 1. Meaning a ~ 1, a + b ~ 1, a + b + c ~ 1
Is there a way to limit scipy's fitting function this way?
Sorry for the noob question I guess
I tried to fit to your data to a sum of two exponentials and also to a sum of three exponentials. In both cases the fitting is correct only on a part of the range but never on the whole range. The difficulty can be understood in plotting the experimental points with a logarithmic scale on the abscissa axis.
The shape of the pattern looks more like the sum of fuctions of logistic kind than the sum of functions of exponential kind.
This suggests that each term of the sum might be on this form :
Thus the whole function to be fitted is :
NOTE : The above is a preliminary study in order to find a convenient kind of function to be fitted. The above numerical values of parameters are only empirically approximated. In order to have a better fit one have still to compute the parameters thanks to non-linear regression in using iterative calculus. The initial values to start the iterative process can be the above values of parameters.

Curve Fitting scipy

why this fitting is this much bad ?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def fit(x, a, b, c, d):
return a * np.sin(b * x + c) + d
xdata = np.linspace(0, 360, 1000)
ydata = 89.9535 + 60.9535 * np.sin(0.0174 * xdata - 1.5708)
popt, pcov = curve_fit(fit, xdata, ydata)
plt.plot(xdata, 89.9535 + 60.9535 * np.sin(0.0174 * xdata - 1.5708))
plt.plot(xdata, fit(xdata, popt[0], popt[1], popt[2], popt[3]))
plt.show()
the fitted curve seems very strange, or maybe I am miss using it , thanks for any helps .
This is the result:
curve_fit finds a local minimum for the least-squares problem. In this case, there are many local minima.
One way around this is to use as good an initial guess as possible. For problems with multiple local minima, curve_fit's default of all ones for the initial guess can be pretty bad. For your function, the crucial parameter is b, the frequency. If you know that value will be small, i.e. on the order of 0.01, use 0.01 as the initial guess:
In [77]: (a, b, c, d), pcov = curve_fit(fit, xdata, ydata, p0=[1, .01, 1, 1])
In [78]: a
Out[78]: 60.953499999999998
In [79]: b
Out[79]: 0.017399999999999999
In [80]: c
Out[80]: -102.10176491487339
In [81]: ((c + np.pi) % (2*np.pi)) - np.pi
Out[81]: -1.570800000000002
In [82]: d
Out[82]: 89.953500000000005
As an alternative, plot the original data alone and use it to make initial guesses of the parameters. For a periodic function it can be easy to estimate the period and the amplitude. In this case the guesses need not be too close.
Then I used these in curve_fit:
popt, pcov = curve_fit(fit, xdata, ydata, [ 80., np.pi/330, 1., 1. ])
The result it returned are essentially the original values.
array([ 6.09535000e+01, 1.74000000e-02, -1.57080000e+00,
8.99535000e+01])

Getting standard error associated with parameter estimates from scipy.optimize.curve_fit

I am using scipy.optimize.curve_fit to fit a curve to some data i have. The curves, for the most part, seem to fit very well. For some reason, pcov = inf when i print it off.
What i really need is to calculate the error associated with the parameters i'm fitting, and am not sure how exactly to do this even if it does give me the covariance matrix.
The model being fit to is:
def intensity(x,R_out,R_in,K_in,K_out,a,b,c):
K_in,K_out = abs(0.0),abs(K_out)
if x<=R_in:
return 2*R_out*(K_out*np.sqrt(1-x**2/R_out**2)-
(K_out-0.0)*np.sqrt(R_in**2/R_out**2-x**2/R_out**2)) + c
elif x>=R_in and x<=R_out:
return K_out*2*R_out*np.sqrt(1-x**2/R_out**2) + c
elif x>R_out:
return c
intensity_vec = np.vectorize(intensity)
def intensity_vec_self(x,R_out,R_in,K_in,K_out,a,b,c):
y = np.zeros(x.shape)
for i in range(len(y)):
y[i]=intensity_vec(x[i],R_out,R_in,K_in,K_out,a,b,c)
return y
and there are 400 data points, i can put that on here if you think it will help.
To summarize, i can't get curve_fit to print off my pcov and need help as to figure out why and if i can get it to do so.
Also, if it is a quick explanation i would like to know how to use the pcov array to attain the errors associated with my fit.
Thanks
The variance of parameters are the diagonal elements of the variance-co variance matrix, and the standard error is the square root of it. np.sqrt(np.diag(pcov))
Regarding getting inf, see and compare these two examples:
In [129]:
import numpy as np
def func(x, a, b, c, d):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5, 1)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ inf inf inf inf]
And:
In [130]:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ 0.11097646 0.11849107 0.05230711]
In this extreme example, d has no effect on the function func, hence it will be associated with variance of +inf, or in another word, it can be just about any value. Removing d from func will get what will make sense.
In reality, if parameters are of very different scale, say:
def func(x, a, b, c, d):
#return a * np.exp(-b * x) + c
return a * np.exp(-b * x) + c + d*1e-10
You will also get inf due to float point overflow/underflow.
In your case, I think you never used a and b. So it is just like the first example here.

how to find 50% point after curve fitting using numpy

I have used numpy in python to fit my data to a sigmoidal curve. How can I find the vaue for X at y=50% point in the curve after the data is fit to the curve
enter code here`import numpy as np
enter code here`import pylab
from scipy.optimize import curve_fit
def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y
xdata = np.array([0.0, 1.0, 3.0, 4.3, 7.0, 8.0, 8.5, 10.0, 12.0])
ydata = np.array([0.01, 0.02, 0.04, 0.11, 0.43, 0.7, 0.89, 0.95, 0.99])
popt, pcov = curve_fit(sigmoid, xdata, ydata)
print popt
x = np.linspace(-1, 15, 50)
y = sigmoid(x, *popt)
pylab.plot(xdata, ydata, 'o', label='data')
pylab.plot(x,y, label='fit')
pylab.ylim(0, 1.05)
pylab.legend(loc='best')
pylab.show()
You just need to solve the function you found for y(x) = 0.50. You can use one of the root finding tools of scipy, though these only solve for zero, so you need to give your function an offset:
def sigmoid(x, x0, k, y0=0):
y = 1 / (1 + np.exp(-k*(x-x0))) + y0
return y
Then it's just a matter of calling the root finding method of choice:
from scipy.optimize import brentq
a = np.min(xdata)
b = np.max(xdata)
x0, k = popt
y0 = -0.50
solution = brentq(sigmoid, a, b, args=(x0, k, y0)) # = 7.142
In addition to your comment:
My code above uses the original popt that was calculated with your code. If you do the curve fitting with the updated sigmoid function (with the offset), popt will also contain a fitted parameter for y0.
Probably you don't want this.. you'll want the curve fitted for y0=0. This can be done by supplying a guess for the curve_fit with only two values. This way the default value for y0 of the sigmoid function will be used:
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0 = (1,1))
Alternatively, just declare two seperate sigmmoid functions, one with the offset and one without it.

Categories

Resources