Related
I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.
The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.
here is the plot
(https://i.stack.imgur.com/fBfg7.png)](https://i.stack.imgur.com/fBfg7.png)
here is the experimental raw data I used: https://pastebin.com/CR2BCJji
xdata = cfu_u
ydata = OD_u
min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))
def func(x,a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()
print(popt)
[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I used the curve_fit function from scipy to fit an exponential curve to some data. The fit looks very good, so that part was a success.
However, the parameters output by the curve_fit function do not make sense, and solving f(x) with them results in f(x)=0 for every value of x, which is clearly not what is happening in the curve.
Modify your model to show what's actually happening:
def func(x: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
return (a*x)**(1 - b) + (c - a)*x
producing optimized parameters
[3.49003332e-04 6.60420171e-06 3.13366557e-08]
This is likely to be numerically unstable. Try optimizing in the log domain instead.
When I run your example (after adding imports, etc.), I get NaNs for popt, and I eventually realized you were allowing general, real b with negative x. If I fit to the positive x only, I get a popt of [1.89176133e+01 5.66689997e+00 1.29380532e+08]. The fit isn't too bad (see below), but perhaps you need to restrict b to be an integer to fit the whole set. I'm not sure how to do that in Scipy (I assume you need mixed integer-real optimization, and I haven't investigated if Scipy supports that.)
Code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
cfu_u, OD_u = np.loadtxt('data.txt', skiprows=1).T
# fit to positive x only
posmask = cfu_u > 0
xdata = cfu_u[posmask]
ydata = OD_u[posmask]
def func(x, a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata, p0=[1000,2,1])
x_vec = np.geomspace(xdata.min(), xdata.max())
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data', marker='x',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.yscale("log")
plt.xscale("symlog")
plt.show()
print(popt)
#[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I am trying to draw a best fit curve for my data. It is terribly bad sample of data, but for simplicity's sake let's say, I expect to draw a straight line as a best fit in log-log scale.
I think I already did that with regression and it returns me a reasonable fit line. But I want to double check it with curve fit function in scipy. And I also want to extract the equation of the fit line.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.optimize as optimization
x = np.array([ 1.72724547e-08, 1.81960233e-08, 1.68093027e-08, 2.22839973e-08,
2.23090589e-08, 4.28020801e-08, 2.30004711e-08, 2.48543008e-08,
1.08633065e-07, 3.24417303e-08, 3.22946248e-08, 3.82328031e-08,
3.97713860e-08, 3.44080732e-08, 3.81526816e-08, 3.30756706e-08
])
y = np.array([ 4.18793565e+12, 4.40554864e+12, 4.48745390e+12, 4.50816705e+12,
4.57088190e+12, 4.60256574e+12, 4.66659380e+12, 4.79733449e+12, 7.31139083e+12, 7.53355564e+12, 8.03526122e+12, 8.14704284e+12,
8.47227414e+12, 8.62978548e+12, 8.81048873e+12, 9.46237161e+12
])
# Regression Function
def regress(x, y):
"""Return a tuple of predicted y values and parameters for linear regression."""
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# plotting z
allx, ally = x, y # data, non-transformed
y_pred, _ = regress(np.log(allx), np.log(ally)) # change here # transformed input
plt.loglog(allx, ally, marker='$\\star$',color ='g', markersize=5,linestyle='None')
plt.loglog(allx, np.exp(y_pred), "c--", label="regression") # transformed output
# Let's fit an exponential function.
# This looks like a line on a lof-log plot.
def myExpFunc(x, a, b):
return a * np.power(x, b)
popt, pcov = curve_fit(myExpFunc, x, y, maxfev=1000)
plt.plot(x, myExpFunc(x, *popt), 'r:',
label="({0:.3f}*x**{1:.3f})".format(*popt))
print "Exponential Fit: y = (a*(x**b))"
print "\ta = popt[0] = {0}\n\tb = popt[1] = {1}".format(*popt)
plt.show()
Again I apologize for a bad dataset. your help will be very appreciated.
My plot looks like this:
enter code here
Hello I have a problem to fit some data with Python. I just begin to fit my data with Python so I have some problems... This is my code :
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import *
from numpy import linalg as LA
def f(x,a,b,c):
return a*np.power(x,b)+c
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79])
y = np.array([7200,7925,8050,8200,8000,7550,7500,6800,6400,8150,6566,6280,6105,5963,5673,5495,5395,4800,4550,4558,4228,4087,3951,3817,3721,3612,3498,3416,3359,3269,3163,3241,2984,4475,2757,2644,2555,2600,3163,2720,2630,2543,2454,2441,2389,2339,2293,2261,2212,2180,2143,2450,2065,2032,1994,1960,1930,1897,1870,1838,1821,1785,1763,1741,1718,1689,1676,1662,1635,1635,1667,1633,1617,1615,1599,1581,1565,1547,1547])
params, extras = curve_fit(f, x, y)
plt.plot(x,y, 'o')
plt.plot(x, f(x, params[0], params[1], params[2]))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
And actually I want to fit my data with a function f(x) = a*x^b + c where I am looking for the best values of a, b and c to fit my data.
Do you know where there is something which is wrong ?
Thank you for your help.
Three caveats :
your model is not very good.
it diverge in x=0 : don't take first points.
you must give initial parameter estimations.
An exemple:
p0=[50000,-1,0]
x=x[10:]
y=y[10:]
params, cov = curve_fit(f, x, y,p0) #params=[3.16e+04 -5.83e-01 -1.00e+03]
plt.plot(x,y, 'o')
plt.plot(x, f(x, *params))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
You can estimate the quality of the model by
In [178]: np.sqrt(np.diag(cov))/params
Out[178]: array([ 0.12066005, -0.12537714, -0.53450057])
which shows that the estimation of error on parameters is greater than 10%.
The problem is the function you use for fitting. Consider using something like
def f(x, a, b, c):
return a*x + b*np.power(x, 2) + c
EDIT: accidentally posted the original function instead of the one I wanted to suggest.
I am trying to fit my data with a gaussian curve. Here is my code :
import numpy as np
from scipy import optimize
# The independent variable where the data is measured
x_coord = np.array([-0.1216 , -0.11692308, -0.11224615, -0.10756923, -0.10289231,
-0.09821538, -0.09353846, -0.08886154, -0.08418462, -0.07950769,
-0.07483077, -0.07015385, -0.06547692, -0.0608 , -0.05612308,
-0.05144615, -0.04676923, -0.04209231, -0.03741538, -0.03273846,
-0.02806154, -0.02338462, -0.01870769, -0.01403077, -0.00935385,
-0.00467692, 0. , 0.00467692, 0.00935385, 0.01403077,
0.01870769, 0.02338462, 0.02806154, 0.03273846, 0.03741538,
0.04209231, 0.04676923, 0.05144615, 0.05612308, 0.0608 ,
0.06547692, 0.07015385, 0.07483077, 0.07950769, 0.08418462,
0.08886154, 0.09353846, 0.09821538, 0.10289231, 0.10756923,
0.11224615, 0.11692308])
# The dependent data — nominally f(x_coord)
y = np.array([-0.0221931 , -0.02323915, -0.02414913, -0.0255389 , -0.02652465,
-0.02888672, -0.03075954, -0.03355392, -0.03543005, -0.03839526,
-0.040933 , -0.0456585 , -0.04849097, -0.05038776, -0.0466699 ,
-0.04202133, -0.034239 , -0.02667525, -0.01404582, -0.00122683,
0.01703862, 0.03992694, 0.06704549, 0.11362071, 0.28149172,
0.6649422 , 1. , 0.6649422 , 0.28149172, 0.11362071,
0.06704549, 0.03992694, 0.01703862, -0.00122683, -0.01404582,
-0.02667525, -0.034239 , -0.04202133, -0.0466699 , -0.05038776,
-0.04849097, -0.0456585 , -0.040933 , -0.03839526, -0.03543005,
-0.03355392, -0.03075954, -0.02888672, -0.02652465, -0.0255389 ,
-0.02414913, -0.02323915])
# define a gaussian function to fit the data
def gaussian(x, a, b, c):
val = a * np.exp(-(x - b)**2 / c**2)
return val
# fit the data
popt, pcov = optimize.curve_fit(gaussian, x_coord, y, sigma = np.array([0.01] * len(x_coord)))
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, gaussian(x_coord, popt[0], popt[1], popt[2]), 'r:')
the figure shows that the fitting curve is completely wrong :
What should I do to obtain a well fitting curve?
This is actually a very nice question that illustrates that finding the right (local) optimum can be very difficult.
Via the p0 argument you can give the optimization routine a hint, where approximately you would expect the optimum.
If you start with the initial guess of [1,0,0.1]:
# fit the data
sigma = np.array([0.01] * len(x_coord))
popt, pcov = optimize.curve_fit(gaussian, x_coord, y, sigma=sigma, p0=[1,0,0.1])
You get following result:
A couple of notes: You forced curve_fit to fit a bell curve without a constant term. This made things a little awkward.
If you allow an offset d, you get:
# define a gaussian function to fit the data
def gaussian(x, a, b, c, d):
val = a* np.exp(-(x - b)**2 / c**2) + d
return val
And obtain following result:
# fit the data
popt, pcov = optimize.curve_fit(gaussian, x_coord, y)
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, gaussian(x_coord, *popt), 'r:')
Which looks much more like a reasonable fit. Although it seems that the gaussian does not fit the data well.
The very peaked shape suggests that a Laplacian might fit better:
# define a laplacian function to fit the data
def laplacian(x, a, b, c, d):
val = a* np.exp(-np.abs(x - b) / c) + d
return val
# fit the data
popt, pcov = optimize.curve_fit(laplacian, x_coord, y, p0=[1,0,0.01,-0.1])
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, laplacian(x_coord, *popt), 'r:')
This is the result:
I am trying to fit some data points with y uncertainties in python. The data are labeled in python as x,y and yerr.
I need to do a linear fit on that data in loglog scale. As a reference if the fit results are properly, i compare the python results with the ones from Scidavis
I tried curve_fit with
def func(x, a, b):
return np.exp(a* np.log(x)+np.log(b))
popt, pcov = curve_fit(func, x, y,sigma=yerr)
as well as kmpfit with
def funcL(p, x):
a,b = p
return ( np.exp(a*np.log(x)+np.log(b)) )
def residualsL(p, data):
a,b=p
x, y, errorfit = data
return (y-funcL(p,x)) / errorfit
a0=1
b0=0.1
p0 = [a0,b0]
fitterL = kmpfit.Fitter(residuals=residualsL, data=(x,y,yerr))
fitterL.parinfo = [{}, {}]
fitterL.fit(params0=p0)
and when i am trying to fit the data with one of those without uncertainties (i.e setting yerr=1), everything works just fine and the results are identical with the ones from scidavis. But if i set yerr to the uncertainties of the data file i get some disturbing results.
In python i get i.e. a=0.86 and in scidavis a=0.14. I read something about that the errors are included as weights. Do i have to change anything, in order to calculate the fit correctly? Or what am i doing wrong?
edit: here is an example of a data file (x,y,yerr)
3.942387e-02 1.987800e+00 5.513165e-01
6.623142e-02 7.126161e+00 1.425232e+00
9.348280e-02 1.238530e+01 1.536208e+00
1.353088e-01 1.090471e+01 7.829126e-01
2.028446e-01 1.023087e+01 3.839575e-01
3.058446e-01 8.403626e+00 1.756866e-01
4.584524e-01 7.345275e+00 8.442288e-02
6.879677e-01 6.128521e+00 3.847194e-02
1.032592e+00 5.359025e+00 1.837428e-02
1.549152e+00 5.380514e+00 1.007010e-02
2.323985e+00 6.404229e+00 6.534108e-03
3.355974e+00 9.489101e+00 6.342546e-03
4.384128e+00 1.497998e+01 2.273233e-02
and the result:
in python:
without uncertainties: a=0.06216 +/- 0.00650 ; b=8.53594 +/- 1.13985
with uncertainties: a=0.86051 +/- 0.01640 ; b=3.38081 +/- 0.22667
in scidavis:
without uncertainties: a = 0.06216 +/- 0.08060; b = 8.53594 +/- 1.06763
with uncertainties: a = 0.14154 +/- 0.005731; b = 7.38213 +/- 2.13653
I must be misunderstanding something. Your posted data does not look anything like
f(x,a,b) = np.exp(a*np.log(x)+np.log(b))
The red line is the result of scipy.optimize.curve_fit,
the green line is the result of scidavis.
My guess is that neither algorithm is converging toward a good fit, so it is not surprising that the results do not match.
I can't explain how scidavis finds its parameters, but according to the definitions as I understand them, scipy is finding parameters with lower least squares residuals than scidavis:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optimize
def func(x, a, b):
return np.exp(a* np.log(x)+np.log(b))
def sum_square(residuals):
return (residuals**2).sum()
def residuals(p, x, y, sigma):
return 1.0/sigma*(y - func(x, *p))
data = np.loadtxt('test.dat').reshape((-1,3))
x, y, yerr = np.rollaxis(data, axis = 1)
sigma = yerr
popt, pcov = optimize.curve_fit(func, x, y, sigma = sigma, maxfev = 10000)
print('popt: {p}'.format(p = popt))
scidavis = (0.14154, 7.38213)
print('scidavis: {p}'.format(p = scidavis))
print('''\
sum of squares for scipy: {sp}
sum of squares for scidavis: {d}
'''.format(
sp = sum_square(residuals(popt, x = x, y = y, sigma = sigma)),
d = sum_square(residuals(scidavis, x = x, y = y, sigma = sigma))
))
plt.plot(x, y, 'bo', x, func(x,*popt), 'r-', x, func(x, *scidavis), 'g-')
plt.errorbar(x, y, yerr)
plt.show()
yields
popt: [ 0.86051258 3.38081125]
scidavis: (0.14154, 7.38213)
sum of squares for scipy: 53249.9915654
sum of squares for scidavis: 239654.84276