Optimal parameters not found for my curve fitting - python

Hello I have a problem to fit some data with Python. I just begin to fit my data with Python so I have some problems... This is my code :
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import *
from numpy import linalg as LA
def f(x,a,b,c):
return a*np.power(x,b)+c
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79])
y = np.array([7200,7925,8050,8200,8000,7550,7500,6800,6400,8150,6566,6280,6105,5963,5673,5495,5395,4800,4550,4558,4228,4087,3951,3817,3721,3612,3498,3416,3359,3269,3163,3241,2984,4475,2757,2644,2555,2600,3163,2720,2630,2543,2454,2441,2389,2339,2293,2261,2212,2180,2143,2450,2065,2032,1994,1960,1930,1897,1870,1838,1821,1785,1763,1741,1718,1689,1676,1662,1635,1635,1667,1633,1617,1615,1599,1581,1565,1547,1547])
params, extras = curve_fit(f, x, y)
plt.plot(x,y, 'o')
plt.plot(x, f(x, params[0], params[1], params[2]))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
And actually I want to fit my data with a function f(x) = a*x^b + c where I am looking for the best values of a, b and c to fit my data.
Do you know where there is something which is wrong ?
Thank you for your help.

Three caveats :
your model is not very good.
it diverge in x=0 : don't take first points.
you must give initial parameter estimations.
An exemple:
p0=[50000,-1,0]
x=x[10:]
y=y[10:]
params, cov = curve_fit(f, x, y,p0) #params=[3.16e+04 -5.83e-01 -1.00e+03]
plt.plot(x,y, 'o')
plt.plot(x, f(x, *params))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
You can estimate the quality of the model by
In [178]: np.sqrt(np.diag(cov))/params
Out[178]: array([ 0.12066005, -0.12537714, -0.53450057])
which shows that the estimation of error on parameters is greater than 10%.

The problem is the function you use for fitting. Consider using something like
def f(x, a, b, c):
return a*x + b*np.power(x, 2) + c
EDIT: accidentally posted the original function instead of the one I wanted to suggest.

Related

exponential curve fit parameters in python do not make sense--fit itself looks great

I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.
The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.
here is the plot
(https://i.stack.imgur.com/fBfg7.png)](https://i.stack.imgur.com/fBfg7.png)
here is the experimental raw data I used: https://pastebin.com/CR2BCJji
xdata = cfu_u
ydata = OD_u
min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))
def func(x,a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()
print(popt)
[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I used the curve_fit function from scipy to fit an exponential curve to some data. The fit looks very good, so that part was a success.
However, the parameters output by the curve_fit function do not make sense, and solving f(x) with them results in f(x)=0 for every value of x, which is clearly not what is happening in the curve.
Modify your model to show what's actually happening:
def func(x: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
return (a*x)**(1 - b) + (c - a)*x
producing optimized parameters
[3.49003332e-04 6.60420171e-06 3.13366557e-08]
This is likely to be numerically unstable. Try optimizing in the log domain instead.
When I run your example (after adding imports, etc.), I get NaNs for popt, and I eventually realized you were allowing general, real b with negative x. If I fit to the positive x only, I get a popt of [1.89176133e+01 5.66689997e+00 1.29380532e+08]. The fit isn't too bad (see below), but perhaps you need to restrict b to be an integer to fit the whole set. I'm not sure how to do that in Scipy (I assume you need mixed integer-real optimization, and I haven't investigated if Scipy supports that.)
Code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
cfu_u, OD_u = np.loadtxt('data.txt', skiprows=1).T
# fit to positive x only
posmask = cfu_u > 0
xdata = cfu_u[posmask]
ydata = OD_u[posmask]
def func(x, a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata, p0=[1000,2,1])
x_vec = np.geomspace(xdata.min(), xdata.max())
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data', marker='x',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.yscale("log")
plt.xscale("symlog")
plt.show()
print(popt)
#[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]

Fitting the curve on the gaussian

What is the problem on my code for fitting the curve?
I've written some code for fitting my data based on Gaussian distribution. However, I got some wrong value of a, b, c defined at the beginning of the code. Could you give me some advice to fix that problem?
from numpy import *
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*exp(-(x-b)**2/(2*c**2))
file = loadtxt("angdist1.xvg", skiprows = 18, dtype = float)
x = []
y = []
for i in range(shape(file)[0]):
x.append(file[i,0])
y.append(file[i,1])
popt, pcov = curve_fit(func, x, y)
plt.plot(x, func(x, *popt), color = 'red', linewidth=2)
plt.legend(['Original','fitting'], loc=0)
plt.show()
You did not provide initial guesses for your variables a, b, and c. scipy.optimize.curve_fit() will make the indefensible choice of silently assuming that you wanted initial values of a=b=c=1. Depending on your data, that could be so far off as to prevent the method from finding any solution at all.
The solution is to give initial values for the variables that are close. They don't have to be perfect. For example,
ainit = y.sum() # amplitude is within 10x of integral
binit = x.mean() # centroid is near mean x value
cinit = x.std() # standard deviation is near range of data
popt, pcov = curve_fit(func, x, y, [ainit, binit, cinit])
might give you a better result.

How to fit a specific exponential function with numpy

I'm trying to fit a series of data to a exponential equation, I've found some great answer here: How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting But it didn't contain the step forward that I need for this question.
I'm trying to fit y and x against a equation: y = -AeBx + A. The final A has proven to be a big trouble and I don't know how to transform the equation like log(y) = log(A) + Bx as if the final A was not there.
Any help is appreciated.
You can always just use scipy.optimize.curve_fit as long as your equation isn't too crazy:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as sio
def f(x, A, B):
return -A*np.exp(B*x) + A
A = 2
B = 1
x = np.linspace(0,1)
y = f(x, A, B)
scale = (max(y) - min(y))*.10
noise = np.random.normal(size=x.size)*scale
y += noise
fit = sio.curve_fit(f, x, y)
plt.scatter(x, y)
plt.plot(x, f(x, *fit[0]))
plt.show()
This produces:

Curve fit in a log-log plot in matplotlib and getting the equation of line

I am trying to draw a best fit curve for my data. It is terribly bad sample of data, but for simplicity's sake let's say, I expect to draw a straight line as a best fit in log-log scale.
I think I already did that with regression and it returns me a reasonable fit line. But I want to double check it with curve fit function in scipy. And I also want to extract the equation of the fit line.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.optimize as optimization
x = np.array([ 1.72724547e-08, 1.81960233e-08, 1.68093027e-08, 2.22839973e-08,
2.23090589e-08, 4.28020801e-08, 2.30004711e-08, 2.48543008e-08,
1.08633065e-07, 3.24417303e-08, 3.22946248e-08, 3.82328031e-08,
3.97713860e-08, 3.44080732e-08, 3.81526816e-08, 3.30756706e-08
])
y = np.array([ 4.18793565e+12, 4.40554864e+12, 4.48745390e+12, 4.50816705e+12,
4.57088190e+12, 4.60256574e+12, 4.66659380e+12, 4.79733449e+12, 7.31139083e+12, 7.53355564e+12, 8.03526122e+12, 8.14704284e+12,
8.47227414e+12, 8.62978548e+12, 8.81048873e+12, 9.46237161e+12
])
# Regression Function
def regress(x, y):
"""Return a tuple of predicted y values and parameters for linear regression."""
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# plotting z
allx, ally = x, y # data, non-transformed
y_pred, _ = regress(np.log(allx), np.log(ally)) # change here # transformed input
plt.loglog(allx, ally, marker='$\\star$',color ='g', markersize=5,linestyle='None')
plt.loglog(allx, np.exp(y_pred), "c--", label="regression") # transformed output
# Let's fit an exponential function.
# This looks like a line on a lof-log plot.
def myExpFunc(x, a, b):
return a * np.power(x, b)
popt, pcov = curve_fit(myExpFunc, x, y, maxfev=1000)
plt.plot(x, myExpFunc(x, *popt), 'r:',
label="({0:.3f}*x**{1:.3f})".format(*popt))
print "Exponential Fit: y = (a*(x**b))"
print "\ta = popt[0] = {0}\n\tb = popt[1] = {1}".format(*popt)
plt.show()
Again I apologize for a bad dataset. your help will be very appreciated.
My plot looks like this:
enter code here

Linear curve_fit always yields a slope and y-intercept of 1

I am trying to do a linear fit of some data, but I cannot get curve_fit in Python to give me anything but a slope and y-intercept of 1. Here is an example of my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b):
return a*x + b
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
popt, pcov = curve_fit(func, x, y)
print popt
I have also tried giving curve_fit a "guess," but when I do that it gives me an overflow error, which I'm guessing is because the numbers are too large.
Another way of doing this without using curve_fit is to use numpy's polyfit.
import matplotlib.pyplot as plt
import numpy as np
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
xp = np.linspace(290, 310, 100)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
print (z)
fig, ax = plt.subplots()
ax.plot(x, y, '.')
ax.plot(xp, p(xp), '-')
plt.show()
This prints the coefficients as [2.10000000e+19 -4.22333333e+21] and produces the following graph:
I got something in the ballpark as Excel's linear fit by using scipy basinhopping instead of curve_fit with a large number of iterations. It takes a bit to run the iterations and it also requires an error function, but it was done without scaling the original data. Basinhopping docs.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import basinhopping
def func( x0, x_data, y_data ):
error = 0
for x_val, y_val in zip(x_data, y_data):
error += (y_val - (x0[0]*x_val + x0[1]))**2
return error
x_data = [290., 300., 310.]
y_data = [1.87e+21, 2.07e+21, 2.29e+21]
a = 1
b = 1
x0 = [a, b]
minimizer_kwargs = { 'method': 'TNC', 'args': (x_data, y_data) }
res = basinhopping(func, x0, niter=1000000, minimizer_kwargs=minimizer_kwargs)
print res
This gives x: array([ 7.72723434e+18, -2.38554994e+20]) but if you try again, you'll see this has the problem of non-unique outcomes, although it will give similar ballpark values.
Here's a comparison of the fit with the Excel solution.
Confirmed correct results are returned using:
x = [290., 300., 310.]
y = [300., 301., 302.]
My guess is magnitudes ≅ 10²¹ are too large for the function to work well.
What you can try doing is taking the logarithm of both sides:
def func(x, a, b):
# might need to check if ≤ 0.0
return math.log(a*x + b)
# ... code omitted
y = [48.9802253837, 49.0818355602, 49.1828387704]
Then undo the transformation afterwards.
Also for simple linear approximation, there is an easy deterministic method.

Categories

Resources