I am trying to fit some data to a power law function with exponential cut off. I generate some data with numpy and i am trying to fit those data with scipy.optimization.
Here is my code:
import numpy as np
from scipy.optimize import curve_fit
def func(x, A, B, alpha):
return A * x**alpha * np.exp(B * x)
xdata = np.linspace(1, 10**8, 1000)
ydata = func(xdata, 0.004, -2*10**-8, -0.75)
popt, pcov = curve_fit(func, xdata, ydata)
print popt
The result of I am getting is: [1, 1, 1] which does not correspond with the data.
¿Am i doing something wrong?
Whilst xnx gave you the answer as to why curve_fit failed here I thought I'd suggest a different way of approaching the problem of fitting your functional form which doesn't rely on a gradient descent (and therefore a reasonable initial guess)
Note that if you take the log of the function that you are fitting you get the form
Which is linear in each of the unknown parameters (log A, alpha, B)
We can therefore use the machinery of linear algebra to solve this by writing the equation in the form of a matrix as
log y = M p
Where log y is a column vector of the log of your ydata points, p is a column vector of the unknown parameters and M is the matrix [[1], [log x], [x]]
Or explicitly
The best fitting parameter vector can then be found efficiently by using np.linalg.lstsq
Your example problem in code could then be written as
import numpy as np
def func(x, A, B, alpha):
return A * x**alpha * np.exp(B * x)
A_true = 0.004
alpha_true = -0.75
B_true = -2*10**-8
xdata = np.linspace(1, 10**8, 1000)
ydata = func(xdata, A_true, B_true, alpha_true)
M = np.vstack([np.ones(len(xdata)), np.log(xdata), xdata]).T
logA, alpha, B = np.linalg.lstsq(M, np.log(ydata))[0]
print "A =", np.exp(logA)
print "alpha =", alpha
print "B =", B
Which recovers the initial parameters nicely:
A = 0.00400000003736
alpha = -0.750000000928
B = -1.9999999934e-08
Also note that this method is around 20x faster than using curve_fit for the problem at hand
In [8]: %timeit np.linalg.lstsq(np.vstack([np.ones(len(xdata)), np.log(xdata), xdata]).T, np.log(ydata))
10000 loops, best of 3: 169 µs per loop
In [2]: %timeit curve_fit(func, xdata, ydata, [0.01, -5e-7, -0.4])
100 loops, best of 3: 4.44 ms per loop
Apparently your initial guess (which defaults to [1,1,1], since you didn't give one -- see the docs) is too far from the actual parameters to allow the algorithm to converge. The main problem is probably with B which, if positive, will send your exponential function to very large values for your provided xdata.
Try providing something a little closer to the actual parameters and it works:
p0 = 0.01, -5e-7, -0.4 # Initial guess for the parameters
popt, pcov = curve_fit(func, xdata, ydata, p0)
print popt
Output:
[ 4.00000000e-03 -2.00000000e-08 -7.50000000e-01]
Related
I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.
The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.
here is the plot
(https://i.stack.imgur.com/fBfg7.png)](https://i.stack.imgur.com/fBfg7.png)
here is the experimental raw data I used: https://pastebin.com/CR2BCJji
xdata = cfu_u
ydata = OD_u
min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))
def func(x,a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()
print(popt)
[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I used the curve_fit function from scipy to fit an exponential curve to some data. The fit looks very good, so that part was a success.
However, the parameters output by the curve_fit function do not make sense, and solving f(x) with them results in f(x)=0 for every value of x, which is clearly not what is happening in the curve.
Modify your model to show what's actually happening:
def func(x: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
return (a*x)**(1 - b) + (c - a)*x
producing optimized parameters
[3.49003332e-04 6.60420171e-06 3.13366557e-08]
This is likely to be numerically unstable. Try optimizing in the log domain instead.
When I run your example (after adding imports, etc.), I get NaNs for popt, and I eventually realized you were allowing general, real b with negative x. If I fit to the positive x only, I get a popt of [1.89176133e+01 5.66689997e+00 1.29380532e+08]. The fit isn't too bad (see below), but perhaps you need to restrict b to be an integer to fit the whole set. I'm not sure how to do that in Scipy (I assume you need mixed integer-real optimization, and I haven't investigated if Scipy supports that.)
Code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
cfu_u, OD_u = np.loadtxt('data.txt', skiprows=1).T
# fit to positive x only
posmask = cfu_u > 0
xdata = cfu_u[posmask]
ydata = OD_u[posmask]
def func(x, a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata, p0=[1000,2,1])
x_vec = np.geomspace(xdata.min(), xdata.max())
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data', marker='x',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.yscale("log")
plt.xscale("symlog")
plt.show()
print(popt)
#[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
What is the problem on my code for fitting the curve?
I've written some code for fitting my data based on Gaussian distribution. However, I got some wrong value of a, b, c defined at the beginning of the code. Could you give me some advice to fix that problem?
from numpy import *
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*exp(-(x-b)**2/(2*c**2))
file = loadtxt("angdist1.xvg", skiprows = 18, dtype = float)
x = []
y = []
for i in range(shape(file)[0]):
x.append(file[i,0])
y.append(file[i,1])
popt, pcov = curve_fit(func, x, y)
plt.plot(x, func(x, *popt), color = 'red', linewidth=2)
plt.legend(['Original','fitting'], loc=0)
plt.show()
You did not provide initial guesses for your variables a, b, and c. scipy.optimize.curve_fit() will make the indefensible choice of silently assuming that you wanted initial values of a=b=c=1. Depending on your data, that could be so far off as to prevent the method from finding any solution at all.
The solution is to give initial values for the variables that are close. They don't have to be perfect. For example,
ainit = y.sum() # amplitude is within 10x of integral
binit = x.mean() # centroid is near mean x value
cinit = x.std() # standard deviation is near range of data
popt, pcov = curve_fit(func, x, y, [ainit, binit, cinit])
might give you a better result.
I'm stuck trying to fit a bipolar sigmoid curve - I'd like to have the following curve:
but I need it shifted and stretched. I have the following inputs:
x[0] = 8, x[48] = 2
So over 48 periods I need to drop from 8 to 2 using a bipolar sigmoid function to approximate a nice smooth dropoff. Any ideas how I could derive the curve that would fit those parameters?
Here's what I have so far, but I need to change the sigmoid function:
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
plt.plot([sigmoid(float(z)) for z in range(1,48)])
You could redefine the sigmoid function like so
def sigmoid(x, a, b, c, d):
""" General sigmoid function
a adjusts amplitude
b adjusts y offset
c adjusts x offset
d adjusts slope """
y = ((a-b) / (1 + np.exp(x-(c/2))**d)) + b
return y
x = np.arange(49)
y = sigmoid(x, 8, 2, 48, 0.3)
plt.plot(x, y)
Severin's answer is likely more robust, but this should be fine if all you want is a quick and dirty solution.
In [2]: y[0]
Out[2]: 7.9955238269969806
In [3]: y[48]
Out[3]: 2.0044761730030203
From generic bipolar sigmoid function:
f(x,m,b)= 2/(1+exp(-b*(x-m))) - 1
there are two parameters and two unknowns - shift m and scale b
You have two condition:f(0) = 8, f(48) = 2
take first condition, express b vs m, together with second condition write non-linear function to solve, and then use fsolve from SciPy to solve it numerically, and recover back b and m.
Here related by similar method question and answer: How to random sample lognormal data in Python using the inverse CDF and specify target percentiles?
Alternatively, you could also use curve_fit which might come in handy if you have more than just two datapoints. The output looks like this:
As you can see, the graph contains the desired data points. I used #lanery's function for the fit; you can of course choose any function you like. This is the code with some inline comments:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def sigmoid(x, a, b, c, d):
return ((a - b) / (1. + np.exp(x - (c / 2)) ** d)) + b
# one needs at least as many data points as parameters, so I just duplicate the data
xdata = [0., 48.] * 2
ydata = [8., 2.] * 2
# plot data
plt.plot(xdata, ydata, 'bo', label='data')
# fit the data
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0=[1., 1., 50., 0.5])
# plot the result
xdata_new = np.linspace(0, 50, 100)
plt.plot(xdata_new, sigmoid(xdata_new, *popt), 'r-', label='fit')
plt.legend(loc='best')
plt.show()
why this fitting is this much bad ?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def fit(x, a, b, c, d):
return a * np.sin(b * x + c) + d
xdata = np.linspace(0, 360, 1000)
ydata = 89.9535 + 60.9535 * np.sin(0.0174 * xdata - 1.5708)
popt, pcov = curve_fit(fit, xdata, ydata)
plt.plot(xdata, 89.9535 + 60.9535 * np.sin(0.0174 * xdata - 1.5708))
plt.plot(xdata, fit(xdata, popt[0], popt[1], popt[2], popt[3]))
plt.show()
the fitted curve seems very strange, or maybe I am miss using it , thanks for any helps .
This is the result:
curve_fit finds a local minimum for the least-squares problem. In this case, there are many local minima.
One way around this is to use as good an initial guess as possible. For problems with multiple local minima, curve_fit's default of all ones for the initial guess can be pretty bad. For your function, the crucial parameter is b, the frequency. If you know that value will be small, i.e. on the order of 0.01, use 0.01 as the initial guess:
In [77]: (a, b, c, d), pcov = curve_fit(fit, xdata, ydata, p0=[1, .01, 1, 1])
In [78]: a
Out[78]: 60.953499999999998
In [79]: b
Out[79]: 0.017399999999999999
In [80]: c
Out[80]: -102.10176491487339
In [81]: ((c + np.pi) % (2*np.pi)) - np.pi
Out[81]: -1.570800000000002
In [82]: d
Out[82]: 89.953500000000005
As an alternative, plot the original data alone and use it to make initial guesses of the parameters. For a periodic function it can be easy to estimate the period and the amplitude. In this case the guesses need not be too close.
Then I used these in curve_fit:
popt, pcov = curve_fit(fit, xdata, ydata, [ 80., np.pi/330, 1., 1. ])
The result it returned are essentially the original values.
array([ 6.09535000e+01, 1.74000000e-02, -1.57080000e+00,
8.99535000e+01])
I am using scipy.optimize.curve_fit to fit a curve to some data i have. The curves, for the most part, seem to fit very well. For some reason, pcov = inf when i print it off.
What i really need is to calculate the error associated with the parameters i'm fitting, and am not sure how exactly to do this even if it does give me the covariance matrix.
The model being fit to is:
def intensity(x,R_out,R_in,K_in,K_out,a,b,c):
K_in,K_out = abs(0.0),abs(K_out)
if x<=R_in:
return 2*R_out*(K_out*np.sqrt(1-x**2/R_out**2)-
(K_out-0.0)*np.sqrt(R_in**2/R_out**2-x**2/R_out**2)) + c
elif x>=R_in and x<=R_out:
return K_out*2*R_out*np.sqrt(1-x**2/R_out**2) + c
elif x>R_out:
return c
intensity_vec = np.vectorize(intensity)
def intensity_vec_self(x,R_out,R_in,K_in,K_out,a,b,c):
y = np.zeros(x.shape)
for i in range(len(y)):
y[i]=intensity_vec(x[i],R_out,R_in,K_in,K_out,a,b,c)
return y
and there are 400 data points, i can put that on here if you think it will help.
To summarize, i can't get curve_fit to print off my pcov and need help as to figure out why and if i can get it to do so.
Also, if it is a quick explanation i would like to know how to use the pcov array to attain the errors associated with my fit.
Thanks
The variance of parameters are the diagonal elements of the variance-co variance matrix, and the standard error is the square root of it. np.sqrt(np.diag(pcov))
Regarding getting inf, see and compare these two examples:
In [129]:
import numpy as np
def func(x, a, b, c, d):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5, 1)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ inf inf inf inf]
And:
In [130]:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ 0.11097646 0.11849107 0.05230711]
In this extreme example, d has no effect on the function func, hence it will be associated with variance of +inf, or in another word, it can be just about any value. Removing d from func will get what will make sense.
In reality, if parameters are of very different scale, say:
def func(x, a, b, c, d):
#return a * np.exp(-b * x) + c
return a * np.exp(-b * x) + c + d*1e-10
You will also get inf due to float point overflow/underflow.
In your case, I think you never used a and b. So it is just like the first example here.