Exponential curve fitting in SciPy - python

I have two NumPy arrays x and y. When I try to fit my data using exponential function and curve_fit (SciPy) with this simple code
#!/usr/bin/env python
from pylab import *
from scipy.optimize import curve_fit
x = np.array([399.75, 989.25, 1578.75, 2168.25, 2757.75, 3347.25, 3936.75, 4526.25, 5115.75, 5705.25])
y = np.array([109,62,39,13,10,4,2,0,1,2])
def func(x, a, b, c, d):
return a*np.exp(b-c*x)+d
popt, pcov = curve_fit(func, x, y)
I get wrong coefficients popt
[a,b,c,d] = [1., 1., 1., 24.19999988]
What is the problem?

First comment: since a*exp(b - c*x) = (a*exp(b))*exp(-c*x) = A*exp(-c*x), a or b is redundant. I'll drop b and use:
def func(x, a, c, d):
return a*np.exp(-c*x)+d
That isn't the main issue. The problem is simply that curve_fit fails to converge to a solution to this problem when you use the default initial guess (which is all 1s). Check pcov; you'll see that it is inf. This is not surprising, because if c is 1, most of the values of exp(-c*x) underflow to 0:
In [32]: np.exp(-x)
Out[32]:
array([ 2.45912644e-174, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000])
This suggests that c should be small. A better initial guess is, say, p0 = (1, 1e-6, 1). Then I get:
In [36]: popt, pcov = curve_fit(func, x, y, p0=(1, 1e-6, 1))
In [37]: popt
Out[37]: array([ 1.63561656e+02, 9.71142196e-04, -1.16854450e+00])
This looks reasonable:
In [42]: xx = np.linspace(300, 6000, 1000)
In [43]: yy = func(xx, *popt)
In [44]: plot(x, y, 'ko')
Out[44]: [<matplotlib.lines.Line2D at 0x41c5ad0>]
In [45]: plot(xx, yy)
Out[45]: [<matplotlib.lines.Line2D at 0x41c5c10>]

Firstly I would recommend modifying your equation to a*np.exp(-c*(x-b))+d, otherwise the exponential will always be centered on x=0 which may not always be the case.
You also need to specify reasonable initial conditions (the 4th argument to curve_fit specifies initial conditions for [a,b,c,d]).
This code fits nicely:
from pylab import *
from scipy.optimize import curve_fit
x = np.array([399.75, 989.25, 1578.75, 2168.25, 2757.75, 3347.25, 3936.75, 4526.25, 5115.75, 5705.25])
y = np.array([109,62,39,13,10,4,2,0,1,2])
def func(x, a, b, c, d):
return a*np.exp(-c*(x-b))+d
popt, pcov = curve_fit(func, x, y, [100,400,0.001,0])
print popt
plot(x,y)
x=linspace(400,6000,10000)
plot(x,func(x,*popt))
show()

Related

Linear curve_fit always yields a slope and y-intercept of 1

I am trying to do a linear fit of some data, but I cannot get curve_fit in Python to give me anything but a slope and y-intercept of 1. Here is an example of my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b):
return a*x + b
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
popt, pcov = curve_fit(func, x, y)
print popt
I have also tried giving curve_fit a "guess," but when I do that it gives me an overflow error, which I'm guessing is because the numbers are too large.
Another way of doing this without using curve_fit is to use numpy's polyfit.
import matplotlib.pyplot as plt
import numpy as np
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
xp = np.linspace(290, 310, 100)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
print (z)
fig, ax = plt.subplots()
ax.plot(x, y, '.')
ax.plot(xp, p(xp), '-')
plt.show()
This prints the coefficients as [2.10000000e+19 -4.22333333e+21] and produces the following graph:
I got something in the ballpark as Excel's linear fit by using scipy basinhopping instead of curve_fit with a large number of iterations. It takes a bit to run the iterations and it also requires an error function, but it was done without scaling the original data. Basinhopping docs.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import basinhopping
def func( x0, x_data, y_data ):
error = 0
for x_val, y_val in zip(x_data, y_data):
error += (y_val - (x0[0]*x_val + x0[1]))**2
return error
x_data = [290., 300., 310.]
y_data = [1.87e+21, 2.07e+21, 2.29e+21]
a = 1
b = 1
x0 = [a, b]
minimizer_kwargs = { 'method': 'TNC', 'args': (x_data, y_data) }
res = basinhopping(func, x0, niter=1000000, minimizer_kwargs=minimizer_kwargs)
print res
This gives x: array([ 7.72723434e+18, -2.38554994e+20]) but if you try again, you'll see this has the problem of non-unique outcomes, although it will give similar ballpark values.
Here's a comparison of the fit with the Excel solution.
Confirmed correct results are returned using:
x = [290., 300., 310.]
y = [300., 301., 302.]
My guess is magnitudes ≅ 10²¹ are too large for the function to work well.
What you can try doing is taking the logarithm of both sides:
def func(x, a, b):
# might need to check if ≤ 0.0
return math.log(a*x + b)
# ... code omitted
y = [48.9802253837, 49.0818355602, 49.1828387704]
Then undo the transformation afterwards.
Also for simple linear approximation, there is an easy deterministic method.

Fit bipolar sigmoid python

I'm stuck trying to fit a bipolar sigmoid curve - I'd like to have the following curve:
but I need it shifted and stretched. I have the following inputs:
x[0] = 8, x[48] = 2
So over 48 periods I need to drop from 8 to 2 using a bipolar sigmoid function to approximate a nice smooth dropoff. Any ideas how I could derive the curve that would fit those parameters?
Here's what I have so far, but I need to change the sigmoid function:
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
plt.plot([sigmoid(float(z)) for z in range(1,48)])
You could redefine the sigmoid function like so
def sigmoid(x, a, b, c, d):
""" General sigmoid function
a adjusts amplitude
b adjusts y offset
c adjusts x offset
d adjusts slope """
y = ((a-b) / (1 + np.exp(x-(c/2))**d)) + b
return y
x = np.arange(49)
y = sigmoid(x, 8, 2, 48, 0.3)
plt.plot(x, y)
Severin's answer is likely more robust, but this should be fine if all you want is a quick and dirty solution.
In [2]: y[0]
Out[2]: 7.9955238269969806
In [3]: y[48]
Out[3]: 2.0044761730030203
From generic bipolar sigmoid function:
f(x,m,b)= 2/(1+exp(-b*(x-m))) - 1
there are two parameters and two unknowns - shift m and scale b
You have two condition:f(0) = 8, f(48) = 2
take first condition, express b vs m, together with second condition write non-linear function to solve, and then use fsolve from SciPy to solve it numerically, and recover back b and m.
Here related by similar method question and answer: How to random sample lognormal data in Python using the inverse CDF and specify target percentiles?
Alternatively, you could also use curve_fit which might come in handy if you have more than just two datapoints. The output looks like this:
As you can see, the graph contains the desired data points. I used #lanery's function for the fit; you can of course choose any function you like. This is the code with some inline comments:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def sigmoid(x, a, b, c, d):
return ((a - b) / (1. + np.exp(x - (c / 2)) ** d)) + b
# one needs at least as many data points as parameters, so I just duplicate the data
xdata = [0., 48.] * 2
ydata = [8., 2.] * 2
# plot data
plt.plot(xdata, ydata, 'bo', label='data')
# fit the data
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0=[1., 1., 50., 0.5])
# plot the result
xdata_new = np.linspace(0, 50, 100)
plt.plot(xdata_new, sigmoid(xdata_new, *popt), 'r-', label='fit')
plt.legend(loc='best')
plt.show()

Curve Fitting scipy

why this fitting is this much bad ?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def fit(x, a, b, c, d):
return a * np.sin(b * x + c) + d
xdata = np.linspace(0, 360, 1000)
ydata = 89.9535 + 60.9535 * np.sin(0.0174 * xdata - 1.5708)
popt, pcov = curve_fit(fit, xdata, ydata)
plt.plot(xdata, 89.9535 + 60.9535 * np.sin(0.0174 * xdata - 1.5708))
plt.plot(xdata, fit(xdata, popt[0], popt[1], popt[2], popt[3]))
plt.show()
the fitted curve seems very strange, or maybe I am miss using it , thanks for any helps .
This is the result:
curve_fit finds a local minimum for the least-squares problem. In this case, there are many local minima.
One way around this is to use as good an initial guess as possible. For problems with multiple local minima, curve_fit's default of all ones for the initial guess can be pretty bad. For your function, the crucial parameter is b, the frequency. If you know that value will be small, i.e. on the order of 0.01, use 0.01 as the initial guess:
In [77]: (a, b, c, d), pcov = curve_fit(fit, xdata, ydata, p0=[1, .01, 1, 1])
In [78]: a
Out[78]: 60.953499999999998
In [79]: b
Out[79]: 0.017399999999999999
In [80]: c
Out[80]: -102.10176491487339
In [81]: ((c + np.pi) % (2*np.pi)) - np.pi
Out[81]: -1.570800000000002
In [82]: d
Out[82]: 89.953500000000005
As an alternative, plot the original data alone and use it to make initial guesses of the parameters. For a periodic function it can be easy to estimate the period and the amplitude. In this case the guesses need not be too close.
Then I used these in curve_fit:
popt, pcov = curve_fit(fit, xdata, ydata, [ 80., np.pi/330, 1., 1. ])
The result it returned are essentially the original values.
array([ 6.09535000e+01, 1.74000000e-02, -1.57080000e+00,
8.99535000e+01])

Wrong fit using scipy curve_fit

I am trying to fit some data to a power law function with exponential cut off. I generate some data with numpy and i am trying to fit those data with scipy.optimization.
Here is my code:
import numpy as np
from scipy.optimize import curve_fit
def func(x, A, B, alpha):
return A * x**alpha * np.exp(B * x)
xdata = np.linspace(1, 10**8, 1000)
ydata = func(xdata, 0.004, -2*10**-8, -0.75)
popt, pcov = curve_fit(func, xdata, ydata)
print popt
The result of I am getting is: [1, 1, 1] which does not correspond with the data.
¿Am i doing something wrong?
Whilst xnx gave you the answer as to why curve_fit failed here I thought I'd suggest a different way of approaching the problem of fitting your functional form which doesn't rely on a gradient descent (and therefore a reasonable initial guess)
Note that if you take the log of the function that you are fitting you get the form
Which is linear in each of the unknown parameters (log A, alpha, B)
We can therefore use the machinery of linear algebra to solve this by writing the equation in the form of a matrix as
log y = M p
Where log y is a column vector of the log of your ydata points, p is a column vector of the unknown parameters and M is the matrix [[1], [log x], [x]]
Or explicitly
The best fitting parameter vector can then be found efficiently by using np.linalg.lstsq
Your example problem in code could then be written as
import numpy as np
def func(x, A, B, alpha):
return A * x**alpha * np.exp(B * x)
A_true = 0.004
alpha_true = -0.75
B_true = -2*10**-8
xdata = np.linspace(1, 10**8, 1000)
ydata = func(xdata, A_true, B_true, alpha_true)
M = np.vstack([np.ones(len(xdata)), np.log(xdata), xdata]).T
logA, alpha, B = np.linalg.lstsq(M, np.log(ydata))[0]
print "A =", np.exp(logA)
print "alpha =", alpha
print "B =", B
Which recovers the initial parameters nicely:
A = 0.00400000003736
alpha = -0.750000000928
B = -1.9999999934e-08
Also note that this method is around 20x faster than using curve_fit for the problem at hand
In [8]: %timeit np.linalg.lstsq(np.vstack([np.ones(len(xdata)), np.log(xdata), xdata]).T, np.log(ydata))
10000 loops, best of 3: 169 µs per loop
In [2]: %timeit curve_fit(func, xdata, ydata, [0.01, -5e-7, -0.4])
100 loops, best of 3: 4.44 ms per loop
Apparently your initial guess (which defaults to [1,1,1], since you didn't give one -- see the docs) is too far from the actual parameters to allow the algorithm to converge. The main problem is probably with B which, if positive, will send your exponential function to very large values for your provided xdata.
Try providing something a little closer to the actual parameters and it works:
p0 = 0.01, -5e-7, -0.4 # Initial guess for the parameters
popt, pcov = curve_fit(func, xdata, ydata, p0)
print popt
Output:
[ 4.00000000e-03 -2.00000000e-08 -7.50000000e-01]

Getting standard error associated with parameter estimates from scipy.optimize.curve_fit

I am using scipy.optimize.curve_fit to fit a curve to some data i have. The curves, for the most part, seem to fit very well. For some reason, pcov = inf when i print it off.
What i really need is to calculate the error associated with the parameters i'm fitting, and am not sure how exactly to do this even if it does give me the covariance matrix.
The model being fit to is:
def intensity(x,R_out,R_in,K_in,K_out,a,b,c):
K_in,K_out = abs(0.0),abs(K_out)
if x<=R_in:
return 2*R_out*(K_out*np.sqrt(1-x**2/R_out**2)-
(K_out-0.0)*np.sqrt(R_in**2/R_out**2-x**2/R_out**2)) + c
elif x>=R_in and x<=R_out:
return K_out*2*R_out*np.sqrt(1-x**2/R_out**2) + c
elif x>R_out:
return c
intensity_vec = np.vectorize(intensity)
def intensity_vec_self(x,R_out,R_in,K_in,K_out,a,b,c):
y = np.zeros(x.shape)
for i in range(len(y)):
y[i]=intensity_vec(x[i],R_out,R_in,K_in,K_out,a,b,c)
return y
and there are 400 data points, i can put that on here if you think it will help.
To summarize, i can't get curve_fit to print off my pcov and need help as to figure out why and if i can get it to do so.
Also, if it is a quick explanation i would like to know how to use the pcov array to attain the errors associated with my fit.
Thanks
The variance of parameters are the diagonal elements of the variance-co variance matrix, and the standard error is the square root of it. np.sqrt(np.diag(pcov))
Regarding getting inf, see and compare these two examples:
In [129]:
import numpy as np
def func(x, a, b, c, d):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5, 1)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ inf inf inf inf]
And:
In [130]:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ 0.11097646 0.11849107 0.05230711]
In this extreme example, d has no effect on the function func, hence it will be associated with variance of +inf, or in another word, it can be just about any value. Removing d from func will get what will make sense.
In reality, if parameters are of very different scale, say:
def func(x, a, b, c, d):
#return a * np.exp(-b * x) + c
return a * np.exp(-b * x) + c + d*1e-10
You will also get inf due to float point overflow/underflow.
In your case, I think you never used a and b. So it is just like the first example here.

Categories

Resources