I'm trying to fit the following data
tau = [0.0001, 0.0004, 0.0006, 0.0008, 0.001, 0.0015, 0.002, 0.004, 0.006, 0.008, 0.01, 0.05, 0.1, 0.2, 0.5, 0.6, 0.8, 1.0, 1.5, 2.0, 4.0, 6.0, 8.0, 10.0]
tet = [1.000000000, 0.993790739, 0.965602604, 0.924802378, 0.88010508, 0.778684048, 0.702773729, 0.569882533, 0.544103907, 0.54709633, 0.547347558, 0.543859156, 0.504348651, 0.691909732, 0.351717086, 0.405861814, 0.340536768, 0.301032851, 0.192656835, 0.188915355, 0.100207658, 0.059809495, 0.035968302, 0.024147687]
using a summation with the general formula
f(x) = $\sum_{i=1}^{n} a_i* exp^{-x/ti}$
I'm doing it separately, I'm sure I can do it using a for a function or something like that but I do not know how to do it. So here it goes
def fitfunc_1(x, a, t1):
return a * np.exp(- x / t1)
popt_tet_1, pcov = curve_fit(fitfunc_1, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
def fitfunc_2(x, a, t1, b, t2):
return a * np.exp(- x / t1) + b * np.exp(- x / t2)
popt_tet_2, pcov = curve_fit(fitfunc_2, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
def fitfunc_3(x, a, t1, b, t2, c, t3):
return a * np.exp(- x / t1) + b * np.exp(- x / t2) + c * np.exp(- x / t3)
popt_tet_3, pcov = curve_fit(fitfunc_3, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
However, I need to make sure that the sum of the a_i indexes, a, b and c are around 1. Meaning a ~ 1, a + b ~ 1, a + b + c ~ 1
Is there a way to limit scipy's fitting function this way?
Sorry for the noob question I guess
I tried to fit to your data to a sum of two exponentials and also to a sum of three exponentials. In both cases the fitting is correct only on a part of the range but never on the whole range. The difficulty can be understood in plotting the experimental points with a logarithmic scale on the abscissa axis.
The shape of the pattern looks more like the sum of fuctions of logistic kind than the sum of functions of exponential kind.
This suggests that each term of the sum might be on this form :
Thus the whole function to be fitted is :
NOTE : The above is a preliminary study in order to find a convenient kind of function to be fitted. The above numerical values of parameters are only empirically approximated. In order to have a better fit one have still to compute the parameters thanks to non-linear regression in using iterative calculus. The initial values to start the iterative process can be the above values of parameters.
Related
I am currently using my beginner-level knowledge of python for some econometric problems I am facing. Until now, this worked perfectly fine. However, my current problem is finding a graph + function for a few interview answers, for example for the following 6 points:
xvalues = [0, 0.2, 0.4, 0.6, 0.8, 1]
yvalues = [0, 0.15, 0.6, 0.49, 0.51, 1]
I've used curve_fit with mixed results. I have no problem with sigmoid and logarithmic functions. But when it comes to polynomial functions, I need to limit the possible y-values the function can have. For 0 <= x <= 1 the following conditions have to apply (I don't care about x < 0 and x > 1):
0 <= y <= 1
Maxima and minima of the function have to be located at said points. This doesn't apply to inflection points, though. Edit for clarity: Maxima and minima have to located only at said points.
as a basis, let's take the following, very simple code that works:
from scipy.optimize import curve_fit
def poly6(x, a, b, c, d, e, f):
return f * (x ** 6) + e * (x ** 5) + d * (x ** 4) + c * (x ** 3) + b * (x ** 2) + a * (x ** 1)
xvalues = [0, 0.2, 0.4, 0.6, 0.8, 1]
yvalues = [0, 0.15, 0.6, 0.49, 0.51, 1]
x = xvalues
y = yvalues
x_line = arange(min(x), max(x), 1)
popt, _ = curve_fit(poly6, x, y)
a, b, c, d, e, f = popt
print("Poly 6:")
print(popt)
How can I efficiently write these conditions down?
I've tried to find an answer, but with underwhelming success. I found it hard to narrow my problem down to an oneliner that other people already asked.
Using scipy.optimize.minimize to provide provide bounds of the possible y values of your function. I only implemented the limits of y being between 0 and 1. I didn't fully understand what you meant by the maxima/minima of the function having to be in the interval 0 <= x <= 1. Or do you mean minimum has to be at x=0 and maximum at x=1? If that's the case, then it's fairly easy to add two new weights for those situations.
from scipy.optimize import minimize, curve_fit
import numpy as np
import matplotlib.pyplot as plt
xvalues = np.array([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
yvalues = np.array([0, 0.075, 0.15, 0.375, 0.6, 0.545, 0.49, 0.5, 0.51, 0.755, 1])
def poly6(x, a, b, c, d, e, f):
return f * (x ** 6) + e * (x ** 5) + d * (x ** 4) + c * (x ** 3) + b * (x ** 2) + a * (x ** 1)
def min_function(params, x, y):
model = poly6(x, *params)
residual = ((y - model) ** 2).sum()
if np.any(model > 1):
residual += 100 # Just some large value
if np.any(model < 0):
residual += 100
return residual
res = minimize(min_function, x0=(1, 1, 1, 1, 1, 1), args=(xvalues, yvalues))
plt.plot(xvalues, yvalues, label='data')
plt.plot(xvalues, poly6(xvalues, *res.x), label='model')
plt.legend()
This is the resulting fit:
First, at least your test dataset is very small compared to the amount of variables you are trying to fit.
The constraints for 0<x<1 are:
local minima/ maxima at given points y'(x)=0, y''(x)!=0
0<= y(x)<= 1
The first constraint cannot be fulfilled by an approximation approach like curve-fit since you will not end exactly on the given points. Thus you will have to use some interpolation function like scipy.interpolate.UnivaranteSpline by definition a spline is continous differentiable and touches all points.
If you are looking for an approximate solution you have to define
how important is each constraint
how to measure the deviation of the expected result
and define yourself some kind of loss-function, punishing bad results e.g. if y >1 return 100000. Here you should use scipy.minimize since it provides the possibility of custom residuum-functions.
Misread the OPs Question
Second, have a look into the reference of the module/ function you are useing. Here you see:
bounds2-tuple of array_like, optional
Lower and upper bounds on
parameters. Defaults to no bounds. Each element of the tuple must be
either an array with the length equal to the number of parameters, or
a scalar (in which case the bound is taken to be the same for all
parameters). Use np.inf with an appropriate sign to disable bounds on
all or some parameters.
Thus, the following change will set a limit for all parameters to [0,1]
bound=(0,1)
popt, _ = curve_fit(poly6, x, y, bounds=bound)
I am trying to fit a data set into the hyperpolic equation using ipython --pylab:
y = ax / (b + x)
Here is my python code:
from scipy import optimize as opti
import numpy as np
from pandas import DataFrame
x = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8])
y = np.array([0.375, 0.466, 0.509, 0.520, 0.525, 0.536, 0.541])
y_stdev = np.array([0.025, 0.016, 0.009, 0.009, 0.025, 0.019])
def func(x, a, b):
return a*x / (b + x)
popt, pcov = opti.curve_fit(func, x, y)
print(popt)
print("a = ", popt.ix[0])
print("b = ", popt.ix[1])
The values of a and b should be inside the popt parameter. What I would like to ask is, the values of a and b are inferred when fitting the data set into the func(x, a, b), then, how can we estimate the standard deviations of a and b?
Thank you.
The answer is in the docs:
pcov : 2d array
The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov))...
I am trying to fit a curve to X and Y data points using a rational function. It can be done in Matlab using the cftool (http://de.mathworks.com/help/curvefit/rational.html). However, I am looking to do the same in Python. I have tried to use scipy.optimize.curve_fit(), but it initially requires a function, which I don't have.
You have the function, it is the rational function. So you need to set up the function and perform the fitting. As curve_fit requires that you supply your arguments not as lists, I supplied an additional function which does the fitting on the specific case of third degree polynomial in both the numerator as well as the denominator.
def rational(x, p, q):
"""
The general rational function description.
p is a list with the polynomial coefficients in the numerator
q is a list with the polynomial coefficients (except the first one)
in the denominator
The zeroth order coefficient of the denominator polynomial is fixed at 1.
Numpy stores coefficients in [x**2 + x + 1] order, so the fixed
zeroth order denominator coefficent must comes last. (Edited.)
"""
return np.polyval(p, x) / np.polyval(q + [1.0], x)
def rational3_3(x, p0, p1, p2, q1, q2):
return rational(x, [p0, p1, p2], [q1, q2])
x = np.linspace(0, 10, 100)
y = rational(x, [-0.2, 0.3, 0.5], [-1.0, 2.0])
ynoise = y * (1.0 + np.random.normal(scale=0.1, size=x.shape))
popt, pcov = curve_fit(rational3_3, x, ynoise, p0=(0.2, 0.3, 0.5, -1.0, 2.0))
print popt
plt.plot(x, y, label='original')
plt.plot(x, ynoise, '.', label='data')
plt.plot(x, rational3_3(x, *popt), label='fit')
I am using scipy.optimize.curve_fit to fit a curve to some data i have. The curves, for the most part, seem to fit very well. For some reason, pcov = inf when i print it off.
What i really need is to calculate the error associated with the parameters i'm fitting, and am not sure how exactly to do this even if it does give me the covariance matrix.
The model being fit to is:
def intensity(x,R_out,R_in,K_in,K_out,a,b,c):
K_in,K_out = abs(0.0),abs(K_out)
if x<=R_in:
return 2*R_out*(K_out*np.sqrt(1-x**2/R_out**2)-
(K_out-0.0)*np.sqrt(R_in**2/R_out**2-x**2/R_out**2)) + c
elif x>=R_in and x<=R_out:
return K_out*2*R_out*np.sqrt(1-x**2/R_out**2) + c
elif x>R_out:
return c
intensity_vec = np.vectorize(intensity)
def intensity_vec_self(x,R_out,R_in,K_in,K_out,a,b,c):
y = np.zeros(x.shape)
for i in range(len(y)):
y[i]=intensity_vec(x[i],R_out,R_in,K_in,K_out,a,b,c)
return y
and there are 400 data points, i can put that on here if you think it will help.
To summarize, i can't get curve_fit to print off my pcov and need help as to figure out why and if i can get it to do so.
Also, if it is a quick explanation i would like to know how to use the pcov array to attain the errors associated with my fit.
Thanks
The variance of parameters are the diagonal elements of the variance-co variance matrix, and the standard error is the square root of it. np.sqrt(np.diag(pcov))
Regarding getting inf, see and compare these two examples:
In [129]:
import numpy as np
def func(x, a, b, c, d):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5, 1)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ inf inf inf inf]
And:
In [130]:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ 0.11097646 0.11849107 0.05230711]
In this extreme example, d has no effect on the function func, hence it will be associated with variance of +inf, or in another word, it can be just about any value. Removing d from func will get what will make sense.
In reality, if parameters are of very different scale, say:
def func(x, a, b, c, d):
#return a * np.exp(-b * x) + c
return a * np.exp(-b * x) + c + d*1e-10
You will also get inf due to float point overflow/underflow.
In your case, I think you never used a and b. So it is just like the first example here.
I have used numpy in python to fit my data to a sigmoidal curve. How can I find the vaue for X at y=50% point in the curve after the data is fit to the curve
enter code here`import numpy as np
enter code here`import pylab
from scipy.optimize import curve_fit
def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y
xdata = np.array([0.0, 1.0, 3.0, 4.3, 7.0, 8.0, 8.5, 10.0, 12.0])
ydata = np.array([0.01, 0.02, 0.04, 0.11, 0.43, 0.7, 0.89, 0.95, 0.99])
popt, pcov = curve_fit(sigmoid, xdata, ydata)
print popt
x = np.linspace(-1, 15, 50)
y = sigmoid(x, *popt)
pylab.plot(xdata, ydata, 'o', label='data')
pylab.plot(x,y, label='fit')
pylab.ylim(0, 1.05)
pylab.legend(loc='best')
pylab.show()
You just need to solve the function you found for y(x) = 0.50. You can use one of the root finding tools of scipy, though these only solve for zero, so you need to give your function an offset:
def sigmoid(x, x0, k, y0=0):
y = 1 / (1 + np.exp(-k*(x-x0))) + y0
return y
Then it's just a matter of calling the root finding method of choice:
from scipy.optimize import brentq
a = np.min(xdata)
b = np.max(xdata)
x0, k = popt
y0 = -0.50
solution = brentq(sigmoid, a, b, args=(x0, k, y0)) # = 7.142
In addition to your comment:
My code above uses the original popt that was calculated with your code. If you do the curve fitting with the updated sigmoid function (with the offset), popt will also contain a fitted parameter for y0.
Probably you don't want this.. you'll want the curve fitted for y0=0. This can be done by supplying a guess for the curve_fit with only two values. This way the default value for y0 of the sigmoid function will be used:
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0 = (1,1))
Alternatively, just declare two seperate sigmmoid functions, one with the offset and one without it.