I am trying to fit a curve to X and Y data points using a rational function. It can be done in Matlab using the cftool (http://de.mathworks.com/help/curvefit/rational.html). However, I am looking to do the same in Python. I have tried to use scipy.optimize.curve_fit(), but it initially requires a function, which I don't have.
You have the function, it is the rational function. So you need to set up the function and perform the fitting. As curve_fit requires that you supply your arguments not as lists, I supplied an additional function which does the fitting on the specific case of third degree polynomial in both the numerator as well as the denominator.
def rational(x, p, q):
"""
The general rational function description.
p is a list with the polynomial coefficients in the numerator
q is a list with the polynomial coefficients (except the first one)
in the denominator
The zeroth order coefficient of the denominator polynomial is fixed at 1.
Numpy stores coefficients in [x**2 + x + 1] order, so the fixed
zeroth order denominator coefficent must comes last. (Edited.)
"""
return np.polyval(p, x) / np.polyval(q + [1.0], x)
def rational3_3(x, p0, p1, p2, q1, q2):
return rational(x, [p0, p1, p2], [q1, q2])
x = np.linspace(0, 10, 100)
y = rational(x, [-0.2, 0.3, 0.5], [-1.0, 2.0])
ynoise = y * (1.0 + np.random.normal(scale=0.1, size=x.shape))
popt, pcov = curve_fit(rational3_3, x, ynoise, p0=(0.2, 0.3, 0.5, -1.0, 2.0))
print popt
plt.plot(x, y, label='original')
plt.plot(x, ynoise, '.', label='data')
plt.plot(x, rational3_3(x, *popt), label='fit')
Related
I am currently using my beginner-level knowledge of python for some econometric problems I am facing. Until now, this worked perfectly fine. However, my current problem is finding a graph + function for a few interview answers, for example for the following 6 points:
xvalues = [0, 0.2, 0.4, 0.6, 0.8, 1]
yvalues = [0, 0.15, 0.6, 0.49, 0.51, 1]
I've used curve_fit with mixed results. I have no problem with sigmoid and logarithmic functions. But when it comes to polynomial functions, I need to limit the possible y-values the function can have. For 0 <= x <= 1 the following conditions have to apply (I don't care about x < 0 and x > 1):
0 <= y <= 1
Maxima and minima of the function have to be located at said points. This doesn't apply to inflection points, though. Edit for clarity: Maxima and minima have to located only at said points.
as a basis, let's take the following, very simple code that works:
from scipy.optimize import curve_fit
def poly6(x, a, b, c, d, e, f):
return f * (x ** 6) + e * (x ** 5) + d * (x ** 4) + c * (x ** 3) + b * (x ** 2) + a * (x ** 1)
xvalues = [0, 0.2, 0.4, 0.6, 0.8, 1]
yvalues = [0, 0.15, 0.6, 0.49, 0.51, 1]
x = xvalues
y = yvalues
x_line = arange(min(x), max(x), 1)
popt, _ = curve_fit(poly6, x, y)
a, b, c, d, e, f = popt
print("Poly 6:")
print(popt)
How can I efficiently write these conditions down?
I've tried to find an answer, but with underwhelming success. I found it hard to narrow my problem down to an oneliner that other people already asked.
Using scipy.optimize.minimize to provide provide bounds of the possible y values of your function. I only implemented the limits of y being between 0 and 1. I didn't fully understand what you meant by the maxima/minima of the function having to be in the interval 0 <= x <= 1. Or do you mean minimum has to be at x=0 and maximum at x=1? If that's the case, then it's fairly easy to add two new weights for those situations.
from scipy.optimize import minimize, curve_fit
import numpy as np
import matplotlib.pyplot as plt
xvalues = np.array([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
yvalues = np.array([0, 0.075, 0.15, 0.375, 0.6, 0.545, 0.49, 0.5, 0.51, 0.755, 1])
def poly6(x, a, b, c, d, e, f):
return f * (x ** 6) + e * (x ** 5) + d * (x ** 4) + c * (x ** 3) + b * (x ** 2) + a * (x ** 1)
def min_function(params, x, y):
model = poly6(x, *params)
residual = ((y - model) ** 2).sum()
if np.any(model > 1):
residual += 100 # Just some large value
if np.any(model < 0):
residual += 100
return residual
res = minimize(min_function, x0=(1, 1, 1, 1, 1, 1), args=(xvalues, yvalues))
plt.plot(xvalues, yvalues, label='data')
plt.plot(xvalues, poly6(xvalues, *res.x), label='model')
plt.legend()
This is the resulting fit:
First, at least your test dataset is very small compared to the amount of variables you are trying to fit.
The constraints for 0<x<1 are:
local minima/ maxima at given points y'(x)=0, y''(x)!=0
0<= y(x)<= 1
The first constraint cannot be fulfilled by an approximation approach like curve-fit since you will not end exactly on the given points. Thus you will have to use some interpolation function like scipy.interpolate.UnivaranteSpline by definition a spline is continous differentiable and touches all points.
If you are looking for an approximate solution you have to define
how important is each constraint
how to measure the deviation of the expected result
and define yourself some kind of loss-function, punishing bad results e.g. if y >1 return 100000. Here you should use scipy.minimize since it provides the possibility of custom residuum-functions.
Misread the OPs Question
Second, have a look into the reference of the module/ function you are useing. Here you see:
bounds2-tuple of array_like, optional
Lower and upper bounds on
parameters. Defaults to no bounds. Each element of the tuple must be
either an array with the length equal to the number of parameters, or
a scalar (in which case the bound is taken to be the same for all
parameters). Use np.inf with an appropriate sign to disable bounds on
all or some parameters.
Thus, the following change will set a limit for all parameters to [0,1]
bound=(0,1)
popt, _ = curve_fit(poly6, x, y, bounds=bound)
I'm trying to fit the following data
tau = [0.0001, 0.0004, 0.0006, 0.0008, 0.001, 0.0015, 0.002, 0.004, 0.006, 0.008, 0.01, 0.05, 0.1, 0.2, 0.5, 0.6, 0.8, 1.0, 1.5, 2.0, 4.0, 6.0, 8.0, 10.0]
tet = [1.000000000, 0.993790739, 0.965602604, 0.924802378, 0.88010508, 0.778684048, 0.702773729, 0.569882533, 0.544103907, 0.54709633, 0.547347558, 0.543859156, 0.504348651, 0.691909732, 0.351717086, 0.405861814, 0.340536768, 0.301032851, 0.192656835, 0.188915355, 0.100207658, 0.059809495, 0.035968302, 0.024147687]
using a summation with the general formula
f(x) = $\sum_{i=1}^{n} a_i* exp^{-x/ti}$
I'm doing it separately, I'm sure I can do it using a for a function or something like that but I do not know how to do it. So here it goes
def fitfunc_1(x, a, t1):
return a * np.exp(- x / t1)
popt_tet_1, pcov = curve_fit(fitfunc_1, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
def fitfunc_2(x, a, t1, b, t2):
return a * np.exp(- x / t1) + b * np.exp(- x / t2)
popt_tet_2, pcov = curve_fit(fitfunc_2, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
def fitfunc_3(x, a, t1, b, t2, c, t3):
return a * np.exp(- x / t1) + b * np.exp(- x / t2) + c * np.exp(- x / t3)
popt_tet_3, pcov = curve_fit(fitfunc_3, data['tau'], data['tet'], maxfev=10000, bounds = (0.0, np.inf))
However, I need to make sure that the sum of the a_i indexes, a, b and c are around 1. Meaning a ~ 1, a + b ~ 1, a + b + c ~ 1
Is there a way to limit scipy's fitting function this way?
Sorry for the noob question I guess
I tried to fit to your data to a sum of two exponentials and also to a sum of three exponentials. In both cases the fitting is correct only on a part of the range but never on the whole range. The difficulty can be understood in plotting the experimental points with a logarithmic scale on the abscissa axis.
The shape of the pattern looks more like the sum of fuctions of logistic kind than the sum of functions of exponential kind.
This suggests that each term of the sum might be on this form :
Thus the whole function to be fitted is :
NOTE : The above is a preliminary study in order to find a convenient kind of function to be fitted. The above numerical values of parameters are only empirically approximated. In order to have a better fit one have still to compute the parameters thanks to non-linear regression in using iterative calculus. The initial values to start the iterative process can be the above values of parameters.
I am trying to define a function that fits input x and y data of the form:
def nlvh(x,y, xi, yi, H,C):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
The x and y data are 1-D numpy arrays of the same length. I would like to slice the data so that I can select the first 5 points of x and y, fit those by optimizing C and H in the model, and then move one point ahead and repeat. I have some code that does this for a linear fit over the same data:
for i in np.arange(len(x)):
xdata = x[i:i + window]
ydata = y[i:i+window]
a[i], b[i] = np.polyfit(xdata, ydata,1)
xdata_avg[i] = np.mean(xdata)
if i == (lenx - window):
break
but doing the same thing over the equation defined above appears to be a bit more tricky. x and y appear as the independent and dependent variables, but there are also parameters xo and yo which are the first values of x and y in each window.
The end result I would like are two new arrays with H[i] and C[i], where i designates each subsequent window. Does anybody have some insight as to how I can get started?
Following your comment to my previous answer (where you suggested that you will like xi and yi to be the initial values in each "sliced" x and y arrays), I am adding another answer. This answer introduces a change in the function nlvh and achieves exactly what you desire. As like my previous answer, we will use curve_fit from scipy.optimize.
In the below mentioned code, I am using globals() function from python to define xi and yi. For every sliced x and y arrays, xi and yi store the first value of the respective sliced arrays. This is the revamped code:
from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit
def nlvh(x, H, C):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
xdata = np.arange(1,21) #Choose an array for x.
#Choose an array for y.
ydata = np.array([-0.1404996, -0.04353953, 0.35002257, 0.12939468, -0.34259184, -0.2906065,
-0.37508709, -0.41583238, -0.511851, -0.39465581, -0.32631751, -0.34403938,
-0.592997, -0.34312689, -0.4838437, -0.19311436, -0.20962735, -0.31134191,
-0.09487793, -0.55578775])
H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
#Select 5 consecutive points of xdata (from index i to i+4).
xnew = xdata[i: i+5]
globals()['xi'] = xnew[0]
#Select 5 consecutive points of ydata (from index i to i+4).
ynew = ydata[i: i+5]
globals()['yi'] = ynew[0]
#Fit function nlvh to data using scipy.optimize.curve_fit
popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)
#Optimal values for H from minimization of sum of the squared residuals.
H_lst += [popt[0]]
#Optimal values for C from minimization of sum of the squared residuals.
C_lst += [popt[1]]
H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.
Your output for H_arr and C_arr will now be the following:
print H_arr
>>>[1.0, 1.0, -23.041138662879327, -34.58915200575536, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
print C_arr
>>>[1.0, 1.0, -8.795855063863234, -9.271561975595562, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Following are the plots that you get for the data selected above (xdata, ydata).
You can use curve_fit from scipy.optimize. It will use non-linear least squares to fit the parameters (H, C, xi, yi) of your function nlvh to given input data for x and y.
Try the following code. In the below mentioned code, H_arr and C_arr are numpy arrays which contain fit parameters of H and C respectively when the function nlvh is fitted to windows of 5 consecutive points of xdata and ydata (xdata and ydata are arrays that I have chosen for x and y. You can choose different arrays here.)
from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit
def nlvh(x, H, C, xi, yi):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
xdata = np.arange(1,21) #Choose an array for x
#Find an array yy for chosen values of parameters (H, C, xi, yi)
yy = nlvh(xdata, H=1.0, C=1.0, xi=1.0, yi=1.0)
print yy
>>>[ 0. -0.08337108 -0.13214004 -0.16674217 -0.19358166 -0.21551112 -0.23405222 -0.25011325 -0.26428008 -0.27695274 -0.28841656 -0.2988822 -0.30850967 -0.3174233 -0.3257217 -0.33348433 -0.3407762 -0.34765116 -0.35415432 -0.36032382]
#Add noise to the initally chosen array yy.
y_noise = 0.2 * np.random.normal(size=xdata.size)
ydata = yy + y_noise
print ydata
>>>[-0.1404996 -0.04353953 0.35002257 0.12939468 -0.34259184 -0.2906065 -0.37508709 -0.41583238 -0.511851 -0.39465581 -0.32631751 -0.34403938 -0.592997 -0.34312689 -0.4838437 -0.19311436 -0.20962735 -0.31134191-0.09487793 -0.55578775]
H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
#Select 5 consecutive points of xdata (from index i to i+4).
xnew = xdata[i: i+5]
#Select 5 consecutive points of ydata (from index i to i+4).
ynew = ydata[i: i+5]
#Fit function nlvh to data using scipy.optimize.curve_fit
popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)
#Optimal values for H from minimization of sum of the squared residuals.
H_lst += [popt[0]]
#Optimal values for C from minimization of sum of the squared residuals.
C_lst += [popt[1]]
H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.
Following will be your output of H_arr and C_arr for the chosen values of xdata and ydata.
print H_arr
>>>[ -11.5317468 -18.44101926 20.30837781 31.47360697 -14.45018355 24.17226837 39.96761325 15.28776756 -113.15255865 15.71324201 51.56631241 159.38292301 -28.2429133 -60.97509922 -89.48216973]
print C_arr
>>>[0.70339652 0.34734507 0.2664654 0.2062776 0.30740565 0.19066498 0.1812445 0.30169133 0.11654544 0.21882872 0.11852967 0.09968506 0.2288574 0.128909 0.11658227]
Current I'm attempting to use scipy's least squares, or any of their minimization functions to minimize a function with 5 parameters.
What I would like scipy to do is minimize some function using a standard least squares.
My code is below:
fitfunc1 = lambda p, xx, yy, zz: -(50000*(xx + (p[0] + p[1])*yy +
p[3]))/(1.67*(-p[2]*yy + zz + p[4]))
errfunc1 = lambda p,x11, xx, yy, zz: fitfunc1(p, xx, yy, zz) - x11
x0 = np.array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1], dtype = float)
res3 = leastsq(errfunc1, x0[:], args=(x1, x, y, z))
where x1, x, y, z are all column numpy arrays of the same length about 90x1
I'm currently getting an error that says ' Error: Result from function call is not a proper array of floats', I've attempted many possibilities, and tried to rewrite this the way it is described in examples, but doesn't seem to work.
In addition: I actually would like to solve the problem:
min sum (f - x1)**2 + (g - x2)**2
where f = f(p, x, y, z) and g = g(p, x, y, z) and x, y, z, x1, y1 are all data, but attempting to find the parameters, p (6 of them).
Is this currently possible in least squares? I have attempted using scipy.minimize, but when this is done using the Nedler's Mead method, it doesn't seem to work either.
Here is my current code:
def f(phi, psi, theta, xnot, ynot, znot):
return sum(abs( (-50000*(x[:]+ (psi + phi)*y[:] + xnot)/(1.67*(-
theta*y[:] + z[:] + znot))) - x1[:]) //
+ abs( (-50000*(-x[:]*(psi + phi) + y[:] + theta*(z[:]) + ynot)/(1.67*(-
theta*y[:] + z[:] + znot))) - y1[:]))
x0 = np.array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1], dtype = float)
res3 = leastsq(f, x0[:], args=(x1, y1, x, y, z))
I feel as if I am making some mistake that may be obvious to someone more familiar, but this is my first time using scipy. All help would be much appreciated.
I believe your problem is with the shape of the variables:
where x1, x, y, z are all column numpy arrays of the same length about 90x1
It causes your fitfunc1 and errfunc1 functions to return 2d arrays (of shape (90,1)), where the scipy optimization function expects a 1d array.
Try reshaping your arrays, e.g.,
x1 = x1.reshape((90,)),
and similarly for the rest of your input variables.
This should fix your problem.
I have used numpy in python to fit my data to a sigmoidal curve. How can I find the vaue for X at y=50% point in the curve after the data is fit to the curve
enter code here`import numpy as np
enter code here`import pylab
from scipy.optimize import curve_fit
def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y
xdata = np.array([0.0, 1.0, 3.0, 4.3, 7.0, 8.0, 8.5, 10.0, 12.0])
ydata = np.array([0.01, 0.02, 0.04, 0.11, 0.43, 0.7, 0.89, 0.95, 0.99])
popt, pcov = curve_fit(sigmoid, xdata, ydata)
print popt
x = np.linspace(-1, 15, 50)
y = sigmoid(x, *popt)
pylab.plot(xdata, ydata, 'o', label='data')
pylab.plot(x,y, label='fit')
pylab.ylim(0, 1.05)
pylab.legend(loc='best')
pylab.show()
You just need to solve the function you found for y(x) = 0.50. You can use one of the root finding tools of scipy, though these only solve for zero, so you need to give your function an offset:
def sigmoid(x, x0, k, y0=0):
y = 1 / (1 + np.exp(-k*(x-x0))) + y0
return y
Then it's just a matter of calling the root finding method of choice:
from scipy.optimize import brentq
a = np.min(xdata)
b = np.max(xdata)
x0, k = popt
y0 = -0.50
solution = brentq(sigmoid, a, b, args=(x0, k, y0)) # = 7.142
In addition to your comment:
My code above uses the original popt that was calculated with your code. If you do the curve fitting with the updated sigmoid function (with the offset), popt will also contain a fitted parameter for y0.
Probably you don't want this.. you'll want the curve fitted for y0=0. This can be done by supplying a guess for the curve_fit with only two values. This way the default value for y0 of the sigmoid function will be used:
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0 = (1,1))
Alternatively, just declare two seperate sigmmoid functions, one with the offset and one without it.