I have a function Imaginary which describes a physics process and I want to fit this to a dataset x_interpolate, y_interpolate. The function is a form of a Lorentzian peak function and I have some initial values that are user given, except for f_peak (the peak location) which I find using a peak finding algorithm. All of the fit parameters, except for the offset, are expected to be positive and thus I have set bounds_I accordingly.
def Imaginary(freq, alpha, res, Ms, off):
numerator = (2*alpha*freq*res**2)
denominator = (4*(alpha*res*freq)**2) + (res**2 - freq**2)**2
Im = Ms*(numerator/denominator) + off
return Im
pI = np.array([alpha_init, f_peak, Ms_init, 0])
bounds_I = ([0,0,0,0, -np.inf], [np.inf,np.inf,np.inf, np.inf])
poptI, pcovI = curve_fit(Imaginary, x_interpolate, y_interpolate, pI, bounds=bounds_I)
In some situations I want to keep the parameter f_peak fixed during the fitting process. I tried an easy solution by changing bounds_I to:
bounds_I = ([0,f_peak+0.001,0,0, -np.inf], [np.inf,f_peak-0.001,np.inf, np.inf])
This is for many reasons not an optimal way of doing this so I was wondering if there is a more Pythonic way of doing this? Thank you for your help
If a parameter is fixed, it is not really a parameter, so it should be removed from the list of parameters. Define a model that has that parameter replaced by a fixed value, and fit that. Example below, simplified for brevity and to be self-contained:
x = np.arange(10)
y = np.sqrt(x)
def parabola(x, a, b, c):
return a*x**2 + b*x + c
fit1 = curve_fit(parabola, x, y) # [-0.02989396, 0.56204598, 0.25337086]
b_fixed = 0.5
fit2 = curve_fit(lambda x, a, c: parabola(x, a, b_fixed, c), x, y)
The second call to fit returns [-0.02350478, 0.35048631], which are the optimal values of a and c. The value of b was fixed at 0.5.
Of course, the parameter should be removed from the initial vector pI and the bounds as well.
You might find lmfit (https://lmfit.github.io/lmfit-py/) helpful. This library adds a higher-level interface to the scipy optimization routines, aiming for a more Pythonic approach to optimization and curve fitting. For example, it uses Parameter objects to allow setting bounds and fixing parameters without having to modify the objective or model function. For curve-fitting, it defines high level Model functions that can be used.
For you example, you could use your Imaginary function as you've written it with
from lmfit import Model
lmodel = Model(Imaginary)
and then create Parameters (lmfit will name the Parameter objects according to your function signature), providing initial values:
params = lmodel.make_params(alpha=alpha_init, res=f_peak, Ms=Ms_init, off=0)
By default all Parameters are unbound and will vary in the fit, but you can modify these attributes (without rewriting the model function):
params['alpha'].min = 0
params['res'].min = 0
params['Ms'].min = 0
You can set one (or more) of the parameters to not vary in the fit as with:
params['res'].vary = False
To be clear: this does not require altering the model function, making it much easier to change with is fixed, what bounds might be imposed, and so forth.
You would then perform the fit with the model and these parameters:
result = lmodel.fit(y_interpolate, params, freq=x_interpolate)
you can get a report of fit statistics, best-fit values and uncertainties for parameters with
print(result.fit_report())
The best fit Parameters will be held in result.params.
FWIW, lmfit also has builtin Models for many common forms, including Lorentzian and a Constant offset. So, you could construct this model as
from lmfit.models import LorentzianModel, ConstantModel
mymodel = LorentzianModel(prefix='l_') + ConstantModel()
params = mymodel.make_params()
which will have Parameters named l_amplitude, l_center, l_sigma, and c (where c is the constant) and the model will use the name x for the independent variable (your freq). This approach can become very convenient when you may want to change the functional form of the peaks or background, or when fitting multiple peaks to a spectrum.
I was able to solve this issue regarding arbitrary number of parameters and arbitrary positioning of the fixed parameters:
def d_fit(x, y, param, boundMi, boundMx, listparam):
Sparam, SboundMi, SboundMx = asarray([]), asarray([]), asarray([])
Nparam, NboundMi, NboundMx = asarray([]), asarray([]), asarray([])
for i in range(len(param)):
if(listparam[i] == 1):
Sparam = append(Sparam,asarray(param[i]))
SboundMi = append(SboundMi,asarray(boundMi[i]))
SboundMx = append(SboundMx,asarray(boundMx[i]))
else:
Nparam = append(Nparam,asarray(param[i]))
def funF(x, Sparam):
j = 0
for i in range(len(param)):
if(listparam[i] == 1):
param[i] = Sparam[i-j]
else:
param[i] = Nparam[j]
j = j + 1
return fun(x, param)
return curve_fit(lambda x, *Sparam: funF(x, Sparam), x, y, p0 = Sparam, bounds = (SboundMi,SboundMx))
In this case:
param = [a,b,c,...] # parameters array (any size)
boundMi = [min_a, min_b, min_c,...] # minimum allowable value of each parameter
boundMx = [max_a, max_b, max_c,...] # maximum allowable value of each parameter
listparam = [0,1,1,0,...] # 1 = fit and 0 = fix the corresponding parameter in the fit routine
and the root function is define as
def fun(x, param):
a,b,c,d.... = param
return a*b/c... # any function of the params a,b,c,d...
This way, you can change the root function and the number of parameters without changing the fit routine.
And, at any time, you can fix or let fit any parameter by changing "listparam".
Use like this:
popt, pcov = d_fit(x, y, param, boundMi, boundMx, listparam)
"popt" and "pcov" are 1D arrays of the size of the number of "1" in "listparam" bringing the results of the fitted parameters (best value and err matrix)
"param" will ramain an 1D array of the same size of the original (input) "param", HOWEVER IT WILL BE UPDATED AUTOMATICALLY TO THE FITTED VALUES (same as "popt") for the fitted values, keeping the fixed values according to "listparam"
Hope can be usefull!
Obs1: x = 1D-array independent values and y = 1D-array dependent values
Obs2: This is my first post. Please let me know if I can improove it!
Related
I have following function which I need to minimize utilizing least square method (I am using lmfit).
y = a * exp(-x/b) + c
I have for example following data:
profitlist = [-10000, 100.00, 1000.00, 100000.00, 1000000.00]
utilitylist = [0, 0.2, 0.4, 0.6, 1]
App returns the following error:
ValueError: NaN values detected in your input data or the output of your objective/model function - fitting algorithms cannot handle this! Please read https://lmfit.github.io/lmfit-py/faq.html#i-get-errors-from-nan-in-my-fit-what-can-i-do for more information.
Problem seems to be that: exp(-x/b) returns inf or -inf if profitList contains any bigger negative number (-1000 worked, -100000 not). So it overflows probably.
The values in the profitList can be very large float numbers and they are not always the same. So how can I optimize it with these huge numbers? It seems that lmfit does not support decimal numbers which would fix the issue... What can I do to make it work?
class LeastSquares:
def __init__(self, profitList, utilityList):
self.profitList = np.asarray(profitList)
self.utilityList = np.asanyarray(utilityList)
def function(self, params, x):
a = params["a"]
b = params["b"]
c = params["c"]
return a * np.exp(-x/b) + c
def residual(self, params, x, y):
return (y - self.function(params, x))**2
def setParameters(self, a_start, b_start, c_start):
parameters = Parameters()
parameters.add(name="a", value=a_start, min=None, max=0, vary=True)
parameters.add(name="b", value=b_start, vary=True, min=0.1, max=None)
parameters.add(name="c", value=c_start, vary=True)
return parameters
def startOptimalization(self):
parameters = self.setParameters(-1, 1, 1)
result = minimize(self.residual, parameters, args=(self.profitList, self.utilityList), method="leastsq")
result.params.pretty_print()
print(fit_report(result))
print("SSE")
print(np.sum(result.residual))
As you see, numpy.exp(arg) gives Infinity for any argument greater than ~709, and you will need to avoid such extreme values. The underlying solvers simply cannot solve them. Since your argument for arg is -x/b, you need to make sure that b is not so small as to blow up the argument to numpy.exp().
In fact, your code shows that you do set a lower bound on b of 0.1.
But with values of profitlist extending to 1e7, that lower bound is too small to prevent Infinity - your lower limit on b would have to be around 14,000.
If your values for profitlist are changing for each optimization run, you may need to do something like this (in your startOptimization):
parameters = self.setParameters(-1, 1, 1)
parameters['b'].min = max(abs(self.profitList))/700.0
result = minimize(self.residual, parameters, args=(self.profitList, self.utilityList), method="leastsq")
result.params.pretty_print()
Also, when fitting exponential changes, it is often helpful to compute your exponential model function, and then take the residual as the logarithm of your data and the logarithm of your model, effectively doing the fit in log-space, as you would likely plot the data.
And, finally, don't take the square or the sum of squares of the difference yourself, just return the residual array with sign in tact. That is, you will probably be better off using something like:
def residual(self, params, x, y):
return np.log(y) - np.log(self.function(params, x))
I am trying to fit some experimental data to a nonlinear function with one parameter that includes an arcus cosine function which therefore is limited in its area of definition from -1 to 1. I use scipy's curve_fit to find the parameter of the function, but it returns the following error:
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 400.
The function I want to fit is this one:
def fitfunc(x, a):
y = np.rad2deg(np.arccos(x*np.cos(np.deg2rad(a))))
return y
For the fitting, I provid a numpy array for x and y respectively which contain values in degree (which is why the function contains conversion to and from radians).
param, param_cov = curve_fit(fitfunc, xs, ys)
When I use other fit functions like for example a polynomial, the curve_fit returns some values, the error mentioned above only occurs when I use this function which includes an arcus cosine.
I suspect that it cannot fit the data points because depending on the parameter of the arcus cosine function, some data points do not lie inside the area of definition of the arcus cosine. I have tried raising the number iterations (maxfev) but without success.
Sample data:
ys = np.array([113.46125, 129.4225, 140.88125, 145.80375, 145.4425,
146.97125, 97.8025, 112.91125, 114.4325, 119.16125,
130.13875, 134.63125, 129.4375, 141.99, 139.86,
138.77875, 137.91875, 140.71375])
xs = np.array([2.786427013, 3.325624466, 3.473013087, 3.598247534, 4.304280248,
4.958273121, 2.679526725, 2.409388637, 2.606306639, 3.661558062,
4.569923009, 4.836843789, 3.377013596, 3.664550526, 4.335401233,
3.064199519, 3.97155254, 4.100567011])
As HS-nebula mentioned in his comments, you need to define an initial value a0 of a as a start guess for the curve-fitting. Moreover, you need to be careful when choosing a0 as your np.arcos() is only defined in [-1,1] and choosing the wrong a0 results in error.
import numpy as np
from scipy.optimize import curve_fit
ys = np.array([113.46125, 129.4225, 140.88125, 145.80375, 145.4425, 146.97125,
97.8025, 112.91125, 114.4325, 119.16125, 130.13875, 134.63125,
129.4375, 141.99, 139.86, 138.77875, 137.91875, 140.71375])
xs = np.array([2.786427013, 3.325624466, 3.473013087, 3.598247534, 4.304280248, 4.958273121,
2.679526725, 2.409388637, 2.606306639, 3.661558062, 4.569923009, 4.836843789,
3.377013596, 3.664550526, 4.335401233, 3.064199519, 3.97155254, 4.100567011])
def fit_func(x, a):
a_in_rad = np.deg2rad(a)
cos_a_in_rad = np.cos(a_in_rad)
arcos_xa_product = np.arccos( x * cos_a_in_rad )
return np.rad2deg(arcos_xa_product)
a0 = 80
param, param_cov = curve_fit(fit_func, xs, ys, a0, bounds = (0, 360))
print('Using curve we retrieve a value of a = ', param[0])
Output:
Using curve we retrieve a value of a = 100.05275506147824
However if you choose a0=60, you get the following error:
ValueError: Residuals are not finite in the initial point.
To be able to use the data with all possible values of a, a normalization as HS-nebula suggested is good idea.
I am trying to define a piecewise function to be fitted by lmfit library in Python. The issue I am having is a parameter I have defined for the function will not evaluate alongside the data I am submitting.
I have one example of a case somewhat similar to mine here. However, the vectorize function the answer describes wasn't producing values I wanted, and when reading the documentation, it didn't seem to be the answer to my solution. I also used scipy.optimize.leastsq, but I got the same issue with lmfit described below.
I have a my residual function defined such as
from lmfit import minimize, Parameters, Model
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
if(param2 < x):
p = 1
else:
p = param1*x + param2
return p - y
params = Parameters()
params.add('one', value=1)
params.add('two', value=2)
out = minimize(residual, params,args=(y,x))
I also tried defining the function such that
def f(param1,param2,x):
if(param2 < x):
p = 1
else:
p = param1*x + param2
return p
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
return f(param1,param2,x) - y
I have also tried inline using a lambda function.
I am getting an error 'The truth value of an array with more than one element is ambiguous.' When I got the error, it made sense why it happened, because (param2 < x) would produce a logical array. However, I can't seem to find a way to define the function in a piecewise fashion with the given case to get it fitted with the lmfit.minimize() function. I have seen the answer done in Matlab, in which it's nlinfit function seems to evaluate the data element-wise without issue (I tried searching if Python has an equivalent operation to define element-wise computation such as .* or .+, but that doesn't seem to exist as explicitly).
lmfit also seems to operate a bit differently compared to nlinfit, because we have to always have our residuals return (model - y) while nlinfit outputs the result once the function is given, which I am not sure could be another issue.
So to reiterate, my main question is if there is a method of defining the piecewise function such that it can compare the parameter to the data set.
Any help or explanation would be appreciated, thank you!
In place of (param2 < x) (where param2 is a float and x is an numpy array), you want to use numpy.where. You might try:
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
p = param1 * x + param2
p[np.where(param2 < x)] = 1.0
return p - y
I should also warn you about a potential problem with this approach to having a variable be a boundary for a piecewise function.
In non-linear fits, variables are always floating point (continuous, non-discrete) values. As the fit proceeds, it will make small adjustments in the values and see how that small change alters the result. In your approach, the parameter 'two' is used as both the transition between pieces and the offset for the line -- that is good.
If a parameter is used only as the transition, it may not work. Consider, say, x=np.array([0, 1., 2., 3., 4., ..., 20.0]). Having two = 10.5 and two=10.4 would then give the same result. In that case, the fit would not be able to alter the value of two: it would try a very small change, see no change in the result and give up.
So, either make sure that two is also used elsewhere in your real model (assuming your real model is more complicated than the example given), or consider using a more gentle transition rather than a hard change in pieces. I find an error-function of width ~spacing between x points often works. Depending on the nature of your problem, you might try something like this:
from scipy.special import erf, erfc
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
dx = (max(x) - min(x))/(len(x)-1)
xhi = (erf((x-param2)/dx) + 1)/2.0
xlo = (erfc((x-param2)/dx) + 1)/2.0
p = xlo*1.0 + xhi*(param1*x + param2)
# note: did you really want?
# p = xlo*param + xhi*(param1*x + param2)
# p = param2 + xhi*param1*x
return p - y
Hope that helps.
I have a set of data made up of (2-dimensional) observations of multiple objects. The observations can be described by a general function plus an offset that is unique to each object. I want to use curve_fit to simultaneously recover the general function and the offsets for each object (with associated errors). I do not know in advance how many objects the data-set will be made up of, only that there are likely to be multiple observations of each.
So a generalised data set of 7 observations might look like this:
[[x[0], y1[0], y2[0], lab='A'],
[x[1], y1[1], y2[1], lab='B'],
[x[2], y1[2], y2[2], lab='A'],
[x[3], y1[3], y2[3], lab='A'],
[x[4], y1[4], y2[4], lab='B'],
[x[5], y1[5], y2[5], lab='C'],
[x[6], y1[6], y2[6], lab='A']]
I could do the task by passing the parameters of the general function (say g = [g0, g1, g2]) and the object offsets offsets = n x [o1, o2] to fit_func and then using an object label to decide which of the n offsets needs to be added to the general function, except that I can't figure out how to pass the label.
def fit_func(x, g, offsets, lab):
y1 = g[0] * cos(2*(x - g[1])) + offsets['lab',0] + g[2]
y2 = g[0] * sin(2*(x - g[1])) + offsets['lab',1] + g[2]
return [y1, y2]
The problem is that lab is not a float to be fit, so I can't figure out how to pass it. From reading some other threads I believe I will need a wrapper function, but I can't figure out what form it should take, and then how to call it in such a way that I can specify sigma and p0.
Can anyone point me in the right direction?
Edit: I managed to produce a function that I thought would work. It used a global parameter call to choose options within the function call. So, for example I interleaved the y1 and y2 arrays, and had the function call the second equation every second run with a global getEven() and setEven(bool) call. However curve_fit really didn't like that. The fit values were nonsensical.
At the moment I am fitting the equation for y1 and the equation for y2 separately and taking the rms to determine g0 and g1 (this also gives me offsets['A',0] and offsets['A',1] respectively. I could just do this multiple times with each different object in the set, but I can't fit the g2 parameter this way, since in any given call to the y1 or y2 function it is degenerate with the corresponding offset.
Here is example code that fits two different equations with a shared parameter using 'A' or 'B' decoding. It appears to work as you need for decoding the lab type, but I personally have never done this before and while it appears to function per your post the "text-to-float" conversion inside the function seems klunky to me. But it works.
import numpy
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# single array with all "X" data to pass around
num = numpy.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])
ids = numpy.array(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'])
xdata = numpy.array([num, ids]) # combine data, numpy auto-converts to 'text' type
# ydata is numeric single array
ydata = [9.0,8.0,7.0,6.0,4.0,3.0,2.0,1.0]
def fitFunction(data, commonParameter, pA, pB):
numericDataAsText = data[0]
textData = data[1]
returnArray = []
for i in range(len(textData)):
x = float(numericDataAsText[i])
if textData[i] == 'A':
val = commonParameter + x * pA
elif textData[i] == 'B':
val = commonParameter + x * pB
else:
raise(Exception('Error: must use A or B'))
returnArray.append(val)
return returnArray
initialParameters = [1.0, 1.0, 1.0]
# curve fit the equations individually to their respective data
params, pcov = curve_fit(fitFunction, xdata, ydata, initialParameters)
# values for display of fitted function
commonParameter, pA, pB = params
# for plotting the fitting results
y_fit = fitFunction(xdata, commonParameter, pA, pB)
plt.plot(xdata[0], ydata, 'D') # plot the raw data as a scatterplot
plt.plot(xdata[0][:4], y_fit[:4])
plt.plot(xdata[0][4:], y_fit[4:])
plt.show()
print('fittedparameters:', params)
It would be helpful to show a more complete example of what you are trying, including the call to scipy.optimize.curve_fit. But, if I understand the question correctly, you want to have an argument for your model function that is not treated as a variable in the fit. I believe that curve_fit cannot do this, and treats all arguments after the first as variables.
In fact, I think that your model function will not work for curve_fit because you expect g to be a sequence of values. With curve_fit, each argument after the first will get a single float value. So you probably want something like
def func(x, g0, g1, g2, offsets):
y1 = g0 * cos(2*(x - g1)) + offsets['lab', 0] + g2
...
Anyway, I have two suggestions to work around this limitation of curve_fit:
First, you could overload x. Now, curve_fit will internally apply numpy.asarray() to the x you pass in, but it will otherwise just pass it along to your model function. So, if you turn x into a list containing your real x and your lab, you should be able to unpack this in your model function, say like
xhack = [x, offsets]
def func(x, g0, g1, g2):
x, offsets = x
....
out = curve_fit(func, xhack, ...)
Personally, I think that's kind of ugly, but it might work.
Second, you could use lmfit (https://lmfit.github.io/lmfit-py/), which provides a higher level interface to curve fitting and fixes many of the shortcomings of curve_fit. For your question in particular, lmfit's Model class for curve fitting examines the model function more carefully to turn function arguments into fitting parameters. Specifically:
keyword arguments with non-numerical defaults will not be turned into fit parameters.
you can specify more than 1 "independent variable", and they do not have to be the first argument of the function.
That is, you could either write:
from lmfit import Model
def func(x, g0, g1, g2, offsets=None):
y1 = g0 * cos(2*(x - g1)) + offsets['lab', 0] + g2
mymodel = Model(func)
or explicitly tell Model what the independent variables are:
from lmfit import Model
def func(x, g0, g1, g2, offsets):
y1 = g0 * cos(2*(x - g1)) + offsets['lab', 0] + g2
mymodel = Model(func, independent_vars=['x', 'offsets'])
Either way, offsets can be any complex objects, and you would use this mymodel for curve fitting with:
# create parameter objects for this model, with initial values:
params = mymodel.make_params(g0=0, g1=0.5, g2=2.0)
# run the fit
result = mymodel.fit(ydata, params, x=x, offsets=offsets)
There are lots of other conveniences we added to lmfit (I am one of the developers) for building curve fitting models and working with parameters as high-level objects, but this might be enough to get you started.
I want to fit a curve to my data:
x=[24,25,28,37,58,104,200,235,235]
y=[340,350,370,400,430,460,490,520,550]
xerr=[1.1,1,0.8,1.4,1.4,2.6,3.8,2,2]
def fit_fc(x, a, b, c):
return a*x**b+c
popt, pcov=curve_fit(fit_fc,x,y,maxfev=5000)
plt.plot(x,fit_fc(x,popt[0],popt[1],popt[2]))
plt.errorbar(x,y,xerr=xerr,fmt='-o')
but i want to put some constraints on the a,b and c. For example I want them to be in some range, lets say between 0 and 20. How can i achieve that? I'm new in Python, so any help would be appreciated.
You could use lmfit to constrain you parameters. For the following plot, I constrained your parameters a and b to the range [0,20] (which you mentioned in your post) and c to the range [0, 400]. The parameters you get are:
a: 19.9999991
b: 0.46769173
c: 274.074071
and the corresponding plot looks as follows:
As you can see, the model reproduces the data reasonable well and the parameters are in the given ranges.
Here is the code that reproduces the results with additional comments:
from lmfit import minimize, Parameters, Parameter, report_fit
import numpy as np
x=[24,25,28,37,58,104,200,235,235]
y=[340,350,370,400,430,460,490,520,550]
def fit_fc(params, x, data):
a = params['a'].value
b = params['b'].value
c = params['c'].value
model = np.power(x,b)*a + c
return model - data #that's what you want to minimize
# create a set of Parameters
#'value' is the initial condition
#'min' and 'max' define your boundaries
params = Parameters()
params.add('a', value= 2, min=0, max=20)
params.add('b', value= 0.5, min=0, max=20)
params.add('c', value= 300.0, min=0, max=400)
# do fit, here with leastsq model
result = minimize(fit_fc, params, args=(x, y))
# calculate final result
final = y + result.residual
# write error report
report_fit(params)
#plot results
try:
import pylab
pylab.plot(x, y, 'k+')
pylab.plot(x, final, 'r')
pylab.show()
except:
pass
If you constrain all of your parameters to the range [0,20], the plot looks rather bad:
It depends on what you want to have happen if the variables are out of range. You can use a simple if statement (in this case the program exit()s):
x = 21
if (x not in range(0, 20)):
print("var x is out of range")
exit()
Another way is to assert that the variable must be in the range. In this case, it's wrapped in a try/except block that handles the problem gracefully, and also exit()s like above:
try:
assert(x in range(0, 20))
except AssertionError:
print("variable x is out of range")
exit()
Scipy uses unconstrained least squares in order to fit curve parameters, so it won't be that straightforward: https://github.com/scipy/scipy/blob/v0.16.0/scipy/optimize/minpack.py#L454
What you'd probably like to do is called constrained (non-linear?, giving what you're trying to fit) least squares problem. For instance, take a look at those discussions:
Constrained least-squares estimation in Python ( leastsq_bounds: https://gist.github.com/denis-bz/65da931bdbf92c49e4d0 )
scipy.optimize.leastsq with bound constraints