Fit an arbitrary number of parameters when calling curve_fit - python

Closest I found to this question was here: Fitting only one parameter of a function with many parameters in python. I have a multi-parameter function that I want to be able to call with a different subset of parameters being optimised in different parts of the code (useful because for some datasets, I may be able to fix some parameters based on ancillary data). Simplified demonstration of the problem below.
from scipy.optimize import curve_fit
import numpy as np
def wrapper_func(**kwargs):
a = kwargs['a'] if 'a' in kwargs else None
b = kwargs['b'] if 'b' in kwargs else None
c = kwargs['c'] if 'c' in kwargs else None
return lambda x, a, c: func(x, a, b, c)
def func(x, a, b, c):
return a * x**2 + b * x + c
# Set parameters
a = 0.3
b = 5
c = 17
# Make some fake data
x_vals = np.arange(100)
y_vals = a * x_vals**2 + b * x_vals + c
noise = np.random.randn(100) * 20
# Get fit
popt, pcov = curve_fit(lambda x, a_, c_: func(x, a_, b, c_),
x_vals, y_vals + noise)
# Get fit using separate function
alt_popt, alt_cov = curve_fit(wrapper_func(b=5), x_vals, y_vals + noise)
So this works, but I want to be able to pass any combination of parameters to be fixed. So here parameters a and c are optimised, and b is fixed, but if I want to fix a and optimise b and c (or any other combination), is there a way to do this neatly? I made a start with wrapper_func() above, but the same problem arises: there seems to be no way to vary which parameters are optimised, except by writing multiple lambdas (conditional on what fixed parameter values are passed). This gets ugly quickly because the equations I am working with have 4-6 parameters. I can make a version work using eval, but gather this is not recommended. As it stands I have been groping around trying to use *args with lambda, but haven't managed to get it to work.
Any tips greatly appreciated!

lmfit (https://lmfit.github.io/lmfit-py/) does exactly this. Instead of creating an array of floating point values for the parameters in the fit, one creates a Parameters object -- an ordered dictionary of Parameter objects that are used to parametrize the model for the data. Each Parameter can be fixed or varied in the fit, can have max/min bounds, or can be defined as a simple mathematical expression in terms of other Parameters in the fit.
That is, with lmfit (and its Model class that is especially useful for curve-fitting), one creates Parameters and can then decide which will be optimized and which will be held fixed.
As an example, here is a variation on the problem you pose:
import numpy as np
from lmfit import Model
import matplotlib.pylab as plt
# starting parameters
a, b, c = 0.3, 5, 17
x_vals = np.arange(100)
noise = np.random.normal(size=100, scale=0.25)
y_vals = a * x_vals**2 + b * x_vals + c + noise
def func(x, a, b, c):
return a * x**2 + b * x + c
# create a Model from this function
model = Model(func)
# create parameters with initial values. Model will know to
# turn function args `a`, `b`, and `c` into Parameters:
params = model.make_params(a=0.25, b=4, c=10)
# you can alter each parameter, for example, fix b or put bounds on a
params['b'].vary = False
params['b'].value = 5.3
params['a'].min = -1
params['a'].max = 1
# run fit
result = model.fit(y_vals, params, x=x_vals)
# print and plot results
print(result.fit_report())
result.plot(datafmt='--')
plt.show()
will print out:
[[Model]]
Model(func)
[[Fit Statistics]]
# function evals = 12
# data points = 100
# variables = 2
chi-square = 475.843
reduced chi-square = 4.856
Akaike info crit = 159.992
Bayesian info crit = 165.202
[[Variables]]
a: 0.29716481 +/- 7.46e-05 (0.03%) (init= 0.25)
b: 5.3 (fixed)
c: 11.4708897 +/- 0.329508 (2.87%) (init= 10)
[[Correlations]] (unreported correlations are < 0.100)
C(a, c) = -0.744
(You will find that b and c are highly and negatively correlated) and show a plot like
Furthermore, the fit results including the parameters are held in result, so if you want to change what parameters are fixed, you can simply change the starting values (which have not been updated by the fit):
params['b'].vary = True
params['a'].value = 0.285
params['a'].vary = False
newresult = model.fit(y_vals, params, x=x_vals)
and then compare/contrast the two results.

Here my solution. I am not sure how to do it with curve_fit, but it works with leastsq. It has a wrapper function that takes the free and fixed parameters as well as a list of the free parameter positions. As leastsq calls the function with the free parameters first, hence, the wrapper has to rearrange the order.
from matplotlib import pyplot as plt
import numpy as np
from scipy.optimize import leastsq
def func(x,a,b,c,d,e):
return a+b*x+c*x**2+d*x**3+e*x**4
#takes x, the 5 parameters and a list
# the first n parameters are free
# the list of length n gives there position, e.g. 2 parameters, 1st and 3rd order ->[1,3]
# the remaining parameters are in order, i.e. in this example it would be f(x,b,d,a,c,e)
def expand_parameters(*args):
callArgs=args[1:6]
freeList=args[-1]
fixedList=range(5)
for item in freeList:
fixedList.remove(item)
callList=[0,0,0,0,0]
for val,pos in zip(callArgs, freeList+fixedList):
callList[pos]=val
return func(args[0],*callList)
def residuals(parameters,dataPoint,fixedParameterValues=None,freeParametersPosition=None):
if fixedParameterValues is None:
a,b,c,d,e = parameters
dist = [y -func(x,a,b,c,d,e) for x,y in dataPoint]
else:
assert len(fixedParameterValues)==5-len(freeParametersPosition)
assert len(fixedParameterValues)>0
assert len(fixedParameterValues)<5 # doesn't make sense to fix all
extraIn=list(parameters)+list(fixedParameterValues)+[freeParametersPosition]
dist = [y -expand_parameters(x,*extraIn) for x,y in dataPoint]
return dist
if __name__=="__main__":
xList=np.linspace(-1,3,15)
fList=np.fromiter( (func(s,1.1,-.9,-.7,.5,.1) for s in xList), np.float)
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
dataTupel=zip(xList,fList)
###some test
print residuals([1.1,-.9,-.7,.5,.1],dataTupel)
print residuals([1.1,-.9,-.7,.5],dataTupel,fixedParameterValues=[.1],freeParametersPosition=[0,1,2,3])
#exact fit
bestFitValuesAll, ier = leastsq(residuals, [1,1,1,1,1],args=(dataTupel))
print bestFitValuesAll
###Only a constant
guess=[1]
bestFitValuesConstOnly, ier = leastsq(residuals, guess,args=(dataTupel,[0,0,0,0],[0]))
print bestFitValuesConstOnly
fConstList=np.fromiter(( func(x,*np.append(bestFitValuesConstOnly,[0,0,0,0])) for x in xList),np.float)
###Only 2nd and 4th
guess=[1,1]
bestFitValues_1_3, ier = leastsq(residuals, guess,args=(dataTupel,[0,0,0],[2,4]))
print bestFitValues_1_3
f_1_3_List=np.fromiter(( expand_parameters(x, *(list(bestFitValues_1_3)+[0,0,0]+[[2,4]] ) ) for x in xList),np.float)
###Only 2nd and 4th with closer values
guess=[1,1]
bestFitValues_1_3_closer, ier = leastsq(residuals, guess,args=(dataTupel,[1.2,-.8,0],[2,4]))
print bestFitValues_1_3_closer
f_1_3_closer_List=np.fromiter(( expand_parameters(x, *(list(bestFitValues_1_3_closer)+[1.2,-.8,0]+[[2,4]] ) ) for x in xList),np.float)
ax.plot(xList,fList,linestyle='',marker='o',label='orig')
ax.plot(xList,fConstList,linestyle='',marker='o',label='0')
ax.plot(xList,f_1_3_List,linestyle='',marker='o',label='1,3')
ax.plot(xList,f_1_3_closer_List,linestyle='',marker='o',label='1,3 c')
ax.legend(loc=0)
plt.show()
Providing:
>>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
>>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
>>[ 1.1 -0.9 -0.7 0.5 0.1]
>>[ 2.64880466]
>>[-0.14065838 0.18305123]
>>[-0.31708629 0.2227272 ]

Related

nonlinear curve fitting in python with two variables

I am trying to define a function that fits input x and y data of the form:
def nlvh(x,y, xi, yi, H,C):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
The x and y data are 1-D numpy arrays of the same length. I would like to slice the data so that I can select the first 5 points of x and y, fit those by optimizing C and H in the model, and then move one point ahead and repeat. I have some code that does this for a linear fit over the same data:
for i in np.arange(len(x)):
xdata = x[i:i + window]
ydata = y[i:i+window]
a[i], b[i] = np.polyfit(xdata, ydata,1)
xdata_avg[i] = np.mean(xdata)
if i == (lenx - window):
break
but doing the same thing over the equation defined above appears to be a bit more tricky. x and y appear as the independent and dependent variables, but there are also parameters xo and yo which are the first values of x and y in each window.
The end result I would like are two new arrays with H[i] and C[i], where i designates each subsequent window. Does anybody have some insight as to how I can get started?
Following your comment to my previous answer (where you suggested that you will like xi and yi to be the initial values in each "sliced" x and y arrays), I am adding another answer. This answer introduces a change in the function nlvh and achieves exactly what you desire. As like my previous answer, we will use curve_fit from scipy.optimize.
In the below mentioned code, I am using globals() function from python to define xi and yi. For every sliced x and y arrays, xi and yi store the first value of the respective sliced arrays. This is the revamped code:
from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit
def nlvh(x, H, C):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
xdata = np.arange(1,21) #Choose an array for x.
#Choose an array for y.
ydata = np.array([-0.1404996, -0.04353953, 0.35002257, 0.12939468, -0.34259184, -0.2906065,
-0.37508709, -0.41583238, -0.511851, -0.39465581, -0.32631751, -0.34403938,
-0.592997, -0.34312689, -0.4838437, -0.19311436, -0.20962735, -0.31134191,
-0.09487793, -0.55578775])
H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
#Select 5 consecutive points of xdata (from index i to i+4).
xnew = xdata[i: i+5]
globals()['xi'] = xnew[0]
#Select 5 consecutive points of ydata (from index i to i+4).
ynew = ydata[i: i+5]
globals()['yi'] = ynew[0]
#Fit function nlvh to data using scipy.optimize.curve_fit
popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)
#Optimal values for H from minimization of sum of the squared residuals.
H_lst += [popt[0]]
#Optimal values for C from minimization of sum of the squared residuals.
C_lst += [popt[1]]
H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.
Your output for H_arr and C_arr will now be the following:
print H_arr
>>>[1.0, 1.0, -23.041138662879327, -34.58915200575536, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
print C_arr
>>>[1.0, 1.0, -8.795855063863234, -9.271561975595562, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Following are the plots that you get for the data selected above (xdata, ydata).
You can use curve_fit from scipy.optimize. It will use non-linear least squares to fit the parameters (H, C, xi, yi) of your function nlvh to given input data for x and y.
Try the following code. In the below mentioned code, H_arr and C_arr are numpy arrays which contain fit parameters of H and C respectively when the function nlvh is fitted to windows of 5 consecutive points of xdata and ydata (xdata and ydata are arrays that I have chosen for x and y. You can choose different arrays here.)
from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit
def nlvh(x, H, C, xi, yi):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
xdata = np.arange(1,21) #Choose an array for x
#Find an array yy for chosen values of parameters (H, C, xi, yi)
yy = nlvh(xdata, H=1.0, C=1.0, xi=1.0, yi=1.0)
print yy
>>>[ 0. -0.08337108 -0.13214004 -0.16674217 -0.19358166 -0.21551112 -0.23405222 -0.25011325 -0.26428008 -0.27695274 -0.28841656 -0.2988822 -0.30850967 -0.3174233 -0.3257217 -0.33348433 -0.3407762 -0.34765116 -0.35415432 -0.36032382]
#Add noise to the initally chosen array yy.
y_noise = 0.2 * np.random.normal(size=xdata.size)
ydata = yy + y_noise
print ydata
>>>[-0.1404996 -0.04353953 0.35002257 0.12939468 -0.34259184 -0.2906065 -0.37508709 -0.41583238 -0.511851 -0.39465581 -0.32631751 -0.34403938 -0.592997 -0.34312689 -0.4838437 -0.19311436 -0.20962735 -0.31134191-0.09487793 -0.55578775]
H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
#Select 5 consecutive points of xdata (from index i to i+4).
xnew = xdata[i: i+5]
#Select 5 consecutive points of ydata (from index i to i+4).
ynew = ydata[i: i+5]
#Fit function nlvh to data using scipy.optimize.curve_fit
popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)
#Optimal values for H from minimization of sum of the squared residuals.
H_lst += [popt[0]]
#Optimal values for C from minimization of sum of the squared residuals.
C_lst += [popt[1]]
H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.
Following will be your output of H_arr and C_arr for the chosen values of xdata and ydata.
print H_arr
>>>[ -11.5317468 -18.44101926 20.30837781 31.47360697 -14.45018355 24.17226837 39.96761325 15.28776756 -113.15255865 15.71324201 51.56631241 159.38292301 -28.2429133 -60.97509922 -89.48216973]
print C_arr
>>>[0.70339652 0.34734507 0.2664654 0.2062776 0.30740565 0.19066498 0.1812445 0.30169133 0.11654544 0.21882872 0.11852967 0.09968506 0.2288574 0.128909 0.11658227]

Least Squares Method for a sum of functions

I would like to use the curve_fit function from the scipy.optimize module to determine amplitudes, frequencies, phases of sum of sine functions (and one y0). It's easy to do when I know a number of sines to use. For example when I know two frequencies from the DFT (Discrete Fourier Transform): 1.152 and 0.432 I can define a function:
def func(x, amp1, amp2, freq1 , freq2, phase1, phase2, y0):
return amp1*np.sin(freq1*x + phase1) + amp2*np.sin(freq2*x + phase2) + y0
Then, using the curve_fit and constraining intervals of frequencies I can find a good fitting:
param, _ = curve_fit(func, t, data, bounds=([-np.inf, -np.inf, 1.14, 0.43, -np.inf, -np.inf, -np.inf], [np.inf, np.inf, 1.16, 0.44, np.inf, np.inf, np.inf]))
It looks great:
But in this case I've prepared the data and I've known a number of frequencies. Do you know how to define the func only once and handle all cases (for example five sine functions)? I've tried to put the parameters into lists, e.g. amp = [amp1, amp2, ... ] and I've iterated over their length. But there is a problem to define bounds for parameter lists. bounds is very important to ensure reality model.
The solution does not have to based on curve_fit.
Assuming you know the frequencies beforehand the problem is simple. You can set the lower bound to 0 and set the upper bound to 2 * pi * freq for frequency. For amps, set any number (or np.inf if you want no boundary).
You can formulate the function in the form lambda x, amp1, phase1, amp2, phase2... : y, curve_fit can accept a function of undefined number of arguments as long as you supply a proper initial guess.
A sample code for five frequencies:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,10,60)
w = [1,2,3,4,5]
a = [1,4,2,3,0.1]
x0 = [0,1,0,1,0.5]
y = np.sum(a_i * np.sin(w_i * x - x0_i) for w_i, a_i, x0_i in zip(w,a, x0)) #base_data
yr = y + np.random.normal(0,0.5, size=x.size) #noisy data
def func(x, *args):
""" function of the form lambda x, amp1, phase1, amp2, phase2...."""
return np.sum(a_i * np.sin(w_i * (x-x0)) for w_i, a_i, x0
in zip(w,args[::2], args[1::2]))
ubounds = np.zeros(len(w) * 2)
ubounds[::2] = 10 #setting amp max value to 10 (arbitrary)
ubounds[1::2] = np.asarray(w) * 2 * np.pi
p0 = [0] * 10 # note p0 size
popt, pcov = curve_fit(func, x, yr, p0, bounds=(0, ubounds))
amps, phases = popt[::2], popt[1::2]
plt.plot(x,func(x, *popt))
plt.plot(x,yr, 'go')

Python Estimate the standard deviation after data fitting

I am trying to fit a data set into the hyperpolic equation using ipython --pylab:
y = ax / (b + x)
Here is my python code:
from scipy import optimize as opti
import numpy as np
from pandas import DataFrame
x = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8])
y = np.array([0.375, 0.466, 0.509, 0.520, 0.525, 0.536, 0.541])
y_stdev = np.array([0.025, 0.016, 0.009, 0.009, 0.025, 0.019])
def func(x, a, b):
return a*x / (b + x)
popt, pcov = opti.curve_fit(func, x, y)
print(popt)
print("a = ", popt.ix[0])
print("b = ", popt.ix[1])
The values of a and b should be inside the popt parameter. What I would like to ask is, the values of a and b are inferred when fitting the data set into the func(x, a, b), then, how can we estimate the standard deviations of a and b?
Thank you.
The answer is in the docs:
pcov : 2d array
The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov))...

Getting standard error associated with parameter estimates from scipy.optimize.curve_fit

I am using scipy.optimize.curve_fit to fit a curve to some data i have. The curves, for the most part, seem to fit very well. For some reason, pcov = inf when i print it off.
What i really need is to calculate the error associated with the parameters i'm fitting, and am not sure how exactly to do this even if it does give me the covariance matrix.
The model being fit to is:
def intensity(x,R_out,R_in,K_in,K_out,a,b,c):
K_in,K_out = abs(0.0),abs(K_out)
if x<=R_in:
return 2*R_out*(K_out*np.sqrt(1-x**2/R_out**2)-
(K_out-0.0)*np.sqrt(R_in**2/R_out**2-x**2/R_out**2)) + c
elif x>=R_in and x<=R_out:
return K_out*2*R_out*np.sqrt(1-x**2/R_out**2) + c
elif x>R_out:
return c
intensity_vec = np.vectorize(intensity)
def intensity_vec_self(x,R_out,R_in,K_in,K_out,a,b,c):
y = np.zeros(x.shape)
for i in range(len(y)):
y[i]=intensity_vec(x[i],R_out,R_in,K_in,K_out,a,b,c)
return y
and there are 400 data points, i can put that on here if you think it will help.
To summarize, i can't get curve_fit to print off my pcov and need help as to figure out why and if i can get it to do so.
Also, if it is a quick explanation i would like to know how to use the pcov array to attain the errors associated with my fit.
Thanks
The variance of parameters are the diagonal elements of the variance-co variance matrix, and the standard error is the square root of it. np.sqrt(np.diag(pcov))
Regarding getting inf, see and compare these two examples:
In [129]:
import numpy as np
def func(x, a, b, c, d):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5, 1)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ inf inf inf inf]
And:
In [130]:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ 0.11097646 0.11849107 0.05230711]
In this extreme example, d has no effect on the function func, hence it will be associated with variance of +inf, or in another word, it can be just about any value. Removing d from func will get what will make sense.
In reality, if parameters are of very different scale, say:
def func(x, a, b, c, d):
#return a * np.exp(-b * x) + c
return a * np.exp(-b * x) + c + d*1e-10
You will also get inf due to float point overflow/underflow.
In your case, I think you never used a and b. So it is just like the first example here.

passing arguments to a function for fitting

I am trying to fit a function which takes as input 2 independent variables x,y and 3 parameters to be found a,b,c. This is my test code:
import numpy as np
from scipy.optimize import curve_fit
def func(x,y, a, b, c):
return a*np.exp(-b*(x+y)) + c
y= x = np.linspace(0,4,50)
z = func(x,y, 2.5, 1.3, 0.5) #works ok
#generate data to be fitted
zn = z + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x,y, zn) #<--------Problem here!!!!!
But i am getting the error: "func() takes exactly 5 arguments (51 given)". How can pass my arguments x,y correctly?
A look at the documentation of scipy.optimize.curve_fit() is all it takes. The prototype is
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
The documentation states curve_fit() is called with the target function as the first argument, the independent variable(s) as the second argument, the dependent variable as the third argument ans the start values for the parameters as the forth argument. You tried to call the function in a completely different way, so it's not surprising it does not work. Specifically, you passed zn as the p0 parameter – this is why the function was called with so many parameters.
The documentation also describes how the target function is called:
f: callable
The model function, f(x, ...). It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments.
xdata : An N-length sequence or an (k,N)-shaped array
for functions with k predictors. The independent variable where the data is measured.
You try to uses to separate arguments for the dependent variables, while it should be a single array of arguments. Here's the code fixed:
def func(x, a, b, c):
return a * np.exp(-b * (x[0] + x[1])) + c
N = 50
x = np.linspace(0,4,50)
x = numpy.array([x, x]) # Combine your `x` and `y` to a single
# (2, N)-array
z = func(x, 2.5, 1.3, 0.5)
zn = z + 0.2 * np.random.normal(size=x.shape[1])
popt, pcov = curve_fit(func, x, zn)
Try to pass the first two array parameters to func as a tuple and modify func to accept a tuple of parameters
Normally it is expected the curvefit would accept an x and y parameter func(x) as an input to fit the curve. Strangely in your example as your x parameter is not a single value but two values (not sure why), you have to modify your function so that it accept the x as a single parameter and expands it within.
Generally speaking, three dimensional curve fitting should be handled in a different manner from what you are trying to achieve. You can take a look into the following SO post which tried to fit a three dimensional scatter with a line.
>>> def func((x,y), a, b, c):
return a*np.exp(-b*(x+y)) + c
>>> y= x = np.linspace(0,4,50)
>>> z = func((x,y), 2.5, 1.3, 0.5) #works ok
>>> zn = z + 0.2*np.random.normal(size=len(x))
>>> popt, pcov = curve_fit(func, (x,y), zn)

Categories

Resources