I am trying to fit a function which takes as input 2 independent variables x,y and 3 parameters to be found a,b,c. This is my test code:
import numpy as np
from scipy.optimize import curve_fit
def func(x,y, a, b, c):
return a*np.exp(-b*(x+y)) + c
y= x = np.linspace(0,4,50)
z = func(x,y, 2.5, 1.3, 0.5) #works ok
#generate data to be fitted
zn = z + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x,y, zn) #<--------Problem here!!!!!
But i am getting the error: "func() takes exactly 5 arguments (51 given)". How can pass my arguments x,y correctly?
A look at the documentation of scipy.optimize.curve_fit() is all it takes. The prototype is
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
The documentation states curve_fit() is called with the target function as the first argument, the independent variable(s) as the second argument, the dependent variable as the third argument ans the start values for the parameters as the forth argument. You tried to call the function in a completely different way, so it's not surprising it does not work. Specifically, you passed zn as the p0 parameter – this is why the function was called with so many parameters.
The documentation also describes how the target function is called:
f: callable
The model function, f(x, ...). It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments.
xdata : An N-length sequence or an (k,N)-shaped array
for functions with k predictors. The independent variable where the data is measured.
You try to uses to separate arguments for the dependent variables, while it should be a single array of arguments. Here's the code fixed:
def func(x, a, b, c):
return a * np.exp(-b * (x[0] + x[1])) + c
N = 50
x = np.linspace(0,4,50)
x = numpy.array([x, x]) # Combine your `x` and `y` to a single
# (2, N)-array
z = func(x, 2.5, 1.3, 0.5)
zn = z + 0.2 * np.random.normal(size=x.shape[1])
popt, pcov = curve_fit(func, x, zn)
Try to pass the first two array parameters to func as a tuple and modify func to accept a tuple of parameters
Normally it is expected the curvefit would accept an x and y parameter func(x) as an input to fit the curve. Strangely in your example as your x parameter is not a single value but two values (not sure why), you have to modify your function so that it accept the x as a single parameter and expands it within.
Generally speaking, three dimensional curve fitting should be handled in a different manner from what you are trying to achieve. You can take a look into the following SO post which tried to fit a three dimensional scatter with a line.
>>> def func((x,y), a, b, c):
return a*np.exp(-b*(x+y)) + c
>>> y= x = np.linspace(0,4,50)
>>> z = func((x,y), 2.5, 1.3, 0.5) #works ok
>>> zn = z + 0.2*np.random.normal(size=len(x))
>>> popt, pcov = curve_fit(func, (x,y), zn)
Related
I am trying to fit 3-dimensional data (that is, 2 independent and 1 dependent variable) using multivariate fitting in scipy curve_fit. I wish to do piecewise fitting for the same problem. I have tried to proceed on the basis of this without any success. The problem is defined below:
import numpy as np
from scipy.optimize import curve_fit
#..........................................................................................................
def F0(X, a, b, c, c0, y0):
x, y = X
value = []
for i in range(0, len(x)):
if y[i] < y0:
lnZ = x[i] + c0*y[i]
else:
lnZ = x[i] + c*y[i]
val = a + (b*lnZ)
value.append(val)
return value
#..........................................................................................................
def F1(X, a, b, c):
x, y = X
lnZ = x + c*y
value = a + (b*lnZ)
return value
#..........................................................................................................
x = [-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
0,
0,
0,
0,
0,
0,
0,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093
]
y = [7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04
]
z = [4.077424497,
4.358253892,
4.610475878,
4.881769469,
5.153063061,
5.323277142,
5.462023074,
4.610475878,
4.840765517,
5.04864602,
5.235070966,
5.351407761,
5.440090728,
5.540693448,
4.960439843,
5.118257381,
5.266539115,
5.370479367,
5.440090728,
5.528296904,
5.5816974,
]
popt, pcov = curve_fit(F0, (x, y), z, method = 'lm')
print(popt)
popt, pcov = curve_fit(F1, (x, y), z, method = 'lm')
print(popt)
The output is:
[1.34957781e+00 1.05456428e-01 1.00000000e+00 4.14879613e+04
1.00000000e+00]
[1.34957771e+00 1.05456434e-01 4.14879603e+04]
You can see that the values of parameters in the piecewise fitting remain as the initial values. I know I am not doing it in the correct way. Please correct me.
The main source of the problem is the insensitivity of this approach to the value of the variable that defines the switch from one function to another (see this response for a similar explanation). Moreover, the choice of starting parameters isn't good.
Since no starting values are provided, curve_fit chooses a value of 1 for all the fitting parameters (see here the default value for p0). Since the fitting algorithm works by making small variations on the parameters, y0 is varied in small steps around 1, which produces no changes in the output of the function (all y values are much smaller than 1). Since y[i] < y0 is always True and only the first branch is ever evaluated, and the output of the function does not depend on the value of c. That explains why y0 and c stay at the initial values.
One might expect that setting y0 initial value to be inside of the range of values that are evaluated (i.e. around 8E-4) might solve the problem. Indeed, since the second branch is evaluated, the value of c is now optimized. Nevertheless, y0 value will stay unchanged. As the fitting algorithm works testing very small changes to the values, the changes are not large enough to move from the interval between two experimental y values to another one. In this particular case, if one chooses 8E-4, the small variations will never be enough to make it go over 8.17E-04 or below 7.85E-4, that are the values encompassing initial y0 choice.
One can usually circumvent this problem making the function depend explicitly on the value of y0. A smart choice would be to redefine the function so the value at y0 is the same no matter which branch is taken (i.e. ensure that the function is continuous). In this case, the function definition does not ensure so. A reasonable change would be:
def F2(X, a, b, c, c0, y0):
x, y = X
value = []
for i in range(0, len(x)):
lnZ = x[i] + c0 * y[i]
if y[i] >= y0:
lnZ += c * (y[i]-y0)
val = a + (b*lnZ)
value.append(val)
return value
which changes the meaning of the parameter c, and limits the results to only continuous functions. In this case, the value of y0 is indeed the function turning point. Nevertheless, it yields the desired results:
popt2, pcov = curve_fit(F2, (x, y), z, p0=(1, 1, 1E4, 1E4, 9.1E-4), method = 'lm')
print(popt2)
results in:
[-1.93417968e-01 1.05456433e-01 -3.65740192e+04 5.97890809e+04
8.64354057e-04]
A better (pythonic) definition for the function avoids the for loop:
def F3(X, a, b, c, c0, y0):
x, y = X
lnZ = x + c0 * y
idx = np.where(y>=y0)
lnZ[idx] += c * (y[idx] - y0)
rv = a + (b * lnZ)
return rv
which will probably be much faster for larger datasets.
My ODE is given as Mx''+Lx'+f(x)=0 where f(x) is a polynomial function. Please look at my FULL CODE where I defined the differential equation in a function namely 'diff'. Then I use 'odeint' which calls 'diff' along with the necessary arguments to solve the differential equaion.
Now I consider f(x)=ax. Here I have to pass three parameters in total (M,L,a) as the argument to the 'diff' function. As a matter of fact the code works if I write: (see full code)
sol = odeint(diff, y0, t, args=(M,L, a))
But when f(x) is a polynomial up to 10 power of 'x', then the parameter list becomes too long. Therefore I want to put all the parameters in an array and then pass that array as argument. I tried in this way:
def diff(y, t, inertia):
M=inertia[0]
L=inertia[1]
a=inertia[2]
x,v = y
dydt = [v, (-L*v - a*x)/M]
return dydt
M=5
L = 0.5
a = 5.0
Inertia=(M,L,a)
sol = odeint(diff, y0, t, args=Inertia)
But this approach doen't work. It says 'TypeError: diff() takes 3 positional arguments but 5 were given'.
How can I can I make this approach work, or how to send a list of parameters as argument?
Full Code:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def diff(y, t, M, L, a):
x,v = y
dydt = [v, (-L*v - a*x)/M]
return dydt
M=5
L = 0.5
a = 5.0
#Inertia=(M,L,a)
#But I cant pass the 'Inertia' as an argument
y0 = [- 0.1, 0.0]
t = np.linspace(0, 10, 101)
sol = odeint(diff, y0, t, args=(M,L, a))
plt.plot(t, sol[:, 0], 'b', label='x(t)')
plt.plot(t, sol[:, 1], 'g', label='v(t)')
plt.legend(loc='best')
plt.show()
Inertia in this case is a tuple. odeint expects a tuple of arguments as its args parameter, so Inertia gets unpacked and the arguments to diff become y0, t, M, L, a. To circumvent this, you should pack Inertia in another tuple to make Inertia a single argument, like so:
sol = odeint(diff, y0, t, args=(Inertia,))
Note the , after Inertia. this makes it a tuple ((a) == a, (a,) == tuple(a))
Your approach doesn't work because you have assigned inertia as a tuple instead of an array. Correct is inertia=[a,b,c].
As arguments are passed to functions as well, your "array" gets appended to the other argumenty when passing it to a function an so this function receives 5 arguments.
I am trying to define a function of n variables to fit to a data set. The function looks like this.
Kelly Function
I then want to find the optimal ai's and bj's to fit my data set using scipy.optimize.leastsq
Here's my code so far.
from scipy.optimize import leastsq
import numpy as np
def kellyFunc(a, b, x): #Function to fit.
top = 0
bot = 0
a = [a]
b = [b]
for i in range(len(a)):
top = top + a[i]*x**(2*i)
bot = bot + b[i]*x**(2*i)
return(top/bot)
def fitKelly(x, y, n):
line = lambda params, x : kellyFunc(params[0,:], params[1,:], x) #Lambda Function to minimize
error = lambda params, x, y : line(params, x) - y #Kelly - dataset
paramsInit = [[1 for x in range(n)] for y in range(2)] #define all ai and bi = 1 for initial guess
paramsFin, success = leastsq(error, paramsInit, args = (x,y)) #run leastsq optimization
#line of best fit
xx = np.linspace(x.min(), x.max(), 100)
yy = line(paramsFin, xx)
return(paramsFin, xx, yy)
At the moment it's giving me the error:
"IndexError: too many indices" because of the way I've defined my initial lambda function with params[0,:] and params[1,:].
There are a few problems with your approach that makes me write a full answer.
As for your specific question: leastsq doesn't really expect multidimensional arrays as parameter input. The documentation doesn't make this clear, but parameter inputs are flattened when passed to the objective function. You can verify this by using full functions instead of lambdas:
from scipy.optimize import leastsq
import numpy as np
def kellyFunc(a, b, x): #Function to fit.
top = 0
bot = 0
for i in range(len(a)):
top = top + a[i]*x**(2*i)
bot = bot + b[i]*x**(2*i)
return(top/bot)
def line(params,x):
print(repr(params)) # params is 1d!
params = params.reshape(2,-1) # need to reshape back
return kellyFunc(params[0,:], params[1,:], x)
def error(params,x,y):
print(repr(params)) # params is 1d!
return line(params, x) - y # pass it on, reshape in line()
def fitKelly(x, y, n):
#paramsInit = [[1 for x in range(n)] for y in range(2)] #define all ai and bi = 1 for initial guess
paramsInit = np.ones((n,2)) #better
paramsFin, success = leastsq(error, paramsInit, args = (x,y)) #run leastsq optimization
#line of best fit
xx = np.linspace(x.min(), x.max(), 100)
yy = line(paramsFin, xx)
return(paramsFin, xx, yy)
Now, as you see, the shape of the params array is (2*n,) instead of (2,n). By doing the re-reshape ourselves, your code (almost) works. Of course the print calls are only there to show you this fact; they are not needed for the code to run (and will produce bunch of needless output in each iteration).
See my other changes, related to other errors: you had a=[a] and b=[b] in your kellyFunc, for no good reason. This turned the input arrays into lists containing arrays, which made the next loop do something very different from what you intended.
Finally, the sneakiest error: you have input variables named x, y in fitKelly, then you use x and y is loop variables in a list comprehension. Please be aware that this only works as you expect it to in python 3; in python 2 the internal variables of list comprehensions actually leak outside the outer scope, overwriting your input variables named x and y.
The thing is, im trying to design of fitting procedure for my purposes and want to use scipy`s differential evolution algorithm as a general estimator of initial values which then will be used in LM algorithm for better fitting. The function i want to minimize with DE is the least squares between analytically defined non-linear function and some experimental values. Point at which i stuck is the function design. As its stated in scipy reference: "function must be in the form f(x, *args) , where x is the argument in the form of a 1-D array and args is a tuple of any additional fixed parameters needed to completely specify the function"
There is an ugly example of code which i wrote just for illustrative purposes:
def func(x, *args):
"""args[0] = x
args[1] = y"""
result = 0
for i in range(len(args[0][0])):
result += (x[0]*(args[0][0][i]**2) + x[1]*(args[0][0][i]) + x[2] - args[0][1][i])**2
return result**0.5
if __name__ == '__main__':
bounds = [(1.5, 0.5), (-0.3, 0.3), (0.1, -0.1)]
x = [0,1,2,3,4]
y = [i**2 for i in x]
args = (x, y)
result = differential_evolution(func, bounds, args=args)
print(func(bounds, args))
I wanted to supply raw data as a tuple into the function but it seems that its not how its suppose to be since interpreter isn't happy with the function. The problem should be easy solvable, but i really frustrated, so advice will be much appreciated.
This is kinda straightforward solution which shows the idea, also code isn`t very pythonic but for simplicity i think its good enough. Ok as example we want to fit equation of a kind y = ax^2 + bx + c to a data obtained from equation y = x^2. It obvious that parameter a = 1 and b,c should equal to 0. Since differential evolution algorithm finds minimum of a function we want to find a minimum of a root mean square deviation (again, for simplicity) of analytic solution of general equation (y = ax^2 + bx + c) with given parameters (providing some initial guess) vs "experimental" data. So, to the code:
from scipy.optimize import differential_evolution
def func(parameters, *data):
#we have 3 parameters which will be passed as parameters and
#"experimental" x,y which will be passed as data
a,b,c = parameters
x,y = data
result = 0
for i in range(len(x)):
result += (a*x[i]**2 + b*x[i]+ c - y[i])**2
return result**0.5
if __name__ == '__main__':
#initial guess for variation of parameters
# a b c
bounds = [(1.5, 0.5), (-0.3, 0.3), (0.1, -0.1)]
#producing "experimental" data
x = [i for i in range(6)]
y = [x**2 for x in x]
#packing "experimental" data into args
args = (x,y)
result = differential_evolution(func, bounds, args=args)
print(result.x)
I am trying to write a curve fitting function which returns the optimal parameters a, b and c, here is a simplified example:
import numpy
import scipy
from scipy.optimize import curve_fit
def f(x, a, b, c):
return x * 2*a + 4*b - 5*c
xdata = numpy.array([1,3,6,8,10])
ydata = numpy.array([ 0.91589774, 4.91589774, 10.91589774, 14.91589774, 18.91589774])
popt, pcov = scipy.optimize.curve_fit(f, xdata, ydata)
This works fine, but I want to give the user a chance to supply some (or none) of the parameters a, b or c, in which case they should be treated as constants and not estimated. How can I write f so that it fits only the parameters not supplied by the user?
Basically, I need to define f dynamically with the correct arguments. For instance if a was known by the user, f becomes:
def f(x, b, c):
a = global_version_of_a
return x * 2*a + 4*b - 5*c
Taking a page from the collections.namedtuple playbook, you can use exec to "dynamically" define func:
import numpy as np
import scipy.optimize as optimize
import textwrap
funcstr=textwrap.dedent('''\
def func(x, {p}):
return x * 2*a + 4*b - 5*c
''')
def make_model(**kwargs):
params=set(('a','b','c')).difference(kwargs.keys())
exec funcstr.format(p=','.join(params)) in kwargs
return kwargs['func']
func=make_model(a=3, b=1)
xdata = np.array([1,3,6,8,10])
ydata = np.array([ 0.91589774, 4.91589774, 10.91589774, 14.91589774, 18.91589774])
popt, pcov = optimize.curve_fit(func, xdata, ydata)
print(popt)
# [ 5.49682045]
Note the line
func=make_model(a=3, b=1)
You can pass whatever parameters you like to make_model. The parameters you pass to make_model become fixed constants in func. Whatever parameters remain become free parameters that optimize.curve_fit will try to fit.
For example, above, a=3 and b=1 become fixed constants in func. Actually, the exec statement places them in func's global namespace. func is thus defined as a function of x and the single parameter c. Note the return value for popt is an array of length 1 corresponding to the remaining free parameter c.
Regarding textwrap.dedent: In the above example, the call to textwrap.dedent is unnecessary. But in a "real-life" script, where funcstr is defined inside a function or at a deeper indentation level, textwrap.dedent allows you to write
def foo():
funcstr=textwrap.dedent('''\
def func(x, {p}):
return x * 2*a + 4*b - 5*c
''')
instead of the visually unappealing
def foo():
funcstr='''\
def func(x, {p}):
return x * 2*a + 4*b - 5*c
'''
Some people prefer
def foo():
funcstr=(
'def func(x, {p}):\n'
' return x * 2*a + 4*b - 5*c'
)
but I find quoting each line separately and adding explicit EOL characters a bit onerous. It does save you a function call however.
I usually use a lambda for this purpose.
user_b, user_c = get_user_vals()
opt_fun = lambda x, a: f(x, a, user_b, user_c)
popt, pcov = scipy.optimize.curve_fit(opt_fun, xdata, ydata)
If you want a simple solution based on curve_fit, I'd suggest that you wrap your function in a class. Minimal example:
import numpy
from scipy.optimize import curve_fit
class FitModel(object):
def f(self, x, a, b, c):
return x * 2*a + 4*b - 5*c
def f_a(self, x, b, c):
return self.f(x, self.a, b, c)
# user supplies a = 1.0
fitModel = FitModel()
fitModel.a = 1.0
xdata = numpy.array([1,3,6,8,10])
ydata = numpy.array([ 0.91589774, 4.91589774, 10.91589774, 14.91589774, 18.91589774])
initial = (1.0,2.0)
popt, pconv = curve_fit(fitModel.f_a, xdata, ydata, initial)
There is already a package that does this:
https://lmfit.github.io/lmfit-py/index.html
From the README:
"LMfit-py provides a Least-Squares Minimization routine and class
with a simple, flexible approach to parameterizing a model for
fitting to data. Named Parameters can be held fixed or freely
adjusted in the fit, or held between lower and upper bounds. In
addition, parameters can be constrained as a simple mathematical
expression of other Parameters."
def f(x, a = 10, b = 15, c = 25):
return x * 2*a + 4*b - 5*c
If the user doesn't supply an argument for the parameter in question, whatever you specified on the right-hand side of the = sign will be used:
e.g.:
f(5, b = 20) will evaluate to return 5 * 2*10 + 4*20 - 5*25 and
f(7) will evaluate to return 7 * 2*10 + 4*15 - 5*25