Is it possible to concatenate scipy.optimize.curve_fit with scipy.optimize.bisect (or fsolve, or whatever) for implicit scalar functions?
In practice, have a look at this Python code where I try to define an implicit function and pass it to curve_fit to obtain the best fit for a parameter:
import numpy as np
import scipy.optimize as opt
import scipy.special as spc
# Estimate of initial parameter (not really important for this example)
fact, _, _, _ = spc.airy(-1.0188)
par0 = -np.log(2.0*fact*(18**(1.0/3.0))*np.pi*1e-6)
# Definition of an implicit parametric function f(c,t;b)=0
def func_impl(c, t, p) :
return ( c - ((t**3)/9.0) / ( np.log(t*(c**(1.0/3.0))) + p ) )
# definition of the function I believe should be passed to curve_fit
def func_egg(t, p) :
x_st, _ = opt.bisect( lambda x : func_impl(x, t, p), a=0.01, b=0.3 )
return x_st
# Some data points
t_data = np.deg2rad(np.array([95.0, 69.1, 38.8, 14.7]))
c_data = np.array([0.25, 0.10, 0.05, 0.01])
# Call to curve_fit
popt, pcov = opt.curve_fit(func_egg, t_data, c_data, p0=par0)
b = popt[0]
Now, I am aware of all the things that may go wrong when trying to automatically find roots (although bisection should be stable, provided there's a root between a and b); however, the error I get seems to concern the dimensionality of the output of func_impl:
Traceback (most recent call last):
File "example_fit.py", line 23, in <module>
popt, pcov = opt.curve_fit(func_egg, t_data, c_data, p0=par0)
File "/usr/local/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 752, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "/usr/local/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 383, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "/usr/local/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "/usr/local/lib/python3.7/site-packages/scipy/optimize/minpack.py", line 458, in func_wrapped
return func(xdata, *params) - ydata
File "example_fit.py", line 15, in func_egg
x_st, _ = opt.bisect( lambda x : func_impl(x, t, p), a=0.01, b=0.3 )
File "/usr/local/lib/python3.7/site-packages/scipy/optimize/zeros.py", line 550, in bisect
r = _zeros._bisect(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
File "example_fit.py", line 15, in <lambda>
x_st, _ = opt.bisect( lambda x : func_impl(x, t, p), a=0.01, b=0.3 )
File "example_fit.py", line 11, in func_impl
return ( c - ((t**3)/9.0) / ( np.log(t*(c**(1.0/3.0))) + p ) )
TypeError: only size-1 arrays can be converted to Python scalars
My guess is that curve_fit basically treats the output of the input function as a vector having the same dimensionality of the input data; I thought I could easily work around this by 'vectorizing' the implicit function, or func_egg, although it does not seem as trivial as I thought.
Am I missing something?
Is there a simple workaround?
I guess I end up answering my own question. I hope this could be useful to others.
Let's first choose a simpler implicit function, in this case, f(c,t;b)=c-b*t^3 (the reason will be clarified later):
import numpy as np
import scipy.optimize as opt
import scipy.special as spc
import matplotlib.pyplot as plt
# Definition of an implicit parametric function f(c,t;b)=0
def func_impl(c, t, p) :
return (c-p*t**3)
Let's vectorize it:
v_func_impl = np.vectorize(func_impl)
The same script as the one in the question, but now (1) func_egg is vectorized, and (2) I use newton instead of bisect (I found it easier to provide x0 instead of [a,b]):
# Definition of the function I believe should be passed to curve_fit
def func_egg(t, p) :
x_st = opt.newton( lambda x : func_impl(x, t, p), x0=0.05 )
return x_st
v_func_egg = np.vectorize(func_egg)
# Some data points
t_data = np.deg2rad(np.array([127.0, 95.0, 69.1, 38.8]))
c_data = np.array([0.6, 0.25, 0.10, 0.05])
# Call to curve_fit
par0 = 0.05
popt, pcov = opt.curve_fit(v_func_egg, t_data, c_data, p0=par0)
b = popt[0]
Now it works!
plt.plot(t_data, c_data)
plt.plot(np.linspace(0.5, 2.5), b*np.linspace(0.5, 2.5)**3)
plt.show()
So, in essence:
In order to concatenate scipy curve-fitting and root-finding one needs to ensure that each function is vectorized (or can deal with numpy arrays as input and output).
Make sure that your function is not 'too ugly', otherwise even if the concatenation works the root-finding procedure itself may not be able to find a result (this goes into numerical mathematics; I should have checked the regularity of my original function).
Related
I am trying to call scipy curve_fit(), with the proper:
model function
xdata (float numpy 1D Array)
ydata (float numpy 1D Array)
p (float numpy 1D Array, initial values)
However I am getting the error:
ValueError: Object too deep for desired Array
Result from function Call is not a proper array of floats.
the model function I am computing is :
The mathematical expression that optimizes model_f, from which we are trying to find the optimal alpha, gamma.
function model_f computes the mathematical expression appended in the picture.
with open("Data_case_3.csv",'r') as i: #open a file in directory of this script for reading
rawdata = list(csv.reader(i,delimiter=",")) #make a list of data in file
exampledata = np.array(rawdata[1:],dtype=np.float) #convert to data array
xdata = exampledata[:,0]
ydata = exampledata[:,1]
m = 0.5
omega0 = 34.15
k = np.square(omega0)*m
def model_f(x,a,g):
zetaeq = (a*np.sqrt(np.pi)*(x**(g-1))*omega0*math.gamma(g/2))/(2*np.pi*k*math.gamma((3+g)/2))
return zetaeq
#------------------------------------------------------------------------------
funcdata = model_f(xdata,0.3,0.1)
plt.plot(xdata,funcdata,label="Model")
plt.legend()
popt, pcov = curve_fit(model_f, xdata, ydata, p0=[0.3,0.1])
And I am attaching the data types of the variables mentioned:
Variable types and shapes of the script
Can you help me understand what I am doing wrong?
Compare these 2 calls to curve_fit:
In [217]: xdata, ydata = np.ones(5), np.ones(5)
In [218]: curve_fit(model_f, xdata, ydata, p0=[0.3, 0.1])
Out[218]:
(array([0.74436049, 0.02752099]),
array([[2.46401533e-16, 9.03501810e-18],
[9.03501810e-18, 3.31294823e-19]]))
and
In [219]: xdata, ydata = np.ones((5,1)), np.ones((5,1))
In [220]: curve_fit(model_f, xdata, ydata, p0=[0.3, 0.1])
ValueError: object too deep for desired array
Traceback (most recent call last):
Input In [220] in <module>
curve_fit(model_f, xdata, ydata, p0=[0.3, 0.1])
File /usr/local/lib/python3.8/dist-packages/scipy/optimize/_minpack_py.py:789 in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File /usr/local/lib/python3.8/dist-packages/scipy/optimize/_minpack_py.py:423 in leastsq
retval = _minpack._lmdif(func, x0, args, full_output, ftol, xtol,
error: Result from function call is not a proper array of floats.
Which is closer to your experience?
I have the following code:
import numpy as np
import scipy.integrate as spi
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as mh
def GUFunction(z, Omega_Lambda):
integral = spi.quad(lambda zvar: AuxIntegrandum(zvar, Omega_Lambda), 0.0, z)[0]
DL = (1+z) * c/H0 * integral *1000000
return (5*(mh.log(DL,10)-1))
def AuxIntegrandum(z, Omega_Lambda):
Omega_m = 1 - Omega_Lambda
return 1 / mh.sqrt(Omega_m*(1+z)**3 + Omega_Lambda)
def DataFit(filename):
print curve_fit(GUFunction, ComputeData(filename)[0], ComputeData(filename)[1])
DataFit("data.dat")
data.dat has z values in the first column and GUF(z) values in the second column.
Upon executing this code, the compiler tells me that comparing an array to a value (+inf or -inf) is ambiguous.
I think this refers to the integration boundaries, where it looks to see if I want to integrate to infinity. For some reason it apparently puts all z-values from the data file into the integration boundary.
Is there some trick I don't know about which allows you to fit a curve to a numerically integrated function?
Here's the exact error:
Traceback (most recent call last):
File "plot.py", line 83, in <module>
DataFit("data.dat")
File "plot.py", line 67, in DataFit
print curve_fit(GUFunction, ComputeData(filename)[0], ComputeData(filename)[1])
File "/home/joshua/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 736, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "/home/joshua/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 377, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "/home/joshua/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "/home/joshua/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 454, in func_wrapped
return func(xdata, *params) - ydata
File "plot.py", line 57, in GUFunction
integral = spi.quad(lambda zvar: AuxIntegrandum(zvar, Omega_Lambda), 0.0, z)[0]
File "/home/joshua/anaconda2/lib/python2.7/site-packages/scipy/integrate/quadpack.py", line 323, in quad
points)
File "/home/joshua/anaconda2/lib/python2.7/site-packages/scipy/integrate/quadpack.py", line 372, in _quad
if (b != Inf and a != -Inf):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Short answer: curve_fit tries to evaluate the target function on an array of xdata, but quad cannot accept a vector argument. You need to define your target function via e.g. a list comprehension over an input array.
Let's cook up a minimum reproducible example:
In [33]: xdata = np.linspace(0, 3, 11)
In [34]: ydata = xdata**3
In [35]: def integr(x):
...: return quad(lambda t: t**2, 0, x)[0]
...:
In [36]: def func(x, a):
...: return integr(x) * a
...:
In [37]: curve_fit(func, xdata, ydata)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-37-4660c65f85a2> in <module>()
----> 1 curve_fit(func, xdata, ydata)
[... removed for clarity ...]
~/virtualenvs/py35/lib/python3.5/site-packages/scipy/integrate/quadpack.py in _quad(func, a, b, args, full_output, epsabs, epsrel, limit, points)
370 def _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points):
371 infbounds = 0
--> 372 if (b != Inf and a != -Inf):
373 pass # standard integration
374 elif (b == Inf and a != -Inf):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Which is precisely the error you're seeing. Ok, the error comes from quad, which tries to evaluate func(xdata, a), which boils down to integr(xdata) and that does not work. (How I found it out? I put import pdb; pdf.set_trace() inside the func function and poked around in the debugger).
Then, let's make the target function handle array arguments:
In [38]: def func2(x, a):
...: return np.asarray([integr(xx) for xx in x]) * a
...:
In [39]: curve_fit(func2, xdata, ydata)
Out[39]: (array([ 3.]), array([[ 3.44663413e-32]]))
While trying to create an example with scipy.optimize curve_fit I found that scipy seems to be incompatible with Python's math module. While function f1 works fine, f2 throws an error message.
from scipy.optimize import curve_fit
from math import sin, pi, log, exp, floor, fabs, pow
x_axis = np.asarray([pi * i / 6 for i in range(-6, 7)])
y_axis = np.asarray([sin(i) for i in x_axis])
def f1(x, m, n):
return m * x + n
coeff1, mat = curve_fit(f1, x_axis, y_axis)
print(coeff1)
def f2(x, m, n):
return m * sin(x) + n
coeff2, mat = curve_fit(f2, x_axis, y_axis)
print(coeff2)
The full traceback is
Traceback (most recent call last):
File "/Documents/Programming/Eclipse/PythonDevFiles/so_test.py", line 49, in <module>
coeff2, mat = curve_fit(f2, x_axis, y_axis)
File "/usr/local/lib/python3.5/dist-packages/scipy/optimize/minpack.py", line 742, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/scipy/optimize/minpack.py", line 377, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "/usr/local/lib/python3.5/dist-packages/scipy/optimize/minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "/usr/local/lib/python3.5/dist-packages/scipy/optimize/minpack.py", line 454, in func_wrapped
return func(xdata, *params) - ydata
File "/Documents/Programming/Eclipse/PythonDevFiles/so_test.py", line 47, in f2
return m * sin(x) + n
TypeError: only length-1 arrays can be converted to Python scalars
The error message appears with lists and numpy arrays as input alike. It affects all math functions, I tested (see functions in import) and must have something to do with, how the math module manipulates input data. This is most obvious with pow() function - if I don't import this function from math, curve_fit works properly with pow().
The obvious question - why does this happen and how can math functions be used with curve_fit?
P.S.: Please don't discuss, that one shouldn't fit the sample data with a linear fit. This was just chosen to illustrate the problem.
Be careful with numpy-arrays, operations working on arrays and operations working on scalars!
Scipy optimize assumes the input (initial-point) to be a 1d-array and often things go wrong in other cases (a list for example becomes an array and if you assumed to work on lists, things go havoc; those kind of problems are common here on StackOverflow and debugging is not that easy to do by the eye; code-interaction helps!).
import numpy as np
import math
x = np.ones(1)
np.sin(x)
> array([0.84147098])
math.sin(x)
> 0.8414709848078965 # this only works as numpy has dedicated support
# as indicated by the error-msg below!
x = np.ones(2)
np.sin(x)
> array([0.84147098, 0.84147098])
math.sin(x)
> TypeError: only size-1 arrays can be converted to Python scalars
To be honest: this is part of a very basic understanding of numpy and should be understood when using scipy's somewhat sensitive functions.
I am completely new to python and in fact any fundamental programming language, I use Mathematica for my all my symbolic and numeric calculations. I am learning to work with python and finding it really awesome! Here is a problem I am trying to solve but stuck without a clue!
I have a data file for example
0. 1.
0.01 0.9998000066665778
0.02 0.9992001066609779
... ..
Which just the {t, Cos[2t]}.
I want to define a function out of this data and use it in solving an equation in python. My Mathematica intuition tells me that I should define the function like:
iFunc[x_] = Interpolation[iData, x]
and rest of the job is easy. for instance
NDSolve[{y''[x] + iFunc[x] y[x] == 0, y[0] == 1, y[1] == 0}, y, {x, 0, 1}]
Solves the equation easily. (I have not tried with more complicated cases though).
Now how to do the job in python and also accuracy is an important issue for me. So, now I would like to ask two questions.
1. Is this the most accurate method in Mathematica?
2. And what is the equivalent of more accurate way to do the problem in python?
Here is my humble attempt to solve the problem (with a lot of input from StackOverflow) where the definition with cos(2t) works:
from scipy.integrate import odeint
import numpy as np
import matplotlib.pyplot as plt
from math import cos
from scipy import interpolate
data = np.genfromtxt('cos2t.dat')
T = data[:,0] #first column
phi = data[:,1] #second column
f = interpolate.interp1d(T, phi)
tmin = 0.0# There should be a better way to define from the data
dt = 0.01
tmax = 2*np.pi
t = np.arange(tmin, tmax, dt)
phinew = f(t) # use interpolation function returned by `interp1d`
"""
def fun(z, t):
x, y = z
return np.array([y, -(cos(2*t))*x ])
"""
def fun(z, t):
x, y = z
return np.array([y, -(phinew(t))*x ])
sol1 = odeint(fun, [1, 0], t)[..., 0]
# for checking the plots
plt.plot(t, sol1, label='sol')
plt.show()
*When I run the code with interpolated function from cos(2t) data, is not working...the error message tell
Traceback (most recent call last): File "testde.py", line 30,
in <module> sol1 = odeint(fun, [1, 0], t)[..., 0]
File "/home/archimedes/anaconda3/lib/python3.6/site-packages/scipy/integrate/odepack.py",
line 215, in odeint ixpr, mxstep, mxhnil, mxordn, mxords)
File "testde.py",
line 28, in fun return np.array([y, -(phinew(t))*x ])
TypeError: 'numpy.ndarray' object is not callable.
I really can't decipher them. Please help...
In Mathematica, the usual way is simply
iFunc = Interpolation[iData]
Interpolation[iData] already returns a function.
To sub-question 2
With
t = np.arange(tmin, tmax, dt)
phinew = f(t) # use interpolation function returned by `interp1d`
equivalent to
phinew = np.array([ f(s) for s in t])
you construct phinew not as callable function but as array of values, closing the circle array to interpolation function to array. Use f which is a scalar function directly in the derivatives function,
def fun(z, t):
x, y = z
return np.array([y, -f(t)*x ])
I have a set of data that I am trying to fit to an ODE model using scipy's leastsq function. My ODE has parameters beta and gamma, so that it looks for example like this:
# dS/dt = -betaSI
# dI/dt = betaSI - gammaI
# dR/dt = gammaI
# with y0 = y(t=0) = (S(0),I(0),R(0))
The idea is to find beta and gamma so that the numerical integration of my system of ODE's best approximates the data. I am able to do this just fine using leastsq if I know all the points in my initial condition y0.
Now, I am trying to do the same thing but to pass now one of the entries of y0 as an extra parameter. Here is where the Python and me stop communicating...
I did a function so that now the first entry of the parameters that I pass to leastsq is the initial condition of my variable R.
I get the following message:
*Traceback (most recent call last):
File "/Users/Laura/Dropbox/SHIV/shivmodels/test.py", line 73, in <module>
p1,success = optimize.leastsq(errfunc, initguess, args=(simpleSIR,[y0[0]],[Tx],[mydata]))
File "/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 283, in leastsq
gtol, maxfev, epsfcn, factor, diag)
TypeError: array cannot be safely cast to required type*
Here is my code. It is a little more involved that what it needs to be for this example because in reality I want to fit another ode with 7 parameters and want to fit to several data sets at once. But I wanted to post here something simpler... Any help will be very very much appreciated! Thank you very much!
import numpy as np
from matplotlib import pyplot as plt
from scipy import optimize
from scipy.integrate import odeint
#define the time span for the ODE integration:
Tx = np.arange(0,50,1)
num_points = len(Tx)
#define a simple ODE to fit:
def simpleSIR(y,t,params):
dydt0 = -params[0]*y[0]*y[1]
dydt1 = params[0]*y[0]*y[1] - params[1]*y[1]
dydt2 = params[1]*y[1]
dydt = [dydt0,dydt1,dydt2]
return dydt
#generate noisy data:
y0 = [1000.,1.,0.]
beta = 12*0.06/1000.0
gamma = 0.25
myparam = [beta,gamma]
sir = odeint(simpleSIR, y0, Tx, (myparam,))
mydata0 = sir[:,0] + 0.05*(-1)**(np.random.randint(num_points,size=num_points))*sir[:,0]
mydata1 = sir[:,1] + 0.05*(-1)**(np.random.randint(num_points,size=num_points))*sir[:,1]
mydata2 = sir[:,2] + 0.05*(-1)**(np.random.randint(num_points,size=num_points))*sir[:,2]
mydata = np.array([mydata0,mydata1,mydata2]).transpose()
#define a function that will run the ode and fit it, the reason I am doing this
#is because I will use several ODE's to see which one fits the data the best.
def fitfunc(myfun,y0,Tx,params):
myfit = odeint(myfun, y0, Tx, args=(params,))
return myfit
#define a function that will measure the error between the fit and the real data:
def errfunc(params,myfun,y0,Tx,y):
"""
INPUTS:
params are the parameters for the ODE
myfun is the function to be integrated by odeint
y0 vector of initial conditions, so that y(t0) = y0
Tx is the vector over which integration occurs, since I have several data sets and each
one has its own vector of time points, Tx is a list of arrays.
y is the data, it is a list of arrays since I want to fit to multiple data sets at once
"""
res = []
for i in range(len(y)):
V0 = params[0][i]
myparams = params[1:]
initCond = np.zeros([3,])
initCond[:2] = y0[i]
initCond[2] = V0
myfit = fitfunc(myfun,initCond,Tx[i],myparams)
res.append(myfit[:,0] - y[i][:,0])
res.append(myfit[:,1] - y[i][:,1])
res.append(myfit[1:,2] - y[i][1:,2])
#end for
all_residuals = np.hstack(res).ravel()
return all_residuals
#end errfunc
#example of the problem:
V0 = [0]
params = [V0,beta,gamma]
y0 = [1000,1]
#this is just to test that my errfunc does work well.
errfunc(params,simpleSIR,[y0],[Tx],[mydata])
initguess = [V0,0.5,0.5]
p1,success = optimize.leastsq(errfunc, initguess, args=(simpleSIR,[y0[0]],[Tx],[mydata]))
The problem is with the variable initguess. The function optimize.leastsq has the following call signature:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html
It's second argument, x0, has to be an array. Your list
initguess = [v0,0.5,0.5]
won't be converted to an array because v0 is a list instead of an int or float. So you get an error when you try to convert initguess from a list to an array in the leastsq function.
I would adjust the variable params from
def errfunc(params,myfun,y0,Tx,y):
so that it is a 1-D array. Make the first few entries the values of v0 then append beta and gamma to that.