How to fit a function with an integral in python - python

I have to fit a fairly complex function to the data of an experiment and so far I get values that makes no sense. The function looks like this function. I have tried the following:
R0 = 2.5e-9
def integrand1(r, args):
t, W0, n = args
return 4*pi*n*W0*np.exp(-2*r/R0)*np.exp(-W0*np.exp(-2*r/R0)*t)*r**2
def integrand2(r, args):
t, W0, n = args
return 4*pi*n*(np.exp(-W0*np.exp(-2*r/R0)*t)-1)*r**2
def fit(t, W0,n):
res = scipy.integrate.quad(integrand1, 0.0, np.inf,[t,W0,n])*np.exp(scipy.integrate.quad(integrand2, 0, np.inf, [t,W0,n]))
return res[0]
vcurve = np.vectorize(fit, excluded=set([1]))
popt, pcov = scipy.optimize.curve_fit(vcurve, t, y,p0=[0,0], bounds = ((0,0),(np.inf,1)))
print(popt)
print(pcov)
I'm really unsure how to proceed, because the code 'apparently' works but I either get an error of infinity or 0, which neither makes sense. I've never dealt with such a complicated function so I assume I may be missing some steps that may help me 'prevent' or fix this issue and make the fit work. Any help would be greatly appreciated!
Edit1: This is the seccond attempt at the fit:
def W(W0,r):
return W0*exp(-2*r/R0)
def I(t,W0,R0,n):
out = []
for i in t:
t1 = 4*pi*n * quad(lambda r: W0 * exp(-2*r/R0 -W(W0,r)*i)*r**2,0,inf)[0]
t2 = exp(4*pi*n * quad(lambda r: (exp(-W(W0,r)*i) - 1) * r**2,0,inf)[0])
out.append(t1*t2/(pi*n*W0*R0**3)) #Normierung
return out
pars = Parameters()
pars.add('W0', value=1, min=0)
pars.add('n', value=1, min=0)
pars.add('R0', value=2.5e-9, vary = False)
mdl = Model(I)
result = mdl.fit(y,t,params=pars,nan_policy='propagate')
comps = result.eval_components()
print(result.fit_report())
Here I get the error message: fit() got multiple values for argument 'params'. Any kind of help would be greatly appreciated!
For extra information, I(t) and t consists of 20000 rows. t goes from 0 to 3.2e^-5s and here is a sample of I(t): 14,19,18,10,10,15,15,23,16,74,54,44,31,31,26,39,31,46,31,23.

Related

ValueError: setting an array element with a sequence. in scipy.optimize.minimize + curve_fit

I am a newbie at Python and I was writing a code to compute, then fit, magnetization data.
Firstly, I am writing the function for the energy to be minimized with respect to the parameter "theta".
def E_uniaxial(H, phi, theta, Keff, Ms):
e = Keff*(np.cos(theta))**2 - ((4*np.pi)**2*mu0)*Ms*H*np.cos(theta - phi)
return e
Then, as the magnetization depends strongly on the previous equilibriuum position of the system, I write a function for the "next equilibriuum position", the parameter H is the one supposed to change between the previous and the new equilibriuum position.
def next_theta(Ms, phi, Keff, H, lasttheta, fctE):
E = lambda x : fctE(H, phi, x, Keff, Ms)[0]
result = scipy.optimize.minimize(E, lasttheta)
return result.x
After this, I write a function that computes a whole hysteresis cycle. Given a starting point that is known, the function increases H and computes all the equilibriuum positions that depends on the previous one (then H is decreased and the same process is performed).
def cycle_theta(Ms, desfield, Keff, Hmax, theta_init_1, theta_init_2, fctE):
#aller
H1 = np.linspace(-Hmax, Hmax, 2000)
sol1 = np.zeros(np.shape(H1))
sol1[0] = theta_init_1
for i in range(len(H1)-1):
sol1[i+1] = next_theta(Ms, desfield, Keff, H1[i+1], sol1[i], fctE)
#retour
H2 = np.linspace(Hmax, -Hmax, 2000)
sol2 = np.zeros(np.shape(H2))
sol2[0] = theta_init_2
for i in range(len(H2) -1):
sol2[i+1] = next_theta(Ms, desfield, Keff, H2[i+1], sol2[i], fctE)
return H1, sol1, np.flip(sol2)
Then, I have to fit data in order to find the Ms and Keff parameters. I defined this function :
def test_fit(H, Ms, Keff):
a = cycle_theta(Ms, 1., Keff, 20, np.pi, 0., E_uniaxial)[1]
idx = 0
if isinstance(H, float):
idx = find_nearest(a, H)
print('float')
return np.sin(a[idx])
if isinstance(H, np.ndarray):
c = np.zeros(np.shape(H))
for i in range(len(H)):
idx = find_nearest(a, H[i])
c[i] = a[idx]
print('array')
return np.sin(c)
The condition on the type seemed to be required for the function to work with curve_fit.
I finally call popt = curve_fit(test_fit, b, sig) where "b" and "sig" are my experimental data.
But I got this error several times coming from the scipy.optimize.minimize, not the curve_fit:
ValueError: setting an array element with a sequence.
I read that this message can come from the fact my energy function E_unixial returns an array and not a scalar, but actually it's a quite regular function : if you input a scalar, you get a scalar and if you input an array, you get an array.
So I really don't understand, am I not supposed to use scipy.optimize.minimize and scipy.minimize.curve_fit one into the other ?
Thank you a lot for your help !!

Passing Arguments in a correct way to scipy minimzer

I am trying to minimize a loglikelihood wrt Fsc, Qsc and Rsc:
def llik_scalars(Fsc, Qsc, Rsc, pred_state, pred_P, y):
T = len(pred_P)
#pred_state = np.array([pred_state[t].item() for t in range(len(pred_state))])
#pred_P = np.array([pred_P[t].item() for t in range(len(pred_P))])
Sigmat = np.array(pred_P) + Rsc
Mut = pred_state
for t in range(T):
exponent = -0.5 * (y[t]-Mut[t])**2 / Sigmat[t]
cc = 1 / math.sqrt(2*math.pi*Sigmat[t])
LL -= math.log(cc*math.exp(exponent))
return LL
At first i tried to pass my pred_state and pred_P as lists of matrices. These matrices are of size 1x1, so with the code that is commented out i retrieved list of the numbers in the matrices.
However, as i was not sure the arguments could be passed in that form, but I read that arrays can be passed, the code that is commented out is now performed BEFORE i pass pred_state and pred_P as arguments. I thus pass them as numpy arrays.
I tried to do this using the scipy minimzer
x0 = [0.5, np.var(y)/3, np.var(y) *2/3]
minimize(llik_scalars, x0, method = 'nelder-mead', args=(pred_state, pred_P, y))
I get this error:
llik_scalars() missing 2 required positional arguments: 'pred_P' and 'y'
Following another topic on stackoverflow i adapted my code to the following, hoping to solve my problem:
def llik_scalars(Fsc, Qsc, Rsc, *args):
pred_state = args[0]
pred_P = args[1]
y = args[2]
T = len(pred_P)
#pred_state = np.array([pred_state[t].item() for t in range(len(pred_state))])
#pred_P = np.array([pred_P[t].item() for t in range(len(pred_P))])
Sigmat = np.array(pred_P) + Rsc
Mut = pred_state
for t in range(T):
exponent = -0.5 * (y[t]-Mut[t])**2 / Sigmat[t]
cc = 1 / math.sqrt(2*math.pi*Sigmat[t])
LL -= math.log(cc*math.exp(exponent))
return LL
This however, results in the following error:
pred_P = args[1]
IndexError: tuple index out of range
I don't see how this is not working. Please help me out :)
-- EDIT:--
the first few entries of pred_state and pred_p and y, how i pass them into llik_scalars. Note inital guess for the state is 0, and I use a sort of diffuse prior by setting my variance (pred_P) to a million. I retrieved my pred_state and pred_P using a Kalman filter with initial guesses for my F, Q and R:
pred_state[:5]
Out[121]: array([ 0. , 0.6097107 , 0.29789331, 0.30998801, -0.33307371])
pred_P[:5]
Out[122]:
array([1.00000000e+06, 1.24999975e+00, 1.13888888e+00, 1.13311688e+00,
1.13280061e+00])
y[:5]
Out[123]: array([ 1.21942262, 0.58464737, 0.90278035, -1.52760793, -0.80572172])

Minimization: taking advantage of asymmetry in computation time for each parameter

I have a (relatively standard) minimization problem, where I have a set of experimental data (xdata, ydata), a model y=f(x, parameters), and I want to extract the parameters. scipy.optimize.curve_fit would be my go-to, but there is a trick in the structure of the function to minimize. It is actually a two step calculation where some parameters are involved in a much quicker calculation than others.
Example code with mockup functions and data follows (in the real thing, param_a and param_b include multiple parameters, if it matters). Here the structure of the model is not used at all:
'''Runs, but not optimal.'''
import numpy as np
import scipy.optimize
from time import sleep
def quick(array_in, param):
return param * array_in
def slow(array_in, param):
sleep(0.1)
return np.exp(param*array_in)
def model(x, param_a, param_b):
intermediary = slow(x, param_a)
return quick(intermediary, param_b)
p_actual = [0.5, 2.0]
xdata = np.linspace(0.0, 10.0)
ydata = model(xdata, *p_actual) + np.random.randn(*xdata.shape)
# The following is inefficient because parameter asymmetry is hidden to curve_fit
# Change the 0.1s sleep time in slow() to experiment
popt, _ = scipy.optimize.curve_fit(model, xdata, ydata, p0=np.asarray([1.0, 1.0]))
print(popt) # about [0.5, 2.0]
I would imagine the minimization algorithm could take advantage of the structure of the problem where changing param_b is really easy (if a previous call to model with the same param_a was made). In the real case, slow involves solving ODEs and takes several minutes, while quick is a weighted sum of numpy arrays which takes (at least) four orders of magnitude less time.
The following is an implementation of this idea, including a test comparing the two methods for a non-trivial fitting problem. It turns out that in ~50% of cases the "improved" version calls slow() *more* times (sometimes to converge to a better fit, but sometimes not); that might be due to the interaction of scipy.optimize routines and the energy landscape of the problem. I suspect a working solution requires more thinking about the math than I put in.
# -*- coding: utf-8 -*-
import numpy as np
from inspect import getfullargspec
import scipy.optimize
import random
from time import sleep
import matplotlib.pyplot as plt
def _number_of_arguments(f):
'''
f must be a function taking a fixed number >= 1 of non-keyword arguments, because that's what scipy.optimize.curve_fit operates on. If so, the number of those arguments is returned. Otherwise, an error is thrown.
'''
fas = getfullargspec(f)
if fas.varargs is not None:
raise ValueError('Function accepts an arbitrary number of positional arguments.', f)
if fas.varkw is not None or fas.kwonlyargs:
raise ValueError('Function accepts keyword arguments.', f)
n = len(fas.args)
if n == 0:
raise ValueError('Function takes no arguments, it should take at least one (x).')
return n
def _score(ydata, yest, sigma=None):
sigma = sigma if sigma is not None else np.ones(yest.shape)
return np.sum(((yest - ydata)/sigma)**2)
def _optimize_std(fslow, fquick, xdata, ydata, ps_guess=None, pq_guess=None, sigma=None):
Nps = _number_of_arguments(fslow) - 1
Npq = _number_of_arguments(fquick) - 1
assert ps_guess is None or Nps == len(ps_guess)
assert pq_guess is None or Npq == len(pq_guess)
assert xdata.shape == ydata.shape
assert sigma is None or sigma.shape == ydata.shape
def f(x, *p):
assert len(p) == Nps + Npq
return fquick( fslow(x, *p[:Nps]), *p[Nps:])
ps_guess = np.array(ps_guess) if ps_guess is not None else np.ones(Nps)
pq_guess = np.array(pq_guess) if pq_guess is not None else np.ones(Npq)
p_guess = np.concatenate((ps_guess, pq_guess))
p_opt, _ = scipy.optimize.curve_fit(f, xdata, ydata, p0=p_guess, sigma=sigma)
return p_opt[:Nps], p_opt[Nps:]
def optimize(fslow, fquick, xdata, ydata, ps_guess=None, pq_guess=None, sigma=None):
'''Two-step curve fitting for two-part function.
Two functions `fquick: interm, *pq -> y` and `fslow: x, *ps -> interm`
define together `f: x, *ps, *pq -> y = fquick( fslow(x, *pq), *ps)`.
Assuming that `f: xdata -> ydata` one can fit the ps, pq parameters; we
return ps0, pq0 such that f(xdata, ps0, pq0) ~ ydata.
This whole function should have an output equivalent to that of the
standard scipy.optimize.curve_fit. However, internally, it is written so
that the fewest possibles calls to fslow() are made (at the expense of more
calls to fquick). That is possible because the intermediary calculation
`interm = fslow(x, *ps)` can be reused for multiple calls to `y =
fquick(interm, *pq)`. If the latter is really quick, it can make sense to
fully optimize the `fquick` part at each step of the optimization for
`fslow`; this increases greatly the number of calls to `fquick`, but the
optimization that is actually costly in function calls is madek with fewer
parameters.
The first argument of fslow() and the output of fquick() must be 1d arrays
with consistent size. The output of fslow() must be consumed as the first
argument of fquick(), but can be any kind of object.
Args:
fslow (callable): function with two arguments.
fslow: interm, *ps -> y
fquick (callable): function with two arguments.
fquick: x, *pq -> interm
xdata (np.array): evaluation points.
ydata (np.array): values to fit.
Returns: : optimal parameters for fslow : optimal parameters for fquick
'''
assert xdata.shape == ydata.shape
assert sigma is None or sigma.shape == ydata.shape
Nps = _number_of_arguments(fslow) - 1
Npq = _number_of_arguments(fquick) - 1
assert ps_guess is None or Nps == len(ps_guess)
assert pq_guess is None or Npq == len(pq_guess)
cur_ps = np.array(ps_guess, copy=True) if ps_guess is not None else np.ones(Nps)
cur_pq = np.array(pq_guess, copy=True) if pq_guess is not None else np.ones(Npq)
def f_with_quickopt(x, *ps, sigma=None):
'''
That function keeps the state of pq between calls via the pq attribute.
That attribute must hence be set before the first call to the function.
For more details on the method, see
https://python-forum.io/Thread-function-state-between-calls?pid=38969#pid38969
'''
try:
f_with_quickopt.pq
except AttributeError as exc:
raise RuntimeError('You must define attribute pq.') from exc
interm = fslow(x, *ps)
def f_quick_score_to_minimize(pq):
yest = fquick(interm, *pq) # pq not long enough?
return _score(ydata, yest, sigma=sigma)
solve_quick = scipy.optimize.minimize(
f_quick_score_to_minimize, f_with_quickopt.pq
)
f_with_quickopt.pq = solve_quick.x
return fquick(interm, *f_with_quickopt.pq)
f_with_quickopt.pq = cur_pq
# Main optimization call - here's what should take most time
ps_opt, _ = scipy.optimize.curve_fit(f_with_quickopt, xdata, ydata, p0=cur_ps, sigma=sigma)
pq_opt = f_with_quickopt.pq
return ps_opt, pq_opt
if __name__ == '__main__':
def slow(x, a, b):
slow.Ncalls += 1
sleep(0.01)
return a*x**2 + b * x
def quick(x, a, b):
quick.Ncalls += 1
return a*x + b*np.cos(x)
slow.Ncalls = 0
quick.Ncalls = 0
def model(x, ps, pq):
return quick(slow(x, *ps), *pq)
ps_actual = 1.0 + np.random.random((2, ))
pq_actual = 1.0 + np.random.random((2, ))
xdata = np.linspace(-0.0, 2.0, num=1000)
y_theory = model(xdata, ps_actual, pq_actual)
ydata = y_theory + np.random.randn(*xdata.shape)/5
plt.figure()
plt.plot(xdata, ydata, 'k.', label='data')
plt.plot(xdata, y_theory, 'r-', label='actual')
for (lab, opti) in [
('std: {q} quick(), {s} slow()', _optimize_std),
('asym: {q} quick(), {s} slow()', optimize)
]:
slow.Ncalls = 0
quick.Ncalls = 0
ps, pq = opti(slow, quick, xdata, ydata)
leg = lab.format(q=quick.Ncalls, s=slow.Ncalls)
yplot = model(xdata, ps, pq)
plt.plot(xdata, yplot, label=leg)
plt.legend()
plt.show()

using python to do 3-D surface fitting

i can use module(scipy.optimize.least_squares) to do 1-D curve fitting(of course,i can also use curve_fit module directly) , like this
def f(par,data,obs):
return par[0]*data+par[1]-obs
def get_f(x,a,b):
return x*a+b
data = np.linspace(0, 50, 100)
obs = get_f(data,3.2,2.3)
par = np.array([1.0, 1.0])
res_lsq = least_squares(f, par, args=(data, obs))
print res_lsq.x
i can get right fitting parameter (3.2,2.3),but when I generalize this method to multi-dimension,like this
def f(par,data,obs):
return par[0]*data[0,:]+par[1]*data[1,:]-obs
def get_f(x,a,b):
return x[0]*a+b*x[1]
data = np.asarray((np.linspace(0, 50, 100),(np.linspace(0, 50, 100)) ) )
obs = get_f(data,1.,1.)
par = np.array([3.0, 5.0])
res_lsq = least_squares(f, par, args=(data, obs))
print res_lsq.x
I find i can not get right answer, i.e (1.,1.),i have no idea whether i have made a mistake.
The way you generate data and observations in the "multi-dimensional" case effectively results in get_f returning (a+b)*x[0] (input values x[0], x[1] are always the same) and, similarly, f returning (par[0]+par[1])*data[0]-obs. Of course, with a=1 and b=1, the exact same obs would be generated by any other values a, b such that a+b=1. Scipy correctly returns one of the (infinite) possible values satisfying this constraint, depending on the initial estimate.

Optimize.fmin does not find minimum on well-behaved continuous function

I'm trying to find the minimum on the following function:
Here's the call:
>>> optimize.fmin(residualLambdaMinimize, 0.01, args=(u, returnsMax, Param, residualLambdaExtended),
disp=False, full_output=True, xtol=0.00001, ftol = 0.0001)
Out[19]: (array([ 0.0104]), 0.49331109755304359, 10, 23, 0)
>>> residualLambdaMinimize(0.015, u, returnsMax, Param, residualLambdaExtended)
Out[22]: 0.46358005517761958
>>> residualLambdaMinimize(0.016, u, returnsMax, Param, residualLambdaExtended)
Out[23]: 0.42610470795409616
As you can see, there's points in the direct neighborhood which yield smaller values. Why doesn't my solver consider them?
Here is a suggestion which may help you debug the situation.
If you add something like
data.append((x, result)) to residualLambdaMinimize, you can collect all the points where optimize.fmin is evaluating residualLambdaMinimize:
data = []
def residualLambdaMinimize(x, u, returnsMax, Param, residualLambdaExtended):
result = ...
data.append((x, result))
return result
Then we might be better able to understand what fmin is doing (and maybe reproduce the problem) if you post data without us having to see exactly how residualLambdaMinimize is defined.
Moreover, you can visualize the "path" fmin is taking as it tries to find the minimum:
import numpy as np
import scipy.optimize as optimize
import matplotlib.pyplot as plt
data = []
def residualLambdaMinimize(x, u, returnsMax, Param, residualLambdaExtended):
result = (x-0.025)**2
data.append((x, result))
return result
u, returnsMax, Param, residualLambdaExtended = range(4)
retval = optimize.fmin(
residualLambdaMinimize, 0.01,
args=(u, returnsMax, Param, residualLambdaExtended),
disp=False, full_output=True, xtol=0.00001, ftol = 0.0001)
data = np.squeeze(data)
x, y = data.T
plt.plot(x, y)
plt.show()

Categories

Resources