What is the easiest way to save intermediate variables during simulation with odeint in Numpy?
For example:
def dy(y,t)
x = np.rand(3,1)
return y + x.sum()
sim = odeint(dy,0,np.arange(0,1,0.1))
What would be the easiest way to save the data stored in x during simulation? Ideally at the points specified in the t argument passed to odeint.
A handy way to hack odeint, with some caveats, is to wrap your call to odeint in method in a class, with dy as another method, and pass self as an argument to your dy function. For example,
class WrapODE(object):
def __init__(self):
self.y_0 = 0.
self.L_x = []
self.timestep = 0
self.times = np.arange(0., 1., 0.1)
def run(self):
self.L_y = odeint(
self.dy,
self.y_0, self.times,
args=(self,))
#staticmethod
def dy(y, t, self):
""""
Discretized application of dudt
Watch out! Because this is a staticmethod, as required by odeint, self
is the third argument
"""
x = np.random.rand(3,1)
if t >= self.times[self.timestep]:
self.timestep += 1
self.L_x.append(x)
else:
self.L_x[-1] = x
return y + x.sum()
To be clear, this is a hack that is prone to pitfalls. For example, unless odeint is doing Euler stepping, dy is going to get called more times than the number of timesteps you specify. To make sure you get one x for each y, the monkey business in the if t >= self.times[self.timestep]: block picks a spot in an array for storing data for each time value from the times vector. Your particular application might lead to other crazy problems. Be sure to thoroughly validate this method for your application.
Related
I am trying to define a piecewise function to be fitted by lmfit library in Python. The issue I am having is a parameter I have defined for the function will not evaluate alongside the data I am submitting.
I have one example of a case somewhat similar to mine here. However, the vectorize function the answer describes wasn't producing values I wanted, and when reading the documentation, it didn't seem to be the answer to my solution. I also used scipy.optimize.leastsq, but I got the same issue with lmfit described below.
I have a my residual function defined such as
from lmfit import minimize, Parameters, Model
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
if(param2 < x):
p = 1
else:
p = param1*x + param2
return p - y
params = Parameters()
params.add('one', value=1)
params.add('two', value=2)
out = minimize(residual, params,args=(y,x))
I also tried defining the function such that
def f(param1,param2,x):
if(param2 < x):
p = 1
else:
p = param1*x + param2
return p
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
return f(param1,param2,x) - y
I have also tried inline using a lambda function.
I am getting an error 'The truth value of an array with more than one element is ambiguous.' When I got the error, it made sense why it happened, because (param2 < x) would produce a logical array. However, I can't seem to find a way to define the function in a piecewise fashion with the given case to get it fitted with the lmfit.minimize() function. I have seen the answer done in Matlab, in which it's nlinfit function seems to evaluate the data element-wise without issue (I tried searching if Python has an equivalent operation to define element-wise computation such as .* or .+, but that doesn't seem to exist as explicitly).
lmfit also seems to operate a bit differently compared to nlinfit, because we have to always have our residuals return (model - y) while nlinfit outputs the result once the function is given, which I am not sure could be another issue.
So to reiterate, my main question is if there is a method of defining the piecewise function such that it can compare the parameter to the data set.
Any help or explanation would be appreciated, thank you!
In place of (param2 < x) (where param2 is a float and x is an numpy array), you want to use numpy.where. You might try:
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
p = param1 * x + param2
p[np.where(param2 < x)] = 1.0
return p - y
I should also warn you about a potential problem with this approach to having a variable be a boundary for a piecewise function.
In non-linear fits, variables are always floating point (continuous, non-discrete) values. As the fit proceeds, it will make small adjustments in the values and see how that small change alters the result. In your approach, the parameter 'two' is used as both the transition between pieces and the offset for the line -- that is good.
If a parameter is used only as the transition, it may not work. Consider, say, x=np.array([0, 1., 2., 3., 4., ..., 20.0]). Having two = 10.5 and two=10.4 would then give the same result. In that case, the fit would not be able to alter the value of two: it would try a very small change, see no change in the result and give up.
So, either make sure that two is also used elsewhere in your real model (assuming your real model is more complicated than the example given), or consider using a more gentle transition rather than a hard change in pieces. I find an error-function of width ~spacing between x points often works. Depending on the nature of your problem, you might try something like this:
from scipy.special import erf, erfc
def residual(params, y, x):
param1 = params['one']
param2 = params['two']
dx = (max(x) - min(x))/(len(x)-1)
xhi = (erf((x-param2)/dx) + 1)/2.0
xlo = (erfc((x-param2)/dx) + 1)/2.0
p = xlo*1.0 + xhi*(param1*x + param2)
# note: did you really want?
# p = xlo*param + xhi*(param1*x + param2)
# p = param2 + xhi*param1*x
return p - y
Hope that helps.
I have a function Imaginary which describes a physics process and I want to fit this to a dataset x_interpolate, y_interpolate. The function is a form of a Lorentzian peak function and I have some initial values that are user given, except for f_peak (the peak location) which I find using a peak finding algorithm. All of the fit parameters, except for the offset, are expected to be positive and thus I have set bounds_I accordingly.
def Imaginary(freq, alpha, res, Ms, off):
numerator = (2*alpha*freq*res**2)
denominator = (4*(alpha*res*freq)**2) + (res**2 - freq**2)**2
Im = Ms*(numerator/denominator) + off
return Im
pI = np.array([alpha_init, f_peak, Ms_init, 0])
bounds_I = ([0,0,0,0, -np.inf], [np.inf,np.inf,np.inf, np.inf])
poptI, pcovI = curve_fit(Imaginary, x_interpolate, y_interpolate, pI, bounds=bounds_I)
In some situations I want to keep the parameter f_peak fixed during the fitting process. I tried an easy solution by changing bounds_I to:
bounds_I = ([0,f_peak+0.001,0,0, -np.inf], [np.inf,f_peak-0.001,np.inf, np.inf])
This is for many reasons not an optimal way of doing this so I was wondering if there is a more Pythonic way of doing this? Thank you for your help
If a parameter is fixed, it is not really a parameter, so it should be removed from the list of parameters. Define a model that has that parameter replaced by a fixed value, and fit that. Example below, simplified for brevity and to be self-contained:
x = np.arange(10)
y = np.sqrt(x)
def parabola(x, a, b, c):
return a*x**2 + b*x + c
fit1 = curve_fit(parabola, x, y) # [-0.02989396, 0.56204598, 0.25337086]
b_fixed = 0.5
fit2 = curve_fit(lambda x, a, c: parabola(x, a, b_fixed, c), x, y)
The second call to fit returns [-0.02350478, 0.35048631], which are the optimal values of a and c. The value of b was fixed at 0.5.
Of course, the parameter should be removed from the initial vector pI and the bounds as well.
You might find lmfit (https://lmfit.github.io/lmfit-py/) helpful. This library adds a higher-level interface to the scipy optimization routines, aiming for a more Pythonic approach to optimization and curve fitting. For example, it uses Parameter objects to allow setting bounds and fixing parameters without having to modify the objective or model function. For curve-fitting, it defines high level Model functions that can be used.
For you example, you could use your Imaginary function as you've written it with
from lmfit import Model
lmodel = Model(Imaginary)
and then create Parameters (lmfit will name the Parameter objects according to your function signature), providing initial values:
params = lmodel.make_params(alpha=alpha_init, res=f_peak, Ms=Ms_init, off=0)
By default all Parameters are unbound and will vary in the fit, but you can modify these attributes (without rewriting the model function):
params['alpha'].min = 0
params['res'].min = 0
params['Ms'].min = 0
You can set one (or more) of the parameters to not vary in the fit as with:
params['res'].vary = False
To be clear: this does not require altering the model function, making it much easier to change with is fixed, what bounds might be imposed, and so forth.
You would then perform the fit with the model and these parameters:
result = lmodel.fit(y_interpolate, params, freq=x_interpolate)
you can get a report of fit statistics, best-fit values and uncertainties for parameters with
print(result.fit_report())
The best fit Parameters will be held in result.params.
FWIW, lmfit also has builtin Models for many common forms, including Lorentzian and a Constant offset. So, you could construct this model as
from lmfit.models import LorentzianModel, ConstantModel
mymodel = LorentzianModel(prefix='l_') + ConstantModel()
params = mymodel.make_params()
which will have Parameters named l_amplitude, l_center, l_sigma, and c (where c is the constant) and the model will use the name x for the independent variable (your freq). This approach can become very convenient when you may want to change the functional form of the peaks or background, or when fitting multiple peaks to a spectrum.
I was able to solve this issue regarding arbitrary number of parameters and arbitrary positioning of the fixed parameters:
def d_fit(x, y, param, boundMi, boundMx, listparam):
Sparam, SboundMi, SboundMx = asarray([]), asarray([]), asarray([])
Nparam, NboundMi, NboundMx = asarray([]), asarray([]), asarray([])
for i in range(len(param)):
if(listparam[i] == 1):
Sparam = append(Sparam,asarray(param[i]))
SboundMi = append(SboundMi,asarray(boundMi[i]))
SboundMx = append(SboundMx,asarray(boundMx[i]))
else:
Nparam = append(Nparam,asarray(param[i]))
def funF(x, Sparam):
j = 0
for i in range(len(param)):
if(listparam[i] == 1):
param[i] = Sparam[i-j]
else:
param[i] = Nparam[j]
j = j + 1
return fun(x, param)
return curve_fit(lambda x, *Sparam: funF(x, Sparam), x, y, p0 = Sparam, bounds = (SboundMi,SboundMx))
In this case:
param = [a,b,c,...] # parameters array (any size)
boundMi = [min_a, min_b, min_c,...] # minimum allowable value of each parameter
boundMx = [max_a, max_b, max_c,...] # maximum allowable value of each parameter
listparam = [0,1,1,0,...] # 1 = fit and 0 = fix the corresponding parameter in the fit routine
and the root function is define as
def fun(x, param):
a,b,c,d.... = param
return a*b/c... # any function of the params a,b,c,d...
This way, you can change the root function and the number of parameters without changing the fit routine.
And, at any time, you can fix or let fit any parameter by changing "listparam".
Use like this:
popt, pcov = d_fit(x, y, param, boundMi, boundMx, listparam)
"popt" and "pcov" are 1D arrays of the size of the number of "1" in "listparam" bringing the results of the fitted parameters (best value and err matrix)
"param" will ramain an 1D array of the same size of the original (input) "param", HOWEVER IT WILL BE UPDATED AUTOMATICALLY TO THE FITTED VALUES (same as "popt") for the fitted values, keeping the fixed values according to "listparam"
Hope can be usefull!
Obs1: x = 1D-array independent values and y = 1D-array dependent values
Obs2: This is my first post. Please let me know if I can improove it!
I'm a bit new to Python and PyMC, and making rapid progress. But I'm just confused about the use of setting deterministic values of a 2D matrix. I have a model below, that I cannot get to parse correctly. The problem relates to setting the value theta in the model.
import numpy as np
import pymc
define known variables
N = 2
T = 10
tau = 1
define model... which I cannot get to parse correctly. It's the allocation of theta that I'm having trouble with. The aim to to get samples of D and x. Theta is just an intermediate variable, but I need to keep it as it's used in more complex variations of the model.
def NAFCgenerator():
D = np.empty(T, dtype=object)
theta = np.empty([N,T], dtype=object)
x = np.empty([N,T], dtype=object)
# true location of signal
for t in range(T):
D[t] = pymc.DiscreteUniform('D_%i' % t, lower=0, upper=N-1)
for t in range(T):
for n in range(N):
#pymc.deterministic(plot=False)
def temp_theta(dt=D[t], n=n):
return dt==n
theta[n,t] = temp_theta
x[n,t] = pymc.Normal('x_%i,%i' % (n,t),
mu=theta[n,t], tau=tau)
return locals()
** EDIT **
Explicit indexing is useful for me as I'm learning both PyMC and Python. But it seems that extracting MCMC samples is a bit clunky, e.g.
D0values = pymc_generator.trace('D_0')[:]
But I am probably missing something. But did I managed to get a vectorised version working
# Approach 1b - actually quite promising
def NAFCgenerator():
# NOTE TO SELF. It's important to declare these as objects
D = np.empty(T, dtype=object)
theta = np.empty([N,T], dtype=object)
x = np.empty([N,T], dtype=object)
# true location of signal
D = pymc.Categorical('D', spatial_prior, size=T)
# displayed stimuli
#pymc.deterministic(plot=False)
def theta(D=D):
theta = np.zeros([N,T])
theta[0,D==0]=1
theta[1,D==1]=1
return theta
#for n in range(N):
x = pymc.Normal('x', mu=theta, tau=tau)
return locals()
Which seems easier to get at MCMC samples using this for example
Dvalues = pymc_generator.trace('D')[:]
In PyMC2, when creating deterministic nodes with decorators, the default is to take the node name from the function name. The solution is simple: specify the node name as a parameter for the decorator.
#pymc.deterministic(name='temp_theta_%d_%d'%(t,n), plot=False)
def temp_theta(dt=D[t], n=n):
return dt==n
theta[n,t] = temp_theta
Here is a notebook that puts this in context.
I will simply explain the problem in short. This problem is exactly similar as shown in scipy.doc. The problem is on error occurance as float argument required, not numpy.ndarray
What I have:
Function: y = s*z^t
Variable length/dimensions
t - 1...m,
s - 1...m and 1...n. So, m is row number, n - col number.
z - 1...n.
y - this can be y1, y[2], y[3],..., y[m],
T - s[m,n] matrix
Like this:
y[1] = s[1][1]*z[1]^t[1]+s[1][2]*z[2]^t[1]+...s[1][n]*z[n]^t[1])
y[2] = s[2][1]*z[1]^t[2]+s[2][2]*z[2]^t[2]+...s[2][n]*z[n]^t[2])
...
y[m] = s[m][1]*z[1]^t[m]+s[m][2]*z[2]^t[2]+...s[m][n]*z[n]^t[m])
Problem: Error occured.
Optimization terminated successfully.
Traceback (most recent call last):
solution = optimize.fmin_cg(func, z, fprime=gradf, args=args)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 952, in fmin_cg
res = _minimize_cg(f, x0, args, fprime, callback=callback, **opts)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 1072, in _minimize_cg
print " Current function value: %f" % fval
TypeError: float argument required, not numpy.ndarray
Here is the code
import numpy as np
import scipy as sp
import scipy.optimize as optimize
def func(z, *args):
y,T,t = args[0]
return y - counter(T,z,t)
def counter(T, z, t):
rows,cols = np.shape(T)
res = np.zeros(rows)
for i,row_val in enumerate(T):
res[i] = np.dot(row_val, z**t[i])
return res
def gradf(z, *args):
y,T,t = args[0]
return np.dot(t,counter(T,z,t-1))
def main():
# Inputs
N = 30
M = 20
z0 = np.zeros(N) # initial guess
y = 30*np.random.random(M)
T = 10*np.random.random((M,N))
t = 5*np.random.random(M)
args = [y, T, t]
solution = optimize.fmin_cg(func, z0, fprime=gradf, args=args)
print 'solution: ', solution
if __name__ == '__main__':
main()
I also tried to find similar examples but could not find something very similar. Here is the code for your consideration. Thanks in advance.
The root of your problem is that fmin_cg expects the function to return a single scalar value for the misfit instead of an array.
Basically, you want something vaguely similar to:
def func(z, y, T, t):
return np.linalg.norm(y - counter(T,z,t))
I'm using np.linalg.norm here because there's no builtin function in numpy for the root-mean-square. The actual RMS would be norm(x) / sqrt(x.size), but for minimization the constant multiplier doesn't make any difference.
There are also other minor problems in your code (e.g. args[0] is going to be a single item. You want y, T, t = args or better yet, just func(z, y, T, t)). Your gradient function doesn't make any sense to me, but it's optional regardless. Also, there's no way the solution can produce reasonable values at the moment, as you're testing it against pure noise. I assume those are just meant to be placeholder values, though.
However, you have a larger problem. You're trying to minimize in 30-dimensional space. Most non-linear solvers aren't going to work well with that high of a dimensionality. It may work fine, but you're very likely to run into problems.
All that having been said, you may find it more intuitive to use the scipy.optimize.curve_fit interface rather than using the others, if you're okay with LM instead of CG (they're fairly similar methods).
One final thing: You're trying to solve for 30 model parameters with 20 observations. This is an underdetermined problem. This problem doesn't have a unique solution. You're going to need to apply some a-priori knowledge to get a reasonable answer.
I have been playing around with a package that uses a linear scipy.interpolate.interp1d to create a history function for the ode solver in scipy, described here.
The relevant bit of code goes something like
def update(self, ti, Y):
""" Add one new (ti, yi) to the interpolator """
self.itpr.x = np.hstack([self.itpr.x, [ti]])
yi = np.array([Y]).T
self.itpr.y = np.hstack([self.itpr.y, yi])
#self.itpr._y = np.hstack([self.itpr.y, yi])
self.itpr.fill_value = Y
Where "self.itpr" is initialized in __init__:
def __init__(self, g, tc=0):
""" g(t) = expression of Y(t) for t<tc """
self.g = g
self.tc = tc
# We must fill the interpolator with 2 points minimum
self.itpr = scipy.interpolate.interp1d(
np.array([tc-1, tc]), # X
np.array([self.g(tc), self.g(tc)]).T, # Y
kind='linear', bounds_error=False,
fill_value = self.g(tc))
Where g is some function that returns an array of values that are solutions to a set of differential equations and tc is the current time.
This seems nice to me because a new interpolator object doesn't have to be re-created every time I want to update the ranges of values (which happens at each explicit time step during a simulation). This method of updating the interpolator works well under scipy v 0.11.0. However, after updating to v 0.12.0 I ran into issues. I see that the new interpolator now includes an array _y that seems to just be another copy of the original. Is it safe and/or sane to just update _y as outlined above as well? Is there a simpler, more pythonic way to address this that would hopefully be more robust to future updates in scipy? Again, in v 0.11 everything works well and expected results are produced, and in v 0.12 I get an IndexError when _y is referenced as it isn't updated in my function while y itself is.
Any help/pointers would be appreciated!
It looks like _y is just a copy of y that has been reshaped by interp1d._reshape_yi(). It should therefore be safe to just update it using:
self.itpr._y = self.itpr._reshape_yi(self.itpr.y)
In fact, as far as I can tell it's only _y that gets used internally by the interpolator, so I think you could get away without actually updating y at all.
A much more elegant solution would be to make _y a property of the interpolator that returns a suitably reshaped copy of y. It's possible to achieve this by monkey-patching your specific instance of interp1d after it has been created (see Alex Martelli's answer here for more explanation):
x = np.arange(100)
y = np.random.randn(100)
itpr = interp1d(x,y)
# method to get self._y from self.y
def get_y(self):
return self._reshape_yi(self.y)
meth = property(get_y,doc='reshaped version of self.y')
# make this a method of this interp1d instance only
basecls = type(itpr)
cls = type(basecls.__name__, (basecls,), {})
setattr(cls, '_y', meth)
itpr.__class__ = cls
# itpr._y is just a reshaped version of itpr.y
print itpr.y.shape,itpr._y.shape
>>> (100,) (100, 1)
Now itpr._y gets updated when you update itpr.y
itpr.x = np.arange(110)
itpr.y = np.random.randn(110)
print itpr._y.shape
>>> (110,) (110, 1)
This is all quite fiddly and not very Pythonic - it's much easier to fix the scipy source code (scipy/interpolate/interpolate.py). All you'd need to do is remove the last line from interp1d.__init__() where it sets:
self._y = y
and add these lines:
#property
def _y(self):
return self._reshape_yi(self.y)