Stepsize control of dopri5 integrator - python

I am trying to solve a simple example with the dopri5 integrator in scipy.integrate.ode. As the documentation states
This is an explicit runge-kutta method of order (4)5 due to Dormand & Prince (with stepsize control and dense output).
this should work. So here is my example:
import numpy as np
from scipy.integrate import ode
import matplotlib.pyplot as plt
def MassSpring_with_force(t, state):
""" Simple 1DOF dynamics model: m ddx(t) + k x(t) = f(t)"""
# unpack the state vector
x = state[0]
xd = state[1]
# these are our constants
k = 2.5 # Newtons per metre
m = 1.5 # Kilograms
# force
f = force(t)
# compute acceleration xdd
xdd = ( ( -k*x + f) / m )
# return the two state derivatives
return [xd, xdd]
def force(t):
""" Excitation force """
f0 = 1 # force amplitude [N]
freq = 20 # frequency[Hz]
omega = 2 * np.pi *freq # angular frequency [rad/s]
return f0 * np.sin(omega*t)
# Time range
t_start = 0
t_final = 1
# Main program
state_ode_f = ode(MassSpring_with_force)
state_ode_f.set_integrator('dopri5', rtol=1e-6, nsteps=500,
first_step=1e-6, max_step=1e-3)
state2 = [0.0, 0.0] # initial conditions
state_ode_f.set_initial_value(state2, 0)
sol = np.array([[t_start, state2[0], state2[1]]], dtype=float)
print("Time\t\t Timestep\t dx\t\t ddx\t\t state_ode_f.successful()")
while state_ode_f.t < (t_final):
state_ode_f.integrate(t_final, step=True)
sol = np.append(sol, [[state_ode_f.t, state_ode_f.y[0], state_ode_f.y[1]]], axis=0)
print("{0:0.8f}\t {1:0.4e} \t{2:10.3e}\t {3:0.3e}\t {4}".format(
state_ode_f.t, sol[-1, 0]- sol[-2, 0], state_ode_f.y[0], state_ode_f.y[1], state_ode_f.successful()))
The result I get is:
Time Timestep dx ddx state_ode_f.successful()
0.49763822 4.9764e-01 2.475e-03 -8.258e-04 False
0.99863822 5.0100e-01 3.955e-03 -3.754e-03 False
1.00000000 1.3618e-03 3.950e-03 -3.840e-03 False
with a warning:
c:\python34\lib\site-packages\scipy\integrate_ode.py:1018: UserWarning: dopri5: larger nmax is needed
self.messages.get(idid, 'Unexpected idid=%s' % idid))
The result is incorect. If I run the same code with vode integrator, I get the expected result.
Edit
A similar issue is described here:
Using adaptive step sizes with scipy.integrate.ode
The suggested solution recommends setting nsteps=1, which solves the ODE correctly and with step-size control. However the integrator returns state_ode_f.successful() as False.

No, there is nothing wrong. You are telling the integrator to perform an integration step to t_final and it performs that step. Internal steps of the integrator are not reported.
The sensible thing to do is to give the desired sampling points as input of the algorithm, set for example dt=0.1 and use
state_ode_f.integrate( min(state_ode_f.t+dt, t_final) )
There is no single-step method in dopri5, only vode has it defined, see the source code https://github.com/scipy/scipy/blob/v0.14.0/scipy/integrate/_ode.py#L376, this could account for the observed differences.
As you found in Using adaptive step sizes with scipy.integrate.ode, one can force single-step behavior by setting the iteration bound nsteps=1. This will produce a warning every time, so one has to suppress these specific warnings to see a sensible result.
You should not use a parameter (which is a constant for the integration interval) for a time-dependent force. Use inside MassSpring_with_force the evaluation f=force(t). Possibly you could pass the function handle of force as parameter.

Related

Solving coupled differential equations with sympy

I am trying to solve the following system of first order coupled differential equations:
- dR/dt=k2*Y(t)-k1*R(t)*L(t)+k4*X(t)-k3*R(t)*I(t)
- dL/dt=k2*Y(t)-k1*R(t)*L(t)
- dI/dt=k4*X(t)-k3*R(t)*I(t)
- dX/dt=k1*R(t)*L(t)-k2*Y(t)
- dY/dt=k3*R(t)*I(t)-k4*X(t)
The knowed initial conditions are: X(0)=0, Y(0)=0
The knowed constants values are: k1=6500, k2=0.9
This equations defines a kinetick model and I need to solve them to get the Y(t) function to fit my data and find k3 and k4 values. In order to that, I have tried to solve the system simbologically with sympy. There is my code:
import matplotlib.pyplot as plt
import numpy as np
import sympy
from sympy.solvers.ode.systems import dsolve_system
from scipy.integrate import solve_ivp
from scipy.integrate import odeint
k1 = sympy.Symbol('k1', real=True, positive=True)
k2 = sympy.Symbol('k2', real=True, positive=True)
k3 = sympy.Symbol('k3', real=True, positive=True)
k4 = sympy.Symbol('k4', real=True, positive=True)
t = sympy.Symbol('t',real=True, positive=True)
L = sympy.Function('L')
R = sympy.Function('R')
I = sympy.Function('I')
Y = sympy.Function('Y')
X = sympy.Function('X')
f1=k2*Y(t)-k1*R(t)*L(t)+k4*X(t)-k3*R(t)*I(t)
f2=k2*Y(t)-k1*R(t)*L(t)
f3=k4*X(t)-k3*R(t)*I(t)
f4=-f2
f5=-f3
eq1=sympy.Eq(sympy.Derivative(R(t),t),f1)
eq2=sympy.Eq(sympy.Derivative(L(t),t),f2)
eq3=sympy.Eq(sympy.Derivative(I(t),t),f3)
eq4=sympy.Eq(sympy.Derivative(Y(t),t),f4)
eq5=sympy.Eq(sympy.Derivative(X(t),t),f5)
Sys=(eq1,eq2,eq3,eq4,eq5])
solsys=dsolve_system(eqs=Sys,funcs=[X(t),Y(t),R(t),L(t),I(t)], t=t, ics={Y(0):0, X(0):0})
There is the answer:
NotImplementedError:
The system of ODEs passed cannot be solved by dsolve_system.
I have tried with dsolve too, but I get the same.
Is there any other solver I can use or some way of doing this that will allow me to get the function for the fitting? I'm using python 3.8 in Spider with Anaconda in windows64.
Thank you!
# Update
Following
You are saying "experiment". So you have data and want to fit the model to them, find appropriate values for k3 and k4 at least, and perhaps for all coefficients and the initial conditions (the first measured data point might not be the initial condition for the best fit)? See stackoverflow.com/questions/71722061/… for a recent attempt on such a task. –
Lutz Lehmann
23 hours
There is my new code:
t=[0,0.25,0.5,0.75,1.5,2.27,3.05,3.82,4.6,5.37,6.15,6.92,7.7,8.47,13.42,18.42,23.42,28.42,33.42,38.42,43.42,48.42,53.42,58.42,63.42,68.42,83.4,98.4,113.4,128.4,143.4,158.4,173.4,188.4,203.4,218.4,233.4,248.4]
yexp=[832.49,1028.01,1098.12,1190.08,1188.97,1377.09,1407.47,1529.35,1431.72,1556.66,1634.59,1679.09,1692.05,1681.89,1621.88,1716.77,1717.91,1686.7,1753.5,1722.98,1630.14,1724.16,1670.45,1677.16,1614.98,1671.16,1654.03,1661.84,1675.31,1626.76,1638.29,1614.41,1594.31,1584.73,1599.22,1587.85,1567.74,1602.69]
def derivative(S, t, k3, k4):
k1=1798931
k2=0.2629
x, y,r,l,i = S
doty = k1*r*l+k2*y
dotx = k3*r*i-k4*x
dotr = k2*y-k1*r*l+k4*x-k3*r*i
dotl = k2*y-k1*r*l
doti = k4*x-k3*r*i
return np.array([doty, dotx, dotr, dotl, doti])
def solver(XY,t,para):
return odeint(derivative, XY, t, args = para, atol=1e-8, rtol=1e-11)
def integration(XY_arr,*para):
XY0 = para[:5]
para = para[5:]
T = np.arange(len(XY_arr))
res0 = solver(XY0,T, para)
res1 = [ solver(XY0,[t,t+1],para)[-1]
for t,XY in enumerate(XY_arr[:-1]) ]
return np.concatenate([res0,res1]).flatten()
XData =yexp
YData = np.concatenate([ yexp,yexp,yexp,yexp,yexp,yexp[1:],yexp[1:],yexp[1:],yexp[1:],yexp[1:]]).flatten()
p0 =[0,0,100,10,10,1e8,0.01]
params, info = curve_fit(integration,XData,YData,p0=p0, maxfev=5000)
XY0, para = params[:5], params[5:]
print(XY0,tuple(para))
t_plot = np.linspace(0,len(t),500)
x_plot = solver(XY0, t_plot, tuple(para))
But the output are not correct, as are the same as initial condition p0:
[ 0. 0. 100. 10. 10.] (100000000.0, 0.01)
Graph
I understand that the function 'integration' gives packed values of y for each function at each instant of time, but I don't know how to unpack them to make the curve_fitt separately. Maybe I don't quite understand how it works.
Thank you!
As you observed, sympy is not able to solve this system. This might mean that
the procedure to classify ODE in sympy is not complete enough, or
some trick/method is needed above the standard set of methods that is implemented in sympy, or
that there just is no symbolic solution.
The last case is the generic one, take a symbolically solvable ODE, add some random term, and almost certainly the resulting ODE is no longer symbolically solvable.
As I understand with the comments, you have an model via ODE system with state space (cX,cY,cR,cL,cI) with equations with 4 parameters k1,k2,k3,k4 and, by the structure of a reaction system R+I <-> X, R+L <-> Y, the sums cR+cX+cY, cL+cY, cI+cX are all constant.
For some other process that is approximately represented by the model, you have time series data t[k],y[k] for the Y component. Also you have partial information on the initial state and the parameter set. If there are sufficiently many data points one could also forget about these, fit for all parameters, and compare how far away the given parameters are to the computed ones.
There are several modules and packages that solve this fitting task in a more or less abstract fashion. I think pyomo and gekko can both be used. More directly one can use the facilities of scipy.odr or scipy.optimize.
Define the forward function that transforms time and parameters
def model(t,u,k1,k2,k3,k4):
X,Y,R,L,I = u
dL = k2*Y - k1*R*L
dI = k4*X - k3*R*I
dR = dL+dI
dX = -dI
dY = -dL
return dX,dY,dR,dL,dI
def solver(t,u0,k):
res = solve_ivp(model, [0, t[-1]], u0, args=tuple(k), t_eval=t,
method="DOP853", atol=1e-7, rtol=1e-10)
return res.y
Prepare some data plus noise
k1o = 6.500; k2o=0.9
T = np.linspace(0,0.05,21)
U = solver(T, [0,0,50,40,25], [k1o, k2o, 5.400, 0.7])
Y = U[1] # equilibrium slightly above 30
Y += np.random.uniform(high=0.05, size=Y.shape)
Prepare the function that splits the combined parameter vector in initial state and coefficients, call the curve fitting function
from scipy.optimize import curve_fit
def partial(t,R,L,I,k3,k4):
print(R,L,I,k3,k4)
U = solver(t,[0,0,R,L,I],[k1o,k2o,k3,k4])
return U[1]
params, info = curve_fit(partial,T,Y, p0=[30,20,10, 0.3,3.000])
R,L,I, k3,k4 = params
print(R,L,I, k3,k4)
It turns out that curve_fit goes into strange regions with large negative values. A likely reason is that the Y component is, in the end, not coupled strongly enough to all the other components, meaning that large changes in some of the parameters have minimal influence on Y, so that minimal noise in Y can lead to large deviations in these parameters. Here this apparently happens (first) to k3.

Computing the confidence intervals for the lambda parameter of an exponential distribution in Python

Suppose I have a sample, which I have reason to believe follows an exponential distribution. I want to estimate the distributions parameter (lambda) and some indication of the confidence. Either the confidence interval or the standard error would be fine. Sadly, scipy.stats.expon.fit does not seem to permit that. Here's an example, I will use lambda=1/120=0.008333 for the test data:
""" Generate test data"""
import scipy.stats
test_data = scipy.stats.expon(scale=120).rvs(size=3000)
""" Scipy.stats fit"""
fit = scipy.stats.expon.fit(test_data)
print("\nExponential parameters:", fit, " Specifically lambda: ", 1/fit[1], "\n")
# Exponential parameters: (0.0066790678905608875, 116.8376079908356) Specifically lambda: 0.008558887991599736
The answer to a similar question about gamma-distributed data suggests using GenericLikelihoodModel from the statsmodels module. While I can confirm that this works nicely for gamma-distributed data, it does not for exponential distributions, because the optimization apparently results in a non-invertible Hessian matrix. This results either from non-finite elements in the Hessian or from np.linalg.eigh producing non-positive eigenvalues for the Hessian. (Source code here; HessianInversionWarning is raised in the fit method of the LikelihoodModel class.).
""" Statsmodel fit"""
from statsmodels.base.model import GenericLikelihoodModel
class Expon(GenericLikelihoodModel):
nparams = 2
def loglike(self, params):
return scipy.stats.expon.logpdf(self.endog, *params).sum()
res = Expon(test_data).fit(start_params=fit)
res.df_model = len(fit)
res.df_resid = len(test_data) - len(fit)
print(res.summary())
#Optimization terminated successfully.
# Current function value: 5.760785
# Iterations: 38
# Function evaluations: 76
#/usr/lib/python3.8/site-packages/statsmodels/tools/numdiff.py:352: RuntimeWarning: invalid value encountered in double_scalars
# hess[i, j] = (f(*((x + ee[i, :] + ee[j, :],) + args), **kwargs)
#/usr/lib/python3.8/site-packages/statsmodels/base/model.py:547: HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
# warn('Inverting hessian failed, no bse or cov_params '
#/usr/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py:903: RuntimeWarning: invalid value encountered in greater
# return (a < x) & (x < b)
#/usr/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py:903: RuntimeWarning: invalid value encountered in less
# return (a < x) & (x < b)
#/usr/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py:1912: RuntimeWarning: invalid value encountered in less_equal
# cond2 = cond0 & (x <= _a)
# Expon Results
#==============================================================================
#Dep. Variable: y Log-Likelihood: -17282.
#Model: Expon AIC: 3.457e+04
#Method: Maximum Likelihood BIC: 3.459e+04
#Date: Thu, 06 Aug 2020
#Time: 13:55:24
#No. Observations: 3000
#Df Residuals: 2998
#Df Model: 2
#==============================================================================
# coef std err z P>|z| [0.025 0.975]
#------------------------------------------------------------------------------
#par0 0.0067 nan nan nan nan nan
#par1 116.8376 nan nan nan nan nan
#==============================================================================
This seems to happen every single time, so it might be something related to exponentially-distributed data.
Is there some other possible approach? Or am I maybe missing something or doing something wrong here?
Edit: It turns out, I was doing something wrong, namely, I wrongly had
test_data = scipy.stats.expon(120).rvs(size=3000)
instead of
test_data = scipy.stats.expon(scale=120).rvs(size=3000)
and was correspondingly looking at the forst element of the fit tuple while I should have been looking at the second.
As a result, the two other options I considered (manually computing the fit and confidence intervals following the standard procedure described on wikipedia) and using scikits.bootstrap as suggested in this answer actually does work and is part of the solution I'll add in a minute and not of the problem.
As mentioned in the edited question, part of the problem was that I was looking at the wrong parameter when creating the sample and again in the fit.
What remains is that scipy.stats.expon.fit does not offer the possibility to compute confidence or errors and that using GenericLikelihoodModel from the statsmodels module as suggested here fails because of a malformed Hessian.
There are, however three approaches that do work:
1. Using the simple inference procedure for confidence intervals in exponential data as given in the wikipedia article
""" Maximum likelihood"""
import numpy as np
ML_lambda = 1 / np.mean(test_data)
print("\nML lambda: {0:8f}".format(ML_lambda))
#ML lambda: 0.008558
""" Bias corrected ML"""
ML_BC_lambda = ML_lambda - ML_lambda / (len(test_data) - 1)
print("\nML bias-corrected lambda: {0:8f}".format(ML_BC_lambda))
#ML bias-corrected lambda: 0.008556
Computation of the confidence intervals::
""" Maximum likelihood 95% confidence"""
CI_distance = ML_BC_lambda * 1.96/(len(test_data)**0.5)
print("\nLambda with confidence intervals: {0:8f} +/- {1:8f}".format(ML_BC_lambda, CI_distance))
print("Confidence intervals: ({0:8f}, {1:9f})".format(ML_BC_lambda - CI_distance, ML_BC_lambda + CI_distance))
#Lambda with confidence intervals: 0.008556 +/- 0.000306
#Confidence intervals: (0.008249, 0.008862)
Second option with this: In addition, the confidence interval equation should also be valid for a lambda estimate produced by a different such as the one from scipy.stats.expon.fit. (I thought that the fitting procedure in scipy.stats.expon.fit was more reliable, but it turns out it is actually the same, without the bias correction (see above).)
""" Maximum likelihood 95% confidence based on scipy.stats fit"""
scipy_stats_lambda = 1 / fit[1]
scipy_stats_CI_distance = scipy_stats_lambda * 1.96/(len(test_data)**0.5)
print("\nOr, based on scipy.stats fit:")
print("Lambda with confidence intervals: {0:8f} +/- {1:8f}".format(scipy_stats_lambda, scipy_stats_CI_distance))
print("Confidence intervals: ({0:8f}, {1:9f})".format(scipy_stats_lambda - scipy_stats_CI_distance,
scipy_stats_lambda + scipy_stats_CI_distance))
#Or, based on scipy.stats fit:
#Lambda with confidence intervals: 0.008559 +/- 0.000306
#Confidence intervals: (0.008253, 0.008865)
2. Bootstrapping with scikits.bootstrap following the suggestion in this answer
This yields an InstabilityWarning: Some values were NaN; results are probably unstable (all values were probably equal), so this should be treated a bit skeptically.
""" Bootstrapping with scikits"""
print("\n")
import scikits.bootstrap as boot
bootstrap_result = boot.ci(test_data, scipy.stats.expon.fit)
print(bootstrap_result)
#tmp/expon_fit_test.py:53: InstabilityWarning: Some values were NaN; results are probably unstable (all values were probably equal)
# bootstrap_result = boot.ci(test_data, scipy.stats.expon.fit)
#[[6.67906789e-03 1.12615588e+02]
# [6.67906789e-03 1.21127091e+02]]
3. Using rpy2
""" Using r modules with rpy2"""
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
MASS = importr('MASS')
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
rpy_fit = MASS.fitdistr(test_data, "exponential")
rpy_estimate = rpy_fit.rx("estimate")[0][0]
rpy_sd = rpy_fit.rx("sd")[0][0]
rpy_lower = rpy_estimate - 2*rpy_sd
rpy_upper = rpy_estimate + 2*rpy_sd
print("\nrpy2 fit: \nLambda={0:8f} +/- {1:8f}, CI: ({2:8f}, {3:8f})".format(rpy_estimate, rpy_sd, rpy_lower, rpy_upper))
#rpy2 fit:
#Lambda=0.008558 +/- 0.000156, CI: (0.008246, 0.008871)
You have already found a solution to your problem, but here is a solution based on OpenTURNS. I think it relies on bootstrapping under the hood.
OpenTURNS requires you to reshape the data so that it is clear we are dealing with 3000 one-dimensional points and not a single 3000-dimensional point.
test_data = test_data.reshape(-1, 1)
The rest is rather straightforward.
import openturns as ot
confidence_level = 0.9
params_ci = ot.ExponentialFactory().buildEstimator(test_data).getParameterDistribution().computeBilateralConfidenceInterval(confidence_level)
lambda_ci = [params_ci.getLowerBound()[0], params_ci.getUpperBound()[0]]
# the index 0 means we are interested in the CI on lambda
print(lambda_ci)
I got the following output (but it depends on the random seed):
[0.008076302149561718, 0.008688296487447742]

Implement early stopping in emcee given a convergence criterion

I am using emcee.EnsembleSampler to sample some likelihoods. I am doing this on millions of different data sets, and so performance is important.
It would be great to be able to get a chain to stop sampling once it has satisfied some convergence criterion on the autocorrelation time.
I can not find any way of doing this in the documentation, or through my searches, so was hoping someone has a recipe for it.
While I too couldn't find anything in the current emcee docs, the latest versions seems to have added a tutorial which happens to show the exact thing you are trying to do: link
In case the link breaks or things change, here is the example:
import emcee
import numpy as np
np.random.seed(42)
# The definition of the log probability function
# We'll also use the "blobs" feature to track the "log prior" for each step
def log_prob(theta):
log_prior = -0.5 * np.sum((theta-1.0)**2 / 100.0)
log_prob = -0.5 * np.sum(theta**2) + log_prior
return log_prob, log_prior
# Initialize the walkers
coords = np.random.randn(32, 5)
nwalkers, ndim = coords.shape
# Set up the backend
# Don't forget to clear it in case the file already exists
filename = "tutorial.h5"
backend = emcee.backends.HDFBackend(filename)
backend.reset(nwalkers, ndim)
# Initialize the sampler
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, backend=backend)
max_n = 100000
# We'll track how the average autocorrelation time estimate changes
index = 0
autocorr = np.empty(max_n)
# This will be useful to testing convergence
old_tau = np.inf
# Now we'll sample for up to max_n steps
for sample in sampler.sample(coords, iterations=max_n, progress=True):
# Only check convergence every 100 steps
if sampler.iteration % 100:
continue
# Compute the autocorrelation time so far
# Using tol=0 means that we'll always get an estimate even
# if it isn't trustworthy
tau = sampler.get_autocorr_time(tol=0)
autocorr[index] = np.mean(tau)
index += 1
# Check convergence
converged = np.all(tau * 100 < sampler.iteration)
converged &= np.all(np.abs(old_tau - tau) / tau < 0.01)
if converged:
break
old_tau = tau

Least Squares method in practice

Very simple regression task. I have three variables x1, x2, x3 with some random noise. And I know target equation: y = q1*x1 + q2*x2 + q3*x3. Now I want to find target coefs: q1, q2, q3 evaluate the
performance using the mean Relative Squared Error (RSE) (Prediction/Real - 1)^2 to evaluate the performance of our prediction methods.
In the research, I see that this is ordinary Least Squares Problem. But I can't get from examples on the internet how to solve this particular problem in Python. Let say I have data:
import numpy as np
sourceData = np.random.rand(1000, 3)
koefs = np.array([1, 2, 3])
target = np.dot(sourceData, koefs)
(In real life that data are noisy, with not normal distribution.) How to find this koefs using Least Squares approach in python? Any lib usage.
#ayhan made a valuable comment.
And there is a problem with your code: Actually there is no noise in the data you collect. The input data is noisy, but after the multiplication, you don't add any additional noise.
I've added some noise to your measurements and used the least squares formula to fit the parameters, here's my code:
data = np.random.rand(1000,3)
true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)
noise = np.random.rand(1000) * 1
noisy_measurements = true_measurements + noise
estimated_theta = np.linalg.inv(data.T # data) # data.T # noisy_measurements
The estimated_theta will be close to true_theta. If you don't add noise to the measurements, they will be equal.
I've used the python3 matrix multiplication syntax.
You could use np.dot instead of #
That makes the code longer, so I've split the formula:
MTM_inv = np.linalg.inv(np.dot(data.T, data))
MTy = np.dot(data.T, noisy_measurements)
estimated_theta = np.dot(MTM_inv, MTy)
You can read up on least squares here: https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#The_general_problem
UPDATE:
Or you could just use the builtin least squares function:
np.linalg.lstsq(data, noisy_measurements)
In addition to the #lhk answer I have found great scipy Least Squares function. It is easy to get the requested behavior with it.
This way we can provide a custom function that returns residuals and form Relative Squared Error instead of absolute squared difference:
import numpy as np
from scipy.optimize import least_squares
data = np.random.rand(1000,3)
true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)
noise = np.random.rand(1000) * 1
noisy_measurements = true_measurements + noise
#noisy_measurements[-1] = data[-1] # (1000 * true_theta) - uncoment this outliner to see how much Relative Squared Error esimator works better then default abs diff for this case.
def my_func(params, x, y):
res = (x # params) / y - 1 # If we change this line to: (x # params) - y - we will got the same result as np.linalg.lstsq
return res
res = least_squares(my_func, x0, args=(data, noisy_measurements) )
estimated_theta = res.x
Also, we can provide custom loss with loss argument function that will process the residuals and form final loss.

Using adaptive step sizes with scipy.integrate.ode

The (brief) documentation for scipy.integrate.ode says that two methods (dopri5 and dop853) have stepsize control and dense output. Looking at the examples and the code itself, I can only see a very simple way to get output from an integrator. Namely, it looks like you just step the integrator forward by some fixed dt, get the function value(s) at that time, and repeat.
My problem has pretty variable timescales, so I'd like to just get the values at whatever time steps it needs to evaluate to achieve the required tolerances. That is, early on, things are changing slowly, so the output time steps can be big. But as things get interesting, the output time steps have to be smaller. I don't actually want dense output at equal intervals, I just want the time steps the adaptive function uses.
EDIT: Dense output
A related notion (almost the opposite) is "dense output", whereby the steps taken are as large as the stepper cares to take, but the values of the function are interpolated (usually with accuracy comparable to the accuracy of the stepper) to whatever you want. The fortran underlying scipy.integrate.ode is apparently capable of this, but ode does not have the interface. odeint, on the other hand, is based on a different code, and does evidently do dense output. (You can output every time your right-hand-side is called to see when that happens, and see that it has nothing to do with the output times.)
So I could still take advantage of adaptivity, as long as I could decide on the output time steps I want ahead of time. Unfortunately, for my favorite system, I don't even know what the approximate timescales are as functions of time, until I run the integration. So I'll have to combine the idea of taking one integrator step with this notion of dense output.
EDIT 2: Dense output again
Apparently, scipy 1.0.0 introduced support for dense output through a new interface. In particular, they recommend moving away from scipy.integrate.odeint and towards scipy.integrate.solve_ivp, which as a keyword dense_output. If set to True, the returned object has an attribute sol that you can call with an array of times, which then returns the integrated functions values at those times. That still doesn't solve the problem for this question, but it is useful in many cases.
Since SciPy 0.13.0,
The intermediate results from the dopri family of ODE solvers can
now be accessed by a solout callback function.
import numpy as np
from scipy.integrate import ode
import matplotlib.pyplot as plt
def logistic(t, y, r):
return r * y * (1.0 - y)
r = .01
t0 = 0
y0 = 1e-5
t1 = 5000.0
backend = 'dopri5'
# backend = 'dop853'
solver = ode(logistic).set_integrator(backend)
sol = []
def solout(t, y):
sol.append([t, *y])
solver.set_solout(solout)
solver.set_initial_value(y0, t0).set_f_params(r)
solver.integrate(t1)
sol = np.array(sol)
plt.plot(sol[:,0], sol[:,1], 'b.-')
plt.show()
Result:
The result seems to be slightly different from Tim D's, although they both use the same backend. I suspect this having to do with FSAL property of dopri5. In Tim's approach, I think the result k7 from the seventh stage is discarded, so k1 is calculated afresh.
Note: There's a known bug with set_solout not working if you set it after setting initial values. It was fixed as of SciPy 0.17.0.
I've been looking at this to try to get the same result. It turns out you can use a hack to get the step-by-step results by setting nsteps=1 in the ode instantiation. It will generate a UserWarning at every step (this can be caught and suppressed).
import numpy as np
from scipy.integrate import ode
import matplotlib.pyplot as plt
import warnings
def logistic(t, y, r):
return r * y * (1.0 - y)
r = .01
t0 = 0
y0 = 1e-5
t1 = 5000.0
#backend = 'vode'
backend = 'dopri5'
#backend = 'dop853'
solver = ode(logistic).set_integrator(backend, nsteps=1)
solver.set_initial_value(y0, t0).set_f_params(r)
# suppress Fortran-printed warning
solver._integrator.iwork[2] = -1
sol = []
warnings.filterwarnings("ignore", category=UserWarning)
while solver.t < t1:
solver.integrate(t1, step=True)
sol.append([solver.t, solver.y])
warnings.resetwarnings()
sol = np.array(sol)
plt.plot(sol[:,0], sol[:,1], 'b.-')
plt.show()
result:
The integrate method accepts a boolean argument step that tells the method to return a single internal step. However, it appears that the 'dopri5' and 'dop853' solvers do not support it.
The following code shows how you can get the internal steps taken by the solver when the 'vode' solver is used:
import numpy as np
from scipy.integrate import ode
import matplotlib.pyplot as plt
def logistic(t, y, r):
return r * y * (1.0 - y)
r = .01
t0 = 0
y0 = 1e-5
t1 = 5000.0
backend = 'vode'
#backend = 'dopri5'
#backend = 'dop853'
solver = ode(logistic).set_integrator(backend)
solver.set_initial_value(y0, t0).set_f_params(r)
sol = []
while solver.successful() and solver.t < t1:
solver.integrate(t1, step=True)
sol.append([solver.t, solver.y])
sol = np.array(sol)
plt.plot(sol[:,0], sol[:,1], 'b.-')
plt.show()
Result:
FYI, although an answer has been accepted already, I should point out for the historical record that dense output and arbitrary sampling from anywhere along the computed trajectory is natively supported in PyDSTool. This also includes a record of all the adaptively-determined time steps used internally by the solver. This interfaces with both dopri853 and radau5 and auto-generates the C code necessary to interface with them rather than relying on (much slower) python function callbacks for the right-hand side definition. None of these features are natively or efficiently provided in any other python-focused solver, to my knowledge.
Here's another option that should also work with dopri5 and dop853. Basically, the solver will call the logistic() function as often as needed to calculate intermediate values so that's where we store the results:
import numpy as np
from scipy.integrate import ode
import matplotlib.pyplot as plt
sol = []
def logistic(t, y, r):
sol.append([t, y])
return r * y * (1.0 - y)
r = .01
t0 = 0
y0 = 1e-5
t1 = 5000.0
# Maximum number of steps that the integrator is allowed
# to do along the whole interval [t0, t1].
N = 10000
#backend = 'vode'
backend = 'dopri5'
#backend = 'dop853'
solver = ode(logistic).set_integrator(backend, nsteps=N)
solver.set_initial_value(y0, t0).set_f_params(r)
# Single call to solver.integrate()
solver.integrate(t1)
sol = np.array(sol)
plt.plot(sol[:,0], sol[:,1], 'b.-')
plt.show()

Categories

Resources