How does scipy.integrate.ode.integrate() work?

How does scipy.integrate.ode.integrate() work? - python

I have obviously read through the documentation, but I have not been able to find a more detailed description of what is happening under the covers. Specifically, there are a few behaviors that I am very confused about:
General setup
import numpy as np
from scipy.integrate import ode
#Constants in ODE
N = 30
K = 0.5
w = np.random.normal(np.pi, 0.1, N)
#Integration parameters
y0 = np.linspace(0, 2*np.pi, N, endpoint=False)
t0 = 0
#Set up the solver
solver = ode(lambda t,y: w + K/N*np.sum( np.sin( y - y.reshape(N,1) ), axis=1))
solver.set_integrator('vode', method='bdf')
solver.set_initial_value(y0, t0)
Problem 1: solver.integrate(t0) fails
Setting up the integrator, and asking for the value at t0 the first time returns a successful integration. Repeating this returns the correct number, but the solver.successful() method returns false:
solver.integrate(t0)
>>> array([ 0. , 0.20943951, 0.41887902, ..., 5.65486678,
5.86430629, 6.0737458 ])
solver.successful()
>>> True
solver.integrate(t0)
>>> array([ 0. , 0.20943951, 0.41887902, ..., 5.65486678,
5.86430629, 6.0737458 ])
solver.successful()
>>> False
My question is, what is happening in the solver.integrate(t) method that causes it to succeed the first time, and fail subsequently, and what does it mean to have an “unsuccessful” integration? Furthermore, why does the integrator fail silently, and continue to produce useful-looking outputs until I ask it explicitly whether it was successful?
Related, is there a way to reset the failed integration, or do I need to re-instantiate the solver from scratch?
Problem 2: solver.integrate(t) immediately returns an answer for almost any value of t
Even though my initial value of y0 is given at t0=0, I can request the value at t=10000 and get the answer immediately. I would expect that the numerical integration over such a large time span should take at least a few seconds (e.g. in Matlab, asking to integrate over 10000 time steps would take several minutes).
For example, re-run the setup from above and execute:
solver.integrate(10000)
>>> array([ 2153.90803383, 2153.63023706, 2153.60964064, ..., 2160.00982959,
2159.90446056, 2159.82900895])
Is Python really that fast, or is this output total nonsense?

Problem 0
Don’t ignore error messages. Yes, ode’s error messages can be cryptic at times, but you still want to avoid them.
Problem 1
As you already integrated up to t0 with the first call of solver.integrate(t0), you are integrating for a time step of 0 with the second call. This throws the cryptic error:
DVODE-- ISTATE (=I1) .gt. 1 but DVODE not initialized
In above message, I1 = 2
/usr/lib/python3/dist-packages/scipy/integrate/_ode.py:869: UserWarning: vode: Illegal input detected. (See printed message.)
'Unexpected istate=%s' % istate))
Problem 2.1
There is a maximum number of (internal) steps that a solver is going to take in one call without throwing an error. This can be set with the nsteps argument of set_integrator. If you integrate a large time at once, nsteps will be exceeded even if nothing is wrong, and the following error message is thrown:
/usr/lib/python3/dist-packages/scipy/integrate/_ode.py:869: UserWarning: vode: Excess work done on this call. (Perhaps wrong MF.)
'Unexpected istate=%s' % istate))
The integrator then stops at whenever this happens.
Problem 2.2
If you set nsteps=10**10, the integration runs without problems. It still is pretty fast though (roughly 1 s on my machine). The reason for this is as follows:
For a multi-dimensional system such as yours, there are two main runtime sinks when integrating:
Vector and matrix operations within the integrator. In scipy.ode, these are all realised with NumPy operations or ported Fortran or C code. Anyway, they are realised with compiled code without Python overhead and thus very efficient.
Evaluating the derivative (lambda t,y: w + K/N*np.sum( np.sin( y - y.reshape(N,1) ), axis=1) in your case). You realised this with NumPy operations, which again are realised with compiled code and very efficient. You may improve this a little bit with a purely compiled function, but that will grant you at most a small factor. If you used Python lists and loops instead, it would be horribly slow.
Therefore, for your problem, everything relevant is handled with compiled code under the hood and the integration is handled with an efficiency comparable to that of, e.g., a pure C program. I do not know how the two above aspects are handled in Matlab, but if either of the above challenges is handled with interpreted instead of compiled loops, this would explain the runtime discrepancy you observe.

To the second question, yes, the output might be nonsense. Local errors, be they from discretization or floating point operations, accumulate with a compounding factor which is about the Lipschitz constant of the ODE function. In a first estimate, the Lipschitz constant here is K=0.5. The magnification rate of early errors, that is, their coefficient as part of the global error, can thus be as large as exp(0.5*10000), which is a huge number.
On the other hand it is not surprising that the integration is fast. Most of the provided methods use step size adaptation, and with the standard error tolerances this might result in only some tens of internal steps. Reducing the error tolerances will increase the number of internal steps and may change the numerical result drastically.

Related

Running something in a method takes much more depending on the data types

Introduction
Today I found a weird behaviour in python while running some experiments with exponentiation and I was wondering if someone here knows what's happening. In my experiments, I was trying to check what is faster in python int**int or float**float. To check that I run some small snippets, and I found a really weird behaviour.
Weird results
My first approach was just to write some for loops and prints to check which one is faster. The snipper I used is this one
import time
# Run powers outside a method
ti = time.time()
for i in range(EXPERIMENTS):
x = 2**2
tf = time.time()
print(f"int**int took {tf-ti:.5f} seconds")
ti = time.time()
for i in range(EXPERIMENTS):
x = 2.**2.
tf = time.time()
print(f"float**float took {tf-ti:.5f} seconds")
After running it I got
int**int took 0.03004
float**float took 0.03070 seconds
Cool, it seems that data types do not affect the execution time. However, since I try to be a clean coder I refactored the repeated logic in a function power_time
import time
# Run powers in a method
def power_time(base, exponent):
ti = time.time()
for i in range(EXPERIMENTS):
x = base ** exponent
tf = time.time()
return tf-ti
print(f"int**int took {power_time(2, 2):.5f} seconds")
print(f"float**float took {power_time(2., 2.):5f} seconds")
And what a surprise of mine when I got these results
int**int took 0.20140 seconds
float**float took 0.05051 seconds
The refactor didn't affect a lot the float case, but it multiplied by ~7 the time required for the int case.
Conclusions and questions
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
Also, if I run the same experiments but change ** by * or + the weird results disappear, and all the approaches give more or less the same results
Does someone know why is this happening? Am I missing something?

Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
It would be really weird if it was not like this! You can write your class that has it's own ** operator (through implementing the __pow__(self, other) method), and you could, for example, sleep 1s in there. Why should that take as long as taking a float to the power of another?
So, yeah, Python is a dynamically typed language. So, the operations done on data depend on the type of that data, and things can generally take different times.
In your first example, the difference never arises, because a) most probably the values get cached, because right after parsing it's clear that 2**2 is a constant and does not need to get evaluated every loop. Even if that's not the case, b) the time it costs to run a loop in python is hundreds of times that it takes to actually execute the math here – again, dynamically typed, dynamically named.
base**exponent is a whole different story. None about this is constant. So, there's actually going to be a calculation every iteration.
Now, the ** operator (__rpow__ in the Python data model) for Python's built-in float type is specified to do the float exponent (which is something implemented in highly optimized C and assembler), as exponentiation can elegantly be done on floating point numbers. Look for nb_power in cpython's floatobject.c. So, for the float case, the actual calculation is "free" for all that matters, again, because your loop is limited by how much effort it is to resolve all the names, types and functions to call in your loop. Not by doing the actual math, which is trivial.
The ** operator on Python's built-in int type is not as neatly optimized. It's a lot more complicated – it needs to do checks like "if the exponent is negative, return a float," and it does not do elementary math that your computer can do with a simple instruction, it handles arbitrary-length integers (remember, a python integer has as many bytes as it needs. You can save numbers that are larger than 64 bit in a Python integer!), which comes with allocation and deallocations. (I encourage you to read long_pow in CPython's longobject.c; it has 200 lines.)
All in all, integer exponentiation is expensive in python, because of python's type system.

Python Z3 solver not correctly reporting satisfiability for exponent constraints

I noticed the Z3 Solver library for python wasn't correctly reporting satisfiability for a problem involving exponents that I was working on. Specifically, it reported finding no solutions on cases where I knew a valid one -- unless I added constraints that effectively "told it the answer".
I simplified the problem to isolate it. In the code below, I'm asking it to find q and m such that q^m == 100. With the constraint 0 <= q < 100, you have, of course, q=10, m=2. But with the code below, it reports finding no solution (raise Z3Exception("model is not available")):
import z3.z3 as z
slv = z.Solver()
m = z.Int('m')
q = z.Int('q')
slv.add(100 == (q ** m))
slv.add(q >= 0)
slv.add(q < 100)
slv.add(m >= 0)
slv.add(m <= 100)
slv.check()
However, if you replace slv.add(m <= 100)) with slv.add(m <= 2) (or slv.add(m == 2)!), it has no problem finding the solution (of q=10, m=2).
Am I using Z3 wrong somehow?
I thought it would only report unsatisfiability ("model is not available") if it proved there was no solution and would otherwise hang while searching for a solution. Is that wrong? I didn't expect to be in a position where it only finds the solution if you shrink down the search space enough.
I haven't had this problem with any other operation besides exponentiation (e.g. addition, modulo, etc.).

You're misinterpreting what z3 is telling you. Change your line:
slv.check()
to:
print(slv.check())
print(slv.reason_unknown())
And you'll see it prints:
unknown
smt tactic failed to show goal to be sat/unsat (incomplete (theory arithmetic))
So, z3 doesn't know if your problem is sat or unsat; so you cannot ask for a model. The reason for this is the power operator: It introduces non-linearity, and the theory of non-linear integer equations is undecidable in general. That is, z3's solver is incomplete for this problem. In practice, this means z3 will apply a bunch of heuristics, and will hopefully solve the problem for you. But you can get unknown as well, as you observed.
It's not surprising that if you add extra constraints you're helping the solver and thus it finds an answer. You're just helping it further and those heuristics have an easier time. With different versions of z3, you can observe different behavior. (i.e., in the future, they might be able to solve this problem out-of-the-box, or maybe the heuristics will get worse and you helping it this way won't resolve the issue either.) Such is the nature of automatic-theorem proving with undecidable theories.
Bottom line: Any call to check can return sat, unsat, or unknown. Your program should check for all three possibilities and interpret the output accordingly.

Scipy minimize iterating past bounds

I am trying to minimize a function of 3 input variables using scipy. The function reads like so-
def myfunc(x):
x[0] = a
x[1] = b
x[2] = c
n = f(a,b,c)
return n
bound1 = (80,100)
bound2 = (10,20)
bound3 = (312,740)
guess = [a0,b0,c0]
bds = (bound1,bound2,bound3)
result = minimize(myfunc, guess,method='L-BFGS-B',bounds=bds)
The function I am trying to currently run reaches a minimum at a=100,b=10,c=740, which is at the end of the bounds.
The minimize function keeps trying to iterate past the end of bound 3 (gets to c0 value of 740.0000000149012 on its last iteration.
Is there any way to stop this from happening? i.e. stop the iteration at the actual end of my bound?

This happens due to numerical-differentiation, which itself is not only needed to infer the step-direction and size, but also to reason about termination.
In general you can't do much without being very careful in regards to whatever solver (and there are many backend-solvers) being used. The basic idea is to replace the automatic numerical-differentiation with one provided by you: this one then respects those bounds and must be careful about the solvers-internals, e.g. "how to reason about termination at this end".
Fix A:
Your problem should vanish automatically when using: Pull-request #10673, which touches your configuration: L-BFGS-B.
It seems, this PR is not part of the current release SciPy 1.4.1 (as this was 2 months before the PR).
See also #6026, where a milestone of 1.5.0 is mentioned in regards to some changes including respecting bounds in num-diff.
For above PR, you will need to install scipy from the sources, which is:
quite doable on linux (and maybe os x)
not something you should try on windows!
trust me...
See the documentation if needed.
Fix B:
Apart from that, as you are doing unconstrained-optimization (with variable-bounds) where more solver-backends are available (compared to constrained-optimization), you might try another solver, trust-constr, which has explicit support for this, see #9098.
Be careful to recognize, that you need to signal this explicitly when setting up the bounds!

High memory usage when doing direct transcription with sympy equations

I used sympy to derive, via lagrange, the equations of motion of my 3 link robot. The resultant equation of motion in the form (theta_dot_dot = f(theta, theta_dot)) turned out very complicated with A LOT of cos and sin. I then lambdified the functions to use with drake, replacing all the sympy.sin and sympy.cos with drake.sin, drake.cos.
The final function can be evaluated numerically (i.e. given theta, theta_dot, find theta_dot_dot) somewhat efficiently in the milliseconds range.
I then tried to use direct transcription to do trajectory optimization. Note I did not use the DirectTranscription library, instead manually added the necessary constraints.
The constraints are added roughly as follows:
for i in range(NUM_TIME_STEPS-1):
print("Adding constraints for t = " + str(i))
tau = mp.NewContinuousVariables(3, "tau_%d" % i)
next_state = mp.NewContinuousVariables(8, "state_%d" % (i+1))
for j in range(8):
mp.AddConstraint(next_state[j] <= (state_over_time[i] + TIME_INTERVAL*derivs(state_over_time[i], tau))[j])
mp.AddConstraint(next_state[j] >= (state_over_time[i] + TIME_INTERVAL*derivs(state_over_time[i], tau))[j])
state_over_time[i+1] = next_state
tau_over_time[i] = tau
The problem I'm facing right now is that on each iteration of adding constraints, I observe that my memory usage increases by around 70-100MB. This means that my number of time steps cannot go more than around 50 before the program crashes due to out of memory.
I'm wondering what I can do to make trajectory optimization work for my robot. Obviously I can try to simplify (by hand or otherwise) the equations of motions... but is there anything else I can try? Is it even normal that the constraints are taking up so much memory? Am I doing something very wrong here?

You're pushing drake's symbolic through your complex equations. Making that better is a good goal, but probably you want to avoid it by using the other overload for AddConstraint:
AddConstraint(your_method, lb, ub, vars)
https://drake.mit.edu/pydrake/pydrake.solvers.mathematicalprogram.html?highlight=addconstraint#pydrake.solvers.mathematicalprogram.MathematicalProgram.AddConstraint
That will use your python code as a function pointer, and should use autodiff instead of symbolic.

MCMC convergence in hierarchical model with (large) time^2 term in pymc3

I have a hierarchical logit that has observations over time. Following Carter 2010, I have included a time, time^2, and time^3 term. The model mixes using Metropolis or NUTS before I add the time variables. HamiltonianMC fails. NUTS and Metropolis also work with time. But NUTS and Metropolis fail with time^2 and time^3, but they fail differently and in a puzzling way. However, unlike in other models that fail for more obvious model specification reasons, ADVI still gives an estimate, (and ELBO is not inf).
NUTS either stalls early (last time it made it to 60 iterations), or progresses too quickly and returns an empty traceplot with an error about varnames.
Metropolis errors out immediately with a dimension mismatch error. It looks like the one in this github error, but I'm using a Bernoulli outcome, not a negative binomial. The end of the error looks like: ValueError: Input dimension mis-match. (input[0].shape[0] = 1, input[4].shape[0] = 18)
I get an empty trace when I try HamiltonianMC. It returns the starting values with no mixing
ADVI gives me a mean and a standard deviation.
I increased the ADVI iterations by a lot. It gave pretty close to the same starting points and NUTS still failed.
I double checked that the fix for the github issue is in place in the version of pymc3 I'm running. It is.
My intuition is that this has something to do with how huge the time^2 and time^3 variables get, since I'm looking over a large time-frame. Time^3 starts at 0 and goes to 64,000.
Here is what I've tried for sampling so far. Note that I have small sample sizes while testing, since it takes so long to run (if it finishes) and I'm just trying to get it to sample at all. Once I find one that works, I'll up the number of iterations
with my_model:
mu,sds,elbo = pm.variational.advi(n=500000,learning_rate=1e-1)
print(mu['mu_b'])
step = pm.NUTS(scaling=my_model.dict_to_array(sds)**2,
is_cov=True)
my_trace = pm.sample(500,
step=step,
start=mu,
tune=100)
I've also done the above with tune=1000
I've also tried a Metropolis and Hamiltonian.
with my_model:
my_trace = pm.sample(5000,step=pm.Metropolis())
with my_model:
my_trace = pm.sample(5000,step=pm.HamiltonianMC())
Questions:
Is my intuition about the size and spread of the time variables reasonable?
Are there ways to sample square and cubed terms more effectively? I've searched, but can you perhaps point me to a resource on this so I can learn more about it?
What's up with Metropolis and the dimension mismatch error?
What's up with the empty trace plots for NUTS? Usually when it stalls, the trace up until the stall works.
Are there alternative ways to handle time that might be easier to sample?
I haven't posted a toy model, because it's hard to replicate without the data. I'll add a toy model once I replicate with simulated data. But the actual model is below:
with pm.Model() as my_model:
mu_b = pm.Flat('mu_b')
sig_b = pm.HalfCauchy('sig_b',beta=2.5)
b_raw = pm.Normal('b_raw',mu=0,sd=1,shape=n_groups)
b = pm.Deterministic('b',mu_b + sig_b*b_raw)
t1 = pm.Normal('t1',mu=0,sd=100**2,shape=1)
t2 = pm.Normal('t2',mu=0,sd=100**2,shape=1)
t3 = pm.Normal('t3',mu=0,sd=100**2,shape=1)
est =(b[data.group.values]* data.x.values) +\
(t1*data.t.values)+\
(t2*data.t2.values)+\
(t3*data.t3.values)
y = pm.Bernoulli('y', p=tt.nnet.sigmoid(est), observed = data.y)
BREAKTHROUGH 1: Metropolis error
Weird syntax issue. Theano seemed to be confused about a model with both constant and random effects. I created a constant in data equal to 0, data['c']=0 and used it as an index for the time, time^2 and time^3 effects, as follows:
est =(b[data.group.values]* data.x.values) +\
(t1[data.c.values]*data.t.values)+\
(t2[data.c.values]*data.t2.values)+\
(t3[data.c.values]*data.t3.values)
I don't think this is the whole issue, but it's a step in the right direction. I bet this is why my asymmetric specification didn't work, and if so, suspect it may sample better.
UPDATE: It sampled! Will now try some of the suggestions for making it easier on the sampler, including using the specification suggested here. But at least it's working!

Without having the dataset to play with it is hard to give a definite answer, but here is my best guess:
To me, it is a bit unexpected to hear about the third order polynomial in there. I haven't read the paper, so I can't really comment on it, but I think this might be the reason for your problems. Even very small values for t3 will have a huge influence on the predictor. To keep this reasonable, I'd try to change the parametrization a bit: First make sure that your predictor is centered (something like data['t'] = data['t'] - data['t'].mean() and after that define data.t2 and data.t3). Then try to set a more reasonable prior on t2 and t3. They should be pretty small, so maybe try something like
t1 = pm.Normal('t1',mu=0,sd=1,shape=1)
t2 = pm.Normal('t2',mu=0,sd=1,shape=1)
t2 = t2 / 100
t3 = pm.Normal('t3',mu=0,sd=1,shape=1)
t3 = t3 / 1000
If you want to look at other models, you could try to model your predictor as a GaussianRandomWalk or a Gaussian Process.
Updating pymc3 to the latest release candidate should also help, the sampler was improved quit a bit.
Update I just noticed you don't have an intercept term in your model. Unless there is a good reason for that you probably want to add
intercept = pm.Flat('intercept')
est = (intercept
+ b[..] * data.x
+ ...)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.