Scipy fmin_bfgs Error: Divide-by-zero encountered: rhok assumed large

Scipy fmin_bfgs Error: Divide-by-zero encountered: rhok assumed large - python

I'm getting the following error when using fmin_bfgs (in SciPy) to optimize an unregularized logistic cost function:
Divide-by-zero encountered: rhok assumed large
C:\Python27\lib\site-packages\scipy\optimize\optimize.py:828:
RuntimeWarning: divide by zero encountered in double_scalars rhok =
1.0 / (numpy.dot(yk, sk))
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 0.693147
Iterations: 1
Function evaluations: 27
The algorithm (fmin_bfgs) stops after one iteration. What could I be doing wrong? Here's the python code: https://gist.github.com/4223554
Here's the dataset: https://gist.github.com/4223566

Your objective and gradient functions have bugs:
initial_theta instead of theta, so they return constant values. Such a function has no well-defined minima, hence the optimization fails.
the gradient function assumes theta is 2D array
Fix them and it works.

folks,
Unlike what 'pv' mentioned, the issue wasn't related to initial_theta. It had to do with the training data set. I've fixed the problem and here's the working code: https://github.com/dormantroot/machine-learning-experiment/blob/master/LogisticRegressionExamples/LogisticRegression.py

Related

solving a Non- linear first order differential equation and getting a break in the plot

I am trying to solve the elliptical differential equation using fourth-order runge-kutta method in python.
After execution, I get a very small part of the actual plot that should be obtained and alongside with it an error saying that:
"RuntimeWarning: invalid value encountered in double_scalars"
import numpy as np
import matplotlib.pyplot as plt
#Define constants
g=9.8
L=1.04
#Define the differential Function
def fun(y,x):
return-(2*(g/L)*(np.cos(y)-np.cos(np.pi/6)))**(1/2)
#Define variable arrays
x=np.zeros(1000)
y=np.zeros(1000)
y[0]=np.pi/6
dx=0.5
#Runge-Kutta Method
for i in range(len(y)-1):
k1=fun(x[i],y[i])
k2=fun(x[i]+dx/2, y[i]+dx*k1/2)
k3=fun(x[i]+dx/2, y[i]+dx*k2/2)
k4=fun(x[i]+dx, y[i]+dx*k3)
y[i+1]=y[i]+dx/6*(k1+2*k2+2*k3+k4)
x[i+1]=x[i]+dx
#print(y)
#print(x)
plt.plot(x,y)
plt.xlabel('Time')
plt.ylabel('Theta')
plt.grid()
And the graph I obtain is something like,
My question is why am I getting the error message? Thanks for helping!

Several points that lead to this behavior. First, you switched the order of the arguments in the ODE function, probably to make it compatible with odeint. Use the tfirst=True optional argument to avoid that and have the independent variable always first.
The actual source of the error is the term
(np.cos(y)-np.cos(np.pi/6)))**(1/2)
remember that in your version y has the value x[i], so that at some point the expression under the root becomes negative.
If you correct the first point, you will probably still encounter the second error as the exact solution moves parabolically towards the fixed point, so that the stages of RK4 are likely to overshoot. One can fix that by providing a sufficiently secured square root function,
def softroot(x): return x/max(1e-12,abs(x))**0.5
#Define the differential Function
def fun(x,y):
return -(2*(g/L)*softroot(np.cos(y)-np.cos(np.pi/6)))
#Define variable arrays
dx=0.01
x=np.arange(0,1,dx)
y=np.zeros(x.shape)
y[0]=np.pi/6
...
results in a plot
as the solution already starts in the fixed point. Shifting the initial point a little down to y[0]=np.pi/6-1e-8 produces a jump to the fixed point below.

How can I handle divergence failure manually when using optimize.newton in SciPy?

I'm using newton optimize from SciPy to solve an equation and depending on the initial guess sometimes the solution does not converge and crashes.
x = optimize.newton(fun,1/1000)
Would it be possible to print a message instead of the python crash message to say that convergence failed or retry optimization with different initial values?

From the documentation:
disp: bool, optional
If True, raise a RuntimeError if the algorithm didn’t converge, with the error message containing the number of iterations and current function value. Otherwise the convergence status is recorded in a RootResults return object. Ignored if x0 is not scalar. Note: this has little to do with displaying, however the disp keyword cannot be renamed for backwards compatibility.
You should set disp to False, because it is enabled by default:
optimize.newton(fun, 1/1000, disp=False)
Your result and other information will be in a RootResults object.

Python: scipy's optimize functions do not work/give dimension errors

I am implementing Andrew Ng's Machine Learning course on Python, but I got stuck because the scipy's optimize functions keep giving me a hard time by not working/giving me dimension errors
The goal is to find the minimum of the cost function (a scalar function that takes theta (dimension (1,401)), X (dimension (5000,401)), and y (dimension (5000,1)) as inputs). I have defined such cost function and its gradient wrt parameters. When running one of the optimize functions (I have tried fmin_tnc, minimize, Nelder-Mead and others, all not working), either they run for ages or keep giving me errors saying that the array dimension is wrong, or that they find a division by 0... errors that I am not able to spot.
weirdest thing is that this problem has popped up at first when I was doing exercise 2 on logistic regression, and then magically disappeared without me changing anything. Now, Implementing multi-classification logistic regression, it has appeared again, and it won't fix even though I have literally copied and pasted the code of exercise 2!
The code is the following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat
import scipy.misc
import matplotlib.cm as cm
from scipy.optimize import minimize,fmin_tnc
import random
def sigmoid(z):
return 1/(1+np.exp(-z))
def J(theta,X,y):
theta_t=np.transpose(theta)
prod=np.matmul(X,theta_t)
sigm=sigmoid(prod)
vec=y*np.log(sigm)+(1-y)*np.log(1-sigm)
return -np.sum(vec)/len(y)
def grad(theta,X,y):
theta_t=np.transpose(theta)
prod=np.matmul(X,theta_t)
sigm=sigmoid(prod)
one=sigm-y
return np.matmul(np.transpose(one),X)/len(y)
data=loadmat('/home/marco/Desktop/MLang/mlex3/ex3/ex3data1.mat')
X,y = data['X'],data['y']
X=np.column_stack((np.ones(len(X[:,0])),X))
initial_theta=np.zeros((1,len(X[0,:])))
res=fmin_tnc(func=J, x0=initial_theta.flatten(), args=(X,y.flatten()), fprime=grad)
theta_opt=res[0]
Instead of returning the value of theta that minimizes the function as theta_opt, it says:
/home/marco/anaconda3/lib/python3.6/site-packages ipykernel_launcher.py:8: RuntimeWarning: divide by zero encountered in log
I have no clue where this divide by zero occurs, given that there is literally no division in the whole code, except for the division by len(y), which is 5000, and the division in the sigmoid function (1/(1+exp(-z)), which can never be 0!
Any suggestions?

Runtimewarning when using scipy.stats.beta.fit

If I run the following code in python
from scipy.stats import norm, beta
sample = beta.rvs(2,5,size=100)
beta_fit = beta.fit(sample)
I get the following error
/usr/lib/python3/dist-packages/scipy/stats/_continuous_distns.py:404: RuntimeWarning: invalid
value encountered in sqrt
sk = 2*(b-a)*sqrt(a + b + 1) / (a + b + 2) / sqrt(a*b)
and depending on the size of the sample, I sometimes also get this other error
/usr/lib/python3/dist-packages/scipy/optimize/minpack.py:161: RuntimeWarning:
The iteration is not making good progress, as measured by the improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
Does anyone know why this is happening and how to fix it?
Thanks!

In a comment you say that you want to keep the support fixed as [0, 1]. To do that with the fit() method, use the arguments floc=0 and fscale=1. Then only the shape parameters will be fit to the data.
from scipy.stats import beta
sample = beta.rvs(2, 5, size=100)
beta_fit = beta.fit(sample, floc=0, fscale=1)
This should also eliminate the warnings that you are seeing. Those warnings occur because when all four parameters are fit, the code uses a generic numerical optimization routine to find the parameters that maximize the likelihood, and something in that code is generating those warnings. (It might be a bug--the shape parameters are supposed to be positive, so neither of the calls to sqrt in the line that generates the warning should get a negative argument.) When you fix the location and scale, the fit() method solves a simpler numerical problem to find the maximum likelihood parameter estimates, so it avoids the code that generates the warnings.

How does scipy.integrate.ode.integrate() work?

I have obviously read through the documentation, but I have not been able to find a more detailed description of what is happening under the covers. Specifically, there are a few behaviors that I am very confused about:
General setup
import numpy as np
from scipy.integrate import ode
#Constants in ODE
N = 30
K = 0.5
w = np.random.normal(np.pi, 0.1, N)
#Integration parameters
y0 = np.linspace(0, 2*np.pi, N, endpoint=False)
t0 = 0
#Set up the solver
solver = ode(lambda t,y: w + K/N*np.sum( np.sin( y - y.reshape(N,1) ), axis=1))
solver.set_integrator('vode', method='bdf')
solver.set_initial_value(y0, t0)
Problem 1: solver.integrate(t0) fails
Setting up the integrator, and asking for the value at t0 the first time returns a successful integration. Repeating this returns the correct number, but the solver.successful() method returns false:
solver.integrate(t0)
>>> array([ 0. , 0.20943951, 0.41887902, ..., 5.65486678,
5.86430629, 6.0737458 ])
solver.successful()
>>> True
solver.integrate(t0)
>>> array([ 0. , 0.20943951, 0.41887902, ..., 5.65486678,
5.86430629, 6.0737458 ])
solver.successful()
>>> False
My question is, what is happening in the solver.integrate(t) method that causes it to succeed the first time, and fail subsequently, and what does it mean to have an “unsuccessful” integration? Furthermore, why does the integrator fail silently, and continue to produce useful-looking outputs until I ask it explicitly whether it was successful?
Related, is there a way to reset the failed integration, or do I need to re-instantiate the solver from scratch?
Problem 2: solver.integrate(t) immediately returns an answer for almost any value of t
Even though my initial value of y0 is given at t0=0, I can request the value at t=10000 and get the answer immediately. I would expect that the numerical integration over such a large time span should take at least a few seconds (e.g. in Matlab, asking to integrate over 10000 time steps would take several minutes).
For example, re-run the setup from above and execute:
solver.integrate(10000)
>>> array([ 2153.90803383, 2153.63023706, 2153.60964064, ..., 2160.00982959,
2159.90446056, 2159.82900895])
Is Python really that fast, or is this output total nonsense?

Problem 0
Don’t ignore error messages. Yes, ode’s error messages can be cryptic at times, but you still want to avoid them.
Problem 1
As you already integrated up to t0 with the first call of solver.integrate(t0), you are integrating for a time step of 0 with the second call. This throws the cryptic error:
DVODE-- ISTATE (=I1) .gt. 1 but DVODE not initialized
In above message, I1 = 2
/usr/lib/python3/dist-packages/scipy/integrate/_ode.py:869: UserWarning: vode: Illegal input detected. (See printed message.)
'Unexpected istate=%s' % istate))
Problem 2.1
There is a maximum number of (internal) steps that a solver is going to take in one call without throwing an error. This can be set with the nsteps argument of set_integrator. If you integrate a large time at once, nsteps will be exceeded even if nothing is wrong, and the following error message is thrown:
/usr/lib/python3/dist-packages/scipy/integrate/_ode.py:869: UserWarning: vode: Excess work done on this call. (Perhaps wrong MF.)
'Unexpected istate=%s' % istate))
The integrator then stops at whenever this happens.
Problem 2.2
If you set nsteps=10**10, the integration runs without problems. It still is pretty fast though (roughly 1 s on my machine). The reason for this is as follows:
For a multi-dimensional system such as yours, there are two main runtime sinks when integrating:
Vector and matrix operations within the integrator. In scipy.ode, these are all realised with NumPy operations or ported Fortran or C code. Anyway, they are realised with compiled code without Python overhead and thus very efficient.
Evaluating the derivative (lambda t,y: w + K/N*np.sum( np.sin( y - y.reshape(N,1) ), axis=1) in your case). You realised this with NumPy operations, which again are realised with compiled code and very efficient. You may improve this a little bit with a purely compiled function, but that will grant you at most a small factor. If you used Python lists and loops instead, it would be horribly slow.
Therefore, for your problem, everything relevant is handled with compiled code under the hood and the integration is handled with an efficiency comparable to that of, e.g., a pure C program. I do not know how the two above aspects are handled in Matlab, but if either of the above challenges is handled with interpreted instead of compiled loops, this would explain the runtime discrepancy you observe.

To the second question, yes, the output might be nonsense. Local errors, be they from discretization or floating point operations, accumulate with a compounding factor which is about the Lipschitz constant of the ODE function. In a first estimate, the Lipschitz constant here is K=0.5. The magnification rate of early errors, that is, their coefficient as part of the global error, can thus be as large as exp(0.5*10000), which is a huge number.
On the other hand it is not surprising that the integration is fast. Most of the provided methods use step size adaptation, and with the standard error tolerances this might result in only some tens of internal steps. Reducing the error tolerances will increase the number of internal steps and may change the numerical result drastically.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.