python divide by zero encountered in log - logistic regression

python divide by zero encountered in log - logistic regression - python

I'm trying to implement a multiclass logistic regression classifier that distinguishes between k different classes.
This is my code.
import numpy as np
from scipy.special import expit
def cost(X,y,theta,regTerm):
(m,n) = X.shape
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
return J
def gradient(X,y,theta,regTerm):
(m,n) = X.shape
grad = np.dot(((expit(np.dot(X,theta))).reshape(m,1) - y).T,X)/m + (np.concatenate(([0],theta[1:].T),axis=0)).reshape(1,n)
return np.asarray(grad)
def train(X,y,regTerm,learnRate,epsilon,k):
(m,n) = X.shape
theta = np.zeros((k,n))
for i in range(0,k):
previousCost = 0;
currentCost = cost(X,y,theta[i,:],regTerm)
while(np.abs(currentCost-previousCost) > epsilon):
print(theta[i,:])
theta[i,:] = theta[i,:] - learnRate*gradient(X,y,theta[i,:],regTerm)
print(theta[i,:])
previousCost = currentCost
currentCost = cost(X,y,theta[i,:],regTerm)
return theta
trX = np.load('trX.npy')
trY = np.load('trY.npy')
theta = train(trX,trY,2,0.1,0.1,4)
I can verify that cost and gradient are returning values that are in the right dimension (cost returns a scalar, and gradient returns a 1 by n row vector), but i get the error
RuntimeWarning: divide by zero encountered in log
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
why is this happening and how can i avoid this?

The proper solution here is to add some small epsilon to the argument of log function. What worked for me was
epsilon = 1e-5
def cost(X, y, theta):
m = X.shape[0]
yp = expit(X # theta)
cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
return cost

You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator # for matrix multiplication — and breaking it up as suggested in the comments.
Here is your cost function:
def cost(X, y, theta, regTerm):
m = X.shape[0] # or y.shape, or even p.shape after the next line, number of training set
p = expit(X # theta)
log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
return J
You can clean up your gradient function along the same lines.
By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.

Cause:
This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.
Cost function:
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
Now, consider the following part in the above code snippet:
np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))
Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.

I'm guessing your data has negative values in it. You can't log a negative.
import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan
There are a lot of different ways to transform your data that should help, if this is the case.

def cost(X, y, theta):
yp = expit(X # theta)
cost = - np.average(y * np.log(yp) + (1 - y) * np.log(1 - yp))
return cost
The warning originates from np.log(yp) when yp==0 and in np.log(1 - yp) when yp==1. One option is to filter out these values, and not to pass them into np.log. The other option is to add a small constant to prevent the value from being exactly 0 (as suggested in one of the comments above)

Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all.
But i am not sure if it will give accurate results or not .

Related

Minimizing this error function, using NumPy

Background
I've been working for some time on attempting to solve the (notoriously painful) Time Difference of Arrival (TDoA) multi-lateration problem, in 3-dimensions and using 4 nodes. If you're unfamiliar with the problem, it is to determine the coordinates of some signal source (X,Y,Z), given the coordinates of n nodes, the time of arrival of the signal at each node, and the velocity of the signal v.
My solution is as follows:
For each node, we write (X-x_i)**2 + (Y-y_i)**2 + (Z-z_i)**2 = (v(t_i - T)**2
Where (x_i, y_i, z_i) are the coordinates of the ith node, and T is the time of emission.
We have now 4 equations in 4 unknowns. Four nodes are obviously insufficient. We could try to solve this system directly, however that seems next to impossible given the highly nonlinear nature of the problem (and, indeed, I've tried many direct techniques... and failed). Instead, we simplify this to a linear problem by considering all i/j possibilities, subtracting equation i from equation j. We obtain (n(n-1))/2 =6 equations of the form:
2*(x_j - x_i)*X + 2*(y_j - y_i)*Y + 2*(z_j - z_i)*Z + 2 * v**2 * (t_i - t_j) = v**2 ( t_i**2 - t_j**2) + (x_j**2 + y_j**2 + z_j**2) - (x_i**2 + y_i**2 + z_i**2)
Which look like Xv_1 + Y_v2 + Z_v3 + T_v4 = b. We try now to apply standard linear least squares, where the solution is the matrix vector x in A^T Ax = A^T b. Unfortunately, if you were to try feeding this into any standard linear least squares algorithm, it'll choke up. So, what do we do now?
...
The time of arrival of the signal at node i is given (of course) by:
sqrt( (X-x_i)**2 + (Y-y_i)**2 + (Z-z_i)**2 ) / v
This equation implies that the time of arrival, T, is 0. If we have that T = 0, we can drop the T column in matrix A and the problem is greatly simplified. Indeed, NumPy's linalg.lstsq() gives a surprisingly accurate & precise result.
...
So, what I do is normalize the input times by subtracting from each equation the earliest time. All I have to do then is determine the dt that I can add to each time such that the residual of summed squared error for the point found by linear least squares is minimized.
I define the error for some dt to be the squared difference between the arrival time for the point predicted by feeding the input times + dt to the least squares algorithm, minus the input time (normalized), summed over all 4 nodes.
for node, time in nodes, times:
error += ( (sqrt( (X-x_i)**2 + (Y-y_i)**2 + (Z-z_i)**2 ) / v) - time) ** 2
My problem:
I was able to do this somewhat satisfactorily by using brute-force. I started at dt = 0, and moved by some step up to some maximum # of iterations OR until some minimum RSS error is reached, and that was the dt I added to the normalized times to obtain a solution. The resulting solutions were very accurate and precise, but quite slow.
In practice, I'd like to be able to solve this in real time, and therefore a far faster solution will be needed. I began with the assumption that the error function (that is, dt vs error as defined above) would be highly nonlinear-- offhand, this made sense to me.
Since I don't have an actual, mathematical function, I can automatically rule out methods that require differentiation (e.g. Newton-Raphson). The error function will always be positive, so I can rule out bisection, etc. Instead, I try a simple approximation search. Unfortunately, that failed miserably. I then tried Tabu search, followed by a genetic algorithm, and several others. They all failed horribly.
So, I decided to do some investigating. As it turns out the plot of the error function vs dt looks a bit like a square root, only shifted right depending upon the distance from the nodes that the signal source is:
Where dt is on horizontal axis, error on vertical axis
And, in hindsight, of course it does!. I defined the error function to involve square roots so, at least to me, this seems reasonable.
What to do?
So, my issue now is, how do I determine the dt corresponding to the minimum of the error function?
My first (very crude) attempt was to get some points on the error graph (as above), fit it using numpy.polyfit, then feed the results to numpy.root. That root corresponds to the dt. Unfortunately, this failed, too. I tried fitting with various degrees, and also with various points, up to a ridiculous number of points such that I may as well just use brute-force.
How can I determine the dt corresponding to the minimum of this error function?
Since we're dealing with high velocities (radio signals), it's important that the results be precise and accurate, as minor variances in dt can throw off the resulting point.
I'm sure that there's some infinitely simpler approach buried in what I'm doing here however, ignoring everything else, how do I find dt?
My requirements:
Speed is of utmost importance
I have access only to pure Python and NumPy in the environment where this will be run
EDIT:
Here's my code. Admittedly, a bit messy. Here, I'm using the polyfit technique. It will "simulate" a source for you, and compare results:
from numpy import poly1d, linspace, set_printoptions, array, linalg, triu_indices, roots, polyfit
from dataclasses import dataclass
from random import randrange
import math
#dataclass
class Vertexer:
receivers: list
# Defaults
c = 299792
# Receivers:
# [x_1, y_1, z_1]
# [x_2, y_2, z_2]
# [x_3, y_3, z_3]
# Solved:
# [x, y, z]
def error(self, dt, times):
solved = self.linear([time + dt for time in times])
error = 0
for time, receiver in zip(times, self.receivers):
error += ((math.sqrt( (solved[0] - receiver[0])**2 +
(solved[1] - receiver[1])**2 +
(solved[2] - receiver[2])**2 ) / c ) - time)**2
return error
def linear(self, times):
X = array(self.receivers)
t = array(times)
x, y, z = X.T
i, j = triu_indices(len(x), 1)
A = 2 * (X[i] - X[j])
b = self.c**2 * (t[j]**2 - t[i]**2) + (X[i]**2).sum(1) - (X[j]**2).sum(1)
solved, residuals, rank, s = linalg.lstsq(A, b, rcond=None)
return(solved)
def find(self, times):
# Normalize times
times = [time - min(times) for time in times]
# Fit the error function
y = []
x = []
dt = 1E-10
for i in range(50000):
x.append(self.error(dt * i, times))
y.append(dt * i)
p = polyfit(array(x), array(y), 2)
r = roots(p)
return(self.linear([time + r for time in times]))
# SIMPLE CODE FOR SIMULATING A SIGNAL
# Pick nodes to be at random locations
x_1 = randrange(10); y_1 = randrange(10); z_1 = randrange(10)
x_2 = randrange(10); y_2 = randrange(10); z_2 = randrange(10)
x_3 = randrange(10); y_3 = randrange(10); z_3 = randrange(10)
x_4 = randrange(10); y_4 = randrange(10); z_4 = randrange(10)
# Pick source to be at random location
x = randrange(1000); y = randrange(1000); z = randrange(1000)
# Set velocity
c = 299792 # km/ns
# Generate simulated source
t_1 = math.sqrt( (x - x_1)**2 + (y - y_1)**2 + (z - z_1)**2 ) / c
t_2 = math.sqrt( (x - x_2)**2 + (y - y_2)**2 + (z - z_2)**2 ) / c
t_3 = math.sqrt( (x - x_3)**2 + (y - y_3)**2 + (z - z_3)**2 ) / c
t_4 = math.sqrt( (x - x_4)**2 + (y - y_4)**2 + (z - z_4)**2 ) / c
print('Actual:', x, y, z)
myVertexer = Vertexer([[x_1, y_1, z_1],[x_2, y_2, z_2],[x_3, y_3, z_3],[x_4, y_4, z_4]])
solution = myVertexer.find([t_1, t_2, t_3, t_4])
print(solution)

It seems like the Bancroft method applies to this problem? Here's a pure NumPy implementation.
# Implementation of the Bancroft method, following
# https://gssc.esa.int/navipedia/index.php/Bancroft_Method
M = np.diag([1, 1, 1, -1])
def lorentz_inner(v, w):
return np.sum(v * (w # M), axis=-1)
B = np.array(
[
[x_1, y_1, z_1, c * t_1],
[x_2, y_2, z_2, c * t_2],
[x_3, y_3, z_3, c * t_3],
[x_4, y_4, z_4, c * t_4],
]
)
one = np.ones(4)
a = 0.5 * lorentz_inner(B, B)
B_inv_one = np.linalg.solve(B, one)
B_inv_a = np.linalg.solve(B, a)
for Lambda in np.roots(
[
lorentz_inner(B_inv_one, B_inv_one),
2 * (lorentz_inner(B_inv_one, B_inv_a) - 1),
lorentz_inner(B_inv_a, B_inv_a),
]
):
x, y, z, c_t = M # np.linalg.solve(B, Lambda * one + a)
print("Candidate:", x, y, z, c_t / c)

My answer might have mistakes (glaring) as I had not heard the TDOA term before this afternoon. Please double check if the method is right.
I could not find solution to your original problem of finding dt corresponding to the minimum error. My answer also deviates from the requirement that other than numpy no third party library had to be used (I used Sympy and largely used the code from here). However I am still posting this thinking that somebody someday might find it useful if all one is interested in ... is to find X,Y,Z of the source emitter. This method also does not take into account real-life situations where white noise or errors might be present or curvature of the earth and other complications.
Your initial test conditions are as below.
from random import randrange
import math
# SIMPLE CODE FOR SIMULATING A SIGNAL
# Pick nodes to be at random locations
x_1 = randrange(10); y_1 = randrange(10); z_1 = randrange(10)
x_2 = randrange(10); y_2 = randrange(10); z_2 = randrange(10)
x_3 = randrange(10); y_3 = randrange(10); z_3 = randrange(10)
x_4 = randrange(10); y_4 = randrange(10); z_4 = randrange(10)
# Pick source to be at random location
x = randrange(1000); y = randrange(1000); z = randrange(1000)
# Set velocity
c = 299792 # km/ns
# Generate simulated source
t_1 = math.sqrt( (x - x_1)**2 + (y - y_1)**2 + (z - z_1)**2 ) / c
t_2 = math.sqrt( (x - x_2)**2 + (y - y_2)**2 + (z - z_2)**2 ) / c
t_3 = math.sqrt( (x - x_3)**2 + (y - y_3)**2 + (z - z_3)**2 ) / c
t_4 = math.sqrt( (x - x_4)**2 + (y - y_4)**2 + (z - z_4)**2 ) / c
print('Actual:', x, y, z)
My solution is as below.
import sympy as sym
X,Y,Z = sym.symbols('X,Y,Z', real=True)
f = sym.Eq((x_1 - X)**2 +(y_1 - Y)**2 + (z_1 - Z)**2 , (c*t_1)**2)
g = sym.Eq((x_2 - X)**2 +(y_2 - Y)**2 + (z_2 - Z)**2 , (c*t_2)**2)
h = sym.Eq((x_3 - X)**2 +(y_3 - Y)**2 + (z_3 - Z)**2 , (c*t_3)**2)
i = sym.Eq((x_4 - X)**2 +(y_4 - Y)**2 + (z_4 - Z)**2 , (c*t_4)**2)
print("Solved coordinates are ", sym.solve([f,g,h,i],X,Y,Z))
print statement from your initial condition gave.
Actual: 111 553 110
and the solution that almost instantly came out was
Solved coordinates are [(111.000000000000, 553.000000000000, 110.000000000000)]
Sorry again if something is totally amiss.

Improve speed of gradient descent

I am trying to maximize a target function f(x) with function scipy.optimize.minimum. But it usually takes 4-5 hrs to run the code because the function f(x) involves a lot of computation of complex matrix. To improve its speed, I want to use gpu. And I've already tried tensorflow package. Since I use numpy to define f(x), I have to convert it into tensorflow's format. However, it doesn't support the computation of complex matrix. What else package or means I can use? Any suggestions?
To specific my problem, I will show calculate scheme below:
Calculate the expectation :
-where H=x*H_0, x is the parameter
Let \phi go through the dynamics of Schrödinger equation
-Different H is correspond to a different \phi_end. Thus, parameter x determines the expectation
Change x, calculate the corresponding expectation
Find a specific x that minimize the expectation
Here is a simple example of part of my code:
import numpy as np
import cmath
from scipy.linalg import expm
import scipy.optimize as opt
# create initial complex matrixes
N = 2 # Dimension of matrix
H = np.array([[1.0 + 1.0j] * N] * N) # a complex matrix with shape(N, N)
A = np.array([[0.0j] * N] * N)
A[0][0] = 1.0 + 1j
# calculate the expectation
def value(phi):
exp_H = expm(H) # put the matrix in the exp function
new_phi = np.linalg.linalg.matmul(exp_H, phi)
# calculate the expectation of the matrix
x = np.linalg.linalg.matmul(H, new_phi)
expectation = np.inner(np.conj(phi), x)
return expectation
# Contants
tmax = 1
dt = 0.1
nstep = int(tmax/dt)
phi_init = [1.0 + 1.0j] * N
# 1st derivative of Schrödinger equation
def dXdt(t, phi, H): # 1st derivative of the function
return -1j * np.linalg.linalg.matmul(H, phi)
def f(X):
phi = [[0j] * N] * nstep # store every time's phi
phi[0] = phi_init
# phi go through the dynamics of Schrödinger equation
for i in range(nstep - 1):
phi[i + 1] = phi[i] - dXdt(i * dt, X[i] * H, phi[i]) * dt
# calculate the corresponding value
f_result = value(phi[-1])
return f_result
# Initialize the parameter
X0 = np.array(np.ones(nstep))
results = opt.minimize(f, X0) # minimize the target function
opt_x = results.x
PS:
Python Version: 3.7
Operation System: Win 10

MLE Log-likelihood for logistic regression gives divide by zero error

I want to compute the log-likelihood of a logistic regression model.
def sigma(x):
return 1 / (1 + np.exp(-x))
def logll(y, X, w):
""""
Parameters
y : ndarray of shape (N,)
Binary labels (either 0 or 1).
X : ndarray of shape (N,D)
Design matrix.
w : ndarray of shape (D,)
Weight vector.
"""
p = sigma(X # w)
y_1 = y # np.log(p)
y_0 = (1 - y) # (1 - np.log(1 - p))
return y_1 + y_0
logll(y, Xz, np.linspace(-5,5,D))
Applying this function results in
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:16:
RuntimeWarning: divide by zero encountered in log
app.launch_new_instance()
I would expect y_0 to be a negative float. How can I avoid this error and is there a bug somewhere in the code?
Edit 1
X # w statistics:
Max: 550.775133944
Min: -141.972597608
Sigma(max): 1.0 => Throws error in y_0 in np.log(1 - 1.0)
Sigma(min): 2.19828642169e-62
Edit 2
I also have access to this logsigma function that computes sigma in log space:
def logsigma (x):
return np.vectorize(np.log)(sigma(x))
Unfortunately, I don't find a way to rewrite y_0 then. The following is my approach but obviously not correct.
def l(y, X, w):
y_1 = np.dot(y, logsigma(X # w))
y_0 = (1 - y) # (1 - np.log(1 - logsigma(X # w)))
return y_1 + y_0

First of all, I think you've made a mistake in your log-likelihood formula: it should be a plain sum of y_0 and y_1, not sum of exponentials:
Division by zero can be caused by large negative values (I mean large by abs value) in X # w, e.g. sigma(-800) is exactly 0.0 on my machine, so the log of it results in "RuntimeWarning: divide by zero encountered in log".
Make sure you initialize your network with small values near zero and you don't have exploding gradients after several iterations of backprop.
By the way, here's the code I use for cross-entropy loss, which works also in multi-class problems:
def softmax_loss(x, y):
"""
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C
"""
probs = np.exp(x - np.max(x, axis=1, keepdims=True))
probs /= np.sum(probs, axis=1, keepdims=True)
N = x.shape[0]
return -np.sum(np.log(probs[np.arange(N), y])) / N
UPD: When nothing else helps, there is one more numerical trick (discussed in the comments): compute log(p+epsilon) and log(1-p+epsilon) with a small positive epsilon value. This ensures that log(0.0) never happens.

Multi variable gradient descent

I am learning gradient descent for calculating coefficients. Below is what I am doing:
#!/usr/bin/Python
import numpy as np
# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
#print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
X = np.array([41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8])
y = np.array([251.3,251.3,248.3,267.5,273.0,276.5,270.3,274.9,285.0,290.0,297.0,302.5,304.5,309.3,321.7,330.7,349.0])
n = np.max(X.shape)
x = np.vstack([np.ones(n), X]).T
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
Now my above code works fine. If I now try multiple variables and replace X with X1 like the following:
X1 = np.array([[41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8], [29.1,29.3,29.5,29.7,29.9,30.3,30.5,30.7,30.8,30.9,31.5,31.7,31.9,32.0,32.1,32.5,32.9]])
then my code fails and shows me the following error:
JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)
JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient
[ nan nan nan]
Can anybody tell me how can I do gradient descent using X1? My expected output using X1 is:
[-153.5 1.24 12.08]
I am open to other Python implementations also. I just want the coefficients (also called thetas) for X1 and y.

The problem is in your algorithm not converging. It diverges instead. The first error:
JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)
comes from the problem that at some point calculating the square of something is impossible, as the 64-bit floats cannot hold the number (i.e. it is > 10^309).
JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient
This is only a consequence of the error before. The numbers are not reasonable for calculations.
You can actually see the divergence by uncommenting your debug print line. The cost starts to grow, as there is no convergence.
If you try your function with X1 and a smaller value for alpha, it converges.

Euler method (explicit and implicit)

I'd like to implement Euler's method (the explicit and the implicit one)
(https://en.wikipedia.org/wiki/Euler_method) for the following model:
x(t)' = q(x_M -x(t))x(t)
x(0) = x_0
where q, x_M and x_0 are real numbers.
I know already the (theoretical) implementation of the method. But I couldn't figure out where I can insert / change the model.
Could anybody help?
EDIT: You were right. I didn't understand correctly the method. Now, after a few hours, I think that I really got it! With the explicit method, I'm pretty sure (nevertheless: could anybody please have a look at my code? )
With the implicit implementation, I'm not very sure if it's correct. Could please anyone have a look at the implementation of the implicit method and give me a feedback what's correct / not good?
def explizit_euler():
''' x(t)' = q(xM -x(t))x(t)
x(0) = x0'''
q = 2.
xM = 2
x0 = 0.5
T = 5
dt = 0.01
N = T / dt
x = x0
t = 0.
for i in range (0 , int(N)):
t = t + dt
x = x + dt * (q * (xM - x) * x)
print '%6.3f %6.3f' % (t, x)
def implizit_euler():
''' x(t)' = q(xM -x(t))x(t)
x(0) = x0'''
q = 2.
xM = 2
x0 = 0.5
T = 5
dt = 0.01
N = T / dt
x = x0
t = 0.
for i in range (0 , int(N)):
t = t + dt
x = (1.0 / (1.0 - q *(xM + x) * x))
print '%6.3f %6.3f' % (t, x)

Pre-emptive note: Although the general idea should be correct, I did all the algebra in place in the editor box so there might be mistakes there. Please, check it yourself before using for anything really important.
I'm not sure how you come to the "implicit" formula
x = (1.0 / (1.0 - q *(xM + x) * x))
but this is wrong and you can check it by comparing your "explicit" and "implicit" results: they should slightly diverge but with this formula they will diverge drastically.
To understand the implicit Euler method, you should first get the idea behind the explicit one. And the idea is really simple and is explained at the Derivation section in the wiki: since derivative y'(x) is a limit of (y(x+h) - y(x))/h, you can approximate y(x+h) as y(x) + h*y'(x) for small h, assuming our original differential equation is
y'(x) = F(x, y(x))
Note that the reason this is only an approximation rather than exact value is that even over small range [x, x+h] the derivative y'(x) changes slightly. It means that if you want to get a better approximation of y(x+h), you need a better approximation of "average" derivative y'(x) over the range [x, x+h]. Let's call that approximation just y'. One idea of such improvement is to find both y' and y(x+h) at the same time by saying that we want to find such y' and y(x+h) that y' would be actually y'(x+h) (i.e. the derivative at the end). This results in the following system of equations:
y'(x+h) = F(x+h, y(x+h))
y(x+h) = y(x) + h*y'(x+h)
which is equivalent to a single "implicit" equation:
y(x+h) - y(x) = h * F(x+h, y(x+h))
It is called "implicit" because here the target y(x+h) is also a part of F. And note that quite similar equation is mentioned in the Modifications and extensions section of the wiki article.
So now going to your case that equation becomes
x(t+dt) - x(t) = dt*q*(xM -x(t+dt))*x(t+dt)
or equivalently
dt*q*x(t+dt)^2 + (1 - dt*q*xM)*x(t+dt) - x(t) = 0
This is a quadratic equation with two solutions:
x(t+dt) = [(dt*q*xM - 1) ± sqrt((dt*q*xM - 1)^2 + 4*dt*q*x(t))]/(2*dt*q)
Obviously we want the solution that is "close" to the x(t) which is the + solution. So the code should be something like:
b = (q * xM * dt - 1)
x(t+h) = (b + (b ** 2 + 4 * q * x(t) * dt) ** 0.5) / 2 / q / dt
(editor note:) Applying the binomial complement, this formula has the numerically more stable form for small dt, where then b < 0,
x(t+h) = (2 * x(t)) / ((b ** 2 + 4 * q * x(t) * dt) ** 0.5 - b)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python divide by zero encountered in log - logistic regression - python

The proper solution here is to add some small epsilon to the argument of log function. What worked for me was epsilon = 1e-5 def cost(X, y, theta): m = X.shape[0] yp = expit(X # theta) cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon)) return cost

I'm guessing your data has negative values in it. You can't log a negative. import numpy as np np.log(2) > 0.69314718055994529 np.log(-2) > nan There are a lot of different ways to transform your data that should help, if this is the case.

Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all. But i am not sure if it will give accurate results or not .

Related

Minimizing this error function, using NumPy

Improve speed of gradient descent

MLE Log-likelihood for logistic regression gives divide by zero error

Multi variable gradient descent

Euler method (explicit and implicit)

Categories

Resources