Implementing Linear Regression using Gradient Descent

Implementing Linear Regression using Gradient Descent - python

I have just started in machine learning and currently taking the course by andrew Ng's Machine learning Course. I have implemented the linear regression algorithm in python but the result is not desirable. I code of python is as follows:
import numpy as np
x = [[1,1,1,1,1,1,1,1,1,1],[10,20,30,40,50,60,70,80,90,100]]
y = [10,16,20,23,29,30,35,40,45,50]
x = np.array(x)
y = np.array(y)
theta = np.zeros((2,1))
def Cost(x,y,theta):
m = len(y)
pred_ions = np.transpose(theta).dot(x)
J = 1/(2*m) * np.sum((pred_ions - y)*(pred_ions - y))
return J
def GradientDescent(x,y,theta,iteration,alpha):
m = len(y)
pred_ions = np.transpose(theta).dot(x)
i = 1
while i <= iteration:
theta[0] = theta[0] - alpha/m * np.sum(pred_ions - y)
theta[1] = theta[1] - alpha/m * np.sum((pred_ions - y)*x[1,:])
Cost_History = Cost(x,y,theta)
i = i + 1
return theta[0],theta[1]
itera = 1000
alpha = 0.01
a,b = GradientDescent(x,y,theta,itera, alpha)
print(a)
print(b)
I am not able to figure out what exactly is the problem. But, my results are something very strange. The value of parameter is, according to above code, are 298 and 19890. Any Help would be appreciated. Thanks.

Ah. I did this assignment too a while ago.
See this mentioned in Page 7 of the assignment PDF:
Octave/MATLAB array indices start from one, not zero. If you’re
storing θ0 and θ1 in a vector called theta, the values will be
theta(1) and theta(2).
So, in your while loop, change the theta[0] and theta[1] to theta[1] and theta[2]. It should work right.
Also, if you are storing the Cost in Cost_History, shouldn't it include the iteration variable like
Cost_History[i] = Cost(x,y,theta)
Just check that too! Hope this helped.
Edit 1: Okay, I have understood the issue now. In his video, Andrew Ng says that you need to update both the thetas simultaneously. To do that, store the theta matrix in a temp variable. And update theta[0] and theta[1] based on the temp values.
Currently in your code, during theta[1] = it has changed the theta[0] to the newer value already, so both are not being updated simultaneously.
So instead, do this:
while i <= iteration:
temp = theta
theta[0] = theta[0] - alpha/m * np.sum(np.transpose(temp).dot(x) - y)
theta[1] = theta[1] - alpha/m * np.sum((np.transpose(temp).dot(x) - y)*x[1,:])
Cost_History[i] = Cost(x,y,theta)
i = i + 1
It should work now, if not, let me know, I will debug on my side.

Related

Evaluating a function with a well-defined value at x,y=0

I am trying to write a program that uses an array in further calculations. I initialize a grid of equally spaced points with NumPy and assign a value at each point as per the code snippet provided below. The function I am trying to describe with this array gives me a division by 0 error at x=y and it generally blows up around it. I know that the real part of said function is bounded by band_D/(2*math.pi)
at x=y and I tried manually assigning this value on the diagonal, but it seems that points around it are still ill-behaved and so I am not getting any right values. Is there a way to remedy this? This is how the function looks like with matplotlib
gamma=5
band_D=100
Dt=1e-3
x = np.arange(0,1/gamma,Dt)
y = np.arange(0,1/gamma,Dt)
xx,yy= np.meshgrid(x,y)
N=x.shape[0]
di = np.diag_indices(N)
time_fourier=(1j/2*math.pi)*(1-np.exp(1j*band_D*(xx-yy)))/(xx-yy)
time_fourier[di]=band_D/(2*math.pi)

You have a classic 0 / 0 problem. It's not really Numpy's job to figure out to apply De L'Hospital and solve this for you... I see, as other have commented, that you had the right idea with trying to set the limit value at the diagonal (where x approx y), but by the time you'd hit that line, the warning had already been emitted (just a warning, BTW, not an exception).
For a quick fix (but a bit of a fudge), in this case, you can try to add a small value to the difference:
xy = xx - yy + 1e-100
num = (1j / 2*np.pi) * (1 - np.exp(1j * band_D * xy))
time_fourier = num / xy
This also reveals that there is something wrong with your limit calculation... (time_fourier[0,0] approx 157.0796..., not 15.91549...).
and not band_D / (2*math.pi).
For a correct calculation:
def f(xy):
mask = xy != 0
limit = band_D * np.pi/2
return np.where(mask, np.divide((1j/2 * np.pi) * (1 - np.exp(1j * band_D * xy)), xy, where=mask), limit)
time_fourier = f(xx - yy)

You are dividing by x-y, that will definitely throw an error when x = y. The function being well behaved here means that the Taylor series doesn't diverge. But python doesn't know or care about that, it just calculates one step at a time until it reaches division by 0.
You had the right idea by defining a different function when x = y (ie, the mathematically true answer) but your way of applying it doesn't work because the correction is AFTER the division by 0, so it never gets read. This, however, should work
def make_time_fourier(x, y):
if np.isclose(x, y):
return band_D/(2*math.pi)
else:
return (1j/2*math.pi)*(1-np.exp(1j*band_D*(x-y)))/(x-y)
time_fourier = np.vectorize(make_time_fourier)(xx, yy)
print(time_fourier)

You can use np.divide with where option.
import math
gamma=5
band_D=100
Dt=1e-3
x = np.arange(0,1/gamma,Dt)
y = np.arange(0,1/gamma,Dt)
xx,yy = np.meshgrid(x,y)
N = x.shape[0]
di = np.diag_indices(N)
time_fourier = (1j / 2 * np.pi) * (1 - np.exp(1j * band_D * (xx - yy)))
time_fourier = np.divide(time_fourier,
(xx - yy),
where=(xx - yy) != 0)
time_fourier[di] = band_D / (2 * np.pi)

You can reformulate your function so that the division is inside the (numpy) sinc function, which handles it correctly.
To save typing I'll use D for band_D and use a variable
z = D*(xx-yy)/2
Then
T = (1j/2*pi)*(1-np.exp(1j*band_D*(xx-yy)))/(xx-yy)
= (2/D)*(1j/2*pi)*( 1 - cos( 2*z) - 1j*sin( 2*z))/z
= (1j/D*pi)* (2*sin(z)*sin(z) - 2j*sin(z)*cos(z))/z
= (2j/D*pi) * sin(z)/z * (sin(z) - 1j*cos(z))
= (2j/D*pi) * sinc( z/pi) * (sin(z) - 1j*cos(z))
numpy defines
sinc(x) to be sin(pi*x)/(pi*x)
I can't run python do you should chrck my calculations
The steps are
Substitute the definition of z and expand the complex exp
Apply the double angle formulae for sin and cos
Factor out sin(z)
Substitute the definition of sinc

Minimizing this error function, using NumPy

Background
I've been working for some time on attempting to solve the (notoriously painful) Time Difference of Arrival (TDoA) multi-lateration problem, in 3-dimensions and using 4 nodes. If you're unfamiliar with the problem, it is to determine the coordinates of some signal source (X,Y,Z), given the coordinates of n nodes, the time of arrival of the signal at each node, and the velocity of the signal v.
My solution is as follows:
For each node, we write (X-x_i)**2 + (Y-y_i)**2 + (Z-z_i)**2 = (v(t_i - T)**2
Where (x_i, y_i, z_i) are the coordinates of the ith node, and T is the time of emission.
We have now 4 equations in 4 unknowns. Four nodes are obviously insufficient. We could try to solve this system directly, however that seems next to impossible given the highly nonlinear nature of the problem (and, indeed, I've tried many direct techniques... and failed). Instead, we simplify this to a linear problem by considering all i/j possibilities, subtracting equation i from equation j. We obtain (n(n-1))/2 =6 equations of the form:
2*(x_j - x_i)*X + 2*(y_j - y_i)*Y + 2*(z_j - z_i)*Z + 2 * v**2 * (t_i - t_j) = v**2 ( t_i**2 - t_j**2) + (x_j**2 + y_j**2 + z_j**2) - (x_i**2 + y_i**2 + z_i**2)
Which look like Xv_1 + Y_v2 + Z_v3 + T_v4 = b. We try now to apply standard linear least squares, where the solution is the matrix vector x in A^T Ax = A^T b. Unfortunately, if you were to try feeding this into any standard linear least squares algorithm, it'll choke up. So, what do we do now?
...
The time of arrival of the signal at node i is given (of course) by:
sqrt( (X-x_i)**2 + (Y-y_i)**2 + (Z-z_i)**2 ) / v
This equation implies that the time of arrival, T, is 0. If we have that T = 0, we can drop the T column in matrix A and the problem is greatly simplified. Indeed, NumPy's linalg.lstsq() gives a surprisingly accurate & precise result.
...
So, what I do is normalize the input times by subtracting from each equation the earliest time. All I have to do then is determine the dt that I can add to each time such that the residual of summed squared error for the point found by linear least squares is minimized.
I define the error for some dt to be the squared difference between the arrival time for the point predicted by feeding the input times + dt to the least squares algorithm, minus the input time (normalized), summed over all 4 nodes.
for node, time in nodes, times:
error += ( (sqrt( (X-x_i)**2 + (Y-y_i)**2 + (Z-z_i)**2 ) / v) - time) ** 2
My problem:
I was able to do this somewhat satisfactorily by using brute-force. I started at dt = 0, and moved by some step up to some maximum # of iterations OR until some minimum RSS error is reached, and that was the dt I added to the normalized times to obtain a solution. The resulting solutions were very accurate and precise, but quite slow.
In practice, I'd like to be able to solve this in real time, and therefore a far faster solution will be needed. I began with the assumption that the error function (that is, dt vs error as defined above) would be highly nonlinear-- offhand, this made sense to me.
Since I don't have an actual, mathematical function, I can automatically rule out methods that require differentiation (e.g. Newton-Raphson). The error function will always be positive, so I can rule out bisection, etc. Instead, I try a simple approximation search. Unfortunately, that failed miserably. I then tried Tabu search, followed by a genetic algorithm, and several others. They all failed horribly.
So, I decided to do some investigating. As it turns out the plot of the error function vs dt looks a bit like a square root, only shifted right depending upon the distance from the nodes that the signal source is:
Where dt is on horizontal axis, error on vertical axis
And, in hindsight, of course it does!. I defined the error function to involve square roots so, at least to me, this seems reasonable.
What to do?
So, my issue now is, how do I determine the dt corresponding to the minimum of the error function?
My first (very crude) attempt was to get some points on the error graph (as above), fit it using numpy.polyfit, then feed the results to numpy.root. That root corresponds to the dt. Unfortunately, this failed, too. I tried fitting with various degrees, and also with various points, up to a ridiculous number of points such that I may as well just use brute-force.
How can I determine the dt corresponding to the minimum of this error function?
Since we're dealing with high velocities (radio signals), it's important that the results be precise and accurate, as minor variances in dt can throw off the resulting point.
I'm sure that there's some infinitely simpler approach buried in what I'm doing here however, ignoring everything else, how do I find dt?
My requirements:
Speed is of utmost importance
I have access only to pure Python and NumPy in the environment where this will be run
EDIT:
Here's my code. Admittedly, a bit messy. Here, I'm using the polyfit technique. It will "simulate" a source for you, and compare results:
from numpy import poly1d, linspace, set_printoptions, array, linalg, triu_indices, roots, polyfit
from dataclasses import dataclass
from random import randrange
import math
#dataclass
class Vertexer:
receivers: list
# Defaults
c = 299792
# Receivers:
# [x_1, y_1, z_1]
# [x_2, y_2, z_2]
# [x_3, y_3, z_3]
# Solved:
# [x, y, z]
def error(self, dt, times):
solved = self.linear([time + dt for time in times])
error = 0
for time, receiver in zip(times, self.receivers):
error += ((math.sqrt( (solved[0] - receiver[0])**2 +
(solved[1] - receiver[1])**2 +
(solved[2] - receiver[2])**2 ) / c ) - time)**2
return error
def linear(self, times):
X = array(self.receivers)
t = array(times)
x, y, z = X.T
i, j = triu_indices(len(x), 1)
A = 2 * (X[i] - X[j])
b = self.c**2 * (t[j]**2 - t[i]**2) + (X[i]**2).sum(1) - (X[j]**2).sum(1)
solved, residuals, rank, s = linalg.lstsq(A, b, rcond=None)
return(solved)
def find(self, times):
# Normalize times
times = [time - min(times) for time in times]
# Fit the error function
y = []
x = []
dt = 1E-10
for i in range(50000):
x.append(self.error(dt * i, times))
y.append(dt * i)
p = polyfit(array(x), array(y), 2)
r = roots(p)
return(self.linear([time + r for time in times]))
# SIMPLE CODE FOR SIMULATING A SIGNAL
# Pick nodes to be at random locations
x_1 = randrange(10); y_1 = randrange(10); z_1 = randrange(10)
x_2 = randrange(10); y_2 = randrange(10); z_2 = randrange(10)
x_3 = randrange(10); y_3 = randrange(10); z_3 = randrange(10)
x_4 = randrange(10); y_4 = randrange(10); z_4 = randrange(10)
# Pick source to be at random location
x = randrange(1000); y = randrange(1000); z = randrange(1000)
# Set velocity
c = 299792 # km/ns
# Generate simulated source
t_1 = math.sqrt( (x - x_1)**2 + (y - y_1)**2 + (z - z_1)**2 ) / c
t_2 = math.sqrt( (x - x_2)**2 + (y - y_2)**2 + (z - z_2)**2 ) / c
t_3 = math.sqrt( (x - x_3)**2 + (y - y_3)**2 + (z - z_3)**2 ) / c
t_4 = math.sqrt( (x - x_4)**2 + (y - y_4)**2 + (z - z_4)**2 ) / c
print('Actual:', x, y, z)
myVertexer = Vertexer([[x_1, y_1, z_1],[x_2, y_2, z_2],[x_3, y_3, z_3],[x_4, y_4, z_4]])
solution = myVertexer.find([t_1, t_2, t_3, t_4])
print(solution)

It seems like the Bancroft method applies to this problem? Here's a pure NumPy implementation.
# Implementation of the Bancroft method, following
# https://gssc.esa.int/navipedia/index.php/Bancroft_Method
M = np.diag([1, 1, 1, -1])
def lorentz_inner(v, w):
return np.sum(v * (w # M), axis=-1)
B = np.array(
[
[x_1, y_1, z_1, c * t_1],
[x_2, y_2, z_2, c * t_2],
[x_3, y_3, z_3, c * t_3],
[x_4, y_4, z_4, c * t_4],
]
)
one = np.ones(4)
a = 0.5 * lorentz_inner(B, B)
B_inv_one = np.linalg.solve(B, one)
B_inv_a = np.linalg.solve(B, a)
for Lambda in np.roots(
[
lorentz_inner(B_inv_one, B_inv_one),
2 * (lorentz_inner(B_inv_one, B_inv_a) - 1),
lorentz_inner(B_inv_a, B_inv_a),
]
):
x, y, z, c_t = M # np.linalg.solve(B, Lambda * one + a)
print("Candidate:", x, y, z, c_t / c)

My answer might have mistakes (glaring) as I had not heard the TDOA term before this afternoon. Please double check if the method is right.
I could not find solution to your original problem of finding dt corresponding to the minimum error. My answer also deviates from the requirement that other than numpy no third party library had to be used (I used Sympy and largely used the code from here). However I am still posting this thinking that somebody someday might find it useful if all one is interested in ... is to find X,Y,Z of the source emitter. This method also does not take into account real-life situations where white noise or errors might be present or curvature of the earth and other complications.
Your initial test conditions are as below.
from random import randrange
import math
# SIMPLE CODE FOR SIMULATING A SIGNAL
# Pick nodes to be at random locations
x_1 = randrange(10); y_1 = randrange(10); z_1 = randrange(10)
x_2 = randrange(10); y_2 = randrange(10); z_2 = randrange(10)
x_3 = randrange(10); y_3 = randrange(10); z_3 = randrange(10)
x_4 = randrange(10); y_4 = randrange(10); z_4 = randrange(10)
# Pick source to be at random location
x = randrange(1000); y = randrange(1000); z = randrange(1000)
# Set velocity
c = 299792 # km/ns
# Generate simulated source
t_1 = math.sqrt( (x - x_1)**2 + (y - y_1)**2 + (z - z_1)**2 ) / c
t_2 = math.sqrt( (x - x_2)**2 + (y - y_2)**2 + (z - z_2)**2 ) / c
t_3 = math.sqrt( (x - x_3)**2 + (y - y_3)**2 + (z - z_3)**2 ) / c
t_4 = math.sqrt( (x - x_4)**2 + (y - y_4)**2 + (z - z_4)**2 ) / c
print('Actual:', x, y, z)
My solution is as below.
import sympy as sym
X,Y,Z = sym.symbols('X,Y,Z', real=True)
f = sym.Eq((x_1 - X)**2 +(y_1 - Y)**2 + (z_1 - Z)**2 , (c*t_1)**2)
g = sym.Eq((x_2 - X)**2 +(y_2 - Y)**2 + (z_2 - Z)**2 , (c*t_2)**2)
h = sym.Eq((x_3 - X)**2 +(y_3 - Y)**2 + (z_3 - Z)**2 , (c*t_3)**2)
i = sym.Eq((x_4 - X)**2 +(y_4 - Y)**2 + (z_4 - Z)**2 , (c*t_4)**2)
print("Solved coordinates are ", sym.solve([f,g,h,i],X,Y,Z))
print statement from your initial condition gave.
Actual: 111 553 110
and the solution that almost instantly came out was
Solved coordinates are [(111.000000000000, 553.000000000000, 110.000000000000)]
Sorry again if something is totally amiss.

Implementing stochastic gradient descent

I am trying to implement a basic way of the stochastic gradient desecent with multi linear regression and the L2 Norm as loss function.
The result can be seen in this picture:
Its pretty far of the ideal regression line, but I dont really understand why thats the case. I double checked all array dimensions and they all seem to fit.
Below is my source code. If anyone can see my error or give me a hint I would appreciate that.
def SGD(x,y,learning_rate):
theta = np.array([[0],[0]])
for i in range(N):
xi = x[i].reshape(1,-1)
y_pre = xi#theta
theta = theta + learning_rate*(y[i]-y_pre[0][0])*xi.T
print(theta)
return theta
N = 100
x = np.array(np.linspace(-2,2,N))
y = 4*x + 5 + np.random.uniform(-1,1,N)
X = np.array([x**0,x**1]).T
plt.scatter(x,y,s=6)
th = SGD(X,y,0.1)
y_reg = np.matmul(X,th)
print(y_reg)
print(x)
plt.plot(x,y_reg)
plt.show()
Edit: Another solution was to shuffle the measurements with x = np.random.permutation(x)

to illustrate my comment,
def SGD(x,y,n,learning_rate):
theta = np.array([[0],[0]])
# currently it does exactly one iteration. do more
for _ in range(n):
for i in range(len(x)):
xi = x[i].reshape(1,-1)
y_pre = xi#theta
theta = theta + learning_rate*(y[i]-y_pre[0][0])*xi.T
print(theta)
return theta
SGD(X,y,10,0.01) yields the correct result

Coupled map lattice in Python

I attempt to plot bifurcation diagram for following one-dimensional spatially extended system with boundary conditions
x[i,n+1] = (1-eps)*(r*x[i,n]*(1-x[i,n])) + 0.5*eps*( r*x[i-1,n]*(1-x[i-1,n]) + r*x[i+1,n]*(1-x[i+1,n])) + p
I am facing problem in getting desired output figure may be because of number of transients I am using. Can someone help me out by cross-checking my code, what values of nTransients should I choose or how many transients should I ignore ?
My Python code is as follows:
import numpy as np
from numpy import *
from pylab import *
L = 60 # no. of lattice sites
eps = 0.6 # diffusive coupling strength
r = 4.0 # control parameter r
np.random.seed(1010)
ic = np.random.uniform(0.1, 0.9, L) # random initial condition betn. (0,1)
nTransients = 900 # The iterates we'll throw away
nIterates = 1000 # This sets how much the attractor is filled in
nSteps = 400 # This sets how dense the bifurcation diagram will be
pLow = -0.4
pHigh = 0.0
pInc = (pHigh-pLow)/float(nSteps)
def LM(p, x):
x_new = []
for i in range(L):
if i==0:
x_new.append((1-eps)*(r*x[i]*(1-x[i])) + 0.5*eps*(r*x[L-1]*(1-x[L-1]) + r*x[i+1]*(1-x[i+1])) + p)
elif i==L-1:
x_new.append((1-eps)*(r*x[i]*(1-x[i])) + 0.5*eps*(r*x[i-1]*(1-x[i-1]) + r*x[0]*(1-x[0])) + p)
elif i>0 and i<L-1:
x_new.append((1-eps)*(r*x[i]*(1-x[i])) + 0.5*eps*(r*x[i-1]*(1-x[i-1]) + r*x[i+1]*(1-x[i+1])) + p)
return x_new
for p in arange(pLow, pHigh, pInc):
# set initial conditions
state = ic
# throw away the transient iterations
for i in range(nTransients):
state = LM(p, state)
# now stote the next batch of iterates
psweep = [] # store p values
x = [] # store iterates
for i in range(nIterates):
state = LM(p, state)
psweep.append(p)
x.append(state[L/2-1])
plot(psweep, x, 'k,') # Plot the list of (r,x) pairs as pixels
xlabel('Pinning Strength p')
ylabel('X(L/2)')
# Display plot in window
show()
Can someone also tell me figure displayed by pylab in the end has either dots or lines as a marker, if it is lines then how to get plot with dots.
This is my output image for reference, after using pixels:

It still isn't clear exactly what your desired output is, but I'm guessing you're aiming for something that looks like this image from Wikipedia:
Going with that assumption, I gave it my best shot, but I'm guessing your equations (with the boundary conditions and so on) give you something that simply doesn't look quite that pretty. Here's my result:
This plot by itself may not look like the best thing ever, however, if you zoom in, you can really see some beautiful detail (this is right from the center of the plot, where the two arms of the bifurcation come down, meet, and then branch away again):
Note that I have used horizontal lines, with alpha=0.1 (originally you were using solid, vertical lines, which was why the result didn't look good).
The code!
I essentially modified your program a little to make it vectorized: I removed the for loop over p, which made the whole thing run almost instantaneously. This enabled me to use a much denser sampling for p, and allowed me to plot horizontal lines.
from __future__ import print_function, division
import numpy as np
import matplotlib.pyplot as plt
L = 60 # no. of lattice sites
eps = 0.6 # diffusive coupling strength
r = 4.0 # control parameter r
np.random.seed(1010)
ic = np.random.uniform(0.1, 0.9, L) # random initial condition betn. (0,1)
nTransients = 100 # The iterates we'll throw away
nIterates = 100 # This sets how much the attractor is filled in
nSteps = 4000 # This sets how dense the bifurcation diagram will be
pLow = -0.4
pHigh = 0.0
pInc = (pHigh - pLow) / nSteps
def LM(p, x):
x_new = np.empty(x.shape)
for i in range(L):
if i == 0:
x_new[i] = ((1 - eps) * (r * x[i] * (1 - x[i])) + 0.5 * eps * (r * x[L - 1] * (1 - x[L - 1]) + r * x[i + 1] * (1 - x[i + 1])) + p)
elif i == L - 1:
x_new[i] = ((1 - eps) * (r * x[i] * (1 - x[i])) + 0.5 * eps * (r * x[i - 1] * (1 - x[i - 1]) + r * x[0] * (1 - x[0])) + p)
elif i > 0 and i < L - 1:
x_new[i] = ((1 - eps) * (r * x[i] * (1 - x[i])) + 0.5 * eps * (r * x[i - 1] * (1 - x[i - 1]) + r * x[i + 1] * (1 - x[i + 1])) + p)
return x_new
p = np.arange(pLow, pHigh, pInc)
state = np.tile(ic[:, np.newaxis], (1, p.size))
# set initial conditions
# throw away the transient iterations
for i in range(nTransients):
state = LM(p, state)
# now store the next batch of iterates
x = np.empty((p.size, nIterates)) # store iterates
for i in range(nIterates):
state = LM(p, state)
x[:, i] = state[L // 2 - 1]
# Plot the list of (r,x) pairs as pixels
plt.plot(p, x, c=(0, 0, 0, 0.1))
plt.xlabel('Pinning Strength p')
plt.ylabel('X(L/2)')
# Display plot in window
plt.show()
I don't want to try explaining the whole program to you: I've used a few standard numpy tricks, including broadcasting, but otherwise, I have not modified much. I've not modified your LM function at all.
Please don't hesitate to ask me in the comments if you have any questions! I'm happy to explain specifics that you want explained.
A note on transients and iterates: Hopefully, now that the program runs much faster, you can try playing with these elements yourself. To me, the number of transients seemed to decide for how long the plot remained "deterministic-looking". The number of iterates just increases the density of plot lines, so increasing this beyond a point didn't seem to make sense to me.
I tried increasing the number of transients all the way up till 10,000. Here's my result from that experiment, for your reference:

python divide by zero encountered in log - logistic regression

I'm trying to implement a multiclass logistic regression classifier that distinguishes between k different classes.
This is my code.
import numpy as np
from scipy.special import expit
def cost(X,y,theta,regTerm):
(m,n) = X.shape
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
return J
def gradient(X,y,theta,regTerm):
(m,n) = X.shape
grad = np.dot(((expit(np.dot(X,theta))).reshape(m,1) - y).T,X)/m + (np.concatenate(([0],theta[1:].T),axis=0)).reshape(1,n)
return np.asarray(grad)
def train(X,y,regTerm,learnRate,epsilon,k):
(m,n) = X.shape
theta = np.zeros((k,n))
for i in range(0,k):
previousCost = 0;
currentCost = cost(X,y,theta[i,:],regTerm)
while(np.abs(currentCost-previousCost) > epsilon):
print(theta[i,:])
theta[i,:] = theta[i,:] - learnRate*gradient(X,y,theta[i,:],regTerm)
print(theta[i,:])
previousCost = currentCost
currentCost = cost(X,y,theta[i,:],regTerm)
return theta
trX = np.load('trX.npy')
trY = np.load('trY.npy')
theta = train(trX,trY,2,0.1,0.1,4)
I can verify that cost and gradient are returning values that are in the right dimension (cost returns a scalar, and gradient returns a 1 by n row vector), but i get the error
RuntimeWarning: divide by zero encountered in log
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
why is this happening and how can i avoid this?

The proper solution here is to add some small epsilon to the argument of log function. What worked for me was
epsilon = 1e-5
def cost(X, y, theta):
m = X.shape[0]
yp = expit(X # theta)
cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
return cost

You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator # for matrix multiplication — and breaking it up as suggested in the comments.
Here is your cost function:
def cost(X, y, theta, regTerm):
m = X.shape[0] # or y.shape, or even p.shape after the next line, number of training set
p = expit(X # theta)
log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
return J
You can clean up your gradient function along the same lines.
By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.

Cause:
This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.
Cost function:
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
Now, consider the following part in the above code snippet:
np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))
Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.

I'm guessing your data has negative values in it. You can't log a negative.
import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan
There are a lot of different ways to transform your data that should help, if this is the case.

def cost(X, y, theta):
yp = expit(X # theta)
cost = - np.average(y * np.log(yp) + (1 - y) * np.log(1 - yp))
return cost
The warning originates from np.log(yp) when yp==0 and in np.log(1 - yp) when yp==1. One option is to filter out these values, and not to pass them into np.log. The other option is to add a small constant to prevent the value from being exactly 0 (as suggested in one of the comments above)

Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all.
But i am not sure if it will give accurate results or not .

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Implementing Linear Regression using Gradient Descent - python

Related

Evaluating a function with a well-defined value at x,y=0

Minimizing this error function, using NumPy

Implementing stochastic gradient descent

Coupled map lattice in Python

python divide by zero encountered in log - logistic regression

Categories

Resources