I am learning gradient descent for calculating coefficients. Below is what I am doing:
#!/usr/bin/Python
import numpy as np
# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
#print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
X = np.array([41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8])
y = np.array([251.3,251.3,248.3,267.5,273.0,276.5,270.3,274.9,285.0,290.0,297.0,302.5,304.5,309.3,321.7,330.7,349.0])
n = np.max(X.shape)
x = np.vstack([np.ones(n), X]).T
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
Now my above code works fine. If I now try multiple variables and replace X with X1 like the following:
X1 = np.array([[41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8], [29.1,29.3,29.5,29.7,29.9,30.3,30.5,30.7,30.8,30.9,31.5,31.7,31.9,32.0,32.1,32.5,32.9]])
then my code fails and shows me the following error:
JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)
JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient
[ nan nan nan]
Can anybody tell me how can I do gradient descent using X1? My expected output using X1 is:
[-153.5 1.24 12.08]
I am open to other Python implementations also. I just want the coefficients (also called thetas) for X1 and y.
The problem is in your algorithm not converging. It diverges instead. The first error:
JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)
comes from the problem that at some point calculating the square of something is impossible, as the 64-bit floats cannot hold the number (i.e. it is > 10^309).
JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient
This is only a consequence of the error before. The numbers are not reasonable for calculations.
You can actually see the divergence by uncommenting your debug print line. The cost starts to grow, as there is no convergence.
If you try your function with X1 and a smaller value for alpha, it converges.
Related
I'm facing some issues trying to find the linear regression line using Gradient Descent, getting to weird results.
Here is the function:
def gradient_descent(m_k, c_k, learning_rate, points):
n = len(points)
dm, dc = 0, 0
for i in range(n):
x = points.iloc[i]['alcohol']
y = points.iloc[i]['total']
dm += -(2/n) * x * (y - (m_k * x + c_k)) # Partial der in m
dc += -(2/n) * (y - (m_k * x + c_k)) # Partial der in c
m = m_k - dm * learning_rate
c = c_k - dc * learning_rate
return m, c
And combined with a for loop
l_rate = 0.0001
m, c = 0, 0
epochs = 1000
for _ in range(epochs):
m, c = gradient_descent(m, c, l_rate, dataset)
plt.scatter(dataset.alcohol, dataset.total)
plt.plot(list(range(2, 10)), [m * x + c for x in range(2,10)], color='red')
plt.show()
Gives this result:
Slope: 2.8061974241244196
Y intercept: 0.5712221080810446
The problem is though that taking advantage of sklearn to compute the slope and intercept, i.e.
model = LinearRegression(fit_intercept=True).fit(np.array(dataset['alcohol']).copy().reshape(-1, 1),
np.array(dataset['total']).copy())
I get something completely different:
Slope: 2.0325063
Intercept: 5.8577761548263005
Any idea why? Looking on SO I've found out that a possible problem could be a too high learning rate, but as stated above I'm currently using 0.0001
Sklearn's LinearRegression doesn't use gradient descent - it uses Ordinary Least Squares (OLS) Regression which is a non-iterative method.
For your model, you might consider randomly initialising m, c rather than starting with 0,0. You could also consider adjusting the learning rate or using an adaptive learning rate.
I am trying to plot gradient descent cost_list with respect to epoch, but when I am trying to do so, I am getting lost with basic python function structure. I am appending my code structure what I am trying to do.
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.T
cost_list=[]
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
cost = np.sum(loss ** 2) / (2 * m)
cost_list.append(cost)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
#a = plt.plot(i,theta)
return theta,cost_list
what I am trying to do is I am return the "cost_list" at each step and creating a list of cost and I am trying to plot now with the below Line of codes.
theta,cost_list=gradientDescent(x,y,bias,0.000001,len(my dataframe),100)
plt.plot(list(range(numIterations)), cost_list, '-r')
but it's giving me error with numIterations not defined.
what should be the possible edit to the code
I tried your code with sample data;
df = pd.DataFrame(np.random.randint(1,50, size=(50,2)), columns=list('AB'))
x=df.A
y=df.B
bias = np.random.randn(50,1)
numIterations = 100
theta,cost_list=gradientDescent(x,y,bias,0.000001,len(df),numIterations)
plt.plot(list(range(numIterations)), cost_list, '-r')
Writing this algorithm for my final year project. Used gradient descent to find the minimum, but instead getting the cost as high as infinity.
I have checked the gradientDescent function. I believe that's correct.
The csv I am importing and its formatting is causing some error.
The data in the CSV is of below format.
Each quad before '|' is a row.
First 3 columns are independent variables x.
4th column is dependent y.
600 20 0.5 0.63 | 600 20 1 1.5 | 800 20 0.5 0.9
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
x = df.loc[:,'0':'2'].as_matrix()
y = df[3].as_matrix()
print(x)
print(y)
m, n = np.shape(x)
numIterations= 100
alpha = 0.001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
As forayer mentioned in the comments, the problem is in the line where you read the csv. You are setting delimiter=",", which means that python expects each column in your data to be separated by a comma. However, in your data, columns are apparently separated by a whitespace.
Just substitute the line with
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=" ",header=None)
I want to compute the log-likelihood of a logistic regression model.
def sigma(x):
return 1 / (1 + np.exp(-x))
def logll(y, X, w):
""""
Parameters
y : ndarray of shape (N,)
Binary labels (either 0 or 1).
X : ndarray of shape (N,D)
Design matrix.
w : ndarray of shape (D,)
Weight vector.
"""
p = sigma(X # w)
y_1 = y # np.log(p)
y_0 = (1 - y) # (1 - np.log(1 - p))
return y_1 + y_0
logll(y, Xz, np.linspace(-5,5,D))
Applying this function results in
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:16:
RuntimeWarning: divide by zero encountered in log
app.launch_new_instance()
I would expect y_0 to be a negative float. How can I avoid this error and is there a bug somewhere in the code?
Edit 1
X # w statistics:
Max: 550.775133944
Min: -141.972597608
Sigma(max): 1.0 => Throws error in y_0 in np.log(1 - 1.0)
Sigma(min): 2.19828642169e-62
Edit 2
I also have access to this logsigma function that computes sigma in log space:
def logsigma (x):
return np.vectorize(np.log)(sigma(x))
Unfortunately, I don't find a way to rewrite y_0 then. The following is my approach but obviously not correct.
def l(y, X, w):
y_1 = np.dot(y, logsigma(X # w))
y_0 = (1 - y) # (1 - np.log(1 - logsigma(X # w)))
return y_1 + y_0
First of all, I think you've made a mistake in your log-likelihood formula: it should be a plain sum of y_0 and y_1, not sum of exponentials:
Division by zero can be caused by large negative values (I mean large by abs value) in X # w, e.g. sigma(-800) is exactly 0.0 on my machine, so the log of it results in "RuntimeWarning: divide by zero encountered in log".
Make sure you initialize your network with small values near zero and you don't have exploding gradients after several iterations of backprop.
By the way, here's the code I use for cross-entropy loss, which works also in multi-class problems:
def softmax_loss(x, y):
"""
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C
"""
probs = np.exp(x - np.max(x, axis=1, keepdims=True))
probs /= np.sum(probs, axis=1, keepdims=True)
N = x.shape[0]
return -np.sum(np.log(probs[np.arange(N), y])) / N
UPD: When nothing else helps, there is one more numerical trick (discussed in the comments): compute log(p+epsilon) and log(1-p+epsilon) with a small positive epsilon value. This ensures that log(0.0) never happens.
I'm trying to implement a multiclass logistic regression classifier that distinguishes between k different classes.
This is my code.
import numpy as np
from scipy.special import expit
def cost(X,y,theta,regTerm):
(m,n) = X.shape
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
return J
def gradient(X,y,theta,regTerm):
(m,n) = X.shape
grad = np.dot(((expit(np.dot(X,theta))).reshape(m,1) - y).T,X)/m + (np.concatenate(([0],theta[1:].T),axis=0)).reshape(1,n)
return np.asarray(grad)
def train(X,y,regTerm,learnRate,epsilon,k):
(m,n) = X.shape
theta = np.zeros((k,n))
for i in range(0,k):
previousCost = 0;
currentCost = cost(X,y,theta[i,:],regTerm)
while(np.abs(currentCost-previousCost) > epsilon):
print(theta[i,:])
theta[i,:] = theta[i,:] - learnRate*gradient(X,y,theta[i,:],regTerm)
print(theta[i,:])
previousCost = currentCost
currentCost = cost(X,y,theta[i,:],regTerm)
return theta
trX = np.load('trX.npy')
trY = np.load('trY.npy')
theta = train(trX,trY,2,0.1,0.1,4)
I can verify that cost and gradient are returning values that are in the right dimension (cost returns a scalar, and gradient returns a 1 by n row vector), but i get the error
RuntimeWarning: divide by zero encountered in log
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
why is this happening and how can i avoid this?
The proper solution here is to add some small epsilon to the argument of log function. What worked for me was
epsilon = 1e-5
def cost(X, y, theta):
m = X.shape[0]
yp = expit(X # theta)
cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
return cost
You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator # for matrix multiplication — and breaking it up as suggested in the comments.
Here is your cost function:
def cost(X, y, theta, regTerm):
m = X.shape[0] # or y.shape, or even p.shape after the next line, number of training set
p = expit(X # theta)
log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
return J
You can clean up your gradient function along the same lines.
By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.
Cause:
This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.
Cost function:
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
Now, consider the following part in the above code snippet:
np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))
Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.
I'm guessing your data has negative values in it. You can't log a negative.
import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan
There are a lot of different ways to transform your data that should help, if this is the case.
def cost(X, y, theta):
yp = expit(X # theta)
cost = - np.average(y * np.log(yp) + (1 - y) * np.log(1 - yp))
return cost
The warning originates from np.log(yp) when yp==0 and in np.log(1 - yp) when yp==1. One option is to filter out these values, and not to pass them into np.log. The other option is to add a small constant to prevent the value from being exactly 0 (as suggested in one of the comments above)
Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all.
But i am not sure if it will give accurate results or not .