I am trying to plot gradient descent cost_list with respect to epoch, but when I am trying to do so, I am getting lost with basic python function structure. I am appending my code structure what I am trying to do.
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.T
cost_list=[]
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
cost = np.sum(loss ** 2) / (2 * m)
cost_list.append(cost)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
#a = plt.plot(i,theta)
return theta,cost_list
what I am trying to do is I am return the "cost_list" at each step and creating a list of cost and I am trying to plot now with the below Line of codes.
theta,cost_list=gradientDescent(x,y,bias,0.000001,len(my dataframe),100)
plt.plot(list(range(numIterations)), cost_list, '-r')
but it's giving me error with numIterations not defined.
what should be the possible edit to the code
I tried your code with sample data;
df = pd.DataFrame(np.random.randint(1,50, size=(50,2)), columns=list('AB'))
x=df.A
y=df.B
bias = np.random.randn(50,1)
numIterations = 100
theta,cost_list=gradientDescent(x,y,bias,0.000001,len(df),numIterations)
plt.plot(list(range(numIterations)), cost_list, '-r')
Related
I'm facing some issues trying to find the linear regression line using Gradient Descent, getting to weird results.
Here is the function:
def gradient_descent(m_k, c_k, learning_rate, points):
n = len(points)
dm, dc = 0, 0
for i in range(n):
x = points.iloc[i]['alcohol']
y = points.iloc[i]['total']
dm += -(2/n) * x * (y - (m_k * x + c_k)) # Partial der in m
dc += -(2/n) * (y - (m_k * x + c_k)) # Partial der in c
m = m_k - dm * learning_rate
c = c_k - dc * learning_rate
return m, c
And combined with a for loop
l_rate = 0.0001
m, c = 0, 0
epochs = 1000
for _ in range(epochs):
m, c = gradient_descent(m, c, l_rate, dataset)
plt.scatter(dataset.alcohol, dataset.total)
plt.plot(list(range(2, 10)), [m * x + c for x in range(2,10)], color='red')
plt.show()
Gives this result:
Slope: 2.8061974241244196
Y intercept: 0.5712221080810446
The problem is though that taking advantage of sklearn to compute the slope and intercept, i.e.
model = LinearRegression(fit_intercept=True).fit(np.array(dataset['alcohol']).copy().reshape(-1, 1),
np.array(dataset['total']).copy())
I get something completely different:
Slope: 2.0325063
Intercept: 5.8577761548263005
Any idea why? Looking on SO I've found out that a possible problem could be a too high learning rate, but as stated above I'm currently using 0.0001
Sklearn's LinearRegression doesn't use gradient descent - it uses Ordinary Least Squares (OLS) Regression which is a non-iterative method.
For your model, you might consider randomly initialising m, c rather than starting with 0,0. You could also consider adjusting the learning rate or using an adaptive learning rate.
I'm just starting out learning machine learning and have been trying to fit a polynomial to data generated with a sine curve. I know how to do this in closed form, but I'm trying to get it to work with gradient descent too.
However, my weights explode to crazy heights, even with a very large penalty term. What am I doing wrong?
Here is the code:
import numpy as np
import matplotlib.pyplot as plt
from math import pi
N = 10
D = 5
X = np.linspace(0,100, N)
Y = np.sin(0.1*X)*50
X = X.reshape(N, 1)
Xb = np.array([[1]*N]).T
for i in range(1, D):
Xb = np.concatenate((Xb, X**i), axis=1)
#Randomly initializie the weights
w = np.random.randn(D)/np.sqrt(D)
#Solving in closed form works
#w = np.linalg.solve((Xb.T.dot(Xb)),Xb.T.dot(Y))
#Yhat = Xb.dot(w)
#Gradient descent
learning_rate = 0.0001
for i in range(500):
Yhat = Xb.dot(w)
delta = Yhat - Y
w = w - learning_rate*(Xb.T.dot(delta) + 100*w)
print('Final w: ', w)
plt.scatter(X, Y)
plt.plot(X,Yhat)
plt.show()
Thanks!
When updating theta, you have to take theta and subtract it with the learning weight times the derivative of theta divided by the training set size. You also have to divide your penality term by the training size set. But the main problem is that your learning rate is too large. For future debugging, it is helpful to print the cost to see if gradient descent is working and if the learning rate is too small or just right.
Below here is the code for 2nd degree polynomial which the found the optimum thetas (as you can see the learning rate is really small). I've also added the cost function.
N = 2
D = 2
#Gradient descent
learning_rate = 0.000000000001
for i in range(200):
Yhat = Xb.dot(w)
delta = Yhat - Y
print((1/N) * np.sum(np.dot(delta, np.transpose(delta))))
w = w - learning_rate*(np.dot(delta, Xb)) * (1/N)
Writing this algorithm for my final year project. Used gradient descent to find the minimum, but instead getting the cost as high as infinity.
I have checked the gradientDescent function. I believe that's correct.
The csv I am importing and its formatting is causing some error.
The data in the CSV is of below format.
Each quad before '|' is a row.
First 3 columns are independent variables x.
4th column is dependent y.
600 20 0.5 0.63 | 600 20 1 1.5 | 800 20 0.5 0.9
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
x = df.loc[:,'0':'2'].as_matrix()
y = df[3].as_matrix()
print(x)
print(y)
m, n = np.shape(x)
numIterations= 100
alpha = 0.001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
As forayer mentioned in the comments, the problem is in the line where you read the csv. You are setting delimiter=",", which means that python expects each column in your data to be separated by a comma. However, in your data, columns are apparently separated by a whitespace.
Just substitute the line with
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=" ",header=None)
I'm new in machine learning. I started from linear regression with gradient descent. I have python code for this and I understad this way. My question is: Gradient descent algorithm minimize function, can I plot this function? I want to see what the function in which the minimum is looked like. It possible?
My code:
import matplotlib.pyplot as plt import numpy as np
def sigmoid_activation(x):
return 1.0 / (1 + np.exp(-x))
X = np.array([
[2.13, 5.49],
[8.35, 6.74],
[8.17, 5.79],
[0.62, 8.54],
[2.74, 6.92] ])
y = [0, 1, 1, 0, 0]
xdata = [row[0] for row in X] ydata = [row[1] for row in X]
X = np.c_[np.ones((X.shape[0])), X] W = np.random.uniform(size=(X.shape[1], ))
lossHistory = []
for epoch in np.arange(0, 5):
preds = sigmoid_activation(X.dot(W))
error = preds - y
loss = np.sum(error ** 2)
lossHistory.append(loss)
gradient = X.T.dot(error) / X.shape[0]
W += - 0.44 * gradient
plt.scatter(xdata, ydata) plt.show()
plt.plot(np.arange(0, 5), lossHistory) plt.show()
for i in np.random.choice(5, 5):
activation = sigmoid_activation(X[i].dot(W))
label = 0 if activation < 0.5 else 1
print("activation={:.4f}; predicted_label={}, true_label={}".format(
activation, label, y[i]))
Y = (-W[0] - (W[1] * X)) / W[2]
plt.scatter(X[:, 1], X[:, 2], c=y) plt.plot(X, Y, "r-") plt.show()
With the risk of being obvious... You can simply plot lossHistory with matplotlib. Or am I missing something?
EDIT: apparently the OP asked what the Gradient Descent (GD) is minimizing. I will try to answer here and I hope I can answer the original question.
The GD algorithm is a generic algorithm to find the minimum of a function in parameter space. In your case (and that is how is usually used with Neural Networks) you want to find the minimum of a loss function: the MSE (Mean Squared Error). You implement the GD algorithm updating the weights as you did with
gradient = X.T.dot(error) / X.shape[0]
W += - 0.44 * gradient
The gradient is just the partial derivative of your loss function (the MSE) with respect to the weights. So are effectively minimizing the loss function (MSE). Then you update your weights with a learning rate of 0.44.
Then you simply save the value of your loss function in the array
loss = np.sum(error ** 2)
lossHistory.append(loss)
and therefore the lossHistory array contains your cost (or loss) function that you can plot to check your learning process. The plot should show something decreasing. Does this explanation help you?
Best,
Umberto
I am learning gradient descent for calculating coefficients. Below is what I am doing:
#!/usr/bin/Python
import numpy as np
# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
#print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
X = np.array([41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8])
y = np.array([251.3,251.3,248.3,267.5,273.0,276.5,270.3,274.9,285.0,290.0,297.0,302.5,304.5,309.3,321.7,330.7,349.0])
n = np.max(X.shape)
x = np.vstack([np.ones(n), X]).T
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
Now my above code works fine. If I now try multiple variables and replace X with X1 like the following:
X1 = np.array([[41.9,43.4,43.9,44.5,47.3,47.5,47.9,50.2,52.8,53.2,56.7,57.0,63.5,65.3,71.1,77.0,77.8], [29.1,29.3,29.5,29.7,29.9,30.3,30.5,30.7,30.8,30.9,31.5,31.7,31.9,32.0,32.1,32.5,32.9]])
then my code fails and shows me the following error:
JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)
JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient
[ nan nan nan]
Can anybody tell me how can I do gradient descent using X1? My expected output using X1 is:
[-153.5 1.24 12.08]
I am open to other Python implementations also. I just want the coefficients (also called thetas) for X1 and y.
The problem is in your algorithm not converging. It diverges instead. The first error:
JustTestingSGD.py:14: RuntimeWarning: overflow encountered in square
cost = np.sum(loss ** 2) / (2 * m)
comes from the problem that at some point calculating the square of something is impossible, as the 64-bit floats cannot hold the number (i.e. it is > 10^309).
JustTestingSGD.py:19: RuntimeWarning: invalid value encountered in subtract
theta = theta - alpha * gradient
This is only a consequence of the error before. The numbers are not reasonable for calculations.
You can actually see the divergence by uncommenting your debug print line. The cost starts to grow, as there is no convergence.
If you try your function with X1 and a smaller value for alpha, it converges.