So i wanted to do a gradient descent in Python so that i can find the global minimum of f, where x=10, learning rate is 0.01, epsilon is 0.00001 and max. number of iterations is 10000
# parameters to set
x = 10 # Starting value of x
alpha = 0.01 # Set learning rate
epsilon = 0.00001 # Stop algorithm when absolute difference between 2 consecutive x-values is less than epsilon
max_iter = 10000 # set maximum number of iterations
# Define function and derivative of function
f = lambda x: x**4-3*x**3+15
fprime = lambda x: 4*x**3-9*x**2
# Initialising
diff = 1 # initialise difference between 2 consecutive x-values
iter = 1 # iterations counter
# Now Gradient Descent
while diff > epsilon and iter < max_iter: # 2 stopiing criteria
x_new = x - alpha * fprime(x) # update rule
print("Iteration ", iter, ": x-value is:", x_new,", f(x) is: ", f(x_new) )
diff = abs(x_new - x)
iter = iter + 1
x = x_new
print("The local minimum occurs at: ", x)
But the thing is, when i run the entire code, it only manages to print out 5 iterations and then i encounter a OverFlowError message.
Your learning rate is too high, and thus causing the divergence that you're observing. A value of alpha = 0.001 converges to a local minimum:
# parameters to set
x = 10 # Starting value of x
alpha = 0.001 # Set learning rate
epsilon = 0.00001 # Stop algorithm when absolute difference between 2 consecutive x-values is less than epsilon
max_iter = 10000 # set maximum number of iterations
# Define function and derivative of function
f = lambda x: x**4-3*x**3+15
fprime = lambda x: 4*x**3-9*x**2
# Initialising
diff = 1 # initialise difference between 2 consecutive x-values
iter = 1 # iterations counter
# Now Gradient Descent
while diff > epsilon and iter < max_iter: # 2 stopiing criteria
x_new = x - alpha * fprime(x) # update rule
print("Iteration ", iter, ": x-value is:", x_new,", f(x) is: ", f(x_new) )
diff = abs(x_new - x)
iter = iter + 1
x = x_new
print("The local minimum occurs at: ", x)
Related
I have the following code where I have implemented gradient descent for a function using pyTorch. How do I add noise to the code so that it identifies both local minima?
import torch
startVal = -5.0
alpha = 0.001
space = " "
progressionCheck = True
x = torch.tensor(startVal, requires_grad=True)
def function(a):
f = a**4 - a**3 - a**2 + a - 1
return f
for i in range(1000):
function(x).backward()
newVal = x - alpha * (x.grad)
progressionCheck = function(newVal) < function(startVal)
x = newVal.detach().clone().requires_grad_()
print(x)
print("The minimum value occurs at" + space + str(float(x)))
print("The minimum value is" + space + str(function(float(x))))
I assume you intend to disturb the gradients by some noise. To do so, you could specify a distribution e.g. as follows
low, high = -0.1, 0.1
dist = torch.distributions.uniform.Uniform(low, high)
and then sample from it to update the gradients, i.e. adjust
newVal = x - alpha * (x.grad)
to
newVal = x - alpha * (x.grad) * dist.sample([1]).item()
Altneratively, sample the noise in advance
noise = dist.sample([1000])
and then index it
newVal = x - alpha * (x.grad) * noise[i]
However, I doubt this fulfils the purpose and don't see how you could avoid multiple runs coupled with using varying start values (or, less beautiful, very large noise or step size) to find multiple local minima.
I am writing a code for successive over-relaxation.
When I run the code, I get the following error:
x[i] = (1-w)xold[i] + w(d[i] + sum(C[i,:]*x)) # estimate new values
OverflowError: Python int too large to convert to C long
# import libraries
import numpy as np
# define function
# M is the coeff matrix; b is RHS matrix, x is the initial guesses
# tol is acceptable tolerance and Nmax = max. iterations
def sor(M,b,x,w,tol,Nmax):
N = len(M) # length of the coefficient matrix
C = np.zeros((N,N)) # initialize iteration coeff matrix
d = np.zeros(N) # initiation iteration RHS matrix
# Create iteration matrix
for i in np.arange(0,N,1):
pvt = M[i,i] # identify the pivot element
C[i,:] = -M[i,:]/pvt # divide coefficient by pivot
C[i,i] = 0 # element the pivot element
d[i] = b[i]/pvt # divide RHS by Pivot element
# Perform iterations
res = 100 # create a high res so there is at least 1 iteration
iter = 0 #initialize iteration
xold = 1.0*x # initialize xold
#res = np.linalg.norm(np.matmul(M,x) - b)
# iterate when residual > tol or iter <= max iterations
while(res > tol and iter <= Nmax):
for i in np.arange(0,N,1): # loop through all unknowns
x[i] = (1-w)*xold[i] + w*(d[i] + sum(C[i,:]*x)) # estimate new values
res = np.sum(np.abs(np.matmul(M,x) - b)) # compute res
iter = iter + 1 # update residual
xold = x
return(x)
# Solve Example
Nmax = 100 # Max. Number of iteration
tol = 1e-03 # Absolute tolerance
M = [[1,1,1,0,0,0],
[1,1,1,1,0,0],
[1,1,1,1,1,0],
[0,1,1,1,1,1],
[0,0,1,1,1,1],
[0,0,0,1,1,1]]
M = np.array(M) # Coefficient Matrix
b = np.array([1,1,0.5,1,0.5,1])
y = [0,0,0,0,0,0]
y = np.array(y) # Initial Guesses
w = 1
X = sor(M,b,y,w,tol,Nmax) # Apply the function
print(X)
I am suppose to get the same answer as the built in function:
[ 1. 0.5 -0.5 0. -0.5 1.5]
What am I missing that is causing this issue?
Thanks in advance!
I'm trying to write a code that return the parameters for ridge regression using gradient descent. Ridge regression is defined as
Where, L is the loss (or cost) function. w are the parameters of the loss function (which assimilates b). x are the data points. y are the labels for each vector x. lambda is a regularization constant. b is the intercept parameter (which is assimilated into w). So, L(w,b) = number
The gradient descent algorithm that I should implement looks like this:
Where ∇
is the gradient of L with respect to w. η
is a step size. t is the time or iteration counter.
My code:
def ridge_regression_GD(x,y,C):
x=np.insert(x,0,1,axis=1) # adding a feature 1 to x at beggining nxd+1
w=np.zeros(len(x[0,:])) # d+1
t=0
eta=1
summ = np.zeros(1)
grad = np.zeros(1)
losses = np.array([0])
loss_stry = 0
while eta > 2**-30:
for i in range(0,len(y)): # here we calculate the summation for all rows for loss and gradient
summ=summ+((y[i,]-np.dot(w,x[i,]))*x[i,])
loss_stry=loss_stry+((y[i,]-np.dot(w,x[i,]))**2)
losses=np.insert(losses,len(losses),loss_stry+(C*np.dot(w,w)))
grad=((-2)*summ)+(np.dot((2*C),w))
eta=eta/2
w=w-(eta*grad)
t+=1
summ = np.zeros(1)
loss_stry = 0
b=w[0]
w=w[1:]
return w,b,losses
The output should be the intercept parameter b, the vector w and the loss in each iteration, losses.
My problem is that when I run the code I get increasing values for w and for the losses, both in the order of 10^13.
Would really appreciate if you could help me out. If you need any more information or clarification just ask for it.
NOTE: This post was deleted from Cross Validated forum. If there's a better forum to post it please let me know.
After I check your code, turns out your implementation of Ridge regression is correct, the problem of increasing values for w which led to increasing losses you get is due to extreme and unstable update value of parameters (i.e abs(eta*grad) is too big), so I adjust the learning rate and weights decay rate to appropriate range and change the way you decay the learning rate then everything work as expected:
import numpy as np
sample_num = 100
x_dim = 10
x = np.random.rand(sample_num, x_dim)
w_tar = np.random.rand(x_dim)
b_tar = np.random.rand(1)[0]
y = np.matmul(x, np.transpose([w_tar])) + b_tar
C = 1e-6
def ridge_regression_GD(x,y,C):
x = np.insert(x,0,1,axis=1) # adding a feature 1 to x at beggining nxd+1
x_len = len(x[0,:])
w = np.zeros(x_len) # d+1
t = 0
eta = 3e-3
summ = np.zeros(x_len)
grad = np.zeros(x_len)
losses = np.array([0])
loss_stry = 0
for i in range(50):
for i in range(len(y)): # here we calculate the summation for all rows for loss and gradient
summ = summ + (y[i,] - np.dot(w, x[i,])) * x[i,]
loss_stry += (y[i,] - np.dot(w, x[i,]))**2
losses = np.insert(losses, len(losses), loss_stry + C * np.dot(w, w))
grad = -2 * summ + np.dot(2 * C,w)
w -= eta * grad
eta *= 0.9
t += 1
summ = np.zeros(1)
loss_stry = 0
return w[1:], w[0], losses
w, b, losses = ridge_regression_GD(x, y, C)
print("losses: ", losses)
print("b: ", b)
print("b_tar: ", b_tar)
print("w: ", w)
print("w_tar", w_tar)
x_pre = np.random.rand(3, x_dim)
y_tar = np.matmul(x_pre, np.transpose([w_tar])) + b_tar
y_pre = np.matmul(x_pre, np.transpose([w])) + b
print("y_pre: ", y_pre)
print("y_tar: ", y_tar)
Outputs:
losses: [ 0 1888 2450 2098 1128 354 59 5 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1]
b: 1.170527138363387
b_tar: 0.894306608050021
w: [0.7625987 0.6027163 0.58350218 0.49854847 0.52451963 0.59963663
0.65156702 0.61188389 0.74257133 0.67164963]
w_tar [0.82757802 0.76593551 0.74074476 0.37049698 0.40177269 0.60734677
0.72304859 0.65733725 0.91989305 0.79020028]
y_pre: [[3.44989377]
[4.77838804]
[3.53541958]]
y_tar: [[3.32865041]
[4.74528037]
[3.42093559]]
As you can see from losses change at outputs, the learning rate eta = 3e-3 is still bit two much, so the loss will go up at first few training episode, but start to drop when learning rate decay to appropriate value.
I have a specific analytical gradient I am using to calculate my cost f(x,y), and gradients dx and dy. It runs, but I can't tell if my gradient descent is broken. Should I plot my partial derivatives x and y?
import math
gamma = 0.00001 # learning rate
iterations = 10000 #steps
theta = np.array([0,5]) #starting value
thetas = []
costs = []
# calculate cost of any point
def cost(theta):
x = theta[0]
y = theta[1]
return 100*x*math.exp(-0.5*x*x+0.5*x-0.5*y*y-y+math.pi)
def gradient(theta):
x = theta[0]
y = theta[1]
dx = 100*math.exp(-0.5*x*x+0.5*x-0.0035*y*y-y+math.pi)*(1+x*(-x + 0.5))
dy = 100*x*math.exp(-0.5*x*x+0.5*x-0.05*y*y-y+math.pi)*(-y-1)
gradients = np.array([dx,dy])
return gradients
#for 2 features
for step in range(iterations):
theta = theta - gamma*gradient(theta)
value = cost(theta)
thetas.append(theta)
costs.append(value)
thetas = np.array(thetas)
X = thetas[:,0]
Y = thetas[:,1]
Z = np.array(costs)
iterations = [num for num in range(iterations)]
plt.plot(Z)
plt.xlabel("num. iteration")
plt.ylabel("cost")
I strongly recommend you check whether or not your analytic gradient is working correcly by first evaluating it against a numerical gradient.
I.e make sure that your f'(x) = (f(x+h) - f(x)) / h for some small h.
After that, make sure your updates are actually in the right direction by picking a point where you know x or y should decrease and then checking the sign of your gradient function output.
Of course make sure your goal is actually minimization vs maximization.
I am trying to implement gradient descent on a dataset. Even though I tried everything, I couldn't make it work. So, I created a test case. I am trying my code on a random data and try to debug.
More specifically, what I am doing is, I am generating random vectors between 0-1 and random labels for these vectors. And try to over-fit the training data.
However, my weight vector gets bigger and bigger in each iteration. And then, I have infinities. So, I do not actually learn anything. Here is my code:
import numpy as np
import random
def getRandomVector(n):
return np.random.uniform(0,1,n)
def getVectors(m, n):
return [getRandomVector(n) for i in range(n)]
def getLabels(n):
return [random.choice([-1,1]) for i in range(n)]
def GDLearn(vectors, labels):
maxIterations = 100
stepSize = 0.01
w = np.zeros(len(vectors[0])+1)
for i in range(maxIterations):
deltaw = np.zeros(len(vectors[0])+1)
for i in range(len(vectors)):
temp = np.append(vectors[i], -1)
deltaw += ( labels[i] - np.dot(w, temp) ) * temp
w = w + ( stepSize * (-1 * deltaw) )
return w
vectors = getVectors(100, 30)
labels = getLabels(100)
w = GDLearn(vectors, labels)
print w
I am using LMS for loss function. So, in all iterations, my update is the following,
where w^i is the ith weight vector and R is the stepSize and E(w^i) is the loss function.
Here is my loss function. (LMS)
and here is how I derivated the loss function,
,
Now, my questions are:
Should I expect good results in this random scenario using Gradient Descent? (What is the theoretical bounds?)
If yes, what is my bug in my implementation?
PS: I tried several other maxIterations and stepSize parameters. Still not working.
PS2: This is the best way I can ask the question here. Sorry if the question is too specific. But it made me crazy. I really want to learn the problem.
Your code has a couple of faults:
In GetVectors() method, you did not actually use the input variable m;
In GDLearn() method, you have a double loop, but you use the same variable i as the loop variables in both loops. (I guess the logic is still right, but it's confusing).
The prediction error (labels[i] - np.dot(w, temp)) has the wrong sign.
Step size does matters. If I am using 0.01 as step size, the cost is increasing in each iteration. Changing it to be 0.001 solved the problem.
Here is my revised code based on your original code.
import numpy as np
import random
def getRandomVector(n):
return np.random.uniform(0,1,n)
def getVectors(m, n):
return [getRandomVector(n) for i in range(m)]
def getLabels(n):
return [random.choice([-1,1]) for i in range(n)]
def GDLearn(vectors, labels):
maxIterations = 100
stepSize = 0.001
w = np.zeros(len(vectors[0])+1)
for iter in range(maxIterations):
cost = 0
deltaw = np.zeros(len(vectors[0])+1)
for i in range(len(vectors)):
temp = np.append(vectors[i], -1)
prediction_error = np.dot(w, temp) - labels[i]
deltaw += prediction_error * temp
cost += prediction_error**2
w = w - stepSize * deltaw
print 'cost at', iter, '=', cost
return w
vectors = getVectors(100, 30)
labels = getLabels(100)
w = GDLearn(vectors, labels)
print w
Running result -- you can see the cost is decreasing with each iteration but with a diminishing return.
cost at 0 = 100.0
cost at 1 = 99.4114482617
cost at 2 = 98.8476022685
cost at 3 = 98.2977744556
cost at 4 = 97.7612851154
cost at 5 = 97.2377571222
cost at 6 = 96.7268325883
cost at 7 = 96.2281642899
cost at 8 = 95.7414151147
cost at 9 = 95.2662577529
cost at 10 = 94.8023744037
......
cost at 90 = 77.367904046
cost at 91 = 77.2744249433
cost at 92 = 77.1823702888
cost at 93 = 77.0917090883
cost at 94 = 77.0024111475
cost at 95 = 76.9144470493
cost at 96 = 76.8277881325
cost at 97 = 76.7424064707
cost at 98 = 76.6582748518
cost at 99 = 76.5753667579
[ 0.16232142 -0.2425511 0.35740632 0.22548442 0.03963853 0.19595213
0.20080207 -0.3921798 -0.0238925 0.13097533 -0.1148932 -0.10077534
0.00307595 -0.30111942 -0.17924479 -0.03838637 -0.23938181 0.1384443
0.22929163 -0.0132466 0.03325976 -0.31489526 0.17468025 0.01351012
-0.25926117 0.09444201 0.07637793 -0.05940019 0.20961315 0.08491858
0.07438357]