Coordinate descent in Python [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I want to implement Coordinate Descent in Python and compare the result with that of Gradient Descent. I wrote the code. But it does not work well. GD is maybe ok, but CD is not good.
This is the reference to Coordinate Descent. --- LINK
And I expected this result
Gradient Descent vs Coordinate Descent
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(10, 100, 1000)
y = np.linspace(10, 100, 1000)
def func(x, y, param):
return param[0] * x + param[1] * y
def costf(X, y, param):
return np.sum((X.dot(param) - y) ** 2)/(2 * len(y))
z = func(x, y, [5, 8]) + np.random.normal(0., 10.)
z = z.reshape(-1, 1)
interc = np.ones(1000)
X = np.concatenate([interc.reshape(-1, 1), x.reshape(-1, 1), y.reshape(-1, 1)], axis=1)
#param = np.random.randn(3).T
param = np.array([2, 2, 2])
def gradient_descent(X, y, param, eta=0.01, iter=100):
cost_history = [0] * iter
for iteration in range(iter):
h = X.dot(param)
loss = h - y
gradient = X.T.dot(loss)/(2 * len(y))
param = param - eta * gradient
cost = costf(X, y, param)
#print(cost)
cost_history[iteration] = cost
return param, cost_history
def coordinate_descent(X, y, param, iter=100):
cost_history = [0] * iter
for iteration in range(iter):
for i in range(len(param)):
dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))
cost = costf(X, y, param)
#print(cost)
cost_history[iteration] = cost
return param, cost_history
ret, xret = gradient_descent(X, z, param)
cret, cxret = coordinate_descent(X, z, param)
plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()
plt.show()
Thank you in advance.

I'm not an expert in optimaztion. Here's only some advise.
Your code looks pretty good to me, except that
1.the following two lines seems wrong
loss = h - y
param[i] = np.dot(X[:,i].T, (y - dele))/np.sum(np.square(X[:,i]))
You reshaped y to (n_samples, 1) with z = z.reshape(-1, 1), while h and dele has shape (n_samples,). This results a (n_samples, n_samples) matrix which I think is not what you wanted.
2.your data might not be that good for test purpose. It doesn't mean that it won't work but you may need some extract effort to tune the learning rate / number of iterations.
I tried your code on boston dataset, the output looks like this.
the code
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
# boston house-prices dataset
data = load_boston()
X, y = data.data, data.target
X = StandardScaler().fit_transform(X) # for easy convergence
X = np.hstack([X, np.ones((X.shape[0], 1))])
param = np.zeros(X.shape[1])
def gradient_descent(X, y, param, eta=0.01, iter=300):
cost_history = [0] * (iter+1)
cost_history[0] = costf(X, y, param) # you may want to save initial cost
for iteration in range(iter):
h = X.dot(param)
loss = h - y.ravel()
gradient = X.T.dot(loss)/(2 * len(y))
param = param - eta * gradient
cost = costf(X, y, param)
#print(cost)
cost_history[iteration+1] = cost
return param, cost_history
def coordinate_descent(X, y, param, iter=300):
cost_history = [0] * (iter+1)
cost_history[0] = costf(X, y, param)
for iteration in range(iter):
for i in range(len(param)):
dele = np.dot(np.delete(X, i, axis=1), np.delete(param, i, axis=0))
param[i] = np.dot(X[:,i].T, (y.ravel() - dele))/np.sum(np.square(X[:,i]))
cost = costf(X, y, param)
cost_history[iteration+1] = cost
return param, cost_history
ret, xret = gradient_descent(X, y, param)
cret, cxret = coordinate_descent(X, y, param.copy())
plt.plot(range(len(xret)), xret, label="GD")
plt.plot(range(len(cxret)), cxret, label="CD")
plt.legend()
If this is what you want, you can then try to test on some other data. Just be careful that even the implementation is right, you still need to tune hyperparamters to get good convergence.

Related

Implement gradient descent in python

I am trying to implement gradient descent in python. Though my code is returning result by I think results I am getting are completely wrong.
Here is the code I have written:
import numpy as np
import pandas
dataset = pandas.read_csv('D:\ML Data\house-prices-advanced-regression-techniques\\train.csv')
X = np.empty((0, 1),int)
Y = np.empty((0, 1), int)
for i in range(dataset.shape[0]):
X = np.append(X, dataset.at[i, 'LotArea'])
Y = np.append(Y, dataset.at[i, 'SalePrice'])
X = np.c_[np.ones(len(X)), X]
Y = Y.reshape(len(Y), 1)
def gradient_descent(X, Y, theta, iterations=100, learningRate=0.000001):
m = len(X)
for i in range(iterations):
prediction = np.dot(X, theta)
theta = theta - (1/m) * learningRate * (X.T.dot(prediction - Y))
return theta
theta = np.random.randn(2,1)
theta = gradient_descent(X, Y, theta)
print('theta',theta)
The result I get after running this program is:
theta [[-5.23237458e+228]
[-1.04560188e+233]]
Which are very high values. Can someone point out the mistake I have made in implementation.
Also, 2nd problem is I have to set value of learning rate very low (in this case i have set to 0.000001) to work other wise program throws an error.
Please help me in diagnosis the problem.
try to reduce the learning rate with iteration otherwise it wont be able to reach the optimal lowest.try this
import numpy as np
import pandas
dataset = pandas.read_csv('start.csv')
X = np.empty((0, 1),int)
Y = np.empty((0, 1), int)
for i in range(dataset.shape[0]):
X = np.append(X, dataset.at[i, 'R&D Spend'])
Y = np.append(Y, dataset.at[i, 'Profit'])
X = np.c_[np.ones(len(X)), X]
Y = Y.reshape(len(Y), 1)
def gradient_descent(X, Y, theta, iterations=50, learningRate=0.01):
m = len(X)
for i in range(iterations):
prediction = np.dot(X, theta)
theta = theta - (1/m) * learningRate * (X.T.dot(prediction - Y))
learningRate/=10;
return theta
theta = np.random.randn(2,1)
theta = gradient_descent(X, Y, theta)
print('theta',theta)

My vectorization implementation of gradient descent does not get me the right answer

I'm currently working on Andrew Ng's gradient descent exercise using python but keeps getting me the wrong optimal theta. I followed this vectorization cheatsheet for gradient descent --- https://medium.com/ml-ai-study-group/vectorized-implementation-of-cost-functions-and-gradient-vectors-linear-regression-and-logistic-31c17bca9181.
Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def cost_func(X, Y, theta):
m = len(X)
H = X.dot(theta)
J = 1/(2*m) * (H - Y).T.dot(H - Y)
return J
def gradient_descent(X, Y, alpha=0.01, iterations=1500):
#initializing theta as a zero vector
theta = np.zeros(X.shape[1])
#initializing the a list of cost function value
J_list = [cost_func(X, Y, theta)]
m = len(X)
while iterations > 0:
H = X.dot(theta)
delta = (1/m)*X.T.dot(H - Y)
theta = theta - alpha * delta
iterations -= 1
J_list.append(cost_func(X, Y, theta))
return theta, J_list
def check_convergence(J_list):
plt.plot(range(len(J_list)), J_list)
plt.xlabel('Iterations')
plt.ylabel('Cost J')
plt.show()
file_name_1 = 'https://raw.githubusercontent.com/kaleko/CourseraML/master/ex1/data/ex1data1.txt'
df1 = pd.read_csv(file_name_1, header=None)
X = df1.values[:, 0]
Y = df1.values[:, 1]
m = len(X)
X = np.column_stack((np.ones(m), X))
theta_optimal, J_list = gradient_descent(X, Y, 0.01, 1500)
print(theta_optimal)
check_convergence(J_list)
My theta output is [-3.63029144 1.16636235], which is incorrect.
Here is my cost function graph. As you see, it converges way too quickly.
The correct graph should look like.
Thank you.

Shape mismatch error with scipy.optimize.minimize for logistic regression

I am going through's Andrew Ng's ML course, and I am trying to implement the programs in python. For the second exercise, on logistic regression, I am trying to use scipy.optimize.minimize for optimizing the cost function. My code is as follows.
import os
import numpy as np
from scipy.special import expit
from scipy import optimize
datafile1 = os.path.join('data','ex2data1.txt')
data1 = np.loadtxt(datafile1, delimiter=',')
exam_scores, results = data1[:, :2], data1[:, 2]
m, n = exam_scores.shape
exam_scores = np.concatenate([np.ones([m, 1]), exam_scores], axis=1)
def cost_function(x, y, theta):
m = len(y)
hypothesis = expit(np.dot(x, theta))
term1 = -np.dot(y.T, np.log(hypothesis)) / m
term2 = -np.dot((1 - y).T, np.log(1 - hypothesis)) / m
cost = term1 + term2
return cost
def gradient(x, y, theta):
m = len(y)
hypothesis = expit(np.dot(x, theta))
return np.dot(hypothesis - y, x) / m
def minimize_cost(x, y, theta):
output = optimize.minimize(cost_function, theta, args=(x, y),
jac=gradient, options={'maxiter':400})
return output.fun, output.x
theta = np.zeros(n + 1)
theta, cost = minimize_cost(exam_scores, results, theta)
This gives me
<ipython-input-42-e2ba65cce1d8> in gradient(x, y, theta)
9 def gradient(x, y, theta):
10 m = len(y)
---> 11 hypothesis = expit(np.dot(x, theta))
12 return np.dot(hypothesis - y, x) / m
ValueError: shapes (3,) and (100,) not aligned: 3 (dim 0) != 100 (dim 0).
However the shape of theta and the output of the gradient function is the same, i.e. theta.shape == gradient(exam_scores, results, theta).shape gives me True.
I do not understand why is the gradient function raising a ValueError when called from minimize since by itself it is giving the expected output.
Any pointers would be appreciated.
P.S. Here is a part of the data.
exam_scores[:5, :]
array([[34.62365962, 78.02469282],
[30.28671077, 43.89499752],
[35.84740877, 72.90219803],
[60.18259939, 86.3085521 ],
[79.03273605, 75.34437644]])
results.reshape(m, 1)[:5, :]
array([[0.],
[0.],
[0.],
[1.],
[1.]])
Edit: Added part of the data.

Simple regression works with randn but not with random

Last night I wrote a simple binary logistic regression python code.
It seems to be working correctly (likelihood increases with each iteration, and I get good classification results).
My problem is that I can only initialize my weights with W = np.random.randn(n+1, 1) normal distribution.
But I don't want normal distribution, I want uniform distribution. But when I do that, I get the error
"RuntimeWarning: divide by zero encountered in log
return np.dot(Y.T, np.log(predictions)) + np.dot((onesVector - Y).T, np.log(onesVector - predictions))"
this is my code
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
return 1/(1+np.exp(-x))
def predict(X, W):
return sigmoid(np.dot(X, W))
def logLikelihood(X, Y, W):
m = X.shape[0]
predictions = predict(X, W)
onesVector = np.ones((m, 1))
return np.dot(Y.T, np.log(predictions)) + np.dot((onesVector - Y).T, np.log(onesVector - predictions))
def gradient(X, Y, W):
return np.dot(X.T, Y - predict(X, W))
def successRate(X, Y, W):
m = Y.shape[0]
predictions = predict(X, W) > 0.5
correct = (Y == predictions)
return 100 * np.sum(correct)/float(correct.shape[0])
trX = np.load("binaryMnistTrainX.npy")
trY = np.load("binaryMnistTrainY.npy")
teX = np.load("binaryMnistTestX.npy")
teY = np.load("binaryMnistTestY.npy")
m, n = trX.shape
trX = np.concatenate((trX, np.ones((m, 1))),axis=1)
teX = np.concatenate((teX, np.ones((teX.shape[0], 1))),axis=1)
W = np.random.randn(n+1, 1)
learningRate = 0.00001
numIter = 500
likelihoodArray = np.zeros((numIter, 1))
for i in range(0, numIter):
W = W + learningRate * gradient(trX, trY, W)
likelihoodArray[i, 0] = logLikelihood(trX, trY, W)
print("train success rate is %lf" %(successRate(trX, trY, W)))
print("test success rate is %lf" %(successRate(teX, teY, W)))
plt.plot(likelihoodArray)
plt.show()
If i initialize my W to be zeros or randn then it works.
If I initialize it to random (not normal) or ones, then I get the division by zero thing.
Why does this happen and how can I fix it?

Trying to plot a simple function - python

I implemented a simple linear regression and I want to try it out by fitting a non linear model
specifically I am trying to fit a model for the function y = x^3 + 5 for example
this is my code
import numpy as np
import numpy.matlib
import matplotlib.pyplot as plt
def predict(X,W):
return np.dot(X,W)
def gradient(X, Y, W, regTerm=0):
return (-np.dot(X.T, Y) + np.dot(np.dot(X.T,X),W))/(m*k) + regTerm * W /(n*k)
def cost(X, Y, W, regTerm=0):
m, k = Y.shape
n, k = W.shape
Yhat = predict(X, W)
return np.trace(np.dot(Y-Yhat,(Y-Yhat).T))/(2*m*k) + regTerm * np.trace(np.dot(W,W.T)) / (2*n*k)
def Rsquared(X, Y, W):
m, k = Y.shape
SSres = cost(X, Y, W)
Ybar = np.mean(Y,axis=0)
Ybar = np.matlib.repmat(Ybar, m, 1)
SStot = np.trace(np.dot(Y-Ybar,(Y-Ybar).T))
return 1-SSres/SStot
m = 10
n = 200
k = 1
trX = np.random.rand(m, n)
trX[:, 0] = 1
for i in range(2, n):
trX[:, i] = trX[:, 1] ** i
trY = trX[:, 1] ** 3 + 5
trY = np.reshape(trY, (m, k))
W = np.random.rand(n, k)
numIter = 10000
learningRate = 0.5
for i in range(0, numIter):
W = W - learningRate * gradient(trX, trY, W)
domain = np.linspace(0,1,100000)
powerDomain = np.copy(domain)
m = powerDomain.shape[0]
powerDomain = np.reshape(powerDomain, (m, 1))
powerDomain = np.matlib.repmat(powerDomain, 1, n)
for i in range(1, n):
powerDomain[:, i] = powerDomain[:, 0] ** i
print(Rsquared(trX, trY, W))
plt.plot(trX[:, 1],trY,'o', domain, predict(powerDomain, W),'r')
plt.show()
the R^2 I'm getting is very close to 1, meaning I found a very good fit to the training data, but it isn't shown on the plots. When I plot the data, it usually looks like this:
it looks as if I'm underfitting the data, but with such a complex hypothesis, with 200 features (meaning i allow polynomials up to x^200) and only 10 training examples, I should very clearly be overfitting data, so I expect the red line to pass through all the blue points and go wild between them.
This isn't what I'm getting which is confusing to me.
What's wrong?
You forgot to set powerDomain[:,0]=1, that's why your plot goes wrong at 0. And yes you are over fitting: look how quickly your plot fires up as soon as you get out of your training domain.

Categories

Resources