How to add L1 normalization in python? - python

I am trying to code logistic regression from scratch. In this code I have, I thought my cost derivative was my regularization, but I've been tasked with adding L1norm regularization. How do you add this in python? Should this be added where I have defined the cost derivative? Any help in the right direction is appreciated.
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, X):
return Sigmoid(X # theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T # (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)

Regularization adds a term to the cost function so that there is a compromise between minimize cost and minimizing the model parameters to reduce overfitting. You can control how much compromise you would like by adding a scalar e for the regularization term.
So just add the L1 norm of theta to the original cost function:
J = J + e * np.sum(abs(theta))
Since this term is added to the cost function, then it should be considered when computing the gradient of the cost function.
This is simple since the derivative of the sum is the sum of derivatives. So now just need to figure out what is the derivate of the term sum(abs(theta)). Since it is a linear term, then the derivative is constant. It is = 1 if theta >= 0, and -1 if theta < 0 (note there is a mathematical undeterminity at 0, but we don't care about it).
So in the function Cost_Function_Derivative we add:
J = J + alpha * e * (theta >= 0).astype(float)

Related

Why my gradient descent algorithm for linear regression diverges

Gradient descent for linear regression diverges even if I decrease the value for alpha or even delete it!
def gradientDescent(sqft_living, price, theta, alpha, num_iters):
m = price.size
# make a copy of theta, to avoid changing the original array, since numpy arrays
# are passed by reference to functions
theta = theta.copy()
J_history = [] # Use a python list to save cost in every iteration
theta_history = []
for z in range(num_iters):
term = 0
term0 = 0
for i in range(m):
h = np.dot(theta, sqft_living[i])
term += (h - price[i])
term0 += term * sqft_living[i][0]
temp0 = theta[0] - alpha * (1/m) * term0
term = 0
term1 = 0
for i in range(m):
h = np.dot(theta, sqft_living[i])
term += (h - price[i])
term1 += term * sqft_living[i][1]
temp1 = theta[1] - alpha * (1/m) * term1
theta[0] = temp0
theta[1] = temp1
cost = computeCost(sqft_living, price, theta)
print(cost)
J_history.append(cost)
theta_history.append(theta)
return theta, J_history
This is the using of the algorithm
# initialize fitting parameters
theta = np.zeros(2)
# some gradient descent settings
iterations = 100
alpha = 0.00001
theta, J_history = gradientDescent(sqft_living, price, theta, alpha, iterations)
print(J_history)
This is a part of the output :
cost = 3.8144815615142405e+22
cost = 8.337226954930875e+33
cost = 1.8222489338606256e+45
cost = 3.982848487760674e+56
cost = 8.705222311669148e+67
cost = 1.9026808508648173e+79
cost = 4.158646718757122e+90
cost = 9.089460549082665e+101
cost = 1.9866629377457605e+113
cost = 4.342204476162054e+124
cost = 9.49065860875058e+135
cost = 2.074351894810886e+147
cost = 4.533864256311113e+158

Neural Networks Using Python and NumPy

I am newbie to NN and I am trying to implement NN with Python/Numpy from the code I found at:
"Create a Simple Neural Network in Python from Scratch"
enter link description here
My input array is:
array([[5.71, 5.77, 5.94],
[5.77, 5.94, 5.51],
[5.94, 5.51, 5.88],
[5.51, 5.88, 5.73]])
Output array is:
array([[5.51],
[5.88],
[5.73],
[6.41]])
after running the code, I see following results which are not correct:
synaptic_weights after training
[[1.90625275]
[2.54867698]
[1.07698312]]
outputs after training
[[1.]
[1.]
[1.]
[1.]]
Here is the core of the code:
for iteration in range(1000):
input_layer = tr_input
outputs = sigmoid(np.dot(input_layer, synapic_weights))
error = tr_output - outputs
adjustmnets = error * sigmoid_derivative(outputs)
synapic_weights +=np.dot(input_layer.T, adjustmnets )
print('synaptic_weights after training')
print(synapic_weights)
print('outputs after training')
print(outputs)
What should I change in this code so it works for my data? Or shall I take different method? Any help is highly appreciated.
That's because you are using a wrong activation function (i.e. sigmoid). The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
If you want to train a model to predict the values in your array, you should use a regression model. Otherwise, you can convert your output into labels (for example are 5.x to 0 and 6.x to 1) and retrain your model.
These are the steps involved in my neural network implementation.
Randomly initialize weights (θ Theta)
Implement forward propagation
Compute cost function
Implement back propagation to compute partial derivative
Use gradient descent
def forward_prop(X, theta_list):
m = X.shape[0]
a_list = []
z_list = []
a_list.append(np.insert(X, 0, values=np.ones(m), axis=1))
idx = 0
for idx, thera in enumerate(theta_list):
z_list.append(a_list[idx] * (theta_list[idx].T))
if idx != (len(theta_list)-1):
a_list.append(np.insert(sigmoid(z_list[idx]), 0, values=np.ones(m), axis=1))
else:
a_list.append(sigmoid(z_list[idx]))
return a_list, z_list
def back_prop(params, input_size, hidden_layers, num_labels, X, y, regularization, regularize):
m = X.shape[0]
X = np.matrix(X)
y = np.matrix(y)
theta_list = []
startCount = 0
idx = 0
for idx, val in enumerate(hidden_layers):
if idx == 0:
startCount = val * (input_size + 1)
theta_list.append(np.matrix(np.reshape(params[:startCount], (val, (input_size + 1)))))
if idx != 0:
tempCount = startCount
startCount += (val * (hidden_layers[idx-1] + 1))
theta_list.append(np.matrix(np.reshape(params[tempCount:startCount], (val, (hidden_layers[idx-1] + 1)))))
if idx == (len(hidden_layers)-1):
theta_list.append(np.matrix(np.reshape(params[startCount:], (num_labels, (val + 1)))))
a_list, z_list= forward_prop(X, theta_list)
J = cost(X, y, a_list[len(a_list)-1], theta_list, regularization, regularize)
d_list = []
d_list.append(a_list[len(a_list)-1] - y)
idx = 0
while idx < (len(theta_list)-1):
d_temp = np.multiply(d_list[idx] * theta_list[len(a_list) - 2 - idx], sigmoid_gradient(a_list[len(a_list) - 2 - idx]))
d_list.append(d_temp[:,1:])
idx += 1
delta_list = []
for theta in theta_list:
delta_list.append(np.zeros(theta.shape))
for idx, delta in enumerate(delta_list):
delta_list[idx] = delta_list[idx] + ((d_list[len(d_list) - 1 -idx].T) * a_list[idx])
delta_list[idx] = delta_list[idx] / m
if regularize:
for idx, delta in enumerate(delta_list):
delta_list[idx][:, 1:] = delta_list[idx][:, 1:] + (theta_list[idx][:, 1:] * regularization)
grad_list = np.ravel(delta_list[0])
idx = 1
while idx < (len(delta_list)):
grad_list = np.concatenate((grad_list, np.ravel(delta_list[idx])), axis=None)
idx += 1
return J, grad_list
def cost(X, y, h, theta_list, regularization, regularize):
m = X.shape[0]
X = np.matrix(X)
y = np.matrix(y)
J = (np.multiply(-y, np.log(h)) - np.multiply((1 - y), np.log(1 - h))).sum() / m
if regularize:
regularization_value = 0.0
for theta in theta_list:
regularization_value += np.sum(np.power(theta[:, 1:], 2))
J += (float(regularization) / (2 * m)) * regularization_value
return J
Implementation

How to vectorize Logistic Regression?

I'm trying to implement regularized logistic regression using python for the coursera ML class but I'm having a lot of trouble vectorizing it. Using this repository:
I've tried many different ways but never get the correct gradient or cost heres my current implementation:
h = utils.sigmoid( np.dot(X, theta) )
J = (-1/m) * ( y.T.dot( np.log(h) ) + (1 - y.T).dot( np.log( 1 - h ) ) ) + ( lambda_/(2*m) ) * np.sum( np.square(theta[1:]) )
grad = ((1/m) * (h - y).T.dot( X )).T + grad_theta_reg
Here are the results:
Cost : 0.693147
Expected
cost: 2.534819
Gradients:
[-0.100000, -0.030000, -0.080000, -0.130000]
Expected gradients:
[0.146561, -0.548558, 0.724722, 1.398003]
Any help from someone who knows whats going on would be much appreciated.
Bellow a working snippet of a vectorized version of Logistic Regression. You can see more here https://github.com/hzitoun/coursera_machine_learning_matlab_python
Main
theta_t = np.array([[-2], [-1], [1], [2]])
data = np.arange(1, 16).reshape(3, 5).T
X_t = np.c_[np.ones((5,1)), data/10]
y_t = (np.array([[1], [0], [1], [0], [1]]) >= 0.5) * 1
lambda_t = 3
J, grad = lrCostFunction(theta_t, X_t, y_t, lambda_t), lrGradient(theta_t, X_t, y_t, lambda_t, flattenResult=False)
print('\nCost: f\n', J)
print('Expected cost: 2.534819\n')
print('Gradients:\n')
print(' f \n', grad)
print('Expected gradients:\n')
print(' 0.146561\n -0.548558\n 0.724722\n 1.398003\n')
lrCostFunction
from sigmoid import sigmoid
import numpy as np
def lrCostFunction(theta, X, y, reg_lambda):
"""LRCOSTFUNCTION Compute cost and gradient for logistic regression with
regularization
J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
theta as the parameter for regularized logistic regression and the
gradient of the cost w.r.t. to the parameters.
"""
m, n = X.shape #number of training examples
theta = theta.reshape((n,1))
prediction = sigmoid(X.dot(theta))
cost_y_1 = (1 - y) * np.log(1 - prediction)
cost_y_0 = -1 * y * np.log(prediction)
J = (1.0/m) * np.sum(cost_y_0 - cost_y_1) + (reg_lambda/(2.0 * m)) * np.sum(np.power(theta[1:], 2))
return J
lrGradient
from sigmoid import sigmoid
import numpy as np
def lrGradient(theta, X,y, reg_lambda, flattenResult=True):
m,n = X.shape
theta = theta.reshape((n,1))
prediction = sigmoid(np.dot(X, theta))
errors = np.subtract(prediction, y)
grad = (1.0/m) * np.dot(X.T, errors)
grad_with_regul = grad[1:] + (reg_lambda/m) * theta[1:]
firstRow = grad[0, :].reshape((1,1))
grad = np.r_[firstRow, grad_with_regul]
if flattenResult:
return grad.flatten()
return grad
Hope that helped!

Multivariate Regression Numpy for Math Homework

I'm looking to use multivariate regression with least squares as my cost function to find a,b,c for ax^2 +bx + c that best fits cos(x) from (-2,2). My cost won't decrease but is ridiculously high- what I am doing wrong?
x = np.linspace(-2,2,100)
y = np.cos(x)
theta = np.random.random((3,1))
m = len(y)
for i in range(10000):
#Calculate my y_hat
y_hat = np.array([(theta[0]*(a**2) + theta[1]*a + theta[2]) for a in x])
#Calculate my cost based off y_hat and y
cost = np.sum((y_hat - y) ** 2) * (1/m)
#Calculate my derivatives based off y_hat and x
da = (2 / m) * np.sum((y_hat - y) * (x**2))
db = (2 / m) * np.sum((y_hat - y) * (x))
dc = (2 / m) * np.sum((y_hat - y))
#update step
theta[0] = theta[0] - 0.0001*(da)
theta[1] = theta[1] - 0.0001*(db)
theta[2] = theta[2] - 0.0001*(dc)
print("Epoch Num: {} Cost: {}".format(i, cost))
print(theta)
You're calculation of y_hat is slightly incorrect. It's currently a 2D array of shape (100,1).
This should help. It pulls the "zeroith" element from each of the rows:
theta_ = [(theta[0]*(a**2) + theta[1]*a + theta[2]) for a in x]
y_hat = np.array([t[0] for t in theta_])

Implementing naive gradient descent in python

I'm trying to implement a very naive gradient descent in python. However, it looks like it goes into an infinite loop. Could you please help me debug it?
y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
optimum = initial_guess
while derivative(optimum) != 0:
optimum = optimum - derivative(optimum)
else:
return optimum
gradient_descent(y,dy_dx,5)
Edit:
Now I have this code, I really can't comprehend the output. P.s. It might freeze your CPU.
y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
optimum = initial_guess
while abs(derivative(optimum)) > 0.01:
optimum = optimum - 2*derivative(optimum)
print((optimum,derivative(optimum)))
else:
return optimum
gradient_descent(y,dy_dx,5)
Now I'm trying to apply it to a regression problem, however the output doesn't appear to be correct as shown in the output below:
Output of gradient descent code below
import matplotlib.pyplot as plt
def stepGradient(x,y, step):
b_current = 0
m_current = 0
b_gradient = 0
m_gradient = 0
N = int(len(x))
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
while abs(b_gradient) > 0.01 and abs(m_gradient) > 0.01:
b_current = b_current - (step * b_gradient)
m_current = m_current - (step * m_gradient)
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
return [b_current, m_current]
x = [1,2, 2,3,4,5,7,8]
y = [1.5,3,1,3,2,5,6,7]
step = 0.00001
(b,m) = stepGradient(x,y,step)
plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()
Fixed :D
import matplotlib.pyplot as plt
def stepGradient(x,y):
step = 0.001
b_current = 0
m_current = 0
b_gradient = 0
m_gradient = 0
N = int(len(x))
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
while abs(b_gradient) > 0.01 or abs(m_gradient) > 0.01:
b_current = b_current - (step * b_gradient)
m_current = m_current - (step * m_gradient)
b_gradient= 0
m_gradient = 0
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
return [b_current, m_current]
x = [1,2, 2,3,4,5,7,8,10]
y = [1.5,3,1,3,2,5,6,7,20]
(b,m) = stepGradient(x,y)
plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()
Your while loop stops only when a calculated floating-point value equals zero. This is naïve, since floating-point values are rarely calculated exactly. Instead, stop the loop when the calculated value is close enough to zero. Use something like
while math.abs(derivative(optimum)) > eps:
where eps is the desired precision of the calculated value. This could be made another parameter, perhaps with a default value of 1e-10 or some such.
That said, the problem in your case is worse. Your algorithm is far too naïve in assuming that the calculation
optimum = optimum - 2*derivative(optimum)
will move the value of optimum closer to the actual optimum value. In your particular case, the variable optimum just cycles back and forth between 5 (your initial guess) and -5. Note that the derivative at 5 is 10 and the derivative at -5 is -10.
So you need to avoid such cycling. You could multiply your delta 2*derivative(optimum) by something smaller than 1, which would work in your particular case y=x**2. But this will not work in general.
To be completely safe, 'bracket' your optimum point with a smaller value and a larger value, and use the derivative to find the next guess. But ensure that your next guess does not go outside the bracketed interval. If it does, or if the convergence of your guesses is too slow, use another method such as bisection or golden mean search.
Of course, this means your 'very naïve gradient descent' algorithm is too naïve to work in general. That's why real optimization routines are more complicated.
You also need to decrease your step size (gamma in the gradient descent formula):
y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
optimum = initial_guess
while abs(derivative(optimum)) > 0.01:
optimum = optimum - 0.01*derivative(optimum)
print((optimum,derivative(optimum)))
else:
return optimum

Categories

Resources