I'm looking to use multivariate regression with least squares as my cost function to find a,b,c for ax^2 +bx + c that best fits cos(x) from (-2,2). My cost won't decrease but is ridiculously high- what I am doing wrong?
x = np.linspace(-2,2,100)
y = np.cos(x)
theta = np.random.random((3,1))
m = len(y)
for i in range(10000):
#Calculate my y_hat
y_hat = np.array([(theta[0]*(a**2) + theta[1]*a + theta[2]) for a in x])
#Calculate my cost based off y_hat and y
cost = np.sum((y_hat - y) ** 2) * (1/m)
#Calculate my derivatives based off y_hat and x
da = (2 / m) * np.sum((y_hat - y) * (x**2))
db = (2 / m) * np.sum((y_hat - y) * (x))
dc = (2 / m) * np.sum((y_hat - y))
#update step
theta[0] = theta[0] - 0.0001*(da)
theta[1] = theta[1] - 0.0001*(db)
theta[2] = theta[2] - 0.0001*(dc)
print("Epoch Num: {} Cost: {}".format(i, cost))
print(theta)
You're calculation of y_hat is slightly incorrect. It's currently a 2D array of shape (100,1).
This should help. It pulls the "zeroith" element from each of the rows:
theta_ = [(theta[0]*(a**2) + theta[1]*a + theta[2]) for a in x]
y_hat = np.array([t[0] for t in theta_])
Related
i wrote the below program in python with the hope of conducting a Helmholtz decomposition on a vector V(x,z)=[f(x,z),0,0] where f(x,z) is a function defined earlier, the aim of this program is to get the solenoidal and harmonic parts of vector V as S(x,z)=[S1(x,z),S2(x,z),S3(x,z)] and H(x,z)=[H1(x,z),H2(x,z),H3(x,z)] with S and H satisfying the condition V=S+H which transllates to (S1+H1=f, S2+H2=0, S3+H3=0)
please help i cant get anywhere with this problem, the output of the above code isnt what i wanted , its the following:
Solenoidal:
[[-22.6179559436889 + 41.14742726254I, 33.243161684442 - 99.9416505604629I, -22.6179559436889 + 41.14742726254I], [0.000151144774536593 + 0.000222403457962539I, 0, -0.000151144774536593 - 0.000222403457962539I], [22.6210744289585 - 41.1540953247099I, -41.2442631673893 + 88.1909008014634I, 6.6295316668479 - 64.6849359328842I]]
Harmonic:
[[26.6155393446675 - 35.2651619174123I, -33.243161684442 + 99.9416505604629I, 18.6203725427103 - 47.0296926076676I], [-0.000151144774536593 - 0.000222403457962539I, 0, 0.000151144774536593 + 0.000222403457962539I], [-18.6231887384308 + 47.0368054767535I, 41.2442631673893 - 88.1909008014634I, -10.6274173573755 + 58.8022257808406I]]
`
import math
import numpy as np
from sympy import symbols, simplify, lambdify
# Define x and z as symbolic variables
x, z = symbols('x, z')
# Define the function f
def f(x, z):
term1 = 171.05 * 10**(-18) * ((1.00 * x**4 + 2.00 * x**2 * z**2 + 1.00 * z**4) * math.atan(z*x) - 1.00 * x**3 * z - 1.00 * x * z**3)
term2 = -3.17 * 10**6 * x**4 - 6.36 * 10**6 * x**2 * z**2 - 3.19 * 10**6 * z**4 + 1.00 * x**4 * z + 2.00 * x**2 * z**3 + 1.00 * z**5
term3 = (z - 44.33 * 10**3)
term4 = ((-2.00 * 10**3) / (576.30 * 10**3 + 13.00 * z))**2.69 * (x**2 + z**2)**7.00 / 2.00 * z
return term1 * term2 * term3 / (term4 + 1e-15) # Add a small value to term4 to avoid division by zero
# Define a 2D array with 3 elements
vector = np.array([[f(x, z) for x in range(-1, 2)] for z in range(-1, 2)])
def helmholtz_hodge_decomposition(vector):
# Compute the gradient of the vector field
gradient = np.gradient(vector)
# Compute the curl of the vector field
curl = np.cross(gradient[0], gradient[1])
# Compute the divergence of the vector field
divergence = np.sum(gradient, axis=0)
# Compute the harmonic part of the vector field
harmonic = -curl - divergence
# Compute the solenoidal part of the vector field
solenoidal = vector - harmonic
return solenoidal, harmonic
# Print the solenoidal and harmonic parts as functions of x and z
solenoidal, harmonic = helmholtz_hodge_decomposition(vector)
print("Solenoidal:")
print(simplify(solenoidal))
print("Harmonic:")
print(simplify(harmonic))
# Create functions from the solenoidal and harmonic parts
solenoidal_part = lambdify((x, z), simplify(solenoidal), 'numpy')
harmonic_part = lambdify((x, z), simplify(harmonic), 'numpy')
`
expecting :Conducting a Helmholtz decomposition on a vector V(x,z)=[f(x,z),0,0] where f(x,z) is a function defined earlier, the aim of this program is to get the solenoidal and harmonic parts of vector V as S(x,z)=[S1(x,z),S2(x,z),S3(x,z)] and H(x,z)=[H1(x,z),H2(x,z),H3(x,z)] with S and H satisfying the condition V=S+H which transllates to (S1+H1=f, S2+H2=0, S3+H3=0)
I have tried to implement gradient descent myself using Python. I know there are similar topics on this, but for my attempt, my guess slope can always get really close to the real slope, but the guess intercept never matched or even come close to the real intercept. Does anyone know why is that happening?
Also, I read a lot of gradient descent post and formula, it says for each iteration, I need to multiply the gradient by the negative learning rate and repeat until it converges. As you can see in my implementation below, my gradient descent only works when I multiply the learning rate to the gradient and not by -1. Why is that? Did I understand the gradient descent wrong or is my implementation wrong? (The exam_m and exam_b will quickly go overflow if I multiply the learning rate and gradient by -1)
intercept = -5
slope = -4
x = []
y = []
for i in range(0, 100):
x.append(i/300)
y.append((i * slope + intercept)/300)
learning_rate = 0.005
# y = mx + b
# m is slope, b is y-intercept
exam_m = 100
exam_b = 100
#iteration
#My error function is sum all (y - guess) ^2
for _ in range(20000):
gradient_m = 0
gradient_b = 0
for i in range(len(x)):
gradient_m += (y[i] - exam_m * x[i] - exam_b) * x[i]
gradient_b += (y[i] - exam_m * x[i] - exam_b)
#why not gradient_m -= (y[i] - exam_m * x[i] - exam_b) * x[i] like what it said in the gradient descent formula
exam_m += learning_rate * gradient_m
exam_b += learning_rate * gradient_b
print(exam_m, exam_b)
The reason for overflow is the missing factor (2/n). I have broadly shown the use of negative signs for more clarification.
import numpy as np
import matplotlib.pyplot as plt
intercept = -5
slope = -4
# y = mx + b
x = []
y = []
for i in range(0, 100):
x.append(i/300)
y.append((i * slope + intercept)/300)
n = len(x)
x = np.array(x)
y = np.array(y)
learning_rate = 0.05
exam_m = 0
exam_b = 0
epochs = 1000
for _ in range(epochs):
gradient_m = 0
gradient_b = 0
for i in range(n):
gradient_m -= (y[i] - exam_m * x[i] - exam_b) * x[i]
gradient_b -= (y[i] - exam_m * x[i] - exam_b)
exam_m = exam_m - (2/n)*learning_rate * gradient_m
exam_b = exam_b - (2/n)*learning_rate * gradient_b
print('Slope, Intercept: ', exam_m, exam_b)
y_pred = exam_m*x + exam_b
plt.xlabel('x')
plt.ylabel('y')
plt.plot(x, y_pred, '--', color='black', label='predicted_line')
plt.plot(x, y, '--', color='blue', label='orginal_line')
plt.legend()
plt.show()
Output:
Slope, Intercept: -2.421033215481844 -0.2795651072061604
I'm trying to implement regularized logistic regression using python for the coursera ML class but I'm having a lot of trouble vectorizing it. Using this repository:
I've tried many different ways but never get the correct gradient or cost heres my current implementation:
h = utils.sigmoid( np.dot(X, theta) )
J = (-1/m) * ( y.T.dot( np.log(h) ) + (1 - y.T).dot( np.log( 1 - h ) ) ) + ( lambda_/(2*m) ) * np.sum( np.square(theta[1:]) )
grad = ((1/m) * (h - y).T.dot( X )).T + grad_theta_reg
Here are the results:
Cost : 0.693147
Expected
cost: 2.534819
Gradients:
[-0.100000, -0.030000, -0.080000, -0.130000]
Expected gradients:
[0.146561, -0.548558, 0.724722, 1.398003]
Any help from someone who knows whats going on would be much appreciated.
Bellow a working snippet of a vectorized version of Logistic Regression. You can see more here https://github.com/hzitoun/coursera_machine_learning_matlab_python
Main
theta_t = np.array([[-2], [-1], [1], [2]])
data = np.arange(1, 16).reshape(3, 5).T
X_t = np.c_[np.ones((5,1)), data/10]
y_t = (np.array([[1], [0], [1], [0], [1]]) >= 0.5) * 1
lambda_t = 3
J, grad = lrCostFunction(theta_t, X_t, y_t, lambda_t), lrGradient(theta_t, X_t, y_t, lambda_t, flattenResult=False)
print('\nCost: f\n', J)
print('Expected cost: 2.534819\n')
print('Gradients:\n')
print(' f \n', grad)
print('Expected gradients:\n')
print(' 0.146561\n -0.548558\n 0.724722\n 1.398003\n')
lrCostFunction
from sigmoid import sigmoid
import numpy as np
def lrCostFunction(theta, X, y, reg_lambda):
"""LRCOSTFUNCTION Compute cost and gradient for logistic regression with
regularization
J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
theta as the parameter for regularized logistic regression and the
gradient of the cost w.r.t. to the parameters.
"""
m, n = X.shape #number of training examples
theta = theta.reshape((n,1))
prediction = sigmoid(X.dot(theta))
cost_y_1 = (1 - y) * np.log(1 - prediction)
cost_y_0 = -1 * y * np.log(prediction)
J = (1.0/m) * np.sum(cost_y_0 - cost_y_1) + (reg_lambda/(2.0 * m)) * np.sum(np.power(theta[1:], 2))
return J
lrGradient
from sigmoid import sigmoid
import numpy as np
def lrGradient(theta, X,y, reg_lambda, flattenResult=True):
m,n = X.shape
theta = theta.reshape((n,1))
prediction = sigmoid(np.dot(X, theta))
errors = np.subtract(prediction, y)
grad = (1.0/m) * np.dot(X.T, errors)
grad_with_regul = grad[1:] + (reg_lambda/m) * theta[1:]
firstRow = grad[0, :].reshape((1,1))
grad = np.r_[firstRow, grad_with_regul]
if flattenResult:
return grad.flatten()
return grad
Hope that helped!
I am trying to code logistic regression from scratch. In this code I have, I thought my cost derivative was my regularization, but I've been tasked with adding L1norm regularization. How do you add this in python? Should this be added where I have defined the cost derivative? Any help in the right direction is appreciated.
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, X):
return Sigmoid(X # theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T # (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)
Regularization adds a term to the cost function so that there is a compromise between minimize cost and minimizing the model parameters to reduce overfitting. You can control how much compromise you would like by adding a scalar e for the regularization term.
So just add the L1 norm of theta to the original cost function:
J = J + e * np.sum(abs(theta))
Since this term is added to the cost function, then it should be considered when computing the gradient of the cost function.
This is simple since the derivative of the sum is the sum of derivatives. So now just need to figure out what is the derivate of the term sum(abs(theta)). Since it is a linear term, then the derivative is constant. It is = 1 if theta >= 0, and -1 if theta < 0 (note there is a mathematical undeterminity at 0, but we don't care about it).
So in the function Cost_Function_Derivative we add:
J = J + alpha * e * (theta >= 0).astype(float)
Performed a regression with numpy and scipy using loss minimization with constraints. Here is an example:
y_values is a vector with numObservations values
x_matrix_trans is the x matrix
we want to solve y = Xc with a constraint such that some of the coefficients multiplied by some input weights have to sum to 0.
def constraint1(x):
res = 0
for i in range (0, NUM_WEIGHTS):
res = res + x[CONST_VAL + i] * weights[CONST_VAL + i]
return res
def loss(x):
return np.sum(np.square((np.dot(x, x_matrix_trans) - y_values)))
cons = ({'type': 'eq',
'fun' : constraint1})
x0 = np.zeros(x_matrix_trans.shape[0])
res = minimize(loss, x0, method='SLSQP',constraints=cons, options={'disp': True, 'maxiter' : 1000, 'ftol' : 1e-07})
print(res.x)
My regression produced the correct values in res.x but I also needed to calculate the r_squared and the adjusted r_squared. I tried to calculate the r_squared but they turned out incorrect.
Here is how I attempted to calculate the r_squared:
ymeas = y_values
yfit = np.dot(res.x, x_matrix_trans)
ss_res = np.sum((ymeas - yfit) ** 2)
ss_tot = np.var(ymeas) * len(ymeas)
rsq = 1 - ss_res / ss_tot
Here is how I attempted to calculate the adjusted r_squared:
adjRsq = 0.0
if ((num_coefficients - num_observations - 1) != 0):
adjRsq = rsq - (1-rsq)*(num_coefficients - 1)/( num_observations -num_coefficients - 1)
Thanks.
Well turns out my answer was correct all along. With the exception of adjusted r_square which according to wikipedia should be
adjRsq = rsq - (1-rsq)*(num_coeff)/(num_observations-num_coeff-1)
and not
adjRsq = rsq - (1-rsq)*(num_coeff - 1)/(num_observations-num_coeff-1)