Why my gradient descent algorithm for linear regression diverges

Why my gradient descent algorithm for linear regression diverges - python

Gradient descent for linear regression diverges even if I decrease the value for alpha or even delete it!
def gradientDescent(sqft_living, price, theta, alpha, num_iters):
m = price.size
# make a copy of theta, to avoid changing the original array, since numpy arrays
# are passed by reference to functions
theta = theta.copy()
J_history = [] # Use a python list to save cost in every iteration
theta_history = []
for z in range(num_iters):
term = 0
term0 = 0
for i in range(m):
h = np.dot(theta, sqft_living[i])
term += (h - price[i])
term0 += term * sqft_living[i][0]
temp0 = theta[0] - alpha * (1/m) * term0
term = 0
term1 = 0
for i in range(m):
h = np.dot(theta, sqft_living[i])
term += (h - price[i])
term1 += term * sqft_living[i][1]
temp1 = theta[1] - alpha * (1/m) * term1
theta[0] = temp0
theta[1] = temp1
cost = computeCost(sqft_living, price, theta)
print(cost)
J_history.append(cost)
theta_history.append(theta)
return theta, J_history
This is the using of the algorithm
# initialize fitting parameters
theta = np.zeros(2)
# some gradient descent settings
iterations = 100
alpha = 0.00001
theta, J_history = gradientDescent(sqft_living, price, theta, alpha, iterations)
print(J_history)
This is a part of the output :
cost = 3.8144815615142405e+22
cost = 8.337226954930875e+33
cost = 1.8222489338606256e+45
cost = 3.982848487760674e+56
cost = 8.705222311669148e+67
cost = 1.9026808508648173e+79
cost = 4.158646718757122e+90
cost = 9.089460549082665e+101
cost = 1.9866629377457605e+113
cost = 4.342204476162054e+124
cost = 9.49065860875058e+135
cost = 2.074351894810886e+147
cost = 4.533864256311113e+158

Related

CFD simulation (with multiple for loops and matrix operations) is very slow to run. Looking to replace with faster numpy functions (or alternative)

As mentioned above, the function below works, however its very slow. I am very interested in using faster/optimised numpy (or other) vectorized alternatives. I have not posted the entire script here due to it being too large.
My specific question is - are there suitable numpy (or other) functions that I can use to 1) reduce run time and 2) reduce code volume of this function, specifically the for loop?
Edit: mass, temp, U and dpdh are functions that carry out simple algebraic calculations and return constants
def my_system(t, y, n, hIn, min, mAlumina, cpAlumina, sa, V):
dydt = np.zeros(3 * n) #setting up zeros array for solution (solving for [H0,Ts0,m0,H1,Ts1,m1,H2,Ts2,m2,..Hn,Tsn,mn])
# y = [h_0, Ts_0, m_0, ... h_n, Ts_n, m_n]
# y[0] = hin
# y[1] = Ts0
# y[2] = minL
i=0
## Using thermo
T = temp(y[i],P) #initial T
m = mass(y[i],P) #initial m
#initial values
dydt[i] = (min * (hIn - y[i]) + (U(hIn,P,min) * sa * (y[i + 1] - T))) / m # dH/dt (eq. 2)
dydt[i + 1] = -(U(hIn,P,min) * sa * (y[i + 1] - T)) / (mAlumina * cpAlumina) # dTs/dt from eq.3
dmdt = dydt[i] * dpdh(y[i], P) * V # dm/dt (holdup variation) eq. 4b
dydt[i + 2] = min - dmdt # mass flow out (eq.4a)
for i in range(3, 3 * n, 3): #starting at index 3, and incrementing by 3 because we are solving for 'triplets' [h,Ts,m] in each loop
## Using thermo
T = temp(y[i],P)
m = mass(y[i],P)
# [h, TS, mdot]
dydt[i] = (dydt[i-1] * (y[i - 3] - y[i]) + (U(y[i-3], P, dydt[i-1]) * sa * (y[i + 1] - T))) / m # dH/dt (eq.2), dydt[i-1] is the mass of the previous tank
dydt[i + 1] = -(U(y[i-3], P, dydt[i-1]) * sa * (y[i + 1] - T)) / (mAlumina * cpAlumina) # dTs/dt eq. (3)
dmdt = dydt[i] * dpdh(y[i], P) * V # Equation 4b
dydt[i + 2] = dydt[i-1] - dmdt # Equation 4a
return dydt
The functions mass, temp, U, and dpdh used inside the my_system function all take numbers as input, perform some simple algebraic operation and return a number (no need to optimise these I am just providing them for further context)
def temp(H,P):
# returns temperature given enthalpy (after processing function)
T = flasher.flash(H=H, P=P, zs=zs, retry=True).T
return T
def mass(H, P):
# returns mass holdup in mol
m = flasher.flash(H=H, P=P, zs=zs, retry=True).rho()*V
return m
def dpdh(H, P):
res = flasher.flash(H=H, P=P, zs=zs, retry=True)
if res.phase_count == 1:
if res.phase == 'L':
drho_dTf = res.liquid0.drho_dT()
else:
drho_dTf = res.gas.drho_dT()
else:
drho_dTf = res.bulk._equilibrium_derivative(of='rho', wrt='T', const='P')
dpdh = drho_dTf/res.dH_dT_P()
return dpdh
def U(H,P,m):
# Given T, P, m
air = Mixture(['nitrogen', 'oxygen'], Vfgs=[0.79, 0.21], H=H, P=P)
mu = air.mu*1000/mWAir #mol/m.s
cp = air.Cpm #J/mol.K
kg = air.k #W/m.K
g0 = m/areaBed #mol/m2.s
a = sa*n/vTotal #m^2/m^3 #QUESTIONABLE
psi = 1
beta = 10
pr = (mu*cp)/kg
re = (6*g0)/(a*mu*psi)
hfs = ((2.19*(re**1/3)) + (0.78*(re**0.619)))*(pr**1/3)*(kg)/diameterParticle
h = 1/((1/hfs) + ((diameterParticle/beta)/kAlumina))
return h
Reference Image:
enter image description here

For improving the speed, you can see Numba, which is useable if you use NumPy a lot but not every code can be used with Numba. Apart from that, the formulation of the equation system is confusing. You are solving 3 equations and adding the result to a single dydt list by 3 elements each. You can simply create three lists, solve each equation and add them to their respective list. For this, you need to re-write my_system as:
import numpy as np
def my_system(t, RHS, hIn, Ts0, minL, mAlumina, cpAlumina, sa, V):
# get initial boundary condition values
y1 = RHS[0]
y2 = RHS[1]
y3 = RHS[2]
## Using thermo
T = # calculate T
m = # calculate m
# [h, TS, mdot] solve dy1dt for h, dy2dt for TS and dy3dt for mdot
dy1dt = # dH/dt (eq.2), y1 corresponds to initial or previous value of dy1dt
dy2dt = # dTs/dt eq. (3), y2 corresponds to initial or previous value of dy2dt
dmdt = # Equation 4b
dy3dt = # Equation 4a, y3 corresponds to initial or previous value of dy3dt
# Left-hand side of ODE
LHS = np.zeros([3,])
LHS[0] = dy1dt
LHS[1] = dy2dt
LHS[2] = dy3dt
return LHS
In this function, you can pass RHS as a list with initial values ([dy1dt, dy2dt, dy3dt]) which will be unpacked as y1, y2, and y3 respectively and use them for respective differential equations. The solved equations (next values) will be saved to dy1dt, dy2dt, and dy3dt which will be returned as a list LHS.
Now you can solve this using scipy.integrate.odeint. Therefore, you can leave the for loop structure and solve the equations by using this method as follows:
hIn = #some val
Ts0 = #some val
minL = #some val
mAlumina = #some vaL
cpAlumina = #some val
sa = #some val
V = #some val
P = #some val
## Using thermo
T = temp(hIn,P) #initial T
m = mass(hIn,P) #initial m
#initial values
y01 = # calculate dH/dt (eq. 2)
y02 = # calculate dTs/dt from eq.3
dmdt = # calculate dm/dt (holdup variation) eq. 4b
y03 = # calculatemass flow out (eq.4a)
n = # time till where you want to solve the equation system
y0 = [y01, y02, y03]
step_size = 1
t = np.linspace(0, n, int(n/step_size)) # use that start time to which initial values corresponds
res = odeint(my_sytem, y0, t, args=(hIn, Ts0, minL, mAlumina, cpAlumina, sa, V,), tfirst=True)
print(res[:,0]) # print results for dH/dt
print(res[:,1]) # print results for dTs/dt
print(res[:,2]) # print results for Equation 4a
Here, I have passed all the initial values as y0 and chosen a step size of 1 which you can change as per your need.

Why I'm getting a huge cost in Stochastic Gradient Descent Implementation?

I've run into some problems while trying to implement Stochastic Gradient Descent, and basically what is happening is that my cost is growing like crazy and I don't have a clue why.
MSE implementation:
def mse(x,y,w,b):
predictions = x # w
summed = (np.square(y - predictions - b)).mean(0)
cost = summed / 2
return cost
Gradients:
def grad_w(y,x,w,b,n_samples):
return -y # x / n_samples + x.T # x # w / n_samples + b * x.mean(0)
def grad_b(y,x,w,b,n_samples):
return -y.mean(0) + x.mean(0) # w + b
SGD Implementation:
def stochastic_gradient_descent(X,y,w,b,learning_rate=0.01,iterations=500,batch_size =100):
length = len(y)
cost_history = np.zeros(iterations)
n_batches = int(length/batch_size)
for it in range(iterations):
cost =0
indices = np.random.permutation(length)
X = X[indices]
y = y[indices]
for i in range(0,length,batch_size):
X_i = X[i:i+batch_size]
y_i = y[i:i+batch_size]
w -= learning_rate*grad_w(y_i,X_i,w,b,length)
b -= learning_rate*grad_b(y_i,X_i,w,b,length)
cost = mse(X_i,y_i,w,b)
cost_history[it] = cost
if cost_history[it] <= 0.0052: break
return w, cost_history[:it]
Random Variables:
w_true = np.array([0.2, 0.5,-0.2])
b_true = -1
first_feature = np.random.normal(0,1,1000)
second_feature = np.random.uniform(size=1000)
third_feature = np.random.normal(1,2,1000)
arrays = [first_feature,second_feature,third_feature]
x = np.stack(arrays,axis=1)
y = x # w_true + b_true + np.random.normal(0,0.1,1000)
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 1.0
After running this:
theta,cost_history = stochastic_gradient_descent(x,y,w,b)
print('Final cost/MSE: {:0.3f}'.format(cost_history[-1]))
I Get that:
Final cost/MSE: 3005958172614261248.000
And here is the plot

Here are a few suggestions:
your learning rate is too big for the training: changing it to something like 1e-3 should be fine.
your update part could be slightly modified as follows:
def stochastic_gradient_descent(X,y,w,b,learning_rate=0.01,iterations=500,batch_size =100):
length = len(y)
cost_history = np.zeros(iterations)
n_batches = int(length/batch_size)
for it in range(iterations):
cost =0
indices = np.random.permutation(length)
X = X[indices]
y = y[indices]
for i in range(0,length,batch_size):
X_i = X[i:i+batch_size]
y_i = y[i:i+batch_size]
w -= learning_rate*grad_w(y_i,X_i,w,b,len(X_i)) # the denominator should be the actual batch size
b -= learning_rate*grad_b(y_i,X_i,w,b,len(X_i))
cost += mse(X_i,y_i,w,b)*len(X_i) # add batch loss
cost_history[it] = cost/length # this is a running average of your batch losses, which is statistically more stable
if cost_history[it] <= 0.0052: break
return w, b, cost_history[:it]
The final results:
w_true = np.array([0.2, 0.5, -0.2])
b_true = -1
first_feature = np.random.normal(0,1,1000)
second_feature = np.random.uniform(size=1000)
third_feature = np.random.normal(1,2,1000)
arrays = [first_feature,second_feature,third_feature]
x = np.stack(arrays,axis=1)
y = x # w_true + b_true + np.random.normal(0,0.1,1000)
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 0.0
theta,bias,cost_history = stochastic_gradient_descent(x,y,w,b,learning_rate=1e-3,iterations=3000)
print("Final epoch cost/MSE: {:0.3f}".format(cost_history[-1]))
print("True final cost/MSE: {:0.3f}".format(mse(x,y,theta,bias)))
print(f"Final coefficients:\n{theta,bias}")

Hey #TQCH and thanks for that. I've come up with a different approach to implement SGD without an inner loop and the results were also pretty sweet.
def stochastic_gradient_descent(X,y,w,b,learning_rate=0.35,iterations=3000,batch_size =100):
length = len(y)
cost_history = np.zeros(iterations)
n_batches = int(length/batch_size)
marker = 0
cost = mse(X,y,w,b)
print(cost)
for it in range(iterations):
cost =0
indices = np.random.choice(length, batch_size)
X_i = X[indices]
y_i = y[indices]
w -= learning_rate*grad_w(y_i,X_i,w,b)
b -= learning_rate*grad_b(y_i,X_i,w,b)
cost = mse(X_i,y_i,w,b)
cost_history[it] = cost
if cost_history[it] <= 0.0075 and cost_history[it] > 0.0071: marker = it
if cost <= 0.0052: break
print(f'{w}, {b}')
return w, cost_history, marker, cost
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 1.0
theta,cost_history, marker, cost = stochastic_gradient_descent(x,y,w,b)
print(f'Number of iterations: {marker}')
print('Final cost/MSE: {:0.3f}'.format(cost))
which gave me these results:
1.9443112664859845,
[ 0.19592532 0.31735225 -0.20044424], -0.9059800816290591
Number of iterations: 68
Final cost/MSE: 0.005
But you're right I missed that I was dividing by total length of vector y and not by batch size and forgot to add batch loss!
Thanks for that!

Python problems on Machine-Learning

import numpy as np
import pandas as pd
import numpy as np
from matplotlib import pyplot as pt
def computeCost(X,y,theta):
m=len(y)
predictions= X*theta-y
sqrerror=np.power(predictions,2)
return 1/(2*m)*np.sum(sqrerror)
def gradientDescent(X, y, theta, alpha, num_iters):
m = len(y)
jhistory = np.zeros((num_iters,1))
for i in range(num_iters):
h = X * theta
s = h - y
theta = theta - (alpha / m) * (s.T*X).T
jhistory_iter = computeCost(X, y, theta)
return theta,jhistory_iter
data = open(r'C:\Users\Coding\Desktop\machine-learning-ex1\ex1\ex1data1.txt')
data1=np.array(pd.read_csv(r'C:\Users\Coding\Desktop\machine-learning-ex1\ex1\ex1data1.txt',header=None))
y =np.array(data1[:,1])
m=len(y)
y=np.asmatrix(y.reshape(m,1))
X = np.array([data1[:,0]]).reshape(m,1)
X = np.asmatrix(np.insert(X,0,1,axis=1))
theta=np.zeros((2,1))
iterations = 1500
alpha = 0.01;
print('Testing the cost function ...')
J = computeCost(X, y, theta)
print('With theta = [0 , 0]\nCost computed = ', J)
print('Expected cost value (approx) 32.07')
theta=np.asmatrix([[-1,0],[1,2]])
J = computeCost(X, y, theta)
print('With theta = [-1 , 2]\nCost computed =', J)
print('Expected cost value (approx) 54.24')
theta,JJ = gradientDescent(X, y, theta, alpha, iterations)
print('Theta found by gradient descent:')
print(theta)
print('Expected theta values (approx)')
print(' -3.6303\n 1.1664\n')
predict1 = [1, 3.5] *theta
print(predict1*10000)
Result:
Testing the cost function ...
With theta = [0 , 0]
Cost computed = 32.072733877455676
Expected cost value (approx) 32.07
With theta = [-1 , 2]
Cost computed = 69.84811062494227
Expected cost value (approx) 54.24
Theta found by gradient descent:
[[-3.70304726 -3.64357517]
[ 1.17367146 1.16769684]]
Expected theta values (approx)
-3.6303
1.1664
[[4048.02858742 4433.63790186]]
There are two problems, the first Cost computed was right, but the second one was wrong. And there are 4 element in my gradient descent(suppose to be two)

When you mention "With theta = [-1 , 2]"
and you enter
theta=np.asmatrix([[-1,0],[1,2]])
I think this is incorrect. Assuming that you have single feature and you added a column of 1, and you are trying to do simple linear regression
The correct way should be
np.array([-1,2])
Also where have
predictions= X*theta-y
It would be better if you did
np.dot(X,theta)-y
When you multiply, it's not doing the same thing.

How to implement GMM Clustering EM algorighm(Expectation Maximisation algorithm) which work for N Dimension feature vector in python

I am trying to implement GMM Clustering for both 24 Dimension feature vector and 32 dimension feature vector, where assignment of initial parameters are done by Kmeans algorightm (K mean clustering is providing cluster centers - MU - only).
I am following this link, where it's implemented only for 2D feature vector and predefined Mu and sigma.
If anyone have the code for GMM clustering kindly post.
Predefined Lib for GMM is also there in sklearn, but it's not giving me likelyhood for each iteration. sklearn GMM

def kmeans(dataSet, k, c):
# 1. Randomly choose clusters
rng = np.random.RandomState(c)
p = rng.permutation(dataSet.shape[0])[:k]
centers = dataSet[p]
while True:
labels = pairwise_distances_argmin(dataSet, centers)
new_centers = np.array([dataSet[labels == i].mean(0) for i in range(k)]
if np.all(centers == new_centers):
break
centers = new_centers
cluster_data = [dataSet[labels == i] for i in range(k)]
l = []
covs = []
for i in range(k):
l.append(len(cluster_data[i]) * 1.0 / len(dataSet))
covs.append(np.cov(np.array(cluster_data[i]).T))
return centers, l, covs, cluster_data
return new_mu, new_covs, cluster_data
class gaussian_Mix_Model:
def __init__(self, k = 8, eps = 0.0000001):
self.k = k ## number of clusters
self.eps = eps ## threshold to stop `epsilon`
def calculate_Exp_Maxim(self, X, max_iters = 1000):
# n = number of data-points, d = dimension of data points
n, d = X.shape
mu, Cov = [], []
for i in range(1,k):
new_mu, new_covs, cluster_data = kmeans(dataSet, k, c)
# Initialize new
mu[k] = new_mu
Cov[k]= new_cov
# initialize the weights
w = [1./self.k] * self.k
R = np.zeros((n, self.k))
### LLhoods
LLhoods = []
P = lambda mu, s: np.linalg.det(s) ** -.5 ** (2 * np.pi) ** (-X.shape[1]/2.) \
* np.exp(-.5 * np.einsum('ij, ij -> i',\
X - mu, np.dot(np.linalg.inv(s) , (X - mu).T).T ) )
# Iterate till max_iters iterations
while len(LLhoods) < max_iters:
# Expectation Calcultion
## membership for each of K Clusters
for k in range(self.k):
R[:, k] = w[k] * P(mu[k], Cov[k])
# Finding the log likelihood
LLhood = np.sum(np.log(np.sum(R, axis = 1)))
# Now store the log likelihood to the list.
LLhoods.append(LLhood)
# Number of data points to each clusters
R = (R.T / np.sum(R, axis = 1)).T
N_ks = np.sum(R, axis = 0)
# Maximization and calculating the new parameters.
for k in range(self.k):
# Calculate the new means
mu[k] = 1. / N_ks[k] * np.sum(R[:, k] * X.T, axis = 1).T
x_mu = np.matrix(X - mu[k])
# Calculate new cov
Cov[k] = np.array(1 / N_ks[k] * np.dot(np.multiply(x_mu.T, R[:, k]), x_mu))
# Calculate new PiK
w[k] = 1. / n * N_ks[k]
# check for convergence
if (np.abs(LLhood - LLhoods[-2]) < self.eps) and (iteration < max_iters): break
else:
Continue
from collections import namedtuple
self.params = namedtuple('params', ['mu', 'Cov', 'w', 'LLhoods', 'num_iters'])
self.params.mu = mu
self.params.Cov = Cov
self.params.w = w
self.params.LLhoods = LLhoods
self.params.num_iters = len(LLhoods)
return self.params
# Call the GMM to find the model
gmm = gaussian_Mix_Model(3, 0.000001)
params = gmm.fit_EM(X, max_iters= 150)
# Plotting of Log-Likelihood VS Iterations.
plt.plot(LLhoods[0])
plt.savefig('Dataset_2A_GMM_Class_1_K_16.png')
plt.clf()
plt.plot(LLhoods[1])
plt.savefig('Dataset_2A_GMM_Class_2_K_16.png')
plt.clf()
plt.plot(LLhoods[2])
plt.savefig('Dataset_2A_GMM_Class_3_K_16.png')
plt.clf()

How to add L1 normalization in python?

I am trying to code logistic regression from scratch. In this code I have, I thought my cost derivative was my regularization, but I've been tasked with adding L1norm regularization. How do you add this in python? Should this be added where I have defined the cost derivative? Any help in the right direction is appreciated.
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, X):
return Sigmoid(X # theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T # (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)

Regularization adds a term to the cost function so that there is a compromise between minimize cost and minimizing the model parameters to reduce overfitting. You can control how much compromise you would like by adding a scalar e for the regularization term.
So just add the L1 norm of theta to the original cost function:
J = J + e * np.sum(abs(theta))
Since this term is added to the cost function, then it should be considered when computing the gradient of the cost function.
This is simple since the derivative of the sum is the sum of derivatives. So now just need to figure out what is the derivate of the term sum(abs(theta)). Since it is a linear term, then the derivative is constant. It is = 1 if theta >= 0, and -1 if theta < 0 (note there is a mathematical undeterminity at 0, but we don't care about it).
So in the function Cost_Function_Derivative we add:
J = J + alpha * e * (theta >= 0).astype(float)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why my gradient descent algorithm for linear regression diverges - python

Related

CFD simulation (with multiple for loops and matrix operations) is very slow to run. Looking to replace with faster numpy functions (or alternative)

Why I'm getting a huge cost in Stochastic Gradient Descent Implementation?

Python problems on Machine-Learning

How to implement GMM Clustering EM algorighm(Expectation Maximisation algorithm) which work for N Dimension feature vector in python

How to add L1 normalization in python?

Categories

Resources