I started machine learning and wrote this code. But for some reason I am getting zig-zag error curve instead of a decreasing logarithmic curve. The "form_binary_classes" for now does nothing but take the start and end indices of two similar datasets with different labels. The error function returns the error in every iteration(most probably this is where the bug is) and acc returns the accuracy. gradient_descent is basically used to return the trained weights and bias terms. Looking only for the bug and not for an efficient method.
def hypothesis(x, theta, b):
h = np.dot(x, theta) + b
return sigmoid(h)
def sigmoid(z):
return 1.0/(1.0+np.exp(-1.0*z))
def error(y_true, x, w, b):
m = x.shape[0]
err = 0.0
for i in range(m):
hx = hypothesis(x[i], w, b)
err += (1-y_true[i])*np.log2(1-hx)
err += y_true[i]*np.log2(hx)
err += y_true[i]*np.log2(hx) + (1-y_true[i])*np.log2(1-hx)
return -err/m
def get_gradient(y_true, x, w, b):
grad_w = np.zeros(w.shape)
grad_b = 0.0
m = x.shape[0]
for i in range(m):
hx = hypothesis(x[i], w, b)
grad_w += (y_true[i] - hx)*x[i]
grad_b += (y_true[i] - hx)
grad_w /= m
grad_b /= m
return [grad_w, grad_b]
def gradient_descent(y_true, x, w, b, learning_rate=0.1):
err = error(y_true, x, w, b)
grad_w, grad_b = get_gradient(y_true, x, w, b)
w = w + learning_rate*grad_w
b = b + learning_rate*grad_b
return err, w, b
def predict(x,w,b):
confidence = hypothesis(x,w,b)
if confidence<0.5:
return 0
return 1
def get_acc(x_tst,y_tst,w,b):
y_pred = []
for i in range(y_tst.shape[0]):
p = predict(x_tst[i],w,b)
y_pred = np.array(y_pred)
return float((y_pred==y_tst).sum())/y_tst.shape[0]
def form_binary_classes(a_start, a_end, b_start, b_end):
x = np.vstack((X[a_start:a_end], X[b_start:b_end]))
y = np.hstack((Y[a_start:a_end], Y[b_start:b_end]))
print("{} {}".format(x.shape,y.shape[0]))
loss = []
acc = []
w = 2*np.random.random((x.shape[1],))
b = 5*np.random.random()
for i in range(100):
l, w, b = gradient_descent(y, x, w, b, learning_rate=0.5)
plt.ylabel("Negative of Log Likelihood")
What error plot looks like:
What it SHOULD look like:
You have an issue in computing the error, and that can very possibly cause your model's issue for not converging.
In your code, when you consider the corner cases,if hx==0 or if hx==1 any way the error you are computing is zero, even if we have prediction errors, like hx==0 while ytrue=1
in this case, we come inside the first if, and the error will be
(1-1)*log2(1) =0, which is not correct.
You can solve this issue by modifying your first two ifs in this way:
def error(y_true, x, w, b):
m = x.shape[0]
err = 0.0
for i in range(m):
hx = hypothesis(x[i], w, b)
if(hx==y_true[i]): #Corner cases where we have zero error
err += 0
elif((hx==1 and y_true[i]==0) or (hx==0 and y_true[i]==1) ): #Corner cases where we will have log2 of zero
err += np.iinfo(np.int32).min # which is an approximation for log2(0), and we penalzie the model at most with the greatest error possible
err += y_true[i]*np.log2(hx) + (1-y_true[i])*np.log2(1-hx)
return -err/m
In this part of the code, I assumed you have binary labels
I am writing a program that operates out of a main() function. I am unable to change the main function (this is part of a class). How would I go about carrying over a variable from one function to the next, without changing the main()?
def read_data(fname) :
file = fname
x = []
y = []
with open(file) as f:
for line in f:
xi, yi = [float(x) for x in line.split(",")]
return x, y
def compute_m_and_b(x, y) :
sx, sy, sx2, sxy, sy2 = 0, 0, 0, 0, 0
for i in range(len(x)):
sx += x[i]
sy += y[i]
sx2 += (x[i] ** 2)
sy2 += (y[i] ** 2)
sxy += (x[i] * y[i])
m = (sxy * len(x) - sx * sy) / (sx2 * len(x) - sx**2)
b = (sy - m * sx) / len(x)
return m, b
def compute_fx_residual(x, y, m, b) :
fx = []
for xi in x:
fx.append(m * xi + b)
residual = []
for i in range(len(y)):
residual.append(y[i] - fx[i])
return fx, residual
def compute_sum_of_squared_residuals(residual) :
least_squares_r = 0
for i in range(len(y)) :
least_squares_r += (residual[i]) ** 2
return least_squares_r
def compute_total_sum_of_squares(y) :
sum_squares = 0
ymean = sum(y) / len(y)
for i in range(len(y)) :
sum_squares += (yi - ymean) ** 2
return sum_squares
as you can see, I am restricted to only pulling the variables listed in the parentheses of the def functions. This leads to variables calculated in prior functions being undefined. How can I import them without needing to change the main()?
Please let me know if I should be more clear. I can provide more examples, but I wanted to maintain some brevity.
EDIT: here is the main function:
def main():
fname = input("Enter Input Filename: ")
x, y = regress.read_data(fname)
print("Input File: ", fname)
print("Data points: ", len(x))
#compute coefficients m and b
m, b = regress.compute_m_and_b(x, y)
#compute fx and residual
fx, residual = regress.compute_fx_residual(x, y, m, b)
#compute sum of squared residuals
least_squares_r = regress.compute_sum_of_squared_residuals(residual)
#compute sum of squares
sum_squares = regress.compute_total_sum_of_squares(y)
#compute coefficeint of determination
coeff_of_d = regress.compute_coeff_of_determination(least_squares_r, sum_squares)
regress.print_least_squares(x, y, m, b, fx, residual, least_squares_r, sum_squares, coeff_of_d)
#compute pearson coefficient
pearson_r = regress.compute_pearson_coefficient(x, y)
You haven't provided the main function so it's unclear how you're currently using it.
Looks to me like you can just get the variable for each consecutive function and pass them into the next:
fname = "some/path/to/file.txt"
x, y = read_data(fname)
m, b = compute_m_and_b(x, y, m, b)
fx, residual = compute_fx_residual(x, y, m, b)
least_squares_r = compute_sum_of_squared_residuals(residual)
sum_squares = compute_total_sum_of_squares(y)
I'm trying to implement gradient descent in Simple linear regression. Whenever I run the code, I get the error
The code i am using is this:
def get_data(df, feature, predict):
X = df[feature]
Y = df[predict]
X = np.float64(X)
Y = np.float64(Y)
return X, Y
def average(X, Y, b, m, length):
temp1 = 0
temp2 = 0
for i in range(length):
temp1 += (b + m * X[i]) - Y[i]
temp2 += ((b + m * X[i]) - Y[i]) * X[i]
return temp1 / float(length), temp2 / float(length)
def gradient_descent(b, m, alpha, length, num_iterations, X, Y):
for i in range(num_iterations):
temp1, temp2 = average(X, Y, b, m, length)
b_temp = b - alpha * temp1
m_temp = m - alpha * temp2
b = b_temp
m = m_temp
return b, m
def run(b, m, alpha, feature, predict, df, num_iterations):
X, Y = get_data(df, feature, predict)
length = np.alen(X)
final_b, final_m = gradient(b, m, alpha, length, num_iterations, X, Y)
return final_b, final_m
b, m = run(0, 0, 0.05, 'sqft_living', 'price', df, 1000)
The error it gives my is this:
RuntimeWarning: overflow encountered in double_scalars from
ipykernel import kernelapp as app
RuntimeWarning: invalid value encountered in double_scalars.
I'm not able to identify which part of the code is causing the error. I tried to convert numpy array to float64 also my code is not running into Divide by Zero Error. Can someone identify the error? Also, How can it be rectified?
Last night I wrote a simple binary logistic regression python code.
It seems to be working correctly (likelihood increases with each iteration, and I get good classification results).
My problem is that I can only initialize my weights with W = np.random.randn(n+1, 1) normal distribution.
But I don't want normal distribution, I want uniform distribution. But when I do that, I get the error
"RuntimeWarning: divide by zero encountered in log
return np.dot(Y.T, np.log(predictions)) + np.dot((onesVector - Y).T, np.log(onesVector - predictions))"
this is my code
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
return 1/(1+np.exp(-x))
def predict(X, W):
return sigmoid(np.dot(X, W))
def logLikelihood(X, Y, W):
m = X.shape[0]
predictions = predict(X, W)
onesVector = np.ones((m, 1))
return np.dot(Y.T, np.log(predictions)) + np.dot((onesVector - Y).T, np.log(onesVector - predictions))
def gradient(X, Y, W):
return np.dot(X.T, Y - predict(X, W))
def successRate(X, Y, W):
m = Y.shape[0]
predictions = predict(X, W) > 0.5
correct = (Y == predictions)
return 100 * np.sum(correct)/float(correct.shape[0])
trX = np.load("binaryMnistTrainX.npy")
trY = np.load("binaryMnistTrainY.npy")
teX = np.load("binaryMnistTestX.npy")
teY = np.load("binaryMnistTestY.npy")
m, n = trX.shape
trX = np.concatenate((trX, np.ones((m, 1))),axis=1)
teX = np.concatenate((teX, np.ones((teX.shape[0], 1))),axis=1)
W = np.random.randn(n+1, 1)
learningRate = 0.00001
numIter = 500
likelihoodArray = np.zeros((numIter, 1))
for i in range(0, numIter):
W = W + learningRate * gradient(trX, trY, W)
likelihoodArray[i, 0] = logLikelihood(trX, trY, W)
print("train success rate is %lf" %(successRate(trX, trY, W)))
print("test success rate is %lf" %(successRate(teX, teY, W)))
If i initialize my W to be zeros or randn then it works.
If I initialize it to random (not normal) or ones, then I get the division by zero thing.
Why does this happen and how can I fix it?
I implemented a simple linear regression and I want to try it out by fitting a non linear model
specifically I am trying to fit a model for the function y = x^3 + 5 for example
this is my code
import numpy as np
import numpy.matlib
import matplotlib.pyplot as plt
def predict(X,W):
return np.dot(X,W)
def gradient(X, Y, W, regTerm=0):
return (-np.dot(X.T, Y) + np.dot(np.dot(X.T,X),W))/(m*k) + regTerm * W /(n*k)
def cost(X, Y, W, regTerm=0):
m, k = Y.shape
n, k = W.shape
Yhat = predict(X, W)
return np.trace(np.dot(Y-Yhat,(Y-Yhat).T))/(2*m*k) + regTerm * np.trace(np.dot(W,W.T)) / (2*n*k)
def Rsquared(X, Y, W):
m, k = Y.shape
SSres = cost(X, Y, W)
Ybar = np.mean(Y,axis=0)
Ybar = np.matlib.repmat(Ybar, m, 1)
SStot = np.trace(np.dot(Y-Ybar,(Y-Ybar).T))
return 1-SSres/SStot
m = 10
n = 200
k = 1
trX = np.random.rand(m, n)
trX[:, 0] = 1
for i in range(2, n):
trX[:, i] = trX[:, 1] ** i
trY = trX[:, 1] ** 3 + 5
trY = np.reshape(trY, (m, k))
W = np.random.rand(n, k)
numIter = 10000
learningRate = 0.5
for i in range(0, numIter):
W = W - learningRate * gradient(trX, trY, W)
domain = np.linspace(0,1,100000)
powerDomain = np.copy(domain)
m = powerDomain.shape[0]
powerDomain = np.reshape(powerDomain, (m, 1))
powerDomain = np.matlib.repmat(powerDomain, 1, n)
for i in range(1, n):
powerDomain[:, i] = powerDomain[:, 0] ** i
print(Rsquared(trX, trY, W))
plt.plot(trX[:, 1],trY,'o', domain, predict(powerDomain, W),'r')
the R^2 I'm getting is very close to 1, meaning I found a very good fit to the training data, but it isn't shown on the plots. When I plot the data, it usually looks like this:
it looks as if I'm underfitting the data, but with such a complex hypothesis, with 200 features (meaning i allow polynomials up to x^200) and only 10 training examples, I should very clearly be overfitting data, so I expect the red line to pass through all the blue points and go wild between them.
This isn't what I'm getting which is confusing to me.
What's wrong?
You forgot to set powerDomain[:,0]=1, that's why your plot goes wrong at 0. And yes you are over fitting: look how quickly your plot fires up as soon as you get out of your training domain.
I've been reading Bishop's book on machine learning, and I'm trying to implement the backpropagation algorithm for a neural network, but it's not finding a solution. The code is below. I've broken it down into the network code and the testing code.
import numpy as np
from collections import namedtuple
import matplotlib.pyplot as plt
import scipy.optimize as opt
# Network code
def tanh(x):
return np.tanh(x)
def dtanh(x):
return 1 - np.tan(x)**2
def identity(x):
return x
def unpack_weights(w, D, M, K):
len(w) = (D + 1)*M + (M + 1)*K, where
D = number of inputs, excluding bias
M = number of hidden units, excluding bias
K = number of output units
UnpackedWeights = namedtuple("UpackedWeights", ["wHidden", "wOutput"])
cutoff = M*(D + 1)
wHidden = w[:cutoff].reshape(M, D + 1)
wOutput = w[cutoff:].reshape(K, M + 1)
return UnpackedWeights(wHidden=wHidden, wOutput=wOutput)
def compute_output(x, weights, fcnHidden=tanh, fcnOutput=identity):
NetworkResults = namedtuple("NetworkResults", ["hiddenAct", "hiddenOut", "outputAct", "outputOut"])
xBias = np.vstack((1., x))
hiddenAct = weights.wHidden.dot(xBias)
hiddenOut = np.vstack((1., fcnHidden(hiddenAct)))
outputAct = weights.wOutput.dot(hiddenOut)
outputOut = fcnOutput(outputAct)
return NetworkResults(hiddenAct=hiddenAct, hiddenOut=hiddenOut, outputAct=outputAct,
def backprop(t, x, M, fcnHidden=tanh, fcnOutput=identity, dFcnHidden=dtanh):
maxIter = 10000
learningRate = 0.2
N, K = t.shape
N, D = x.shape
nParams = (D + 1)*M + (M + 1)*K
w0 = np.random.uniform(-0.1, 0.1, nParams)
for _ in xrange(maxIter):
sse = 0.
for n in xrange(N):
weights = unpack_weights(w0, D, M, K)
# Compute net output
netResults = compute_output(x=x[n].reshape(-1, 1), weights=weights,
fcnHidden=fcnHidden, fcnOutput=fcnOutput)
# Compute derivatives of error function wrt wOutput
outputDelta = netResults.outputOut - t[n].reshape(K, 1)
outputDerivs = outputDelta.dot(netResults.hiddenOut.T)
# Compute derivateives of error function wrt wHidden
hiddenDelta = dFcnHidden(netResults.hiddenAct)*(weights.wOutput[:, 1:].T.dot(outputDelta))
xBias = np.vstack((1., x[n].reshape(-1, 1)))
hiddenDerivs = hiddenDelta.dot(xBias.T)
delErr = np.hstack((np.ravel(hiddenDerivs), np.ravel(outputDerivs)))
w1 = w0 - learningRate*delErr
w0 = w1
sse += np.sum(outputDelta**2)
return w0
# Testing code
def generate_test_data():
D, M, K, N = 1, 3, 1, 25
x = np.sort(np.random.uniform(-1., 1., (N, D)), axis=0)
t = 1.0 + x**2
return D, M, K, N, x, t
def test_backprop():
D, M, K, N, x, t = generate_test_data()
return backprop(t, x, M)
def scipy_solution(t, x, D, M, K, N, method="BFGS"):
def obj_fn(w):
weights = unpack_weights(w, D, M, K)
err = 0
for n in xrange(N):
netOut = compute_output(x[n], weights=weights)
err += (netOut.outputOut[0, 0] - t[n])**2
return err
w0 = np.random.uniform(-1, 1, (D + 1)*M + (M + 1)*K)
return opt.minimize(obj_fn, w0, method=method)
When I use the optimize module in scipy (i.e., the scipy_solution() function) to find the network weights, the sum of squared errors gets very close to zero, and the output of the network looks like the data I generated. When I use my backpropagation function, the sum of squared errors gets stuck between 2.0 and 3.0, and the network output looks almost linear. Moreover, when I feed the scipy solution for the weights to my backprop function as the starting value, my backprop function still doesn't find the right solution.
I've been stuck on this for a couple of days, so I'd really appreciate any tips anyone has. Thanks.
def dtanh(x):
return 1 - np.tan(x)**2
should be
def dtanh(x):
return 1 - np.tanh(x)**2