Why is my neural network only getting to a certain accuracy? - python

Sorry for not being very specific in the question, but I've been trying to create a neural network all on my own for a couple months now, and could use some help. This is a basic one made to recognize numbers from the MNIST data set, and it's mostly based on code from here and here. When I run it now, after some experimenting with the amount of iterations and the learning rate, I can get it to ~30% accuracy, which is a lot better than my failed experiments from before but definitely hardly as good as it can be (even if I do 40,000 iterations, it seems to end up almost always guessing 1 for some reason). Here's the code, it's got a couple quirks and could be optimized a lot but I just wanted to be completely able to see what's happening and fully understand it.
#Importing some random libraries idk
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from mnist import MNIST
#Sigmoid is used as the activation function
def sigmoid(x):
return 1/(1 + np.exp(-x))
#Derivative of the sigmoid function
def dsigmoid(x):
return sigmoid(x)*(1.0 - sigmoid(x))
def convToBinary(ogArray, newArray):
for i in range(len(ogArray)):
newArray[i][ogArray[i]] = 1
print("Very beginning")
#size is 784-16-16-10
# 0-1 - 2-3 < layer names
# 0 1 2 < interlayer names
#this means that something like outputs[2] is actually the activition for layer 3
class NeuralNetwork(object):
def __init__ (self):
#weights
#dimensions are opposite usual notation, (in, out) or (layer size, next layer size)
self.weights = ([np.random.randn(784, 16), np.random.randn(16, 16), np.random.randn(16, 10)])
self.wDerivs = ([np.zeros((784, 16)), np.zeros((16,16)), np.zeros((16, 10))])
#biases
self.biases = ([np.ones((16, 1)), np.ones((16, 1)), np.ones((10, 1))])
self.bDerivs = ([np.zeros((16, 1)), np.zeros((16, 1)), np.zeros((10, 1))])
#outputs
self.zoutputs = ([np.ones((16, 1)), np.ones((16, 1)), np.ones((10, 1))])
self.outputs = ([np.ones((16, 1)), np.ones((16, 1)), np.ones((10, 1))])
self.aDerivs = ([np.ones((16, 1)), np.ones((16, 1)), np.ones((10, 1))])
self.cost = np.ones((10, 1))
def forwardPropagate(self, input):
last = input
for i in range(3):
self.zoutputs[i] = np.add(np.dot(np.transpose(self.weights[i]), last), self.biases[i])
self.outputs[i] = sigmoid(self.zoutputs[i])
last = self.outputs[i]
def backPropagate(self, input, y, lr):
#deltas, dC/da
self.aDerivs[2] = (self.outputs[2] - y) * dsigmoid(self.zoutputs[2])
self.aDerivs[1] = np.dot(self.weights[2], self.aDerivs[2]) * dsigmoid(self.zoutputs[1])
self.aDerivs[0] = np.dot(self.weights[1], self.aDerivs[1]) * dsigmoid(self.zoutputs[0])
#biases, dC/db
self.bDerivs[2] = self.aDerivs[2]
self.bDerivs[1] = self.aDerivs[1]
self.bDerivs[0] = self.aDerivs[0]
#weights, dC/dw
self.wDerivs[2] = np.dot(self.outputs[1], np.transpose(self.aDerivs[2]))
self.wDerivs[1] = np.dot(self.outputs[0], np.transpose(self.aDerivs[1]))
self.wDerivs[0] = np.dot(input, np.transpose(self.aDerivs[0]))
#doing the adjusting
for i in range(len(self.biases)):
self.biases[i] = self.biases[i] - (self.bDerivs[i] * lr)
for i in range(len(self.weights)):
self.weights[i] = self.weights[i] - (self.wDerivs[i] * lr)
def findCost(self, y):
for i in range(10):
self.cost[i] = -(y[i]*np.log(self.outputs[2][i]) + (1-y[i])*np.log(1 - self.outputs[2][i]))
aM = 0
for i in self.cost:
aM += i
#find average cost for each one, idk if I'm using the right cost function here but it gets the job done
return aM / 10
def findAnswer(self):
#finding highest value in output layer to guess the number
bestNum = 0
bestVal = 0
for i in range(10):
if (self.outputs[2][i] > bestVal):
bestNum = i
bestVal = self.outputs[2][i]
return bestNum
def doTheThing(self, X, oldY, Y, iter, lr):
#this function does all of the other functions
n_c = 0
for i in range(iter):
c = 0
x = X[i].reshape(784, 1)
oldy = oldY[i]
y = Y[i].reshape(10, 1)
self.forwardPropagate(x)
c = self.findCost(y)
self.backPropagate(x, y, lr)
print("It is " + str(oldy))
print("It predicted " + str(self.findAnswer()))
if (oldy == self.findAnswer()):
if (i > (iter * 4) / 5):
n_c += 1
print("Iteration: " + str(i))
print("Cost: " + str(c))
#I didn't really separate training and testing, so I just found percent right based on last 1/5
if (((i - (iter*0.8) + 1)) != 0):
print("Right: " + str(n_c / ((i - (iter*0.8) + 1))))
#import
mndata = MNIST('Number_Samples')
iTest, lTest = mndata.load_training()
newITest = np.array(iTest)
newLTest = np.zeros((len(lTest), 10))
#putting the expected result in a form that can be compared to the output layer
convToBinary(lTest, newLTest)
nn = NeuralNetwork()
nn.doTheThing(newITest, lTest, newLTest, 40000, 0.1)
I've tried debugging but I've had no luck, there's probably some major flaw that I'm just not seeing. I would greatly appreciate it if someone with a lot more experience than me were to at least point in the right direction, because right now I have no clue what I'm doing wrong.

Related

My neural network has learned to ignore its input and always give the same answer. How can I fix this?

I'm trying to write a neural network in Python which is mostly from scratch to check my understanding of how they work. The task I've given it is to reverse an 8-term list of ones and zeros, e.g. "[0, 1, 0, 1, 0, 1, 0, 1]" as input should result in "[1, 0, 1, 0, 1, 0, 1, 0]" as output.
However, after the training process it always gives the same prediction regardless of which input I give it, which is obviously not ideal. I'm new to this so I'm probably doing something obviously silly but I don't know what. Where does this go wrong? Here's my code:
#imports
import random as r
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import pandas as pd
from tqdm.notebook import tqdm
#State Constants
nNeurons = 8 #Number of neurons per layer
nLayers = 24 #Number of layers
nWeights = nNeurons + 1
trainingDataSize = 10000 #Training data size
nIterations = 250 #Number of iterations of each algorithm
testSize = 100 #Number of data points used for a loss function for the whole network
nAttempts = 5 #Number of models gradient descent back propegation generates
#Data
#Objective function
def f(x):
output = []
for i in range(len(x)):
output.append(x[-i-1])
return(output)
#Random 8-bit binary vector
def randomX():
output = []
for i in range(8):
output.append(r.choice([0,1]))
return(output)
#Generating the training data
trainingData = []
for i in range(trainingDataSize):
x = randomX()
trainingData.append([x, f(x)])
#Functions
def sigmoid(x):
if x < -10:
return(0)
else:
return(1/(1 + np.e**(-x)))
def neuron(w, x):
output = w[0]
for i in range(len(x)):
output += w[i + 1] * x[i]
activation = tanh(output)
return(sigmoid(activation))
def generateWeights():
return(100 * (np.random.rand(nLayers, nNeurons, nWeights) - 0.5))
def hiddenLayer(layerNum, inp, hiddenWeights):
w = hiddenWeights[layerNum]
output = []
for i in range(nNeurons):
output.append(neuron(w[i],inp))
return(output)
def neuralNetwork(inp, weights):
lay = inp
for i in range(nLayers):
lay = hiddenLayer(i, lay, weights)
return(lay)
#Loss of a single data point based on the current weights
def lossOne(weights):
dataPoint = r.choice(trainingData)
output = 0
prediction = neuralNetwork(dataPoint[0], weights)
actual = dataPoint[1]
for i in range(len(prediction)):
output += 0.5 * (prediction[i] - actual[i]) ** 2
return(output)
#Loss of several data points
def loss(weights):
output = 0
for i in range(testSize):
output += lossOne(weights)
return(output)
def hiddenLayer(layerNum, inp, hiddenWeights):
w = hiddenWeights[layerNum]
output = []
for i in range(nNeurons):
output.append(neuron(w[i],inp))
return(output)
def layerLoss(layerNum, inp, weights, expectedOut):
output = 0
actualOut = hiddenLayer(layerNum, inp, weights)
for i in range(len(expectedOut)):
output += 0.5 * (actualOut[i] - expectedOut[i]) ** 2
return(output)
scale = 100
dw = 0.01
bestY = 10 ** 10 #For tracking which weight is best
for h in tqdm(range(nAttempts)):
currentW = generateWeights()
#Using back-propegation to train the network with gradient descent
for i in tqdm(range(nIterations)):
dataPoint = r.choice(trainingData)
lay = dataPoint[0]
expectedOut = dataPoint[1]
for j in range(nLayers):
for k in range(nNeurons):
#Gradient descent on the kth neuron in layer - j - 1
for l in range(j): #Finding the input this layer recieives
lay = hiddenLayer(l, lay, currentW)
currentY = layerLoss(-j - 1, lay, currentW, expectedOut)
nearbyW = currentW
nearbyW[-j - 1][k] += dw
nearbyY = layerLoss(-j - 1, lay, nearbyW, expectedOut)
dY = nearbyY - currentY
pert = r.uniform(-dw ** 2,dw ** 2) #Small perturbation to prevent getting stuck at maxima
currentW[-j - 1][k] -= (scale * dY/dw) + pert
expectedOut = lay
#Keeping the best model
newY = loss(currentW)
if newY < bestY:
bestW = currentW
bestY = newY
#Testing the neural network
tstx = [1, 1, 0, 0, 1, 1, 0, 0]
tstGD = neuralNetwork(tstx, bestW)
I noticed that it usually predicts "0.2689..." and "0.7310..." as the probabilities, which is weird because it seems unlikely to be equally sure of each prediction. Can anyone explain what I got wrong here to cause it to always predict the same value regardless of the input and to always be either 26% sure or 73% sure?

Implementing simple probabilistic model with negative log likelihood loss

First a quick disclaimer would be that I posted this question on Reddit, in the Deep Learning and Learning Machine Learning first, but I thought I might also request your expertise here too. Without further ado:
I am currently challenging myself on this year Deep Unsupervised Learning Course of Berkeley University and although I just started the warmup exercise of week 1, I am already having 'technical' difficulties.
The exercise in question is the "1. Warmup" in the following document: Week 1 Exercises. (My apologies as I am not familiar enough with Reddit formating to seemlessly include images.
In my understanding, we have a variable x which can take values from 1..100 which a specific probability of being sampled ( defined in sample_data() function).
The task is therefore to fit a vector of parameters theta which is passed to a softmax function, and is supposed to give the likelihood of a specific element x_i to be sampled. Namely, theta_1 should the parameter which "bumps up" the soft-max value corresponding to the variable x = 1 and so on.
Using Tensorflow, I think I was able to create such a model, but when it comes to training, I believe I am missing a crucial point as the program cannot compute gradients with respect to the theta parameters.
I would like to know if am not misunderstanding the task, and if there is any better method to achieve the result of the exercise.
Here is the code, where the failing par is located from the # Computing gradients.
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
if __name__ == "__main__":
# Sampling function of the x variable provided in the exercise
def sample_data():
count = 10000
rand = np.random.RandomState(0)
a = 0.3 + 0.1 * rand.randn(count)
b = 0.8 + 0.05 * rand.randn(count)
mask = rand.rand(count) < 0.5
samples = np.clip(a * mask + b * (1 - mask), 0.0, 1.0)
return np.digitize(samples, np.linspace(0.0, 1.0, 100))
full_data = sample_data()
train_ds = full_data[:int(.8*len( full_data))]
val_ds = full_data[int(.8*len( full_data)):]
# Declaring parameters theta
w_init = tf.zeros_initializer()
params = tf.Variable(
initial_value=w_init(shape=(1, 100),
dtype='float32'), trainable=True, name='params')
softmax = tf.squeeze( tf.nn.softmax( params, axis=1))
#Should materialize the loss of the model
def get_neg_log_likelihood( inputs):
return - tf.math.log( softmax)
neg_log_likelihoods = get_neg_log_likelihood( softmax)
dist = tfp.distributions.Categorical( probs=softmax, dtype=tf.int32)
optimizer = tf.keras.optimizers.Adam()
for epoch in range( 100):
minibatch_size = 200
n_minibatches = len( train_ds) // minibatch_size
# Running over minibatches of the data
for minibatch in range( n_minibatches):
# Minibatching
start_index = (minibatch*minibatch_size)
end_index = (minibatch_size*minibatch + minibatch_size)
x = train_ds[start_index:end_index]
with tf.GradientTape() as tape:
tape.watch( params)
loss = tf.reduce_mean( - dist.log_prob( x))
# Computing gradients
grads = tape.gradient( loss, params)
print( grads) # Result: None
# input()
optimizer.apply_gradients( zip( grads, params))
Thank you in advance for your time.
PS: I mainly have a background in Deep Reinforcement Learning, therefore I can understand the various models used there ( policy, value functions ...), but I am trying to refine my grasp over the internals of the models themselves, namely in generative probabilistic models (GAN, VAE) and other unsupervised learning models in general ( RealNVP, Norm Flows, ...)
Pretty sure nobody is gonna see this, but I thought I might as well bring some closure to this.
First of all, I calculated the gradients by directly deriving its expression from the negative log likelihood of the soft-max value, thus dropping the Tensorflow framework by the same occasion.
Although the results are a little bit under my expectations, the program was able to fit the model to a distribution somewhat similar to the empirical distribution of the sampled data. I guess this is due to the fact that just a 1 dimensional theta parameter vector is not enough to fully model the real data distribution, as well as the finite amount of sampled data.
An updated version of the code:
import numpy as np
from matplotlib import pyplot as plt
np.random.seed( 42)
def softmax(X, theta = 1.0, axis = None):
# Shamefull copy paste from SO
y = np.atleast_2d(X)
if axis is None:
axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)
y = y * float(theta)
y = y - np.expand_dims(np.max(y, axis = axis), axis)
y = np.exp(y)
ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)
p = y / ax_sum
if len(X.shape) == 1: p = p.flatten()
return p
if __name__ == "__main__":
def sample_data():
count = 10000
rand = np.random.RandomState(0)
a = 0.3 + 0.1 * rand.randn(count)
b = 0.8 + 0.05 * rand.randn(count)
mask = rand.rand(count) < 0.5
samples = np.clip(a * mask + b * (1 - mask), 0.0, 1.0)
return np.digitize(samples, np.linspace(0.0, 1.0, 100))
full_data = sample_data()
train_ds = full_data[:int(.8*len( full_data))]
val_ds = full_data[int(.8*len( full_data)):]
# Declaring parameters
params = np.zeros(100)
# Use for loss computation
def get_neg_log_likelihood( softmax):
return - np.log( softmax)
def get_loss( params, x):
return np.mean( [get_neg_log_likelihood( softmax( params))[i-1] for i in x])
lr = .0005
for epoch in range( 1000):
# Shuffling training data
np.random.shuffle( train_ds)
minibatch_size = 100
n_minibatches = len( train_ds) // minibatch_size
# Running over minibatches of the data
for minibatch in range( n_minibatches):
smax = softmax( params)
# Jacobian of neg log likelishood
jacobian = [[ smax[j] - 1 if i == j else
smax[j] for j in range(100)] for i in range(100)]
# Minibatching
start_index = (minibatch*minibatch_size)
end_index = (minibatch_size*minibatch + minibatch_size)
x = train_ds[start_index:end_index]
# Compute the gradient matrix for each sample data and mean over it
grad_matrix = np.vstack( [jacobian[i] for i in x])
grads = np.sum( grad_matrix, axis=0)
params -= lr * grads
print( "Epoch %d -- Train loss: %.4f , Val loss: %.4f" %(epoch, get_loss( params, train_ds), get_loss( params, val_ds)))
# Plotting each ~100 epochs
if epoch % 100 == 0:
counters = { i+1: 0 for i in range(100)}
for x in full_data:
counters[x]+= 1
histogram = np.array( [ counters[i+1] / len( full_data) for i in range( 100)])
fsmax = softmax( params)
fig, ax = plt.subplots()
ax.set_title('Dist. Comp. after %d epochs of training (from scratch)' % epoch)
x = np.arange( 1,101)
width = 0.35
rects1 = ax.bar(x - width/2, fsmax, width, label='Model')
rects2 = ax.bar(x + width/2, histogram, width, label='Empirical')
ax.set_ylabel('Likelihood')
ax.set_xlabel('Variable x\s values')
ax.legend()
def autolabel(rects):
for rect in rects:
height = rect.get_height()
autolabel(rects1)
autolabel(rects2)
fig.tight_layout()
plt.savefig( 'plots/results_after_%d_epochs.png' % epoch)
Picture of the final model distribution included for completeness. Modeled vs Empirical Distribution

Neural Network Cost Function in Coursera ML Excercise 4(Week 5) Python

so i am doing ex 4 and i cant figure out this ex 4. i dont want to cheat so can anyone guide me in the right direction?
def nnCostFunction(nn_params,input_layer_size,hidden_layer_size,num_labels,X, y, lambda_=0.0):
Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],
(hidden_layer_size, (input_layer_size + 1)))
Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],
(num_labels, (hidden_layer_size + 1)))
# Setup some useful variables
m = y.size
# You need to return the following variables correctly
J = 0
Theta1_grad = np.zeros(Theta1.shape)
Theta2_grad = np.zeros(Theta2.shape)
# ====================== YOUR CODE HERE ======================
x=utils.sigmoid(np.dot(X,Theta1.T))#5000*25
x_C=np.concatenate([np.ones((m,1)),x],axis=1)
z=utils.sigmoid(np.dot(x_C,Theta2.T))#5000*10
J=(1/m)*np.sum(-np.dot(y,np.log(z))-np.dot((1-y),np.log(1-z)))
# ================================================================
# Unroll gradients
# grad = np.concatenate([Theta1_grad.ravel(order=order), Theta2_grad.ravel(order=order)])
grad = np.concatenate([Theta1_grad.ravel(), Theta2_grad.ravel()])
return J, grad
lambda_ = 0
J, _ = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,
num_labels, X, y, lambda_)
print('Cost at parameters (loaded from ex4weights): %.6f ' % J)
print('The cost should be about : 0.287629.')
>> Cost at parameters (loaded from ex4weights): 949.011852
The cost should be about : 0.287629.
In another cell i tried to output J(without summing it) and it was :
array([ 32.94277417, 31.60660549, 121.58989642, 110.33099785, 111.01961993, 105.33746192, 124.60468929, 117.79628872, 102.04080206, 91.74271593])
So, why is my cost coming out wrong? Can someone guide me.
Here is the main source code for more information
Take a closer look at 1.2 Model representation.
You've forgotten to add a bias unit to the first layer units. Beside that the whole backpropagation part is missing.

How to calculate logistic regression accuracy

I am a complete beginner in machine learning and coding in python, and I have been tasked with coding logistic regression from scratch to understand what happens under the hood. So far I have coded for the hypothesis function, cost function and gradient descent, and then coded for the logistic regression. However on coding for printing the accuracy I get a low output (0.69) which doesnt change with increasing iterations or changing the learning rate. My question is, is there a problem with my accuracy code below? Any help pointing to the right direction would be appreciated
X = data[['radius_mean', 'texture_mean', 'perimeter_mean',
'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
'fractal_dimension_se', 'radius_worst', 'texture_worst',
'perimeter_worst', 'area_worst', 'smoothness_worst',
'compactness_worst', 'concavity_worst', 'concave points_worst',
'symmetry_worst', 'fractal_dimension_worst']]
X = np.array(X)
X = min_max_scaler.fit_transform(X)
Y = data["diagnosis"].map({'M':1,'B':0})
Y = np.array(Y)
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25)
X = data["diagnosis"].map(lambda x: float(x))
def Sigmoid(z):
if z < 0:
return 1 - 1/(1 + math.exp(z))
else:
return 1/(1 + math.exp(-z))
def Hypothesis(theta, x):
z = 0
for i in range(len(theta)):
z += x[i]*theta[i]
return Sigmoid(z)
def Cost_Function(X,Y,theta,m):
sumOfErrors = 0
for i in range(m):
xi = X[i]
hi = Hypothesis(theta,xi)
error = Y[i] * math.log(hi if hi >0 else 1)
if Y[i] == 1:
error = Y[i] * math.log(hi if hi >0 else 1)
elif Y[i] == 0:
error = (1-Y[i]) * math.log(1-hi if 1-hi >0 else 1)
sumOfErrors += error
constant = -1/m
J = constant * sumOfErrors
#print ('cost is: ', J )
return J
def Cost_Function_Derivative(X,Y,theta,j,m,alpha):
sumErrors = 0
for i in range(m):
xi = X[i]
xij = xi[j]
hi = Hypothesis(theta,X[i])
error = (hi - Y[i])*xij
sumErrors += error
m = len(Y)
constant = float(alpha)/float(m)
J = constant * sumErrors
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = []
constant = alpha/m
for j in range(len(theta)):
CFDerivative = Cost_Function_Derivative(X,Y,theta,j,m,alpha)
new_theta_value = theta[j] - CFDerivative
new_theta.append(new_theta_value)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test, Hypothesis(X,theta))
for i in range(length):
prediction = round(Hypothesis(X[i],theta))
answer = Y[i]
if prediction == answer.all():
correct += 1
my_accuracy = (correct / length)*100
print ('LR Accuracy %: ', my_accuracy)
def Logistic_Regression(X,Y,alpha,theta,num_iters):
theta = np.zeros(X.shape[1])
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
Cost_Function(X,Y,theta,m)
print ('theta: ', theta)
print ('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
initial_theta = [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
alpha = 0.0001
iterations = 1000
Logistic_Regression(X,Y,alpha,initial_theta,iterations)
This is using data from the wisconsin breast cancer dataset (https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) where I am weighing in 30 features - although changing the features to ones which are known to correlate also doesn't change my accuracy.
Python gives us this scikit-learn library that makes our work easier,
this worked for me:
from sklearn.metrics import accuracy_score
y_pred = log.predict(x_test)
score =accuracy_score(y_test,y_pred)
Accuracy is one of the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. Higher accuracy means model is preforming better.
Accuracy = TP+TN/TP+FP+FN+TN
TP = True positives
TN = True negatives
FN = False negatives
TN = True negatives
While you are using accuracy measure your false positives and false negatives should be of similar cost. A better metric is the F1-score which is given by
F1-score = 2*(Recall*Precision)/Recall+Precision where,
Precision = TP/TP+FP
Recall = TP/TP+FN
Read more here
https://en.wikipedia.org/wiki/Precision_and_recall
The beauty about machine learning in python is that important modules like scikit-learn is open source so you can always look at the actual code.
Please use the below link to scikit learn metrics source code which will give you an idea how scikit-learn calculates the accuracy score when you do
from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)
https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/metrics
I'm not sure how you arrived at a value of 0.0001 for alpha, but I think it's too low. Using your code with the cancer data shows that cost is decreasing with each iteration -- it's just going glacially.
When I raise this to 0.5, I still get a decreasing costs, but at a more reasonable level. After 1000 iterations it reports:
cost: 0.23668000993020666
And after fixing the Accuracy function I'm getting 92% on the test segment of the data.
You have Numpy installed, as shown by X = np.array(X). You should really consider using it for your operations. It will be orders of magnitude faster for jobs like this. Here is a vectorized version that gives results instantly rather than waiting:
import math
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
df = pd.read_csv("cancerdata.csv")
X = df.values[:,2:-1].astype('float64')
X = (X - np.mean(X, axis =0)) / np.std(X, axis = 0)
## Add a bias column to the data
X = np.hstack([np.ones((X.shape[0], 1)),X])
X = MinMaxScaler().fit_transform(X)
Y = df["diagnosis"].map({'M':1,'B':0})
Y = np.array(Y)
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25)
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, x):
return Sigmoid(x # theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T # (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy %: ', my_accuracy)
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
#print ('theta: ', theta)
print ('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 2000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)
I think I might have a different versions of scikit, because I had change the MinMaxScaler line to make it work. The result is that I can 10K iterations in the blink of an eye and the results of the applying the model to the test set is about 97% accuracy.
This also works using Vectorization to calculate the accuracy
But Accuracy is not recommended metric as the above Answer noted (if the data is not well_blanced you should not use accuracy instead you use F1-score)
clf = sklearn.linear_model.LogisticRegressionCV();
clf.fit(X.T, Y.T);
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y,LR_predictions) + np.dot(1-Y,1-LR_predictions))/float(Y.size)*100) +
'% ' + "(percentage of correctly labelled datapoints)")

How to get accurate predictions from Neural Network?

I'm doing a project on water quality prediction using Artificial Neural Network. I implemented this using python. I have completed my prediction model but the generated predictions are not much accurate.
What I'm doing is I have collected data from a river for past 4 and half years on daily basis and I'm predicting a pattern for a specific parameter by inputting data from past records. Simply what I need to do is to predict "Turbidity level" of water on 2015 by feeding data on turbidity from 2012-2014.
From the model which I have created it is not much accurate when I compare to the real data I have gathered for 2015. Please help me to solve this. I tried this by changing hidden layer sizes and the Lambda value.
//This is my code
import xlrd
import numpy as np
from numpy import zeros
from scipy.optimize import minimize
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy import optimize
#Neural Network
class Neural_Network(object):
def __init__(self,Lambda):
#Define Hyperparameters
self.inputLayerSize = 2
self.outputLayerSize = 1
self.hiddenLayerSize = 10
#Weights (parameters)
self.W1 = np.random.randn(self.inputLayerSize,self.hiddenLayerSize)
self.W2 = np.random.randn(self.hiddenLayerSize,self.outputLayerSize)
#Regularization Parameter:
self.Lambda = Lambda
def forward(self, arrayInput):
#Propogate inputs though network
self.z2 = np.dot(arrayInput, self.W1)
self.a2 = self.sigmoid(self.z2)
self.z3 = np.dot(self.a2, self.W2)
yHat = self.sigmoid(self.z3)
return yHat
def sigmoid(self, z):
#Apply sigmoid activation function to scalar, vector, or matrix
return 1/(1+np.exp(-z))
def sigmoidPrime(self,z):
#Gradient of sigmoid
return np.exp(-z)/((1+np.exp(-z))**2)
def costFunction(self, arrayInput, arrayOutput):
#Compute cost for given input,output use weights already stored in class.
self.yHat = self.forward(arrayInput)
#J = 0.5*sum((arrayOutput-self.yHat)**2)
#J = 0.5*sum((arrayOutput-self.yHat)**2)/arrayInput.shape[0] + (self.Lambda/2)
J = 0.5*sum((arrayOutput-self.yHat)**2)/arrayInput.shape[0] + (self.Lambda/2)*sum(sum(self.W1**2),sum(self.W2**2))
#J = 0.5*sum((arrayOutput-self.yHat)**2)/arrayInput.shape[0] + (self.Lambda/2)*(sum(self.W1**2)+sum(self.W2**2))
return J
def costFunctionPrime(self, arrayInput, arrayOutput):
#Compute derivative with respect to W and W2 for a given X and y:
self.yHat = self.forward(arrayInput)
delta3 = np.multiply(-(arrayOutput-self.yHat), self.sigmoidPrime(self.z3))
#Add gradient of regularization term:
#dJdW2 = np.dot(self.a2.T, delta3) + self.Lambda*self.W2
dJdW2 = np.dot(self.a2.T, delta3)
delta2 = np.dot(delta3, self.W2.T)*self.sigmoidPrime(self.z2)
#Add gradient of regularization term:
#dJdW1 = np.dot(arrayInput.T, delta2)+ self.Lambda*self.W1
dJdW1 = np.dot(arrayInput.T, delta2)
return dJdW1, dJdW2
#Helper Functions for interacting with other classes:
def getParams(self):
#Get W1 and W2 unrolled into vector:
params = np.concatenate((self.W1.ravel(), self.W2.ravel()))
return params
def setParams(self, params):
#Set W1 and W2 using single paramater vector.
W1_start = 0
W1_end = self.hiddenLayerSize * self.inputLayerSize
self.W1 = np.reshape(params[W1_start:W1_end], (self.inputLayerSize , self.hiddenLayerSize))
W2_end = W1_end + self.hiddenLayerSize*self.outputLayerSize
self.W2 = np.reshape(params[W1_end:W2_end], (self.hiddenLayerSize, self.outputLayerSize))
def computeGradients(self, arrayInput, arrayOutput):
dJdW1, dJdW2 = self.costFunctionPrime(arrayInput, arrayOutput)
return np.concatenate((dJdW1.ravel(), dJdW2.ravel()))
def computeNumericalGradient(self,N, X, y):
paramsInitial = N.getParams()
numgrad = np.zeros(paramsInitial.shape)
perturb = np.zeros(paramsInitial.shape)
e = 1e-4
for p in range(len(paramsInitial)):
#Set perturbation vector
perturb[p] = e
N.setParams(paramsInitial + perturb)
loss2 = N.costFunction(X, y)
N.setParams(paramsInitial - perturb)
loss1 = N.costFunction(X, y)
#Compute Numerical Gradient
numgrad[p] = (loss2 - loss1) / (2*e)
#Return the value we changed to zero:
perturb[p] = 0
#Return Params to original value:
N.setParams(paramsInitial)
return numgrad
#Trainer class
class trainer(object):
def __init__(self, N):
self.N = N
def costFunctionWrapper(self, params, arrayInput, arrayOutput):
self.N.setParams(params)
cost = self.N.costFunction(arrayInput, arrayOutput)
#grad = self.N.computeGradients(arrayInput, arrayOutput)
grad = self.N.computeNumericalGradient(self.N,arrayInput, arrayOutput)
return cost, grad
def callbackF(self, params):
self.N.setParams(params)
self.J.append(self.N.costFunction(self.arrayInput, self.arrayOutput))
self.testJ.append(self.N.costFunction(self.TestInput, self.TestOutput))
def train(self, arrayInput, arrayOutput,TestInput,TestOutput):
#Make an internal variable for the callback function:
self.arrayInput = arrayInput
self.arrayOutput = arrayOutput
self.TestInput = TestInput
self.TestOutput = TestOutput
#Make empty list to store costs:
self.J = []
self.testJ= []
params0 = self.N.getParams()
options = {'maxiter': 200, 'disp' : True}
_res = optimize.minimize(self.costFunctionWrapper, params0, jac=True, method='BFGS', \
args=(arrayInput, arrayOutput), options=options, callback=self.callbackF)
self.N.setParams(_res.x)
self.optimizationResults = _res
#Main Program
path = "F:\prototype\\newdata\\tody\\turbidity\\c.xlsx"
book = xlrd.open_workbook(path)
input1=[]
output=[]
testinput=[]
testoutput=[]
#training data set
first_sheet = book.sheet_by_index(1)
for row in range(first_sheet.ncols-1):
input1.append(first_sheet.col_values(row))
for row in range((first_sheet.ncols-1),first_sheet.ncols ):
output.append(first_sheet.col_values(row))
arrayInput = np.asarray(input1)
arrayInput = arrayInput.T
arrayOutput = np.asarray(output)
arrayOutput = arrayOutput.T
#testing data set
first_sheet1 = book.sheet_by_index(0)
for row in range(first_sheet1.ncols-1):
testinput.append(first_sheet1.col_values(row))
for row in range((first_sheet1.ncols-1),first_sheet1.ncols ):
testoutput.append(first_sheet1.col_values(row))
TestInput = np.asarray(testinput)
TestInput = TestInput.T
TestOutput = np.asarray(testoutput)
TestOutput = TestOutput.T
#2016
input2016=[]
first_sheet2 = book.sheet_by_index(2)
for row in range(first_sheet2.ncols):
input2016.append(first_sheet2.col_values(row))
Input = np.asarray(input2016)
Input = Input.T
# Scaling
arrayInput = arrayInput / np.amax(arrayInput, axis=0)
arrayOutput = arrayOutput / np.amax(arrayOutput, axis=0)
TestInput = TestInput / np.amax(TestInput, axis=0)
Input = Input / np.amax(Input, axis=0)
TestOutput = TestOutput / np.amax(TestOutput, axis=0)
NN=Neural_Network(Lambda=0.00000000000001)
T = trainer(NN)
T.train(arrayInput,arrayOutput,TestInput,TestOutput)
print NN.costFunctionPrime(arrayInput,arrayOutput)
Output = NN.forward(Input)
print Output
print '----------'
#print TestOutput
#plt.plot(T.J)
plt.plot(Output)
plt.grid(1)
plt.xlabel('Iterations')
plt.ylabel('cost')
plt.show()
//Turbidity means 2015 real data and prediction means data predicted using this code
Some of the comments suggest scaling the output sigmoidal layer to match the correct data. If you look at your predictions, you will see that with some scaling they are pretty accurate. I advise against scaling a sigmoidal function, however.
A sigmoidal output is meant to be interpreted as a probability (given certain constraints are followed), so scaling it would be breaking that contract and could give undefined results. What happens if you scale from 0-100, but then start receiving training targets larger than 100? (assuming you are training an online system, otherwise perhaps that example is not relevant)
I would change your code to use a linear output layer. This would not require any manipulation of the data after training the network. Also given that your cost function is least squares, the linear output layer will be convex (which reduces the number of local optima that your algorithm can get stuck in).

Categories

Resources