I'm trying to create a neural network for binary classification on the breast cancer dataset:
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
My neural network consists of 3 layers(not including input layer):
first layer: 6 neurons with tanh activation.
second layer: 6 neurons with tanh activation.
final layer: 1 neuron with sigmoid activation.
Unfortunately, I'm only getting around 44% accuracy in the training examples and around 23% accuracy in the test examples.
Here is my python code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("data.csv")
data = data.drop(['id'], axis = 1)
data = data.drop(data.columns[31], axis = 1)
data = data.replace({'M': 1, 'B': 0})
X = data
X = X.drop(['diagnosis'], axis = 1)
X = np.array(X)
X_mean = np.mean(X, axis = 1, keepdims = True)
X_std = np.std(X, axis = 1, keepdims = True)
X_n = (X - X_mean) / X_std
y = np.array(data['diagnosis'])
y = y.reshape(569, 1)
m = 378
y_train = y[:m, :]
y_test = y[m:, :]
X_train = X_n[:m, :]
X_test = X_n[m:, :]
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def dsigmoid(z):
return np.multiply(z, (1 - z))
def tanh(z):
return (np.exp(z) - np.exp(-z)) / (np.exp(z) + np.exp(-z))
def dtanh(z):
return 1 - np.square(tanh(z))
def cost(A, Y):
m = Y.shape[0]
return -(1.0/m) *np.sum( np.dot(Y.T, np.log(A)) + np.dot((1 - Y).T, np.log(1-A)))
def train(X, y ,model, epocs, a):
W1 = model['W1']
W2 = model['W2']
W3 = model['W3']
b1 = model['b1']
b2 = model['b2']
b3 = model['b3']
costs = []
for i in range(epocs):
#forward propagation
z1 = np.dot(X, W1) + b1
a1 = tanh(z1)
z2 = np.dot(a1, W2) + b2
a2 = tanh(z2)
z3 = np.dot(a2, W3) + b3
a3 = sigmoid(z3)
costs.append(cost(a3, y))
#back propagation
dz3 = z3 - y
d3 = np.multiply(dz3, dsigmoid(z3))
dW3 = np.dot(a2.T, d3)
db3 = np.sum(d3, axis = 0, keepdims=True)
d2 = np.multiply(np.dot(d3, W3.T), dtanh(z2))
dW2 = np.dot(a1.T, d2)
db2 = np.sum(d2, axis = 0, keepdims=True)
d1 = np.multiply(np.dot(d2, W2.T), dtanh(z1))
dW1 = np.dot(X.T, d1)
db1 = np.sum(d1, axis = 0, keepdims=True)
W1 -= (a / m) * dW1
W2 -= (a / m) * dW2
W3 -= (a / m) * dW3
b1 -= (a / m) * db1
b2 -= (a / m) * db2
b3 -= (a / m) * db3
cache = {'W1': W1, 'W2': W2, 'W3': W3, 'b1': b1, 'b2': b2, 'b3': b3}
return cache, costs
np.random.seed(0)
model = {'W1': np.random.rand(30, 6) * 0.01, 'W2': np.random.rand(6, 6) * 0.01, 'W3': np.random.rand(6, 1) * 0.01, 'b1': np.random.rand(1, 6), 'b2': np.random.rand(1, 6), 'b3': np.random.rand(1, 1)}
model, costss = train(X_train, y_train, model, 1000, 0.1)
plt.plot([i for i in range(1000)], costss)
print(costss[999])
plt.show()
def predict(X,y ,model):
W1 = model['W1']
W2 = model['W2']
W3 = model['W3']
b1 = model['b1']
b2 = model['b2']
b3 = model['b3']
z1 = np.dot(X, W1) + b1
a1 = tanh(z1)
z2 = np.dot(a1, W2) + b2
a2 = tanh(z2)
z3 = np.dot(a2, W3) + b3
a3 = sigmoid(z3)
m = a3.shape[0]
y_predict = np.zeros((m, 1))
for i in range(m):
y_predict = 1 if a3[i, 0] > 0.5 else 0
return y_predict
Thanks for helping :)
I think there is a problem with your backpropagation (I made a quick test and tried your model on Tensorflow and it achieves around 92% accuracy on both train and test data).
I've made the following modification to your code:
dz3 = a3 - y
d3 = np.multiply(dz3, dsigmoid(a3))
Also your function predict returns only one number whereas it should return as many number as examples therefore instead of
y_predict = np.zeros((m, 1))
for i in range(m):
y_predict = 1 if a3[i, 0] > 0.5 else 0
return y_predict
I changed this part to
y_predict[a3[:,0] > 0.5] = 1
return y_predict
I ran the training with 2000 epochs and increase the learning rate to 1 (a=1)
Related
I have a neural network in Python, but it gives almost the exact same prediction for each data point and I can't work out why this is. I have tried altering the features I use to make the predictions but I get the same issue. Thanks for any help.
I have a data file which looks like this:
Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
from kaggle.
My neural network code is this:
import numpy as np
import pandas as pd
data = pd.read_csv("diabetes.csv", header=0)
print(data.head())
training_examples = data[["BloodPressure", "Glucose", "Outcome"]]
X = training_examples[["BloodPressure", "Glucose"]].to_numpy()
y = training_examples[["Outcome"]].to_numpy()
DIMENSIONS = 2
HIDDEN_LAYER = 20
# Set up the training data
# X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# y = np.array([[0], [1], [1], [0]])
# Set the number of epochs and the learning rate
num_epochs = 10
learning_rate = 0.1
# Initialize the weights and biases
w1 = np.random.randn(DIMENSIONS, HIDDEN_LAYER)
b1 = np.zeros((1, HIDDEN_LAYER))
w2 = np.random.randn(HIDDEN_LAYER, 1)
b2 = np.zeros((1, 1))
# Define the sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Define the derivative of the sigmoid function
def sigmoid_derivative(x):
return x * (1 - x)
# Train the network
for epoch in range(num_epochs):
# Forward pass
z1 = np.dot(X, w1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, w2) + b2
a2 = sigmoid(z2)
# Calculate the loss
loss = np.mean((a2 - y)**2)
# Print the loss every 100 epochs
if epoch % 100 == 0:
print(f'Epoch {epoch}: loss = {loss}')
# Backpropagation
dz2 = a2 - y
dw2 = np.dot(a1.T, dz2)
db2 = np.sum(dz2, axis=0)
da1 = np.dot(dz2, w2.T)
dz1 = da1 * sigmoid_derivative(a1)
dw1 = np.dot(X.T, dz1)
db1 = np.sum(dz1, axis=0)
# Update the weights and biases
w1 -= learning_rate * dw1
b1 -= learning_rate * db1
w2 -= learning_rate * dw2
b2 -= learning_rate * db2
# Make predictions on the test data
predictions = a2
# Print the predictions
print(predictions)
So I made a simple neural network for MNIST (784 input neurons, 30 hidden neurons, and 10 output neurons), but the cost function (MSE) always increases to 4.5 and never decreases, and the output neurons eventually all just output 1. Here's the code:
np.set_printoptions(suppress=True)
epochs = 50
batch = 60000
learning_rate = 3
B1 = np.random.randn(30, 1)
B2 = np.random.randn(10, 1)
W1 = np.random.randn(784, 30)
W2 = np.random.randn(30, 10)
for i in range(epochs):
X, Y = shuffle(X, Y)
c_B1 = np.zeros(B1.shape)
c_B2 = np.zeros(B2.shape)
c_W1 = np.zeros(W1.shape)
c_W2 = np.zeros(W2.shape)
for b in range(0, np.size(X, 0), batch):
inputs = X[b:b+batch]
outputs = Y[b:b+batch]
Z1 = nn_forward(inputs, W1.T, B1)
A1 = sigmoid(Z1)
Z2 = nn_forward(A1, W2.T, B2)
A2 = sigmoid(Z2)
e_L = (outputs - A2) * d_sig(Z2)
e_1 = np.multiply(np.dot(e_L, W2.T), d_sig(Z1))
d_B2 = np.sum(e_L, axis=0)
d_B1 = np.sum(e_1, axis=0)
d_W2 = np.dot(A1.T, e_L)
d_W1 = np.dot(inputs.T, e_1)
d_B2 = d_B2.reshape((np.size(B2, 0), 1))
d_B1 = d_B1.reshape((np.size(B1, 0), 1))
c_B1 = np.add(c_B1, d_B1)
c_B2 = np.add(c_B2, d_B2)
c_W1 = np.add(c_W1, d_W1)
c_W2 = np.add(c_W2, d_W2)
B1 = np.subtract(B1, (learning_rate/batch) * c_B1)
B2 = np.subtract(B2, (learning_rate/batch) * c_B2)
W1 = np.subtract(W1, (learning_rate/batch) * c_W1)
W2 = np.subtract(W2, (learning_rate/batch) * c_W2)
print(i, cost(outputs, A2))
What am I doing wrong?
Two things I notice right away:
Why do you use MSE as loss-function for a classification problem? MSE Is usually used for regression problems. Try using crossentropy.
You have sigmoid as output activation, which maps your input x to the interval (0,1), so in case you like to do classification you should look at the argmax of your output vector and use this as predicted class label.
I am trying to implement a neural network which have around 2000 inputs.
I have made some tests with the iris data set in order to check it and it seems to work, but when I am running my test it throws wrong results, most of the time, for all the tests, I obtain the same output for every data. I am afraid if it is somehow related to the bias process and the gradient update, maybe you guys can spot the error or give me some advice.
Here is part of the code for the backpropagation process.
def backward_propagation(parameters, cache, X, Y):
#weights
W1 = parameters['W1']
W2 = parameters['W2']
#Outputs after activation function
A1 = cache['A1']
A2 = cache['A2']
dZ2= A2 - Y
dW2 = np.dot(dZ2, A1.T)
db2 = np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
dW1 = np.dot(dZ1, X.T)
db1 = np.sum(dZ1, axis=1, keepdims=True)
gradient = {"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2}
return gradient
It is extremely difficult to see if it is really working as it should if you do not provide the prediction and forward function.
That way we can know what is being done exactly and see if the backpropagation is really correct.
You are not correctly deriving the sigmoid function and I think that you are not correctly applying the chain rule either.
From what I see you are using this architecture:
The gradients would be (apply chain rule):
In your code it is translated in the following way:
W1 = parameters['W1']
W2 = parameters['W2']
#Outputs after activation function
A1 = cache['A1']
A2 = cache['A2']
dA2= A2 - Y
dfc2 = dA2*A2*(1 - A2)
dA1 = np.dot(dfc2, W2.T)
dW2 = np.dot(A1.T, dfc2)
db2 = np.sum(dA2, axis=1, keepdims=True)
dfc1 = dA1*A1*(1 - A1)
dA1 = np.dot(dfc1, W1.T)
dW1 = np.dot(X.T, dfc1)
db1 = np.sum(dA1, axis=1, keepdims=True)
gradient = {
"dW1": np.sum(dW1, axis=0),
"db1": np.sum(db1, axis=0),
"dW2": np.sum(dW2, axis=0),
"db2": np.sum(db2, axis=0)
}
I check doing the following code:
import numpy as np
W1 = np.random.rand(30, 10)
b1 = np.random.rand(10)
W2 = np.random.rand(10, 1)
b2 = np.random.rand(1)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
X = np.random.rand(100, 30)
Y = np.ones(shape=(100, 1)) #...
for i in range(100000000):
fc1 = X.dot(W1) + b1
A1 = sigmoid(fc1)
fc2 = A1.dot(W2) + b2
A2 = sigmoid(fc2)
L = np.sum(A2 - Y)**2
print(L)
dA2= A2 - Y
dfc2 = dA2*A2*(1 - A2)
dA1 = np.dot(dfc2, W2.T)
dW2 = np.dot(A1.T, dfc2)
db2 = np.sum(dA2, axis=1, keepdims=True)
dfc1 = dA1*A1*(1 - A1)
dA1 = np.dot(dfc1, W1.T)
dW1 = np.dot(X.T, dfc1)
db1 = np.sum(dA1, axis=1, keepdims=True)
gradient = {
"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2
}
W1 -= 0.1*np.sum(dW1, axis=0)
W2 -= 0.1*np.sum(dW2, axis=0)
b1 -= 0.1*np.sum(db1, axis=0)
b2 -= 0.1*np.sum(db2, axis=0)
If your last activation is a sigmoid the value will be between 0 and 1. You should keep in mind that normally this is used to indicate a probability and that the cross entropy is normally used as a loss.
I started learning about neural networks and decided to follow this Google code lab on convolutional neural networks, but I decided to use the CIFAR-10 dataset for image classification, but I get very low accuracy and high cross-entropy.
After training the accuracy is around 0.1 (never more than 0.2) and cross-entropy doesn't go below 230. I didn't use batch-normalization or dropout, but I should still get more accuracy here.
My code:
import tensorflow as tf
import numpy as np
import matplotlib as mpt
import math
# Just disables the warning, doesn't enable AVX/FMA
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
def makeMiniBatch(dictionary,start,number):
matrix=np.zeros([number,3072],dtype=np.int)
labels=np.zeros([number],dtype=np.int)
for i in range(0,number):
matrix[i]=dictionary[b'data'][i+start]
labels[i]=dictionary[b'labels'][i+start]
return matrix,labels
def formatLabels(labele):
lab=np.zeros([100,10])
for i in range(0,100):
lab[i][labele[i]]=1
return lab
def formatData(values):
temp = np.zeros([100,32,32,3])
for i in range(0,100):
im_r = values[i][0:1024].reshape(32, 32)
im_g = values[i][1024:2048].reshape(32, 32)
im_b = values[i][2048:].reshape(32, 32)
temp[i] = np.dstack((im_r, im_g, im_b))
return temp
batch='D:/cifar-10-python/cifar-10-batches-py/data_batch_1'
data=unpickle(batch)
tf.set_random_seed(0)
K = 8
L = 16
M = 32
N = 200
X_=tf.placeholder(tf.float32,[None,32,32,3])
Y_=tf.placeholder(tf.float32,[None,10])
lr = tf.placeholder(tf.float32)
W1 = tf.Variable(tf.truncated_normal([5, 5, 3, K], stddev=0.1))
B1 = tf.Variable(tf.ones([K])/10)
W2 = tf.Variable(tf.truncated_normal([5, 5, K, L], stddev=0.1))
B2 = tf.Variable(tf.ones([L])/10)
W3 = tf.Variable(tf.truncated_normal([4, 4, L, M], stddev=0.1))
B3 = tf.Variable(tf.ones([M])/10)
W4 = tf.Variable(tf.truncated_normal([8 * 8 * M, N], stddev=0.1))
B4 = tf.Variable(tf.ones([N])/10)
W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1))
B5 = tf.Variable(tf.ones([10])/10)
stride = 1
Y1_ = tf.nn.conv2d(X_, W1, strides=[1, stride, stride, 1], padding='SAME') +
B1
Y1_max=tf.nn.max_pool(Y1_,ksize=[1,2,2,1],strides=[1,1,1,1],padding='SAME')
Y1 = tf.nn.relu(Y1_max)
Y2_ = tf.nn.conv2d(Y1, W2, strides=[1, stride, stride, 1], padding='SAME') +
B2
Y2_max=tf.nn.max_pool(Y2_,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
Y2 = tf.nn.relu(Y2_max)
Y3_ = tf.nn.conv2d(Y2, W3, strides=[1, stride, stride, 1], padding='SAME') +
B3
Y3_max=tf.nn.max_pool(Y3_,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
Y3 = tf.nn.relu(Y3_max)
YY = tf.reshape(Y3, shape=[-1, 8 * 8 * M])
Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4)
Ylogits = tf.matmul(Y4, W5) + B5
Y = tf.nn.softmax(Ylogits)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits,
labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy)*100
correct_prediction=tf.equal(tf.argmax(Y,1),tf.argmax(Y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
train_step = tf.train.AdamOptimizer(lr).minimize(cross_entropy)
init=tf.global_variables_initializer()
sess=tf.Session()
sess.run(init)
def training_step(i):
global data
val,lab=makeMiniBatch(data,i * 100,100)
Y_labels=formatLabels(lab)
X_data=formatData(val)
max_learning_rate = 0.003
min_learning_rate = 0.0001
decay_speed = 2000.0
learning_rate = min_learning_rate + (max_learning_rate -
min_learning_rate) * math.exp(-i/decay_speed)
_,a,c = sess.run([train_step,accuracy, cross_entropy], feed_dict={X_:
X_data, Y_: Y_labels, lr:learning_rate})
print("Accuracy: ",a)
print("Cross-Entropy",c)
for i in range (0,100):
training_step(i%100)
Thanks Maxim, the normalization worked and after 30 seconds of training the network achieved an accuracy of 40%.
The changes I made to my code are the following:
def formatDatanew2(values):
ret=values.reshape(100,3,32,32).transpose(0,2,3,1).astype("float32")
ret/=255
return ret
Here I'm attempting to implement a 2 layer neural network using numpy alone. Code below is just computing forward propagation.
The training data is two examples where the inputs are 5 dimensions and outputs are 4 dimensions. When I attempt to run my network :
# Two Layer Neural network
import numpy as np
M = 2
learning_rate = 0.0001
X_train = np.asarray([[1,1,1,1,1] , [1,1,1,1,1]])
Y_train = np.asarray([[0,0,0,0] , [1,0,0,0]])
X_trainT = X_train.T
Y_trainT = Y_train.T
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
w1=np.zeros((Y_trainT.shape[0], X_trainT.shape[0]))
b1=np.zeros((Y_trainT.shape[0], 1))
A1 = sigmoid(np.dot(w1 , X_trainT))
w2=np.zeros((A1.shape[0], w1.shape[0]))
b2=np.zeros((A1.shape[0], 1))
A2 = sigmoid(np.dot(w2 , A1))
# forward propogation
dw1 = ( 1 / M ) * np.dot((A1 - A2) , X_trainT.T / M)
db1 = (A1 - A2).mean(axis=1, keepdims=True)
w1 = w1 - learning_rate * dw1
b1 = b1 - learning_rate * db1
dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M)
db2 = (A2 - Y_trainT).mean(axis=1, keepdims=True)
w2 = w2 - learning_rate * dw2
b2 = b2 - learning_rate * db2
Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
print(Y_prediction_train.T)
I receive error :
ValueError Traceback (most recent call last)
<ipython-input-42-f0462b5940a4> in <module>()
36 b2 = b2 - learning_rate * db2
37
---> 38 Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
39 print(Y_prediction_train.T)
ValueError: shapes (4,4) and (2,5) not aligned: 4 (dim 1) != 2 (dim 0)
I seem to have astray in my linear algebra but I'm not sure where.
Printing the weights and corresponding derivatives :
print(w1.shape)
print(w2.shape)
print(dw1.shape)
print(dw2.shape)
prints :
(4, 5)
(4, 4)
(4, 5)
(4, 4)
How to incorporate 5 dimensions of training examples into this network ?
Have I implemented forward propagation correctly ?
From #Imran answer now using this network :
# Two Layer Neural network
import numpy as np
M = 2
learning_rate = 0.0001
X_train = np.asarray([[1,0,1,1,1] , [1,1,1,1,1]])
Y_train = np.asarray([[0,1,0,0] , [1,0,0,0]])
X_trainT = X_train.T
Y_trainT = Y_train.T
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
w1=np.zeros((Y_trainT.shape[0], X_trainT.shape[0]))
b1=np.zeros((Y_trainT.shape[0], 1))
A1 = sigmoid(np.dot(w1 , X_trainT))
w2=np.zeros((A1.shape[0], w1.shape[0]))
b2=np.zeros((A1.shape[0], 1))
A2 = sigmoid(np.dot(w2 , A1))
# forward propogation
dw1 = ( 1 / M ) * np.dot((A1 - A2) , X_trainT.T / M)
db1 = (A1 - A2).mean(axis=1, keepdims=True)
w1 = w1 - learning_rate * dw1
b1 = b1 - learning_rate * db1
dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M)
db2 = (A2 - Y_trainT).mean(axis=1, keepdims=True)
w2 = w2 - learning_rate * dw2
b2 = b2 - learning_rate * db2
Y_prediction_train = sigmoid(np.dot(w2 , A1) +b2)
print(Y_prediction_train.T)
which prints :
[[ 0.5 0.5 0.4999875 0.4999875]
[ 0.5 0.5 0.4999875 0.4999875]]
I think dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M) should instead be
dw2 = ( 1 / M ) * np.dot((A2 - A1) , A1.T / M) as in order to propagate differences from layer hidden layer 1 to hidden layer 2, is this correct ?
Y_prediction_train = sigmoid(np.dot(w2 , X_train) +b2)
w2 is the weight matrix for your second hidden layer. This should never be multiplied by your input, X_train.
To obtain a prediction you need to factor forward propagation into its own function that takes an input X, first computes A1 = sigmoid(np.dot(w1 , X)), and then returns the result of A2 = sigmoid(np.dot(w2 , A1))
UPDATE:
I think dw2 = ( 1 / M ) * np.dot((A2 - A1) , Y_trainT.T / M) should instead be dw2 = ( 1 / M ) * np.dot((A2 - A1) , A1.T / M) as in order to propagate differences from layer hidden layer 1 to hidden layer 2, is this correct ?
Backpropagation propagates errors backwards. The first step is the calculate the gradient of the loss function with respect to your outputs, which will be A2-Y if you are using Mean Squared Error. This will then be fed in to your terms for the gradients of the loss with respect to the weights and biases of layer 2, so on back to layer 1. You don't want to propagate anything from layer 1 to layer 2 during backprop.
It looks like you almost have it right in your updated question, but I think you want:
dW2 = ( 1 / M ) * np.dot((A2 - Y) , A1.T)
A couple more notes:
You are intializing your weights as zeros. This will not allow the neural network to break symmetry during training, and you will end up with the same weights at every neuron. You should try initializing with random weights in the range [-1,1].
You should put your forward- and backpropagation steps in a loop so you can run them for multiple epochs while your error is still improving.