Neural Network Cost Function Not Minimizing

Neural Network Cost Function Not Minimizing - python

So I made a simple neural network for MNIST (784 input neurons, 30 hidden neurons, and 10 output neurons), but the cost function (MSE) always increases to 4.5 and never decreases, and the output neurons eventually all just output 1. Here's the code:
np.set_printoptions(suppress=True)
epochs = 50
batch = 60000
learning_rate = 3
B1 = np.random.randn(30, 1)
B2 = np.random.randn(10, 1)
W1 = np.random.randn(784, 30)
W2 = np.random.randn(30, 10)
for i in range(epochs):
X, Y = shuffle(X, Y)
c_B1 = np.zeros(B1.shape)
c_B2 = np.zeros(B2.shape)
c_W1 = np.zeros(W1.shape)
c_W2 = np.zeros(W2.shape)
for b in range(0, np.size(X, 0), batch):
inputs = X[b:b+batch]
outputs = Y[b:b+batch]
Z1 = nn_forward(inputs, W1.T, B1)
A1 = sigmoid(Z1)
Z2 = nn_forward(A1, W2.T, B2)
A2 = sigmoid(Z2)
e_L = (outputs - A2) * d_sig(Z2)
e_1 = np.multiply(np.dot(e_L, W2.T), d_sig(Z1))
d_B2 = np.sum(e_L, axis=0)
d_B1 = np.sum(e_1, axis=0)
d_W2 = np.dot(A1.T, e_L)
d_W1 = np.dot(inputs.T, e_1)
d_B2 = d_B2.reshape((np.size(B2, 0), 1))
d_B1 = d_B1.reshape((np.size(B1, 0), 1))
c_B1 = np.add(c_B1, d_B1)
c_B2 = np.add(c_B2, d_B2)
c_W1 = np.add(c_W1, d_W1)
c_W2 = np.add(c_W2, d_W2)
B1 = np.subtract(B1, (learning_rate/batch) * c_B1)
B2 = np.subtract(B2, (learning_rate/batch) * c_B2)
W1 = np.subtract(W1, (learning_rate/batch) * c_W1)
W2 = np.subtract(W2, (learning_rate/batch) * c_W2)
print(i, cost(outputs, A2))
What am I doing wrong?

Two things I notice right away:
Why do you use MSE as loss-function for a classification problem? MSE Is usually used for regression problems. Try using crossentropy.
You have sigmoid as output activation, which maps your input x to the interval (0,1), so in case you like to do classification you should look at the argmax of your output vector and use this as predicted class label.

Related

Neural network gives same prediction for each data point for diabetes.csv

I have a neural network in Python, but it gives almost the exact same prediction for each data point and I can't work out why this is. I have tried altering the features I use to make the predictions but I get the same issue. Thanks for any help.
I have a data file which looks like this:
Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
from kaggle.
My neural network code is this:
import numpy as np
import pandas as pd
data = pd.read_csv("diabetes.csv", header=0)
print(data.head())
training_examples = data[["BloodPressure", "Glucose", "Outcome"]]
X = training_examples[["BloodPressure", "Glucose"]].to_numpy()
y = training_examples[["Outcome"]].to_numpy()
DIMENSIONS = 2
HIDDEN_LAYER = 20
# Set up the training data
# X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# y = np.array([[0], [1], [1], [0]])
# Set the number of epochs and the learning rate
num_epochs = 10
learning_rate = 0.1
# Initialize the weights and biases
w1 = np.random.randn(DIMENSIONS, HIDDEN_LAYER)
b1 = np.zeros((1, HIDDEN_LAYER))
w2 = np.random.randn(HIDDEN_LAYER, 1)
b2 = np.zeros((1, 1))
# Define the sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Define the derivative of the sigmoid function
def sigmoid_derivative(x):
return x * (1 - x)
# Train the network
for epoch in range(num_epochs):
# Forward pass
z1 = np.dot(X, w1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, w2) + b2
a2 = sigmoid(z2)
# Calculate the loss
loss = np.mean((a2 - y)**2)
# Print the loss every 100 epochs
if epoch % 100 == 0:
print(f'Epoch {epoch}: loss = {loss}')
# Backpropagation
dz2 = a2 - y
dw2 = np.dot(a1.T, dz2)
db2 = np.sum(dz2, axis=0)
da1 = np.dot(dz2, w2.T)
dz1 = da1 * sigmoid_derivative(a1)
dw1 = np.dot(X.T, dz1)
db1 = np.sum(dz1, axis=0)
# Update the weights and biases
w1 -= learning_rate * dw1
b1 -= learning_rate * db1
w2 -= learning_rate * dw2
b2 -= learning_rate * db2
# Make predictions on the test data
predictions = a2
# Print the predictions
print(predictions)

how impelemnt correlation rule for learning algorithm mlp replace backpropagation?

this code mlp with backpropagation learning algorithm. i want impelemnt corralation rule replace backpropagation. how i can?
def f_forward(x, w1, w2):
Z_hidden = x.dot(w1)
A_hidden = sigmoid(Z_hidden)
Z_out = A_hidden.dot(w2)
A_out = sigmoid(Z_out)
return(A_out)
def learning_algorithm(x, Y, w1, w2, alpha):
# hidden layer
z1 = x.dot(w1)# input from layer 1
a1 = sigmoid(z1)# output of layer 2
# Output layer
z2 = a1.dot(w2)# input of out layer
a2 = sigmoid(z2)# output of out layer
# error in output layer
d2 =(a2-Y)
d1 = np.multiply((w2.dot((d2.transpose()))).transpose(),
(np.multiply(a1, 1-a1)))
# Gradient for w1 and w2
w1_adj = x.transpose().dot(d1)
w2_adj = a1.transpose().dot(d2)
# Updating parameters
w1 = w1-(alpha*(w1_adj))
w2 = w2-(alpha*(w2_adj))
return(w1, w2)
this code is corralation and backpropagationm, but don't implemented for mlp with 1 hidden layer, and i don't know that how change this for my problem.
def correlation(w,x,y,lr): # 相关性算法
x1 = [1,x[0],x[1]] # 添加偏置
w1 = [ww+lr*y*xx for ww,xx in zip(w,x1)] # 输入和权值求点积
return w1
def perceptron(w,x,y,lr):
x1 = [1,x[0],x[1]]
wTx = sum([ww*xx for ww,xx in zip(w,x1)])
o = 1 if wTx >= 0 else -1
w1 = [ww+lr*(y-o)*xx for ww,xx in zip(w,x1)]
return w1

Having trouble getting TensorFlow to do something trivial

I have a vector x and want to compute a vector y such that y[j] = x[j]**2 using a neural network specified by TensorFlow, below. It doesn't work so well, the error is high.
Am I doing something wrong?
Any help will be appreciated
The way it works is it first generates data in Xtrain, Ytrain, Xtest, and Ytest and then creates placeholder variables to get TensorFlow going.
Then it specifies three hidden layers and one output layer. Then it trains, and Ypred, the prediction for Ytest, is created using a feed dictionary.
import numpy as np
import tensorflow as tf
n = 10
k = 1000
n_hidden = 10
learning_rate = .01
training_epochs = 100000
Xtrain = []
Ytrain = []
Xtest = []
Ytest = []
for i in range(0,k,1):
X = np.random.randn(1,n)[0]
Xtrain += [X]
Ytrain += [Xtrain[-1]**2]
X = np.random.randn(1,n)[0]
Xtest += [X]
Ytest += [Xtest[-1]**2]
x = tf.placeholder(tf.float64,shape = (k,n))
y = tf.placeholder(tf.float64,shape = (k,n))
W1 = tf.Variable(tf.random_normal((n,n_hidden),dtype = tf.float64))
b1 = tf.Variable(tf.random_normal((n_hidden,),dtype = tf.float64))
x_hidden1 = tf.nn.sigmoid(tf.matmul(x,W1) + b1)
W2 = tf.Variable(tf.random_normal((n,n_hidden),dtype = tf.float64))
b2 = tf.Variable(tf.random_normal((n_hidden,),dtype = tf.float64))
x_hidden2 = tf.nn.sigmoid(tf.matmul(x_hidden1,W2) + b2)
W3 = tf.Variable(tf.random_normal((n,n_hidden),dtype = tf.float64))
b3 = tf.Variable(tf.random_normal((n_hidden,),dtype = tf.float64))
x_hidden3 = tf.nn.sigmoid(tf.matmul(x_hidden1,W3) + b3)
W4 = tf.Variable(tf.random_normal((n,n_hidden),dtype = tf.float64))
b4 = tf.Variable(tf.random_normal((n_hidden,),dtype = tf.float64))
y_pred = tf.matmul(x_hidden3,W4) + b4
penalty = tf.reduce_sum(tf.abs((y - y_pred)))
train_op = tf.train.AdamOptimizer(learning_rate).minimize(penalty)
model = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(model)
for i in range(0,training_epochs):
sess.run(train_op,{x: Xtrain,y: Ytrain})
Ypred = y_pred.eval(feed_dict = {x: Xtest})

Here is just some simple modification of your code:
import numpy as np
import tensorflow as tf
n = 10
k = 1000
learning_rate = 1e-3
training_epochs = 100000
# It will be better for you to use PEP8 style
# None here will allow you to feed data with ANY k size
x = tf.placeholder(tf.float64, shape=(None, n))
y = tf.placeholder(tf.float64, shape=(None, n))
# Use default layer constructors
# from your implementation it uses another random initializer
out = tf.layers.dense(x, 100)
out = tf.layers.batch_normalization(out)
# ReLU is better than sigmoid, there are a lot of articles about it
out = tf.nn.relu(out)
out = tf.layers.dense(out, 200)
out = tf.layers.batch_normalization(out)
out = tf.nn.relu(out)
out = tf.layers.dense(out, n)
# total loss = mean L1 for samples
# each sample is a vector of 10 values, so you need to calculate
# sum along first axis, and them calculate mean of sums
l1 = tf.reduce_mean(tf.reduce_sum(tf.abs(y - out), axis=1))
train_op = tf.train.AdamOptimizer(learning_rate).minimize(l1)
model = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(model)
for i in range(training_epochs):
xs = np.random.randn(k, n)
ys = xs ** 2
_, l1_value = sess.run(
[train_op, l1],
feed_dict={x: xs, y: ys})
if (i + 1) % 10000 == 0 or i == 0:
print('Current l1({}/{}) = {}'.format(
i + 1, training_epochs, l1_value))
xs = np.random.randn(k, n)
ys = xs ** 2
test_l1 = sess.run(l1, feed_dict={x: xs, y: ys})
print ('Total l1 at test = {}'.format(test_l1))
Output:
Current l1(1/100000) = 11.0853215657
Current l1(10000/100000) = 0.126037403282
Current l1(20000/100000) = 0.096445475666
Current l1(30000/100000) = 0.0719392853473
Current l1(40000/100000) = 0.0690671103719
Current l1(50000/100000) = 0.07661241544
Current l1(60000/100000) = 0.0743827124406
Current l1(70000/100000) = 0.0656016587469
Current l1(80000/100000) = 0.0675546809828
Current l1(90000/100000) = 0.0649035400487
Current l1(100000/100000) = 0.0583308788607
Total l1 at test = 0.0613149096968
Total penalty may be enhanced using some other architecture, learning rate, batch size, count of epochs, loss function, e.t.c.
It looks that architecture may be increased, then you will be able to run training for a long period of time to get 1e-3.
More information about how it works and how to do it you can find at CS231 course.
P.S. here is some assumption about data feeding: some data on which I tested might was at the training process. Because task is simple it's OK, but it's better to guarantee, that not any train sample will be in test set.

This code does a lot better. Anyone want to make any further improvements?
import numpy as np
import tensorflow as tf
n = 10
k = 1000
n_hidden = 50
learning_rate = .001
training_epochs = 100000
Xtrain = []
Ytrain = []
Xtest = []
Ytest = []
for i in range(0,k,1):
X = np.random.randn(1,n)[0]
Xtrain += [X]
Ytrain += [Xtrain[-1]**2]
X = np.random.randn(1,n)[0]
Xtest += [X]
Ytest += [Xtest[-1]**2]
x = tf.placeholder(tf.float64,shape = (k,n))
y = tf.placeholder(tf.float64,shape = (k,n))
W1 = tf.Variable(tf.random_normal((n,n_hidden),dtype = tf.float64))
b1 = tf.Variable(tf.random_normal((n_hidden,),dtype = tf.float64))
x_hidden1 = tf.nn.sigmoid(tf.matmul(x,W1) + b1)
W2 = tf.Variable(tf.random_normal((n_hidden,n_hidden),dtype = tf.float64))
b2 = tf.Variable(tf.random_normal((n_hidden,),dtype = tf.float64))
x_hidden2 = tf.nn.sigmoid(tf.matmul(x_hidden1,W2) + b2)
W3 = tf.Variable(tf.random_normal((n_hidden,n),dtype = tf.float64))
b3 = tf.Variable(tf.random_normal((n,),dtype = tf.float64))
y_pred = tf.matmul(x_hidden2,W3) + b3
penalty = tf.reduce_sum((y - y_pred)**2)
train_op = tf.train.AdamOptimizer(learning_rate).minimize(penalty)
model = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(model)
for i in range(0,training_epochs):
sess.run(train_op,{x: Xtrain,y: Ytrain})
Ypred = y_pred.eval(feed_dict = {x: Xtest})

keras very slow compared to low level TF?

I had a curious experience with Keras.
Info: input dataset shapes
16 features, 5000 observations
target variable: 1 dimension
Problem: Regression
While writing code for students I developed a toy network using tf using the following code (I know is not a complete example but I hope it will give you enough information)
n1 = 15 # Number of neurons in layer 1
n2 = 15 # Number of neurons in layer 2
n3 = 15
nx = number_of_x_points
n_dim = nx
n4 = 1
stddev_f = 0.1
tf.set_random_seed(5)
X = tf.placeholder(tf.float32, [n_dim, None])
Y = tf.placeholder(tf.float32, [10, None])
W1 = tf.Variable(tf.random_normal([n1, n_dim], stddev=stddev_f))
b1 = tf.Variable(tf.constant(0.0, shape = [n1,1]) )
W2 = tf.Variable(tf.random_normal([n2, n1], stddev=stddev_f))
b2 = tf.Variable(tf.constant(0.0, shape = [n2,1]))
W3 = tf.Variable(tf.random_normal([n3,n2], stddev = stddev_f))
b3 = tf.Variable(tf.constant(0.0, shape = [n3,1]))
W4 = tf.Variable(tf.random_normal([n4,n3], stddev = stddev_f))
b4 = tf.Variable(tf.constant(0.0, shape = [n4,1]))
X = tf.placeholder(tf.float32, [nx, None]) # Inputs
Y = tf.placeholder(tf.float32, [1, None]) # Labels
Z1 = tf.nn.sigmoid(tf.matmul(W1, X) + b1) # n1 x n_dim * n_dim x n_obs = n1 x n_obs
Z2 = tf.nn.sigmoid(tf.matmul(W2, Z1) + b2) # n2 x n1 * n1 * n_obs = n2 x n_obs
Z3 = tf.nn.sigmoid(tf.matmul(W3, Z2) + b3)
Z4 = tf.matmul(W4, Z3) + b4
y_ = tf.sigmoid(Z4)
cost = tf.reduce_mean(tf.square(y_-Y))
learning_rate = 0.005
training_step = tf.train.AdamOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
training_epochs = 1000
cost_history = np.empty(shape=[1], dtype = float)
cost_meas_history = np.empty(shape=[1], dtype = float)
train_x = np.transpose(data)
train_y = np.transpose(targets)
cost_history = []
for epoch in range(training_epochs+1):
for i in range(0, train_x.shape[0], batch_size):
x_batch = train_x[i:i + batch_size,:]
y_batch = train_y[i:i + batch_size,:]
sess.run(training_step, feed_dict = {X: x_batch, Y: y_batch})
cost_ = sess.run(cost, feed_dict={ X:train_x, Y: train_y})
cost_history = np.append(cost_history, cost_)
if (epoch % 5000 == 0):
print("Reached epoch",epoch,"cost J =", cost_)
this code is working quite well and it takes on my laptop for 1000 epochs 5 sec. Now I developed the same network with keras with the code
model = tf.keras.Sequential()
model.add(layers.Dense(15, input_dim=16, activation='sigmoid'))
model.add(layers.Dense(15, activation='sigmoid'))
model.add(layers.Dense(15, activation='sigmoid'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer=tf.train.AdamOptimizer(0.005),
loss='mse',
metrics=['mae'])
# Training Phase
model.fit(train_x.transpose(), train_y.transpose()/100.0, epochs=1000, batch_size=100,verbose = 0)
This code takes 43 sec. Has anyone any idea what this is the case? Now I expected Keras to be slower but not that much slower. What am I missing?
Thanks, Umberto

Ok I found the reason... It was my mistake. Due to a series of mistakes, due to programming at night after midnight (...), I realized I was comparing batch GD and mini-batch GD. My apologies to everyone and thanks to today that noticed my mistake...
If someone thinks this should be deleted is fine with me.
Now Keras and plain TF are taking exactly the same time. Thanks everyone for reading.
Best, Umberto

Neural Network under fitting - breast cancer dataset

I'm trying to create a neural network for binary classification on the breast cancer dataset:
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
My neural network consists of 3 layers(not including input layer):
first layer: 6 neurons with tanh activation.
second layer: 6 neurons with tanh activation.
final layer: 1 neuron with sigmoid activation.
Unfortunately, I'm only getting around 44% accuracy in the training examples and around 23% accuracy in the test examples.
Here is my python code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("data.csv")
data = data.drop(['id'], axis = 1)
data = data.drop(data.columns[31], axis = 1)
data = data.replace({'M': 1, 'B': 0})
X = data
X = X.drop(['diagnosis'], axis = 1)
X = np.array(X)
X_mean = np.mean(X, axis = 1, keepdims = True)
X_std = np.std(X, axis = 1, keepdims = True)
X_n = (X - X_mean) / X_std
y = np.array(data['diagnosis'])
y = y.reshape(569, 1)
m = 378
y_train = y[:m, :]
y_test = y[m:, :]
X_train = X_n[:m, :]
X_test = X_n[m:, :]
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def dsigmoid(z):
return np.multiply(z, (1 - z))
def tanh(z):
return (np.exp(z) - np.exp(-z)) / (np.exp(z) + np.exp(-z))
def dtanh(z):
return 1 - np.square(tanh(z))
def cost(A, Y):
m = Y.shape[0]
return -(1.0/m) *np.sum( np.dot(Y.T, np.log(A)) + np.dot((1 - Y).T, np.log(1-A)))
def train(X, y ,model, epocs, a):
W1 = model['W1']
W2 = model['W2']
W3 = model['W3']
b1 = model['b1']
b2 = model['b2']
b3 = model['b3']
costs = []
for i in range(epocs):
#forward propagation
z1 = np.dot(X, W1) + b1
a1 = tanh(z1)
z2 = np.dot(a1, W2) + b2
a2 = tanh(z2)
z3 = np.dot(a2, W3) + b3
a3 = sigmoid(z3)
costs.append(cost(a3, y))
#back propagation
dz3 = z3 - y
d3 = np.multiply(dz3, dsigmoid(z3))
dW3 = np.dot(a2.T, d3)
db3 = np.sum(d3, axis = 0, keepdims=True)
d2 = np.multiply(np.dot(d3, W3.T), dtanh(z2))
dW2 = np.dot(a1.T, d2)
db2 = np.sum(d2, axis = 0, keepdims=True)
d1 = np.multiply(np.dot(d2, W2.T), dtanh(z1))
dW1 = np.dot(X.T, d1)
db1 = np.sum(d1, axis = 0, keepdims=True)
W1 -= (a / m) * dW1
W2 -= (a / m) * dW2
W3 -= (a / m) * dW3
b1 -= (a / m) * db1
b2 -= (a / m) * db2
b3 -= (a / m) * db3
cache = {'W1': W1, 'W2': W2, 'W3': W3, 'b1': b1, 'b2': b2, 'b3': b3}
return cache, costs
np.random.seed(0)
model = {'W1': np.random.rand(30, 6) * 0.01, 'W2': np.random.rand(6, 6) * 0.01, 'W3': np.random.rand(6, 1) * 0.01, 'b1': np.random.rand(1, 6), 'b2': np.random.rand(1, 6), 'b3': np.random.rand(1, 1)}
model, costss = train(X_train, y_train, model, 1000, 0.1)
plt.plot([i for i in range(1000)], costss)
print(costss[999])
plt.show()
def predict(X,y ,model):
W1 = model['W1']
W2 = model['W2']
W3 = model['W3']
b1 = model['b1']
b2 = model['b2']
b3 = model['b3']
z1 = np.dot(X, W1) + b1
a1 = tanh(z1)
z2 = np.dot(a1, W2) + b2
a2 = tanh(z2)
z3 = np.dot(a2, W3) + b3
a3 = sigmoid(z3)
m = a3.shape[0]
y_predict = np.zeros((m, 1))
for i in range(m):
y_predict = 1 if a3[i, 0] > 0.5 else 0
return y_predict
Thanks for helping :)

I think there is a problem with your backpropagation (I made a quick test and tried your model on Tensorflow and it achieves around 92% accuracy on both train and test data).
I've made the following modification to your code:
dz3 = a3 - y
d3 = np.multiply(dz3, dsigmoid(a3))
Also your function predict returns only one number whereas it should return as many number as examples therefore instead of
y_predict = np.zeros((m, 1))
for i in range(m):
y_predict = 1 if a3[i, 0] > 0.5 else 0
return y_predict
I changed this part to
y_predict[a3[:,0] > 0.5] = 1
return y_predict
I ran the training with 2000 epochs and increase the learning rate to 1 (a=1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Neural Network Cost Function Not Minimizing - python

Related

Neural network gives same prediction for each data point for diabetes.csv

how impelemnt correlation rule for learning algorithm mlp replace backpropagation?

Having trouble getting TensorFlow to do something trivial

keras very slow compared to low level TF?

Neural Network under fitting - breast cancer dataset

Categories

Resources