Python - Low accuracy with low loss with tensorflow - python

I'm building a simple neural network that takes 3 values and gives 2 outputs.
I'm getting an accuracy of 67.5% and an average cost of 0.05
I have a training dataset of 1000 examples and 500 testing examples. I plan on making a larger dataset in the near future.
A little while ago I managed to get an accuracy of about 82% and sometimes a bit higher, but the cost was quite high.
I've been experimenting with adding another layer which is currently in the model and that is the reason I have got the loss under 1.0
I'm not sure what is going wrong, I'm new to Tensorflow and NNs in general.
Here is my code:
import tensorflow as tf
import numpy as np
import sys
sys.path.insert(0, '.../Dataset/Testing/')
sys.path.insert(0, '.../Dataset/Training/')
#other files
from TestDataNormaliser import *
from TrainDataNormaliser import *
learning_rate = 0.01
trainingIteration = 10
batchSize = 100
displayStep = 1
x = tf.placeholder("float", [None, 3])
y = tf.placeholder("float", [None, 2])
#layer 1
w1 = tf.Variable(tf.truncated_normal([3, 4], stddev=0.1))
b1 = tf.Variable(tf.zeros([4]))
y1 = tf.matmul(x, w1) + b1
#layer 2
w2 = tf.Variable(tf.truncated_normal([4, 4], stddev=0.1))
b2 = tf.Variable(tf.zeros([4]))
#y2 = tf.nn.sigmoid(tf.matmul(y1, w2) + b2)
y2 = tf.matmul(y1, w2) + b2
w3 = tf.Variable(tf.truncated_normal([4, 2], stddev=0.1))
b3 = tf.Variable(tf.zeros([2]))
y3 = tf.nn.sigmoid(tf.matmul(y2, w3) + b3) #sigmoid
#output
#wO = tf.Variable(tf.truncated_normal([2, 2], stddev=0.1))
#bO = tf.Variable(tf.zeros([2]))
a = y3 #tf.nn.softmax(tf.matmul(y2, wO) + bO) #y2
a_ = tf.placeholder("float", [None, 2])
#cost function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(a)))
#cross_entropy = -tf.reduce_sum(y*tf.log(a))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
#training
init = tf.global_variables_initializer() #initialises tensorflow
with tf.Session() as sess:
sess.run(init) #runs the initialiser
writer = tf.summary.FileWriter(".../Logs")
writer.add_graph(sess.graph)
merged_summary = tf.summary.merge_all()
for iteration in range(trainingIteration):
avg_cost = 0
totalBatch = int(len(trainArrayValues)/batchSize) #1000/100
#totalBatch = 10
for i in range(batchSize):
start = i
end = i + batchSize #100
xBatch = trainArrayValues[start:end]
yBatch = trainArrayLabels[start:end]
#feeding training data
sess.run(optimizer, feed_dict={x: xBatch, y: yBatch})
i += batchSize
avg_cost += sess.run(cross_entropy, feed_dict={x: xBatch, y: yBatch})/totalBatch
if iteration % displayStep == 0:
print("Iteration:", '%04d' % (iteration + 1), "cost=", "{:.9f}".format(avg_cost))
#
print("Training complete")
predictions = tf.equal(tf.argmax(a, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(predictions, "float"))
print("Accuracy:", accuracy.eval({x: testArrayValues, y: testArrayLabels}))

A few important notes:
You don't have non-linearities between your layers. This means you're training a network which is equivalent to a single-layer network, just with a lot of wasted computation. This is easily solved by adding a simple non-linearity, e.g. tf.nn.relu after each matmul/+ bias line, e.g. y2 = tf.nn.relu(y2) for all bar the last layer.
You are using a numerically unstable cross entropy implementation. I'd encourage you to use tf.nn.sigmoid_cross_entropy_with_logits, and removing your explicit sigmoid call (the input to your sigmoid function is what is generally referred to as the logits, or 'logistic units').
It seems you are not shuffling your dataset as you go. This could be particularly bad given your choice of optimizer, which leads us to...
Stochastic gradient descent is not great. For a boost without adding too much complication, consider using MomentumOptimizer instead. AdamOptimizer is my go-to, but play around with them.
When it comes to writing clean, maintainable code, I'd also encourage you to consider the following:
Use higher level APIs, e.g. tf.layers. It's good you know what's going on at a variable level, but it's easy to make a mistake with all that replicated code, and the default values with the layer implementations are generally pretty good
Consider using the tf.data.Dataset API for your data input. It's a bit scary at first, but it handles a lot of things like batching, shuffling, repeating epochs etc. very nicely
Consider using something like the tf.estimator.Estimator API for handling session runs, summary writing and evaluation.
With all those changes, you might have something that looks like the following (I've left your code in so you can roughly see the equivalent lines).
For graph construction:
def get_logits(features):
"""tf.layers API is cleaner and has better default values."""
# #layer 1
# w1 = tf.Variable(tf.truncated_normal([3, 4], stddev=0.1))
# b1 = tf.Variable(tf.zeros([4]))
# y1 = tf.matmul(x, w1) + b1
x = tf.layers.dense(features, 4, activation=tf.nn.relu)
# #layer 2
# w2 = tf.Variable(tf.truncated_normal([4, 4], stddev=0.1))
# b2 = tf.Variable(tf.zeros([4]))
# y2 = tf.matmul(y1, w2) + b2
x = tf.layers.dense(x, 4, activation=tf.nn.relu)
# w3 = tf.Variable(tf.truncated_normal([4, 2], stddev=0.1))
# b3 = tf.Variable(tf.zeros([2]))
# y3 = tf.nn.sigmoid(tf.matmul(y2, w3) + b3) #sigmoid
# N.B Don't take a non-linearity here.
logits = tf.layers.dense(x, 1, actiation=None)
# remove unnecessary final dimension, batch_size * 1 -> batch_size
logits = tf.squeeze(logits, axis=-1)
return logits
def get_loss(logits, labels):
"""tf.nn.sigmoid_cross_entropy_with_logits is numerically stable."""
# #cost function
# cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(a)))
return tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels)
def get_train_op(loss):
"""There are better options than standard SGD. Try the following."""
learning_rate = 1e-3
# optimizer = tf.train.GradientDescentOptimizer(learning_rate)
optimizer = tf.train.MomentumOptimizer(learning_rate)
# optimizer = tf.train.AdamOptimizer(learning_rate)
return optimizer.minimize(loss)
def get_inputs(feature_data, label_data, batch_size, n_epochs=None,
shuffle=True):
"""
Get features and labels for training/evaluation.
Args:
feature_data: numpy array of feature data.
label_data: numpy array of label data
batch_size: size of batch to be returned
n_epochs: number of epochs to train for. None will result in repeating
forever/until stopped
shuffle: bool flag indicating whether or not to shuffle.
"""
dataset = tf.data.Dataset.from_tensor_slices(
(feature_data, label_data))
dataset = dataset.repeat(n_epochs)
if shuffle:
dataset = dataset.shuffle(len(feature_data))
dataset = dataset.batch(batch_size)
features, labels = dataset.make_one_shot_iterator().get_next()
return features, labels
For session running you could use this like you have (what I'd call 'the hard way')...
features, labels = get_inputs(
trainArrayValues, trainArrayLabels, batchSize, n_epochs, shuffle=True)
logits = get_logits(features)
loss = get_loss(logits, labels)
train_op = get_train_op(loss)
init = tf.global_variables_initializer()
# monitored sessions have the `should_stop` method, which works with datasets
with tf.train.MonitoredSession() as sess:
sess.run(init)
while not sess.should_stop():
# get both loss and optimizer step in the same session run
loss_val, _ = sess.run([loss, train_op])
print(loss_val)
# save variables etc, do evaluation in another graph with different inputs?
but I think you're better off using a tf.estimator.Estimator, though some people prefer tf.keras.Models.
def model_fn(features, labels, mode):
logits = get_logits(features)
loss = get_loss(logits, labels)
train_op = get_train_op(loss)
predictions = tf.greater(logits, 0)
accuracy = tf.metrics.accuracy(labels, predictions)
return tf.estimator.EstimatorSpec(
mode=mode, loss=loss, train_op=train_op,
eval_metric_ops={'accuracy': accuracy}, predictions=predictions)
def train_input_fn():
return get_inputs(trainArrayValues, trainArrayLabels, batchSize)
def eval_input_fn():
return get_inputs(
testArrayValues, testArrayLabels, batchSize, n_epochs=1, shuffle=False)
# Where variables and summaries will be saved to
model_dir = './model'
estimator = tf.estimator.Estimator(model_fn, model_dir)
estimator.train(train_input_fn, max_steps=max_steps)
estimator.evaluate(eval_input_fn)
Note if you use estimators the variables will be saved after training, so you won't need to re-train each time. If you want to reset, just delete the model_dir.

I see that you are using a softmax loss with sigmoidal activation functions in the last layer. Now let me explain the difference between softmax activations and sigmoidal.
You are now allowing the output of the network to be y=(0, 1), y=(1, 0), y=(0, 0) and y=(1, 1). This is because your sigmoidal activations "squish" each element in y between 0 and 1. Your loss function, however, assumes that your y vector sums to one.
What you need to do here is either to penalise the sigmoidal cross entropy function, which looks like this:
-tf.reduce_sum(y*tf.log(a))-tf.reduce_sum((1-y)*tf.log(1-a))
Or, if you want a to sum to one, you need to use softmax activations in your final layer (to get your a's) instead of sigmoids, which is implemented like this
exp_out = tf.exp(y3)
a = exp_out/tf reduce_sum(exp_out)
Ps. I'm using my phone on a train so please excuse typos

Related

Tensorflow: Simple 3D Convnet not learning

I am trying to create a simple 3D U-net for image segmentation, just to learn how to use the layers. Therefore I do a 3D convolution with stride 2 and then a transpose deconvolution to get back the same image size. I am also overfitting to a small set (test set) just to see if my network is learning.
I created the same net in Keras and it works just fine. Now I want to create in tensorflow but I been having trouble with it.
The cost changes slightly but no matter what I do (reduce learning rate, add more epochs, add more layers, change batch size...) the output is always the same. I believe the net is not updating the weights. I am sure I am doing something wrong but I can find what it is. Any help would be greatly appreciate it.
Here is my code:
def forward_propagation(X):
if ( mode == 'train'): print(" --------- Net --------- ")
# Convolutional Layer 1
with tf.variable_scope('CONV1'):
Z1 = tf.layers.conv3d(X, filters = 16, kernel =[3,3,3], strides = [ 2, 2, 2], padding='SAME', name = 'S2/conv3d')
A1 = tf.nn.relu(Z1, name = 'S2/ReLU')
if ( mode == 'train'): print("Convolutional Layer 1 S2 " + str(A1.get_shape()))
# DEConvolutional Layer 1
with tf.variable_scope('DeCONV1'):
output_deconv1 = tf.stack([X.get_shape()[0] , X.get_shape()[1], X.get_shape()[2], X.get_shape()[3], 1])
dZ1 = tf.nn.conv3d_transpose(A1, filters = 1, kernel =[3,3,3], strides = [2, 2, 2], padding='SAME', name = 'S2/conv3d_transpose')
dA1 = tf.nn.relu(dZ1, name = 'S2/ReLU')
if ( mode == 'train'): print("Deconvolutional Layer 1 S1 " + str(dA1.get_shape()))
return dA1
def compute_cost(output, target, method = 'dice_hard_coe'):
with tf.variable_scope('COST'):
if (method == 'sigmoid_cross_entropy') :
# Make them vectors
output = tf.reshape( output, [-1, output.get_shape().as_list()[0]] )
target = tf.reshape( target, [-1, target.get_shape().as_list()[0]] )
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = output, labels = target)
cost = tf.reduce_mean(loss)
return cost
and the main function for the model:
def model(X_h5, Y_h5, learning_rate = 0.009,
num_epochs = 100, minibatch_size = 64, print_cost = True):
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
#tf.set_random_seed(1) # to keep results consistent (tensorflow seed)
#seed = 3 # to keep results consistent (numpy seed)
(m, n_D, n_H, n_W, num_channels) = X_h5["test_data"].shape #TTT
num_labels = Y_h5["test_mask"].shape[4] #TTT
img_size = Y_h5["test_mask"].shape[1] #TTT
costs = [] # To keep track of the cost
accuracies = [] # To keep track of the accuracy
# Create Placeholders of the correct shape
X, Y = create_placeholders(n_H, n_W, n_D, minibatch_size)
# Forward propagation: Build the forward propagation in the tensorflow graph
nn_output = forward_propagation(X)
prediction = tf.nn.sigmoid(nn_output)
# Cost function: Add cost function to tensorflow graph
cost_method = 'sigmoid_cross_entropy'
cost = compute_cost(nn_output, Y, cost_method)
# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
# Initialize all the variables globally
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
print('------ Training ------')
# Run the initialization
tf.local_variables_initializer().run(session=sess)
sess.run(init)
# Do the training loop
for i in range(num_epochs*m):
# ----- TRAIN -------
current_epoch = i//m
patient_start = i-(current_epoch * m)
patient_end = patient_start + minibatch_size
current_X_train = np.zeros((minibatch_size, n_D, n_H, n_W,num_channels))
current_X_train[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:]) #TTT
current_X_train = np.nan_to_num(current_X_train) # make nan zero
current_Y_train = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
current_Y_train[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:]) #TTT
current_Y_train = np.nan_to_num(current_Y_train) # make nan zero
feed_dict = {X: current_X_train, Y: current_Y_train}
_ , temp_cost = sess.run([optimizer, cost], feed_dict=feed_dict)
# ----- TEST -------
# Print the cost every 1/5 epoch
if ((i % (num_epochs*m/5) )== 0):
# Calculate the predictions
test_predictions = np.zeros(Y_h5["test_mask"].shape)
for j in range(0, X_h5["test_data"].shape[0], minibatch_size):
patient_start = j
patient_end = patient_start + minibatch_size
current_X_test = np.zeros((minibatch_size, n_D, n_H, n_W, num_channels))
current_X_test[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:])
current_X_test = np.nan_to_num(current_X_test) # make nan zero
current_Y_test = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
current_Y_test[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:])
current_Y_test = np.nan_to_num(current_Y_test) # make nan zero
feed_dict = {X: current_X_test, Y: current_Y_test}
_, current_prediction = sess.run([cost, prediction], feed_dict=feed_dict)
test_predictions[j:j + minibatch_size,:,:,:,:] = current_prediction
costs.append(temp_cost)
print ("[" + str(current_epoch) + "|" + str(num_epochs) + "] " + "Cost : " + str(costs[-1]))
display_progress(X_h5["test_data"], Y_h5["test_mask"], test_predictions, 5, n_H, n_W)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('epochs')
plt.show()
return
I call the model with:
model(hdf5_data_file, hdf5_mask_file, num_epochs = 500, minibatch_size = 1, learning_rate = 1e-3)
These are the results that I am currently getting:
Edit:
I have tried reducing the learning rate and it doesn't help. I also tried using tensorboard debug and the weights are not being updated:
I am not sure why this is happening.
I Created the same simple model in keras and it works fine. I am not sure what I am doing wrong in tensorflow.
Not sure if you are still looking for help, as I am answering this question half a year later your posted date. :) I've listed my observations and also some suggestions for you to try below. It my primary observation is right... then you probably just need a coffee break / a night of good sleep.
primary observation:
tf.reshape( output, [-1, output.get_shape().as_list()[0]] ) seems wrong. If you prefer to flatten the vector, it should be something like tf.reshape(output,[-1,np.prod(image_shape_list)]).
other observations:
With such a shallow network, I doubt the network have enough spatial resolution to differentiate tumor voxels from non-tumor voxels. Can you show the keras implementation and the performance compared to a pure tf implementation? I would probably go with 2+ layers, let's .
say with 3 layers, with a stride of 2 per layer, and an input image width of 256, you will end with a width of 32 at your deepest encoder layer. (If you have a limited GPU memory, downsample the input image.)
if changing the loss computation does not work, as #bremen_matt mentioned, reduce LR to say maybe 1e-5.
after the basic architecture tweaks and you "feel" that the network is sort of learning and not stuck, try augmenting the training data, add dropout, batch norm during training, and then maybe fancy up your loss by adding a discriminator.

Generating MNIST numbers using LSTM-CGAN in TensorFlow

Inspired by this article, I'm trying to build a Conditional GAN which will use LSTM to generate MNIST numbers. I hope I'm using the same architecture as in the image below (except for the bidirectional RNN in discriminator, taken from this paper):
When I run this model, I've got very strange results. This image shows my model generating number 3 after each epoch. It should look more like this. It's really bad.
Loss of my discriminator network decreasing really fast up to close to zero. However, the loss of my generator network oscillates around some fixed point (maybe diverging slowly). I really don't know what's happening. Here is the most important part of my code (full code here):
timesteps = 28
X_dim = 28
Z_dim = 100
y_dim = 10
X = tf.placeholder(tf.float32, [None, timesteps, X_dim]) # reshaped MNIST image to 28x28
y = tf.placeholder(tf.float32, [None, y_dim]) # one-hot label
Z = tf.placeholder(tf.float32, [None, timesteps, Z_dim]) # numpy.random.uniform noise in range [-1; 1]
y_timesteps = tf.tile(tf.expand_dims(y, axis=1), [1, timesteps, 1]) # [None, timesteps, y_dim] - replicate y along axis=1
def discriminator(x, y):
with tf.variable_scope('discriminator', reuse=tf.AUTO_REUSE) as vs:
inputs = tf.concat([x, y], axis=2)
D_cell = tf.contrib.rnn.LSTMCell(64)
output, _ = tf.nn.dynamic_rnn(D_cell, inputs, dtype=tf.float32)
last_output = output[:, -1, :]
logit = tf.contrib.layers.fully_connected(last_output, 1, activation_fn=None)
pred = tf.nn.sigmoid(logit)
variables = [v for v in tf.all_variables() if v.name.startswith(vs.name)]
return variables, pred, logit
def generator(z, y):
with tf.variable_scope('generator', reuse=tf.AUTO_REUSE) as vs:
inputs = tf.concat([z, y], axis=2)
G_cell = tf.contrib.rnn.LSTMCell(64)
output, _ = tf.nn.dynamic_rnn(G_cell, inputs, dtype=tf.float32)
logit = tf.contrib.layers.fully_connected(output, X_dim, activation_fn=None)
pred = tf.nn.sigmoid(logit)
variables = [v for v in tf.all_variables() if v.name.startswith(vs.name)]
return variables, pred
G_vars, G_sample = run_generator(Z, y_timesteps)
D_vars, D_real, D_logit_real = run_discriminator(X, y_timesteps)
_, D_fake, D_logit_fake = run_discriminator(G_sample, y_timesteps)
D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))
D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=D_vars)
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=G_vars)
There is most likely something wrong with my model. Anyone could help me make the generator network converge?
There are a few things you can do to improve your network architecture and training phase.
Remove the tf.nn.sigmoid(logit) from both the generator and discriminator. Return just the pred.
Use a numerically stable function to calculate your loss functions and fix the loss functions:
D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))
should be:
D_loss_real = tf.nn.sigmoid_cross_entropy_with_logits(
logits=D_real,
labels=tf.ones_like(D_real))
D_loss_fake = tf.nn.sigmoid_cross_entropy_with_logits(
logits=D_fake,
labels=tf.zeros_like(D_fake))
D_loss = -tf.reduce_mean(D_loss_real + D_loss_fake)
G_loss = -tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=D_real,
labels=tf.ones_like(D_real)))
Once you fixed the loss and used a numerically stable function, things will go better. Also, as a rule of thumb, if there's too much noise in the loss, reduce the learning rate (the default lr of ADAM is usually too high when training GANs).
Hope it helps

Why does this neural network learn nothing?

I am learning TensorFlow and was implementing a simple neural network as explained in MNIST for Beginners in TensorFlow docs. Here is the link. The accuracy was about 80-90 %, as expected.
Then following the same article was MNIST for Experts using ConvNet. Instead of implementing that I decided to improve the beginner part. I know about Neural Nets and how they learn and the fact that deep networks can perform better than shallow networks. I modified the original program in MNIST for Beginner to implement a Neural network with 2 hidden layers each of 16 neurons.
It looks something like this :
Image of Network
Code for it
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
x = tf.placeholder(tf.float32, [None, 784], 'images')
y = tf.placeholder(tf.float32, [None, 10], 'labels')
# We are going to make 2 hidden layer neurons with 16 neurons each
# All the weights in network
W0 = tf.Variable(dtype=tf.float32, name='InputLayerWeights', initial_value=tf.zeros([784, 16]))
W1 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Weights', initial_value=tf.zeros([16, 16]))
W2 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Weights', initial_value=tf.zeros([16, 10]))
# All the biases for the network
B0 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Biases', initial_value=tf.zeros([16]))
B1 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Biases', initial_value=tf.zeros([16]))
B2 = tf.Variable(dtype=tf.float32, name='OutputLayerBiases', initial_value=tf.zeros([10]))
def build_graph():
"""This functions wires up all the biases and weights of the network
and returns the last layer connections
:return: returns the activation in last layer of network/output layer without softmax
"""
A1 = tf.nn.relu(tf.matmul(x, W0) + B0)
A2 = tf.nn.relu(tf.matmul(A1, W1) + B1)
return tf.matmul(A2, W2) + B2
def print_accuracy(sx, sy, tf_session):
"""This function prints the accuracy of a model at the time of invocation
:return: None
"""
correct_prediction = tf.equal(tf.argmax(y), tf.argmax(tf.nn.softmax(build_graph())))
correct_prediction_float = tf.cast(correct_prediction, dtype=tf.float32)
accuracy = tf.reduce_mean(correct_prediction_float)
print(accuracy.eval(feed_dict={x: sx, y: sy}, session=tf_session))
y_predicted = build_graph()
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predicted))
model = tf.train.GradientDescentOptimizer(0.03).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(1000):
batch_x, batch_y = mnist.train.next_batch(50)
if _ % 100 == 0:
print_accuracy(batch_x, batch_y, sess)
sess.run(model, feed_dict={x: batch_x, y: batch_y})
The Output expected was supposed to be better than what could be achieved with when just a single layer (Assumed that W0 has shape of [784,10] and B0 has shape of [10])
def build_graph():
return tf.matmul(x,W0) + B0
Instead, the output says that network was not training at all. The Accuracy was not crossing 20% in any iteration.
Output
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
My Question
What is wrong with the above program that it does not generalize at all? How can I improve it more without using convolutional neural networks?
Your main mistake is network symmetry, because you initialized all weights to zeros. As a result, the weights are never updated. Change it to small random numbers and it will start learning. It's ok to initialize biases with zeros.
Another issue is pure technical: print_accuracy function is creating new nodes in the computational graph, and since you're calling it in the loop, the graph gets bloated and eventually will use up all memory.
You might also want to play with hyper-parameters and make the network bigger to increase its capacity.
Edit: I also spotted a bug in your accuracy calculation. It should be
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_predicted, 1))
Here's a complete code:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
x = tf.placeholder(tf.float32, [None, 784], 'images')
y = tf.placeholder(tf.float32, [None, 10], 'labels')
W0 = tf.Variable(dtype=tf.float32, name='InputLayerWeights', initial_value=tf.truncated_normal([784, 16]) * 0.001)
W1 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Weights', initial_value=tf.truncated_normal([16, 16]) * 0.001)
W2 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Weights', initial_value=tf.truncated_normal([16, 10]) * 0.001)
B0 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Biases', initial_value=tf.ones([16]))
B1 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Biases', initial_value=tf.ones([16]))
B2 = tf.Variable(dtype=tf.float32, name='OutputLayerBiases', initial_value=tf.ones([10]))
A1 = tf.nn.relu(tf.matmul(x, W0) + B0)
A2 = tf.nn.relu(tf.matmul(A1, W1) + B1)
y_predicted = tf.matmul(A2, W2) + B2
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_predicted, 1))
correct_prediction_float = tf.cast(correct_prediction, dtype=tf.float32)
accuracy = tf.reduce_mean(correct_prediction_float)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predicted))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)
mnist = input_data.read_data_sets('mnist', one_hot=True)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(20000):
batch_x, batch_y = mnist.train.next_batch(64)
_, cost_val, acc_val = sess.run([optimizer, cross_entropy, accuracy], feed_dict={x: batch_x, y: batch_y})
if i % 100 == 0:
print('cost=%.3f accuracy=%.3f' % (cost_val, acc_val))

MLP(ReLu) stops learning after few iterations. Tensor Flow

2 layers MLP (Relu) + Softmax
After 20 iterations, Tensor Flow just gives up and stops updating any weights or biases.
I initially thought that my ReLu where dying, so I displayed histograms to make sure none of them where 0. And none of them are !
They just stop changing after few iterations and cross entropy is still high. ReLu, Sigmoid and tanh gives the same results. Tweaking GradientDescentOptimizer from 0.01 to 0.5 also doesn't change much.
There has to be a bug somewhere. Like an actual bug in my code. I can't even overfit a small sample set !
Here are my histograms and here's my code, if anyone could check it out, that would be a major help.
We have 3000 scalars with 6 values between 0 and 255
to classify in two classes : [1,0] or [0,1]
(I made sure to randomise the order)
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#act=tf.nn.relu
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
normalised = tf.scalar_mul(1/255,x)
#Two layers
hidden1 = nn_layer(normalised, 6, 4, "hidden1")
hidden2 = nn_layer(hidden1, 4, 2, "hidden2")
#Finish by a softmax
softmax = tf.nn.softmax(hidden2)
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=softmax))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(softmax, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
Hidden Layer 1. Output isn't zero, so that's not a dying ReLu problem. but still, weights are constant! TF didn't even try to modify them
Same for Hidden Layer 2. TF tried tweaking them a bit and gave up pretty fast.
Cross entropy does decrease, but stays staggeringly high.
EDIT :
LOTS of mistakes in my code.
First one is 1/255 = 0 in python... Changed it to 1.0/255.0 and my code started to live.
So basically, my input was multiplied by 0 and the neural network just was purely blind. So he tried to get the best result he could while being blind and then gave up. Which explains totally it's reaction.
Now I was applying a softmax twice... Modifying it helped also.
And by strying different learning rates and different number of epoch I finally found something good.
Here is the final working code :
def runModel(self):
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
#This is standard weight for neural networks with ReLu.
#I divide by math.sqrt(float(6)) because my input has 6 values
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
#I chose this bias myself. It work. Not sure why.
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#Some neurons will have ReLu as activation function
#Some won't have any activation functions
if act == "None":
activations = preactivate
else :
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
#Normalising input really helps. Nothing is doable without it
#But my ERROR was to write 1/255. Becase in python
#1/255 = 0 .... (integer division)
#But 1.0/255.0 = 0,003921568 (float division)
normalised = tf.scalar_mul(1.0/255.0,x)
#Three layers total. The first one is just a matrix multiplication
input = nn_layer(normalised, 6, 4, "input", act="None")
#The second one has a ReLu after a matrix multiplication
hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
#The last one is also jsut a matrix multiplcation
#WARNING ! No softmax here ! Because later we call a function
#That implicitly does a softmax
#And it's bad practice to do two softmax one after the other
output = nn_layer(hidden1, 4, 2, "output", act="None")
#Tried different learning rates
#Higher learning rate means find a result faster
#But could be a local minimum
#Lower learning rate means we need much more epochs
learning_rate = 0.03
with tf.name_scope('learning_rate_'+str(learning_rate)):
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
I'm afraid that you have to reduce your learning rate. It's to high. High learning rate usually leads you to local minimum not global one.
Try 0.001, 0.0001 or even 0.00001. Or make your learning rate flexible.
I did not checked the code, so firstly try to tune LR.
Just incase someone needs it in the future:
I had initialized my dual layer network's layers with np.random.randn but the network refused to learn. Using the He (for ReLU) and Xavier(for softmax) initializations totally worked.

How does one use the official Batch Normalization layer in TensorFlow?

I was trying to use batch normalization to train my Neural Networks using TensorFlow but it was unclear to me how to use the official layer implementation of Batch Normalization (note this is different from the one from the API).
After some painful digging on the their github issues it seems that one needs a tf.cond to use it properly and also a 'resue=True' flag so that the BN shift and scale variables are properly reused. After figuring that out I provided a small description of how I believe is the right way to use it here.
Now I have written a short script to test it (only a single layer and a ReLu, hard to make it smaller than this). However, I am not 100% sure how to test it. Right now my code runs with no error messages but returns NaNs unexpectedly. Which lowers my confidence that the code I gave in the other post might be right. Or maybe the network I have is weird. Either way, does someone know whats wrong? Here is the code:
import tensorflow as tf
# download and install the MNIST data automatically
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
def batch_norm_layer(x,train_phase,scope_bn):
bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=True,
reuse=None, # is this right?
trainable=True,
scope=scope_bn)
bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=False,
reuse=True, # is this right?
trainable=True,
scope=scope_bn)
z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return z
def get_NN_layer(x, input_dim, output_dim, scope, train_phase):
with tf.name_scope(scope+'vars'):
W = tf.Variable(tf.truncated_normal(shape=[input_dim, output_dim], mean=0.0, stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[output_dim]))
with tf.name_scope(scope+'Z'):
z = tf.matmul(x,W) + b
with tf.name_scope(scope+'BN'):
if train_phase is not None:
z = batch_norm_layer(z,train_phase,scope+'BN_unit')
with tf.name_scope(scope+'A'):
a = tf.nn.relu(z) # (M x D1) = (M x D) * (D x D1)
return a
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# placeholder for data
x = tf.placeholder(tf.float32, [None, 784])
# placeholder that turns BN during training or off during inference
train_phase = tf.placeholder(tf.bool, name='phase_train')
# variables for parameters
hiden_units = 25
layer1 = get_NN_layer(x, input_dim=784, output_dim=hiden_units, scope='layer1', train_phase=train_phase)
# create model
W_final = tf.Variable(tf.truncated_normal(shape=[hiden_units, 10], mean=0.0, stddev=0.1))
b_final = tf.Variable(tf.constant(0.1, shape=[10]))
y = tf.nn.softmax(tf.matmul(layer1, W_final) + b_final)
### training
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean( -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]) )
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
steps = 3000
for iter_step in xrange(steps):
#feed_dict_batch = get_batch_feed(X_train, Y_train, M, phase_train)
batch_xs, batch_ys = mnist.train.next_batch(100)
# Collect model statistics
if iter_step%1000 == 0:
batch_xstrain, batch_xstrain = batch_xs, batch_ys #simualtes train data
batch_xcv, batch_ycv = mnist.test.next_batch(5000) #simualtes CV data
batch_xtest, batch_ytest = mnist.test.next_batch(5000) #simualtes test data
# do inference
train_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xs, y_:batch_ys, train_phase: False})
cv_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xcv, y_:batch_ycv, train_phase: False})
test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})
def do_stuff_with_errors(*args):
print args
do_stuff_with_errors(train_error, cv_error, test_error)
# Run Train Step
sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})
# list of booleans indicating correct predictions
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
# accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels, train_phase: False}))
when I run it I get:
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(2.3474066, 2.3498712, 2.3461707)
(0.49414295, 0.88536006, 0.91152304)
(0.51632041, 0.393666, nan)
0.9296
it used to be all the last ones were nan and now only a few of them. Is everything fine or am I paranoic?
I am not sure if this will solve your problem, the documentation for BatchNorm is not quite easy-to-use/informative, so here is a short recap on how to use simple BatchNorm:
First of all, you define your BatchNorm layer. If you want to use it after an affine/fully-connected layer, you do this (just an example, order can be different/as you desire):
...
inputs = tf.matmul(inputs, W) + b
inputs = tf.layers.batch_normalization(inputs, training=is_training)
inputs = tf.nn.relu(inputs)
...
The function tf.layers.batch_normalization calls variable-initializers. These are internal-variables and need a special scope to be called, which is in the tf.GraphKeys.UPDATE_OPS. As such, you must call your optimizer function as follows (after all layers have been defined!):
...
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
trainer = tf.train.AdamOptimizer()
updateModel = trainer.minimize(loss, global_step=global_step)
...
You can read more about it here. I know it's a little late to answer your question, but it might help other people coming across BatchNorm problems in tensorflow! :)
training =tf.placeholder(tf.bool, name = 'training')
lr_holder = tf.placeholder(tf.float32, [], name='learning_rate')
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate = lr).minimize(cost)
when defining the layers, you need to use the placeholder 'training'
batchNormal_layer = tf.layers.batch_normalization(pre_batchNormal_layer, training=training)

Categories

Resources