2 layers MLP (Relu) + Softmax
After 20 iterations, Tensor Flow just gives up and stops updating any weights or biases.
I initially thought that my ReLu where dying, so I displayed histograms to make sure none of them where 0. And none of them are !
They just stop changing after few iterations and cross entropy is still high. ReLu, Sigmoid and tanh gives the same results. Tweaking GradientDescentOptimizer from 0.01 to 0.5 also doesn't change much.
There has to be a bug somewhere. Like an actual bug in my code. I can't even overfit a small sample set !
Here are my histograms and here's my code, if anyone could check it out, that would be a major help.
We have 3000 scalars with 6 values between 0 and 255
to classify in two classes : [1,0] or [0,1]
(I made sure to randomise the order)
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#act=tf.nn.relu
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
normalised = tf.scalar_mul(1/255,x)
#Two layers
hidden1 = nn_layer(normalised, 6, 4, "hidden1")
hidden2 = nn_layer(hidden1, 4, 2, "hidden2")
#Finish by a softmax
softmax = tf.nn.softmax(hidden2)
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=softmax))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(softmax, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
Hidden Layer 1. Output isn't zero, so that's not a dying ReLu problem. but still, weights are constant! TF didn't even try to modify them
Same for Hidden Layer 2. TF tried tweaking them a bit and gave up pretty fast.
Cross entropy does decrease, but stays staggeringly high.
EDIT :
LOTS of mistakes in my code.
First one is 1/255 = 0 in python... Changed it to 1.0/255.0 and my code started to live.
So basically, my input was multiplied by 0 and the neural network just was purely blind. So he tried to get the best result he could while being blind and then gave up. Which explains totally it's reaction.
Now I was applying a softmax twice... Modifying it helped also.
And by strying different learning rates and different number of epoch I finally found something good.
Here is the final working code :
def runModel(self):
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
#This is standard weight for neural networks with ReLu.
#I divide by math.sqrt(float(6)) because my input has 6 values
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
#I chose this bias myself. It work. Not sure why.
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#Some neurons will have ReLu as activation function
#Some won't have any activation functions
if act == "None":
activations = preactivate
else :
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
#Normalising input really helps. Nothing is doable without it
#But my ERROR was to write 1/255. Becase in python
#1/255 = 0 .... (integer division)
#But 1.0/255.0 = 0,003921568 (float division)
normalised = tf.scalar_mul(1.0/255.0,x)
#Three layers total. The first one is just a matrix multiplication
input = nn_layer(normalised, 6, 4, "input", act="None")
#The second one has a ReLu after a matrix multiplication
hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
#The last one is also jsut a matrix multiplcation
#WARNING ! No softmax here ! Because later we call a function
#That implicitly does a softmax
#And it's bad practice to do two softmax one after the other
output = nn_layer(hidden1, 4, 2, "output", act="None")
#Tried different learning rates
#Higher learning rate means find a result faster
#But could be a local minimum
#Lower learning rate means we need much more epochs
learning_rate = 0.03
with tf.name_scope('learning_rate_'+str(learning_rate)):
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
I'm afraid that you have to reduce your learning rate. It's to high. High learning rate usually leads you to local minimum not global one.
Try 0.001, 0.0001 or even 0.00001. Or make your learning rate flexible.
I did not checked the code, so firstly try to tune LR.
Just incase someone needs it in the future:
I had initialized my dual layer network's layers with np.random.randn but the network refused to learn. Using the He (for ReLU) and Xavier(for softmax) initializations totally worked.
Related
I am trying to train an RNN without using the RNN API in tensorflow (2) in Python 3.7, so the code is very basic. Something is going really wrong, but I'm not sure what it is.
As a reference, I am using a dataset from this tensorflow tutorial so I know what the error should roughly converge to. My RNN code is the following. What it is trying to do is use the previous 20 timesteps to predict the value of a series at the 21st timestep. I am training in batches of size 256.
While there is a decrease in loss over time, the ceiling is approximately 10x what it is if I follow the tutorial approach. Could it be some problem with the backpropagation through time?
state_size = 20 #dimensionality of the network
BATCH_SIZE = 256
#define recurrent weights and biases. W has 1 more dimension that the state
#dimension as also processes the inputs
W = tf.Variable(np.random.rand(state_size+1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)), dtype=tf.float32)
#weights and biases for the output
W2 = tf.Variable(np.random.rand(state_size, 1),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,1)), dtype=tf.float32)
init_state = tf.Variable(np.random.normal(size=[BATCH_SIZE,state_size]),dtype='float32')
optimizer = tf.keras.optimizers.Adam(1e-3)
losses = []
for epoch in range(20):
with tf.GradientTape() as tape:
loss = 0
for batch_idx in range(200):
current_state = init_state
batchx = x_train_uni[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE].swapaxes(0,1)
batchy = y_train_uni[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE]
#forward pass through the timesteps
for x in batchx:
inst = tf.concat([current_state,x],1) #concatenate state and inputs for that timepoint
current_state = tf.tanh(tf.matmul(inst, W) + b) #
#predict using the hidden state after the full forward pass
pred = tf.matmul(current_state,W2) + b2
loss += tf.reduce_mean(tf.abs(batchy-pred))
#get gradients with respect to parameters
gradients = tape.gradient(loss, [W,b,W2,b2])
#apply gradients
optimizer.apply_gradients(zip(gradients, [W,b,W2,b2]))
losses.append(loss)
print(loss)
I'm building a simple neural network that takes 3 values and gives 2 outputs.
I'm getting an accuracy of 67.5% and an average cost of 0.05
I have a training dataset of 1000 examples and 500 testing examples. I plan on making a larger dataset in the near future.
A little while ago I managed to get an accuracy of about 82% and sometimes a bit higher, but the cost was quite high.
I've been experimenting with adding another layer which is currently in the model and that is the reason I have got the loss under 1.0
I'm not sure what is going wrong, I'm new to Tensorflow and NNs in general.
Here is my code:
import tensorflow as tf
import numpy as np
import sys
sys.path.insert(0, '.../Dataset/Testing/')
sys.path.insert(0, '.../Dataset/Training/')
#other files
from TestDataNormaliser import *
from TrainDataNormaliser import *
learning_rate = 0.01
trainingIteration = 10
batchSize = 100
displayStep = 1
x = tf.placeholder("float", [None, 3])
y = tf.placeholder("float", [None, 2])
#layer 1
w1 = tf.Variable(tf.truncated_normal([3, 4], stddev=0.1))
b1 = tf.Variable(tf.zeros([4]))
y1 = tf.matmul(x, w1) + b1
#layer 2
w2 = tf.Variable(tf.truncated_normal([4, 4], stddev=0.1))
b2 = tf.Variable(tf.zeros([4]))
#y2 = tf.nn.sigmoid(tf.matmul(y1, w2) + b2)
y2 = tf.matmul(y1, w2) + b2
w3 = tf.Variable(tf.truncated_normal([4, 2], stddev=0.1))
b3 = tf.Variable(tf.zeros([2]))
y3 = tf.nn.sigmoid(tf.matmul(y2, w3) + b3) #sigmoid
#output
#wO = tf.Variable(tf.truncated_normal([2, 2], stddev=0.1))
#bO = tf.Variable(tf.zeros([2]))
a = y3 #tf.nn.softmax(tf.matmul(y2, wO) + bO) #y2
a_ = tf.placeholder("float", [None, 2])
#cost function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(a)))
#cross_entropy = -tf.reduce_sum(y*tf.log(a))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
#training
init = tf.global_variables_initializer() #initialises tensorflow
with tf.Session() as sess:
sess.run(init) #runs the initialiser
writer = tf.summary.FileWriter(".../Logs")
writer.add_graph(sess.graph)
merged_summary = tf.summary.merge_all()
for iteration in range(trainingIteration):
avg_cost = 0
totalBatch = int(len(trainArrayValues)/batchSize) #1000/100
#totalBatch = 10
for i in range(batchSize):
start = i
end = i + batchSize #100
xBatch = trainArrayValues[start:end]
yBatch = trainArrayLabels[start:end]
#feeding training data
sess.run(optimizer, feed_dict={x: xBatch, y: yBatch})
i += batchSize
avg_cost += sess.run(cross_entropy, feed_dict={x: xBatch, y: yBatch})/totalBatch
if iteration % displayStep == 0:
print("Iteration:", '%04d' % (iteration + 1), "cost=", "{:.9f}".format(avg_cost))
#
print("Training complete")
predictions = tf.equal(tf.argmax(a, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(predictions, "float"))
print("Accuracy:", accuracy.eval({x: testArrayValues, y: testArrayLabels}))
A few important notes:
You don't have non-linearities between your layers. This means you're training a network which is equivalent to a single-layer network, just with a lot of wasted computation. This is easily solved by adding a simple non-linearity, e.g. tf.nn.relu after each matmul/+ bias line, e.g. y2 = tf.nn.relu(y2) for all bar the last layer.
You are using a numerically unstable cross entropy implementation. I'd encourage you to use tf.nn.sigmoid_cross_entropy_with_logits, and removing your explicit sigmoid call (the input to your sigmoid function is what is generally referred to as the logits, or 'logistic units').
It seems you are not shuffling your dataset as you go. This could be particularly bad given your choice of optimizer, which leads us to...
Stochastic gradient descent is not great. For a boost without adding too much complication, consider using MomentumOptimizer instead. AdamOptimizer is my go-to, but play around with them.
When it comes to writing clean, maintainable code, I'd also encourage you to consider the following:
Use higher level APIs, e.g. tf.layers. It's good you know what's going on at a variable level, but it's easy to make a mistake with all that replicated code, and the default values with the layer implementations are generally pretty good
Consider using the tf.data.Dataset API for your data input. It's a bit scary at first, but it handles a lot of things like batching, shuffling, repeating epochs etc. very nicely
Consider using something like the tf.estimator.Estimator API for handling session runs, summary writing and evaluation.
With all those changes, you might have something that looks like the following (I've left your code in so you can roughly see the equivalent lines).
For graph construction:
def get_logits(features):
"""tf.layers API is cleaner and has better default values."""
# #layer 1
# w1 = tf.Variable(tf.truncated_normal([3, 4], stddev=0.1))
# b1 = tf.Variable(tf.zeros([4]))
# y1 = tf.matmul(x, w1) + b1
x = tf.layers.dense(features, 4, activation=tf.nn.relu)
# #layer 2
# w2 = tf.Variable(tf.truncated_normal([4, 4], stddev=0.1))
# b2 = tf.Variable(tf.zeros([4]))
# y2 = tf.matmul(y1, w2) + b2
x = tf.layers.dense(x, 4, activation=tf.nn.relu)
# w3 = tf.Variable(tf.truncated_normal([4, 2], stddev=0.1))
# b3 = tf.Variable(tf.zeros([2]))
# y3 = tf.nn.sigmoid(tf.matmul(y2, w3) + b3) #sigmoid
# N.B Don't take a non-linearity here.
logits = tf.layers.dense(x, 1, actiation=None)
# remove unnecessary final dimension, batch_size * 1 -> batch_size
logits = tf.squeeze(logits, axis=-1)
return logits
def get_loss(logits, labels):
"""tf.nn.sigmoid_cross_entropy_with_logits is numerically stable."""
# #cost function
# cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(a)))
return tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels)
def get_train_op(loss):
"""There are better options than standard SGD. Try the following."""
learning_rate = 1e-3
# optimizer = tf.train.GradientDescentOptimizer(learning_rate)
optimizer = tf.train.MomentumOptimizer(learning_rate)
# optimizer = tf.train.AdamOptimizer(learning_rate)
return optimizer.minimize(loss)
def get_inputs(feature_data, label_data, batch_size, n_epochs=None,
shuffle=True):
"""
Get features and labels for training/evaluation.
Args:
feature_data: numpy array of feature data.
label_data: numpy array of label data
batch_size: size of batch to be returned
n_epochs: number of epochs to train for. None will result in repeating
forever/until stopped
shuffle: bool flag indicating whether or not to shuffle.
"""
dataset = tf.data.Dataset.from_tensor_slices(
(feature_data, label_data))
dataset = dataset.repeat(n_epochs)
if shuffle:
dataset = dataset.shuffle(len(feature_data))
dataset = dataset.batch(batch_size)
features, labels = dataset.make_one_shot_iterator().get_next()
return features, labels
For session running you could use this like you have (what I'd call 'the hard way')...
features, labels = get_inputs(
trainArrayValues, trainArrayLabels, batchSize, n_epochs, shuffle=True)
logits = get_logits(features)
loss = get_loss(logits, labels)
train_op = get_train_op(loss)
init = tf.global_variables_initializer()
# monitored sessions have the `should_stop` method, which works with datasets
with tf.train.MonitoredSession() as sess:
sess.run(init)
while not sess.should_stop():
# get both loss and optimizer step in the same session run
loss_val, _ = sess.run([loss, train_op])
print(loss_val)
# save variables etc, do evaluation in another graph with different inputs?
but I think you're better off using a tf.estimator.Estimator, though some people prefer tf.keras.Models.
def model_fn(features, labels, mode):
logits = get_logits(features)
loss = get_loss(logits, labels)
train_op = get_train_op(loss)
predictions = tf.greater(logits, 0)
accuracy = tf.metrics.accuracy(labels, predictions)
return tf.estimator.EstimatorSpec(
mode=mode, loss=loss, train_op=train_op,
eval_metric_ops={'accuracy': accuracy}, predictions=predictions)
def train_input_fn():
return get_inputs(trainArrayValues, trainArrayLabels, batchSize)
def eval_input_fn():
return get_inputs(
testArrayValues, testArrayLabels, batchSize, n_epochs=1, shuffle=False)
# Where variables and summaries will be saved to
model_dir = './model'
estimator = tf.estimator.Estimator(model_fn, model_dir)
estimator.train(train_input_fn, max_steps=max_steps)
estimator.evaluate(eval_input_fn)
Note if you use estimators the variables will be saved after training, so you won't need to re-train each time. If you want to reset, just delete the model_dir.
I see that you are using a softmax loss with sigmoidal activation functions in the last layer. Now let me explain the difference between softmax activations and sigmoidal.
You are now allowing the output of the network to be y=(0, 1), y=(1, 0), y=(0, 0) and y=(1, 1). This is because your sigmoidal activations "squish" each element in y between 0 and 1. Your loss function, however, assumes that your y vector sums to one.
What you need to do here is either to penalise the sigmoidal cross entropy function, which looks like this:
-tf.reduce_sum(y*tf.log(a))-tf.reduce_sum((1-y)*tf.log(1-a))
Or, if you want a to sum to one, you need to use softmax activations in your final layer (to get your a's) instead of sigmoids, which is implemented like this
exp_out = tf.exp(y3)
a = exp_out/tf reduce_sum(exp_out)
Ps. I'm using my phone on a train so please excuse typos
I'm having trouble making predictions with a trained neural model on Tensor. Here's my attempt:
import tensorflow as tf
import pandas, numpy as np
dataset=[[0.4,0.5,0.6,0],[0.6,0.7,0.8,1],[0.3,0.8,0.5,2],....]
X = tf.placeholder(tf.float32, [None, 3])
W = tf.Variable(tf.zeros([3,10]))
b = tf.Variable(tf.zeros([10]))
Y1 = tf.matmul(X, W) + b
W1 = tf.Variable(tf.zeros([10, 1]))
b1 = tf.Variable(tf.zeros([1]))
Y = tf.nn.sigmoid(tf.matmul(Y1, W1) + b1)
# placeholder for correct labels
Y_ = tf.placeholder(tf.float32, [None, 1])
init = tf.global_variables_initializer()
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)
sess = tf.Session()
sess.run(init)
for i in range(1000):
# load batch of images and correct answers
batch_X, batch_Y = [x[:3] for x in dataset[:4000]],[x[-1:] for x in dataset[:4000]]
train_data={X: batch_X, Y_: batch_Y}
sess.run(train_step, feed_dict=train_data)
correct_prediction = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
a,c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
test, lebs=[x[:3] for x in dataset[4000:]],[x[-1:] for x in dataset[4000:]]
test_data={X: test, Y_: lebs}
a,c = sess.run([accuracy, cross_entropy], feed_dict=test_data)
prediction=tf.argmax(Y,1)
print ("predictions", prediction.eval({X:test}, session=sess))
I got the following results when I ran the above code:
predictions [0 0 0 ..., 0 0 0]
My expected output should be the class labels:
predictions `[0,1,2....]`
I will appreciate your suggestions.
There are multiple problems with your code:
Initialisation: You are zero initializing your weight variable.
W = tf.Variable(tf.zeros([3,10]))
Your model will keep propogating same values at each layer for all types of inputs if you zero initialize it. Initialize it with random values. Ex:
W = tf.Variable(tf.truncated_normal((3,10)))
Loss function: I believe you are trying to replicate this familiar looking equation as your loss function:
y * log(prob) + (1 - y) * log(1 - prob). I believe you are having totally 10 classes. For each of the 10 classes, you will have to substitue the above equation and remember, you will use y value in above equation as either correct class or wrong class i.e 1 or 0 only for each class. Do not substitute y value as class label from 0 to 9.
To avoid all this calculations, I will suggest you to make use of Tensorflow's in-built functions like tf.nn.softmax_cross_entropy_with_logits. It will help you in a long way.
Sigmoid function: This is the prime culprit on why all your outputs are giving value as only 0. The output range of sigmoid is from 0 to 1. Think of replacing it with ReLU.
Output units: If you are doing classification, your number of neurons in final layers should be equal to number of classes. Each class denotes one output class. Replace it with 10 neurons.
Learning rate: Keep playing with your learning rate. I believe your learning rate is little high for such a small network.
Hope you understood the problems in your code. Please Google each of the above point I have mentioned for greater details but I have given you more than enough information to start solving the problem.
Given the DNN (simple case of multilayered perceptron) with 2 hidden layers of 5 and 3 dimensions respectively, I am training a model to recognize the OR gate.
Using tensorflow learn, it seems like it's giving me the reverse output and I have no idea why:
from tensorflow.contrib import learn
classifier = learn.DNNClassifier(hidden_units=[5, 3], n_classes=2)
or_input = np.array([[0.,0.], [0.,1.], [1.,0.]])
or_output = np.array([[0,1,1]]).T
classifier.fit(or_input, or_output, steps=0.05, batch_size=3)
classifier.predict(np.array([ [1., 1.], [1., 0.] , [0., 0.] , [0., 1.]]))
[out]:
array([0, 0, 1, 0])
If I'm doing it "old-school", without the tensorflow.learn as follows, I get the expected answer.
import tensorflow as tf
# Parameters
learning_rate = 1.0
num_epochs = 1000
# Network Parameters
input_dim = 2 # Input dimensions.
hidden_dim_1 = 5 # 1st layer number of features
hidden_dim_2 = 3 # 2nd layer number of features
output_dim = 1 # Output dimensions.
# tf Graph input
x = tf.placeholder("float", [None, input_dim])
y = tf.placeholder("float", [hidden_dim_2, output_dim])
# With biases.
weights = {
'syn0': tf.Variable(tf.random_normal([input_dim, hidden_dim_1])),
'syn1': tf.Variable(tf.random_normal([hidden_dim_1, hidden_dim_2])),
'syn2': tf.Variable(tf.random_normal([hidden_dim_2, output_dim]))
}
biases = {
'b0': tf.Variable(tf.random_normal([hidden_dim_1])),
'b1': tf.Variable(tf.random_normal([hidden_dim_2])),
'b2': tf.Variable(tf.random_normal([output_dim]))
}
# Create a model
def multilayer_perceptron(X, weights, biases):
# Hidden layer 1 + sigmoid activation function
layer_1 = tf.add(tf.matmul(X, weights['syn0']), biases['b0'])
layer_1 = tf.nn.sigmoid(layer_1)
# Hidden layer 2 + sigmoid activation function
layer_2 = tf.add(tf.matmul(layer_1, weights['syn1']), biases['b1'])
layer_2 = tf.nn.sigmoid(layer_2)
# Output layer
out_layer = tf.matmul(layer_2, weights['syn2']) + biases['b2']
out_layer = tf.nn.sigmoid(out_layer)
return out_layer
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.sub(y, pred)
# Or you can use fancy cost like:
##tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.initialize_all_variables()
or_input = np.array([[0.,0.], [0.,1.], [1.,0.]])
or_output = np.array([[0.,1.,1.]]).T
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(num_epochs):
batch_x, batch_y = or_input, or_output # Loop over all data points.
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
#print (c)
# Now let's test it on the unknown dataset.
new_inputs = np.array([[1.,1.], [1.,0.]])
feed_dict = {x: new_inputs}
predictions = sess.run(pred, feed_dict)
print (predictions)
[out]:
[[ 0.99998868]
[ 0.99998868]]
Why is it that I am getting the reversed output using tensorflow.learn? Am I doing something wrongly using the tensorflow.learn?
How do I get the tensorflow.learn code to produce the same output as the "old-school" tensorflow framework?
If you specify the right argument for steps you get the good results:
classifier.fit(or_input, or_output, steps=1000, batch_size=3)
Result:
array([1, 1, 0, 1])
How does steps work
The steps argument specifies the number of times you run the training operation. Let me give you some examples:
with batch_size = 16 and steps = 10, you will see a total of 160 examples
in your example, batch_size = 3 and steps = 1000, the algorithm will see 3000 examples. In fact, it will see 1000 times the same 3 examples you provided
So, steps is not the number of epochs, it is the number of times you run the training op, or the number of times you see a new batch.
Why is steps = 0.05 allowed?
In the tf.learn code, they don't check if steps is an integer. They just run a while loop checking that (at this line):
last_step < max_steps
So if max_steps = 0.05, it will behave the same as if max_steps = 1 (last_step is incremented in the loop).
I was trying to use batch normalization to train my Neural Networks using TensorFlow but it was unclear to me how to use the official layer implementation of Batch Normalization (note this is different from the one from the API).
After some painful digging on the their github issues it seems that one needs a tf.cond to use it properly and also a 'resue=True' flag so that the BN shift and scale variables are properly reused. After figuring that out I provided a small description of how I believe is the right way to use it here.
Now I have written a short script to test it (only a single layer and a ReLu, hard to make it smaller than this). However, I am not 100% sure how to test it. Right now my code runs with no error messages but returns NaNs unexpectedly. Which lowers my confidence that the code I gave in the other post might be right. Or maybe the network I have is weird. Either way, does someone know whats wrong? Here is the code:
import tensorflow as tf
# download and install the MNIST data automatically
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
def batch_norm_layer(x,train_phase,scope_bn):
bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=True,
reuse=None, # is this right?
trainable=True,
scope=scope_bn)
bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=False,
reuse=True, # is this right?
trainable=True,
scope=scope_bn)
z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return z
def get_NN_layer(x, input_dim, output_dim, scope, train_phase):
with tf.name_scope(scope+'vars'):
W = tf.Variable(tf.truncated_normal(shape=[input_dim, output_dim], mean=0.0, stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[output_dim]))
with tf.name_scope(scope+'Z'):
z = tf.matmul(x,W) + b
with tf.name_scope(scope+'BN'):
if train_phase is not None:
z = batch_norm_layer(z,train_phase,scope+'BN_unit')
with tf.name_scope(scope+'A'):
a = tf.nn.relu(z) # (M x D1) = (M x D) * (D x D1)
return a
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# placeholder for data
x = tf.placeholder(tf.float32, [None, 784])
# placeholder that turns BN during training or off during inference
train_phase = tf.placeholder(tf.bool, name='phase_train')
# variables for parameters
hiden_units = 25
layer1 = get_NN_layer(x, input_dim=784, output_dim=hiden_units, scope='layer1', train_phase=train_phase)
# create model
W_final = tf.Variable(tf.truncated_normal(shape=[hiden_units, 10], mean=0.0, stddev=0.1))
b_final = tf.Variable(tf.constant(0.1, shape=[10]))
y = tf.nn.softmax(tf.matmul(layer1, W_final) + b_final)
### training
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean( -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]) )
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
steps = 3000
for iter_step in xrange(steps):
#feed_dict_batch = get_batch_feed(X_train, Y_train, M, phase_train)
batch_xs, batch_ys = mnist.train.next_batch(100)
# Collect model statistics
if iter_step%1000 == 0:
batch_xstrain, batch_xstrain = batch_xs, batch_ys #simualtes train data
batch_xcv, batch_ycv = mnist.test.next_batch(5000) #simualtes CV data
batch_xtest, batch_ytest = mnist.test.next_batch(5000) #simualtes test data
# do inference
train_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xs, y_:batch_ys, train_phase: False})
cv_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xcv, y_:batch_ycv, train_phase: False})
test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})
def do_stuff_with_errors(*args):
print args
do_stuff_with_errors(train_error, cv_error, test_error)
# Run Train Step
sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})
# list of booleans indicating correct predictions
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
# accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels, train_phase: False}))
when I run it I get:
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(2.3474066, 2.3498712, 2.3461707)
(0.49414295, 0.88536006, 0.91152304)
(0.51632041, 0.393666, nan)
0.9296
it used to be all the last ones were nan and now only a few of them. Is everything fine or am I paranoic?
I am not sure if this will solve your problem, the documentation for BatchNorm is not quite easy-to-use/informative, so here is a short recap on how to use simple BatchNorm:
First of all, you define your BatchNorm layer. If you want to use it after an affine/fully-connected layer, you do this (just an example, order can be different/as you desire):
...
inputs = tf.matmul(inputs, W) + b
inputs = tf.layers.batch_normalization(inputs, training=is_training)
inputs = tf.nn.relu(inputs)
...
The function tf.layers.batch_normalization calls variable-initializers. These are internal-variables and need a special scope to be called, which is in the tf.GraphKeys.UPDATE_OPS. As such, you must call your optimizer function as follows (after all layers have been defined!):
...
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
trainer = tf.train.AdamOptimizer()
updateModel = trainer.minimize(loss, global_step=global_step)
...
You can read more about it here. I know it's a little late to answer your question, but it might help other people coming across BatchNorm problems in tensorflow! :)
training =tf.placeholder(tf.bool, name = 'training')
lr_holder = tf.placeholder(tf.float32, [], name='learning_rate')
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate = lr).minimize(cost)
when defining the layers, you need to use the placeholder 'training'
batchNormal_layer = tf.layers.batch_normalization(pre_batchNormal_layer, training=training)