I'm learning how to use Tensorflow with the MNIST tutorial, but I'm blocking on a point of the tutorial.
Here is the code provided :
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
saver = tf.train.Saver()
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
But I actually don't understand at all how the variables "W" (The weight) and "b" (The bias) are changed while the computing ?
On each batch, they are initialized at zero, but after ?
I don't see at all where in the code they're going to change ?
Thanks you very much in advance!
TensorFlow variables maintain their state from one run() call to the next. In your program they will be initialized to zero, and then progressively updated in the training loop.
The code that changes the values of the variables is created, implicitly, by this line:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
In TensorFlow, a tf.train.Optimizer is a class that creates operations for updating variables, typically based on the gradient of some tensor (e.g. a loss) with respect to those variables. By default, when you
call Optimizer.minimize(), TensorFlow creates operations to update all variables on which the given tensor (in this case cross_entropy) depends.
When you call sess.run(train_step), this runs a graph that includes those update operations, and therefore instructs TensorFlow to update the values of the variables.
Related
TensorFlow graph API separates graph construction and execution. Because of this, I can't understand in which line neural network is executed.
"""
- model_fn: function that performs the forward pass of the model
- init_fn: function that initializes the parameters of the model.
- learning_rate: the learning rate to use for SGD.
"""
tf.reset_default_graph()
is_training = tf.placeholder(tf.bool, name='is_training')
with tf.device(device):
x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int32, [None])
params = init_fn() # Initialize the model parameters
scores = model_fn(x, params) # Forward pass of the model
loss = training_step(scores, y, params, learning_rate) # SGD
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t, (x_np, y_np) in enumerate(train_dset):
feed_dict = {x: x_np, y: y_np}
loss_np = sess.run(loss, feed_dict=feed_dict)
As described in the Tensorflow documentation: (https://www.tensorflow.org/api_docs/python/tf/Session#run)
This method runs one "step" of TensorFlow computation, by running the
necessary graph fragment to execute every Operation and evaluate every
Tensor
In you example, sess.run(tf.global_variables_initializer()) run the initilization operation that create all the weights and tensors, loss_np = sess.run(loss, feed_dict=feed_dict) executes all the operations up to loss.
I hope this answers your question
I've a simple MNIST which I've successfully saved, being the code the next:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
import tensorflow as tf
sess = tf.InteractiveSession()
tf_save_file = './mnist-to-save-saved'
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())
y = tf.matmul(x, W) + b
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
saver.save(sess, tf_save_file)
for _ in range(1000):
batch = mnist.train.next_batch(100)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver.save(sess, tf_save_file, global_step=1000)
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Then, the next files are generated:
checkpoint
mnist-to-save-saved-1000.data-00000-of-00001
mnist-to-save-saved-1000.index
mnist-to-save-saved-1000.meta
mnist-to-save-saved.data-00000-of-00001
mnist-to-save-saved.index
mnist-to-save-saved.meta
Now, in order to use it in production (and so, for example, pass it a number image), I want to be able to execute the trained model by passing it any number image to make the prediction (I mean, not deploying yet a server but making this prediction "locally", having in the same directory that "fixed" number image, so using the model would be like when you run an executable).
But, considering the (mid-low?) API level of my code, I'm confused about what would be the easiest correct next step (if restoring, using an Estimator, etc...), and how to do it.
Although I've read the official documentation, I insist that they seem to be many ways, but some are a bit complex and "noisy" for a simple model like this.
Edit:
I've edit and re-run the mnist file, whose code is the same as above except for those lines:
...
x = tf.placeholder(tf.float32, shape=[None, 784], name='input')
...
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1), name='result')
...
Then, I try to run this another .py code (in the same directory as the above code) in order to pass a local handwritten number image ("mnist-input-image.png") located in the same directory:
import tensorflow as tf
from PIL import Image
import numpy as np
image_test = Image.open("mnist-input-image.png")
image = np.array(image_test)
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/Users/username/.meta')
new = saver.restore(sess, tf.train.latest_checkpoint('/Users/username/'))
graph = tf.get_default_graph()
input_x = graph.get_tensor_by_name("input:0")
result = graph.get_tensor_by_name("result:0")
feed_dict = {input_x: image}
predictions = result.eval(feed_dict=feed_dict)
print(predictions)
Now, if I correctly understand, I've to pass the image as numpy array. Then, my questions are:
1) Which is the exact file reference of those lines (since I've no .meta folder in my User folder)?
saver = tf.train.import_meta_graph('/Users/username/.meta')
new = saver.restore(sess, tf.train.latest_checkpoint('/Users/username/'))
I mean, to which exact files refer those lines (from my generated files list above)?
2) Translasted to my case, is correct this line to pass my numpy array into the feed dict?
feed_dict = {input_x: image}
A simple solution is to use your session object. When you have generated the checkpoint file, you can restore it with a Saver object.
By the way, do you know why most tutorials have their graph creation inside of a function? One good reason is because you can deserialize the graph quickly with your inputs.
The correct method to start a session is with the following:
# Use your placeholders, variables, etc to create the entire graph.
# Usually you return the input placeholder,
# prediction and the loss/accuracy here.
# You don't need the accuracy.
x, y, _ = make_your_graph(test_X, test_y)
# This object is the interface for serialization in tf
saver = tf.train.Saver()
with tf.Session() as sess:
# Takes your current model's checkpoint. "./checkpoint" is your checkpoint file.
saver.restore(sess, tf.train.latest_checkpoint("./checkpoint"))
prediction = sess.run(y)
Want to run more than 1 data point for your already-booted up session?
Then replace the last line with a feed dict:
while waiting_for_new_y():
another_y = get_new_y()
feed_dict = {x: [another_y]}
another_prediction = sess.run(y, feed_dict)
First of all , give value to name parameter in each object which you want to use later , so that you can use it later by it's name:
change this :
x = tf.placeholder(tf.float32, shape=[None, 784])
to
x = tf.placeholder(tf.float32, shape=[None, 784],name='input')
and
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
to
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1),name='result')
Now run this small script to store model :
import tensorflow as tf
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/Users/dummy/.meta')
new=saver.restore(sess, tf.train.latest_checkpoint('/Users/dummy/'))
graph = tf.get_default_graph()
input_x = graph.get_tensor_by_name("input:0")
result = graph.get_tensor_by_name("result:0")
feed_dict = {input_x: mnist.test.images,} #here you feed your new data for example i am feeding mnist
predictions = result.eval(feed_dict=feed_dict)
print(predictions)
And you will get output.
Here is a basic Tensorflow network example (based on MNIST), complete code, that gives roughly 0.92 accuracy:
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run() # or
tf.initialize_all_variables().run()
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Question: Why adding an extra layer, like in the code below, makes it so much worse that it drops to about 0.11 accuracy?
W = tf.Variable(tf.zeros([784, 100]))
b = tf.Variable(tf.zeros([100]))
h0 = tf.nn.relu(tf.matmul(x, W) + b)
W2 = tf.Variable(tf.zeros([100, 10]))
b2 = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(h0, W2) + b2)
The example does not properly initialise weights, but without a hidden layer, it turns out the effective linear softmax regression that the demo does is unaffected by that choice. Setting them all to zero is safe, but only for a single layer network.
When you make a deeper network though, this is a disastrous choice. You must use non-equal initialisation of neural network weights, and the usual quick way to do this is randomly.
Try this:
W = tf.Variable(tf.random_uniform([784, 100], -0.01, 0.01))
b = tf.Variable(tf.zeros([100]))
h0 = tf.nn.relu(tf.matmul(x, W) + b)
W2 = tf.Variable(tf.random_uniform([100, 10], -0.01, 0.01))
b2 = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(h0, W2) + b2)
The reason you need these non-identical weights is to do with how back propagation works - the values of weights in the layer determine how that layer will calculate gradients. If all the weights are the same, then all the gradients will be the same. Which means in turn that all weight updates are the same - everything changes in lockstep, and the behaviour is similar to if you have a single neuron in the hidden layer (because you have multiple neurons all with identical parameters), which can effectively only choose one class.
Neil explained you nicely how to fix your problem, I will add a little bit of explanation why this happens.
The problem is not so much that the gradients are all the same, but also by the fact the all of them are 0. This happens because relu(Wx + b) = 0 when W = 0 and b = 0. There is even a name for this - dead neuron.
The network does not progress at all and it does not matter whether you train it for 1 step of for 1mln. The results will not be different from a random choice and you see it with your accuracy of 0.11 (if you randomly select stuff you will get 0.10).
First of all, I'm very new in Python and Tensorflow either.
I'm trying on demo of link: https://www.tensorflow.org/get_started/mnist/beginners
and it runs well.
However, I would like to debug (or log) the value of some placeholders, variables which are changed when I run Session.run(). I
Could you please show me the way to "debug" or log them when Session running in the loops?
Here is my code
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("mnist/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y1 = tf.add(tf.matmul(x,W),b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
cross_entropy1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y1, y_))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy1)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.argmax(y,1), feed_dict={x: mnist.test.images, y_: mnist.test.labels})
In this script, I would like to log the value of y and tf.argmax(y, 1) for each test image processed.
Mrry answered it best in this stackoverflow answer: https://stackoverflow.com/a/33633839/6487788
Exactly what you are asking (printing during sess.run) would be this part of his answer:
To print the value of a tensor without returning it to your Python program, you can use the tf.Print() op, as And suggests in another answer. Note that you still need to run part of the graph to see the output of this op, which is printed to standard output. If you're running distributed TensorFlow, the tf.Print() op will print its output to the standard output of the task where that op runs.
This would be this code for argmax:
argmaxy = tf.Print(tf.argmax(y,1))
correct_prediction = tf.equal(argmaxy, tf.argmax(y_,1))
Good luck!
While #rmeerten's answer is correct, you can consider also using TensorBoard which can be a useful tool for debugging your models and seeing what's happening. For background, you can also check out the TensorBoard session from the TensorFlow Dev Summit.
I was trying to use batch normalization to train my Neural Networks using TensorFlow but it was unclear to me how to use the official layer implementation of Batch Normalization (note this is different from the one from the API).
After some painful digging on the their github issues it seems that one needs a tf.cond to use it properly and also a 'resue=True' flag so that the BN shift and scale variables are properly reused. After figuring that out I provided a small description of how I believe is the right way to use it here.
Now I have written a short script to test it (only a single layer and a ReLu, hard to make it smaller than this). However, I am not 100% sure how to test it. Right now my code runs with no error messages but returns NaNs unexpectedly. Which lowers my confidence that the code I gave in the other post might be right. Or maybe the network I have is weird. Either way, does someone know whats wrong? Here is the code:
import tensorflow as tf
# download and install the MNIST data automatically
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
def batch_norm_layer(x,train_phase,scope_bn):
bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=True,
reuse=None, # is this right?
trainable=True,
scope=scope_bn)
bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=False,
reuse=True, # is this right?
trainable=True,
scope=scope_bn)
z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return z
def get_NN_layer(x, input_dim, output_dim, scope, train_phase):
with tf.name_scope(scope+'vars'):
W = tf.Variable(tf.truncated_normal(shape=[input_dim, output_dim], mean=0.0, stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[output_dim]))
with tf.name_scope(scope+'Z'):
z = tf.matmul(x,W) + b
with tf.name_scope(scope+'BN'):
if train_phase is not None:
z = batch_norm_layer(z,train_phase,scope+'BN_unit')
with tf.name_scope(scope+'A'):
a = tf.nn.relu(z) # (M x D1) = (M x D) * (D x D1)
return a
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# placeholder for data
x = tf.placeholder(tf.float32, [None, 784])
# placeholder that turns BN during training or off during inference
train_phase = tf.placeholder(tf.bool, name='phase_train')
# variables for parameters
hiden_units = 25
layer1 = get_NN_layer(x, input_dim=784, output_dim=hiden_units, scope='layer1', train_phase=train_phase)
# create model
W_final = tf.Variable(tf.truncated_normal(shape=[hiden_units, 10], mean=0.0, stddev=0.1))
b_final = tf.Variable(tf.constant(0.1, shape=[10]))
y = tf.nn.softmax(tf.matmul(layer1, W_final) + b_final)
### training
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean( -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]) )
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
steps = 3000
for iter_step in xrange(steps):
#feed_dict_batch = get_batch_feed(X_train, Y_train, M, phase_train)
batch_xs, batch_ys = mnist.train.next_batch(100)
# Collect model statistics
if iter_step%1000 == 0:
batch_xstrain, batch_xstrain = batch_xs, batch_ys #simualtes train data
batch_xcv, batch_ycv = mnist.test.next_batch(5000) #simualtes CV data
batch_xtest, batch_ytest = mnist.test.next_batch(5000) #simualtes test data
# do inference
train_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xs, y_:batch_ys, train_phase: False})
cv_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xcv, y_:batch_ycv, train_phase: False})
test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})
def do_stuff_with_errors(*args):
print args
do_stuff_with_errors(train_error, cv_error, test_error)
# Run Train Step
sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})
# list of booleans indicating correct predictions
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
# accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels, train_phase: False}))
when I run it I get:
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(2.3474066, 2.3498712, 2.3461707)
(0.49414295, 0.88536006, 0.91152304)
(0.51632041, 0.393666, nan)
0.9296
it used to be all the last ones were nan and now only a few of them. Is everything fine or am I paranoic?
I am not sure if this will solve your problem, the documentation for BatchNorm is not quite easy-to-use/informative, so here is a short recap on how to use simple BatchNorm:
First of all, you define your BatchNorm layer. If you want to use it after an affine/fully-connected layer, you do this (just an example, order can be different/as you desire):
...
inputs = tf.matmul(inputs, W) + b
inputs = tf.layers.batch_normalization(inputs, training=is_training)
inputs = tf.nn.relu(inputs)
...
The function tf.layers.batch_normalization calls variable-initializers. These are internal-variables and need a special scope to be called, which is in the tf.GraphKeys.UPDATE_OPS. As such, you must call your optimizer function as follows (after all layers have been defined!):
...
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
trainer = tf.train.AdamOptimizer()
updateModel = trainer.minimize(loss, global_step=global_step)
...
You can read more about it here. I know it's a little late to answer your question, but it might help other people coming across BatchNorm problems in tensorflow! :)
training =tf.placeholder(tf.bool, name = 'training')
lr_holder = tf.placeholder(tf.float32, [], name='learning_rate')
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate = lr).minimize(cost)
when defining the layers, you need to use the placeholder 'training'
batchNormal_layer = tf.layers.batch_normalization(pre_batchNormal_layer, training=training)