computation of test data in tensorflow tutorial - python

I was going through the tutorial of tensorflow-
https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10])) #weights
b = tf.Variable(tf.zeros([10])) #bias
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Towards the very end, we pass in test data to the placeholders. y_ is matrix containing true values. and y is the matrix with predicted values. My question is when is y computed for the test data. The W matrix has been trained by backpropagation. But this trained matrix must be multiplied with new input x (test data) to give the prediction y. Where does this happen?
Normally i have seen sequential execution of code, and in the last few lines, y isn't called explicitly.

accuracy depends on correct_prediction which depends on y.
So when you call sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}), y is computed before accuracy is computed. All this happen inside the TensorFlow graph.
The TensorFlow graph is the same for train and test. The only difference is the data you feed to the placeholders x and y_.

y is computed here:
y = tf.nn.softmax(tf.matmul(x, W) + b) # Line 7
specifically what you are looking for is with in that line:
tf.matmul(x, W) + b
the output of which is put through the softmax function to identify the class.
This is computed in each of the 1000 passes through the graph, each time the variables W, and b are updated by GradientDescent and y is computed and compared against y_ to determine the loss.

Related

tensorflow GradientDescentOptimizer not updating variables?

I'm new to machine learning. I started with the simplest example of classification mnist handwritten images with softmax and gradient descent. By referencing some other examples, I came up with my own Logistic regression below:
import tensorflow as tf
import numpy as np
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = np.float32(x_train / 255.0)
x_test = np.float32(x_test / 255.0)
X = tf.placeholder(tf.float32, [None, 28, 28])
Y = tf.placeholder(tf.uint8, [100])
XX = tf.reshape(X, [-1, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
def err(x, y):
predictions = tf.matmul(x, W) + b
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf.reshape(y, [-1, 1]), logits=predictions))
# value = tf.reduce_mean(y * tf.log(predictions))
# loss = -tf.reduce_mean(tf.one_hot(y, 10) * tf.log(predictions)) * 100.
return loss
# cost = err(np.reshape(x_train[:100], (-1, 784)), y_train[:100])
cost = err(tf.reshape(X, (-1, 784)), Y)
optimizer = tf.train.GradientDescentOptimizer(0.005).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
# temp = sess.run(tf.matmul(XX, W) + b, feed_dict={X: x_train[:100]})
temp = sess.run(cost, feed_dict={X: x_train[:100], Y: y_train[:100]})
print(temp)
# print(temp.dtype)
# print(type(temp))
for i in range(100):
sess.run(optimizer, feed_dict={X: x_train[i * 100: 100 * (i + 1)], Y: y_train[i * 100: 100 * (i + 1)]})
# sess.run(optimizer, feed_dict={X: x_train[: 100], Y: y_train[:100]})
temp = sess.run(cost, feed_dict={X: x_train[:100], Y: y_train[:100]})
print(temp)
sess.close()
I tried to run the optimizer some iterations, feeding data with train image data and labeles. In my understanding, during the optimizer run, the variables of 'W' and 'b' should be update so the model would produce different result before and after training. But with this code, the printed costs of the model before and after optimizer run were the same. What can be wrong to make this happen?
You are initializing the weights matrix W with zeros and as a result, all parameters receive the same gradient value at each weights update. For weights initialization use tf.truncated_normal(), tf.random_normal(), tf.contrib.layers.xavier_initializer() or something else, but not zeros.
This is a similar question.

Arguments to tensorflow session.run() - do you pass operations?

I'm following this tutorial for tensorflow:
I'm trying to understand the arguments to tf.session.run(). I understand that you have to run operations in a graph in a session.
Is train_step passed in because it encapsulates all the operations of the network in this particular example? I'm trying to understand why I don't need to pass any other variables to the session like cross_entropy.
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Here is the full code:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(10):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
In a TensorFlow Session tf.Session, you want to run (or execute) the optimizer operation (in this case it is train_step). The optimizer minimizes your loss function (in this case cross_entropy), which is evaluated or computed using the model hypothesis y.
In the cascade approach, the cross_entropy loss function minimizes the error made when computing y, so it finds the best values of the weights W that when combined with x accurately approximates y.
So using a TensorFlow Session object tf.Session as sess we run the optimizer train_step, which then evaluates the entire Computational Graph.
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Because the cascade approach ultimately calls cross_entropy which makes use of the placeholders x and y, you have to use the feed_dict to pass data to those placeholders.
As you mentioned, Tensorflow is used to build a graph of operations. Your train_step operation (i.e. "minimize by gradient descent") is connected/depends on the result of cross_entropy. cross_entropy itself relies on the results of y (softmax operation) and y_ (data assignment); etc.
When you are calling sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}), you are basically asking Tensorflow "run all the operations leading to train_step, and return its result (with x = batch_xs and y = batch_ys for input)". So yes, Tensorflow will itself go through your graph backward to figure out the operation/input dependencies for train_step, then execute all these operations forward, to return what you asked.

Tensorflow Histogram error

I new to tensorflow and my code is below:
import tensorflow as tf
logdir="/tmp/mnist_tutorial5/"
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(train_dir=logdir+"data",one_hot = True)
tf.reset_default_graph()
sess = tf.Session()
writer = tf.summary.FileWriter(logdir)
def model(input):
w = tf.Variable(tf.truncated_normal([784,10], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[10]), name="B")
act = tf.matmul(input,w) + b
tf.summary.histogram("weights",w)
tf.summary.histogram("biases",b)
tf.summary.histogram("activations",act)
return act
def train():
x = tf.placeholder(tf.float32, shape=[None, 784], name="input_img")
y = tf.placeholder(tf.float32, shape=[None, 10], name="labels")
my_label = model(x)
print("linear_regression is completed")
mean_error = tf.reduce_mean(tf.reduce_sum(tf.square(my_label-y)))
tf.summary.scalar("loss", mean_error)
train_step=tf.train.GradientDescentOptimizer(0.0003).minimize(mean_error)
correct_prediction = tf.equal(tf.argmax(my_label, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar("accuracy", accuracy)
sess.run(tf.global_variables_initializer())
summ = tf.summary.merge_all()
for i in range(2000):
batch = mnist.train.next_batch(100)
train_accuracy = sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})
print("%s th iteration"%i)
if i%500==0:
print("over 2")
summarys = sess.run(summ, {x: batch[0], y: batch[1]})##i'm getting error here
print("over 3")
writer.add_summary(summarys,i)
print("one over")
train()
writer.add_graph(sess.graph)
And this is the error I am getting:
InvalidArgumentError (see above for traceback): Nan in summary histogram for: weights
[[Node: weights = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](weights/tag, W/read)]]
first of all its just a one layer network with no hidden layer
you have not applied any kind of activation function
no activation means your outputs are not being squashed
your gradients are getting exploded because of which the weights during updation are being exploded which may result in nan
You are training for 2000 epochs which means obviously after few epochs the weights will be nan
try to use an activation function like sigmoid and add atleast one hidden layer and you will be fine
also reduce the number of epochs ... For mnist classification you do not need the model to train for 2000 epochs ... Waste of time

modifying softmax function in tensorflow

I started using tensorflow about a week ago, so I'm not sure what API can I use.
Currently I'm using basic mnist number recognition code.
I want to test how recognition precision of this code changes if I modify the softmax function from floating point calculation to fixed point calculation.
At first I tried to modify the library but it was too complicated to do so. So I think I have to read tensors and modify(calculate) it in the form of array and change it to the tensor using tf.Session().eval() function.
Which function should I use?
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
#temp = tf.Variable(tf.zeros([784, 10]))
temp = tf.Variable(tf.matmul(x, W) + b)
#temp = tf.add(tf.matmul(x, W),b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
#print(temp[500])
for i in range(100):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

what's wrong with my tensorflow code

I just begin to study tensorflow and I want to create a DNN for MNIST. In the tutorial, there is a very simple neural network with 784 input nodes, 10 output nodes and no hidden nodes. I try to modify these codes to create a DNN network. Here is my code. I think I just add a hidden layer with 500 nodes between input and output layers, but the test accuracy is just 10%, which means it is not trained. Do you know what's wrong with my codes?
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
os.chdir('../')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.zeros([784,500]))
B_h1=tf.Variable(tf.zeros([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.zeros([5,5]))
B_h2=tf.Variable(tf.zeros([5]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.zeros([10]))
W_o=tf.Variable(tf.zeros([500,10]))
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
number_steps = 10000
batch_size = 100
for _ in range(number_steps):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
OK, according to #lejlot's suggestion, I change my code as following.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
os.chdir('../')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.random_normal([784,500]))
B_h1=tf.Variable(tf.random_normal([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.random_normal([500,500]))
B_h2=tf.Variable(tf.random_normal([500]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.random_normal([10]))
W_o=tf.Variable(tf.random_normal([500,10]))
y= tf.matmul(h1,W_o)+B_o # notice no activation
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
number_steps = 10000
batch_size = 100
for i in range(number_steps):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
if i % 1000==0:
acc=sess.run(accuracy,feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print('Current loop %d, Accuracy: %g'%(i,acc))
# Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
There are two modification:
change the initial value of W_h1 and B_h1 with tf.random_normal
change the define of y and cross_entropy
The modification dose work. But I still don't know what's wrong with my original code. I call the tf.global_variables_initializer().run(), and I think this function will random the value of W_h1 and B_h1. Besides, if I define y and cross_entropy as following, it doesn't work.
y= tf.nn.softmax(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),reduction_indices=[1]))
First of all this is not valid classifier model.
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
You are using explicit equation for cross entropy which requires y to be a (row-wise) probability distribution, yet you produce y by applying relu, meaning that you are simply outputing some non-negative numbers. In fact, if you ever output zeros, your code will produce NaNs and fail (as log of 0 is minus infinity).
You should use
y = tf.nn.softmax(tf.matmul(h1,W_o)+B_o)
instead. Or even better (for better numerical stability):
y= tf.matmul(h1,W_o)+B_o # notice no activation
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(
-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
reduction_indices=[1]))
update
Second issue is initialisation - you cannot initialise neural network weights to zeros, they have to be random numbers, typically sampled from low-variance zero-mean Gaussians. Global initialiser does not randomise weights, it simply runs all the initialisation ops - if the initialisation ops are constant ones (like zeros), it simply makes sure these zeros are assigned to variables, nothing else (thus it can be used to reset the network etc.). Zero initialisation works only for convex problems, such as logistic regression, but cannot work for complex model like neural network.

Categories

Resources