TensorFlow graph API separates graph construction and execution. Because of this, I can't understand in which line neural network is executed.
"""
- model_fn: function that performs the forward pass of the model
- init_fn: function that initializes the parameters of the model.
- learning_rate: the learning rate to use for SGD.
"""
tf.reset_default_graph()
is_training = tf.placeholder(tf.bool, name='is_training')
with tf.device(device):
x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int32, [None])
params = init_fn() # Initialize the model parameters
scores = model_fn(x, params) # Forward pass of the model
loss = training_step(scores, y, params, learning_rate) # SGD
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for t, (x_np, y_np) in enumerate(train_dset):
feed_dict = {x: x_np, y: y_np}
loss_np = sess.run(loss, feed_dict=feed_dict)
As described in the Tensorflow documentation: (https://www.tensorflow.org/api_docs/python/tf/Session#run)
This method runs one "step" of TensorFlow computation, by running the
necessary graph fragment to execute every Operation and evaluate every
Tensor
In you example, sess.run(tf.global_variables_initializer()) run the initilization operation that create all the weights and tensors, loss_np = sess.run(loss, feed_dict=feed_dict) executes all the operations up to loss.
I hope this answers your question
Related
I am using tensorflow and I have developer a deep multilayer feedforward model. To be sure about the performance of the model, I decided to use it in 10-fold cross validation. In each fold I create a new instance of the neural network, call the train and the predict functions.
In each fold I call the following codes:
for each fold:
nn= ffNN(hidden_nodes, epochs, learning_rate, saveFrequency, save_path, decay, decay_step, decay_factor, stop_loss, keep_probability, regularization_factor,minimum_cost,activation_function,batch_size,shuffle,stopping_iteration)
nn.initialize(x_size)
nn.train(X,y)
nn.predict(X_test)
in ffNN file I have the initialization and train and predict functions as follow:
nn.train:
sess = tf.InteractiveSession()
init = tf.global_variables_initializer()
sess.run(init)
saver = tf.train.Saver()
for each epoch:
for each batch:
_ , loss = session.run([self.optimizer,self.loss],feed_dict={self.X:X1, self.y:y})
if epoch % save_frequency == 0:
saver.save(session,save_path)
sess.close()
The problem is in saver.save, in each fold it takes longer and longer to save. Although I create all of the variables from the scratch, I don't know what is making it dependent on the folds and make the saving takes longer and longer.
Thanks in advance.
Edit:
The code for building the model nn.initialize is as follow:
self.X = tf.placeholder("float", shape=[None, x_size], name='XValue')
self.y = tf.placeholder("float", shape=[None, y_size], name='yValue')
with tf.variable_scope("initialization", reuse=tf.AUTO_REUSE):
w_in, b_in = init_weights((x_size, self.hidden_nodes))
h_out = self.forwardprop(self.X, w_in, b_in, self.keep_prob,self.activation_function)
l2_norm = tf.add(tf.nn.l2_loss(w_in), tf.nn.l2_loss(b_in))
w_out, b_out = init_weights((self.hidden_nodes, y_size))
l2_norm = tf.add(tf.nn.l2_loss(w_out), l2_norm)
l2_norm = tf.add(tf.nn.l2_loss(b_out), l2_norm)
self.yhat = tf.add(tf.matmul(h_out, w_out), b_out)
self.mse = tf.losses.mean_squared_error(labels=self.y, predictions=self.yhat)
self.loss = tf.add(self.mse,self.regularization_factor * l2_norm)
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
Based on what you described in the question the problem is not in saver.save, but the computational graph getting bigger and bigger instead. Thus, the saving takes more time. Make sure to structure the code in the following way:
for each fold:
# Clear the previous computational graph
tf.reset_default_graph()
# Then build the graph
nn = ffNN()
# Create the saver
saver = tf.train.Saver()
# Create a session
with tf.Session() as sess:
# Initialize the variables in the graph
sess.run(tf.global_variables_initializer())
# Train the model
for each epoch:
for each batch:
nn.train_on_batch()
if epoch % save_frequency == 0:
saver.save(sess,save_path)
I'm using Tensorboard 1.5 and I would like to see how my gradients are doing.
Here is an example of layer I am using:
net = tf.layers.dense(features, 40, activation=tf.nn.relu, kernel_regularizer=regularizer,
kernel_initializer=tf.contrib.layers.xavier_initializer())
And here is my optimizer:
train_op = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(loss)
For my model parameters I create summaries this way:
for var in tf.trainable_variables():
tf.summary.histogram(var.name, var)
Is there a similar way to get the all gradients in a for loop to create my summaries?
You should first get the gradients using compute_gradients of the optimizer and then pass them to summary:
opt = tf.train.AdamOptimizer(learning_rate = learning_rate)
# Calculate the gradients for the batch of data
grads = opt.compute_gradients(loss)
# Add histograms for gradients.
for grad, var in grads:
if grad is not None:
summaries.append(tf.summary.histogram(var.op.name + '/gradients', grad))
And then to perform the training, you can call the apply_gradients of optimizer:
# Apply the gradients to adjust the shared variables.
train_op = opt.apply_gradients(grads, global_step=global_step)
for more, you can go to tensorflow cifar10 tutorial.
I'm training a convolutional model in tensorflow. After training the model for about 70 epochs, which took almost 1.5 hrs, I couldn't save the model. It gave me ValueError: GraphDef cannot be larger than 2GB. I found that as the training proceeds the number of nodes in my graph keeps increasing.
At epochs 0,3,6,9, the number of nodes in the graph are 7214, 7238, 7262, 7286 respectively. When I use with tf.Session() as sess:, instead of passing the session as sess = tf.Session(), the number of nodes are 3982, 4006, 4030, 4054 at epochs 0,3,6,9 respectively.
In this answer, it is said that as nodes get added to the graph, it can exceed its maximum size. I need help with understanding how the number of nodes keep going up in my graph.
I train my model using the code below:
def runModel(data):
'''
Defines cost, optimizer functions, and runs the graph
'''
X, y,keep_prob = modelInputs((755, 567, 1),4)
logits = cnnModel(X,keep_prob)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y), name="cost")
optimizer = tf.train.AdamOptimizer(.0001).minimize(cost)
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1), name="correct_pred")
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
for e in range(12):
batch_x, batch_y = data.next_batch(30)
x = tf.reshape(batch_x, [30, 755, 567, 1]).eval(session=sess)
batch_y = tf.one_hot(batch_y,4).eval(session=sess)
sess.run(optimizer, feed_dict={X: x, y: batch_y,keep_prob:0.5})
if e%3==0:
n = len([n.name for n in tf.get_default_graph().as_graph_def().node])
print("No.of nodes: ",n,"\n")
current_cost = sess.run(cost, feed_dict={X: x, y: batch_y,keep_prob:1.0})
acc = sess.run(accuracy, feed_dict={X: x, y: batch_y,keep_prob:1.0})
print("At epoch {epoch:>3d}, cost is {a:>10.4f}, accuracy is {b:>8.5f}".format(epoch=e, a=current_cost, b=acc))
What causes an increase in the number of nodes?
You are creating new nodes within your training loop. In particular, you are calling tf.reshape and tf.one_hot, each of which creates one (or more) nodes. You can either:
Create those nodes outside of the graph using placeholders as inputs, and then only evaluate them in the loop.
Not use TensorFlow for those operations and use instead NumPy or equivalent operations.
I would recommend the second one, since there does not seem to be any benefit in using TensorFlow for data preparation. You can have something like:
import numpy as np
# ...
x = np.reshape(batch_x, [30, 755, 567, 1])
# ...
# One way of doing one-hot encoding with NumPy
classes_arr = np.arange(4).reshape([1] * batch_y.ndims + [-1])
batch_y = (np.expand_dims(batch_y, -1) == classes_arr).astype(batch_y.dtype)
# ...
PD: I'd also recommend using tf.Session() in a with context manager to make sure its close() method is called at the end (unless you want to keep using the same session later).
Another option, that solved a similar problem for me, is to use tf.reset_default_graph()
I'm learning how to use Tensorflow with the MNIST tutorial, but I'm blocking on a point of the tutorial.
Here is the code provided :
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
saver = tf.train.Saver()
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
But I actually don't understand at all how the variables "W" (The weight) and "b" (The bias) are changed while the computing ?
On each batch, they are initialized at zero, but after ?
I don't see at all where in the code they're going to change ?
Thanks you very much in advance!
TensorFlow variables maintain their state from one run() call to the next. In your program they will be initialized to zero, and then progressively updated in the training loop.
The code that changes the values of the variables is created, implicitly, by this line:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
In TensorFlow, a tf.train.Optimizer is a class that creates operations for updating variables, typically based on the gradient of some tensor (e.g. a loss) with respect to those variables. By default, when you
call Optimizer.minimize(), TensorFlow creates operations to update all variables on which the given tensor (in this case cross_entropy) depends.
When you call sess.run(train_step), this runs a graph that includes those update operations, and therefore instructs TensorFlow to update the values of the variables.
I was trying to use batch normalization to train my Neural Networks using TensorFlow but it was unclear to me how to use the official layer implementation of Batch Normalization (note this is different from the one from the API).
After some painful digging on the their github issues it seems that one needs a tf.cond to use it properly and also a 'resue=True' flag so that the BN shift and scale variables are properly reused. After figuring that out I provided a small description of how I believe is the right way to use it here.
Now I have written a short script to test it (only a single layer and a ReLu, hard to make it smaller than this). However, I am not 100% sure how to test it. Right now my code runs with no error messages but returns NaNs unexpectedly. Which lowers my confidence that the code I gave in the other post might be right. Or maybe the network I have is weird. Either way, does someone know whats wrong? Here is the code:
import tensorflow as tf
# download and install the MNIST data automatically
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
def batch_norm_layer(x,train_phase,scope_bn):
bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=True,
reuse=None, # is this right?
trainable=True,
scope=scope_bn)
bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
is_training=False,
reuse=True, # is this right?
trainable=True,
scope=scope_bn)
z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return z
def get_NN_layer(x, input_dim, output_dim, scope, train_phase):
with tf.name_scope(scope+'vars'):
W = tf.Variable(tf.truncated_normal(shape=[input_dim, output_dim], mean=0.0, stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[output_dim]))
with tf.name_scope(scope+'Z'):
z = tf.matmul(x,W) + b
with tf.name_scope(scope+'BN'):
if train_phase is not None:
z = batch_norm_layer(z,train_phase,scope+'BN_unit')
with tf.name_scope(scope+'A'):
a = tf.nn.relu(z) # (M x D1) = (M x D) * (D x D1)
return a
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# placeholder for data
x = tf.placeholder(tf.float32, [None, 784])
# placeholder that turns BN during training or off during inference
train_phase = tf.placeholder(tf.bool, name='phase_train')
# variables for parameters
hiden_units = 25
layer1 = get_NN_layer(x, input_dim=784, output_dim=hiden_units, scope='layer1', train_phase=train_phase)
# create model
W_final = tf.Variable(tf.truncated_normal(shape=[hiden_units, 10], mean=0.0, stddev=0.1))
b_final = tf.Variable(tf.constant(0.1, shape=[10]))
y = tf.nn.softmax(tf.matmul(layer1, W_final) + b_final)
### training
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean( -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]) )
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
steps = 3000
for iter_step in xrange(steps):
#feed_dict_batch = get_batch_feed(X_train, Y_train, M, phase_train)
batch_xs, batch_ys = mnist.train.next_batch(100)
# Collect model statistics
if iter_step%1000 == 0:
batch_xstrain, batch_xstrain = batch_xs, batch_ys #simualtes train data
batch_xcv, batch_ycv = mnist.test.next_batch(5000) #simualtes CV data
batch_xtest, batch_ytest = mnist.test.next_batch(5000) #simualtes test data
# do inference
train_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xs, y_:batch_ys, train_phase: False})
cv_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xcv, y_:batch_ycv, train_phase: False})
test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})
def do_stuff_with_errors(*args):
print args
do_stuff_with_errors(train_error, cv_error, test_error)
# Run Train Step
sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})
# list of booleans indicating correct predictions
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
# accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels, train_phase: False}))
when I run it I get:
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(2.3474066, 2.3498712, 2.3461707)
(0.49414295, 0.88536006, 0.91152304)
(0.51632041, 0.393666, nan)
0.9296
it used to be all the last ones were nan and now only a few of them. Is everything fine or am I paranoic?
I am not sure if this will solve your problem, the documentation for BatchNorm is not quite easy-to-use/informative, so here is a short recap on how to use simple BatchNorm:
First of all, you define your BatchNorm layer. If you want to use it after an affine/fully-connected layer, you do this (just an example, order can be different/as you desire):
...
inputs = tf.matmul(inputs, W) + b
inputs = tf.layers.batch_normalization(inputs, training=is_training)
inputs = tf.nn.relu(inputs)
...
The function tf.layers.batch_normalization calls variable-initializers. These are internal-variables and need a special scope to be called, which is in the tf.GraphKeys.UPDATE_OPS. As such, you must call your optimizer function as follows (after all layers have been defined!):
...
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
trainer = tf.train.AdamOptimizer()
updateModel = trainer.minimize(loss, global_step=global_step)
...
You can read more about it here. I know it's a little late to answer your question, but it might help other people coming across BatchNorm problems in tensorflow! :)
training =tf.placeholder(tf.bool, name = 'training')
lr_holder = tf.placeholder(tf.float32, [], name='learning_rate')
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate = lr).minimize(cost)
when defining the layers, you need to use the placeholder 'training'
batchNormal_layer = tf.layers.batch_normalization(pre_batchNormal_layer, training=training)