I'm trying to implement an asynchronous parameter server, DistBelief style using TensorFlow. I found that minimize() is split into two functions, compute_gradients and apply_gradients, so my plan is to insert a network boundary between them. I have a question about how to evaluate all the gradients simultaneously and pull them out all at once. I understand that eval only evaluates the subgraph necessary, but it also only returns one tensor, not the chain of tensors required to compute that tensor.
How can I do this more efficiently? I took the Deep MNIST example as a starting point:
import tensorflow as tf
import download_mnist
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial, name=name)
def bias_variable(shape, name):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial, name=name)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
mnist = download_mnist.read_data_sets('MNIST_data', one_hot=True)
session = tf.InteractiveSession()
x = tf.placeholder("float", shape=[None, 784], name='x')
x_image = tf.reshape(x, [-1,28,28,1], name='reshape')
y_ = tf.placeholder("float", shape=[None, 10], name='y_')
W_conv1 = weight_variable([5, 5, 1, 32], 'W_conv1')
b_conv1 = bias_variable([32], 'b_conv1')
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2')
b_conv2 = bias_variable([64], 'b_conv2')
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024], 'W_fc1')
b_fc1 = bias_variable([1024], 'b_fc1')
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder("float", name='keep_prob')
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10], 'W_fc2')
b_fc2 = bias_variable([10], 'b_fc2')
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
loss = -tf.reduce_sum(y_ * tf.log(y_conv))
optimizer = tf.train.AdamOptimizer(1e-4)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
compute_gradients = optimizer.compute_gradients(loss)
session.run(tf.initialize_all_variables())
batch = mnist.train.next_batch(50)
feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}
gradients = []
for grad_var in compute_gradients:
grad = grad_var[0].eval(feed_dict=feed_dict)
var = grad_var[1]
gradients.append((grad, var))
I think this last for loop is actually recalculating the last gradient several times, whereas the first gradient is computed only once? How can I grab all the gradients without recomputing them?
Just give you a simple example. Understand it and try your specific task out.
Initialize required symbols.
x = tf.Variable(0.5)
y = x*x
opt = tf.train.AdagradOptimizer(0.1)
grads = opt.compute_gradients(y)
grad_placeholder = [(tf.placeholder("float", shape=grad[1].get_shape()), grad[1] for grad in grads]
apply_placeholder_op = opt.apply_gradients(grad_placeholder)
transform_grads = [(function1(grad[0]), grad[1]) for grad in grads]
apply_transform_op = opt.apply_gradients(transform_grads)
Initialize
sess = tf.Session()
sess.run(tf.initialize_all_variables())
Get all gradients
grad_vals = sess.run([grad[0] for grad in grads])
Apply gradients
feed_dict = {}
for i in xrange(len(grad_placeholder)):
feed_dict[grad_placeholder[i][0]] = function2(grad_vals[i])
sess.run(apply_placeholder_op, feed_dict=feed_dict)
sess.run(apply_transform_op)
Note: the code hasn't been tested by myself, but I confirm the code is legal except minor code errors.
Note: function1 and function2 is kind of computation, such as 2*x, x^e or e^x and so on.
Refer: TensorFlow apply_gradients remotely
I coded up a very simple example with comments (inspired from the above answer) that is runnable to see gradient descent in action:
import tensorflow as tf
#funciton to transform gradients
def T(g, decay=1.0):
#return decayed gradient
return decay*g
# x variable
x = tf.Variable(10.0,name='x')
# b placeholder (simualtes the "data" part of the training)
b = tf.placeholder(tf.float32)
# make model (1/2)(x-b)^2
xx_b = 0.5*tf.pow(x-b,2)
y=xx_b
learning_rate = 1.0
opt = tf.train.GradientDescentOptimizer(learning_rate)
# gradient variable list = [ (gradient,variable) ]
gv = opt.compute_gradients(y,[x])
# transformed gradient variable list = [ (T(gradient),variable) ]
decay = 0.1 # decay the gradient for the sake of the example
tgv = [(T(g,decay=decay),v) for (g,v) in gv] #list [(grad,var)]
# apply transformed gradients (this case no transform)
apply_transform_op = opt.apply_gradients(tgv)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
epochs = 10
for i in range(epochs):
b_val = 1.0 #fake data (in SGD it would be different on every epoch)
print '----'
x_before_update = x.eval()
print 'before update',x_before_update
# compute gradients
grad_vals = sess.run([g for (g,v) in gv], feed_dict={b: b_val})
print 'grad_vals: ',grad_vals
# applies the gradients
result = sess.run(apply_transform_op, feed_dict={b: b_val})
print 'value of x should be: ', x_before_update - T(grad_vals[0], decay=decay)
x_after_update = x.eval()
print 'after update', x_after_update
you can observe the change in the variable as its trained and also the value of the gradient. Note that the only reason T decays the gradient because otherwise it reaches the global minimum in 1 step.
As an extra bonus, if you want to see it work with tensorboard, here you go! :)
## run cmd to collect model: python quadratic_minimizer.py --logdir=/tmp/quaratic_temp
## show board on browser run cmd: tensorboard --logdir=/tmp/quaratic_temp
## browser: http://localhost:6006/
import tensorflow as tf
#funciton to transform gradients
def T(g, decay=1.0):
#return decayed gradient
return decay*g
# x variable
x = tf.Variable(10.0,name='x')
# b placeholder (simualtes the "data" part of the training)
b = tf.placeholder(tf.float32)
# make model (1/2)(x-b)^2
xx_b = 0.5*tf.pow(x-b,2)
y=xx_b
learning_rate = 1.0
opt = tf.train.GradientDescentOptimizer(learning_rate)
# gradient variable list = [ (gradient,variable) ]
gv = opt.compute_gradients(y,[x])
# transformed gradient variable list = [ (T(gradient),variable) ]
decay = 0.9 # decay the gradient for the sake of the example
tgv = [ (T(g,decay=decay), v) for (g,v) in gv] #list [(grad,var)]
# apply transformed gradients (this case no transform)
apply_transform_op = opt.apply_gradients(tgv)
(dydx,_) = tgv[0]
x_scalar_summary = tf.scalar_summary("x", x)
grad_scalar_summary = tf.scalar_summary("dydx", dydx)
with tf.Session() as sess:
merged = tf.merge_all_summaries()
tensorboard_data_dump = '/tmp/quaratic_temp'
writer = tf.train.SummaryWriter(tensorboard_data_dump, sess.graph)
sess.run(tf.initialize_all_variables())
epochs = 14
for i in range(epochs):
b_val = 1.0 #fake data (in SGD it would be different on every epoch)
print '----'
x_before_update = x.eval()
print 'before update',x_before_update
# get gradients
#grad_list = [g for (g,v) in gv]
(summary_str_grad,grad_val) = sess.run([merged] + [dydx], feed_dict={b: b_val})
grad_vals = sess.run([g for (g,v) in gv], feed_dict={b: b_val})
print 'grad_vals: ',grad_vals
writer.add_summary(summary_str_grad, i)
# applies the gradients
[summary_str_apply_transform,_] = sess.run([merged,apply_transform_op], feed_dict={b: b_val})
writer.add_summary(summary_str_apply_transform, i)
print 'value of x after update should be: ', x_before_update - T(grad_vals[0], decay=decay)
x_after_update = x.eval()
print 'after update', x_after_update
Related
I'm trying to get started with TensorFlow in python, building a simple CNN with batch normalization. But when i create a new graph to run, exception happens to BN.
My key codes is as follows
**# exception here**
def batch_norm(x, beta, gamma, phase_train, scope='bn', decay=0.9, eps=1e-5):
with tf.variable_scope(scope):
batch_mean, batch_var = tf.nn.moments(x, [0], name='moments')
ema = tf.train.ExponentialMovingAverage(decay=decay)
def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(phase_train, mean_var_with_update, lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, eps)
return normed
training code:
# start training
output = conv2d_net()
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=output, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.002).minimize(loss)
predict = tf.reshape(output, [-1, MAX_CAPTCHA, CHAR_SET_LEN])
max_idx_p = tf.argmax(predict, 2)
max_idx_l = tf.argmax(tf.reshape(Y, [-1, MAX_CAPTCHA, CHAR_SET_LEN]), 2)
correct_pred = tf.equal(max_idx_p, max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
step = 0
while True:
batch_x, batch_y = get_next_batch(64)
_, loss_ = sess.run([optimizer, loss],
feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.75, train_phase: True})
print(step, loss_)
if step % 10 == 0 and step != 0:
batch_x_test, batch_y_test = get_next_batch(100)
acc = sess.run(accuracy,
feed_dict={X: batch_x_test, Y: batch_y_test, keep_prob: 1., train_phase: False})
print("step %s,accuracy:%s" % (step, acc))
if acc > 0.05:
# stop training and save parameters in layer
result_weights['wc1'] = weights['wc1'].eval(sess)
...
break
step += 1
Create new graph for exporting:
EXPORT_DIR = './model'
if os.path.exists(EXPORT_DIR):
shutil.rmtree(EXPORT_DIR)
g = tf.Graph()
with g.as_default():
x_2 = tf.placeholder(tf.float32, shape=[None, IMAGE_HEIGHT * IMAGE_WIDTH], name="input")
x_image = tf.reshape(x_2, shape=[-1, IMAGE_HEIGHT, IMAGE_WIDTH, 1])
# fill trained parameters and create new cnn layers
WC1 = tf.constant(result_weights['wc1'], name="WC1")
...
**# crash here!!!**
CONV1 = conv2d(WC1, BC1, x_image, tf.constant(0.0, shape=[32]),
tf.random_normal(shape=[32], mean=1.0, stddev=0.02), scope='BN_1')
OUTPUT = tf.add(tf.matmul(FULL1, W_OUT), B_OUT)
OUTPUT = tf.nn.sigmoid(OUTPUT, name="output")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
graph_def = g.as_graph_def()
tf.train.write_graph(graph_def, EXPORT_DIR, 'phone_model_graph.pb', as_text=True)
I create a new graph at last. The exception means it uses incorrect parameter in old training graph. How to explain it?
Thank you very much!
Log is:
I call batch_norm in fuction conv2d. It seems no tensor passed to the new graph.
def conv2d(w, b, x, tf_constant, tf_random_normal, scope, keep_p=1., phase=tf.constant(False)):
out = tf.nn.bias_add(tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME'), b)
out = batch_norm(out, tf_constant, tf_random_normal, phase, scope=scope)
out = tf.nn.relu(out)
out = tf.nn.max_pool(out, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
out = tf.nn.dropout(out, keep_p)
return out
I create a new graph at last.
That's the key statement here: upon creation of a new graph one can't use any tensor from the old graph. See a detailed explanation in this question. According to the stacktrace, at least one of the tensors that is passed to the batch_norm is defined before g.as_default(), that's why tensorflow crashes. From your code snippets it's unclear how exactly the batch_norm is called, so I can't say which one.
You can check this hypothesis by printing x.graph and g and checking if these values are different. In order to avoid this problem you can either do all the work inside one graph (which is a recommended way) or define both graphs in different python scopes thus making impossible to accidentally reuse the same python variable in two graphs.
Given the following example code (to work with):
# Generate random data
x_train = np.random.rand(64, 16, 16, 8)
y_train = np.random.randint(0, 5, 64)
one_hot = np.zeros((len(y_train), 5))
one_hot[list(np.indices((len(y_train),))) + [y_train]] = 1
y_train = one_hot
# Model definition
class FeedForward(object):
def __init__(self):
self.x = tf.placeholder(tf.float32, shape=[None, 16, 16, 8], name="input_x")
self.y = tf.placeholder(tf.float32, [None, 5], name="input_y")
with tf.name_scope("conv1"):
kernel_shape=[3, 3, 8, 8]
w = tf.Variable(tf.truncated_normal(kernel_shape, stddev=0.1), name="weight")
conv1 = tf.nn.conv2d(self.x, w, strides=[1, 1, 1, 1], padding="SAME", name="conv")
with tf.name_scope("conv2"):
kernel_shape=[3, 3, 8, 4]
w = tf.Variable(tf.truncated_normal(kernel_shape, stddev=0.1), name="weight")
conv2 = tf.nn.conv2d(conv1, w, strides=[1, 1, 1, 1], padding="SAME", name="conv")
out = tf.contrib.layers.flatten(conv2)
with tf.name_scope("output"):
kernel_shape=[out.get_shape()[1].value, 5]
w = tf.Variable(tf.truncated_normal(kernel_shape, stddev=0.1), name="weight")
self.scores = tf.matmul(out, w)
predictions = tf.argmax(self.scores, axis=1, name="predictions")
self.loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.y))
correct_predictions = tf.equal(predictions, tf.argmax(self.y, axis=1))
self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
I wish to perform a custom weight update step, ie., aside from the weight update on each iteration, I would like to subtract some fixed value from my weight parameter as below:
with tf.Graph().as_default():
session_conf = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
sess = tf.Session(config=session_conf)
with sess.as_default():
ffn = FeedForward()
global_step = tf.Variable(0, name="global_step", trainable=False)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-2)
grads_and_vars = optimizer.compute_gradients(ffn.loss)
updated_gv = []
for g, w in grads_and_vars:
# perform update on weights aside from ouput weights
if ("weight" in w.name) and ("output" not in w.name):
# some weight update
w_update = tf.Variable.assign(w, w - tf.constant(1.0, shape=w.get_shape()))
updated_gv.append([g, w])
# next two lines are not required here
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.apply_gradients(updated_gv, global_step=global_step)
sess.run(tf.global_variables_initializer())
def train_step(x_batch, y_batch):
feed_dict = {
}
_, step, _update, loss, accuracy = sess.run([train_op, global_step, w_update, ffn.loss, ffn.accuracy],
feed_dict={ffn.x: x_batch, ffn.y: y_batch})
print("step {}, loss {:g}, acc {:g}".format(step, loss, accuracy))
batch_size = 32
s_idx = - batch_size
for batch_index in range(2):
s_idx += batch_size
e_idx = s_idx + batch_size
x_batch = x_train[s_idx:e_idx]
y_batch = y_train[s_idx:e_idx]
train_step(x_batch, y_batch)
current_step = tf.train.global_step(sess, global_step)
However, the above code (and other similar variations of it) do not affect the actual weights. I assume that any assignments using tf.Variable.assign... or tf.assign... are made into some copy of the original variable.
What is the most meaningful way to perform the intended update?
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) this line just returns an empty list in your code.
If you want to perform a (manual) weight change after calculating and applying the gradient, you can break it into two steps:
sess.run([your normal gradient step]) and then
sess.run([your weight update ops])
In the following code, I have modified the Deep MNIST example from the tensorflow tutorials (official).
Modifications -- Added weight decay into the loss function and also modifying the weights as well. (If its incorrect please do let me know).
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import sys
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
from hyperopt import STATUS_OK, STATUS_FAIL
Flags2=None
def build_and_optimize(hp_space):
global Flags2
Flags2 = {}
Flags2['dp'] = hp_space['dropout_global']
Flags2['wd'] = hp_space['wd']
res = main(Flags2)
results = {
'loss': res,
'status': STATUS_OK
}
return results
def deepnn(x):
"""deepnn builds the graph for a deep net for classifying digits.
args:
x: an input tensor with the dimensions (N_examples, 784), where 784 is the number of piexs in a standard MNIST image.
returns:
a tuple (y, keep_prob). y is a tensor of shape (N_examples, 10), with values equal to the logits of classifying the digit into one of classes (the digits 0-9). keep_prob is a scalar placeholder for the probability of dropout.
"""
# reshape to use within a convolutional neural net
# last dimension is for "features" - there is only one here, since images are
# grayscale -- it would be 3 for RGB, 4 for RGBA, etc.
x_image = tf.reshape(x, [-1, 28, 28, 1])
wd = tf.placeholder(tf.float32)
# first convolutional layer - maps one grayscale image to 32 feature maps
W_conv1 = weight_variable([5, 5, 1, 32], wd)
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# pooling layer - downsamples by 2X
h_pool1 = max_pool_2X2(h_conv1)
# second convolutional layer --maps 32 feature maps to 64
W_conv2 = weight_variable([5, 5, 32, 64], wd)
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# second pooling layer - downsamples by 2X
h_pool2 = max_pool_2X2(h_conv2)
# fully connected layer 1 -- after 2 round of downsampleing, our 28x28 image
# is done to 7x7x64 feature maps --maps this to 1025 features.
W_fc1 = weight_variable([7*7*64, 1024], wd)
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# dropout - controls the complexity of the model, prevents co-adaptation of features.
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# map the 1024 features to 10 classes, one for each digit
W_fc2 = weight_variable([1024, 10], wd)
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
return y_conv, keep_prob, wd
def conv2d(x, W):
"""conv2d returns a 2d convolution layer with full stride."""
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2X2(x):
"""max_pool_2x2 downsamples a feature map by 2X."""
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
def weight_variable(shape, wd = None):
"""weight_variable generates a weight variable of a given shape."""
initial = tf.truncated_normal(shape, stddev=0.1)
# weight decay
if wd is not None:
weight_decay = tf.multiply(tf.nn.l2_loss(initial), wd, name = 'weight_loss')
tf.add_to_collection('losses', weight_decay)
return tf.Variable(initial)
def bias_variable(shape):
"""bias_variable generates a bias variable of a given shape."""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def main(_):
global Flags2
if Flags2 is None:
Flags2 = {}
if 'keep_prob' not in Flags2:
Flags2 = {}
Flags2['dp'] = 1.0
Flags2['wd'] = 0.0
print(Flags2)
# import data
mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data', one_hot=True)
# create the model
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
# build the graph for the deep net
y_conv, keep_prob, wd = deepnn(x)
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
# adding weight decay
tf.add_to_collection('losses', cross_entropy)
total_loss = tf.add_n(tf.get_collection('losses'), name='total_loss')
train_step = tf.train.AdamOptimizer(1e-4).minimize(total_loss)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(1000):
batch =mnist.train.next_batch(200)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: batch[0], y_:batch[1], keep_prob: Flags2['dp'], wd: Flags2['wd']})
print('step %d, training accuracy %g' %(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: Flags2['dp'], wd: Flags2['wd']})
test_accuracy = accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob:1.0, wd: Flags2['wd']})
print('test accuracy %g' % test_accuracy)
return test_accuracy
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str,
default='/tmp/tensorflow/mnist/input_data',
help='directory for storing input data')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
Hyperopt is used to tune the hyper-parameters (weight decay factor and dropout probability).
from hyperopt import fmin, tpe, hp, Trials
import pickle
import traceback
from my_mnist_convnet import build_and_optimize
space = {
'dropout_global': hp.uniform('conv_dropout_prob', 0.4, 0.6),
'wd': hp.uniform('wd', 0.0, 0.01)
}
def run_a_trail():
"""Run one TPE meta optimisation step and save its results."""
max_evals = nb_evals = 3
print("Attempt to resume a past training if it exists:")
try:
trials = pickle.load(open("results.pkl", "rb"))
print("Found saved Trials! Loading...")
max_evals = len(trials.trials) + nb_evals
print("Rerunning from {} trials to add another one.".format(
len(trials.trials)))
except:
trials = Trials()
print("Starting from scratch: new trials.")
best = fmin(
build_and_optimize,
space,
algo=tpe.suggest,
trials=trials,
max_evals=max_evals
)
pickle.dump(trials, open("results.pkl", "wb"))
print(best)
return
def plot_base_and_best_models():
return
if __name__ == "__main__":
"""plot the model and run the optimisation forever (and save results)."""
run_a_trail()
When hyperopt code is used, the code runs fine for only one TPE run, however, if the number of trails is increased then it reports the following error.
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Shape [-1,784] has negative dimensions
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[?,784], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
This problem is most likely arising because each call to build_and_optimize() is adding nodes to the same TensorFlow graph, and the tf.train.AdamOptimizer is attempting to optimize variables from all of the previous graphs in addition to the current graph. To work around this problem, modify build_and_optimize() so that it runs main() in a different TensorFlow graph, using the following change:
def build_and_optimize(hp_space):
global Flags2
Flags2 = {}
Flags2['dp'] = hp_space['dropout_global']
Flags2['wd'] = hp_space['wd']
# Create a new, empty graph for each trial to avoid interference from
# previous trials.
with tf.Graph().as_default():
res = main(Flags2)
results = {
'loss': res,
'status': STATUS_OK
}
return results
I have made a convolutional neural network model using tensorflow to recognize handwriting by referring to tensorflow tutorials[1].This model uses convolutional filter1:[5,5,1,16], filter2:[5,5,16,32], fully combined layers[7*7*32,1024], and [1024,10] and then uses softmax to covert it to probabilities. I runs this model and failed because "loss" did't decrease ever and all of outputs are [0,0,1,0,0,0,0,0,0,0,0].
Then, I reduced the number of the filters and neurons and it succeeded and the accuracy marked about 97%.
Why can't I train successfully when I make a model in the same number of filters and neurons?
Here is my failed model.(I used "mnist.csv")
x = tf.placeholder(tf.float32,[None,28*28])
t = tf.placeholder(tf.float32,[None,10])
def weight(shape):
init = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(init)
def bias(shape):
init = tf.constant(0.1, shape=shape)
return tf.Variable(init)
def conv2d(x,W):
return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding="SAME")
def max_pool_22(x):
return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding="SAME")
W_conv1 = weight([5,5,1,16])
b_conv1 = bias([16])
x_image = tf.reshape(x,[-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_22(h_conv1)
print(h_pool1.shape)
W_conv2 = weight([5,5,16,64])
b_conv2 = bias([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)
h_pool2 = max_pool_22(h_conv2)
W_fc1 = weight([7*7*64,1024])
b_fc1 = bias([1024])
h_pool2_flat = tf.reshape(h_pool2,[-1,7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1) + b_fc1)
W_fc2 = weight([1024,10])
b_fc2 = bias([10])
prediction = tf.nn.softmax(tf.matmul(h_fc1,W_fc2) + b_fc2)
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=t,logits=prediction))
train_step = tf.train.AdamOptimizer().minimize(cross_entropy)
correct_prediction =tf.equal(tf.argmax(prediction,1),tf.argmax(t,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for epoch in range(20):
avg_loss = 0.
avg_accuracy = 0.
for i in range(1000):
ind = np.random.choice(len(x_train),50)
x_train_batch = x_train[ind]
t_train_batch = t_train[ind]
_, loss, a = sess.run([train_step,cross_entropy, accuracy],feed_dict={x:x_train_batch,t:t_train_batch})
avg_loss += loss/1000
avg_accuracy += a/1000
if epoch % 1 == 0:
print("Step:{0} Loss:{1} TrainAccuracy:{2}".format(epoch,avg_loss,avg_accuracy))
print("test_accuracy:{0}".format(accuracy.eval(feed_dict={x:x_test,t:t_test})))
[1]: https://www.tensorflow.org/get_started/mnist/prosenter code here
You are calling softmax_cross_entropy_with_logits on the output of softmax. This applies softmax twice leading to wrong results. softmax_cross_entropy_with_logits should be called on the linear output of the last layer, before applying softmax:
y = tf.matmul(h_fc1,W_fc2) + b_fc2
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=t, logits=y))
prediction_probabilities = tf.nn.softmax(y)
prediction_class = tf.argmax(y, 1)
The prediction_probabilities tensor above is only needed if you need the probabilities of each class. Otherwise, you can call argmax on y directly to get the predicted class.
I am currently working on creating visualizations for a maximal input image given the kernel/filters generated by a Convolutional Neural Network.
Keras had a blog post here that does something similar, but the results were questionable at best when using anything but the supplied dataset, so I thought I might give it a try with Tensorflow directly. [I will try and edit my post later with the images from it, not available on this computer].
Using the MNIST dataset along with the Tensorflow tutorial and Keras blog post as reference, I have generated the following code in attempts to create said visualizations. I am not sure if my methodology is correct, especially with how/when to normalize my results to visualize them.
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import copy
from scipy.misc import imsave
#~~~~~~~~~~~~~~~~~~~~~~~~~ CNN ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#Most of the CNN section directly from the tutorial
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
img_width = 28
img_height = 28
n = 3
remove_negatives = False
normalize = True
use = 'layer'
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return(tf.Variable(initial))
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return(tf.Variable(initial))
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(5000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
layer = sess.run(W_conv1[:,:,:,:])
bias = sess.run(b_conv1)
layer2 = sess.run(W_conv2[:,:,:,:])
bias2 = sess.run(b_conv2)
#~~~~~~~~~~~~~~~ Begin Visualization Code ~~~~~~~~~~~~~~~~
kept_filters = []
layer_use = layer
bias_use = bias
k=1
#toggle between layer 1 and layer 2 based on variable defined at beginning
if use != 'layer':
k = np.shape(layer2[:,:,:,:])[2]
layer_use = layer2
bias_use = bias2
#loop through kernels/feature maps and maximize each one's input image
for fmap in range(len(layer[0,0,0,:])):
feat_map = fmap
#randomized white-noise input image that will be max'ed
noise_mat = weight_variable([1,28,28,k])
#load kernel as a constant
single_layer = tf.constant(layer_use[:,:,0:k,feat_map-1:feat_map] + bias_use[feat_map],dtype=tf.float32)
conv = conv2d(noise_mat,single_layer)
#Use mean of the image matrix as the "loss" - is this the proper way to do this?
loss = -tf.reduce_mean(conv)
train_step = tf.train.GradientDescentOptimizer(.5).minimize(loss,var_list=[noise_mat])
#the training/maximizing
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
updatelist = [np.sum(sess.run(noise_mat)[0,:,:,0])]
noise_mat_begin = sess.run(noise_mat[0,:,:,0])
conv_saved = sess.run(conv)
for __ in range(5000):
train_step.run()
if __%200 == 0:
updatelist = updatelist + [np.sum(sess.run(noise_mat)[0,:,:,0])]
noise_mat_end = sess.run(noise_mat)[0,:,:,0]
noise_mat_normed = copy.deepcopy(noise_mat_end)
#not sure the best way to normalize?
if remove_negatives:
noise_mat_normed[noise_mat_normed <= 0] = 0
if normalize:
std = np.std(noise_mat_normed)
mean = np.mean(noise_mat_normed)
def full_norm(val):
return((val - mean)/std)
vnew = np.vectorize(full_norm)
noise_mat_normed = vnew(noise_mat_normed)
else:
oldmax = np.max(noise_mat_normed)
oldmin = np.min(noise_mat_normed)
def new_range(val,OldMax,OldMin):
return((((val - OldMin) * 255) / (OldMax - OldMin)))
vnew = np.vectorize(new_range)
noise_mat_normed = vnew(noise_mat_normed,oldmax,oldmin)
#negative sums generally imply a lack of convergence due to my loss metric, so remove them
if np.sum(noise_mat_normed) > 0:
kept_filters += [noise_mat_normed]
#visualize results in a grid format, similar to the blog post
kept_filters = kept_filters[:n * n]
margin = 5
width = n * img_width + (n - 1) * margin
height = n * img_height + (n - 1) * margin
stitched_filters = np.zeros((width, height))
for i in range(n):
for j in range(n):
img = kept_filters[i * n + j]
stitched_filters[(img_width + margin) * i: (img_width + margin) * i + img_width,
(img_height + margin) * j: (img_height + margin) * j + img_height] = img
imsave('TF_vis_%dx%d.png' % (n, n), stitched_filters)
This produces results like so (from convolutional layer 1):
I'm not sure if this is at all correct, especially since layer 2 doesn't seem much different. Do my results and/or methodology seem reasonable? Has anyone else done this using the MNIST dataset? As an aside, validation accuracy was >95%.
EDIT: I must have been doing something wrong originally; I re-did/re-ran the code from the blog post and now the results from my own Tensorflow code look about the same as the blog post method's output, so that's good. However, the main concerns still stand:
Why am I not getting more obvious or distinct outputs? I know they wont be as specific as the filters themselves, but these images don't seem to portray anything, unlike the blog post counterparts. Is there just not enough variation in the original dataset?
Shouldn't I be getting at least SOME things that aren't just glorified bordered images, like diagonals or curves?
Shouldn't the second layer look a more complex iteration of the first?