Tensorboard Graph with custom training loop does not include my Model - python

I have created my own loop as shown in the TF 2 migration guide here.
I am currently able to see the graph for only the --- VISIBLE --- section of the code below. How do I make my model (defined in the ---NOT VISIBLE--- section) visible in tensorboard?
If I was not using a custom training loop, I could have gone with the documented model.fit approach:
model.fit(..., callbacks=[keras.callbacks.TensorBoard(log_dir=logdir)])
In TF 1, the approach used to be quite straightforward:
tf.compat.v1.summary.FileWriter(LOGDIR, sess.graph)
The Tensorboard migration guide clearly states (here) that:
No direct writing of tf.compat.v1.Graph - instead use #tf.function and trace functions
configure_default_gpus()
tf.summary.trace_on(graph=True)
K = tf.keras
dataset = sanity_dataset(BATCH_SIZE)
#-------------------------- NOT VISIBLE -----------------------------------------
model = K.models.Sequential([
K.layers.Flatten(input_shape=(IMG_WIDTH, IMG_HEIGHT, IMG_CHANNELS)),
K.layers.Dense(10, activation=K.layers.LeakyReLU()),
K.layers.Dense(IMG_WIDTH * IMG_HEIGHT * IMG_CHANNELS, activation=K.layers.LeakyReLU()),
K.layers.Reshape((IMG_WIDTH, IMG_HEIGHT, IMG_CHANNELS)),
])
#--------------------------------------------------------------------------------
optimizer = tf.keras.optimizers.Adam()
loss_fn = K.losses.Huber()
#tf.function
def train_step(inputs, targets):
with tf.GradientTape() as tape:
predictions = model(inputs, training=True)
#-------------------------- VISIBLE ---------------------------------------------
pred_loss = loss_fn(targets, predictions)
gradients = tape.gradient(pred_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
#--------------------------------------------------------------------------------
return pred_loss, predictions
with tf.summary.create_file_writer(LOG_DIR).as_default() as writer:
for epoch in range(5):
for step, (input_batch, target_batch) in enumerate(dataset):
total_loss, predictions = train_step(input_batch, target_batch)
if step == 0:
tf.summary.trace_export(name="all", step=step, profiler_outdir=LOG_DIR)
tf.summary.scalar('loss', total_loss, step=step)
writer.flush()
writer.close()
There's a similar unanswered question where the OP was unable to view any graph.

I'm sure there's a better way, but I just realized that a simple workaround is to just use the existing tensorboard callback logic:
tb_callback = tf.keras.callbacks.TensorBoard(LOG_DIR)
tb_callback.set_model(model) # Writes the graph to tensorboard summaries using an internal file writer
If you want, you could write your own summaries into the same directory it uses: tf.summary.create_file_writer(LOG_DIR + '/train').

Related

Saving a GAN in keras using tf.train.Checkpoint

UPDATE: To solve this, I kept the checkpoint structure the same but wrote a custom train_step function, with the help of the repo linked in the accepted answer of the question linked below, which calculated the gradients and used apply_weights rather than compiling the model and using train_on_batch. This lets the full GAN state be restored. Sadly, with this method I'm fairly sure the dropout layers no longer work as the discriminator is able to work perfectly very early in the training which prevents the model from training properly. Nevertheless, the original problem is solved.
Original:
I am currently training a GAN in keras and trying to make it so that I can save the model and resume training later. Ordinarily in keras you'd simply use model.save(), however for a GAN if the discriminator and GAN (combined generator and discriminator, with discriminator weights not trainable) models are saved and loaded separately then the link between them is broken and the GAN will not function as expected. Someone asked a similar question here, How to save and resume training a GAN with multiple model parts with Tensorflow 2/ Keras, and was told to use tf.train.Checkpoint instead to save the full model at once as a checkpoint.
I've tried implementing this as follows:
def train(epochs, batch_size):
checkpoint = tf.train.Checkpoint(g_optimizer=g_optimizer,
d_optimizer=d_optimizer,
generator=generator,
discriminator=discriminator,
gan=gan
)
ckpt_manager = tf.train.CheckpointManager(checkpoint, 'checkpoints', max_to_keep=3)
if ckpt_manager.latest_checkpoint:
checkpoint.restore(ckpt_manager.latest_checkpoint)
discriminator.compile(loss='binary_crossentropy', optimizer=d_optimizer)
i = Input(shape=(None, latent_dims))
lcs = generator(i)
discriminator.trainable = False
valid = discriminator(lcs)
gan = Model(i, valid)
gan.compile(loss='binary_crossentropy', optimizer=g_optimizer)
for epoch in epochs:
#train discriminator...
#train generator...
ckpt_manager.save()
where g_optimizer, d_optimizer are just tf.keras.optimizers.Adam objects and generator, discriminator and gan are tf.keras.Model objects.
When I use this approach, the link between the gan model and the discriminator is preserved after loading in the checkpoint. The training works normally at first, but after I stop and then resume training using the checkpoint the discriminator loss starts massively increasing and the generated data becomes nonsensical.
Recompiling the models are loading the checkpoint like this was only way I could think of doing it which uses the last state of the optimizer, but clearly something isn't right - rather than resuming the training from where it was, this approach is massively disrupting the training.
Have I used tf.train.Checkpoint incorrectly for what I'm trying to do? Please let me know if there's any more information you need to be able to address the question.
Edit, have added full code by request:
Here is the code that creates the models in the first place and then trains them, in this setup the models are compiled initially when first created, and then compiled again if resuming from a checkpoint using the latest optimizer state. I appreciate it's weird to compile twice but I couldn't think of another way to use the latest optimizer state from the checkpoint, if there's a better way I'm very happy to change it. Note, the unusual GRU-based GAN is because I'm testing out being able to generate variable length time-series. There's a lot of data specific stuff in there but hopefully on the whole it makes sense. train_df is just a pandas DataFrame containing all the training data
def build_generator():
input = Input(shape=(None, latent_dims))
gru1 = GRU(100, activation='relu', return_sequences=True)(input)
gru2 = GRU(100, activation='relu', return_sequences=True (gru1)
output = GRU(9, return_sequences=True, activation='sigmoid')(gru2)
model = Model(input, output)
return model
def build_discriminator():
input = Input(shape=(None, 9))
gru1 = GRU(100, return_sequences=True)(input)
gru2 = GRU(100, return_sequences=True)(gru1)
output = GRU(1, activation='sigmoid')(gru2)
model = Model(input, output)
return model
d_optimizer = opt.Adam(learning_rate=lr)
g_optimizer = opt.Adam(learning_rate=lr)
# Build discriminator
discriminator = build_discriminator()
discriminator.compile(loss='binary_crossentropy', optimizer=d_optimizer)
# Build generator
generator = build_generator()
# Build combined model
i = Input(shape=(None, latent_dims))
lcs = generator(i)
discriminator.trainable = False
valid = discriminator(lcs)
gan = Model(i, valid)
gan.compile(loss='binary_crossentropy', optimizer=g_optimizer)
def train(epochs, batch_size=1): #Only works with batch size of 1 currently
sne = train_df.sn.unique()
n_batches = int(len(sne) / batch_size)
rng = np.random.default_rng(123)
checkpoint = tf.train.Checkpoint(g_optimizer=g_optimizer,
d_optimizer=d_optimizer,
generator=generator,
discriminator=discriminator,
gan=gan
)
ckpt_manager = tf.train.CheckpointManager(checkpoint, 'checkpoints', max_to_keep=3)
if ckpt_manager.latest_checkpoint:
checkpoint.restore(ckpt_manager.latest_checkpoint)
discriminator.compile(loss='binary_crossentropy', optimizer=d_optimizer)
i = Input(shape=(None, latent_dims))
lcs = generator(i)
discriminator.trainable = False
valid = discriminator(lcs)
gan = Model(i, valid)
gan.compile(loss='binary_crossentropy', optimizer=g_optimizer)
for epoch in range(epochs):
rng.shuffle(sne)
g_losses, d_losses = [], []
for batch in range(n_batches):
real = np.random.uniform(0.0, 0.1, (batch_size, 1)) # Used instead of np.zeros to avoid zero gradients
fake = np.random.uniform(0.9, 1.0, (batch_size, 1)) # Used instead of np.ones to avoid zero gradients
# Select real data
sn = sne[batch]
sndf = train_df[train_df.sn == sn]
X = sndf[['g_t', 'r_t', 'i_t', 'z_t', 'g', 'r', 'i', 'z', 'g_err', 'r_err', 'i_err', 'z_err']].values
X = X.reshape((1, *X.shape))
noise = rand.normal(size=(batch_size, latent_dims))
noise = np.reshape(noise, (batch_size, 1, latent_dims))
noise = np.repeat(noise, X.shape[1], 1)
gen_lcs = generator.predict(noise)
# Train discriminator
d_loss_real = discriminator.train_on_batch(X, real)
d_loss_fake = discriminator.train_on_batch(gen_lcs, fake)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# Train generator
noise = rand.normal(size=(2 * batch_size, latent_dims))
noise = np.reshape(noise, (2 * batch_size, 1, latent_dims))
noise = np.repeat(noise, X.shape[1], 1)
gen_labels = np.zeros((2 * batch_size, 1))
g_loss = gan.train_on_batch(noise, gen_labels)
g_losses.append(g_loss)
d_losses.append(d_loss)
ckpt_manager.save()
full_g_loss = np.mean(g_losses)
full_d_loss = np.mean(d_losses)
print(f'{epoch + 1}/{epochs} g_loss={full_g_loss}, d_loss={full_d_loss})
train()
If you have the following checkpoint structure, your model should work properly:
checkpoint_dir = 'checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(generator_opt=generator_opt,
discriminator_opt=discriminator_opt,
gan_opt=gan_opt,
generator=generator,
discriminator=discriminator,
GAN = GAN
)
ckpt_manager = tf.train.CheckpointManager(checkpoint, checkpoint_dir, max_to_keep=3)
if ckpt_manager.latest_checkpoint:
checkpoint.restore(ckpt_manager.latest_checkpoint)
print ('Latest checkpoint restored!!')
Note that the GAN model has its own optimizer. And then in your training loop, just save checkpoints at certain intervals, for example every 10 epochs.
for epoch in range(epochs):
...
...
...
if epoch%10 == 0:
ckpt_manager.save()

Gradient Accumulation with Custom model.fit in TF.Keras?

Please add a minimum comment on your thoughts so that I can improve my query. Thank you. -)
I'm trying to train a tf.keras model with Gradient Accumulation (GA). But I don't want to use it in the custom training loop (like) but customize the .fit() method by overriding the train_step.Is it possible? How to accomplish this? The reason is if we want to get the benefit of keras built-in functionality like fit, callbacks, we don't want to use the custom training loop but at the same time if we want to override train_step for some reason (like GA or else) we can customize the fit method and still get the leverage of using those built-in functions.
And also, I know the pros of using GA but what are the major cons of using it? Why does it's not come as a default but an optional feature with the framework?
# overriding train step
# my attempt
# it's not appropriately implemented
# and need to fix
class CustomTrainStep(keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = n_gradients
self.gradient_accumulation = [
tf.zeros_like(this_var) for this_var in self.trainable_variables
]
def train_step(self, data):
x, y = data
batch_size = tf.cast(tf.shape(x)[0], tf.float32)
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(
y, y_pred, regularization_losses=self.losses
)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
accum_gradient = [
(acum_grad+grad) for acum_grad, grad in \
zip(self.gradient_accumulation, gradients)
]
accum_gradient = [
this_grad/batch_size for this_grad in accum_gradient
]
# apply accumulated gradients
self.optimizer.apply_gradients(
zip(accum_gradient, self.trainable_variables)
)
# TODO: reset self.gradient_accumulation
# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
Please, run and check with the following toy setup.
# Model
size = 32
input = keras.Input(shape=(size,size,3))
efnet = keras.applications.DenseNet121(
weights=None,
include_top = False,
input_tensor = input
)
base_maps = keras.layers.GlobalAveragePooling2D()(efnet.output)
base_maps = keras.layers.Dense(
units=10, activation='softmax',
name='primary'
)(base_maps)
custom_model = CustomTrainStep(
n_gradients=10, inputs=[input], outputs=[base_maps]
)
# bind all
custom_model.compile(
loss = keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = keras.optimizers.Adam()
)
# data
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.expand_dims(x_train, -1)
x_train = tf.repeat(x_train, 3, axis=-1)
x_train = tf.divide(x_train, 255)
x_train = tf.image.resize(x_train, [size,size]) # if we want to resize
y_train = tf.one_hot(y_train , depth=10)
# customized fit
custom_model.fit(x_train, y_train, batch_size=64, epochs=3, verbose = 1)
Update
I've found that some others also tried to achieve this and ended up with the same issue. One has got some workaround, here, but it's too messy and I think there should be some better approach.
Update 2
The accepted answer (by Mr.For Example) is fine and works well in single strategy. Now, I like to start 2nd bounty to extend it to support multi-gpu, tpu, and with mixed-precision techniques. There are some complications, see details.
Yes it is possible to customize the .fit() method by overriding the train_step without a custom training loop, following simple example will show you how to train a simple mnist classifier with gradient accumulation:
import tensorflow as tf
class CustomTrainStep(tf.keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = tf.constant(n_gradients, dtype=tf.int32)
self.n_acum_step = tf.Variable(0, dtype=tf.int32, trainable=False)
self.gradient_accumulation = [tf.Variable(tf.zeros_like(v, dtype=tf.float32), trainable=False) for v in self.trainable_variables]
def train_step(self, data):
self.n_acum_step.assign_add(1)
x, y = data
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign_add(gradients[i])
# If n_acum_step reach the n_gradients then we apply accumulated gradients to update the variables otherwise do nothing
tf.cond(tf.equal(self.n_acum_step, self.n_gradients), self.apply_accu_gradients, lambda: None)
# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
def apply_accu_gradients(self):
# apply accumulated gradients
self.optimizer.apply_gradients(zip(self.gradient_accumulation, self.trainable_variables))
# reset
self.n_acum_step.assign(0)
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign(tf.zeros_like(self.trainable_variables[i], dtype=tf.float32))
# Model
input = tf.keras.Input(shape=(28, 28))
base_maps = tf.keras.layers.Flatten(input_shape=(28, 28))(input)
base_maps = tf.keras.layers.Dense(128, activation='relu')(base_maps)
base_maps = tf.keras.layers.Dense(units=10, activation='softmax', name='primary')(base_maps)
custom_model = CustomTrainStep(n_gradients=10, inputs=[input], outputs=[base_maps])
# bind all
custom_model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) )
# data
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.divide(x_train, 255)
y_train = tf.one_hot(y_train , depth=10)
# customized fit
custom_model.fit(x_train, y_train, batch_size=6, epochs=3, verbose = 1)
Outputs:
Epoch 1/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.5053 - accuracy: 0.8584
Epoch 2/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.1389 - accuracy: 0.9600
Epoch 3/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.0898 - accuracy: 0.9748
Pros:
Gradient accumulation is a mechanism to split the batch of samples —
used for training a neural network — into several mini-batches of
samples that will be run sequentially
Because GA calculates the loss and gradients after each mini-batch, but instead of updating the model parameters, it waits and accumulates the gradients over consecutive batches, so it can overcoming memory constraints, i.e using less memory to training the model like it using large batch size.
Example: If you run a gradient accumulation with steps of 5 and batch
size of 4 images, it serves almost the same purpose of running with a
batch size of 20 images.
We could also parallel the training when using GA, i.e aggregate gradients from multiple machines.
Things to consider:
This technique is working so well so it is widely used, there few things to consider before using it that I don't think it should be called cons, after all, all GA does is turning 4 + 4 to 2 + 2 + 2 + 2.
If your machine has sufficient memory for the batch size that already large enough then there no need to use it, because it is well known that too large of a batch size will lead to poor generalization, and it will certainly run slower if you using GA to achieve the same batch size that your machine's memory already can handle.
Reference:
What is Gradient Accumulation in Deep Learning?
Thanks to #Mr.For Example for his convenient answer.
Usually, I also observed that using Gradient Accumulation, won't speed up training since we are doing n_gradients times forward pass and compute all the gradients. But it will speed up the convergence of our model. And I found that using the mixed_precision technique here can be really helpful here. Details here.
policy = tf.keras.mixed_precision.Policy('mixed_float16')
tf.keras.mixed_precision.experimental.set_policy(policy)
Here is a complete gist.

How to get the class probabilities during the evaluation of CIFAR-10 in TensorFlow?

I try to modify the code from the Convolutional Neural Network TensorFlow Tutorial to get the single probabilities for each class from each test-images.
What alternative to tf.nn.in_top_k can I use? Because this method returns only one boolean tensor. But I want to preserve the individual values.
I use Tensorflow 1.4 and Python 3.5, I think lines 62-82 and 121-129 / 142 are probably the lines to be modified. Somebody have a hint for me?
Lines 62-82:
def eval_once(saver, summary_writer, top_k_op, summary_op):
"""Run Eval once.
Args:
saver: Saver.
summary_writer: Summary writer.
top_k_op: Top K op.
summary_op: Summary op.
"""
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
# Assuming model_checkpoint_path looks something like:
# /my-favorite-path/cifar10_train/model.ckpt-0,
# extract global_step from it.
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
else:
print('No checkpoint file found')
return
Lines 121-129 + 142
[....]
images, labels = cifar10.inputs(eval_data=eval_data)
# Build a Graph that computes the logits predictions from the
# inference model.
logits = cifar10.inference(images)
# Calculate predictions.
top_k_op = tf.nn.in_top_k(logits, labels, 1)
[....]
You can compute the class probabilities from the raw logits:
# The vector of probabilities per each example in a batch
prediction = tf.nn.softmax(logits)
As a bonus, here's how to get the exact accuracy:
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

How to load a trained tensorflow model

I am having all kinds of trouble loading a tensorflow model to test on some new data. When I trained the model, I used this:
save_model_file = 'my_saved_model'
saver = tf.train.Saver()
save_path = saver.save(sess, save_model_file)
This seems to result in the following files being created:
my_saved_model.meta
checkpoint
my_saved_model.index
my_saved_model.data-00000-of-00001
I have no idea which of these files I am supposed to pay attention to.
Now the model is trained, and I can't seem to load it or use it without throwing an exception. Here is what I am doing:
def neural_net_data_input(data_shape):
theshape=(None,)+tuple(data_shape)
return tf.placeholder(tf.float32,shape=theshape,name='x')
def neural_net_label_input(n_out):
return tf.placeholder(tf.float32,shape=(None,n_out),name='one_hot_labels')
def neural_net_keep_prob_input():
return tf.placeholder(tf.float32,name='keep_prob')
def do_generate_network(x):
#
# here is where i generate the network layer by layer.
# this code works fine so i am not showing it here
#
pass
#
# Now I want to restore the model
#
tf.reset_default_graph()
input_data_shape=(32,32,1)
final_num_outputs=43
graph1 = tf.Graph()
with graph1.as_default():
x = neural_net_data_input(input_data_shape)
one_hot_labels = neural_net_label_input(final_num_outputs)
keep_prob=neural_net_keep_prob_input()
logits = do_generate_network(x)
# Name logits Tensor, so that is can be loaded from disk after training
logits = tf.identity(logits, name='logits')
#
# accuracy: we use this for validation testing
#
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
################################
# Evaluate
################################
new_data=myutils.load_pickle_file(SOME_DATA_FILE_NAME)
new_features=new_data['features']
new_one_hot_labels=new_data['labels']
print('Evaluating on new data...')
with tf.Session(graph=graph1) as sess:
# Initializing the variables
sess.run(tf.global_variables_initializer())
saver.restore(sess,save_model_file)
new_acc = sess.run(accuracy, feed_dict={x: new_features, one_hot_labels: new_one_hot_labels, keep_prob: 1.})
print('Testing Accuracy For New Images: {}'.format(new_acc))
But when I do this, I get this:
TypeError: Cannot interpret feed_dict key as Tensor: The name 'save/Const:0' refers to a Tensor which does not exist. The operation, 'save/Const', does not exist in the graph.
So, i tried moving my graph inside the session like this:
################################
# Evaluate
################################
print('Evaluating on web data...')
with tf.Session() as sess:
x = neural_net_data_input(input_data_shape)
one_hot_labels = neural_net_label_input(final_num_outputs)
keep_prob=neural_net_keep_prob_input()
logits = do_generate_network(x)
# Name logits Tensor, so that is can be loaded from disk after training
logits = tf.identity(logits, name='logits')
#
# accuracy: we use this for validation testing
#
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
sess.run(tf.global_variables_initializer())
my_save_dir="/home/carnd/CarND-Traffic-Sign-Classifier-Project"
load_model_meta_file=os.path.join(my_save_dir,"my_saved_model.meta")
load_model_path=os.path.join(my_save_dir,"my_saved_model")
new_saver = tf.train.import_meta_graph(load_model_meta_file)
new_saver.restore(sess, load_model_path)
web_acc = sess.run(accuracy, feed_dict={x: web_features, one_hot_labels: web_one_hot_labels, keep_prob: 1.})
print('Testing Accuracy For Web Images: {}'.format(web_acc))
Now it runs without throwing an error, but the accuracy result it prints is 0.02! I am feeding in the very same data that during training I was getting 95% accuracy on. So it appears I am somehow loading my model incorrectly.
What am I doing wrong?
Steps for loading the trained model:
Load the graph:
You can load the graph using tf.train.import_meta_graph(). An example code would be:
model_path = "my_saved_model"
inference_graph = tf.Graph()
with tf.Session(graph= inference_graph) as sess:
# Load the graph with the trained states
loader = tf.train.import_meta_graph(model_path+'.meta')
loader.restore(sess, model_path)
Get the tensors: Get the tensors need for inference by using get_tensor_by_name(). So in your model make sure you name the tensors by name, so that you can call it during inference.
#Get the tensors by their variable name
_accuracy = inference_graph.get_tensor_by_name('accuracy:0')
_x = inference_graph get_tensor_by_name('x:0')
_y = inference_graph.get_tensor_by_name('y:0')
Test: Can do done by using the tensors loaded. sess.run(_accuracy, feed_dict={_x: ... , _y:...}

Tensorflow slim train and validate inception model

I'm trying to fine tune inception models, and validate it with test data. But all the examples given at tensorflow slime web page only either fine-tuning or testing, there is not any example that doing both at same graph and session.
Basically I want to this.
with tf.Graph().as_default():
image, image_raw, label,image_name, label_name = dut.distorted_inputs(params,is_training=is_training)
test_image, test_image_raw, test_label,test_image_name, test_label_name = dut.distorted_inputs(params,is_training=False)
# I'm creating as it is suggested at github slim page:
logits, _ =inception.inception_v2(image, num_classes=N, is_training=True)
tf.get_variable_scope().reuse_variables()
logits_tes, _ =inception.inception_v2(test_image, num_classes=N, is_training=Test)
err=tf.sub(logits, label)
losses = tf.reduce_mean(tf.reduce_sum(tf.square(err)))
# total_loss = model_loss+losses
total_loss = losses+slim.losses.get_total_loss()
test_err=tf.sub(test_logits, test_label)
test_loss= tf.reduce_mean(tf.reduce_sum(tf.square(test_err)))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
train_op = slim.learning.create_train_op(total_loss, optimizer)
final_loss = slim.learning.train(
train_op,
logdir=params["cp_file"],
init_fn=ut.get_init_fn(slim,params),
number_of_steps=2,
summary_writer=summary_writer
)
this code fails As it can be seen, I don't have loop separately to call my test models, I want to test my model on my test data at each 10th batch.
Does calling train with number_of_steps=10 and then using the evaluation code work?

Categories

Resources