Loading model weights into a new tensorflow graph - python

Goal
Using tensorflow, I'm trying to train a LSTM model for a certain number of iterations on data that's N timesteps per sample, then slowly increase the number of timesteps per sample as the model trains.
So maybe the RNN model is looking at 4 timesteps per training sample at first. After training for a while, performance levels out. I'd like to now continue training the model with 8 timesteps. This is basically a form of finetuning for RNNs.
Progress
The seemingly most straightforward way to do this would be to save the model after training it for a while, then rebuild a new graph with a new Variable X with more timesteps defined.
Unfortunately, I can't find a way to not hardcode the number of timesteps into my model. But that's ok, because if I recreate the model and fill it with saved weights, the shapes of the model shapes should be the same so it should work.
So I'm running the model a first time to generate a save file. Then I'm loading that save file and trying to populate a new graph with the weights from the old (almost identical) tensorflow graph.
This has been driving me crazy, so any help is much appreciated.
Code
Here's my code so far:
if MODEL_FILE is not None:
# load from saved model file
new_saver = tf.train.import_meta_graph(MODEL_FILE + '.meta')
weights = {
'out': tf.Variable(tf.random_uniform([LSTM_SIZE, n_outputs_sm]))
}
biases = {
'out': tf.Variable(tf.random_uniform([n_outputs_sm]))
}
# setup input X and output Y graph variables
x = tf.placeholder('float', [None, NUM_TIMESTEPS, n_input], name='input_x')
y = tf.placeholder('float', [None, n_outputs_sm], name='output_y')
# Feed forward function to get the RNN output. We're using a fancy type of LSTM cell.
def TFEncoderRNN(inp, weights, biases):
# current_input_shape: (batch_size, n_steps, n_input
# required shape: 'n_steps' tensors list of shape (batch_size, n_input)
inp = tf.unstack(inp, NUM_TIMESTEPS, 1)
lstm_cell = tf.contrib.rnn.LayerNormBasicLSTMCell(LSTM_SIZE, dropout_keep_prob=DROPOUT)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, inp, dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
# we'll be able to call this to get our model output
pred = TFEncoderRNN(x, weights, biases)
# define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
# I define some more stuff here I'll leave out for brevity
init = None
if new_saver:
new_saver.restore(sess, './' + MODEL_FILE)
init = tf.initialize_variables([global_step])
else:
init = tf.global_variables_initializer()
sess.run(init)
######
### TRAIN AND STUFF
######
print "Optimization finished!"
# save the current graph, you can just run this script again to
# continue training
if SAVE_MODEL:
print "Saving model"
saver = tf.train.Saver()
saver.save(sess, 'tf_model_001')
Any ideas on how to move my trained model weights into a newly created graph/model?

The seemingly most straightforward way to do this would be to save the model after training it for a while, then rebuild a new graph with a new Variable X with more timesteps defined.
Actually, this is what tf.nn.dynamic_rnn is for -- the same model works for any sequence length.

Related

How to generate predictions from new data using trained tensorflow network?

I want to train Googles VGGish network (Hershey et al 2017) from scratch to predict classes specific to my own audio files.
For this I am using the vggish_train_demo.py script available on their github repo which uses tensorflow. I've been able to modify the script to extract melspec features from my own audio by changing the _get_examples_batch() function, and, then train the model on the output of this function. This runs to completetion and prints the loss at each epoch.
However, I've been unable to figure out how to get this trained model to generate predictions from new data. Can this be done with changes to the vggish_train_demo.py script?
For anyone who stumbles across this in the future, I wrote this script which does the job. You must save logmel specs for train and test data in the arrays: X_train, y_train, X_test, y_test. The X_train/test are arrays of the (n, 96,64) features and the y_train/test are arrays of shape (n, _NUM_CLASSES) for two classes, where n = the number of 0.96s audio segments and _NUM_CLASSES = the number of classes used.
See the function definition statement for more info and the vggish github in my original post:
### Run the network and save the predictions and accuracy at each epoch
### Train NN, output results
r"""This uses the VGGish model definition within a larger model which adds two
layers on top, and then trains this larger model.
We input log-mel spectrograms (X_train) calculated above with associated labels
(y_train), and feed the batches into the model. Once the model is trained, it
is then executed on the test log-mel spectrograms (X_test), and the accuracy is
ouput, alongside a .csv file with the predictions for each 0.96s chunk and their
true class."""
def main(X):
with tf.Graph().as_default(), tf.Session() as sess:
# Define VGGish.
embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
# Define a shallow classification model and associated training ops on top
# of VGGish.
with tf.variable_scope('mymodel'):
# Add a fully connected layer with 100 units. Add an activation function
# to the embeddings since they are pre-activation.
num_units = 100
fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
logits = slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='logits')
tf.sigmoid(logits, name='prediction')
linear_out= slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
logits = tf.sigmoid(linear_out, name='logits')
# Add training ops.
with tf.variable_scope('train'):
global_step = tf.train.create_global_step()
# Labels are assumed to be fed as a batch multi-hot vectors, with
# a 1 in the position of each positive class label, and 0 elsewhere.
labels_input = tf.placeholder(
tf.float32, shape=(None, _NUM_CLASSES), name='labels')
# Cross-entropy label loss.
xent = tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels_input, name='xent')
loss = tf.reduce_mean(xent, name='loss_op')
tf.summary.scalar('loss', loss)
# We use the same optimizer and hyperparameters as used to train VGGish.
optimizer = tf.train.AdamOptimizer(
learning_rate=vggish_params.LEARNING_RATE,
epsilon=vggish_params.ADAM_EPSILON)
train_op = optimizer.minimize(loss, global_step=global_step)
# Initialize all variables in the model, and then load the pre-trained
# VGGish checkpoint.
sess.run(tf.global_variables_initializer())
vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
# The training loop.
features_input = sess.graph.get_tensor_by_name(
vggish_params.INPUT_TENSOR_NAME)
accuracy_scores = []
for epoch in range(num_epochs):#FLAGS.num_batches):
epoch_loss = 0
i=0
while i < len(X_train):
start = i
end = i+batch_size
batch_x = np.array(X_train[start:end])
batch_y = np.array(y_train[start:end])
_, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
epoch_loss += c
i+=batch_size
#print no. of epochs and loss
print('Epoch', epoch+1, 'completed out of', num_epochs,', loss:',epoch_loss) #FLAGS.num_batches,', loss:',epoch_loss)
#If these lines are left here, it will evaluate on the test data every iteration and print accuracy
#note this adds a small computational cost
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) #This line returns the max value of each array, which we want to be the same (think the prediction/logits is value given to each class with the highest value being the best match)
accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
accuracy1 = accuracy.eval({features_input:X_test, labels_input:y_test})
accuracy_scores.append(accuracy1)
print('Accuracy:', accuracy1)#TF is smart so just knows to feed it through the model without us seeming to tell it to.
#Save predictions for test data
predictions_sigm = logits.eval(feed_dict = {features_input:X_test}) #not really _sigm, change back later
#print(predictions_sigm) #shows table of predictions, meaningless if saving at each epoch
test_preds = pd.DataFrame(predictions_sigm, columns = col_names) #converts predictions to df
true_class = np.argmax(y_test, axis = 1) #This saves the true class
test_preds['True class'] = true_class #This adds true class to the df
#Saves csv file of table of predictions for test data. NB. header will not save when using np.text for some reason
np.savetxt("/content/drive/MyDrive/..."+"Epoch_"+str(epoch+1)+"_Accuracy_"+str(accuracy1), test_preds.values, delimiter=",")
if __name__ == '__main__':
tf.app.run()
#'An exception has occurred, use %tb to see the full traceback.' error will occur, fear not, this just means its finished (perhaps as its exited the tensorflow session?)

Tensorflow: How to get the correct output of a pretrained Keras model with inputs from another Tensorflow model

I have a Keras pre-trained model "model_keras" and I want to use it in a loss function. The input of model "model_keras" is an output of another Tensorflow model "model_tf" (a generative model). I'm trying to update the weights of "model_tf" by minimizing the loss. During the optimization, "model_kears" is only used for inference and will not get updated. My problem is that I'm not able to get the correct inference result from "model_keras", due to this issue, I'm not able to update the "model_tf" correctly. The code is shown below:
loss_func(input, target, model_keras): # the input is an output of another Tensorflow model.
inference_res = model_keras(input)
loss = tf.reduce_mean(inference_res-target)
return loss
train_phase = tf.placeholder(tf.bool)
z = tf.placeholder(tf.float32, [None, 128])
y = tf.placeholder(tf.int32, [None])
t = tf.placeholder(tf.float32, [None, 10])
model_tf = Generator("generator") # Building the Tensorflow model "model_tf"
fake_img = model_tf(z, train_phase, y, NUMS_CLASS) # fake_img is the output of "model_tf" and will be served as the input of "model_keras"
model_keras = MyKerasModel("Vgg19") # Loading the pretrained Keras model
G_loss = loss_func(fake_img, t, model_keras)
G_opt = tf.train.AdamOptimizer(4e-4, beta1=0., beta2=0.9).minimize(G_loss, var_list=model_tf.var_list())
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(G_opt, feed_dict={z: Z, train_phase: True, y: Y, t: target}) # Z, Y and target are numpy arrays.
I also tried to use model.predict(input) but got the ValueError: "When feeding symbolic tensors to a model, we expect the tensors to have a static batch size". Reason behind is that model.predict() expects the input to be real data tensor instead of a symbolic tensor. However, since I want to update the weights of "model_tf", I need to make the loss function differentiable and compute the gradients. Therefore, I can not just pass a numpy array to "model_keras".
How can I get the correct output(inference_res) of "model_keras" in this case? The Tensorflow and Keras version I'm using is 1.15 and 2.2.5, respectively.
If I understood your question, here is an idea. You can pass your input to model_keras and lets name the output keras_y. Then freeze the model_keras and add the model to the end of model_tf so you have a big model which is sequence of model_tf and then model_keras (which the second part has been freezed). Next give the inputs to your model and name the output as model_y. Now you can compute the loss as loss_func(keras_y, model_y)

Restore variables that are a subset of new model in Tensorflow?

I am doing an example for boosting(4 layers DNN to 5 layers DNN) via Tensorflow. I am making it with save session and restore in TF because there is a brief paragraph in TF tute:
'For example, you may have trained a neural net with 4 layers, and you now want to train a new model with 5 layers, restoring the parameters from the 4 layers of the previously trained model into the first 4 layers of the new model.', where tensorflow tute inspires in https://www.tensorflow.org/how_tos/variables/.
However, I found that nobody has asked how to use 'restore' when the checkpoint saves parameters of 4 layers but we need to put that into 5 layers, raising a red flag.
Making this in real code, I made
with tf.name_scope('fcl1'):
hidden_1 = fully_connected_layer(inputs, train_data.inputs.shape[1], num_hidden)
with tf.name_scope('fcl2'):
hidden_2 = fully_connected_layer(hidden_1, num_hidden, num_hidden)
with tf.name_scope('fclf'):
hidden_final = fully_connected_layer(hidden_2, num_hidden, num_hidden)
with tf.name_scope('outputl'):
outputs = fully_connected_layer(hidden_final, num_hidden, train_data.num_classes, tf.identity)
outputs = tf.nn.softmax(outputs)
with tf.name_scope('boosting'):
boosts = fully_connected_layer(outputs, train_data.num_classes, train_data.num_classes, tf.identity)
where variables inside(or called from) 'fcl1' - so that I have 'fcl1/Variable' and 'fcl1/Variable_1' for weight and bias - 'fcl2', 'fclf', and 'outputl' are stored by saver.save() in the script without 'boosting' layer. However, as we now have 'boosting' layer, saver.restore(sess, "saved_models/model_list.ckpt") does not work as
NotFoundError: Key boosting/Variable_1 not found in checkpoint
I really hope to hear about this problem. Thank you.
Below code is the main part of the code I am in trouble.
def fully_connected_layer(inputs, input_dim, output_dim, nonlinearity=tf.nn.relu):
weights = tf.Variable(
tf.truncated_normal(
[input_dim, output_dim], stddev=2. / (input_dim + output_dim)**0.5),
'weights')
biases = tf.Variable(tf.zeros([output_dim]), 'biases')
outputs = nonlinearity(tf.matmul(inputs, weights) + biases)
return outputs
inputs = tf.placeholder(tf.float32, [None, train_data.inputs.shape[1]], 'inputs')
targets = tf.placeholder(tf.float32, [None, train_data.num_classes], 'targets')
with tf.name_scope('fcl1'):
hidden_1 = fully_connected_layer(inputs, train_data.inputs.shape[1], num_hidden)
with tf.name_scope('fcl2'):
hidden_2 = fully_connected_layer(hidden_1, num_hidden, num_hidden)
with tf.name_scope('fclf'):
hidden_final = fully_connected_layer(hidden_2, num_hidden, num_hidden)
with tf.name_scope('outputl'):
outputs = fully_connected_layer(hidden_final, num_hidden, train_data.num_classes, tf.identity)
with tf.name_scope('error'):
error = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(outputs, targets))
with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(
tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1)),
tf.float32))
with tf.name_scope('train'):
train_step = tf.train.AdamOptimizer().minimize(error)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(init)
saver.restore(sess, "saved_models/model.ckpt")
print("Model restored")
print("Optimization Starts!")
for e in range(training_epochs):
...
#Save model - save session
save_path = saver.save(sess, "saved_models/model.ckpt")
### I once saved the variables using var_list, but didn't work as well...
print("Model saved in file: %s" % save_path)
For clarity, the checkpoint file has
fcl1/Variable:0
fcl1/Variable_1:0
fcl2/Variable:0
fcl2/Variable_1:0
fclf/Variable:0
fclf/Variable_1:0
outputl/Variable:0
outputl/Variable_1:0
As the original 4 layers model does not have 'boosting' layer.
It doesn't look right to read values for boosting from the checkpoint in this case and I think that's not what you want to do. Obviously you're getting error, since while restoring the variables you are first catching the list of all of the variables in your model and then you look for corresponding variables in your checkpoint, which doesn't have them.
You can restore only part of your model by defining a subset of your model variables. For example you can do it using tf.slim library. Getting the list of variables in your models:
variables = slim.get_variables_to_restore()
Now variables is a list of tensors, but for each element you can access its name attribute. Using that you can specify that you only want to restore layers other than boosting, e.g.:
variables_to_restore = [v for v in variables if v.name.split('/')[0]!='boosting']
model_path = 'your/model/path'
saver = tf.train.Saver(variables_to_restore)
with tf.Session() as sess:
saver.restore(sess, model_path)
This way you will have your 4 layers restored. Theoretically you could try to catch values of one of your variables from checkpoint by creating another server that will only have boosting in variables list and renaming the chosen variable from the checkpoint, but I really don't think it's what you need here.
Since this is a custom layer for your model and you don't have this variable anywhere, just initialize it within your workflow instead of trying to import it. You can do for example by passing this argument while calling a function fully_connected:
weights_initializer = slim.variance_scaling_initializer()
You need to check details yourself though, since I'm not sure what your imports are and which function are you using here.
Generally I'd advice you to take a look at slim library, which will make it easier for you to define a model and scopes for layers (instead of defining it by with you can rather pass a scope argument while calling a function). It would look something like that with slim:
boost = slim.fully_connected(input, number_of_outputs, activation_fn=None, scope='boosting', weights_initializer=slim.variance_scaling_initializer())

Tensorflow: Feeding placeholder from variable

I'm feeding to tensorflow computation(train) graph using input queue and
tf.train.batch function that prepares huge tensor with data.
I have another queue with test data I would like to feed to graph every 50th step.
Question
Given the form of the input (tensors) do I have to define separate test graph for test data computation or I can somehow reuse train grap?
# Prepare data
batch = tf.train.batch([train_image, train_label], batch_size=200)
batchT = tf.train.batch([test_image, test_label], batch_size=200)
x = tf.reshape(batch[0], [-1, IMG_SIZE, IMG_SIZE, 3])
y_ = batch[1]
xT = tf.reshape(batchT[0], [-1, IMG_SIZE, IMG_SIZE, 3])
y_T = batchT[1]
# Graph definition
train_step = ... # train_step = g(x)
# Session
sess = tf.Session()
sess.run(tf.initialize_all_variables())
for i in range(1000):
if i%50 == 0:
# here i would like reuse train graph but with tensor x replaced by x_t
# train_accuracy = ?
# print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(session=sess)
I would use placeholders but I can't feed tf.placeholder with tf.Tensors and this is the thing I'm getting from queues.
How is it supposed to be done?
I'm really just starting.
Take a look at how this is done in the MNIST example: You need to use a placeholder with an initializer of the none-tensor form of your data (like filenames, or CSV) and then inside the graph, use the slice_input_producer -> deocde_jpeg (or whatever...) -> tf.train.batch() to create batches and feed those to the computation graph.
So your graph looks like something like:
Placeholder initialized with big filenames list/CSV/range
tf.slice_input_producer
tf.image.decode_jpeg or tf.py_func - loading of the actual data
tf.train.batch - create mini batches for training
feed to your model

Train a tensorflow model minimizing the loss of several batches

I would like to train the weights of a model based on the sum of the loss value of several batches. However it seems that once you run the graph for each of the individual batches, the object that is returned is just a regular numpy array. So when you try and use an optimizer like GradientDescentOptimizer, it no longer has information about the variables that were used to calculate the sum of the losses, so it can't find the gradients of the weights that what help minimize the loss. Here's an example tensorflow script to illustrate what I'm talking about:
weights = tf.Variable(tf.ones([num_feature_values], tf.float32))
feature_values = tf.placeholder(tf.int32, shape=[num_feature_values])
labels = tf.placeholder(tf.int32, shape=[1])
loss_op = some_loss_function(weights, feature_values, labels)
with tf.Session() as sess:
for batch in batches:
feed_dict = fill_feature_values_and_labels(batch)
#Calculates loss for one batch
loss = sess.run(loss_op, feed_dict=feed_dict)
#Adds it to total loss
total_loss += loss
# Want to train weights to minimize total_loss, however this
# doesn't work because the graph has already been run.
optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(total_loss)
with tf.Session() as sess:
for step in xrange(num_steps):
sess.run(optimizer)
The total_loss is a numpy array and thus cannot be used in the optimizer. Does anyone know a way around the problem, where I want to use information across many batches but still need the graph intact in order to preserve the fact that the total_loss is a function of the weights?
The thing you optimize in any of the trainers must be a part of the graph, here what you train on is the actual realized result, so it won't work.
I think the way you should probably do this is to construct your input as a batch of batches e.g.
intput = tf.placeholder("float", (number_of_batches, batch_size, input_size)
Then have your target also be a 3d tensor which can be trained on.

Categories

Resources