I am training a Generative Adversarial Network (GAN) in tensorflow, where basically we have two different networks each one with its own optimizer.
self.G, self.layer = self.generator(self.inputCT,batch_size_tf)
self.D, self.D_logits = self.discriminator(self.GT_1hot)
...
self.g_optim = tf.train.MomentumOptimizer(self.learning_rate_tensor, 0.9).minimize(self.g_loss, global_step=self.global_step)
self.d_optim = tf.train.AdamOptimizer(self.learning_rate, beta1=0.5) \
.minimize(self.d_loss, var_list=self.d_vars)
The problem is that I train one of the networks (g) first, and then, I want to train g and d together. However, when I call the load function:
self.sess.run(tf.initialize_all_variables())
self.sess.graph.finalize()
self.load(self.checkpoint_dir)
def load(self, checkpoint_dir):
print(" [*] Reading checkpoints...")
ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
self.saver.restore(self.sess, ckpt.model_checkpoint_path)
return True
else:
return False
I have an error like this (with a lot more traceback):
Tensor name "beta2_power" not found in checkpoint files checkpoint/MR2CT.model-96000
I can restore the g network and keep training with that function, but when I want to star d from scratch, and g from the the stored model I have that error.
To restore a subset of variables, you must create a new tf.train.Saver and pass it a specific list of variables to restore in the optional var_list argument.
By default, a tf.train.Saver will create ops that (i) save every variable in your graph when you call saver.save() and (ii) lookup (by name) every variable in the given checkpoint when you call saver.restore(). While this works for most common scenarios, you have to provide more information to work with specific subsets of the variables:
If you only want to restore a subset of the variables, you can get a list of these variables by calling tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=G_NETWORK_PREFIX), assuming that you put the "g" network in a common with tf.name_scope(G_NETWORK_PREFIX): or tf.variable_scope(G_NETWORK_PREFIX): block. You can then pass this list to the tf.train.Saver constructor.
If you want to restore a subset of the variable and/or they variables in the checkpoint have different names, you can pass a dictionary as the var_list argument. By default, each variable in a checkpoint is associated with a key, which is the value of its tf.Variable.name property. If the name is different in the target graph (e.g. because you added a scope prefix), you can specify a dictionary that maps string keys (in the checkpoint file) to tf.Variable objects (in the target graph).
I had a similar problem when restoring only part of my variables from a checkpoint and some of the saved variables did not exist in the new model.
Inspired by #Lidong answer I modified a little the reading function:
def get_tensors_in_checkpoint_file(file_name,all_tensors=True,tensor_name=None):
varlist=[]
var_value =[]
reader = pywrap_tensorflow.NewCheckpointReader(file_name)
if all_tensors:
var_to_shape_map = reader.get_variable_to_shape_map()
for key in sorted(var_to_shape_map):
varlist.append(key)
var_value.append(reader.get_tensor(key))
else:
varlist.append(tensor_name)
var_value.append(reader.get_tensor(tensor_name))
return (varlist, var_value)
and added a loading function:
def build_tensors_in_checkpoint_file(loaded_tensors):
full_var_list = list()
# Loop all loaded tensors
for i, tensor_name in enumerate(loaded_tensors[0]):
# Extract tensor
try:
tensor_aux = tf.get_default_graph().get_tensor_by_name(tensor_name+":0")
except:
print('Not found: '+tensor_name)
full_var_list.append(tensor_aux)
return full_var_list
Then you can simply load all common variables using:
CHECKPOINT_NAME = path to save file
restored_vars = get_tensors_in_checkpoint_file(file_name=CHECKPOINT_NAME)
tensors_to_load = build_tensors_in_checkpoint_file(restored_vars)
loader = tf.train.Saver(tensors_to_load)
loader.restore(sess, CHECKPOINT_NAME)
Edit: I am using tensorflow 1.2
Inspired by #mrry, I propose a solution for this problem.
To make it clear, I formulate the problem as restoring a subset of the variable from the checkpoint, when the model is built on a pre-trained model.
First, we should use print_tensors_in_checkpoint_file function from the library inspect_checkpoint or just simply extract this function by:
from tensorflow.python import pywrap_tensorflow
def print_tensors_in_checkpoint_file(file_name, tensor_name, all_tensors):
varlist=[]
reader = pywrap_tensorflow.NewCheckpointReader(file_name)
if all_tensors:
var_to_shape_map = reader.get_variable_to_shape_map()
for key in sorted(var_to_shape_map):
varlist.append(key)
return varlist
varlist=print_tensors_in_checkpoint_file(file_name=the path of the ckpt file,all_tensors=True,tensor_name=None)
Then we use tf.get_collection() just like #mrry saied:
variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
Finally, we can initialize the saver by:
saver = tf.train.Saver(variable[:len(varlist)])
The complete version can be found at my github: https://github.com/pobingwanghai/tensorflow_trick/blob/master/restore_from_checkpoint.py
In my situation, the new variables are added at the end of the model, so I can simply use [:length()] to identify the needed variables, for a more complex situation, you might have to do some hand-alignment work or write a simple string matching function to determine the required variables.
You can create a separate instance of tf.train.Saver() with the var_list argument set to the variables you want to restore.
And create a separate instance to save the variables
Related
I have a class as follows and the load function returns me the tensorflow saved graph.
class StoredGraph():
.
.
.
def build_meta_saver(self, meta_file=None):
meta_file = self._get_latest_checkpoint() + '.meta' if not meta_file else meta_file
meta_saver = tf.train.import_meta_graph(meta_file)
return meta_saver
def load(self, sess, saverObj):
saverObj.restore(sess, self._get_latest_checkpoint())
graph = tf.get_default_graph()
return graph
I have another class lets call it TrainNet().
class TrainNet():
.
.
.
def train(dataset):
self.train_graph = tf.Graph()
meta_saver, saver = None, None
GraphIO = StoredGraph(experiment_dir)
latest_checkpoint = GraphIO._get_latest_checkpoint()
with self.train_graph.as_default():
tf.set_random_seed(42)
if not latest_checkpoint:
#build graph here
self.build_graph()
else:
meta_saver = GraphIO.build_meta_saver() # this loads the meta file
with tf.Session(graph=self.train_graph) as sess:
if not meta_saver:
sess.run(tf.global_variables_initializer())
if latest_checkpoint:
self.scaler, self.train_graph = GraphIO.load(sess, meta_saver)
#here access placeholders using self.train_graph.get_tensor_by_name()...
#and feed the values
In my training class I use the above class simply by loading the graph using load function as self.train_graph = StoredGraphclass.load(sess,metasaver)
My doubt is are all the variables restored by loading the saved graph ? Normally everyone defines the restoration operation in the same script like saver.restore() which restores all the variables of the graph. But I am calling saver.restore()in a different class and using the returned graph to access placeholders.
I think this way not all the variables are restored. Is the above approach wrong ? This doubt arose when I checked the values of weights in two different .meta files written at different training steps, and the values were exactly the same meaning this variable wasnt updated or the restoration method has some fault.
As long as you have created all the necessary variables in your file and given them the same "name" (and of course the shape needs to be correct as well), restore will load all the appropriate values into the appropriate variables. Here you can find a toy example showing you how this can be done.
I want to use pretrained weights for 2 parts of my model. I have 2 checkpoints from different models, from which I can load only one into my main model with tf.estimator.WarmStart as I'm using the estimator architecture.
tf.WarmStartSettings(ckpt_to_initialize_from=X)
from the doc:
Either the directory or a specific checkpoint can be provided (in the case of the former, the latest checkpoint will be used).
I can't see how I can add an additional checkpoint. Maybe there is a way to load the weights from both checkpoint into one and load that one?
You can use init_from_checkpoint.
First, define assignment map:
dir = 'path_to_checkpoint_files'
vars_to_load = [i[0] for i in tf.train.list_variables(dir)]
This creates a list of all variables in checkpoints
assignment_map = {variable.op.name: variable for variable in tf.global_variables() if variable.op.name in vars_to_load}
And this creates a dict that has variables from current graph as key and variables from checkpoint as values
tf.train.init_from_checkpoint(dir, assignment_map)
This function is placed inside estimator's model_fn. It will override standard variable initialization.
I am trying to build a generic tensorflow infrastructure wrapped inside a simple one layer NN class (see code below).
I will be creating many NNets so I was wondering what was the best way to manage the sessions and the variables.
Typically, I'd like to get tf.trainable_variables() for only one network, not all of them (in the "show" function) so that I can print the network I want.
I also have to pass the session variable "sess" to every function, so that the variables are not re-initialized.
I think I am not doing everything properly... Can someone help ?
class oneLayerNN:
"""
Implements a 1 hidden-layer neural network: y = W2 * ([W1 * x + b1]+) + b1
"""
def __init__(self, ...):
...
self.initOp = tf.global_variables_initializer()
def show(self, sess):
tvars = tf.trainable_variables()
tvals = sess.run(tvars)
for var, val in zip(tvars,tvals):
print(var.name, val)
print()
def initializeVariables(self, sess):
sess.run(self.initOp)
def forwardPropagation(self, sess, x):
labels = sess.run(self.yHat, feed_dict={self.x: x})
return labels
def train(self, sess, dataset, epochs, batchSize, debug=False, verbose=False):
dataset = dataset.batch(batchSize)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
for epoch in range(epochs):
sess.run(iterator.initializer)
while True:
try:
batch_x, batch_y = sess.run(next_element)
_, c = sess.run([self.optimizer, self.loss], feed_dict={self.x: batch_x, self.y: batch_y})
except tf.errors.OutOfRangeError:
break
with tf.Session() as sess:
network.initializeVariables(sess)
network.show(sess)
It is probably a matter of taste and of how you intend to use your objects.
If it is OK for you to limit your objects to deal with a single tf.Session (as in Keras — should cover basic needs and probably a bit beyond), then you could simply instantiate a single tf.Session via your preferred Singleton-like pattern (maybe just plain old functions like in Keras).
Thanks for your answers.
However I still have issues with the scopes of variables. How can I do to define variables as part as my object? I want to be able to do something like:
vars = network.getTrainableVariables()
And that should return only the variables defined in that object (not like tf.trainable_variables())
I can't find one example of a clean declaration of variables within a scope when using multiple networks at the same time (the scope being the name of the network for example).
At the moment when I run the code multiple times, it creates variables W,b, then W_1,b_1, then W_2,b_2 etc...
Also, I would like network.initialize() to initialize only the variables defined within this graph, not all variables in every network...
A solution would be to declare variables for network within scope 'name' and then be able to reset_default_graph within this 'name' scope but I am not able to do that.
I'd suggest using tf.keras.Model to manage state. Take a look at the subclassing section of the tf.keras documentation. There are training examples using Model.fit there, but you can also just call the object directly, and it will collect variables and losses for you in properties (variables, trainable_variables, losses, etc.).
Whatever you do, I'd separate the model definition (anything that manages Variable objects) from the training loop. And when defining the model, Variables should be attributes of the model definition object and created once (not necessarily in __init__, but protected by an if self.attribute is not None: self.attribute = tf.Variable(...)).
I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.
The setup code looks something like this:
#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)
cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)
input_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
output_b = tf.get_variable("output_b", [6])
outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)
...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:
with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
cellSim = cellRaw
input_data_sim = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
input_list_sim = tf.unpack(input_data_sim)
outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
output_b = tf.get_variable("output_b", [nOut])
outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)
This second part returns the following error:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")
is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.
Questions:
What is the proper way to move trained LSTM weights from one graph to another?
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
Thanks!
What is the proper way to move trained LSTM weights from one graph to another?
You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:
basegraph = tf.Graph()
with basegraph.as_default():
***your graph***
traingraph = tf.Graph()
with traingraph.as_default():
tf.import_graph_def(basegraph.as_graph_def())
***your training graph***
make sure you load your variables when you start a session for a new graph.
I don't have experience with this functionality so you may have to look into it a bit more
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
yep, the graph object still hold it
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
I think your understanding or usage of scopes is flawed, see above
I have a setup where I need to initialize an LSTM after the main initialization which uses tf.initialize_all_variables(). I.e. I want to call tf.initialize_variables([var_list])
Is there way to collect all the internal trainable variables for both:
rnn_cell.BasicLSTM
rnn_cell.MultiRNNCell
so that I can initialize JUST these parameters?
The main reason I want this is because I do not want to re-initialize some trained values from earlier.
The easiest way to solve your problem is to use variable scope. The names of the variables within a scope will be prefixed with its name. Here is a short snippet:
cell = rnn_cell.BasicLSTMCell(num_nodes)
with tf.variable_scope("LSTM") as vs:
# Execute the LSTM cell here in any way, for example:
for i in range(num_steps):
output[i], state = cell(input_data[i], state)
# Retrieve just the LSTM variables.
lstm_variables = [v for v in tf.all_variables()
if v.name.startswith(vs.name)]
# [..]
# Initialize the LSTM variables.
tf.initialize_variables(lstm_variables)
It would work the same way with MultiRNNCell.
EDIT: changed tf.trainable_variables to tf.all_variables()
You can also use tf.get_collection():
cell = rnn_cell.BasicLSTMCell(num_nodes)
with tf.variable_scope("LSTM") as vs:
# Execute the LSTM cell here in any way, for example:
for i in range(num_steps):
output[i], state = cell(input_data[i], state)
lstm_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope=vs.name)
(partly copied from Rafal's answer)
Note that the last line is equivalent to the list comprehension in Rafal's code.
Basically, tensorflow stores a global collection of variables, which can be fetched by either tf.all_variables() or tf.get_collection(tf.GraphKeys.VARIABLES). If you specify scope (scope name) in the tf.get_collection() function, then you only fetch tensors (variables in this case) in the collection whose scopes are under the specified scope.
EDIT:
You can also use tf.GraphKeys.TRAINABLE_VARIABLES to get trainable variables only. But since vanilla BasicLSTMCell does not initialize any non-trainable variable, both will be functionally equivalent. For a complete list of default graph collections, check this out.