How to load a layer from checkpoint

How to load a layer from checkpoint - python

I have this config:
network = {"source_embed_raw": {"class": "linear", ...}}
I want to load the params for layer source_embed_raw from some existing checkpoint.
In that checkpoint, param is called differently (output/rec/target_embed_raw/W).
I understand, that I can load parameters with preload_from_files, but I am not sure about the exact way to do that in my case, because the names of the layers differ, thus simply adding a prefix does not do the job.

This is currently not possible with preload_from_files in this way.
So I currently see these possible options:
We could extend the logic of preload_from_files (and CustomCheckpointLoader) to allow for sth like that (some generic variable/layer name mapping).
Or you could rename your layer from source_embed_raw to e.g. old_model__target_embed_raw and then use preload_from_files with the prefix option. If you do not want to rename it, you could still add a layer like old_model__target_embed_raw and then use parameter sharing in source_embed_raw.
If the parameter in the checkpoint is actually called sth like output/rec/target_embed_raw/..., you could create a SubnetworkLayer named old_model__output, in that another SubnetworkLayer with name rec, and in that a layer named target_embed_raw.
You could write a script to simply load the existing checkpoint, and store is as a new checkpoint but with renamed variable names (this is also totally independent from RETURNN).
LinearLayer (and most other layers) allows to specify exactly how the parameters are initialized (forward_weights_init and bias_init). The parameter initialization is quite flexible. E.g. there is sth like load_txt_file_initializer which can be used. Currently there is no such function to directly load it from an existing checkpoint but we could add that. Or you could simply implement the logic inside your config (it will only be sth like 5 lines of code or so).
Instead of using preload_from_files, you could also use SubnetworkLayer and the load_on_init option. And then a similar logic as in option 2.

Related

How to change layer parent in python in Gimp?

Simple problem. I want to change the parent of LayerA to GroupB.
The member "parent" of layer is read only, and I can't use pdb.gimp_image_insert_layer because the layer already has been added to image. I also tried removing it first by gimp_image_remove_layer, and it also doesn't work.

I cannot find an API for this in Python. Using image.remove_layer() deletes the layer so it cannot be re-inserted, so the best I can think of is to copy the layer using something like this:
def moveLayer(image,layer,group,position):
layerName=layer.name
layerCopy=layer.copy()
image.remove_layer(layer)
layerCopy.name=layerName # Can't have two layers with same name
image.insert_layer(layerCopy,group,position)
return layerCopy # this one has a new ID
This said, I've written many Python scripts and never needed to change a layer parent, so maybe there is a way to avoid doing this...

Can I access the inner layer outputs of DeepLab in pytorch?

Using Pytorch, I am trying to implement a network that is using the pre=trained DeepLab ResNet-101.
I found two possible methods for using this network:
this one
or
torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)
However, I might not only need this network's output, but also several inside layers' outputs.
Is there a way to access the inner layer outputs using one of these methods?
If not - Is it possible to manually copy the trained resnet's parameters so I can manually recreate it and add those outputs myself? (Hopefully the first option is possible so I won't need to do this)
Thanks!

You can achieve this without too much trouble using forward hooks.
The idea is to loop over the modules of your model, find the layers you're interested in, hook a callback function onto them. When called, those layers will trigger the hook. We will take advantage of this to save the intermediate outputs.
For example, let's say you want to get the outputs of layer classifier.0.convs.3.1:
layers = ['classifier.0.convs.3.1']
activations = {}
def forward_hook(name):
def hook(module, x, y):
activations[name] = y
return hook
for name, module in model.named_modules():
if name in layers:
module.register_forward_hook(forward_hook(name))
*The closure around hook() made by forward_hook's scope is used to enclose the module's name which you wouldn't otherwise have access to at this point.
Everything is ready, we can call the model
>>> model = torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=True, progress=True, num_classes=21, aux_loss=None)
>>> model(torch.rand(16, 3, 100, 100))
And as expected, after inference, activations will have a new entry 'classifier.0.convs.3.1' which - in this case - will contain a tensor of shape (16, 256, 13, 13).
Not so long ago, I wrote an answer about a similar question which goes a little bit more in detail on how hooks can be used to inspect the intermediate output shapes.

Load (or combine) several pretrained checkpoints with tf.estimator.WarmStartSettings

I want to use pretrained weights for 2 parts of my model. I have 2 checkpoints from different models, from which I can load only one into my main model with tf.estimator.WarmStart as I'm using the estimator architecture.
tf.WarmStartSettings(ckpt_to_initialize_from=X)
from the doc:
Either the directory or a specific checkpoint can be provided (in the case of the former, the latest checkpoint will be used).
I can't see how I can add an additional checkpoint. Maybe there is a way to load the weights from both checkpoint into one and load that one?

You can use init_from_checkpoint.
First, define assignment map:
dir = 'path_to_checkpoint_files'
vars_to_load = [i[0] for i in tf.train.list_variables(dir)]
This creates a list of all variables in checkpoints
assignment_map = {variable.op.name: variable for variable in tf.global_variables() if variable.op.name in vars_to_load}
And this creates a dict that has variables from current graph as key and variables from checkpoint as values
tf.train.init_from_checkpoint(dir, assignment_map)
This function is placed inside estimator's model_fn. It will override standard variable initialization.

Explicitly clear/reset a nested TensorFlow Graph scope

So, I'm using a bunch of functions from OpenAI baselines for Reinforcement Learning. In those functions, policy nets are initialised using statements like:
with tf.variable_scope('deepq', reuse=True):
...
return output
The problem is that the pointer to the output of those networks gets returned while still inside the scope, which means that when accessing those functions from another .py file I am still inside those scopes.
Basically I want to run a first function train_policy(output_dir) that trains the net and dumps the checkpoint to disk using tf.Saver().
Next, I run a function run_policy(output_dir) that reinitializes the same tf Graph and loads it's pretrained values using the checkpoint dir.
Right now, when I try this, I get a ValueError:
"Variable deepq/... already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?" because at the point of running the second function, I'm still in the scope defined by the first.. I checked the code from OpenAI baselines (very nested code, hard to see everything that's going on), and reuse is already set to True.
So I tried doing something like:
tf.get_default_session().close() followed by:
tf.reset_default_graph()
after the first function call. (I don't need the session to remain active since I'm dumping everything to disk)
But this gives me errors because I'm still inside a nested graph scope and so I can't reset the default graph... (see eg here)
Alternatively I tried things like:
tf.get_default_graph().as_graph_def().__exit__()
or
tf.name_scope('deepq').__exit__()
but the exit() function needs a whole bunch of args I don't know how to get... (and I can't find good documentation on how to use this function).
My current solution is to run these functions in separate subprocesses in Python (and let the garbage collector do all the work), but this doensn't feel like a satisfactory solution..
Any ideas on how to deal with this? Ideally I'd need something like: tf.clear_all_graphs_and_sessions()

Ait one solution is indeed to reset the default graph:
I simply wrap every function call in a new default graph object like this:
with tf.Graph().as_default():
train_policy(output_dir)
with tf.Graph().as_default():
run_policy(output_dir)
...
This way the default graph simply gets reinitialised empty and you can load whatever is in the checkpoint file. (Inside every function I also close the default session before returning).

You can try to do your work in another default graph:
with tf.get_default_graph().as_default():
with tf.variable_scope('deepq', reuse=False):
v = tf.get_variable('v', shape=[])
print(v.name, v.graph)
with tf.Graph().as_default():
v = tf.get_variable('v', shape=[])
print(v.name, v.graph)
Output:
deepq/v:0 <tensorflow.python.framework.ops.Graph object at 0x7f61adaa6390>
v:0 <tensorflow.python.framework.ops.Graph object at 0x7f61460abbd0>

Tensorflow: Using weights trained in one model inside another, different model

I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.
The setup code looks something like this:
#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)
cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)
input_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
output_b = tf.get_variable("output_b", [6])
outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)
...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:
with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
cellSim = cellRaw
input_data_sim = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
input_list_sim = tf.unpack(input_data_sim)
outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
output_b = tf.get_variable("output_b", [nOut])
outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)
This second part returns the following error:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")
is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.
Questions:
What is the proper way to move trained LSTM weights from one graph to another?
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
Thanks!

What is the proper way to move trained LSTM weights from one graph to another?
You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:
basegraph = tf.Graph()
with basegraph.as_default():
***your graph***
traingraph = tf.Graph()
with traingraph.as_default():
tf.import_graph_def(basegraph.as_graph_def())
***your training graph***
make sure you load your variables when you start a session for a new graph.
I don't have experience with this functionality so you may have to look into it a bit more
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
yep, the graph object still hold it
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
I think your understanding or usage of scopes is flawed, see above

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.