Memory leak for custom tensorflow training using #tf.function - python

I am trying to write my own training loop for TF2/Keras, following the official Keras walkthrough. The vanilla version works like a charm, but when I try to add the #tf.function decorator to my training step, some memory leak grabs all my memory and I lose control of my machine, does anyone know what is going on?.
The important parts of the code look like this:
#tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
logits = siamese_network(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, siamese_network.trainable_weights)
optimizer.apply_gradients(zip(grads, siamese_network.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
#tf.function
def test_step(x, y):
val_logits = siamese_network(x, training=False)
val_acc_metric.update_state(y, val_logits)
val_prec_metric.update_state(y_batch_val, val_logits)
val_rec_metric.update_state(y_batch_val, val_logits)
for epoch in range(epochs):
step_time = 0
epoch_time = time.time()
print("Start of {} epoch".format(epoch))
for step, (x_batch_train, y_batch_train) in enumerate(train_ds):
if step > steps_epoch:
break
loss_value = train_step(x_batch_train, y_batch_train)
train_acc = train_acc_metric.result()
train_acc_metric.reset_states()
for val_step,(x_batch_val, y_batch_val) in enumerate(test_ds):
if val_step>validation_steps:
break
test_step(x_batch_val, y_batch_val)
val_acc = val_acc_metric.result()
val_prec = val_prec_metric.result()
val_rec = val_rec_metric.result()
val_acc_metric.reset_states()
val_prec_metric.reset_states()
val_rec_metric.reset_states()
If I comment on the #tf.function lines, the memory leak doesn't occur, but the step time is 3 times slower. My guess is that somehow the graph is bean created again within each epoch or something like that, but I have no idea how to solve it.
This is the tutorial I am following: https://keras.io/guides/writing_a_training_loop_from_scratch/

tl;dr;
TensorFlow may be generating a new graph for each unique set of argument values passed into the decorated functions. Make sure you are passing consistently-shaped Tensor objects to test_step and train_step instead of python objects.
Details
This is a stab in the dark. While I've never tried #tf.function, I did find the following warnings in the documentation:
tf.function also treats any pure Python value as opaque objects, and builds a separate graph for each set of Python arguments that it encounters.
and
Caution: Passing python scalars or lists as arguments to tf.function will always build a new graph. To avoid this, pass numeric arguments as Tensors whenever possible
Finally:
A Function determines whether to reuse a traced ConcreteFunction by computing a cache key from an input's args and kwargs. A cache key is a key that identifies a ConcreteFunction based on the input args and kwargs of the Function call, according to the following rules (which may change): The key generated for a tf.Tensor is its shape and dtype. The key generated for a tf.Variable is a unique variable id. The key generated for a Python primitive (like int, float, str) is its value. The key generated for nested dicts, lists, tuples, namedtuples, and attrs is the flattened tuple of leaf-keys (see nest.flatten). (As a result of this flattening, calling a concrete function with a different nesting structure than the one used during tracing will result in a TypeError). For all other Python types the key is unique to the object. This way a function or method is traced independently for each instance it is called with.
What I get from all this is that if you don't pass in a consistently-sized Tensor object to your #tf.function-ified function (perhaps you use Python collections or primitives instead), it is likely that you are creating a new graph version of your function with every distinct argument value you pass in. I'm guessing this could create the memory explosion behavior you're seeing. I can't tell how your test_ds and train_ds objects are being created, but you might want to make sure that they are created such that enumerate(blah_ds) returns tensors like in the tutorial, or at least convert the values to tensors before passing to your test_step and train_step functions.

Related

optimizer.step() Not updating Model Weights/Parameters

I'm currently working on a solution via PyTorch. I'm not going to share the exact solution but I will provide code that reproduces the issue I'm having.
I have a model defined as follows:
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(10,4)
def foward(self,x):
return nn.functional.relu(self.fc1(x))
Then I create a instance: my_model = Net(). Next I create an Adam optimizer as such:
optim = Adam(my_model.parameters())
# create a random input
inputs = torch.tensor(np.array([1,1,1,1,1,2,2,2,2,2]),dtype=torch.float32,requires_grad=True)
# get the outputs
outputs = my_model(inputs)
# compute gradients / backprop via
outputs.backward(gradient=torch.tensor([1.,1.,1.,5.]))
# store parameters before optimizer step
before_step = list(my_model.parameters())[0].detach().numpy()
# update parameters via
optim.step()
# collect parameters again
after_step = list(my_model.parameters())[0].detach().numpy()
# Print if parameters are the same or not
print(np.array_equal(before_step,after_step)) # Prints True
I provided my models parameters to the Adam optimizer, so I'm not exactly sure why the parameters aren't updating. I know in most cases one uses a loss function, however I cannot do that in my case but I assumed if I specified model paramters to the optimizers, it would know to connect the two.
Anyone know why the parameters aren't getting updated?
The problem is with detach (docs).
As noted at the bottom:
Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks
So that is exactly what's happening here. To correctly compare the parameters, you need to clone (docs) them to get a real copy.
list(my_model.parameters())[0].clone().detach().numpy()
On a side note, it can be helpful if you check the gradients after optim.step() with print(list(my_model.parameters())[0].grad) to check if the graph is intact. Also, don't forget to call optim.zero_grad().

Custom reduction of losses within each batch in Keras

I am using keras for tensorflow in Python. I have a custom loss function that returns a single number for each sample in a batch (so a vector with length = batch size). How can I also specify a custom reduction method to aggregate these sample losses into a single loss for the entire batch? Is it acceptable to include this reduction within the custom loss function and have this function return just a single scalar rather than a vector of losses?
It really depends on your application and goal. A very common approach is to perform a reduce_mean over the loss generated on batch size. Some also use reduce_sum, which of course makes the loss value to depend on the batch size. A general (and maybe unnecessarily complicated) approach could be to use a function to call your desired function, which reduces the batch loss to a single value. Let's call it reducer. In your loss function, in the last line, you can call it right before return:
class my_loss(keras.losses.Loss):
def __init__(self, inputs)
# a bunch of assignments
self.reducer = self._get_reducer_function(inputs) (or a normal mean function)
def call(self, y_true, y_pred):
y_batch = ....
return self.reducer(y_batch)
def get_config(self):
return {'input': 1}
Of course you don't need to write so complicated, but it should give you an idea of how to do it. Also, you can simply add sample_weights if you need.

Accessing training data during tensorflow graph execution

I'd like to use pre-trained sentence embeddings in my tensorflow graph execution model. The embeddings are available dynamically from a function call, which takes in an array of sentences and outputs an array of sentence embeddings. This function uses a pre-trained pytorch model so has to remain separate from the tensorflow model I'm training:
def get_pretrained_embeddings(sentences):
return pretrained_pytorch_model.encode(sentences)
My tensorflow model looks like this:
class SentenceModel(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, sentences):
embedding_layer = tf.keras.layers.Embedding(
10_000,
256,
embeddings_initializer=tf.keras.initializers.Constant(get_pretrained_embeddings(sentences)),
trainable=False,
)
sentence_text_embedding = tf.keras.Sequential([
embedding_layer,
tf.keras.layers.GlobalAveragePooling1D(),
])
return sentence_text_embedding,
But when I try to train this model using
cached_train = train.shuffle(100_000).batch(1024)
model.fit(cached_train)
my embeddings_initializer call gets the error:
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
I assume this is because tensorflow is trying to compile the graph using symbolic data. How can I get my external function, which relies on the current training data batch, to work with tensorflow's graph training?
Tensorflow compiles models to an execution graph before performing the actual training process. The obvious side-effect that clues us into this is if we have a regular Python print() statement in e.g. our call() method, it will only get executed once as Tensorflow runs through your code to construct the execution graph, which it will later convert to native code.
The other side effect of this is that cannot use anything that isn't a tensor of some description when training. By 'tensor' here, all of the following can be considered a tensor:
The input value of your call() method (obviously)
A tf.Sequential
A tf.keras.Model/tf.keras.layers.Layer subclass
A SparseTensor
A tf.constant()
....probably more I haven't listed here.
To this end, you would need to convert your PyTorch model to a Tensorflow one to be able to reference it in a subclass of tf.keras.Model/tf.keras.layers.Layer.
As a side note, if you do find you need to iterate a tensor, you should just be able to iterate it on the 1st dimension (i.e. the batch size) like so:
for part in some_tensor:
pass
If you want to iterate on some other dimension, I recommend doing a tf.unstack(some_tensor, axis=AXIS_NUMBER_HERE) first and iterate over the result thereof.

Tensorflow & Keras: Different output shape depends on training or inferring

I am sub-classing tensorflow.keras.Model to implement a certain model. Expected behavior:
Training (fitting) time: returns a list of tensors including the final output and auxiliary output;
Inferring (predicting) time: returns a single output tensor.
And the code is:
class SomeModel(tensorflow.keras.Model):
# ......
def call(self, x, training=True):
# ......
return [aux1, aux2, net] if training else net
This is how i use it:
model=SomeModel(...)
model.compile(...,
loss=keras.losses.SparseCategoricalCrossentropy(),
loss_weights=[0.4, 0.4, 1],...)
# ......
model.fit(data, [labels, labels, labels])
And got:
AssertionError: in converted code:
ipython-input-33-862e679ab098:140 call *
`return [aux1, aux2, net] if training else net`
...\tensorflow_core\python\autograph\operators\control_flow.py:918 if_stmt
Then the problem is that the if statement is converted into the calculation graph and this would of course cause the problem. I found the whole stack trace is long and useless so it's not included here.
So, is there any way to make TensorFlow generate different graph based on training or not?
Which tensorflow version are you using? You can overwrite behaviour in the .fit, .predict and .evaluate methods in Tensorflow 2.2, which would generate different graphs for these methods (I assume) and potentially work for your use-case.
The problems with earlier versions is that subclassed models get created by tracing the call method. This means Python conditionals become Tensorflow conditionals and face several limitations during graph creation and execution.
First, both branches (if-else) have to be defined, and regarding python collections (eg. lists), the branches have to have the same structure (eg. number of elements). You can read about the limitations and effects of Autograph here and here.
(Also, a conditional may not get evaluated at every run, if the condition is based on a Python variable and not a tensor.)

Proper use of tf.while_loop during inference?

Is it possible to use tf.TensorArray's for reading and writing inside of the body of a tf.while_loop, but not pass them all through loop_vars?
I want to use tf.while_loop as part of a graph for WaveNet sound generation, which is a sequential generation mechanism that that generates the next amplitude value based on a window of previously generated ones. But, I want to use this just during inference, so no need for gradients, and would call with back_prop=False.
Besides this, the loop body function must read and write intermediate values that must be remembered across time steps.
It looks like tf.TensorArray is the only option for reading and writing values in this way, but I notice that tf.TensorArray.write() returns a new tf.TensorArray that is meant to be returned by the body and used in the loop_vars argument. Is this the best way to do this?
If I don't have a need for gradients, is there a simpler way to preserve state over the loop?
You can use tf.assign inside the tf.while to assign stuff inside a global variable.
EDIT: You cannot use tf.assign for assigning sliced indices of tf.Tensor, it is however allowed for tf.Variable. The argument sent to the while body is of type tf.Tensor and not tf.Variable so it wont work.
Here is some sample code.
import tensorflow as tf
x = tf.Variable([0,0])
assign_op = tf.assign(x[1],42)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(x)) # [0,0]
sess.run(assign_op)
print(sess.run(x)) # [0,42]

Categories

Resources