optimizer.step() Not updating Model Weights/Parameters

optimizer.step() Not updating Model Weights/Parameters - python

I'm currently working on a solution via PyTorch. I'm not going to share the exact solution but I will provide code that reproduces the issue I'm having.
I have a model defined as follows:
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(10,4)
def foward(self,x):
return nn.functional.relu(self.fc1(x))
Then I create a instance: my_model = Net(). Next I create an Adam optimizer as such:
optim = Adam(my_model.parameters())
# create a random input
inputs = torch.tensor(np.array([1,1,1,1,1,2,2,2,2,2]),dtype=torch.float32,requires_grad=True)
# get the outputs
outputs = my_model(inputs)
# compute gradients / backprop via
outputs.backward(gradient=torch.tensor([1.,1.,1.,5.]))
# store parameters before optimizer step
before_step = list(my_model.parameters())[0].detach().numpy()
# update parameters via
optim.step()
# collect parameters again
after_step = list(my_model.parameters())[0].detach().numpy()
# Print if parameters are the same or not
print(np.array_equal(before_step,after_step)) # Prints True
I provided my models parameters to the Adam optimizer, so I'm not exactly sure why the parameters aren't updating. I know in most cases one uses a loss function, however I cannot do that in my case but I assumed if I specified model paramters to the optimizers, it would know to connect the two.
Anyone know why the parameters aren't getting updated?

The problem is with detach (docs).
As noted at the bottom:
Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks
So that is exactly what's happening here. To correctly compare the parameters, you need to clone (docs) them to get a real copy.
list(my_model.parameters())[0].clone().detach().numpy()
On a side note, it can be helpful if you check the gradients after optim.step() with print(list(my_model.parameters())[0].grad) to check if the graph is intact. Also, don't forget to call optim.zero_grad().

Related

Steps of tf.summary.* operations in TensorBoard are always 0

When I'm training my model with TensorFlow 2.3, I want to visualize some intermediate tensors calculated using the weight in the computation graph of my customized tf.keras.layers.Layer.
So I use tf.summary.image() to record these tensors and visualize them as images like this:
class CustomizedLayer(tf.keras.layers.Layer):
def call(self, inputs, training=None):
# ... some code ...
tf.summary.image(name="some_weight_map", data=some_weight_map)
# ... some code ...
But in TensorBoard, no matter how many steps passed, there is only one image of step 0 shown.
And I tried to set the parameter step of tf.summary.image() to the value obtained from tf.summary.experimental.get_step():
tf.summary.image(name="weight_map", data=weight_map, step=tf.summary.experimental.get_step())
And update the step by calling tf.summary.experimental.set_step from a customized Callback using a tf.Variable like codes shown below:
class SummaryCallback(tf.keras.callbacks.Callback):
def __init__(self, step_per_epoch):
super().__init__()
self.global_step = tf.Variable(initial_value=0, trainable=False, name="global_step")
self.global_epoch = 0
self.step_per_epoch = step_per_epoch
tf.summary.experimental.set_step(self.global_step)
def on_batch_end(self, batch, logs=None):
self.global_step = batch + self.step_per_epoch * self.global_epoch
tf.summary.experimental.set_step(self.global_step)
# whether the line above is commented, calling tf.summary.experimental.get_step() in computation graph code always returns 0.
# tf.print(self.global_step)
def on_epoch_end(self, epoch, logs=None):
self.global_epoch += 1
This Callback's instance is passed in the argument callbacks in model.fit() function.
But the value tf.summary.experimental.get_step() returned is still 0.
The TensorFlow document of "tf.summary.experimental.set_step()" says:
when using this with #tf.functions, the step value will be captured at the time the function is traced, so changes to the step outside the function will not be reflected inside the function unless using a tf.Variable step.
Accroding to the document, I am already using a Variable to store the steps, but it's changes are still not reflected inside the function (or keras.Model).
Note: My code produces expected results in TensorFlow 1.x with just a simple line of tf.summary.image() before I migrate it to TensorFlow 2.
So I want to know if my approach is wrong in TensorFlow 2?
In TF2, how can I get training steps inside the computation graph?
Or there is other solution to summarize tensors (as scalar, image, etc.) inside a model in TensorFlow 2?

I found this issue has been reported on Github repository of Tensorflow: https://github.com/tensorflow/tensorflow/issues/43568
This is caused by using tf.summary in model while tf.keras.callbacks.TensorBoard callback is also enabled, and the step will always be zero. The issue reporter gives a temporary solution.
To fix it, inherit the tf.keras.callbacks.TensorBoard class and overwrite the on_train_begin method and on_test_begin method like this:
class TensorBoardFix(tf.keras.callbacks.TensorBoard):
"""
This fixes incorrect step values when using the TensorBoard callback with custom summary ops
"""
def on_train_begin(self, *args, **kwargs):
super(TensorBoardFix, self).on_train_begin(*args, **kwargs)
tf.summary.experimental.set_step(self._train_step)
def on_test_begin(self, *args, **kwargs):
super(TensorBoardFix, self).on_test_begin(*args, **kwargs)
tf.summary.experimental.set_step(self._val_step)
And use this fixed callback class in model.fit():
tensorboard_callback = TensorBoardFix(log_dir=log_dir, histogram_freq=1, write_graph=True, update_freq=1)
model.fit(dataset, epochs=200, callbacks=[tensorboard_callback])
This solve my problem and now I can get proper step inside my model by calling tf.summary.experimental.get_step().
(This issue may be fixed in later version of TensorFlow)

What does model.compile() do in keras tensorflow?

According to keras.io:
Once the model is created, you can config the model with losses and
metrics with model.compile().
But this explanation does not provide enough information about what exactly compiling model does.

Configures the model for training. documentation
Personally, I wouldn't call it compile, because what it does has got nothing to do with compilation, in computer science terms, and this is very confusing/ overwhelming to think about machine learning and compilation at the same time.
Its just a method which does configuration:
It just sets the arguments you pass it: optimizer, loss function, metrics, eager execution. You can run it multiple times, it will just overwrite the settings you set previously.
My suggestion to developers of TensorFlow would be to rename it to configure in the short term, and perhaps in the future (not that important), move to having 1 setter (or use the factory/ builder pattern) for each configuration argument.
Heres the code for it:
base_layer.keras_api_gauge.get_cell('compile').set(True)
with self.distribute_strategy.scope():
if 'experimental_steps_per_execution' in kwargs:
logging.warn('The argument `steps_per_execution` is no longer '
'experimental. Pass `steps_per_execution` instead of '
'`experimental_steps_per_execution`.')
if not steps_per_execution:
steps_per_execution = kwargs.pop('experimental_steps_per_execution')
self._validate_compile(optimizer, metrics, **kwargs)
self._run_eagerly = run_eagerly
self.optimizer = self._get_optimizer(optimizer)
self.compiled_loss = compile_utils.LossesContainer(
loss, loss_weights, output_names=self.output_names)
self.compiled_metrics = compile_utils.MetricsContainer(
metrics, weighted_metrics, output_names=self.output_names)
self._configure_steps_per_execution(steps_per_execution or 1)
# Initializes attrs that are reset each time `compile` is called.
self._reset_compile_cache()
self._is_compiled = True
self.loss = loss or {} # Backwards compat.

model.compile is related to training your model. Actually, your weights need to optimize and this function can optimize them. In a way that your accuracy make increases. This was just one of the input parameters called 'optimizer'.
model.compile(
optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics='acc'
)
These are the main inputs. Also you can find more details in TensorFlow documentation in link below:
https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile

The result is different when I apply torch.manual_seed before loading cuda() after loading the model

I tried to make sure my code to be reproducible (always get the same results)
So I applied below settings before my codes.
os.environ['PYTHONHASHSEED'] = str(args.seed)
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed) # if you are using multi-GPU.
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
With these settings, I always achieved the same results with the same environment and GPU.
Howerver, when I applied torch.manual_seed() after loading the model.
torch.manual_seed(args.seed)
model = Net()
Net.cuda()
torch.manual_seed(args.seed)
model = Net()
torch.manual_seed(args.seed)
Net.cuda()
The above two results were different.
How should I understand this situation?
Does seed reinitialize after loading the model?

The Net.cuda() has no effect on the random number generator. Under the hood it just calls cuda() for all of the model parameters. So basically it's multiple calls to Tensor.cuda().
https://github.com/pytorch/pytorch/blob/ecd3c252b4da3056797f8a505c9ebe8d68db55c4/torch/nn/modules/module.py#L293
We can test this by doing the following:
torch.random.manual_seed(42)
x = torch.rand(1)
x.cuda()
y = torch.rand(1)
y.cuda()
print(x, y)
# the above prints the same as below
torch.random.manual_seed(42)
print(torch.rand(1), torch.rand(1))
So that means Net() is using the number generator to initialize random weights within the layers.
torch.manual_seed(args.seed)
model = Net()
print(torch.rand(1))
# the below will print a different result
torch.manual_seed(args.seed)
model = Net()
torch.manual_seed(args.seed)
print(torch.rand(1))
I would recommend narrowing the scope of how random numbers are managed within your Python source code. So that a global block of code outside of the Model isn't responsible for how internal values are generated.
Simply said, pass the seed as a parameter to the __init__ of the model.
model = Net(args.seed)
print(torch.rand(1))
This will force developers to always provide a seed for consistency when using the model, and you can make the parameter optional if seeding isn't always necessary.
I'd avoid using the same seed all the time, because you're going to learn to use parameters that work best with that seed.

When should tf.losses.add_loss() be used in TensorFlow?

I cannot find an answer to this question in the TensorFlow documentation. I once read that one should add losses from tf.nn functions but it isn't necessary for functions from tf.losses. Therefore:
When should I use tf.losses.add_loss()?
Example:
loss = tf.reduce_mean(tf.nn.sparse_softmax_corss_entropy_with_logits
(labels=ground_truth, logits=predictions))
tf.losses.add_loss(loss) <-- when is this required?
Thank yoou.

One would use this method to register the loss defined by user.
Namely, if you have created a tensor that defines your loss, for example as my_loss = tf.mean(output) you can use this method to add it to loss collection. You might want to do that if you are not tracking all your losses manually. For example if you are using a method like tf.losses.get_total_loss().
Inside tf.losses.add_loss is very much straightforward:
def add_loss(loss, loss_collection=ops.GraphKeys.LOSSES):
if loss_collection and not context.executing_eagerly():
ops.add_to_collection(loss_collection, loss)

How to apply Optimizer on Variable in Chainer?

Here is an example in Pytorch:
optimizer = optim.Adam([modifier_var], lr=0.0005)
And here in Tensorflow:
self.train = self.optimizer.minimize(self.loss, var_list=[self.modifier])
But Chainer's optimizers only can use on 'Link', how can I apply Optimizer on Variable in Chainer?

In short, there is no way to directly assign chainer.Variable (even nor chainer.Parameter) to chainer.Optimizer.
The following is some redundant explanation.
First, I re-define Variable and Parameter to avoid confusion.
Variable is (1) torch.Tensor in PyTorch v4, (2) torch.autograd.Variable in PyTorch v3, and (3) chainer.Variable in Chainer v4.
Variable is an object who holds two tensors; .data and .grad. It is the necessary and sufficient condition, so Variable is not necessarily a learnable parameter, which is a target of the optimizer.
In both libraries, there is another class Parameter, which is similar but not the same with Variable. Parameter is torch.autograd.Parameter in Pytorch and chainer.Parameter in Chainer.
Parameter must be a learnable parameter and should be optimized.
Therefore, there should be no case to register Variable (not Parameter) to Optimizer (although PyTorch allows to register Variable to Optimizer: this is just for backward compatibility).
Second, in PyTorch torch.nn.Optimizer directly optimizes Parameter, but in Chainer chainer.Optimizer DOES NOT optimize Parameter: instead, chainer.UpdateRule does. The Optimizer just registers UpdateRules to Parameters in a Link.
Therefore, it is only natural that chainer.Optimizer does not receive Parameter as its arguments, because it is just a "delivery-man" of UpdateRule.
If you want to attach different UpdateRule for each Parameter, you should directly create an instance of UpdateRule subclass, and attach it to the Parameter.

Below is an example to learn regression task by MyChain MLP model using Adam optimizer in Chainer.
from chainer import Chain, Variable
# Prepare your model (neural network) as `Link` or `Chain`
class MyChain(Chain):
def __init__(self):
super(MyChain, self).__init__(
l1=L.Linear(None, 30),
l2=L.Linear(None, 30),
l3=L.Linear(None, 1)
)
def __call__(self, x):
h = self.l1(x)
h = self.l2(F.sigmoid(h))
return self.l3(F.sigmoid(h))
model = MyChain()
# Then you can instantiate optimizer
optimizer = chainer.optimizers.Adam()
# Register model to optimizer (to indicate which parameter to update)
optimizer.setup(model)
# Calculate loss, and update parameter as follows.
def lossfun(x, y):
loss = F.mean_squared_error(model(x), y)
return loss
# this iteration is "training", to fit the model into desired function.
for i in range(300):
optimizer.update(lossfun, x, y)
So in summary, you need to setup the model, after that you can use update function to calculate loss and update model's parameter.
The above code comes from here
Also, there are other way to write training code using Trainer module. For more detailed tutorial of Chainer, please refer below
chainer-handson
deep-learning-tutorial-with-chainer

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

optimizer.step() Not updating Model Weights/Parameters - python

Related

Steps of tf.summary.* operations in TensorBoard are always 0

What does model.compile() do in keras tensorflow?

The result is different when I apply torch.manual_seed before loading cuda() after loading the model

When should tf.losses.add_loss() be used in TensorFlow?

How to apply Optimizer on Variable in Chainer?

Categories

Resources