Optimizing a vector using eager execution

Optimizing a vector using eager execution - python

I would like to use TensorFlow's eager execution functionality to optimize the components of a vector. In all the documented examples, each trainable variable is just a scalar, with collections represented by lists of these. However, the loss function I have in mind involves performing vector manipulations on those components, and so this is inconvenient.
For example, let us use the Adam optimizer to normalize a 3-component vector:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tf.enable_eager_execution()
def normalize(din=[2.0,1.0,0.0], lr=0.001,
nsteps=100):
d = tfe.Variable(din)
def loss(dvec):
return tf.sqrt((1.0 - tf.tensordot(dvec, dvec, 1))**2)
def grad(dvec):
with tf.GradientTape() as tape:
loss_val = loss(dvec)
return tape.gradient(loss_val, dvec)
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
for i in range(nsteps):
grads = grad(d)
optimizer.apply_gradients(zip(grads, d)) #Throws error
return d
This code correctly computes the required gradients. However, the "optimizer.apply_gradients" line throws some kind of error, seemingly no matter what I do, essentially because tfe.Variable is not an iterable.
In this specific example the error is "AttributeError: Tensor.name is meaningless when eager execution is enabled". We could also try, for example,
zip(grads, [d[i] for i in range(3)])
instead of d, but then the interpreter complains that d is not iterable.
What is the correct way to pair grads with d?

Optimizer.apply_gradients requires its first argument to be a list of (gradient, variable) pairs.
In the code above, neither grads nor d is a list (try print(type(grads)) for example), so the error is from the call to zip. I think what you want instead is:
optimizer.apply_gradients(zip([grads], [d]))
Or, more simply:
optimizer.apply_gradients([(grads, d)])
Also, FYI, as eager execution is stabilizing more things are moving out of the experimental "contrib" namespace, so you don't need the tfe module for your example (tf.Variable will work just fine) if you're using a recent version of TensorFlow (1.11, 1.12 etc.). Making your whole program look like:
import tensorflow as tf
import numpy as np
tf.enable_eager_execution()
def normalize(din=[2.0,1.0,0.0], lr=0.001,
nsteps=100):
d = tf.Variable(din)
def loss(dvec):
return tf.sqrt((1.0 - tf.tensordot(dvec, dvec, 1))**2)
def grad(dvec):
with tf.GradientTape() as tape:
loss_val = loss(dvec)
return tape.gradient(loss_val, dvec)
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
for i in range(nsteps):
dd = grad(d)
optimizer.apply_gradients([(dd, d)])
return d
Hope that helps!

Related

Pytorch update hyper-parameter using current loss, RuntimeError: Trying to backward through the graph a second time

I am defining my own loss function, and my own loss function has a hyper-parameter Lambda. For example, if the prediction is y, then I define the loss function as Loss = Lambda * y. I want to update my Lambda at some iteration using the current round's Loss. For example, at some specific iteration, I want my Lambda to be updated as Lambda = Lambda + Loss, then it returns the error of
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Specifically, my naive code is as follows:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
batch_size, input, output = 10, 3, 3
model = nn.Linear(input, output)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
lam = torch.from_numpy(np.array([0.1, 0.1, 0.1]))
lam.requires_grad = False
for i in range(10):
x = torch.rand(batch_size, input)
output = model(x)
loss = torch.sum(lam*output)
if i == 5:
lam = lam + torch.clone(loss)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(loss)
I had the feeling that the error was caused by using Loss to update my Lambda. So I used the code torch.clone(loss), hoping not to influence loss, it didn't help. Does anyone know how to fix the problem? Some explanation about why this error occurs would be great!

Firstly take a look this link.
There's explanation why the error occurs in your code.
To reduce memory usage, during the .backward() call, all the intermediary results are deleted when they are not needed anymore. Hence if you try to call .backward() again, the intermediary results don’t exist and the backward pass cannot be performed (and you get the error you see).
So if you want to fix the problem, you can easily solve it by adding a parameter:
loss.backward(retain_graph=True)
or
lam = lam + torch.Tensor([loss.item()])
Hopefully it helps! ;)

Tensorflow Gradient Tape returning None

I'm using TensorFlow 1.14.0 with Python(3.6.8). I'm trying to use tensorflow_probability's lbfgs optimizer implementation(documentation/example).
If I run the example code provided in the documentation it works fine. I tried to follow the same procedure for my own code which uses the tf.GradientTape() approach for computing the objective function. When doing it that way, the gradients come back as None type.
I'm not seeing why one is working, but the other is not.
Edit: I realized that running eager execution using the gradients wouldn't work, so I adjusted the example to be able to be run with eager execution.
Non-working example(using GradientTape) with eager execution
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
tf.enable_eager_execution()
# A high-dimensional quadratic bowl.
ndims = 3
minimum = np.ones([ndims], dtype='float64')
scales = np.arange(ndims, dtype='float64') + 1.0
# The objective function and the gradient.
def quadratic(x):
with tf.GradientTape() as g:
value = tf.reduce_sum(scales * (x - minimum) ** 2)
grads = g.gradient(value, x)
print('Gradients: ')
print(grads)
return value, grads
start = np.arange(ndims, 0, -1, dtype='float64')
optim_results = tfp.optimizer.lbfgs_minimize(quadratic, initial_position=start, num_correction_pairs=10,tolerance=1e-8)
print('results')
print(optim_results)
# Check that the search converged
assert(optim_results.converged)
# Check that the argmin is close to the actual value.
np.testing.assert_allclose(optim_results.position, minimum)

You need to watch x. For the operations inside this context manager, it's required at least one of their inputs is being watched.
with tf.GradientTape() as g:
g.watch(x)
value = tf.reduce_sum(scales * (x - minimum) ** 2)

TensorFlow's Print or K.print_tensor are not printing intermediate tensors in loss function

I have written a rather complex loss function for a Keras model and it keeps returning nan while training. Therefore, I need to print the intermediate tensors while training. I understand that you cannot do K.eval in your loss function because the tensors are not initialized. However, I have tried both K.print_tensor() and tf.Print() and neither work.
Pretty much I want to do something like this:
def mean_squared_error(y_true, y_pred):
print("mean_squared_error")
loss = K.mean(K.square(y_pred - y_true), axis=-1)
loss = tf.Print(loss, [loss])
return loss
model.compile(optimizer=self.optimizer, loss=mean_squared_error)
In practice, I would replace mean_squared_error with my custom loss. "mean_squared_error" would get printed, but not the values I try to print using TensorFlow print (nor Keras print). I also tried the exact same code as in How do I print inside the loss function during training in Keras? and I still don't see anything getting printed in the console.
In addition, I have written a separate file to test something.
import tensorflow as tf
import keras.backend as K
input1 = K.constant(1)
input2 = K.constant(2)
input3 = K.constant(3)
node1 = tf.add(input1, input2)
print_output = K.print_tensor(node1)
output = tf.multiply(print_output, input3)
Nothing gets printed either.
Am I using TensorFlow's Print and Keras print_tensor wrongly? Or are the results printed elsewhere? I have tried to test for my console's stderr using print("test", file=sys.stderr) and got the correct output test.
For clarification, I know that you can use K.eval to make the test code print out values of the tensor, but since I cannot use K.eval in my loss function, I need to make tf.Print or K.print_tensor work.

The issue here is that the training code often does not actually depend on the value of the loss tensor! Usually you can compute the gradient of a loss without ever computing the actual value of the loss, and this means tensorflow's runtime is free to prune the actual execution of the loss from the graph.
You can wrap your loss function in a tf.contrib.eager.defun decorator, which has the side effect of guaranteeing that all stateful ops in your function run even if they are not needed by the backward pass.

You will have to use tf.InteractiveSession if you want to run ops and print results without passing a session -- see details here
So you test code will print node1 value if changed as follows:
import tensorflow as tf
import keras.backend as K
input1 = K.constant(1)
input2 = K.constant(2)
input3 = K.constant(3)
node1 = tf.add(input1, input2)
print_output = K.print_tensor(node1)
output = tf.multiply(print_output, input3)
sess = tf.InteractiveSession()
print("node1: ", node1.eval())
sess.close()

Your code builds a graph:
import tensorflow as tf
import keras.backend as K
input1 = K.constant(1)
input2 = K.constant(2)
input3 = K.constant(3)
node1 = tf.add(input1, input2)
print_output = K.print_tensor(node1)
output = tf.multiply(print_output, input3)
In order to run the graph, you need to define a Session environment in which Operation objects are executed, and Tensor objects are evaluated:
sess = tf.Session()
To evaluate the tensor output:
sess.run(output)
Finally, release the resources:
sess.close()
Your code just defines the graph. There is no Session and no evaluation operation.

In TensorFlow 2, similarly to the solution suggested in this answer, you can decorate your loss function with #tf.function.

keras implementation of Levenberg-Marquardt optimization algorithm as a custom optimizer

I am trying to implement Levenberg-Marquardt algorithm as a Keras optimizer as was described here but I have several problems, biggest one is with this error
TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn.
After quick search I have found out this is connected to how tensorflow is running programs with graphs which I don't understand in details.I have found this answer useful from SO but its about loss function, not optimizer.
So to the point.
My attempt looks like this:
from keras.optimizers import Optimizer
from keras.legacy import interfaces
from keras import backend as K
class Leveberg_Marquardt(Optimizer):
def __init__(self, tau =1e-2 , lambda_1=1e-5, lambda_2=1e+2, **kwargs):
super(Leveberg_Marquardt, self).__init__(**kwargs)
with K.name_scope(self.__class__.__name__):
self.iterations = K.variable(0, dtype='int64', name='iterations')
self.tau = K.variable(tau,name ='tau')
self.lambda_1 = K.variable(lambda_1,name='lambda_1')
self.lambda_2 = K.variable(lambda_2,name='lambda_2')
#interfaces.legacy_get_updates_support
def get_updates(self, loss, params):
grads = self.get_gradients(loss,params)
self.updates = [K.update_add(self.iterations,1)]
error = [K.int_shape(m) for m in loss]
for p,g,err in zip(params,grads,error):
H = K.dot(g, K.transpose(g)) + self.tau * K.eye(K.max(g))
w = p - K.pow(H,-1) * K.dot(K.transpose(g),err) #ended at step 3 from http://mads.lanl.gov/presentations/Leif_LM_presentation_m.pdf
if self.tau > self.lambda_2:
w = w - 1/self.tau * err
if self.tau < self.lambda_1:
w = w - K.pow(H,-1) * err
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
w = p.constraint(w)
self.updates.append(K.update_add(err, w))
return self.updates
def get_config(self):
config = {'tau':float(K.get_value(self.tau)),
'lambda_1':float(K.get_value(self.lambda_1)),
'lambda_2':float(K.get_value(self.lambda_2)),}
base_config = super(Leveberg_Marquardt, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
Q1 Can I fix this error without going deep into tensorflow (I wish I could do this by staying on Keras level)
Q2 Do I use keras backend in correct way?
I mean, in this line
H = K.dot(g, K.transpose(g)) + self.tau * K.eye(K.max(g))
I should use keras backend function, or numpy or pure python in order to run this code without problem that input data are numpy arrays?
Q3 This question is more about the algorith itself.
Do I even implement LMA correctly? I'm must say, I not sure how to deal with boundry conditions, tau/lambda values I have guessed, maybe you know better way?
I was trying to understand how every other optimizer in keras works, but even SGD code looks ambiguous to me.
Q4 Do I need to change in any way local file optimizers.py?
In order to run it properly I was initializing my optimizer with:
myOpt = Leveberg_Marquardt()
and then simply pass it to complie method. Yet after quick look at source code of optimizers.py I have found thera are places in code with explicity writted names of optimizers (e.g deserialize function). Is it important to extend this for my custom optimizer or I can leave it be?
I would really appreciate any help and direction of future actions.

Q1 Can I fix this error without going deep into tensorflow (I wish I
could do this by staying on Keras level)
A1 I believe even if this error is fixed there are still problems in the implementation of the algorithm that keras does not support for example, the error term f(x;w_0)-y from the document is not available to a keras optimizer.
Q2 Do I use keras backend in correct way?
A2 Yes you must use the keras backend for this calculation because g is a tensor object and not a numpy array. However, I believe the correct calculation for H should be H = K.dot(K.transpose(g), g) to take the Nx1 vector g and perform an outer product to produce an NxN matrix.
Q3 This question is more about the algorith itself.
A3 As stated in A1 I am not sure that keras supports the required inputs for this algorithm.
Q4 Do I need to change in any way local file optimizers.py?
A4 The provided line of code would run the optimizer if supplied as the optimizer argument to the model compile function of keras. The keras library supports calling the built in classes and functions by name for convenience.

Cannot gather gradients for GradientDescentOptimizer in TensorFlow

I've been trying to gather the gradient steps for each step of the GradientDescentOptimizer within TensorFlow, however I keep running into a TypeError when I try to pass the result of apply_gradients() to sess.run(). The code I'm trying to run is:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
x = tf.placeholder(tf.float32,[None,784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W)+b)
y_ = tf.placeholder(tf.float32,[None,10])
cross_entropy = -tf.reduce_sum(y_*log(y))
# note that up to this point, this example is identical to the tutorial on tensorflow.org
gradstep = tf.train.GradientDescentOptimizer(0.01).compute_gradients(cross_entropy)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
batch_x,batch_y = mnist.train.next_batch(100)
print sess.run(gradstep, feed_dict={x:batch_x,y_:batch_y})
Note that if I replace the last line with print sess.run(train_step,feed_dict={x:batch_x,y_:batch_y}), where train_step = tf.GradientDescentOptimizer(0.01).minimize(cross_entropy), the error is not raised. My confusion arises from the fact that minimize calls compute_gradients with exactly the same arguments as its first step. Can someone explain why this behavior occurs?

The Optimizer.compute_gradients() method returns a list of (Tensor, Variable) pairs, where each tensor is the gradient with respect to the corresponding variable.
Session.run() expects a list of Tensor objects (or objects convertible to a Tensor) as its first argument. It does not understand how to handle a list of pairs, and hence you get a TypeError which you try to run sess.run(gradstep, ...)
The correct solution depends on what you are trying to do. If you want to fetch all of the gradient values, you can do the following:
grad_vals = sess.run([grad for grad, _ in gradstep], feed_dict={x: batch_x, y: batch_y})
# Then, e.g., nuild a variable name-to-gradient dictionary.
var_to_grad = {}
for grad_val, (_, var) in zip(grad_vals, gradstep):
var_to_grad[var.name] = grad_val
If you also want to fetch the variables, you can execute the following statement separately:
sess.run([var for _, var in gradstep])
...though note that—without further modification to your program—this will just return the initial values for each variable.
You will have to run the optimizer's training step (or otherwise call Optimizer.apply_gradients()) to update the variables.

minimize calls compute_gradients followed by apply_gradients: it's possible you're missing the second step.
compute_gradients just returns the grads / variables, but doesn't apply the update rule to them.
Here is an example: https://github.com/tensorflow/tensorflow/blob/f2bd0fc399606d14b55f3f7d732d013f32b33dd5/tensorflow/python/training/optimizer.py#L69

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimizing a vector using eager execution - python

Related

Pytorch update hyper-parameter using current loss, RuntimeError: Trying to backward through the graph a second time

Tensorflow Gradient Tape returning None

TensorFlow's Print or K.print_tensor are not printing intermediate tensors in loss function

keras implementation of Levenberg-Marquardt optimization algorithm as a custom optimizer

Cannot gather gradients for GradientDescentOptimizer in TensorFlow

Categories

Resources