Cannot gather gradients for GradientDescentOptimizer in TensorFlow - python

I've been trying to gather the gradient steps for each step of the GradientDescentOptimizer within TensorFlow, however I keep running into a TypeError when I try to pass the result of apply_gradients() to sess.run(). The code I'm trying to run is:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
x = tf.placeholder(tf.float32,[None,784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W)+b)
y_ = tf.placeholder(tf.float32,[None,10])
cross_entropy = -tf.reduce_sum(y_*log(y))
# note that up to this point, this example is identical to the tutorial on tensorflow.org
gradstep = tf.train.GradientDescentOptimizer(0.01).compute_gradients(cross_entropy)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
batch_x,batch_y = mnist.train.next_batch(100)
print sess.run(gradstep, feed_dict={x:batch_x,y_:batch_y})
Note that if I replace the last line with print sess.run(train_step,feed_dict={x:batch_x,y_:batch_y}), where train_step = tf.GradientDescentOptimizer(0.01).minimize(cross_entropy), the error is not raised. My confusion arises from the fact that minimize calls compute_gradients with exactly the same arguments as its first step. Can someone explain why this behavior occurs?

The Optimizer.compute_gradients() method returns a list of (Tensor, Variable) pairs, where each tensor is the gradient with respect to the corresponding variable.
Session.run() expects a list of Tensor objects (or objects convertible to a Tensor) as its first argument. It does not understand how to handle a list of pairs, and hence you get a TypeError which you try to run sess.run(gradstep, ...)
The correct solution depends on what you are trying to do. If you want to fetch all of the gradient values, you can do the following:
grad_vals = sess.run([grad for grad, _ in gradstep], feed_dict={x: batch_x, y: batch_y})
# Then, e.g., nuild a variable name-to-gradient dictionary.
var_to_grad = {}
for grad_val, (_, var) in zip(grad_vals, gradstep):
var_to_grad[var.name] = grad_val
If you also want to fetch the variables, you can execute the following statement separately:
sess.run([var for _, var in gradstep])
...though note that—without further modification to your program—this will just return the initial values for each variable.
You will have to run the optimizer's training step (or otherwise call Optimizer.apply_gradients()) to update the variables.

minimize calls compute_gradients followed by apply_gradients: it's possible you're missing the second step.
compute_gradients just returns the grads / variables, but doesn't apply the update rule to them.
Here is an example: https://github.com/tensorflow/tensorflow/blob/f2bd0fc399606d14b55f3f7d732d013f32b33dd5/tensorflow/python/training/optimizer.py#L69

Related

Pytorch update hyper-parameter using current loss, RuntimeError: Trying to backward through the graph a second time

I am defining my own loss function, and my own loss function has a hyper-parameter Lambda. For example, if the prediction is y, then I define the loss function as Loss = Lambda * y. I want to update my Lambda at some iteration using the current round's Loss. For example, at some specific iteration, I want my Lambda to be updated as Lambda = Lambda + Loss, then it returns the error of
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Specifically, my naive code is as follows:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
batch_size, input, output = 10, 3, 3
model = nn.Linear(input, output)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
lam = torch.from_numpy(np.array([0.1, 0.1, 0.1]))
lam.requires_grad = False
for i in range(10):
x = torch.rand(batch_size, input)
output = model(x)
loss = torch.sum(lam*output)
if i == 5:
lam = lam + torch.clone(loss)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(loss)
I had the feeling that the error was caused by using Loss to update my Lambda. So I used the code torch.clone(loss), hoping not to influence loss, it didn't help. Does anyone know how to fix the problem? Some explanation about why this error occurs would be great!
Firstly take a look this link.
There's explanation why the error occurs in your code.
To reduce memory usage, during the .backward() call, all the intermediary results are deleted when they are not needed anymore. Hence if you try to call .backward() again, the intermediary results don’t exist and the backward pass cannot be performed (and you get the error you see).
So if you want to fix the problem, you can easily solve it by adding a parameter:
loss.backward(retain_graph=True)
or
lam = lam + torch.Tensor([loss.item()])
Hopefully it helps! ;)

Assigning to a TensorFlow variable during a recursive loop

In Tensorflow 1.9, I want to create a network and then recursively feed the output (the prediction) of the network back into the input of the network. During this loop, I want to store the predictions made by the network in a list.
Here is my attempt:
# Define the number of steps over which to loop the network
num_steps = 5
# Define the network weights
weights_1 = np.random.uniform(0, 1, [1, 10]).astype(np.float32)
weights_2 = np.random.uniform(0, 1, [10, 1]).astype(np.float32)
# Create a variable to store the predictions, one for each loop
predictions = tf.Variable(np.zeros([num_steps, 1]), dtype=np.float32)
# Define the initial prediction to feed into the loop
initial_prediction = np.array([[0.1]], dtype=np.float32)
x = initial_prediction
# Loop through the predictions
for step_num in range(num_steps):
x = tf.matmul(x, weights_1)
x = tf.matmul(x, weights_2)
predictions[step_num-1].assign(x)
# Define the final prediction
final_prediction = x
# Start a session
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Make the predictions
last_pred, all_preds = sess.run([final_prediction, predictions])
print(last_pred)
print(all_preds)
And this prints out:
[[48.8769]]
[[0.]
[0.]
[0.]
[0.]
[0.]]
So whilst the value of final_prediction appears correct, the value of predictions is not what I would expect. It seems that predictions is never actually assigned to, despite the line predictions[step_num-1].assign(x).
Please can somebody explain to me why this isn't working, and what I should be doing instead? Thanks!
This happens because assign ist just a TF op like any other, and as such is only executed if needed. Since nothing on the path to final_prediction relies on the assign op, and predictions is just a variable, the assignment is never executed.
I think the most straightforward solution would be to replace the line
predictions[step_num-1].assign(x)
by
x = predictions[step_num-1].assign(x)
This works because assign also returns the value it is assigning. Now, to compute final_prediction TF actually needs to "go through" the assign op so the assignments should be carried out.
Another option would be to use tf.control_dependencies which is a way to "force" TF to compute specific ops when it is computing other ones. However in this case it could be a bit icky because the op we want to force (assign) depends on values that are being computed within the loop and I'm not sure about the order in which TF does stuff in this case. The following should work:
for step_num in range(num_steps):
x = tf.matmul(x, weights_1)
x = tf.matmul(x, weights_2)
with tf.control_dependencies([predictions[step_num-1].assign(x)]):
x = tf.identity(x)
We use tf.identity as a noop just to have something to wrap with control_dependencies. I think this is the more flexible option between the two. However it comes with some caveats discussed in the docs.

Optimizing a function involving tf.keras's "model.predict()" using TensorFlow optimizers?

I used tf.keras to build a fully-connected ANN, "my_model". Then, I'm trying to minimize a function f(x) = my_model.predict(x) - 0.5 + g(x) using Adam optimizer from TensorFlow. I tried the below code:
x = tf.get_variable('x', initializer = np.array([1.5, 2.6]))
f = my_model.predict(x) - 0.5 + g(x)
optimizer = tf.train.AdamOptimizer(learning_rate=.001).minimize(f)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(50):
print(sess.run([x,f]))
sess.run(optimizer)
However, I'm getting the following error when my_model.predict(x) is executed:
If your data is in the form of symbolic tensors, you should specify the steps argument (instead of the batch_size argument)
I understand what the error is but I'm unable to figure out how to make my_model.predict(x) work in the presence of symbolic tensors. If my_model.predict(x) is removed from the function f(x), the code runs without any error.
I checked the following link, link where TensorFlow optimizers are used to minimize an arbitrary function, but I think my problem is with the usage of underlying keras's model.predict() function. I appreciate any help. Thanks in advance!
I found the answer!
Basically, I was trying to optimize a function involving a trained ANN w.r.t the input variables to the ANN. So, all I wanted was to know how to call my_model and put it in f(x). Digging a bit into the Keras documentation here: https://keras.io/getting-started/functional-api-guide/, I found that all Keras models are callable just like the layers of the models! Quoting the information from the link,
..you can treat any model as if it were a layer, by calling it on a
tensor. Note that by calling a model you aren't just reusing the
architecture of the model, you are also reusing its weights.
Meanwhile, the model.predict(x) part expects x to be numpy arrays or evaluated tensors and does not take tensorflow variables as inputs (https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict).
So the following code worked:
## initializations
sess = tf.InteractiveSession()
x_init_value = np.array([1.5, 2.6])
x_placeholder = tf.placeholder(tf.float32)
x_var = tf.Variable(x_init_value, dtype=tf.float32)
# Check calling my_model
assign_step = tf.assign(x_var, x_placeholder)
sess.run(assign_step, feed_dict={x_placeholder: x_init_value})
model_output = my_model(x_var) # This simple step is all I wanted!
sess.run(model_output) # This outputs my_model's predicted value for input x_init_value
# Now, define the objective function that has to be minimized
f = my_model(x_var) - 0.5 + g(x_var) # g(x_var) is some function of x_var
# Define the optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=.001).minimize(f)
# Run the optimization steps
for i in range(50): # for 50 steps
_,loss = optimizer.minimize(f, var_list=[x_var])
print("step: ", i+1, ", loss: ", loss, ", X: ", x_var.eval()))

TensorFlow: why not use a function instead of a placeholder?

I am starting to use TensorFlow (with Python) and was wondering: when using a placeholder in a function, why not have an argument in my function which would feed a TensorFlow constant rather than the placeholder?
Here is an example (the difference is in x):
def sigmoid(z):
x = tf.constant(z, dtype=tf.float32, name = "x")
sigmoid = tf.sigmoid(x)
with tf.Session() as sess:
result = sess.run(sigmoid)
return result
instead of:
def sigmoid(z):
x = tf.placeholder(tf.float32, name = "...")
sigmoid = tf.sigmoid(x)
with tf.Session() as sess:
result = sess.run(sigmoid, feed_dict={x:z})
return result
The idea with Tensorflow is that you will repeat the same calculation on lots of data. when you write the code you are setting up a computational graph that later you will execute on the data. In your first example, you have hard-coded the data to a constant. This is not a typical tensorflow use case. The second example is better because it allows you to reuse the same computational graph with different data.

TensorFlow: get_variable() but for placeholders?

There is a function tf.get_variable('name') which allows to "implicitly" pass parameters into function like:
def function(sess, feed):
with tf.variable_scope('training', reuse=True):
cost = tf.get_variable('cost')
value = sess.run(cost, feed_dict=feed)
# other statements
But what if one want to pass a tf.placeholder into function? Is there same mechanism for placeholders, i.e. something like tf.get_placeholder():
def function(sess, cost, X_train, y_train):
# Note this is NOT a valid TF code
with tf.variable_scope('training', reuse=True):
features = tf.get_placeholder('features')
labels = tf.get_placeholder('labels')
feed = {features: X_train, labels: y_train}
value = sess.run(cost, feed_dict=feed)
print('Cost: %s' % value)
Or it doesn't make too much sense to do it and better to just construct placeholders inside of function?
Placeholders are just... placeholders. It's pointless "getting" a placeholder as if it has some sort of state (that's what get variable does, returns a variable in its current state).
Just use the same python variable everywhere.
Also, if you don't want to pass a python variable because your method signaturl becomes ugly, you can exploit the fact that you're building a graph and the graph itself contains the information about the declared placeholders.
You can do something like:
#define your placeholder
a = tf.placeholder(tf.float32, name="asd")
# then, when you need it, fetch if from the graph
graph = tf.get_default_graph()
placeholder = graph.get_tensor_by_name("asd:0")
Aside the fact that if you are working in the same script you should not need this, you can do that by getting the tensor by name, as in Tensorflow: How to get a tensor by name?
For instance
p = tf.placeholder(tf.float32)
p2 = tf.get_default_graph().get_tensor_by_name(p.name)
assert p == p2

Categories

Resources