Tensorflow: Variable Reuse for tf.map_fn - python

I created a function func that contains some variables. Now, I want to use this function standalone and also through tf.map_fn function and I want to keep the same set of variables for both the cases. But, apparently tf.map_fn function appends the current variable scope with map and hence the variable scope of standalone case can no longer matches the case with tf.map_fn. So, the following code throws an error as variable mul1/map/weights does not exist before calling it with reuse=True.
import tensorflow as tf
D = 5
batch_size = 1
def func(x):
W = tf.get_variable(initializer=tf.constant_initializer(1), shape=[D,1], dtype=tf.float32, trainable=True, name="weights")
y = tf.matmul(x, W)
return y
x = tf.placeholder(tf.float32, [batch_size, 5])
x_cat = tf.placeholder(tf.float32, [None, batch_size, 5])
with tf.variable_scope("mul1") as mul1_scope:
y_sum = func(x)
with tf.variable_scope(mul1_scope, reuse=True):
cost = tf.map_fn(lambda x: func(x), x_cat)
Here I want to run gradient update only on the variables under mul1/map scope. So, I can probably use tf.assign after every update to change the variables under mul1 scope (which is used only for the feedforward step). But that's a rather painful way to do variable sharing. So, I was wondering if there is any better way to solve this. Any help would be much appreciated !

Related

Tensorflow: create a tf.constant from a tf.Variable

I want to optimize a cost function. This cost function contains variables and other parameters that are not variables. This non-variable parameters are obtained from the variables.
Here is a toy example that illustrates the point:
import numpy as np
import tensorflow as tf
r_init = np.array([5.0,6.0])
x = tf.get_variable("x_var", initializer = r_init[0], trainable = True)
y = tf.get_variable("y_var", initializer = r_init[1], trainable = True)
def cost(x,y):
a = x
return a*((x-1.0)**2+(y-1.0)**2)
train_op = tf.train.AdamOptimizer(learning_rate=0.05).minimize(cost(x,y))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(100):
print(sess.run([cost(x,y), train_op]))
print('x=', x.eval(session=sess))
print('y=', y.eval(session=sess))
As you can see, the parameter a is defined from the variable x, on the other hand a should not be a variable, I want the optimizer to see it as a constant. This constant should be updated as the variable x is updated in the optimization process.
How can I define a non-variable parameter a from the variable x? I am making this up, but intuitively, what comes to my mind is something like:
a = tf.to_constant(x)
Any ideas?
You are looking for tf.stop_gradient:
a = tf.stop_gradient(x)
Quoting the docs,
This is useful any time you want to compute a value with TensorFlow but need to pretend that the value was a constant.

How to freeze specific nodes in a tensorflow variable while training?

Currently I am having trouble in making a few elements in a variable as non-trainable. It implies that given a variable such as x,
x= tf.Variable(tf.zeros([2,2]))
I wish to train only x[0,0] and x[1,1] while keeping x[0,1] ans x[1.0] as fixed while training.
Currently tensorflow does provide the option to make any variable non-trainable by using trainable=False or tf.stop_gradient(). However, these method will make the all element in x as non-trainable. My question is how to obtain this selectivity?
There is no selective lack of update as for now; however you can achieve this effect indirectly by specifing explicitely variables that should be updated. Both .minimize and all the gradient functions accept the list of variables you want to optimize over - just create a list omitting some of these, for example
v1 = tf.Variable( ... ) # we want to freeze it in one op
v2 = tf.Variable( ... ) # we want to freeze it in another op
v3 = tf.Variable( ... ) # we always want to train this one
loss = ...
optimizer = tf.train.GradientDescentOptimizer(0.1)
op1 = optimizer.minimize(loss,
var_list=[v for v in tf.get_collection(tf.TRAINABLE_VARIABLES) if v != v1])
op2 = optimizer.minimize(loss,
var_list=[v for v in tf.get_collection(tf.TRAINABLE_VARIABLES) if v != v2])
and now you can call them whenever you want to train wrt. subset of variables. Note that this might require 2 separate optimizers if you are using Adam or some other method gathering statistics (and you will end up with separate statistics per optimizer!). However if there is just one set of frozen variables per training - everything will be straightforward with var_list.
However there is no way to fix training of the subset of the variable. Tensorflow treats variable as a single unit, always. You have to specify your computations in a different way to achieve this, one way is to:
create a binary mask M with 1's where you want to stop updates over X
create separate variable X', which is non-trainable, and tf.assign to it value of X
output X'*M + (1-M)*X
for example:
x = tf.Variable( ... )
xp= tf.Variable( ..., trainable=False)
m = tf.Constant( ... ) # mask
cp= tf.Assign(x, xp)
with tf.control_dependencies([cp]):
x_frozen = m*xp + (1-m)*x
and you just use x_frozen instead of x. Note that we need control dependency as tf.assign can execute asynchronously, and here we want to make sure it always has the most up to date value of x.
You can use tf.stop_gradient trick to prevent masked tf.Variable elements from training. For example:
x = tf.Variable(tf.zeros([2, 2]))
mask = tf.constant([[1, 0], [0, 1]], dtype=x.dtype)
x = mask * x + tf.stop_gradient((1 - mask) * x)

TensorFlow: why not use a function instead of a placeholder?

I am starting to use TensorFlow (with Python) and was wondering: when using a placeholder in a function, why not have an argument in my function which would feed a TensorFlow constant rather than the placeholder?
Here is an example (the difference is in x):
def sigmoid(z):
x = tf.constant(z, dtype=tf.float32, name = "x")
sigmoid = tf.sigmoid(x)
with tf.Session() as sess:
result = sess.run(sigmoid)
return result
instead of:
def sigmoid(z):
x = tf.placeholder(tf.float32, name = "...")
sigmoid = tf.sigmoid(x)
with tf.Session() as sess:
result = sess.run(sigmoid, feed_dict={x:z})
return result
The idea with Tensorflow is that you will repeat the same calculation on lots of data. when you write the code you are setting up a computational graph that later you will execute on the data. In your first example, you have hard-coded the data to a constant. This is not a typical tensorflow use case. The second example is better because it allows you to reuse the same computational graph with different data.

variable scope issue in Tensorflow

def biLSTM(data, n_steps):
n_hidden= 24
data = tf.transpose(data, [1, 0, 2])
# Reshape to (n_steps*batch_size, n_input)
data = tf.reshape(data, [-1, 300])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
data = tf.split(0, n_steps, data)
lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, _, _ = tf.nn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, data, dtype=tf.float32)
return outputs, n_hidden
In my code I am calling this function twice to create 2 bidirectional LSTMs. Then I got the problem of reusing variables.
ValueError: Variable lstm/BiRNN_FW/BasicLSTMCell/Linear/Matrix
already exists, disallowed. Did you mean to set reuse=True in
VarScope?
To resolve this I added the LSTM definition in the function within with tf.variable_scope('lstm', reuse=True) as scope:
This led to a new issue
ValueError: Variable lstm/BiRNN_FW/BasicLSTMCell/Linear/Matrix does
not exist, disallowed. Did you mean to set reuse=None in VarScope?
Please help with a solution to this.
When you create BasicLSTMCell(), it creates all the required weights and biases to implement an LSTM cell under the hood. All of these variables are assigned names automatically. If you call the function more than once within the same scope you get the error you get. Since your question seems to state that you want to create two separate LSTM cells, you do not want to reuse the variables, but you do want to create them in separate scopes. You can do this in two different ways (I haven't actually tried to run this code, but it should work). You can call your function from within a unique scope
def biLSTM(data, n_steps):
... blah ...
with tf.variable_scope('LSTM1'):
outputs, hidden = biLSTM(data, steps)
with tf.variable_scope('LSTM2'):
outputs, hidden = biLSTM(data, steps)
or you can pass a unique scope name to the function and use the scope inside
def biLSTM(data, n_steps, layer_name):
... blah...
with tf.variable_scope(layer_name) as scope:
lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, _, _ = tf.nn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, data, dtype=tf.float32)
return outputs, n_hidden
l1 = biLSTM(data, steps, 'layer1')
l2 = biLSTM(data, steps, 'layer2')
It is up to your coding sensibilities which approach to choose, they are functionally pretty much the same.
I also has the similar problem. However I was using keras implementation with pretrained Resnet50 model.
It worked for me when I updated the tensorflow version using following command:
conda update -f -c conda-forge tensorflow
and used
from keras import backend as K
K.clear_session

Cannot gather gradients for GradientDescentOptimizer in TensorFlow

I've been trying to gather the gradient steps for each step of the GradientDescentOptimizer within TensorFlow, however I keep running into a TypeError when I try to pass the result of apply_gradients() to sess.run(). The code I'm trying to run is:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
x = tf.placeholder(tf.float32,[None,784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W)+b)
y_ = tf.placeholder(tf.float32,[None,10])
cross_entropy = -tf.reduce_sum(y_*log(y))
# note that up to this point, this example is identical to the tutorial on tensorflow.org
gradstep = tf.train.GradientDescentOptimizer(0.01).compute_gradients(cross_entropy)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
batch_x,batch_y = mnist.train.next_batch(100)
print sess.run(gradstep, feed_dict={x:batch_x,y_:batch_y})
Note that if I replace the last line with print sess.run(train_step,feed_dict={x:batch_x,y_:batch_y}), where train_step = tf.GradientDescentOptimizer(0.01).minimize(cross_entropy), the error is not raised. My confusion arises from the fact that minimize calls compute_gradients with exactly the same arguments as its first step. Can someone explain why this behavior occurs?
The Optimizer.compute_gradients() method returns a list of (Tensor, Variable) pairs, where each tensor is the gradient with respect to the corresponding variable.
Session.run() expects a list of Tensor objects (or objects convertible to a Tensor) as its first argument. It does not understand how to handle a list of pairs, and hence you get a TypeError which you try to run sess.run(gradstep, ...)
The correct solution depends on what you are trying to do. If you want to fetch all of the gradient values, you can do the following:
grad_vals = sess.run([grad for grad, _ in gradstep], feed_dict={x: batch_x, y: batch_y})
# Then, e.g., nuild a variable name-to-gradient dictionary.
var_to_grad = {}
for grad_val, (_, var) in zip(grad_vals, gradstep):
var_to_grad[var.name] = grad_val
If you also want to fetch the variables, you can execute the following statement separately:
sess.run([var for _, var in gradstep])
...though note that—without further modification to your program—this will just return the initial values for each variable.
You will have to run the optimizer's training step (or otherwise call Optimizer.apply_gradients()) to update the variables.
minimize calls compute_gradients followed by apply_gradients: it's possible you're missing the second step.
compute_gradients just returns the grads / variables, but doesn't apply the update rule to them.
Here is an example: https://github.com/tensorflow/tensorflow/blob/f2bd0fc399606d14b55f3f7d732d013f32b33dd5/tensorflow/python/training/optimizer.py#L69

Categories

Resources