In TensorFlow, what is tf.identity used for? - python

I've seen tf.identity used in a few places, such as the official CIFAR-10 tutorial and the batch-normalization implementation on stackoverflow, but I don't see why it's necessary.
What's it used for? Can anyone give a use case or two?
One proposed answer is that it can be used for transfer between the CPU and GPU. This is not clear to me. Extension to the question, based on this: loss = tower_loss(scope) is under the GPU block, which suggests to me that all operators defined in tower_loss are mapped to the GPU. Then, at the end of tower_loss, we see total_loss = tf.identity(total_loss) before it's returned. Why? What would be the flaw with not using tf.identity here?

After some stumbling I think I've noticed a single use case that fits all the examples I've seen. If there are other use cases, please elaborate with an example.
Use case:
Suppose you'd like to run an operator every time a particular Variable is evaluated. For example, say you'd like to add one to x every time the variable y is evaluated. It might seem like this will work:
x = tf.Variable(0.0)
x_plus_1 = tf.assign_add(x, 1)
with tf.control_dependencies([x_plus_1]):
y = x
init = tf.initialize_all_variables()
with tf.Session() as session:
init.run()
for i in xrange(5):
print(y.eval())
It doesn't: it'll print 0, 0, 0, 0, 0. Instead, it seems that we need to add a new node to the graph within the control_dependencies block. So we use this trick:
x = tf.Variable(0.0)
x_plus_1 = tf.assign_add(x, 1)
with tf.control_dependencies([x_plus_1]):
y = tf.identity(x)
init = tf.initialize_all_variables()
with tf.Session() as session:
init.run()
for i in xrange(5):
print(y.eval())
This works: it prints 1, 2, 3, 4, 5.
If in the CIFAR-10 tutorial we dropped tf.identity, then loss_averages_op would never run.

tf.identity is useful when you want to explicitly transport tensor between devices (like, from GPU to a CPU).
The op adds send/recv nodes to the graph, which make a copy when the devices of the input and the output are different.
A default behavior is that the send/recv nodes are added implicitly when the operation happens on a different device but you can imagine some situations (especially in a multi-threaded/distributed settings) when it might be useful to fetch the value of the variable multiple times within a single execution of the session.run. tf.identity allows for more control with regard to when the value should be read from the source device. Possibly a more appropriate name for this op would be read.
Also, please note that in the implementation of tf.Variable link, the identity op is added in the constructor, which makes sure that all the accesses to the variable copy the data from the source only once. Multiple copies can be expensive in cases when the variable lives on a GPU but it is read by multiple CPU ops (or the other way around). Users can change the behavior with multiple calls to tf.identity when desired.
EDIT: Updated answer after the question was edited.
In addition, tf.identity can be used used as a dummy node to update a reference to the tensor. This is useful with various control flow ops. In the CIFAR case we want to enforce that the ExponentialMovingAverageOp will update relevant variables before retrieving the value of the loss. This can be implemented as:
with tf.control_dependencies([loss_averages_op]):
total_loss = tf.identity(total_loss)
Here, the tf.identity doesn't do anything useful aside of marking the total_loss tensor to be ran after evaluating loss_averages_op.

In addition to the above, I simply use it when I need to assign a name to ops that do not have a name argument, just like when initializing a state in RNN's:
rnn_cell = tf.contrib.rnn.MultiRNNCell([cells])
# no name arg
initial_state = rnn_cell.zero_state(batch_size,tf.float32)
# give it a name with tf.identity()
initial_state = tf.identity(input=initial_state,name="initial_state")

I came across another use case that is not completely covered by the other answers.
def conv_layer(input_tensor, kernel_shape, output_dim, layer_name, decay=None, act=tf.nn.relu):
"""Reusable code for making a simple convolutional layer.
"""
# Adding a name scope ensures logical grouping of the layers in the graph.
with tf.name_scope(layer_name):
# This Variable will hold the state of the weights for the layer
with tf.name_scope('weights'):
weights = weight_variable(kernel_shape, decay)
variable_summaries(weights, layer_name + '/weights')
with tf.name_scope('biases'):
biases = bias_variable([output_dim])
variable_summaries(biases, layer_name + '/biases')
with tf.name_scope('convolution'):
preactivate = tf.nn.conv2d(input_tensor, weights, strides=[1, 1, 1, 1], padding='SAME')
biased = tf.nn.bias_add(preactivate, biases)
tf.histogram_summary(layer_name + '/pre_activations', biased)
activations = act(biased, 'activation')
tf.histogram_summary(layer_name + '/activations', activations)
return activations
Most of the time when constructing a convolutional layer, you just want the activations returned so you can feed those into the next layer. Sometimes, however - for example when building an auto-encoder - you want the pre-activation values.
In this situation an elegant solution is to pass tf.identity as the activation function, effectively not activating the layer.

When our input data is serialized in bytes, and we want to extract features from this dataset. We can do so in key-value format and then get a placeholder for it. Its benefits are more realised when there are multiple features and each feature has to be read in different format.
#read the entire file in this placeholder
serialized_tf_example = tf.placeholder(tf.string, name='tf_example')
#Create a pattern in which data is to be extracted from input files
feature_configs = {'image': tf.FixedLenFeature(shape=[256], dtype=tf.float32),/
'text': tf.FixedLenFeature(shape=[128], dtype=tf.string),/
'label': tf.FixedLenFeature(shape=[128], dtype=tf.string),}
#parse the example in key: tensor dictionary
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
#Create seperate placeholders operation and tensor for each feature
image = tf.identity(tf_example['image'], name='image')
text = tf.identity(tf_example['text'], name='text')
label = tf.identity(tf_example['text'], name='label')

I found another application of tf.identity in Tensorboard.
If you use tf.shuffle_batch, it returns multiple tensors at once, so you see messy picture when visualizing the graph, you can't split tensor creation pipeline from actiual input tensors: messy
But with tf.identity you can create duplicate nodes, which don't affect computation flow: nice

In distribution training, we should use tf.identity or the workers will hang at waiting for initialization of the chief worker:
vec = tf.identity(tf.nn.embedding_lookup(embedding_tbl, id)) * mask
with tf.variable_scope("BiRNN", reuse=None):
out, _ = tf.nn.bidirectional_dynamic_rnn(fw, bw, vec, sequence_length=id_sz, dtype=tf.float32)
For details, without identity, the chief worker would treat some variables as local variables inappropriately and the other workers wait for an initialization operation that can not end

I see this kind of hack to check assert:
assertion = tf.assert_equal(tf.shape(image)[-1], 3, message="image must have 3 color channels")
with tf.control_dependencies([assertion]):
image = tf.identity(image)
Also it's used just to give a name:
image = tf.identity(image, name='my_image')

Related

Unnamed Op not showing in list_tensors command during debug session

I have these two unnamed op tensors logits and outputs under a variable scope, but the lt command isn't listing these two tensors under the op 'MatMul' and 'Softmax' during the tfdbg session after a test run on a checkpoint. Here is a snapshot of the code:
with tf.variable_scope(scope):
d_inputs = dropout(inputs, keep_prob=keep_prob, is_train=is_train)
d_memory = dropout(memory, keep_prob=keep_prob, is_train=is_train)
JX = tf.shape(inputs)[1]
with tf.variable_scope("attention"):
inputs_ = tf.nn.relu(dense(d_inputs, hidden, use_bias=False, scope="inputs"))
memory_ = tf.nn.relu(dense(d_memory, hidden, use_bias=False, scope="memory"))
outputs = tf.matmul(inputs_, tf.transpose(memory_, [0, 2, 1])) / (hidden ** 0.5)
mask = tf.tile(tf.expand_dims(mask, axis=1), [1, JX, 1])
# The tensor down below
logits = tf.nn.softmax(softmax_mask(outputs, mask))
# And the tensor down below here as well
outputs = tf.matmul(logits, memory)
res = tf.concat([inputs, outputs], axis=2)
What can I do to retrieve these variables for testing purposes on tfdbg?
For alternative purposes, can I retrieve them using the normal tensorflow session by using tf.add_to_collection(op_name, tensor) as mentioned in this answer?
There is really no such thing as "unnamed tensor". If you don't provide a name for an op, it will use a default name. The output tensors will usually be named using the operation's OpDef spec. If there is a single output, it can be named using just the op name. If the name is already taken, it will be made unique by appending _N to it.
You should see all tensors with plain lt command. Are you giving some options to it?
For the second question, tensorflow collections are basically just dictionaries that can be saved and restored "automatically". If you want to get have a persistent fixed name for some tensor, you can certainly save it in a collection and retrieve it after restoring from a checkpoint.

Tensorflow softmax evaluation

Will both the snippets produce the same result in tensorflow?
graph =tf.get_default_graph()
imgs = graph.get_tensor_by_name('Input:0')
logits = graph.get_tensor_by_name('Output/Conv:0')
for i in range(0,100):
pred_val = sess.run(tf.contrib.layers.softmax(logits), feed_dict={imgs: arr})
preds.append(pred_val)
pred = np.concatenate(preds)
And inserting the softmax at a different position like below:
imgs = graph.get_tensor_by_name('Input:0')
logits = graph.get_tensor_by_name('Output/Conv:0')
logits_sf = tf.contrib.layers.softmax(logits)
for i in range(0,100):
pred_val = sess.run(logits_sf, feed_dict={imgs: arr})
preds.append(pred_val)
pred = np.concatenate(preds)
Will both the outputs of the pred be different or the same?
[Updated] The new versions of code are identical in the functional sense: you will get the same pred_val on each iteration and, as a result, the same preds. But there's a catch in the second snippet and it's important. Each consecutive call of tf.contrib.layers.softmax creates a new node in the graph, even though it's used only once. If you print the graph definition, you'll see 100 softmax ops, while the first snippet will have just one. Another fun side effect: this node will not be present in tensorboard. Try to avoid creation of ops while training, because it's a sure way to consume all of your RAM and crash the process. In other words, the first snippet is better.
[Original answer] No, just because you've defined logits_sf node in the graph doesn't mean it will be executed in session. Your second code snippet evaluates only logits, which doesn't depend on the softmax op.
So the results will be different: the first one will produce the probability distribution in each pred_val, while the second one will produce raw logits.

Computing the weight-update-ratio in Tensorflow

I'm searching for a way to compute the weight-update-ratio for optimizer steps in Tensorflow. The weight-update-ratio is defined as the update-scale divided by the variable scale in each step and can be used for inspecting network training.
Ideally I want a non-intrusive way to compute it in a single session run, but couldn't accomplish quite what I was looking for. Since the update-scale and parameter scale are independent of the train step, one needs to add explicit dependencies to the graph in order to graph variable-scale before and after the update step. Unfortunately, it seems that in TF dependencies can only be defined for new nodes, which further complicates the issue.
So far, the best I've come up with is a context manager for definining the necessary operations. Its used as follows
opt = tf.train.AdamOptimizer(1e0)
grads = tf.gradients(loss, tf.trainable_variables())
grads = list(zip(grads, tf.trainable_variables()))
with compute_weight_update_ratio('wur') as wur:
train = opt.apply_gradients(grads_and_vars=grads)
# ...
with tf.Session() as sess:
sess.run(wur.ratio)
The full code of compute_weight_update_ratio can be found below. What bugs me is that in the current state the weight-update-ratio (at least norm_before) is computed with every training step, but for performance reason I'd rather prefer to do it selectively (e.g only when summaries are computed).
Any ideas on how to improve?
#contextlib.contextmanager
def compute_weight_update_ratio(name, var_scope=None):
'''Injects training to compute weight-update-ratio.
The weight-update-ratio is computed as the update scale divided
by the variable scale before the update and should be somewhere in the
range 1e-2 or 1e-3.
Params
------
name : str
Operation name
Kwargs
------
var_scope : str, optional
Name selection of variables to compute weight-update-ration for. Defaults to all. Regex supported.
'''
class WeightUpdateRatio:
def __init__(self):
self.num_train = len(tf.get_collection(tf.GraphKeys.TRAIN_OP))
self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=var_scope)
self.norm_before = tf.norm(self.variables, name='norm_before')
def compute_ratio(self,):
train_ops = tf.get_collection(tf.GraphKeys.TRAIN_OP)
assert len(train_ops) > self.num_train, 'Missing training op'
with tf.control_dependencies(train_ops[self.num_train:]):
self.norm_after = tf.norm(self.variables, name='norm_after')
absdiff = tf.abs(tf.subtract(self.norm_after, self.norm_before), name='absdiff')
self.ratio = tf.divide(absdiff, self.norm_before, name=name)
with tf.name_scope(name) as scope:
try:
wur = WeightUpdateRatio()
with tf.control_dependencies([wur.norm_before]):
yield wur
finally:
wur.compute_ratio()
You don't need to worry about performance too much. Tensorflow only executes the subgraph necessary to produce the output.
So, in your training loop, if wur.ratio is not called during an iteration, none of the extra nodes created to compute it will be executed.

Can I get a pointer to part of the tensor in tensorflow?

Say in tensorflow, I created a variable by
C = tf.Variable(tf.random_uniform([n_sample, n_sample], -1, 1), name='C'),
now I want to get a pointer to the first column of the variable, is there anyway I could do that? Would tf.slice(C, [0,0], [n_sample,1]) give me what I want or it will just create another variable with value stored in C.
The reason that I want to do it is because my optimization function is dependent on both C and each columns of C.
As far as I know you can't really get access to the data itself (i.e. like a pointer). The reasoning being is that the code will be data agnostic so that it can pass around the data to different CPUs or GPUs without you worrying about that part (or you could specify device to use but that gets cumbersome).
So tf.slice would be the correct function to use.
you could do :
for i in range(n_sample):
curr_slice = tf.slice(C, [i,0], [n_sample,1])
do_something(curr_slice)
this isn't the most efficient version but it's what you asked for in the comments.
for i inVectorized range(n_sample):approach
curr_sliceloss = tf.slice(C, [i,0], [n_sample,1])
y.assign_add( tf.nn.l2_loss(tf.sub(curr_slice,X - tf.matmul(X,curr_slice)C)) + lambdalamb * tf.nn.l2_loss(curr_slice) C)
loss=tf.reduce_sum(y)
Vectorized approach much cleaner:
loss = tf.nn.l2_loss(X - tf.matmul(X,C)) + lamb * tf.nn.l2_loss(C)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
sess.run(train_step)
You might need to initialize the some of the values by creating placeholders.
Alternatively I couldn't find it in skflow yet but in scikit learn it's a simple 3 liner.
from sklearn.linear_model import Ridge
clf = Ridge(alpha=1.0)
clf.fit(X, W)

Feeding parameters into placeholders in tensorflow

I'm trying to get into tensorflow, setting up a network and then feeding data to it. For some reason I end up with the error message ValueError: setting an array element with a sequence. I made a minimal example of what I'm trying to do:
import tensorflow as tf
K = 10
lchild = tf.placeholder(tf.float32, shape=(K))
rchild = tf.placeholder(tf.float32, shape=(K))
parent = tf.nn.tanh(tf.add(lchild, rchild))
input = [ tf.Variable(tf.random_normal([K])),
tf.Variable(tf.random_normal([K])) ]
with tf.Session() as sess :
print(sess.run([parent], feed_dict={ lchild: input[0], rchild: input[1] }))
Basically, I'm setting up a network with place holders and a sequence of input embeddings that I want to learn, and then I try to run the network, feeding the input embeddings into it. From what I can tell by searching for the error message, there might be something wrong with my feed_dict, but I can't see any obvious mismatches in eg. dimensionality.
So, what did I miss, or how did I get this completely backwards?
EDIT: I've edited the above to clarify that the input represents embeddings that need to be learned. I guess the question can be asked more sharply as: Is it possible to use placeholders for parameters?
The inputs should be numpy arrays.
So, instead of tf.Variable(tf.random_normal([K])), simply write np.random.randn(K) and everything should work as expected.
EDIT (The question was clarified after my answer):
It is possible to use placeholders as parameters but in a slightly different way. For example:
lchild = tf.placeholder(tf.float32, shape=(K))
rchild = tf.placeholder(tf.float32, shape=(K))
parent = tf.nn.tanh(tf.add(lchild, rchild))
loss = <some loss that depends on the parent tensor or lchild/rchild>
# Compute gradients with respect to the input variables
grads = tf.gradients(loss, [lchild, rchild])
inputs = [np.random.randn(K), np.random.randn(K)]
for i in range(<number of iterations>):
np_grads = sess.run(grads, feed_dict={lchild:inputs[0], rchild:inputs[1])
inputs[0] -= 0.1 * np_grads[0]
inputs[1] -= 0.1 * np_grads[1]
It is not however the best or easiest way to do this. The main problem with it is that at every iteration you need to copy numpy arrays in and out of the session (which is running potentially on a different device like GPU).
Placeholders generally are used to feed the data external to the model (like texts or images). The way to solve it using tensorflow utilities would be something like:
lchild = tf.Variable(tf.random_normal([K])
rchild = tf.Variable(tf.random_normal([K])
parent = tf.nn.tanh(tf.add(lchild, rchild))
loss = <some loss that depends on the parent tensor or lchild/rchild>
train_op = tf.train.GradientDescentOptimizer(loss).minimize(0.1)
for i in range(<number of iterations>):
sess.run(train_op)
# Retrieve the weights back to numpy:
np_lchild = sess.run(lchild)

Categories

Resources