I am new to tensorflow , I am not able to understand the difference of variable and constant, I get the idea that we use variables for equations and constants for direct values , but why code #1 works only and why not code#2 and #3, and please explain in which cases we have to run our graph first(a) and then our variable(b) i.e
(a) session.run(model)
(b) print(session.run(y))
and in which case I can directly execute this command
i.e
print(session.run(y))
Code #1 :
x = tf.constant(35, name='x')
y = tf.Variable(x + 5, name='y')
model = tf.global_variables_initializer()
with tf.Session() as session:
session.run(model)
print(session.run(y))
Code #2 :
x = tf.Variable(35, name='x')
y = tf.Variable(x + 5, name='y')
model = tf.global_variables_initializer()
with tf.Session() as session:
session.run(model)
print(session.run(y))
Code #3 :
x = tf.constant(35, name='x')
y = tf.constant(x + 5, name='y')
model = tf.global_variables_initializer()
with tf.Session() as session:
session.run(model)
print(session.run(y))
In TensorFlow the differences between constants and variables are that when you declare some constant, its value can't be changed in the future (also the initialization should be with a value, not with operation).
Nevertheless, when you declare a Variable, you can change its value in the future with tf.assign() method (and the initialization can be achieved with a value or operation).
The function tf.global_variables_initializer() initialises all variables in your code with the value passed as parameter, but it works in async mode, so doesn't work properly when dependencies exists between variables.
Your first code (#1) works properly because there is no dependencies on variable initialization and the constant is constructed with a value.
The second code (#2) doesn't work because of the async behavior of tf.global_variables_initializer(). You can fix it using tf.variables_initializer() as follows:
x = tf.Variable(35, name='x')
model_x = tf.variables_initializer([x])
y = tf.Variable(x + 5, name='y')
model_y = tf.variables_initializer([y])
with tf.Session() as session:
session.run(model_x)
session.run(model_y)
print(session.run(y))
The third code (#3) doesn't work properly because you are trying to initialize a constant with an operation, that isn't possible. To solve it, an appropriate strategy is (#1).
Regarding to your last question. You need to run (a) session.run(model) when there are variables in your calculation graph (b) print(session.run(y)).
I will point the difference when using eager execution.
As of Tensorflow 2.0.b1, Variables and Constant trigger different behaviours when using tf.GradientTape. Strangely, the official document is not verbal about it enough.
Let's look at the example code in https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y = x * x
z = y * y
dz_dx = g.gradient(z, x) # 108.0 (4*x^3 at x = 3)
dy_dx = g.gradient(y, x) # 6.0
del g # Drop the reference to the tape
You had to watch x which is a Constant. GradientTape does NOT automatically watch constants in the context. Additionally, it can watch only one tensor per GradientTape. If you want to get gradients of multiple Constants, you need to nest GradientTapes. For example,
x = tf.constant(3.0)
x2 = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
g.watch(x)
with tf.GradientTape(persistent=True) as g2:
g2.watch(x2)
y = x * x
y2 = y * x2
dy_dx = g.gradient(y, x) # 6
dy2_dx2 = g2.gradient(y2, x2) # 9
del g, g2 # Drop the reference to the tape
On the other hand, Variables are automatically watched by GradientTape.
By default GradientTape will automatically watch any trainable variables that are accessed inside the context. Source: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape
So the above will look like,
x = tf.Variable(3.0)
x2 = tf.Variable(3.0)
with tf.GradientTape(persistent=True) as g:
y = x * x
y2 = y * x2
dy_dx = g.gradient(y, x) # 6
dy2_dx2 = g.gradient(y2, x2) # 9
del g # Drop the reference to the tape
print(dy_dx)
print(dy2_dx2)
Of course, you can turn off the automatic watching by passing watch_accessed_variables=False. The examples may not be so practical but I hope this clears someone's confusion.
Another way to look to the differences is:
tf.constant : are fixed values, and hence not trainable.
tf.Variable: these are tensors (arrays) that were initialized in a session and are trainable (with trainable i mean this can be optimized and can changed over time)
Related
I am forwarding, and backpropping tensor data X through two simple nn.Module PyTorch models instances, model1 and model2.
I can't get this process to work without usage of the depreciated Variable API.
So this works just fine:
y1 = model1(X)
v = Variable(y1.data, requires_grad=training) # Its all about this line!
y2 = model2(v)
criterion = nn.NLLLoss()
loss = criterion(y2, y)
loss.backward()
y1.backward(v.grad)
self.step()
But this will throw an error:
y1 = model1(X)
y2 = model2(y1)
criterion = nn.NLLLoss()
loss = criterion(y2, y)
loss.backward()
y1.backward(y1.grad) # it breaks here
self.step()
>>> RuntimeError: grad can be implicitly created only for scalar outputs
I just can't seem to find a relevant difference between v in the first implementation, and y1 in the second. In both cases requires_grad is set to True. The only thing I could find was that y1.grad_fn=<ThnnConv2DBackward> and v.grad_fn=<ThnnConv2DBackward>
What am I missing here? What (tensor attributes?) do I not know about, and if Variable is depreciated, what other implementation would work?
[UPDATED]
You are not correctly passing the y1.grad into y1.backward in the second example. After the first backward all the intermediate gradient will be destroyed, you need a special hook to extract that gradients. And in your case you are passing the None value. Here is small example to reproduce your case:
Code:
import torch
import torch.nn as nn
torch.manual_seed(42)
class Model1(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x.pow(3)
class Model2(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x / 2
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
# We are going to backprop 2 times, so we need to
# retain_graph=True while first backward
loss.backward(retain_graph=True)
try:
y1.backward(y1.grad)
except RuntimeError as err:
print(err)
print('y1.grad: ', y1.grad)
Output:
grad can be implicitly created only for scalar outputs
y1.grad: None
So you need to extract them correctly:
Code:
def extract(V):
"""Gradient extractor.
"""
def hook(grad):
V.grad = grad
return hook
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
y1.register_hook(extract(y1))
loss.backward(retain_graph=True)
print('y1.grad', y1.grad)
y1.backward(y1.grad)
Output:
y1.grad: tensor([[-0.1763, -0.2114, -0.0266, -0.3293, 0.0534]])
After some investigation I came to the following two solutions.
The solution provided elsewhere in this thread retained the computation graph manually, without an option the free them, thus running fine initially, but causing OOM errors later on.
The first solution is to tie the models together using the built in torch.nn.Sequential as such:
model = torch.nn.Sequential(Model1(), Model2())
it's as easy as that. It looks clean and behaves exactly like an ordinary model would.
The alternative is to simply tie them together manually:
model1 = Model1()
model2 = Model2()
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
loss.backward()
My fear that this would only backpropagate model2 turned out to be unsubstantiated, since model1 is also stored in the computation graph that is back propagated over.
This implementation enabled inceased transparancy of the interface between the two models, compared to the previous implementation.
Currently, I have the following code:
x = tf.placeholder(tf.int32, name = "x")
y = tf.Variable(0, name="y")
y = 2*x**2 + 5
for i in range(1,10):
print("Value of y for x = ",i, " is: ",sess.run(y, feed_dict={x:i}))
However, when I try to display this on tensorboard, this gets messy.
Ideally I'd want to do y= tf.Variable(2*x**2 +5) but tensorflow throws an error telling me that x is uninitialized.
Or perhaps I shouldn't use tf.Variable and use something else?
If you really want to do that with a tf.Variable, you can do that in two ways. You can use the desired expression as the initialization value for the variable. Then, when you initialize the variable, you pass the x value in the feed_dict.
import tensorflow as tf
# Placeholder shape must be specified or use validate_shape=False in tf.Variable
x = tf.placeholder(tf.int32, (), name="x")
# Initialization value for variable is desired expression
y = tf.Variable(2 * x ** 2 + 5, name="y")
with tf.Session() as sess:
for i in range(1,10):
# Initialize variable on each iteration
sess.run(y.initializer, feed_dict={x: i})
# Show value
print("Value of y for x =", i , "is:", sess.run(y))
Alternatively, you can do the same thing with a tf.assign operation. In this case, you pass the x value when you run the assignment.
import tensorflow as tf
# Here placeholder shape is not stricly required as tf.Variable already gives the shape
x = tf.placeholder(tf.int32, name="x")
# Specify some initialization value for variable
y = tf.Variable(0, name="y")
# Assign expression value to variable
y_assigned = tf.assign(y, 2 * x** 2 + 5)
# Initialization can be skipped in this case since we always assign new value
with tf.Graph().as_default(), tf.Session() as sess:
for i in range(1,10):
# Assign vale to variable in each iteration (and get value after assignment)
print("Value of y for x =", i , "is:", sess.run(y_assigned, feed_dict={x: i}))
However, as pointed out by Nakor, you may not need a variable if y is simply supposed to be the result of that expression for whatever value x takes. The purpose of a variable is to hold a value that will be maintained in future calls to run. So you would need it only if you want to set y to some value depending on x, and then maintain the same value even if x changes (or even if x is not provided at all).
I think you misunderstood was a tf.Variable is. To quote the Tensorflow documentation:
A tf.Variable represents a tensor whose value can be changed by
running ops on it. Unlike tf.Tensor objects, a tf.Variable exists
outside the context of a single session.run call.
So variables will be your biases and weights in your neural network. These will vary when training your network. In Tensorflow, if you want to use your variables, you need to initialize them (using a constant or random value). That's what your error is about: you're defining y as a tf.Variable so it needs to be initialized.
However, your y is deterministic, it's not a tf.Variable. You can just remove the line where you define y and it works fine:
import tensorflow as tf
x = tf.placeholder(tf.int32, name = "x")
y = 2*x**2 + 5
with tf.Session() as sess:
for i in range(1,10):
print("Value of y for x = ",i, " is: ",sess.run(y, feed_dict={x:i}))
It returns:
Value of y for x = 1 is: 7
Value of y for x = 2 is: 13
Value of y for x = 3 is: 23
Value of y for x = 4 is: 37
Value of y for x = 5 is: 55
Value of y for x = 6 is: 77
Value of y for x = 7 is: 103
Value of y for x = 8 is: 133
Value of y for x = 9 is: 167
See the code snippet:
import tensorflow as tf
x = tf.Variable(1)
op = tf.assign(x, x + 1)
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run([x, op]))
There are two possible results:
x=1 and op=2
x=2 and op=2
They depend on the order of evaluation, for the first case, x is evaluated before op, and for the second case, x is evaluated after op.
I have run the code many times, but the result is always x=2 and op=2. So I guess that tensorflow can guarantee x is evaluated after op. Is it right? And how does tensorflow guarantee the dependence?
Update
For the case above, the result is determinate. But in the follow case, the result is not determinate.
import tensorflow as tf
x = tf.Variable(1)
op = tf.assign(x, x + 1)
x = x + 0 # add this line
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(5):
print(sess.run([x, op]))
In the first code, x is Variable and op depends on x, so x is always evaluated after op. But in the second case, x becomes Tensor, and op depend on Variable x(After x = x + 0, x is overrided). So the op doesn't depend on Tensor x.
The order in which tensors are evaluated is undefined. See the API docs (towards, the very bottom, in the "Returns" info on Session.run()). As such, you should not rely on them being executed in a particular order. If you need to guarantee an order you should probably use separate run calls for the different tensors/ops.
This works:
with tf.device('/cpu:0'):
# x = tf.get_variable('x', shape=(), initializer=tf.constant_initializer(1), dtype=tf.int32)
x = tf.Variable(1)
op = tf.assign(x, x+1)
with tf.control_dependencies([op]):
x = x + 0
# x = tf.multiply(x, 3)
# x = tf.add(x, 0)
but not always:
with tf.device('/cpu:0'):
# x = tf.get_variable('x', shape=(), initializer=tf.constant_initializer(1), dtype=tf.int32)
x = tf.Variable(1)
with tf.device('/gpu:0'): # add this line.
op = tf.assign(x, x+1)
with tf.control_dependencies([op]):
x = x + 0
# x = tf.multiply(x, 3)
# x = tf.add(x, 0)
I think the problem is from:
Some operation (like x = x + 0, read value from the Variable and then add 0) depends on the value of the Variable, while the Variable is changed by some assign operation (like op = tf.assign(x, x+1)). If there are no dependencies, these two operations are parallel. So when reading the value of Variable, it is not sure whether assign is already done.
transferring data from different devices. Even there are dependencies, it is still indeterminate.
In brief, if all variables and operations are in the same device, then with tf.control_dependencies should guarantee that op is before the add operation.
Note that:
(these notes below are not important but might help you)
tf.assign operations only update the Variable, not Tensor.
When you do x=x+0, the new x becomes a Tensor; but the op = tf.assign(x, x+1) returns the ref of Variable. So op should always be determinate because it depends on the current value of Variable which will not be changed by other operations.
GPU does not support int32 variable. When I run your code snippet on my machine (tf-gpu1.12), variable is created on CPU but ops are done on GPU. You can check the variables and ops by config = tf.ConfigProto(log_device_placement=True).
I am trying to implement a little tweaked version of the Batch Normalization operation; in which I need to keep the moving average values like mean and variance explicitly. In order to do that, I am doing some experimentation with assignment and control dependency mechanisms in the Tensorflow and I run into a mysterious problem. I have the following toy code; in which I am trying to test whether the tf.control_dependencies work as intended:
dataset = MnistDataSet(validation_sample_count=10000,
load_validation_from="validation_indices")
samples, labels, indices_list, one_hot_labels =
dataset.get_next_batch(batch_size=GlobalConstants.BATCH_SIZE)
samples = np.expand_dims(samples, axis=3)
flat_data = tf.contrib.layers.flatten(GlobalConstants.TRAIN_DATA_TENSOR)
mean = tf.Variable(name="mean", initial_value=tf.constant(100.0, shape=[784], dtype=tf.float32),
trainable=False, dtype=tf.float32)
a = tf.Variable(name="a", initial_value=5.0, trainable=False)
b = tf.Variable(name="b", initial_value=4.0, trainable=False)
c = tf.Variable(name="c", initial_value=0.0, trainable=False)
batch_mean, batch_var = tf.nn.moments(flat_data, [0])
b_op = tf.assign(b, a)
mean_op = tf.assign(mean, batch_mean)
with tf.control_dependencies([b_op, mean_op]):
c = a + b
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
results = sess.run([c, mean], feed_dict={GlobalConstants.TRAIN_DATA_TENSOR: samples})
I am simply loading a data batch with each entry having 784 dimensions, calculate the moments of it and try to store the batch_mean into the variable mean. I trivially store the variable a's value into b as well.
In the last line, when I run the graph for the values of c and mean, I see c as 10, which is the expected value. But mean is still a vector of 100's and does not contain the batch mean. It is like the mean_op = tf.assign(mean, batch_mean) has not been executed.
What can be the reason of this? As far as I know, all operations in the tf.control_dependencies call must be executed before any operation in the following context; I explicitly call c here, which is in the context. Am I missing something?
This is a known "feature" of tf.Session.run(). The c and mean ops are independent, hence mean may be evaluated before c (which would update mean).
Here's a shorter version of this effect:
a = tf.Variable(name="a", initial_value=1.0, trainable=False)
b = tf.Variable(name="b", initial_value=0.0, trainable=False)
dependent_op = tf.assign(b, a * 3)
with tf.control_dependencies([dependent_op]):
c = a + 1
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run([c, b]))
print(sess.run([b]))
The second evaluation of b is guaranteed to return [3.0]. But the first run may return either [2.0 3.0] or [2.0 0.0].
I'm fairly new to tensorflow, tried to calculate argmin of a quadratic function. I want to see the value of x and y after each iteration. Code:
import tensorflow as tf
x = tf.Variable(1.0,name="x")
y = x**2 - 4*x + 3
alpha = 0.05
optimizer = tf.train.AdamOptimizer(learning_rate = alpha).minimize(y)
num_epochs = 20
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(num_epochs):
print("Epoch: %d" %epoch)
opt,x,result = sess.run([optimizer,x,y])
print(result)
The error I get is argument has invalid type , must be a string or Tensor.
It works if I don't try to get the value of x, just y and opt.
In your line
opt,x,result = sess.run([optimizer,x,y])
you assign the evaluated result of the x operation to variable x - thus, in the next iteration, x is no longer tf.Variable(1.0,name="x") but the result from the previous iteration. Just use another name for the variable and it should work.