I'm trying to use TensorFlow from an IPython notebook. I've created a function that defines a placeholder an a variable. Since I'm a TensorFlow newbie, I did not initialize the variable properly and got an error saying I did not initialize a placeholder.
I have two cells, one with the function and one with a function call. No matter how much I fix the function (and rerun both cells, of course) I keep getting initialization errors even after I fix the bug.
The only way to get it to work is to restart the kernel, which pretty much beats the purpose of a notebook, I can just write a Python script.
It is mostly speculation without seeing your code, but from what I read I believe to know what you are doing wrong.
When using Tensorflow inside a notebook you have to be especially careful not to confuse graph building code with evaluation code. You only need and should define the computational graph once at the beginning. Executing functions which define the graph again will just build another subgraph (this probably also goes for your function which defines the placeholder and variables). The tf.global_variables_initializer operation should also only be executed once.
It is crucial to understand that the Tensorflow graph can not be dynamically handled by the notebook, because python does not actually control Tensorflow variables. Python in this case is just a meta language for defining the graph and initiating computations.
So in the notebook after initializing the graph exactly once you can only call functions which wrap Tensorflow graph evaluation code, not graph building code dynamically without resetting the kernel. Examples for such methods which only evaluate an existing graph are session.run, other tf.Session methods or similar evaluation methods like tensor.eval.
So yea to make it clear, there is no way to change an already build graph without rebuilding it which in this case requires resetting the kernel, unless you just build new subgraphs over and over again (and initialize the new variables) but that will at some point use up all available memory.
Related
I'm trying to use tfd.TransformedDistribution to apply a chain of bijectors to modify a bivariate Gaussian distribution, and I'm getting the error noted above ("AttributeError: Tensor.name is meaningless when eager execution is enabled."). I'm using using TensorFlow 2.0 (Python) and TensorFlow Probability 0.9.0 in a Jupyter Notebook hosted in a Chrome browser, version 94.0.4606.61. The call that appears to provoke the error is this:
x_dist = tfd.TransformedDistribution(z, chain_of_bijectors)
Some of the chained bijectors have been subclassed using naming conventions similar to what is shown below, but the error happens even when I use a single bijector (i.e., even one derived directly from TensorFlow's library of bijectors). The bijectors appear to work normally (with no errors) when used in a scrutinized sequence that resembles the chain.
Example code snippet of a typical subclassed bijector:
class MyBijector(tfb.Bijector):
def __init__(self, validate_args=False, name='my_bijector'):
super(MyBijector, self).__init__(
validate_args=validate_args,
forward_min_event_ndims=0,
name=name
)
To resolve the error, I have tried different variations of the subclass names (for the two init's ), and removing the names altogether. (The fact that the same error occurs even when a single, non-subclassed bijector is used in the function call seems to suggest the issue is not really with the names of the bijectors). I also tried disabling eager execution (which seems unnecessary). When eager execution was disabled, the code ran normally until the same call, and then it produced a different error related to the chain of bijectors: "ValueError: 'chain_of_[...string of mostly bijector names omitted here...]/forward/add:0' is not a valid scope name".
Can anyone explain the cause of the AttributeError and how to fix it? If eager execution must be disabled to run this code, how can I fix the ValueError? Thanks!
Nevermind. I figured out the problem: In the function call listed above ("x_dist = tfd.TransformedDistribution(z, chain_of_bijectors)"), z was a sample from an underlying distribution, rather than the distribution itself, causing the error. The error went away once I passed z as an actual distribution object, rather than as a sample from such an object.
I have to call a couple of numpy operations on TF-Tensors for which I didn't find any equivalent tensorflow functions. As far as I know the easiest (and only?) way to convert a tensor into a numpy array is to run a session. Unfortunately at this point the graph is not completely built yet, so I have to close this session to continue building up the graph and then later start a new session after the graph is built. When I start a new session my data loader chooses different samples (because of shuffling I guess) than it did in the first session so this doesn't work.
Is there another way to convert my tensors to np.arrays without running the graph?
Is it generally sensible to run more than one session if I want to use the same data samples do propagate through the graph?
Any ideas to tackle this problem?
I already tried to use tf.python_func() but this only works for using eager execution. I wouldn't prefer to change my code base to use eager execution so I'm looking for a different way.
Thanks in advance and sorry for being a tensorflow noob.
Description
I wrap TensorFlow model with a loss function in a model_fn() for a tf.estimator.Estimator instantiation.
Various optimizers (e.g. tf.train.MomentumOptimizer, or tf.train.AdagradOptimizer) create Read(_x) operations for all trainable variables in the model (gamma, beta of batch normalization, kernels of convolutions, ...) when optimizer.apply_gradients() is called.
Specifically, they are named as Read_<num>/ReadVariableOp, and are of type tf.Operation.
The problem is that these variables are created outside of any tf.variable_scope or tf.name_scope at the root of the tf.Graph.
This totally messes up readability in TensorBoard:
The red frame is the actual model. The blue frame encapsulates a tiny fraction of all 300+ read functions.
Question
Is there a way how I could wrap all Read_<num> Operations in something like a tf.name_scope()? Or is there another way to programmatically (i.e. not by clicking with the mouse) remove them from TensorBoard?
What have I tried
Wrap the call to apply_gradients() like this:
with tf.name_scope('apply_gradients_to_ns'):
with tf.variable_scope('apply_gradients_to_vs'):
minimize_op = optimizer.apply_gradients(
grads_and_vars=grads_and_vars,
global_step=tf.train.get_or_create_global_step(),
name='apply_gradients_to_name'
)
with no effect. Still, the scope of all Read_<num> operations is not influenced.
Trace the creation of these operations:
In tensorflow.python.training.slot_creator.py, line 179, in create_zeros_slot(..., colocate_with_primary=True) triggers a call to tensorflow.python.ops.variable_scope.py, line 1298, get_variable(..., use_resource=None) with use_resource=True.
However, I don't want to mess around in the source code of TensorFlow.
Also, I conclude that this behavior is intended and I just use it wrong.
How should it be used?
Try different distribution strategies: OneDeviceStrategy, MirroredStrategy.
Both produce a similar effect.
The only difference is that the MirroredStrategy creates group_deps_<num> variables.
Code to reproduce this effect, a more detailed description, and two more screenshots can be found in this repository: https://github.com/patzm/tf-estimator-distribute-so
I have a surprisingly simple question: I have implemented a complex custom op and its gradient in Tensorflow, assuming the forward is correct, I was wondering if there was an easy way to check if the finite differences approximates well your custom gradient at different points without having to re-implement it in an ugly way. I saw the function tf.test.compute_gradient_error()in the official doc but the source code is dense and hard to read and I cannot seem to find any other related questions or examples.
However I am sure there is one super simple self-contained example lying around that I missed ?
EDIT:
For instance if I try:
import tensorflow as tf
import numpy as np
start=np.random.normal(size=(100,1)).astype("float32")
x=tf.Variable(start)
w=2*tf.ones((1,1),dtype="float32")
y=tf.matmul(x,w)
#I differentiate y wrt x, which is a variable
check=tf.test.compute_gradient_error(x,[100,1],y,[100,1],x_init_value=start)
sess=tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(check)
It throws:
AttributeError: 'NoneType' object has no attribute 'run'
Looking into gradient_checker.py what am I doing wrong ?
So my problem was that gradient_checker.py calls get_default_session() to get the session it uses, which apparently does not work if the op is not explicitly connected to the session in use, which is done by scoping the op:
with sess.as_default_session():
check=tf.test.compute_gradient_error()
print check
It also needs to be said that the reason it needs to be this way comes from the fact that check is directly the result to a sess.run() of a tensor and not a node in the graph like most tensorflow functions.
I am trying to register a python function and its gradient as a tensorflow operation.
I found many useful examples e.g.:
Write Custom Python-Based Gradient Function for an Operation? (without C++ Implementation)
https://programtalk.com/python-examples/tensorflow.python.framework.function.Defun/
Nonetheless I would like to register attributes in the operation and use these attributes in the gradient definition by calling op.get_attr('attr_name').
Is this possible without going down to C implementation?
May you give me an example?
Unfortunately I don't believe it is possible to add attributes without using a C++ implementation of the operation. One feature that may help though is that you can define 'private' attributes by prepending an underscore to the start. I'm not sure if this is well documented or what the long-term guarantees are, but you can try setting '_my_attr_name' and you should be able to retrieve it later.