I have a surprisingly simple question: I have implemented a complex custom op and its gradient in Tensorflow, assuming the forward is correct, I was wondering if there was an easy way to check if the finite differences approximates well your custom gradient at different points without having to re-implement it in an ugly way. I saw the function tf.test.compute_gradient_error()in the official doc but the source code is dense and hard to read and I cannot seem to find any other related questions or examples.
However I am sure there is one super simple self-contained example lying around that I missed ?
EDIT:
For instance if I try:
import tensorflow as tf
import numpy as np
start=np.random.normal(size=(100,1)).astype("float32")
x=tf.Variable(start)
w=2*tf.ones((1,1),dtype="float32")
y=tf.matmul(x,w)
#I differentiate y wrt x, which is a variable
check=tf.test.compute_gradient_error(x,[100,1],y,[100,1],x_init_value=start)
sess=tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(check)
It throws:
AttributeError: 'NoneType' object has no attribute 'run'
Looking into gradient_checker.py what am I doing wrong ?
So my problem was that gradient_checker.py calls get_default_session() to get the session it uses, which apparently does not work if the op is not explicitly connected to the session in use, which is done by scoping the op:
with sess.as_default_session():
check=tf.test.compute_gradient_error()
print check
It also needs to be said that the reason it needs to be this way comes from the fact that check is directly the result to a sess.run() of a tensor and not a node in the graph like most tensorflow functions.
Related
I'm trying to use tfd.TransformedDistribution to apply a chain of bijectors to modify a bivariate Gaussian distribution, and I'm getting the error noted above ("AttributeError: Tensor.name is meaningless when eager execution is enabled."). I'm using using TensorFlow 2.0 (Python) and TensorFlow Probability 0.9.0 in a Jupyter Notebook hosted in a Chrome browser, version 94.0.4606.61. The call that appears to provoke the error is this:
x_dist = tfd.TransformedDistribution(z, chain_of_bijectors)
Some of the chained bijectors have been subclassed using naming conventions similar to what is shown below, but the error happens even when I use a single bijector (i.e., even one derived directly from TensorFlow's library of bijectors). The bijectors appear to work normally (with no errors) when used in a scrutinized sequence that resembles the chain.
Example code snippet of a typical subclassed bijector:
class MyBijector(tfb.Bijector):
def __init__(self, validate_args=False, name='my_bijector'):
super(MyBijector, self).__init__(
validate_args=validate_args,
forward_min_event_ndims=0,
name=name
)
To resolve the error, I have tried different variations of the subclass names (for the two init's ), and removing the names altogether. (The fact that the same error occurs even when a single, non-subclassed bijector is used in the function call seems to suggest the issue is not really with the names of the bijectors). I also tried disabling eager execution (which seems unnecessary). When eager execution was disabled, the code ran normally until the same call, and then it produced a different error related to the chain of bijectors: "ValueError: 'chain_of_[...string of mostly bijector names omitted here...]/forward/add:0' is not a valid scope name".
Can anyone explain the cause of the AttributeError and how to fix it? If eager execution must be disabled to run this code, how can I fix the ValueError? Thanks!
Nevermind. I figured out the problem: In the function call listed above ("x_dist = tfd.TransformedDistribution(z, chain_of_bijectors)"), z was a sample from an underlying distribution, rather than the distribution itself, causing the error. The error went away once I passed z as an actual distribution object, rather than as a sample from such an object.
I'm looking for function for linear interpolation in tensorflow similar to np.interp(..)
I'm aware that tensorflow is able to receive any numpy function and apply it on tensors but
np.interp is only activated on single object and as far as I checked couldn't be broadcasted.
so is there any efficient way to apply it using tensoflow ?
Thank you
I know this is a late answer, but google brought me here, so my answer might be useful for others.
You can use the interp_regular_1d_grid from tensorflow probability.
It works in a similar fashion as numpy.interp(), but consult the documentation for exact functionality.
I have to call a couple of numpy operations on TF-Tensors for which I didn't find any equivalent tensorflow functions. As far as I know the easiest (and only?) way to convert a tensor into a numpy array is to run a session. Unfortunately at this point the graph is not completely built yet, so I have to close this session to continue building up the graph and then later start a new session after the graph is built. When I start a new session my data loader chooses different samples (because of shuffling I guess) than it did in the first session so this doesn't work.
Is there another way to convert my tensors to np.arrays without running the graph?
Is it generally sensible to run more than one session if I want to use the same data samples do propagate through the graph?
Any ideas to tackle this problem?
I already tried to use tf.python_func() but this only works for using eager execution. I wouldn't prefer to change my code base to use eager execution so I'm looking for a different way.
Thanks in advance and sorry for being a tensorflow noob.
I'm trying to use TensorFlow from an IPython notebook. I've created a function that defines a placeholder an a variable. Since I'm a TensorFlow newbie, I did not initialize the variable properly and got an error saying I did not initialize a placeholder.
I have two cells, one with the function and one with a function call. No matter how much I fix the function (and rerun both cells, of course) I keep getting initialization errors even after I fix the bug.
The only way to get it to work is to restart the kernel, which pretty much beats the purpose of a notebook, I can just write a Python script.
It is mostly speculation without seeing your code, but from what I read I believe to know what you are doing wrong.
When using Tensorflow inside a notebook you have to be especially careful not to confuse graph building code with evaluation code. You only need and should define the computational graph once at the beginning. Executing functions which define the graph again will just build another subgraph (this probably also goes for your function which defines the placeholder and variables). The tf.global_variables_initializer operation should also only be executed once.
It is crucial to understand that the Tensorflow graph can not be dynamically handled by the notebook, because python does not actually control Tensorflow variables. Python in this case is just a meta language for defining the graph and initiating computations.
So in the notebook after initializing the graph exactly once you can only call functions which wrap Tensorflow graph evaluation code, not graph building code dynamically without resetting the kernel. Examples for such methods which only evaluate an existing graph are session.run, other tf.Session methods or similar evaluation methods like tensor.eval.
So yea to make it clear, there is no way to change an already build graph without rebuilding it which in this case requires resetting the kernel, unless you just build new subgraphs over and over again (and initialize the new variables) but that will at some point use up all available memory.
Theano has Ops and functions.
What is the difference?
Functions seem nice and easy to define,
eg:
x = T.dmatrix('x')
linmax = function([x], T.maximum(x,0))
Ops seem complex to define. All abstract classes and such
but things like theano.tesnor.tanh and theano.tensor.nnet.sigmoid are defined as Ops.
I'm not to sure on the difference.
How would I write the above linmax function as a Op?
theano.function() return a python object that is callable. So you can use it do the the computation you described when it was called.
Theano Ops are part of the symbolic graph that describe the computation that you want. Do not forget that Theano have two step as many other language as C and others. You first need to describe the computation that you want, then compile. In C, you define that computation in text file. In Theano, you describe it with a Theano symbolic graph and that graph include Ops.
Then you compile, with possible gcc for C and with theano.function() in Theano.
So Op is the element op the symbolic graph. It describe the computation done at one point in the graph. This page in Theano tutorial describe the graph in more detail:
http://deeplearning.net/software/theano/tutorial/symbolic_graphs.html#theano-graphs
This page describe how to make an Op in Theano:
http://deeplearning.net/software/theano/tutorial/extending_theano.html
You can skip the section for optional part. So you can skip most of that page if you don't plan to make one and just want to understand the usage.