What is the difference between a Op and a Function - python

Theano has Ops and functions.
What is the difference?
Functions seem nice and easy to define,
eg:
x = T.dmatrix('x')
linmax = function([x], T.maximum(x,0))
Ops seem complex to define. All abstract classes and such
but things like theano.tesnor.tanh and theano.tensor.nnet.sigmoid are defined as Ops.
I'm not to sure on the difference.
How would I write the above linmax function as a Op?

theano.function() return a python object that is callable. So you can use it do the the computation you described when it was called.
Theano Ops are part of the symbolic graph that describe the computation that you want. Do not forget that Theano have two step as many other language as C and others. You first need to describe the computation that you want, then compile. In C, you define that computation in text file. In Theano, you describe it with a Theano symbolic graph and that graph include Ops.
Then you compile, with possible gcc for C and with theano.function() in Theano.
So Op is the element op the symbolic graph. It describe the computation done at one point in the graph. This page in Theano tutorial describe the graph in more detail:
http://deeplearning.net/software/theano/tutorial/symbolic_graphs.html#theano-graphs
This page describe how to make an Op in Theano:
http://deeplearning.net/software/theano/tutorial/extending_theano.html
You can skip the section for optional part. So you can skip most of that page if you don't plan to make one and just want to understand the usage.

Related

Checking custom gradient easily with Tensorflow

I have a surprisingly simple question: I have implemented a complex custom op and its gradient in Tensorflow, assuming the forward is correct, I was wondering if there was an easy way to check if the finite differences approximates well your custom gradient at different points without having to re-implement it in an ugly way. I saw the function tf.test.compute_gradient_error()in the official doc but the source code is dense and hard to read and I cannot seem to find any other related questions or examples.
However I am sure there is one super simple self-contained example lying around that I missed ?
EDIT:
For instance if I try:
import tensorflow as tf
import numpy as np
start=np.random.normal(size=(100,1)).astype("float32")
x=tf.Variable(start)
w=2*tf.ones((1,1),dtype="float32")
y=tf.matmul(x,w)
#I differentiate y wrt x, which is a variable
check=tf.test.compute_gradient_error(x,[100,1],y,[100,1],x_init_value=start)
sess=tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(check)
It throws:
AttributeError: 'NoneType' object has no attribute 'run'
Looking into gradient_checker.py what am I doing wrong ?
So my problem was that gradient_checker.py calls get_default_session() to get the session it uses, which apparently does not work if the op is not explicitly connected to the session in use, which is done by scoping the op:
with sess.as_default_session():
check=tf.test.compute_gradient_error()
print check
It also needs to be said that the reason it needs to be this way comes from the fact that check is directly the result to a sess.run() of a tensor and not a node in the graph like most tensorflow functions.

Defining tensorflow operations in python with attributes

I am trying to register a python function and its gradient as a tensorflow operation.
I found many useful examples e.g.:
Write Custom Python-Based Gradient Function for an Operation? (without C++ Implementation)
https://programtalk.com/python-examples/tensorflow.python.framework.function.Defun/
Nonetheless I would like to register attributes in the operation and use these attributes in the gradient definition by calling op.get_attr('attr_name').
Is this possible without going down to C implementation?
May you give me an example?
Unfortunately I don't believe it is possible to add attributes without using a C++ implementation of the operation. One feature that may help though is that you can define 'private' attributes by prepending an underscore to the start. I'm not sure if this is well documented or what the long-term guarantees are, but you can try setting '_my_attr_name' and you should be able to retrieve it later.

Which mathematical method is used by odeint?

I'm working with scipy.integrate.odeint and want to understand it better. For this I have two slightly related questions:
Which mathematical method is it using? Runge-Kutta? Adams-Bashforth? I found this site, but it seems to be for C++, but as far as I know the python function uses the C++ version as well... It states that it switches automatically between implicit and explicit solver, does anybody know how it does this?
To understand/reuse the information I would like to know at which timepoints it evaluates the function and how exactly it computes the solution of the ODE, but fulloutput does not seem to help/I wasn't able to find out how. So to be more precise, an example with Runge-Kutta-Fehlberg: I want the different timepoints at which it evaluated f and the weights it used to multiply it.
Additional information (what for this Info is needed):
I want to reuse this information to use automatic differentiation. So I would call odeint as a black box, find out all the relevant steps it made and reuse this info to calculate the differential dx(T_end)/dx0.
If you know of any other method to solve my problem, please go ahead. Also if another ode solver might be more appropriate to d this.
PS: I'm new, so would it be better to split this question into to questions? I.e. seperate 1. and 2.?

How can I implement a recursive neural network in TensorFlow?

Is there some way of implementing a recursive neural network like the one in [Socher et al. 2011] using TensorFlow?
Note that this is different from recurrent neural networks, which are nicely supported by TensorFlow.
The difference is that the network is not replicated into a linear sequence of operations, but into a tree structure.
I imagine that I could use the While op to construct something like a breadth-first traversal of the tree data structure for each entry of my dataset.
Maybe it would be possible to implement tree traversal as a new C++ op in TensorFlow, similar to While (but more general)?
Your guess is correct, you can use tf.while_loop and tf.cond to represent the tree structure in a static graph. More info:
https://github.com/bogatyy/cs224d/tree/master/assignment3
In my evaluation, it makes training 16x faster compared to re-building the graph for every new tree.
Currently, these models are very hard to implement efficiently and cleanly in TensorFlow because the graph structure depends on the input. That also makes it very hard to do minibatching. It is possible using things like the while loop you mentioned, but doing it cleanly isn't easy.
You can build a new graph for each example, but this will be very annoying. If, for a given input size, you can enumerate a reasonably small number of possible graphs you can select between them and build them all at once, but this won't be possible for larger inputs.
You can also route examples through your graph with complicated tf.gather logic and masks, but this can also be a huge pain.
Ultimately, building the graph on the fly for each example is probably the easiest and there is a chance that there will be alternatives in the future that support better immediate style execution. But as of v0.8 I would expect this to be a bit annoying and introduce some overhead as Yaroslav mentions in his comment.
Edit: Since I answered, here is an example using a static graph with while loops: https://github.com/bogatyy/cs224d/tree/master/assignment3
I am not sure how performant it is compared to custom C++ code for models like this, although in principle it could be batched.

Theano cost function

I am trying to learn how to use Theano. I work very frequently with survival analysis and I wanted therefore to try to implement a standard survival model using Theano's automatic differentiation and gradient descent. The model that I am trying to implement is called the Cox model and here is the wikipedia article: https://en.wikipedia.org/wiki/Proportional_hazards_model
Very helpfully, they have written there the partial likelihood function, which is what is maximized when estimating the parameters of a Cox model. I am quite new to Theano and as a result am having difficulties implementing this cost function and so I am looking for some guidance.
Here is the code I have written so far. My dataset has 137 records and hence the reason I hard-coded that value. T refers to the tensor module, and W refers to what the wikipedia article calls beta, and status is what wikipedia calls C. The remaining variables are identical to wikipedia's notation.
def negative_log_likelihood(self, y, status):
v = 0
for i in xrange(137):
if T.eq(status[i], 1):
v += T.dot(self.X[i], self.W)
u = 0
for j in xrange(137):
if T.gt(y[j], y[i]):
u += T.exp(T.dot(self.X[j], self.W))
v -= T.log(u)
return T.sum(-v)
Unfortunately, when I run this code, I am unhappily met with an infinite recursion error, which I hoped would not happen. This makes me think that I have not implemented this cost function in the way that Theano would like and so I am hoping to get some guidance on how to improve this code so that it works.
You are mixing symbolic and non-symbolic operations but this doesn't work.
For example, T.eq returns a non-executable symbolic expression representing the idea of comparing two things for equality but it doesn't actually do the comparison there and then. T.eq actually returns a Python object that represents the equality comparison and since a non-None object reference is considered the same as True in Python, the execution will always continue inside the if statement.
If you need to construct a Theano computation involving conditionals then you need to use one of its two symbolic conditional operations: T.switch or theano.ifelse.ifelse. See the documentation for examples and details.
You are also using Python loops which is probably not what you need. To construct a Theano computation that explicitly loops you need to use the theano.scan module. However, if you can express your computation in terms of matrix operations (dot products, reductions, etc.) then it will run much, much, faster than something using scans.
I suggest you work through some more Theano tutorials before trying to implement something complex from scratch.

Categories

Resources