I want to implement the following sigmoid function with a custom slope parameter k.
y = f(x)= 1/ ( 1+exp(-1*k*x))
gradient gy = k * f(x)*(1-f(x))
I want to use this in my autoencoder. How do I implement this in Chainer?
If k is constant (i.e., a hyperparameter), F.sigmoid(k * x) should just work.
If k is a parameter that should be learned in the same way as other weights, you may want to subclass a link like L.PReLU, and use it just like other links, e.g. L.Linear and L.Convolution2D. You can still implement the forward method of the link like the above simple expression.
An activation function should be a subclass of Chainer.FunctionNode (FunctionNode docs). An example of this is the Swish function provided by chainer library. You can observe its source here and clone it (or any other function such as tanh) to make necessary changes to its forward and backward operation declaration to fit it to your needs.
Related
For the backpropagation in PyTorch, many gradients of simple, functions are of course already implemented.
But what if I want to have a function that evaluate the gradient of an existing primitive function directly, e.g. the derivative of torch.sigmoid(x) with respect to x? I'd also like to be able to backpropagate through this new function.
The goal would be something like the following, but by using only torch.sigmoid instead of a custom (re-)implementation.
import torch
import matplotlib.pyplot as plt
def dsigmoid_dx(x):
return torch.sigmoid(x) * (1-torch.sigmoid(x))
xx = torch.linspace(-3.5, 3.5, 100)
yy = dsigmoid_dx(xx)
# ... do other stuff with yy
Of course, I could make x require gradients, pass it through the function, and then use autograd, e.g. as follows:
import torch
import matplotlib.pyplot as plt
xx = torch.linspace(-3.5, 3.5, 100, requires_grad=True)
yy = torch.sigmoid(xx)
grad = torch.autograd.grad(yy, [xx], grad_outputs=torch.ones_like(yy), create_graph=True)[0]
plt.plot(xx.detach(), grad.detach())
plt.plot(xx.detach(), yy.detach(), color='red')
plt.show();
Is it (for individual, primitive functions) possible to somehow directly access the implemented backward function?
In the pytorch docs it's shown how to extend autograd, but I can't figure out how to directly access these functions for existing ones (again, e.g. torch.sigmoid)
To summarize, I want to avoid having to reimplement simple derivatives of functions, which are obviously already implemented in the framework (and presumably in a numerically stable way). Is this possible? Or do I always have to reimplement it myself?
Since the computation of yy only involves one (native) function which is torch.sigmoid, then ultimately calling autograd.grad or similarly yy.backward will result in directly calling the implemented backward function of sigmoid. Which is by the looks of it what you are looking for in the first place. In other words, backpropagating on yy is the exact definition of accessing (ie. calling) for a given point.
So one alternative interface you can use is backward:
xx = torch.linspace(-3.5, 3.5, 100, requires_grad=True)
yy = torch.sigmoid(xx)
yy.sum().backward()
plt.plot(xx.detach(), xx.grad)
plt.plot(xx.detach(), yy.detach(), color='red')
I am working on a weighted version of SparseCategoricalCrossentropy. right now my implementation is converting y_true to one hot form and calculates the cross entropy then multiplies it with a weight matrix. I get the same output between my implementation and SparseCategoricalCrossentropy when weights are all 1 however my problem is with one hot encoding. I have a lot of classes (32+bg) and when using one hot encoding I run out of memory for large images/batch sizes which does not happen with SparseCategoricalCrossentropy. I am trying to figure out how is the built in one implemented (is there a way to avoid one hot encoding etc.). How is the built in one implemented or where is it implemented looking at [1] it is probably implemented on the native side but I can not find it?
[1] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/losses.py#L692
The SparseCategoricalCrossentropy documentation has a "View Source on GitHub" tab you can click on. This will show you the implementation. Doing this leads us to line 666 of tensorflow.python.keras.losses. We can see from the class definition that it wraps a function sparse_categorical_crossentropy which is defined on line 4867 of tensorflow.keras.backend. We can see at the bottom of the function definition this is a wrapper around tf.nn.sparse_softmax_cross_entropy_with_logits and this function definition can be found in tensorflow.python.ops.nn_ops. At the bottom of this function definition, we can see it is a wrapper around gen_nn_ops.sparse_softmax_cross_entropy_with_logits. If you look for gen_nn_ops, you won't find it. It is the name of the *.so file that python imports to run tensorflow's C++ op code. So what we are really looking for is a sparse softmax C++ kernel, which can be found in tensorflow.core.kernels.sparse_xent_op.cc. This op calls a functor which calls a method SparseXentEigenImpl whose implementation can be found in the corresponding header file, sparse_xent_op.h. And starting on line 47 of that file you can see how they create the sparse loss.
// Generator for calculation of the sparse Xent loss.
// This generator takes the logits, the sum of the exponentiated
// logits, and the label indices. For each minibatch entry, ignoring
// the batch index b, it calculates:
//
// loss[j] = (log(sum_exp_logits) - logits[j]) * 1{ j == label }
//
// for j = 0 .. num_classes. This value must be summed over all j for
// the final loss.
And on line 224 there is a comment of outlining the loss calculation formula.
// sum(-labels *
// ((logits - max_logits) - log(sum(exp(logits - max_logits)))))
// along classes
Not sure if this helps you create your weighted op, but this is how sparse xent is calculated in tensorflow.
Edit:
There also is a method tf.nn.weighted_cross_entropy_with_logits. Not sure if that will work with your sparsity requirement, but will probably work better than trying to implement something yourself.
I am looking to use floor() method in one of my models. I would like to understand what pytorch does with its gradient propagation since as such floor is a discontinuous method.
If there is no gradient defined, I could override the backward method to define my own gradient as necessary but I would like to understand what the default behavior is and the corresponding source code if possible.
import torch
x = torch.rand(20, requires_grad=True)
y = 20*x
z = y.floor().sum()
z.backward()
x.grad returns zeros.
z has a grad_fn=
So FloorBackward is the gradient method. But there is no reference to the source code of FloorBackward in pytorch repository.
As the floor function is piece wise constant. This means the gradient must be zero almost everywhere.
While the code doesn't say anything about it, I expect that the gradient is set to a constant zero everywhere.
I learned that we need to use tf.OPERATIONS to define the computation graph, but I found sometimes, using + or = are just fine, without using tf.add or tf.assign see here.
My question is that what are the operations allowed in tensorflow loss function definition without using "tf.OPERATIONS". In other words, other than + and = what else? can we use for example *, or ^2 on variables?
PS: I just do not understand why x*x is ok but x^2 is not ...
Is there way to provide user defined activation function for layers in CNTK (Python API) instead of only primitive ones like tanh, relu etc.?
Something like this
def f(x):
return x * x
LSTM(number_of_cells, activation=f)
Yes, what you wrote should work as is.
This tutorial might be useful to you:
https://www.cntk.ai/Tutorials/CVPR2017/CVPR_2017_Tutorial_final.pdf
Also, CNTK has a number of tutorials and manuals:
https://github.com/Microsoft/CNTK/tree/master/Tutorials
https://github.com/Microsoft/CNTK/tree/master/Manual
What you wrote should work, you can use any of the CNTK expression to compose a more complex activation function.