Directly access derivative of primitive functions in PyTorch - python

For the backpropagation in PyTorch, many gradients of simple, functions are of course already implemented.
But what if I want to have a function that evaluate the gradient of an existing primitive function directly, e.g. the derivative of torch.sigmoid(x) with respect to x? I'd also like to be able to backpropagate through this new function.
The goal would be something like the following, but by using only torch.sigmoid instead of a custom (re-)implementation.
import torch
import matplotlib.pyplot as plt
def dsigmoid_dx(x):
return torch.sigmoid(x) * (1-torch.sigmoid(x))
xx = torch.linspace(-3.5, 3.5, 100)
yy = dsigmoid_dx(xx)
# ... do other stuff with yy
Of course, I could make x require gradients, pass it through the function, and then use autograd, e.g. as follows:
import torch
import matplotlib.pyplot as plt
xx = torch.linspace(-3.5, 3.5, 100, requires_grad=True)
yy = torch.sigmoid(xx)
grad = torch.autograd.grad(yy, [xx], grad_outputs=torch.ones_like(yy), create_graph=True)[0]
plt.plot(xx.detach(), grad.detach())
plt.plot(xx.detach(), yy.detach(), color='red')
plt.show();
Is it (for individual, primitive functions) possible to somehow directly access the implemented backward function?
In the pytorch docs it's shown how to extend autograd, but I can't figure out how to directly access these functions for existing ones (again, e.g. torch.sigmoid)
To summarize, I want to avoid having to reimplement simple derivatives of functions, which are obviously already implemented in the framework (and presumably in a numerically stable way). Is this possible? Or do I always have to reimplement it myself?

Since the computation of yy only involves one (native) function which is torch.sigmoid, then ultimately calling autograd.grad or similarly yy.backward will result in directly calling the implemented backward function of sigmoid. Which is by the looks of it what you are looking for in the first place. In other words, backpropagating on yy is the exact definition of accessing (ie. calling) for a given point.
So one alternative interface you can use is backward:
xx = torch.linspace(-3.5, 3.5, 100, requires_grad=True)
yy = torch.sigmoid(xx)
yy.sum().backward()
plt.plot(xx.detach(), xx.grad)
plt.plot(xx.detach(), yy.detach(), color='red')

Related

Can I use scipy to check the jacobian of a function?

I have a function for which I know the explicit expression of the jacobian. I would like to check the correctness of this jacobian by comparing it against a finite-element approximation. Scipy has a function that does a similar check on the gradient of a function but I haven't found the equivalent for a jacobian (if it existed in scipy, I assume it would be in this listing). I would like a function that similarly takes two callables (the function and the jacobian) and an ndarray (the points to check the jacobian against its approximation) and returns the error between the two.
The jacobian of a function can be written in a form that uses the gradients of the components of the function, so the scipy.optimize.check_grad function might be usable to this extent, but I don't know how that might be implemented in practice.
Say I have function
def fun(x, y):
return y, x
with the jacobian
from numpy import ndarray, zeros
def jac(x, y):
result = zeros((2, 2))
result[0, 1] = 1
result[1, 2] = 1
return result
How should I go about to separate these variables in order to use the scipy function? The solution must be generalizable to n-dimensional functions. Or is there an existing function to fill this task?
If I were limited to 2-dimensional functions, I might do
from scipy.optimize import check_grad
def fun1(x, y):
return fun(x, y)[0]
def grad1(x, y):
return jac(x, y0)[0]
check_grad(fun1, grad1, [1.5, -1.5])
...
but this solution isn't trivially extended to functions of higher dimensions.
SciPy is not the best tool for this. You should be using a numerical library that does autograd.
JAX has a close implementation of the NumPy API and adds autograd functionality.
Other deep learning frameworks such as PyTorch and Tensorflow are able to do the same, but without the simplicity of the NumPy interface.

Vector to matrix function in NumPy without accessing elements of vector

I would like to create a NumPy function that computes the Jacobian of a function at a certain point - with the Jacobian hard coded into the function.
Say I have a vector containing two arbitrary scalars X = np.array([[x],[y]]), and a function f(X) = np.array([[2xy],[3xy]]).
This function has Jacobian J = np.array([[2y, 2x],[3y, 3x]])
How can I write a function that takes in the array X and returns the Jacobian? Of course, I could do this using array indices (e.g. x = X[0,0]), but am wondering if there is a way to do this directly without accessing the individual elements of X.
I am looking for something that works like this:
def foo(x,y):
return np.array([[2*y, 2*x],[3*y, 3*x]])
X = np.array([[3],[7]])
J = foo(X)
Given that this is possible on 1-dimensional arrays, e.g. the following works:
def foo(x):
return np.array([x,x,x])
X = np.array([1,2,3,4])
J = foo(X)
You want the jacobian, which is the differential of the function. Is that correct? I'm afraid numpy is not the right tool for that.
Numpy works with fixed numbers not with variables. That is given some number you can calculate the value of a function. The differential is a different function, that has a special relationship to the original function but is not the same. You cannot just calculate the differential but must deduce it from the functional form of the original function using differentiating rules. Numpy cannot do that.
As far as I know you have three options:
use a numeric library to calculate the differential at a specific point. However you only will get the jacobian at a specific point (x,y) and no formula for it.
take a look at a pythen CAS library like e.g. sympy. There you can define expressions in terms of variables and compute the differential with respect to that variables.
Use a library that perform automatic differentiation. Maschine learning toolkits like pytorch or tensorflow have excellent support for automatic differentiation and good integration of numpy arrays. They essentially calculate the differential, by knowing the differential for all basic operation like multiplication or addition. For composed functions, the chain rule is applied and the difderential can be calculated for arbitray complex functions.

Why does the order of function order matter?

I am having a problem with a function I am trying to fit to some data. I have a model, given by the equation inside the function which I am using to find a value for v. However, the order in which I write the variables in the function definition greatly effects the value the fit gives for v. If, as in the code block below, I have def MAR_fit(v,x) where x is the independent variable, the fit gives a value for v hugely different from if I have the definition def MAR_fit(x,v). I haven't had a huge amount of experience with the curve_fit function in the scipy package and the docs still left me wondering.
Any help would be great!
def MAR_fit(v,x):
return (3.*((2.-1.)**2.)*0.05*v)/(2.*(2.-1.)*(60.415**2.)) * (((3.*x*((2.-1.)**2.)*v)/(60.415**2.))+1.)**(-((5./2.)-1.)/(2.-1.))
x = newCD10_AVB1_AMIN01['time_phys'][1:]
y = (newCD10_AVB1_AMIN01['MAR'][1:])
popt_tf, pcov = curve_fit(MAR_fit, x, y)
Have a look at the documentation again, it says that the callable that you pass to curve_fit (the function you are trying to fit) must take the independent variable as its first argument. Further arguments are the parameters you are trying to fit. You must use MAR_fit(x,v) because that is what curve_fit expects.

Python: Integral and funcion nested in another integral using scipy quad

I have managed to write a few lines of code using scipy.integrate.quad for my stochastic process class
I have the Markov transition function for standard Brownian motion
import numpy as np
def p(x,t):
return (1/np.sqrt(2*np.pi*t))*np.exp(-x**2/(2*t))
But I want to compute the following that I am going to write in code that would not work. I write it like this so we can understand the problem without the use of latex.
from scipy.integrate import quad
integral = quad(quad(p(y-x),1,np.inf)*p(x,1),1,np.inf)
You probably noticed that the problem is the bivariate thing going on in the inner integral. I did the following but am unsure of it:
p_xy = lambda y,x: p(y-x,1)
inner = lambda x : quad(p_xy,1,np.inf,args = (x,))[0]
outer = lambda x: inner(x)*p(x,1)
integral = quad(outer,1,np.inf)[0]
I then get
0.10806767286289147
I love Python and its lambda functions but seem to not be sure about this. What are your thoughts? Thank you for your time.
For the type of integral you wish to perform, bivariate integrals, SciPy has dedicated routines.
The advantage is that these routines handle complex boundaries more easily (were the bounds depend on the other coordinate, for instance).
I rewrote your example as:
import numpy as np
from scipy.integrate import nquad
def p(x,t):
return (1/np.sqrt(2*np.pi*t))*np.exp(-x**2/(2*t))
def integrand(x, y):
return p(y-x, 1)*p(x, 1)
integral = nquad(integrand, ((1, np.inf), (1, np.inf)))
print(integral[0])
which prints out the same result. I believe that the code above is easier to read as the integrand is written explicitly as a function of the two variables.

Partial Derivative using Autograd

I have a function that takes in a multivariate argument x. Here x = [x1,x2,x3]. Let's say my function looks like:
f(x,T) = np.dot(x,T) + np.exp(np.dot(x,T) where T is a constant.
I am interested in finding df/dx1, df/dx2 and df/dx3 functions.
I have achieved some success using scipy diff, but I am a bit skeptical because it uses numerical differences. Yesterday, my colleague pointed me to Autograd (github). Since it seems to be a popular package, I am hoping someone here knows how to get partial differentiation using this package. My initial tests with this library indicates that the grad function only takes differentiation with respect to the first argument. I am not sure how to extend it to other arguments. Any help would be greatly appreciated.
Thanks.
I found the following description of the grad function in the autograd source code:
def grad(fun, x)
"Returns a function which computes the gradient of `fun` with
respect to positional argument number `argnum`. The returned
function takes the same arguments as `fun`, but returns the
gradient instead. The function `fun`should be scalar-valued. The
gradient has the same type as the argument."
So
def h(x,t):
return np.dot(x,t) + np.exp(np.dot(x,t))
h_x = grad(h,0) # derivative with respect to x
h_t = grad(h,1) # derivative with respect to t
Also make sure to use the numpy libaray that comes with autograd
import autograd.numpy as np
instead of
import numpy as np
in order to make use of all numpy functions.

Categories

Resources