Use pdist() in python with a custom distance function defined by you - python

I have been interested in usage of scipy.spatial.distance.pdist(...) in python which has come to be useful and fast for some of the applications I have been working on.
I need to use a pairwise distance function which are custom and not standard default distance metrics as defined by the metric. Let's make a simple example, suppose I do not want to use euclidean distance function as the following:
Y = pdist(X, 'euclidean')
Instead I want to define the euclidean function myself and pass it as a function or argument to pdist(). How can I pass the implementation of euclidean distance function to this function to get exactly the same results. The answer to this question, will help me to use the function in the way I am interested in.
In MATLAB, I know how to use pdist(), in Python I don't yet. Thanks for your suggestion

There is an example in the documentation for pdist:
import numpy as np
from scipy.spatial.distance import pdist
dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))
If you want to use a regular function instead of a lambda function the equivalent would be
import numpy as np
from scipy.spatial.distance import pdist
def dfun(u, v):
return np.sqrt(((u-v)**2).sum())
dm = pdist(X, dfun)

Related

Can I use scipy to check the jacobian of a function?

I have a function for which I know the explicit expression of the jacobian. I would like to check the correctness of this jacobian by comparing it against a finite-element approximation. Scipy has a function that does a similar check on the gradient of a function but I haven't found the equivalent for a jacobian (if it existed in scipy, I assume it would be in this listing). I would like a function that similarly takes two callables (the function and the jacobian) and an ndarray (the points to check the jacobian against its approximation) and returns the error between the two.
The jacobian of a function can be written in a form that uses the gradients of the components of the function, so the scipy.optimize.check_grad function might be usable to this extent, but I don't know how that might be implemented in practice.
Say I have function
def fun(x, y):
return y, x
with the jacobian
from numpy import ndarray, zeros
def jac(x, y):
result = zeros((2, 2))
result[0, 1] = 1
result[1, 2] = 1
return result
How should I go about to separate these variables in order to use the scipy function? The solution must be generalizable to n-dimensional functions. Or is there an existing function to fill this task?
If I were limited to 2-dimensional functions, I might do
from scipy.optimize import check_grad
def fun1(x, y):
return fun(x, y)[0]
def grad1(x, y):
return jac(x, y0)[0]
check_grad(fun1, grad1, [1.5, -1.5])
...
but this solution isn't trivially extended to functions of higher dimensions.
SciPy is not the best tool for this. You should be using a numerical library that does autograd.
JAX has a close implementation of the NumPy API and adds autograd functionality.
Other deep learning frameworks such as PyTorch and Tensorflow are able to do the same, but without the simplicity of the NumPy interface.

Vector to matrix function in NumPy without accessing elements of vector

I would like to create a NumPy function that computes the Jacobian of a function at a certain point - with the Jacobian hard coded into the function.
Say I have a vector containing two arbitrary scalars X = np.array([[x],[y]]), and a function f(X) = np.array([[2xy],[3xy]]).
This function has Jacobian J = np.array([[2y, 2x],[3y, 3x]])
How can I write a function that takes in the array X and returns the Jacobian? Of course, I could do this using array indices (e.g. x = X[0,0]), but am wondering if there is a way to do this directly without accessing the individual elements of X.
I am looking for something that works like this:
def foo(x,y):
return np.array([[2*y, 2*x],[3*y, 3*x]])
X = np.array([[3],[7]])
J = foo(X)
Given that this is possible on 1-dimensional arrays, e.g. the following works:
def foo(x):
return np.array([x,x,x])
X = np.array([1,2,3,4])
J = foo(X)
You want the jacobian, which is the differential of the function. Is that correct? I'm afraid numpy is not the right tool for that.
Numpy works with fixed numbers not with variables. That is given some number you can calculate the value of a function. The differential is a different function, that has a special relationship to the original function but is not the same. You cannot just calculate the differential but must deduce it from the functional form of the original function using differentiating rules. Numpy cannot do that.
As far as I know you have three options:
use a numeric library to calculate the differential at a specific point. However you only will get the jacobian at a specific point (x,y) and no formula for it.
take a look at a pythen CAS library like e.g. sympy. There you can define expressions in terms of variables and compute the differential with respect to that variables.
Use a library that perform automatic differentiation. Maschine learning toolkits like pytorch or tensorflow have excellent support for automatic differentiation and good integration of numpy arrays. They essentially calculate the differential, by knowing the differential for all basic operation like multiplication or addition. For composed functions, the chain rule is applied and the difderential can be calculated for arbitray complex functions.

How to use numpy roots for trig function

Trying to write some code to find the roots of the function -1.5sin(3x) on the domain [-2, 2]. Is this possible with the numpy roots function?
Essentially the code will look something like this:
import numpy as np
def f(x):
x = -1.5*sin(3*x)
return x
print(np.roots())
I'm just not sure what to put in the parentheses since this function is not a polynomial.
numpy.roots needs a polynomial. You do not have one. numpy.roots cannot be used to find the roots of an arbitrary function.

Python: Integral and funcion nested in another integral using scipy quad

I have managed to write a few lines of code using scipy.integrate.quad for my stochastic process class
I have the Markov transition function for standard Brownian motion
import numpy as np
def p(x,t):
return (1/np.sqrt(2*np.pi*t))*np.exp(-x**2/(2*t))
But I want to compute the following that I am going to write in code that would not work. I write it like this so we can understand the problem without the use of latex.
from scipy.integrate import quad
integral = quad(quad(p(y-x),1,np.inf)*p(x,1),1,np.inf)
You probably noticed that the problem is the bivariate thing going on in the inner integral. I did the following but am unsure of it:
p_xy = lambda y,x: p(y-x,1)
inner = lambda x : quad(p_xy,1,np.inf,args = (x,))[0]
outer = lambda x: inner(x)*p(x,1)
integral = quad(outer,1,np.inf)[0]
I then get
0.10806767286289147
I love Python and its lambda functions but seem to not be sure about this. What are your thoughts? Thank you for your time.
For the type of integral you wish to perform, bivariate integrals, SciPy has dedicated routines.
The advantage is that these routines handle complex boundaries more easily (were the bounds depend on the other coordinate, for instance).
I rewrote your example as:
import numpy as np
from scipy.integrate import nquad
def p(x,t):
return (1/np.sqrt(2*np.pi*t))*np.exp(-x**2/(2*t))
def integrand(x, y):
return p(y-x, 1)*p(x, 1)
integral = nquad(integrand, ((1, np.inf), (1, np.inf)))
print(integral[0])
which prints out the same result. I believe that the code above is easier to read as the integrand is written explicitly as a function of the two variables.

Partial Derivative using Autograd

I have a function that takes in a multivariate argument x. Here x = [x1,x2,x3]. Let's say my function looks like:
f(x,T) = np.dot(x,T) + np.exp(np.dot(x,T) where T is a constant.
I am interested in finding df/dx1, df/dx2 and df/dx3 functions.
I have achieved some success using scipy diff, but I am a bit skeptical because it uses numerical differences. Yesterday, my colleague pointed me to Autograd (github). Since it seems to be a popular package, I am hoping someone here knows how to get partial differentiation using this package. My initial tests with this library indicates that the grad function only takes differentiation with respect to the first argument. I am not sure how to extend it to other arguments. Any help would be greatly appreciated.
Thanks.
I found the following description of the grad function in the autograd source code:
def grad(fun, x)
"Returns a function which computes the gradient of `fun` with
respect to positional argument number `argnum`. The returned
function takes the same arguments as `fun`, but returns the
gradient instead. The function `fun`should be scalar-valued. The
gradient has the same type as the argument."
So
def h(x,t):
return np.dot(x,t) + np.exp(np.dot(x,t))
h_x = grad(h,0) # derivative with respect to x
h_t = grad(h,1) # derivative with respect to t
Also make sure to use the numpy libaray that comes with autograd
import autograd.numpy as np
instead of
import numpy as np
in order to make use of all numpy functions.

Categories

Resources