About custom operations in Tensorflow and PyTorch

About custom operations in Tensorflow and PyTorch - python

I have to implement an energy function, termed Rigidity Energy, as in Eq 7 of this paper here.
The energy function takes as input two 3D object meshes, and returns the energy between them. The first mesh is the source mesh, and the second mesh is the deformed version of the source mesh. In rough psuedo-code, the computation would go like this:
Iterate over all the vertices in the source mesh.
For every vertex, compute its covariance matrix with its neighboring vertices.
Perform SVD on the computed covariance matrix and find the rotation matrix of the vertex.
Use the computed rotation matrix, the point coordinates in the original mesh and the corresponding coordinates in the deformed mesh, to compute the energy deviation of the vertex.
Thus this energy function requires me to iterate over each point in the mesh, and the mesh could have more than 2k such points. In Tensorflow, there are two ways to do this. I can have 2 tensors of shape (N,3), one representing the points of source and the other of the deformed mesh.
Do it purely using Tensorflow tensors. That is, iterate over elements of the above tensors using tf.gather and perform the computation on each point using only existing TF operations. This method, would be extremely slow. I've tried to define loss functions that iterate over 1000s of points before, and the graph construction itself takes too much time to be practical.
Add a new TF OP as explained in the TF documentation here . This involves writing the function in CPP (and Cuda, for GPU support), and registering the new OP with TF.
The first method is easy to write, but impractically slow. The second method is a pain to write.
I've used TF for 3 years, and have never used PyTorch before, but at this point I'm considering switching to it, if it offers a better alternative for such cases.
Does PyTorch have a way of implementing such loss functions both easily and performs as fast as it would on GPU. i.e, A pythonic way of writing my own loss functions that runs on GPU, without any C or Cuda code on my part?

As far as I understand, you are essentially asking if this operation can be vectorized. The answer is no, at least not fully, because svd implementation in PyTorch is not vectorized.
If you showed the tensorflow implementation, it would help in understanding your starting point. I don't know what you mean by finding the rotation matrix of the vertex, but I would guess this can be vectorized. This would mean that svd is the only non-vectorized operation and you could perhaps get away with writing just a single custom OP, that is the vectorized svd - which is likely quite easy, because it would amount to calling some library routines in a loop in C++.
Two possible sources of problems I see are
if the neighborhoods of N(i) in equation 7 can be of significantly different sizes (which would mean that the covariance matrices are of different sizes and vectorization would require some dirty tricks)
the general problem of dealing with meshes and neighborhoods could be difficult. This is an innate property of irregular meshes, but PyTorch has support for sparse matrices and a dedicated package torch_geometry, which at least helps.

Related

How to implement custom metric in UMAP?

I am looking to linearly combine features to be used by UMAP. Some of them are GCS coordinates and require a haversine treatment while others can be compared using their euclidean distance.
distance(v1, v2) = alpha * euclidean(f1_eucl, f2_eucl) + beta * haversine(f1_hav, f2_hav)
So far, I have tried:
Creating a custom distance matrix. The squared matrix takes 70GB using float64, 35GB with float32. Using fastdist, I get a computation time of 7min, which is quite slow compared to UMAP's 2-3min -- all included. This approach falls apart as soon as I try adding the euclidean and haversine matrices together (140GB which is massive compared to UMAP's 5GB). I also tried chunking the computation using dask. The result is memory-friendly but my session kept crashing so I couldn't even tell how long that would have taken.
Using a custom callable to be ingested by UMAP. Thanks to the jit compilation using numba, I get the results quite quickly and have no memory problem. The major problem here is it looks like UMAP ignores my callable when the dataset reaches 4096 in size. If I set the callable to return 0, UMAP still shows the patterns of the original dataset in the graphs. If somebody could explain me what this is due to, that'd be great.
In summary, how do you go about, practically speaking, implementing a custom metric in UMAP? And bonus question, do you think this is a sound approach? Thanks.

The custom metric in numba should work for more than 4096 points. That's a relevant number because that's the stage at which it cuts over to using approximate nearest neighbor search (which is passes of to pynndescent). Now pynndescent does support custom metrics compiled with numba, so if it is somehow going astray it is because that is not getting passed to pynndescent correctly. Still, I would have expected an error, not defaulting to euclidean distance.

Large sparse matrix inversion on Python

I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
I chose Python (which is not the fastest) and it works pretty well. However, in my code, I have inversions of large sparse symmetric (non-positive definite, so can't use Cholesky) matrix to execute (image below). I currenty use np.linalg.inv() which is using the LU decomposition method.
I pretty sure there would be some optimization to do in terms of rapidity.
I thought about Cuthill-McKee algotihm to rearange the matrix and take its inverse. Do you have any ideas or advice ?
Thank you very much for your answers !

Good news is that if you're using any of the popular python libraries for linear algebra (namely, numpy), the speed of python really doesn't matter for the math – it's all done natively inside the library.
For example, when you write matrix_prod = matrix_a # matrix_b, that's not triggering a bunch of Python loops to multiply the two matrices, but using numpy's internal implementation (which I think uses the FORTRAN LAPACK library).
The scipy.sparse.linalg module has your back covered when it comes to solving sparsely stored matrices specifying sparse systems of equations. (which is what you do with the inverse of a matrix). If you want to use sparse matrices, that's your way to go – notice that there's matrices that are sparse in mathematical terms (i.e., most entries are 0), and matrices which are stored as sparse matrix, which means you avoid storing millions of zeros. Numpy itself doesn't have sparsely stored matrices, but scipy does.
If your matrix is densely stored, but mathematically sparse, i.e. you're using standard numpy ndarrays to store it, then you won't get any more rapid by implementing anything in Python. The theoretical complexity gains will be outweighed by the practical slowness of Python compared to highly optimized inversion.
Inverting a sparse matrix usually loses the sparsity. Also, you never invert a matrix if you can avoid it at all! For a sparse matrix, solving the linear equation system Ax = b, with A your matrix and b a known vector, for x, is so much faster done forward than computing A⁻¹! So,
I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
since LS says you don't need the inverse matrix, simply don't calculate it, ever. The point of LS is finding a solution that's as close as it gets, even if your matrix isn't invertible. Which can very well be the case for sparse matrices!

Performing UMAP dimension reduction on inconsistently shaped data - python

first question, I will do my best to be as clear as possible.
If I can provide UMAP with a distance function that also outputs a gradient or some other relevant information, can I apply UMAP to non-traditional looking data? (I.e., a data set with points of inconsistent dimension, data points that are non-uniformly sized matrices, etc.) The closest I have gotten to finding something that looks vaguely close to my question is in the documentation here (https://umap-learn.readthedocs.io/en/latest/embedding_space.html), but this seems to be sort of the opposite process, and as far as I can tell still supposes you are starting with tuple-based data of uniform dimension.
I'm aware that one way around this is just to calculate a full pairwise distance matrix ahead of time and give that to UMAP, but from what I understand of the way UMAP is coded, it only performs a subset of all possible distance calculations, and is thus much faster for the same amount of data than if I were to take the full pre-calculation route.
I am working in python3, but if there is an implementation of UMAP dimension reduction in some other environment that permits this, I would be willing to make a detour in my workflow to obtain this greater flexibility with incoming data types.
Thank you.

Algorithmically this is quite possible, but in practice most implementations do not support anything other than fixed dimension vectors. If computing the all pairs distances is not tractable another option is to try to find a way to featurize or vectorize the data in a way that will allow for easy distance computations. This is, of course, not always possible. The final option is to implement things yourself, but this requires handling the nearest neighbour search, which is likely a non-trivial coding project in and of itself.

Generating reachability matrix from a given adjacency matrix

What is the best algorithm for generating a reachability matrix from a given adjacency matrix. There is warshall's algorithm but it is not the best method. There are some other methods but the procedures are more theoretical. Is there any module or with which I can create a reachability matrix with ease. I am working on python 2.7.

I don't think there is a way to do it faster than O(n³) in general case for directed graph.
That being said, you can try to use clever technics to reduce the constant.
For example, you can do the following:
Convert your graph into DAG by finding all strongly connected components and replacing them with a single vertex. This can be done in Θ(V + E) or Θ(V²)
On the new graph, run DFS to calculate reachability for all vertices, but, when updating reachability set for a vertex, do it in the fast vectorized way. This is technically Θ( (V + E) * V ) or Θ(V³), but the constant will be low (see below).
The proposed vectorized way is to have the reachability set for every vertex represented as the bit vector, residing on GPU. This way, the calculation of the union of two sets is performed in extremely fast parallelized manner on GPU. You can use any tensor library for GPU, for example, tf.bitwise.
After you calculated reachability bit vectors for every vertex on GPU, you can extract them into CPU memory in Θ(V²) time.

Clustering GPS points with a custom distance function in scipy

I'm curious if it is possible to specify your own distance function between two points for scipy clustering. I have datapoints with 3 values: GPS-lat, GPS-lon, and posix-time. I want to cluster these points using some algorithm: either agglomerative clustering, meanshift, or something else.
The problem is distance between GPS points needs to be calculated with the Haversine formula. And then that distance needs to be weighted appropriately so it is comparable with a distance in seconds for clustering purposes.
Looking at the documentation for scipy I don't see anything that jumps out as a way to specify a custom distance between two points.
Is there another way I should be going about this? I'm curious what the Pythonic thing to do is.

You asked for sklearn, but I don't have a good answer for you there. Basically, you could build a distance matrix the way you like, and many algorithms will process the distance matrix. The problem is that this needs O(n^2) memory.
For my attempts at clustering geodata, I have instead used ELKI (which is Java, not Python). First of all, it includes geodetic distance functions; but it also includes index acceleration for many algorithms and for this distance function.
I have not used an additional attribute such as time. As you already noticed you need to weight them appropriately, as 1 meter does not equal not 1 second. Weights will be very much use case dependant, and heuristic.
Why I'm suggesting ELKI is because they have a nice Tutorial on implementing custom distance functions that then can be used in most algorithms. They can't be used in every algorithm - some don't use distance at all, or are constrained to e.g. Minkowski metrics only. But a lot of algorithms can use arbitrary (even non-metric) distance functions.
There also is a follow-up tutorial on index accelerated distance functions. For my geodata, indexes were tremendously useful, speeding up by a factor of over 100x, and thus enabling be to process 10 times more data.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.