Obtaining all combinations of Euclidean distance in Tensorflow? - python

I would like to form a loss function in Tensorflow that relies on a matrix containing all combinations of (squared) Euclidean distances for a set of embeddings. In numpy, like this:
# E is (batch_size,N,32)
N=100
D = np.zeros((batch_size,N,N))
for x in range(N):
for y in range(N):
D[:,x,y] = np.sum(np.square(E[:,x,:]-E[:,y,:]),axis=1)
How can I code this in Tensorflow/Keras without using the nested for loop, or no for loops at all?

This should do:
D = tf.reduce_sum((E[:, None, :] - E[:, :, None])**2, axis=-1)
D will be (batch_size, N, N). This also works in numpy (obviously use np.sum), so you could use that to check equivalence to the loop version just to be sure.
This solution works via broacasting: None is used to insert axes such that a size-N axis is matched against a size-1 axis, and the latter is broadcast (repeated) to match the former. This results in all elements being compared to all others (per batch element). It's a little hard to describe in text and also difficult to visualize since we are dealing with four-dimensional tensors here...

Related

Writing Code using NumPy without any loops

I am writing a program that utilizes NumPy to calculate accuracy between testing and training points, but I am not sure how to utilize the vectorized functions as opposed to the for loops I have used in my code.
Here is my code(Is there a way to simply the code so that I do not need any loops?)
ty#command to import NumPy package
import numpy as np
iris_train=np.genfromtxt("iris-train-data.csv",delimiter=',',usecols=(0,1,2,3),dtype=float)
iris_test=np.genfromtxt("iris-test-data.csv",delimiter=',',usecols=(0,1,2,3),dtype=float)
train_cat=np.genfromtxt("iris-training-data.csv",delimiter=',',usecols=(4),dtype=str)
test_cat=np.genfromtxt("iris-testing-data.csv",delimiter=',',usecols=(4),dtype=str)
correct = 0
for i in range(len(iris_test)):
n = 0
old_distance = float('inf')
while n < len(iris_train):
#finding the difference between test and train point
iris_diff = (abs(iris_test[i] - iris_train[n])**2)
#summing up the calculated differences
iris_sum = sum(iris_diff)
new_distance = float(np.sqrt(iris_sum))
#if statement to update distance
if new_distance < old_distance:
index = n
old_distance = new_distance
n += 1
print(i + 1, test_cat[i], train_cat[index])
if test_cat[i] == train_cat[index]:
correct += 1
accuracy = ((correct)/float((len(iris_test)))*100)
print(f"Accuracy:{accuracy: .2f}%")pe here
:
The trick with computing the distances is to insert extra dimensions using numpy.newaxis and use broadcasting to compute a matrix with the distance from every testing sample to every training sample in one vectorized operation. Using numpy's broadcasting rules, diff has shape (num_test_samples, num_train_samples, num_features), and distance has shape (num_test_samples, num_train_samples) since we summed along the last axis in the call to numpy.sum.
Then you can use numpy.argmin to find the index of the closest training sample for every testing sample. index has shape (num_test_samples, ) since we did the reduction operation along the last axis of distance.
Finally, you can use index to select the training classification closest
to the testing classification. We can construct a boolean array that represents the equality between the testing classification and the closest training classification using the == operator. The number of correct classifications is then the sum of the True elements of this boolean array. Since True is casted to 1 and False is casted to 0 we can simply sum this boolean array to get the number of correct classifications.
# Compute the distance from every training sample to every testing sample
# Note that `np.sqrt` is not necessary since sqrt is a monotonically
# increasing function -- removing it doesn't change the answer
diff = iris_test[:, np.newaxis] - iris_train[np.newaxis, :]
distance = np.sqrt(np.sum(np.square(diff), axis=-1))
# Compute the index of the closest training sample to the testing sample
index = np.argmin(distance, axis=-1)
# Check if class of the closest training sample matches the class
# of the testing sample
correct = (test_cat == train_cat[index]).sum()
If I get correctly what you are doing (but I don't really need to, to answer the question), for each vector of iris_test, you are searching for the closest one in isis_train. Closest being here in the sense of euclidean distance.
So you have 3 embedded loop (pseudo-python)
for u in iris_test:
for v in iris_train:
s=0
for i in range(dimensionOfVectors):
s+=(iris_test[i]-iris_train[i])**2
dist=sqrt(s)
You are right to try to get rid of python loops. And the most important one to get rid of is the inner one. And you already got rid of this one. Since the inner loop of my pseudo code is, in your code, implicitly in:
iris_diff = (abs(iris_test[i] - iris_train[n])**2)
and
iris_sum = sum(iris_diff)
Both those line iterates through all dimensions of your vectors. But do it not in python but in internal numpy code, so it is fast.
One may object that you don't really need abs after a **2, that you could have called the np.linalg.norm function that does all those operations in one call
new_distance = np.linalg.norm(iris_test[i]-iris_train[n])
which is faster than your code. But at least, in your code, that loop over all components of the vectors is already vectorized.
The next stage is to vectorize the middle loop.
That also can be accomplished. Instead of computing one by one
new_distance = np.linalg.norm(iris_test[i]-iris_train[n])
You could compute in one call all the len(iris_train) distances between iris_test[i] and all iris_train[n].
new_distances = np.linalg.norm(iris_test[i]-iris_train, axis=1)
The trick here lies in numpy broadcasting and axis parameter
broadcasting means that you can compute the difference between a 1D, length W vector, and a 2D n×W array (iris_test[0] is a 1D vector, and iris_train is 2D-array whose number of columns is the same as the length of iris_test[0]). Because in such case, numpy broadcasts the 1st operator, and returns a 2D n×W array as result, whose each line k is iris_test[0] - iris_train[k].
Calling np.linalg.norm on that n×W 2D matrix would return a single float (the norm of the whole matrix). Unless you restrict the norm to the 2nd axis (axis=1). In which case, it returns n floats, each of them being the norm of one row.
In other words, after the previous line of code, new_distances[k] is the distance between iris_test[i] and iris_train[k].
Once that done, you can easily find k such as this distance is the smallest, using np.argmin.
np.argmin(new_distances) is the index of the smallest of the distances.
So, all together, your code could be rewritten as:
correct = 0
for i in range(len(iris_test)):
new_distances = np.linalg.norm(iris_test[i]-iris_train, axis=1)
index=np.argmin(new_distances)
#printing out classifications
print(i + 1, test_cat[i], train_cat[index])
if test_cat[i] == train_cat[index]:
correct += 1

Pytorch: Efficiently compute unbiased estimator of mean to the power of four

Let w, x, y, z be torch tensors of shape (m, n) and we wish to compute the following unbiased estimator row-wise efficiently (without for loops), where I want to compute for every row 1, ..., m:
In case of only the unbiased estimator of the square of means, i.e., for :
this is possible, e.g., using torch.einsum:
batch_outer = torch.einsum('bi, bj -> bij', x, y)
zero_diag = 1-torch.eye(batch_outer.shape[1])
return (batch_outer * zero_diag).sum(dim=2).sum(dim=1) / (n * (n-1))
However, for the case to the power of four this is not so easy doable, mostly because these are not squared tensors and in particular, because the zeroing out of the diagonals becomes very tedious.
My questions:
1.) How can this be implemented efficiently ommitting any for loops?
2.) Which time and memory complexity would that solution have in big O notation?
3.) Can this solution also be used to do it with four 3D tensors of shape (m, k, n), where again we only want to do the computations along the axes of length n (dim=2)?
4.) If I want to do it in log-space for numerical stability, i.e., to use logsumexp for summations and sums for multiplications (because log(xy)= log(x)+log(y)), any solution with einsum wouldnt work anymore. How could that computation then be done in log space?
1 This implementation seems to work if I didn't make mess with the diagonal dimensions.
import numpy as np
import torch as th
x = np.array([1,4,5,3])
y = np.array([5,2,4,5])[np.newaxis]
z = np.array([5,7,4,5])[np.newaxis][np.newaxis]
w = np.array([3,9,5,1])[np.newaxis][np.newaxis][np.newaxis]
xth = th.Tensor(x)
yth = th.Tensor(y)
zth = th.Tensor(z)
wth = th.Tensor(w)
tensor = xth*th.transpose(yth, 0, 1)*th.transpose(zth,0,2)*th.transpose(wth,0,3)
diag = th.diagonal(tensor, dim1 = -2, dim2 = -1)
result = th.sum(tensor) - th.sum(diag)
result /= np.math.factorial(len(x))
print(result)
The order is between O(n^2.37..) - O(n^3), depending on the pytorch implementation of the matrix multiplication.
I don't see why not, just choose properly the dimensions to transpose and take the diagonal.
I don't see why would this solution won't work in a log-space.
pd: my knowledge in pytorch is quite limited, but I'm sure you can define x,y,z,w in a more elegant way.

Central difference with Convolution

So basically I am trying to do finite differencing on a 2d array without doing too many for loops. I would like to have the Hessian matrix of the array, and the gradient. So I need both the first order and second order derivative of the array.
This can be achieved by evaluating the following equation on on the array.
To deal with boundaries we only compute it for the interior points, so code for this derivate might look something like the following
arr = np.random.rand(16).reshape(4,4)
result = np.zeros_like(arr)
w, h = arr.shape
for i in range(1, w-1):
for j in range(1, h-1):
result[i,j] = (arr[i+1, j] - arr[i-1, j]) / (2*dx)
This gives the correct answer but can be very slow compared nu numpy operations, so I thought to myself. This is basically just a convolution with a kernel that looks like this
kernel = [1, 0 , -1]
So we execute the following code
from scipy.sigmal import convolve
result = np.pad((convolve(arr,kernel,mode='same',
method = 'direct')/(2*dx))[1:-1, 1:-1], 1).T
Since we are only dealing with the interior points, we cut them of and pad with zeros afterwards, to mimick what would happened in the previous naive case.
This works! But with some arrays, the mean squared error between the naive case and the convolution case sky rockets. So it seems that the numerical error increases very much for some cases.
I would like the speed gained by convolution with the stability of the naive case. Any help?
We can simply slice and operate. Hence, after output initialization, do -
result[1:-1,1:-1] = (arr[2:,1:-1] - arr[:-2,1:-1])/(2*dx)
Convolution IMHO would be an overkill when working with NumPy arrays, as slicing arrays are virtually free on memory and performance. Being compute heavy, one can look into numexpr though to leverage multi-cores.

What does the MNIST tensorflow tutorial mean with matmul flipping trick?

The tutorial on MNIST for ML Beginners, in Implementing the Regression, shows how to make the regression on a single line, followed by an explanation that mentions the use of a trick (emphasis mine):
y = tf.nn.softmax(tf.matmul(x, W) + b)
First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs.
What is the trick here, and why are we using it?
Well, there's no trick here. That line basically points to one previous equation multiplication order
# Here the order of W and x, this equation for single example
y = Wx +b
# if you want to use batch of examples you need the change the order of multiplication; instead of using another transpose op
y = xW +b
# hence
y = tf.matmul(x, W)
Ok, I think the main point is that if you train in batches (i.e. train with several instances of the training set at once), TensorFlow always assumes that the zeroth dimension of x indicates the number of events per batch.
Suppose you want to map a training instance of dimension M to a target instance of dimension N. You would typically do this by multiplying x (a column vector) with a NxM matrix (and, optionally, add a bias with dimension N (also a column vector)), i.e.
y = W*x + b, where y is also a column vector.
This is perfectly alright seen from the perspective of linear algebra. But now comes the point with the training in batches, i.e. training with several training instances at once.
To get to understand this, it might be helpful to not view x (and y) as vectors of dimension M (and N), but as matrices with the dimensions Mx1 (and Nx1 for y).
Since TensorFlow assumes that the different training instances constituting a batch are aligned along the zeroth dimension, we get into trouble here since the zeroth dimension is occupied by the different elements of one single instance.
The trick is then to transpose the above equation (remember that transposition of a product also switches the order of the two transposed objects):
y^T = x^T * W^T + b^T
This is pretty much what has been described in short within the tutorial.
Note that y^T is now a matrix of dimension 1xN (practically a row vector), while x^T is a matrix of dimension 1xM (also a row vector). W^T is a matrix of dimension MxN. In the tutorial, they did not write x^T or y^T, but simply defined the placeholders according to this transposed equation. The only point that is not clear to me is why they did not define b the "transposed way". I assume that the + operator automatically transposes b if it is necessary in order to get the correct dimensions.
The rest is now pretty easy: if you have batches larger than 1 instance, you just "stack" multiple of the x (1xM) matrices, say to a matrix of dimensions (AxM) (where A is the batch size). b will hopefully automatically broadcasted to this number of events (that means to a matrix of dimension (AxN). If you then use
y^T = x^T * W^T + b^T,
you will get a (AxN) matrix of the targets for each element of the batch.

Matrix vector multiplication along array axes

in a current project I have a large multidimensional array of shape (I,J,K,N) and a square matrix of dim N.
I need to perform a matrix vector multiplication of the last axis of the array with the square matrix.
So the obvious solution would be:
for i in range(I):
for j in range(J):
for k in range(K):
arr[i,j,k] = mat.dot(arr[i,j,k])
but of course this is rather slow. So I also tried numpy's tensordot but had little success.
I would expect that something like:
arr = tensordot(mat,arr,axes=((0,1),(3)))
should do the trick but I get a shape mismatch error.
Has someone a better solution or knows how to correctly use tensordot?
Thank you!
This should do what your loops, but with vectorized looping:
from numpy.core.umath_tests import matrix_multiply
arr[..., np.newaxis] = matrix_multiply(mat, arr[..., np.newaxis])
matrix_multiply and its sister inner1d are hidden, undocumented, gems of numpy, although a full set of linear algebra gufuncs should see the light with numpy 1.8. matrix_multiply does matrix multiplication on the last two dimensions of its inputs, and broadcasting on the rest. The only tricky part is setting an additional dimension, so that it sees column vectors when multiplying, and adding it also on assignment back into array, so that there is no shape mismatch.
I think your for loop is wrong, and for this case dot seems to be enough:
# a is your IJKN
# b is your NN
c = dot(a, b)
Here c will be a IJKN array. If you want to sum over the last dimension to get the IJK array:
arr = dot(a,b).sum(axis=3)
BUT I'm NOT SURE IF THIS IS WHAT YOU WANT...

Categories

Resources