TLDR: given two tensors t1 and t2 that represent b samples of a tensor with shape c,h,w (i.e, every tensor has shape b,c,h,w), i'm trying to calculate the pairwise distance between t1[i] and t2[j] for all i,j efficiently
some more context - I've extracted ResNet18 activations for both my train and test data (CIFAR10) and I'm trying to implement k-nearest-neighbours. A possible pseudo-code might be:
for te in test_activations:
distances = []
for tr in train_activations:
distances.append(||te-tr||)
neighbors = k_smallest_elements(distances)
prediction(te) = majority_vote(labels(neighbors))
I'm trying to vectorise this process given batches from the test and train activations datasets. I've tried iterating the batches (and not the samples) and using torch.cdist(train_batch,test_batch), but I'm not quite sure how this function handles multi-dimensional tensors, as in the documentation it states
torch.cdist(x1, x2,...):
If x1 has shape BxPxM and x2 has shape BxRxM then the output will have shape BxPxR
Which doesn't seem to handle my case (see below)
A minimal example can be found here:
b,c,h,w = 1000,128,28,28 # actual dimensions in my problem
train_batch = torch.randn(b,c,h,w)
test_batch = torch.randn(b,c,h,w)
d = torch.cdist(train_batch,test_batch)
You can think of test_batch and train_batch as the tensors in the for loop for test_batch in train: for train_batch in test:...
EDIT: im adding another example:
both t1[i] and t2[j] are tensors shaped (c,h,w), and the distance between them is a scalar d. so for example, if we have
t1 = torch.randn(2,128,28,28)
t2 = torch.randn(2,128,28,28)
the distance matrix would look something like
[[d(t1[0],t2[0]), d(t1[0],t2[1])],
[d(t1[1],t2[0]), d(t1[1],t2[1])]]
and have a shape (2,2) (or (b,b) more generally)
where d is the scalar distance between the two tensors t1[i] and t2[j].
It is common to have to reshape your data before feeding it to a builtin PyTorch operator. As you've said torch.cdist works with two inputs shaped (B, P, M) and (B, R, M) and returns a tensor shaped (B, P, R).
Instead, you have two tensors shaped the same way: (b, c, h, w). If we match those dimensions we have: B=b, M=c, while P=h*w (from the 1st tensor) and R=h*w (from the 2nd tensor). This requires flattening the spatial dimensions together and swapping the last two axes. Something like:
>>> x1 = train_batch.flatten(2).transpose(1,2)
>>> x2 = test_batch.flatten(2).transpose(1,2)
>>> d = torch.cdist(x1, x2)
Now d contains distance between all possible pairs (train_batch[b, :, iy, ix], test_batch[b, :, jy, jx]) and is shaped (b, h*w, h*w).
You can then apply a knn using argmax to retrieve the k closest neighbour from one element of the training batch to the test batch.
Related
I have a numpy array of 3D vectors of shape (n, 3), n being quite large, which can be illustrated as follows:
vects = [[x_1 y_1 z_1]
...
[x_i y_i z_i]
...
[x_n y_n z_n]]
I then use an existing function computing the divergence of my vectors array according to spatial coordinates DataSet at witch the vectors are specified:
vects_div = algs.divergence(vects, DataSet)
What is computed actually doesn't really matters, the important point is that my function needs as arguments:
a (n, 3) shaped numpy array
an object describing the spatial coordinates
and is outputting a (n,) array of scalars.
My issue is that I want to apply this function to subsamples of an array of tensors.
Let me explain.
This time, I have a numpy array of 3D tensors of shape (n, 3, 3), which can be illustrated as follows:
tensors = [[[ xx_1 xy_1 xz_1]
[ yx_1 yy_1 yz_1]
[ zx_1 zy_1 zz_1]]
...
[[ xx_i xy_i xz_i]
[ yx_i yy_i yz_i]
[ zx_i zy_i zz_i]]
...
[[ xx_n xy_n xz_n]
[ yx_n yy_n yz_n]
[ zx_n zy_n zz_n]]]
I then use the previous function on three vectors arrays subsamples of this tensors array:
tensors_x = tensors[:,0,:]
tensors_x = [[xx_1 xy_1 xz_1]
...
[xx_i xy_i xz_i]
...
[xx_n xy_n xz_n]]
tensors_y = tensors[:,1,:]
tensors_y = [[yx_1 yy_1 yz_1]
...
[yx_i yy_i yz_i]
...
[yx_n yy_n yz_n]]
tensors_z = tensors[:,2,:]
tensors_z = [[zx_1 zy_1 zz_1]
...
[zx_i zy_i zz_i]
...
[zx_n zy_n zz_n]]
tensors_x_div = algs.divergence(tensors_x, DataSet)
tensors_y_div = algs.divergence(tensors_y, DataSet)
tensors_z_div = algs.divergence(tensors_z, DataSet)
Which results in three separate arrays of scalars of shape (n,).
Finally I rebuild the vectors array of results as follows:
tensors_div = np.column_stack((tensors_x_div, tensors_y_div, tensors_z_div))
All of this can be summarized into a single and dirty line:
tensors_div = np.column_stack((algs.divergence(tensors[:,0,:], DataSet), algs.divergence(tensors[:,1,:], DataSet), algs.divergence(tensors[:,2,:], DataSet)))
This is all working fine, but I'am looking for a prettier and if possible more efficient way to do it.
I was thinking at the np.apply_along_axis function with something like
np.apply_along_axis(algs.divergence, 1, tensors, DataSet)
But the slices sent to my function are of shape (3,) instead of (n,3).
Is there any numpy function which would allow to do what I want ?
Is there a better way than my dirty line ?
I would like to figure out a way to apply a function which calculates pairwise distances, let's call it dists(A, B), row-wise for every input element in a batch, meaning:
(100, 16, 3) -- input, 100 is the batch size so 100 instances, 16 is let's say image size, and 3 filters (asking for Conv2D)
(5, 3) -- tensor for which I want to calculate the row-wise distance (assume it's A in dists(A, B) and is fixed)
Now, for every instance I am supposed to get back a matrix of shape (5, 16). Naturally, I could use a for to span the batch and get my final (100,5,16) result. However, I would love to know if there is an easier way to apply my function row-wise, in parallel, using GPU.
Thank you very much for your time.
Suppose we are using the L1 distance:
import torch
# data and target
a = torch.randn(100, 16, 3)
b = torch.randn(5, 3)
# Reshape the tensors
a = a.unsqueeze(1)
b = b.unsqueeze(0).unsqueeze(2)
print(a.shape, b.shape)
# Compute distance
dist = (a-b).abs().sum(3)
print(dist.shape)
I have a numpy 2D array with values that range from 0 to 59.
for those who are familiar with DL and specifically Image Segmentation - I create the array (call it L) from a .png image and the value of each pixel L[x,y] means the class that this pixel belongs to (out of the 60 classes).
I want to create a 1-hot tensor - Lhot, in which (Lhot[x,y,z] == 1) only if (L[x,y] == z), and 0 otherwise.
I want to create it with some kind of broadcasting/indexing (1,2 lines) - without loops.
it should be functionally equal to this piece of code (Dtype corresponds to L):
Lhot = np.zeros((L.shape[0], L.shape[1], 60), dtype=Dtype)
for i in range(L.shape[0]):
for j in range(L.shape[1]):
Lhot[i,j,L[i,j]] = 1
anyone has an idea?
Thanks!
Much faster and cleaner way using pure numpy
Lhot = np.transpose(np.eye(60)[L], (1,2,0))
Problem you'll run into with multidimensional one-hots is they get really big and really sparse and there's no good way to handle sparse arrays with more than 2D in numpy/scipy (or sklearn or many other ML packages either I think). Do you really need an n-d one-hot?
Since typical one-hot encoding is defined for 1D vectors, all you have to do is flatten your matrix, use one hot encoder from scikit-learn (or any other library with one-hot encoding) and reshape back.
from sklearn.preprocessing import OneHotEncoder
n, m = L.shape
k = 60
Lhot = np.array(OneHotEncoder(n_values=k).fit_transform(L.reshape(-1,1)).todense()).reshape(n, m, k)
of course you can do it by hand too
n, m = L.shape
k = 60
Lhot = np.zeros((n*m, k)) # empty, flat array
Lhot[np.arange(n*m), L.flatten()] = 1 # one-hot encoding for 1D
Lhot = Lhot.reshape(n, m, k) # reshaping back to 3D tensor
I have two tensors.
A tensor of shape (1,N)
A tensor of shape (N,T)
What I want to calculate is the following scalar:
tf.reduce_sum seemed helpful, but I couldn't get my head around combining the two tensors and reduce functions to get what I want. Can someone help me how to write the above equation in tensorflow?
Does this work?
import tensorflow as tf
import numpy as np
N = 10
T = 20
l = tf.constant(np.random.randn(1, N), dtype=tf.float32)
z = tf.constant(np.random.randn(N, T), dtype=tf.float32)
with tf.Session() as sess:
# swap axis for broadcasting to work
l = tf.transpose(l, [1, 0])
z_div_l = tf.divide(z, l)
z_div_l_2 = tf.divide(1.0 - z, 1.0 - l)
result = tf.reduce_sum(tf.add(z_div_l, z_div_l_2), axis=0)
eval_result = sess.run(result)
print('{}\n{}'.format(eval_result.shape, eval_result))
This calculates the above expression for every t from 0 to T-1, so it is not a scalar but a vector of size (T,). Your question mentions you want to compute just one scalar, but the sum is only over N and not over T, so I assumed you just want this expression to be evaluated for every t.
I have trained per-pixel models on many images and want to evaluate them on new images.
What I'd like to do is for each image of shape (N, M, 3), apply a function in this fashion:
myfunc(array[i, j, :], i, j)
# Takes (3,1) input and indices
def myfunc(input, i, j):
ret1, ret2 = model[i,j].predict(input)
# returns a single float value
return ret1[1]
where i, j are indices, where myfunc will look up the correct model parameters to apply. If it helps, model can be a numpy array of objects, with the same dimensions as the original input's first two dimensions (N, M)
I was looking at ufuncs and vectorize and wasn't really sure if they did what I wanted. Is there a provided interface for doing this, or will I have to loop through the array myself (ugly and possibly slower as it is in python).
Alternatively, what about applying the same function to each value?
e.g.
myfunc(array[i, j, :])
# Takes (3,1) input
def myfunc(input):
ret1, ret2 = model.predict(input)
# returns a single float value
return ret1[1]