Reducing two tensors in Tensorflow - python

I have two tensors.
A tensor of shape (1,N)
A tensor of shape (N,T)
What I want to calculate is the following scalar:
tf.reduce_sum seemed helpful, but I couldn't get my head around combining the two tensors and reduce functions to get what I want. Can someone help me how to write the above equation in tensorflow?

Does this work?
import tensorflow as tf
import numpy as np
N = 10
T = 20
l = tf.constant(np.random.randn(1, N), dtype=tf.float32)
z = tf.constant(np.random.randn(N, T), dtype=tf.float32)
with tf.Session() as sess:
# swap axis for broadcasting to work
l = tf.transpose(l, [1, 0])
z_div_l = tf.divide(z, l)
z_div_l_2 = tf.divide(1.0 - z, 1.0 - l)
result = tf.reduce_sum(tf.add(z_div_l, z_div_l_2), axis=0)
eval_result = sess.run(result)
print('{}\n{}'.format(eval_result.shape, eval_result))
This calculates the above expression for every t from 0 to T-1, so it is not a scalar but a vector of size (T,). Your question mentions you want to compute just one scalar, but the sum is only over N and not over T, so I assumed you just want this expression to be evaluated for every t.

Related

Vectorised pairwise distance

TLDR: given two tensors t1 and t2 that represent b samples of a tensor with shape c,h,w (i.e, every tensor has shape b,c,h,w), i'm trying to calculate the pairwise distance between t1[i] and t2[j] for all i,j efficiently
some more context - I've extracted ResNet18 activations for both my train and test data (CIFAR10) and I'm trying to implement k-nearest-neighbours. A possible pseudo-code might be:
for te in test_activations:
distances = []
for tr in train_activations:
distances.append(||te-tr||)
neighbors = k_smallest_elements(distances)
prediction(te) = majority_vote(labels(neighbors))
I'm trying to vectorise this process given batches from the test and train activations datasets. I've tried iterating the batches (and not the samples) and using torch.cdist(train_batch,test_batch), but I'm not quite sure how this function handles multi-dimensional tensors, as in the documentation it states
torch.cdist(x1, x2,...):
If x1 has shape BxPxM and x2 has shape BxRxM then the output will have shape BxPxR
Which doesn't seem to handle my case (see below)
A minimal example can be found here:
b,c,h,w = 1000,128,28,28 # actual dimensions in my problem
train_batch = torch.randn(b,c,h,w)
test_batch = torch.randn(b,c,h,w)
d = torch.cdist(train_batch,test_batch)
You can think of test_batch and train_batch as the tensors in the for loop for test_batch in train: for train_batch in test:...
EDIT: im adding another example:
both t1[i] and t2[j] are tensors shaped (c,h,w), and the distance between them is a scalar d. so for example, if we have
t1 = torch.randn(2,128,28,28)
t2 = torch.randn(2,128,28,28)
the distance matrix would look something like
[[d(t1[0],t2[0]), d(t1[0],t2[1])],
[d(t1[1],t2[0]), d(t1[1],t2[1])]]
and have a shape (2,2) (or (b,b) more generally)
where d is the scalar distance between the two tensors t1[i] and t2[j].
It is common to have to reshape your data before feeding it to a builtin PyTorch operator. As you've said torch.cdist works with two inputs shaped (B, P, M) and (B, R, M) and returns a tensor shaped (B, P, R).
Instead, you have two tensors shaped the same way: (b, c, h, w). If we match those dimensions we have: B=b, M=c, while P=h*w (from the 1st tensor) and R=h*w (from the 2nd tensor). This requires flattening the spatial dimensions together and swapping the last two axes. Something like:
>>> x1 = train_batch.flatten(2).transpose(1,2)
>>> x2 = test_batch.flatten(2).transpose(1,2)
>>> d = torch.cdist(x1, x2)
Now d contains distance between all possible pairs (train_batch[b, :, iy, ix], test_batch[b, :, jy, jx]) and is shaped (b, h*w, h*w).
You can then apply a knn using argmax to retrieve the k closest neighbour from one element of the training batch to the test batch.

Correlating an array row-wise with a vector

I have an array X with dimension mxn, for every row m I want to get a correlation with a vector y with dimension n.
In Matlab this would be possible with the corr function corr(X,y). For Python however this does not seem possible with the np.corrcoef function:
import numpy as np
X = np.random.random([1000, 10])
y = np.random.random(10)
np.corrcoef(X,y).shape
Which results in shape (1001, 1001). But this will fail when the dimension of X is large. In my case, there is an error:
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 5.93 TiB for an array with shape (902630, 902630) and data type float64
Since the X.shape[0] dimension is 902630.
My question is, how can I only get the row wise correlations with the vector resulting in shape (1000,) of all correlations?
Of course this could be done via a list comprehension:
np.array([np.corrcoef(X[i, :], y)[0,1] for i in range(X.shape[0])])
Currently I am therefore using numba with a for loop running through the >900000 elemens. But I think there could be a much more efficient matrix operation function for this problem.
EDIT:
Pandas provides with the corrwith function also a method for this problem:
X_df = pd.DataFrame(X)
y_s = pd.Series(y)
X_df.corrwith(y_s)
The implementation allows for different correlation type calculations, but does not seem to be implemmented as a matrix operation and is therefore really slow. Probably there is a more efficient implementation.
This should work to compute the correlation coefficient for each row with a specified y in a vectorized manner.
X = np.random.random([1000, 10])
y = np.random.random(10)
r = (len(y) * np.sum(X * y[None, :], axis=-1) - (np.sum(X, axis=-1) * np.sum(y))) / (np.sqrt((len(y) * np.sum(X**2, axis=-1) - np.sum(X, axis=-1) ** 2) * (len(y) * np.sum(y**2) - np.sum(y)**2)))
print(r[0], np.corrcoef(X[0], y))
0.4243951, 0.4243951

Fancy indexing in tensorflow

I have implemented a 3D CNN with a custom loss function (Ax' - y)^2 where x' is a flattened and cropped vector of the 3D output from the CNN, y is the ground truth and A is a linear operator that takes an x and outputs a y. So I need a way to flatten the 3D output and crop it using fancy indexing before computing the loss.
Here is what I have tried:
This is the numpy code I am trying to replicate,
def flatten_crop(img_vol, indices, vol_shape, N):
"""
:param img_vol: shape (145, 59, 82, N)
:param indices: shape (396929,)
"""
nVx, nVy, nVz = vol_shape
voxels = np.reshape(img_vol, (nVx * nVy * nVz, N), order='F')
voxels = voxels[indices, :]
return voxels
I tried using tf.nd_gather to perform the same action but I am unable to generalize it for an arbitrary batch size. Here is my tensorflow code for batch size of 1 (or a single 3D output):
voxels = tf.transpose(tf.reshape(tf.transpose(y_pred), (1, 145 * 59 * 82))) # to flatten and reshape using Fortran-like index order
voxels = tf.gather_nd(voxels, tf.stack([indices, tf.zeros(len(indices), dtype=tf.dtypes.int32)], axis=1)) # indexing
voxels = tf.reshape(voxels, (voxels.shape[0], 1))
Currently I have this piece of code in my custom loss function and I would like to be able to generalize to an arbitrary batch size. Also if you have an alternate suggestion to implement this (such as a custom layer instead of integrating with the loss function), I am all ears!
Thank you.
Try this code:
import tensorflow as tf
y_pred = tf.random.uniform((10, 145, 59, 82))
indices = tf.random.uniform((396929,), 0, 145*59*82, dtype=tf.int32)
voxels = tf.reshape(y_pred, (-1, 145 * 59 * 82)) # to flatten and reshape using Fortran-like index order
voxels = tf.gather(voxels, indices, axis=-1)
voxels = tf.transpose(voxels)

Vector dot product along one dimension for multidimensional arrays

I want to compute the sum product along one dimension of two multidimensional arrays, using Theano.
I'll describe precisely what I want to do using numpy first. numpy.tensordot and numpy.dot seem to always do a matrix product, whereas I'm in essence looking for a batched equivalent of a vector product. Given x and y, I want to compute z like so:
x = np.random.normal(size=(200, 2, 2, 1000))
y = np.random.normal(size=(200, 2, 2))
# this is how I now approach it:
z = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
# z is of shape (200, 2, 1000)
Now I know that numpy.einsum would probably be able to help me here, but again, I want to do this particular computation in Theano, which does not have an einsum equivalent. I will need to use dot, tensordot, or Theano's specialized einsum subset functions batched_dot or batched_tensordot.
The reason I'm looking to change my approach to this is performance; I suspect that using builtin (CUDA) dot products will be faster than relying on broadcasting, element-wise product, and sum.
In Theano, none of the dimensions of three and four dimensional tensors are broadcastable. You have to explicitly set them. Then the Numpy principles will work just fine. One way to do this is to use T.patternbroadcast. To read more about broadcasting, refer this.
You have three dimensions in one of the tensors. So first you need to append a singleton dimension at the end and then make that dimension broadcastable. These two things can be achieved with a single command - T.shape_padaxis. The entire code is as follows:
import theano
from theano import tensor as T
import numpy as np
X = T.ftensor4('X')
Y = T.ftensor3('Y')
Y_broadcast = T.shape_padaxis(Y, axis=-1) # appending extra dimension and making it
# broadcastable
Z = T.sum((X*Y_broadcast), axis=1) # element-wise multiplication
f = theano.function([X, Y], Z, allow_input_downcast=True)
# Making sure that it works and gives correct results
x = np.random.normal(size=(3, 2, 2, 4))
y = np.random.normal(size=(3, 2, 2))
theano_result = f(x,y)
numpy_result = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
print np.amax(theano_result - numpy_result) # prints 2.7e-7 on my system, close enough!
I hope this helps.

Hessian in theano with respect to matrix input

I'm trying to get a vectorized version of theano gradient and hessian, i.e. I want to compute gradient and hessian at several points, given in a matrix as shown below:
I have a function:
f(x_1,x_2,..,x_n)=exp(x_1^2+x_2^2+...+x_n^2)
and I want to compute its gradient at multiple points with one command. I can do this like so:
x = T.matrix('x')
y = T.diag(T.exp(T.dot(x,x.T)))
J = theano.grad(cost = y.sum(), wrt = x)
f = theano.function(inputs = [x], outputs = J)
f([[1,2],[3,4]])
It returns a matrix, which rows are gradients computed at points (1,2) and (3,4). I want to get the same result for hessian (in this case it would be a 3 dimensional tensor as oppose to a matrix, but the same idea). The following code:
H = theano.gradient.hessian(cost = y.sum(), wrt = x)
returns an error:
AssertionError: tensor.hessian expects a (list of) 1 dimensional variable as `wrt`
I was able to achieve the appropriate result with following code
J = theano.grad(cost = y.sum(), wrt = x)
H = theano.gradient.jacobian(expression = J.flatten(), wrt = x)
g = theano.function(inputs = [x], outputs = H)
g([[1,2],[3,4]])
but it produces a lot of unnecessary zeros and seems like an inefficient and "ugly" way of obtaining the desired result. Has anyone had a similar problem or can you suggest anything?

Categories

Resources