I have implemented a 3D CNN with a custom loss function (Ax' - y)^2 where x' is a flattened and cropped vector of the 3D output from the CNN, y is the ground truth and A is a linear operator that takes an x and outputs a y. So I need a way to flatten the 3D output and crop it using fancy indexing before computing the loss.
Here is what I have tried:
This is the numpy code I am trying to replicate,
def flatten_crop(img_vol, indices, vol_shape, N):
"""
:param img_vol: shape (145, 59, 82, N)
:param indices: shape (396929,)
"""
nVx, nVy, nVz = vol_shape
voxels = np.reshape(img_vol, (nVx * nVy * nVz, N), order='F')
voxels = voxels[indices, :]
return voxels
I tried using tf.nd_gather to perform the same action but I am unable to generalize it for an arbitrary batch size. Here is my tensorflow code for batch size of 1 (or a single 3D output):
voxels = tf.transpose(tf.reshape(tf.transpose(y_pred), (1, 145 * 59 * 82))) # to flatten and reshape using Fortran-like index order
voxels = tf.gather_nd(voxels, tf.stack([indices, tf.zeros(len(indices), dtype=tf.dtypes.int32)], axis=1)) # indexing
voxels = tf.reshape(voxels, (voxels.shape[0], 1))
Currently I have this piece of code in my custom loss function and I would like to be able to generalize to an arbitrary batch size. Also if you have an alternate suggestion to implement this (such as a custom layer instead of integrating with the loss function), I am all ears!
Thank you.
Try this code:
import tensorflow as tf
y_pred = tf.random.uniform((10, 145, 59, 82))
indices = tf.random.uniform((396929,), 0, 145*59*82, dtype=tf.int32)
voxels = tf.reshape(y_pred, (-1, 145 * 59 * 82)) # to flatten and reshape using Fortran-like index order
voxels = tf.gather(voxels, indices, axis=-1)
voxels = tf.transpose(voxels)
Related
TLDR: given two tensors t1 and t2 that represent b samples of a tensor with shape c,h,w (i.e, every tensor has shape b,c,h,w), i'm trying to calculate the pairwise distance between t1[i] and t2[j] for all i,j efficiently
some more context - I've extracted ResNet18 activations for both my train and test data (CIFAR10) and I'm trying to implement k-nearest-neighbours. A possible pseudo-code might be:
for te in test_activations:
distances = []
for tr in train_activations:
distances.append(||te-tr||)
neighbors = k_smallest_elements(distances)
prediction(te) = majority_vote(labels(neighbors))
I'm trying to vectorise this process given batches from the test and train activations datasets. I've tried iterating the batches (and not the samples) and using torch.cdist(train_batch,test_batch), but I'm not quite sure how this function handles multi-dimensional tensors, as in the documentation it states
torch.cdist(x1, x2,...):
If x1 has shape BxPxM and x2 has shape BxRxM then the output will have shape BxPxR
Which doesn't seem to handle my case (see below)
A minimal example can be found here:
b,c,h,w = 1000,128,28,28 # actual dimensions in my problem
train_batch = torch.randn(b,c,h,w)
test_batch = torch.randn(b,c,h,w)
d = torch.cdist(train_batch,test_batch)
You can think of test_batch and train_batch as the tensors in the for loop for test_batch in train: for train_batch in test:...
EDIT: im adding another example:
both t1[i] and t2[j] are tensors shaped (c,h,w), and the distance between them is a scalar d. so for example, if we have
t1 = torch.randn(2,128,28,28)
t2 = torch.randn(2,128,28,28)
the distance matrix would look something like
[[d(t1[0],t2[0]), d(t1[0],t2[1])],
[d(t1[1],t2[0]), d(t1[1],t2[1])]]
and have a shape (2,2) (or (b,b) more generally)
where d is the scalar distance between the two tensors t1[i] and t2[j].
It is common to have to reshape your data before feeding it to a builtin PyTorch operator. As you've said torch.cdist works with two inputs shaped (B, P, M) and (B, R, M) and returns a tensor shaped (B, P, R).
Instead, you have two tensors shaped the same way: (b, c, h, w). If we match those dimensions we have: B=b, M=c, while P=h*w (from the 1st tensor) and R=h*w (from the 2nd tensor). This requires flattening the spatial dimensions together and swapping the last two axes. Something like:
>>> x1 = train_batch.flatten(2).transpose(1,2)
>>> x2 = test_batch.flatten(2).transpose(1,2)
>>> d = torch.cdist(x1, x2)
Now d contains distance between all possible pairs (train_batch[b, :, iy, ix], test_batch[b, :, jy, jx]) and is shaped (b, h*w, h*w).
You can then apply a knn using argmax to retrieve the k closest neighbour from one element of the training batch to the test batch.
I am currently working on the mnist dataset to create a CNN.
My input is
X: Array of shape (batch_size, n_channels, image_height, image_width)
F: The filter to apply. Array of shape (n_channels, filter_height, filter_width)
I am able to compute the element-wise multiplication on a single filter as below:
index : tuple pointing to the top-left corner of where Kernel to be placed
f_shape = np.shape(F)
np.multiply(X[:, :, index[0]:index[0] + f_shape[1], index[1]:index[1] + f_shape[2]], F)
But now, I want to compute the element-wise multiplication over multiple filters.
So my input will be:
X: Array of shape (batch_size, n_channels, image_height, image_width)
F: The filter to apply. Array of shape (n_filters, n_channels, filter_height, filter_width)
I am not able figure out an efficient numpy operation using broadcasting to solve this.
You want skimage.util.view_as_windows for both cases. In addition, np.multiply does not do dot products, np.dot does. Or in this case (when tracking many dimensions) np.einsum
from skimage.util import view_as_windows
x_window = view_as_windows(X, (n_channels, filter_height, filter_width)).squeeze()
single_filter_mult = np.einsum('ijkmnp, mnp -> ijkmnp', x_window, F)
single_filter_dot = np.einsum('ijkmnp, mpq -> ijklmq', x_window, F)
multi_filter_mult = np.einsum('ijkmnp, lmnp -> ijklmnp', x_window, F_multi)
multi_filter_dot = np.einsum('ijkmnp, lmpq -> ijklmnq', x_window, F_multi)
now *_filter_*[index] will give the expected output.
I want to calculate the categorical crossentropy of two numpy arrays. Both arrays have the same length.
y_true contains around 10000 2D arrays, which are the labels
y_pred contains 10000 2D arrays, which are my predictions
The result should be a 1D numpy array which contains all the categorical crossentropy values for the arrays. The formular is:
Here x_true is the i-th element of one true vector and x_pred is the i-th element of the prediction vector.
My implementation looks like this, but it is very slow. The reshaping is done to convert the 2D arrays to 1D arrays to simple iterate over them.
def categorical_cross_entropy(y_true, y_pred):
losses = np.zeros(len(y_true))
for i in range(len(y_true)):
single_sequence = y_true[i].reshape(y_true.shape[1]*y_true.shape[2])
single_pred = y_pred[i].reshape(y_pred.shape[1]*y_pred.shape[2])
sum = 0
for j in range(len(single_sequence)):
log = math.log(single_pred[j])
sum = sum + single_sequence[j] * log
sum = sum * (-1)
losses[i] = sum
return losses
A conversion to tensors is not possible, since tf.constant(y_pred) fails in a MemoryError, because every 2D array in y_true and y_pred has roughly the dimensions 190 x 190. So any ideas?
You can use scipy.special.xlogy. For example,
In [10]: import numpy as np
In [11]: from scipy.special import xlogy
Create some data:
In [12]: y_true = np.random.randint(1, 10, size=(8, 200, 200))
In [13]: y_pred = np.random.randint(1, 10, size=(8, 200, 200))
Compute the result using xlogy:
In [14]: -xlogy(y_true, y_pred).sum(axis=(1, 2))
Out[14]:
array([-283574.67634307, -283388.18672431, -284720.65206688,
-285517.06983709, -286383.26148469, -282200.33634505,
-285781.78641736, -285862.91148953])
Verify the result by computing it with your function:
In [15]: categorical_cross_entropy(y_true, y_pred)
Out[15]:
array([-283574.67634309, -283388.18672432, -284720.65206689,
-285517.0698371 , -286383.2614847 , -282200.33634506,
-285781.78641737, -285862.91148954])
If you don't want the dependence on scipy, you can do the same thing with np.log, but you might get a warning if any value in y_pred is 0:
In [20]: -(y_true*np.log(y_pred)).sum(axis=(1, 2))
Out[20]:
array([-283574.67634307, -283388.18672431, -284720.65206688,
-285517.06983709, -286383.26148469, -282200.33634505,
-285781.78641736, -285862.91148953])
I have a numpy 2D array with values that range from 0 to 59.
for those who are familiar with DL and specifically Image Segmentation - I create the array (call it L) from a .png image and the value of each pixel L[x,y] means the class that this pixel belongs to (out of the 60 classes).
I want to create a 1-hot tensor - Lhot, in which (Lhot[x,y,z] == 1) only if (L[x,y] == z), and 0 otherwise.
I want to create it with some kind of broadcasting/indexing (1,2 lines) - without loops.
it should be functionally equal to this piece of code (Dtype corresponds to L):
Lhot = np.zeros((L.shape[0], L.shape[1], 60), dtype=Dtype)
for i in range(L.shape[0]):
for j in range(L.shape[1]):
Lhot[i,j,L[i,j]] = 1
anyone has an idea?
Thanks!
Much faster and cleaner way using pure numpy
Lhot = np.transpose(np.eye(60)[L], (1,2,0))
Problem you'll run into with multidimensional one-hots is they get really big and really sparse and there's no good way to handle sparse arrays with more than 2D in numpy/scipy (or sklearn or many other ML packages either I think). Do you really need an n-d one-hot?
Since typical one-hot encoding is defined for 1D vectors, all you have to do is flatten your matrix, use one hot encoder from scikit-learn (or any other library with one-hot encoding) and reshape back.
from sklearn.preprocessing import OneHotEncoder
n, m = L.shape
k = 60
Lhot = np.array(OneHotEncoder(n_values=k).fit_transform(L.reshape(-1,1)).todense()).reshape(n, m, k)
of course you can do it by hand too
n, m = L.shape
k = 60
Lhot = np.zeros((n*m, k)) # empty, flat array
Lhot[np.arange(n*m), L.flatten()] = 1 # one-hot encoding for 1D
Lhot = Lhot.reshape(n, m, k) # reshaping back to 3D tensor
I have two tensors.
A tensor of shape (1,N)
A tensor of shape (N,T)
What I want to calculate is the following scalar:
tf.reduce_sum seemed helpful, but I couldn't get my head around combining the two tensors and reduce functions to get what I want. Can someone help me how to write the above equation in tensorflow?
Does this work?
import tensorflow as tf
import numpy as np
N = 10
T = 20
l = tf.constant(np.random.randn(1, N), dtype=tf.float32)
z = tf.constant(np.random.randn(N, T), dtype=tf.float32)
with tf.Session() as sess:
# swap axis for broadcasting to work
l = tf.transpose(l, [1, 0])
z_div_l = tf.divide(z, l)
z_div_l_2 = tf.divide(1.0 - z, 1.0 - l)
result = tf.reduce_sum(tf.add(z_div_l, z_div_l_2), axis=0)
eval_result = sess.run(result)
print('{}\n{}'.format(eval_result.shape, eval_result))
This calculates the above expression for every t from 0 to T-1, so it is not a scalar but a vector of size (T,). Your question mentions you want to compute just one scalar, but the sum is only over N and not over T, so I assumed you just want this expression to be evaluated for every t.