Given
batch_images: 4D tensor of shape (B, H, W, C)
x: 3D tensor of shape (B, H, W)
y: 3D tensor of shape (B, H, W)
Goal
How can I index into batch_images using the x and y coordinates to obtain a 4D tensor of shape B, H, W, C. That is, I want to obtain for each batch, and for each pair (x, y) a tensor of shape C.
In numpy, this would be achieved using input_img[np.arange(B)[:,None,None], y, x] for example but I can't seem to make it work in tensorflow.
My attempt so far
def get_pixel_value(img, x, y):
"""
Utility function to get pixel value for
coordinate vectors x and y from a 4D tensor image.
"""
H = tf.shape(img)[1]
W = tf.shape(img)[2]
C = tf.shape(img)[3]
# flatten image
img_flat = tf.reshape(img, [-1, C])
# flatten idx
idx_flat = (x*W) + y
return tf.gather(img_flat, idx_flat)
which is returning an incorrect tensor of shape (B, H, W).
It should be possible to do it by flattening the tensor as you've done, but the batch dimension has to be taken into account in the index calculation.
In order to do this, you'll have to make an additional dummy batch index tensor with the same shape as x and y that always contains the index of the current batch.
This is basically the np.arange(B) from your numpy example, which is missing from your TensorFlow code.
You can also simplify things a bit by using tf.gather_nd, which does the index calculations for you.
Here's an example:
import numpy as np
import tensorflow as tf
# Example tensors
M = np.random.uniform(size=(3, 4, 5, 6))
x = np.random.randint(0, 5, size=(3, 4, 5))
y = np.random.randint(0, 4, size=(3, 4, 5))
def get_pixel_value(img, x, y):
"""
Utility function that composes a new image, with pixels taken
from the coordinates given in x and y.
The shapes of x and y have to match.
The batch order is preserved.
"""
# We assume that x and y have the same shape.
shape = tf.shape(x)
batch_size = shape[0]
height = shape[1]
width = shape[2]
# Create a tensor that indexes into the same batch.
# This is needed for gather_nd to work.
batch_idx = tf.range(0, batch_size)
batch_idx = tf.reshape(batch_idx, (batch_size, 1, 1))
b = tf.tile(batch_idx, (1, height, width))
indices = tf.pack([b, y, x], 3)
return tf.gather_nd(img, indices)
s = tf.Session()
print(s.run(get_pixel_value(M, x, y)).shape)
# Should print (3, 4, 5, 6).
# We've composed a new image of the same size from randomly picked x and y
# coordinates of each original image.
Related
Hi I'm new to Pytorch and torch tensors. I'm reading yolo_v3 code and encounter this question. I think it relates to tensor indexing with ..., but it's difficult to search ... by google, so I figure to ask it here. The code is:
prediction = (
x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)
.permute(0, 1, 3, 4, 2)
.contiguous()
)
print (prediction.shape)
# Get outputs
x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
pred_conf = torch.sigmoid(prediction[..., 4]) # Conf
pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.
My understanding is that the prediction will be a tensor with shape of [batch, anchor, x_grid, y_grid, class]. But what does prediction[..., x] do (x=0,1,2,3,4,5)? Is it similar as numpy indexing of [:, x]? If so the calculation of x, y, w, h, pred_conf and pred_cls don't make sense.
It's call Ellipsis. It indicate unspecified dimensions of ndarray or tensor.
Here, if prediction shape is [batch, anchor, x_grid, y_grid, class] then
prediction[..., 0] # is equivalent to prediction[:,:,:,:,0]
prediction[..., 1] # is equivalent to prediction[:,:,:,:,1]
More
prediction[0, ..., 0] # equivalent to prediction[0,:,:,:,0]
You can also write ... as Ellipsis
prediction[Ellipsis, 0]
First things first: I'm relatively new to TensorFlow.
I'm trying to implement a custom layer in tensorflow.keras and I'm having relatively hard time when I try to achieve the following:
I've got 3 Tensors (x,y,z) of shape (?,49,3,3,32) [where ? is the batch size]
On each Tensor I compute the sum over the 3rd and 4th axes [thus I end up with 3 Tensors of shape (?,49,32)]
By doing an argmax (A)on the above 3 Tensors (?,49,32) I get a single (?,49,32) Tensor
Now I want to use this tensor to select slices from the initial x,y,z Tensors in the following form:
Each element in the last dimension of A corresponds to the selected Tensor.
(aka: 0 = X, 1 = Y, 2 = Z)
The index of the last dimension of A corresponds to the slice that I would like to extract from the Tensor last dimension.
I've tried to achieve the above using tf.gather but I had no luck. Then I tried using a series of tf.map_fn, which is ugly and computationally costly.
To simplify the above:
let's say we've got an A array of shape (3,3,3,32). Then the numpy equivalent of what I try to achieve is this:
import numpy as np
x = np.random.rand(3,3,32)
y = np.random.rand(3,3,32)
z = np.random.rand(3,3,32)
x_sums = np.sum(np.sum(x,axis=0),0);
y_sums = np.sum(np.sum(y,axis=0),0);
z_sums = np.sum(np.sum(z,axis=0),0);
max_sums = np.argmax([x_sums,y_sums,z_sums],0)
A = np.array([x,y,z])
tmp = []
for i in range(0,len(max_sums)):
tmp.append(A[max_sums[i],:,:,i)
output = np.transpose(np.stack(tmp))
Any suggestions?
ps: I tried tf.gather_nd but I had no luck
This is how you can do something like that with tf.gather_nd:
import tensorflow as tf
# Make example data
tf.random.set_seed(0)
b = 10 # Batch size
x = tf.random.uniform((b, 49, 3, 3, 32))
y = tf.random.uniform((b, 49, 3, 3, 32))
z = tf.random.uniform((b, 49, 3, 3, 32))
# Stack tensors together
data = tf.stack([x, y, z], axis=2)
# Put reduction axes last
data_t = tf.transpose(data, (0, 1, 5, 2, 3, 4))
# Reduce
s = tf.reduce_sum(data_t, axis=(4, 5))
# Find largest sums
idx = tf.argmax(s, 3)
# Make gather indices
data_shape = tf.shape(data_t, idx.dtype)
bb, ii, jj = tf.meshgrid(*(tf.range(data_shape[i]) for i in range(3)), indexing='ij')
# Gather result
output_t = tf.gather_nd(data_t, tf.stack([bb, ii, jj, idx], axis=-1))
# Reorder axes
output = tf.transpose(output_t, (0, 1, 3, 4, 2))
print(output.shape)
# TensorShape([10, 49, 3, 3, 32])
I want to filter a tensor by keeping 10% of the largest entries. Is there a Tensorflow function to do that? How would a possible implementation look like? I am looking for something that can handle tensors of shape [N,W,H,C] and [N,W*H*C].
By filter I mean that the shape of the tensor remains the same but only the largest 10% are kept. Thus all entries become zero except the 10% largest.
Is that possible?
The correct way of doing this would be computing the 90 percentile, for example with tf.contrib.distributions.percentile:
import tensorflow as tf
images = ... # [N, W, H, C]
n = tf.shape(images)[0]
images_flat = tf.reshape(images, [n, -1])
p = tf.contrib.distributions.percentile(images_flat, 90, axis=1, interpolation='higher')
images_top10 = tf.where(images >= tf.reshape(p, [n, 1, 1, 1]),
images, tf.zeros_like(images))
If you want to be ready for TensorFlow 2.x, where tf.contrib will be removed, you can instead use TensorFlow Probability, which is where the percentile function will be permanently in the future.
EDIT: If you want to do the filtering per channel, you can modify the code slightly like this:
import tensorflow as tf
images = ... # [N, W, H, C]
shape = tf.shape(images)
n, c = shape[0], shape[3]
images_flat = tf.reshape(images, [n, -1, c])
p = tf.contrib.distributions.percentile(images_flat, 90, axis=1, interpolation='higher')
images_top10 = tf.where(images >= tf.reshape(p, [n, 1, 1, c]),
images, tf.zeros_like(images))
I've not found any built-in method yet. Try this workaround:
import numpy as np
import tensorflow as tf
def filter(tensor, ratio):
num_entries = tf.reduce_prod(tensor.shape)
num_to_keep = tf.cast(tf.multiply(ratio, tf.cast(num_entries, tf.float32)), tf.int32)
# Calculate threshold
x = tf.contrib.framework.sort(tf.reshape(tensor, [num_entries]))
threshold = x[-num_to_keep]
# Filter the tensor
mask = tf.cast(tf.greater_equal(tensor, threshold), tf.float32)
return tf.multiply(tensor, mask)
tensor = tf.constant(np.arange(40).reshape(2, 4, 5), dtype=tf.float32)
filtered_tensor = filter(tensor, 0.1)
# Print result
tf.InteractiveSession()
print(tensor.eval())
print(filtered_tensor.eval())
I've got a NP array called X_train with the following properties:
X_train.shape = (139,)
X_train[0].shape = (210, 224, 3)
X_train[1].shape = (220,180, 3)
In other words, there are 139 observations. Each image has a different width and height, but they all have 3 channels. So the dimension should be (139, None, None, 3) where None = variable.
Since you don't include the dimension for the number of observations in the layer, for the Conv2D layer I used input_shape=(None,None,3). But that gives me the error:
expected conv2d_1_input to have 4 dimensions, but got array with shape
(139, 1)
My guess is that the problem is that the input shape is (139,) instead of (139, None, None, 3). I'm not sure how to convert to that however.
One possible solution to your problem is to fill the arrays with zeros so that they all have a similar size. Afterwards, your input shape will be something like (139, max_x_dimension, max_y_dimension, 3).
The following functions will do the job:
import numpy as np
def fillwithzeros(inputarray, outputshape):
"""
Fills input array with dtype 'object' so that all arrays have the same shape as 'outputshape'
inputarray: input numpy array
outputshape: max dimensions in inputarray (obtained with the function 'findmaxshape')
output: inputarray filled with zeros
"""
length = len(inputarray)
output = np.zeros((length,)+outputshape, dtype=np.uint8)
for i in range(length):
output[i][:inputarray[i].shape[0],:inputarray[i].shape[1],:] = inputarray[i]
return output
def findmaxshape(inputarray):
"""
Finds maximum x and y in an inputarray with dtype 'object' and 3 dimensions
inputarray: input numpy array
output: detected maximum shape
"""
max_x, max_y, max_z = 0, 0, 0
for array in inputarray:
x, y, z = array.shape
if x > max_x:
max_x = x
if y > max_y:
max_y = y
if z > max_z:
max_z = z
return(max_x, max_y, max_z)
#Create random data similar to your data
random_data1 = np.random.randint(0,255, 210*224*3).reshape((210, 224, 3))
random_data2 = np.random.randint(0,255, 220*180*3).reshape((220, 180, 3))
X_train = np.array([random_data1, random_data2])
#Convert X_train so that all images have the same shape
new_shape = findmaxshape(X_train)
new_X_train = fillwithzeros(X_train, new_shape)
In short: I am looking for a simple numpy (maybe oneliner) implementation of Maxpool - maximum on a window on numpy.narray for all location of the window across dimensions.
In more details: I am implementing a convolutional neural network ("CNN"), one of the typical layers in such a network is MaxPool layer (look for example here). Writing
y = MaxPool(x, S), x is an input narray and S is a parameter, using pseudocode, the output of the MaxPool is given by:
y[b,h,w,c] = max(x[b, s*h + i, s*w + j, c]) over i = 0,..., S-1; j = 0,...,S-1.
That is, y is narray where the value at indexes b,h,w,c equals the maximum taken over the window of size S x S along the second and the third dimension of the input x, the window "corner" is placed at the indexes b,h,w,c.
Some additional details: The network is implemented using numpy. CNN has many "layers" where output of one layer is the input to the next layer. The input to a layers are numpy.narrays called "tensors". In my case tensors are 4-dimensional numpy.narray's, x. That is x.shape is a tuple (B,H,W,C). Each size of dimensions changes after the tensor is process by a layer, for example the input to layer i= 4 can have size B = 10, H = 24, W = 24, C = 3, while the output, aka input to i+1 layer has B = 10, H = 12, W = 12, C = 5. As indicated in the comments the size after application of MaxPool is (B, H - S + 1, W - S + 1, C).
For a concreteness: if I use
import numpy as np
y = np.amax(x, axis = (1,2))
where x.shape is say (2,3,3,4) this will give me what I want but for a degenerate case where the window I am maximizing over is of the size 3 x 3, the size of the second and third dimension of x, which is not exactly what I want.
Here's a solution using np.lib.stride_tricks.as_strided to create sliding windows resulting in a 6D array of shape : (B,H-S+1,W-S+1,S,S,C) and then simply performing max along the fourth and fifth axes, resulting in an output array of shape : (B,H-S+1,W-S+1,C). The intermediate 6D array would be a view into the input array and as such won't occupy anymore memory. The subsequent operation of max being a reduction would efficiently utilize the sliding views.
Thus, an implementation would be -
# Based on http://stackoverflow.com/a/41850409/3293881
def patchify(img, patch_shape):
a, X, Y, b = img.shape
x, y = patch_shape
shape = (a, X - x + 1, Y - y + 1, x, y, b)
a_str, X_str, Y_str, b_str = img.strides
strides = (a_str, X_str, Y_str, X_str, Y_str, b_str)
return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
out = patchify(x, (S,S)).max(axis=(3,4))
Sample run -
In [224]: x = np.random.randint(0,9,(10,24,24,3))
In [225]: S = 5
In [226]: np.may_share_memory(patchify(x, (S,S)), x)
Out[226]: True
In [227]: patchify(x, (S,S)).shape
Out[227]: (10, 20, 20, 5, 5, 3)
In [228]: patchify(x, (S,S)).max(axis=(3,4)).shape
Out[228]: (10, 20, 20, 3)