Hi I'm new to Pytorch and torch tensors. I'm reading yolo_v3 code and encounter this question. I think it relates to tensor indexing with ..., but it's difficult to search ... by google, so I figure to ask it here. The code is:
prediction = (
x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)
.permute(0, 1, 3, 4, 2)
.contiguous()
)
print (prediction.shape)
# Get outputs
x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
pred_conf = torch.sigmoid(prediction[..., 4]) # Conf
pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.
My understanding is that the prediction will be a tensor with shape of [batch, anchor, x_grid, y_grid, class]. But what does prediction[..., x] do (x=0,1,2,3,4,5)? Is it similar as numpy indexing of [:, x]? If so the calculation of x, y, w, h, pred_conf and pred_cls don't make sense.
It's call Ellipsis. It indicate unspecified dimensions of ndarray or tensor.
Here, if prediction shape is [batch, anchor, x_grid, y_grid, class] then
prediction[..., 0] # is equivalent to prediction[:,:,:,:,0]
prediction[..., 1] # is equivalent to prediction[:,:,:,:,1]
More
prediction[0, ..., 0] # equivalent to prediction[0,:,:,:,0]
You can also write ... as Ellipsis
prediction[Ellipsis, 0]
Related
I want to create a new tensor z from two tensors, say x and y with dimensions [N_samples, S, N_feats] and [N_samples, T, N_feats] respectively. The aim is to combine both tensors on the 2nd dim by mixing the elements of the 2nd dim in a specific ordering, which is stored in a variable order with dim [N_samples, U].
The ordering is different for every sample and is basically which index to extract from which tensor. It looks like this for a given sample order[0] - [x_0, x_1, y_0, x_2, y_1, ... ], where the letter indicates the tensor and the number indicates the index of the 2nd dim. So z[0] would be
z[0] = [x[0, 0, :], x[0, 1, :], y[0, 0, :], x[0, 2, :], y[0, 1, :] ... ]
How would I achieve this? I've written something that uses torch.gather that tries to do this.
x = torch.rand((2, 4, 5))
y = torch.rand((2, 3, 5))
# new ordering of second dim
# positive means take (n-1)th element from x
# negative means take (n-1)th element from y
order = [[1, 2, -1, 3, -2, 4, 3],
[1, -1, -2, 2, 3, 4, -3]]
# simple concat for gather
combined = torch.cat([x, y], dim=1)
# add a zero padding on top of combined tensor to ease gather
zero = torch.zeros_like(x)[:, 1:2]
combined = torch.cat([zero, combined], dim=1)
def _create_index_for_gather(index, offset, n_feats):
new_index = [abs(i) + offset if i < 0 else i for i in index]
# need to repeat index for each dim for torch.gather
new_index = [[x] * n_feats for x in new_index]
return new_index
_, offset, n_feats = x.shape
index_for_gather = [_create_index_for_gather(i, offset, n_feats) for i in order]
z = combined.gather(dim=1, index=torch.tensor(index_for_gather))
Is there a more efficient way of doing this?
First things first: I'm relatively new to TensorFlow.
I'm trying to implement a custom layer in tensorflow.keras and I'm having relatively hard time when I try to achieve the following:
I've got 3 Tensors (x,y,z) of shape (?,49,3,3,32) [where ? is the batch size]
On each Tensor I compute the sum over the 3rd and 4th axes [thus I end up with 3 Tensors of shape (?,49,32)]
By doing an argmax (A)on the above 3 Tensors (?,49,32) I get a single (?,49,32) Tensor
Now I want to use this tensor to select slices from the initial x,y,z Tensors in the following form:
Each element in the last dimension of A corresponds to the selected Tensor.
(aka: 0 = X, 1 = Y, 2 = Z)
The index of the last dimension of A corresponds to the slice that I would like to extract from the Tensor last dimension.
I've tried to achieve the above using tf.gather but I had no luck. Then I tried using a series of tf.map_fn, which is ugly and computationally costly.
To simplify the above:
let's say we've got an A array of shape (3,3,3,32). Then the numpy equivalent of what I try to achieve is this:
import numpy as np
x = np.random.rand(3,3,32)
y = np.random.rand(3,3,32)
z = np.random.rand(3,3,32)
x_sums = np.sum(np.sum(x,axis=0),0);
y_sums = np.sum(np.sum(y,axis=0),0);
z_sums = np.sum(np.sum(z,axis=0),0);
max_sums = np.argmax([x_sums,y_sums,z_sums],0)
A = np.array([x,y,z])
tmp = []
for i in range(0,len(max_sums)):
tmp.append(A[max_sums[i],:,:,i)
output = np.transpose(np.stack(tmp))
Any suggestions?
ps: I tried tf.gather_nd but I had no luck
This is how you can do something like that with tf.gather_nd:
import tensorflow as tf
# Make example data
tf.random.set_seed(0)
b = 10 # Batch size
x = tf.random.uniform((b, 49, 3, 3, 32))
y = tf.random.uniform((b, 49, 3, 3, 32))
z = tf.random.uniform((b, 49, 3, 3, 32))
# Stack tensors together
data = tf.stack([x, y, z], axis=2)
# Put reduction axes last
data_t = tf.transpose(data, (0, 1, 5, 2, 3, 4))
# Reduce
s = tf.reduce_sum(data_t, axis=(4, 5))
# Find largest sums
idx = tf.argmax(s, 3)
# Make gather indices
data_shape = tf.shape(data_t, idx.dtype)
bb, ii, jj = tf.meshgrid(*(tf.range(data_shape[i]) for i in range(3)), indexing='ij')
# Gather result
output_t = tf.gather_nd(data_t, tf.stack([bb, ii, jj, idx], axis=-1))
# Reorder axes
output = tf.transpose(output_t, (0, 1, 3, 4, 2))
print(output.shape)
# TensorShape([10, 49, 3, 3, 32])
I have R as 2D rotation matrices of shape (N,2,2). Now I wish to extend each matrix to (3,3) 3D rotation matrices, i.e. to put zeros in each [:,:2,:2] and put 1 to [:,2,2].
How to do this in tensorflow?
UPDATE
I tried this way
R = tf.get_variable(name='R', shape=np.shape(R_value), dtype=tf.float64,
initializer=tf.constant_initializer(R_value))
eye = tf.eye(np.shape(R_value)[1]+1)
right_column = eye[:2,2]
bottom_row = eye[2,:]
R = tf.concat([R, right_column], 3)
R = tf.concat([R, bottom_row], 2)
but failed, because concat doesn't do broadcasting...
UPDATE 2
I made explicit broadcasting and also fixed wrong indices in concat calls:
R = tf.get_variable(name='R', shape=np.shape(R_value), dtype=tf.float64,
initializer=tf.constant_initializer(R_value))
eye = tf.eye(np.shape(R_value)[1]+1, dtype=tf.float64)
right_column = eye[:2,2]
right_column = tf.expand_dims(right_column, 0)
right_column = tf.expand_dims(right_column, 2)
right_column = tf.tile(right_column, (np.shape(R_value)[0], 1, 1))
bottom_row = eye[2,:]
bottom_row = tf.expand_dims(bottom_row, 0)
bottom_row = tf.expand_dims(bottom_row, 0)
bottom_row = tf.tile(bottom_row, (np.shape(R_value)[0], 1, 1))
R = tf.concat([R, right_column], 2)
R = tf.concat([R, bottom_row], 1)
The solutions looks rather complex. Are there any simpler ones?
first pad zeros to [N, 2, 2] to be [N, 3, 3] with padded = tf.pad(R, [[0, 0], [0, 1], [0, 1]])
then convert padded[N, 2, 2] to 1:
since tf.Tensor does not support assignment, you can do this with initializing a np.array, and then add them together.
arr = np.zeros((3, 3))
arr[2, 2] = 1
R = padded + arr # broadcast used here
now variable R is what you need
Given
batch_images: 4D tensor of shape (B, H, W, C)
x: 3D tensor of shape (B, H, W)
y: 3D tensor of shape (B, H, W)
Goal
How can I index into batch_images using the x and y coordinates to obtain a 4D tensor of shape B, H, W, C. That is, I want to obtain for each batch, and for each pair (x, y) a tensor of shape C.
In numpy, this would be achieved using input_img[np.arange(B)[:,None,None], y, x] for example but I can't seem to make it work in tensorflow.
My attempt so far
def get_pixel_value(img, x, y):
"""
Utility function to get pixel value for
coordinate vectors x and y from a 4D tensor image.
"""
H = tf.shape(img)[1]
W = tf.shape(img)[2]
C = tf.shape(img)[3]
# flatten image
img_flat = tf.reshape(img, [-1, C])
# flatten idx
idx_flat = (x*W) + y
return tf.gather(img_flat, idx_flat)
which is returning an incorrect tensor of shape (B, H, W).
It should be possible to do it by flattening the tensor as you've done, but the batch dimension has to be taken into account in the index calculation.
In order to do this, you'll have to make an additional dummy batch index tensor with the same shape as x and y that always contains the index of the current batch.
This is basically the np.arange(B) from your numpy example, which is missing from your TensorFlow code.
You can also simplify things a bit by using tf.gather_nd, which does the index calculations for you.
Here's an example:
import numpy as np
import tensorflow as tf
# Example tensors
M = np.random.uniform(size=(3, 4, 5, 6))
x = np.random.randint(0, 5, size=(3, 4, 5))
y = np.random.randint(0, 4, size=(3, 4, 5))
def get_pixel_value(img, x, y):
"""
Utility function that composes a new image, with pixels taken
from the coordinates given in x and y.
The shapes of x and y have to match.
The batch order is preserved.
"""
# We assume that x and y have the same shape.
shape = tf.shape(x)
batch_size = shape[0]
height = shape[1]
width = shape[2]
# Create a tensor that indexes into the same batch.
# This is needed for gather_nd to work.
batch_idx = tf.range(0, batch_size)
batch_idx = tf.reshape(batch_idx, (batch_size, 1, 1))
b = tf.tile(batch_idx, (1, height, width))
indices = tf.pack([b, y, x], 3)
return tf.gather_nd(img, indices)
s = tf.Session()
print(s.run(get_pixel_value(M, x, y)).shape)
# Should print (3, 4, 5, 6).
# We've composed a new image of the same size from randomly picked x and y
# coordinates of each original image.
Numpy's meshgrid is very useful for converting two vectors to a coordinate grid. What is the easiest way to extend this to three dimensions? So given three vectors x, y, and z, construct 3x3D arrays (instead of 2x2D arrays) which can be used as coordinates.
Numpy (as of 1.8 I think) now supports higher that 2D generation of position grids with meshgrid. One important addition which really helped me is the ability to chose the indexing order (either xy or ij for Cartesian or matrix indexing respectively), which I verified with the following example:
import numpy as np
x_ = np.linspace(0., 1., 10)
y_ = np.linspace(1., 2., 20)
z_ = np.linspace(3., 4., 30)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
assert np.all(x[:,0,0] == x_)
assert np.all(y[0,:,0] == y_)
assert np.all(z[0,0,:] == z_)
Here is the source code of meshgrid:
def meshgrid(x,y):
"""
Return coordinate matrices from two coordinate vectors.
Parameters
----------
x, y : ndarray
Two 1-D arrays representing the x and y coordinates of a grid.
Returns
-------
X, Y : ndarray
For vectors `x`, `y` with lengths ``Nx=len(x)`` and ``Ny=len(y)``,
return `X`, `Y` where `X` and `Y` are ``(Ny, Nx)`` shaped arrays
with the elements of `x` and y repeated to fill the matrix along
the first dimension for `x`, the second for `y`.
See Also
--------
index_tricks.mgrid : Construct a multi-dimensional "meshgrid"
using indexing notation.
index_tricks.ogrid : Construct an open multi-dimensional "meshgrid"
using indexing notation.
Examples
--------
>>> X, Y = np.meshgrid([1,2,3], [4,5,6,7])
>>> X
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> Y
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]])
`meshgrid` is very useful to evaluate functions on a grid.
>>> x = np.arange(-5, 5, 0.1)
>>> y = np.arange(-5, 5, 0.1)
>>> xx, yy = np.meshgrid(x, y)
>>> z = np.sin(xx**2+yy**2)/(xx**2+yy**2)
"""
x = asarray(x)
y = asarray(y)
numRows, numCols = len(y), len(x) # yes, reversed
x = x.reshape(1,numCols)
X = x.repeat(numRows, axis=0)
y = y.reshape(numRows,1)
Y = y.repeat(numCols, axis=1)
return X, Y
It is fairly simple to understand. I extended the pattern to an arbitrary number of dimensions, but this code is by no means optimized (and not thoroughly error-checked either), but you get what you pay for. Hope it helps:
def meshgrid2(*arrs):
arrs = tuple(reversed(arrs)) #edit
lens = map(len, arrs)
dim = len(arrs)
sz = 1
for s in lens:
sz*=s
ans = []
for i, arr in enumerate(arrs):
slc = [1]*dim
slc[i] = lens[i]
arr2 = asarray(arr).reshape(slc)
for j, sz in enumerate(lens):
if j!=i:
arr2 = arr2.repeat(sz, axis=j)
ans.append(arr2)
return tuple(ans)
Can you show us how you are using np.meshgrid? There is a very good chance that you really don't need meshgrid because numpy broadcasting can do the same thing without generating a repetitive array.
For example,
import numpy as np
x=np.arange(2)
y=np.arange(3)
[X,Y] = np.meshgrid(x,y)
S=X+Y
print(S.shape)
# (3, 2)
# Note that meshgrid associates y with the 0-axis, and x with the 1-axis.
print(S)
# [[0 1]
# [1 2]
# [2 3]]
s=np.empty((3,2))
print(s.shape)
# (3, 2)
# x.shape is (2,).
# y.shape is (3,).
# x's shape is broadcasted to (3,2)
# y varies along the 0-axis, so to get its shape broadcasted, we first upgrade it to
# have shape (3,1), using np.newaxis. Arrays of shape (3,1) can be broadcasted to
# arrays of shape (3,2).
s=x+y[:,np.newaxis]
print(s)
# [[0 1]
# [1 2]
# [2 3]]
The point is that S=X+Y can and should be replaced by s=x+y[:,np.newaxis] because
the latter does not require (possibly large) repetitive arrays to be formed. It also generalizes to higher dimensions (more axes) easily. You just add np.newaxis where needed to effect broadcasting as necessary.
See http://www.scipy.org/EricsBroadcastingDoc for more on numpy broadcasting.
i think what you want is
X, Y, Z = numpy.mgrid[-10:10:100j, -10:10:100j, -10:10:100j]
for example.
Here is a multidimensional version of meshgrid that I wrote:
def ndmesh(*args):
args = map(np.asarray,args)
return np.broadcast_arrays(*[x[(slice(None),)+(None,)*i] for i, x in enumerate(args)])
Note that the returned arrays are views of the original array data, so changing the original arrays will affect the coordinate arrays.
Instead of writing a new function, numpy.ix_ should do what you want.
Here is an example from the documentation:
>>> ixgrid = np.ix_([0,1], [2,4])
>>> ixgrid
(array([[0],
[1]]), array([[2, 4]]))
>>> ixgrid[0].shape, ixgrid[1].shape
((2, 1), (1, 2))'
You can achieve that by changing the order:
import numpy as np
xx = np.array([1,2,3,4])
yy = np.array([5,6,7])
zz = np.array([9,10])
y, z, x = np.meshgrid(yy, zz, xx)