I want to create a new tensor z from two tensors, say x and y with dimensions [N_samples, S, N_feats] and [N_samples, T, N_feats] respectively. The aim is to combine both tensors on the 2nd dim by mixing the elements of the 2nd dim in a specific ordering, which is stored in a variable order with dim [N_samples, U].
The ordering is different for every sample and is basically which index to extract from which tensor. It looks like this for a given sample order[0] - [x_0, x_1, y_0, x_2, y_1, ... ], where the letter indicates the tensor and the number indicates the index of the 2nd dim. So z[0] would be
z[0] = [x[0, 0, :], x[0, 1, :], y[0, 0, :], x[0, 2, :], y[0, 1, :] ... ]
How would I achieve this? I've written something that uses torch.gather that tries to do this.
x = torch.rand((2, 4, 5))
y = torch.rand((2, 3, 5))
# new ordering of second dim
# positive means take (n-1)th element from x
# negative means take (n-1)th element from y
order = [[1, 2, -1, 3, -2, 4, 3],
[1, -1, -2, 2, 3, 4, -3]]
# simple concat for gather
combined = torch.cat([x, y], dim=1)
# add a zero padding on top of combined tensor to ease gather
zero = torch.zeros_like(x)[:, 1:2]
combined = torch.cat([zero, combined], dim=1)
def _create_index_for_gather(index, offset, n_feats):
new_index = [abs(i) + offset if i < 0 else i for i in index]
# need to repeat index for each dim for torch.gather
new_index = [[x] * n_feats for x in new_index]
return new_index
_, offset, n_feats = x.shape
index_for_gather = [_create_index_for_gather(i, offset, n_feats) for i in order]
z = combined.gather(dim=1, index=torch.tensor(index_for_gather))
Is there a more efficient way of doing this?
Related
Hi I'm new to Pytorch and torch tensors. I'm reading yolo_v3 code and encounter this question. I think it relates to tensor indexing with ..., but it's difficult to search ... by google, so I figure to ask it here. The code is:
prediction = (
x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)
.permute(0, 1, 3, 4, 2)
.contiguous()
)
print (prediction.shape)
# Get outputs
x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
pred_conf = torch.sigmoid(prediction[..., 4]) # Conf
pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.
My understanding is that the prediction will be a tensor with shape of [batch, anchor, x_grid, y_grid, class]. But what does prediction[..., x] do (x=0,1,2,3,4,5)? Is it similar as numpy indexing of [:, x]? If so the calculation of x, y, w, h, pred_conf and pred_cls don't make sense.
It's call Ellipsis. It indicate unspecified dimensions of ndarray or tensor.
Here, if prediction shape is [batch, anchor, x_grid, y_grid, class] then
prediction[..., 0] # is equivalent to prediction[:,:,:,:,0]
prediction[..., 1] # is equivalent to prediction[:,:,:,:,1]
More
prediction[0, ..., 0] # equivalent to prediction[0,:,:,:,0]
You can also write ... as Ellipsis
prediction[Ellipsis, 0]
First things first: I'm relatively new to TensorFlow.
I'm trying to implement a custom layer in tensorflow.keras and I'm having relatively hard time when I try to achieve the following:
I've got 3 Tensors (x,y,z) of shape (?,49,3,3,32) [where ? is the batch size]
On each Tensor I compute the sum over the 3rd and 4th axes [thus I end up with 3 Tensors of shape (?,49,32)]
By doing an argmax (A)on the above 3 Tensors (?,49,32) I get a single (?,49,32) Tensor
Now I want to use this tensor to select slices from the initial x,y,z Tensors in the following form:
Each element in the last dimension of A corresponds to the selected Tensor.
(aka: 0 = X, 1 = Y, 2 = Z)
The index of the last dimension of A corresponds to the slice that I would like to extract from the Tensor last dimension.
I've tried to achieve the above using tf.gather but I had no luck. Then I tried using a series of tf.map_fn, which is ugly and computationally costly.
To simplify the above:
let's say we've got an A array of shape (3,3,3,32). Then the numpy equivalent of what I try to achieve is this:
import numpy as np
x = np.random.rand(3,3,32)
y = np.random.rand(3,3,32)
z = np.random.rand(3,3,32)
x_sums = np.sum(np.sum(x,axis=0),0);
y_sums = np.sum(np.sum(y,axis=0),0);
z_sums = np.sum(np.sum(z,axis=0),0);
max_sums = np.argmax([x_sums,y_sums,z_sums],0)
A = np.array([x,y,z])
tmp = []
for i in range(0,len(max_sums)):
tmp.append(A[max_sums[i],:,:,i)
output = np.transpose(np.stack(tmp))
Any suggestions?
ps: I tried tf.gather_nd but I had no luck
This is how you can do something like that with tf.gather_nd:
import tensorflow as tf
# Make example data
tf.random.set_seed(0)
b = 10 # Batch size
x = tf.random.uniform((b, 49, 3, 3, 32))
y = tf.random.uniform((b, 49, 3, 3, 32))
z = tf.random.uniform((b, 49, 3, 3, 32))
# Stack tensors together
data = tf.stack([x, y, z], axis=2)
# Put reduction axes last
data_t = tf.transpose(data, (0, 1, 5, 2, 3, 4))
# Reduce
s = tf.reduce_sum(data_t, axis=(4, 5))
# Find largest sums
idx = tf.argmax(s, 3)
# Make gather indices
data_shape = tf.shape(data_t, idx.dtype)
bb, ii, jj = tf.meshgrid(*(tf.range(data_shape[i]) for i in range(3)), indexing='ij')
# Gather result
output_t = tf.gather_nd(data_t, tf.stack([bb, ii, jj, idx], axis=-1))
# Reorder axes
output = tf.transpose(output_t, (0, 1, 3, 4, 2))
print(output.shape)
# TensorShape([10, 49, 3, 3, 32])
The Problem:
I want to calculate the dot product of a very large set of data. I am able to do this in a nested for-loop, but this is way too slow.
Here is a small example:
import numpy as np
points = np.array([[0.5, 2, 3, 5.5, 8, 11], [1, 2, -1.5, 0.5, 4, 5]])
lines = np.array([[0, 2, 4, 6, 10, 10, 0, 0], [0, 0, 0, 0, 0, 4, 4, 0]])
x1 = lines[0][0:-1]
y1 = lines[1][0:-1]
L1 = np.asarray([x1, y1])
# calculate the relative length of the projection
# of each point onto each line
a = np.diff(lines)
b = points[:,:,None] - L1[:,None,:]
print(a.shape)
print(b.shape)
[rows, cols, pages] = np.shape(b)
Z = np.zeros((cols, pages))
for k in range(cols):
for l in range(pages):
Z[k][l] = a[0][l]*b[0][k][l] + a[1][l]*b[1][k][l]
N = np.linalg.norm(a, axis=0)**2
relativeProjectionLength = np.squeeze(np.asarray(Z/N))
In this example, the first two dimensions of both a and b represent the x- and y-coordinates that I need for the dot product.
The shape of a is (2,7) and b has (2,6,7). Since the dot product reduces the first dimension I would expect the result to be of the shape (6,7). How can I calculate this without the slow loops?
What I have tried:
I think that numpy.dot with correct broadcasting could do the job, however I have trouble setting up the dimensions correctly.
a = a[:, None, :]
Z = np.dot(a,b)
This on gives me the following error:
shapes (2,1,7) and (2,6,7) not aligned: 7 (dim 2) != 6 (dim 1)
You can use np.einsum -
np.einsum('ij,ikj->kj',a,b)
Explanation :
Keep the last axes aligned for the two inputs.
Sum-reduce the first from those.
Let the rest stay, which is the second axis of b.
Usual rules on whether to use einsum or stick to a loopy-dot based method apply here.
numpy.dot does not reduce the first dimension. From the docs:
For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
That is exactly what the error is telling you: it is attempting to match axis 2 in the first vector to axis 1 in the second.
You can fix this using numpy.rollaxis or better yet numpy.moveaxis. Instead of a = a[:, None, :], do
a = np.movesxis(a, 0, -1)
b = np.moveaxis(b, 0, -2)
Z = np.dot(a, b)
Better yet, you can construct your arrays to have the correct shape up front. For example, transpose lines and do a = np.diff(lines, axis=0).
I have R as 2D rotation matrices of shape (N,2,2). Now I wish to extend each matrix to (3,3) 3D rotation matrices, i.e. to put zeros in each [:,:2,:2] and put 1 to [:,2,2].
How to do this in tensorflow?
UPDATE
I tried this way
R = tf.get_variable(name='R', shape=np.shape(R_value), dtype=tf.float64,
initializer=tf.constant_initializer(R_value))
eye = tf.eye(np.shape(R_value)[1]+1)
right_column = eye[:2,2]
bottom_row = eye[2,:]
R = tf.concat([R, right_column], 3)
R = tf.concat([R, bottom_row], 2)
but failed, because concat doesn't do broadcasting...
UPDATE 2
I made explicit broadcasting and also fixed wrong indices in concat calls:
R = tf.get_variable(name='R', shape=np.shape(R_value), dtype=tf.float64,
initializer=tf.constant_initializer(R_value))
eye = tf.eye(np.shape(R_value)[1]+1, dtype=tf.float64)
right_column = eye[:2,2]
right_column = tf.expand_dims(right_column, 0)
right_column = tf.expand_dims(right_column, 2)
right_column = tf.tile(right_column, (np.shape(R_value)[0], 1, 1))
bottom_row = eye[2,:]
bottom_row = tf.expand_dims(bottom_row, 0)
bottom_row = tf.expand_dims(bottom_row, 0)
bottom_row = tf.tile(bottom_row, (np.shape(R_value)[0], 1, 1))
R = tf.concat([R, right_column], 2)
R = tf.concat([R, bottom_row], 1)
The solutions looks rather complex. Are there any simpler ones?
first pad zeros to [N, 2, 2] to be [N, 3, 3] with padded = tf.pad(R, [[0, 0], [0, 1], [0, 1]])
then convert padded[N, 2, 2] to 1:
since tf.Tensor does not support assignment, you can do this with initializing a np.array, and then add them together.
arr = np.zeros((3, 3))
arr[2, 2] = 1
R = padded + arr # broadcast used here
now variable R is what you need
I want to compute the pairwise square distance of a batch of feature in Tensorflow. I have a simple implementation using + and * operations by
tiling the original tensor :
def pairwise_l2_norm2(x, y, scope=None):
with tf.op_scope([x, y], scope, 'pairwise_l2_norm2'):
size_x = tf.shape(x)[0]
size_y = tf.shape(y)[0]
xx = tf.expand_dims(x, -1)
xx = tf.tile(xx, tf.pack([1, 1, size_y]))
yy = tf.expand_dims(y, -1)
yy = tf.tile(yy, tf.pack([1, 1, size_x]))
yy = tf.transpose(yy, perm=[2, 1, 0])
diff = tf.sub(xx, yy)
square_diff = tf.square(diff)
square_dist = tf.reduce_sum(square_diff, 1)
return square_dist
This function takes as input two matrices of size (m,d) and (n,d) and compute the squared distance between each row vector. The output is a matrix of size (m,n) with element 'd_ij = dist(x_i, y_j)'.
The problem is that I have a large batch and high dim features 'm, n, d' replicating the tensor consume a lot of memory.
I'm looking for another way to implement this without increasing the memory usage and just only store the final distance tensor. Kind of double looping the original tensor.
You can use some linear algebra to turn it into matrix ops. Note that what you need matrix D where a[i] is the ith row of your original matrix and
D[i,j] = (a[i]-a[j])(a[i]-a[j])'
You can rewrite that into
D[i,j] = r[i] - 2 a[i]a[j]' + r[j]
Where r[i] is squared norm of ith row of the original matrix.
In a system that supports standard broadcasting rules you can treat r as a column vector and write D as
D = r - 2 A A' + r'
In TensorFlow you could write this as
A = tf.constant([[1, 1], [2, 2], [3, 3]])
r = tf.reduce_sum(A*A, 1)
# turn r into column vector
r = tf.reshape(r, [-1, 1])
D = r - 2*tf.matmul(A, tf.transpose(A)) + tf.transpose(r)
sess = tf.Session()
sess.run(D)
result
array([[0, 2, 8],
[2, 0, 2],
[8, 2, 0]], dtype=int32)
Using squared_difference:
def squared_dist(A):
expanded_a = tf.expand_dims(A, 1)
expanded_b = tf.expand_dims(A, 0)
distances = tf.reduce_sum(tf.squared_difference(expanded_a, expanded_b), 2)
return distances
One thing I noticed is that this solution using tf.squared_difference gives me out of memory (OOM) for very large vectors, while the approach by #YaroslavBulatov doesn't. So, I think decomposing the operation yields a smaller memory footprint (which I thought squared_difference would handle better under the hood).
Here is a more general solution for two tensors of coordinates A and B:
def squared_dist(A, B):
assert A.shape.as_list() == B.shape.as_list()
row_norms_A = tf.reduce_sum(tf.square(A), axis=1)
row_norms_A = tf.reshape(row_norms_A, [-1, 1]) # Column vector.
row_norms_B = tf.reduce_sum(tf.square(B), axis=1)
row_norms_B = tf.reshape(row_norms_B, [1, -1]) # Row vector.
return row_norms_A - 2 * tf.matmul(A, tf.transpose(B)) + row_norms_B
Note that this is the square distance. If you want to change this to the Euclidean distance, perform a tf.sqrt on the result. If you want to do that, don't forget to add a small constant to compensate for the floating point instabilities: dist = tf.sqrt(squared_dist(A, B) + 1e-6).
If you want compute other method , then change the order of the tf modules.
def compute_euclidean_distance(x, y):
size_x = x.shape.dims[0]
size_y = y.shape.dims[0]
for i in range(size_x):
tile_one = tf.reshape(tf.tile(x[i], [size_y]), [size_y, -1])
eu_one = tf.expand_dims(tf.sqrt(tf.reduce_sum(tf.pow(tf.subtract(tile_one, y), 2), axis=1)), axis=0)
if i == 0:
d = eu_one
else:
d = tf.concat([d, eu_one], axis=0)
return d