Why transpose in Spatial Batch Normalization

Why transpose in Spatial Batch Normalization - python

I was trying to write a function called spatial_batchnorm_forward, used in Convolutional Neural Network. In this function, I wanted to reuse the batchnorm_foward function, which is implemented for a (N, D) shaped input in Fully Connected Network.
The following is a correct implementation.
def spatial_batchnorm_forward(x, gamma, beta, bn_param):
"""Computes the forward pass for spatial batch normalization.
"""
out, cache = None, None
N, C, H, W = x.shape
x_ = x.transpose(0,2,3,1).reshape(N*H*W, C)
out_, cache = batchnorm_forward(x_, gamma, beta, bn_param)
out = out_.reshape(N, H, W, C).transpose(0,3,1,2)
return out, cache
But at first, I wrote it as:
def spatial_batchnorm_forward(x, gamma, beta, bn_param):
"""Computes the forward pass for spatial batch normalization.
"""
out, cache = None, None
N, C, H, W = x.shape
x_ = x.reshape(-1, C)
out_, cache = batchnorm_forward(x_, gamma, beta, bn_param)
out = out_.reshape(N, C, H, W)
return out, cache
This code can run, which means those dimensions match. But the output is slightly different from the above one.
I was wondering what's going on here.
Really appreciate your patience and help!!!
I guess the problem occurs in the reshape function, so I read the document.
numpy.reshape(a, newshape, order='C')[source]
Gives a new shape to an array without changing its data.
Parameters
aarray_like
Array to be reshaped.
newshapeint or tuple of ints
The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
order{‘C’, ‘F’, ‘A’}, optional
Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to read / write the elements using Fortran-like index order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of indexing. ‘A’ means to read / write the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise.
Returns
reshaped_arrayndarray
This will be a new view object if possible; otherwise, it will be a copy. Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.
But I still can't figure out what's going on here.

Related

Usage of torch.scatter() for multi-dimensional value

i have a question regarding the usage of the torch.scatter() function.
I want to construct a weights matrix weights (# [B, N, V]. B is batch size, N is number of points and V is the number of features for each point. )
Let's say i have two tensors
a = # shape [B, N, k], where B is batch size, N is number of points, k is the index number within [0,V] to select feature.
b = # shape [B, N, k], where B is batch size, N is number of points, k stores here the weights for selected feature.
I tried to use function torch.scatter():
weights.scatter_(index=a, dim=2, value=some_fix_value). By this operation i can only set one fixed value, but not the whole value tensor b, which contains all information at those location.
Can someone gives me a hint on how to do this properly?

I believe what you are looking to do is:
weights.scatter_(dim=2, index=a, src=b)
In other words, a's last dimension is indexing b's last dimension. Which corresponds to the following operation in pseudo-code when torch.scatter's dim argument is set to 2:
out[i][j][a[i][j][k]] = b[i][j][k]

Obtaining all combinations of Euclidean distance in Tensorflow?

I would like to form a loss function in Tensorflow that relies on a matrix containing all combinations of (squared) Euclidean distances for a set of embeddings. In numpy, like this:
# E is (batch_size,N,32)
N=100
D = np.zeros((batch_size,N,N))
for x in range(N):
for y in range(N):
D[:,x,y] = np.sum(np.square(E[:,x,:]-E[:,y,:]),axis=1)
How can I code this in Tensorflow/Keras without using the nested for loop, or no for loops at all?

This should do:
D = tf.reduce_sum((E[:, None, :] - E[:, :, None])**2, axis=-1)
D will be (batch_size, N, N). This also works in numpy (obviously use np.sum), so you could use that to check equivalence to the loop version just to be sure.
This solution works via broacasting: None is used to insert axes such that a size-N axis is matched against a size-1 axis, and the latter is broadcast (repeated) to match the former. This results in all elements being compared to all others (per batch element). It's a little hard to describe in text and also difficult to visualize since we are dealing with four-dimensional tensors here...

What does the MNIST tensorflow tutorial mean with matmul flipping trick?

The tutorial on MNIST for ML Beginners, in Implementing the Regression, shows how to make the regression on a single line, followed by an explanation that mentions the use of a trick (emphasis mine):
y = tf.nn.softmax(tf.matmul(x, W) + b)
First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs.
What is the trick here, and why are we using it?

Well, there's no trick here. That line basically points to one previous equation multiplication order
# Here the order of W and x, this equation for single example
y = Wx +b
# if you want to use batch of examples you need the change the order of multiplication; instead of using another transpose op
y = xW +b
# hence
y = tf.matmul(x, W)

Ok, I think the main point is that if you train in batches (i.e. train with several instances of the training set at once), TensorFlow always assumes that the zeroth dimension of x indicates the number of events per batch.
Suppose you want to map a training instance of dimension M to a target instance of dimension N. You would typically do this by multiplying x (a column vector) with a NxM matrix (and, optionally, add a bias with dimension N (also a column vector)), i.e.
y = W*x + b, where y is also a column vector.
This is perfectly alright seen from the perspective of linear algebra. But now comes the point with the training in batches, i.e. training with several training instances at once.
To get to understand this, it might be helpful to not view x (and y) as vectors of dimension M (and N), but as matrices with the dimensions Mx1 (and Nx1 for y).
Since TensorFlow assumes that the different training instances constituting a batch are aligned along the zeroth dimension, we get into trouble here since the zeroth dimension is occupied by the different elements of one single instance.
The trick is then to transpose the above equation (remember that transposition of a product also switches the order of the two transposed objects):
y^T = x^T * W^T + b^T
This is pretty much what has been described in short within the tutorial.
Note that y^T is now a matrix of dimension 1xN (practically a row vector), while x^T is a matrix of dimension 1xM (also a row vector). W^T is a matrix of dimension MxN. In the tutorial, they did not write x^T or y^T, but simply defined the placeholders according to this transposed equation. The only point that is not clear to me is why they did not define b the "transposed way". I assume that the + operator automatically transposes b if it is necessary in order to get the correct dimensions.
The rest is now pretty easy: if you have batches larger than 1 instance, you just "stack" multiple of the x (1xM) matrices, say to a matrix of dimensions (AxM) (where A is the batch size). b will hopefully automatically broadcasted to this number of events (that means to a matrix of dimension (AxN). If you then use
y^T = x^T * W^T + b^T,
you will get a (AxN) matrix of the targets for each element of the batch.

Tensorflow - matmul of input matrix with batch data

I have some data represented by input_x. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n. input_x undergoes tf.nn.embedding_lookup, so that embed now has dimensions [?, n, m] where m is the embedding size and ? refers to the unknown batch size.
This is described here:
input_x = tf.placeholder(tf.int32, [None, n], name="input_x")
embed = tf.nn.embedding_lookup(W, input_x)
I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U, and I can't seem to get how to do that.
I first tried using tf.matmul but it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of U and applying batch_matmul (I also tried the function from tf.nn.math_ops., the result was the same):
U = tf.Variable( ... )
U1 = tf.expand_dims(U,0)
h=tf.batch_matmul(embed, U1)
This passes the initial compilation, but then when actual data is applied, I get the following error:
In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]
I also know why this is happening - I replicated the dimension of U and it is now 1, but the minibatch size, 64, doesn't fit.
How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?

Previous answers are obsolete. Currently tf.matmul() support tensors with rank > 2:
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
Also tf.batch_matmul() was removed and tf.matmul() is the right way to do batch multiplication. The main idea can be understood from the following code:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)

1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((batch_size, m, p))
# python >= 3.5
MN = M # N
# or the old way,
MN = tf.matmul(M, N)
# MN has shape (batch_size, n, p)
2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise
We fall back to case 1 by adding and removing a dimension to v.
M = tf.random_normal((batch_size, n, m))
v = tf.random_normal((batch_size, m))
Mv = (M # v[..., None])[..., 0]
# Mv has shape (batch_size, n)
3. I want to multiply a single matrix with a batch of matrices
In this case, we cannot simply add a batch dimension of 1 to the single matrix, because tf.matmul does not broadcast in the batch dimension.
3.1. The single matrix is on the right side
In that case, we can treat the matrix batch as a single large matrix, using a simple reshape.
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((m, p))
MN = tf.reshape(tf.reshape(M, [-1, m]) # N, [-1, n, p])
# MN has shape (batch_size, n, p)
3.2. The single matrix is on the left side
This case is more complicated. We can fall back to case 3.1 by transposing the matrices.
MT = tf.matrix_transpose(M)
NT = tf.matrix_transpose(N)
NTMT = tf.reshape(tf.reshape(NT, [-1, m]) # MT, [-1, p, n])
MN = tf.matrix_transpose(NTMT)
However, transposition can be a costly operation, and here it is done twice on an entire batch of matrices. It may be better to simply duplicate M to match the batch dimension:
MN = tf.tile(M[None], [batch_size, 1, 1]) # N
Profiling will tell which option works better for a given problem/hardware combination.
4. I want to multiply a single matrix with a batch of vectors
This looks similar to case 3.2 since the single matrix is on the left, but it is actually simpler because transposing a vector is essentially a no-op. We end-up with
M = tf.random_normal((n, m))
v = tf.random_normal((batch_size, m))
MT = tf.matrix_transpose(M)
Mv = v # MT
What about einsum?
All of the previous multiplications could have been written with the tf.einsum swiss army knife. For example the first solution for 3.2 could be written simply as
MN = tf.einsum('nm,bmp->bnp', M, N)
However, note that einsum is ultimately relying on tranpose and matmul for the computation.
So even though einsum is a very convenient way to write matrix multiplications, it hides the complexity of the operations underneath — for example it is not straightforward to guess how many times an einsum expression will transpose your data, and therefore how costly the operation will be. Also, it may hide the fact that there could be several alternatives for the same operation (see case 3.2) and might not necessarily choose the better option.
For this reason, I would personally use explicit formulas like those above to better convey their respective complexity. Although if you know what you are doing and like the simplicity of the einsum syntax, then by all means go for it.

The matmul operation only works on matrices (2D tensors). Here are two main approaches to do this, both assume that U is a 2D tensor.
Slice embed into 2D tensors and multiply each of them with U individually. This is probably easiest to do using tf.scan() like this:
h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
On the other hand if efficiency is important it may be better to reshape embed to be a 2D tensor so the multiplication can be done with a single matmul like this:
embed = tf.reshape(embed, [-1, m])
h = tf.matmul(embed, U)
h = tf.reshape(h, [-1, n, c])
where c is the number of columns in U. The last reshape will make sure that h is a 3D tensor where the 0th dimension corresponds to the batch just like the original x_input and embed.

As answered by #Stryke, there are two ways to achieve this: 1. Scanning, and 2. Reshaping
tf.scan requires lambda functions and is generally used for recursive operations. Some examples for the same are here: https://rdipietro.github.io/tensorflow-scan-examples/
I personally prefer reshaping, since it is more intuitive. If you are trying to matrix multiply each matrix in the 3D tensor by the matrix that is the 2D tensor, like Cijl = Aijk * Bkl, you can do it with a simple reshape.
A' = tf.reshape(Aijk,[i*j,k])
C' = tf.matmul(A',Bkl)
C = tf.reshape(C',[i,j,l])

It seems that in TensorFlow 1.11.0 the docs for tf.matmul incorrectly say that it works for rank >= 2.
Instead, the best clean alternative I've found is to use tf.tensordot(a, b, (-1, 0)) (docs).
This function gets the dot product of any axis of array a and any axis of array b in its general form tf.tensordot(a, b, axis). Providing axis as (-1, 0) gets the standard dot product of two arrays.

What does 'M' means on scipy document?

I love numpy and scipy and often read those documents. For instance, according to the page describing about numpy.linalg.inv, inv takes a parameter of which type is array_like.
But, I do not know what (..., M, M) means put before array_like. Please let me know what is this.

It means that the operations will be performed for the last two axes of the input array and that the last two axes must be of the same size M.
So if you have an input array with 3 axes of shape n, M, M, it will return the inverse of the n 2D arrays of shape M, M, taken from the last 2 axes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why transpose in Spatial Batch Normalization - python

Related

Usage of torch.scatter() for multi-dimensional value

Obtaining all combinations of Euclidean distance in Tensorflow?

What does the MNIST tensorflow tutorial mean with matmul flipping trick?

Tensorflow - matmul of input matrix with batch data

What does 'M' means on scipy document?

Categories

Resources