Tensor Reshape No-op in RNN example

Tensor Reshape No-op in RNN example - python

What exactly is this code doing in the method zero_state for the RNNCell in the file rnn_cell.py? I'm not entirely sure what a shape of the form [-1, n] means...

The semantics of reshape are similar to the one from numpy:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html
It changes the tensor to have 2-dimensions and the second dimension should have self.state_size elements. E.g. if my tensor has 6 elements and I reshape it to [-1, 2], then the first dimension will have 6 / 2 = 3 elements.

Rafal's example is great. The way I remember -1 is that it sets the size of that dimension to whatever's necessary to fit all of the data from the original tensor. You can only have one -1 in a reshape.
If the original tensor is of size a, b, c (total elements = abc)
and you resize it to x, y, -1,
then the effect will be that the -1 will end up being abc/(y*z).
A 3,3,3 tensor (27 elements) reshaped to 9,3,-1 will actually have a size of 9,3,1 (27 elements)

Documentation vaguely suggests the behavior for -1.
If shape is the special value [-1], then tensor is flattened and the
operation outputs a 1-D tensor with all elements of tensor.
When the shape is an array, where one of the elements is -1, this value is just a convenient placeholder for some integer that might produce the same shape. Note that there might not be such an integer (for example if your starting matrix is [5x5] you can not reshape it with [7, -1]).
Also as you see there can not be two -1, because this make the shape ambiguous. As it was noted, the behavior is similar to numpy's reshape.

Related

Select tensor slice along a dimension based on index

I have a PyTorch tensor of the following shape: (100, 5, 100). I need to convert it into a tensor of shape (100, 100) by selecting from each row only one item in the second dimension, meaning that of those 5 elements I only need one, with its corresponding 100 elements.
To do this operation I have a second tensor of shape (100,) with the indices that specify which of those 5 items should be selected in each row.
Is there a simple way to perform this selection without having to mess with the dimensions too much?

Suppose tensor with indicies called idx and have shape (100,). Tensor with values called source. Then to select:
result = source[torch.arange(100), idx]

Why different shapes of array can have those following calculation? [duplicate]

I don't understand broadcasting. The documentation explains the rules of broadcasting but doesn't seem to define it in English. My guess is that broadcasting is when NumPy fills a smaller dimensional array with dummy data in order to perform an operation. But this doesn't work:
>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
The error message hints that I'm on the right track, though. Can someone define broadcasting and then provide some simple examples of when it works and when it doesn't?

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.
It's basically a way numpy can expand the domain of operations over arrays.
The only requirement for broadcasting is a way aligning array dimensions such that either:
Aligned dimensions are equal.
One of the aligned dimensions is 1.
So, for example if:
x = np.ndarray(shape=(4,1,3))
y = np.ndarray(shape=(3,3))
You could not align x and y like so:
4 x 1 x 3
3 x 3
But you could like so:
4 x 1 x 3
3 x 3
How would an operation like this result?
Suppose we have:
x = np.ndarray(shape=(1,3), buffer=np.array([1,2,3]),dtype='int')
array([[1, 2, 3]])
y = np.ndarray(shape=(3,3), buffer=np.array([1,1,1,1,1,1,1,1,1]),dtype='int')
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
The operation x + y would result in:
array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
I hope you caught the drift. If you did not, you can always check the official documentation here.
Cheers!

1.What is Broadcasting?
Broadcasting is a Tensor operation. Helpful in Neural Network (ML, AI)
2.What is the use of Broadcasting?
Without Broadcasting addition of only identical Dimension(shape) Tensors is supported.
Broadcasting Provide us the Flexibility to add two Tensors of Different Dimension.
for Example: adding a 2D Tensor with a 1D Tensor is not possible without broadcasting see the image explaining Broadcasting pictorially
Run the Python example code understand the concept
x = np.array([1,3,5,6,7,8])
y = np.array([2,4,5])
X=x.reshape(2,3)
x is reshaped to get a 2D Tensor X of shape (2,3), and adding this 2D Tensor X with 1D Tensor y of shape(1,3) to get a 2D Tensor z of shape(2,3)
print("X =",X)
print("\n y =",y)
z=X+y
print("X + y =",z)
You are almost correct about smaller Tensor, no ambiguity, the smaller tensor will be broadcasted to match the shape of the larger tensor.(Small vector is repeated but not filled with Dummy Data or Zeros to Match the Shape of larger).
3. How broadcasting happens?
Broadcasting consists of two steps:
1 Broadcast axes are added to the smaller tensor to match the ndim of
the larger tensor.
2 The smaller tensor is repeated alongside these new axes to match the full shape
of the larger tensor.
4. Why Broadcasting not happening in your code?
your code is working but Broadcasting can not happen here because both Tensors are different in shape but Identical in Dimensional(1D).
Broadcasting occurs when dimensions are nonidentical.
what you need to do is change Dimension of one of the Tensor, you will experience Broadcasting.
5. Going in Depth.
Broadcasting(repetition of smaller Tensor) occurs along broadcast axes but since both the Tensors are 1 Dimensional there is no broadcast Axis.
Don't Confuse Tensor Dimension with the shape of tensor,
Tensor Dimensions are not same as Matrices Dimension.

Broadcasting is numpy trying to be smart when you tell it to perform an operation on arrays that aren't the same dimension. For example:
2 + np.array([1,3,5]) == np.array([3, 5, 7])
Here it decided you wanted to apply the operation using the lower dimensional array (0-D) on each item in the higher-dimensional array (1-D).
You can also add a 0-D array (scalar) or 1-D array to a 2-D array. In the first case, you just add the scalar to all items in the 2-D array, as before. In the second case, numpy will add row-wise:
In [34]: np.array([1,2]) + np.array([[3,4],[5,6]])
Out[34]:
array([[4, 6],
[6, 8]])
There are ways to tell numpy to apply the operation along a different axis as well. This can be taken even further with applying an operation between a 3-D array and a 1-D, 2-D, or 0-D array.

>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
Broadcasting is how numpy do math operations with array of different shapes. Shapes are the format the array has, for example the array you used, x , has 3 elements of 1 dimension; y has 2 elements and 1 dimension.
To perform broadcasting there are 2 rules:
1) Array have the same dimensions(shape) or
2)The dimension that doesn't match equals one.
for example x has shape(2,3) [or 2 lines and 3 columns];
y has shape(2,1) [or 2 lines and 1 column]
Can you add them? x + y?
Answer: Yes, because the mismatched dimension is equal to 1 (the column in y). If y had shape(2,4) broadcasting would not be possible, because the mismatched dimension is not 1.
In the case you posted:
operands could not be broadcast together with shapes (3,) (2,);
it is because 3 and 2 mismatched altough both have 1 line.

I would like to suggest to try the np.broadcast_arrays, run some demos may give intuitive ideas. Official Document is also helpful. From my current understanding, numpy will compare the dimension from tail to head. If one dim is 1, it will broadcast in the dimension, if one array has more axes, such (256*256*3) multiply (1,), you can view (1) as (1,1,1). And broadcast will make (256,256,3).

What does layout = torch.strided mean?

As I was going through pytorch documentation I came across a term layout = torch.strided in many of the functions. Can anyone help me in understanding where is it used and how. The description says it's the the desired layout of returned Tensor. What does layout mean and how many types of layout are there ?
torch.rand(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

strides is number of steps (or jumps) that is needed to go from one element to next element, in a given dimension. In computer memory, the data is stored linearly in a contiguous block of memory. What we view is just a (re)presentation.
Let's take an example tensor for understanding this:
# a 2D tensor
In [62]: tensor = torch.arange(1, 16).reshape(3, 5)
In [63]: tensor
Out[63]:
tensor([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
With this tensor in place, the strides are:
# get the strides
In [64]: tensor.stride()
Out[64]: (5, 1)
What this resultant tuple (5, 1) says is:
to traverse along the 0th dimension/axis (Y-axis), let's say we want to jump from 1 to 6, we should take 5 steps (or jumps)
to traverse along the 1st dimension/axis (X-axis), let's say we want to jump from 7 to 8, we should take 1 step (or jump)
The order (or index) of 5 & 1 in the tuple represents the dimension/axis. You can also pass the dimension, for which you want the stride, as an argument:
# get stride for axis 0
In [65]: tensor.stride(0)
Out[65]: 5
# get stride for axis 1
In [66]: tensor.stride(1)
Out[66]: 1
With that understanding, we might have to ask why is this extra parameter needed when we create the tensors? The answer to that is for efficiency reasons. (How can we store/read/access the elements in the (sparse) tensor most efficiently?).
With sparse tensors (a tensor where most of the elements are just zeroes), so we don't want to store these values. we only store the non-zero values and their indices. With a desired shape, the rest of the values can then be filled with zeroes, yielding the desired sparse tensor.
For further reading on this, the following articles might be of help:
numpy.ndarray.strides
torch.layout
torch.sparse
P.S: I guess there's a typo in the torch.layout documentation which says
Strides are a list of integers ...
The composite data type returned by tensor.stride() is a tuple, not a list.

For quick understanding, layout=torch.strided corresponds to dense tensors while layout=torch.sparse_coo corresponds to sparse tensors.
From another perspective, we can understand it together with torch.tensor.view.
A tensor can be viewed indicates it is contiguous. If we change the view of a tensor, the strides will change accordingly, but the data will keep the same. More specifically, view returns a new tensor with the same data but different shape, and strides is compatible with the view to indicate how to access the data in the memory.
For example
In [1]: import torch
In [2]: a = torch.arange(15)
In [3]: a.data_ptr()
Out[3]: 94270437164688
In [4]: a.stride()
Out[4]: (1,)
In [5]: a = a.view(3, 5)
In [6]: a.data_ptr() # share the same data pointer
Out[6]: 94270437164688
In [7]: a.stride() # the stride changes as the view changes
Out[7]: (5, 1)
In addition, the idea of torch.strided is basically the same as strides in numpy.
View this question for more detailed understanding.
How to understand numpy strides for layman?

As per the official pytorch documentation here,
A torch.layout is an object that represents the memory layout of a
torch.Tensor. Currently, we support torch.strided (dense Tensors) and
have experimental support for torch.sparse_coo (sparse COO Tensors).
torch.strided represents dense Tensors and is the memory layout that
is most commonly used. Each strided tensor has an associated
torch.Storage, which holds its data. These tensors provide
multi-dimensional, strided view of a storage. Strides are a list of
integers: the k-th stride represents the jump in the memory necessary
to go from one element to the next one in the k-th dimension of the
Tensor. This concept makes it possible to perform many tensor
operations efficiently.
Example:
>>> x = torch.Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
>>> x.stride()
(5, 1)
>>> x.t().stride()
(1, 5)

layout means the way memory organize the element in that tensor,I think currently there are 2 types of layout to store tensor
one is torch.strided and another is torch.sparse_coo
strided means the element is arranged one by one in a very dense way, think about a strided troops, squares,so each soldier actually has neighbours.
while for sparse_coo I think should be deal with sparse matrix, the exact storage structure I am not sure but I guess it just stores non-zero elements' indices and values
It need to separate for those two types because for sparse matrix no need to arrange element one by one in a dense form, because it will take maybe one hundred steps for the non-zero element get to its next non-zero elements

Extracting specific columns from multi-dimensional array

Suppose we have a 3d numpy array in Python of shape (1, 22, 22) -random dimensions for illustration. If i want to extract the first 2 dimensions from Y, Z, then I can do:
new_array = array[:, 0:2, 0:2]
new_array.shape
(1, 2, 2)
But when I try to do the same by explicitly specifying the first two dimensions, as:
new_array = array[:, [0,1], [0,1]]
new_array.shape
(1, 2)
I'm getting a different result. Why's that? How can I select specific dimensions and and not a range of dimensions?

Passing a list to a numpy array's __getite__ uses advanced indexing instead of slicing. See the documentation here.
Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.
In your case, you are using the integer array indexing. The chain of integer indices are broadcast and iterated as a single unit. So using
array[:, [0,1], [0,1]]
selects elements (0,0) and (1,1), not the zeroth and first subarray from dimension 1 and the zeroth and first subarray form dimension 2.

I read the documentation and played around with my code. The only thing that seemed to work -but doesn't- with respect to my question is:
columns = np.array(([0, 1]), ([0,1]), dtype=np.intp)
new_array = my_array[:, columns, 0]
I'm still not quite sure why it works though.
EDIT: doesn't work as expected

NumPy - What is broadcasting?

I don't understand broadcasting. The documentation explains the rules of broadcasting but doesn't seem to define it in English. My guess is that broadcasting is when NumPy fills a smaller dimensional array with dummy data in order to perform an operation. But this doesn't work:
>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
The error message hints that I'm on the right track, though. Can someone define broadcasting and then provide some simple examples of when it works and when it doesn't?

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.
It's basically a way numpy can expand the domain of operations over arrays.
The only requirement for broadcasting is a way aligning array dimensions such that either:
Aligned dimensions are equal.
One of the aligned dimensions is 1.
So, for example if:
x = np.ndarray(shape=(4,1,3))
y = np.ndarray(shape=(3,3))
You could not align x and y like so:
4 x 1 x 3
3 x 3
But you could like so:
4 x 1 x 3
3 x 3
How would an operation like this result?
Suppose we have:
x = np.ndarray(shape=(1,3), buffer=np.array([1,2,3]),dtype='int')
array([[1, 2, 3]])
y = np.ndarray(shape=(3,3), buffer=np.array([1,1,1,1,1,1,1,1,1]),dtype='int')
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
The operation x + y would result in:
array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
I hope you caught the drift. If you did not, you can always check the official documentation here.
Cheers!

1.What is Broadcasting?
Broadcasting is a Tensor operation. Helpful in Neural Network (ML, AI)
2.What is the use of Broadcasting?
Without Broadcasting addition of only identical Dimension(shape) Tensors is supported.
Broadcasting Provide us the Flexibility to add two Tensors of Different Dimension.
for Example: adding a 2D Tensor with a 1D Tensor is not possible without broadcasting see the image explaining Broadcasting pictorially
Run the Python example code understand the concept
x = np.array([1,3,5,6,7,8])
y = np.array([2,4,5])
X=x.reshape(2,3)
x is reshaped to get a 2D Tensor X of shape (2,3), and adding this 2D Tensor X with 1D Tensor y of shape(1,3) to get a 2D Tensor z of shape(2,3)
print("X =",X)
print("\n y =",y)
z=X+y
print("X + y =",z)
You are almost correct about smaller Tensor, no ambiguity, the smaller tensor will be broadcasted to match the shape of the larger tensor.(Small vector is repeated but not filled with Dummy Data or Zeros to Match the Shape of larger).
3. How broadcasting happens?
Broadcasting consists of two steps:
1 Broadcast axes are added to the smaller tensor to match the ndim of
the larger tensor.
2 The smaller tensor is repeated alongside these new axes to match the full shape
of the larger tensor.
4. Why Broadcasting not happening in your code?
your code is working but Broadcasting can not happen here because both Tensors are different in shape but Identical in Dimensional(1D).
Broadcasting occurs when dimensions are nonidentical.
what you need to do is change Dimension of one of the Tensor, you will experience Broadcasting.
5. Going in Depth.
Broadcasting(repetition of smaller Tensor) occurs along broadcast axes but since both the Tensors are 1 Dimensional there is no broadcast Axis.
Don't Confuse Tensor Dimension with the shape of tensor,
Tensor Dimensions are not same as Matrices Dimension.

Broadcasting is numpy trying to be smart when you tell it to perform an operation on arrays that aren't the same dimension. For example:
2 + np.array([1,3,5]) == np.array([3, 5, 7])
Here it decided you wanted to apply the operation using the lower dimensional array (0-D) on each item in the higher-dimensional array (1-D).
You can also add a 0-D array (scalar) or 1-D array to a 2-D array. In the first case, you just add the scalar to all items in the 2-D array, as before. In the second case, numpy will add row-wise:
In [34]: np.array([1,2]) + np.array([[3,4],[5,6]])
Out[34]:
array([[4, 6],
[6, 8]])
There are ways to tell numpy to apply the operation along a different axis as well. This can be taken even further with applying an operation between a 3-D array and a 1-D, 2-D, or 0-D array.

>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
Broadcasting is how numpy do math operations with array of different shapes. Shapes are the format the array has, for example the array you used, x , has 3 elements of 1 dimension; y has 2 elements and 1 dimension.
To perform broadcasting there are 2 rules:
1) Array have the same dimensions(shape) or
2)The dimension that doesn't match equals one.
for example x has shape(2,3) [or 2 lines and 3 columns];
y has shape(2,1) [or 2 lines and 1 column]
Can you add them? x + y?
Answer: Yes, because the mismatched dimension is equal to 1 (the column in y). If y had shape(2,4) broadcasting would not be possible, because the mismatched dimension is not 1.
In the case you posted:
operands could not be broadcast together with shapes (3,) (2,);
it is because 3 and 2 mismatched altough both have 1 line.

I would like to suggest to try the np.broadcast_arrays, run some demos may give intuitive ideas. Official Document is also helpful. From my current understanding, numpy will compare the dimension from tail to head. If one dim is 1, it will broadcast in the dimension, if one array has more axes, such (256*256*3) multiply (1,), you can view (1) as (1,1,1). And broadcast will make (256,256,3).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensor Reshape No-op in RNN example - python

What exactly is this code doing in the method zero_state for the RNNCell in the file rnn_cell.py? I'm not entirely sure what a shape of the form [-1, n] means...

Related

Select tensor slice along a dimension based on index

Why different shapes of array can have those following calculation? [duplicate]

What does layout = torch.strided mean?

Extracting specific columns from multi-dimensional array

NumPy - What is broadcasting?

Categories

Resources