What does layout = torch.strided mean?

What does layout = torch.strided mean? - python

As I was going through pytorch documentation I came across a term layout = torch.strided in many of the functions. Can anyone help me in understanding where is it used and how. The description says it's the the desired layout of returned Tensor. What does layout mean and how many types of layout are there ?
torch.rand(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

strides is number of steps (or jumps) that is needed to go from one element to next element, in a given dimension. In computer memory, the data is stored linearly in a contiguous block of memory. What we view is just a (re)presentation.
Let's take an example tensor for understanding this:
# a 2D tensor
In [62]: tensor = torch.arange(1, 16).reshape(3, 5)
In [63]: tensor
Out[63]:
tensor([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
With this tensor in place, the strides are:
# get the strides
In [64]: tensor.stride()
Out[64]: (5, 1)
What this resultant tuple (5, 1) says is:
to traverse along the 0th dimension/axis (Y-axis), let's say we want to jump from 1 to 6, we should take 5 steps (or jumps)
to traverse along the 1st dimension/axis (X-axis), let's say we want to jump from 7 to 8, we should take 1 step (or jump)
The order (or index) of 5 & 1 in the tuple represents the dimension/axis. You can also pass the dimension, for which you want the stride, as an argument:
# get stride for axis 0
In [65]: tensor.stride(0)
Out[65]: 5
# get stride for axis 1
In [66]: tensor.stride(1)
Out[66]: 1
With that understanding, we might have to ask why is this extra parameter needed when we create the tensors? The answer to that is for efficiency reasons. (How can we store/read/access the elements in the (sparse) tensor most efficiently?).
With sparse tensors (a tensor where most of the elements are just zeroes), so we don't want to store these values. we only store the non-zero values and their indices. With a desired shape, the rest of the values can then be filled with zeroes, yielding the desired sparse tensor.
For further reading on this, the following articles might be of help:
numpy.ndarray.strides
torch.layout
torch.sparse
P.S: I guess there's a typo in the torch.layout documentation which says
Strides are a list of integers ...
The composite data type returned by tensor.stride() is a tuple, not a list.

For quick understanding, layout=torch.strided corresponds to dense tensors while layout=torch.sparse_coo corresponds to sparse tensors.
From another perspective, we can understand it together with torch.tensor.view.
A tensor can be viewed indicates it is contiguous. If we change the view of a tensor, the strides will change accordingly, but the data will keep the same. More specifically, view returns a new tensor with the same data but different shape, and strides is compatible with the view to indicate how to access the data in the memory.
For example
In [1]: import torch
In [2]: a = torch.arange(15)
In [3]: a.data_ptr()
Out[3]: 94270437164688
In [4]: a.stride()
Out[4]: (1,)
In [5]: a = a.view(3, 5)
In [6]: a.data_ptr() # share the same data pointer
Out[6]: 94270437164688
In [7]: a.stride() # the stride changes as the view changes
Out[7]: (5, 1)
In addition, the idea of torch.strided is basically the same as strides in numpy.
View this question for more detailed understanding.
How to understand numpy strides for layman?

As per the official pytorch documentation here,
A torch.layout is an object that represents the memory layout of a
torch.Tensor. Currently, we support torch.strided (dense Tensors) and
have experimental support for torch.sparse_coo (sparse COO Tensors).
torch.strided represents dense Tensors and is the memory layout that
is most commonly used. Each strided tensor has an associated
torch.Storage, which holds its data. These tensors provide
multi-dimensional, strided view of a storage. Strides are a list of
integers: the k-th stride represents the jump in the memory necessary
to go from one element to the next one in the k-th dimension of the
Tensor. This concept makes it possible to perform many tensor
operations efficiently.
Example:
>>> x = torch.Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
>>> x.stride()
(5, 1)
>>> x.t().stride()
(1, 5)

layout means the way memory organize the element in that tensor,I think currently there are 2 types of layout to store tensor
one is torch.strided and another is torch.sparse_coo
strided means the element is arranged one by one in a very dense way, think about a strided troops, squares,so each soldier actually has neighbours.
while for sparse_coo I think should be deal with sparse matrix, the exact storage structure I am not sure but I guess it just stores non-zero elements' indices and values
It need to separate for those two types because for sparse matrix no need to arrange element one by one in a dense form, because it will take maybe one hundred steps for the non-zero element get to its next non-zero elements

Related

pytorch view tensor and reduce one dimension

So I have a 4d tensor with shape [4,1,128,678] and I would like to view/reshape it as [4,678,128].
I have to do this for multiple tensors where the last shape value 678 is not always know and could be different, so [4,1,128,575]should also go to [4,575,128]
Any idea on what is the optimal operation to transform the tensor? view/reshape? and how?
Thanks

You could also use (less to write and IMO cleaner):
# x.shape == (4, 1, 128, 678)
x.squeeze().permute(0, 2, 1)
If you were to use view you would lose dimension information (but maybe that is what you want), in this case it would be:
x.squeeze().view(4, -1, 128)
permute reorders tensors, while shape only gives a different view without restructuring underlying memory. You can see the difference between those two operations in this StackOverflow answer.

Use einops instead, it can do all operations in one turn and verify known dimensions:
from einops import reshape
y = rearrange(x, 'x 1 y z -> x z y', x=4, y=128)

Numpy [...,None]

I have found myself needing to add features to existing numpy arrays which has led to a question around what the last portion of the following code is actually doing:
np.ones(shape=feature_set.shape)[...,None]
Set-up
As an example, let's say I wish to solve for linear regression parameter estimates by using numpy and solving:
Assume I have a feature set shape (50,1), a target variable of shape (50,), and I wish to use the shape of my target variable to add a column for intercept values.
It would look something like this:
# Create random target & feature set
y_train = np.random.randint(0,100, size = (50,))
feature_set = np.random.randint(0,100,size=(50,1))
# Build a set of 1s after shape of target variable
int_train = np.ones(shape=y_train.shape)[...,None]
# Able to then add int_train to feature set
X = np.concatenate((int_train, feature_set),1)
What I Think I Know
I see the difference in output when I include [...,None] vs when I leave it off. Here it is:
The second version returns an error around input arrays needing the same number of dimensions, and eventually I stumbled on the solution to use [...,None].
Main Question
While I see the output of [...,None] gives me what I want, I am struggling to find any information on what it is actually supposed to do. Can anybody walk me through what this code actually means, what the None argument is doing, etc?
Thank you!

The slice of [..., None] consists of two "shortcuts":
The ellipsis literal component:
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then
x[1,2,...] is equivalent to x[1,2,:,:,:],
x[...,3] to x[:,:,:,:,3] and
x[4,...,5,:] to x[4,:,:,5,:].
(Source)
The None component:
numpy.newaxis
The newaxis object can be used in all slicing operations to create an axis of length one. newaxis is an alias for ‘None’, and ‘None’ can be used in place of this with the same result.
(Source)
So, arr[..., None] takes an array of dimension N and "adds" a dimension "at the end" for a resulting array of dimension N+1.
Example:
import numpy as np
x = np.array([[1,2,3],[4,5,6]])
print(x.shape) # (2, 3)
y = x[...,None]
print(y.shape) # (2, 3, 1)
z = x[:,:,np.newaxis]
print(z.shape) # (2, 3, 1)
a = np.expand_dims(x, axis=-1)
print(a.shape) # (2, 3, 1)
print((y == z).all()) # True
print((y == a).all()) # True

Consider this code:
np.ones(shape=(2,3))[...,None].shape
As you see the 'None' phrase change the (2,3) matrix to a (2,3,1) tensor. As a matter of fact it put the matrix in the LAST index of the tensor.
If you use
np.ones(shape=(2,3))[None, ...].shape
it put the matrix in the FIRST‌ index of the tensor

assigning different weights to every numpy column

I have the following numpy array:
from sklearn.decomposition import PCA
from sklearn.preprocessing import normalize
import numpy as np
# NumPy array comprising associate metrics
# i.e. Open TA's, Open SR's, Open SE's
associateMetrics = np.array([[11, 28, 21],
[27, 17, 20],
[19, 31, 3],
[17, 24, 17]]).astype(np.float64)
print("raw metrics=", associateMetrics)
Now, I want to assign different weights to every column in the above array & later normalize this. For eg. lets say i want to assign higher weight to 1st column by multiplying by 5, multiple column 2 by 3 and the last column by 2.
How do i do this in python? Sorry a bit new to python and numpy.
I have tried this for just 1 column but it wont work:
# Assign weights to metrics
weightedMetrics = associateMetrics
np.multiply(2, weightedMetrics[:,0])
print("weighted metrics=", weightedMetrics)

You should make use of numpy's array broadcasting. This means that lower-dimensional arrays can be automatically expanded to perform a vectorized operation with an array of higher (but compatible) dimensions. In your specific case, you can multiply your (4,3)-shaped array with a 1d weight array of shape (3,) and obtain what you want:
weightedMetrics = associateMetrics * np.array([5,3,2])
The trick is that you can imagine numpy ndarrays to have leading singleton dimensions, along which broadcasting is automatic. By this I mean that your 1d numpy weight array of shape (3,) can be thought to have a leading singleton dimension (but only from the point of view of broadcasting!). And it's easy to see how the array of shape (4,3) and (1,3) should be multiplied: each element of the latter has to be used for full columns of the former.
In the very general case, you can even use arithmetic operations on, say, an array of shape (3,1,3,1,4) and one of shape (2,3,4,4). What's important that dimensions that meet should either agree, or one of the arrays should have a singleton dimension at that place, and one of the arrays is allowed to be longer (in the front).

i found my answer. This is what i used:
print("weighted metrics=", np.multiply([ 1, 2, 3], associateMetrics))

Tensor Reshape No-op in RNN example

What exactly is this code doing in the method zero_state for the RNNCell in the file rnn_cell.py? I'm not entirely sure what a shape of the form [-1, n] means...

The semantics of reshape are similar to the one from numpy:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html
It changes the tensor to have 2-dimensions and the second dimension should have self.state_size elements. E.g. if my tensor has 6 elements and I reshape it to [-1, 2], then the first dimension will have 6 / 2 = 3 elements.

Rafal's example is great. The way I remember -1 is that it sets the size of that dimension to whatever's necessary to fit all of the data from the original tensor. You can only have one -1 in a reshape.
If the original tensor is of size a, b, c (total elements = abc)
and you resize it to x, y, -1,
then the effect will be that the -1 will end up being abc/(y*z).
A 3,3,3 tensor (27 elements) reshaped to 9,3,-1 will actually have a size of 9,3,1 (27 elements)

Documentation vaguely suggests the behavior for -1.
If shape is the special value [-1], then tensor is flattened and the
operation outputs a 1-D tensor with all elements of tensor.
When the shape is an array, where one of the elements is -1, this value is just a convenient placeholder for some integer that might produce the same shape. Note that there might not be such an integer (for example if your starting matrix is [5x5] you can not reshape it with [7, -1]).
Also as you see there can not be two -1, because this make the shape ambiguous. As it was noted, the behavior is similar to numpy's reshape.

What is the difference between contiguous and non-contiguous arrays?

In the numpy manual about the reshape() function, it says
>>> a = np.zeros((10, 2))
# A transpose make the array non-contiguous
>>> b = a.T
# Taking a view makes it possible to modify the shape without modifying the
# initial object.
>>> c = b.view()
>>> c.shape = (20)
AttributeError: incompatible shape for a non-contiguous array
My questions are:
What are continuous and noncontiguous arrays? Is it similar to the contiguous memory block in C like What is a contiguous memory block?
Is there any performance difference between these two? When should we use one or the other?
Why does transpose make the array non-contiguous?
Why does c.shape = (20) throws an error incompatible shape for a non-contiguous array?
Thanks for your answer!

A contiguous array is just an array stored in an unbroken block of memory: to access the next value in the array, we just move to the next memory address.
Consider the 2D array arr = np.arange(12).reshape(3,4). It looks like this:
In the computer's memory, the values of arr are stored like this:
This means arr is a C contiguous array because the rows are stored as contiguous blocks of memory. The next memory address holds the next row value on that row. If we want to move down a column, we just need to jump over three blocks (e.g. to jump from 0 to 4 means we skip over 1,2 and 3).
Transposing the array with arr.T means that C contiguity is lost because adjacent row entries are no longer in adjacent memory addresses. However, arr.T is Fortran contiguous since the columns are in contiguous blocks of memory:
Performance-wise, accessing memory addresses which are next to each other is very often faster than accessing addresses which are more "spread out" (fetching a value from RAM could entail a number of neighbouring addresses being fetched and cached for the CPU.) This means that operations over contiguous arrays will often be quicker.
As a consequence of C contiguous memory layout, row-wise operations are usually faster than column-wise operations. For example, you'll typically find that
np.sum(arr, axis=1) # sum the rows
is slightly faster than:
np.sum(arr, axis=0) # sum the columns
Similarly, operations on columns will be slightly faster for Fortran contiguous arrays.
Finally, why can't we flatten the Fortran contiguous array by assigning a new shape?
>>> arr2 = arr.T
>>> arr2.shape = 12
AttributeError: incompatible shape for a non-contiguous array
In order for this to be possible NumPy would have to put the rows of arr.T together like this:
(Setting the shape attribute directly assumes C order - i.e. NumPy tries to perform the operation row-wise.)
This is impossible to do. For any axis, NumPy needs to have a constant stride length (the number of bytes to move) to get to the next element of the array. Flattening arr.T in this way would require skipping forwards and backwards in memory to retrieve consecutive values of the array.
If we wrote arr2.reshape(12) instead, NumPy would copy the values of arr2 into a new block of memory (since it can't return a view on to the original data for this shape).

Maybe this example with 12 different array values will help:
In [207]: x=np.arange(12).reshape(3,4).copy()
In [208]: x.flags
Out[208]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
...
In [209]: x.T.flags
Out[209]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
...
The C order values are in the order that they were generated in. The transposed ones are not
In [212]: x.reshape(12,) # same as x.ravel()
Out[212]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [213]: x.T.reshape(12,)
Out[213]: array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])
You can get 1d views of both
In [214]: x1=x.T
In [217]: x.shape=(12,)
the shape of x can also be changed.
In [220]: x1.shape=(12,)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-220-cf2b1a308253> in <module>()
----> 1 x1.shape=(12,)
AttributeError: incompatible shape for a non-contiguous array
But the shape of the transpose cannot be changed. The data is still in the 0,1,2,3,4... order, which can't be accessed accessed as 0,4,8... in a 1d array.
But a copy of x1 can be changed:
In [227]: x2=x1.copy()
In [228]: x2.flags
Out[228]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
...
In [229]: x2.shape=(12,)
Looking at strides might also help. A strides is how far (in bytes) it has to step to get to the next value. For a 2d array, there will be be 2 stride values:
In [233]: x=np.arange(12).reshape(3,4).copy()
In [234]: x.strides
Out[234]: (16, 4)
To get to the next row, step 16 bytes, next column only 4.
In [235]: x1.strides
Out[235]: (4, 16)
Transpose just switches the order of the strides. The next row is only 4 bytes- i.e. the next number.
In [236]: x.shape=(12,)
In [237]: x.strides
Out[237]: (4,)
Changing the shape also changes the strides - just step through the buffer 4 bytes at a time.
In [238]: x2=x1.copy()
In [239]: x2.strides
Out[239]: (12, 4)
Even though x2 looks just like x1, it has its own data buffer, with the values in a different order. The next column is now 4 bytes over, while the next row is 12 (3*4).
In [240]: x2.shape=(12,)
In [241]: x2.strides
Out[241]: (4,)
And as with x, changing the shape to 1d reduces the strides to (4,).
For x1, with data in the 0,1,2,... order, there isn't a 1d stride that would give 0,4,8....
__array_interface__ is another useful way of displaying array information:
In [242]: x1.__array_interface__
Out[242]:
{'strides': (4, 16),
'typestr': '<i4',
'shape': (4, 3),
'version': 3,
'data': (163336056, False),
'descr': [('', '<i4')]}
The x1 data buffer address will be same as for x, with which it shares the data. x2 has a different buffer address.
You could also experiment with adding a order='F' parameter to the copy and reshape commands.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What does layout = torch.strided mean? - python

Related

pytorch view tensor and reduce one dimension

Numpy [...,None]

assigning different weights to every numpy column

Tensor Reshape No-op in RNN example

What is the difference between contiguous and non-contiguous arrays?

Categories

Resources