There are a few questions I've found that are close to what I am asking but they are different enough that they don't seem to solve my problem. I am trying to grab a 1d slice along one axis for an ndarray. As an example for a 3d array
[[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9,10,11],
[12,13,14],
[15,16,17]],
[[18,19,20],
[21,22,23],
[24,25,26]]]
I want the following 1d slices
[0,1,2]
...
[24,25,26]
[0,3,6]
...
[20,23,26]
[0,9,18]
...
[8,17,26]
which effectively equates to the following (for a 3d arrays):
ary[i,j,:]
ary[i,:,k]
ary[:,j,k]
I want this to generalize to an array of n dimensions
(for a 2d array we would get ary[i,:] and ary[:,j], etc.)
Is there a numpy function that lets me do this?
EDIT: Corrected the 2nd dimension indexing
We could permute axes by selecting each one of the axes one at a time pushing it at the end and reshape. We would make use of ndarray.ndim to generalize to generic n-dim ndarrays. Also, np.transpose would be useful here to permute axes and np.roll to get rolled axes order. The implementation would be quite simple and is listed below -
# a is input ndarray
R = np.arange(a.ndim)
out = [np.transpose(a,np.roll(R,i)).reshape(-1,a.shape[i]) for i in R]
Sample run -
In [403]: a = np.arange(27).reshape(3,3,3)
In [325]: R = np.arange(a.ndim)
In [326]: out = [np.transpose(a,np.roll(R,i)).reshape(-1,a.shape[i]) for i in R]
In [327]: out[0]
Out[327]:
array([[ 0, 1, 2],
[ 3, 4, 5],
...
[24, 25, 26]])
In [328]: out[1]
Out[328]:
array([[ 0, 3, 6],
[ 9, 12, 15],
....
[20, 23, 26]])
In [329]: out[2]
Out[329]:
array([[ 0, 9, 18],
[ 1, 10, 19],
....
[ 8, 17, 26]])
Related
I am trying to access a pytorch tensor by a matrix of indices and I recently found this bit of code that I cannot find the reason why it is not working.
The code below is split into two parts. The first half proves to work, whilst the second trips an error. I fail to see the reason why. Could someone shed some light on this?
import torch
import numpy as np
a = torch.rand(32, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # WORKS for a torch.tensor of size M >= 32. It doesn't work otherwise.
a = torch.rand(16, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # IndexError: too many indices for tensor of dimension 2
and if I change a = np.random.rand(16, 16) it does work as well.
To whoever comes looking for an answer: it looks like its a bug in pyTorch.
Indexing using numpy arrays is not well defined, and it works only if tensors are indexed using tensors. So, in my example code, this works flawlessly:
a = torch.rand(M, N)
m, n = a.shape
xx, yy = torch.meshgrid(torch.arange(m), torch.arange(m), indexing='xy')
result = a[xx] # WORKS
I made a gist to check it, and it's available here
First, let me give you a quick insight into the idea of indexing a tensor with a numpy array and another tensor.
Example: this is our target tensor to be indexed
numpy_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # numpy array
tensor_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # 2D tensor
t = torch.tensor([[1, 2, 3, 4], # targeted tensor
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32]])
numpy_result = t[numpy_indices]
tensor_result = t[tensor_indices]
Indexing using a 2D numpy array: the index is read like pairs (x,y) tensor[row,column] e.g. t[0,0], t[1,1], t[2,2], and t[7,3].
print(numpy_result) # tensor([ 1, 6, 11, 32])
Indexing using a 2D tensor: walks through the index tensor in a row-wise manner and each value is an index of a row in the targeted tensor.
e.g. [ [t[0],t[1],t[2],[7]] , [[0],[1],[2],[3]] ] see the example below, the new shape of tensor_result after indexing is (tensor_indices.shape[0],tensor_indices.shape[1],t.shape[1])=(2,4,4).
print(tensor_result) # tensor([[[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [29, 30, 31, 32]],
# [[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [ 13, 14, 15, 16]]])
If you try to add a third row in numpy_indices, you will get the same error you have because the index will be represented by 3D e.g., (0,0,0)...(7,3,3).
indices = np.array([[0, 1, 2, 7],
[0, 1, 2, 3],
[0, 1, 2, 3]])
print(numpy_result) # IndexError: too many indices for tensor of dimension 2
However, this is not the case with indexing by tensor and the shape will be bigger (3,4,4).
Finally, as you see the outputs of the two types of indexing are completely different. To solve your problem, you can use
xx = torch.tensor(xx).long() # convert a numpy array to a tensor
What happens in the case of advanced indexing (rows of numpy_indices > 3 ) as your situation is still ambiguous and unsolved and you can check 1 , 2, 3.
I was going through one of the documentation of NumPy module, I come across something like : If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b, I'm beginner to NumPy I thought there are only 2 axes 0 ( rows) and 1( columns) could someone please explain what it means? if I have ND array as say n=np.arange(16).reshape(4,4), which is the second to last axis?
when you first think of it as a simple data structure, you can think of 2-dimensional arrays as rows and columns. But here, instead of saying 0:represents row and 1:column, it is more correct to say 0:represents data and 1:represents dimensions.
In other words, you need to look at the dimension-based, not the axis-based.
np.arange(16).reshape(4,4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Here, we get an array with n*m(4*4) ie 4 dimensions, and 16 data in it.
Below, we obtain a 2-dimensional array containing 16 data.
np.arange(16).reshape(8,2)
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]])
As for the question you want to learn.
a=np.arange(16).reshape(4,4)
print(a[:,-2])
array([ 2, 6, 10, 14])
The above expression returns data in the second-to-last dimension.
z = np.arange(15).reshape(3,5)
indexx = [0,2]
indexy = [1,2,3,4]
zz = []
for i in indexx:
for j in indexy:
zz.append(z[i][j])
Output:
zz >> [1, 2, 3, 4, 11, 12, 13, 14]
This essentially flattens the array but only keeping the elements that have indicies present in the two indices list.
This works, but it is very slow for larger arrays/list of indicies. Is there a way to speed this up using numpy?
Thanks.
Edited to show desired output.
A list of integers can be used to access the entries of interest for numpy arrays.
z[indexx][:,indexy].flatten()
x = {"apple", "banana", "cherry"}
y = {"google", "microsoft", "apple"}
z = x.intersection(y)
print(z)
z => apples
If I understand you correctly, just use Python set. And then cast it to list.
Indexing in several dimensions at once requires broadcasting the indices against each other. np.ix_ is a handy tool for doing this:
In [127]: z
Out[127]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [128]: z[np.ix_(indexx, indexy)]
Out[128]:
array([[ 1, 2, 3, 4],
[11, 12, 13, 14]])
Converting that to 1d is a trivial ravel() task.
Look at the ix_ produces, here it's a (2,1) and (1,4) array. You can construct such arrays 'from-scratch':
In [129]: np.ix_(indexx, indexy)
Out[129]:
(array([[0],
[2]]),
array([[1, 2, 3, 4]]))
I want to compute the element-wise tensor product of 2 tensors of the shape (1144,3) meaning I want to compute the tensordot along the second axis if I understood it correctly.
I'd expect my result to be of the shape (1144,3,3).
I am currently trying to achieve this using numpys tensordot() function, but I can't figure out the correct axes to use to get a shape of (1144,3,3).
You can use numpy.einsum for this.
In [30]: a
Out[30]:
array([[0, 1, 2],
[3, 4, 5]])
In [31]: np.einsum('ij,ik->ijk', a, a)
Out[31]:
array([[[ 0, 0, 0],
[ 0, 1, 2],
[ 0, 2, 4]],
[[ 9, 12, 15],
[12, 16, 20],
[15, 20, 25]]])
As numpy.tensordot support only 2 element axes this means there is no way to imitate the
->...-like behavior. So I don't see how this can be done with numpy.tensordot.
I have a 3D matrix X which contains vectors as rows into the 3rd dimension. I would like to extract each such vector X(:, x, y) and save it as a 2D matrix such that X(:, 0, 0) is the first row of the 2D matrix, X(:, 0, 1) the second, and so on. The following crude graphic might help illustrate this:
I know that I can create my new 2D matrix and then iterate over the original X to add the vectors, but does somebody have some input on how to do this quick and efficiently?
Example: Given
>>> a = np.arange(9*3).reshape(3,3,3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I would like to get the following as rows, though the order of the rows does not matter:
array([[ 0, 9, 18],
[ 1, 10, 19]],
...)
Use np.transpose and then reshape like so -
X.transpose(1,2,0).reshape(-1,X.shape[0])
Explanation -
1) You want to get rows formed off X[:, 0, 0], X[:, 0, 1], etc., i.e., we have to "push" the axis=0 elements to the last axis of such a 2D array output. Next up, we have to decided the order of rows, which would be formed out of axes=1,2 from it. Now, going back to the desired 2D array output, between the first and second rows, i.e. between X[:, 0, 0] and X[:, 0, 1], axis=1 stays the same. So, in the 2D array output, the second axis (axis=1) would have precedence over the third axis (axis=2). So, in X we push axis=1 to axis=0 and axis=2 to axis=1. Since, as stated earlier axis=0 in X had to be moved to the last axis, so that would be axis=2. All of this could be done with X.transpose(1,2,0). Let's call it Y .
2) Finally, we have to reshape Y to a 2D array such that the number of elements in each row is same as X.shape[0], which is achieved through Y.reshape(-1,X.shape[0]). Thus, the final solution becomes -
X.transpose(1,2,0).reshape(-1,X.shape[0])
Sample run -
In [25]: X
Out[25]:
array([[[ 0.19508052, 0.02481975],
[ 0.88915956, 0.95974095]],
[[ 0.23271151, 0.14730822],
[ 0.56763563, 0.30607283]],
[[ 0.33259228, 0.42552102],
[ 0.28950926, 0.47782175]]])
In [26]: X[:, 0, 0]
Out[26]: array([ 0.19508052, 0.23271151, 0.33259228])
In [27]: X[:, 0, 1]
Out[27]: array([ 0.02481975, 0.14730822, 0.42552102])
In [28]: X[:, 1, 0]
Out[28]: array([ 0.88915956, 0.56763563, 0.28950926])
In [29]: X[:, 1, 1]
Out[29]: array([ 0.95974095, 0.30607283, 0.47782175])
In [30]: X.transpose(1,2,0).reshape(-1,X.shape[0])
Out[30]:
array([[ 0.19508052, 0.23271151, 0.33259228],
[ 0.02481975, 0.14730822, 0.42552102],
[ 0.88915956, 0.56763563, 0.28950926],
[ 0.95974095, 0.30607283, 0.47782175]])