Index Numpy tensor without having to reshape - python

I have a tensor with the shape (5,48,15). How can I access an element along the 0th axis and still maintain 3 dimensions without needing to reshape. For example:
x.shape # this is (5,48,15)
m = x[0,:,:]
m.shape # This is (48,15)
m_new = m.reshape(1,48,15)
m_new.shape # This is now (1,48,15)
Is this possible without needing to reshape?

When you index an axis with a single integer, as with x[0, :, :], the dimensionality of the returned array drops by one.
To keep three dimensions, you can either...
insert a new axis at the same time as indexing:
>>> x[None, 0, :, :].shape
(1, 48, 15)
or use slicing:
>>> x[:1, :, :].shape
(1, 48, 15)
or use fancy indexing:
>>> x[[0], :, :].shape
(1, 48, 15)

The selection index needs to be a slice or list (or array):
m = x[[0],:,:]
m = x[:1,:,:]
m = x[0:1,:,:]

Related

How to reshape a (x, y) numpy array into a (x, y, 1) array?

How do you reshape a (55, 11) numpy array to a (55, 11, 1) numpy array?
Attempts:
Simply doing numpy_array.reshape(-1, 1) without any loop produces a flat array that is not 3D.
The following for loop produces a "cannot
broadcast error":
for i in range(len(numpy_array)):
numpy_array[i] = numpy_array[i].reshape(-1, 1)
Maybe you are looking for numpy.expand_dims(https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html)?
import numpy
a = numpy.random.rand(55,11)
print(a.shape) # 55,11
print(numpy.expand_dims(a, 2).shape) # 55, 11, 1
Add a newaxis to the array
my_array = np.arange(55*11).reshape(55,11)
my_array.shape
# (55, 11)
# add new axis
new_array = my_array[...,None]
new_array.shape
# (55, 11, 1)
Can specify new shape in reshape too:
new_array = my_array.reshape(*my_array.shape, 1)
new_array.shape
# (55, 11, 1)
One of the answers recommends using expand_dims. That's a good answer, but if you look at its code, and strip off some generalities, all it is doing is:
In [409]: a = np.ones((2,3)); axis=(2,)
...: out_ndim = 2+1
...: shape_it = iter(a.shape)
...: shape = [1 if ax in axis else next(shape_it) for ax in range(out_ndim)]
In [410]: shape
Out[410]: [2, 3, 1]
followed by a return a.reshape(shape).
In other words, the function call is just hiding the obvious, expand a (x,y) to (x,y,1) with
a.reshape(x,y,1)
Are you seeking some 3d 'magic' akin to the -1 in numpy_array.reshape(-1, 1)?
Personally I like to use None to add dimensions, so prefer the other answer [...,None]. But functionally it's all the same.

Complex indexing of a multidimensional array with indices of lower dimensional arrays in Python

Problem:
I have a numpy array of 4 dimensions:
x = np.arange(1000).reshape(5, 10, 10, 2 )
If we print it:
I want to find the indices of the 6 largest values of the array in the 2nd axis but only for the 0th element in the last axis (red circles in the image):
indLargest2ndAxis = np.argpartition(x[...,0], 10-6, axis=2)[...,10-6:]
These indices have a shape of (5,10,6) as expected.
I want to obtain the values of the array for these indices in the 2nd axis but now for the 1st element in the last axis (yellow circles in the image). They should have a shape of (5,10,6). Without vectorizing, this could be done with:
np.array([ [ [ x[i, j, k, 1] for k in indLargest2ndAxis[i,j]] for j in range(10) ] for i in range(5) ])
However, I would like to achieve it vectorizing. I tried indexing with:
x[indLargest2ndAxis, 1]
But I get IndexError: index 5 is out of bounds for axis 0 with size 5. How can I manage this indexing combination in a vectorized way?
Ah, I think I now get what you are after. Fancy indexing is documented here in detail. Be warned though that - in its full generality - this is quite heavy stuff. In a nutshell, fancy indexing allows you to take elements from a source array (according to some idx) and place them into a new array (fancy indexing allways returns a copy):
source = np.array([10.5, 21, 42])
idx = np.array([0, 1, 2, 1, 1, 1, 2, 1, 0])
# this is fancy indexing
target = source[idx]
expected = np.array([10.5, 21, 42, 21, 21, 21, 42, 21, 10.5])
assert np.allclose(target, expected)
What is nice about this is that you can control the shape of the resulting array using the shape of the index array:
source = np.array([10.5, 21, 42])
idx = np.array([[0, 1], [1, 2]])
target = source[idx]
expected = np.array([[10.5, 21], [21, 42]])
assert np.allclose(target, expected)
assert target.shape == (2,2)
Where things get a little more interesting is if source has more than one dimension. In this case, you need to specify the indices of each axis so that numpy knows which elements to take:
source = np.arange(4).reshape(2,2)
idxA = np.array([0, 1])
idxB = np.array([0, 1])
# this will take (0,0) and (1,1)
target = source[idxA, idxB]
expected = np.array([0, 3])
assert np.allclose(target, expected)
Observe that, again, the shape of target matches the shape of the index used. What is awesome about fancy indexing is that index shapes are broadcasted if necessary:
source = np.arange(4).reshape(2,2)
idxA = np.array([0, 0, 1, 1]).reshape((4,1))
idxB = np.array([0, 1]).reshape((1,2))
target = source[idxA, idxB]
expected = np.array([[0, 1],[0, 1],[2, 3],[2, 3]])
assert np.allclose(target, expected)
At this point, you can understand where your exception comes from. Your source.ndim is 4; however, you try to index it with a 2-tuple (indLargest2ndAxis, 1). Numpy will interpret this as you trying to index the first axis using indLargest2ndAxis, the second axis using 1, and all other axis using :. Clearly, this doesn't work. All values of indLargest2ndAxis would have to be between 0 and 4 (inclusive), since they would have to refer to positions along the first axis of x.
What my suggestion of x[..., indLargest2ndAxis, 1] does is tell numpy that you wish to index the last two axes of x, i.e., you wish to index the third axis using indLargest2ndAxis, the fourth axis using 1, and : for anything else.
This will produce a result since all elements of indLargest2ndAxis are in [0, 10), but will produce a shape of (5, 10, 5, 10, 6) (which is not what you want). Being a bit hand-wavy, the first part of the shape (5, 10) comes from the ellipsis (...), aka. select everything, the middle part (5, 10, 6) comes from indLargest2ndAxis selecting elements along the third axis of x according to the shape of indLargest2ndAxis and the final part (which you don't see because it is squeezed) comes from selecting index 1 along the fourth axis.
Moving on to your actual problem, you can entirely dodge the fancy indexing bullet and do the following:
x = np.arange(1000).reshape(5, 10, 10, 2)
order = x[..., 0]
values = x[..., 1]
idx = np.argpartition(order, 4)[..., 4:]
result = np.take_along_axis(values, idx, axis=-1)
Edit: Of course, you can also use fancy indexing; however, it is more cryptic and doesn't scale as nicely to different shapes:
x = np.arange(1000).reshape(5, 10, 10, 2)
indLargest2ndAxis = np.argpartition(x[..., 0], 4)[..., 4:]
result = x[np.arange(5)[:, None, None], np.arange(10)[None, :, None], indLargest2ndAxis, 1]

NumPy: Concatenating 1D array to 3D array

Suppose I have a 5x10x3 array, which I interpret as 5 'sub-arrays', each consisting of 10 rows and 3 columns. I also have a seperate 1D array of length 5, which I call b.
I am trying to insert a new column into each sub-array, where the column inserted into the ith (i=0,1,2,3,4) sub-array is a 10x1 vector where each element is equal to b[i].
For example:
import numpy as np
np.random.seed(777)
A = np.random.rand(5,10,3)
b = np.array([2,4,6,8,10])
A[0] should look like:
A[1] should look like:
And similarly for the other 'sub-arrays'.
(Notice b[0]=2 and b[1]=4)
What about this?
# Make an array B with the same dimensions than A
B = np.tile(b, (1, 10, 1)).transpose(2, 1, 0) # shape: (5, 10, 1)
# Concatenate both
np.concatenate([A, B], axis=-1) # shape: (5, 10, 4)
One method would be np.pad:
np.pad(A, ((0,0),(0,0),(0,1)), 'constant', constant_values=[[[],[]],[[],[]],[[],b[:, None,None]]])
# array([[[9.36513084e-01, 5.33199169e-01, 1.66763960e-02, 2.00000000e+00],
# [9.79060284e-02, 2.17614285e-02, 4.72452812e-01, 2.00000000e+00],
# etc.
Or (more typing but probably faster):
i,j,k = A.shape
res = np.empty((i,j,k+1), np.result_type(A, b))
res[...,:-1] = A
res[...,-1] = b[:, None]
Or dstack after broadcast_to:
np.dstack([A,np.broadcast_to(b[:,None],A.shape[:2])]

How to multiply a tensor row-wise by a vector in PyTorch?

When I have a tensor m of shape [12, 10] and a vector s of scalars with shape [12], how can I multiply each row of m with the corresponding scalar in s?
You need to add a corresponding singleton dimension:
m * s[:, None]
s[:, None] has size of (12, 1) when multiplying a (12, 10) tensor by a (12, 1) tensor pytorch knows to broadcast s along the second singleton dimension and perform the "element-wise" product correctly.
You can broadcast a vector to a higher dimensional tensor like so:
def row_mult(input, vector):
extra_dims = (1,)*(input.dim()-1)
return t * vector.view(-1, *extra_dims)
A slighty hard to understand at first, but very powerful technique is to use Einstein summation:
torch.einsum('i,ij->ij', s, m)
Shai's answer works if you know the number of dimensions in advance and can hardcode the correct number of None's. This can be extended to extra dimentions is required:
mask = (torch.rand(12) > 0.5).int()
data = (torch.rand(12, 2, 3, 4))
result = data * mask[:,None,None,None]
result.shape # torch.Size([12, 2, 3, 4])
mask[:,None,None,None].shape # torch.Size([12, 1, 1, 1])
If you are dealing with data of variable or unknown dimensions, then it may require manually extending mask to the correct shape
mask = (torch.rand(12) > 0.5).int()
while mask.dim() < data.dim(): mask.unsqueeze_(1)
result = data * mask
result.shape # torch.Size([12, 2, 3, 4])
mask.shape # torch.Size([12, 1, 1, 1])
This is a bit of an ugly solution, but it does work. There is probably a much more elegant way to correctly reshape the mask tensor inline for a variable number of dimensions

How to multiply a numpy array by a list to get a multidimentional array?

In Python, I have a list and a numpy array.
I would like to multiply the array by the list in such a way that I get an array where the 3rd dimension represents the input array multiplied by each element of the list. Therefore:
in_list = [2,4,6]
in_array = np.random.rand(5,5)
result = ...
np.shape(result) ---> (3,5,5)
where (0,:,:) is the input array multiplied by the first element of the list (2);
(1,:,:) is the input array multiplied by the second element of the list (4), etc.
I have a feeling this question will be answered by broadcasting, but I'm not sure how to go around doing this.
You want np.multiply.outer. The outer method is defined for any NumPy "ufunc", including multiplication. Here's a demonstration:
In [1]: import numpy as np
In [2]: in_list = [2, 4, 6]
In [3]: in_array = np.random.rand(5, 5)
In [4]: result = np.multiply.outer(in_list, in_array)
In [5]: result.shape
Out[5]: (3, 5, 5)
In [6]: (result[1, :, :] == in_list[1] * in_array).all()
Out[6]: True
As you suggest, broadcasting gives an alternative solution: if you convert in_list to a 1d NumPy array of length 3, you can then reshape to an array of shape (3, 1, 1), and then a multiplication with in_array will broadcast appropriately:
In [9]: result2 = np.array(in_list)[:, None, None] * in_array
In [10]: result2.shape
Out[10]: (3, 5, 5)
In [11]: (result2[1, :, :] == in_list[1] * in_array).all()
Out[11]: True

Categories

Resources