There is this great Question/Answer about slicing the last dimension:
Numpy slice of arbitrary dimensions: for slicing a numpy array to obtain the i-th index in the last dimension, one can use ... or Ellipsis,
slice = myarray[...,i]
What if the first N dimensions are needed ?
For 3D myarray, N=2:
slice = myarray[:,:,0]
For 4D myarray, N=2:
slice = myarray[:,:,0,0]
Does this can be generalized to an arbitrary dimension?
I don't think there's any built-in syntactic sugar for that, but slices are just objects like anything else. The slice(None) object is what is created from :, and otherwise just picking the index 0 works fine.
myarray[(slice(None),)*N+(0,)*(myarray.ndim-N)]
Note the comma in (slice(None),). Python doesn't create tuples from parentheses by default unless the parentheses are empty. The comma signifies that don't just want to compute whatever's on the inside.
Slices are nice because they give you a view into the object instead of a copy of the object. You can use the same idea to, e.g., iterate over everything except the N-th dimension on the N-th dimension. There have been some stackoverflow questions about that, and they've almost unanimously resorted to rolling the indices and other things that I think are hard to reason about in high-dimensional spaces. Slice tuples are your friend.
From the comments, #PaulPanzer points out another technique that I rather like.
myarray.T[(myarray.ndim-N)*(0,)].T
First, transposes in numpy are view-operations instead of copy-operations. This isn't inefficient in the slightest. Here's how it works:
Start with myarray with dimensions (0,...,k)
The transpose myarray.T reorders those to (k,...,0)
The whole goal is to fix the last myarray.ndim-N dimensions from the original array, so we select those with [(myarray.ndim-N)*(0,)], which grabs the first myarray.ndim-N dimensions from this array.
They're in the wrong order. We have dimensions (N-1,...,0). Use another transpose with .T to get the ordering (0,...,N-1) instead.
Related
This question already has an answer here:
why there is deference between the output type of this two Numpy slice commands
(1 answer)
Closed last year.
I'm a little bit new to Python & Numpy, but I've noticed that when you call operator [] on a numpy array A, if it's a single index that is used (e.g., A[1]), the resulting sub-array is 1 dimension smaller, but if it's a range of indices (e.g., A[1:]) the dimension of the subarray remains unchanged, even if the range of indices covers only a single index, e.g., in this above case, if A was 2x2, A[1:] is effectively just a single index, but the resulting size is not the same as A[1].
My question is: is this always true in that if you supply a range of indices when extracting a subarray, the dimension doesn't change, and that a single index always reduces the dimension by 1? Are there edge cases?
That is always the case. When you use one index-value, e.g. A[1], you are effectively saying "give me the subarray A[1]", which, by definition, has a dimensionality smaller (by 1).
When you request a range of indices, e.g., A[1:] you are "cropping" A, to get everything but the first slice (A[0]). See, the range of indices define the axis you "lost" in the previous case (A[1]).
The following docs should be helpful to understand numpy arrays (indexing):
Arrays: https://numpy.org/doc/stable/user/absolute_beginners.html#more-information-about-arrays
Indexing: https://numpy.org/doc/stable/user/basics.indexing.html
I have a 5 dimension array like this
a=np.random.randint(10,size=[2,3,4,5,600])
a.shape #(2,3,4,5,600)
I want to get the first element of the 2nd dimension, and several elements of the last dimension
b=a[:,0,:,:,[1,3,5,30,17,24,30,100,120]]
b.shape #(9,2,4,5)
as you can see, the last dimension was automatically converted to the first dimension.
why? and how to avoid that?
This behavior is described in the numpy documentation. In the expression
a[:,0,:,:,[1,3,5,30,17,24,30,100,120]]
both 0 and [1,3,5,30,17,24,30,100,120] are advanced indexes, separated by slices. As the documentation explains, in such case dimensions coming from advanced indexes will be first in the resulting array.
If we replace 0 by the slice 0:1 it will change this situation (since it will leave only one advanced index), and then the order of dimensions will be preserved. Thus one way to fix this issue is to use the 0:1 slice and then squeeze the appropriate axis:
a[:,0:1,:,:,[1,3,5,30,17,24,30,100,120]].squeeze(axis=1)
Alternatively, one can keep both advanced indexes, and then rearrange axes:
np.moveaxis(a[:,0,:,:,[1,3,5,30,17,24,30,100,120]], 0, -1)
I am working with multiple multidimensional arrays. Let us consider dummy example for simplicity:
array_list=[np.ones(3), np.ones((3,3,3)), np.ones((3,3)), np.ones(3)]
I need to subscribe the outermost dimension of each array in the list. For example, my goal is to set some of the elements to zero according to a specified range in the outermost dimension:
array_list[0][0:2]=0
array_list[1][:,:,0:2]=0
array_list[2][:,0:2]=0
array_list[3][0:2]=0
In my real application I don't know how many arrays I have and how many dimensions are in there.
I would like to do the task in a for loop:
for array in array_list:
array[???]=0
But I am struggling how to implement this if I don't know the dimensionality of each array.
Use the Ellipsis to select all dimensions except the last (if there's only 1, nothing is selected).
for array in array_list:
array[..., 0:2] = 0
I'm messing around with 2-dimensional slicing and don't understand why leaving out some defaults grabs the same values from the original array but produces different output. What's going on with the double brackets and shape changing?
x = np.arange(9).reshape(3,3)
y = x[2]
z = x[2:,:]
print y
print z
print shape(y)
print shape(z)
[6 7 8]
[[6 7 8]]
(3L,)
(1L, 3L)
x is a two dimensional array, an instance of NumPy's ndarray object. You can index/slice these objects in essentially two ways: basic and advanced.
y[2] fetches the row at index 2 of the array, returning the array [6 7 8]. You're doing basic slicing because you've specified only an integer. You can also specify a tuple of slice objects and integers for basic slicing, e.g. x[:,2] to select the right-hand column.
With basic slicing, you're also reducing the number of dimensions of the returned object (in this case from two to just one):
An integer, i, returns the same values as i:i+1 except the dimensionality of the returned object is reduced by 1.
So when you ask for the shape of y, this is why you only get back one dimension (from your two-dimensional x).
Advanced slicing occurs when you specify an ndarray: or a tuple with at least one sequence object or ndarray. This is the case with x[2:,:] since 2: counts as a sequence object.
You get back an ndarray. When you ask for its shape, you will get back all of the dimensions (in this case two):
The shape of the output (or the needed shape of the object to be used for setting) is the broadcasted shape.
In a nutshell, as soon as you start slicing along any dimension of your array with :, you're doing advanced slicing and not basic slicing.
One brief point worth mentioning: basic slicing returns a view onto the original array (changes made to y will be reflected in x). Advanced slicing returns a brand new copy of the array.
You can read about array indexing and slicing in much more detail here.
Is there a simple way to index arrays using lists or any other collection so that no copy is made (just a view of the array is taken). Please do not try to answer the question in terms of the snippet of code below --- the list I use to index the element is not always short (i.e. thousands of elements, not 4) and the list is a product of an algorithm and hence the number are not necessarily ordered, etc.
For example in the code below columns 1,2 and 3 are selected in both cases but only in the first case a view of the data is returned:
>>> a[:,1:4]
>>> b = a[:,1:4]
>>> b.base is a
True
>>> c = a[:,[1,3,2]]
>>> c.base is a
False
Fancy indexing (using a list of indices to access elements of an array) always produces a copy, as there is no way for numpy to translate it into a new view of the same data, but with a different fixed stride and shape, starting from a particular element.
Under the hood, a numpy array is a pointer to the first element in memory of an array, a dtype, shape and information about how far to move in memory to get to each of the dimensions (next row, column, etc) and some flags. A view on some pre-existing memory just points to some element in that array and fiddles with the stride and shape. Fancy indexing generally specifies random access into that pre-existing memory and you can't force that data into the necessary form, so a copy has to be made.