Why is slicing using "colon and comma" different than using a collection of indexes?
Here is an example of what I expected to yield the same result but but it does not:
import numpy as np
a = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(a[[0,1],[0,1]])
# Output
# [[ 1 2 3]
# [10 11 12]]
print(a[:,[0,1]])
# Output
# [[[ 1 2 3]
# [ 4 5 6]]
# [[ 7 8 9]
# [10 11 12]]]
Why are they not equivalent?
In the first case, you are indexing the array a with 2 lists of the same length, which would be equivalent to indexing with 2 arrays of the same shape (see numpy docs on arrays as indices).
Therefore, the output is a[0,0] (which is the same as a[0,0,:]) and a[1,1], the elementwise combinations of the index array. This is expected to return an array of shape 2,3. 2 because it is the length of the index array, and 3 because it is the axis that is not indexed.
In the second case however, the result is a[:,0] (equivalent to a[:,0,:]) and a[:,1]. Thus, here the expected result is an array with the first and third dimensions equivalent to the original array, and the second dimension equal to 2, the length of the index array (which here is the same as the original size of the second axis).
To show clearly that these two operations are clearly not the same, we can try to assume equivalence between : and a range of the same length as the axis to the third axis, which will result in:
print(a[[0,1],[0,1],[0,1,2]])
IndexError Traceback (most recent call last)
<ipython-input-8-110de8f5f6d8> in <module>()
----> 1 print(a[[0,1],[0,1],[0,1,2]])
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,) (3,)
That is because there is no elementwise combination of the index arrays possible. Opposite to that, a[:,:,:] would return the whole array, and a[[0,1],[0,1],[0,2]] returns [ 1 12] which as expected is an array of one dimension with length 2, like the index array.
I read the following in the numpy documentation for the function r_:
A string integer specifies which axis to stack multiple comma
separated arrays along. A string of two comma-separated integers
allows indication of the minimum number of dimensions to force each
entry into as the second integer (the axis to concatenate along is
still the first integer).
and they give this example:
>>> np.r_['0,2', [1,2,3], [4,5,6]] # concatenate along first axis, dim>=2
array([[1, 2, 3],
[4, 5, 6]])
I don't follow, what does exactly the string '0,2' instruct numpy to do?
Other than the link above, is there another site with more documentation about this function?
'n,m' tells r_ to concatenate along axis=n, and produce a shape with at least m dimensions:
In [28]: np.r_['0,2', [1,2,3], [4,5,6]]
Out[28]:
array([[1, 2, 3],
[4, 5, 6]])
So we are concatenating along axis=0, and we would normally therefore expect the result to have shape (6,), but since m=2, we are telling r_ that the shape must be at least 2-dimensional. So instead we get shape (2,3):
In [32]: np.r_['0,2', [1,2,3,], [4,5,6]].shape
Out[32]: (2, 3)
Look at what happens when we increase m:
In [36]: np.r_['0,3', [1,2,3,], [4,5,6]].shape
Out[36]: (2, 1, 3) # <- 3 dimensions
In [37]: np.r_['0,4', [1,2,3,], [4,5,6]].shape
Out[37]: (2, 1, 1, 3) # <- 4 dimensions
Anything you can do with r_ can also be done with one of the more readable array-building functions such as np.concatenate, np.row_stack, np.column_stack, np.hstack, np.vstack or np.dstack, though it may also require a call to reshape.
Even with the call to reshape, those other functions may even be faster:
In [38]: %timeit np.r_['0,4', [1,2,3,], [4,5,6]]
10000 loops, best of 3: 38 us per loop
In [43]: %timeit np.concatenate(([1,2,3,], [4,5,6])).reshape(2,1,1,3)
100000 loops, best of 3: 10.2 us per loop
The paragraph that you've highlighted is the two comma-separated integers syntax which is a special case of the three comma-separated syntax. Once you understand the three comma-separated syntax the two comma-separated syntax falls into place.
The equivalent three comma-separated integers syntax for your example would be:
np.r_['0,2,-1', [1,2,3], [4,5,6]]
In order to provide a better explanation I will change the above to:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]]
The above has two parts:
A comma-separated integer string
Two comma-separated arrays
The comma-separated arrays have the following shapes:
np.array([1,2,3]).shape
(3,)
np.array([[4,5,6]]).shape
(1, 3)
In other words the first 'array' is '1-dimensional' while the second 'array' is '2-dimensional'.
First the 2 in 0,2,-1 means that each array should be upgraded so that it's forced to be at least 2-dimensional. Since the second array is already 2-dimensional it is not affected. However the first array is 1-dimensional and in order to make it 2-dimensional np.r_ needs to add a 1 to its shape tuple to make it either (1,3) or (3,1). That is where the -1 in 0,2,-1 comes into play. It basically decides where the extra 1 needs to be placed in the shape tuple of the array. -1 is the default and places the 1 (or 1s if more dimensions are required) in the front of the shape tuple (I explain why further below). This turns the first array's shape tuple into (1,3) which is the same as the second array's shape tuple. The 0 in 0,2,-1 means that the resulting arrays need to be concatenated along the '0' axis.
Since both arrays now have a shape tuple of (1,3) concatenation is possible because if you set aside the concatenation axis (dimension 0 in the above example which has a value of 1) in both arrays the remaining dimensions are equal (in this case the value of the remaining dimension in both arrays is 3). If this was not the case then the following error would be produced:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Now if you concatenate two arrays having the shape (1,3) the resulting array will have shape (1+1,3) == (2,3) and therefore:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]].shape
(2, 3)
When a 0 or a positive integer is used for the third integer in the comma-separated string, that integer determines the start of each array's shape tuple in the upgraded shape tuple (only for those arrays which need to have their dimensions upgraded). For example 0,2,0 means that for arrays requiring a shape upgrade the array's original shape tuple should start at dimension 0 of the upgraded shape tuple. For array [1,2,3] which has a shape tuple (3,) the 1 would be placed after the 3. This would result in a shape tuple equal to (3,1) and as you can see the original shape tuple (3,) starts at dimension 0 of the upgraded shape tuple. 0,2,1 would mean that for [1,2,3] the array's shape tuple (3,) should start at dimension 1 of the upgraded shape tuple. This means that the 1 needs to be placed at dimension 0. The resulting shape tuple would be (1,3).
When a negative number is used for the third integer in the comma-separated string, the integer following the negative sign determines where original shape tuple should end. When the original shape tuple is (3,) 0,2,-1 means that the original shape tuple should end at the last dimension of the upgraded shape tuple and therefore the 1 would be placed at dimension 0 of the upgraded shape tuple and the upgraded shape tuple would be (1,3). Now (3,) ends at dimension 1 of the upgraded shape tuple which is also the last dimension of the upgraded shape tuple ( original array is [1,2,3] and upgraded array is [[1,2,3]]).
np.r_['0,2', [1,2,3], [4,5,6]]
Is the same as
np.r_['0,2,-1', [1,2,3], [4,5,6]]
Finally here's an example with more dimensions:
np.r_['2,4,1',[[1,2],[4,5],[10,11]],[7,8,9]].shape
(1, 3, 3, 1)
The comma-separated arrays are:
[[1,2],[4,5],[10,11]] which has shape tuple (3,2)
[7,8,9] which has shape tuple (3,)
Both of the arrays need to be upgraded to 4-dimensional arrays. The original array's shape tuples need to start from dimension 1.
Therefore for the first array the shape becomes (1,3,2,1) as 3,2 starts at dimension 1 and because two 1s need to be added to make it 4-dimensional one 1 is placed before the original shape tuple and one 1 after.
Using the same logic the second array's shape tuple becomes (1,3,1,1).
Now the two arrays need to be concatenated using dimension 2 as the concatenation axis. Eliminating dimension 2 from each array's upgraded shape tuple result in the tuple (1,3,1) for both arrays. As the resulting tuples are identical the arrays can be concatenated and the concatenated axis are summed up to produce (1, 3, 2+1, 1) == (1, 3, 3, 1).
The string '0,2' tells numpy to concatenate along axis 0 (the first axis) and to wrap the elements in enough brackets to ensure a two-dimensional array. Consider the following results:
for axis in (0,1):
for minDim in (1,2,3):
print np.r_['{},{}'.format(axis, minDim), [1,2,30, 31], [4,5,6, 61], [7,8,90, 91], [10,11, 12, 13]], 'axis={}, minDim={}\n'.format(axis, minDim)
[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13] axis=0, minDim=1
[[ 1 2 30 31]
[ 4 5 6 61]
[ 7 8 90 91]
[10 11 12 13]] axis=0, minDim=2
[[[ 1 2 30 31]]
[[ 4 5 6 61]]
[[ 7 8 90 91]]
[[10 11 12 13]]] axis=0, minDim=3
[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13] axis=1, minDim=1
[[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13]] axis=1, minDim=2
[[[ 1 2 30 31]
[ 4 5 6 61]
[ 7 8 90 91]
[10 11 12 13]]] axis=1, minDim=3