Why fancy indexing is not same as slicing in numpy?

Why fancy indexing is not same as slicing in numpy? - python

I have been learning Fancy indexing but when I observed the behavior of the following code I got a couple of questions...
According to my understanding,
Fancy Indexing is:
ndArray[ [0,1,2] ] i.e. passing a list of rows / columns
and
Slicing is:
ndArray[ 0:3 ] i.e. giving a range of rows / columns
Now, the problem
A numpy array,
arr = [ [1,2,3],
[4,5,6],
[7,8,9] ]
When I try fancy indexing:
arr[ [0,1], [1,2] ]
>>> [2, 6]
And when slice it,
arr[:2, 1:]
>>> [ [2, 3],
[5, 6] ]
Essentially both of them should return the two-dimension array as both of them mean the same, as they are used interchangeably!
:2 should be equivalent to [0,1] #For rows
1: should be equivalent to [1,2] #For cols
The question:
Why Fancy indexing is not returning as the slice notation? And how to achieve that?
Please enlighten me.
Thanks

Fancy indexing and slicing behave differently by definition / by numpy specification.
So, instead of questioning why that is so, it is better to:
Be able to recognize / distinguish / tell them apart (i.e., have a clear understanding of when does the indexing become fancy indexing, and when is it slicing).
Be aware of the differences in their semantics (outcomes).
In your example:
In the case of fancy indexing, the indices generated for the two axes are combined "in tandem" (similar to how the zip function combines two input sequences "in tandem". (In the words of the official numpy documentation, the two index arrays are "iterated together"). We are passing the list [0, 1] for indexing the array on axis 0, and passing the list [1, 2] for indexing the array on axis 1. The index 0 from the index array [0, 1] is combined only with the corresponding index 1 of the index array [1, 2]. Similarly, the index 1 of the index array [0, 1] is combined only with the corresponding index 2 of the index array [1, 2]. In other words, the index arrays do not combine with each other in a many-to-many fashion. All this was about fancy indexing.
In the case of slicing, the slice :2 that is specified for axis 0 conceptually generates indices '0' and '1' for axis 0; and the slice 1: specified for axis 1 conceptually generates indices 1 and 2 for axis 1. But these generated indices combine in a many-to-many fashion, unlike in the case of fancy indexing. So, they produce four combinations rather than just two.
So, the crucial difference in the defined semantics of fancy indexing and slicing is that in the case of fancy indexing, the fancy index arrays are iterated together.

Related

get a vector from a matrix and a vactor of index in numpy

I have a matrix m = [[1,2,3],[4,5,6],[7,8,9]] and a vector v=[1,2,0] that contains the indices of the rows I want to return for each column of my matrix.
the results I expect should be r=[4,8,3], but I can not find out how to get this result using numpy.
By applying the vector to the index, for each columns I get this : m[v,[0,1,2]] = [4, 8, 3], which is roughly my quest.
To prevent hardcoding the columns, I'm using np.arange(m.shape[1]) and the my final formula looks like r=m[v,np.arange(m.shape[1])]
This sounds weird to me and a little complicated for something that should be quite common.
Is there a clean way to get such result ?

In [157]: m = np.array([[1,2,3],[4,5,6],[7,8,9]]);v=np.array([1,2,0])
In [158]: m
Out[158]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [159]: v
Out[159]: array([1, 2, 0])
In [160]: m[v,np.arange(3)]
Out[160]: array([4, 8, 3])
We are choosing 3 elements, with indices (1,0),(2,1),(0,2).
Closer to the MATLAB approach:
In [162]: np.ravel_multi_index((v,np.arange(3)),(3,3))
Out[162]: array([3, 7, 2])
In [163]: m.flat[_]
Out[163]: array([4, 8, 3])
Octave/MATLAB equivalent
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1]
v =
2 3 1
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1];
>> sub2ind([3,3],v,[1 2 3])
ans =
2 6 7
>> m(sub2ind([3,3],v,[1 2 3]))
ans =
4 8 3
The same broadcasting is used to access a block, as illustrated in this recent question:
Is there a way in Python to get a sub matrix as in Matlab?

Well, this 'weird/complicated' thing is actually mentioned as a "straight forward" scenario, in the documentation of Integer array andexing, which is a sub-topic under the broader topic of "Advanced Indexing".
To quote some extract:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. Advanced indexes always are broadcast and iterated as one. Note that the result shape is identical to the (broadcast) indexing array shapes
Blockquote
If it makes it seem any less complicated/weird, you could use range(m.shape[1]) instead of np.arange(m.shape[1]). It just needs to be any array or array-like structure.
Visualization / Intuition:
When I was learning this (integer array indexing), it helped me to visualize things in the following way:
I visualized the indexing arrays standing side-by-side, all having exactly the same shape (perhaps as a consequence of getting broadcasted together). I also visualized the result array, which also has the same shape as the indexing arrays. In each of these indexing arrays and the result array, I visualized a monkey, capable of doing a walk-through of its own array, hopping to successive elements of its own array. Note that, in general, this identical shape of the indexing arrays and the result array, can be n-dimensional, and this identical shape can be very different from the shape of the source array whose values are actually being indexed.
In your own example, the source array m has shape (3,3), and the indexing arrays and the result array each have a shape of (3,).
Inn your example, there is a monkey in each of those three arrays (the two indexing arrays and the result array). We then visualize the monkeys doing a walk-through of their respective array elements in tandem. Here, "in tandem" means all the three monkeys start at the first element of their respective arrays, and whenever a monkey hops to the next element of its own array, the other monkeys in the other arrays also hop to the next element in their respective arrays. As it hops to each successive element, the monkey in each indexing array calls out the value of the element it has just visited. So the two monkeys in the two indexing arrays read out the values they've just visited, in their respective indexing arrays. The monkey in the result array also hops in tandem with the monkeys in the indexing arrays. It hears the values being called out by the monkeys in the indexing arrays, uses those values as indices into the source array m, and thus determines the value to be picked from source array m. The monkey in the result array picks up this value from the source array m, and stores it the value in the result array, at the location it has just hopped to. Thus, for example, when all the three monkeys are in the second element of their respective arrays, the second position of the result array would get its value determined.

As stated by the numpy documentation, I think the way you mentioned is the standard way to do this task:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
x = np.array([[1, 2], [3, 4], [5, 6]])
x[[0, 1, 2], [0, 1, 0]]

numpy array indexing: list index and np.array index give different result

I am trying to index an np.array using list and np.array indexes. But they give different result.
Here is an illustration:
import numpy as np
x = np.arange(10)
idx = [[0, 1], [1, 2]]
x[np.array(idx)] # returns array([[0, 1], [1, 2]])
but straightly apply the list gives error
x[idx] # raises IndexError: too many indices for array
I'm expecting the above returns the same result as using np.array index.
Any ideas why?
I am using python 3.5 and numpy 1.13.1.

If it's an array it's interpreted as shape of the final array containing the indices - but if it's an list it's the indices along the "dimensions" (multi-dimensional array indices).
So the first example (with an array) is equivalent to:
[[x[0], x[1],
[x[1], x[2]]
But the second example (list) is interpreted as:
[x[0, 1], x[1, 2]]
But x[0, 1] gives a IndexError: too many indices for array because your x has only one dimension.
That's because lists are interpreted like it was a tuple, which is identical to passing them in "separately":
x[[0, 1], [1, 2]]
^^^^^^----- indices for the second dimension
^^^^^^------------- indices for the first dimension

From numpy indexing documentation:
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection.
...
Basic slicing occurs when obj is a slice object (constructed by
start:stop:step notation inside of brackets), an integer, or a tuple
of slice objects and integers. Ellipsis and newaxis objects can be
interspersed with these as well. In order to remain backward
compatible with a common usage in Numeric, basic slicing is also
initiated if the selection object is any non-ndarray sequence (such as
a list) containing slice objects, the Ellipsis object, or the newaxis
object, but not for integer arrays or other embedded sequences. ...

appending numpy array with booleans

Can someone explain what this code is doing?
a = np.array([[1, 2], [3, 4]])
a[..., [True, False]]
What is the [True, False] doing there?

Ellipsis Notation and Booleans as Integers
From the numpy docs:
Ellipsis expand to the number of : objects needed to make a selection tuple of the same length as x.ndim. There may only be a single ellipsis present
True and False are just obfuscated 0 and 1. Taking the example from the docs:
x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
x[...,0]
# outputs: array([[1, 2, 3],
# [4, 5, 6]])
x[..., False] # same thing
The boolean values are specifying an index, just like the numbers 0 or 1 would.
In response to your question in the comments
It first seems magical that
a = np.array([[1, 2], [3, 4]])
a[..., [True, True]] # = [[2,2],[4,4]]
But when we consider it as
a[..., [1,1]] # = [[2,2],[4,4]]
It seems less impressive.
Similarly:
b = array([[1,2,3],[4,5,6]])
b[...,[2,2]] # = [[3,3],[5,5]]
After applying the ellipsis rules; the true and false grab column indices, just like 0, 1, or 17 would have
Boolean Arrays for Complex Indexing
There are some subtle differences (bool's have a different type than ints). A lot of the hairy details can be found here. These do not seem to have any roll in your code, but they are interesting in figuring out how numpy indexing works.
In particular, this line is probably what you're looking for:
In the future Boolean array-likes (such as lists of python bools) will
always be treated as Boolean indexes
On this page, they talk about boolean arrays, which are quite complex as an indexing tool
Boolean arrays used as indices are treated in a different manner
entirely than index arrays. Boolean arrays must be of the same shape
as the initial dimensions of the array being indexed
Skipping down a bit
Unlike in the case of integer index arrays, in the boolean case, the
result is a 1-D array containing all the elements in the indexed array
corresponding to all the true elements in the boolean array. The
elements in the indexed array are always iterated and returned in
row-major (C-style) order. The result is also identical to
y[np.nonzero(b)]. As with index arrays, what is returned is a copy of
the data, not a view as one gets with slices.

Getting a slice of a numpy ndarray (for arbitary dimensions)

I have a Numpy array of arbitrary dimensions, and an index vector containing one number for each dimension. I would like to get the slice of the array corresponding to the set of indices less than the value in the index array for all dimensions, e.g.
A = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9,10,11,12]])
index = [2,3]
result = [[1,2,3],
[5,6,7]]
The intuitive syntax for this would be something like A[:index], but this doesn't work for obvious reasons.
If the dimension of the array were fixed, I could write A[:index[0],:index[1],...:index[n]]; is there some kind of list comprehension I could use, like A[:i for i in index]?

You can slice multiple dimensions in one go:
result = A[:2,:3]
that slices dimension one up to the index 2 and dimension two up to the index 3.
If you have arbitary dimensions you can also create a tuple of slices:
slicer = tuple(slice(0, i, 1) for i in index)
result = A[slicer]
A slice defines the start(0), stop(the index you specified) and step(1) - basically like a range but useable for indexing. And the i-th entry of the tuple slices the i-th dimension of your array.
If you only specify stop-indices you can use the shorthand:
slicer = tuple(slice(i) for i in index)
I would recommend the first option if you know the number of dimensions and the last one if you don't.

Numpy multidimensional array slicing

Suppose I have defined a 3x3x3 numpy array with
x = numpy.arange(27).reshape((3, 3, 3))
Now, I can get an array containing the (0,1) element of each 3x3 subarray with x[:, 0, 1], which returns array([ 1, 10, 19]). What if I have a tuple (m,n) and want to retrieve the (m,n) element of each subarray(0,1) stored in a tuple?
For example, suppose that I have t = (0, 1). I tried x[:, t], but it doesn't have the right behaviour - it returns rows 0 and 1 of each subarray. The simplest solution I have found is
x.transpose()[tuple(reversed(t))].transpose()
but I am sure there must be a better way. Of course, in this case, I could do x[:, t[0], t[1]], but that can't be generalised to the case where I don't know how many dimensions x and t have.

you can create the index tuple first:
index = (numpy.s_[:],)+t
x[index]

HYRY solution is correct, but I have always found numpy's r_, c_ and s_ index tricks to be a bit strange looking. So here is the equivalent thing using a slice object:
x[(slice(None),) + t]
That single argument to slice is the stop position (i.e. None meaning all in the same way that x[:] is equivalent to x[None:None])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.