Related
I have been learning Fancy indexing but when I observed the behavior of the following code I got a couple of questions...
According to my understanding,
Fancy Indexing is:
ndArray[ [0,1,2] ] i.e. passing a list of rows / columns
and
Slicing is:
ndArray[ 0:3 ] i.e. giving a range of rows / columns
Now, the problem
A numpy array,
arr = [ [1,2,3],
[4,5,6],
[7,8,9] ]
When I try fancy indexing:
arr[ [0,1], [1,2] ]
>>> [2, 6]
And when slice it,
arr[:2, 1:]
>>> [ [2, 3],
[5, 6] ]
Essentially both of them should return the two-dimension array as both of them mean the same, as they are used interchangeably!
:2 should be equivalent to [0,1] #For rows
1: should be equivalent to [1,2] #For cols
The question:
Why Fancy indexing is not returning as the slice notation? And how to achieve that?
Please enlighten me.
Thanks
Fancy indexing and slicing behave differently by definition / by numpy specification.
So, instead of questioning why that is so, it is better to:
Be able to recognize / distinguish / tell them apart (i.e., have a clear understanding of when does the indexing become fancy indexing, and when is it slicing).
Be aware of the differences in their semantics (outcomes).
In your example:
In the case of fancy indexing, the indices generated for the two axes are combined "in tandem" (similar to how the zip function combines two input sequences "in tandem". (In the words of the official numpy documentation, the two index arrays are "iterated together"). We are passing the list [0, 1] for indexing the array on axis 0, and passing the list [1, 2] for indexing the array on axis 1. The index 0 from the index array [0, 1] is combined only with the corresponding index 1 of the index array [1, 2]. Similarly, the index 1 of the index array [0, 1] is combined only with the corresponding index 2 of the index array [1, 2]. In other words, the index arrays do not combine with each other in a many-to-many fashion. All this was about fancy indexing.
In the case of slicing, the slice :2 that is specified for axis 0 conceptually generates indices '0' and '1' for axis 0; and the slice 1: specified for axis 1 conceptually generates indices 1 and 2 for axis 1. But these generated indices combine in a many-to-many fashion, unlike in the case of fancy indexing. So, they produce four combinations rather than just two.
So, the crucial difference in the defined semantics of fancy indexing and slicing is that in the case of fancy indexing, the fancy index arrays are iterated together.
I have a matrix m = [[1,2,3],[4,5,6],[7,8,9]] and a vector v=[1,2,0] that contains the indices of the rows I want to return for each column of my matrix.
the results I expect should be r=[4,8,3], but I can not find out how to get this result using numpy.
By applying the vector to the index, for each columns I get this : m[v,[0,1,2]] = [4, 8, 3], which is roughly my quest.
To prevent hardcoding the columns, I'm using np.arange(m.shape[1]) and the my final formula looks like r=m[v,np.arange(m.shape[1])]
This sounds weird to me and a little complicated for something that should be quite common.
Is there a clean way to get such result ?
In [157]: m = np.array([[1,2,3],[4,5,6],[7,8,9]]);v=np.array([1,2,0])
In [158]: m
Out[158]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [159]: v
Out[159]: array([1, 2, 0])
In [160]: m[v,np.arange(3)]
Out[160]: array([4, 8, 3])
We are choosing 3 elements, with indices (1,0),(2,1),(0,2).
Closer to the MATLAB approach:
In [162]: np.ravel_multi_index((v,np.arange(3)),(3,3))
Out[162]: array([3, 7, 2])
In [163]: m.flat[_]
Out[163]: array([4, 8, 3])
Octave/MATLAB equivalent
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1]
v =
2 3 1
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1];
>> sub2ind([3,3],v,[1 2 3])
ans =
2 6 7
>> m(sub2ind([3,3],v,[1 2 3]))
ans =
4 8 3
The same broadcasting is used to access a block, as illustrated in this recent question:
Is there a way in Python to get a sub matrix as in Matlab?
Well, this 'weird/complicated' thing is actually mentioned as a "straight forward" scenario, in the documentation of Integer array andexing, which is a sub-topic under the broader topic of "Advanced Indexing".
To quote some extract:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. Advanced indexes always are broadcast and iterated as one. Note that the result shape is identical to the (broadcast) indexing array shapes
Blockquote
If it makes it seem any less complicated/weird, you could use range(m.shape[1]) instead of np.arange(m.shape[1]). It just needs to be any array or array-like structure.
Visualization / Intuition:
When I was learning this (integer array indexing), it helped me to visualize things in the following way:
I visualized the indexing arrays standing side-by-side, all having exactly the same shape (perhaps as a consequence of getting broadcasted together). I also visualized the result array, which also has the same shape as the indexing arrays. In each of these indexing arrays and the result array, I visualized a monkey, capable of doing a walk-through of its own array, hopping to successive elements of its own array. Note that, in general, this identical shape of the indexing arrays and the result array, can be n-dimensional, and this identical shape can be very different from the shape of the source array whose values are actually being indexed.
In your own example, the source array m has shape (3,3), and the indexing arrays and the result array each have a shape of (3,).
Inn your example, there is a monkey in each of those three arrays (the two indexing arrays and the result array). We then visualize the monkeys doing a walk-through of their respective array elements in tandem. Here, "in tandem" means all the three monkeys start at the first element of their respective arrays, and whenever a monkey hops to the next element of its own array, the other monkeys in the other arrays also hop to the next element in their respective arrays. As it hops to each successive element, the monkey in each indexing array calls out the value of the element it has just visited. So the two monkeys in the two indexing arrays read out the values they've just visited, in their respective indexing arrays. The monkey in the result array also hops in tandem with the monkeys in the indexing arrays. It hears the values being called out by the monkeys in the indexing arrays, uses those values as indices into the source array m, and thus determines the value to be picked from source array m. The monkey in the result array picks up this value from the source array m, and stores it the value in the result array, at the location it has just hopped to. Thus, for example, when all the three monkeys are in the second element of their respective arrays, the second position of the result array would get its value determined.
As stated by the numpy documentation, I think the way you mentioned is the standard way to do this task:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
x = np.array([[1, 2], [3, 4], [5, 6]])
x[[0, 1, 2], [0, 1, 0]]
I try to index an array (has five dimensions) using a list. However, under certain situation, the array is permuted.
Say, a has the shape of (3,4,5,6,7), i.e.,
>>> a = np.zeros((3,4,5,6,7))
>>> a.shape
(3, 4, 5, 6, 7)
Using a list to index this array on the third dimension, it looks normal:
>>> a[:,:,[0,3],:,:].shape
(3, 4, 2, 6, 7)
However, if the array were indexed under the following situation, the third dimension is permuted to the leftmost:
>>> a[0,:,[0,1],:,:].shape
(2, 4, 6, 7)
Can anyone shed some light on it?
Basic Slicing:-
Basic Slicing occurs when a slice object is used.Usually a slice object is constructed as array[(start:stop:step)]. Ellipsis and newaxis also comes under this.
Example:- 1D array
>>x=np.arange(10)
>>x[2:10:3]
array([2, 5, 8])
Example:- 2D array
>>>x = np.array([[1,2,3], [4,5,6]])
>>>x[1:2]
array([[4, 5, 6]])
Example:- 3D array
>>>x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
>>> x[0:1]
array([[[1],
[2],
[3]]])
In the above example the number of slices(obj) given is less than that of the total number of dimension of the array. If the number of objects in the selection tuple is less than N, then it is assumed for any subsequent dimensions.
Advanced Slicing:-
Advanced indexing is triggered when the selection object, obj,
is a non-tuple sequence object,
an ndarray (of data type integer or bool),
a tuple with at least one sequence object or ndarray (of data type integer or bool).
There are two types of advanced indexing: Integer and Boolean.
Integer Indexing:-
Integer array indexing allows selection of arbitrary items in the array based on their N-dimensional index. Each integer array represents a number of indexes into that dimension.
When the index consists of as many integer arrays as the array being indexed has dimensions, the indexing is straight forward, but different from slicing.
Example:-
>>a = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>a[[0,1,2],[0,1,1]]
array([1, 5, 8])
The above example prints:
a[0,0],a[1,0],a[2,1]
Remember:- So Integer Indexing maps between two indexes.
Now to your question:-
>>>a=np.array([3,4,5])
>>>a[0,:,[0,1]]
First Case:-
This is of the form x[arr1,:,arr2].
arr1 and arr2 are advanced indexes.We consider 0 also to be an advanced index.
If the advanced indexes are separated by a slice, Ellipsis or newaxis then the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
This essentially means that the dimension of [0,1] comes first in the array. I am leaving off 0 as it has no dimension.
>>>a[0,:,[0,1]].shape
(2,4)
Second case:-
This is of the form x[:,:,arr1]. Here only arr1 is advanced index.
If the advanced indexes are all next to each other then the dimensions from the advanced indexing operations are inserted into the result array at the same spot as they were in the initial array.
This essentially means that the dimension of [0,1] comes at its respective position specified in the index of the array.
>>>a[0:1,:,[0,1]].shape
(1,4,2)
[0,1] has shape(2,) and since it occurs at third index it is inserted into 3rd index of the result array.
Any suggestions and improvements are Welcome.
Reference:-
Numpy_Docs
Thanks #Hari_Sheldon for the reply. Now, I've seen what print has done to the array a, but I still do not understand why Python takes those columns specified by a list and puts them as rows at the leftmost position. Is there any reference out there to explain the reason?
And, under some situations, this dimension permutation does not occur, i.e.:
>>> a[0:1,:,[0,3]].shape
(1, 4, 2)
As you can see, instead of permuting it into (2, 4), the dimensional order remains!
I have a multidimensional array a:
a = np.random.uniform(1,10,(2,4,2,3,10,10))
For dimensions 4-6, I have 3 lists which contain the indexes for slicing that dimension of array 'a'
dim4 = [0,2]
dim5 = [3,5,9]
dim6 = [1,2,7,8]
How do I slice out array 'a' such that i get:
b = a[0,:,0,dim4,dim5,dim6]
So b should be an array with shape (4,2,3,4), and containing elements from the corresponding dimensions of a. When I try the code above, I get an error saying that different shapes can't be broadcast together for axis 4-6, but if I were to do:
b = a[0,:,0:2,0:3,0:4]
then it does work, even though the slicing lists all have different lengths. So how do you slice multidimensional arrays with non adjacent indexes?
You can use the numpy.ix_ function to construct complex indexing like this. It takes a sequence of array_like, and makes an "open mesh" from them. The example from the docstring is pretty clear:
Using ix_ one can quickly construct index arrays that will index
the cross product. a[np.ix_([1,3],[2,5])] returns the array
[[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
So, for your data, you'd do:
>>> indices = np.ix_((0,), np.arange(a.shape[1]), (0,), dim4, dim5, dim6)
>>> a[indices].shape
(1, 4, 1, 2, 3, 4)
Get rid of the size-1 dimensions with np.squeeze:
>>> np.squeeze(a[indices]).shape
(4, 2, 3, 4)
I have defined 2 numpy array 2,3 and horizontally concatenate them
a=numpy.array([[1,2,3],[4,5,6]])
b=numpy.array([[7,8,9],[10,11,12]])
C=numpy.concatenate((a,b),axis=0)
c becomes 4,3 matrix
Now I tried same thing with 1,3 list as
a=numpy.array([1,2,3])
b=numpy.array([4,5,6])
c=numpy.concatenate((a,b),axis=0)
Now I was expecting 2,3 matrix but instead I have 1,6. I understand that vstack etc will work but I am curious as to why this is happening? And what I am doing wrong with numpy.concatenate?
Thanks for the reply. I can get the result as suggested by having 1,3 array and then concatenation. But logic is I have to add rows to an empty matrix at each iteration. I tried append as Suggested:
testing=[]
for i in range(3):
testing=testing.append([1,2,3])
It gave error testing doesnot have attribute append as its of None Type. Further If I use logic of 1,3 array using np.array([[1,2,3]]) how can i do this inside for loop?
You didn't do anything wrong. numpy.concatenate join a sequence of arrays together.which means it create an integrated array from the current array's element which in a 2D array the elements are nested lists and in a 1D array the elements are variables.
So this is not the concatenate's job, as you said you can use np.vstack :
>>> c=numpy.vstack((a,b))
>>> c
array([[1, 2, 3],
[4, 5, 6]])
Also in your code list.append appends and element in-place to a list you can not assign it to a variable.instead you can just append to testing in each iteration.
testing=[]
for i in range(3):
testing.append([1,2,3])
also as a more efficient way you can create that list using a list comprehension list following :
testing=[[1,2,3] for _ in xrange(3)]
This is happening because you are concatenating along axis=0.
In your first example:
a=numpy.array([[1,2,3],[4,5,6]]) # 2 elements in 0th dimension
b=numpy.array([[7,8,9],[10,11,12]]) # 2 elements in 0th dimension
C=numpy.concatenate((a,b),axis=0) # 4 elements in 0th dimension
In your second example:
a=numpy.array([1,2,3]) # 3 elements in 0th dimension
b=numpy.array([4,5,6]) # 3 elements in 0th dimension
c=numpy.concatenate((a,b),axis=0) # 6 elements in 0th dimension
Edit:
Note that in your second example, you only have one dimensional array.
In [35]: a=numpy.array([1,2,3])
In [36]: a.shape
Out[36]: (3,)
If the shape of the arrays was (1,3) you would get your expected result:
In [43]: a2=numpy.array([[1,2,3]])
In [44]: b2=numpy.array([[4,5,6]])
In [45]: numpy.concatenate((a2,b2), axis=0)
Out[45]:
array([[1, 2, 3],
[4, 5, 6]])