I have a multidimensional array a:
a = np.random.uniform(1,10,(2,4,2,3,10,10))
For dimensions 4-6, I have 3 lists which contain the indexes for slicing that dimension of array 'a'
dim4 = [0,2]
dim5 = [3,5,9]
dim6 = [1,2,7,8]
How do I slice out array 'a' such that i get:
b = a[0,:,0,dim4,dim5,dim6]
So b should be an array with shape (4,2,3,4), and containing elements from the corresponding dimensions of a. When I try the code above, I get an error saying that different shapes can't be broadcast together for axis 4-6, but if I were to do:
b = a[0,:,0:2,0:3,0:4]
then it does work, even though the slicing lists all have different lengths. So how do you slice multidimensional arrays with non adjacent indexes?
You can use the numpy.ix_ function to construct complex indexing like this. It takes a sequence of array_like, and makes an "open mesh" from them. The example from the docstring is pretty clear:
Using ix_ one can quickly construct index arrays that will index
the cross product. a[np.ix_([1,3],[2,5])] returns the array
[[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
So, for your data, you'd do:
>>> indices = np.ix_((0,), np.arange(a.shape[1]), (0,), dim4, dim5, dim6)
>>> a[indices].shape
(1, 4, 1, 2, 3, 4)
Get rid of the size-1 dimensions with np.squeeze:
>>> np.squeeze(a[indices]).shape
(4, 2, 3, 4)
Related
I have a matrix m = [[1,2,3],[4,5,6],[7,8,9]] and a vector v=[1,2,0] that contains the indices of the rows I want to return for each column of my matrix.
the results I expect should be r=[4,8,3], but I can not find out how to get this result using numpy.
By applying the vector to the index, for each columns I get this : m[v,[0,1,2]] = [4, 8, 3], which is roughly my quest.
To prevent hardcoding the columns, I'm using np.arange(m.shape[1]) and the my final formula looks like r=m[v,np.arange(m.shape[1])]
This sounds weird to me and a little complicated for something that should be quite common.
Is there a clean way to get such result ?
In [157]: m = np.array([[1,2,3],[4,5,6],[7,8,9]]);v=np.array([1,2,0])
In [158]: m
Out[158]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [159]: v
Out[159]: array([1, 2, 0])
In [160]: m[v,np.arange(3)]
Out[160]: array([4, 8, 3])
We are choosing 3 elements, with indices (1,0),(2,1),(0,2).
Closer to the MATLAB approach:
In [162]: np.ravel_multi_index((v,np.arange(3)),(3,3))
Out[162]: array([3, 7, 2])
In [163]: m.flat[_]
Out[163]: array([4, 8, 3])
Octave/MATLAB equivalent
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1]
v =
2 3 1
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1];
>> sub2ind([3,3],v,[1 2 3])
ans =
2 6 7
>> m(sub2ind([3,3],v,[1 2 3]))
ans =
4 8 3
The same broadcasting is used to access a block, as illustrated in this recent question:
Is there a way in Python to get a sub matrix as in Matlab?
Well, this 'weird/complicated' thing is actually mentioned as a "straight forward" scenario, in the documentation of Integer array andexing, which is a sub-topic under the broader topic of "Advanced Indexing".
To quote some extract:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. Advanced indexes always are broadcast and iterated as one. Note that the result shape is identical to the (broadcast) indexing array shapes
Blockquote
If it makes it seem any less complicated/weird, you could use range(m.shape[1]) instead of np.arange(m.shape[1]). It just needs to be any array or array-like structure.
Visualization / Intuition:
When I was learning this (integer array indexing), it helped me to visualize things in the following way:
I visualized the indexing arrays standing side-by-side, all having exactly the same shape (perhaps as a consequence of getting broadcasted together). I also visualized the result array, which also has the same shape as the indexing arrays. In each of these indexing arrays and the result array, I visualized a monkey, capable of doing a walk-through of its own array, hopping to successive elements of its own array. Note that, in general, this identical shape of the indexing arrays and the result array, can be n-dimensional, and this identical shape can be very different from the shape of the source array whose values are actually being indexed.
In your own example, the source array m has shape (3,3), and the indexing arrays and the result array each have a shape of (3,).
Inn your example, there is a monkey in each of those three arrays (the two indexing arrays and the result array). We then visualize the monkeys doing a walk-through of their respective array elements in tandem. Here, "in tandem" means all the three monkeys start at the first element of their respective arrays, and whenever a monkey hops to the next element of its own array, the other monkeys in the other arrays also hop to the next element in their respective arrays. As it hops to each successive element, the monkey in each indexing array calls out the value of the element it has just visited. So the two monkeys in the two indexing arrays read out the values they've just visited, in their respective indexing arrays. The monkey in the result array also hops in tandem with the monkeys in the indexing arrays. It hears the values being called out by the monkeys in the indexing arrays, uses those values as indices into the source array m, and thus determines the value to be picked from source array m. The monkey in the result array picks up this value from the source array m, and stores it the value in the result array, at the location it has just hopped to. Thus, for example, when all the three monkeys are in the second element of their respective arrays, the second position of the result array would get its value determined.
As stated by the numpy documentation, I think the way you mentioned is the standard way to do this task:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
x = np.array([[1, 2], [3, 4], [5, 6]])
x[[0, 1, 2], [0, 1, 0]]
I try to index an array (has five dimensions) using a list. However, under certain situation, the array is permuted.
Say, a has the shape of (3,4,5,6,7), i.e.,
>>> a = np.zeros((3,4,5,6,7))
>>> a.shape
(3, 4, 5, 6, 7)
Using a list to index this array on the third dimension, it looks normal:
>>> a[:,:,[0,3],:,:].shape
(3, 4, 2, 6, 7)
However, if the array were indexed under the following situation, the third dimension is permuted to the leftmost:
>>> a[0,:,[0,1],:,:].shape
(2, 4, 6, 7)
Can anyone shed some light on it?
Basic Slicing:-
Basic Slicing occurs when a slice object is used.Usually a slice object is constructed as array[(start:stop:step)]. Ellipsis and newaxis also comes under this.
Example:- 1D array
>>x=np.arange(10)
>>x[2:10:3]
array([2, 5, 8])
Example:- 2D array
>>>x = np.array([[1,2,3], [4,5,6]])
>>>x[1:2]
array([[4, 5, 6]])
Example:- 3D array
>>>x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
>>> x[0:1]
array([[[1],
[2],
[3]]])
In the above example the number of slices(obj) given is less than that of the total number of dimension of the array. If the number of objects in the selection tuple is less than N, then it is assumed for any subsequent dimensions.
Advanced Slicing:-
Advanced indexing is triggered when the selection object, obj,
is a non-tuple sequence object,
an ndarray (of data type integer or bool),
a tuple with at least one sequence object or ndarray (of data type integer or bool).
There are two types of advanced indexing: Integer and Boolean.
Integer Indexing:-
Integer array indexing allows selection of arbitrary items in the array based on their N-dimensional index. Each integer array represents a number of indexes into that dimension.
When the index consists of as many integer arrays as the array being indexed has dimensions, the indexing is straight forward, but different from slicing.
Example:-
>>a = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>a[[0,1,2],[0,1,1]]
array([1, 5, 8])
The above example prints:
a[0,0],a[1,0],a[2,1]
Remember:- So Integer Indexing maps between two indexes.
Now to your question:-
>>>a=np.array([3,4,5])
>>>a[0,:,[0,1]]
First Case:-
This is of the form x[arr1,:,arr2].
arr1 and arr2 are advanced indexes.We consider 0 also to be an advanced index.
If the advanced indexes are separated by a slice, Ellipsis or newaxis then the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
This essentially means that the dimension of [0,1] comes first in the array. I am leaving off 0 as it has no dimension.
>>>a[0,:,[0,1]].shape
(2,4)
Second case:-
This is of the form x[:,:,arr1]. Here only arr1 is advanced index.
If the advanced indexes are all next to each other then the dimensions from the advanced indexing operations are inserted into the result array at the same spot as they were in the initial array.
This essentially means that the dimension of [0,1] comes at its respective position specified in the index of the array.
>>>a[0:1,:,[0,1]].shape
(1,4,2)
[0,1] has shape(2,) and since it occurs at third index it is inserted into 3rd index of the result array.
Any suggestions and improvements are Welcome.
Reference:-
Numpy_Docs
Thanks #Hari_Sheldon for the reply. Now, I've seen what print has done to the array a, but I still do not understand why Python takes those columns specified by a list and puts them as rows at the leftmost position. Is there any reference out there to explain the reason?
And, under some situations, this dimension permutation does not occur, i.e.:
>>> a[0:1,:,[0,3]].shape
(1, 4, 2)
As you can see, instead of permuting it into (2, 4), the dimensional order remains!
I have a Numpy array of arbitrary dimensions, and an index vector containing one number for each dimension. I would like to get the slice of the array corresponding to the set of indices less than the value in the index array for all dimensions, e.g.
A = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9,10,11,12]])
index = [2,3]
result = [[1,2,3],
[5,6,7]]
The intuitive syntax for this would be something like A[:index], but this doesn't work for obvious reasons.
If the dimension of the array were fixed, I could write A[:index[0],:index[1],...:index[n]]; is there some kind of list comprehension I could use, like A[:i for i in index]?
You can slice multiple dimensions in one go:
result = A[:2,:3]
that slices dimension one up to the index 2 and dimension two up to the index 3.
If you have arbitary dimensions you can also create a tuple of slices:
slicer = tuple(slice(0, i, 1) for i in index)
result = A[slicer]
A slice defines the start(0), stop(the index you specified) and step(1) - basically like a range but useable for indexing. And the i-th entry of the tuple slices the i-th dimension of your array.
If you only specify stop-indices you can use the shorthand:
slicer = tuple(slice(i) for i in index)
I would recommend the first option if you know the number of dimensions and the last one if you don't.
I have piece of code that slices a 2D NumPy array and returns the resulting (sub-)array. In some cases, the slicing only indexes one element, in which case the result is a one-element array:
>>> sub_array = orig_array[indices_h, indices_w]
>>> sub_array.shape
(1,)
How can I force this array to be two-dimensional in a general way? I.e.:
>>> sub_array.shape
(1,1)
I know that sub_array.reshape(1,1) works, but I would like to be able to apply it to sub_array generally without worrying about the number of elements in it. To put it in another way, I would like to compose a (light-weight) operation that converts a shape-(1,) array to a shape-(1,1) array, a shape-(2,2) array to a shape-(2,2) array etc. I can make a function:
def twodimensionalise(input_array):
if input_array.shape == (1,):
return input_array.reshape(1,1)
else:
return input_array
Is this the best I am going to get or does NumPy have something more 'native'?
Addition:
As pointed out in https://stackoverflow.com/a/31698471/865169, I was doing the indexing wrong. I really wanted to do:
sub_array = orig_array[indices_h][:, indices_w]
This does not work when there is only one entry in indices_h, but combining it with np.atleast_2d suggested in another answer, I arrive at:
sub_array = np.atleast_2d(orig_array[indices_h])[:, indices_w]
It sounds like you might be looking for atleast_2d. This function returns a view of a 1D array as a 2D array:
>>> arr1 = np.array([1.7]) # shape (1,)
>>> np.atleast_2d(arr1)
array([[ 1.7]])
>>> _.shape
(1, 1)
Arrays that are already 2D (or have more dimensions) are unchanged:
>>> arr2 = np.arange(4).reshape(2,2) # shape (2, 2)
>>> np.atleast_2d(arr2)
array([[0, 1],
[2, 3]])
>>> _.shape
(2, 2)
When defining a numpy array you can use the keyword argument ndmin to specify that you want at least two dimensions.
e.g.
arr = np.array(item_list, ndmin=2)
arr.shape
>>> (100, 1) # if item_list is 100 elements long etc
In the example in the question, just do
sub_array = np.array(orig_array[indices_h, indices_w], ndmin=2)
sub_array.shape
>>> (1,1)
This can be extended to higher dimensions too, unlike np.atleast_2d().
Are you sure you are indexing in the way you want to? In the case where indices_h and indices_w are broadcastable integer indexing arrays, the result will have the broadcasted shape of indices_h and indices_w. So if you want to make sure that the result is 2D, make the indices arrays 2D.
Otherwise, if you want all combinations of indices_h[i] and indices_w[j] (for all i, j), do e.g. a sequential indexing:
sub_array = orig_array[indices_h][:, indices_w]
Have a look at the documentation for details about advanced indexing.
I have a numpy array which looks like:
myArray = np.array([[1,2],[3]])
But I can not flatten it,
In: myArray.flatten()
Out: array([[1, 2], [3]], dtype=object)
If I change the array to the same length in the second axis, then I can flatten it.
In: myArray2 = np.array([[1,2],[3,4]])
In: myArray2.flatten()
Out: array([1, 2, 3, 4])
My Question is:
Can I use some thing like myArray.flatten() regardless the dimension of the array and the length of its elements, and get the output: array([1,2,3])?
myArray is a 1-dimensional array of objects. Your list objects will simply remain in the same order with flatten() or ravel(). You can use hstack to stack the arrays in sequence horizontally:
>>> np.hstack(myArray)
array([1, 2, 3])
Note that this is basically equivalent to using concatenate with an axis of 1 (this should make sense intuitively):
>>> np.concatenate(myArray, axis=1)
array([1, 2, 3])
If you don't have this issue however and can merge the items, it is always preferable to use flatten() or ravel() for performance:
In [1]: u = timeit.Timer('np.hstack(np.array([[1,2],[3,4]]))'\
....: , setup = 'import numpy as np')
In [2]: print u.timeit()
11.0124390125
In [3]: u = timeit.Timer('np.array([[1,2],[3,4]]).flatten()'\
....: , setup = 'import numpy as np')
In [4]: print u.timeit()
3.05757689476
Iluengo's answer also has you covered for further information as to why you cannot use flatten() or ravel() given your array type.
Well, I agree with the other answers when they say that hstack or concatenate do the job in this case. However, I would like to point that even if it 'fixes' the problem, the problem is not addressed properly.
The problem is that even if it looks like the second axis has different length, this is not true in practice. If you try:
>>> myArray.shape
(2,)
>>> myArray.dtype
dtype('O') # stands for Object
>>> myArray[0]
[1, 2]
It shows you that your array is not a 2D array with variable size (as you might think), it is just a 1D array of objects. In your case, the elements are list, being the first element of your array a 2-element list and the second element of the array is a 1-element list.
So, flatten and ravel won't work because transforming 1D array to a 1D array results in exactly the same 1D array. If you have a object numpy array, it won't care about what you put inside, it will treat individual items as unkown items and can't decide how to merge them.
What you should have in consideration, is if this is the behaviour you want for your application. Numpy arrays are specially efficient with fixed-size numeric matrices. If you are playing with arrays of objects, I don't see why would you like to use Numpy instead of regular python lists.
np.hstack works in this case
In [69]: np.hstack(myArray)
Out[69]: array([1, 2, 3])