2darray indexing in numpy, python [duplicate] - python

This question already has answers here:
NumPy selecting specific column index per row by using a list of indexes
(7 answers)
Closed 2 years ago.
Is there a better way to get the "output_array" from the "input_array" and "select_id" ?
Can we get rid of range( input_array.shape[0] ) ?
>>> input_array = numpy.array( [ [3,14], [12, 5], [75, 50] ] )
>>> select_id = [0, 1, 1]
>>> print input_array
[[ 3 14]
[12 5]
[75 50]]
>>> output_array = input_array[ range( input_array.shape[0] ), select_id ]
>>> print output_array
[ 3 5 50]

You can choose from given array using numpy.choose which constructs an array from an index array (in your case select_id) and a set of arrays (in your case input_array) to choose from. However you may first need to transpose input_array to match dimensions. The following shows a small example:
In [101]: input_array
Out[101]:
array([[ 3, 14],
[12, 5],
[75, 50]])
In [102]: input_array.shape
Out[102]: (3, 2)
In [103]: select_id
Out[103]: [0, 1, 1]
In [104]: output_array = np.choose(select_id, input_array.T)
In [105]: output_array
Out[105]: array([ 3, 5, 50])

(because I can't post this as a comment on the accepted answer)
Note that numpy.choose only works if you have 32 or fewer choices (in this case, the dimension of your array along which you're indexing must be of size 32 or smaller). Additionally, the documentation for numpy.choose says
To reduce the chance of misinterpretation, even though the following "abuse" is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.
The OP asks:
Is there a better way to get the output_array from the input_array and select_id?
I would say, the way you originally suggested seems the best out of those presented here. It is easy to understand, scales to large arrays, and is efficient.
Can we get rid of range(input_array.shape[0])?
Yes, as shown by other answers, but the accepted one doesn't work in general so well as what the OP already suggests doing.

I think enumerate is handy.
[input_array[enum, item] for enum, item in enumerate(select_id)]

How about:
[input_array[x,y] for x,y in zip(range(len(input_array[:,0])),select_id)]

Related

Numpy: are bound checks necessary when slicing arrays

If you do e.g. the following:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]])
print(a[2:10])
Python won't complain and prints the array as in a[2:] which would be great in my usecase. I want to loop through a large array and slice it into equally sized chunks until the array is "used up". The last array can thus be smaller than the rest which doesn't matter to me.
However: I'm concerned about security leaks, performance leaks, the possibility for this behaviour to become deprecated in the near future, etc.. Is it safe and intended to use slicing like this or should it be avoided and I have to go the extra mile to make sure the last chunk is sliced as a[2:] or a[2:len(a)]?
There are related Answers like this but I haven't found anything addressing my concerns
Slice resolution is not done in numpy. slice objects have a convenience method called indices method, which is only documented in the C API under PySlice_GetIndices. In fact the python documentation states that they have no functionality besides storing indices.
When you run a[2:10], the slice object is slice(2, 10), and the length of the axis is a.shape[0] == 5:
>>> slice(2, 10).indices(5)
(2, 5, 1)
This is builtin python behavior, at a lower level than numpy. The linked question has an example of getting an error for the corresponding index:
>>> a[np.arange(2, 10)]
In this case, the passed object is not a slice, so it does get handled by numpy, and raises an error:
IndexError: index 5 is out of bounds for axis 0 with size 5
This is the same error that you would get if you tried accessing the invalid index individually:
>>> a[5]
...
IndexError: index 5 is out of bounds for axis 0 with size 5
Incidentally, python lists and tuples will check the bounds on a scalar index as well:
>>> a.tolist()[5]
...
IndexError: list index out of range
You can implement your own bounds checking, for example to create a fancy index using slice.indices:
>>> a[np.arange(*slice(2, 10).indices(a.shape[0]))]
array([[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])

Is there any way of getting multiple ranges of values in numpy array at once?

Let's say we have a simple 1D ndarray. That is:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
I want to get the first 3 and the last 2 values, so that the output would be [ 1 2 3 9 10].
I have already solved this by merging and concatenating the merged variables as follows :
b= a[:2]
c= a[-2:]
a=np.concatenate([b,c])
However I would like to know if there is a more direct way to achieve this using slices, such as a[:2 and -2:] for instance. As an alternative I already tried this :
a = a[np.r_[:2, -2:]]
but it not seems to be working. It returns me only the first 2 values that is [1 2] ..
Thanks in advance!
Slicing a numpy array needs to be continuous AFAIK. The np.r_[-2:] does not work because it does not know how big the array a is. You could do np.r_[:2, len(a)-2:len(a)], but this will still copy the data since you are indexing with another array.
If you want to avoid copying data or doing any concatenation operation you could use np.lib.stride_tricks.as_strided:
ds = a.dtype.itemsize
np.lib.stride_tricks.as_strided(a, shape=(2,2), strides=(ds * 8, ds)).ravel()
Output:
array([ 1, 2, 9, 10])
But since you want the first 3 and last 2 values the stride for accessing the elements will not be equal. This is a bit trickier, but I suppose you could do:
np.lib.stride_tricks.as_strided(a, shape=(2,3), strides=(ds * 8, ds)).ravel()[:-1]
Output:
array([ 1, 2, 3, 9, 10])
Although, this is a potential dangerous operation because the last element is reading outside the allocated memory.
In afterthought, I cannot find out a way do this operation without copying the data somehow. The numpy ravel in the code snippets above is forced to make a copy of the data. If you can live with using the shapes (2,2) or (2,3) it might work in some cases, but you will only have reading permission to a strided view and this should be enforced by setting the keyword writeable=False.
You could try to access the elements with a list of indices.
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a[[0,1,2,8,9]] # b should now be array([ 1, 2, 3, 9, 10])
Obviously, if your array is too long, you would not want to type out all the indices.
Thus, you could build the inner index list from for loops.
Something like that:
index_list = [i for i in range(3)] + [i for i in range(8, 10)]
b = a[index_list] # b should now be array([ 1, 2, 3, 9, 10])
Therefore, as long as you know where your desired elements are, you can access them individually.

pythonic way for axis-wise winner-take-all in numpy

I am wondering what the most concise and pythonic way to keep only the maximum element in each line of a 2D numpy array while setting all other elements to zeros. Example:
given the following numpy array:
a = [ [1, 8, 3 ,6],
[5, 5, 60, 1],
[63,9, 9, 23] ]
I want the answer to be:
b = [ [0, 8, 0, 0],
[0, 0, 60, 0],
[63,0, 0, 0 ] ]
I can think of several ways to solve that, but what interests me is whether there are python functions to so this just quickly
Thank you in advance
You can use np.max to take the maximum along one axis, then use np.where to zero out the non-maximal elements:
np.where(a == a.max(axis=1, keepdims=True), a, 0)
The keepdims=True argument keeps the singleton dimension after taking the max (i.e. so that a.max(1, keepdims=True).shape == (3, 1)), which simplifies broadcasting it against a.
Don't know what is pythonic, so I assume the way with most python specific grammar is pythonic.
It used two list comprehension, which is feature of python. but in this way it might not that concise.
b = [[y if y == max(x) else 0 for y in x] for x in a ]

Python numpy array index can be -1?

I'm trying out opencv samples from https://github.com/Itseez/opencv/blob/master/samples/python2/letter_recog.py and I need help deciphering this code..
new_samples = np.zeros((sample_n * self.class_n, var_n+1), np.float32)
new_samples[:,:-1] = np.repeat(samples, self.class_n, axis=0)
new_samples[:,-1] = np.tile(np.arange(self.class_n), sample_n)
I know what np.repeat and np.tile are, but I'm not sure what new_samples[:,:-1] or new_samples[:,-1] are supposed to do, with the -1 index. I know how numpy array indexing works, but have not seen this case. I could not find solutions from searching.
Python slicing and numpy slicing are slightly different. But in general -1 in arrays or lists means counting backwards (from last item). It is mentioned in the Information Introduction for strings as:
>>> word = 'Python'
>>> word[-1] #last character
'n'
And for lists as:
>>> squares = [1, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]
>>> squares[-1]
25
This can be also expanded to numpy array indexing as in your example.
new_samples[:,:-1] means all rows except the last columns
new_samples[:,-1] means all rows and last column only

Modifying block matrices in Python

I would like to take a matrix and modify blocks of it. For example, with a 4x4 matrix the {1,2},{1,2} block is to the top left quadrant ([0,1;4,5] below). The {4,1},{4,1} block is the top left quadrant if we rearrange the matrix so the 4th row/column is in position 1 and the 1st in position 2.
Let's made such a 4x4 matrix:
a = np.arange(16).reshape(4, 4)
print(a)
## [[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]]
Now one way of selecting the block, where I specify which rows/columns I want beforehand, is as follows:
C=[3,0]
a[[[C[0],C[0]],[C[1],C[1]]],[[C[0],C[1]],[C[0],C[1]]]]
## array([[15, 12],
## [ 3, 0]])
Here's another way:
a[C,:][:,C]
## array([[15, 12],
## [ 3, 0]])
Yet, if I have a 2x2 array, call it b, setting
a[C,:][:,C]=b
doesn't work but
a[[[C[0],C[0]],[C[1],C[1]]],[[C[0],C[1]],[C[0],C[1]]]]=b
does.
Why is this? And is this second way the most efficient possible? Thanks!
The relevant section from the numpy docs is
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#purely-integer-array-indexing
Advanced array indexing.
Adapting that example to your case:
In [213]: rows=np.array([[C[0],C[0]],[C[1],C[1]]])
In [214]: cols=np.array([[C[0],C[1]],[C[0],C[1]]])
In [215]: rows
array([[3, 3],
[0, 0]])
In [216]: cols
array([[3, 0],
[3, 0]])
In [217]: a[rows,cols]
array([[15, 12],
[ 3, 0]])
due to broadcasting, you don't need to repeat duplicate indices, thus:
a[[[3],[0]],[3,0]]
does just fine. np.ix_ is a convenience function to produce just such a pair:
np.ix_(C,C)
(array([[3],
[0]]),
array([[3, 0]]))
thus a short answer is:
a[np.ix_(C,C)]
A related function is meshgrid, which constructs full indexing arrays:
a[np.meshgrid(C,C,indexing='ij')]
np.meshgrid(C,C,indexing='ij') is the same as your [rows, cols]. See the functions doc for the significance of the 'ij' parameter.
np.meshgrid(C,C,indexing='ij',sparse=True) produces the same pair of arrays as np.ix_.
I don't think there's a serious difference in computational speed. Obviously some require less typing on your part.
a[:,C][C,:] works for viewing values, but not for modifying them. The details have to do with which actions make views and which make copies. The simple answer is, use only one layer of indexing if you want to modify values.
The indexing documentation:
Thus, x[ind1,...,ind2,:] acts like x[ind1][...,ind2,:] under basic slicing.
Thus a[1][3] += 7 works. But the doc also warns
Warning
The above is not true for advanced indexing.

Categories

Resources