Numpy: are bound checks necessary when slicing arrays

Numpy: are bound checks necessary when slicing arrays - python

If you do e.g. the following:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]])
print(a[2:10])
Python won't complain and prints the array as in a[2:] which would be great in my usecase. I want to loop through a large array and slice it into equally sized chunks until the array is "used up". The last array can thus be smaller than the rest which doesn't matter to me.
However: I'm concerned about security leaks, performance leaks, the possibility for this behaviour to become deprecated in the near future, etc.. Is it safe and intended to use slicing like this or should it be avoided and I have to go the extra mile to make sure the last chunk is sliced as a[2:] or a[2:len(a)]?
There are related Answers like this but I haven't found anything addressing my concerns

Slice resolution is not done in numpy. slice objects have a convenience method called indices method, which is only documented in the C API under PySlice_GetIndices. In fact the python documentation states that they have no functionality besides storing indices.
When you run a[2:10], the slice object is slice(2, 10), and the length of the axis is a.shape[0] == 5:
>>> slice(2, 10).indices(5)
(2, 5, 1)
This is builtin python behavior, at a lower level than numpy. The linked question has an example of getting an error for the corresponding index:
>>> a[np.arange(2, 10)]
In this case, the passed object is not a slice, so it does get handled by numpy, and raises an error:
IndexError: index 5 is out of bounds for axis 0 with size 5
This is the same error that you would get if you tried accessing the invalid index individually:
>>> a[5]
...
IndexError: index 5 is out of bounds for axis 0 with size 5
Incidentally, python lists and tuples will check the bounds on a scalar index as well:
>>> a.tolist()[5]
...
IndexError: list index out of range
You can implement your own bounds checking, for example to create a fancy index using slice.indices:
>>> a[np.arange(*slice(2, 10).indices(a.shape[0]))]
array([[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])

Related

How to replace certain elements of a NumPy array via an index array

I have an numpy array a that I would like to replace some elements. I have the value of the new elements in a tuple/numpy array and the indexes of the elements of a that needs to be replaced in another tuple/numpy array. Below is an example of using python to do what I want.How do I do this efficiently in NumPy?
Example script:
a = np.arange(10)
print( f'a = {a}' )
newvalues = (10, 20, 35)
indexes = (2, 4, 6)
for n,i in enumerate( indexes ):
a[i]=newvalues[n]
print( f'a = {a}' )
Output:
a = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = array([ 0, 1, 10, 3, 20, 5, 35, 7, 8, 9])
I tried a[indexes]=newvalues but got IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

The list of indices indicating which elements you want to replace should be a Python list (or similar type), not a tuple. Different items in the selection tuple indicate that they should be selected from different axis dimensions.
Therefore, a[(2, 4, 6)] is the same as a[2, 4, 6], which is interpreted as the value at index 2 in the first dimension, index 4 in the second dimension, and index 6 in the third dimension.
The following code works correctly:
indexes = [2, 4, 6]
a[indexes] = newvalues
See also the page on Indexing from the numpy documentation, specifically the second 'Note' block in the introduction as well as the first 'Warning' under Advanced Indexing:
In Python, x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN]; the latter is just syntactic sugar for the former.
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs.

Lists in Python vs vectors and matrices in Matlab

Coming from Matlab I am unable to even think of singular datapoints / variables. Anything I deal with is a matrix / array. After one week of searching and insuccesful trial and error I realise, that I ABSOLUTELY do NOT get the concept of dealing with matrices in (plain) Python.
I created
In[]: A = [[1,2,3], [9,8,7], [5,5,5]]
In[]: A
Out[]: [[1, 2, 3], [9, 8, 7], [5, 5, 5]]
Trying to extract the vectors in the matrix along the two dimensions:
In[]: A[:][1]
Out[]: [9, 8, 7]
In[]: A[1][:]
Out[]: [9, 8, 7]
'surprisingly' gives the same! No way to get a specific column (of course, except with one by one iteration).
Consequently, I am unable to manage merging matrix A with another vector, i.e. extending A with another column. Matlab style approach obviously is odd:
In[]: B = A, [4,6,8]
In[]: B
Out[]: ([[1, 2, 3], [9, 8, 7], [5, 5, 5]], [4, 6, 8])
Results in something nested, not an extension of A.
Same for
B = [A, [4,6,8]]
Ok, more Python-like:
A.append([11,12,13])
This easily adds a row. But is there a similar way to add a column??
(The frustrating thing is that Python doc gives all kinds of fancy examples but apparently these focus on demonstrating 'pythonic' solutions for one-dimensional lists.)

Coming from MATLAB myself, I understand your point.
The problem is that Python lists are not designed to serve as matrices. When indexing a list, you always work on the top level list elements, e.g. A[:][1] returns all the ([:]) three list elements, namely [1, 2, 3], [9, 8, 7] and [5, 5, 5]. Then you select the second ([1]) element from those, i.e. [9, 8, 7]. A[1][:] does the same, just the other way round.
This being said, you can still use nested lists for simple indexing tasks, as A[1][1] gives the expected result (8). However, if you are planing to migrate your whole MATLAB code to Python or work on non-trivial matrix problems, you should definitely consider using NumPy. There is even a NumPy guide for former MATLAB users.

Dynamic Python Array Slicing

I am facing a situation where I have a VERY large numpy.ndarray (really, it's an hdf5 dataset) that I need to find a subset of quickly because they entire array cannot be held in memory. However, I also do not want to iterate through such an array (even declaring the built-in numpy iterator throws a MemoryError) because my script would take literally days to run.
As such, I'm faced with the situation of iterating through some dimensions of the array so that I can perform array-operations on pared down subsets of the full array. To do that, I need to be able to dynamically slice out a subset of the array. Dynamic slicing means constructing a tuple and passing it.
For example, instead of
my_array[0,0,0]
I might use
my_array[(0,0,0,)]
Here's the problem: if I want to slice out all values along a particular dimension/axis of the array manually, I could do something like
my_array[0,:,0]
> array([1, 4, 7])
However, I this does not work if I use a tuple:
my_array[(0,:,0,)]
where I'll get a SyntaxError.
How can I do this when I have to construct the slice dynamically to put something in the brackets of the array?

You could slice automaticaly using python's slice:
>>> a = np.random.rand(3, 4, 5)
>>> a[0, :, 0]
array([ 0.48054702, 0.88728858, 0.83225113, 0.12491976])
>>> a[(0, slice(None), 0)]
array([ 0.48054702, 0.88728858, 0.83225113, 0.12491976])
The slice method reads as slice(*start*, stop[, step]). If only one argument is passed, then it is interpreted as slice(0, stop).
In the example above : is translated to slice(0, end) which is equivalent to slice(None).
Other slice examples:
:5 -> slice(5)
1:5 -> slice(1, 5)
1: -> slice(1, None)
1::2 -> slice(1, None, 2)

Okay, I finally found an answer just as someone else did.
Suppose I have array:
my_array[...]
>array(
[[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]])
I can use the slice object, which apparently is a thing:
sl1 = slice( None )
sl2 = slice( 1,2 )
sl3 = slice( None )
ad_array.matrix[(sl1, sl2, sl3)]
>array(
[[[ 4, 5, 6]],
[[13, 14, 15]]])

Python numpy array index can be -1?

I'm trying out opencv samples from https://github.com/Itseez/opencv/blob/master/samples/python2/letter_recog.py and I need help deciphering this code..
new_samples = np.zeros((sample_n * self.class_n, var_n+1), np.float32)
new_samples[:,:-1] = np.repeat(samples, self.class_n, axis=0)
new_samples[:,-1] = np.tile(np.arange(self.class_n), sample_n)
I know what np.repeat and np.tile are, but I'm not sure what new_samples[:,:-1] or new_samples[:,-1] are supposed to do, with the -1 index. I know how numpy array indexing works, but have not seen this case. I could not find solutions from searching.

Python slicing and numpy slicing are slightly different. But in general -1 in arrays or lists means counting backwards (from last item). It is mentioned in the Information Introduction for strings as:
>>> word = 'Python'
>>> word[-1] #last character
'n'
And for lists as:
>>> squares = [1, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]
>>> squares[-1]
25
This can be also expanded to numpy array indexing as in your example.
new_samples[:,:-1] means all rows except the last columns
new_samples[:,-1] means all rows and last column only

2darray indexing in numpy, python [duplicate]

This question already has answers here:
NumPy selecting specific column index per row by using a list of indexes
(7 answers)
Closed 2 years ago.
Is there a better way to get the "output_array" from the "input_array" and "select_id" ?
Can we get rid of range( input_array.shape[0] ) ?
>>> input_array = numpy.array( [ [3,14], [12, 5], [75, 50] ] )
>>> select_id = [0, 1, 1]
>>> print input_array
[[ 3 14]
[12 5]
[75 50]]
>>> output_array = input_array[ range( input_array.shape[0] ), select_id ]
>>> print output_array
[ 3 5 50]

You can choose from given array using numpy.choose which constructs an array from an index array (in your case select_id) and a set of arrays (in your case input_array) to choose from. However you may first need to transpose input_array to match dimensions. The following shows a small example:
In [101]: input_array
Out[101]:
array([[ 3, 14],
[12, 5],
[75, 50]])
In [102]: input_array.shape
Out[102]: (3, 2)
In [103]: select_id
Out[103]: [0, 1, 1]
In [104]: output_array = np.choose(select_id, input_array.T)
In [105]: output_array
Out[105]: array([ 3, 5, 50])

(because I can't post this as a comment on the accepted answer)
Note that numpy.choose only works if you have 32 or fewer choices (in this case, the dimension of your array along which you're indexing must be of size 32 or smaller). Additionally, the documentation for numpy.choose says
To reduce the chance of misinterpretation, even though the following "abuse" is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.
The OP asks:
Is there a better way to get the output_array from the input_array and select_id?
I would say, the way you originally suggested seems the best out of those presented here. It is easy to understand, scales to large arrays, and is efficient.
Can we get rid of range(input_array.shape[0])?
Yes, as shown by other answers, but the accepted one doesn't work in general so well as what the OP already suggests doing.

I think enumerate is handy.
[input_array[enum, item] for enum, item in enumerate(select_id)]

How about:
[input_array[x,y] for x,y in zip(range(len(input_array[:,0])),select_id)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: are bound checks necessary when slicing arrays - python

Related

How to replace certain elements of a NumPy array via an index array

Lists in Python vs vectors and matrices in Matlab

Dynamic Python Array Slicing

Python numpy array index can be -1?

2darray indexing in numpy, python [duplicate]

Categories

Resources