NumPy Indexing - all pairwise elements of lists of indices - python

When using slicing in NumPy, you get all pair-wise elements, e.g.:
>> im = np.arange(1,37).reshape((6, 6))
>> im[1:6:2,1:6:2]
array([[ 8, 10, 12],
[20, 22, 34],
[32, 34, 36]])
However when using lists/tuples of indices this behavior does not seem to be followed:
>> im[(1,3,5),(1,3,5)]
array([ 8, 22, 36])
>> im[[1,3,5],[1,3,5]]
array([ 8, 22, 36])
It is instead gets just the diagonal (in this case). This is problematic if you cannot specify indices as slices, for example (1,3,4) and (1,3,6). For those two tuples I would expect to get all elements at (1,1) (1,3) (1,6) (3,1) ...
All the workarounds I can think of involve fleshing out every pair of elements which is incredibly expensive when trying to extract large numbers of elements from massive images. In MATLAB, im([1,3,5],[1,3,5]) does what I would want. I know there are many tricks in NumPy's indexing and I am probably just missing some subtleties.
As a conclusion, example workarounds:
im[np.meshgrid([1,3,5], [1,3,5], indexing='ij')]
im[zip(*itertools.product([1,3,5], [1,3,5]))].reshape((3,3))

Try numpy.ix_:
>>> im[np.ix_((1,3,5),(1,3,5))]
array([[ 8, 10, 12],
[20, 22, 24],
[32, 34, 36]])
Or you can directly do this:
>>> ix = np.array([1, 3, 5])
>>> iy = np.array([1, 3, 5])
>>> im[ix[:, np.newaxis], iy[np.newaxis, :]]
array([[ 8, 10, 12],
[20, 22, 24],
[32, 34, 36]])

Is this what you need?
i1 = [1,3,5]
i2 = [1,3,5]
print im[i1][:,i2].ravel()
Note a temporary array is created on first indexing. If your array is very big, it might be undesirable.

The answer by other people is correct. Just to explain why this is happening.
From documentation of Indexing on numpy arrays -
When indexing like - x[obj] - Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool).
Your case falls into the second , and hence im[(1,3,5),(1,3,5)] triggers Advanced indexing. And later on in the documentation of Advanced indexing , it is explained -
Advanced indexes always are broadcast and iterated as one:
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
Note that the result shape is identical to the (broadcast) indexing array shapes ind_1, ..., ind_N.
That it result[i_1] would be - x[ind_1[i_1],ind_2[i_1],...ind_N[i_1]]
The documentation suggest to use np.ix_ to achieve behavior similar to basic slicing -
To achieve a behaviour similar to the basic slicing above, broadcasting can be used. The function ix_ can help with this broadcasting. This is best understood with an example.

Related

Delete multiple entries from a list given a list of indices

I have a list of indices and a list of data. The list of indices says which elements should be removed from the list of data. I would like to use the list of indices efficiently, i.e. without loops. Is there a faster way to remove these elements?
Assuming you use numpy, np.delete does exactly what you want:
>>> a = np.array([1, 4, 9, 16, 25, 36])
>>> np.delete(a, [1, 2, 5])
array([ 1, 16, 25])

Numpy: are bound checks necessary when slicing arrays

If you do e.g. the following:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]])
print(a[2:10])
Python won't complain and prints the array as in a[2:] which would be great in my usecase. I want to loop through a large array and slice it into equally sized chunks until the array is "used up". The last array can thus be smaller than the rest which doesn't matter to me.
However: I'm concerned about security leaks, performance leaks, the possibility for this behaviour to become deprecated in the near future, etc.. Is it safe and intended to use slicing like this or should it be avoided and I have to go the extra mile to make sure the last chunk is sliced as a[2:] or a[2:len(a)]?
There are related Answers like this but I haven't found anything addressing my concerns
Slice resolution is not done in numpy. slice objects have a convenience method called indices method, which is only documented in the C API under PySlice_GetIndices. In fact the python documentation states that they have no functionality besides storing indices.
When you run a[2:10], the slice object is slice(2, 10), and the length of the axis is a.shape[0] == 5:
>>> slice(2, 10).indices(5)
(2, 5, 1)
This is builtin python behavior, at a lower level than numpy. The linked question has an example of getting an error for the corresponding index:
>>> a[np.arange(2, 10)]
In this case, the passed object is not a slice, so it does get handled by numpy, and raises an error:
IndexError: index 5 is out of bounds for axis 0 with size 5
This is the same error that you would get if you tried accessing the invalid index individually:
>>> a[5]
...
IndexError: index 5 is out of bounds for axis 0 with size 5
Incidentally, python lists and tuples will check the bounds on a scalar index as well:
>>> a.tolist()[5]
...
IndexError: list index out of range
You can implement your own bounds checking, for example to create a fancy index using slice.indices:
>>> a[np.arange(*slice(2, 10).indices(a.shape[0]))]
array([[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])

Dynamic Python Array Slicing

I am facing a situation where I have a VERY large numpy.ndarray (really, it's an hdf5 dataset) that I need to find a subset of quickly because they entire array cannot be held in memory. However, I also do not want to iterate through such an array (even declaring the built-in numpy iterator throws a MemoryError) because my script would take literally days to run.
As such, I'm faced with the situation of iterating through some dimensions of the array so that I can perform array-operations on pared down subsets of the full array. To do that, I need to be able to dynamically slice out a subset of the array. Dynamic slicing means constructing a tuple and passing it.
For example, instead of
my_array[0,0,0]
I might use
my_array[(0,0,0,)]
Here's the problem: if I want to slice out all values along a particular dimension/axis of the array manually, I could do something like
my_array[0,:,0]
> array([1, 4, 7])
However, I this does not work if I use a tuple:
my_array[(0,:,0,)]
where I'll get a SyntaxError.
How can I do this when I have to construct the slice dynamically to put something in the brackets of the array?
You could slice automaticaly using python's slice:
>>> a = np.random.rand(3, 4, 5)
>>> a[0, :, 0]
array([ 0.48054702, 0.88728858, 0.83225113, 0.12491976])
>>> a[(0, slice(None), 0)]
array([ 0.48054702, 0.88728858, 0.83225113, 0.12491976])
The slice method reads as slice(*start*, stop[, step]). If only one argument is passed, then it is interpreted as slice(0, stop).
In the example above : is translated to slice(0, end) which is equivalent to slice(None).
Other slice examples:
:5 -> slice(5)
1:5 -> slice(1, 5)
1: -> slice(1, None)
1::2 -> slice(1, None, 2)
Okay, I finally found an answer just as someone else did.
Suppose I have array:
my_array[...]
>array(
[[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]])
I can use the slice object, which apparently is a thing:
sl1 = slice( None )
sl2 = slice( 1,2 )
sl3 = slice( None )
ad_array.matrix[(sl1, sl2, sl3)]
>array(
[[[ 4, 5, 6]],
[[13, 14, 15]]])

2darray indexing in numpy, python [duplicate]

This question already has answers here:
NumPy selecting specific column index per row by using a list of indexes
(7 answers)
Closed 2 years ago.
Is there a better way to get the "output_array" from the "input_array" and "select_id" ?
Can we get rid of range( input_array.shape[0] ) ?
>>> input_array = numpy.array( [ [3,14], [12, 5], [75, 50] ] )
>>> select_id = [0, 1, 1]
>>> print input_array
[[ 3 14]
[12 5]
[75 50]]
>>> output_array = input_array[ range( input_array.shape[0] ), select_id ]
>>> print output_array
[ 3 5 50]
You can choose from given array using numpy.choose which constructs an array from an index array (in your case select_id) and a set of arrays (in your case input_array) to choose from. However you may first need to transpose input_array to match dimensions. The following shows a small example:
In [101]: input_array
Out[101]:
array([[ 3, 14],
[12, 5],
[75, 50]])
In [102]: input_array.shape
Out[102]: (3, 2)
In [103]: select_id
Out[103]: [0, 1, 1]
In [104]: output_array = np.choose(select_id, input_array.T)
In [105]: output_array
Out[105]: array([ 3, 5, 50])
(because I can't post this as a comment on the accepted answer)
Note that numpy.choose only works if you have 32 or fewer choices (in this case, the dimension of your array along which you're indexing must be of size 32 or smaller). Additionally, the documentation for numpy.choose says
To reduce the chance of misinterpretation, even though the following "abuse" is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.
The OP asks:
Is there a better way to get the output_array from the input_array and select_id?
I would say, the way you originally suggested seems the best out of those presented here. It is easy to understand, scales to large arrays, and is efficient.
Can we get rid of range(input_array.shape[0])?
Yes, as shown by other answers, but the accepted one doesn't work in general so well as what the OP already suggests doing.
I think enumerate is handy.
[input_array[enum, item] for enum, item in enumerate(select_id)]
How about:
[input_array[x,y] for x,y in zip(range(len(input_array[:,0])),select_id)]

Combining slicing and broadcasted indexing for multi-dimensional numpy arrays

I have a ND numpy array (let say for instance 3x3x3) from wich I'd like to extract a sub-array, combining slices and index arrays. For instance:
import numpy as np
A = np.arange(3*3*3).reshape((3,3,3))
i0, i1, i2 = ([0,1], [0,1,2], [0,2])
ind1 = j0, j1, j2 = np.ix_(i0, i1, i2)
ind2 = (j0, slice(None), j2)
B1 = A[ind1]
B2 = A[ind2]
I would expect that B1 == B2, but actually, the shapes are different
>>> B1.shape
(2, 3, 2)
>>> B2.shape
(2, 1, 2, 3)
>>> B1
array([[[ 0, 2],
[ 3, 5],
[ 6, 8]],
[[ 9, 11],
[12, 14],
[15, 17]]])
>>> B2
array([[[[ 0, 3, 6],
[ 2, 5, 8]]],
[[[ 9, 12, 15],
[11, 14, 17]]]])
Someone understands why? Any idea of how I could get 'B1' by manipulating only 'A' and 'ind2' objects? The goal is that it would work for any nD arrays, and that I would not have to look for the shape of dimensions I want to keep entirely (hope I'm clear enough:)). Thanks!!
---EDIT---
To be clearer, I would like to have a function 'fun' such that
A[fun(ind2)] == B1
This is the closer I can get to your specs, I haven't been able to devise a solution that can compute the correct indices without knowing A (or, more precisely, its shape...).
import numpy as np
def index(A, s):
ind = []
groups = s.split(';')
for i, group in enumerate(groups):
if group == ":":
ind.append(range(A.shape[i]))
else:
ind.append([int(n) for n in group.split(',')])
return np.ix_(*ind)
A = np.arange(3*3*3).reshape((3,3,3))
ind2 = index(A,"0,1;:;0,2")
print A[ind2]
A shorter version
def index2(A,s):return np.ix_(*[range(A.shape[i])if g==":"else[int(n)for n in g.split(',')]for i,g in enumerate(s.split(';'))])
ind3 = index2(A,"0,1;:;0,2")
print A[ind3]
The indexing subspaces of ind1 are (2,),(3,),(2,), and the resulting B is (2,3,2). This is a simple case of advanced indexing.
ind2 is a case of (advanced) partial indexing. There are 2 indexed arrays, and 1 slice. The advanced indexing documentation states:
If the indexing subspaces are separated (by slice objects), then the broadcasted indexing space is first, followed by the sliced subspace of x.
In this case advanced indexing constructs a (2,2) array (from the 1st and 3rd indexes), and appends the slice dimension at the end, resulting in a (2,2,3) array.
I explain the reasoning in more detail in https://stackoverflow.com/a/27097133/901925
A way to fix a tuple like ind2, is to expand each slice into an array. I recently saw this done in np.insert.
np.arange(*ind2[1].indices(3))
expands : to [0,1,2]. But the replacement has to have the right shape.
ind=list(ind2)
ind[1]=np.arange(*ind2[1].indices(3)).reshape(1,-1,1)
A[ind]
I'm leaving off the details of determining which term is a slice, its dimension, and the relevant reshape. The goal is to reproduce i1.
If indices were generated by something other than ix_, reshaping this slice could be more difficult. For example
A[np.array([0,1])[None,:,None],:,np.array([0,2])[None,None,:]] # (1,2,2,3)
A[np.array([0,1])[None,:,None],np.array([0,1,2])[:,None,None],np.array([0,2])[None,None,:]]
# (3,2,2)
The expanded slice has to be compatible with the other arrays under broadcasting.
Swapping axes after indexing is another option. The logic, though, might be more complex.
But in some cases transposing might actually be simpler:
A[np.array([0,1])[:,None],:,np.array([0,2])[None,:]].transpose(2,0,1)
# (3,2,2)
A[np.array([0,1])[:,None],:,np.array([0,2])[None,:]].transpose(0,2,1)
# (2, 3, 2)
In restricted indexing cases like this using ix_, it is possible to do the indexing in successive steps.
A[ind1]
is the same as
A[i1][:,i2][:,:,i3]
and since i2 is the full range,
A[i1][...,i3]
If you only have ind2 available
A[ind2[0].flatten()][[ind2[2].flatten()]
In more general contexts you have to know how j0,j1,j2 broadcast with each other, but when they are generated by ix_, the relationship is simple.
I can imagine circumstances in which it would be convenient to assign A1 = A[i1], followed by a variety of actions involving A1, including, but not limited to A1[...,i3]. You have to be aware of when A1 is a view, and when it is a copy.
Another indexing tool is take:
A.take(i0,axis=0).take(i2,axis=2)

Categories

Resources