Advanced Integer slicing when slicing object is an ndarray tuple - python

I understand how
x=np.array([[1, 2], [3, 4], [5, 6]]
y = x[[0,1,2], [0,1,0]]
Output gives y= [1 4 5] This just takes the first list as rows and seconds list and columns.
But how does the the below work?
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
rows = np.array([[0,0],[3,3]])
cols = np.array([[0,2],[0,2]])
y = x[rows,cols]
This gives the output of :
[[ 0 2]
[ 9 11]]
Can you please explain the logic when using ndarrays as slicing object? Why does it have a 2d array for both rows and columns. How are the rules different when the slicing object is a ndarray as opposed to a python list?

We've the following array x
x = np.array([[1, 2], [3, 4], [5, 6]]
And the indices [0, 1, 2] and [0, 1, 0] which when indexed into x like
x[[0,1,2], [0,1,0]]
gives
[1, 4, 5]
The indices that we used basically translates to:
[0, 1, 2] & [0, 1, 0] --> [0,0], [1,1], [2,0]
Since we used 1D list as indices, we get 1D array as result.
With that knowledge, let's see the next case. Now, we've the array x as:
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
Now the indices are 2D arrays.
rows = np.array([[0,0],[3,3]])
cols = np.array([[0,2],[0,2]])
This when indexed into the array x like:
x[rows,cols]
simply translates to:
[[0,0],[3,3]]
| | | | ====> [[0,0]], [[0,2]], [[3,0]], [[3,2]]
[[0,2],[0,2]]
Now, it's easy to observe how these 4 list of list when indexed into the array x would give the following result (i.e. here it simply returns the corner elements from our array x):
[[ 0, 2]
[ 9, 11]]
Note that in this case we get the result as a 2D array (as opposed to 1D array in the first case) since our indices rows & columns itself were 2D arrays (i.e. equivalently list of list) whereas in the first case our indices were 1D arrays (or equivalently simple list without any nesting).
So, if you need 2D arrays as result, you need to give 2D arrays as indices.

The easiest way to wrap one's head around this is the following observation: The shape of the output is determined by the shape of the index array, or more precisely the shape resulting from broadcasting all the index arrays together.
Look at it like that: you have an array A of a given shape and another array V of some other shape and you want to fill A with values from V. What do you need to specify? Well, for each position in A you need to specify coordinates of some element in V. Therefore if V is ND you need N index arrays of the same shape as A or at least broadcastable to that. Then you index V by putting these index arrays at their coordinate positions in the [] expression.

To stay simple, we'll stay 2D and assume rows.shape = cols.shape. (You can break this rule with broadcasting, but for now we won't). We'll call this shape (I, J)
then y = x[rows, cols] is the same as:
y = np.empty((I, J))
for i in range(I):
for j in range(J):
y[i, j] = x[rows[i, j], cols[i, j]]

Related

Numpy: Swap value in 2D numpy array

I have a 2D numpy array:
arr = np.array(([[6,1,2],
[3,4,5],
[0,7,8]]))
I use a other 1D numpy array:
value = np.asarray([9,8,7,6,5,4,3,2,1])
I would like to change the values ​​of my 2D array with the index value of my 1D array
For example:
In my 2D array at position (0,0), I have the value 6. I must therefore modify the value (0,0) by the value present at index 6 of my 1D array, therefore 3.
So far I have this code:
value = np.asarray([9,8,7,6,5,4,3,2,1])
arr = np.array(([[6,1,2],[3,4,5],[0,7,8]]))
print(arr)
#[[6 1 2]
#[3 4 5]
#[0 7 8]]
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
arr[i,j] = value[arr[i,j]]
print(arr)
#[[3 8 7]
#[6 5 4]
#[9 2 1]]
The problem is that this code takes time on large tables. (10 seconds for an array of size 4096²)
Is there an effective way to solve this problem?
This is very simple, you just need a single command. Numpy automatically takes care of the vectorization.
arr = value[arr]
Here is an example with the data you provided:
>>> value[arr]
array([[3, 8, 7],
[6, 5, 4],
[9, 2, 1]])

Python Numpy syntax: what does array index as two arrays separated by comma mean?

I don't understand array as index in Python Numpy.
For example, I have a 2d array A in Numpy
[[1,2,3]
[4,5,6]
[7,8,9]
[10,11,12]]
What does A[[1,3], [0,1]] mean?
Just test it for yourself!
A = np.arange(12).reshape(4,3)
print(A)
>>> array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
By slicing the array the way you did (docs to slicing), you'll get the first row, zero-th column element and the third row, first column element.
A[[1,3], [0,1]]
>>> array([ 3, 10])
I'd highly encourage you to play around with that a bit and have a look at the documentation and the examples.
Your are creating a new array:
import numpy as np
A = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
A = np.array(A)
print(A[[1, 3], [0, 1]])
# [ 4 11]
See Indexing, Slicing and Iterating in the tutorial.
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas
Quoting the doc:
def f(x,y):
return 10*x+y
b = np.fromfunction(f, (5, 4), dtype=int)
print(b[2, 3])
# -> 23
You can also use a NumPy array as index of an array. See Index arrays in the doc.
NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples; see the end of this document for why this is). The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.

How to take elements along a given axis, given by their indices?

I have a 3D array and I need to "squeeze" it over the last axis, so that I get a 2D array. I need to do it in the following way. For each values of the indices for the first two dimensions I know the value of the index for the 3rd dimension from where the value should be taken.
For example, I know that if i1 == 2 and i2 == 7 then i3 == 11. It means that out[2,7] = inp[2,7,11]. This mapping from first two dimensions into the third one is given in another 2D array. In other words, I have an array in which on the position 2,7 I have 11 as a value.
So, my question is how to combine these two array (3D and 2D) to get the output array (2D).
In [635]: arr = np.arange(24).reshape(2,3,4)
In [636]: idx = np.array([[1,2,3],[0,1,2]])
In [637]: I,J = np.ogrid[:2,:3]
In [638]: arr[I,J,idx]
Out[638]:
array([[ 1, 6, 11],
[12, 17, 22]])
In [639]: arr
Out[639]:
array([[[ 0, 1, 2, 3], # 1
[ 4, 5, 6, 7], # 6
[ 8, 9, 10, 11]], # ll
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
I,J broadcast together to select a (2,3) set of values, matching idx:
In [640]: I
Out[640]:
array([[0],
[1]])
In [641]: J
Out[641]: array([[0, 1, 2]])
This is a generalization to 3d of the easier 2d problem - selecting one item from each row:
In [649]: idx
Out[649]:
array([[1, 2, 3],
[0, 1, 2]])
In [650]: idx[np.arange(2), [0,1]]
Out[650]: array([1, 1])
In fact we could convert the 3d problem into a 2d one:
In [655]: arr.reshape(6,4)[np.arange(6), idx.ravel()]
Out[655]: array([ 1, 6, 11, 12, 17, 22])
Generalizing the original case:
In [55]: arr = np.arange(24).reshape(2,3,4)
In [56]: idx = np.array([[1,2,3],[0,1,2]])
In [57]: IJ = np.ogrid[[slice(i) for i in idx.shape]]
In [58]: IJ
Out[58]:
[array([[0],
[1]]), array([[0, 1, 2]])]
In [59]: (*IJ,idx)
Out[59]:
(array([[0],
[1]]), array([[0, 1, 2]]), array([[1, 2, 3],
[0, 1, 2]]))
In [60]: arr[_]
Out[60]:
array([[ 1, 6, 11],
[12, 17, 22]])
The key is in combining the IJ list of arrays with the idx to make a new indexing tuple. Constructing the tuple is a little messier if idx isn't the last index, but it's still possible. E.g.
In [61]: (*IJ[:-1],idx,IJ[-1])
Out[61]:
(array([[0],
[1]]), array([[1, 2, 3],
[0, 1, 2]]), array([[0, 1, 2]]))
In [62]: arr.transpose(0,2,1)[_]
Out[62]:
array([[ 1, 6, 11],
[12, 17, 22]])
Of if it's easier transpose arr to the idx dimension is last. The key is that the index operation takes a tuple of index arrays, arrays which broadcast against each other to select specific items.
That's what ogrid is doing, create the arrays that work with idx.
inp = np.random.random((20, 10, 5)) # simulate some input
i1, i2 = np.indices(inp.shape[:2])
i3 = np.random.randint(0, 5, size=inp.shape) # or implement whatever mapping
# you want between (i1,i2) and i3
out = inp[(i1, i2, i3)]
See https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing for more details
Using numpy.einsum
This can be achieved by a combination of array indexing and usage of numpy.einsum:
>>> numpy.einsum('ijij->ij', inp[:, :, indices])
inp[:, :, indices] creates a four-dimensional array where for each of the first two indices (the first two dimensions) all indices of the index array are applied to the third dimension. Because the index array is two-dimensional this results in 4D. However you only want those indices of the index array which correspond to the ones of the first two dimensions. This is then achieved by using the string ijij->ij. This tells einsum that you want to select only those elements where the indices of 1st and 3rd and 2nd and 4th axis are similar. Because the last two dimensions (3rd and 4th) were added by the index array this is similar to selecting only the index index[i, j] for the third dimension of inp.
Note that this method can really blow up the memory consumption. Especially if inp.shape[:2] is much greater than inp.shape[2] then inp[:, :, indices].size will be approximately inp.size ** 2.
Building the indices manually
First we prepare the new index array:
>>> idx = numpy.array(list(
... numpy.ndindex(*inp.shape[:2], 1) # Python 3 syntax
... ))
Then we update the column which corresponds to the third axis:
>>> idx[:, 2] = indices[idx[:, 0], idx[:, 1]]
Now we can select the elements and simply reshape the result:
>>> inp[tuple(idx.T)].reshape(*inp.shape[:2])
Using numpy.choose
Note: numpy.choose allows a maximum size of 32 for the axis which is chosen from.
According to this answer and the documentation of numpy.choose we can also use the following:
# First we need to bring the last axis to the front because
# `numpy.choose` chooses from the first axis.
>>> new_inp = numpy.moveaxis(inp, -1, 0)
# Now we can select the elements.
>>> numpy.choose(indices, new_inp)
Although the documentation discourages the use of a single array for the 2nd argument (the choices)
To reduce the chance of misinterpretation, even though the following “abuse” is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.
this seems to be the case only for preventing misunderstandings:
choices : sequence of arrays
Choice arrays. a and all of the choices must be broadcastable to the same shape. If choices is itself an array (not recommended), then its outermost dimension (i.e., the one corresponding to choices.shape[0]) is taken as defining the “sequence”.
So from my point of view there's nothing wrong with using numpy.choose that way, as long as one is aware of what they're doing.
I believe this should do it:
for i in range(n):
for j in range(m):
k = index_mapper[i][j]
value = input_3d[i][j][k]
out_2d[i][j] = value

Numpy 3d array indexing

I have a 3d numpy array (n_samples x num_components x 2) in the example below n_samples = 5 and num_components = 7.
I have another array (indices) which is the selected component for each sample which is of shape (n_samples,).
I want to select from the data array given the indices so that the resulting array is n_samples x 2.
The code is below:
import numpy as np
np.random.seed(77)
data=np.random.randint(low=0, high=10, size=(5, 7, 2))
indices = np.array([0, 1, 6, 4, 5])
#how can I select indices from the data array?
For example for data 0, the selected component should be the 0th and for data 1 the selected component should be 1.
Note that I can't use any for loops because I'm using it in Theano and the solution should be solely based on numpy.
Is this what you are looking for?
In [36]: data[np.arange(data.shape[0]),indices,:]
Out[36]:
array([[7, 4],
[7, 3],
[4, 5],
[8, 2],
[5, 8]])
To get component #0, use
data[:, 0]
i.e. we get every entry on axis 0 (samples), and only entry #0 on axis 1 (components), and implicitly everything on the remaining axes.
This can be easily generalized to
data[:, indices]
to select all relevant components.
But what OP really wants is just the diagonal of this array, i.e. (data[0, indices[0]], (data[1, indices[1]]), ...) The diagonal of a high-dimensional array can be extracted using the diagonal function:
>>> np.diagonal(data[:, indices])
array([[7, 7, 4, 8, 5],
[4, 3, 5, 2, 8]])
(You may need to transpose the result.)
You have a variety of ways to do so, but this is my loop recommendation:
selection = np.array([ datum[indices[k]] for k,datum in enumerate(data)])
The resulting array, selection, has the desired shape.

How to index an np.array with a list of indices in Python

Suppose I have an N-dimensional np.array (or just a list) and a list of N indices. What is the preferred/efficient way to index the array without using loops?
# 4D array with shape of (2, 3, 4, 5)
arr = np.random.random((2, 3, 4, 5))
index = [0, 2, 1, 3]
result = ??? # Equivalent to arr[0, 2, 1, 3]
Additionally, supplying only a 3D index the result should be an array of the last dimension.
index = [0, 2, 1]
result2 = ??? # Equivalent to arr[0, 2, 1]
Please note that I am not able to just index with the usual syntax because the implementation has to handle arrays of different shapes.
I am aware that NumPy supports indexing by an array but that behaves differently as it cherry-picks values from the array rather by indexing by dimension (https://docs.scipy.org/doc/numpy/user/basics.indexing.html).
Per the docs:
If one supplies to the index a tuple, the tuple will be interpreted as a list of indices.
Therefore, change index to a tuple:
In [46]: np.allclose(arr[tuple([0,2,1])], arr[0,2,1])
Out[46]: True
In [47]: np.allclose(arr[tuple([0,2,1,3])], arr[0,2,1,3])
Out[47]: True

Categories

Resources