Numpy array indexing with partial indices - python

I am trying to pull out a particular slice of a numpy array but don't know how to express it with a tuple of indices. Using a tuple of indices works if its length is the same as the number of dimensions:
ind = (1,2,3)
# these two values are the same
foo[1,2,3]
foo[ind]
But if I want to get a slice along one dimension the same notation doesn't work:
ind = (2,3)
# these two values are not the same
foo[:,2,3]
foo[:,ind]
# and this isn't even proper syntax
foo[:,*ind]
So is there a way to use a named tuple of indices together with slices?

Instead of using the : syntax you can explicitly create the slice object and add that to the tuple:
ind = (2, 3)
s = slice(None) # equivalent to ':'
foo[(s,) + ind] # add s to tuples
In contrast to using foo[:, ind], the result of this should be the same as foo[:,2,3].

For accessing 2D arrays...
I believe what you are suggesting should work. Be mindful that numpy arrays index starting from 0. So to pull the first and third column from the following matrix I use column indices 0 and 2.
import numpy as np
foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
ind = (0,2)
foo[:,ind]
For accessing 3D arrays...
3D numpy arrays are accessed by 3 values x[i,j,k] where "i" represents the first matrix slice, or
[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]]
from my example below. "j" represents the second matrix slice, or the rows of these matrices. And "k" represents their columns. i,j and k can be :, integer or tuple. So we can access particular slices by using two sets of named tuples as follows.
import numpy as np
foo2 = np.array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
ind1 = (1,2)
ind2 = (0,1)
foo2[:,ind1,ind2]

Related

Python: combining and rearranging arrays

I have two arrays:
array1=[[1,2],
[3,4],
[8,9]]
array2=[[11,12],
[13,14],
19,20]]
How can I combine the arrays to an array which looks like:
array=[([[1,11],
[3,13],
[8,19]]),
array([[[2,12],
[4,14],
[9,20]])]
Thank you in advance!
You can use np.concatenate, First output array can be created by concatenating first column of array1 and array2 along the axis =1, and similarly for second output array take second column from array1 and array2.
Use:
new_arr1 = np.concatenate((array1[:, 0:1], array2[:, 0:1]), axis = 1)
new_arr2 = np.concatenate((array1[:, 1:2], array2[:, 1:2]), axis = 1)
Output:
>>> new_arr1
array([[ 1, 11],
[ 3, 13],
[ 8, 19]])
>>> new_arr2
array([[ 2, 12],
[ 4, 14],
[ 9, 20]])
If you don't want to keep the original array we can do inplace changes which will only take extra memory for one column.
temp = array1[:, 1].copy()
array1[:, 1] = array2[:, 0]
array2[:, 0] = temp
Output:
>>> array1
array([[ 1, 11],
[ 3, 13],
[ 8, 19]])
>>> array2
array([[ 2, 12],
[ 4, 14],
[ 9, 20]])
Use this:
import numpy as np
array_new1 = np.array([[a[0],b[0]] for a,b in zip(array1,array2)])
array_new2 = np.array([[a[1],b[1]] for a,b in zip(array1,array2)])
zip function will take many iterables (lists) and iterate each elements of all iterables (lists) parallely.
So, here I just iterated over 0 indexed element of both the arrays parallelly to get new array 1. It is done using list comprehension which is just another form of using for loop.
For new array 2, I iterated over 1 indexed elements of both arrays parallelly and combined them using list comprehension.

Slicing arrays with lists

So, I create a numpy array:
a = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
A conventional slice a[1:3,1:3] returns
array([[ 6, 7],
[11, 12]])
as does using a list in the second a[1:3,[1,2]]
array([[ 6, 7],
[11, 12]])
However, a[[1,2],[1,2]] returns
array([ 6, 12])
Obviously I am not understanding something here. That said, slicing with a list might on occasion be very useful.
Cheers,
keng
You observed effect of so-called Advanced Indexing. Let consider example from link:
import numpy as np
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x)
[[1 2]
[3 4]
[5 6]]
print(x[[0, 1, 2], [0, 1, 0]]) # [1 4 5]
You might think about this as providing lists of (Cartesian) coordinates of grid, as
print(x[0,1]) # 1
print(x[1,1]) # 4
print(x[2,0]) # 5
In the last case, the two individual lists are treated as separate indexing operations (this is really awkward wording so please bear with me).
Numpy sees two lists of two integers and decides that you are therefore asking for two values. The row index of each value comes from the first list, while the column index of each value comes from the second list. Therefore, you get a[1,1] and a[2,2]. The : notation not only expands to the list you've accurately deduced, but also tells numpy that you want all the rows/columns in that range.
If you provide manually curated list indices, they have to be of the same size, because the size of each/any list is the number of elements you'll get back. For example, if you wanted the elements in columns 1 and 2 of rows 1,2,3:
>>> a[1:4,[1,2]]
array([[ 6, 7],
[11, 12],
[16, 17]])
But
>>> a[[1,2,3],[1,2]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
The former tells numpy that you want a range of rows and specific columns, while the latter says "get me the elements at (1,1), (2,2), and (3, hey! what the?! where's the other index?)"
a[[1,2],[1,2]] is reading this as, I want a[1,1] and a[2,2]. There are a few ways around this and I likely don't even have the best ways but you could try
a[[1,1,2,2],[1,2,1,2]]
This will give you a flattened version of above
a[[1,2]][:,[1,2]]
This will give you the correct slice, it works be taking the rows [1,2] and then columns [1,2].
It triggers advanced indexing so first slice is the row index, second is the column index. For each row, it selects the corresponding column.
a[[1,2], [1,2]] -> [a[1, 1], a[2, 2]] -> [6, 12]

Python Numpy syntax: what does array index as two arrays separated by comma mean?

I don't understand array as index in Python Numpy.
For example, I have a 2d array A in Numpy
[[1,2,3]
[4,5,6]
[7,8,9]
[10,11,12]]
What does A[[1,3], [0,1]] mean?
Just test it for yourself!
A = np.arange(12).reshape(4,3)
print(A)
>>> array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
By slicing the array the way you did (docs to slicing), you'll get the first row, zero-th column element and the third row, first column element.
A[[1,3], [0,1]]
>>> array([ 3, 10])
I'd highly encourage you to play around with that a bit and have a look at the documentation and the examples.
Your are creating a new array:
import numpy as np
A = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
A = np.array(A)
print(A[[1, 3], [0, 1]])
# [ 4 11]
See Indexing, Slicing and Iterating in the tutorial.
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas
Quoting the doc:
def f(x,y):
return 10*x+y
b = np.fromfunction(f, (5, 4), dtype=int)
print(b[2, 3])
# -> 23
You can also use a NumPy array as index of an array. See Index arrays in the doc.
NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples; see the end of this document for why this is). The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.

How to take elements along a given axis, given by their indices?

I have a 3D array and I need to "squeeze" it over the last axis, so that I get a 2D array. I need to do it in the following way. For each values of the indices for the first two dimensions I know the value of the index for the 3rd dimension from where the value should be taken.
For example, I know that if i1 == 2 and i2 == 7 then i3 == 11. It means that out[2,7] = inp[2,7,11]. This mapping from first two dimensions into the third one is given in another 2D array. In other words, I have an array in which on the position 2,7 I have 11 as a value.
So, my question is how to combine these two array (3D and 2D) to get the output array (2D).
In [635]: arr = np.arange(24).reshape(2,3,4)
In [636]: idx = np.array([[1,2,3],[0,1,2]])
In [637]: I,J = np.ogrid[:2,:3]
In [638]: arr[I,J,idx]
Out[638]:
array([[ 1, 6, 11],
[12, 17, 22]])
In [639]: arr
Out[639]:
array([[[ 0, 1, 2, 3], # 1
[ 4, 5, 6, 7], # 6
[ 8, 9, 10, 11]], # ll
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
I,J broadcast together to select a (2,3) set of values, matching idx:
In [640]: I
Out[640]:
array([[0],
[1]])
In [641]: J
Out[641]: array([[0, 1, 2]])
This is a generalization to 3d of the easier 2d problem - selecting one item from each row:
In [649]: idx
Out[649]:
array([[1, 2, 3],
[0, 1, 2]])
In [650]: idx[np.arange(2), [0,1]]
Out[650]: array([1, 1])
In fact we could convert the 3d problem into a 2d one:
In [655]: arr.reshape(6,4)[np.arange(6), idx.ravel()]
Out[655]: array([ 1, 6, 11, 12, 17, 22])
Generalizing the original case:
In [55]: arr = np.arange(24).reshape(2,3,4)
In [56]: idx = np.array([[1,2,3],[0,1,2]])
In [57]: IJ = np.ogrid[[slice(i) for i in idx.shape]]
In [58]: IJ
Out[58]:
[array([[0],
[1]]), array([[0, 1, 2]])]
In [59]: (*IJ,idx)
Out[59]:
(array([[0],
[1]]), array([[0, 1, 2]]), array([[1, 2, 3],
[0, 1, 2]]))
In [60]: arr[_]
Out[60]:
array([[ 1, 6, 11],
[12, 17, 22]])
The key is in combining the IJ list of arrays with the idx to make a new indexing tuple. Constructing the tuple is a little messier if idx isn't the last index, but it's still possible. E.g.
In [61]: (*IJ[:-1],idx,IJ[-1])
Out[61]:
(array([[0],
[1]]), array([[1, 2, 3],
[0, 1, 2]]), array([[0, 1, 2]]))
In [62]: arr.transpose(0,2,1)[_]
Out[62]:
array([[ 1, 6, 11],
[12, 17, 22]])
Of if it's easier transpose arr to the idx dimension is last. The key is that the index operation takes a tuple of index arrays, arrays which broadcast against each other to select specific items.
That's what ogrid is doing, create the arrays that work with idx.
inp = np.random.random((20, 10, 5)) # simulate some input
i1, i2 = np.indices(inp.shape[:2])
i3 = np.random.randint(0, 5, size=inp.shape) # or implement whatever mapping
# you want between (i1,i2) and i3
out = inp[(i1, i2, i3)]
See https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing for more details
Using numpy.einsum
This can be achieved by a combination of array indexing and usage of numpy.einsum:
>>> numpy.einsum('ijij->ij', inp[:, :, indices])
inp[:, :, indices] creates a four-dimensional array where for each of the first two indices (the first two dimensions) all indices of the index array are applied to the third dimension. Because the index array is two-dimensional this results in 4D. However you only want those indices of the index array which correspond to the ones of the first two dimensions. This is then achieved by using the string ijij->ij. This tells einsum that you want to select only those elements where the indices of 1st and 3rd and 2nd and 4th axis are similar. Because the last two dimensions (3rd and 4th) were added by the index array this is similar to selecting only the index index[i, j] for the third dimension of inp.
Note that this method can really blow up the memory consumption. Especially if inp.shape[:2] is much greater than inp.shape[2] then inp[:, :, indices].size will be approximately inp.size ** 2.
Building the indices manually
First we prepare the new index array:
>>> idx = numpy.array(list(
... numpy.ndindex(*inp.shape[:2], 1) # Python 3 syntax
... ))
Then we update the column which corresponds to the third axis:
>>> idx[:, 2] = indices[idx[:, 0], idx[:, 1]]
Now we can select the elements and simply reshape the result:
>>> inp[tuple(idx.T)].reshape(*inp.shape[:2])
Using numpy.choose
Note: numpy.choose allows a maximum size of 32 for the axis which is chosen from.
According to this answer and the documentation of numpy.choose we can also use the following:
# First we need to bring the last axis to the front because
# `numpy.choose` chooses from the first axis.
>>> new_inp = numpy.moveaxis(inp, -1, 0)
# Now we can select the elements.
>>> numpy.choose(indices, new_inp)
Although the documentation discourages the use of a single array for the 2nd argument (the choices)
To reduce the chance of misinterpretation, even though the following “abuse” is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.
this seems to be the case only for preventing misunderstandings:
choices : sequence of arrays
Choice arrays. a and all of the choices must be broadcastable to the same shape. If choices is itself an array (not recommended), then its outermost dimension (i.e., the one corresponding to choices.shape[0]) is taken as defining the “sequence”.
So from my point of view there's nothing wrong with using numpy.choose that way, as long as one is aware of what they're doing.
I believe this should do it:
for i in range(n):
for j in range(m):
k = index_mapper[i][j]
value = input_3d[i][j][k]
out_2d[i][j] = value

Unpack NumPy array by column

If I have a NumPy array, for example 5x3, is there a way to unpack it column by column all at once to pass to a function rather than like this: my_func(arr[:, 0], arr[:, 1], arr[:, 2])?
Kind of like *args for list unpacking but by column.
You can unpack the transpose of the array in order to use the columns for your function arguments:
my_func(*arr.T)
Here's a simple example:
>>> x = np.arange(15).reshape(5, 3)
array([[ 0, 5, 10],
[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])
Let's write a function to add the columns together (normally done with x.sum(axis=1) in NumPy):
def add_cols(a, b, c):
return a+b+c
Then we have:
>>> add_cols(*x.T)
array([15, 18, 21, 24, 27])
NumPy arrays will be unpacked along the first dimension, hence the need to transpose the array.
numpy.split splits an array into multiple sub-arrays. In your case, indices_or_sections is 3 since you have 3 columns, and axis = 1 since we're splitting by column.
my_func(numpy.split(array, 3, 1))
I guess numpy.split will not suffice in the future. Instead, it should be
my_func(tuple(numpy.split(array, 3, 1)))
Currently, python prints the following warning:
FutureWarning: Using a non-tuple sequence for multidimensional
indexing is deprecated; use arr[tuple(seq)] instead of arr[seq].
In the future this will be interpreted as an array index,
arr[np.array(seq)], which will result either in an error or a
different result.

Categories

Resources