Unpack NumPy array by column

Unpack NumPy array by column - python

If I have a NumPy array, for example 5x3, is there a way to unpack it column by column all at once to pass to a function rather than like this: my_func(arr[:, 0], arr[:, 1], arr[:, 2])?
Kind of like *args for list unpacking but by column.

You can unpack the transpose of the array in order to use the columns for your function arguments:
my_func(*arr.T)
Here's a simple example:
>>> x = np.arange(15).reshape(5, 3)
array([[ 0, 5, 10],
[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])
Let's write a function to add the columns together (normally done with x.sum(axis=1) in NumPy):
def add_cols(a, b, c):
return a+b+c
Then we have:
>>> add_cols(*x.T)
array([15, 18, 21, 24, 27])
NumPy arrays will be unpacked along the first dimension, hence the need to transpose the array.

numpy.split splits an array into multiple sub-arrays. In your case, indices_or_sections is 3 since you have 3 columns, and axis = 1 since we're splitting by column.
my_func(numpy.split(array, 3, 1))

I guess numpy.split will not suffice in the future. Instead, it should be
my_func(tuple(numpy.split(array, 3, 1)))
Currently, python prints the following warning:
FutureWarning: Using a non-tuple sequence for multidimensional
indexing is deprecated; use arr[tuple(seq)] instead of arr[seq].
In the future this will be interpreted as an array index,
arr[np.array(seq)], which will result either in an error or a
different result.

Related

Slicing arrays with lists

So, I create a numpy array:
a = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
A conventional slice a[1:3,1:3] returns
array([[ 6, 7],
[11, 12]])
as does using a list in the second a[1:3,[1,2]]
array([[ 6, 7],
[11, 12]])
However, a[[1,2],[1,2]] returns
array([ 6, 12])
Obviously I am not understanding something here. That said, slicing with a list might on occasion be very useful.
Cheers,
keng

You observed effect of so-called Advanced Indexing. Let consider example from link:
import numpy as np
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x)
[[1 2]
[3 4]
[5 6]]
print(x[[0, 1, 2], [0, 1, 0]]) # [1 4 5]
You might think about this as providing lists of (Cartesian) coordinates of grid, as
print(x[0,1]) # 1
print(x[1,1]) # 4
print(x[2,0]) # 5

In the last case, the two individual lists are treated as separate indexing operations (this is really awkward wording so please bear with me).
Numpy sees two lists of two integers and decides that you are therefore asking for two values. The row index of each value comes from the first list, while the column index of each value comes from the second list. Therefore, you get a[1,1] and a[2,2]. The : notation not only expands to the list you've accurately deduced, but also tells numpy that you want all the rows/columns in that range.
If you provide manually curated list indices, they have to be of the same size, because the size of each/any list is the number of elements you'll get back. For example, if you wanted the elements in columns 1 and 2 of rows 1,2,3:
>>> a[1:4,[1,2]]
array([[ 6, 7],
[11, 12],
[16, 17]])
But
>>> a[[1,2,3],[1,2]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
The former tells numpy that you want a range of rows and specific columns, while the latter says "get me the elements at (1,1), (2,2), and (3, hey! what the?! where's the other index?)"

a[[1,2],[1,2]] is reading this as, I want a[1,1] and a[2,2]. There are a few ways around this and I likely don't even have the best ways but you could try
a[[1,1,2,2],[1,2,1,2]]
This will give you a flattened version of above
a[[1,2]][:,[1,2]]
This will give you the correct slice, it works be taking the rows [1,2] and then columns [1,2].

It triggers advanced indexing so first slice is the row index, second is the column index. For each row, it selects the corresponding column.
a[[1,2], [1,2]] -> [a[1, 1], a[2, 2]] -> [6, 12]

Python Numpy syntax: what does array index as two arrays separated by comma mean?

I don't understand array as index in Python Numpy.
For example, I have a 2d array A in Numpy
[[1,2,3]
[4,5,6]
[7,8,9]
[10,11,12]]
What does A[[1,3], [0,1]] mean?

Just test it for yourself!
A = np.arange(12).reshape(4,3)
print(A)
>>> array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
By slicing the array the way you did (docs to slicing), you'll get the first row, zero-th column element and the third row, first column element.
A[[1,3], [0,1]]
>>> array([ 3, 10])
I'd highly encourage you to play around with that a bit and have a look at the documentation and the examples.

Your are creating a new array:
import numpy as np
A = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
A = np.array(A)
print(A[[1, 3], [0, 1]])
# [ 4 11]
See Indexing, Slicing and Iterating in the tutorial.
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas
Quoting the doc:
def f(x,y):
return 10*x+y
b = np.fromfunction(f, (5, 4), dtype=int)
print(b[2, 3])
# -> 23
You can also use a NumPy array as index of an array. See Index arrays in the doc.
NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples; see the end of this document for why this is). The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.

Behavior of np.c_ with list and tuple arguments

The output of np.c_ differs when its arguments are lists or tuples. Consider the output of the three following lines
np.c_[[1,2]]
np.c_[(1,2)]
np.c_[(1,2),]
With a list argument, np.c_ returns a column array, as expected. When the argument is a tuple instead (second line), it returns a 2D row. Adding a comma after the tuple (third line) returns a column array as for the first call.
Can somebody explain the rationale behind this behavior?

There are 2 common use cases for np.c_:
np.c_ can accept a sequence of 1D array-likes:
In [98]: np.c_[[1,2],[3,4]]
Out[98]:
array([[1, 3],
[2, 4]])
or, np.c_ can accept a sequence of 2D array-likes:
In [96]: np.c_[[[1,2],[3,4]], [[5,6],[7,8]]]
Out[96]:
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
So np.c_ can be passed 1D array-likes or 2D array-likes.
But that raises the question how is np.c_ supposed to recognize if the input is a single 2D array-like (e.g. [[1,2],[3,4]]) or a sequence of 1D array-likes (e.g. [1,2], [3,4])?
The developers made a design decision: If np.c_ is passed a tuple, the argument will be treated as a sequence of separate array-likes. If it is passed a non-tuple (such as a list), then that object will be consider a single array-like.
Thus, np.c_[[1,2], [3,4]] (which is equivalent to np.c_[([1,2], [3,4])]) will treat ([1,2], [3,4]) as two separate 1D arrays.
In [99]: np.c_[[1,2], [3,4]]
Out[99]:
array([[1, 3],
[2, 4]])
In contrast, np.c_[[[1,2], [3,4]]] will treat [[1,2], [3,4]] as a single 2D array.
In [100]: np.c_[[[1,2], [3,4]]]
Out[100]:
array([[1, 2],
[3, 4]])
So, for the examples you posted:
np.c_[[1,2]] treats [1,2] as a single 1D array-like, so it makes [1,2] into a column of a 2D array:
In [101]: np.c_[[1,2]]
Out[101]:
array([[1],
[2]])
np.c_[(1,2)] treats (1,2) as 2 separate array-likes, so it places each value into its own column:
In [102]: np.c_[(1,2)]
Out[102]: array([[1, 2]])
np.c_[(1,2),] treats the tuple (1,2), (which is equivalent to ((1,2),)) as a sequence of one array-like, so that array-like is treated as a column:
In [103]: np.c_[(1,2),]
Out[103]:
array([[1],
[2]])
PS. Perhaps more than most packages, NumPy has a history of treating lists and tuples differently. That link discusses how lists and tuples are treated differenty when passed to np.array.

The first level on handling the argument comes from the Python interpreter, which translates a [...] into a call to __getitem__:
In [442]: class Foo():
...: def __getitem__(self,args):
...: print(args)
...:
In [443]: Foo()['str']
str
In [444]: Foo()[[1,2]]
[1, 2]
In [445]: Foo()[[1,2],]
([1, 2],)
In [446]: Foo()[(1,2)]
(1, 2)
In [447]: Foo()[(1,2),]
((1, 2),)
np.c_ is an instance of np.lib.index_tricks.AxisConcatenator. It's __getitem__
# handle matrix builder syntax
if isinstance(key, str):
....
mymat = matrixlib.bmat(...)
return mymat
if not isinstance(key, tuple):
key = (key,)
....
for k, item in enumerate(key):
....
So except for the np.bmat compatible string, it turns all inputs into a tuple, and then iterates over the elements.
Any of the variations containing [1,2] is the same as ([1,2],), a single element tuple. (1,2) is two elements that will be concatenated. So is ([1,2],[3,4]).
Note that numpy indexing also distinguishes between lists and tuples (though with a few inconsistencies).
In [455]: x=np.arange(24).reshape(2,3,4)
In [456]: x[0,1] # tuple - index for each dim
Out[456]: array([4, 5, 6, 7])
In [457]: x[(0,1)] # same tuple
Out[457]: array([4, 5, 6, 7])
In [458]: x[[0,1]] # list - index for one dim
Out[458]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [459]: x[([0,1],)] # same
....

Unexpected behaviour when indexing a 2D np.array with two boolean arrays

two_d = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
first = np.array((True, True, False, False, False))
second = np.array((False, False, False, True, True))
Now, when I enter:
two_d[first, second]
I get:
array([3,9])
which doesn't make a whole lot of sense to me. Can anybody explain that simply?

When given multiple boolean arrays to index with, NumPy pairs up the indices of the True values. The first true value in first in paired with the first true value in second, and so on. NumPy then fetches the elements at each of these (x, y) indices.
This means that two_d[first, second] is equivalent to:
two_d[[0, 1], [3, 4]]
In other words you're retrieving the values at index (0, 3) and index (1, 4); 3 and 9. Note that if the two arrays had different numbers of true values an error would be raised!
The documents on advanced indexing mention this behaviour briefly and suggest np.ix_ as a 'less surprising' alternative:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
Hence you may be looking for:
>>> two_d[np.ix_(first, second)]
array([[3, 4],
[8, 9]])

Check the documentation on boolean indexing.
two_d[first, second] is the same as two_d[first.nonzero(), second.nonzero()], where:
>>> first.nonzero()
(array([0, 1]),)
>>> second.nonzero()
(array([3, 4]),)
Used as indices, this will select 3 and 9 because
>>> two_d[0,3]
3
>>> two_d[1,4]
9
and
>>> two_d[[0,1],[3,4]]
array([3, 9])
Also mildy related: NumPy indexing using List?

Numpy array indexing with partial indices

I am trying to pull out a particular slice of a numpy array but don't know how to express it with a tuple of indices. Using a tuple of indices works if its length is the same as the number of dimensions:
ind = (1,2,3)
# these two values are the same
foo[1,2,3]
foo[ind]
But if I want to get a slice along one dimension the same notation doesn't work:
ind = (2,3)
# these two values are not the same
foo[:,2,3]
foo[:,ind]
# and this isn't even proper syntax
foo[:,*ind]
So is there a way to use a named tuple of indices together with slices?

Instead of using the : syntax you can explicitly create the slice object and add that to the tuple:
ind = (2, 3)
s = slice(None) # equivalent to ':'
foo[(s,) + ind] # add s to tuples
In contrast to using foo[:, ind], the result of this should be the same as foo[:,2,3].

For accessing 2D arrays...
I believe what you are suggesting should work. Be mindful that numpy arrays index starting from 0. So to pull the first and third column from the following matrix I use column indices 0 and 2.
import numpy as np
foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
ind = (0,2)
foo[:,ind]
For accessing 3D arrays...
3D numpy arrays are accessed by 3 values x[i,j,k] where "i" represents the first matrix slice, or
[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]]
from my example below. "j" represents the second matrix slice, or the rows of these matrices. And "k" represents their columns. i,j and k can be :, integer or tuple. So we can access particular slices by using two sets of named tuples as follows.
import numpy as np
foo2 = np.array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
ind1 = (1,2)
ind2 = (0,1)
foo2[:,ind1,ind2]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unpack NumPy array by column - python

If I have a NumPy array, for example 5x3, is there a way to unpack it column by column all at once to pass to a function rather than like this: my_func(arr[:, 0], arr[:, 1], arr[:, 2])? Kind of like *args for list unpacking but by column.

numpy.split splits an array into multiple sub-arrays. In your case, indices_or_sections is 3 since you have 3 columns, and axis = 1 since we're splitting by column. my_func(numpy.split(array, 3, 1))

Related

Slicing arrays with lists

Python Numpy syntax: what does array index as two arrays separated by comma mean?

Behavior of np.c_ with list and tuple arguments

Unexpected behaviour when indexing a 2D np.array with two boolean arrays

Numpy array indexing with partial indices

Categories

Resources