Behavior of np.c_ with list and tuple arguments - python

The output of np.c_ differs when its arguments are lists or tuples. Consider the output of the three following lines
np.c_[[1,2]]
np.c_[(1,2)]
np.c_[(1,2),]
With a list argument, np.c_ returns a column array, as expected. When the argument is a tuple instead (second line), it returns a 2D row. Adding a comma after the tuple (third line) returns a column array as for the first call.
Can somebody explain the rationale behind this behavior?

There are 2 common use cases for np.c_:
np.c_ can accept a sequence of 1D array-likes:
In [98]: np.c_[[1,2],[3,4]]
Out[98]:
array([[1, 3],
[2, 4]])
or, np.c_ can accept a sequence of 2D array-likes:
In [96]: np.c_[[[1,2],[3,4]], [[5,6],[7,8]]]
Out[96]:
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
So np.c_ can be passed 1D array-likes or 2D array-likes.
But that raises the question how is np.c_ supposed to recognize if the input is a single 2D array-like (e.g. [[1,2],[3,4]]) or a sequence of 1D array-likes (e.g. [1,2], [3,4])?
The developers made a design decision: If np.c_ is passed a tuple, the argument will be treated as a sequence of separate array-likes. If it is passed a non-tuple (such as a list), then that object will be consider a single array-like.
Thus, np.c_[[1,2], [3,4]] (which is equivalent to np.c_[([1,2], [3,4])]) will treat ([1,2], [3,4]) as two separate 1D arrays.
In [99]: np.c_[[1,2], [3,4]]
Out[99]:
array([[1, 3],
[2, 4]])
In contrast, np.c_[[[1,2], [3,4]]] will treat [[1,2], [3,4]] as a single 2D array.
In [100]: np.c_[[[1,2], [3,4]]]
Out[100]:
array([[1, 2],
[3, 4]])
So, for the examples you posted:
np.c_[[1,2]] treats [1,2] as a single 1D array-like, so it makes [1,2] into a column of a 2D array:
In [101]: np.c_[[1,2]]
Out[101]:
array([[1],
[2]])
np.c_[(1,2)] treats (1,2) as 2 separate array-likes, so it places each value into its own column:
In [102]: np.c_[(1,2)]
Out[102]: array([[1, 2]])
np.c_[(1,2),] treats the tuple (1,2), (which is equivalent to ((1,2),)) as a sequence of one array-like, so that array-like is treated as a column:
In [103]: np.c_[(1,2),]
Out[103]:
array([[1],
[2]])
PS. Perhaps more than most packages, NumPy has a history of treating lists and tuples differently. That link discusses how lists and tuples are treated differenty when passed to np.array.

The first level on handling the argument comes from the Python interpreter, which translates a [...] into a call to __getitem__:
In [442]: class Foo():
...: def __getitem__(self,args):
...: print(args)
...:
In [443]: Foo()['str']
str
In [444]: Foo()[[1,2]]
[1, 2]
In [445]: Foo()[[1,2],]
([1, 2],)
In [446]: Foo()[(1,2)]
(1, 2)
In [447]: Foo()[(1,2),]
((1, 2),)
np.c_ is an instance of np.lib.index_tricks.AxisConcatenator. It's __getitem__
# handle matrix builder syntax
if isinstance(key, str):
....
mymat = matrixlib.bmat(...)
return mymat
if not isinstance(key, tuple):
key = (key,)
....
for k, item in enumerate(key):
....
So except for the np.bmat compatible string, it turns all inputs into a tuple, and then iterates over the elements.
Any of the variations containing [1,2] is the same as ([1,2],), a single element tuple. (1,2) is two elements that will be concatenated. So is ([1,2],[3,4]).
Note that numpy indexing also distinguishes between lists and tuples (though with a few inconsistencies).
In [455]: x=np.arange(24).reshape(2,3,4)
In [456]: x[0,1] # tuple - index for each dim
Out[456]: array([4, 5, 6, 7])
In [457]: x[(0,1)] # same tuple
Out[457]: array([4, 5, 6, 7])
In [458]: x[[0,1]] # list - index for one dim
Out[458]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [459]: x[([0,1],)] # same
....

Related

Why can't numpy.sum sum all elements of arrays with different length?

I have been self-learning numpy, and according to the numpy manual, the numpy.sum will sum all the elements of an array or array-like. However, I have noticed if these arrays are in different lengths, numpy.sum would rather combine them than sum them.
For example:
array_a = [1,2,3,4,5,6] # Same length
array_b = [4,5,6,7,8,9]
np.sum([array_a, array_b])
60
array_a = [1,2,3,4,5] # Different length
array_b = [4,5,6,7,8,9]
np.sum([array_a, array_b])
[1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
Why in the latter, numpy.sum did not sum up all the elements as it is supposed to do?
In [128]: array_a = [1,2,3,4,5,6] # Same length
...: array_b = [4,5,6,7,8,9]
Here you give sum a list:
In [129]: np.sum([array_a, array_b])
Out[129]: 60
What it does first is make array:
In [130]: np.array([array_a, array_b])
Out[130]:
array([[1, 2, 3, 4, 5, 6],
[4, 5, 6, 7, 8, 9]])
60 is the sum of all elements. You can also give sum an axis number:
In [131]: np.sum([array_a, array_b],axis=0)
Out[131]: array([ 5, 7, 9, 11, 13, 15])
In [132]: np.sum([array_a, array_b],axis=1)
Out[132]: array([21, 39])
That's the normal, documented behavior.
ragged
In [133]: array_a = [1,2,3,4,5] # Different length
...: array_b = [4,5,6,7,8,9]
In [135]: x = np.array([array_a, array_b])
<ipython-input-135-5379fc40e73f>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
x = np.array([array_a, array_b])
In [136]: x.shape
Out[136]: (2,)
In [137]: x.dtype
Out[137]: dtype('O')
In [138]: np.sum(x)
Out[138]: [1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
That is summing the lists - same as if we do:
In [139]: array_a + array_b
Out[139]: [1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
Despite the name, array_a is NOT an array.
With object dtype, numpy tries to apply the operator (here add) to the elements. Add for a list is concatenate.
If instead we make a ragged array from arrays:
In [140]: y = np.array([np.array(array_a), np.array(array_b)])
...
In [142]: y
Out[142]: array([array([1, 2, 3, 4, 5]), array([4, 5, 6, 7, 8, 9])], dtype=object)
In [143]: np.sum(y)
Traceback ...
ValueError: operands could not be broadcast together with shapes (5,) (6,)
It's trying to do
In [144]: np.array(array_a) + np.array(array_b)
When learning numpy it's a good idea to focus on the numeric multidimensional arrays, and leave these ragged object dtype arrays to later. There are nuances that aren't obvious from the "normal" array operations. Ragged arrays are very much like lists, and often are the result of user errors. Intentionally making ragged arrays is usually not a useful approach.
Why in the latter, numpy.sum did not sum up all the elements as it is supposed to do?
I cannot explain why it didn't - because it did.
[array_a, array_b] contains two elements - one of them is the list array_a, and the other is the list array_b. The result in the latter case is the same as for array_a + array_b.
In the first case, Numpy detects that it can build a two-dimensional, two-by-six Numpy array from the input data. In the latter case, it cannot build a two-dimensional array - arrays, by their definition, must be rectangular (which is part of why we do not use that name for the built-in Python data structure, but instead call it a list). It can, however, build a one-dimensional array, where each element is a Python list. (Yes, Numpy arrays are allowed to store Python objects; the dtype will be object, and Numpy will not care about the exact type. In the case of numpy.sum, it will just create the appropriate requests to "add" the elements, and the rest is the responsibility of built-in Python definitions.)
Explicit is better than implicit. If you want to work with arrays, then create them in the first place:
array_a = np.array([1,2,3,4,5,6])

numpy array slicing index

import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
How can I get zeroth index column? Expecting output [[1],[2],[3]] a[...,0] gives 1D array. Maybe next question answers this question.
How to get last 2 columns of a? a[...,1:2] gives second column only, a[...,2:3] gives last 2 columns, but a[...,3] is invalid dimension. So, how does it work?
By the way, operator ... and : have same meaning? a[...,0] and a[:,0] give same output. Can someone comment here?
numpy indexing is built on python list conventions, but extended to multi-dimensions and multi-element indexing. It is powerful, but complex, but sooner or later you should read a full indexing documentation, one that distinguishes between 'basic' and 'advanced' indexing.
Like range and arange, slice index has a 'open' stop value
In [111]: a = np.arange(1,10).reshape(3,3)
In [112]: a
Out[112]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Indexing with a scalar reduces the dimension, regardless of where:
In [113]: a[1,:]
Out[113]: array([4, 5, 6])
In [114]: a[:,1]
Out[114]: array([2, 5, 8])
That also means a[1,1] returns 5, not np.array([[5]]).
Indexing with a slice preserves the dimension:
In [115]: a[1:2,:]
Out[115]: array([[4, 5, 6]])
so does indexing with a list or array (though this makes a copy, not a view):
In [116]: a[[1],:]
Out[116]: array([[4, 5, 6]])
... is a generalized : - use as many as needed.
In [117]: a[...,[1]]
Out[117]:
array([[2],
[5],
[8]])
You can adjust dimensions with newaxis or reshape:
In [118]: a[:,1,np.newaxis]
Out[118]:
array([[2],
[5],
[8]])
Note that trailing : are automatic. a[1] is the same as a[1,:]. But leading ones must be explicit.
List indexing also removes a 'dimension/nesting layer'
In [119]: alist = [[1,2,3],[4,5,6]]
In [120]: alist[0]
Out[120]: [1, 2, 3]
In [121]: alist[0][0]
Out[121]: 1
In [122]: [l[0] for l in alist] # a column equivalent
Out[122]: [1, 4]
import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
a[:,0] # first colomn
>>> array([1, 4, 7])
a[0,:] # first row
>>> array([1, 2, 3])
a[:,0:2] # first two columns
>>> array([[1, 2],
[4, 5],
[7, 8]])
a[0:2,:] # first two rows
>>> array([[1, 2, 3],
[4, 5, 6]])

Slicing arrays with lists

So, I create a numpy array:
a = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
A conventional slice a[1:3,1:3] returns
array([[ 6, 7],
[11, 12]])
as does using a list in the second a[1:3,[1,2]]
array([[ 6, 7],
[11, 12]])
However, a[[1,2],[1,2]] returns
array([ 6, 12])
Obviously I am not understanding something here. That said, slicing with a list might on occasion be very useful.
Cheers,
keng
You observed effect of so-called Advanced Indexing. Let consider example from link:
import numpy as np
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x)
[[1 2]
[3 4]
[5 6]]
print(x[[0, 1, 2], [0, 1, 0]]) # [1 4 5]
You might think about this as providing lists of (Cartesian) coordinates of grid, as
print(x[0,1]) # 1
print(x[1,1]) # 4
print(x[2,0]) # 5
In the last case, the two individual lists are treated as separate indexing operations (this is really awkward wording so please bear with me).
Numpy sees two lists of two integers and decides that you are therefore asking for two values. The row index of each value comes from the first list, while the column index of each value comes from the second list. Therefore, you get a[1,1] and a[2,2]. The : notation not only expands to the list you've accurately deduced, but also tells numpy that you want all the rows/columns in that range.
If you provide manually curated list indices, they have to be of the same size, because the size of each/any list is the number of elements you'll get back. For example, if you wanted the elements in columns 1 and 2 of rows 1,2,3:
>>> a[1:4,[1,2]]
array([[ 6, 7],
[11, 12],
[16, 17]])
But
>>> a[[1,2,3],[1,2]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
The former tells numpy that you want a range of rows and specific columns, while the latter says "get me the elements at (1,1), (2,2), and (3, hey! what the?! where's the other index?)"
a[[1,2],[1,2]] is reading this as, I want a[1,1] and a[2,2]. There are a few ways around this and I likely don't even have the best ways but you could try
a[[1,1,2,2],[1,2,1,2]]
This will give you a flattened version of above
a[[1,2]][:,[1,2]]
This will give you the correct slice, it works be taking the rows [1,2] and then columns [1,2].
It triggers advanced indexing so first slice is the row index, second is the column index. For each row, it selects the corresponding column.
a[[1,2], [1,2]] -> [a[1, 1], a[2, 2]] -> [6, 12]

Python Numpy syntax: what does array index as two arrays separated by comma mean?

I don't understand array as index in Python Numpy.
For example, I have a 2d array A in Numpy
[[1,2,3]
[4,5,6]
[7,8,9]
[10,11,12]]
What does A[[1,3], [0,1]] mean?
Just test it for yourself!
A = np.arange(12).reshape(4,3)
print(A)
>>> array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
By slicing the array the way you did (docs to slicing), you'll get the first row, zero-th column element and the third row, first column element.
A[[1,3], [0,1]]
>>> array([ 3, 10])
I'd highly encourage you to play around with that a bit and have a look at the documentation and the examples.
Your are creating a new array:
import numpy as np
A = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
A = np.array(A)
print(A[[1, 3], [0, 1]])
# [ 4 11]
See Indexing, Slicing and Iterating in the tutorial.
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas
Quoting the doc:
def f(x,y):
return 10*x+y
b = np.fromfunction(f, (5, 4), dtype=int)
print(b[2, 3])
# -> 23
You can also use a NumPy array as index of an array. See Index arrays in the doc.
NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples; see the end of this document for why this is). The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.

How to take elements along a given axis, given by their indices?

I have a 3D array and I need to "squeeze" it over the last axis, so that I get a 2D array. I need to do it in the following way. For each values of the indices for the first two dimensions I know the value of the index for the 3rd dimension from where the value should be taken.
For example, I know that if i1 == 2 and i2 == 7 then i3 == 11. It means that out[2,7] = inp[2,7,11]. This mapping from first two dimensions into the third one is given in another 2D array. In other words, I have an array in which on the position 2,7 I have 11 as a value.
So, my question is how to combine these two array (3D and 2D) to get the output array (2D).
In [635]: arr = np.arange(24).reshape(2,3,4)
In [636]: idx = np.array([[1,2,3],[0,1,2]])
In [637]: I,J = np.ogrid[:2,:3]
In [638]: arr[I,J,idx]
Out[638]:
array([[ 1, 6, 11],
[12, 17, 22]])
In [639]: arr
Out[639]:
array([[[ 0, 1, 2, 3], # 1
[ 4, 5, 6, 7], # 6
[ 8, 9, 10, 11]], # ll
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
I,J broadcast together to select a (2,3) set of values, matching idx:
In [640]: I
Out[640]:
array([[0],
[1]])
In [641]: J
Out[641]: array([[0, 1, 2]])
This is a generalization to 3d of the easier 2d problem - selecting one item from each row:
In [649]: idx
Out[649]:
array([[1, 2, 3],
[0, 1, 2]])
In [650]: idx[np.arange(2), [0,1]]
Out[650]: array([1, 1])
In fact we could convert the 3d problem into a 2d one:
In [655]: arr.reshape(6,4)[np.arange(6), idx.ravel()]
Out[655]: array([ 1, 6, 11, 12, 17, 22])
Generalizing the original case:
In [55]: arr = np.arange(24).reshape(2,3,4)
In [56]: idx = np.array([[1,2,3],[0,1,2]])
In [57]: IJ = np.ogrid[[slice(i) for i in idx.shape]]
In [58]: IJ
Out[58]:
[array([[0],
[1]]), array([[0, 1, 2]])]
In [59]: (*IJ,idx)
Out[59]:
(array([[0],
[1]]), array([[0, 1, 2]]), array([[1, 2, 3],
[0, 1, 2]]))
In [60]: arr[_]
Out[60]:
array([[ 1, 6, 11],
[12, 17, 22]])
The key is in combining the IJ list of arrays with the idx to make a new indexing tuple. Constructing the tuple is a little messier if idx isn't the last index, but it's still possible. E.g.
In [61]: (*IJ[:-1],idx,IJ[-1])
Out[61]:
(array([[0],
[1]]), array([[1, 2, 3],
[0, 1, 2]]), array([[0, 1, 2]]))
In [62]: arr.transpose(0,2,1)[_]
Out[62]:
array([[ 1, 6, 11],
[12, 17, 22]])
Of if it's easier transpose arr to the idx dimension is last. The key is that the index operation takes a tuple of index arrays, arrays which broadcast against each other to select specific items.
That's what ogrid is doing, create the arrays that work with idx.
inp = np.random.random((20, 10, 5)) # simulate some input
i1, i2 = np.indices(inp.shape[:2])
i3 = np.random.randint(0, 5, size=inp.shape) # or implement whatever mapping
# you want between (i1,i2) and i3
out = inp[(i1, i2, i3)]
See https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing for more details
Using numpy.einsum
This can be achieved by a combination of array indexing and usage of numpy.einsum:
>>> numpy.einsum('ijij->ij', inp[:, :, indices])
inp[:, :, indices] creates a four-dimensional array where for each of the first two indices (the first two dimensions) all indices of the index array are applied to the third dimension. Because the index array is two-dimensional this results in 4D. However you only want those indices of the index array which correspond to the ones of the first two dimensions. This is then achieved by using the string ijij->ij. This tells einsum that you want to select only those elements where the indices of 1st and 3rd and 2nd and 4th axis are similar. Because the last two dimensions (3rd and 4th) were added by the index array this is similar to selecting only the index index[i, j] for the third dimension of inp.
Note that this method can really blow up the memory consumption. Especially if inp.shape[:2] is much greater than inp.shape[2] then inp[:, :, indices].size will be approximately inp.size ** 2.
Building the indices manually
First we prepare the new index array:
>>> idx = numpy.array(list(
... numpy.ndindex(*inp.shape[:2], 1) # Python 3 syntax
... ))
Then we update the column which corresponds to the third axis:
>>> idx[:, 2] = indices[idx[:, 0], idx[:, 1]]
Now we can select the elements and simply reshape the result:
>>> inp[tuple(idx.T)].reshape(*inp.shape[:2])
Using numpy.choose
Note: numpy.choose allows a maximum size of 32 for the axis which is chosen from.
According to this answer and the documentation of numpy.choose we can also use the following:
# First we need to bring the last axis to the front because
# `numpy.choose` chooses from the first axis.
>>> new_inp = numpy.moveaxis(inp, -1, 0)
# Now we can select the elements.
>>> numpy.choose(indices, new_inp)
Although the documentation discourages the use of a single array for the 2nd argument (the choices)
To reduce the chance of misinterpretation, even though the following “abuse” is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.
this seems to be the case only for preventing misunderstandings:
choices : sequence of arrays
Choice arrays. a and all of the choices must be broadcastable to the same shape. If choices is itself an array (not recommended), then its outermost dimension (i.e., the one corresponding to choices.shape[0]) is taken as defining the “sequence”.
So from my point of view there's nothing wrong with using numpy.choose that way, as long as one is aware of what they're doing.
I believe this should do it:
for i in range(n):
for j in range(m):
k = index_mapper[i][j]
value = input_3d[i][j][k]
out_2d[i][j] = value

Categories

Resources