Numpy 2D array indexing by other 2D along specific axis - python

I have a 2D array:
>>> in_arr = np.array([[1,2],[4,3]])
array([[1, 2],
[4, 3]])
and I find the sorted indices by columns to yield another 2D array:
>>> col_sort = np.argsort(in_arr, axis=1)
array([[0, 1],
[1, 0]])
I would like to know the efficient numpy slice to index the first by the second:
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(in_arr, col_sort, axis=1)
array([[1, 2],
[3, 4]])
The intention is to then perform a (more complicated) function on the array by column, e.g.:
>>> arr_with_function = reordered_in_arr ** np.array([1,2])
array([[1, 4],
[3, 16]])
and return the elements to their original position in the array
>>> return_order = np.argsort(col_sort, axis=1)
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(arr_with_function, return_order, axis=1)
array([[1, 4],
[16, 3]])
Ok so thinking about it as I type I might just use apply_over_axis, but I would still like know how to the above efficiently in case it is of value later..

If you want to do all those operations in-place then you don't need argsort(). Numpy supports in-place operations in such situations:
In [12]: in_arr = np.array([[1,2],[4,3]])
In [13]: in_arr.sort(axis=1)
In [14]: in_arr **= [1, 2]
In [15]: in_arr
Out[15]:
array([[ 1, 4],
[ 3, 16]])
But if you need the indices of the sorted items you can get the expected result with a simple indexing.
In [18]: in_arr[np.arange(2)[:,None], col_sort]
Out[18]:
array([[1, 2],
[3, 4]])

Related

numpy array slicing index

import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
How can I get zeroth index column? Expecting output [[1],[2],[3]] a[...,0] gives 1D array. Maybe next question answers this question.
How to get last 2 columns of a? a[...,1:2] gives second column only, a[...,2:3] gives last 2 columns, but a[...,3] is invalid dimension. So, how does it work?
By the way, operator ... and : have same meaning? a[...,0] and a[:,0] give same output. Can someone comment here?
numpy indexing is built on python list conventions, but extended to multi-dimensions and multi-element indexing. It is powerful, but complex, but sooner or later you should read a full indexing documentation, one that distinguishes between 'basic' and 'advanced' indexing.
Like range and arange, slice index has a 'open' stop value
In [111]: a = np.arange(1,10).reshape(3,3)
In [112]: a
Out[112]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Indexing with a scalar reduces the dimension, regardless of where:
In [113]: a[1,:]
Out[113]: array([4, 5, 6])
In [114]: a[:,1]
Out[114]: array([2, 5, 8])
That also means a[1,1] returns 5, not np.array([[5]]).
Indexing with a slice preserves the dimension:
In [115]: a[1:2,:]
Out[115]: array([[4, 5, 6]])
so does indexing with a list or array (though this makes a copy, not a view):
In [116]: a[[1],:]
Out[116]: array([[4, 5, 6]])
... is a generalized : - use as many as needed.
In [117]: a[...,[1]]
Out[117]:
array([[2],
[5],
[8]])
You can adjust dimensions with newaxis or reshape:
In [118]: a[:,1,np.newaxis]
Out[118]:
array([[2],
[5],
[8]])
Note that trailing : are automatic. a[1] is the same as a[1,:]. But leading ones must be explicit.
List indexing also removes a 'dimension/nesting layer'
In [119]: alist = [[1,2,3],[4,5,6]]
In [120]: alist[0]
Out[120]: [1, 2, 3]
In [121]: alist[0][0]
Out[121]: 1
In [122]: [l[0] for l in alist] # a column equivalent
Out[122]: [1, 4]
import numpy as np
a=np.array([ [1,2,3],[4,5,6],[7,8,9]])
a[:,0] # first colomn
>>> array([1, 4, 7])
a[0,:] # first row
>>> array([1, 2, 3])
a[:,0:2] # first two columns
>>> array([[1, 2],
[4, 5],
[7, 8]])
a[0:2,:] # first two rows
>>> array([[1, 2, 3],
[4, 5, 6]])

using boolean array for indexing in numpy for 2D arrays

I use boolean indexing to select elements from a numpy array as
x = y[t<tmax]
where t a numpy array with as many elements as y. My question is how can I do the same with 2D numpy arrays? I tried
x = y[t<tmax][t<tmax]
This does not seem to work however since it seems to select first the rows and then complains that the second selection has the wrong dimension.
IndexError: boolean index did not match indexed array along dimension 0; dimension is 50 but corresponding boolean dimension is 200
#
Here is an example
x1D = np.array([1,2,3], np.int32)
x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
print(x1D[x1D<3]) --> [1 2]
print(x2D[x1D<3][x1D<3]) --> error
The second print statement produces an error similar to the error shown above. I use
print(x2D[x1D<3])
I get
[[1 2 3]
[1 2 3]]
but I want
[[1 2]
[1 2]]
In [28]: x1D = np.array([1,2,3], np.int32)
...: x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
The 1d mask:
In [29]: x1D<3
Out[29]: array([ True, True, False])
applied to the 1d array (same size):
In [30]: x1D[_]
Out[30]: array([1, 2], dtype=int32)
applied to the 2d it selects 2 rows:
In [31]: x2D[_29]
Out[31]:
array([[1, 2, 3],
[1, 2, 3]], dtype=int32)
It can be used again to select columns - but note the : place holder for the row index:
In [32]: _[:, _29]
Out[32]:
array([[1, 2],
[1, 2]], dtype=int32)
If we generate an indexing array from that mask, we can do the indexing with one step:
In [37]: idx = np.nonzero(x1D<3)
In [38]: idx
Out[38]: (array([0, 1]),)
In [39]: x2D[idx[0][:,None], idx[0]]
Out[39]:
array([[1, 2],
[1, 2]], dtype=int32)
An alternate way of writing this '2d' indexing:
In [41]: x2D[ [[0],[1]], [[0,1]] ]
Out[41]:
array([[1, 2],
[1, 2]], dtype=int32)
ix_ is a convenient tool for tweaking the indexing dimensions:
In [42]: x2D[np.ix_(idx[0], idx[0])]
Out[42]:
array([[1, 2],
[1, 2]], dtype=int32)
Or passing the boolean mask to ix_:
In [44]: np.ix_(_29, _29)
Out[44]:
(array([[0],
[1]]), array([[0, 1]]))
In [45]: x2D[np.ix_(_29, _29)]
Out[45]:
array([[1, 2],
[1, 2]], dtype=int32)
Writing In[32] so it's close to to your try:
In [46]: x2D[x1D<3][:, x1D<3]
Out[46]:
array([[1, 2],
[1, 2]], dtype=int32)

Numpy sort two arrays together with one array as the keys in axis 1 [duplicate]

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

How to return some column items in a NumPy array?

I want print some items in 2D NumPy array.
For example:
a = [[1, 2, 3, 4],
[5, 6, 7, 8]]
a = numpy.array(a)
My questions:
How can I return just (1 and 2)? As well as (5 and 6)?
And how can I keep the dimension as [2, 2]
The following:
a[:, [0, 1]]
will select only the first two columns (with index 0 and 1). The result will be:
array([[1, 2],
[5, 6]])
You can use slicing to get necessary parts of the numpy array.
To get 1 and 2 you need to select 0's row and the first two columns, i.e.
>>> a[0, 0:2]
array([1, 2])
Similarly for 5 and 6
>>> a[1, 0:2]
array([5, 6])
You can also select a 2x2 subarray, e.g.
>>> a[:,0:2]
array([[1, 2],
[5, 6]])
You can do like this,
In [44]: a[:, :2]
Out[44]:
array([[1, 2],
[5, 6]])

Reduce Dimensons when converting list to array

I want to reduce the dimensions of an array after converting it to a list
a = np.array([[1,2],[3,4]])
print a.shape
b = np.array([[1],[3,4]])
print b.shape
Output:
(2, 2)
(2,)
I want a to have the same shape as b i.e. (2,)
>>> a = np.array([[1,2],[3,4], None])[:2]
>>> a
array([[1, 2], [3, 4]], dtype=object)
>>> a.shape
(2,)
Works, though is probably the wrong way to do it (I'm a numpy newb).
Do you understand what b is?
b = np.array([[1],[3,4]])
print(repr(b))
array([[1], [3, 4]], dtype=object)
b is a 1d array with 2 elements, each a list. np.array does this way because the 2 sublists have different length, so it can't create a 2d array.
a = np.array([[1,2],[3,4]])
print(repr(a))
array([[1, 2],
[3, 4]])
Here the 2 sublists have the same length, so it can create a 2d array. Each element is an integer. np.array tries to create the highest dimensional array that the input allows.
Probably the best way to create another array like b is to make a copy, and insert the desired lists.
a1 = b.copy()
a1[0] = [1,2]
# a1[1] = [3,4]
print(repr(a1))
array([[1, 2], [3, 4]], dtype=object)
You have to use this convoluted method because you trying to do something 'unnatural'.
You comment about using vstack. Both work:
In [570]: np.vstack((a,b)) # (3,2) array
Out[570]:
array([[1, 2],
[3, 4],
[[1], [3, 4]]], dtype=object)
In [571]: np.vstack((a1,b)) # (2,2) array
Out[571]:
array([[[1, 2], [3, 4]],
[[1], [3, 4]]], dtype=object)
Your array b is little more than the original list in an array wrapper. Is that really what you need? The 2d a is a normal numpy array. b is an oddball construction.

Categories

Resources