Slice of 2d numpy array with another array - python

I have a quite large 2d array, and I need to get both the index of the maximum value in axis 1, and the maximum value itself. I can retrieve these two values as follows:
import numpy as np
a = np.arange(27).reshape(9, 3)
idx = np.argmax(a, axis=1)
max_val = np.max(a, axis=1)
However, since I have already found the index of the maximum value, it feels like I should be able to construct the array of maximum values using idx without having to look up the value again.
I realise I can use np.choose(idx, a.T) but this involves transposing the matrix which will be much more expensive than just using max. I can do something like np.array([a[i][idx[i]] for i in range(len(a))]) but this involves creating a list which again seems more expensive that just calling np.max.
Is there any way to slice a with idx in numpy without restructuring the array?

Your a and argmax:
In [602]: a
Out[602]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26]])
In [603]: idx
Out[603]: array([2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int64)
A common way of using that index array:
In [606]: a[np.arange(a.shape[0]),idx]
Out[606]: array([ 2, 5, 8, 11, 14, 17, 20, 23, 26])
A newer tool, that may be easier to use (if not familiar with the first):
In [607]: np.take_along_axis(a,idx[:,None],1)
Out[607]:
array([[ 2],
[ 5],
[ 8],
[11],
[14],
[17],
[20],
[23],
[26]])

Related

numpy ndarrays: Is it possible to access a row based on a member element?

Say I have a 2x3 ndarray:
[[0,1,1],
[1,1,1]]
I want to replace {any row that has 0 in the first index} with [0,0,0]:
[[0,0,0],
[1,1,1]]
Is it possible to do this with np.where?
Here's my attempt:
import numpy as np
arr = np.array([[0,1,1],[1,1,1]])
replacement = np.full(arr.shape,[0,0,0])
new = np.where(arr[:,0]==0,replacement,arr)
I'm met with the following error at the last line:
ValueError: operands could not be broadcast together with shapes (2,) (2,3) (2,3)
The error makes sense, but I don't know how to fix the code to accomplish my goal. Any advice would be greatly appreciated!
Edit:
I was trying to simplify a higher-dimensional case, but turns out it might not generalize.
If I have this ndarray:
[[[0,1,1],[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1],[1,1,1]]]
how can I replace the first triplet with [0,0,0]?
Simple indexing/broadcasting will do:
a[a[:,0]==0] = [0,0,0]
output:
array([[0, 0, 0],
[1, 1, 1]])
explanation:
# get first column
a[:,0]
# array([0, 1])
# compare to 0 creating a boolean array
a[:,0]==0
# array([ True, False])
# select rows where the boolean is True
a[a[:,0]==0]
# array([[0, 1, 1]])
# replace those rows with new array
a[a[:,0]==0] = [0,0,0]
using np.where
this is less elegant in my opinion:
a[np.where(a[:,0]==0)[0]] = [0,0,0]
Edit: generalization
input:
a = np.arange(3**3).reshape((3,3,3))
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
transformation:
a[a[...,0]==0] = [0,0,0]
array([[[ 0, 0, 0],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])

Double Indexing Assignment in NumPy

I would like to know how can I assign a matrix to a slice of a slice of a NumPy array. Following I have toy data and a 1D mask to illustrates what I need to do:
data = np.array([[1, 2, 11, 21], [2, 4, 12, 23], [3, 6, 13, 25], [4, 8, 14, 27], [5, 10, 15, 29]])
m = np.array([False, False, True, True, False])
Consequently, I have:
data[m] -> array([[ 3, 6, 13, 25],
[ 4, 8, 14, 27]])
and
data[m][:, 2:] -> array([[13, 25],
[14, 27]])
I would like to assign a matrix to data[m][:, 2:]. Something like:
data[m][:, 2:] = np.array([[2,2], [2,2]])
and end up with the data like:
np.array([[1, 2, 11, 21], [2, 4, 12, 23], [3, 6, 2, 2], [4, 8, 2, 2], [5, 10, 15, 29]])
My use case is for a huge dataset where I cannot go cell by cell assigning values. Also, I know I can duplicate the mask to the number of columns and then make every value in those columns, but the ones I want to assign, into False and use that final mask over the data, but I am searching for a better solution.
Because data[m] uses boolean indexing (selecting rows) the result is a copy, not a view. The subsequent assignment modifies that, not data. You need to combine the indexing into one.
I suggested using the indices of the True values in m:
In [205]: data[[2,3],2:]
Out[205]:
array([[13, 25],
[14, 27]])
In [208]: m.nonzero()
Out[208]: (array([2, 3]),)
In [209]: data[m.nonzero(),2:]
Out[209]:
array([[[13, 25],
[14, 27]]])
But m can be used directly, since it is just selecting rows:
In [210]: data[m,2:]
Out[210]:
array([[13, 25],
[14, 27]])
It's a little trickier to use boolean indexing with others (list or slices), so that's why I started with the list [2,3].

sum groups rows of numpy matrix using list of lists of indices

slice numpy array using lists of indices and apply function, is it possible to vectorize (or nonvectorized way to do this)? vectorized would be ideal for large matrices
import numpy as np
index = [[1,3], [2,4,5]]
a = np.array(
[[ 3, 4, 6, 3],
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[1, 1, 4, 5]])
summing by the groups of row indices in index, giving:
np.array([[8, 10, 12, 14],
[17, 19, 24, 37]])
Approach #1 : Here's an almost* vectorized approach -
def sumrowsby_index(a, index):
index_arr = np.concatenate(index)
lens = np.array([len(i) for i in index])
cut_idx = np.concatenate(([0], lens[:-1].cumsum() ))
return np.add.reduceat(a[index_arr], cut_idx)
*Almost because of the step that computes lens with a loop-comprehension, but since we are simply getting the lengths and no computation is involved there, that step won't sway the timings in any big way.
Sample run -
In [716]: a
Out[716]:
array([[ 3, 4, 6, 3],
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[ 1, 1, 4, 5]])
In [717]: index
Out[717]: [[1, 3], [2, 4, 5]]
In [718]: sumrowsby_index(a, index)
Out[718]:
array([[ 8, 10, 12, 14],
[17, 19, 24, 27]])
Approach #2 : We could leverage fast matrix-multiplication with numpy.dot to perform those sum-reductions, giving us another method as listed below -
def sumrowsby_index_v2(a, index):
lens = np.array([len(i) for i in index])
id_ar = np.zeros((len(lens), a.shape[0]))
c = np.concatenate(index)
r = np.repeat(np.arange(len(index)), lens)
id_ar[r,c] = 1
return id_ar.dot(a)
Using a list comprehension...
For each index list in index, create a new list which is a list of the rows in a of those indexes. From here, we have a list of numpy arrays which we can apply the sum() method to. On a numpy array, sum() will return a new array of each element from the arrays added which will give you what you want:
np.array([sum([a[r] for r in i]) for i in index])
giving:
array([[ 8, 10, 12, 14],
[17, 19, 24, 27]])

numpy `take` along 2 axes

I have a 3D array a of data and a 2D array b of indices. I need to take a sub-array of a along the 3rd axis, using the indices from b. I can do it with take like this:
a = np.arange(24).reshape((2,3,4))
b = np.array([0,2,1,3]).reshape((2,2))
np.array([np.take(a_,b_,axis=1) for (a_,b_) in zip(a,b)])
Can I do it without list comprehension, using some fancy indexing? I am worried about efficiency, so if fancy indexing is not more efficient in this case, I would like to know it.
EDIT The 1st thing I've tried is a[[0,1],:,b] but it doesn't give the sub-array I need
In [317]: a
Out[317]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [318]: a = np.arange(24).reshape((2,3,4))
...: b = np.array([0,2,1,3]).reshape((2,2))
...: np.array([np.take(a_,b_,axis=1) for (a_,b_) in zip(a,b)])
...:
Out[318]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
So you want the 0 & 2 columns from the 1st block, and 1 & 3 from the second.
Make a c that matches b in shape, and embodies this observation
In [319]: c=np.array([[0,0],[1,1]])
In [320]: c
Out[320]:
array([[0, 0],
[1, 1]])
In [321]: b
Out[321]:
array([[0, 2],
[1, 3]])
In [322]: a[c,:,b]
Out[322]:
array([[[ 0, 4, 8],
[ 2, 6, 10]],
[[13, 17, 21],
[15, 19, 23]]])
That's the right numbers, but not the right shape.
A column vector can be used instead of c.
In [323]: a[np.arange(2)[:,None],:,b] # or a[[[0],[1]],:,b]
Out[323]:
array([[[ 0, 4, 8],
[ 2, 6, 10]],
[[13, 17, 21],
[15, 19, 23]]])
As for the shape, we can transpose the last two axes
In [324]: a[np.arange(2)[:,None],:,b].transpose(0,2,1)
Out[324]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
This transpose is required because we have a slice between two index arrays, a mix of basic and advanced indexing. It's documented, but never the less often puzzling. It put the slice dimension (3) last, and we have to transpose it back.
Nice little indexing puzzle!
The latest question and explanation of this advanced/basic transpose:
Indexing numpy multidimensional arrays depends on a slicing method
This is my first try. I will see if I can do better.
#using numpy broadcasting.
np.r_[a[0][:,b[0]],a[1][:,b[1]]].reshape(2,3,2)
Out[300]: In [301]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
Second try:
#convert both a and b to a 2d array and then slice all rows and only columns determined by b.
a.reshape(6,4)[np.arange(6)[:,None],b.repeat(3,0)].reshape(2,3,2)
Out[429]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])

compress numpy array(matrix) by removing columns using another numpy array as mask

I have a 2D numpy array (i.e matrix) A which contains useful data interspread with garbage in the form of column vectors as well as a 'selection' array B which contains '1' for those columns that are important and 0 for those that are not. Is there a way to select only those columns from A that correspond to ones in B? i.e i have a matrix
A = array([[ 0, 1, 2, 3, 4], and a vector B = array([ 0, 1, 0, 1, 0])
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and I want
array([[1, 3],
[6, 8],
[11, 13],
[16, 18],
[21, 23]])
Is there an elegant way to do so? Right now i just have a for loop that iterates through B.
NOTE: the matrices that i'm dealing with are large, so i don't want to use numpy masked arrays, as i simply don't want the masked data
>>> A
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> B = NP.array([ 0, 1, 0, 1, 0])
>>> # convert the indexing array to a boolean array
>>> B = NP.array(B, dtype=bool)
>>> # index A against B--indexing array is placed after the ',' because
>>> # you are selecting columns
>>> res = A[:,B]
>>> res
array([[ 1, 3],
[ 6, 8],
[11, 13],
[16, 18],
[21, 23]])
The syntax for index-based slicing in NumPy is elegant and simple. A couple of rules cover a majority of use cases:
the form is [rows, columns]
specify all rows or all columns using a colon ":" e.g., [:, 4] (extracts the
entire 5th column)
Not sure if it's the most efficient way (because of the transposition), but it should be better than a for loop:
A.T[B == 1].T
I was interested to do the same but to slice row & column using the boolean values of vector B, the solution was simple:
res = A[:,B][B,:]

Categories

Resources