I would like to know how can I assign a matrix to a slice of a slice of a NumPy array. Following I have toy data and a 1D mask to illustrates what I need to do:
data = np.array([[1, 2, 11, 21], [2, 4, 12, 23], [3, 6, 13, 25], [4, 8, 14, 27], [5, 10, 15, 29]])
m = np.array([False, False, True, True, False])
Consequently, I have:
data[m] -> array([[ 3, 6, 13, 25],
[ 4, 8, 14, 27]])
and
data[m][:, 2:] -> array([[13, 25],
[14, 27]])
I would like to assign a matrix to data[m][:, 2:]. Something like:
data[m][:, 2:] = np.array([[2,2], [2,2]])
and end up with the data like:
np.array([[1, 2, 11, 21], [2, 4, 12, 23], [3, 6, 2, 2], [4, 8, 2, 2], [5, 10, 15, 29]])
My use case is for a huge dataset where I cannot go cell by cell assigning values. Also, I know I can duplicate the mask to the number of columns and then make every value in those columns, but the ones I want to assign, into False and use that final mask over the data, but I am searching for a better solution.
Because data[m] uses boolean indexing (selecting rows) the result is a copy, not a view. The subsequent assignment modifies that, not data. You need to combine the indexing into one.
I suggested using the indices of the True values in m:
In [205]: data[[2,3],2:]
Out[205]:
array([[13, 25],
[14, 27]])
In [208]: m.nonzero()
Out[208]: (array([2, 3]),)
In [209]: data[m.nonzero(),2:]
Out[209]:
array([[[13, 25],
[14, 27]]])
But m can be used directly, since it is just selecting rows:
In [210]: data[m,2:]
Out[210]:
array([[13, 25],
[14, 27]])
It's a little trickier to use boolean indexing with others (list or slices), so that's why I started with the list [2,3].
Related
I have a quite large 2d array, and I need to get both the index of the maximum value in axis 1, and the maximum value itself. I can retrieve these two values as follows:
import numpy as np
a = np.arange(27).reshape(9, 3)
idx = np.argmax(a, axis=1)
max_val = np.max(a, axis=1)
However, since I have already found the index of the maximum value, it feels like I should be able to construct the array of maximum values using idx without having to look up the value again.
I realise I can use np.choose(idx, a.T) but this involves transposing the matrix which will be much more expensive than just using max. I can do something like np.array([a[i][idx[i]] for i in range(len(a))]) but this involves creating a list which again seems more expensive that just calling np.max.
Is there any way to slice a with idx in numpy without restructuring the array?
Your a and argmax:
In [602]: a
Out[602]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26]])
In [603]: idx
Out[603]: array([2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int64)
A common way of using that index array:
In [606]: a[np.arange(a.shape[0]),idx]
Out[606]: array([ 2, 5, 8, 11, 14, 17, 20, 23, 26])
A newer tool, that may be easier to use (if not familiar with the first):
In [607]: np.take_along_axis(a,idx[:,None],1)
Out[607]:
array([[ 2],
[ 5],
[ 8],
[11],
[14],
[17],
[20],
[23],
[26]])
Say I have a 2x3 ndarray:
[[0,1,1],
[1,1,1]]
I want to replace {any row that has 0 in the first index} with [0,0,0]:
[[0,0,0],
[1,1,1]]
Is it possible to do this with np.where?
Here's my attempt:
import numpy as np
arr = np.array([[0,1,1],[1,1,1]])
replacement = np.full(arr.shape,[0,0,0])
new = np.where(arr[:,0]==0,replacement,arr)
I'm met with the following error at the last line:
ValueError: operands could not be broadcast together with shapes (2,) (2,3) (2,3)
The error makes sense, but I don't know how to fix the code to accomplish my goal. Any advice would be greatly appreciated!
Edit:
I was trying to simplify a higher-dimensional case, but turns out it might not generalize.
If I have this ndarray:
[[[0,1,1],[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1],[1,1,1]]]
how can I replace the first triplet with [0,0,0]?
Simple indexing/broadcasting will do:
a[a[:,0]==0] = [0,0,0]
output:
array([[0, 0, 0],
[1, 1, 1]])
explanation:
# get first column
a[:,0]
# array([0, 1])
# compare to 0 creating a boolean array
a[:,0]==0
# array([ True, False])
# select rows where the boolean is True
a[a[:,0]==0]
# array([[0, 1, 1]])
# replace those rows with new array
a[a[:,0]==0] = [0,0,0]
using np.where
this is less elegant in my opinion:
a[np.where(a[:,0]==0)[0]] = [0,0,0]
Edit: generalization
input:
a = np.arange(3**3).reshape((3,3,3))
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
transformation:
a[a[...,0]==0] = [0,0,0]
array([[[ 0, 0, 0],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I have an ndarray of shape (10, 3) and an index list of length 10:
import numpy as np
arr = np.arange(10* 3).reshape((10, 3))
idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
I want to use numpy delete (or a numpy function that is suited better for the task) to delete the values in arr as indicated by idxs for each row. So in the zeroth row of arr I want to delete the 0th entry, in the first the first, in the second the first, and so on.
I tried something like
np.delete(arr, idxs, axis=1)
but it won't work. Then I tried building an index list like this:
idlist = [np.arange(len(idxs)), idxs]
np.delete(arr, idlist)
but this doesn't give me the results I want either.
#Quang's answer is good, but may benefit from some explanation.
np.delete works with whole rows or columns, not selected elements from each.
In [30]: arr = np.arange(10* 3).reshape((10, 3))
...: idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
Selecting items from the array is easy:
In [31]: arr[np.arange(10), idxs]
Out[31]: array([ 0, 4, 7, 10, 14, 15, 20, 23, 25, 27])
Selecting everything but these, takes a bit more work. np.delete is complex general code that does different things depending on the delete specification. But one thing it can do is create a True mask, and set the delete items to False.
For your 2d case we can:
In [33]: mask = np.ones(arr.shape, bool)
In [34]: mask[np.arange(10), idxs] = False
In [35]: arr[mask]
Out[35]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
boolean indexing produces a flat array, so we need to reshape to get 2d:
In [36]: arr[mask].reshape(10,2)
Out[36]:
array([[ 1, 2],
[ 3, 5],
[ 6, 8],
[ 9, 11],
[12, 13],
[16, 17],
[18, 19],
[21, 22],
[24, 26],
[28, 29]])
The Quand's answer creates the mask in another way:
In [37]: arr[np.arange(arr.shape[1]) != idxs[:,None]]
Out[37]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
Let's try extracting the other items by masking, then reshape:
arr[np.arange(arr.shape[1]) != idxs[:,None]].reshape(len(arr),-1)
Thanks for your question and the answers from Quang, and hpaulj.
I just want to add a second senario, where one wants to do the deletion from the other axis.
The index now has only 3 elements because there are only 3 columns in arr, for example:
idxs2 = np.array([1,2,3])
To delete the elements of each column according to the index in idxs2, one can do this
arr.T[np.array(np.arange(arr.shape[0]) != idxs2[:,None])].reshape(len(idxs2),-1).T
And the result becomes:
array([[ 0, 1, 2],
[ 6, 4, 5],
[ 9, 10, 8],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]])
I have a 3D array a of data and a 2D array b of indices. I need to take a sub-array of a along the 3rd axis, using the indices from b. I can do it with take like this:
a = np.arange(24).reshape((2,3,4))
b = np.array([0,2,1,3]).reshape((2,2))
np.array([np.take(a_,b_,axis=1) for (a_,b_) in zip(a,b)])
Can I do it without list comprehension, using some fancy indexing? I am worried about efficiency, so if fancy indexing is not more efficient in this case, I would like to know it.
EDIT The 1st thing I've tried is a[[0,1],:,b] but it doesn't give the sub-array I need
In [317]: a
Out[317]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [318]: a = np.arange(24).reshape((2,3,4))
...: b = np.array([0,2,1,3]).reshape((2,2))
...: np.array([np.take(a_,b_,axis=1) for (a_,b_) in zip(a,b)])
...:
Out[318]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
So you want the 0 & 2 columns from the 1st block, and 1 & 3 from the second.
Make a c that matches b in shape, and embodies this observation
In [319]: c=np.array([[0,0],[1,1]])
In [320]: c
Out[320]:
array([[0, 0],
[1, 1]])
In [321]: b
Out[321]:
array([[0, 2],
[1, 3]])
In [322]: a[c,:,b]
Out[322]:
array([[[ 0, 4, 8],
[ 2, 6, 10]],
[[13, 17, 21],
[15, 19, 23]]])
That's the right numbers, but not the right shape.
A column vector can be used instead of c.
In [323]: a[np.arange(2)[:,None],:,b] # or a[[[0],[1]],:,b]
Out[323]:
array([[[ 0, 4, 8],
[ 2, 6, 10]],
[[13, 17, 21],
[15, 19, 23]]])
As for the shape, we can transpose the last two axes
In [324]: a[np.arange(2)[:,None],:,b].transpose(0,2,1)
Out[324]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
This transpose is required because we have a slice between two index arrays, a mix of basic and advanced indexing. It's documented, but never the less often puzzling. It put the slice dimension (3) last, and we have to transpose it back.
Nice little indexing puzzle!
The latest question and explanation of this advanced/basic transpose:
Indexing numpy multidimensional arrays depends on a slicing method
This is my first try. I will see if I can do better.
#using numpy broadcasting.
np.r_[a[0][:,b[0]],a[1][:,b[1]]].reshape(2,3,2)
Out[300]: In [301]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
Second try:
#convert both a and b to a 2d array and then slice all rows and only columns determined by b.
a.reshape(6,4)[np.arange(6)[:,None],b.repeat(3,0)].reshape(2,3,2)
Out[429]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
I have a 2D numpy array (i.e matrix) A which contains useful data interspread with garbage in the form of column vectors as well as a 'selection' array B which contains '1' for those columns that are important and 0 for those that are not. Is there a way to select only those columns from A that correspond to ones in B? i.e i have a matrix
A = array([[ 0, 1, 2, 3, 4], and a vector B = array([ 0, 1, 0, 1, 0])
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and I want
array([[1, 3],
[6, 8],
[11, 13],
[16, 18],
[21, 23]])
Is there an elegant way to do so? Right now i just have a for loop that iterates through B.
NOTE: the matrices that i'm dealing with are large, so i don't want to use numpy masked arrays, as i simply don't want the masked data
>>> A
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> B = NP.array([ 0, 1, 0, 1, 0])
>>> # convert the indexing array to a boolean array
>>> B = NP.array(B, dtype=bool)
>>> # index A against B--indexing array is placed after the ',' because
>>> # you are selecting columns
>>> res = A[:,B]
>>> res
array([[ 1, 3],
[ 6, 8],
[11, 13],
[16, 18],
[21, 23]])
The syntax for index-based slicing in NumPy is elegant and simple. A couple of rules cover a majority of use cases:
the form is [rows, columns]
specify all rows or all columns using a colon ":" e.g., [:, 4] (extracts the
entire 5th column)
Not sure if it's the most efficient way (because of the transposition), but it should be better than a for loop:
A.T[B == 1].T
I was interested to do the same but to slice row & column using the boolean values of vector B, the solution was simple:
res = A[:,B][B,:]