Related
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
I have an ndarray of shape (10, 3) and an index list of length 10:
import numpy as np
arr = np.arange(10* 3).reshape((10, 3))
idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
I want to use numpy delete (or a numpy function that is suited better for the task) to delete the values in arr as indicated by idxs for each row. So in the zeroth row of arr I want to delete the 0th entry, in the first the first, in the second the first, and so on.
I tried something like
np.delete(arr, idxs, axis=1)
but it won't work. Then I tried building an index list like this:
idlist = [np.arange(len(idxs)), idxs]
np.delete(arr, idlist)
but this doesn't give me the results I want either.
#Quang's answer is good, but may benefit from some explanation.
np.delete works with whole rows or columns, not selected elements from each.
In [30]: arr = np.arange(10* 3).reshape((10, 3))
...: idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
Selecting items from the array is easy:
In [31]: arr[np.arange(10), idxs]
Out[31]: array([ 0, 4, 7, 10, 14, 15, 20, 23, 25, 27])
Selecting everything but these, takes a bit more work. np.delete is complex general code that does different things depending on the delete specification. But one thing it can do is create a True mask, and set the delete items to False.
For your 2d case we can:
In [33]: mask = np.ones(arr.shape, bool)
In [34]: mask[np.arange(10), idxs] = False
In [35]: arr[mask]
Out[35]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
boolean indexing produces a flat array, so we need to reshape to get 2d:
In [36]: arr[mask].reshape(10,2)
Out[36]:
array([[ 1, 2],
[ 3, 5],
[ 6, 8],
[ 9, 11],
[12, 13],
[16, 17],
[18, 19],
[21, 22],
[24, 26],
[28, 29]])
The Quand's answer creates the mask in another way:
In [37]: arr[np.arange(arr.shape[1]) != idxs[:,None]]
Out[37]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
Let's try extracting the other items by masking, then reshape:
arr[np.arange(arr.shape[1]) != idxs[:,None]].reshape(len(arr),-1)
Thanks for your question and the answers from Quang, and hpaulj.
I just want to add a second senario, where one wants to do the deletion from the other axis.
The index now has only 3 elements because there are only 3 columns in arr, for example:
idxs2 = np.array([1,2,3])
To delete the elements of each column according to the index in idxs2, one can do this
arr.T[np.array(np.arange(arr.shape[0]) != idxs2[:,None])].reshape(len(idxs2),-1).T
And the result becomes:
array([[ 0, 1, 2],
[ 6, 4, 5],
[ 9, 10, 8],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]])
I have a matrix B with shape (6, 9) . And for every row of B, I want to add 1 at some column indices. The column indices may appear more than once, so I hope add m on one column if which index appear m times. Please see the following example codes:
import numpy as np
B = np.arange(6*9).reshape(6, 9)
idx = np.array([[0, 1, 2],
[6, 7, 0],
[2, 3, 4],
[4, 5, 6]], dtype=np.int)
B[:, idx] += 1 # the result is not what I want.
Furthermore, np.add.at and np.bincount also do not seem to work for above case.
I hope your help. Thanks very much.
More Information:
In idx array, index 0, 2 4 and 6 appear twice, so I want
B[:, [0, 2, 4, 6]] += 2. For other indices appeared once, just add 1. So the final B should be
B = np.array([[ 2, 2, 4, 4, 6, 6, 8, 8, 8],
[11, 11, 13, 13, 15, 15, 17, 17, 17],
[20, 20, 22, 22, 24, 24, 26, 26, 26],
[29, 29, 31, 31, 33, 33, 35, 35, 35],
[38, 38, 40, 40, 42, 42, 44, 44, 44],
[47, 47, 49, 49, 51, 51, 53, 53, 53]])
I think you can use np.add.at function to get what you want. Its syntax is
np.add.at('array', ('slice or array of indices for 1st dimension', 'slice or array of indices for 2nd dimension'), 'what to add')
So, in your case, if you want to add 1 for every row for every column, specified in idx, you should use
>>> a = np.arange(6 * 9).reshape(6, 9)
>>> np.add.at(a, (np.s_[:], idx), 1)
np.s_[:] is a slice object that tells us to perform it for each row
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
If I slice a 2d array with a set of coordinates
>>> test = np.reshape(np.arange(40),(5,8))
>>> coords = np.array((1,3,4))
>>> slice = test[:, coords]
then my slice has the shape that I would expect
>>> slice.shape
(5, 3)
But if I repeat this with a 3d array
>>> test = np.reshape(np.arange(80),(2,5,8))
>>> slice = test[0, :, coords]
then the shape is now
>>> slice.shape
(3, 5)
Is there a reason that these are different? Separating the indices returns the shape that I would expect
>>> slice = test[0][:][coords]
>>> slice.shape
(5, 3)
Why would these views have different shapes?
slice = test[0, :, coords]
is simple indexing, in effect saying "take the 0th element of the first coordinate, all of the second coordinate, and [1,3,4] of the third coordinate". Or more precisely, take coordinates (0,whatever,1) and make it our first row, (0,whatever,2) and make it our second row, and (0,whatever,3) and make it our third row. There are 5 whatevers, so you end up with (3,5).
The second example you gave is like this:
slice = test[0][:][coords]
In this case you're looking at a (5,8) array, and then taking the 1st, 3rd and 4th elements, which are the 1st, 3rd and 4th rows, so you end up with a (5,3) array.
Edit to discuss 2D case:
In the 2D case, where:
>>> test = np.reshape(np.arange(40),(5,8))
>>> test
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
the behaviour is similar.
Case 1:
>>> test[:,[1,3,4]]
array([[ 1, 3, 4],
[ 9, 11, 12],
[17, 19, 20],
[25, 27, 28],
[33, 35, 36]])
is simply selecting columns 1,3, and 4.
Case 2:
>>> test[:][[1,3,4]]
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
is taking the 1st, 3rd and 4th element of the array, which are the rows.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The docs talk about the complexity of combining advanced and basic indexing.
test[0, :, coords]
The indexing coords comes first, with the [0,:] after, producing the the (3,5).
The easiest way to understand the situation may be to think in terms of the result shape. There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the subspace from the advanced indexing part. [in the case where]
The advanced indexes are separated by a slice, ellipsis or newaxis. For example x[arr1, :, arr2].
.... the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
I recall discussing this kind of indexing in a previous SO question, but it would take some digging to find it.
https://stackoverflow.com/a/28353446/901925 Why does the order of dimensions change with boolean indexing?
How does numpy order array slice indices?
The [:] in test[0][:][coords] does nothing. test[0][:,coords] produces the desired (5,3) result.
In [145]: test[0,:,[1,2,3]] # (3,5) array
Out[145]:
array([[ 1, 9, 17, 25, 33], # test[0,:,1]
[ 2, 10, 18, 26, 34],
[ 3, 11, 19, 27, 35]])
In [146]: test[0][:,[1,2,3]] # same values but (5,3)
Out[146]:
array([[ 1, 2, 3],
[ 9, 10, 11],
[17, 18, 19],
[25, 26, 27],
[33, 34, 35]])
In [147]: test[0][:][[1,2,3]] # [:] does nothing; select 3 from 2nd axis
Out[147]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])
In [148]: test[0][[1,2,3]] # same as test[0,[1,2,3],:]
Out[148]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])