How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
Related
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
I have an ndarray of shape (10, 3) and an index list of length 10:
import numpy as np
arr = np.arange(10* 3).reshape((10, 3))
idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
I want to use numpy delete (or a numpy function that is suited better for the task) to delete the values in arr as indicated by idxs for each row. So in the zeroth row of arr I want to delete the 0th entry, in the first the first, in the second the first, and so on.
I tried something like
np.delete(arr, idxs, axis=1)
but it won't work. Then I tried building an index list like this:
idlist = [np.arange(len(idxs)), idxs]
np.delete(arr, idlist)
but this doesn't give me the results I want either.
#Quang's answer is good, but may benefit from some explanation.
np.delete works with whole rows or columns, not selected elements from each.
In [30]: arr = np.arange(10* 3).reshape((10, 3))
...: idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
Selecting items from the array is easy:
In [31]: arr[np.arange(10), idxs]
Out[31]: array([ 0, 4, 7, 10, 14, 15, 20, 23, 25, 27])
Selecting everything but these, takes a bit more work. np.delete is complex general code that does different things depending on the delete specification. But one thing it can do is create a True mask, and set the delete items to False.
For your 2d case we can:
In [33]: mask = np.ones(arr.shape, bool)
In [34]: mask[np.arange(10), idxs] = False
In [35]: arr[mask]
Out[35]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
boolean indexing produces a flat array, so we need to reshape to get 2d:
In [36]: arr[mask].reshape(10,2)
Out[36]:
array([[ 1, 2],
[ 3, 5],
[ 6, 8],
[ 9, 11],
[12, 13],
[16, 17],
[18, 19],
[21, 22],
[24, 26],
[28, 29]])
The Quand's answer creates the mask in another way:
In [37]: arr[np.arange(arr.shape[1]) != idxs[:,None]]
Out[37]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
Let's try extracting the other items by masking, then reshape:
arr[np.arange(arr.shape[1]) != idxs[:,None]].reshape(len(arr),-1)
Thanks for your question and the answers from Quang, and hpaulj.
I just want to add a second senario, where one wants to do the deletion from the other axis.
The index now has only 3 elements because there are only 3 columns in arr, for example:
idxs2 = np.array([1,2,3])
To delete the elements of each column according to the index in idxs2, one can do this
arr.T[np.array(np.arange(arr.shape[0]) != idxs2[:,None])].reshape(len(idxs2),-1).T
And the result becomes:
array([[ 0, 1, 2],
[ 6, 4, 5],
[ 9, 10, 8],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]])
Say I have a 3D array such as:
>>> arr = numpy.arange(36).reshape(3, 4, 3)
>>> arr
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29],
[30, 31, 32],
[33, 34, 35]]])
How do I extract the nth value from each innermost row?
If I were to take the values for index 1, how can pull out the following?
array([[ 1, 4, 7, 10],
[13, 16, 19, 22],
[25, 28, 31, 34]])
or
array([ 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34])
TLDR:
arr[:,n] # for 2d
arr[:,:,n] # for 3d
arr[:,:,:,n] # for 4d
arr[:,:,:,:,n] # for 5d
You can access using the last index:
import numpy
arr = numpy.arange(36).reshape(3, 4, 3)
print(arr[:, :, 1])
Output
[[ 1 4 7 10]
[13 16 19 22]
[25 28 31 34]]
Or,
print(arr[:, :, 1].flatten())
Output
[ 1 4 7 10 13 16 19 22 25 28 31 34]
You can find more information about numpy indexing here.
UPDATE
As mentioned by #MadPhysicist in the comments, you can use ravel instead of flatten, the main difference is that flatten returns a copy and ravel returns a view. Also you can use an arr[..., 1] to access the last index, known as an ellipsis. From the documentation:
Ellipsis expand to the number of : objects needed to make a selection
tuple of the same length as x.ndim
Further
Ravel vs Flatten
Ellipsis
To extract elements from such array you can do the following:
>>> import numpy as np
>>> arr = np.arange(36).reshape(3, 4, 3)
>>> arr[:,:,1]
array([[ 1, 4, 7, 10],
[13, 16, 19, 22],
[25, 28, 31, 34]])
and if you want the flattened array you can do:
>>> arr.flatten()
array([ 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34])
I have a 2d array like z and a 1d array denoting the "start column position" like starts. In addition I have a fixed row_length = 2
z = np.arange(35).reshape(5, -1)
# --> array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
starts = np.array([1,5,3,3,2])
What I want is the outcome of this slow for-loop, just quicker if possible.
result = np.zeros(
(z.shape[0], row_length),
dtype=z.dtype
)
for i in range(z.shape[0]):
s = starts[i]
result[i] = z[i, s:s+row_length]
So result in this example should look like this in the end:
array([[ 1, 2],
[12, 13],
[17, 18],
[24, 25],
[30, 31]])
I can't seem to find a way using either fancy indexing or np.take to deliver this result.
One approach would be to get those indices using broadcasted additions with those starts and row_length and then use NumPy's advanced-indexing to extract out all of those elements off the data array, like so -
idx = starts[:,None] + np.arange(row_length)
out = z[np.arange(idx.shape[0])[:,None], idx]
Sample run -
In [197]: z
Out[197]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
In [198]: starts = np.array([1,5,3,3,2])
In [199]: row_length = 2
In [200]: idx = starts[:,None] + np.arange(row_length)
In [202]: z[np.arange(idx.shape[0])[:,None], idx]
Out[202]:
array([[ 1, 2],
[12, 13],
[17, 18],
[24, 25],
[30, 31]])
I have the following slicing problem in numpy.
a = np.arange(36).reshape(-1,4)
a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
In my problem always three rows represent one sample, in my case coordinates.
I want to access this matrix in a way that if I use a[0:2] to get the following:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]
These are the first two coordinate samples.
I have to extract a large amount of these coordinate sets from an array.
Thanks
Based on How do you split a list into evenly sized chunks?, I found the following solution, which gives me the desired result.
def chunks(l, n, indices):
return np.vstack([l[idx*n:idx*n+n] for idx in indices])
chunks(a,3,[0,2])
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
Probably this solution could be improved and somebody won't need the stacking.
If three rows are a sample, you can reshape your array to reflect that, use fancy indexing to retrieve your samples, then undo the shape change:
>>> a = a.reshape(-1, 3, 4)
>>> a[[0, 2]].reshape(-1, 4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])