Related
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
I have an ndarray of shape (10, 3) and an index list of length 10:
import numpy as np
arr = np.arange(10* 3).reshape((10, 3))
idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
I want to use numpy delete (or a numpy function that is suited better for the task) to delete the values in arr as indicated by idxs for each row. So in the zeroth row of arr I want to delete the 0th entry, in the first the first, in the second the first, and so on.
I tried something like
np.delete(arr, idxs, axis=1)
but it won't work. Then I tried building an index list like this:
idlist = [np.arange(len(idxs)), idxs]
np.delete(arr, idlist)
but this doesn't give me the results I want either.
#Quang's answer is good, but may benefit from some explanation.
np.delete works with whole rows or columns, not selected elements from each.
In [30]: arr = np.arange(10* 3).reshape((10, 3))
...: idxs = np.array([0, 1, 1, 1, 2, 0, 2, 2, 1 , 0])
Selecting items from the array is easy:
In [31]: arr[np.arange(10), idxs]
Out[31]: array([ 0, 4, 7, 10, 14, 15, 20, 23, 25, 27])
Selecting everything but these, takes a bit more work. np.delete is complex general code that does different things depending on the delete specification. But one thing it can do is create a True mask, and set the delete items to False.
For your 2d case we can:
In [33]: mask = np.ones(arr.shape, bool)
In [34]: mask[np.arange(10), idxs] = False
In [35]: arr[mask]
Out[35]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
boolean indexing produces a flat array, so we need to reshape to get 2d:
In [36]: arr[mask].reshape(10,2)
Out[36]:
array([[ 1, 2],
[ 3, 5],
[ 6, 8],
[ 9, 11],
[12, 13],
[16, 17],
[18, 19],
[21, 22],
[24, 26],
[28, 29]])
The Quand's answer creates the mask in another way:
In [37]: arr[np.arange(arr.shape[1]) != idxs[:,None]]
Out[37]:
array([ 1, 2, 3, 5, 6, 8, 9, 11, 12, 13, 16, 17, 18, 19, 21, 22, 24,
26, 28, 29])
Let's try extracting the other items by masking, then reshape:
arr[np.arange(arr.shape[1]) != idxs[:,None]].reshape(len(arr),-1)
Thanks for your question and the answers from Quang, and hpaulj.
I just want to add a second senario, where one wants to do the deletion from the other axis.
The index now has only 3 elements because there are only 3 columns in arr, for example:
idxs2 = np.array([1,2,3])
To delete the elements of each column according to the index in idxs2, one can do this
arr.T[np.array(np.arange(arr.shape[0]) != idxs2[:,None])].reshape(len(idxs2),-1).T
And the result becomes:
array([[ 0, 1, 2],
[ 6, 4, 5],
[ 9, 10, 8],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]])
If I have a list:
lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
I would like to cast the above list into an array with the following arrangements of the elements:
array([[ 1, 2, 3, 7, 8, 9]
[ 4, 5, 6, 10, 11, 12]
[13, 14, 15, 19, 20, 21]
[16, 17, 18, 22, 23, 24]])
How do I do this or what is the best way to do this? Many thanks.
I have done this in a crude way below where I will just get all the sub-matrix and then concatenate all of them at the end:
np.array(results[arr.shape[0]*arr.shape[1]*0:arr.shape[0]*arr.shape[1]*1]).reshape(arr.shape[0], arr.shape[1])
array([[1, 2, 3],
[4, 5, 6]])
np.array(results[arr.shape[0]*arr.shape[1]*1:arr.shape[0]*arr.shape[1]*2]).reshape(arr.shape[0], arr.shape[1])
array([[ 7, 8, 9],
[ 10, 11, 12]])
etc,
But I will need a more generalized way of doing this (if there is one) as I will need to do this for an array of any size.
You could use the reshape function from numpy, with a bit of indexing :
a = np.arange(24)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
Using reshape and a bit of indexing :
a = a.reshape((8,3))
idx = np.arange(2)
idx = np.concatenate((idx,idx+4))
idx = np.ravel([idx,idx+2],'F')
b = a[idx,:].reshape((4,6))
Ouptut :
>>> b
array([[ 0, 1, 2, 6, 7, 8],
[ 3, 4, 5, 9, 10, 11],
[12, 13, 14, 18, 19, 20],
[15, 16, 17, 21, 22, 23]])
Here the tuple (4,6) passed to reshape indicates that you want your array to be 2 dimensional, and have 4 arrays of 6 elements. Those values can be computed.
Then we compute the index to set the correct order of the data. Obvisouly, this a complicated bit here. As I'm not sure what you mean by "any size of data", its difficult for me to give you a agnostic way to compute that index.
Obviously, if you are using a list and not an np.array, you might have to convert the list first, for example by using np.array(your_list).
Edit :
I'm not sure if this exactly what you are after, but this should work for any array evenly divisible by 6 :
def custom_order(size):
a = np.arange(size)
a = a.reshape((size//3,3))
idx = np.arange(2)
idx = np.concatenate([idx+4*i for i in range(0,size//(6*2))])
idx = np.ravel([idx,idx+2],'F')
b = a[idx,:].reshape((size//6,6))
return b
>>> custom_order(48)
array([[ 0, 1, 2, 6, 7, 8],
[ 3, 4, 5, 9, 10, 11],
[12, 13, 14, 18, 19, 20],
[15, 16, 17, 21, 22, 23],
[24, 25, 26, 30, 31, 32],
[27, 28, 29, 33, 34, 35],
[36, 37, 38, 42, 43, 44],
[39, 40, 41, 45, 46, 47]])
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
is there a way to do the following without an if clause?
I'm reading a set of netcdf files with pupynere and want to build an array with numpy append. Sometimes the input data is multi-dimensional (see variable "a" below), sometimes one dimensional ("b"), but the number of elements in the first dimension is always the same ("9" in the example below).
> import numpy as np
> a = np.arange(27).reshape(3,9)
> b = np.arange(9)
> a.shape
(3, 9)
> b.shape
(9,)
this works as expected:
> np.append(a,a, axis=0)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]])
but, appending b does not work so elegantly:
> np.append(a,b, axis=0)
ValueError: arrays must have same number of dimensions
The problem with append is (from the numpy manual)
"When axis is specified, values must have the correct shape."
I'd have to cast first in order to get the right result.
> np.append(a,b.reshape(1,9), axis=0)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8]])
So, in my file reading loop, I'm currently using an if clause like this:
for i in [a, b]:
if np.size(i.shape) == 2:
result = np.append(result, i, axis=0)
else:
result = np.append(result, i.reshape(1,9), axis=0)
Is there a way to append "a" and "b" without the if statement?
EDIT: While #Sven answered the original question perfectly (using np.atleast_2d()), he (and others) pointed out that the code is inefficient. In an answer below, I combined their suggestions and replaces my original code. It should be much more efficient now. Thanks.
You can use numpy.atleast_2d():
result = np.append(result, np.atleast_2d(i), axis=0)
That said, note that the repeated use of numpy.append() is a very inefficient way to build a NumPy array -- it has to be reallocated in every step. If at all possible, preallocate the array with the desired final size and populate it afterwards using slicing.
You can just add all of the arrays to a list, then use np.vstack() to concatenate them all together at the end. This avoids constantly reallocating the growing array with every append.
|1> a = np.arange(27).reshape(3,9)
|2> b = np.arange(9)
|3> np.vstack([a,b])
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8]])
I'm going to improve my code with the help of #Sven, #Henry and #Robert. #Sven answered the question, so he earns the reputation for this question, but - as highlighted by him and others -there is a more efficient way of doing what I want.
This involves using a python list, which allows appending with a performance penalty of O(1) whereas numpy.append() has a performance penalty of O(N**2). Afterwards, the list is converted to a numpy array:
Suppose i is either of type a or b:
> a = np.arange(27).reshape(3,9)
> b = np.arange(9)
> a.shape
(3, 9)
> b.shape
(9,)
Initialise list and append all read data, e.g. if data appear in order 'aaba'.
> mList = []
> for i in [a,a,b,a]:
mList.append(i)
Your mList will look like this:
> mList
[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]]),
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]]),
array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]])]
finally, vstack the list to form a numpy array:
> result = np.vstack(mList[:])
> result.shape
(10, 9)
Thanks again for valuable help.
As pointed out, append needs to reallocate every numpy array. An alternative solution that allocates once would be something like this:
total_size = 0
for i in [a,b]:
total_size += i.size
result = numpy.empty(total_size, dtype=a.dtype)
offset = 0
for i in [a,b]:
# copy in the array
result[offset:offset+i.size] = i.ravel()
offset += i.size
# if you know its always divisible by 9:
result = result.reshape(result.size//9, 9)
If you can't precompute the array size, then perhaps you can put an upper bound on the size and then just preallocate a block that will always be big enough. Then you can just make the result a view into that block:
result = result[0:known_final_size]