Related
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
I am writing a scheduling program for group play. I have a schedule that works for 32-4-8 (32 players, 4 players per group, 8 rounds) with no duplicate partners or opponents. However, due to space constraints, only 28 players / 7 groups can play in each round. So I have to modify the schedule so that every player gets 7 games, 1 bye, and as few repeat partners or opponents as possible.
import numpy as np
sched = np.array([
[[ 3, 28, 17, 14],
[23, 30, 22, 1],
[ 2, 5, 27, 25],
[20, 8, 10, 16],
[ 0, 24, 26, 11],
[ 4, 21, 31, 7],
[19, 6, 29, 15],
[13, 18, 12, 9]],
[[20, 15, 24, 31],
[ 3, 21, 16, 13],
[ 6, 30, 4, 5],
[28, 8, 0, 7],
[25, 29, 17, 23],
[14, 9, 2, 22],
[27, 12, 1, 11],
[26, 10, 19, 18]],
[[10, 4, 23, 12],
[ 9, 28, 25, 31],
[ 5, 13, 22, 8],
[15, 7, 30, 2],
[16, 19, 11, 14],
[18, 17, 24, 6],
[21, 0, 27, 20],
[ 3, 26, 29, 1]],
[[18, 20, 28, 1],
[ 8, 9, 3, 4],
[12, 17, 31, 5],
[13, 30, 27, 14],
[19, 25, 24, 7],
[ 2, 6, 21, 26],
[10, 11, 29, 22],
[15, 23, 0, 16]],
[[22, 21, 25, 15],
[26, 12, 20, 14],
[28, 5, 24, 10],
[11, 6, 31, 13],
[23, 27, 7, 3],
[ 0, 19, 9, 1],
[18, 30, 8, 29],
[16, 17, 2, 4]],
[[29, 28, 12, 21],
[ 9, 16, 27, 6],
[19, 17, 20, 30],
[ 2, 8, 24, 23],
[ 5, 11, 18, 7],
[26, 13, 25, 4],
[ 1, 10, 15, 14],
[ 0, 22, 31, 3]],
[[31, 19, 27, 8],
[20, 5, 29, 2],
[24, 16, 22, 12],
[25, 3, 10, 6],
[17, 1, 7, 13],
[ 4, 0, 14, 18],
[23, 28, 26, 15],
[11, 21, 9, 30]],
[[31, 18, 1, 16],
[23, 14, 21, 5],
[ 8, 3, 11, 15],
[26, 17, 9, 10],
[30, 12, 25, 0],
[22, 20, 7, 6],
[27, 4, 29, 24],
[13, 19, 28, 2]]
])
To determine the best bye selections, I randomly selected one matchup from each round of play as the bye. I then assign a score to each bye selection that maximizes the number of players that have only 1 bye, to minimize the necessary alterations to the schedule.
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = np.arange(len(arr))
for col in arr.T:
count[indexing, col] += 1
return count
# randomly sample one game per round as byes
# repeat n times (here 10000)
times = 10000
idx1 = np.tile(np.arange(sched.shape[0]), times)
idx2 = np.random.randint(sched.shape[1], size=sched.shape[0] * times)
population_byes = sched[idx1, idx2].reshape(times, sched.shape[1], sched.shape[2])
# get player counts for byes
# can reshape because interested in # of byes for entire schedule
# so no need to segment players by rounds for these counts
count_shape = (population_byes.shape[0], population_byes.shape[1] * population_byes.shape[2])
counts = bincount2d(population_byes.reshape(count_shape))
# fitness is the number of players with one bye
# the higher the value, the less we need to do to mess with the schedule
fitness = np.apply_along_axis(lambda x: (x == 1).sum(), 1, counts)
byes = population_byes[np.argmax(fitness)]
My questions are as follows:
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
(3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
bincount2d is inefficient because it performs inefficient memory accesses. Indeed, a transposition is an expensive operation, especially when it is done lazily like what Numpy does. Moreover, the loop is not efficient too because it works on a quite big array with a random memory access which is bad for CPU caches. That being said, Numpy is not great for such a computation. One can use Numba to implement the operation efficiently:
import numba as nb
# You may need to tune the types on your machines
# Alternatively, you can use cache=True instead and let Numba find the types (which is slower the fist time)
#nb.njit('int64[:,::1](int64[:,::1], optional(int64))')
def bincount2d_fast(arr, bins=None):
if bins is None:
nbins = np.max(arr) + 1
else:
nbins = np.int64(bins)
count = np.zeros((arr.shape[0], nbins), dtype=np.int64)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
count[i, arr[i, j]] += 1
return count
The above code is 10 times faster than the original bincount2d function on my machine.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
Yes. You can do the operation on the whole 2D array and perform the reduction on a given axis. Here is an example:
fitness = (counts == 1).sum(axis=1)
byes = population_byes[np.argmax(fitness)]
```
This is roughly 30 times faster on my machine.
> (3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
A straightforward solution is to use Numba again with plain loops. Another solution could be to save the value to swap in a temporary array and use an indirect access regarding your exact needs (like what #WholeBrain proposed). Something like:
```python
# all_x1, all_y1, etc. are 1D Numpy arrays containing coordinates of the items to swap
arr[all_x2, all_y2], arr[all_x1, all_y1] = arr[all_x1, all_y1], arr[all_x2, all_y2]
```
If I have a 3D array of ([4,3,3]) like this:
[[0,1,2] [[9,10,11 ] [[18,19,20] [[27,28,29]
[3,4,5] [12,13,14] [21,22,23] [30,31,32]
[6,7,8]] , [15,16,17]] , [24,25,26]] , [33,34,35]]
How would I convert it to a 2D array of ([6,6]) like this so that the 1st half of arrays are at the top half of the 160x160 and the 2nd half are at the bottom:
[[0,1,2,9,10,11]
[3,4,5,12,13,14]
[6,7,8,15,16,17]
[18,19,20,27,28,29]
[21,22,23,30,31,32]
[24,25,26,33,34,35]]
My array creation:
qDCTReversed = np.zeros((400,8,8), dtype=np.int)
And I need a (160,160) array.
A very fast one line solution using no for-loops is this:
# initialization
qDCTReversed = np.arange(4*3*3).reshape((4,3,3))
# calculation
qDCTReversed = qDCTReversed.reshape((2,2,3,3)).transpose((0,2,1,3)).reshape((6,6))
or for the (400,8,8) array:
qDCTReversed.reshape((20,20,8,8)).transpose((0,2,1,3)).reshape((160,160))
Speed comparison:
Mstaino's answer: 0.393 ms
yatu's answer: 0.138 ms
This answer: 0.016 ms
You can do this by looping over the list as such:
a = [[[ 0, 1, 2], [ 9,10,11]],
[[ 3, 4, 5], [12,13,14]],
[[ 6, 7, 8], [15,16,17]],
[[18,19,20], [27,28,29]],
[[21,22,23], [30,31,32]],
[[24,25,26], [33,34,35]]]
b = [[i for j in k for i in j ] for k in a]
print(b)
outputs:
[ 0, 1, 2, 9, 10, 11]
[ 3, 4, 5, 12, 13, 14]
[ 6, 7, 8, 15, 16, 17]
[18, 19, 20, 27, 28, 29]
[21, 22, 23, 30, 31, 32]
[24, 25, 26, 33, 34, 35]
The reshape you ask can be done with:
x = np.arange(36).reshape((4,3,3))
np.vstack(np.hstack(x[2*i:2+2*i]) for i in range(x.shape[0]//2))
>>array([[ 0, 1, 2, 9, 10, 11],
[ 3, 4, 5, 12, 13, 14],
[ 6, 7, 8, 15, 16, 17],
[18, 19, 20, 27, 28, 29],
[21, 22, 23, 30, 31, 32],
[24, 25, 26, 33, 34, 35]])
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
I have the following slicing problem in numpy.
a = np.arange(36).reshape(-1,4)
a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
In my problem always three rows represent one sample, in my case coordinates.
I want to access this matrix in a way that if I use a[0:2] to get the following:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]
These are the first two coordinate samples.
I have to extract a large amount of these coordinate sets from an array.
Thanks
Based on How do you split a list into evenly sized chunks?, I found the following solution, which gives me the desired result.
def chunks(l, n, indices):
return np.vstack([l[idx*n:idx*n+n] for idx in indices])
chunks(a,3,[0,2])
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
Probably this solution could be improved and somebody won't need the stacking.
If three rows are a sample, you can reshape your array to reflect that, use fancy indexing to retrieve your samples, then undo the shape change:
>>> a = a.reshape(-1, 3, 4)
>>> a[[0, 2]].reshape(-1, 4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])