Related
How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)
I am writing a scheduling program for group play. I have a schedule that works for 32-4-8 (32 players, 4 players per group, 8 rounds) with no duplicate partners or opponents. However, due to space constraints, only 28 players / 7 groups can play in each round. So I have to modify the schedule so that every player gets 7 games, 1 bye, and as few repeat partners or opponents as possible.
import numpy as np
sched = np.array([
[[ 3, 28, 17, 14],
[23, 30, 22, 1],
[ 2, 5, 27, 25],
[20, 8, 10, 16],
[ 0, 24, 26, 11],
[ 4, 21, 31, 7],
[19, 6, 29, 15],
[13, 18, 12, 9]],
[[20, 15, 24, 31],
[ 3, 21, 16, 13],
[ 6, 30, 4, 5],
[28, 8, 0, 7],
[25, 29, 17, 23],
[14, 9, 2, 22],
[27, 12, 1, 11],
[26, 10, 19, 18]],
[[10, 4, 23, 12],
[ 9, 28, 25, 31],
[ 5, 13, 22, 8],
[15, 7, 30, 2],
[16, 19, 11, 14],
[18, 17, 24, 6],
[21, 0, 27, 20],
[ 3, 26, 29, 1]],
[[18, 20, 28, 1],
[ 8, 9, 3, 4],
[12, 17, 31, 5],
[13, 30, 27, 14],
[19, 25, 24, 7],
[ 2, 6, 21, 26],
[10, 11, 29, 22],
[15, 23, 0, 16]],
[[22, 21, 25, 15],
[26, 12, 20, 14],
[28, 5, 24, 10],
[11, 6, 31, 13],
[23, 27, 7, 3],
[ 0, 19, 9, 1],
[18, 30, 8, 29],
[16, 17, 2, 4]],
[[29, 28, 12, 21],
[ 9, 16, 27, 6],
[19, 17, 20, 30],
[ 2, 8, 24, 23],
[ 5, 11, 18, 7],
[26, 13, 25, 4],
[ 1, 10, 15, 14],
[ 0, 22, 31, 3]],
[[31, 19, 27, 8],
[20, 5, 29, 2],
[24, 16, 22, 12],
[25, 3, 10, 6],
[17, 1, 7, 13],
[ 4, 0, 14, 18],
[23, 28, 26, 15],
[11, 21, 9, 30]],
[[31, 18, 1, 16],
[23, 14, 21, 5],
[ 8, 3, 11, 15],
[26, 17, 9, 10],
[30, 12, 25, 0],
[22, 20, 7, 6],
[27, 4, 29, 24],
[13, 19, 28, 2]]
])
To determine the best bye selections, I randomly selected one matchup from each round of play as the bye. I then assign a score to each bye selection that maximizes the number of players that have only 1 bye, to minimize the necessary alterations to the schedule.
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = np.arange(len(arr))
for col in arr.T:
count[indexing, col] += 1
return count
# randomly sample one game per round as byes
# repeat n times (here 10000)
times = 10000
idx1 = np.tile(np.arange(sched.shape[0]), times)
idx2 = np.random.randint(sched.shape[1], size=sched.shape[0] * times)
population_byes = sched[idx1, idx2].reshape(times, sched.shape[1], sched.shape[2])
# get player counts for byes
# can reshape because interested in # of byes for entire schedule
# so no need to segment players by rounds for these counts
count_shape = (population_byes.shape[0], population_byes.shape[1] * population_byes.shape[2])
counts = bincount2d(population_byes.reshape(count_shape))
# fitness is the number of players with one bye
# the higher the value, the less we need to do to mess with the schedule
fitness = np.apply_along_axis(lambda x: (x == 1).sum(), 1, counts)
byes = population_byes[np.argmax(fitness)]
My questions are as follows:
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
(3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
bincount2d is inefficient because it performs inefficient memory accesses. Indeed, a transposition is an expensive operation, especially when it is done lazily like what Numpy does. Moreover, the loop is not efficient too because it works on a quite big array with a random memory access which is bad for CPU caches. That being said, Numpy is not great for such a computation. One can use Numba to implement the operation efficiently:
import numba as nb
# You may need to tune the types on your machines
# Alternatively, you can use cache=True instead and let Numba find the types (which is slower the fist time)
#nb.njit('int64[:,::1](int64[:,::1], optional(int64))')
def bincount2d_fast(arr, bins=None):
if bins is None:
nbins = np.max(arr) + 1
else:
nbins = np.int64(bins)
count = np.zeros((arr.shape[0], nbins), dtype=np.int64)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
count[i, arr[i, j]] += 1
return count
The above code is 10 times faster than the original bincount2d function on my machine.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
Yes. You can do the operation on the whole 2D array and perform the reduction on a given axis. Here is an example:
fitness = (counts == 1).sum(axis=1)
byes = population_byes[np.argmax(fitness)]
```
This is roughly 30 times faster on my machine.
> (3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
A straightforward solution is to use Numba again with plain loops. Another solution could be to save the value to swap in a temporary array and use an indirect access regarding your exact needs (like what #WholeBrain proposed). Something like:
```python
# all_x1, all_y1, etc. are 1D Numpy arrays containing coordinates of the items to swap
arr[all_x2, all_y2], arr[all_x1, all_y1] = arr[all_x1, all_y1], arr[all_x2, all_y2]
```
I have a matrix B with shape (6, 9) . And for every row of B, I want to add 1 at some column indices. The column indices may appear more than once, so I hope add m on one column if which index appear m times. Please see the following example codes:
import numpy as np
B = np.arange(6*9).reshape(6, 9)
idx = np.array([[0, 1, 2],
[6, 7, 0],
[2, 3, 4],
[4, 5, 6]], dtype=np.int)
B[:, idx] += 1 # the result is not what I want.
Furthermore, np.add.at and np.bincount also do not seem to work for above case.
I hope your help. Thanks very much.
More Information:
In idx array, index 0, 2 4 and 6 appear twice, so I want
B[:, [0, 2, 4, 6]] += 2. For other indices appeared once, just add 1. So the final B should be
B = np.array([[ 2, 2, 4, 4, 6, 6, 8, 8, 8],
[11, 11, 13, 13, 15, 15, 17, 17, 17],
[20, 20, 22, 22, 24, 24, 26, 26, 26],
[29, 29, 31, 31, 33, 33, 35, 35, 35],
[38, 38, 40, 40, 42, 42, 44, 44, 44],
[47, 47, 49, 49, 51, 51, 53, 53, 53]])
I think you can use np.add.at function to get what you want. Its syntax is
np.add.at('array', ('slice or array of indices for 1st dimension', 'slice or array of indices for 2nd dimension'), 'what to add')
So, in your case, if you want to add 1 for every row for every column, specified in idx, you should use
>>> a = np.arange(6 * 9).reshape(6, 9)
>>> np.add.at(a, (np.s_[:], idx), 1)
np.s_[:] is a slice object that tells us to perform it for each row
I know there's an easy way using transpose to rotate a square-matrix 90 degrees, but I'm writing a solution as if I don't know that one (in other words, I want to do the swapping). The generally idea below is to swap layer by layer. The offset represents what layer (outward to inward) is being swapped.
Here's the algorithm (note that it is wrapped in a class because that's a Leetcode thing):
class Solution:
def rotate(self, matrix):
for start in range(len(matrix)//2):
end = len(matrix) - start - 1
# swap all 4 coordinates:
for offset in range(start, end):
# swap top_left over top_right
temp, matrix[start+offset][end] = matrix[start+offset][end], matrix[start][start+offset]
# swap top_right -> bottom_right
temp, matrix[end][end-offset] = matrix[end][end-offset], temp
# swap bottom_right -> bottom_left
temp, matrix[end-offset][start] = matrix[end-offset][start], temp
# swap bottom_left -> top_left
matrix[start][start+offset] = temp
This works for some hand tests with small matrices, as well as the smaller input test cases in the Leetcode submission. However, it fails on the following input:
[[ 2, 29, 20, 26, 16, 28],
[12, 27, 9, 25, 13, 21],
[32, 33, 32, 2, 28, 14],
[13, 14, 32, 27, 22, 26],
[33, 1, 20, 7, 21, 7],
[ 4, 24, 1, 6, 32, 34]]
Expected output:
[[ 4, 33, 13, 32, 12, 2],
[24, 1, 14, 33, 27, 29],
[ 1, 20, 32, 32, 9, 20],
[ 6, 7, 27, 2, 25, 26],
[32, 21, 22, 28, 13, 16],
[34, 7, 26, 14, 21, 28]]
My output:
[[ 4, 33, 13, 32, 12, 2],
[24, 1, 7, 33, 27, 29],
[ 1, 20, 32, 2, 14, 20],
[ 6, 28, 32, 27, 25, 26],
[32, 21, 22, 9, 13, 16],
[34, 7, 26, 14, 21, 28]]
This matrix is just big enough to where it becomes tedious to walk through the algorithm by hand like I did for the smaller input cases to debug. Where is the bug in my implementation?
Your offset is off, try
for offset in range(0, end-start):
Your (random) test data is hard to follow and debug. It would better to use readable set.
Also seems that implementation makes excessive swaps.
It is enough to remember cell value, shift data cells in cyclic manner, retrieve remembered value.
Here is simple implementation of rotation with mat filled with sequential numbers:
n = 5
nm = n - 1
mat = []
for i in range(n):
a = [x for x in range(n*i, n*i+n)]
mat.append(a)
for i in range(n):
(print(mat[i]))
for row in range((n + 1)//2):
for col in range(n//2):
t = mat[row] [col]
mat[row][col] = mat[nm-col][row]
mat[nm-col][row] = mat[nm-row][nm-col]
mat[nm-row][nm - col] = mat[col][nm-row]
mat[col][nm-row] = t
print("")
for i in range(n):
(print(mat[i]))
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[10, 11, 12, 13, 14]
[15, 16, 17, 18, 19]
[20, 21, 22, 23, 24]
[20, 15, 10, 5, 0]
[21, 16, 11, 6, 1]
[22, 17, 12, 7, 2]
[23, 18, 13, 8, 3]
[24, 19, 14, 9, 4]
In order to make that rotation, the value of i-th row and j-th column of the input matrix should be made equal to the j-th row i-th column of the output matrix (transpose) and (next step) columns should be reversed. If i were you I would define an empty matrix and (say TRANSPOSE, has the same size as the input matrix), would define 2 for loops (i in range len(matrix), and j in range len(matrix). Inside second loop I would write
TRANSPOSE[i][j] = matrix[j][i]
Now I have the matrix you want with reversed columns. Now we need to reverse columns to get what you want, so I would define another for loops with the same variables and range, and need one more empty matrix with the same size or we can use the input matrix again, let's use the input matrix in this case.
matrix[i][j] = TRANSPOSE[i][len(matrix) - j - 1]
and return matrix
If I slice a 2d array with a set of coordinates
>>> test = np.reshape(np.arange(40),(5,8))
>>> coords = np.array((1,3,4))
>>> slice = test[:, coords]
then my slice has the shape that I would expect
>>> slice.shape
(5, 3)
But if I repeat this with a 3d array
>>> test = np.reshape(np.arange(80),(2,5,8))
>>> slice = test[0, :, coords]
then the shape is now
>>> slice.shape
(3, 5)
Is there a reason that these are different? Separating the indices returns the shape that I would expect
>>> slice = test[0][:][coords]
>>> slice.shape
(5, 3)
Why would these views have different shapes?
slice = test[0, :, coords]
is simple indexing, in effect saying "take the 0th element of the first coordinate, all of the second coordinate, and [1,3,4] of the third coordinate". Or more precisely, take coordinates (0,whatever,1) and make it our first row, (0,whatever,2) and make it our second row, and (0,whatever,3) and make it our third row. There are 5 whatevers, so you end up with (3,5).
The second example you gave is like this:
slice = test[0][:][coords]
In this case you're looking at a (5,8) array, and then taking the 1st, 3rd and 4th elements, which are the 1st, 3rd and 4th rows, so you end up with a (5,3) array.
Edit to discuss 2D case:
In the 2D case, where:
>>> test = np.reshape(np.arange(40),(5,8))
>>> test
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
the behaviour is similar.
Case 1:
>>> test[:,[1,3,4]]
array([[ 1, 3, 4],
[ 9, 11, 12],
[17, 19, 20],
[25, 27, 28],
[33, 35, 36]])
is simply selecting columns 1,3, and 4.
Case 2:
>>> test[:][[1,3,4]]
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
is taking the 1st, 3rd and 4th element of the array, which are the rows.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The docs talk about the complexity of combining advanced and basic indexing.
test[0, :, coords]
The indexing coords comes first, with the [0,:] after, producing the the (3,5).
The easiest way to understand the situation may be to think in terms of the result shape. There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the subspace from the advanced indexing part. [in the case where]
The advanced indexes are separated by a slice, ellipsis or newaxis. For example x[arr1, :, arr2].
.... the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
I recall discussing this kind of indexing in a previous SO question, but it would take some digging to find it.
https://stackoverflow.com/a/28353446/901925 Why does the order of dimensions change with boolean indexing?
How does numpy order array slice indices?
The [:] in test[0][:][coords] does nothing. test[0][:,coords] produces the desired (5,3) result.
In [145]: test[0,:,[1,2,3]] # (3,5) array
Out[145]:
array([[ 1, 9, 17, 25, 33], # test[0,:,1]
[ 2, 10, 18, 26, 34],
[ 3, 11, 19, 27, 35]])
In [146]: test[0][:,[1,2,3]] # same values but (5,3)
Out[146]:
array([[ 1, 2, 3],
[ 9, 10, 11],
[17, 18, 19],
[25, 26, 27],
[33, 34, 35]])
In [147]: test[0][:][[1,2,3]] # [:] does nothing; select 3 from 2nd axis
Out[147]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])
In [148]: test[0][[1,2,3]] # same as test[0,[1,2,3],:]
Out[148]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])