numpy indexing: fixed length parts of each row with varying starting column

numpy indexing: fixed length parts of each row with varying starting column - python

I have a 2d array like z and a 1d array denoting the "start column position" like starts. In addition I have a fixed row_length = 2
z = np.arange(35).reshape(5, -1)
# --> array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
starts = np.array([1,5,3,3,2])
What I want is the outcome of this slow for-loop, just quicker if possible.
result = np.zeros(
(z.shape[0], row_length),
dtype=z.dtype
)
for i in range(z.shape[0]):
s = starts[i]
result[i] = z[i, s:s+row_length]
So result in this example should look like this in the end:
array([[ 1, 2],
[12, 13],
[17, 18],
[24, 25],
[30, 31]])
I can't seem to find a way using either fancy indexing or np.take to deliver this result.

One approach would be to get those indices using broadcasted additions with those starts and row_length and then use NumPy's advanced-indexing to extract out all of those elements off the data array, like so -
idx = starts[:,None] + np.arange(row_length)
out = z[np.arange(idx.shape[0])[:,None], idx]
Sample run -
In [197]: z
Out[197]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
In [198]: starts = np.array([1,5,3,3,2])
In [199]: row_length = 2
In [200]: idx = starts[:,None] + np.arange(row_length)
In [202]: z[np.arange(idx.shape[0])[:,None], idx]
Out[202]:
array([[ 1, 2],
[12, 13],
[17, 18],
[24, 25],
[30, 31]])

Related

np.delete() how to delete multiple rows in python [duplicate]

How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax

There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.

Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)

numpy - efficient value counts in 2D and 3D arrays

I am writing a scheduling program for group play. I have a schedule that works for 32-4-8 (32 players, 4 players per group, 8 rounds) with no duplicate partners or opponents. However, due to space constraints, only 28 players / 7 groups can play in each round. So I have to modify the schedule so that every player gets 7 games, 1 bye, and as few repeat partners or opponents as possible.
import numpy as np
sched = np.array([
[[ 3, 28, 17, 14],
[23, 30, 22, 1],
[ 2, 5, 27, 25],
[20, 8, 10, 16],
[ 0, 24, 26, 11],
[ 4, 21, 31, 7],
[19, 6, 29, 15],
[13, 18, 12, 9]],
[[20, 15, 24, 31],
[ 3, 21, 16, 13],
[ 6, 30, 4, 5],
[28, 8, 0, 7],
[25, 29, 17, 23],
[14, 9, 2, 22],
[27, 12, 1, 11],
[26, 10, 19, 18]],
[[10, 4, 23, 12],
[ 9, 28, 25, 31],
[ 5, 13, 22, 8],
[15, 7, 30, 2],
[16, 19, 11, 14],
[18, 17, 24, 6],
[21, 0, 27, 20],
[ 3, 26, 29, 1]],
[[18, 20, 28, 1],
[ 8, 9, 3, 4],
[12, 17, 31, 5],
[13, 30, 27, 14],
[19, 25, 24, 7],
[ 2, 6, 21, 26],
[10, 11, 29, 22],
[15, 23, 0, 16]],
[[22, 21, 25, 15],
[26, 12, 20, 14],
[28, 5, 24, 10],
[11, 6, 31, 13],
[23, 27, 7, 3],
[ 0, 19, 9, 1],
[18, 30, 8, 29],
[16, 17, 2, 4]],
[[29, 28, 12, 21],
[ 9, 16, 27, 6],
[19, 17, 20, 30],
[ 2, 8, 24, 23],
[ 5, 11, 18, 7],
[26, 13, 25, 4],
[ 1, 10, 15, 14],
[ 0, 22, 31, 3]],
[[31, 19, 27, 8],
[20, 5, 29, 2],
[24, 16, 22, 12],
[25, 3, 10, 6],
[17, 1, 7, 13],
[ 4, 0, 14, 18],
[23, 28, 26, 15],
[11, 21, 9, 30]],
[[31, 18, 1, 16],
[23, 14, 21, 5],
[ 8, 3, 11, 15],
[26, 17, 9, 10],
[30, 12, 25, 0],
[22, 20, 7, 6],
[27, 4, 29, 24],
[13, 19, 28, 2]]
])
To determine the best bye selections, I randomly selected one matchup from each round of play as the bye. I then assign a score to each bye selection that maximizes the number of players that have only 1 bye, to minimize the necessary alterations to the schedule.
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = np.arange(len(arr))
for col in arr.T:
count[indexing, col] += 1
return count
# randomly sample one game per round as byes
# repeat n times (here 10000)
times = 10000
idx1 = np.tile(np.arange(sched.shape[0]), times)
idx2 = np.random.randint(sched.shape[1], size=sched.shape[0] * times)
population_byes = sched[idx1, idx2].reshape(times, sched.shape[1], sched.shape[2])
# get player counts for byes
# can reshape because interested in # of byes for entire schedule
# so no need to segment players by rounds for these counts
count_shape = (population_byes.shape[0], population_byes.shape[1] * population_byes.shape[2])
counts = bincount2d(population_byes.reshape(count_shape))
# fitness is the number of players with one bye
# the higher the value, the less we need to do to mess with the schedule
fitness = np.apply_along_axis(lambda x: (x == 1).sum(), 1, counts)
byes = population_byes[np.argmax(fitness)]
My questions are as follows:
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
(3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?

(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
bincount2d is inefficient because it performs inefficient memory accesses. Indeed, a transposition is an expensive operation, especially when it is done lazily like what Numpy does. Moreover, the loop is not efficient too because it works on a quite big array with a random memory access which is bad for CPU caches. That being said, Numpy is not great for such a computation. One can use Numba to implement the operation efficiently:
import numba as nb
# You may need to tune the types on your machines
# Alternatively, you can use cache=True instead and let Numba find the types (which is slower the fist time)
#nb.njit('int64[:,::1](int64[:,::1], optional(int64))')
def bincount2d_fast(arr, bins=None):
if bins is None:
nbins = np.max(arr) + 1
else:
nbins = np.int64(bins)
count = np.zeros((arr.shape[0], nbins), dtype=np.int64)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
count[i, arr[i, j]] += 1
return count
The above code is 10 times faster than the original bincount2d function on my machine.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
Yes. You can do the operation on the whole 2D array and perform the reduction on a given axis. Here is an example:
fitness = (counts == 1).sum(axis=1)
byes = population_byes[np.argmax(fitness)]
```
This is roughly 30 times faster on my machine.
> (3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
A straightforward solution is to use Numba again with plain loops. Another solution could be to save the value to swap in a temporary array and use an indirect access regarding your exact needs (like what #WholeBrain proposed). Something like:
```python
# all_x1, all_y1, etc. are 1D Numpy arrays containing coordinates of the items to swap
arr[all_x2, all_y2], arr[all_x1, all_y1] = arr[all_x1, all_y1], arr[all_x2, all_y2]
```

Reshape 3D array to 2D array Python

If I have a 3D array of ([4,3,3]) like this:
[[0,1,2] [[9,10,11 ] [[18,19,20] [[27,28,29]
[3,4,5] [12,13,14] [21,22,23] [30,31,32]
[6,7,8]] , [15,16,17]] , [24,25,26]] , [33,34,35]]
How would I convert it to a 2D array of ([6,6]) like this so that the 1st half of arrays are at the top half of the 160x160 and the 2nd half are at the bottom:
[[0,1,2,9,10,11]
[3,4,5,12,13,14]
[6,7,8,15,16,17]
[18,19,20,27,28,29]
[21,22,23,30,31,32]
[24,25,26,33,34,35]]
My array creation:
qDCTReversed = np.zeros((400,8,8), dtype=np.int)
And I need a (160,160) array.

A very fast one line solution using no for-loops is this:
# initialization
qDCTReversed = np.arange(4*3*3).reshape((4,3,3))
# calculation
qDCTReversed = qDCTReversed.reshape((2,2,3,3)).transpose((0,2,1,3)).reshape((6,6))
or for the (400,8,8) array:
qDCTReversed.reshape((20,20,8,8)).transpose((0,2,1,3)).reshape((160,160))
Speed comparison:
Mstaino's answer: 0.393 ms
yatu's answer: 0.138 ms
This answer: 0.016 ms

You can do this by looping over the list as such:
a = [[[ 0, 1, 2], [ 9,10,11]],
[[ 3, 4, 5], [12,13,14]],
[[ 6, 7, 8], [15,16,17]],
[[18,19,20], [27,28,29]],
[[21,22,23], [30,31,32]],
[[24,25,26], [33,34,35]]]
b = [[i for j in k for i in j ] for k in a]
print(b)
outputs:
[ 0, 1, 2, 9, 10, 11]
[ 3, 4, 5, 12, 13, 14]
[ 6, 7, 8, 15, 16, 17]
[18, 19, 20, 27, 28, 29]
[21, 22, 23, 30, 31, 32]
[24, 25, 26, 33, 34, 35]

The reshape you ask can be done with:
x = np.arange(36).reshape((4,3,3))
np.vstack(np.hstack(x[2*i:2+2*i]) for i in range(x.shape[0]//2))
>>array([[ 0, 1, 2, 9, 10, 11],
[ 3, 4, 5, 12, 13, 14],
[ 6, 7, 8, 15, 16, 17],
[18, 19, 20, 27, 28, 29],
[21, 22, 23, 30, 31, 32],
[24, 25, 26, 33, 34, 35]])

How to delete multiple rows of NumPy array?

How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax

There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.

Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)

Slicing multiple rows by single index

I have the following slicing problem in numpy.
a = np.arange(36).reshape(-1,4)
a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
In my problem always three rows represent one sample, in my case coordinates.
I want to access this matrix in a way that if I use a[0:2] to get the following:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]
These are the first two coordinate samples.
I have to extract a large amount of these coordinate sets from an array.
Thanks
Based on How do you split a list into evenly sized chunks?, I found the following solution, which gives me the desired result.
def chunks(l, n, indices):
return np.vstack([l[idx*n:idx*n+n] for idx in indices])
chunks(a,3,[0,2])
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
Probably this solution could be improved and somebody won't need the stacking.

If three rows are a sample, you can reshape your array to reflect that, use fancy indexing to retrieve your samples, then undo the shape change:
>>> a = a.reshape(-1, 3, 4)
>>> a[[0, 2]].reshape(-1, 4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy indexing: fixed length parts of each row with varying starting column - python

Related

np.delete() how to delete multiple rows in python [duplicate]

numpy - efficient value counts in 2D and 3D arrays

Reshape 3D array to 2D array Python

How to delete multiple rows of NumPy array?

Slicing multiple rows by single index

Categories

Resources