Convert numpy array to a range - python

I have a sorted list of numbers:
[ 1, 2, 3, 4, 6, 7, 8, 9, 14, 15, 16, 17, 18, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 45]
I want to know if numpy has a built-in feature to get something like this out of it:
[ [1, 4], [6, 9], [14, 18], [25,36], [38], [45] ]
Also great if it can ignore holes of certain 2-3 numbers missing in between would still make the range.
I am basically listing frame numbers of a video for processing -- so rather than list out all the frame numbers I would jut put in ranges and its okay if a 3-4 frames in between are missing.
Just want to know if there something that already implements this logic as it seems like a common thing people would want to do - otherwise, I'll implement it myself.
Edit:
found a very close question that's answered already:
converting a list of integers into range in python

Since it's tagged as numpy, here is a numpy solution (sort of). There is no native numpy function but you could use diff + where + split + list comprehension:
>>> [[ary[0], ary[-1]] if len(ary)>1 else [ary[0]] for ary in np.split(arr, np.where(np.diff(arr) != 1)[0] + 1)]
[[1, 4], [6, 9], [14, 18], [25, 36], [38], [45]]
If the array is large, it's more efficient to use a loop rather than np.split, so you could use the function below which produces the same outcome as above:
def array_to_range(arr):
idx = np.r_[0, np.where(np.diff(arr) != 1)[0]+1, len(arr)]
out = []
for i,j in zip(idx, idx[1:]):
ary = arr[i:j]
if len(ary) > 1:
out.append([ary[0], ary[-1]])
else:
out.append([ary[0]])
return out

numpy is not designed to work with to work with arrays that differs in size. It uses internally list.append to append multiple list to empty array using slicing. It really slows the things down and it's recommended to use only in case user is forced to.
Algorithm of np.split
To better see, how could you improve, let's take a look at general pattern of finding np.split-like indices of array split:
arr = [1, 2, 3, 4, 6, 7, 8, 9, 14, 15, 16, 17, 18, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 45]
div_points = np.flatnonzero(np.diff(arr)!=1) + 1
start_points = np.r_[0, div_points]
end_points = np.r_[div_points, len(arr)]
So now you've got start and end arguments of slices that split an array into multiple sub-arrays:
np.transpose([start_points, end_points])
>>> array([[ 0, 4],
[ 4, 8],
[ 8, 13],
[13, 25],
[25, 26],
[26, 27]])
And there is a mechanism that np.split uses internally to split an array:
container = []
for start, end in np.transpose([start_points, end_points]):
container.append(arr[start:end])
>>> container
[[1, 2, 3, 4],
[6, 7, 8, 9],
[14, 15, 16, 17, 18],
[25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36],
[38],
[45]]
Variation of algorithm
To arrive at an output nearly similar to what you expect, you could modify algorithm of np.split like so:
div_points = np.flatnonzero(np.diff(arr)!=1) + 1
start_points = np.r_[0, div_points]
end_points = np.r_[div_points, len(arr)]
out = np.transpose([arr[start_points], arr[end_points - 1]])
out
>>> array([[ 1, 4],
[ 6, 9],
[14, 18],
[25, 36],
[38, 38],
[45, 45]])

Related

np.delete() how to delete multiple rows in python [duplicate]

How can I delete multiple rows of NumPy array? For example, I want to delete the first five rows of x. I'm trying the following code:
import numpy as np
x = np.random.rand(10, 5)
np.delete(x, (0:5), axis=0)
but it doesn't work:
np.delete(x, (0:5), axis=0)
^
SyntaxError: invalid syntax
There are several ways to delete rows from NumPy array.
The easiest one is to use basic indexing as with standard Python lists:
>>> import numpy as np
>>> x = np.arange(35).reshape(7, 5)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> result = x[5:]
>>> result
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
You can select not only rows but columns as well:
>>> x[:2, 1:4]
array([[1, 2, 3],
[6, 7, 8]])
Another way is to use "fancy indexing" (indexing arrays using arrays):
>>> x[[0, 2, 6]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
You can achieve the same using np.take:
>>> np.take(x, [0, 2, 6], axis=0)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34]])
Yet another option is to use np.delete as in the question. For selecting the rows/columns for deletion it can accept slice objects, int, or array of ints:
>>> np.delete(x, slice(0, 5), axis=0)
array([[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> np.delete(x, [0, 2, 3], axis=0)
array([[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
But all this time that I've been using NumPy I never needed this np.delete, as in this case it's much more convenient to use boolean indexing.
As an example, if I would want to remove/select those rows that start with a value greater than 12, I would do:
>>> mask_array = x[:, 0] < 12 # comparing values of the first column
>>> mask_array
array([ True, True, True, False, False, False, False])
>>> x[mask_array]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[~mask_array] # ~ is an element-wise inversion
array([[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
For more information refer to the documentation on indexing: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
If you want to delete selected rows you can write like
np.delete(x, (1,2,5), axis = 0)
This will delete 1,2 and 5 th line, and if you want to delete like (1:5) try this one
np.delete(x, np.s_[0:5], axis = 0)
by this you can delete 0 to 4 lines from your array.
np.s_[0:5] --->> slice(0, 5, None)
both are same.
Pass the multiple row numbers to the list argument.
General Syntax:
np.delete(array_name,[rownumber1,rownumber2,..,rownumber n],axis=0)
Example: delete first three rows in an array:
np.delete(array_name,[0,1,2],axis=0)

numpy - efficient value counts in 2D and 3D arrays

I am writing a scheduling program for group play. I have a schedule that works for 32-4-8 (32 players, 4 players per group, 8 rounds) with no duplicate partners or opponents. However, due to space constraints, only 28 players / 7 groups can play in each round. So I have to modify the schedule so that every player gets 7 games, 1 bye, and as few repeat partners or opponents as possible.
import numpy as np
sched = np.array([
[[ 3, 28, 17, 14],
[23, 30, 22, 1],
[ 2, 5, 27, 25],
[20, 8, 10, 16],
[ 0, 24, 26, 11],
[ 4, 21, 31, 7],
[19, 6, 29, 15],
[13, 18, 12, 9]],
[[20, 15, 24, 31],
[ 3, 21, 16, 13],
[ 6, 30, 4, 5],
[28, 8, 0, 7],
[25, 29, 17, 23],
[14, 9, 2, 22],
[27, 12, 1, 11],
[26, 10, 19, 18]],
[[10, 4, 23, 12],
[ 9, 28, 25, 31],
[ 5, 13, 22, 8],
[15, 7, 30, 2],
[16, 19, 11, 14],
[18, 17, 24, 6],
[21, 0, 27, 20],
[ 3, 26, 29, 1]],
[[18, 20, 28, 1],
[ 8, 9, 3, 4],
[12, 17, 31, 5],
[13, 30, 27, 14],
[19, 25, 24, 7],
[ 2, 6, 21, 26],
[10, 11, 29, 22],
[15, 23, 0, 16]],
[[22, 21, 25, 15],
[26, 12, 20, 14],
[28, 5, 24, 10],
[11, 6, 31, 13],
[23, 27, 7, 3],
[ 0, 19, 9, 1],
[18, 30, 8, 29],
[16, 17, 2, 4]],
[[29, 28, 12, 21],
[ 9, 16, 27, 6],
[19, 17, 20, 30],
[ 2, 8, 24, 23],
[ 5, 11, 18, 7],
[26, 13, 25, 4],
[ 1, 10, 15, 14],
[ 0, 22, 31, 3]],
[[31, 19, 27, 8],
[20, 5, 29, 2],
[24, 16, 22, 12],
[25, 3, 10, 6],
[17, 1, 7, 13],
[ 4, 0, 14, 18],
[23, 28, 26, 15],
[11, 21, 9, 30]],
[[31, 18, 1, 16],
[23, 14, 21, 5],
[ 8, 3, 11, 15],
[26, 17, 9, 10],
[30, 12, 25, 0],
[22, 20, 7, 6],
[27, 4, 29, 24],
[13, 19, 28, 2]]
])
To determine the best bye selections, I randomly selected one matchup from each round of play as the bye. I then assign a score to each bye selection that maximizes the number of players that have only 1 bye, to minimize the necessary alterations to the schedule.
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = np.arange(len(arr))
for col in arr.T:
count[indexing, col] += 1
return count
# randomly sample one game per round as byes
# repeat n times (here 10000)
times = 10000
idx1 = np.tile(np.arange(sched.shape[0]), times)
idx2 = np.random.randint(sched.shape[1], size=sched.shape[0] * times)
population_byes = sched[idx1, idx2].reshape(times, sched.shape[1], sched.shape[2])
# get player counts for byes
# can reshape because interested in # of byes for entire schedule
# so no need to segment players by rounds for these counts
count_shape = (population_byes.shape[0], population_byes.shape[1] * population_byes.shape[2])
counts = bincount2d(population_byes.reshape(count_shape))
# fitness is the number of players with one bye
# the higher the value, the less we need to do to mess with the schedule
fitness = np.apply_along_axis(lambda x: (x == 1).sum(), 1, counts)
byes = population_byes[np.argmax(fitness)]
My questions are as follows:
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
(3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
bincount2d is inefficient because it performs inefficient memory accesses. Indeed, a transposition is an expensive operation, especially when it is done lazily like what Numpy does. Moreover, the loop is not efficient too because it works on a quite big array with a random memory access which is bad for CPU caches. That being said, Numpy is not great for such a computation. One can use Numba to implement the operation efficiently:
import numba as nb
# You may need to tune the types on your machines
# Alternatively, you can use cache=True instead and let Numba find the types (which is slower the fist time)
#nb.njit('int64[:,::1](int64[:,::1], optional(int64))')
def bincount2d_fast(arr, bins=None):
if bins is None:
nbins = np.max(arr) + 1
else:
nbins = np.int64(bins)
count = np.zeros((arr.shape[0], nbins), dtype=np.int64)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
count[i, arr[i, j]] += 1
return count
The above code is 10 times faster than the original bincount2d function on my machine.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
Yes. You can do the operation on the whole 2D array and perform the reduction on a given axis. Here is an example:
fitness = (counts == 1).sum(axis=1)
byes = population_byes[np.argmax(fitness)]
```
This is roughly 30 times faster on my machine.
> (3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
A straightforward solution is to use Numba again with plain loops. Another solution could be to save the value to swap in a temporary array and use an indirect access regarding your exact needs (like what #WholeBrain proposed). Something like:
```python
# all_x1, all_y1, etc. are 1D Numpy arrays containing coordinates of the items to swap
arr[all_x2, all_y2], arr[all_x1, all_y1] = arr[all_x1, all_y1], arr[all_x2, all_y2]
```

How to add values at repeat index locations on multidimensional arrays of Numpy?

I have a matrix B with shape (6, 9) . And for every row of B, I want to add 1 at some column indices. The column indices may appear more than once, so I hope add m on one column if which index appear m times. Please see the following example codes:
import numpy as np
B = np.arange(6*9).reshape(6, 9)
idx = np.array([[0, 1, 2],
[6, 7, 0],
[2, 3, 4],
[4, 5, 6]], dtype=np.int)
B[:, idx] += 1 # the result is not what I want.
Furthermore, np.add.at and np.bincount also do not seem to work for above case.
I hope your help. Thanks very much.
More Information:
In idx array, index 0, 2 4 and 6 appear twice, so I want
B[:, [0, 2, 4, 6]] += 2. For other indices appeared once, just add 1. So the final B should be
B = np.array([[ 2, 2, 4, 4, 6, 6, 8, 8, 8],
[11, 11, 13, 13, 15, 15, 17, 17, 17],
[20, 20, 22, 22, 24, 24, 26, 26, 26],
[29, 29, 31, 31, 33, 33, 35, 35, 35],
[38, 38, 40, 40, 42, 42, 44, 44, 44],
[47, 47, 49, 49, 51, 51, 53, 53, 53]])
I think you can use np.add.at function to get what you want. Its syntax is
np.add.at('array', ('slice or array of indices for 1st dimension', 'slice or array of indices for 2nd dimension'), 'what to add')
So, in your case, if you want to add 1 for every row for every column, specified in idx, you should use
>>> a = np.arange(6 * 9).reshape(6, 9)
>>> np.add.at(a, (np.s_[:], idx), 1)
np.s_[:] is a slice object that tells us to perform it for each row

Can't spot the bug in my rotate matrix code

I know there's an easy way using transpose to rotate a square-matrix 90 degrees, but I'm writing a solution as if I don't know that one (in other words, I want to do the swapping). The generally idea below is to swap layer by layer. The offset represents what layer (outward to inward) is being swapped.
Here's the algorithm (note that it is wrapped in a class because that's a Leetcode thing):
class Solution:
def rotate(self, matrix):
for start in range(len(matrix)//2):
end = len(matrix) - start - 1
# swap all 4 coordinates:
for offset in range(start, end):
# swap top_left over top_right
temp, matrix[start+offset][end] = matrix[start+offset][end], matrix[start][start+offset]
# swap top_right -> bottom_right
temp, matrix[end][end-offset] = matrix[end][end-offset], temp
# swap bottom_right -> bottom_left
temp, matrix[end-offset][start] = matrix[end-offset][start], temp
# swap bottom_left -> top_left
matrix[start][start+offset] = temp
This works for some hand tests with small matrices, as well as the smaller input test cases in the Leetcode submission. However, it fails on the following input:
[[ 2, 29, 20, 26, 16, 28],
[12, 27, 9, 25, 13, 21],
[32, 33, 32, 2, 28, 14],
[13, 14, 32, 27, 22, 26],
[33, 1, 20, 7, 21, 7],
[ 4, 24, 1, 6, 32, 34]]
Expected output:
[[ 4, 33, 13, 32, 12, 2],
[24, 1, 14, 33, 27, 29],
[ 1, 20, 32, 32, 9, 20],
[ 6, 7, 27, 2, 25, 26],
[32, 21, 22, 28, 13, 16],
[34, 7, 26, 14, 21, 28]]
My output:
[[ 4, 33, 13, 32, 12, 2],
[24, 1, 7, 33, 27, 29],
[ 1, 20, 32, 2, 14, 20],
[ 6, 28, 32, 27, 25, 26],
[32, 21, 22, 9, 13, 16],
[34, 7, 26, 14, 21, 28]]
This matrix is just big enough to where it becomes tedious to walk through the algorithm by hand like I did for the smaller input cases to debug. Where is the bug in my implementation?
Your offset is off, try
for offset in range(0, end-start):
Your (random) test data is hard to follow and debug. It would better to use readable set.
Also seems that implementation makes excessive swaps.
It is enough to remember cell value, shift data cells in cyclic manner, retrieve remembered value.
Here is simple implementation of rotation with mat filled with sequential numbers:
n = 5
nm = n - 1
mat = []
for i in range(n):
a = [x for x in range(n*i, n*i+n)]
mat.append(a)
for i in range(n):
(print(mat[i]))
for row in range((n + 1)//2):
for col in range(n//2):
t = mat[row] [col]
mat[row][col] = mat[nm-col][row]
mat[nm-col][row] = mat[nm-row][nm-col]
mat[nm-row][nm - col] = mat[col][nm-row]
mat[col][nm-row] = t
print("")
for i in range(n):
(print(mat[i]))
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[10, 11, 12, 13, 14]
[15, 16, 17, 18, 19]
[20, 21, 22, 23, 24]
[20, 15, 10, 5, 0]
[21, 16, 11, 6, 1]
[22, 17, 12, 7, 2]
[23, 18, 13, 8, 3]
[24, 19, 14, 9, 4]
In order to make that rotation, the value of i-th row and j-th column of the input matrix should be made equal to the j-th row i-th column of the output matrix (transpose) and (next step) columns should be reversed. If i were you I would define an empty matrix and (say TRANSPOSE, has the same size as the input matrix), would define 2 for loops (i in range len(matrix), and j in range len(matrix). Inside second loop I would write
TRANSPOSE[i][j] = matrix[j][i]
Now I have the matrix you want with reversed columns. Now we need to reverse columns to get what you want, so I would define another for loops with the same variables and range, and need one more empty matrix with the same size or we can use the input matrix again, let's use the input matrix in this case.
matrix[i][j] = TRANSPOSE[i][len(matrix) - j - 1]
and return matrix

Slicing 3d numpy array returns strange shape

If I slice a 2d array with a set of coordinates
>>> test = np.reshape(np.arange(40),(5,8))
>>> coords = np.array((1,3,4))
>>> slice = test[:, coords]
then my slice has the shape that I would expect
>>> slice.shape
(5, 3)
But if I repeat this with a 3d array
>>> test = np.reshape(np.arange(80),(2,5,8))
>>> slice = test[0, :, coords]
then the shape is now
>>> slice.shape
(3, 5)
Is there a reason that these are different? Separating the indices returns the shape that I would expect
>>> slice = test[0][:][coords]
>>> slice.shape
(5, 3)
Why would these views have different shapes?
slice = test[0, :, coords]
is simple indexing, in effect saying "take the 0th element of the first coordinate, all of the second coordinate, and [1,3,4] of the third coordinate". Or more precisely, take coordinates (0,whatever,1) and make it our first row, (0,whatever,2) and make it our second row, and (0,whatever,3) and make it our third row. There are 5 whatevers, so you end up with (3,5).
The second example you gave is like this:
slice = test[0][:][coords]
In this case you're looking at a (5,8) array, and then taking the 1st, 3rd and 4th elements, which are the 1st, 3rd and 4th rows, so you end up with a (5,3) array.
Edit to discuss 2D case:
In the 2D case, where:
>>> test = np.reshape(np.arange(40),(5,8))
>>> test
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
the behaviour is similar.
Case 1:
>>> test[:,[1,3,4]]
array([[ 1, 3, 4],
[ 9, 11, 12],
[17, 19, 20],
[25, 27, 28],
[33, 35, 36]])
is simply selecting columns 1,3, and 4.
Case 2:
>>> test[:][[1,3,4]]
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
is taking the 1st, 3rd and 4th element of the array, which are the rows.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The docs talk about the complexity of combining advanced and basic indexing.
test[0, :, coords]
The indexing coords comes first, with the [0,:] after, producing the the (3,5).
The easiest way to understand the situation may be to think in terms of the result shape. There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the subspace from the advanced indexing part. [in the case where]
The advanced indexes are separated by a slice, ellipsis or newaxis. For example x[arr1, :, arr2].
.... the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
I recall discussing this kind of indexing in a previous SO question, but it would take some digging to find it.
https://stackoverflow.com/a/28353446/901925 Why does the order of dimensions change with boolean indexing?
How does numpy order array slice indices?
The [:] in test[0][:][coords] does nothing. test[0][:,coords] produces the desired (5,3) result.
In [145]: test[0,:,[1,2,3]] # (3,5) array
Out[145]:
array([[ 1, 9, 17, 25, 33], # test[0,:,1]
[ 2, 10, 18, 26, 34],
[ 3, 11, 19, 27, 35]])
In [146]: test[0][:,[1,2,3]] # same values but (5,3)
Out[146]:
array([[ 1, 2, 3],
[ 9, 10, 11],
[17, 18, 19],
[25, 26, 27],
[33, 34, 35]])
In [147]: test[0][:][[1,2,3]] # [:] does nothing; select 3 from 2nd axis
Out[147]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])
In [148]: test[0][[1,2,3]] # same as test[0,[1,2,3],:]
Out[148]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])

Categories

Resources