numpy - efficient value counts in 2D and 3D arrays - python

I am writing a scheduling program for group play. I have a schedule that works for 32-4-8 (32 players, 4 players per group, 8 rounds) with no duplicate partners or opponents. However, due to space constraints, only 28 players / 7 groups can play in each round. So I have to modify the schedule so that every player gets 7 games, 1 bye, and as few repeat partners or opponents as possible.
import numpy as np
sched = np.array([
[[ 3, 28, 17, 14],
[23, 30, 22, 1],
[ 2, 5, 27, 25],
[20, 8, 10, 16],
[ 0, 24, 26, 11],
[ 4, 21, 31, 7],
[19, 6, 29, 15],
[13, 18, 12, 9]],
[[20, 15, 24, 31],
[ 3, 21, 16, 13],
[ 6, 30, 4, 5],
[28, 8, 0, 7],
[25, 29, 17, 23],
[14, 9, 2, 22],
[27, 12, 1, 11],
[26, 10, 19, 18]],
[[10, 4, 23, 12],
[ 9, 28, 25, 31],
[ 5, 13, 22, 8],
[15, 7, 30, 2],
[16, 19, 11, 14],
[18, 17, 24, 6],
[21, 0, 27, 20],
[ 3, 26, 29, 1]],
[[18, 20, 28, 1],
[ 8, 9, 3, 4],
[12, 17, 31, 5],
[13, 30, 27, 14],
[19, 25, 24, 7],
[ 2, 6, 21, 26],
[10, 11, 29, 22],
[15, 23, 0, 16]],
[[22, 21, 25, 15],
[26, 12, 20, 14],
[28, 5, 24, 10],
[11, 6, 31, 13],
[23, 27, 7, 3],
[ 0, 19, 9, 1],
[18, 30, 8, 29],
[16, 17, 2, 4]],
[[29, 28, 12, 21],
[ 9, 16, 27, 6],
[19, 17, 20, 30],
[ 2, 8, 24, 23],
[ 5, 11, 18, 7],
[26, 13, 25, 4],
[ 1, 10, 15, 14],
[ 0, 22, 31, 3]],
[[31, 19, 27, 8],
[20, 5, 29, 2],
[24, 16, 22, 12],
[25, 3, 10, 6],
[17, 1, 7, 13],
[ 4, 0, 14, 18],
[23, 28, 26, 15],
[11, 21, 9, 30]],
[[31, 18, 1, 16],
[23, 14, 21, 5],
[ 8, 3, 11, 15],
[26, 17, 9, 10],
[30, 12, 25, 0],
[22, 20, 7, 6],
[27, 4, 29, 24],
[13, 19, 28, 2]]
])
To determine the best bye selections, I randomly selected one matchup from each round of play as the bye. I then assign a score to each bye selection that maximizes the number of players that have only 1 bye, to minimize the necessary alterations to the schedule.
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = np.arange(len(arr))
for col in arr.T:
count[indexing, col] += 1
return count
# randomly sample one game per round as byes
# repeat n times (here 10000)
times = 10000
idx1 = np.tile(np.arange(sched.shape[0]), times)
idx2 = np.random.randint(sched.shape[1], size=sched.shape[0] * times)
population_byes = sched[idx1, idx2].reshape(times, sched.shape[1], sched.shape[2])
# get player counts for byes
# can reshape because interested in # of byes for entire schedule
# so no need to segment players by rounds for these counts
count_shape = (population_byes.shape[0], population_byes.shape[1] * population_byes.shape[2])
counts = bincount2d(population_byes.reshape(count_shape))
# fitness is the number of players with one bye
# the higher the value, the less we need to do to mess with the schedule
fitness = np.apply_along_axis(lambda x: (x == 1).sum(), 1, counts)
byes = population_byes[np.argmax(fitness)]
My questions are as follows:
(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
(3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?

(1) is there an efficient way to account for the values for which there are no counts (I know the indices should be from 0 to 31)? The bincount2d does not have values for the missing values in that range.
bincount2d is inefficient because it performs inefficient memory accesses. Indeed, a transposition is an expensive operation, especially when it is done lazily like what Numpy does. Moreover, the loop is not efficient too because it works on a quite big array with a random memory access which is bad for CPU caches. That being said, Numpy is not great for such a computation. One can use Numba to implement the operation efficiently:
import numba as nb
# You may need to tune the types on your machines
# Alternatively, you can use cache=True instead and let Numba find the types (which is slower the fist time)
#nb.njit('int64[:,::1](int64[:,::1], optional(int64))')
def bincount2d_fast(arr, bins=None):
if bins is None:
nbins = np.max(arr) + 1
else:
nbins = np.int64(bins)
count = np.zeros((arr.shape[0], nbins), dtype=np.int64)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
count[i, arr[i, j]] += 1
return count
The above code is 10 times faster than the original bincount2d function on my machine.
(2) Is there a vectorized/more efficient way than the np.apply_along_axis line to get the count of elements equal to 1?
Yes. You can do the operation on the whole 2D array and perform the reduction on a given axis. Here is an example:
fitness = (counts == 1).sum(axis=1)
byes = population_byes[np.argmax(fitness)]
```
This is roughly 30 times faster on my machine.
> (3) Ultimately, what I would like to do is have the application change the schedule to give everyone a bye by swapping player assignments. How do you swap elements in a 3D array?
A straightforward solution is to use Numba again with plain loops. Another solution could be to save the value to swap in a temporary array and use an indirect access regarding your exact needs (like what #WholeBrain proposed). Something like:
```python
# all_x1, all_y1, etc. are 1D Numpy arrays containing coordinates of the items to swap
arr[all_x2, all_y2], arr[all_x1, all_y1] = arr[all_x1, all_y1], arr[all_x2, all_y2]
```

Related

Convert numpy array to a range

I have a sorted list of numbers:
[ 1, 2, 3, 4, 6, 7, 8, 9, 14, 15, 16, 17, 18, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 45]
I want to know if numpy has a built-in feature to get something like this out of it:
[ [1, 4], [6, 9], [14, 18], [25,36], [38], [45] ]
Also great if it can ignore holes of certain 2-3 numbers missing in between would still make the range.
I am basically listing frame numbers of a video for processing -- so rather than list out all the frame numbers I would jut put in ranges and its okay if a 3-4 frames in between are missing.
Just want to know if there something that already implements this logic as it seems like a common thing people would want to do - otherwise, I'll implement it myself.
Edit:
found a very close question that's answered already:
converting a list of integers into range in python
Since it's tagged as numpy, here is a numpy solution (sort of). There is no native numpy function but you could use diff + where + split + list comprehension:
>>> [[ary[0], ary[-1]] if len(ary)>1 else [ary[0]] for ary in np.split(arr, np.where(np.diff(arr) != 1)[0] + 1)]
[[1, 4], [6, 9], [14, 18], [25, 36], [38], [45]]
If the array is large, it's more efficient to use a loop rather than np.split, so you could use the function below which produces the same outcome as above:
def array_to_range(arr):
idx = np.r_[0, np.where(np.diff(arr) != 1)[0]+1, len(arr)]
out = []
for i,j in zip(idx, idx[1:]):
ary = arr[i:j]
if len(ary) > 1:
out.append([ary[0], ary[-1]])
else:
out.append([ary[0]])
return out
numpy is not designed to work with to work with arrays that differs in size. It uses internally list.append to append multiple list to empty array using slicing. It really slows the things down and it's recommended to use only in case user is forced to.
Algorithm of np.split
To better see, how could you improve, let's take a look at general pattern of finding np.split-like indices of array split:
arr = [1, 2, 3, 4, 6, 7, 8, 9, 14, 15, 16, 17, 18, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 45]
div_points = np.flatnonzero(np.diff(arr)!=1) + 1
start_points = np.r_[0, div_points]
end_points = np.r_[div_points, len(arr)]
So now you've got start and end arguments of slices that split an array into multiple sub-arrays:
np.transpose([start_points, end_points])
>>> array([[ 0, 4],
[ 4, 8],
[ 8, 13],
[13, 25],
[25, 26],
[26, 27]])
And there is a mechanism that np.split uses internally to split an array:
container = []
for start, end in np.transpose([start_points, end_points]):
container.append(arr[start:end])
>>> container
[[1, 2, 3, 4],
[6, 7, 8, 9],
[14, 15, 16, 17, 18],
[25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36],
[38],
[45]]
Variation of algorithm
To arrive at an output nearly similar to what you expect, you could modify algorithm of np.split like so:
div_points = np.flatnonzero(np.diff(arr)!=1) + 1
start_points = np.r_[0, div_points]
end_points = np.r_[div_points, len(arr)]
out = np.transpose([arr[start_points], arr[end_points - 1]])
out
>>> array([[ 1, 4],
[ 6, 9],
[14, 18],
[25, 36],
[38, 38],
[45, 45]])

what is the difference between two following reshape function in numpy?

I am building a neural network. where I have to flatten my training dataset.
I have two options.
1 is:
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T
and 2nd one is:
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[1]*train_x_orig.shape[2]*train_x_orig.shape[3], 209)
both gave the same shape but I found difference while computing cost?
why is that? thank you
Your original tensor is of at least rank 4 based on the second example. The first example pulls each element, ordered by increasing the right-most index, and inserts the elements into rows the length of the zeroth shape. Then transposes.
The second example again pull elements from by incrementing from the right-most index, i.e.:
element = train_x_orig[0, 0, 0, 0]
new_row.append(element)
element = train_x_orig[0, 0, 0, 1]
new_row.append(element)
but the size of the row is different. It is now the dimension of everything else in the tensor.
Here is an example to illustrate.
First we create an ordered array and reshape it to rank 4.
import numpy as np
x = np.arange(36).reshape(3,2,3,2)
x
# returns:
array([[[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]],
[[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]],
[[[24, 25],
[26, 27],
[28, 29]],
[[30, 31],
[32, 33],
[34, 35]]]])
Here is the output of the first example
x.reshape(x.shape[0], -1).T
# returns:
array([[ 0, 12, 24],
[ 1, 13, 25],
[ 2, 14, 26],
[ 3, 15, 27],
[ 4, 16, 28],
[ 5, 17, 29],
[ 6, 18, 30],
[ 7, 19, 31],
[ 8, 20, 32],
[ 9, 21, 33],
[10, 22, 34],
[11, 23, 35]])
And here is the second example
x.reshape(x.shape[1]*x.shape[2]*x.shape[3], -1)
# returns:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29],
[30, 31, 32],
[33, 34, 35]])
How the elements get reordered is fundamentally different.

Sum array elements vertically with iteration

Hi I have an array that I want to sum the elements vertically. Just wonder are there any functions can do this easily ?
a = [[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]]
I want to print the answers of 1+6+11+16+21 , 2+7+12+17, 3+8+13, 4+9, 5
As you can see, in each iteration, there is one element less.
This is one approach using zip and a simple iteration.
Ex:
a = [[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]]
print([sum(v[:-i]) if i else sum(v) for i, v in enumerate(zip(*a))])
Output:
[55, 38, 24, 13, 5]
Converting to a numpy array, and then using the following list comprehension
a = np.array(a)
[a[:5-i,i].sum() for i in range(5)]
yields the following:
[55, 38, 24, 13, 5]

numpy indexing: fixed length parts of each row with varying starting column

I have a 2d array like z and a 1d array denoting the "start column position" like starts. In addition I have a fixed row_length = 2
z = np.arange(35).reshape(5, -1)
# --> array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
starts = np.array([1,5,3,3,2])
What I want is the outcome of this slow for-loop, just quicker if possible.
result = np.zeros(
(z.shape[0], row_length),
dtype=z.dtype
)
for i in range(z.shape[0]):
s = starts[i]
result[i] = z[i, s:s+row_length]
So result in this example should look like this in the end:
array([[ 1, 2],
[12, 13],
[17, 18],
[24, 25],
[30, 31]])
I can't seem to find a way using either fancy indexing or np.take to deliver this result.
One approach would be to get those indices using broadcasted additions with those starts and row_length and then use NumPy's advanced-indexing to extract out all of those elements off the data array, like so -
idx = starts[:,None] + np.arange(row_length)
out = z[np.arange(idx.shape[0])[:,None], idx]
Sample run -
In [197]: z
Out[197]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
In [198]: starts = np.array([1,5,3,3,2])
In [199]: row_length = 2
In [200]: idx = starts[:,None] + np.arange(row_length)
In [202]: z[np.arange(idx.shape[0])[:,None], idx]
Out[202]:
array([[ 1, 2],
[12, 13],
[17, 18],
[24, 25],
[30, 31]])

Slicing multiple rows by single index

I have the following slicing problem in numpy.
a = np.arange(36).reshape(-1,4)
a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
In my problem always three rows represent one sample, in my case coordinates.
I want to access this matrix in a way that if I use a[0:2] to get the following:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]
These are the first two coordinate samples.
I have to extract a large amount of these coordinate sets from an array.
Thanks
Based on How do you split a list into evenly sized chunks?, I found the following solution, which gives me the desired result.
def chunks(l, n, indices):
return np.vstack([l[idx*n:idx*n+n] for idx in indices])
chunks(a,3,[0,2])
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])
Probably this solution could be improved and somebody won't need the stacking.
If three rows are a sample, you can reshape your array to reflect that, use fancy indexing to retrieve your samples, then undo the shape change:
>>> a = a.reshape(-1, 3, 4)
>>> a[[0, 2]].reshape(-1, 4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]])

Categories

Resources