I have a pytorch sparse tensor that I need sliced row/column wise using this slice [idx][:,idx] where idx is a list of indexes, using the mentioned slice yields my desired result on an ordinary float tensor. Is it possible applying the same slicing on a sparse tensor? Example here:
#constructing sparse matrix
i = np.array([[0,1,2,2],[0,1,2,1]])
v = np.ones(4)
i = torch.from_numpy(i.astype("int64"))
v = torch.from_numpy(v.astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)
#constructing float tensor
test2 = np.array([[1,0,0],[0,1,0],[0,1,1]])
test2 = autograd.Variable(torch.cuda.FloatTensor(test2), requires_grad=False)
#slicing
idx = [1,2]
print(test2[idx][:,idx])
output:
Variable containing:
1 0
1 1
[torch.cuda.FloatTensor of size 2x2 (GPU 0)]
I am holding a 250.000 x 250.000 adjacency matrix, where I need to slice n rows and n columns, using the random idx, by simply sampling n random idx's. Since the dataset is so large it is not realistic to convert to a more convenient datatype.
can I achieve the same slicing result on test1? Is it even possible? If not, are there any work-arounds?
Right now I am running my model with the following "hack" of a solution:
idx = sorted(random.sample(range(0, np.shape(test1)[0]), 9000))
test1 = test1AsCsr[idx][:,idx].todense().astype("int32")
test1 = autograd.Variable(torch.cuda.FloatTensor(test1), requires_grad=False)
Where test1AsCsr is my test1 converted to a numpy CSR matrix. This solution works, it is however very slow, and makes my GPU utilization very low, since it needs to read/write from CPU memory, constantly.
Edit: Its fine with a non-sparse tensor as result
Well it's been a couple of years since there was activity on this question, but better late than never.
This is the function I use for slicing sparse tensors. (Helper functions are below)
def slice_torch_sparse_coo_tensor(t, slices):
"""
params:
-------
t: tensor to slice
slices: slice for each dimension
returns:
--------
t[slices[0], slices[1], ..., slices[n]]
"""
t = t.coalesce()
assert len(args) == len(t.size())
for i in range(len(args)):
if type(args[i]) is not torch.Tensor:
args[i] = torch.tensor(args[i], dtype= torch.long)
indices = t.indices()
values = t.values()
for dim, slice in enumerate(args):
invert = False
if t.size(0) * 0.6 < len(slice):
invert = True
all_nodes = torch.arange(t.size(0))
unique, counts = torch.cat([all_nodes, slice]).unique(return_counts=True)
slice = unique[counts==1]
if slice.size(0) > 400:
mask = ainb_wrapper(indices[dim], slice)
else:
mask = ainb(indices[dim], slice)
if invert:
mask = ~mask
indices = indices[:, mask]
values = values[mask]
return torch.sparse_coo_tensor(indices, values, t.size()).coalesce()
Usage (took 2.4s on my machine):
indices = torch.randint(low= 0, high= 200000, size= (2, 1000000))
values = torch.rand(size=(1000000,))
t = torch.sparse_coo_tensor(indices, values, size=(200000, 200000))
idx = torch.arange(1000)
slice_coo(t, [idx, idx])
out:
tensor(indices=tensor([[ 13, 62, 66, 78, 134, 226, 233, 266, 299, 344, 349,
349, 369, 396, 421, 531, 614, 619, 658, 687, 769, 792,
810, 840, 926, 979],
[255, 479, 305, 687, 672, 867, 444, 559, 772, 96, 788,
980, 423, 699, 911, 156, 267, 721, 381, 781, 97, 271,
840, 292, 487, 185]]),
values=tensor([0.4260, 0.4816, 0.8001, 0.8815, 0.3971, 0.4914, 0.7068,
0.2329, 0.4038, 0.1757, 0.7758, 0.3210, 0.2593, 0.8290,
0.1320, 0.4322, 0.7529, 0.8341, 0.8128, 0.4457, 0.4100,
0.1618, 0.4097, 0.3088, 0.6942, 0.5620]),
size=(200000, 200000), nnz=26, layout=torch.sparse_coo)
Timings for slice_torch_sparse_coo_tensor:
%timeit slice_torch_sparse_coo_tensor(t, torch.randperm(200000)[:500], torch.arange(200000))
output:
1.08 s ± 447 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
for the built-in torch.index_select (implemented here):
%timeit t.index_select(0, torch.arange(100))
output:
56.7 s ± 4.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
These are the helper functions I use for this purpose, the function "ainb" finds the elements of a that are in b. I found this function in the internet a while ago but I can't find the post to link it.
import torch
def ainb(a,b):
"""gets mask for elements of a in b"""
size = (b.size(0), a.size(0))
if size[0] == 0: # Prevents error in torch.Tensor.max(dim=0)
return torch.tensor([False]*a.size(0), dtype= torch.bool)
a = a.expand((size[0], size[1]))
b = b.expand((size[1], size[0])).T
mask = a.eq(b).max(dim= 0).values
return mask
def ainb_wrapper(a, b, splits = .72):
inds = int(len(a)**splits)
tmp = [ainb(a[i*inds:(i+1)*inds], b) for i in list(range(inds))]
return torch.cat(tmp)
Since the function scales quadratically with the amount of elements I added a wrapper that splits the input into chunks and then concatenates the output. It's more efficient using only CPU, but I am not sure whether this holds when using a GPU, would appreciate it if someone could test it :)
It's my first time posting, so feedback on the quality of the post is also appreciated.
Possible answer for 2-dimentional sparse indices
Find an answer below, playing with several pytorch methods (torch.eq(), torch.unique(), torch.sort(), etc.) in order to output a compact, sliced tensor of shape (len(idx), len(idx)).
I tested several edge cases (unordered idx, v with 0s, i with multiple same index pairs, etc.), though I may have forgot some. Performance should also be checked.
import torch
import numpy as np
def in1D(x, labels):
"""
Sub-optimal equivalent to numpy.in1D().
Hopefully this feature will be properly covered soon
c.f. https://github.com/pytorch/pytorch/issues/3025
Snippet by Aron Barreira Bordin
Args:
x (Tensor): Tensor to search values in
labels (Tensor/list): 1D array of values to search for
Returns:
Tensor: Boolean tensor y of same shape as x, with y[ind] = True if x[ind] in labels
Example:
>>> in1D(torch.FloatTensor([1, 2, 0, 3]), [2, 3])
FloatTensor([False, True, False, True])
"""
mapping = torch.zeros(x.size()).byte()
for label in labels:
mapping = mapping | x.eq(label)
return mapping
def compact1D(x):
"""
"Compact" values 1D uint tensor, so that all values are in [0, max(unique(x))].
Args:
x (Tensor): uint Tensor
Returns:
Tensor: uint Tensor of same shape as x
Example:
>>> densify1D(torch.ByteTensor([5, 8, 7, 3, 8, 42]))
ByteTensor([1, 3, 2, 0, 3, 4])
"""
x_sorted, x_sorted_ind = torch.sort(x, descending=True)
x_sorted_unique, x_sorted_unique_ind = torch.unique(x_sorted, return_inverse=True)
x[x_sorted_ind] = x_sorted_unique_ind
return x
# Input sparse tensor:
i = torch.from_numpy(np.array([[0,1,4,3,2,1],[0,1,3,1,4,1]]).astype("int64"))
v = torch.from_numpy(np.arange(1, 7).astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)
print(test1.to_dense())
# tensor([[ 1., 0., 0., 0., 0.],
# [ 0., 8., 0., 0., 0.],
# [ 0., 0., 0., 0., 5.],
# [ 0., 4., 0., 0., 0.],
# [ 0., 0., 0., 3., 0.]])
# note: test1[1, 1] = v[i[1,:]] + v[i[6,:]] = 2 + 6 = 8
# since both i[1,:] and i[6,:] are [1,1]
# Input slicing indices:
idx = [4,1,3]
# Getting the elements in `i` which correspond to `idx`:
v_idx = in1D(i, idx).byte()
v_idx = v_idx.sum(dim=0).squeeze() == i.size(0) # or `v_idx.all(dim=1)` for pytorch 0.5+
v_idx = v_idx.nonzero().squeeze()
# Slicing `v` and `i` accordingly:
v_sliced = v[v_idx]
i_sliced = i.index_select(dim=1, index=v_idx)
# Building sparse result tensor:
i_sliced[0] = compact1D(i_sliced[0])
i_sliced[1] = compact1D(i_sliced[1])
# To make sure to have a square dense representation:
size_sliced = torch.Size([len(idx), len(idx)])
res = torch.sparse.FloatTensor(i_sliced, v_sliced, size_sliced)
print(res)
# torch.sparse.FloatTensor of size (3,3) with indices:
# tensor([[ 0, 2, 1, 0],
# [ 0, 1, 0, 0]])
# and values:
# tensor([ 2., 3., 4., 6.])
print(res.to_dense())
# tensor([[ 8., 0., 0.],
# [ 4., 0., 0.],
# [ 0., 3., 0.]])
Previous answer for 1-dimentional sparse indices
Here is a (probably sub-optimal and not covering all edge cases) solution, following the intuitions shared in a related open issue (hopefully this feature will be properly covered soon):
# Constructing a sparse tensor a bit more complicated for the sake of demo:
i = torch.LongTensor([[0, 1, 5, 2]])
v = torch.FloatTensor([[1, 3, 0], [5, 7, 0], [9, 9, 9], [1,2,3]])
test1 = torch.sparse.FloatTensor(i, v)
# note: if you directly have sparse `test1`, you can get `i` and `v`:
# i, v = test1._indices(), test1._values()
# Getting the slicing indices:
idx = [1,2]
# Preparing to slice `v` according to `idx`.
# For that, we gather the list of indices `v_idx` such that i[v_idx[k]] == idx[k]:
i_squeeze = i.squeeze()
v_idx = [(i_squeeze == j).nonzero() for j in idx] # <- doesn't seem optimal...
v_idx = torch.cat(v_idx, dim=1)
# Slicing `v` accordingly:
v_sliced = v[v_idx.squeeze()][:,idx]
# Now defining your resulting sparse tensor.
# I'm not sure what kind of indexing you want, so here are 2 possibilities:
# 1) "Dense" indixing:
test1x = torch.sparse.FloatTensor(torch.arange(v_idx.size(1)).long().unsqueeze(0), v_sliced)
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
# 0 1
# [torch.LongTensor of size (1,2)]
# and values:
#
# 7 0
# 2 3
# [torch.FloatTensor of size (2,2)]
# 2) "Sparse" indixing using the original `idx`:
test1x = torch.sparse.FloatTensor(autograd.Variable(torch.LongTensor(idx)).unsqueeze(0), v_sliced)
# note: this indexing would fail if elements of `idx` were not in `i`.
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
# 1 2
# [torch.LongTensor of size (1,2)]
# and values:
#
# 7 0
# 2 3
# [torch.FloatTensor of size (2,2)]
Related
What is the fastest way to perform operations on adjacent elements of an mxn array within distance $l$ (where m, n are large). If this was an image, it would equate to an operation on the surrounding pixels. To make things clearer, I've created a new array with the neighbours of the corresponding source.
Given some array like
x = [[1,2,3],
[4,5,6],
[7,8,9]]
if I were to take the [0,0] element, and want the surrounding elements at $l$=1, I'd need the [0,1] and [1,0] elements (namley 2 and 4). The desired output would look something like this
y = [[[2,4], [1,3,5], [2,6]],
[[1,5,7], [4,6,2,8], [3,9,5]],
[[4,8], [7,5,9], [8,6]]]
I've tried playing around with kdTree from scipy.spatial, and am aware of https://stackoverflow.com/a/45742628/20451990, but as far as I can tell this is actually finding the nearest data points, whereas I want to find the nearest array elements. I guess it could be naively done by iterating through, but that is very slow...
The end goal here is to generate combinations of nearby array elements which I will be taking the product of. For the example above this could be
[[1*2, 1*4], [2*1, 2*3, 2*5], [3*2, 3*6]],...]
Key takeaways
With numba, it is possible to get roughly 690x times faster algorithms than with naïve python code with for-loops and list appends.
With numba, functions have signature; you tell explicitly what is the datatype.
Avoid memory (re-)allocations. Try to allocate memory for any arrays in advance. Reuse the data containers whenever possible (See: cell_result in the numbafied process_cell())
Numba is not super handy with classes (at least, OOP style code), stuff which is dynamically typed, containers with mixed types or containers changing in size. Prefer simple functions and typed structures with defined size. See also: Supported Python features
Numba likes for-loops, and they're fast!
Prewords
You asked for a fastest way to calculate this. I had no baseline, so I created first a pure python for-loop solution as a baseline. Then, I used numba to make the code run fast. It most probably is not the fastest implementation but at least it is way faster than the naïve pure python for-loop approach.
So, if you are not familiar with numba this is a good way to learn about it a bit :)
Used test data
I use two pieces of test data. First, the simple array given in the question. I call this myarr, and it is used for easy comparison of the output:
import numpy as np
myarr = np.array(
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
],
dtype=np.float32,
)
The second dataset is for benchmarking. You mentioned that the arrays will be of size 30 x 30 and the distance I will be less than 4.
arr_large = np.arange(1, 30 * 30 + 1, 1, dtype=np.float32).reshape(30, 30)
In other words, the arr_large is a 30 x 30 2d-array:
>>> arr_large
array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.,
12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.,
23., 24., 25., 26., 27., 28., 29., 30.],
...
[871., 872., 873., 874., 875., 876., 877., 878., 879., 880., 881.,
882., 883., 884., 885., 886., 887., 888., 889., 890., 891., 892.,
893., 894., 895., 896., 897., 898., 899., 900.]], dtype=float32)
I specified the dtype because specifying datatype is needed at the optimization step. For the pure python solution this is of course not necessary at all.
Baseline solution: Pure python with for-loops
I implemented the baseline soution with a python class and for-loops. The output from it looks like this (source for NeighbourProcessor below):
Example output with 3 x 3 input array (I=1)
n = NeighbourProcessor()
output = n.process(myarr, max_distance=1)
The output is then
>>> output
{(0, 0): [2, 4],
(0, 1): [2, 6, 10],
(0, 2): [6, 18],
(1, 0): [4, 20, 28],
(1, 1): [10, 20, 30, 40],
(1, 2): [18, 30, 54],
(2, 0): [28, 56],
(2, 1): [40, 56, 72],
(2, 2): [54, 72]}
which is same as
{(0, 0): [1 * 2, 1 * 4],
(0, 1): [2 * 1, 2 * 3, 2 * 5],
(0, 2): [3 * 2, 3 * 6],
(1, 0): [4 * 1, 4 * 5, 4 * 7],
(1, 1): [5 * 2, 5 * 4, 5 * 6, 5 * 8],
(1, 2): [6 * 3, 6 * 5, 6 * 9],
(2, 0): [7 * 4, 7 * 8],
(2, 1): [8 * 5, 8 * 7, 8 * 9],
(2, 2): [9 * 6, 9 * 8]}
This is basically what was asked in the question; the target ouput was
[[1*2, 1*4], [2*1, 2*3, 2*5], [3*2, 3*6]],...]
Here I used a dictionary with (row, column) as the key because that way you can more easily find the output for each cell.
Baseline performance
For the largest input of 30 x 30, and largest distance (I=4), the calculation takes about 0.188 seconds on my laptop:
>>> %timeit n.process(arr_large, max_distance=4)
188 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Code for NeighbourProcessor
import math
import numpy as np
class NeighbourProcessor:
def __init__(self):
self.arr = None
def process(self, arr, max_distance=1):
self.arr = arr
output = dict()
rows, columns = self.arr.shape
for current_row in range(rows):
for current_col in range(columns):
cell_result = self.process_cell(current_row, current_col, max_distance)
output[(current_row, current_col)] = cell_result
return output
def row_col_is_within_array(self, row, col):
if row < 0 or col < 0:
return False
if row > self.arr.shape[0] - 1 or col > self.arr.shape[1] - 1:
return False
return True
def distance(self, row, col, current_row, current_col):
distance_squared = (current_row - row) ** 2 + (current_col - col) ** 2
return np.sqrt(distance_squared)
def are_neighbours(self, row, col, current_row, current_col, max_distance):
if row == current_row and col == current_col:
return False
if not self.row_col_is_within_array(row, col):
return False
return self.distance(row, col, current_row, current_col) <= max_distance
def neighbours(self, current_row, current_col, max_distance):
start_row = math.floor(current_row - max_distance)
start_col = math.floor(current_col - max_distance)
end_row = math.ceil(current_row + max_distance)
end_col = math.ceil(current_col + max_distance)
for row in range(start_row, end_row + 1):
for col in range(start_col, end_col + 1):
if self.are_neighbours(
row, col, current_row, current_col, max_distance
):
yield row, col
def process_cell(self, current_row, current_col, max_distance):
cell_output = []
current_cell_value = self.arr[current_row][current_col]
for row, col in self.neighbours(current_row, current_col, max_distance):
neighbour_cell_value = self.arr[row][col]
cell_output.append(current_cell_value * neighbour_cell_value)
return cell_output
Short explanation
So what the NeighbourProcessor.process does is goes through the rows and columns of the input array, starting from (0,0), which is left top corner, and processing from left to right, top to bottom until the bottom right corner, which is (n_rows, n_columns), each time marking the cell as current cell; (current_row, current_column).
For each current cell, process it in process_cell. That will form an iterator with neighbours() which iterates all the neighbours at within maximum distance of I from the current cell. You can check how the logic goes in are_neighbours
Faster solution: Using numba and memory pre-allocation
Now I will make a functions-only version with numba, and try to make the processing as fast as possible. There is possibility also to use classes in numba, but they are still bit more experimental and complex, and this problem can be solved with functions only. The readability of the code suffers a bit, but that's the price we sometimes pay for speed optimization.
I'll start with the process function. Now it will have to create a a three dimensional array instead of a dict. The reason we want to create the array ahead of time because we memory allocation is a costly process and we want to do that exactly once. So, instead of having this as output for myarr:
# output[(row,column)]
#
output[(0,0)] # [2,4]
output[(0,1)] # [2, 6, 10]
#..etc
I want constant-sized output:
# output[row][column]
#
output[0][0] # [2, 4, nan, nan]
output[0][1] # [2, 6, 10, nan]
#..etc
Notice that after all the "pairs", the output is np.nan (not a number). Any postprocessing script must then just simply ignore the extra nans.
Solving for the required size for the pre-allocated array
How I know the size of the third dimension, i.e. the number of neighbours for given max. distance I? Well, I don't. It seems this is quite a complicated problem. See, for example this, this or the Gauss circle problem in Wikipedia. Nevertheless, I can quite easily calculate an upper bound for the number of neighbours. In the following I assume that neighbour is a neighbour if and only if the distance of the middle point of the cells is less or equal to I. If you create sketches with pen and paper, you will notice that when you increase the number of neighbours, the maximum number of neighbours grows as:
I = 1 -> max_number_neighbours = 4
I = 2 -> max_number_neighbours = 9
I = 3 -> max_number_neighbours = 28
Here is an example sketch with 10 x 10 2d-array and distance I=3, when current cell is (4,5), the number of neighbours must be less or equal to 28:
This pattern is represented as a function of max distance (I): (2*I-1)**2 + 4 -1, or
n_third_dimension = max_number_neighbours = (2*I-1)**2 + 3
Refactoring the code to work with numba
We start with creating the function signature of the entry point. In this case, we create a function process with the function signature:
#numba.jit("f4[:,:,:](f4[:,:], f4)")
def process(arr, max_distance):
...
See the docs for the other available types. The f4[:,:] just means that the input is 2d-array of float32 and f4[:,:,:](....) means that the function output is 3d-array of float32. Next, we create the output with the formula we invented above. Here is one part of the magic: memory pre-allocation with np.empty:
n_third_dimension = (2 * math.ceil(max_distance) - 1) ** 2 + 3
output = np.empty((*arr.shape, n_third_dimension), dtype=np.float32)
cell_result = np.empty(n_third_dimension, dtype=np.float32)
Numbafied code
I will not walk though the rest of the code hand-in-hand, but you can see below that it is a bit modified version of the pure python for-loop baseline.
import math
import numba
import numpy as np
#numba.njit("f4(i4,i4,i4,i4)")
def distance(row, col, current_row, current_col):
distance_squared = (current_row - row) ** 2 + (current_col - col) ** 2
return np.sqrt(distance_squared)
#numba.njit("boolean(i4,i4, i4,i4)")
def row_col_is_within_array(
row,
col,
arr_rows,
arr_cols,
):
if row < 0 or col < 0:
return False
if row > arr_rows - 1 or col > arr_cols - 1:
return False
return True
#numba.njit("boolean(i4,i4,i4,i4,f4,i4,i4)")
def are_neighbours(
neighbour_row,
neighbour_col,
current_row,
current_col,
max_distance,
arr_rows,
arr_cols,
):
if neighbour_row == current_row and neighbour_col == current_col:
return False
if not row_col_is_within_array(
neighbour_row,
neighbour_col,
arr_rows,
arr_cols,
):
return False
return (
distance(neighbour_row, neighbour_col, current_row, current_col) <= max_distance
)
#numba.njit("f4[:](f4[:,:], f4[:], i4,i4,i4,f4)")
def process_cell(
arr, cell_result, current_row, current_col, n_third_dimension, max_distance
):
for i in range(n_third_dimension):
cell_result[i] = np.nan
current_cell_value = arr[current_row][current_col]
# Potential cell neighbour area
start_row = math.floor(current_row - max_distance)
start_col = math.floor(current_col - max_distance)
end_row = math.ceil(current_row + max_distance)
end_col = math.ceil(current_col + max_distance)
arr_rows, arr_cols = arr.shape
cell_pointer = 0
for neighbour_row in range(start_row, end_row + 1):
for neighbour_col in range(start_col, end_col + 1):
if are_neighbours(
neighbour_row,
neighbour_col,
current_row,
current_col,
max_distance,
arr_rows,
arr_cols,
):
neighbour_cell_value = arr[neighbour_row][neighbour_col]
cell_result[cell_pointer] = current_cell_value * neighbour_cell_value
cell_pointer += 1
return cell_result
#numba.njit("f4[:,:,:](f4[:,:], f4)")
def process(arr, max_distance):
n_third_dimension = (2 * math.ceil(max_distance) - 1) ** 2 + 3
output = np.empty((*arr.shape, n_third_dimension), dtype=np.float32)
cell_result = np.empty(n_third_dimension, dtype=np.float32)
rows, columns = arr.shape
for current_row in range(rows):
for current_col in range(columns):
cell_result = process_cell(
arr,
cell_result,
current_row,
current_col,
n_third_dimension,
max_distance,
)
output[current_row][current_col][:] = cell_result
return output
Example output
>>> output = process(myarr, max_distance=1.0)
>>> output
array([[[ 2., 4., nan, nan],
[ 2., 6., 10., nan],
[ 6., 18., nan, nan]],
[[ 4., 20., 28., nan],
[10., 20., 30., 40.],
[18., 30., 54., nan]],
[[28., 56., nan, nan],
[40., 56., 72., nan],
>>> output[0]
array([[ 2., 4., nan, nan],
[ 2., 6., 10., nan],
[ 6., 18., nan, nan]], dtype=float32)
>>> output[0][1]
array([ 2., 6., 10., nan], dtype=float32)
# Above is the same as target: [2 * 1, 2 * 3, 2 * 5]
Speed of the numbafied code and closing words
The baseline approach rxecution time was 188 ms. Now, it is 271 µs. That is only 0.00144 times of what the original code took! (99.85% reduction in execution time. Some would say 693x faster.).
>>> %timeit process(arr_large, max_distance=4.0)
271 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Note that you might want to calculate the distance differently, or add there weighting, or some more complex logic, aggregation functions, etc. This could be still further optimized a bit by creating better estimate for the maximum number of neighbors, for example. Have fun with numba, and I hope you learned something! :)
Bonus tip: There is also ahead of time compilation in numba which you can use to make also the first function call fast!
I am trying to use fancy indexing to modifying a large sparce matrix. Suppose you have the following code:
import numpy as np
import scipy.sparse as sp
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
b = sp.lil_matrix(a)
c = sp.lil_matrix((3,4))
c[[1,2], 0] = b[[1,2], 0]
However, this code gives the following error:
ValueError: shape mismatch in assignment
I don't understand why this doesn't work. Both matrices have the same shape and this usually works if both matrices are numpy arrays. I would appreciate any help.
Yeah this is a bug with the sparse __setitem__. I've run into it before (but I just worked around it). Now I actually looked into it; first, you can fix this pretty easily:
import numpy as np
import scipy.sparse as sp
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
b = sp.lil_matrix(a)
c = sp.lil_matrix((3,4))
c[[1,2], 0] = b[[1,2], 0]
This raises the ValueError you saw. This doesn't and works as expected:
c[[1,2], 0] = b[[1,2], [0]]
>>> c.A
array([[0., 0., 0., 0.],
[5., 0., 0., 0.],
[9., 0., 0., 0.]])
Lets just walk through the offending __setitem__ (I'm going to omit a lot of code that doesn't get called):
row, col = self._validate_indices(key)
This is fine - row = [1, 2] and col = 0
col = np.atleast_1d(col)
i, j = _broadcast_arrays(row, col)
So far so good - i = [1, 2] and j = [0, 0]
if i.ndim == 1:
# Inner indexing, so treat them like row vectors.
i = i[None]
j = j[None]
broadcast_row = x.shape[0] == 1 and i.shape[0] != 1
broadcast_col = x.shape[1] == 1 and i.shape[1] != 1
Here's our problem - i and j both got turned into row vectors with shape (1, 2). x here is what you're trying to assign (b[[1,2], 0]), which is of shape (2, 1); the next step raises a ValueError cause x and the indices don't align.
>>> c[[1,2], 0] = b[[1,2], 0].A
ValueError: cannot reshape array of size 4 into shape (2,)
Here's the same problem but __setitem__ broadcasts x into a (2,2) array, which then fails again because it's larger than the array you're assigning it to.
The workaround (b[[1,2], [0]]) has a shape of (1, 2) which is not correct, but that error ends up cancelling out the error in indexing c.
I'm not sure exactly what the logic is behind this indexing code so I'm not sure how to fix this without introducing other subtle bugs.
I want to create an array of a given shape based on another numpy array. The number of dimensions will be matching, but the sizes will differ from axis to axis. If the original size is too small, I want to pad it with zeros to fulfill the requirements. Example of expected behaviour to clarify:
embedding = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
resize_with_outer_zeros(embedding, (4, 3)) = np.array([
[1, 2, 3],
[5, 6, 7],
[0, 0, 0],
[0, 0, 0]
])
I think I achieved the desired behaviour with the function below.
def resize_with_outer_zeros(embedding: np.ndarray, target_shape: Tuple[int, ...]) -> np.ndarray:
padding = tuple((0, max(0, target_size - size)) for target_size, size in zip(target_shape, embedding.shape))
target_slice = tuple(slice(0, target_size) for target_size in target_shape)
return np.pad(embedding, padding)[target_slice]
However, I have strong doubts about its efficiency and elegance, as it involves a lot of pure python tuple operations. Is there a better and more concise way to do it?
If you know that your array won't be bigger than some size (r, c), why not just:
def pad_with_zeros(A, r, c):
out = np.zeros((r, c))
r_, c_ = np.shape(A)
out[0:r_, 0:c_] = A
return out
If you want to support arbitrary dimensions (tensors) it gets a little uglier, but the principle remains the same:
def pad(A, shape):
out = np.zeros(shape)
out[tuple(slice(0, d) for d in np.shape(A))] = A
return out
And to support larger arrays (larger than what you would pad):
def pad(A, shape):
shape = np.max([np.shape(A), shape], axis=0)
out = np.zeros(shape)
out[tuple(slice(0, d) for d in np.shape(A))] = A
return out
I don't think you can do much better, but instead of using pad and then slicing, just do zeros at the right size and then an assignment - this cuts it to one list comprehension instead of two.
embedding = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
z = np.zeros((4,3))
s = tuple([slice(None, min(za,ea)) for za,ea in zip(z.shape, embedding.shape)])
z[s] = embedding[s]
z
# array([[1., 2., 3.],
# [5., 6., 7.],
# [0., 0., 0.],
# [0., 0., 0.]])
I'd just use a zero-matrix and run a nested for-loop to set the values from the older array - the remaining places will automatically be padded with zeros.
import numpy as np
def resize_array(array, new_size):
Z = np.zeros(new_size)
for i in range(len(Z)):
for j in range(len(Z[i])):
try:
Z[i][j] = array[i][j]
except IndexError: # just in case array[i][j] doesn't exist in the new size and should be truncated
pass
return Z
embedding = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(resize_array(embedding, (4, 3)))
I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond -- i.e. shuffle them in unison with respect to their leading indices.
This code works, and illustrates my goals:
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
permutation = numpy.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
For example:
>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
[1, 1],
[3, 3]]), array([2, 1, 3]))
However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays -- I'd rather shuffle them in-place, since they'll be quite large.
Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.
One other thought I had was this:
def shuffle_in_unison_scary(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
This works...but it's a little scary, as I see little guarantee it'll continue to work -- it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example.
Your can use NumPy's array indexing:
def unison_shuffled_copies(a, b):
assert len(a) == len(b)
p = numpy.random.permutation(len(a))
return a[p], b[p]
This will result in creation of separate unison-shuffled arrays.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)
To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html
Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.
If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.
Example: Let's assume the arrays a and b look like this:
a = numpy.array([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 6., 7., 8.],
[ 9., 10., 11.]],
[[ 12., 13., 14.],
[ 15., 16., 17.]]])
b = numpy.array([[ 0., 1.],
[ 2., 3.],
[ 4., 5.]])
We can now construct a single array containing all the data:
c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[ 0., 1., 2., 3., 4., 5., 0., 1.],
# [ 6., 7., 8., 9., 10., 11., 2., 3.],
# [ 12., 13., 14., 15., 16., 17., 4., 5.]])
Now we create views simulating the original a and b:
a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)
The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).
In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.
This solution could be adapted to the case that a and b have different dtypes.
Very simple solution:
randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]
the two arrays x,y are now both randomly shuffled in the same way
James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)
from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array
# Data is currently unshuffled; we should shuffle
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]
Shuffle any number of arrays together, in-place, using only NumPy.
import numpy as np
def shuffle_arrays(arrays, set_seed=-1):
"""Shuffles arrays in-place, in the same order, along axis=0
Parameters:
-----------
arrays : List of NumPy arrays.
set_seed : Seed value if int >= 0, else seed is random.
"""
assert all(len(arr) == len(arrays[0]) for arr in arrays)
seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c])
A few things to note:
The assert ensures that all input arrays have the same length along
their first dimension.
Arrays shuffled in-place by their first dimension - nothing returned.
Random seed within positive int32 range.
If a repeatable shuffle is needed, seed value can be set.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.
you can make an array like:
s = np.arange(0, len(a), 1)
then shuffle it:
np.random.shuffle(s)
now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.
x_data = x_data[s]
x_label = x_label[s]
There is a well-known function that can handle this:
from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)
Just setting test_size to 0 will avoid splitting and give you shuffled data.
Though it is usually used to split train and test data, it does shuffle them too.
From documentation
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y)) and application to input data into a
single call for splitting (and optionally subsampling) data in a
oneliner.
This seems like a very simple solution:
import numpy as np
def shuffle_in_unison(a,b):
assert len(a)==len(b)
c = np.arange(len(a))
np.random.shuffle(c)
return a[c],b[c]
a = np.asarray([[1, 1], [2, 2], [3, 3]])
b = np.asarray([11, 22, 33])
shuffle_in_unison(a,b)
Out[94]:
(array([[3, 3],
[2, 2],
[1, 1]]),
array([33, 22, 11]))
One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.
# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
np.random.seed(seed)
np.random.shuffle(a)
np.random.seed(seed)
np.random.shuffle(b)
That's it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.
EDIT, don't use np.random.seed() use np.random.RandomState instead
def shuffle(a, b, seed):
rand_state = np.random.RandomState(seed)
rand_state.shuffle(a)
rand_state.seed(seed)
rand_state.shuffle(b)
When calling it just pass in any seed to feed the random state:
a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)
Output:
>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]
Edit: Fixed code to re-seed the random state
Say we have two arrays: a and b.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])
We can first obtain row indices by permutating first dimension
indices = np.random.permutation(a.shape[0])
[1 2 0]
Then use advanced indexing.
Here we are using the same indices to shuffle both arrays in unison.
a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]
This is equivalent to
np.take(a, indices, axis=0)
[[4 5 6]
[7 8 9]
[1 2 3]]
np.take(b, indices, axis=0)
[[6 6 6]
[4 2 0]
[9 1 1]]
If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array
for old_index in len(a):
new_index = numpy.random.randint(old_index+1)
a[old_index], a[new_index] = a[new_index], a[old_index]
b[old_index], b[new_index] = b[new_index], b[old_index]
This implements the Knuth-Fisher-Yates shuffle algorithm.
Shortest and easiest way in my opinion, use seed:
random.seed(seed)
random.shuffle(x_data)
# reset the same seed to get the identical random sequence and shuffle the y
random.seed(seed)
random.shuffle(y_data)
most solutions above work, however if you have column vectors you have to transpose them first. here is an example
def shuffle(self) -> None:
"""
Shuffles X and Y
"""
x = self.X.T
y = self.Y.T
p = np.random.permutation(len(x))
self.X = x[p].T
self.Y = y[p].T
With an example, this is what I'm doing:
combo = []
for i in range(60000):
combo.append((images[i], labels[i]))
shuffle(combo)
im = []
lab = []
for c in combo:
im.append(c[0])
lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)
I extended python's random.shuffle() to take a second arg:
def shuffle_together(x, y):
assert len(x) == len(y)
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
y[i], y[j] = y[j], y[i]
That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.
Just use numpy...
First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.
import numpy as np
def shuffle_2d(a, b):
rows= a.shape[0]
if b.shape != (rows,1):
b = b.reshape((rows,1))
S = np.hstack((b,a))
np.random.shuffle(S)
b, a = S[:,0], S[:,1:]
return a,b
features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)
Suppose I have an array b of shape (3, 10, 3) and another array v = [8, 9, 4] of shape (3,), see below. For each of the 3 arrays of shape (10, 3) in b, I need to sum a number of rows as determined by v, i.e. for i = 0, 1, 2 I need to get np.sum(b[i, 0:v[i]], axis=0). My solution (shown below) uses a for loop which is inefficient I guess. I wonder if there is an efficient (vectorized) way to do what I have described above.
NB: my actual arrays have more dimension, these arrays are for illustration.
v = np.array([8,9,4])
b = np.array([[[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.]],
[[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.]],
[[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.]]])
n=v.shape[0]
vv=np.zeros([n, n])
for i in range(n):
vv[i]=np.sum( b[i,0:v[i]],axis=0)
Output:
vv
array([[3., 1., 4.],
[4., 2., 3.],
[3., 0., 1.]])
Edit:
Below is more an actual example of the arrays v and b.
v= np.random.randint(0,300, size=(32, 98,3))
b = np.zeros([98, 3, 300, 3])
for i in range(3):
for j in range(98):
b[j,i] = np.random.multinomial(1,[1./3, 1./3, 1./3], 300)
v.shape
Out[292]: (32, 98, 3)
b.shape
Out[293]: (98, 3, 300, 3)
I need to do the same thing as before, so the final result is an array of shape (32,98,3,3). Note that I have to do the above at each iteration that is why I'm looking for an efficient implementation.
This is a performance comparison of the different methods presented in the answers:
sliced_reduce
sliced_sum
sliced_sum_numba
reduce_cumulative (original idea here)
baseline - The "classic" Python for loop (see below).
Notes on performance
sliced_reduce reverses the order of index pairs from ascending to descending to turn the computation of superfluous elements to no-ops; this way however the array is not traversed in memory layout order and seems to slow down the method by ~30%.
reduce_cumulative performs a number of unnecessary add operations which depends on the distribution of start and stop indices. For the OP example where start indices are all zero and stop indices are uniformly distributed this will be in average twice as many operations as strictly necessary. For other distributions (e.g. non-zero start indices) this fraction might very well change and hence degrade the performance as compared to other methods. Please check your own case.
[Disclaimer] As with all performance estimations, these are rough guidelines to give a broad overview but they don't save you from running the performance tests yourself for your specific use case on your specific machine to be absolutely certain to select the best option.
Using the example dimensions from the OP:
In [15]: np.random.seed(0)
In [16]: b = np.random.randint(0, 1000, size=(98, 3, 300, 3))
In [17]: v = np.random.randint(-299, 300, size=(32, 98, 3))
In [18]: %timeit sliced_reduce(b, np.zeros_like(v), v, np.add, axis=2)
11.3 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [19]: %timeit sliced_sum(b, np.zeros_like(v), v, axis=2)
54.9 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [20]: %timeit sliced_sum_numba(b, np.zeros_like(v), v, 2)
16.3 ms ± 609 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [21]: %timeit reduce_cumulative(b, np.zeros_like(v), v, np.add, axis=2)
2.05 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [22]: %timeit baseline(b, np.zeros_like(v), v, axis=2)
79 ms ± 625 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Baseline implementation:
def baseline(a, i, j, axis=None):
if axis is None:
axis = len(i.shape)
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
m = len(i.shape) - axis
result = np.empty(i.shape + a.shape[axis+1:], dtype=a.dtype)
for k in np.ndindex(i.shape):
result[k] = np.sum(a[k[m:] + (slice(i[k], j[k]),)], axis=0)
return result
Performance plots
Besides the timings for the OP's specific example case it is instructive to check how the algorithms scale with the size of the data and index arrays. Here we can separate the shapes into three different components:
Leading dimensions of the index array (those that are not present in the data array). In the OP example this is (32,).
Common dimensions of the index and the data array (the dimensions after the leading ones up to the reduced axis). In the OP example this is (98, 3).
The size of the axis to be reduced. In the OP example this is 300.
(The trailing dimensions of the data array are handled similarly by all algorithms and hence no particular scaling is to be expected.)
Hence we can create performance plots for three different cases: Varying the size of the leading dimension(s), the common dimensions and the size of the axis to be reduced. Boundaries are chosen from 1 to N where N is the largest power of 2 such that no involved array has more than 5,000,000 elements (input, index, output; intermediary arrays might be larger (such as for sliced_reduce)).
For the code see below.
Leading dimensions
Common dimensions
Reduced dimension
Code
from string import ascii_lowercase as symbols
import numba
import numpy as np
import perfplot
np.random.seed(0)
def sliced_reduce(a, i, j, ufunc=np.add, axis=2):
indices = np.tile(
np.repeat(
np.arange(np.prod(a.shape[:axis])) * a.shape[axis],
2
),
np.prod(i.shape[:len(i.shape) - axis])
)
indices[::2] += (a.shape[axis] + i.ravel()) % a.shape[axis]
indices[1::2] += (a.shape[axis] + j.ravel()) % a.shape[axis]
indices = indices.reshape(-1, 2)[::-1].ravel() # This seems to be counter-effective, please check for your own case.
result = ufunc.reduceat(a.reshape(-1, *a.shape[axis+1:]), indices)[::2] # Select only even to odd.
result[indices[::2] == indices[1::2]] = ufunc.reduce([])
return result[::-1].reshape(*(i.shape + a.shape[axis+1:]))
def sliced_sum(a, i, j, axis=2):
l = len(i.shape) - axis
m = len(i.shape) - l
n = len(a.shape) - axis - 1
leading = symbols[:l]
common = symbols[l:l+m]
summation = symbols[l+m]
trailing = symbols[l+m+1:l+m+1+n]
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
indices, i, j = np.broadcast_arrays(np.arange(a.shape[axis]),
np.expand_dims(i, -1), np.expand_dims(j, -1))
active_elements = (i <= indices) & (indices < j)
return np.einsum(f'{leading + common + summation},{common + summation + trailing}->{leading + common + trailing}',
active_elements, a)
def sliced_sum_numba(a, i, j, axis=2):
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
m = np.prod(i.shape[:len(i.shape) - axis], dtype=int)
n = np.prod(i.shape[len(i.shape) - axis:], dtype=int)
a_flat = a.reshape(-1, *a.shape[axis:])
i_flat = i.ravel()
j_flat = j.ravel()
result = np.empty((m*n,) + a.shape[axis+1:], dtype=a.dtype)
numba_sum(a_flat, i_flat, j_flat, m, n, result)
return result.reshape(*(i.shape + a.shape[axis+1:]))
#numba.jit(parallel=True, nopython=True)
def numba_sum(a, i, j, m, n, out):
for index in numba.prange(m*n):
out[index] = np.sum(a[index % n, i[index]:j[index]], axis=0)
def reduce_cumulative(a, i, j, ufunc=np.add, axis=2):
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
a = np.insert(a, 0, 0, axis)
c = ufunc.accumulate(a, axis=axis)
pre = np.ix_(*(range(x) for x in i.shape))
l = len(i.shape) - axis
return c[pre[l:] + (j,)] - c[pre[l:] + (i,)]
def baseline(a, i, j, axis=2):
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
m = len(i.shape) - axis
result = np.empty(i.shape + a.shape[axis+1:], dtype=a.dtype)
for k in np.ndindex(i.shape):
result[k] = np.sum(a[k[m:] + (slice(i[k], j[k]),)], axis=0)
return result
a = np.random.randint(0, 1000, size=(98, 3, 300, 3))
j = np.random.randint(-299, 300, size=(32, 98, 3))
i = np.zeros_like(j)
check = [f(a, i, j) for f in [sliced_reduce, sliced_sum, sliced_sum_numba, reduce_cumulative, baseline]]
assert all(np.array_equal(check[0], x) for x in check[1:])
perfplot.show(
# Leading dimensions:
# setup = lambda n: (np.random.randint(0, 1000, size=(98, 3, 300, 3)),
# np.zeros((n, 98, 3), dtype=int),
# np.random.randint(-299, 300, size=(n, 98, 3))),
# Common dimensions:
# setup = lambda n: (np.random.randint(0, 1000, size=(n, 3, 300, 3)),
# np.zeros((32, n, 3), dtype=int),
# np.random.randint(-299, 300, size=(32, n, 3))),
# Reduced dimension:
setup = lambda n: (np.random.randint(0, 1000, size=(98, 3, n, 3)),
np.zeros((32, 98, 3), dtype=int),
np.random.randint(-n+1, n, size=(32, 98, 3))),
kernels=[
lambda a: sliced_reduce(*a),
lambda a: sliced_sum(*a),
lambda a: sliced_sum_numba(*a),
lambda a: reduce_cumulative(*a),
lambda a: baseline(*a),
],
labels=['sliced_reduce', 'sliced_sum', 'sliced_sum_numba', 'reduce_cumulative', 'baseline'],
# n_range=[2 ** k for k in range(13)], # Leading dimensions.
# n_range=[2 ** k for k in range(11)], # Common dimensions.
n_range=[2 ** k for k in range(2, 13)], # Reduced dimension.
# xlabel='Size of leading dimension',
# xlabel='Size of first common dimension (second is 3)',
xlabel='Size of reduced dimension',
)
The following function allows for summing a given axis with varying slices indicated by start and stop arrays. It uses np.einsum under the hood together with an appropriately computed coefficient array that indicates which elements in the input array should participate in the sum (using coefficients 1 and 0). Relying on einsum makes the implementation compatible with other packages such as PyTorch or TensorFlow (with minor changes). It doubles the number of necessary computations since for each add operation comes an additional multiply operation with the coefficient array.
from string import ascii_lowercase as symbols
import numpy as np
def sliced_sum(a, i, j, axis=None):
"""Sum an array along a given axis for varying slices `a[..., i:j, ...]` where `i` and `j` are arrays themselves.
Parameters
----------
a : array
The array to be summed over.
i : array
The start indices for the summation axis. Must have the same shape as `j`.
j : array
The stop indices for the summation axis. Must have the same shape as `i`.
axis : int, optional
Axis to be summed over. Defaults to `len(i.shape)`.
Returns
-------
array
Shape `i.shape + a.shape[axis+1:]`.
Notes
-----
The shapes of `a` and `i`, `j` must match up to the summation axis.
That means `a.shape[:axis] == i.shape[len(i.shape) - axis:]``.
`i` and `j` can have additional leading dimensions and `a` can have additional trailing dimensions.
"""
if axis is None:
axis = len(i.shape)
# Compute number of leading, common and trailing dimensions.
l = len(i.shape) - axis # Number of leading dimensions.
m = len(i.shape) - l # Number of common dimensions.
n = len(a.shape) - axis - 1 # Number of trailing dimensions.
# Select the corresponding symbols for `np.einsum`.
leading = symbols[:l]
common = symbols[l:l+m]
summation = symbols[l+m]
trailing = symbols[l+m+1:l+m+1+n]
# Convert negative indices.
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
# Compute the "active" elements, i.e. the ones that should participate in the summation.
# "active" elements have a coefficient of 1 (True), others are 0 (False).
indices, i, j = np.broadcast_arrays(np.arange(a.shape[axis]),
np.expand_dims(i, -1), np.expand_dims(j, -1))
active_elements = (i <= indices) & (indices < j)
return np.einsum(f'{leading + common + summation},{common + summation + trailing}->{leading + common + trailing}',
active_elements, a)
For the examples in the OP it can be used in the following way:
# 1. example:
b = np.random.randint(0, 1000, size=(3, 10, 3))
v = np.random.randint(-9, 10, size=3) # Indexing into `b.shape[1]`.
result = sliced_sum(b, np.zeros_like(v), v)
# 2. example:
b = np.random.randint(0, 1000, size=(98, 3, 300, 3))
v = np.random.randint(-299, 300, size=(32, 98, 3)) # Indexing into `b.shape[2]`; one additional leading dimension for `v`.
result = sliced_sum(b, np.zeros_like(v), v, axis=2)
Another option is to use Numba to speed up the loop. This avoids unnecessary computations and memory allocation and is fully compatible with all numpy functions (i.e. also prod etc. work similarly).
import numba
import numpy as np
def sliced_sum_numba(a, i, j, axis=None):
"""Sum an array along a given axis for varying slices `a[..., i:j, ...]` where `i` and `j` are arrays themselves.
Parameters
----------
a : array
The array to be summed over.
i : array
The start indices for the summation axis. Must have the same shape as `j`.
j : array
The stop indices for the summation axis. Must have the same shape as `i`.
axis : int, optional
Axis to be summed over. Defaults to `len(i.shape)`.
Returns
-------
array
Shape `i.shape + a.shape[axis+1:]`.
Notes
-----
The shapes of `a` and `i`, `j` must match up to the summation axis.
That means `a.shape[:axis] == i.shape[len(i.shape) - axis:]``.
`i` and `j` can have additional leading dimensions and `a` can have additional trailing dimensions.
"""
if axis is None:
axis = len(i.shape)
# Convert negative indices.
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
# Operate on a flattened version of the array (dimensions up to `axis` are flattened).
m = np.prod(i.shape[:len(i.shape) - axis], dtype=int) # Elements in leading dimensions.
n = np.prod(i.shape[len(i.shape) - axis:], dtype=int) # Elements in common dimensions.
a_flat = a.reshape(-1, *a.shape[axis:])
i_flat = i.ravel()
j_flat = j.ravel()
result = np.empty((m*n,) + a.shape[axis+1:], dtype=a.dtype)
numba_sum(a_flat, i_flat, j_flat, m, n, result)
return result.reshape(*(i.shape + a.shape[axis+1:]))
#numba.jit(parallel=True, nopython=True)
def numba_sum(a, i, j, m, n, out):
for index in numba.prange(m*n):
out[index] = np.sum(a[index % n, i[index]:j[index]], axis=0)
For the examples in the OP it can be used in the following way:
# 1. example:
b = np.random.randint(0, 1000, size=(3, 10, 3))
v = np.random.randint(-9, 10, size=3) # Indexing into `b.shape[1]`.
result = sliced_sum_numba(b, np.zeros_like(v), v)
# 2. example:
b = np.random.randint(0, 1000, size=(98, 3, 300, 3))
v = np.random.randint(-299, 300, size=(32, 98, 3)) # Indexing into `b.shape[2]`; one additional leading dimension for `v`.
result = sliced_sum_numba(b, np.zeros_like(v), v, axis=2)
Another idea, brought up by this answer (hence community wiki), is to use np.cumsum and then select the rows corresponding to the slice indices. One can deal with zero indices by inserting an additional zero-row at the beginning of the axis that is to be reduced. This approach performs unnecessary computations since it computes the full cumulative sum, beyond the final index. In case the stop indices are uniformly distributed along the axis (with median input_array.shape[axis]//2) this will in average perform twice as many add operations as necessary. Nevertheless this approach seems to perform quite well compared to other methods (at least for the dimensions indicated by the OP).
def reduce_cumulative(a, i, j, ufunc, axis=None):
if axis is None:
axis = len(i.shape)
i = (a.shape[axis] + i) % a.shape[axis]
j = (a.shape[axis] + j) % a.shape[axis]
a = np.insert(a, 0, 0, axis) # Insert zeros to account for zero indices.
c = ufunc.accumulate(a, axis=axis)
pre = np.ix_(*(range(x) for x in i.shape)) # Indices for dimensions prior to `axis`.
l = len(i.shape) - axis # Number of leading dimensions in `i` and `j`.
return c[pre[l:] + (j,)] - c[pre[l:] + (i,)]
The following function allows for reducing a given axis with varying slices indicated by start and stop arrays. It uses np.ufunc.reduceat under the hood together with appropriately reshaped versions of the input array and the indices. It avoids unnecessary computations but allocates an intermediary array two times the size of the final output array (the computation of the discarded values are however no-ops).
def sliced_reduce(a, i, j, ufunc, axis=None):
"""Reduce an array along a given axis for varying slices `a[..., i:j, ...]` where `i` and `j` are arrays themselves.
Parameters
----------
a : array
The array to be reduced.
i : array
Start indices for the reduced axis. Must have the same shape as `j`.
j : array
Stop indices for the reduced axis. Must have the same shape as `i`.
ufunc : function
The function used for reducing the indicated axis.
axis : int, optional
Axis to be reduced. Defaults to `len(i.shape)`.
Returns
-------
array
Shape `i.shape + a.shape[axis+1:]`.
Notes
-----
The shapes of `a` and `i`, `j` must match up to the reduced axis.
That means `a.shape[:axis] == i.shape[len(i.shape) - axis:]``.
`i` and `j` can have additional leading dimensions and `a` can have additional trailing dimensions.
"""
if axis is None:
axis = len(i.shape)
indices = np.tile(
np.repeat(
np.arange(np.prod(a.shape[:axis])) * a.shape[axis],
2 # Repeat two times to have start and stop indices next to each other.
),
np.prod(i.shape[:len(i.shape) - axis]) # Perform summation for each element of additional axes.
)
# Add `a.shape[axis]` to account for negative indices.
indices[::2] += (a.shape[axis] + i.ravel()) % a.shape[axis]
indices[1::2] += (a.shape[axis] + j.ravel()) % a.shape[axis]
# Now indices are sorted in ascending order but this will lead to unnecessary computation when reducing
# from odd to even indices (since we're only interested in even to odd indices).
# Hence we reverse the order of index pairs (need to reverse the result as well then).
indices = indices.reshape(-1, 2)[::-1].ravel()
result = ufunc.reduceat(a.reshape(-1, *a.shape[axis+1:]), indices)[::2] # Select only even to odd.
# In case start and stop index are equal (i.e. empty slice) `reduceat` will select the element
# corresponding to the start index. Need to supply the correct default value in this case.
result[indices[::2] == indices[1::2]] = ufunc.reduce([])
return result[::-1].reshape(*(i.shape + a.shape[axis+1:])) # Reverse order and reshape.
For the examples in the OP it can be used in the following way:
# 1. example:
b = np.random.randint(0, 1000, size=(3, 10, 3))
v = np.random.randint(-9, 10, size=3) # Indexing into `b.shape[1]`.
result = sliced_reduce(b, np.zeros_like(v), v, np.add)
# 2. example:
b = np.random.randint(0, 1000, size=(98, 3, 300, 3))
v = np.random.randint(-299, 300, size=(32, 98, 3)) # Indexing into `b.shape[2]`; one additional leading dimension for `v`.
result = sliced_reduce(b, np.zeros_like(v), v, np.add, axis=2)
Notes
Reversing the order of flat index pairs in order to have even < odd and thus shortcut every second computation with a no-op doesn't seem to be a good idea (probably because the flattened array is not traversed in memory layout order anymore). Removing this part and using the flat indices in ascending order gives a performance increase of about 30% (also for the perfplots, though not included there).
For what it's worth, here's a one-liner. No promises that this is the most efficient version, because it does a lot more addition than necessary:
In [25]: b.cumsum(axis=1)[np.arange(b.shape[0]), v-1]
Out[25]:
array([[3., 1., 4.],
[4., 2., 3.],
[3., 0., 1.]])
(Also be aware that it doesn't correctly handle a 0 in v.)