Fast compose slices of 3D numpy array as 3D subarray

Fast compose slices of 3D numpy array as 3D subarray - python

I have a 3D numpy array x. I want to take a subset of each slice on axis 0 (each subset is the same shape, but with start and end indices that may be different for each slice) and compose these into a separate 3D numpy array. I can achieve this with
import numpy as np
x = np.arange(24).reshape((3, 4, 2))
starts = [0, 2, 1]
ends = [2, 4, 3]
np.stack([x[i, starts[i]:ends[i]] for i in range(3)])
but 1) is there any way to do this in a single operation using fancy indexing, and 2) will doing so speed things up?

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
L = 2 # ends[0]-starts[0]
w = view_as_windows(x,(1,L,1))[...,0,:,0]
out = w[np.arange(len(starts)), starts].swapaxes(1,2)
Alternatively, a compact version leveraging broadcasting that generates all the required indices and then indexing into the input array, would be -
x[np.arange(len(starts))[:,None],np.asarray(starts)[:,None] + np.arange(L)]

Related

How to efficiently broadcast multiplication between arrays of shapes (n,m,k) and (n,m)

Let a be a numpy array of shape (n,m,k) and a_msk is an array of shape (n,m) containing that masks elements from a through multiplication.
Up to my knowledge, I had to create a new axis in a_msk in order to make it compatible with a for multiplication.
b = a * a_msk[:,:,np.newaxis]
Unfortunately, my Google Colab runtime is running out of memory at this very operation given the large size of the arrays.
My question is whether I can achieve the same thing without creating that new axis for the mask array.

As #hpaulj commented adding an axis to make the two arrays "compatible" for broadcasting is the most straightforward way to do your multiplication.
Alternatively, you can move the last axis of your array a to the front which would also make the two arrays compatible (I wonder though whether this would solve your memory issue):
a = np.moveaxis(a, -1, 0)
Then you can simply multiply:
b = a * a_msk
However, to get your result you have to move the axis back:
b = np.moveaxis(b, 0, -1)
Example: both solutions return the same answer:
import numpy as np
a = np.arange(24).reshape(2, 3, 4)
a_msk = np.arange(6).reshape(2, 3)
print(f'newaxis solution:\n {a * a_msk[..., np.newaxis]}')
print()
print(f'moveaxis solution:\n {np.moveaxis((np.moveaxis(a, -1, 0) * a_msk), 0, -1)}')

Setup sliding windows as columns (IM2COL from MATLAB) in multi-dimensional array - Python

Currently, I have a 4d array, say,
arr = np.arange(48).reshape((2,2,3,4))
I want to apply a function that takes a 2d array as input to each 2d array sliced from arr. I have searched and read this question, which is exactly what I want.
The function I'm using is im2col_sliding_broadcasting() which I get from here. It takes a 2d array and list of 2 elements as input and returns a 2d array. In my case: it takes 3x4 2d array and a list [2, 2] and returns 4x6 2d array.
I considered using apply_along_axis() but as said it only accepts 1d function as parameter. I can't apply im2col function this way.
I want an output that has the shape as 2x2x4x6. Surely I can achieve this with for loop, but I heard that it's too time expensive:
import numpy as np
def im2col_sliding_broadcasting(A, BSZ, stepsize=1):
# source: https://stackoverflow.com/a/30110497/10666066
# Parameters
M, N = A.shape
col_extent = N - BSZ[1] + 1
row_extent = M - BSZ[0] + 1
# Get Starting block indices
start_idx = np.arange(BSZ[0])[:, None]*N + np.arange(BSZ[1])
# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:, None]*N + np.arange(col_extent)
# Get all actual indices & index into input array for final output
return np.take(A, start_idx.ravel()[:, None] + offset_idx.ravel()[::stepsize])
arr = np.arange(48).reshape((2,2,3,4))
output = np.empty([2,2,4,6])
for i in range(2):
for j in range(2):
temp = im2col_sliding_broadcasting(arr[i, j], [2,2])
output[i, j] = temp
Since my arr in fact is a 10000x3x64x64 array. So my question is: Is there another way to do this more efficiently ?

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
W1,W2 = 2,2 # window size
# create sliding windows along last two axes1
w = view_as_windows(arr,(1,1,W1,W2))[...,0,0,:,:]
# Merge the window axes (tha last two axes) and
# merge the axes along which those windows were created (3rd and 4th axes)
outshp = arr.shape[:-2] + (W1*W2,) + ((arr.shape[-2]-W1+1)*(arr.shape[-1]-W2+1),)
out = w.transpose(0,1,4,5,2,3).reshape(outshp)
The last step forces a copy. So, skip it if possible.

How to get chunks of submatrices faster?

I have a really big matrix (nxn)for which I would to build the intersecting tiles (submatrices) with the dimensions mxm. There will be an offset of step bvetween each contiguous submatrices. Here is an example for n=8, m=4, step=2:
import numpy as np
matrix=np.random.randn(8,8)
n=matrix.shape[0]
m=4
step=2
This will store all the corner indices (x,y) from which we will take a 4x4 natrix: (x:x+4,x:x+4)
a={(i,j) for i in range(0,n-m+1,step) for j in range(0,n-m+1,step)}
The submatrices will be extracted like that
sub_matrices = np.zeros([m,m,len(a)])
for i,ind in enumerate(a):
x,y=ind
sub_matrices[:,:,i]=matrix[x:x+m, y:y+m]
Is there a faster way to do this submatrices initialization?

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
# Get indices as array
ar = np.array(list(a))
# Get all sliding windows
w = view_as_windows(matrix,(m,m))
# Get selective ones by indexing with ar
selected_windows = np.moveaxis(w[ar[:,0],ar[:,1]],0,2)
Alternatively, we can extract the row and col indices with a list comprehension and then index with those, like so -
R = [i[0] for i in a]
C = [i[1] for i in a]
selected_windows = np.moveaxis(w[R,C],0,2)
Optimizing from the start, we can skip the creation of stepping array, a and simply use the step arg with view_as_windows, like so -
view_as_windows(matrix,(m,m),step=2)
This would give us a 4D array and indexing into the first two axes of it would have all the mxm shaped windows. These windows are simply views into input and hence no extra memory overhead plus virtually free runtime!

import numpy as np
a = np.random.randn(n, n)
b = a[0:m*step:step, 0:m*step:step]
If you have a one-dimension array, you can get it's submatrix by the following code:
c = a[start:end:step]
If the dimension is two or more, add comma between every dimension.
d = a[start1:end1:step1, start2:end3:step2]

how can I extract multiple random sub-sequences from a numpy array

say I have a sequence s and I'd like to select n random sub sequences from it each with length l and store in a matrix. Is there a more numpy way of doing that than
s = np.arange(0, 1000)
n = 5
l = 10
i = np.random.randint(0, len(s)-10, 5)
ss = np.array([s[x:x+l] for x in i])

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, like so -
from skimage.util.shape import view_as_windows
# Get sliding windows (these are simply views)
w = view_as_windows(s, l)
# Index with indices, i for desired output
out = w[i]
Related :
NumPy Fancy Indexing - Crop different ROIs from different channels
Take N first values from every row in NumPy matrix that fulfill condition
Selecting Random Windows from Multidimensional Numpy Array Rows

Apply bincount to each row of a 2D numpy array

Is there a way to apply bincount with "axis = 1"? The desired result would be the same as the list comprehension:
import numpy as np
A = np.array([[1,0],[0,0]])
np.array([np.bincount(r,minlength = np.max(A) + 1) for r in A])
#array([[1,1]
# [2,0]])

np.bincount doesn't work with a 2D array along a certain axis. To get the desired effect with a single vectorized call to np.bincount, one can create a 1D array of IDs such that different rows would have different IDs even if the elements are the same. This would keep elements from different rows not binning together when using a single call to np.bincount with those IDs. Thus, such an ID array could be created with an idea of linear indexing in mind, like so -
N = A.max()+1
id = A + (N*np.arange(A.shape[0]))[:,None]
Then, feed the IDs to np.bincount and finally reshape back to 2D -
np.bincount(id.ravel(),minlength=N*A.shape[0]).reshape(-1,N)

If the data is too large for this to be efficient, then the issue is more likely to be the memory usage of the dense matrix rather than the numerical operations themself. Here is an example of using a sklearn Hashing Vectorizer on a matrix which is too large to use the bincounts method (the results are a sparse matrix):
import numpy as np
from sklearn.feature_extraction.text import HashingVectorizer
h = HashingVectorizer()
A = np.random.randint(100,size=(1000,100))*10000
A_str = [" ".join([str(v) for v in i]) for i in A]
%timeit h.fit_transform(A_str)
#10 loops, best of 3: 110 ms per loop

You can use apply_along_axis, Here is an example
import numpy as np
test_array = np.array([[0, 0, 1], [0, 0, 1]])
print(test_array)
np.apply_along_axis(np.bincount, axis=1, arr= test_array,
minlength = np.max(test_array) +1)
Note the final shape of this array depends on the number of bins, also you can specify other arguments along with apply_along_axis

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast compose slices of 3D numpy array as 3D subarray - python

Related

How to efficiently broadcast multiplication between arrays of shapes (n,m,k) and (n,m)

Setup sliding windows as columns (IM2COL from MATLAB) in multi-dimensional array - Python

How to get chunks of submatrices faster?

how can I extract multiple random sub-sequences from a numpy array

Apply bincount to each row of a 2D numpy array

Categories

Resources