Related
Lets say I have a Python Numpy array a.
a = numpy.array([1,2,3,4,5,6,7,8,9,10,11])
I want to create a matrix of sub sequences from this array of length 5 with stride 3. The results matrix hence will look as follows:
numpy.array([[1,2,3,4,5],[4,5,6,7,8],[7,8,9,10,11]])
One possible way of implementing this would be using a for-loop.
result_matrix = np.zeros((3, 5))
for i in range(0, len(a), 3):
result_matrix[i] = a[i:i+5]
Is there a cleaner way to implement this in Numpy?
Approach #1 : Using broadcasting -
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides -
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Starting in Numpy 1.20, we can make use of the new sliding_window_view to slide/roll over windows of elements.
And coupled with a stepping [::3], it simply becomes:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([1,2,3,4,5,6,7,8,9,10,11])
sliding_window_view(values, window_shape = 5)[::3]
# array([[ 1, 2, 3, 4, 5],
# [ 4, 5, 6, 7, 8],
# [ 7, 8, 9, 10, 11]])
where the intermediate result of the sliding is:
sliding_window_view(values, window_shape = 5)
# array([[ 1, 2, 3, 4, 5],
# [ 2, 3, 4, 5, 6],
# [ 3, 4, 5, 6, 7],
# [ 4, 5, 6, 7, 8],
# [ 5, 6, 7, 8, 9],
# [ 6, 7, 8, 9, 10],
# [ 7, 8, 9, 10, 11]])
Modified version of #Divakar's code with checking to ensure that memory is contiguous and that the returned array cannot be modified. (Variable names changed for my DSP application).
def frame(a, framelen, frameadv):
"""frame - Frame a 1D array
a - 1D array
framelen - Samples per frame
frameadv - Samples between starts of consecutive frames
Set to framelen for non-overlaping consecutive frames
Modified from Divakar's 10/17/16 11:20 solution:
https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize
CAVEATS:
Assumes array is contiguous
Output is not writable as there are multiple views on the same memory
"""
if not isinstance(a, np.ndarray) or \
not (a.flags['C_CONTIGUOUS'] or a.flags['F_CONTIGUOUS']):
raise ValueError("Input array a must be a contiguous numpy array")
# Output
nrows = ((a.size-framelen)//frameadv)+1
oshape = (nrows, framelen)
# Size of each element in a
n = a.strides[0]
# Indexing in the new object will advance by frameadv * element size
ostrides = (frameadv*n, n)
return np.lib.stride_tricks.as_strided(a, shape=oshape,
strides=ostrides, writeable=False)
Provided a numpy array:
arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
I wonder how access chosen size chunks with chosen separation, both concatenated and in slices:
E.g.: obtain chunks of size 3 separated by two values:
arr_chunk_3_sep_2 = np.array([0,1,2,5,6,7,10,11,12])
arr_chunk_3_sep_2_in_slices = np.array([[0,1,2],[5,6,7],[10,11,12])
Wha is the most efficient way to do it? If possible, I would like to avoid copying or creating new objects as much as possible. Maybe Memoryviews could be of help here?
Approach #1
Here's one with masking -
def slice_grps(a, chunk, sep):
N = chunk + sep
return a[np.arange(len(a))%N < chunk]
Sample run -
In [223]: arr
Out[223]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
In [224]: slice_grps(arr, chunk=3, sep=2)
Out[224]: array([ 0, 1, 2, 5, 6, 7, 10, 11, 12])
Approach #2
If the input array is such that the last chunk would have enough runway, we could , we could leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -
# https://stackoverflow.com/a/51640641/ #Divakar
def skipped_view(a, m, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
shp = ((a.size+n-1)//n,n)
return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]
out = skipped_view(arr,chunk,chunk+sep)
Note that the output would be a view into the input array and as such no extra memory overhead and virtually free!
Sample run to make things clear -
In [255]: arr
Out[255]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
In [256]: chunk = 3
In [257]: sep = 2
In [258]: skipped_view(arr,chunk,chunk+sep)
Out[258]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
# Let's prove that the output is a view indeed
In [259]: np.shares_memory(arr, skipped_view(arr,chunk,chunk+sep))
Out[259]: True
How about a reshape and slice?
In [444]: arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12])
In [445]: arr.reshape(-1,5)
...
ValueError: cannot reshape array of size 13 into shape (5)
Ah a problem - your array isn't big enough for this reshape - so we have to pad it:
In [446]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)
Out[446]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 0, 0]])
In [447]: np.concatenate((arr,np.zeros(2,int))).reshape(-1,5)[:,:-2]
Out[447]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
as_strided can get a way with this by including bytes outside the databuffer. Usually that's seen as a bug, though here it can be an asset - provided you really do throw that garbage away.
Or throwing away the last incomplete line:
In [452]: arr[:-3].reshape(-1,5)[:,:3]
Out[452]:
array([[0, 1, 2],
[5, 6, 7]])
Lets say I have a Python Numpy array a.
a = numpy.array([1,2,3,4,5,6,7,8,9,10,11])
I want to create a matrix of sub sequences from this array of length 5 with stride 3. The results matrix hence will look as follows:
numpy.array([[1,2,3,4,5],[4,5,6,7,8],[7,8,9,10,11]])
One possible way of implementing this would be using a for-loop.
result_matrix = np.zeros((3, 5))
for i in range(0, len(a), 3):
result_matrix[i] = a[i:i+5]
Is there a cleaner way to implement this in Numpy?
Approach #1 : Using broadcasting -
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides -
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Starting in Numpy 1.20, we can make use of the new sliding_window_view to slide/roll over windows of elements.
And coupled with a stepping [::3], it simply becomes:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([1,2,3,4,5,6,7,8,9,10,11])
sliding_window_view(values, window_shape = 5)[::3]
# array([[ 1, 2, 3, 4, 5],
# [ 4, 5, 6, 7, 8],
# [ 7, 8, 9, 10, 11]])
where the intermediate result of the sliding is:
sliding_window_view(values, window_shape = 5)
# array([[ 1, 2, 3, 4, 5],
# [ 2, 3, 4, 5, 6],
# [ 3, 4, 5, 6, 7],
# [ 4, 5, 6, 7, 8],
# [ 5, 6, 7, 8, 9],
# [ 6, 7, 8, 9, 10],
# [ 7, 8, 9, 10, 11]])
Modified version of #Divakar's code with checking to ensure that memory is contiguous and that the returned array cannot be modified. (Variable names changed for my DSP application).
def frame(a, framelen, frameadv):
"""frame - Frame a 1D array
a - 1D array
framelen - Samples per frame
frameadv - Samples between starts of consecutive frames
Set to framelen for non-overlaping consecutive frames
Modified from Divakar's 10/17/16 11:20 solution:
https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize
CAVEATS:
Assumes array is contiguous
Output is not writable as there are multiple views on the same memory
"""
if not isinstance(a, np.ndarray) or \
not (a.flags['C_CONTIGUOUS'] or a.flags['F_CONTIGUOUS']):
raise ValueError("Input array a must be a contiguous numpy array")
# Output
nrows = ((a.size-framelen)//frameadv)+1
oshape = (nrows, framelen)
# Size of each element in a
n = a.strides[0]
# Indexing in the new object will advance by frameadv * element size
ostrides = (frameadv*n, n)
return np.lib.stride_tricks.as_strided(a, shape=oshape,
strides=ostrides, writeable=False)
I have a numpy array of arbitrary shape, e.g.:
a = array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
a.shape = (2, 3, 2)
and a result of argmax over the last axis:
np.argmax(a, axis=-1) = array([[1, 1, 0],
[1, 0, 1]])
I'd like to get max:
np.max(a, axis=-1) = array([[ 2, 4, 8],
[ 8, 9, 12]])
But without recalculating everything. I've tried:
a[np.arange(len(a)), np.argmax(a, axis=-1)]
But got:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,3)
How to do it? Similar question for 2-d: numpy 2d array max/argmax
You can use advanced indexing -
In [17]: a
Out[17]:
array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
In [18]: idx = a.argmax(axis=-1)
In [19]: m,n = a.shape[:2]
In [20]: a[np.arange(m)[:,None],np.arange(n),idx]
Out[20]:
array([[ 2, 4, 8],
[ 8, 9, 12]])
For a generic ndarray case of any number of dimensions, as stated in the comments by #hpaulj, we could use np.ix_, like so -
shp = np.array(a.shape)
dim_idx = list(np.ix_(*[np.arange(i) for i in shp[:-1]]))
dim_idx.append(idx)
out = a[dim_idx]
For ndarray with arbitrary shape, you can flatten the argmax indices, then recover the correct shape, as so:
idx = np.argmax(a, axis=-1)
flat_idx = np.arange(a.size, step=a.shape[-1]) + idx.ravel()
maximum = a.ravel()[flat_idx].reshape(*a.shape[:-1])
For arbitrary-shape arrays, the following should work :)
a = np.arange(5 * 4 * 3).reshape((5,4,3))
# for last axis
argmax = a.argmax(axis=-1)
a[tuple(np.indices(a.shape[:-1])) + (argmax,)]
# for other axis (eg. axis=1)
argmax = a.argmax(axis=1)
idx = list(np.indices(a.shape[:1]+a.shape[2:]))
idx[1:1] = [argmax]
a[tuple(idx)]
or
a = np.arange(5 * 4 * 3).reshape((5,4,3))
argmax = a.argmax(axis=0)
np.choose(argmax, np.moveaxis(a, 0, 0))
argmax = a.argmax(axis=1)
np.choose(argmax, np.moveaxis(a, 1, 0))
argmax = a.argmax(axis=2)
np.choose(argmax, np.moveaxis(a, 2, 0))
argmax = a.argmax(axis=-1)
np.choose(argmax, np.moveaxis(a, -1, 0))
I have scripts with multi-dimensional arrays and instead of for-loops I would like to use a vectorized implementation for my problems (which sometimes contain column operations).
Let's consider a simple example with matrix arr:
> arr = np.arange(12).reshape(3, 4)
> arr
> ([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
> arr.shape
> (3, 4)
So we have a matrix arr with 3 rows and 4 columns.
The simplest case in my scripts is adding something to the values in the array. E.g. I'm doing this for single or multiple rows:
> someVector = np.array([1, 2, 3, 4])
> arr[0] += someVector
> arr
> array([[ 1, 3, 5, 7], <--- successfully added someVector
[ 4, 5, 6, 7], to one row
[ 8, 9, 10, 11]])
> arr[0:2] += someVector
> arr
> array([[ 2, 5, 8, 11], <--- added someVector to two
[ 5, 7, 9, 11], <--- rows at once
[ 8, 9, 10, 11]])
This works well. However, sometimes I need to manipulate one or several columns. One column at a time works:
> arr[:, 0] += [1, 2, 3]
> array([[ 3, 5, 8, 11],
[ 7, 7, 9, 11],
[11, 9, 10, 11]])
^
|___ added the values [1, 2, 3] successfully to
this column
But I am struggling to think out why this does not work for multiple columns at once:
> arr[:, 0:2] += [1, 2, 3]
> ValueError
> Traceback (most recent call last)
> <ipython-input-16-5feef53e53af> in <module>()
> ----> 1 arr[:, 0:2] += [1, 2, 3]
> ValueError: operands could not be broadcast
> together with shapes (3,2) (3,) (3,2)
Isn't this the very same way it works with rows? What am I doing wrong here?
To add a 1D array to multiple columns you need to broadcast the values to a 2D array. Since broadcasting adds new axes on the left (of the shape) by default, broadcasting a row vector to multiple rows happens automatically:
arr[0:2] += someVector
someVector has shape (N,) and gets automatically broadcasted to shape (1, N). If arr[0:2] has shape (2, N), then the sum is performed element-wise as though both arr[0:2] and someVector were arrays of the same shape, (2, N).
But to broadcast a column vector to multiple columns requires hinting NumPy that you want broadcasting to occur with the axis on the right. In fact, you have to add the new axis on the right explicitly by using someVector[:, np.newaxis] or equivalently someVector[:, None]:
In [41]: arr = np.arange(12).reshape(3, 4)
In [42]: arr[:, 0:2] += np.array([1, 2, 3])[:, None]
In [43]: arr
Out[43]:
array([[ 1, 2, 2, 3],
[ 6, 7, 6, 7],
[11, 12, 10, 11]])
someVector (e.g. np.array([1, 2, 3])) has shape (N,) and someVector[:, None] has shape (N, 1) so now broadcasting happens on the right. If arr[:, 0:2] has shape (N, 2), then the sum is performed element-wise as though both arr[:, 0:2] and someVector[:, None] were arrays of the same shape, (N, 2).
Very clear explanation of #unutbu.
As a complement, transposition (.T) can often simplify the task, by working in the first dimension :
In [273]: arr = np.arange(12).reshape(3, 4)
In [274]: arr.T[0:2] += [1, 2, 3]
In [275]: arr
Out[275]:
array([[ 1, 2, 2, 3],
[ 6, 7, 6, 7],
[11, 12, 10, 11]])