numpy apply along n-spaces - python

I have a 4d array, and I would like to apply a function to each 2d slice taken by iterating over the last two dimensions. Viz, apply f(2d_array) to (x,y,0,0), and f(2d_array) to (x,y,0,1), etc etc. My function operates on the array in place, so the dimensions would be the same, but a general solution would return an array of shape (x',y',w,z), where w and z are the last two dimensions of the original array.
This could obviously be generalized to mD slices over an nD array.
Is there any built-in functionality that does this thing?

The 'basic' apply-along-axis model is to iterate on one axis, and pass the other to your function:
In [197]: def foo(x): # return same size
...: return x*2
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[197]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
In [198]: def foo(x):
...: return x.sum() # return one less dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[198]: array([ 6, 22, 38])
In [199]: def foo(x):
...: return x.sum(keepdims=True) # condense the dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[199]:
array([[ 6],
[22],
[38]])
Your 4d problem can be massaged to fit this.
In [200]: arr_4d = np.arange(24).reshape(2,3,2,2)
In [201]: arr_2d = arr_4d.reshape(6,4).T
In [202]: res = np.array([foo(x) for x in arr_2d])
In [203]: res
Out[203]:
array([[60],
[66],
[72],
[78]])
In [204]: res.reshape(2,2)
Out[204]:
array([[60, 66],
[72, 78]])
which is the equivalent of doing:
In [205]: arr_4d[:,:,0,0].sum()
Out[205]: 60
In [206]: foo(arr_4d[:,:,0,0].ravel())
Out[206]: array([60])
apply_along_axis requires a function that takes a 1d array, but can be applied thus:
In [209]: np.apply_along_axis(foo,0,arr_4d.reshape(6,2,2))
Out[209]:
array([[[60, 66],
[72, 78]]])
foo could reshape its input to 2d, and pass it to a function that takes 2d. apply_along_index uses np.ndindex to generate the indices for the iteration axes.
In [212]: list(np.ndindex(2,2))
Out[212]: [(0, 0), (0, 1), (1, 0), (1, 1)]
np.vectorize normally works with a function that takes a scalar. But recent versions have a signature parameter, which I believe could be used to work with your case. It may require transposing the input so it iterates on the first two axes, passing the last two to function. See my answer at https://stackoverflow.com/a/46004266/901925.
None of these approaches offers a speed advantage.
Without reshaping or swapping, I can iterate with the help of ndindex.
Define a function that expects a 2d input:
def foo2(x):
return x.sum(axis=1, keepdims=True) # 2d
Index iterator for the last 2 dim of arr_4d:
In [260]: idx = np.ndindex(arr_4d.shape[-2:])
Do test calc to determine the shape of the return. vectorize and apply... do this sort of test.
In [261]: r1 = foo2(arr_4d[:,:,0,0]).shape
In [262]: r1
Out[262]: (2, 1)
The result array:
In [263]: res = np.zeros(r1+arr_4d.shape[-2:])
In [264]: res.shape
Out[264]: (2, 1, 2, 2)
Now iterate:
In [265]: for i,j in idx:
...: res[...,i,j] = foo2(arr_4d[...,i,j])
...:
In [266]: res
Out[266]:
array([[[[ 12., 15.],
[ 18., 21.]]],
[[[ 48., 51.],
[ 54., 57.]]]])

I guess you're looking for something like numpy.apply_over_axes coupled with a for loop to iterate other the varying axes.

I rolled my own. I'd be interested to know if there are any performance differences between this and #hpaulj's method and if there is reason to believe that writing a custom c module would be offer significant improvement. Of course #hpaulj's method is more general, since this is specific to my needing to just perform an operation on the array in place.
def apply_along_space(f, np_array, axes):
# apply the function f on each subspace given by iterating over the axes listed in axes, e.g. axes=(0,2)
for slic in itertools.product(*map(lambda ax: range(np_array.shape[ax]) if ax in axes else [slice(None,None,None)], range(len(np_array.shape)))):
f(np_array[slic])
return np_array

Related

Efficient way to cast scalars to numpy arrays

When I write a function that accepts ndarray or scalar inputs
def foo(a):
# does something to `a`
#
# a: `x` dimensional array or scalar
# . . .
cast(a, x)
# deal with `a` as if it is an `x`-d array after this
Is there an effeicint way yo write that cast function? Basically what I'd want is a function that would cast:
a, a scalar to ndarray with shape ((1,)*x)
b, an ndarray with y<x dims explicitly to shape ((1,) * (y-x) + b.shape) (same as broadcasting)
c, an ndarray with x dims is unaffected
d, an ndarray with y>x dims throws an error
do it all in-place (at least when starting with an array), to prevent double memory
it seems like this functionality is repeated so often in built-in functions that there should be some shortcut for it, but I'm not finding it.
I can do a_ = np.array(a, ndmin = x, copy = False) and then assert len(a_.shape) == x) , but that still makes a copy of arrays. (i.e. a_.base is a is False). Is there any way around this?
asarray returns the array itself (if starting with an array):
In [271]: x=np.arange(10)
In [272]: y = np.asarray(x)
In [273]: id(x)
Out[273]: 2812424128
In [274]: id(y)
Out[274]: 2812424128 # same id
ndmin produces a view:
In [276]: y = np.array(x, ndmin=2, copy=False)
In [277]: y
Out[277]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [278]: id(x)
Out[278]: 2812424128
In [279]: id(y)
Out[279]: 2811135704 # different id
In [281]: x.__array_interface__['data']
Out[281]: (188551320, False)
In [282]: y.__array_interface__['data'] # same databuffer
Out[282]: (188551320, False)
ndmin on an array of the right dim already:
In [286]: x = np.arange(9).reshape(3,3)
In [287]: y = np.array(x, ndmin=2, copy=False)
In [288]: id(x)
Out[288]: 2810813120
In [289]: id(y)
Out[289]: 2810813120 # same id
Similar discussion with astype,
confused about the `copy` attribution of `numpy.astype`

Sort invariant for numpy.argsort with multiple dimensions

numpy.argsort docs state
Returns:
index_array : ndarray, int
Array of indices that sort a along the specified axis. If a is one-dimensional, a[index_array] yields a sorted a.
How can I apply the result of numpy.argsort for a multidimensional array to get back a sorted array? (NOT just a 1-D or 2-D array; it could be an N-dimensional array where N is known only at runtime)
>>> import numpy as np
>>> np.random.seed(123)
>>> A = np.random.randn(3,2)
>>> A
array([[-1.0856306 , 0.99734545],
[ 0.2829785 , -1.50629471],
[-0.57860025, 1.65143654]])
>>> i=np.argsort(A,axis=-1)
>>> A[i]
array([[[-1.0856306 , 0.99734545],
[ 0.2829785 , -1.50629471]],
[[ 0.2829785 , -1.50629471],
[-1.0856306 , 0.99734545]],
[[-1.0856306 , 0.99734545],
[ 0.2829785 , -1.50629471]]])
For me it's not just a matter of using sort() instead; I have another array B and I want to order B using the results of np.argsort(A) along the appropriate axis. Consider the following example:
>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = np.argsort(A,axis=-1)
>>> BsortA = ???
# should result in [[4,1,3],[5,1,9]]
# so that corresponding elements of B and sort(A) stay together
It looks like this functionality is already an enhancement request in numpy.
The numpy issue #8708 has a sample implementation of take_along_axis that does what I need; I'm not sure if it's efficient for large arrays but it seems to work.
def take_along_axis(arr, ind, axis):
"""
... here means a "pack" of dimensions, possibly empty
arr: array_like of shape (A..., M, B...)
source array
ind: array_like of shape (A..., K..., B...)
indices to take along each 1d slice of `arr`
axis: int
index of the axis with dimension M
out: array_like of shape (A..., K..., B...)
out[a..., k..., b...] = arr[a..., inds[a..., k..., b...], b...]
"""
if axis < 0:
if axis >= -arr.ndim:
axis += arr.ndim
else:
raise IndexError('axis out of range')
ind_shape = (1,) * ind.ndim
ins_ndim = ind.ndim - (arr.ndim - 1) #inserted dimensions
dest_dims = list(range(axis)) + [None] + list(range(axis+ins_ndim, ind.ndim))
# could also call np.ix_ here with some dummy arguments, then throw those results away
inds = []
for dim, n in zip(dest_dims, arr.shape):
if dim is None:
inds.append(ind)
else:
ind_shape_dim = ind_shape[:dim] + (-1,) + ind_shape[dim+1:]
inds.append(np.arange(n).reshape(ind_shape_dim))
return arr[tuple(inds)]
which yields
>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = A.argsort(axis=-1)
>>> take_along_axis(A,i,axis=-1)
array([[1, 2, 3],
[0, 4, 6]])
>>> take_along_axis(B,i,axis=-1)
array([[4, 1, 3],
[5, 1, 9]])
This argsort produces a (3,2) array
In [453]: idx=np.argsort(A,axis=-1)
In [454]: idx
Out[454]:
array([[0, 1],
[1, 0],
[0, 1]], dtype=int32)
As you note applying this to A to get the equivalent of np.sort(A, axis=-1) isn't obvious. The iterative solution is sort each row (a 1d case) with:
In [459]: np.array([x[i] for i,x in zip(idx,A)])
Out[459]:
array([[-1.0856306 , 0.99734545],
[-1.50629471, 0.2829785 ],
[-0.57860025, 1.65143654]])
While probably not the fastest, it is probably the clearest solution, and a good starting point for conceptualizing a better solution.
The tuple(inds) from the take solution is:
(array([[0],
[1],
[2]]),
array([[0, 1],
[1, 0],
[0, 1]], dtype=int32))
In [470]: A[_]
Out[470]:
array([[-1.0856306 , 0.99734545],
[-1.50629471, 0.2829785 ],
[-0.57860025, 1.65143654]])
In other words:
In [472]: A[np.arange(3)[:,None], idx]
Out[472]:
array([[-1.0856306 , 0.99734545],
[-1.50629471, 0.2829785 ],
[-0.57860025, 1.65143654]])
The first part is what np.ix_ would construct, but it does not 'like' the 2d idx.
Looks like I explored this topic a couple of years ago
argsort for a multidimensional ndarray
a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
I tried to explain what is going on. The take function does the same sort of thing, but constructs the indexing tuple for a more general case (dimensions and axis). Generalizing to more dimensions, but still with axis=-1 should be easy.
For the first axis, A[np.argsort(A,axis=0),np.arange(2)] works.
We just need to use advanced-indexing to index along all axes with those indices array. We can use np.ogrid to create open grids of range arrays along all axes and then replace only for the input axis with the input indices. Finally, index into data array with those indices for the desired output. Thus, essentially, we would have -
# Inputs : arr, ind, axis
idx = np.ogrid[tuple(map(slice, ind.shape))]
idx[axis] = ind
out = arr[tuple(idx)]
Just to make it functional and do error checks, let's create two functions - One to get those indices and second one to feed in the data array and simply index. The idea with the first function is to get the indices that could be re-used for indexing into any arbitrary array which would support the necessary number of dimensions and lengths along each axis.
Hence, the implementations would be -
def advindex_allaxes(ind, axis):
axis = np.core.multiarray.normalize_axis_index(axis,ind.ndim)
idx = np.ogrid[tuple(map(slice, ind.shape))]
idx[axis] = ind
return tuple(idx)
def take_along_axis(arr, ind, axis):
return arr[advindex_allaxes(ind, axis)]
Sample runs -
In [161]: A = np.array([[3,2,1],[4,0,6]])
In [162]: B = np.array([[3,1,4],[1,5,9]])
In [163]: i = A.argsort(axis=-1)
In [164]: take_along_axis(A,i,axis=-1)
Out[164]:
array([[1, 2, 3],
[0, 4, 6]])
In [165]: take_along_axis(B,i,axis=-1)
Out[165]:
array([[4, 1, 3],
[5, 1, 9]])
Relevant one.

Batch call to array of functions in Python

I am using scipy.interpolate.Rbf, which returns a function, to fit a large number RBF to different sets of points, and storing the output of this in a vector of functions, as follows
import scipy.interpolate as interp
for i in range(0,n):
# Gets data points for this particular iteration
data = get_data(i)
# Fits RBF to data points
zfun_smooth_rbf = interp.Rbf(data[:, 0], data[:, 1], data[:, 2], function='linear', smooth=0)
# Appends RBF function
rbf_fit.append(zfun_smooth_rbf)
And then I am interested in running all the functions to regress a value for each computed RBF. Currently I use a foor loop strategy, similar to what was answered in this question, but this is far from ideal, because it basically runs this sequentially
c = [float(f(x,y) for f in self.rbf_fit]
Is there anyway to run this call with a single call? In other words, I need to call all the functions stored in an array, at the same time. Something like c = self.rbf_fit[:](x,y)?
I'm going to try to combine the __call__ of 2 rbfi into one call.
In [80]: from scipy.interpolate import Rbf
Make a sample as illustrated in the docs:
In [81]: x, y, z, d = np.random.rand(4, 50)
In [82]: rbfi0 = Rbf(x, y, z, d)
In [83]: xi = yi = zi = np.linspace(0, 1, 20)
In [84]: di0 = rbfi0(xi, yi, zi)
In [85]: di0
Out[85]:
array([ 0.26614249, 0.07429816, -0.01512205, 0.05134466, 0.24213774,
0.41653342, 0.45280185, 0.34763177, 0.17681661, 0.07186139,
0.16299749, 0.40416788, 0.641642 , 0.78828711, 0.79709639,
0.6530432 , 0.42473033, 0.24155719, 0.17008326, 0.179932 ])
Make a second sample:
In [86]: x, y, z, d = np.random.rand(4, 50)
In [87]: rbfi1 = Rbf(x, y, z, d)
In [88]: di1 = rbfi1(xi, yi, zi)
In [89]: di1
Out[89]:
array([ 0.38975158, 0.39887118, 0.42430634, 0.48554998, 0.59403568,
0.71745345, 0.77483525, 0.70657269, 0.53545478, 0.34931526,
0.28960157, 0.45825098, 0.7538652 , 0.99950089, 1.14749381,
1.19019632, 1.12736371, 1.00558691, 0.87811695, 0.77231634])
Look at the key attributes of the rbfi:
In [90]: rbfi0.nodes
Out[90]:
array([ -13.02451018, -3.05675802, 8.54073071, -81.47163716,
-5.74247623, 118.70153224, -1.39117053, -3.37170396,
....
-10.08326243, 8.9995743 , 3.83357612, -4.59815344,
-25.09981508, -2.8753806 , -0.63932038, 76.59402274,
0.26222997, -30.35280108])
In [91]: rbfi0.nodes.shape
Out[91]: (50,)
In [92]: rbfi1.nodes.shape
Out[92]: (50,)
In [93]: rbfi0.xi.shape
Out[93]: (3, 50)
In [94]: rbfi1.xi.shape
Out[94]: (3, 50)
Construct the variables in the __call__:
In [95]: xa = np.asarray([a.flatten() for a in [xi,yi,zi]], dtype=np.float_)
In [96]: xa.shape
Out[96]: (3, 20)
In [97]: r0 = rbfi0._call_norm(xa, rbfi0.xi)
In [98]: r1 = rbfi1._call_norm(xa, rbfi1.xi)
In [99]: r0.shape
Out[99]: (20, 50)
In [100]: r1.shape
Out[100]: (20, 50)
Compute the norm for both rbfi with one call - by concatenating the xi arrays:
In [102]: r01 = rbfi0._call_norm(xa, np.concatenate((rbfi0.xi, rbfi1.xi),axis=1))
In [103]: r01.shape
Out[103]: (20, 100)
In [104]: np.allclose(r0, r01[:,:50])
Out[104]: True
In [105]: np.allclose(r1, r01[:,50:])
Now do the same for the nodes' anddot`:
In [110]: res01 = np.dot(rbfi0._function(r01), np.concatenate((rbfi0.nodes, rbfi1.nodes)))
In [111]: res01.shape
Out[111]: (20,)
Oops. We want two sets of 20; this fits it against all 100 nodes at once. I need to do some reshaping.
In [133]: r01.shape
Out[133]: (20, 100)
In [134]: r01 = r01.reshape(20,2,50)
In [135]: nodes01 = np.concatenate((rbfi0.nodes, rbfi1.nodes))
In [136]: nodes01.shape
Out[136]: (100,)
In [137]: nodes01 = nodes01.reshape(2,50) # should have just stacked them
The _function callables for the 2 rbfi differ, so I have use them separately:
In [138]: fr01 = [rbfi0._function(r01[:,0,:]), rbfi1._function(r01[:,1,:])]
In [139]: fr01[0].shape
Out[139]: (20, 50)
With more samples this list would be constructed with a list comprehension.
In [140]: fr01 = np.stack(fr01, axis=1)
In [141]: fr01.shape
Out[141]: (20, 2, 50)
Now I can do the np.dot for the combined rbfi:
In [142]: res01 = np.einsum('ijk,jk->ij', fr01, nodes01)
In [143]: res01.shape
Out[143]: (20, 2)
In [144]: np.allclose(res0, res01[:,0])
Out[144]: True
In [145]: np.allclose(res1, res01[:,1])
Out[145]: True
In [149]: di01 = np.stack([rbfi0(xi, yi, zi), rbfi1(xi, yi, zi)],axis=1)
In [150]: di01.shape
Out[150]: (20, 2)
In [151]: np.allclose(di01, res01)
So I've managed to replace the In [149] iteration with a In [138] one. I don't know if that's a time savings or not. It may depend on how costly the _function call is compared to the rest the rbfi.__call__.
In my example
In [131]: rbfi0._function
Out[131]: <bound method Rbf._h_multiquadric of <scipy.interpolate.rbf.Rbf object at 0xab002fac>>
I don't know if your parameters, function='linear', smooth=0 make a difference. If the respective _function attributes are the same, then I could replace iteration with a
rbfi0._function(r01).reshape(20,2,50)
That gives an idea of how you might speed up the iteration of the rbfi, and maybe even replace it with a 'vector' operation.
It looks like, for the default function, the difference is only in the epsilon value:
In [156]: rbfi0._function??
Signature: rbfi0._function(r)
Source:
def _h_multiquadric(self, r):
return np.sqrt((1.0/self.epsilon*r)**2 + 1)
File: /usr/local/lib/python3.5/dist-packages/scipy/interpolate/rbf.py
Type: method
In [157]: rbfi0.epsilon
Out[157]: 0.25663331561494024
In [158]: rbfi1.epsilon
Out[158]: 0.26163317529091562

Inserting newaxis at variable position in NumPy arrays

Normally, when we know where should we insert the newaxis, we can do a[:, np.newaxis,...]. Is there any good way to insert the newaxis at certain axis?
Here is how I do it now. I think there must be some much better ways than this:
def addNewAxisAt(x, axis):
_s = list(x.shape)
_s.insert(axis, 1)
return x.reshape(tuple(_s))
def addNewAxisAt2(x, axis):
ind = [slice(None)]*x.ndim
ind.insert(axis, np.newaxis)
return x[ind]
That singleton dimension (dim length = 1) could be added as a shape criteria to the original array shape with np.insert and thus directly change its shape, like so -
x.shape = np.insert(x.shape,axis,1)
Well, we might as well extend this to invite more than one new axes with a bit of np.diff and np.cumsum trick, like so -
insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
x.shape = np.insert(x.shape,insert_idx,1)
Sample runs -
In [151]: def addNewAxisAt(x, axis):
...: insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
...: x.shape = np.insert(x.shape,insert_idx,1)
...:
In [152]: A = np.random.rand(4,5)
In [153]: addNewAxisAt(A, axis=1)
In [154]: A.shape
Out[154]: (4, 1, 5)
In [155]: A = np.random.rand(5,6,8,9,4,2)
In [156]: addNewAxisAt(A, axis=5)
In [157]: A.shape
Out[157]: (5, 6, 8, 9, 4, 1, 2)
In [158]: A = np.random.rand(5,6,8,9,4,2,6,7)
In [159]: addNewAxisAt(A, axis=(1,3,4,6))
In [160]: A.shape
Out[160]: (5, 1, 6, 1, 1, 8, 1, 9, 4, 2, 6, 7)
np.insert does
slobj = [slice(None)]*ndim
...
slobj[axis] = slice(None, index)
...
new[slobj] = arr[slobj2]
Like you it constructs a list of slices, and modifies one or more elements.
apply_along_axis constructs an array, and converts it to indexing tuple
outarr[tuple(i.tolist())] = res
Other numpy functions work this way as well.
My suggestion is to make initial list large enough to hold the None. Then I don't need to use insert:
In [1076]: x=np.ones((3,2,4),int)
In [1077]: ind=[slice(None)]*(x.ndim+1)
In [1078]: ind[2]=None
In [1080]: x[ind].shape
Out[1080]: (3, 2, 1, 4)
In [1081]: x[tuple(ind)].shape # sometimes converting a list to tuple is wise
Out[1081]: (3, 2, 1, 4)
Turns out there is a np.expand_dims
In [1090]: np.expand_dims(x,2).shape
Out[1090]: (3, 2, 1, 4)
It uses reshape like you do, but creates the new shape with tuple concatenation.
def expand_dims(a, axis):
a = asarray(a)
shape = a.shape
if axis < 0:
axis = axis + len(shape) + 1
return a.reshape(shape[:axis] + (1,) + shape[axis:])
Timings don't tell me much about which is better. They are the 2 µs range, where simply wrapping the code in a function makes a difference.

Efficiently slice windows from a 1D numpy array, around indices given by second 2D array

I want to extract multiple slices from the same 1D numpy array, where the slice indices are drawn from a random distribution. Basically, I want to achieve the following:
import numpy as np
import numpy.random
# generate some 1D data
data = np.random.randn(500)
# window size (slices are 2*winsize long)
winsize = 60
# number of slices to take from the data
inds_size = (100, 200)
# get random integers that function as indices into the data
inds = np.random.randint(low=winsize, high=len(data)-winsize, size=inds_size)
# now I want to extract slices of data, running from inds[0,0]-60 to inds[0,0]+60
sliced_data = np.zeros( (winsize*2,) + inds_size )
for k in range(inds_size[0]):
for l in range(inds_size[1]):
sliced_data[:,k,l] = data[inds[k,l]-winsize:inds[k,l]+winsize]
# sliced_data.shape is now (120, 100, 200)
The above nested loop works fine, but is very slow. In my real code, I will need to do this thousands of times, for data arrays a lot bigger than these. Is there any way to do this more efficiently?
Note that inds will always be 2D in my case, but after getting the slices I will always be summing over one of these two dimensions, so an approach that only accumulates the sum across the one dimension would be fine.
I found this question and this answer which seem almost the same. However, the question is only about a 1D indexing vector (as opposed to my 2D). Also, the answer lacks a bit of context, as I don't really understand how the suggested as_strided works. Since my problem does not seem uncommon, I thought I'd ask again in the hope of a more explanatory answer rather than just code.
Using as_strided in this way appears to be somewhat faster than Divakar's approach (20 ms vs 35 ms here), although memory usage might be an issue.
data_wins = as_strided(data, shape=(data.size - 2*winsize + 1, 2*winsize), strides=(8, 8))
inds = np.random.randint(low=0, high=data.size - 2*winsize, size=inds_size)
sliced = data_wins[inds]
sliced = sliced.transpose((2, 0, 1)) # to use the same index order as before
Strides are the steps in bytes for the index in each dimension. For example, with an array of shape (x, y, z) and a data type of size d (8 for float64), the strides will ordinarily be (y*z*d, z*d, d), so that the second index steps over whole rows of z items. Setting both values to 8, data_wins[i, j] and data_wins[j, i] will refer to the same memory location.
>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>> a = np.arange(10, dtype=np.int8)
>>> as_strided(a, shape=(3, 10 - 2), strides=(1, 1))
array([[0, 1, 2, 3, 4, 5, 6, 7],
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 3, 4, 5, 6, 7, 8, 9]], dtype=int8)
Here's a vectorized approach using broadcasting -
# Get 3D offsetting array and add to inds for all indices
allinds = inds + np.arange(-60,60)[:,None,None]
# Index into data with all indices for desired output
sliced_dataout = data[allinds]
Runtime test -
In [20]: # generate some 1D data
...: data = np.random.randn(500)
...:
...: # window size (slices are 2*winsize long)
...: winsize = 60
...:
...: # number of slices to take from the data
...: inds_size = (100, 200)
...:
...: # get random integers that function as indices into the data
...: inds=np.random.randint(low=winsize,high=len(data)-winsize, size=inds_size)
...:
In [21]: %%timeit
...: sliced_data = np.zeros( (winsize*2,) + inds_size )
...: for k in range(inds_size[0]):
...: for l in range(inds_size[1]):
...: sliced_data[:,k,l] = data[inds[k,l]-winsize:inds[k,l]+winsize]
...:
10 loops, best of 3: 66.9 ms per loop
In [22]: %%timeit
...: allinds = inds + np.arange(-60,60)[:,None,None]
...: sliced_dataout = data[allinds]
...:
10 loops, best of 3: 24.1 ms per loop
Memory consumption : Compromise solution
If memory consumption is an issue, here's a compromise solution with one loop -
sliced_dataout = np.zeros( (winsize*2,) + inds_size )
for k in range(sliced_data.shape[0]):
sliced_dataout[k] = data[inds-winsize+k]

Categories

Resources