What is the most efficient way to reshape data to fencepost with numpy?
data = np.array([1, 2, 3, 4, 5])
fencepost = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
You can achieve the same result simply by looking at the same data differently:
>>> from numpy.lib.stride_tricks import as_strided
>>> fencepost = as_strided(data, shape=(data.shape[0]-1, 2),
strides=(data.strides[0],)*2)
>>> fencepost
array([[1, 2],
[2, 3],
[3, 4],
[4, 5]])
No data is being copied, so especially for very large arrays, this is going to be about as quick as it gets. And if you do need a separate copy, you can simply do fencepost = fencepost.copy() and let numpy handle everything internally for you:
In [11]: data = np.arange(10000000)
In [12]: %timeit as_strided(data, shape=(data.shape[0]-1, 2),
... strides=(data.strides[0],)*2)
100000 loops, best of 3: 12.2 us per loop
In [13]: %timeit as_strided(data, shape=(data.shape[0]-1, 2),
... strides=(data.strides[0],)*2).copy()
10 loops, best of 3: 183 ms per loop
This isn't really reshaping, because the second array has a different number of elements. If the first array has N elements (in this case N=5) the second has 2N-2 (in this case 8).
So you will have to make a new array and populate it accordingly. There are two approaches to this. You can populate column by column, or row by row. Which is more efficient will depend on ... well lets find out!
Here I use %timeit from IPython with three different array sizes:
import numpy as np
from numba import jit
data = np.array([1, 2, 3, 4, 5])
#fencepost = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
def fp1(data):
f = np.zeros((data.shape[0]-1,2))
for i in range(data.shape[0]-1):
f[i] = data[i:i+2]
return f
def fp2(data):
f = np.zeros((data.shape[0]-1,2))
f[:,0] = data[:-1]
f[:,1] = data[1:]
return f
%timeit fp1(data)
%timeit fp2(data)
data2 = np.array(range(100000))
%timeit fp1(data2)
%timeit fp2(data2)
data3 = np.array(range(10000000))
%timeit fp1(data3)
%timeit fp2(data3)
On my computer the results are slightly more efficient to do row by row for small arrays, but quickly column by column is much, much better (hence fp2 is the efficient answer):
100000 loops, best of 3: 13 µs per loop
100000 loops, best of 3: 14.4 µs per loop
1 loops, best of 3: 203 ms per loop
1000 loops, best of 3: 1.09 ms per loop
1 loops, best of 3: 20.7 s per loop
1 loops, best of 3: 253 ms per loop
Essentially, fp2 is faster because it is only 2 numpy operations, whereas fp1 is a loop requiring many calls to numpy. For small arrays, the overhead of 5 calls to numpy is negligible.
Related
Is there a more numpythonic way to do this?
#example arrays
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7], dtype=np.float32)
values = np.array([0.2, 3.0, 1.5])
#get the indices where each value falls between values in arr
between = [np.nonzero(i > arr)[0][-1] for i in values]
For sorted arr, we can use np.searchsorted for performance -
In [67]: np.searchsorted(arr,values)-1
Out[67]: array([0, 2, 1])
Timings on large dataset -
In [81]: np.random.seed(0)
...: arr = np.unique(np.random.randint(0,10000, 10000))
...: values = np.random.randint(0,10000, 1000)
# #Andy L.'s soln
In [84]: %timeit np.argmin(values > arr[:,None], axis=0) - 1
10 loops, best of 3: 28.2 ms per loop
# Original soln
In [82]: %timeit [np.nonzero(i > arr)[0][-1] for i in values]
100 loops, best of 3: 8.68 ms per loop
# From this post
In [83]: %timeit np.searchsorted(arr,values)-1
10000 loops, best of 3: 57.8 µs per loop
Use broadcast and argmin
np.argmin(values > arr[:,None], axis=0) - 1
Out[32]: array([0, 2, 1], dtype=int32)
Note: I assume arr is monotonic increasing as in the sample
I've got a numpy array of strictly increasing "cutoff" values of length m, and a pandas series of values (thought the index isn't important and this could be cast to a numpy array) of values of length n.
I need to come up with an efficient way of spitting out a length m vector of counts of the number of elements in the pandas series less than the jth element of the "cutoff" array.
I could do this via a list iterator:
output = array([(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar])
but I was wondering if there were any way to do this that leveraged more of numpy's magic speed, as I have to do this quite a few times inside multiple loops and it keeps crasshing my computer.
Thanks!
Is this what you are looking for?
In [36]: a = np.random.random(20)
In [37]: a
Out[37]:
array([ 0.68574307, 0.15743428, 0.68006876, 0.63572484, 0.26279663,
0.14346269, 0.56267286, 0.47250091, 0.91168387, 0.98915746,
0.22174062, 0.11930722, 0.30848231, 0.1550406 , 0.60717858,
0.23805205, 0.57718675, 0.78075297, 0.17083826, 0.87301963])
In [38]: b = np.array((0.3,0.7))
In [39]: np.sum(a[:,None]<b[None,:], axis=0)
Out[39]: array([ 8, 16])
In [40]: np.sum(a[:,None]<b, axis=0) # b's new axis above is unnecessary...
Out[40]: array([ 8, 16])
In [41]: (a[:,None]<b).sum(axis=0) # even simpler
Out[41]: array([ 8, 16])
Timings are always well received (for a longish, 2E6 elements array)
In [47]: a = np.random.random(2000000)
In [48]: %timeit (a[:,None]<b).sum(axis=0)
10 loops, best of 3: 78.2 ms per loop
In [49]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
1 loop, best of 3: 448 ms per loop
For a smaller array
In [50]: a = np.random.random(2000)
In [51]: %timeit (a[:,None]<b).sum(axis=0)
10000 loops, best of 3: 89 µs per loop
In [52]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 141 µs per loop
Edit
Divakar says that things may be different for lenghty bs, let's see
In [71]: a = np.random.random(2000)
In [72]: b =np.random.random(200)
In [73]: %timeit (a[:,None]<b).sum(axis=0)
1000 loops, best of 3: 1.44 ms per loop
In [74]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
10000 loops, best of 3: 172 µs per loop
quite different indeed! Thank you for prompting my curiosity.
Probably the OP should test for his use case, very long sample with respect to cutoff sequences or not? and where there is a balance?
Edit #2
I made a blooper in my timings, I forgot the axis=0 argument to .sum()...
I've edited the timings with the corrected statement and, of course, the corrected timing. My apologies.
You can use np.searchsorted for some NumPy magic -
# Convert to numpy array for some "magic"
pan_series_arr = np.array(pan_series)
# Let the magic begin!
sortidx = pan_series_arr.argsort()
out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
Explanation
You are performing [(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar] i.e. for each
element in cutoff_ar, we are counting the number of pan_series elements that are lesser than it. Now with np.searchsorted, we are looking for cutoff_ar to be put in a sorted pan_series_arr and get the indices of such positions compared to whom the current element in cutoff_ar is at 'right' position . These indices essentially represent the number of pan_series elements below the current cutoff_ar element, thus giving us our desired output.
Sample run
In [302]: cutoff_ar
Out[302]: array([ 1, 3, 9, 44, 63, 90])
In [303]: pan_series_arr
Out[303]: array([ 2, 8, 69, 55, 97])
In [304]: [(pan_series_arr < cutoff_val).sum() for cutoff_val in cutoff_ar]
Out[304]: [0, 1, 2, 2, 3, 4]
In [305]: sortidx = pan_series_arr.argsort()
...: out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
...:
In [306]: out
Out[306]: array([0, 1, 2, 2, 3, 4])
>>> import numpy as np
>>> a = np.arange(5)
>>> b = desired_function(a, 4)
array([[0, 3, 4, 1],
... [1, 2, 1, 3],
... [2, 4, 2, 4],
... [3, 1, 3, 0],
... [4, 0, 0, 2]])
What I've tried so far
def repeat_and_shuffle(a, ncols):
nrows, = a.shape
m = np.tile(a.reshape(nrows, 1), (1, ncols))
return m
Somehow I have to shuffle m[:,1:ncols] efficiently by column.
Here is one way to create such an array:
>>> a = np.arange(5)
>>> perms = np.argsort(np.random.rand(a.shape[0], 3), axis=0) # 3 columns
>>> np.hstack((a[:,np.newaxis], a[perms]))
array([[0, 3, 1, 4],
[1, 2, 3, 0],
[2, 1, 4, 1],
[3, 4, 0, 3],
[4, 0, 2, 2]])
This creates an array of random values of the required shape and then sorts the indices in each column by their corresponding value. This array of indices is then used to index a.
(The idea of using np.argsort to create an array of columns of permuted indices came from #jme's answer here.)
Build the new array using random permutations of the original.
>>> a = np.arange(5)
>>> n = 4
>>> z = np.array([a]+[np.random.permutation(a) for _ in xrange(n-1)])
>>> z.T
array([[0, 0, 4, 3],
[1, 1, 3, 0],
[2, 3, 2, 4],
[3, 2, 0, 2],
[4, 4, 1, 1]])
>>>
Duplicate columns are possible because of the randomness.
This is a version of Ashwini Chaudhary's solution:
>>> a = numpy.array(['a', 'b', 'c', 'd', 'e'])
>>> a = numpy.tile(a[:,None], 5)
>>> a[:,1:] = numpy.apply_along_axis(numpy.random.permutation, 0, a[:,1:])
>>> a
array([['a', 'c', 'a', 'd', 'c'],
['b', 'd', 'b', 'e', 'a'],
['c', 'e', 'd', 'a', 'e'],
['d', 'a', 'e', 'b', 'd'],
['e', 'b', 'c', 'c', 'b']],
dtype='|S1')
I think it's well-conceived and pedagogically useful (and I hope he undeletes it). But somewhat surprisingly, it's consistently the slowest one in the tests I've performed. Definitions:
>>> def column_perms_along(a, cols):
... a = numpy.tile(a[:,None], cols)
... a[:,1:] = numpy.apply_along_axis(numpy.random.permutation, 0, a[:,1:])
... return a
...
>>> def column_perms_argsort(a, cols):
... perms = np.argsort(np.random.rand(a.shape[0], cols - 1), axis=0)
... return np.hstack((a[:,None], a[perms]))
...
>>> def column_perms_lc(a, cols):
... z = np.array([a] + [np.random.permutation(a) for _ in xrange(cols - 1)])
... return z.T
...
For small arrays and few columns:
>>> %timeit column_perms_along(a, 5)
1000 loops, best of 3: 272 µs per loop
>>> %timeit column_perms_argsort(a, 5)
10000 loops, best of 3: 23.7 µs per loop
>>> %timeit column_perms_lc(a, 5)
1000 loops, best of 3: 165 µs per loop
For small arrays and many columns:
>>> %timeit column_perms_along(a, 500)
100 loops, best of 3: 29.8 ms per loop
>>> %timeit column_perms_argsort(a, 500)
10000 loops, best of 3: 185 µs per loop
>>> %timeit column_perms_lc(a, 500)
100 loops, best of 3: 11.7 ms per loop
For big arrays and few columns:
>>> A = numpy.arange(1000)
>>> %timeit column_perms_along(A, 5)
1000 loops, best of 3: 2.97 ms per loop
>>> %timeit column_perms_argsort(A, 5)
1000 loops, best of 3: 447 µs per loop
>>> %timeit column_perms_lc(A, 5)
100 loops, best of 3: 2.27 ms per loop
And for big arrays and many columns:
>>> %timeit column_perms_along(A, 500)
1 loops, best of 3: 281 ms per loop
>>> %timeit column_perms_argsort(A, 500)
10 loops, best of 3: 71.5 ms per loop
>>> %timeit column_perms_lc(A, 500)
1 loops, best of 3: 269 ms per loop
The moral of the story: always test! I imagine that for extremely large arrays, the disadvantage of an n log n solution like sorting might become apparent here. But numpy's implementation of sorting is extremely well-tuned in my experience. I bet you could go up several orders of magnitude before noticing an effect.
Assuming you are ultimately intending to loop over multiple 1D input arrays, you might be able to cache your permutation indices and then just take rather than permute at the point of use. This can work even if the length of the 1D arrays varies: you just need to discard the permutation indices that are too large.
Rough (partially tested) code for implementation:
def permute_multi(X, k, _cache={}):
"""For 1D input `X` of len `n`, it generates an `(k,n)` array
giving `k` permutations of `X`."""
n = len(X)
cached_inds = _cache.get('inds',np.array([[]]))
# make sure that cached_inds has shape >= (k,n)
if cached_inds.shape[1] < n:
_cache['inds'] = cached_inds = np.empty(shape=(k,n),dtype=int)
for i in xrange(k):
cached_inds[i,:] = np.random.permutation(n)
elif cached_inds.shape[0] < k:
pass # TODO: need to generate more rows
inds = cached_inds[:k,:] # dispose of excess rows
if n < cached_inds.shape[1]:
# dispose of high indices
inds = inds.compress(inds.ravel()<n).reshape((k,n))
return X[inds]
Depending on your usage you might want to provide some way of clearing the cache, or at least some heuristic that can spot when the cached n and k have grown much larger than most of the common inputs. Note that the above function gives (k,n) not (n,k), this is because numpy defaults to rows being contiguous and we want the n-dimension to be contiguous - you could force Fortran-style if you wish, or just transpose the output (which flips a flag inside the array rather than really moving data).
In terms of whether this caching concept is statistically valid, I believe that in most cases it is probably fine, since it is roughly equivalent to resetting the seed at the start of the function to a fixed constant...but if you are doing anything particularly fancy with the returned array you might need to think carefully before using this approach.
A quick benchmark says that (once warmed up) for n=1000 and k=1000 this takes about 2.2 ms, compared to 150 ms for the full k-loop over np.random.permutation. Which is about 70 times faster...but that's in the simplest case where we don't call compress. For n=999 and k=1000, having warmed up with n=1000, it takes an extra few ms, giving 8ms total time, which is still about 19 times faster than the k-loop.
Suppose I have an array
import numpy as np
x=np.array([5,7,2])
I want to create an array that contains a sequence of ranges stacked together with the
length of each range given by x:
y=np.hstack([np.arange(1,n+1) for n in x])
Is there some way to do this without the speed penalty of a list comprehension or looping. (x could be a very large array)
The result should be
y == np.array([1,2,3,4,5,1,2,3,4,5,6,7,1,2])
You could use accumulation:
def my_sequences(x):
x = x[x != 0] # you can skip this if you do not have 0s in x.
# Create result array, filled with ones:
y = np.cumsum(x, dtype=np.intp)
a = np.ones(y[-1], dtype=np.intp)
# Set all beginnings to - previous length:
a[y[:-1]] -= x[:-1]
# and just add it all up (btw. np.add.accumulate is equivalent):
return np.cumsum(a, out=a) # here, in-place should be safe.
(One word of caution: If you result array would be larger then the possible size np.iinfo(np.intp).max this might with some bad luck return wrong results instead of erroring out cleanly...)
And because everyone always wants timings (compared to Ophion's) method:
In [11]: x = np.random.randint(0, 20, 1000000)
In [12]: %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
1 loops, best of 3: 753 ms per loop
In [13]: %timeit my_sequences(x)
1 loops, best of 3: 191 ms per loop
of course the my_sequences function will not ill-perform when the values of x get large.
First idea; prevent multiple calls to np.arange and concatenate should be much faster then hstack:
import numpy as np
x=np.array([5,7,2])
>>>a=np.arange(1,x.max()+1)
>>> np.hstack([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])
>>> np.concatenate([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])
If there are many nonunique values this seems more efficient:
>>>ua,uind=np.unique(x,return_inverse=True)
>>>a=[np.arange(1,k+1) for k in ua]
>>>np.concatenate(np.take(a,uind))
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])
Some timings for your case:
x=np.random.randint(0,20,1000000)
Original code
#Using hstack
%timeit np.hstack([np.arange(1,n+1) for n in x])
1 loops, best of 3: 7.46 s per loop
#Using concatenate
%timeit np.concatenate([np.arange(1,n+1) for n in x])
1 loops, best of 3: 5.27 s per loop
First code:
#Using hstack
%timeit a=np.arange(1,x.max()+1);np.hstack([a[:k] for k in x])
1 loops, best of 3: 3.03 s per loop
#Using concatenate
%timeit a=np.arange(1,x.max()+1);np.concatenate([a[:k] for k in x])
10 loops, best of 3: 998 ms per loop
Second code:
%timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
10 loops, best of 3: 522 ms per loop
Looks like we gain a 14x speedup with the final code.
Small sanity check:
ua,uind=np.unique(x,return_inverse=True)
a=[np.arange(1,k+1) for k in ua]
out=np.concatenate(np.take(a,uind))
>>>out.shape
(9498409,)
>>>np.sum(x)
9498409
I want to broadcast an array b to the shape it would take if it were in an arithmetic operation with another array a.
For example, if a.shape = (3,3) and b was a scalar, I want to get an array whose shape is (3,3) and is filled with the scalar.
One way to do this is like this:
>>> import numpy as np
>>> a = np.arange(9).reshape((3,3))
>>> b = 1 + a*0
>>> b
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
Although this works practically, I can't help but feel it looks a bit weird, and wouldn't be obvious to someone else looking at the code what I was trying to do.
Is there any more elegant way to do this? I've looked at the documentation for np.broadcast, but it's orders of magnitude slower.
In [1]: a = np.arange(10000).reshape((100,100))
In [2]: %timeit 1 + a*0
10000 loops, best of 3: 31.9 us per loop
In [3]: %timeit bc = np.broadcast(a,1);np.fromiter((v for u, v in bc),float).reshape(bc.shape)
100 loops, best of 3: 5.2 ms per loop
In [4]: 5.2e-3/32e-6
Out[4]: 162.5
If you just want to fill an array with a scalar, fill is probably the best choice. But it sounds like you want something more generalized. Rather than using broadcast you can use broadcast_arrays to get the result that (I think) you want.
>>> a = numpy.arange(9).reshape(3, 3)
>>> numpy.broadcast_arrays(a, 1)[1]
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
This generalizes to any two broadcastable shapes:
>>> numpy.broadcast_arrays(a, [1, 2, 3])[1]
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
It's not quite as fast as your ufunc-based method, but it's still on the same order of magnitude:
>>> %timeit 1 + a * 0
10000 loops, best of 3: 23.2 us per loop
>>> %timeit numpy.broadcast_arrays(a, 1)[1]
10000 loops, best of 3: 52.3 us per loop
But scalars, fill is still the clear front-runner:
>>> %timeit b = numpy.empty_like(a, dtype='i8'); b.fill(1)
100000 loops, best of 3: 6.59 us per loop
Finally, further testing shows that the fastest approach -- in at least some cases -- is to multiply by ones:
>>> %timeit numpy.broadcast_arrays(a, numpy.arange(100))[1]
10000 loops, best of 3: 53.4 us per loop
>>> %timeit (1 + a * 0) * numpy.arange(100)
10000 loops, best of 3: 45.9 us per loop
>>> %timeit b = numpy.ones_like(a, dtype='i8'); b * numpy.arange(100)
10000 loops, best of 3: 28.9 us per loop
The fastest and cleanest solution I know is:
b_arr = numpy.empty(a.shape) # Empty array
b_arr.fill(b) # Filling with one value
fill sounds like the simplest way:
>>> a = np.arange(9).reshape((3,3))
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.fill(10)
>>> a
array([[10, 10, 10],
[10, 10, 10],
[10, 10, 10]])
EDIT: As #EOL points out, you don't need arange if you want to create a new array, np.empty((100,100)) (or whatever shape) is better for this.
Timings:
In [3]: a = np.arange(10000).reshape((100,100))
In [4]: %timeit 1 + a*0
100000 loops, best of 3: 19.9 us per loop
In [5]: a = np.arange(10000).reshape((100,100))
In [6]: %timeit a.fill(1)
100000 loops, best of 3: 3.73 us per loop
If you just need to broadcast a scalar to some arbitrary shape, you can do something like this:
a = b*np.ones(shape=(3,3))
Edit: np.tile is more general. You can use it to duplicate any scalar/vector in any number of dimensions:
b = 1
N = 100
a = np.tile(b, reps=(N, N))