How to efficiently concatenate many arange calls in numpy? - python

I'd like to vectorize calls like numpy.arange(0, cnt_i) over a vector of cnt values and concatenate the results like this snippet:
import numpy
cnts = [1,2,3]
numpy.concatenate([numpy.arange(cnt) for cnt in cnts])
array([0, 0, 1, 0, 1, 2])
Unfortunately the code above is very memory inefficient due to the temporary arrays and list comprehension looping.
Is there a way to do this more efficiently in numpy?

Here's a completely vectorized function:
def multirange(counts):
counts = np.asarray(counts)
# Remove the following line if counts is always strictly positive.
counts = counts[counts != 0]
counts1 = counts[:-1]
reset_index = np.cumsum(counts1)
incr = np.ones(counts.sum(), dtype=int)
incr[0] = 0
incr[reset_index] = 1 - counts1
# Reuse the incr array for the final result.
incr.cumsum(out=incr)
return incr
Here's a variation of #Developer's answer that only calls arange once:
def multirange_loop(counts):
counts = np.asarray(counts)
ranges = np.empty(counts.sum(), dtype=int)
seq = np.arange(counts.max())
starts = np.zeros(len(counts), dtype=int)
starts[1:] = np.cumsum(counts[:-1])
for start, count in zip(starts, counts):
ranges[start:start + count] = seq[:count]
return ranges
And here's the original version, written as a function:
def multirange_original(counts):
ranges = np.concatenate([np.arange(count) for count in counts])
return ranges
Demo:
In [296]: multirange_original([1,2,3])
Out[296]: array([0, 0, 1, 0, 1, 2])
In [297]: multirange_loop([1,2,3])
Out[297]: array([0, 0, 1, 0, 1, 2])
In [298]: multirange([1,2,3])
Out[298]: array([0, 0, 1, 0, 1, 2])
Compare timing using a larger array of counts:
In [299]: counts = np.random.randint(1, 50, size=50)
In [300]: %timeit multirange_original(counts)
10000 loops, best of 3: 114 µs per loop
In [301]: %timeit multirange_loop(counts)
10000 loops, best of 3: 76.2 µs per loop
In [302]: %timeit multirange(counts)
10000 loops, best of 3: 26.4 µs per loop

Try the following for solving memory problem, efficiency is almost the same.
out = np.empty((sum(cnts)))
k = 0
for cnt in cnts:
out[k:k+cnt] = np.arange(cnt)
k += cnt
so no concatenation is used.

np.tril_indices pretty much does this for you:
In [28]: def f(c):
....: return np.tril_indices(c, -1)[1]
In [29]: f(10)
Out[29]:
array([0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1,
2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 8])
In [33]: %timeit multirange(range(10))
10000 loops, best of 3: 93.2 us per loop
In [34]: %timeit f(10)
10000 loops, best of 3: 68.5 us per loop
much faster than #Warren Weckesser multirange when the dimension is small.
But becomes much slower when the dimension is larger (#hpaulj, you have a very good point):
In [36]: %timeit multirange(range(1000))
100 loops, best of 3: 5.62 ms per loop
In [37]: %timeit f(1000)
10 loops, best of 3: 68.6 ms per loop

Related

Efficient way of creating a matrix in numpy by duplicating a given array along a diagonal

I have some array that I already know, say:
a = np.array([1,2,3])
I also know that I want a matrix of some size that has total length of length a + some amount n and width n+1, like so:
n = 4
length = len(a) + n
width = n + 1
I'm looking to create an array that looks like this:
array([[1,2,3,0,0,0,0],
[0,1,2,3,0,0,0],
[0,0,1,2,3,0,0],
[0,0,0,1,2,3,0],
[0,0,0,0,1,2,3]])
Unfortunately numpy.kron and in general block diagonals are not what I'm looking for, since that would cause the next row to increment by 3 instead of 1.
I have a way of doing it where I can create each row of the matrix by using a for loop and stacking the resulting arrays on top of each other as well as a method where I use scipy.sparse.diag to create the array using, again, a for loop, but I was wondering if there was a more efficient method.
Here's one with np.lib.stride_tricks.as_strided that gives us views into a zeros-padded array and as such are very efficient, both memory-wise and performance-wise -
def sliding_windows(a, n=4):
length = len(a) + n
width = n + 1
z_pad = np.zeros(n,dtype=a.dtype)
ac = np.r_[z_pad, a, z_pad]
s = ac.strides[0]
strided = np.lib.stride_tricks.as_strided
return strided(ac[n:], shape=(width, length), strides=(-s,s),writeable=False)
If you need a writable version, simply make a copy with sliding_windows(a, n=4).copy().
Sample runs -
In [42]: a
Out[42]: array([1, 2, 3])
In [43]: sliding_windows(a, n=4)
Out[43]:
array([[1, 2, 3, 0, 0, 0, 0],
[0, 1, 2, 3, 0, 0, 0],
[0, 0, 1, 2, 3, 0, 0],
[0, 0, 0, 1, 2, 3, 0],
[0, 0, 0, 0, 1, 2, 3]])
In [44]: sliding_windows(a, n=5)
Out[44]:
array([[1, 2, 3, 0, 0, 0, 0, 0],
[0, 1, 2, 3, 0, 0, 0, 0],
[0, 0, 1, 2, 3, 0, 0, 0],
[0, 0, 0, 1, 2, 3, 0, 0],
[0, 0, 0, 0, 1, 2, 3, 0],
[0, 0, 0, 0, 0, 1, 2, 3]])
One more with array-assignment, which should be good if you need a writable version -
def sliding_windows_arrassign(a, n=4):
pad_length = len(a) + n + 1
width = n + 1
p = np.zeros((width,pad_length),dtype=a.dtype)
p[:,:len(a)] = a
return p.ravel()[:-n-1].reshape(width,-1)
Benchmarking on larger arrays
1) 100 elements and similar n :
In [101]: a = np.arange(1,101)
In [102]: %timeit sliding_windows(a, n=len(a)+1)
100000 loops, best of 3: 17.6 µs per loop
In [103]: %timeit sliding_windows_arrassign(a, n=len(a)+1)
100000 loops, best of 3: 8.63 µs per loop
# #Julien's soln
In [104]: %%timeit
...: n = len(a)+1
...: m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
...: m.shape = (n+1, n+len(a))
100000 loops, best of 3: 15 µs per loop
2) ~5000 elements and similar n :
In [82]: a = np.arange(1,5000)
In [83]: %timeit sliding_windows(a, n=len(a)+1)
10000 loops, best of 3: 23.2 µs per loop
In [84]: %timeit sliding_windows_arrassign(a, n=len(a)+1)
10 loops, best of 3: 28.9 ms per loop
# #Julien's soln
In [91]: %%timeit
...: n = len(a)+1
...: m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
...: m.shape = (n+1, n+len(a))
10 loops, best of 3: 34.3 ms per loop
np.lib.stride_tricks.as_strided would have a constant runtime irrespective of the array length owing to the memory efficiency discussed earlier.
Here's another more concise one, not sure how it compares for efficiency though:
a = np.array([1,2,3])
n = 4
m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
m.shape = (n+1, n+len(a))
Efficiency comparison (writable version):
import numpy as np
a = np.arange(100)
n = 100
def Julien(a, n=4):
m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
m.shape = (n+1, n+len(a))
return m
def Divakar(a, n=4):
length = len(a) + n
width = n + 1
z_pad = np.zeros(n,dtype=a.dtype)
ac = np.r_[z_pad, a, z_pad]
s = ac.strides[0]
strided = np.lib.stride_tricks.as_strided
return strided(ac[n:], shape=(width, length), strides=(-s,s))
%timeit Julien(a)
%timeit Divakar(a)
18.1 µs ± 333 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
23.4 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Here is an inbuilt (sort of) method. Not the fastest, though:
>>> import scipy.linalg as sl
>>> def f_pp(a, n=4):
... pad = np.zeros((n,), a.dtype)
... return sl.toeplitz(*map(np.concatenate, ((a[:1], pad), (a, pad))))
...
>>> f_pp(np.array([1,2,3]))
array([[1, 2, 3, 0, 0, 0, 0],
[0, 1, 2, 3, 0, 0, 0],
[0, 0, 1, 2, 3, 0, 0],
[0, 0, 0, 1, 2, 3, 0],
[0, 0, 0, 0, 1, 2, 3]])

Number of unique elements per row in a NumPy array

For example, for
a = np.array([[1, 0, 0], [1, 0, 0], [2, 3, 4]])
I want to get
[2, 2, 3]
Is there a way to do this without for loops or using np.vectorize?
Edit: Actual data consists of 1000 rows of 100 elements each, with each element ranging from 1 to 365. The ultimate goal is to determine the percentage of rows that have duplicates. This was a homework problem which I already solved (with a for loop), but I was just wondering if there was a better way to do it with numpy.
Approach #1
One vectorized approach with sorting -
In [8]: b = np.sort(a,axis=1)
In [9]: (b[:,1:] != b[:,:-1]).sum(axis=1)+1
Out[9]: array([2, 2, 3])
Approach #2
Another method for ints that aren't very large would be with offsetting each row by an offset that would differentiate elements off each row from others and then doing binned-summation and counting number of non-zero bins per row -
n = a.max()+1
a_off = a+(np.arange(a.shape[0])[:,None])*n
M = a.shape[0]*n
out = (np.bincount(a_off.ravel(), minlength=M).reshape(-1,n)!=0).sum(1)
Runtime test
Approaches as funcs -
def sorting(a):
b = np.sort(a,axis=1)
return (b[:,1:] != b[:,:-1]).sum(axis=1)+1
def bincount(a):
n = a.max()+1
a_off = a+(np.arange(a.shape[0])[:,None])*n
M = a.shape[0]*n
return (np.bincount(a_off.ravel(), minlength=M).reshape(-1,n)!=0).sum(1)
# From #wim's post
def pandas(a):
df = pd.DataFrame(a.T)
return df.nunique()
# #jp_data_analysis's soln
def numpy_apply(a):
return np.apply_along_axis(compose(len, np.unique), 1, a)
Case #1 : Square shaped one
In [164]: np.random.seed(0)
In [165]: a = np.random.randint(0,5,(10000,10000))
In [166]: %timeit numpy_apply(a)
...: %timeit sorting(a)
...: %timeit bincount(a)
...: %timeit pandas(a)
1 loop, best of 3: 1.82 s per loop
1 loop, best of 3: 1.93 s per loop
1 loop, best of 3: 354 ms per loop
1 loop, best of 3: 879 ms per loop
Case #2 : Large number of rows
In [167]: np.random.seed(0)
In [168]: a = np.random.randint(0,5,(1000000,10))
In [169]: %timeit numpy_apply(a)
...: %timeit sorting(a)
...: %timeit bincount(a)
...: %timeit pandas(a)
1 loop, best of 3: 8.42 s per loop
10 loops, best of 3: 153 ms per loop
10 loops, best of 3: 66.8 ms per loop
1 loop, best of 3: 53.6 s per loop
Extending to number of unique elements per column
To extend, we just need to do the slicing and ufunc operations along the other axis for the two proposed approaches, like so -
def nunique_percol_sort(a):
b = np.sort(a,axis=0)
return (b[1:] != b[:-1]).sum(axis=0)+1
def nunique_percol_bincount(a):
n = a.max()+1
a_off = a+(np.arange(a.shape[1]))*n
M = a.shape[1]*n
return (np.bincount(a_off.ravel(), minlength=M).reshape(-1,n)!=0).sum(1)
Generic ndarray with generic axis
Let's see how we can extend to ndarray of generic dimensions and get those number of unique counts along a generic axis. We will make use of np.diff with its axis param to get those consecutive differences and hence make it generic, like so -
def nunique(a, axis):
return (np.diff(np.sort(a,axis=axis),axis=axis)!=0).sum(axis=axis)+1
Sample runs -
In [77]: a
Out[77]:
array([[1, 0, 2, 2, 0],
[1, 0, 1, 2, 0],
[0, 0, 0, 0, 2],
[1, 2, 1, 0, 1],
[2, 0, 1, 0, 0]])
In [78]: nunique(a, axis=0)
Out[78]: array([3, 2, 3, 2, 3])
In [79]: nunique(a, axis=1)
Out[79]: array([3, 3, 2, 3, 3])
If you are working with floating pt numbers and want to make the unique-ness case based on some tolerance value rather than absolute match, we can use np.isclose. Two such options would be -
(~np.isclose(np.diff(np.sort(a,axis=axis),axis=axis),0)).sum(axis)+1
a.shape[axis]-np.isclose(np.diff(np.sort(a,axis=axis),axis=axis),0).sum(axis)
For a custom tolerance value, feed those with np.isclose.
This solution via np.apply_along_axis isn't vectorised and involves a Python-level loop. But it is relatively intuitive using len + np.unique functions.
import numpy as np
from toolz import compose
a = np.array([[1, 0, 0], [1, 0, 0], [2, 3, 4]])
np.apply_along_axis(compose(len, np.unique), 1, a) # [2, 2, 3]
A oneliner using sort:
In [6]: np.count_nonzero(np.diff(np.sort(a)), axis=1)+1
Out[6]: array([2, 2, 3])
Are you open to considering pandas? Dataframes have a dedicated method for this
>>> a = np.array([[1, 0, 0], [1, 0, 0], [2, 3, 4]])
>>> df = pd.DataFrame(a.T)
>>> print(*df.nunique())
2 2 3

Find the index of the n smallest values in a 3 dimensional numpy array

Given a 3 dimensional numpy array, how to find the indexes of top n smallest values ? The index of the minimum value can be found as:
i,j,k = np.where(my_array == my_array.min())
Here's one approach for generic n-dims and generic N smallest numbers -
def smallestN_indices(a, N):
idx = a.ravel().argsort()[:N]
return np.stack(np.unravel_index(idx, a.shape)).T
Each row of the the 2D output array would hold the indexing tuple that corresponds to one of the smallest array numbers.
We can also use argpartition, but that might not maintain the order. So, we need a bit more additional work with argsort there -
def smallestN_indices_argparitition(a, N, maintain_order=False):
idx = np.argpartition(a.ravel(),N)[:N]
if maintain_order:
idx = idx[a.ravel()[idx].argsort()]
return np.stack(np.unravel_index(idx, a.shape)).T
Sample run -
In [141]: np.random.seed(1234)
...: a = np.random.randint(111,999,(2,5,4,3))
...:
In [142]: smallestN_indices(a, N=3)
Out[142]:
array([[0, 3, 2, 0],
[1, 2, 3, 0],
[1, 2, 2, 1]])
In [143]: smallestN_indices_argparitition(a, N=3)
Out[143]:
array([[1, 2, 3, 0],
[0, 3, 2, 0],
[1, 2, 2, 1]])
In [144]: smallestN_indices_argparitition(a, N=3, maintain_order=True)
Out[144]:
array([[0, 3, 2, 0],
[1, 2, 3, 0],
[1, 2, 2, 1]])
Runtime test -
In [145]: a = np.random.randint(111,999,(20,50,40,30))
In [146]: %timeit smallestN_indices(a, N=3)
...: %timeit smallestN_indices_argparitition(a, N=3)
...: %timeit smallestN_indices_argparitition(a, N=3, maintain_order=True)
...:
10 loops, best of 3: 97.6 ms per loop
100 loops, best of 3: 8.32 ms per loop
100 loops, best of 3: 8.34 ms per loop

dot product of two 1D vectors in numpy

I'm working with numpy in python to calculate a vector multiplication.
I have a vector x of dimensions n x 1 and I want to calculate x*x_transpose.
This gives me problems because x.T or x.transpose() doesn't affect a 1 dimensional vector (numpy represents vertical and horizontal vectors the same way).
But how do I calculate a (n x 1) x (1 x n) vector multiplication in numpy?
numpy.dot(x,x.T) gives a scalar, not a 2D matrix as I want.
You are essentially computing an Outer Product.
You can use np.outer.
In [15]: a=[1,2,3]
In [16]: np.outer(a,a)
Out[16]:
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
While np.outer is the simplest way to do this, I'd thought I'd just mention how you might manipulate the (N,) shaped array to do this:
In [17]: a = np.arange(4)
In [18]: np.dot(a[:,None], a[None,:])
Out[18]:
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
In [19]: np.outer(a,a)
Out[19]:
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
Where you could alternatively replace None with np.newaxis.
Another more exotic way to do this is with np.einsum:
In [20]: np.einsum('i,j', a, a)
Out[20]:
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
And just for fun, some timings, which are likely going to vary based on hardware and numpy version/compilation:
Small-ish vector
In [36]: a = np.arange(5, dtype=np.float64)
In [37]: %timeit np.outer(a,a)
100000 loops, best of 3: 17.7 µs per loop
In [38]: %timeit np.dot(a[:,None],a[None,:])
100000 loops, best of 3: 11 µs per loop
In [39]: %timeit np.einsum('i,j', a, a)
1 loops, best of 3: 11.9 µs per loop
In [40]: %timeit a[:, None] * a
100000 loops, best of 3: 9.68 µs per loop
And something a little larger
In [42]: a = np.arange(500, dtype=np.float64)
In [43]: %timeit np.outer(a,a)
1000 loops, best of 3: 605 µs per loop
In [44]: %timeit np.dot(a[:,None],a[None,:])
1000 loops, best of 3: 1.29 ms per loop
In [45]: %timeit np.einsum('i,j', a, a)
1000 loops, best of 3: 359 µs per loop
In [46]: %timeit a[:, None] * a
1000 loops, best of 3: 597 µs per loop
If you want an inner product then use numpy.dot(x,x) for outer product use numpy.outer(x,x)
Another alternative is to define the row / column vector with 2-dimensions, e.g.
a = np.array([1, 2, 3], ndmin=2)
np.dot(a.T, a)
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
Another alternative is to user numpy.matrix
>>> a = np.matrix([1,2,3])
>>> a
matrix([[1, 2, 3]])
>>> a.T * a
matrix([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
Generally use of numpy.arrays is preferred. However, using numpy.matrices can be more readable for long expressions.

Mask numpy array based on index

How do I mask an array based on the actual index values?
That is, if I have a 10 x 10 x 30 matrix and I want to mask the array when the first and second index equal each other.
For example, [1, 1 , :] should be masked because 1 and 1 equal each other but [1, 2, :] should not because they do not.
I'm only asking this with the third dimension because it resembles my current problem and may complicate things. But my main question is, how to mask arrays based on the value of the indices?
In general, to access the value of the indices, you can use np.meshgrid:
i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
m.mask = (i == j)
The advantage of this method is that it works for arbitrary boolean functions on i, j, and k. It is a bit slower than the use of the identity special case.
In [56]: %%timeit
....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
....: i == j
10000 loops, best of 3: 96.8 µs per loop
As #Jaime points out, meshgrid supports a sparse option, which doesn't do so much duplication, but requires a bit more care in some cases because they don't broadcast. It will save memory and speed things up a little. For example,
In [77]: x = np.arange(5)
In [78]: np.meshgrid(x, x)
Out[78]:
[array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]),
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]])]
In [79]: np.meshgrid(x, x, sparse=True)
Out[79]:
[array([[0, 1, 2, 3, 4]]),
array([[0],
[1],
[2],
[3],
[4]])]
So, you can use the sparse version as he says, but you must force the broadcasting as such:
i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
m.mask = np.repeat(i==j, k.size, axis=2)
And the speedup:
In [84]: %%timeit
....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
....: np.repeat(i==j, k.size, axis=2)
10000 loops, best of 3: 73.9 µs per loop
In your special case of wanting to mask the diagonals, you can use the np.identity() function which returns ones along the diagonal. Since you have the third dimension, we have to add that third dimension to the the identity matrix:
m.mask = np.identity(10)[...,None]*np.ones((1,1,30))
There might be a better way of constructing that array, but it is basically stacking 30 of the np.identity(10) array. For example, this is equivalent:
np.dstack((np.identity(10),)*30)
but slower:
In [30]: timeit np.identity(10)[...,None]*np.ones((1,1,30))
10000 loops, best of 3: 40.7 µs per loop
In [31]: timeit np.dstack((np.identity(10),)*30)
1000 loops, best of 3: 219 µs per loop
And #Ophion's suggestions
In [33]: timeit np.tile(np.identity(10)[...,None], 30)
10000 loops, best of 3: 63.2 µs per loop
In [71]: timeit np.repeat(np.identity(10)[...,None], 30)
10000 loops, best of 3: 45.3 µs per loop

Categories

Resources