Python numpy nonzero cumsum - python

I want to do nonzero cumsum with numpy array. Simply skip zeros in array and apply cumsum. Suppose I have a np. array
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
my result should be
[1,3,4,6,11,0,20,26,0,28,31,0]
I have tried this
a = np.cumsum(a[a!=0])
but result is
[1,3,4,6,11,20,26,28,31]
Any ideas?

You need to mask the original array so only the non-zero elements are overwritten:
In [9]:
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a[a!=0] = np.cumsum(a[a!=0])
a
Out[9]:
array([ 1, 3, 4, 6, 11, 0, 20, 26, 0, 28, 31, 0])
Another method is to use np.where:
In [93]:
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a = np.where(a!=0,np.cumsum(a),a)
a
Out[93]:
array([ 1, 3, 4, 6, 11, 0, 20, 26, 0, 28, 31, 0])
timings
In [91]:
%%timeit
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a[a!=0] = np.cumsum(a[a!=0])
a
The slowest run took 4.93 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 12.6 µs per loop
In [94]:
%%timeit
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a = np.where(a!=0,np.cumsum(a),a)
a
The slowest run took 6.00 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 10.5 µs per loop
the above shows that np.where is marginally quicker than the first method

To my mind, jotasi's suggestion in a comment to the OP is the most idiomatic. Here are some timings, though note that Shawn. L's answer returns a Python list, not a NumPy array, so they are not strictly comparable.
import numpy as np
def jotasi(a):
b = np.cumsum(a)
b[a==0] = 0
return b
def EdChum(a):
a[a!=0] = np.cumsum(a[a!=0])
return a
def ShawnL(a):
b=np.cumsum(a)
b = [b[i] if ((i > 0 and b[i] != b[i-1]) or i==0) else 0 for i in range(len(b))]
return b
def Ed2(a):
return np.where(a!=0,np.cumsum(a),a)
To test, I generated a NumPy array of 1E5 integers in [0,100]. Therefore about 1% are 0. These results are from NumPy 1.9.2, Python 2.7.12, and are presented from slowest to fastest:
import timeit
a = np.random.random_integers(0,100,100000)
len(a[a==0]) #verify there are some 0's
1003
timeit.timeit("ShawnL(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
11.743098020553589
timeit.timeit("EdChum(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
0.1794271469116211
timeit.timeit("Ed2(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
0.1282949447631836
timeit.timeit("jotasi(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
0.09286999702453613
I'm a little surprised there's such a big difference between jotasi's and Ed Chum's answers - minimizing boolean operations is noticeable I guess. No surprise that a list comprehension is slow.

Just trying to simplify it:)
b=np.cumsum(a)
[b[i] if ((i > 0 and b[i] != b[i-1]) or i==0) else 0 for i in range(len(b))]

Related

Efficiently adding numpy arrays with duplicate destination indices [duplicate]

Suppose I have 2 matrices M and N (both have > 1 columns). I also have an index matrix I with 2 columns -- 1 for M and one for N. The indices for N are unique, but the indices for M may appear more than once. The operation I would like to perform is,
for i,j in w:
M[i] += N[j]
Is there a more efficient way to do this other than a for loop?
For completeness, in numpy >= 1.8 you can also use np.add's at method:
In [8]: m, n = np.random.rand(2, 10)
In [9]: m_idx, n_idx = np.random.randint(10, size=(2, 20))
In [10]: m0 = m.copy()
In [11]: np.add.at(m, m_idx, n[n_idx])
In [13]: m0 += np.bincount(m_idx, weights=n[n_idx], minlength=len(m))
In [14]: np.allclose(m, m0)
Out[14]: True
In [15]: %timeit np.add.at(m, m_idx, n[n_idx])
100000 loops, best of 3: 9.49 us per loop
In [16]: %timeit np.bincount(m_idx, weights=n[n_idx], minlength=len(m))
1000000 loops, best of 3: 1.54 us per loop
Aside of the obvious performance disadvantage, it has a couple of advantages:
np.bincount converts its weights to double precision floats, .at will operate with you array's native type. This makes it the simplest option for dealing e.g. with complex numbers.
np.bincount only adds weights together, you have an at method for all ufuncs, so you can repeatedly multiply, or logical_and, or whatever you feel like.
But for your use case, np.bincount is probably the way to go.
Using also m_ind, n_ind = w.T, just do M += np.bincount(m_ind, weights=N[n_ind], minlength=len(M))
For clarity, let's define
>>> m_ind, n_ind = w.T
Then the for loop
for i, j in zip(m_ind, n_ind):
M[i] += N[j]
updates the entries M[np.unique(m_ind)]. The values that get written to it are N[n_ind], which must be grouped by m_ind. (The fact that there's an n_ind in addition to m_ind is actually tangential to the question; you could just set N = N[n_ind].) There happens to be a SciPy class that does exactly this: scipy.sparse.csr_matrix.
Example data:
>>> m_ind, n_ind = array([[0, 0, 1, 1], [2, 3, 0, 1]])
>>> M = np.arange(2, 6)
>>> N = np.logspace(2, 5, 4)
The result of the for loop is that M becomes [110002 1103 4 5]. We get the same result with a csr_matrix as follows. As I said earlier, n_ind isn't relevant, so we get rid of that first.
>>> N = N[n_ind]
>>> from scipy.sparse import csr_matrix
>>> update = csr_matrix((N, m_ind, [0, len(N)])).toarray()
The CSR constructor builds a matrix with the required values at the required indices; the third part of its argument is a compressed column index, meaning that the values N[0:len(N)] have the indices m_ind[0:len(N)]. Duplicates are summed:
>>> update
array([[ 110000., 1100.]])
This has shape (1, len(np.unique(m_ind))) and can be added in directly:
>>> M[np.unique(m_ind)] += update.ravel()
>>> M
array([110002, 1103, 4, 5])

Cannot understand numpy argpartition output

I am trying to use arpgpartition from numpy, but it seems there is something going wrong and I cannot seem to figure it out. Here is what's happening:
These are first 5 elements of the sorted array norms
np.sort(norms)[:5]
array([ 53.64759445, 54.91434479, 60.11617279, 64.09630585, 64.75318909], dtype=float32)
But when I use indices_sorted = np.argpartition(norms, 5)[:5]
norms[indices_sorted]
array([ 60.11617279, 64.09630585, 53.64759445, 54.91434479, 64.75318909], dtype=float32)
When I think I should get the same result as the sorted array?
It works just fine when I use 3 as the parameter indices_sorted = np.argpartition(norms, 3)[:3]
norms[indices_sorted]
array([ 53.64759445, 54.91434479, 60.11617279], dtype=float32)
This isn't making much sense to me, hoping someone can offer some insight?
EDIT: Rephrasing this question as whether argpartition preserves order of the k partitioned elements makes more sense.
We need to use list of indices that are to be kept in sorted order instead of feeding the kth param as a scalar. Thus, to maintain the sorted nature across the first 5 elements, instead of np.argpartition(a,5)[:5], simply do -
np.argpartition(a,range(5))[:5]
Here's a sample run to make things clear -
In [84]: a = np.random.rand(10)
In [85]: a
Out[85]:
array([ 0.85017222, 0.19406266, 0.7879974 , 0.40444978, 0.46057793,
0.51428578, 0.03419694, 0.47708 , 0.73924536, 0.14437159])
In [86]: a[np.argpartition(a,5)[:5]]
Out[86]: array([ 0.19406266, 0.14437159, 0.03419694, 0.40444978, 0.46057793])
In [87]: a[np.argpartition(a,range(5))[:5]]
Out[87]: array([ 0.03419694, 0.14437159, 0.19406266, 0.40444978, 0.46057793])
Please note that argpartition makes sense on performance aspect, if we are looking to get sorted indices for a small subset of elements, let's say k number of elems which is a small fraction of the total number of elems.
Let's use a bigger dataset and try to get sorted indices for all elems to make the above mentioned point clear -
In [51]: a = np.random.rand(10000)*100
In [52]: %timeit np.argpartition(a,range(a.size-1))[:5]
10 loops, best of 3: 105 ms per loop
In [53]: %timeit a.argsort()
1000 loops, best of 3: 893 µs per loop
Thus, to sort all elems, np.argpartition isn't the way to go.
Now, let's say I want to get sorted indices for only the first 5 elems with that big dataset and also keep the order for those -
In [68]: a = np.random.rand(10000)*100
In [69]: np.argpartition(a,range(5))[:5]
Out[69]: array([1647, 942, 2167, 1371, 2571])
In [70]: a.argsort()[:5]
Out[70]: array([1647, 942, 2167, 1371, 2571])
In [71]: %timeit np.argpartition(a,range(5))[:5]
10000 loops, best of 3: 112 µs per loop
In [72]: %timeit a.argsort()[:5]
1000 loops, best of 3: 888 µs per loop
Very useful here!
Given the task of indirectly sorting a subset (the top k, top meaning first in sort order) there are two builtin solutions: argsort and argpartition cf. #Divakar's answer.
If, however, performance is a consideration then it may (depending on the sizes of the data and the subset of interest) be well worth resisting the "lure of the one-liner", investing one more line and applying argsort on the output of argpartition:
>>> def top_k_sort(a, k):
... return np.argsort(a)[:k]
...
>>> def top_k_argp(a, k):
... return np.argpartition(a, range(k))[:k]
...
>>> def top_k_hybrid(a, k):
... b = np.argpartition(a, k)[:k]
... return b[np.argsort(a[b])]
>>> k = 100
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_sort, 'rng': np.random.random, 'k': k})
8.348663672804832
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_argp, 'rng': np.random.random, 'k': k})
9.869098862167448
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_hybrid, 'rng': np.random.random, 'k': k})
1.2305558240041137
argsort is O(n log n), argpartition with range argument appears to be O(nk) (?), and argpartition + argsort is O(n + k log k)
Therefore in an interesting regime n >> k >> 1 the hybrid method is expected to be fastest
UPDATE: ND version:
import numpy as np
from timeit import timeit
def top_k_sort(A,k,axis=-1):
return A.argsort(axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_partition(A,k,axis=-1):
return A.argpartition(range(k),axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_hybrid(A,k,axis=-1):
B = A.argpartition(k,axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
return np.take_along_axis(B,np.take_along_axis(A,B,axis).argsort(axis),axis)
A = np.random.random((100,10000))
k = 100
from timeit import timeit
for f in globals().copy():
if f.startswith("top_"):
print(f, timeit(f"{f}(A,k)",globals=globals(),number=10)*100)
Sample run:
top_k_sort 63.72379460372031
top_k_partition 99.30561298970133
top_k_hybrid 10.714635509066284
Let's describe the partition method in a simplified way which helps a lot understand argpartition
Following the example in the picture if we execute C=numpy.argpartition(A, 3) C will be the resulting array of getting the position of every element in B with respect to the A array. ie:
Idx(z) = index of element z in array A
then C would be
C = [ Idx(B[0]), Idx(B[1]), Idx(B[2]), Idx(X), Idx(B[4]), ..... Idx(B[N]) ]
As previously mentioned this method is very helpful and comes very handy when you have a huge array and you are only interested in a selected group of ordered elements, not the whole array.

Looping and Searching in Numpy Array

I need to loop over a numpy array and then do the following search. The following is taking almost 60(s) for an array (npArray1 and npArray2 in the example below) with around 300K values.
In other words, I am looking for the index of the first occurence in npArray2
for every value of npArray1.
for id in np.nditer(npArray1):
newId=(np.where(npArray2==id))[0][0]
Is there anyway I can make the above faster using numpy? I need to run the script above on much bigger arrays (50M). Please note that my two numpy arrays in the lines above, npArray1 and npArray2 are not necessarily the same size, but they are both 1d.
Thanks a lot for your help,
The function np.unique will do much of the work for you:
npArray2 = np.random.randint(100,None,(1000,)) #1000-long vector of ints between 1 and 100, so lots of repeats
vals,idxs = np.unique(searchMe, return_index=True) #each unique value AND the index of its first appearance
for val in npArray1:
newId = idxs[vals==val][0]
vals is an array containing the unique values in npArray2, while idxs gives the index of the first appearance of each value in npArray2. Searching in vals should be much faster than in npArray1 because it's smaller.
You can speed up the search further by taking advantage of the fact that vals is sorted:
import bisect #we can use binary search since vals is sorted
for val in npArray1:
newId = idxs[bisect.bisect_left(vals, val)]
Assuming the input arrays contain unique values, you can use np.searchsorted with its optional sorter option for a vectorized solution, like so -
arr2_sortidx = npArray2.argsort()
idx = np.searchsorted(npArray2,npArray1,sorter=arr2_sortidx)
out1 = arr2_sortidx[idx]
Sample run to verify output -
In [154]: npArray1
Out[154]: array([77, 19, 0, 69])
In [155]: npArray2
Out[155]: array([ 8, 33, 12, 19, 77, 30, 81, 69, 20, 0])
In [156]: out = np.empty(npArray1.size,dtype=int)
...: for i,id in np.ndenumerate(npArray1):
...: out[i] = (np.where(npArray2==id))[0][0]
...:
In [157]: arr2_sortidx = npArray2.argsort()
...: idx = np.searchsorted(npArray2,npArray1,sorter=arr2_sortidx)
...: out1 = arr2_sortidx[idx]
...:
In [158]: out
Out[158]: array([4, 3, 9, 7])
In [159]: out1
Out[159]: array([4, 3, 9, 7])
Runtime test -
In [175]: def original_app(npArray1,npArray2):
...: out = np.empty(npArray1.size,dtype=int)
...: for i,id in np.ndenumerate(npArray1):
...: out[i] = (np.where(npArray2==id))[0][0]
...: return out
...:
...: def searchsorted_app(npArray1,npArray2):
...: arr2_sortidx = npArray2.argsort()
...: idx = np.searchsorted(npArray2,npArray1,sorter=arr2_sortidx)
...: return arr2_sortidx[idx]
...:
In [176]: # Setup inputs
...: M,N = 50000,40000 # npArray2 and npArray1 sizes respectively
...: maxn = 200000
...: npArray2 = np.unique(np.random.randint(0,maxn,(M)))
...: npArray2 = npArray2[np.random.permutation(npArray2.size)]
...: npArray1 = npArray2[np.random.permutation(npArray2.size)[:N]]
...:
In [177]: out1 = original_app(npArray1,npArray2)
In [178]: out2 = searchsorted_app(npArray1,npArray2)
In [179]: np.allclose(out1,out2)
Out[179]: True
In [180]: %timeit original_app(npArray1,npArray2)
1 loops, best of 3: 3.14 s per loop
In [181]: %timeit searchsorted_app(npArray1,npArray2)
100 loops, best of 3: 17.4 ms per loop
In the task you specified you have to iterate over the array one way or another. So you can just think of a considerable performance improvement without changing your algorithm too much. This is where numba might be of a great help:
import numpy as np
from numba import jit
#jit
def numba_iter(npa1, npa2):
for id in np.nditer(npa1):
newId=(np.where(npa2==id))[0][0]
This simple approach might make your program much faster. Look at some examples and benchmarks here.

numpy: efficiently summing with index arrays

Suppose I have 2 matrices M and N (both have > 1 columns). I also have an index matrix I with 2 columns -- 1 for M and one for N. The indices for N are unique, but the indices for M may appear more than once. The operation I would like to perform is,
for i,j in w:
M[i] += N[j]
Is there a more efficient way to do this other than a for loop?
For completeness, in numpy >= 1.8 you can also use np.add's at method:
In [8]: m, n = np.random.rand(2, 10)
In [9]: m_idx, n_idx = np.random.randint(10, size=(2, 20))
In [10]: m0 = m.copy()
In [11]: np.add.at(m, m_idx, n[n_idx])
In [13]: m0 += np.bincount(m_idx, weights=n[n_idx], minlength=len(m))
In [14]: np.allclose(m, m0)
Out[14]: True
In [15]: %timeit np.add.at(m, m_idx, n[n_idx])
100000 loops, best of 3: 9.49 us per loop
In [16]: %timeit np.bincount(m_idx, weights=n[n_idx], minlength=len(m))
1000000 loops, best of 3: 1.54 us per loop
Aside of the obvious performance disadvantage, it has a couple of advantages:
np.bincount converts its weights to double precision floats, .at will operate with you array's native type. This makes it the simplest option for dealing e.g. with complex numbers.
np.bincount only adds weights together, you have an at method for all ufuncs, so you can repeatedly multiply, or logical_and, or whatever you feel like.
But for your use case, np.bincount is probably the way to go.
Using also m_ind, n_ind = w.T, just do M += np.bincount(m_ind, weights=N[n_ind], minlength=len(M))
For clarity, let's define
>>> m_ind, n_ind = w.T
Then the for loop
for i, j in zip(m_ind, n_ind):
M[i] += N[j]
updates the entries M[np.unique(m_ind)]. The values that get written to it are N[n_ind], which must be grouped by m_ind. (The fact that there's an n_ind in addition to m_ind is actually tangential to the question; you could just set N = N[n_ind].) There happens to be a SciPy class that does exactly this: scipy.sparse.csr_matrix.
Example data:
>>> m_ind, n_ind = array([[0, 0, 1, 1], [2, 3, 0, 1]])
>>> M = np.arange(2, 6)
>>> N = np.logspace(2, 5, 4)
The result of the for loop is that M becomes [110002 1103 4 5]. We get the same result with a csr_matrix as follows. As I said earlier, n_ind isn't relevant, so we get rid of that first.
>>> N = N[n_ind]
>>> from scipy.sparse import csr_matrix
>>> update = csr_matrix((N, m_ind, [0, len(N)])).toarray()
The CSR constructor builds a matrix with the required values at the required indices; the third part of its argument is a compressed column index, meaning that the values N[0:len(N)] have the indices m_ind[0:len(N)]. Duplicates are summed:
>>> update
array([[ 110000., 1100.]])
This has shape (1, len(np.unique(m_ind))) and can be added in directly:
>>> M[np.unique(m_ind)] += update.ravel()
>>> M
array([110002, 1103, 4, 5])

How to find which numpy array contains the maximum value on an element by element basis?

Given a list of numpy arrays, each with the same dimensions, how can I find which array contains the maximum value on an element-by-element basis?
e.g.
import numpy as np
def find_index_where_max_occurs(my_list):
# d = ... something goes here ...
return d
a=np.array([1,1,3,1])
b=np.array([3,1,1,1])
c=np.array([1,3,1,1])
my_list=[a,b,c]
array_of_indices_where_max_occurs = find_index_where_max_occurs(my_list)
# This is what I want:
# >>> print array_of_indices_where_max_occurs
# array([1,2,0,0])
# i.e. for the first element, the maximum value occurs in array b which is at index 1 in my_list.
Any help would be much appreciated... thanks!
Another option if you want an array:
>>> np.array((a, b, c)).argmax(axis=0)
array([1, 2, 0, 0])
So:
def f(my_list):
return np.array(my_list).argmax(axis=0)
This works with multidimensional arrays, too.
For the fun of it, I realised that #Lev's original answer was faster than his generalized edit, so this is the generalized stacking version which is much faster than the np.asarray version, but it is not very elegant.
np.concatenate((a[None,...], b[None,...], c[None,...]), axis=0).argmax(0)
That is:
def bystack(arrs):
return np.concatenate([arr[None,...] for arr in arrs], axis=0).argmax(0)
Some explanation:
I've added a new axis to each array: arr[None,...] is equivalent to arr[np.newaxis,...] which is the same as arr[np.newaxis,:,:,:] where the ... expands to be the appropriate number dimensions. The reason for this is because np.concatenate will then stack along the new dimension, which is 0 since the None is at the front.
So, for example:
In [286]: a
Out[286]:
array([[0, 1],
[2, 3]])
In [287]: b
Out[287]:
array([[10, 11],
[12, 13]])
In [288]: np.concatenate((a[None,...],b[None,...]),axis=0)
Out[288]:
array([[[ 0, 1],
[ 2, 3]],
[[10, 11],
[12, 13]]])
In case it helps to understand, this would work too:
np.concatenate((a[...,None], b[...,None], c[...,None]), axis=a.ndim).argmax(a.ndim)
where the new axis is now added at the end, so we must stack and maximize along that last axis, which will be a.ndim. For a, b, and c being 2d, we could do this:
np.concatenate((a[:,:,None], b[:,:,None], c[:,:,None]), axis=2).argmax(2)
Which is equivalent to the dstack I mentioned in my comment above (dstack adds a third axis to stack along if it doesn't exist in the arrays).
To test:
N = 10
M = 2
a = np.random.random((N,)*M)
b = np.random.random((N,)*M)
c = np.random.random((N,)*M)
def bystack(arrs):
return np.concatenate([arr[None,...] for arr in arrs], axis=0).argmax(0)
def byarray(arrs):
return np.array(arrs).argmax(axis=0)
def byasarray(arrs):
return np.asarray(arrs).argmax(axis=0)
def bylist(arrs):
assert arrs[0].ndim == 1, "ndim must be 1"
return [np.argmax(x) for x in zip(*arrs)]
In [240]: timeit bystack((a,b,c))
100000 loops, best of 3: 18.3 us per loop
In [241]: timeit byarray((a,b,c))
10000 loops, best of 3: 89.7 us per loop
In [242]: timeit byasarray((a,b,c))
10000 loops, best of 3: 90.0 us per loop
In [259]: timeit bylist((a,b,c))
1000 loops, best of 3: 267 us per loop
[np.argmax(x) for x in zip(*my_list)]
Well, this is just a list, but you know how to make it an array if you want. :)
To explain what this does: zip(*my_list) is equivalent to zip(a,b,c), which gives you a generator to loop over. Each step in the loop gives you a tuple like (a[i], b[i], c[i]), where i is the step in the loop. Then, np.argmax gives you the index of that tuple for the element with the largest value.

Categories

Resources