numpy sum antidiagonals of array

numpy sum antidiagonals of array - python

Given a numpy ndarray, I would like to take the first two axes, and replace them with a new axis, which is the sum of their antidiagonals.
In particular, suppose I have variables x,y,z,..., and the entries of my array represent the probability
array[i,j,k,...] = P(x=i, y=j, z=k, ...)
I would like to obtain
new_array[l,k,...] = P(x+y=l, z=k, ...) = sum_i P(x=i, y=l-i, z=k, ...)
i.e., new_array[l,k,...] is the sum of all array[i,j,k,...] such that i+j=l.
What is the most efficient and/or cleanest way to do this in numpy?
EDIT to add:
On recommendation of #hpaulj, here is the obvious iterative solution:
array = numpy.arange(30).reshape((2,3,5))
array = array / float(array.sum()) # make it a probability
new_array = numpy.zeros([array.shape[0] + array.shape[1] - 1] + list(array.shape[2:]))
for i in range(array.shape[0]):
for j in range(array.shape[1]):
new_array[i+j,...] += array[i,j,...]
new_array.sum() # == 1

There is a trace function that gives the sum of a diagonal. You can specify the offset and 2 axes (0 and 1 are the defaults). And to get the antidiagonal, you just need to flip one dimension. np.flipud does that, though it's just [::-1,...] indexing.
Putting those together,
np.array([np.trace(np.flipud(array),offset=k) for k in range(-1,3)])
matches your new_array.
It still loops over the possible values of l (4 in this case). trace itself is compiled.
In this small case, it's actually slower than your double loop (2x3 steps). Even if I move the flipud out of the inner loop, it is still slower. I don't know how this scales for larger arrays.
Part of the problem with vectorizing this even further is that fact that each diagonal has a different length.
In [331]: %%timeit
array1 = array[::-1]
np.array([np.trace(array1,offset=k) for k in range(-1,3)])
.....:
10000 loops, best of 3: 87.4 µs per loop
In [332]: %%timeit
new_array = np.zeros([array.shape[0] + array.shape[1] - 1] + list(array.shape[2:]))
for i in range(2):
for j in range(3):
new_array[i+j] += array[i,j]
.....:
10000 loops, best of 3: 43.5 µs per loop
scipy.sparse has a dia format, which stores the values of nonzero diagonals. It stores a padded array of values, along with the offsets.
array([[12, 0, 0, 0],
[ 8, 13, 0, 0],
[ 4, 9, 14, 0],
[ 0, 5, 10, 15],
[ 0, 1, 6, 11],
[ 0, 0, 2, 7],
[ 0, 0, 0, 3]])
array([-3, -2, -1, 0, 1, 2, 3])
While that's a way of getting around the issue of variable diagonal lengths, I don't think it helps in this case where you just need their sums.

Related

How to efficiently calculate block mean (irregular blocks) for numpy 2D array?

My question is related to Block mean of numpy 2D array and block mean of 2D numpy array (in both dimensions) (in fact it is just more general case). I will explain this on simple example.
Let's assume we have 6x6 2D array:
array([[7, 1, 6, 6, 4, 2],
[8, 5, 5, 6, 3, 5],
[3, 1, 7, 1, 3, 4],
[6, 8, 3, 2, 3, 3],
[8, 6, 7, 1, 1, 3],
[8, 5, 4, 5, 1, 4]])
Now each row (and column) in this matrix is assigned to one of three communities (communities can be of different size) e.g. array([0, 0, 1, 1, 1, 2]) would represent this assignment. Now I need to split this matrix according this assignment and calculate mean over blocks (slices). This would produce 3x3 matrix of block means. For example block (or slice) for community pair (0,0) is an 2x2 array:
array([[7, 1],
[8, 5]])
that has mean of 5.25. Block for community pair (0, 1) is an 2x3 array:
array([[6, 6, 4],
[5, 6, 3]])
with mean 5, and so on..
Resulting array of block means should look like this:
array([[5.25 , 5. , 3.5 ],
[5.33333333, 3.11111111, 3.33333333],
[6.5 , 3.33333333, 4. ]])
My question is how to calculate this efficiently. For now I am using for loops – for each pair of communities I get proper slice, calculate mean over that slice and store this in separate matrix. However I need to perform this operation many times and it takes a lot of time.
I cannot really use (or I dont know how) approaches with reshape since it needs an assumption of equal block size.

If you are open to other packages, Pandas as a convenient groupby function:
out = (pd.Series(a.ravel(),
index = pd.MultiIndex.from_product((pairs,pairs)))
.groupby(level=(0,1)).mean()
.unstack().to_numpy()
)
Output:
array([[5.25 , 5. , 3.5 ],
[5.33333333, 3.11111111, 3.33333333],
[6.5 , 3.33333333, 4. ]])

The best I can imagine is to try to limit the number of loops. I will assume here that the 6x6 2D array is arr and that the communities definition is coms = np.array([0, 0, 1, 1, 1, 2]).
I would first compute slices per community:
dcoms = {k: slice(min(x), 1 + max(x)) for k in np.unique(coms)
for x in (np.where(coms==k)[0],)}
1 loop over coms
Then I can directly compute the resulting ndarray with 2 loops over dcoms:
resul = np.array([[arr[dcoms[i],dcoms[j]].mean() for j in dcoms] for i in dcoms])
It gives as expected:
array([[5.25 , 5. , 3.5 ],
[5.33333333, 3.11111111, 3.33333333],
[6.5 , 3.33333333, 4. ]])

Thanks guys, I performed some benchmark tests to compare these solutions.
The setup is following
np.random.seed(0)
mat = (np.random.random((500, 500)) - 0.5) * 100
# Create 5 communities of size 50 and 10 communities of size 15
comms = np.concatenate((np.repeat(np.arange(5), 50), np.repeat(np.arange(5, 15), 25)))
My original solution using for loops:
unique_comms = np.unique(comms)
block = np.zeros((len(unique_comms), len(unique_comms)))
for i in unique_comms:
for j in unique_comms:
block[i, j] = np.mean(mat[comms == i][:, comms == j])
took 7.66 ms ± 43.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each).
Pandas solution:
out = (pd.Series(mat.ravel(),
index = pd.MultiIndex.from_product((comms, comms)))
.groupby(level=(0,1)).mean()
.unstack().to_numpy())
took 22.1 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
whereas slice solution:
dcoms = {k: slice(min(x), 1 + max(x)) for k in np.unique(comms)
for x in (np.where(comms==k)[0],)}
resul = np.array([[mat[dcoms[i],dcoms[j]].mean() for j in dcoms] for i in dcoms])
took 2.1 ms ± 61.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each).
Unfortunately Pandas solution is much slower, whereas slice solution is apparently ~3-4 fold faster than regular for loop solution. One drawback of slice solution is that (I believe) it does not work when community vector is shuffled (i.e. np.array([1, 0, 1, 0, 2, 1]) instead of np.array([0, 0, 1, 1, 1, 2]).

Best way (preferably numpythonic) to count the length before value changes occur, and number of transitions, in a numpy array?

Sorry if the title is a little confusing, but I'll explain more here. Say I have a large array with a small number of unique elements that looks like this:
arr = np.array([0,0,1,1,1,1,1],
[0,2,0,0,1,1,1],
[0,2,0,0,1,1,1],
[0,2,1,1,1,0,0],
[0,3,2,2,0,2,1])
In this case, the array is 5x6 for example purposes, but in reality, I could be working with something as large as a 10000x10000 array (still with a small amount of unique elements).
I was wondering how to iterate through each rows and 'count' the amount of times the array element changes as you move from right to left, as well as the number of constant elements between transitions.
For example, in the above array, the first row has 1 transition, and lengths 2 and 5 for the values 0 and 1, respectively. In the second-to-last row, there are 3 transitions, with lengths 1, 1, 2, and 2, for the values 0, 2, 1, and 0, respectively.
Ideally, some function transition_count would take arr above and return the something like:
row0: [1, (0,2), (1,5)]
row1: [3, (0,1), (2,1), (0,2), (1,3)]
row2: ...
and so forth.
My thinking for this is to iterate through each row of the array, arr[i,:], and analyze it separately (maybe as a list?). But even for just a single row, I'm not sure how to 'count' the number of transitions and the obtain length of each constant element.
Any help would be appreciated, thank you!

This works on a per-row basis. Not sure we can readily vectorize further given the jagged nature of the output.
for row in arr:
d = np.diff(row) != 0
idx = np.concatenate(([0], np.flatnonzero(d) + 1))
c = np.diff(np.concatenate((idx, [len(row)])))
print(len(c))
print('v', row[idx])
print('c', c)
Here is a fully vectorized solution, if you are willing to accept a slightly different output format:
d = np.diff(arr, axis=1) != 0
t = np.ones(shape=arr.shape, dtype=np.bool)
t[:, 1:] = d
e = np.ones(shape=arr.shape, dtype=np.bool)
e[:, :-1] = d
sr, sc = np.nonzero(t)
er, ec = np.nonzero(e)
v = arr[sr, sc]
print(sr)
print(sc)
print(v)
print(ec-sc + 1)
Note: you can group and split there outputs by sr to arrive at your original stated format; but usually it is best to stay away from jagged arrays entirely if you can (and you almost always can!), also in any downstream processing.

Here's a vectorized way to get all values and counts -
# Look for interval changes and pad with bool 1s on either sides to set the
# first interval for each row and for setting boundary wrt the next row
p = np.ones((len(a),1), dtype=bool)
m = np.hstack((p, a[:,:-1]!=a[:,1:], p))
# Look for interval change indices in flattened array version
intv = m.sum(1).cumsum()-1
# Get index and counts
idx = np.diff(np.flatnonzero(m.ravel()))
count = np.delete(idx, intv[:-1])
val = a[m[:,:-1]]
To get to the final split ones split based on rows -
# Get couples and setup offsetted interval change indices
grps = np.c_[val,count]
intvo = np.r_[0,intv-np.arange(len(intv))]
# Finally slice and get output
out = [grps[i:j] for (i,j) in zip(intvo[:-1], intvo[1:])]
Benchmarking
Solution to get counts and values as functions :
# #Eelco Hoogendoorn's soln
def eh(arr):
d = np.diff(arr, axis=1) != 0
t = np.ones(shape=arr.shape, dtype=np.bool)
t[:, 1:] = d
e = np.ones(shape=arr.shape, dtype=np.bool)
e[:, :-1] = d
sr, sc = np.nonzero(t)
er, ec = np.nonzero(e)
v = arr[sr, sc]
return ec-sc + 1,v
# Function form of proposed solution from this post
def grouped_info(a):
p = np.ones((len(a),1), dtype=bool)
m = np.hstack((p, a[:,:-1]!=a[:,1:], p))
intv = m.sum(1).cumsum()-1
idx = np.diff(np.flatnonzero(m.ravel()))
count = np.delete(idx, intv[:-1])
val = a[m[:,:-1]]
return count,val
We will try to get closer to your actual use-case scenario of 10000x10000 by tiling the given sample along the two axes and time the proposed solutions.
In [48]: a
Out[48]:
array([[0, 0, 1, 1, 1, 1, 1],
[0, 2, 0, 0, 1, 1, 1],
[0, 2, 0, 0, 1, 1, 1],
[0, 2, 1, 1, 1, 0, 0],
[0, 3, 2, 2, 0, 2, 1]])
In [49]: a = np.repeat(np.repeat(a,1000,axis=0),1000,axis=1)
In [50]: %timeit grouped_info(a)
126 ms ± 7.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [52]: %timeit eh(a)
389 ms ± 41.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Map an element in a multi-dimension array to its index

I am using the function get_tuples(length, total) from here
to generate an array of all tuples of given length and sum, an example and the function are shown below. After I have created the array I need to find a way to return the indices of a given number of elements in the array. I was able to do that using .index() by changing the array to a list, as shown below. However, this solution or another solution that is also based on searching (for example using np.where) takes a lot of time to find the indices. Since all elements in the array (array s in the example) are different, I was wondering if we can construct a one-to-one mapping, i.e., a function such that given the element in the array it returns the index of the element by doing some addition and multiplication on the values of this element. Any ideas if that is possible? Thanks!
import numpy as np
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# array s
In [1]: s
Out[1]:
array([[ 0, 0, 0, 20],
[ 0, 0, 1, 19],
[ 0, 0, 2, 18],
...,
[19, 0, 1, 0],
[19, 1, 0, 0],
[20, 0, 0, 0]])
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
#change array to list
s_list = s.tolist()
#find the indices
indx=[s_list.index(i) for i in elements_to_find.tolist()]
#output
In [2]: indx
Out[2]: [0, 7, 100, 5, 45]

Here is a formula that calculates the index based on the tuple alone, i.e. it needn't see the full array. To compute the index of an N-tuple it needs to evaluate N-1 binomial coefficients. The following implementation is (part-) vectorized, it accepts ND-arrays but the tuples must be in the last dimension.
import numpy as np
from scipy.special import comb
# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
return np.round(comb(N,k)).astype(int)
def get_idx(s):
N = s.shape[-1] - 1
R = np.arange(1,N)
ps = s[...,::-1].cumsum(-1)
B = bc(ps[...,1:-1]+R,1+R)
return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)
# OP's generator
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# compute each index
r = get_idx(s)
# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
print(get_idx(elements_to_find))
Sample run:
all ok
[ 0 7 100 5 45]
How to derive formula:
Use stars and bars to express the full partition count #part(N,k) (N is total, k is length) as a single binomial coefficient (N + k - 1) choose (k - 1).
Count back-to-front: It is not hard to verify that after the i-th full iteration of the outer loop of OP's generator exactly #part(N-i,k) have not yet been enumerated. Indeed, what's left are all partitions p1+p2+... = N with p1>=i; we can write p1=q1+i such that q1+p2+... = N-i and this latter partition is constraint-free so we can use 1. to count.

You can use binary search to make the search a lot faster.
Binary search makes the search O(log(n)) rather than O(n) (using Index)
We do not need to sort the tuples since they are already sorted by the generator
import bisect
def get_tuples(length, total):
" Generates tuples "
if length == 1:
yield (total,)
return
yield from ((i,) + t for i in range(total + 1) for t in get_tuples(length - 1, total - i))
def find_indexes(x, indexes):
if len(indexes) > 100:
# Faster to generate all indexes when we have a large
# number to check
d = dict(zip(x, range(len(x))))
return [d[tuple(i)] for i in indexes]
else:
return [bisect.bisect_left(x, tuple(i)) for i in indexes]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
# Tuples are generated in sorted order [(0,0,0,20), ...(20,0,0,0)]
# which allows binary search to be used
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
print('Found indexes:', *y)
print('Indexes & Tuples:')
for i in y:
print(i, x[i])
Output
Found indexes: 0 7 100 5 45
Indexes & Tuples:
0 (0, 0, 0, 20)
7 (0, 0, 7, 13)
100 (0, 5, 5, 10)
5 (0, 0, 5, 15)
45 (0, 2, 4, 14)
Performance
Scenario 1--Tuples already computed and we just want to find the index of certain tuples
For instance x = list(get_tuples(4, 20)) has already been perform.
Search for
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
Binary Search
%timeit find_indexes(x, indexes)
100000 loops, best of 3: 11.2 µs per loop
Calculates the index based on the tuple alone (courtesy #PaulPanzer approach)
%timeit get_idx(indexes)
10000 loops, best of 3: 92.7 µs per loop
In this scenario, binary search is ~8x faster when tuples have already been pre-computed.
Scenario 2--the tuples have not been pre-computed.
%%timeit
import bisect
def find_indexes(x, t):
" finds the index of each tuple in list t (assumes x is sorted) "
return [bisect.bisect_left(x, tuple(i)) for i in t]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
100 loops, best of 3: 2.69 ms per loop
#PaulPanzer approach is the same timing in this scenario (92.97 us)
=> #PaulPanzer approach ~29 times faster when the tuples don't have to be computed
Scenario 3--Large number of indexes (#PJORR)
A large number of random indexes is generated
x = list(get_tuples(4, 20))
xnp = np.array(x)
indices = xnp[np.random.randint(0,len(xnp), 2000)]
indexes = indices.tolist()
%timeit find_indexes(x, indexes)
#Result: 1000 loops, best of 3: 1.1 ms per loop
%timeit get_idx(indices)
#Result: 1000 loops, best of 3: 716 µs per loop
In this case, we are #PaulPanzer is 53% faster

get the index of the last negative value in a 2d array per column

I'm trying to get the index of the last negative value of an array per column (in order to slice it after).
a simple working example on a 1d vector is :
import numpy as np
A = np.arange(10) - 5
A[2] = 2
print A # [-5 -4 2 -2 -1 0 1 2 3 4]
idx = np.max(np.where(A <= 0)[0])
print idx # 5
A[:idx] = 0
print A # [0 0 0 0 0 0 1 2 3 4]
Now I wanna do the same thing on each column of a 2D array :
A = np.arange(10) - 5
A[2] = 2
A2 = np.tile(A, 3).reshape((3, 10)) - np.array([0, 2, -1]).reshape((3, 1))
print A2
# [[-5 -4 2 -2 -1 0 1 2 3 4]
# [-7 -6 0 -4 -3 -2 -1 0 1 2]
# [-4 -3 3 -1 0 1 2 3 4 5]]
And I would like to obtain :
print A2
# [[0 0 0 0 0 0 1 2 3 4]
# [0 0 0 0 0 0 0 0 1 2]
# [0 0 0 0 0 1 2 3 4 5]]
but I can't manage to figure out how to translate the max/where statement to the this 2d array...

You already have good answers, but I wanted to propose a potentially quicker variation using the function np.maximum.accumulate. Since your method for a 1D array uses max/where, you may also find this approach quite intuitive. (Edit: quicker Cython implementation added below).
The overall approach is very similar to the others; the mask is created with:
np.maximum.accumulate((A2 < 0)[:, ::-1], axis=1)[:, ::-1]
This line of code does the following:
(A2 < 0) creates a Boolean array, indicating whether a value is negative or not. The index [:, ::-1] flips this left-to-right.
np.maximum.accumulate is used to return the cumulative maximum along each row (i.e. axis=1). For example [False, True, False] would become [False, True, True].
The final indexing operation [:, ::-1] flips this new Boolean array left-to-right.
Then all that's left to do is to use the Boolean array as a mask to set the True values to zero.
Borrowing the timing methodology and two functions from #Divakar's answer, here are the benchmarks for my proposed method:
# method using np.maximum.accumulate
def accumulate_based(A2):
A2[np.maximum.accumulate((A2 < 0)[:, ::-1], axis=1)[:, ::-1]] = 0
return A2
# large sample array
A2 = np.random.randint(-4, 10, size=(100000, 100))
A2c = A2.copy()
A2c2 = A2.copy()
The timings are:
In [47]: %timeit broadcasting_based(A2)
10 loops, best of 3: 61.7 ms per loop
In [48]: %timeit cumsum_based(A2c)
10 loops, best of 3: 127 ms per loop
In [49]: %timeit accumulate_based(A2c2) # quickest
10 loops, best of 3: 43.2 ms per loop
So using np.maximum.accumulate can be as much as 30% faster than the next fastest solution for arrays of this size and shape.
As #tom10 points out, each NumPy operation processes arrays in their entirety, which can be inefficient when multiple operations are needed to get a result. An iterative approach which works through the array just once may fare better.
Below is a naive function written in Cython which could more than twice as fast as a pure NumPy approach.
This function may be able to be sped up further using memory views.
cimport cython
import numpy as np
cimport numpy as np
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.nonecheck(False)
def cython_based(np.ndarray[long, ndim=2, mode="c"] array):
cdef int rows, cols, i, j, seen_neg
rows = array.shape[0]
cols = array.shape[1]
for i in range(rows):
seen_neg = 0
for j in range(cols-1, -1, -1):
if seen_neg or array[i, j] < 0:
seen_neg = 1
array[i, j] = 0
return array
This function works backwards through each row and starts setting values to zero once it has seen a negative value.
Testing it works:
A2 = np.random.randint(-4, 10, size=(100000, 100))
A2c = A2.copy()
np.array_equal(accumulate_based(A2), cython_based(A2c))
# True
Comparing the performance of the function:
In [52]: %timeit accumulate_based(A2)
10 loops, best of 3: 49.8 ms per loop
In [53]: %timeit cython_based(A2c)
100 loops, best of 3: 18.6 ms per loop

Assuming that you are looking to set all elements for each row until the last negative element to be set to zero (as per the expected output listed in the question for a sample case), two approaches could be suggested here.
Approach #1
This one is based on np.cumsum to generate a mask of elements to be set to zeros as listed next -
# Get boolean mask with TRUEs for each row starting at the first element and
# ending at the last negative element
mask = (np.cumsum(A2[:,::-1]<0,1)>0)[:,::-1]
# Use mask to set all such al TRUEs to zeros as per the expected output in OP
A2[mask] = 0
Sample run -
In [280]: A2 = np.random.randint(-4,10,(6,7)) # Random input 2D array
In [281]: A2
Out[281]:
array([[-2, 9, 8, -3, 2, 0, 5],
[-1, 9, 5, 1, -3, -3, -2],
[ 3, -3, 3, 5, 5, 2, 9],
[ 4, 6, -1, 6, 1, 2, 2],
[ 4, 4, 6, -3, 7, -3, -3],
[ 0, 2, -2, -3, 9, 4, 3]])
In [282]: A2[(np.cumsum(A2[:,::-1]<0,1)>0)[:,::-1]] = 0 # Use mask to set zeros
In [283]: A2
Out[283]:
array([[0, 0, 0, 0, 2, 0, 5],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 3, 5, 5, 2, 9],
[0, 0, 0, 6, 1, 2, 2],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 9, 4, 3]])
Approach #2
This one starts with the idea of finding the last negative element indices from #tom10's answer and develops into a mask finding method using broadcasting to get us the desired output, similar to approach #1.
# Find last negative index for each row
last_idx = A2.shape[1] - 1 - np.argmax(A2[:,::-1]<0, axis=1)
# Find the invalid indices (rows with no negative indices)
invalid_idx = A2[np.arange(A2.shape[0]),last_idx]>=0
# Set the indices for invalid ones to "-1"
last_idx[invalid_idx] = -1
# Boolean mask with each row starting with TRUE as the first element
# and ending at the last negative element
mask = np.arange(A2.shape[1]) < (last_idx[:,None] + 1)
# Set masked elements to zeros, for the desired output
A2[mask] = 0
Runtime tests -
Function defintions:
def broadcasting_based(A2):
last_idx = A2.shape[1] - 1 - np.argmax(A2[:,::-1]<0, axis=1)
last_idx[A2[np.arange(A2.shape[0]),last_idx]>=0] = -1
A2[np.arange(A2.shape[1]) < (last_idx[:,None] + 1)] = 0
return A2
def cumsum_based(A2):
A2[(np.cumsum(A2[:,::-1]<0,1)>0)[:,::-1]] = 0
return A2
Runtimes:
In [379]: A2 = np.random.randint(-4,10,(100000,100))
...: A2c = A2.copy()
...:
In [380]: %timeit broadcasting_based(A2)
10 loops, best of 3: 106 ms per loop
In [381]: %timeit cumsum_based(A2c)
1 loops, best of 3: 167 ms per loop
Verify results -
In [384]: A2 = np.random.randint(-4,10,(100000,100))
...: A2c = A2.copy()
...:
In [385]: np.array_equal(broadcasting_based(A2),cumsum_based(A2c))
Out[385]: True

Finding the first is usually easier and faster than finding the last, so here I reverse the array and then find the first negative (using the OP's version of A2):
im = A2.shape[1] - 1 - np.argmax(A2[:,::-1]<0, axis=1)
# [4 6 3] # which are the indices of the last negative in A2
Also, though, note that if you have large arrays with many negative numbers, it might actually be faster to use a non-numpy approach so you can short circuit the search. That is, numpy will do the calculation on the entire array, so if you have 10000 elements in a row but typically will hit a negative number in the first 10 elements (of a reverse search), a pure Python approach might end up being faster.
Overall, iterating the rows might be faster for subsequent operations as well. For example, if your next step is multiplication, it could be faster to just multiply the slices at the ends that are non-zeros, or maybe find that longest non-zero section and just deal with the truncated array.
This basically comes down to number of negatives per row. If you have 1000 negatives per row you'll on average have non-zeros segments that are 1/1000th of your full row length, so you could get a 1000x speed-up by just looking at the ends. The short example given in the question is great for understanding and answering the basic question, but I wouldn't take timing tests too seriously when your end application is a very different use case; especially since your fractional time savings by using iteration improves in proportion to array size (assuming a constant ratio and random distribution of negative numbers).

You can access individual rows:
A2[0] == array([-5, -4, 2, -2, -1, 0, 1, 2, 3, 4])

Better way than T.eye in theano

The problem is, given a arbitrary 1-d vector y, expanded it into d basis vectors with n dimension.
The rule of the expansion is: each element in y is the index of columns in the n*n identity matrix.
For example:
y = [3, 0, 1]
n = 4
Since n = 4, we have the 4*4 identity matrix:
[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
Expand each element y using the rule, we have:
[0, 1, 0]
[0, 0, 1]
[0, 0, 0]
[1, 0, 0]
I want to solve this problem using theano, with very large n (>50k) and very long y (>10k), so efficiency is important.
The solution using numpy is trivial, but the numpy.eye function may cost too much, we may use anther method to make it faster. Comparing the following methods:
import numpy as np
import theano
import theano.tensor as T
n = 25500
y_value = np.asarray([2, 0, 10, 4], dtype='int32')
# method 1
%timeit np.eye(n)[y_value]
# 10 loops, best of 3: 56.9 ms per loop
# method 2
def vec(i):
e = np.zeros(n)
e[i] = 1
return e
%timeit np.vstack([vec(i) for i in y_value])
# 100 loops, best of 3: 16.3 ms per loop
However, the second method may not work in theano since loop in symbolic variable may not trivial. Is there a method which can avoid using T.eye?
y_value can be an arbitrary 1-d vector.

You can try another approach. In my computer:
>>> %timeit np.eye(n)[y_value]
1 loops, best of 3: 544 ms per loop
However, you don't need to create the whole array if you know in advance the rows you want. You can do this:
>>> n = 25500
>>> n_rows = y_value.size
>>> r = np.zeros((n_rows, n))
>>> r[range(n_rows), y_value] = 1
You create a way smaller array, only y x n where y is the size of your index vector, and populate it in every row. The timing in my computer is:
>>> %%timeit
..: r = np.zeros((n_rows, n))
..: r[range(n_rows), y_value] = 1
100 loops, best of 3: 3.8 ms per loop
x151 speedup in my laptop.
Additionally, if you don't want an array full of zeros at the rear (x-axis), you could do:
>>> %%timeit
..: r = np.zeros((n_rows, y_value.max()+1))
..: r[range(n_rows), y_value] = 1
100000 loops, best of 3: 16 µs per loop
Which is even faster, but the resulting array is y x ymax, in this case 99 x 100, which might not be what you want.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy sum antidiagonals of array - python

Related

How to efficiently calculate block mean (irregular blocks) for numpy 2D array?

Best way (preferably numpythonic) to count the length before value changes occur, and number of transitions, in a numpy array?

Map an element in a multi-dimension array to its index

get the index of the last negative value in a 2d array per column

Better way than T.eye in theano

Categories

Resources