Inverse cumsum for numpy - python

A is a ((d,e)) numpy array. I compute a ((d,e)) numpy array B where I compute the entry B[i,j] as follows
b=0
for k in range(i+1,d):
for l in range(j+1,e):
b=b+A[k,l]
B[i,j]=b
In other words, B[i,j] is the sum of A[k,l] taken over all indices k>i, l>j; this is sort of the opposite of the usual cumsum applied to both axis. I am wondering if there is a more elegant and faster way to do this (e.g. using np.cumsum)?

Assuming you're trying to do this:
A = np.arange(15).reshape((5, -1))
def cumsum2_reverse(arr):
out = np.empty_like(arr)
d, e = arr.shape
for i in xrange(d):
for j in xrange(e):
b = 0
for k in xrange(i + 1, d):
for l in xrange(j + 1, e):
b += arr[k, l]
out[i, j] = b
return out
Then if you do,
In [1]: A_revsum = cumsum2_reverse(A)
In [2]: A_revsum
Out[2]:
array([[72, 38, 0],
[63, 33, 0],
[48, 25, 0],
[27, 14, 0],
[ 0, 0, 0]])
You could use np.cumsum on the reverse-ordered arrays to compute the sum. For example, at first you might try something similar to what #Jaime suggested:
In [3]: np.cumsum(np.cumsum(A[::-1, ::-1], 0), 1)[::-1, ::-1]
Out[3]:
array([[105, 75, 40],
[102, 72, 38],
[ 90, 63, 33],
[ 69, 48, 25],
[ 39, 27, 14]])
Here we remember that np.cumsum starts with the value in the first column (in this case last column), so to ensure zeros there, you could shift the output of this operation. This might look like:
def cumsum2_reverse_alt(arr):
out = np.zeros_like(arr)
out[:-1, :-1] = np.cumsum(np.cumsum(arr[:0:-1, :0:-1], 0), 1)[::-1, ::-1]
return out
This gives the same values as above.
In [4]: (cumsum2_reverse(A) == cumsum2_reverse_alt(A)).all()
Out[4]: True
Note, that the one that utilizes np.cumsum is much faster for large arrays. For example:
In [5]: A=np.arange(3000).reshape((50, -1))
In [6]: %timeit cumsum2_reverse(A)
1 loops, best of 3: 453 ms per loop
In [7]: %timeit cumsum2_reverse_alt(A)
10000 loops, best of 3: 24.7 us per loop

Related

identifying sub-arrays in numpy

I have two two dimensional arrays a and b (#columns of a <= #columns in b). I would like to find an efficient way of matching a row in array a to a contiguous part of a row in array b.
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24],
[ 28, 900]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
The output should be np.array([[0,0], [0,1], [1,0], [2,2], [3,1]]). Row 0 in array a matches Row 0 in array b (first two positions). Row 1 in array a matches row 0 in array b (third and fourth positions).
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, and then compare those patches against each row off a, all of it in a vectorized manner. Then, get the matching indices with np.argwhere -
# a and b from posted question
In [325]: from skimage.util.shape import view_as_windows
In [428]: w = view_as_windows(b,(1,a.shape[1]))
In [429]: np.argwhere((w == a).all(-1).any(-2))[:,::-1]
Out[429]:
array([[0, 0],
[1, 0],
[0, 1],
[3, 1],
[2, 2]])
Alternatively, we could get the indices by the order of rows in a by pushing forward the first axis of a while performing broadcasted comparisons -
In [444]: np.argwhere((w[:,:,0] == a[:,None,None,:]).all(-1).any(-1))
Out[444]:
array([[0, 0],
[0, 1],
[1, 0],
[2, 2],
[3, 1]])
Another way I can think of is to loop over each row in a and perform a 2D correlation between the b which you can consider as a 2D signal a row in a.
We would find the results which are equal to the sum of squares of all values in a. If we subtract our correlation result with this sum of squares, we would find matches with a zero result. Any rows that give you a 0 result would mean that the subarray was found in that row. If you are using floating-point numbers for example, you may want to compare with some small threshold that is just above 0.
If you can use SciPy, the scipy.signal.correlate2d method is what I had in mind.
import numpy as np
from scipy.signal import correlate2d
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
EPS = 1e-8
result = []
for (i, row) in enumerate(a):
out = correlate2d(b, row[None,:], mode='valid') - np.square(row).sum()
locs = np.where(np.abs(out) <= EPS)[0]
unique_rows = np.unique(locs)
for res in unique_rows:
result.append((i, res))
We get:
In [32]: result
Out[32]: [(0, 0), (0, 1), (1, 0), (2, 2)]
The time complexity of this could be better, especially since we're looping over each row of a to find any subarrays in b.

Finding max in numpy skipping some rows and column

I want to find the max row and column index in a numpy matrix. But it not be in the a set of rows or columns. Thus, it should skip those rows and columns while computing the max.
Example:
# finding max in numpy matrix
[row,col] = np.where(mat == mat.max())
But it should skip rows removed_rows=[] and columns columns_rows=[]
I don't want to create a new sub matrix for the computation.
Let a be the input array, rows_rem and cols_rem be the rows and column indices to be skipped respectively. We would have an approach using masking, like so -
m,n = a.shape
d0,d1 = np.ogrid[:m,:n]
a_masked = a*~(np.in1d(d0,rows_rem)[:,None] | np.in1d(d1,cols_rem))
max_row, max_col = np.where(a_masked == a_masked.max())
Sample run -
In [204]: # Inputs
...: a = np.random.randint(11,99,(4,5))
...: rows_rem = [1,3]
...: cols_rem = [1,2,4]
...:
In [205]: a
Out[205]:
array([[36, 51, 72, 18, 31],
[78, 42, 12, 71, 72],
[38, 46, 42, 67, 12],
[87, 56, 76, 14, 21]])
In [206]: a_masked
Out[206]:
array([[64, 0, 0, 90, 0],
[ 0, 0, 0, 0, 0],
[17, 0, 0, 40, 0],
[ 0, 0, 0, 0, 0]])
In [207]: max_row, max_col
Out[207]: (array([0]), array([3]))
Please note that if there's more than one element with the same max value, we would have all of those in the output. So, if you want any or the first of those, we can use argmax, like so -
max_row, max_col = np.unravel_index(a_masked.argmax(),a.shape)
remove_rows = [2,3]
remove_cols = [0,1]
a = np.random.randint(11,99,(4,5))
>>> a
array([[60, 86, 89, 66, 20],
[77, 86, 78, 90, 44],
[68, 57, 83, 48, 25],
[30, 81, 42, 11, 63]])
>>>
Get the row and column indices that you are interested in by filtering out the indices you want removed:
r, c = a.shape
r = [x for x in range(r) if x not in remove_rows]
c = [x for x in range(c) if x not in remove_cols]
>>> r,c
([0, 1], [2, 3, 4])
>>>
Now r and c can be used for integer indexing, numpy.ix_ helps with this.
>>> a[np.ix_(r,c)]
array([[89, 66, 20],
[78, 90, 44]])
>>>
Tack on ndarray.max() to get the max value:
>>> a[np.ix_(r,c)].max()
90
>>>
Finally, use numpy.where to find where it is in the original array:
>>> row, col = np.where(a == a[np.ix_(r,c)].max())
>>> row, col
(array([1]), array([3]))
>>>
This method will also work if removing non-sequential rows or columns.
For example:
remove_rows = [0,3]
remove_cols = [1,4]

Index multiple dimensions of a multi-dimensional array with another - NumPy/ Python

Lets say I have an tensor of the following form:
import numpy as np
a = np.array([ [[1,2],
[3,4]],
[[5,6],
[7,3]]
])
# a.shape : (2,2,2) is a tensor containing 2x2 matrices
indices = np.argmax(a, axis=2)
#print indices
for mat in a:
max_i = np.argmax(mat,axis=1)
# Not really working I would like to
# change 4 in the first matrix to -1
# and 3 in the last to -1
mat[max_i] = -1
print a
Now what I would like to do is to use indices as a mask on a to replace every max element with say -1. Is there a numpy way of doing this ? so far all I have figured out is using for loops.
Here's one way using linear indexing in 3D -
m,n,r = a.shape
offset = n*r*np.arange(m)[:,None] + r*np.arange(n)
np.put(a,indices + offset,-1)
Sample run -
In [92]: a
Out[92]:
array([[[28, 59, 26, 70],
[57, 28, 71, 49],
[33, 6, 10, 90]],
[[24, 16, 83, 67],
[96, 16, 72, 56],
[74, 4, 71, 81]]])
In [93]: indices = np.argmax(a, axis=2)
In [94]: m,n,r = a.shape
...: offset = n*r*np.arange(m)[:,None] + r*np.arange(n)
...: np.put(a,indices + offset,-1)
...:
In [95]: a
Out[95]:
array([[[28, 59, 26, -1],
[57, 28, -1, 49],
[33, 6, 10, -1]],
[[24, 16, -1, 67],
[-1, 16, 72, 56],
[74, 4, 71, -1]]])
Here's another way with linear indexing again, but in 2D -
m,n,r = a.shape
a.reshape(-1,r)[np.arange(m*n),indices.ravel()] = -1
Runtime tests and verify output -
In [156]: def vectorized_app1(a,indices): # 3D linear indexing
...: m,n,r = a.shape
...: offset = n*r*np.arange(m)[:,None] + r*np.arange(n)
...: np.put(a,indices + offset,-1)
...:
...: def vectorized_app2(a,indices): # 2D linear indexing
...: m,n,r = a.shape
...: a.reshape(-1,r)[np.arange(m*n),indices.ravel()] = -1
...:
In [157]: # Generate random 3D array and the corresponding indices array
...: a = np.random.randint(0,99,(100,100,100))
...: indices = np.argmax(a, axis=2)
...:
...: # Make copies for feeding into functions
...: ac1 = a.copy()
...: ac2 = a.copy()
...:
In [158]: vectorized_app1(ac1,indices)
In [159]: vectorized_app2(ac2,indices)
In [160]: np.allclose(ac1,ac2)
Out[160]: True
In [161]: # Make copies for feeding into functions
...: ac1 = a.copy()
...: ac2 = a.copy()
...:
In [162]: %timeit vectorized_app1(ac1,indices)
1000 loops, best of 3: 311 µs per loop
In [163]: %timeit vectorized_app2(ac2,indices)
10000 loops, best of 3: 145 µs per loop
You can use indices to index into the last dimension of a provided that you also specify index arrays into the first two dimensions as well:
import numpy as np
a = np.array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 3]]])
indices = np.argmax(a, axis=2)
print(repr(a[range(a.shape[0]), range(a.shape[1]), indices]))
# array([[2, 3],
# [2, 7]])
a[range(a.shape[0]), range(a.shape[1]), indices] = -1
print(repr(a))
# array([[[ 1, -1],
# [ 3, 4]],
# [[ 5, 6],
# [-1, -1]]])

Reshaping vector with indices in python

I am having some problems resizing a list in python. I have a vector (A) with -9999999 as a few of the elements. I want to find those elements remove them and remove the corresponding elements in B.
I have tried to index the non -9999999 values like this:
i = [i for i in range(len(press)) if press[i] !=-9999999]
But I get an error when I try to use the index to reshape press and my other vector.
Type Error: list indices must be integers, not list
The vectors have a length of about 26000
Basically if I have vector A I want to remove -9999999 elements from A and 65 and 32 in B.
A = [33,55,-9999999,44,78,22,-9999999,10,34]
B = [22,33,65,87,43,87,32,77,99]
Since you mentioned vector, so I think you're looking for a NumPy based solution:
>>> import numpy as np
>>> a = np.array(A)
>>> b = np.array(B)
>>> b[a!=-9999999]
array([22, 33, 87, 43, 87, 77, 99])
Pure Python solution using itertools.compress:
>>> from itertools import compress
>>> list(compress(B, (x != -9999999 for x in A)))
[22, 33, 87, 43, 87, 77, 99]
Timing comparisons:
>>> A = [33,55,-9999999,44,78,22,-9999999,10,34]*10000
>>> B = [22,33,65,87,43,87,32,77,99]*10000
>>> a = np.array(A)
>>> b = np.array(B)
>>> %timeit b[a!=-9999999]
100 loops, best of 3: 2.78 ms per loop
>>> %timeit list(compress(B, (x != -9999999 for x in A)))
10 loops, best of 3: 22.3 ms per loop
A = [33,55,-9999999,44,78,22,-9999999,10,34]
B = [22,33,65,87,43,87,32,77,99]
A1, B1 = (list(x) for x in zip(*((a, b) for a, b in zip(A, B) if a != -9999999)))
print(A1)
print(B1)
This yields:
[33, 55, 44, 78, 22, 10, 34]
[22, 33, 87, 43, 87, 77, 99]
c = [j for i, j in zip(A, B) if i != -9999999]
zip merges two lists, creating a list of the pairs (x, y). Using list comprehension you can filter the elements that are -999999 in A.

Numpy: fast calculations considering items' neighbors and their position inside the array

I have 4 2D numpy arrays, called a, b, c, d, each of them made of n rows and m columns. What I need to do is giving to each element of b and d a value calculated as follows (pseudo-code):
min_coords = min_of_neighbors_coords(x, y)
b[x,y] = a[x,y] * a[min_coords];
d[x,y] = c[min_coords];
Where min_of_neighbors_coords is a function that, given the coordinates of an element of the array, returns the coordinates of the 'neighbor' element that has the lower value. I.e., considering the array:
1, 2, 5
3, 7, 2
2, 3, 6
min_of_neighbors_coords(1, 1) will refer to the central element with the value of 7, and will return the tuple (0, 0): the coordinates of the number 1.
I managed to do this using for loops (element per element), but the algorithm is VERY slow and I'm searching a way to improve it, avoiding loops and demanding the calculations to numpy.
Is it possible?
EDIT I have kept my original answer at the bottom. As Paul points out in the comments, the original answer didn't really answer the OP's question, and could be more easily achieved with an ndimage filter. The following much more cumbersome function should do the right thing. It takes two arrays, a and c, and returns the windowed minimum of a and the values in c at the positions of the windowed minimums in a:
def neighbor_min(a, c):
ac = np.concatenate((a[None], c[None]))
rows, cols = ac.shape[1:]
ret = np.empty_like(ac)
# Fill in the center
win_ac = as_strided(ac, shape=(2, rows-2, cols, 3),
strides=ac.strides+ac.strides[1:2])
win_ac = win_ac[np.ogrid[:2, :rows-2, :cols] +
[np.argmin(win_ac[0], axis=2)]]
win_ac = as_strided(win_ac, shape=(2, rows-2, cols-2, 3),
strides=win_ac.strides+win_ac.strides[2:3])
ret[:, 1:-1, 1:-1] = win_ac[np.ogrid[:2, :rows-2, :cols-2] +
[np.argmin(win_ac[0], axis=2)]]
# Fill the top, bottom, left and right borders
win_ac = as_strided(ac[:, :2, :], shape=(2, 2, cols-2, 3),
strides=ac.strides+ac.strides[2:3])
win_ac = win_ac[np.ogrid[:2, :2, :cols-2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, 0, 1:-1] = win_ac[:, np.argmin(win_ac[0], axis=0),
np.ogrid[:cols-2]]
win_ac = as_strided(ac[:, -2:, :], shape=(2, 2, cols-2, 3),
strides=ac.strides+ac.strides[2:3])
win_ac = win_ac[np.ogrid[:2, :2, :cols-2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, -1, 1:-1] = win_ac[:, np.argmin(win_ac[0], axis=0),
np.ogrid[:cols-2]]
win_ac = as_strided(ac[:, :, :2], shape=(2, rows-2, 2, 3),
strides=ac.strides+ac.strides[1:2])
win_ac = win_ac[np.ogrid[:2, :rows-2, :2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, 1:-1, 0] = win_ac[:, np.ogrid[:rows-2],
np.argmin(win_ac[0], axis=1)]
win_ac = as_strided(ac[:, :, -2:], shape=(2, rows-2, 2, 3),
strides=ac.strides+ac.strides[1:2])
win_ac = win_ac[np.ogrid[:2, :rows-2, :2] +
[np.argmin(win_ac[0], axis=2)]]
ret[:, 1:-1, -1] = win_ac[:, np.ogrid[:rows-2],
np.argmin(win_ac[0], axis=1)]
# Fill the corners
win_ac = ac[:, :2, :2]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, 0, 0] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
win_ac = ac[:, :2, -2:]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, 0, -1] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
win_ac = ac[:, -2:, -2:]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, -1, -1] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
win_ac = ac[:, -2:, :2]
win_ac = win_ac[:, np.ogrid[:2],
np.argmin(win_ac[0], axis=-1)]
ret[:, -1, 0] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
return ret
The return is a (2, rows, cols) array that can be unpacked into the two arrays:
>>> a = np.random.randint(100, size=(5,5))
>>> c = np.random.randint(100, size=(5,5))
>>> a
array([[42, 54, 18, 88, 26],
[80, 65, 83, 31, 4],
[51, 52, 18, 88, 52],
[ 1, 70, 5, 0, 89],
[47, 34, 27, 67, 68]])
>>> c
array([[94, 94, 29, 6, 76],
[81, 47, 67, 21, 26],
[44, 92, 20, 32, 90],
[81, 25, 32, 68, 25],
[49, 43, 71, 79, 77]])
>>> neighbor_min(a, c)
array([[[42, 18, 18, 4, 4],
[42, 18, 18, 4, 4],
[ 1, 1, 0, 0, 0],
[ 1, 1, 0, 0, 0],
[ 1, 1, 0, 0, 0]],
[[94, 29, 29, 26, 26],
[94, 29, 29, 26, 26],
[81, 81, 68, 68, 68],
[81, 81, 68, 68, 68],
[81, 81, 68, 68, 68]]])
The OP's case could then be solved as:
def bd_from_ac(a, c):
b,d = neighbor_min(a, c)
return a*b, d
And while there is a serious performance hit, it is pretty fast still:
In [3]: a = np.random.rand(1000, 1000)
In [4]: c = np.random.rand(1000, 1000)
In [5]: %timeit bd_from_ac(a, c)
1 loops, best of 3: 570 ms per loop
You are not really using the coordinates of the minimum neighboring element for anything else than fetching it, so you may as well skip that part and create a min_neighbor function. If you don't want to resort to cython for fast looping, you are going to have to go with rolling window views, such as outlined in Paul's link. This will typically convert your (m, n) array into a (m-2, n-2, 3, 3) view of the same data, and you would then apply np.min over the last two axes.
Unfortunately you have to apply it one axis at a time, so you will have to create a (m-2, n-2, 3) copy of your data. Fortunately, you can compute the minimum in two steps, first windowing and minimizing along one axis, then along the other, and obtain the same result. So at most you are going to have intermediate storage the size of your input. If needed, you could even reuse the output array as intermediate storage and avoid memory allocations, but that is left as exercise...
The following function does that. It is kind of lengthy because it has to deal not only with the central area, but also with the special cases of the four edges and four corners. Other than that it is a pretty compact implementation:
def neighbor_min(a):
rows, cols = a.shape
ret = np.empty_like(a)
# Fill in the center
win_a = as_strided(a, shape=(m-2, n, 3),
strides=a.strides+a.strides[:1])
win_a = win_a.min(axis=2)
win_a = as_strided(win_a, shape=(m-2, n-2, 3),
strides=win_a.strides+win_a.strides[1:])
ret[1:-1, 1:-1] = win_a.min(axis=2)
# Fill the top, bottom, left and right borders
win_a = as_strided(a[:2, :], shape=(2, cols-2, 3),
strides=a.strides+a.strides[1:])
ret[0, 1:-1] = win_a.min(axis=2).min(axis=0)
win_a = as_strided(a[-2:, :], shape=(2, cols-2, 3),
strides=a.strides+a.strides[1:])
ret[-1, 1:-1] = win_a.min(axis=2).min(axis=0)
win_a = as_strided(a[:, :2], shape=(rows-2, 2, 3),
strides=a.strides+a.strides[:1])
ret[1:-1, 0] = win_a.min(axis=2).min(axis=1)
win_a = as_strided(a[:, -2:], shape=(rows-2, 2, 3),
strides=a.strides+a.strides[:1])
ret[1:-1, -1] = win_a.min(axis=2).min(axis=1)
# Fill the corners
ret[0, 0] = a[:2, :2].min()
ret[0, -1] = a[:2, -2:].min()
ret[-1, -1] = a[-2:, -2:].min()
ret[-1, 0] = a[-2:, :2].min()
return ret
You can now do things like:
>>> a = np.random.randint(10, size=(5, 5))
>>> a
array([[0, 3, 1, 8, 9],
[7, 2, 7, 5, 7],
[4, 2, 6, 1, 9],
[2, 8, 1, 2, 3],
[7, 7, 6, 8, 0]])
>>> neighbor_min(a)
array([[0, 0, 1, 1, 5],
[0, 0, 1, 1, 1],
[2, 1, 1, 1, 1],
[2, 1, 1, 0, 0],
[2, 1, 1, 0, 0]])
And your original question can be solved as:
def bd_from_ac(a, c):
return a*neighbor_min(a), neighbor_min(c)
As a performance benchmark:
In [2]: m, n = 1000, 1000
In [3]: a = np.random.rand(m, n)
In [4]: c = np.random.rand(m, n)
In [5]: %timeit bd_from_ac(a, c)
1 loops, best of 3: 123 ms per loop
Finding a[min_coords] is a rolling window operation. Several clever solutions our outlined in this post. You'll want to make the creation of the c[min_coords] array a side-effect of whichever solution you choose.
I hope this helps. I can post some sample code later when I have some time.
I have interest in helping you, and I believe there are possibly better solutions outside the scope of your question, but in order to put my own time into writing code, I must have some feedback of yours, because I am not 100% sure I understand what you need.
One thing to consider: if you are a C# developer, maybe a "brute-force" implementation of C# can outperform a clever implementation of Numpy, so you could consider at least testing your rather simple operations implemented in C#. Geotiff (which I suppose you are reading) has a relatively friendly specification, and I guess there might be .NET GeoTiff libraries around.
But supposing you want to give Numpy a try (and I believe you should), let's take a look at what you're trying to achieve:
If you are going to run min_coords(array) in every element of arrays a and c, you might consider to "stack" nine copies of the same array, each copy rolled by some offset, using numpy.dstack() and numpy.roll(). Then, you apply numpy.argmin(stacked_array, axis=2) and you get an array containing values between 0 and 8, where each of these values map to a tuple containing the offset indexes.
Then, using this principle, your min_coords() function would be vectorized, operating in the whole array at once, and giving back an array that gives you an offset which would be the index of a lookup table containing the offsets.
If you have interest in elaborating this, please leave a comment.
Hope this helps!

Categories

Resources