Fill mask efficiently based on start indices - python

I have a 2D array (for this example, actually can be ND), and I would like to create a mask for it that masks the end of each row. For example:
np.random.seed(0xBEEF)
a = np.random.randint(10, size=(5, 6))
mask_indices = np.argmax(a, axis=1)
I would like to convert mask_indices to a boolean mask. Currently, I can't think of a better way than
mask = np.zeros(a.shape, dtype=np.bool)
for r, m in enumerate(mask_indices):
mask[r, m:] = True
So for
a = np.array([[6, 5, 0, 2, 1, 2],
[8, 1, 3, 7, 1, 9],
[8, 7, 6, 7, 3, 6],
[2, 7, 0, 3, 1, 7],
[5, 4, 0, 7, 6, 0]])
and
mask_indices = np.array([0, 5, 0, 1, 3])
I would like to see
mask = np.array([[ True, True, True, True, True, True],
[False, False, False, False, False, True],
[ True, True, True, True, True, True],
[False, True, True, True, True, True],
[False, False, False, True, True, True]])
Is there a vectorized form of this operation?
In general, I would like to be able to do this across all the dimensions besides the one that defines the index points.

I. Ndim array-masking along last axis (rows)
For n-dim array to mask along rows, we could do -
def mask_from_start_indices(a, mask_indices):
r = np.arange(a.shape[-1])
return mask_indices[...,None]<=r
Sample run -
In [177]: np.random.seed(0)
...: a = np.random.randint(10, size=(2, 2, 5))
...: mask_indices = np.argmax(a, axis=-1)
In [178]: a
Out[178]:
array([[[5, 0, 3, 3, 7],
[9, 3, 5, 2, 4]],
[[7, 6, 8, 8, 1],
[6, 7, 7, 8, 1]]])
In [179]: mask_indices
Out[179]:
array([[4, 0],
[2, 3]])
In [180]: mask_from_start_indices(a, mask_indices)
Out[180]:
array([[[False, False, False, False, True],
[ True, True, True, True, True]],
[[False, False, True, True, True],
[False, False, False, True, True]]])
II. Ndim array-masking along generic axis
For n-dim arrays masking along a generic axis, it would be -
def mask_from_start_indices_genericaxis(a, mask_indices, axis):
r = np.arange(a.shape[axis]).reshape((-1,)+(1,)*(a.ndim-axis-1))
mask_indices_nd = mask_indices.reshape(np.insert(mask_indices.shape,axis,1))
return mask_indices_nd<=r
Sample runs -
Data array setup :
In [288]: np.random.seed(0)
...: a = np.random.randint(10, size=(2, 3, 5))
In [289]: a
Out[289]:
array([[[5, 0, 3, 3, 7],
[9, 3, 5, 2, 4],
[7, 6, 8, 8, 1]],
[[6, 7, 7, 8, 1],
[5, 9, 8, 9, 4],
[3, 0, 3, 5, 0]]])
Indices setup and masking along axis=1 -
In [290]: mask_indices = np.argmax(a, axis=1)
In [291]: mask_indices
Out[291]:
array([[1, 2, 2, 2, 0],
[0, 1, 1, 1, 1]])
In [292]: mask_from_start_indices_genericaxis(a, mask_indices, axis=1)
Out[292]:
array([[[False, False, False, False, True],
[ True, False, False, False, True],
[ True, True, True, True, True]],
[[ True, False, False, False, False],
[ True, True, True, True, True],
[ True, True, True, True, True]]])
Indices setup and masking along axis=2 -
In [293]: mask_indices = np.argmax(a, axis=2)
In [294]: mask_indices
Out[294]:
array([[4, 0, 2],
[3, 1, 3]])
In [295]: mask_from_start_indices_genericaxis(a, mask_indices, axis=2)
Out[295]:
array([[[False, False, False, False, True],
[ True, True, True, True, True],
[False, False, True, True, True]],
[[False, False, False, True, True],
[False, True, True, True, True],
[False, False, False, True, True]]])
Other scenarios
A. Extending to given end/stop-indices for masking
To extend the solutions for cases when we are given end/stop-indices for masking, i.e. we are looking to vectorize mask[r, :m] = True, we just need to edit the last step of comparison in the posted solutions to the following -
return mask_indices_nd>r
B. Outputting an integer array
There might be cases when we might be looking to get an int array. On those, simply view the output as such. Hence, if out is the output off the posted solutions, then we can simply do out.view('i1') or out.view('u1') for int8 and uint8 dtype outputs respectively.
For other datatypes, we would need to use .astype() for dtype conversions.
C. For index-inclusive masking for stop-indices
For index-inclusive masking, i.e. the index is to be included for stop-indices case, we need to simply include the equality in the comparison. Hence, the last step would be -
return mask_indices_nd>=r
D. For index-exclusive masking for start-indices
This is a case when the start indices are given and those indices are not be masked, but masked only from the next element onwards until end. So, similar to the reasoning listed in previous section, for this case we would have the last step modified to -
return mask_indices_nd<r

>>> az = np.zeros(a.shape)
>>> az[np.arange(az.shape[0]), mask_indices] = 1
>>> az.cumsum(axis=1).astype(bool) # use n-th dimension for nd case
array([[ True, True, True, True, True, True],
[False, False, False, False, False, True],
[ True, True, True, True, True, True],
[False, True, True, True, True, True],
[False, False, False, True, True, True]])

Related

How can I perform the "reverse" of numpy argwhere? [duplicate]

This question already has an answer here:
Replace 2D numpy array elements based on 2D indexes [duplicate]
(1 answer)
Closed 12 months ago.
Suppose I have a boolean numpy array, and I perform np.argwhere() on it. Is there any way to easily and efficiently do the reverse operation? In other words, given the final shape of a, and the results of argwhere(), how can I find a? I've tried to use the argwhere results together with an array full of False, but can't figure out how to use to do it. Maybe somehow use np.where()?
>>> a = np.array([[False, True, False, True, False],
[False, False, True, False, False]])
>>> results = np.argwhere(a)
>>> results
array([[0, 1],
[0, 3],
[1, 2]], dtype=int64)
>>> recover_a = np.full(shape=a.shape, fill_value=False) # I am
>>> # guessing I could start here then do something...
Use results columns as indices to update the value in recover_a:
recover_a[results[:,0], results[:,1]] = True
recover_a
# array([[False, True, False, True, False],
# [False, False, True, False, False]])
In [233]: a = np.array([[False, True, False, True, False], [False, False, True,
...: False, False]])
In [234]: np.argwhere(a)
Out[234]:
array([[0, 1],
[0, 3],
[1, 2]])
In [235]: np.nonzero(a)
Out[235]: (array([0, 0, 1]), array([1, 3, 2]))
argwhere is just the np.transpose(np.nonzero(a)). One is a tuple of arrays, the other a 2d array with those arrays arranged as columns.
The nonzero/where result is better for indexing, since it is a tuple of indices.
In [236]: res = np.zeros(a.shape, bool)
In [237]: res[np.nonzero(a)] = True
In [238]: res
Out[238]:
array([[False, True, False, True, False],
[False, False, True, False, False]])
In [239]: a[np.nonzero(a)]
Out[239]: array([ True, True, True])

Identify the interior of a boolean array / blob - NumPy / Python

Suppose i have a boolean array with shape (nrows,ncols). True represents that i have a defined value (real number) and False represents an undefined / (not of interest) values.
Im trying to figure out an efficient way to extract the rows and cols of both the boundary and the interior, for example if i had the floowing boolean array, where im marking the boundaries by red and the interior by green:
then a desired output would be (the position of the green Trues):
interior = [(2,3), (2,4)]
We can assume that the interior is always connected.
Using np.where(array == False)[0], i get the indices of the Falses, but how to go from here to the boundaries indices and then to the interior ? I can ofcourse loop through each boolean and check if any of the neighbours is False, if no, then its an interior.
Any tips on how to do this efficiently without looping? Another example to be clear:
desired output:
interior = [(2,3) , (2,4) , (3,3) , (3,4) , (3,5) , (4,3), (4,4), (4,5)]
The output can be a boolean array as well, containing Trues in interior positions, False otherwise. It does not matter. Thanks in advance.
Approach #1
We can use 2D convolution -
from scipy.signal import convolve2d
def interior_indices(a):
kernel = np.ones((3,3),dtype=int)
return np.argwhere(convolve2d(a,kernel,'same')==9)
Sample runs -
In [44]: a1
Out[44]:
array([[False, False, False, False, False, False, False, False],
[ True, True, True, True, True, True, False, False],
[False, True, True, True, True, True, True, False],
[False, False, True, True, True, True, False, False]])
In [45]: interior_indices(a1)
Out[45]:
array([[2, 3],
[2, 4]])
In [46]: a2
Out[46]:
array([[False, False, False, False, False, False, False, False],
[False, True, True, True, True, True, False, False],
[False, True, True, True, True, True, True, False],
[False, False, True, True, True, True, True, False],
[False, False, True, True, True, True, True, False],
[False, True, True, True, True, True, True, False],
[False, False, False, True, True, False, False, False]])
In [47]: interior_indices(a2)
Out[47]:
array([[2, 3],
[2, 4],
[3, 3],
[3, 4],
[3, 5],
[4, 3],
[4, 4],
[4, 5]])
Approach #2
Alternatively, with uniform-filter -
In [61]: from scipy.ndimage import uniform_filter
In [62]: np.argwhere(uniform_filter(a1,mode='constant'))
Out[62]:
array([[2, 3],
[2, 4]])
In [63]: np.argwhere(uniform_filter(a2,mode='constant'))
Out[63]:
array([[2, 3],
[2, 4],
[3, 3],
[3, 4],
[3, 5],
[4, 3],
[4, 4],
[4, 5]])
Approach #3
And with binary-erosion -
In [72]: from scipy.ndimage.morphology import binary_erosion
In [73]: kernel = np.ones((3,3),dtype=bool)
In [74]: np.argwhere(binary_erosion(a1,kernel))
Out[74]:
array([[2, 3],
[2, 4]])
In [75]: np.argwhere(binary_erosion(a2,kernel))
Out[75]:
array([[2, 3],
[2, 4],
[3, 3],
[3, 4],
[3, 5],
[4, 3],
[4, 4],
[4, 5]])
Found a way ! If its too trivial, vote delete :)
#data: the boolean array
d0 = data[1:-1, 2:]
d1 = data[:-2, 2:]
d2 = data[:-2, 1:-1]
d3 = data[:-2, :-2]
d4 = data[1:-1, :-2]
d5 = data[2:, :-2]
d6 = data[2:, 1:-1]
d7 = data[2:, 2:]
interior = np.where(d0 & d1 & d2 & d3 & d4 & d5 & d6 & d7, True, False)

Vectorized approach for masking individual slices per column

I have a numpy array:
>>> a = np.arange(20).reshape(5, -1)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
I have an array of regions going in order of columns, that I would like to create a boolean mask for:
idx = np.array([[0,2], [1,3], [2,4], [1,4]])
My desired mask for this set of indices is:
array([[ True, False, False, False],
[ True, True, False, True],
[False, True, True, True],
[False, False, True, True],
[False, False, False, False]])
So column 0 has 0:2 masked, column 1 has 1:3 masked, etc. My current approach works, but I am looking for something vectorized:
def foo(a, idx):
out = np.zeros(a, dtype=np.bool8)
for (i, j), k in zip(idx, np.arange(a[1])):
out[i:j, k] = True
return out
In action:
foo(a.shape, idx)
array([[ True, False, False, False],
[ True, True, False, True],
[False, True, True, True],
[False, False, True, True],
[False, False, False, False]])
Using broadcasting -
In [434]: r = np.arange(a.shape[0])[:,None]
In [435]: (idx[:,0] <= r) & (idx[:,1] > r)
Out[435]:
array([[ True, False, False, False],
[ True, True, False, True],
[False, True, True, True],
[False, False, True, True],
[False, False, False, False]])

How to obtain the same result as numpy.where over a 2D array without getting 2 indices from the same row

I have a numpy array with booleans:
bool_array.shape
Out[84]: (78, 8)
bool_array.dtype
Out[85]: dtype('bool')
And I would like to find the indices where the second dimension is True:
bool_array[30:35]
Out[87]:
array([[False, False, False, False, True, False, False, False],
[ True, False, False, False, True, False, False, False],
[False, False, False, False, False, True, False, False],
[ True, False, False, False, False, False, False, False],
[ True, False, False, False, False, False, False, False]], dtype=bool)
I have been using numpy.where to do this, but sometimes there are more than 1 indices along the second dimension with the True value.
I would like to find a way to obtain the same result as numpy.where but avoiding to have 2 indices from the same row:
np.where(bool_array)[0][30:35]
Out[88]: array([30, 31, 31, 32, 33])
I currently solve this by looping over the results of numpy.where, finding which n indices are equal to n-1, and using numpy.delete to remove the unwanted indices.
I would like to know if there is a more directly way to obtain the kind of results that I want.
Notes:
The rows of the boolean arrays that I use always have at least 1
True value.
I don't care which one of the multiples True values remains, i only
care to have just 1.
IIUC and given the fact that there is at least one TRUE element per row, you can simply use np.argmax along the second axis to select the first TRUE element along each row, like so -
col_idx = bool_array.argmax(1)
Sample run -
In [246]: bool_array
Out[246]:
array([[ True, True, True, True, False],
[False, False, True, True, False],
[ True, True, False, False, True],
[ True, True, False, False, True]], dtype=bool)
In [247]: np.where(bool_array)[0]
Out[247]: array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
In [248]: np.where(bool_array)[1]
Out[248]: array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
In [249]: bool_array.argmax(1)
Out[249]: array([0, 2, 0, 0])
Explanation -
Corresponding to the duplicates from the output of np.where(bool_array)[0], i.e. :
array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
, we need to select anyone from the output of np.where(bool_array)[1], i.e. :
array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
^ ^ ^ ^
Thus, selecting the first True from each row with bool_array.argmax(1) gives us :
array([0, 2, 0, 0])
You could call np.unique on the resultant array like so:
>>> np.where(bool_array)[0][30:35]
Out[4]: array([0, 1, 1, 2, 3, 4])
>>> np.unique(np.where(bool_array)[0][30:35])
Out[5]: array([0, 1, 2, 3, 4])

Acces all off diagonal elements of boolean numpy matrix

Suppose there is a diagonal matrix M:
#import numpy as np
M = np.matrix(np.eye(5, dtype=bool))
Does anybody know a simple way to access all off diagonal elements, meaning all elements that are False? In R I can simply do this by executing
M[!M]
Unfortunately this is not valid in Python.
You need the bitwise not operator:
M[~M]
You might try np.extract combined with np.eye. For example:
M = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
np.extract(1 - np.eye(3), M)
# result: array([2, 3, 4, 6, 7, 8])
In your example it's almost an identity:
M = np.matrix(np.eye(5, dtype=bool))
np.extract(1 - np.eye(5), M)
#result:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False], dtype=bool)

Categories

Resources