Related
This question already has an answer here:
Replace 2D numpy array elements based on 2D indexes [duplicate]
(1 answer)
Closed 12 months ago.
Suppose I have a boolean numpy array, and I perform np.argwhere() on it. Is there any way to easily and efficiently do the reverse operation? In other words, given the final shape of a, and the results of argwhere(), how can I find a? I've tried to use the argwhere results together with an array full of False, but can't figure out how to use to do it. Maybe somehow use np.where()?
>>> a = np.array([[False, True, False, True, False],
[False, False, True, False, False]])
>>> results = np.argwhere(a)
>>> results
array([[0, 1],
[0, 3],
[1, 2]], dtype=int64)
>>> recover_a = np.full(shape=a.shape, fill_value=False) # I am
>>> # guessing I could start here then do something...
Use results columns as indices to update the value in recover_a:
recover_a[results[:,0], results[:,1]] = True
recover_a
# array([[False, True, False, True, False],
# [False, False, True, False, False]])
In [233]: a = np.array([[False, True, False, True, False], [False, False, True,
...: False, False]])
In [234]: np.argwhere(a)
Out[234]:
array([[0, 1],
[0, 3],
[1, 2]])
In [235]: np.nonzero(a)
Out[235]: (array([0, 0, 1]), array([1, 3, 2]))
argwhere is just the np.transpose(np.nonzero(a)). One is a tuple of arrays, the other a 2d array with those arrays arranged as columns.
The nonzero/where result is better for indexing, since it is a tuple of indices.
In [236]: res = np.zeros(a.shape, bool)
In [237]: res[np.nonzero(a)] = True
In [238]: res
Out[238]:
array([[False, True, False, True, False],
[False, False, True, False, False]])
In [239]: a[np.nonzero(a)]
Out[239]: array([ True, True, True])
I have following problem, which I want to solve using numpy array elements.
The problem is:
Matrix = np.zeros((4*4), dtype = bool) which gives this 2D matrix.
Matrix = [[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False]]
Les us suppose that we have an another array a = np.array([0,1], [2,1], [3,3])
a = [[0, 1],
[2, 1],
[3, 3]]
My question is: How to use the elements of the a array as indices to fill my matrix with True's. The output should seem like this
Matrix = [[False, True, False, False], # [0, 1]
[False, False, False, False],
[False, True, False, False], # [2, 1]
[False, False, False, True]] # [3, 3]
import numpy as np
Matrix = np.zeros((4*4), dtype = bool).reshape(4,4)
a = [[0, 1],
[2, 1],
[3, 3]]
Unroll them into a proper pair of indexing arrays for a 2d array
a = ([x[0] for x in a], [x[1] for x in a])
Matrix[a] = True
>>> Matrix
array([[False, True, False, False],
[False, False, False, False],
[False, True, False, False],
[False, False, False, True]])
Simple way to make the (4,4) bool array:
In [390]: arr = np.zeros((4,4), dtype = bool)
In [391]: arr
Out[391]:
array([[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False]])
Proper syntax for making a:
In [392]: a = np.array([[0,1], [2,1], [3,3]])
In [393]: a
Out[393]:
array([[0, 1],
[2, 1],
[3, 3]])
Use the 2 columns of a as indices for the 2 dimensions of arr:
In [394]: arr[a[:,0],a[:,1]]=True
In [395]: arr
Out[395]:
array([[False, True, False, False],
[False, False, False, False],
[False, True, False, False],
[False, False, False, True]])
I have two arrays a and b of length n and m respectively, where n > m, a has values in 1,...,m and b is a permutation of 1,...,m:
# n > m
n = 20000
m = 10000
a = np.random.randint(1, m + 1, size=n)
b = np.random.permutation(m) + 1
How can I find an array c of length n with values in 1,...,m such that the following holds?
assert(b[c-1]==a)
This is one way:
_, c = np.nonzero(b == a[:, None])
assert np.allclose(b[c], a)
Just note that it asserts b[c] to a instead of b[c-1].
Working:
The line b == a[:, None] returns a boolean array of shape n x m, where each row compares the row index-th element of a with all elements of b. That is why you have m boolean elements in a row with True in the corresponding column index col where that element from a equals to b[col]. This uses broadcasting for elementwise comparison.
This is a small illustration:
>>> m = 5
>>> n = 10
>>> a = np.random.randint(1, m+1, size=n)
>>> b = np.random.permutation(m) + 1
>>> a
array([5, 4, 2, 1, 4, 2, 5, 4, 5, 2])
>>> b
array([3, 5, 1, 2, 4])
>>> b == a[:, None]
array([[False, True, False, False, False],
[False, False, False, False, True],
[False, False, False, True, False],
[False, False, True, False, False],
[False, False, False, False, True],
[False, False, False, True, False],
[False, True, False, False, False],
[False, False, False, False, True],
[False, True, False, False, False],
[False, False, False, True, False]])
On applying np.nonzero(), on this 2D boolean array, you get 2 1D arrays of row and column indices of where the passed array has True elements i.e., each (i[k], j[k]) position of the boolean array has True. Here I have shown the row and column index arrays as i and j.
>>> i, j = np.nonzero(b == a[:, None])
>>> i
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> j
array([1, 4, 3, 2, 4, 3, 1, 4, 1, 3])
In a way the columns j gives how the array a can be obtained by indexing b with j.
>>> b[j]
array([5, 4, 2, 1, 4, 2, 5, 4, 5, 2])
>>> a
array([5, 4, 2, 1, 4, 2, 5, 4, 5, 2])
Essentially you have elements in a coming from the set b. The idea above is just to compare where each element in a appears in b and then get the corresponding index.
I have a 2D array (for this example, actually can be ND), and I would like to create a mask for it that masks the end of each row. For example:
np.random.seed(0xBEEF)
a = np.random.randint(10, size=(5, 6))
mask_indices = np.argmax(a, axis=1)
I would like to convert mask_indices to a boolean mask. Currently, I can't think of a better way than
mask = np.zeros(a.shape, dtype=np.bool)
for r, m in enumerate(mask_indices):
mask[r, m:] = True
So for
a = np.array([[6, 5, 0, 2, 1, 2],
[8, 1, 3, 7, 1, 9],
[8, 7, 6, 7, 3, 6],
[2, 7, 0, 3, 1, 7],
[5, 4, 0, 7, 6, 0]])
and
mask_indices = np.array([0, 5, 0, 1, 3])
I would like to see
mask = np.array([[ True, True, True, True, True, True],
[False, False, False, False, False, True],
[ True, True, True, True, True, True],
[False, True, True, True, True, True],
[False, False, False, True, True, True]])
Is there a vectorized form of this operation?
In general, I would like to be able to do this across all the dimensions besides the one that defines the index points.
I. Ndim array-masking along last axis (rows)
For n-dim array to mask along rows, we could do -
def mask_from_start_indices(a, mask_indices):
r = np.arange(a.shape[-1])
return mask_indices[...,None]<=r
Sample run -
In [177]: np.random.seed(0)
...: a = np.random.randint(10, size=(2, 2, 5))
...: mask_indices = np.argmax(a, axis=-1)
In [178]: a
Out[178]:
array([[[5, 0, 3, 3, 7],
[9, 3, 5, 2, 4]],
[[7, 6, 8, 8, 1],
[6, 7, 7, 8, 1]]])
In [179]: mask_indices
Out[179]:
array([[4, 0],
[2, 3]])
In [180]: mask_from_start_indices(a, mask_indices)
Out[180]:
array([[[False, False, False, False, True],
[ True, True, True, True, True]],
[[False, False, True, True, True],
[False, False, False, True, True]]])
II. Ndim array-masking along generic axis
For n-dim arrays masking along a generic axis, it would be -
def mask_from_start_indices_genericaxis(a, mask_indices, axis):
r = np.arange(a.shape[axis]).reshape((-1,)+(1,)*(a.ndim-axis-1))
mask_indices_nd = mask_indices.reshape(np.insert(mask_indices.shape,axis,1))
return mask_indices_nd<=r
Sample runs -
Data array setup :
In [288]: np.random.seed(0)
...: a = np.random.randint(10, size=(2, 3, 5))
In [289]: a
Out[289]:
array([[[5, 0, 3, 3, 7],
[9, 3, 5, 2, 4],
[7, 6, 8, 8, 1]],
[[6, 7, 7, 8, 1],
[5, 9, 8, 9, 4],
[3, 0, 3, 5, 0]]])
Indices setup and masking along axis=1 -
In [290]: mask_indices = np.argmax(a, axis=1)
In [291]: mask_indices
Out[291]:
array([[1, 2, 2, 2, 0],
[0, 1, 1, 1, 1]])
In [292]: mask_from_start_indices_genericaxis(a, mask_indices, axis=1)
Out[292]:
array([[[False, False, False, False, True],
[ True, False, False, False, True],
[ True, True, True, True, True]],
[[ True, False, False, False, False],
[ True, True, True, True, True],
[ True, True, True, True, True]]])
Indices setup and masking along axis=2 -
In [293]: mask_indices = np.argmax(a, axis=2)
In [294]: mask_indices
Out[294]:
array([[4, 0, 2],
[3, 1, 3]])
In [295]: mask_from_start_indices_genericaxis(a, mask_indices, axis=2)
Out[295]:
array([[[False, False, False, False, True],
[ True, True, True, True, True],
[False, False, True, True, True]],
[[False, False, False, True, True],
[False, True, True, True, True],
[False, False, False, True, True]]])
Other scenarios
A. Extending to given end/stop-indices for masking
To extend the solutions for cases when we are given end/stop-indices for masking, i.e. we are looking to vectorize mask[r, :m] = True, we just need to edit the last step of comparison in the posted solutions to the following -
return mask_indices_nd>r
B. Outputting an integer array
There might be cases when we might be looking to get an int array. On those, simply view the output as such. Hence, if out is the output off the posted solutions, then we can simply do out.view('i1') or out.view('u1') for int8 and uint8 dtype outputs respectively.
For other datatypes, we would need to use .astype() for dtype conversions.
C. For index-inclusive masking for stop-indices
For index-inclusive masking, i.e. the index is to be included for stop-indices case, we need to simply include the equality in the comparison. Hence, the last step would be -
return mask_indices_nd>=r
D. For index-exclusive masking for start-indices
This is a case when the start indices are given and those indices are not be masked, but masked only from the next element onwards until end. So, similar to the reasoning listed in previous section, for this case we would have the last step modified to -
return mask_indices_nd<r
>>> az = np.zeros(a.shape)
>>> az[np.arange(az.shape[0]), mask_indices] = 1
>>> az.cumsum(axis=1).astype(bool) # use n-th dimension for nd case
array([[ True, True, True, True, True, True],
[False, False, False, False, False, True],
[ True, True, True, True, True, True],
[False, True, True, True, True, True],
[False, False, False, True, True, True]])
I have a numpy array with booleans:
bool_array.shape
Out[84]: (78, 8)
bool_array.dtype
Out[85]: dtype('bool')
And I would like to find the indices where the second dimension is True:
bool_array[30:35]
Out[87]:
array([[False, False, False, False, True, False, False, False],
[ True, False, False, False, True, False, False, False],
[False, False, False, False, False, True, False, False],
[ True, False, False, False, False, False, False, False],
[ True, False, False, False, False, False, False, False]], dtype=bool)
I have been using numpy.where to do this, but sometimes there are more than 1 indices along the second dimension with the True value.
I would like to find a way to obtain the same result as numpy.where but avoiding to have 2 indices from the same row:
np.where(bool_array)[0][30:35]
Out[88]: array([30, 31, 31, 32, 33])
I currently solve this by looping over the results of numpy.where, finding which n indices are equal to n-1, and using numpy.delete to remove the unwanted indices.
I would like to know if there is a more directly way to obtain the kind of results that I want.
Notes:
The rows of the boolean arrays that I use always have at least 1
True value.
I don't care which one of the multiples True values remains, i only
care to have just 1.
IIUC and given the fact that there is at least one TRUE element per row, you can simply use np.argmax along the second axis to select the first TRUE element along each row, like so -
col_idx = bool_array.argmax(1)
Sample run -
In [246]: bool_array
Out[246]:
array([[ True, True, True, True, False],
[False, False, True, True, False],
[ True, True, False, False, True],
[ True, True, False, False, True]], dtype=bool)
In [247]: np.where(bool_array)[0]
Out[247]: array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
In [248]: np.where(bool_array)[1]
Out[248]: array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
In [249]: bool_array.argmax(1)
Out[249]: array([0, 2, 0, 0])
Explanation -
Corresponding to the duplicates from the output of np.where(bool_array)[0], i.e. :
array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
, we need to select anyone from the output of np.where(bool_array)[1], i.e. :
array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
^ ^ ^ ^
Thus, selecting the first True from each row with bool_array.argmax(1) gives us :
array([0, 2, 0, 0])
You could call np.unique on the resultant array like so:
>>> np.where(bool_array)[0][30:35]
Out[4]: array([0, 1, 1, 2, 3, 4])
>>> np.unique(np.where(bool_array)[0][30:35])
Out[5]: array([0, 1, 2, 3, 4])