Numpy 2D indexing of a 1D array with known min, max indices

Numpy 2D indexing of a 1D array with known min, max indices - python

I have a 1D numpy array of False booleans, and a 2D numpy array containing the min,max indices of values in the first array to change to True.
An example:
my_data = numpy.zeros((10,), dtype=bool)
inds2true = numpy.array([[1, 3], [8, 9]])
And I want the following result:
out = numpy.array([False, True, True, True, False, False, False, False, True, True])
How is this possible in Python with Numpy?
Edit: I would like this to be performed in one step (i.e. no looping).

There's one rule-breaking hack:
my_data[inds2true] = True
my_data = np.cumsum(my_data) % 2 == 1
my_data
>>> array([False, True, True, False, False, False, False, False, True, False])
The most common practise is to change indices within np.arange([1, 3]) and np.arange([8, 9]), not including 3 or 9. If you still want to include them, do in addition: my_data[inds2true[:, 1]] = True
If you're looking for other options to do it in one go, the most probably it will include np.cumsum tricks.

import numpy as np
my_data = np.zeros((10,), dtype=bool)
inds2true = np.array([[1, 3], [8, 9]])
indeces = []
for ix_range in inds2true:
indeces += list(range(ix_range[0], ix_range[1] + 1))
my_data[indeces] = True

Related

Numpy: Duplicate mask for an array (returning True if we've seen that value before, False otherwise)

I'm looking for a vectorized function that returns a mask with values of True if the value in the array has been seen before and False otherwise.
I'm looking for the fastest solution possible as speed is very important.
For example this is what I would like to see:
array = [1, 2, 1, 2, 3]
mask = [False, False, True, True, False]
So is_duplicate = array[mask] should return [1, 2].
Is there a fast, vectorized way to do this? Thanks!

Approach #1 : With sorting
def mask_firstocc(a):
sidx = a.argsort(kind='stable')
b = a[sidx]
out = np.r_[False,b[:-1] == b[1:]][sidx.argsort()]
return out
We can use array-assignment to boost perf. further -
def mask_firstocc_v2(a):
sidx = a.argsort(kind='stable')
b = a[sidx]
mask = np.r_[False,b[:-1] == b[1:]]
out = np.empty(len(a), dtype=bool)
out[sidx] = mask
return out
Sample run -
In [166]: a
Out[166]: array([2, 1, 1, 0, 0, 4, 0, 3])
In [167]: mask_firstocc(a)
Out[167]: array([False, False, True, False, True, False, True, False])
Approach #2 : With np.unique(..., return_index)
We can leverage np.unique with its return_index which seems to return the first occurence of each unique elemnent, hence a simple array-assignment and then indexing works -
def mask_firstocc_with_unique(a):
mask = np.ones(len(a), dtype=bool)
mask[np.unique(a, return_index=True)[1]] = False
return mask

Use np.unique
a = np.array([1, 2, 1, 2, 3])
_, ix = np.unique(a, return_index=True)
b = np.full(a.shape, True)
b[ix] = False
In [45]: b
Out[45]: array([False, False, True, True, False])

You can achieve that using the enumerate method - which lets you loop through using index + value :
array = [1, 2, 1, 2, 3]
mask = []
for i,v in enumerate(array):
if array.index(v) == i:
mask.append(False)
else:
mask.append(True)
print(mask)
Output:
[False, False, True, True, False]

Almost by definition, this can't be vectorized. The value of mask for any index depends on the value of array for every value between 0 and index. There may be some algorithm where you expand array into a NxN matrix and do fancy tests, but you're still going to have an O(n^2) algorithm. The straightforward set algorithm is O(n log n).

ndarray row-wise index of values greater than array

I have one array of shape (X, 5):
M = [[1,2,3,4,5],
[6,7,8,9,1],
[2,5,7,8,3]
...]
and one array of shape (X, 1):
n = [[3],
[7],
[100],
...]
Now I need to get the first index of M >= n for each row, or nan if there is no such index.
For example:
np.where([1,2,3,4,5] >= 3)[0][0] # Returns 2
np.searchsorted(np.array([1,2,3,4,5]), 3) # Returns 2
These examples are applied to each row individually (I could loop X times as both arrays have the length X).
I wonder, is there a way to do it in a multidimensional way to get an output of all indices at once?
Something like:
np.where(M>=n)
Thank you
Edit: Values in M are unsorted, I'm still looking for the first index/occurrence fitting M >= n (so probably not searchsorted)

You could start by checking which row indices are lower or equal than n and use argmax to get the first True for each row. For the rows where all columns are False, we can use np.where to set them to np.nan for instance:
M = np.array([[1,2,3,4,5],
[6,7,8,9,1],
[2,5,7,8,3]])
n = np.array([[3],[7],[100]])
le = n<=M
# array([[False, False, True, True, True],
# [False, True, True, True, False],
# [False, False, False, False, False]])
lea = le.argmax(1)
has_any = le[np.arange(len(le)), lea]
np.where(has_any, lea, np.nan)
# array([ 2., 1., nan])

Creating a "bitmask" from several boolean numpy arrays

I'm trying to convert several masks (boolean arrays) to a bitmask with numpy, while that in theory works I feel that I'm doing too many operations.
For example to create the bitmask I use:
import numpy as np
flags = [
np.array([True, False, False]),
np.array([False, True, False]),
np.array([False, True, False])
]
flag_bits = np.zeros(3, dtype=np.int8)
for idx, flag in enumerate(flags):
flag_bits += flag.astype(np.int8) << idx # equivalent to flag * 2 ** idx
Which gives me the expected "bitmask":
>>> flag_bits
array([1, 6, 0], dtype=int8)
>>> [np.binary_repr(bit, width=7) for bit in flag_bits]
['0000001', '0000110', '0000000']
However I feel that especially the casting to int8 and the addition with the flag_bits array is too complicated. Therefore I wanted to ask if there is any NumPy functionality that I missed that could be used to create such an "bitmask" array?
Note: I'm calling an external function that expects such a bitmask, otherwise I would stick with the boolean arrays.

>>> x = np.array(2**i for i in range(1, np.shape(flags)[1]+1))
>>> np.dot(flags, x)
array([1, 2, 2])
How it works: in a bit mask, every bit is effectively an original array element multiplied by a degree of 2 according to its position, e.g. 4 = False * 1 + True * 2 + False * 4. Effectively this can be represented as matrix multiplication, which is really efficient in numpy.
So, first line is a list comprehension to create these weights: x = [1, 2, 4, 8, ... 2^(n+1)].
Then, each line in flags is multiplied by the corresponding element in x and everything is summed up (this is how matrix multiplication works). At the end, we get the bitmask

How about this (added conversion to int8, if desired):
flag_bits = (np.transpose(flags) << np.arange(len(flags))).sum(axis=1)\
.astype(np.int8)
#array([1, 6, 0], dtype=int8)

Here's an approach to directly get to the string bitmask with boolean-indexing -
out = np.repeat('0000000',3).astype('S7')
out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
Sample run -
In [41]: flags
Out[41]:
[array([ True, False, False], dtype=bool),
array([False, True, False], dtype=bool),
array([False, True, False], dtype=bool)]
In [42]: out = np.repeat('0000000',3).astype('S7')
In [43]: out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
In [44]: out
Out[44]:
array([b'0000001', b'0000110', b'0000000'],
dtype='|S7')
Using the same matrix-multiplication strategy as dicussed in detail in #Marat's solution, but using a vectorized scaling array that gives us flag_bits -
np.dot(2**np.arange(3),flags)

Using numpy any() in bool array of arrays

I have a list of lists which are composed by bools, let's say l = [[False, False], [True, False]], and I need to convert l to a numpy array of arrays of booleans. I converted every sublist into a bool array, and the whole list to numpy array too. My current real list has a size of 121 sublists, and the result of np.any() throws just five results, not the 121 expected. My code is this:
>>> result = np.array([ np.array(extracted[aindices[i]:aindices[i + 1]]) for i in range(len(aux_regions)) ])
>>> np.any(result)
[false, false, false, false, false]
extracted[aindices[i]:aindices[i + 1]] is the sublist which I convert to a bool array. The list generated in the whole line is converted to array too.
In the first example l the expected result is, for every subarray (asuming the list as converted) should be [False, True]
What's is the problem using np.any? or the data types for the converted list are not the right ones?

If you have a list of list of bools, you could skip numpy and use a simple comprehension:
In [1]: l = [[False, False], [True, False]]
In [2]: [any(subl) for subl in l]
Out[2]: [False, True]
If the sublists are all the same length, you can pass the list directly to np.array to get a numpy array of bools:
In [3]: import numpy as np
In [4]: result = np.array(l)
In [5]: result
Out[5]:
array([[False, False],
[ True, False]], dtype=bool)
Then you can use the any method on axis 1 to get the result for each row:
In [6]: result.any(axis=1) # or `np.any(result, axis=1)`
Out[6]: array([False, True], dtype=bool)
If the sublists are not all the same length, then a numpy array might not be the best data structure for this problem.
This part of my answer should be considered a "side bar" to what I wrote above. If the sublists have variable lengths, the list comprehension given above is my recommendation. The following is an alternative that uses an advanced numpy feature. I only suggest it because it looks like you already have the data structures needed to used numpy's reduceat function. It works without having to explicitly form the list of lists.
From reading your code, I infer the following:
extracted is a list of bools. You are splitting this up into sublists.
aindices is a list of integers. Each consecutive pair of integers in aindices specifies a range in extracted that is a sublist.
len(aux_regions) is the number of sublists; I'll call this n. The length of aindices is n+1, and the last value in aindices is the length of extracted.
For example, if the data looks like this:
In [74]: extracted
Out[74]: [False, True, False, False, False, False, True, True, True, True, False, False]
In [75]: aindices
Out[75]: [0, 3, 7, 10, 12]
it means there are four sublists:
In [76]: extracted[0:3]
Out[76]: [False, True, False]
In [77]: extracted[3:7]
Out[77]: [False, False, False, True]
In [78]: extracted[7:10]
Out[78]: [True, True, True]
In [79]: extracted[10:12]
Out[79]: [False, False]
With these data structures, you are set up to use the reduceat feature of numpy. The ufunc in this case is logical_or. You can compute the result with this one line:
In [80]: np.logical_or.reduceat(extracted, aindices[:-1])
Out[80]: array([ True, True, True, False], dtype=bool)

Creating a 2D python array to store data

Looking for a way to store this code in a 2D array in python. I've tried making a 1D array and then turning it into a 2D array but my code is still cumbersome and not working. The gap between 4 and 6 is not a typo. Any help would be greatly appreciated.
recno1inds11 = nonzero(data11[:,1]==no1)[0]
recno2inds11 = nonzero(data11[:,1]==no2)[0]
recno3inds11 = nonzero(data11[:,1]==no3)[0]
recno4inds11 = nonzero(data11[:,1]==no4)[0]
recno6inds11 = nonzero(data11[:,1]==no6)[0]
recno7inds11 = nonzero(data11[:,1]==no7)[0]
recno8inds11 = nonzero(data11[:,1]==no8)[0]
recno9inds11 = nonzero(data11[:,1]==no9)[0]
recno10inds11 = nonzero(data11[:,1]==no10)[0]
recno11inds11 = nonzero(data11[:,1]==no11)[0]
recno12inds11 = nonzero(data11[:,1]==no12)[0]
recno13inds11 = nonzero(data11[:,1]==no13)[0]
recno14inds11 = nonzero(data11[:,1]==no14)[0]
recno15inds11 = nonzero(data11[:,1]==no15)[0]
recno16inds11 = nonzero(data11[:,1]==no16)[0]
recno17inds11 = nonzero(data11[:,1]==no17)[0]
recno18inds11 = nonzero(data11[:,1]==no18)[0]
recno19inds11 = nonzero(data11[:,1]==no19)[0]
recno20inds11 = nonzero(data11[:,1]==no20)[0]
recno21inds11 = nonzero(data11[:,1]==no21)[0]
recno22inds11 = nonzero(data11[:,1]==no22)[0]
recno23inds11 = nonzero(data11[:,1]==no23)[0]
recno24inds11 = nonzero(data11[:,1]==no24)[0]
recno25inds11 = nonzero(data11[:,1]==no25)[0]
recno26inds11 = nonzero(data11[:,1]==no26)[0]
recno27inds11 = nonzero(data11[:,1]==no27)[0]
recno28inds11 = nonzero(data11[:,1]==no28)[0]
recno29inds11 = nonzero(data11[:,1]==no29)[0]
recno30inds11 = nonzero(data11[:,1]==no30)[0]

Normally, you don't want to have 30 separate variables like this, you want to have an array of 30 values.
And if you had that, this would be a one-liner; you could need to transpose the right-hand array into the second axis, then use the == operator.
>>> data11 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> data11[:,1]
array([2, 5, 8])
>>> no1to5 = np.array([1, 2, 3, 4, 5])
>>> data11[:,1] == no1to5.reshape((5,1))
array([[False, False, False],
[ True, False, False],
[False, False, False],
[False, False, False],
[False, True, False]], dtype=bool)
Of course you can also apply nonzero, grab the first axis, … whatever you want to do, you can vectorize it as long as you have a vector in the first place, instead of a big collection of separate values that are only related by the meta-information in the variable names you happen to have bound them to.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy 2D indexing of a 1D array with known min, max indices - python

import numpy as np my_data = np.zeros((10,), dtype=bool) inds2true = np.array([[1, 3], [8, 9]]) indeces = [] for ix_range in inds2true: indeces += list(range(ix_range[0], ix_range[1] + 1)) my_data[indeces] = True

Related

Numpy: Duplicate mask for an array (returning True if we've seen that value before, False otherwise)

ndarray row-wise index of values greater than array

Creating a "bitmask" from several boolean numpy arrays

Using numpy any() in bool array of arrays

Creating a 2D python array to store data

Categories

Resources