indices of NumPy arrays intersection - python

I have two NumPy arrays. For example:
arr1 = np.array(['a','b','a','c','c','b','a','d'])
arr2 = np.array(['a','b','c','d'])
My task is to create list of indices of arr2 array where arr1 == arr2.
The length of the desired list should be equal to len(arr1). For instance, in my case the correct answer is [0,1,0,2,2,1,0,3].
What is the short way to do this? Is it possible to use a list comprehension here?

I noticed that arr2 is sorted, is that by design? If so you can do:
arr1 = np.array(['a','b','a','c','c','b','a','d'])
arr2 = np.array(['a','b','c','d'])
arr2.searchsorted(arr1)
# array([0, 1, 0, 2, 2, 1, 0, 3])
As #JAB has mentioned you could use the sorter keyword to searchsorted when arr2 is not sorted:
arr2 = np.array(['d', 'c', 'b', 'a'])
sorter = arr2.argsort()
sorter[arr2.searchsorted(arr1, sorter=sorter)]
# array([3, 2, 3, 1, 1, 2, 3, 0])
This is an O(N*log(N)) method because of the argsort, but it should still be very fast for many use-cases.

Not sure if numpy has a method for this, but here is a builtin approach, which takes O(N) in time:
In [9]: lookup = {v:i for i, v in enumerate(arr2)}
In [10]: [lookup[v] for v in arr1]
Out[10]: [0, 1, 0, 2, 2, 1, 0, 3]

You can do it like this with NumPy using broadcasting, however if your arrays are large you can end up allocating a lot of memory for the intermediate result
>>> import numpy as np
>>> arr1, arr2 = np.array(['a','b','a','c','c','b','a','d']), np.array(['a','b','c','d'])
>>> arr1 == arr2[:, None]
array([[ True, False, True, False, False, False, True, False],
[False, True, False, False, False, True, False, False],
[False, False, False, True, True, False, False, False],
[False, False, False, False, False, False, False, True]], dtype=bool)
>>> (arr1 == arr2[:, None]).argmax(axis=0)
array([0, 1, 0, 2, 2, 1, 0, 3])
>>>
Otherwise keep an eye on arraysetops in case someone adds a return_index parameter to intersect1d

Related

Get matrix entries based on upper and lower bound vectors?

so let`s say I have a matrix mat= [[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]]
and a lower bound vector vector_low = [2.1,1.9,1.7] and upper bound vector vector_up = [3.1,3.5,4.1].
How do I get the values in the matrix in between the upper and lower bounds for every row?
Expected Output:
[[3],[2,3],[2,3,4]] (it`s a list #mozway)
alternatively a vector with all of them would also do...
(Extra question: get the values of the matrix that are between the upper and lower bound, but rounded down/up to the next value in the matrix..
Expected Output:
[[2,3,4],[1,2,3,4],[1,2,3,4,5]])
There should be a fast solution without loop, hope someone can help, thanks!
PS: In the end I just want to sum over the list entries, so the output format is not important...
I probably shouldn't indulge you since you haven't provided the code I asked for, but to satisfy my own curiosity, here my solution(s)
Your lists:
In [72]: alist = [[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]]
In [73]: low = [2.1,1.9,1.7]; up = [3.1,3.5,4.1]
A utility function:
In [74]: def between(row, l, u):
...: return [i for i in row if l <= i <= u]
and the straightforward list comprehension solution - VERY PYTHONIC:
In [75]: [between(row, l, u) for row, l, u in zip(alist, low, up)]
Out[75]: [[3], [2, 3], [2, 3, 4]]
A numpy solutions requires starting with arrays:
In [76]: arr = np.array(alist)
In [77]: Low = np.array(low)
...: Up = np.array(up)
We can check the bounds with:
In [79]: Low[:, None] <= arr
Out[79]:
array([[False, False, True, True, True, True],
[False, True, True, True, True, True],
[False, True, True, True, True, True]])
In [80]: (Low[:, None] <= arr) & (Up[:,None] >= arr)
Out[80]:
array([[False, False, True, False, False, False],
[False, True, True, False, False, False],
[False, True, True, True, False, False]])
Applying the mask to index arr produces a flat array of values:
In [81]: arr[_]
Out[81]: array([3, 2, 3, 2, 3, 4])
to get values by row, we still have to iterate:
In [82]: [row[mask] for row, mask in zip(arr, Out[80])]
Out[82]: [array([3]), array([2, 3]), array([2, 3, 4])]
For the small case I expect the list approach to be faster. For larger cases [81] will do better - IF we already have arrays. Creating arrays from the lists is not a time-trivial task.

ValueError with union of two arrays using the OR operator

I am using Python and numpy where I have a couple of numpy arrays of the same shape and I am trying to create a union of these arrays. these arrays contain only 0 and 1 and basically I want to merge them into a new array using the OR operation. So, I do the following:
import numpy as np
segs = list()
a = np.ones((10, 10)).astype('uint8')
b = np.zeros((10, 10)).astype('uint8')
segs.append(a)
segs.append(b)
mask = np.asarray([any(tup) for tup in zip(*segs)]).astype('uint8')
With the last staement I get the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If I use np.any, somehow my array shape is now just (10,). How can I create this merge without explicitly looping through the arrays?
EDIT
mask = np.asarray([any(tup) for tup in zip(segs)]).astype('uint8')
also results in the same error.
Your segs is a list of 2 arrays:
In [25]: segs = [np.ones((3,6),'uint8'), np.zeros((3,6),'uint8')]
In [26]: [tup for tup in zip(*segs)]
Out[26]:
[(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8))]
The zip produces tuples of 1d arrays (pairing rows of the two arrays). Python any applied to arrays gives the ambiguity error - that's true for other logical Python operations like if, or, etc, which expect a scalar True/False.
You tried np.any - that turns the tuple of arrays into a 2d array. But without an axis parameter it works on the flattened version, return a scalar True/False. But with an axis parameter we can apply this any across rows:
In [27]: [np.any(tup, axis=0) for tup in zip(*segs)]
Out[27]:
[array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True])]
Using the logical_or ufunc as suggested in a comment:
In [31]: np.logical_or(segs[0],segs[1])
Out[31]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
In [32]: np.logical_or.reduce(segs)
Out[32]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
Using the '|' operator isn't quite the same:
In [33]: segs[0] | segs[1]
Out[33]:
array([[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1]], dtype=uint8)
It uses the segs[0].__or__(segs[1]) method. I'd have to check the docs to see what is going on. Application to uint8 (or other numeric values) is different from application to bool. Almost looks like a max.

How to obtain the same result as numpy.where over a 2D array without getting 2 indices from the same row

I have a numpy array with booleans:
bool_array.shape
Out[84]: (78, 8)
bool_array.dtype
Out[85]: dtype('bool')
And I would like to find the indices where the second dimension is True:
bool_array[30:35]
Out[87]:
array([[False, False, False, False, True, False, False, False],
[ True, False, False, False, True, False, False, False],
[False, False, False, False, False, True, False, False],
[ True, False, False, False, False, False, False, False],
[ True, False, False, False, False, False, False, False]], dtype=bool)
I have been using numpy.where to do this, but sometimes there are more than 1 indices along the second dimension with the True value.
I would like to find a way to obtain the same result as numpy.where but avoiding to have 2 indices from the same row:
np.where(bool_array)[0][30:35]
Out[88]: array([30, 31, 31, 32, 33])
I currently solve this by looping over the results of numpy.where, finding which n indices are equal to n-1, and using numpy.delete to remove the unwanted indices.
I would like to know if there is a more directly way to obtain the kind of results that I want.
Notes:
The rows of the boolean arrays that I use always have at least 1
True value.
I don't care which one of the multiples True values remains, i only
care to have just 1.
IIUC and given the fact that there is at least one TRUE element per row, you can simply use np.argmax along the second axis to select the first TRUE element along each row, like so -
col_idx = bool_array.argmax(1)
Sample run -
In [246]: bool_array
Out[246]:
array([[ True, True, True, True, False],
[False, False, True, True, False],
[ True, True, False, False, True],
[ True, True, False, False, True]], dtype=bool)
In [247]: np.where(bool_array)[0]
Out[247]: array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
In [248]: np.where(bool_array)[1]
Out[248]: array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
In [249]: bool_array.argmax(1)
Out[249]: array([0, 2, 0, 0])
Explanation -
Corresponding to the duplicates from the output of np.where(bool_array)[0], i.e. :
array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
, we need to select anyone from the output of np.where(bool_array)[1], i.e. :
array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
^ ^ ^ ^
Thus, selecting the first True from each row with bool_array.argmax(1) gives us :
array([0, 2, 0, 0])
You could call np.unique on the resultant array like so:
>>> np.where(bool_array)[0][30:35]
Out[4]: array([0, 1, 1, 2, 3, 4])
>>> np.unique(np.where(bool_array)[0][30:35])
Out[5]: array([0, 1, 2, 3, 4])

Built-in function in numpy to interpret an integer to an array of boolean values in a bitwise manner?

I'm wondering if there is a simple, built-in function in Python / Numpy for converting an integer datatype to an array/list of booleans, corresponding to a bitwise interpretation of the number please?
e.g:
x = 5 # i.e. 101 in binary
print FUNCTION(x)
and then I'd like returned:
[True, False, True]
or ideally, with padding to always return 8 boolean values (i.e. one full byte):
[False, False, False, False, False, True, False, True]
Thanks
You can use numpy's unpackbits.
From the docs (http://docs.scipy.org/doc/numpy/reference/generated/numpy.unpackbits.html)
>>> a = np.array([[2], [7], [23]], dtype=np.uint8)
>>> a
array([[ 2],
[ 7],
[23]], dtype=uint8)
>>> b = np.unpackbits(a, axis=1)
>>> b
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 1, 1, 1]], dtype=uint8)
To get to a bool array:
In [49]: np.unpackbits(np.array([1],dtype="uint8")).astype("bool")
Out[49]: array([False, False, False, False, False, False, False, True], dtype=bool)
Not a built in method, but something to get you going (and fun to write)
>>> def int_to_binary_bool(num):
return [bool(int(i)) for i in "{0:08b}".format(num)]
>>> int_to_binary_bool(5)
[False, False, False, False, False, True, False, True]

How to make this kind of equality array fast (in numpy)?

I have two numpy array (2 dimensional) e.g.
a1 = array([["a","b"],["a","c"],["b","b"],["a","b"]])
a2 = array([["a","b"],["b","b"],["c","a"],["a","c"]])
What is the most elegant way of getting a matrix like this:
array([[1,0,0,0],
[0,0,0,1],
[0,1,0,0],
[1,0,0,0]])
Where element (i,j) is 1 if all(a1[i,:] == a2[j,:]) and otherwise 0
(everything involving two for loops I don't consider elegant)
>>> (a1[:,numpy.newaxis] == a2).all(axis=2)
array([[ True, False, False, False],
[False, False, False, True],
[False, True, False, False],
[ True, False, False, False]], dtype=bool)
If you really need integers, convert to int as last step:
>>> (a1[:,numpy.newaxis] == a2).all(axis=2).astype(int)
array([[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 0, 0]])

Categories

Resources