Numpy compare array to multiple scalars at once - python

Suppose I have an array
a = np.array([1,2,3])
and I want to compare it to some scalar; this works fine like
a == 2 # [False, True, False]
Is there a way I can do such a comparison but with multiple scalars at once? The default behavior when comparing two arrays is to do an elementwise comparison, but instead I want each element of one array to be compared elementwise with the entire other array, like this:
scalars = np.array([1, 2])
some_function(a, scalars)
[[True, False, False],
[False, True, False]]
Obviously I can do this, e.g., with a for loop and then stacking, but is there any vectorized way to achieve the same result?

Outer product, except it's equality instead of product:
numpy.equal.outer(scalars, a)
or adjust the dimensions and perform a broadcasted comparison:
scalars[:, None] == a

Related

Is there a 2-D "where" in numpy?

This might seem an odd question, but it boils down to quite a simple operation that I can't find a numpy equivalent for. I've looked at np.where as well as many other operations but can't find anything that does this:
a = np.array([1,2,3])
b = np.array([1,2,3,4])
c = np.array([i<b for i in a])
The output is a 2-D array (3,4), of booleans comparing each value.
If you're asking how to get c without loop, try this
# make "a" a column vector
# > broadcasts to produce a len(a) x len(b) array
c = b > a[:, None]
c
array([[False, True, True, True],
[False, False, True, True],
[False, False, False, True]])
You can extend the approach in the other answer to get the values of a and b. Given a mask of
c = b > a[:, None]
You can extract the indices for each dimension using np.where or np.nonzero:
row, col = np.nonzero(c)
And use the indices to get the corresponding values:
ag = a[row]
bg = b[col]
Elements of a and b may be repeated in the result.

Get boolean array indicating which elements in array which belong to a list

This seems to be a simple question but I am struggling with errors from quite some time.
Imagine an array
a = np.array([2,3,4,5,6])
I want to test which elements in the array belong to another list
[2,3,6]
If I do
a in [2,3,6]
Python raises "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
In return, i would like to get a boolean array-like
array([ True, True, False, False, True], dtype=bool)
Use np.isin to create a boolean mask then use np.argwhere on this mask to find the indices of array elements that are non-zero:
m = np.isin(a, lst)
indices = np.argwhere(m)
# print(m)
array([ True, True, False, False, True])
# print(indices)
array([[0], [1], [4]])
import numpy as np
arr1 = np.array([2,3,4,5,6])
arr2 = np.array([2,3,6])
arr_result = [bool(a1 in arr2) for a1 in arr1]
print(arr_result)
I have used simple list-comprehension logic to do this.
Output:
[True,True,False,False,True]

numpy multiple boolean index arrays

I have an array which I want to use boolean indexing on, with multiple index arrays, each producing a different array. Example:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
Should return something along the lines of:
[[2,3], [1]]
I assume that since the number of cells containing True can vary between masks, I cannot expect the result to reside in a 2d numpy array, but I'm still hoping for something more elegant than iterating over the masks the appending the result of indexing w by the i-th b mask to it.
Am I missing a better option?
Edit: The next step I want to do afterwards is to sum each of the arrays returned by w[b], returning a list of scalars. If that somehow makes the problem easier, I'd love to know as well.
Assuming you want a list of numpy arrays you can simply use a comprehension:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
[w[bool] for bool in b]
# [array([2, 3]), array([1])]
If your goal is just a sum of the masked values you use:
np.sum(w*b) # 6
or
np.sum(w*b, axis=1) # array([5, 1])
# or b # w
…since False times you number will be 0 and therefor won't effect the sum.
Try this:
[w[x] for x in b]
Hope this helps.

How to compare two numpy arrays of strings with the "in" operator to get a boolean array using array broadcasting?

Python allows for a simple check if a string is contained in another string:
'ab' in 'abcd'
which evaluates to True.
Now take a numpy array of strings and you can do this:
import numpy as np
A0 = np.array(['z', 'u', 'w'],dtype=object)
A0[:,None] != A0
Resulting in a boolean array:
array([[False, True, True],
[ True, False, True],
[ True, True, False]], dtype=bool)
Lets now take another array:
A1 = np.array(['u_w', 'u_z', 'w_z'],dtype=object)
I want to check where a string of A0 is not contained in a string in A1, essentially creating unique combinations, but the following does not yield a boolean array, only a single boolean, regardless of how I write the indices:
A0[:,None] not in A1
I also tried using numpy.in1d and np.ndarray.__contains__ but those methods don't seem to do the trick either.
Performance is an issue here so I want to make full use of numpy's optimizations.
How do I achieve this?
EDIT:
I found it can be done like this:
fv = np.vectorize(lambda x,y: x not in y)
fv(A0[:,None],A1)
But as the numpy docs state:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
So this is the same as just looping over the array, and it would be nice to solve this without explicit or implicit for-loop.
We can convert to string dtype and then use one of those NumPy based string functions.
Thus, using np.char.count, one solution would be -
np.char.count(A1.astype(str),A0.astype(str)[:,None])==0
Alternative using np.char.find -
np.char.find(A1.astype(str),A0.astype(str)[:,None])==-1
One more using np.char.rfind -
np.char.rfind(A1.astype(str),A0.astype(str)[:,None])==-1
If we are converting one to str dtype, we can skip the conversion for the other array, as internally it would be done anyway. So, the last method could be simplified to -
np.char.rfind(A1.astype(str),A0[:,None])==-1
Sample run -
In [97]: A0
Out[97]: array(['z', 'u', 'w'], dtype=object)
In [98]: A1
Out[98]: array(['u_w', 'u_z', 'w_z', 'zz'], dtype=object)
In [99]: np.char.rfind(A1.astype(str),A0[:,None])==-1
Out[99]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)
# Loopy solution using np.vectorize for verification
In [100]: fv = np.vectorize(lambda x,y: x not in y)
In [102]: fv(A0[:,None],A1)
Out[102]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)

Combining logic statements AND in numpy array

What would be the way to select elements when two conditions are True in a matrix?
In R, it is basically possible to combine vectors of booleans.
So what I'm aiming for:
A = np.array([2,2,2,2,2])
A < 3 and A > 1 # A < 3 & A > 1 does not work either
Evals to:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It should eval to:
array([True,True,True,True,True])
My workaround usually is to sum these boolean vectors and equate to 2, but there must be a better way. What is it?
you could just use &, eg:
x = np.arange(10)
(x<8) & (x>2)
gives
array([False, False, False, True, True, True, True, True, False, False], dtype=bool)
A few details:
This works because & is shorthand for the numpy ufunc bitwise_and, which for the bool type is the same as logical_and. That is, this could also be spelled out asbitwise_and(less(x,8), greater(x,2))
You need the parentheses because in numpy & has higher precedence than < and >
and does not work because it is ambiguous for numpy arrays, so rather than guess, numpy raise the exception.
There's a function for that:
In [8]: np.logical_and(A < 3, A > 1)
Out[8]: array([ True, True, True, True, True], dtype=bool)
Since you can't override the and operator in Python it always tries to cast its arguments to bool. That's why the code you have gives an error.
Numpy has defined the __and__ function for arrays which overrides the & operator. That's what the other answer is using.
While this is primitive, what is wrong with
A = [2, 2, 2, 2, 2]
b = []
for i in A:
b.append(A[i]>1 and A[i]<3)
print b
The output is [True, True, True, True, True]

Categories

Resources