How to map numpy arrays to one another? - python

I have two (A, B) boolean arrays of the same finite, but arbitrarily large, and only known at runtime shape and dimensions.
I want to calculate the value of a boolean function of corresponding elements in A and B and store them in C. At last I need a list of tuples where C is true.
How to get there?
I dont want to iterate over the single elements, because I dont know how many dimensions there are, there must be a better way.
>>> A = array([True, False, True, False, True, False]).reshape(2,3)
>>> B = array([True, True, False, True, True, False]).reshape(2,3)
>>> A == B
array([[ True, False, False],
[False, True, True]], dtype=bool)
as wanted, but:
>>> A and B
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
How do I get "A and B"?
I tried "map", "zip", "nditer" and searched for other methods unsuccessfully.
As for the things with the tuples, I need something like "argmax" for booleans, but didn't find anything either.
Do you know somethign, that might help?

You can also use the & operator:
In [5]: A & B
array([[ True, False, False],
[False, True, False]], dtype=bool)
The big win with the logical_and call is that you can use the out parameter:
In [6]: C = empty_like(A)
In [7]: logical_and(A, B, C)
array([[ True, False, False],
[False, True, False]], dtype=bool)

Yes, there is a function in NumPy:
numpy.logical_and(A,B)

Related

Check if each element in a numpy array is in a separate list

I'd like to do something like this:
>>> y = np.arange(5)
>>> y in (0, 1, 2)
array([True, True, True, False, False])
This syntax doesn't work. What's the best way to achieve the desired result?
(I'm looking for a general solution. Obviously in this specific case I could do y < 3.)
I'll spell this out a little more clearly for you guys, since at least a few people seem to be confused.
Here is a long way of getting my desired behavior:
new_y = np.empty_like(y)
for i in range(len(y)):
if y[i] in (0, 1, 2):
new_y[i] = True
else:
new_y[i] = False
I'm looking for this behavior in a more compact form.
Here's another solution:
new_y = np.array([True if item in (0, 1, 2) else False for item in y])
Again, just looking for a simpler way.
A good general purpose tool is a broadcasted, or 'outer', comparison between elements of two arrays:
In [35]: y=np.arange(5)
In [36]: x=np.array([0,1,2])
In [37]: y[:,None]==x
Out[37]:
array([[ True, False, False],
[False, True, False],
[False, False, True],
[False, False, False],
[False, False, False]])
This is doing a fast comparison between every element of y and every element of x. Depending on your needs, you can condense this array along one of the axes:
In [38]: (y[:,None]==x).any(axis=1)
Out[38]: array([ True, True, True, False, False])
A comment suggested in1d. I think it's a good idea to look at its code. It has several strategies depending on the relative sizes of the inputs.
In [40]: np.in1d(y,x)
Out[40]: array([ True, True, True, False, False])
In [41]: np.array([True if item in x else False for item in y])
Out[41]: array([ True, True, True, False, False])
Which is fastest may depend on the size of the inputs. Starting lists your list comprehension might be faster. This pure list version is by far the fastest:
[True if item in (0,1,2) else False for item in (0,1,2,3,4)]
[item in (0,1,2) for item in (0,1,2,3,4)] # simpler

Finding False-True transitions in a numpy array

Given a numpy array:
x = np.array([False, True, True, False, False, False, False, False, True, False])
How do I find the number of times the values transitions from False to True?
For the above example, the answer would be 2. I don't want to include transitions from True to False in the count.
From the answers to How do I identify sequences of values in a boolean array?, the following produces the indices at which the values are about to change, which is not what I want as this includes True-False transitions.
np.argwhere(np.diff(x)).squeeze()
# [0 2 7 8]
I know that this can be done by looping through the array, however I was wondering if there was a faster way to do this?
Get one-off slices - x[:-1] (starting from the first elem and ending in second last elem) and x[1:] (starting from the second elem and going on until the end), then look for the first slice being lesser than the second one, i.e. catch the pattern of [False, True] and finally get the count with ndarray.sum() or np.count_nonzero() -
(x[:-1] < x[1:]).sum()
np.count_nonzero(x[:-1] < x[1:])
Another way would be to look for the first slice being False and the second one as True, the idea again being to catch that pattern of [False, True] -
(~x[:-1] & x[1:]).sum()
np.count_nonzero(~x[:-1] & x[1:])
I kind of like to use numpy method "roll" for this kind of problems...
"roll" rotates the array to left some step length : (-1,-2,...) or to right (1,2,...)
import numpy as np
np.roll(x,-1)
...this will give x but shifted one step to the left:
array([ True, True, False, False, False, False, False, True, False, False],
dtype=bool)
A False followed by a True can then be expressed as:
~x & np.roll(x,-1)
array([ True, False, False, False, False, False, False, True, False, False],
dtype=bool)

How to compare two numpy arrays of strings with the "in" operator to get a boolean array using array broadcasting?

Python allows for a simple check if a string is contained in another string:
'ab' in 'abcd'
which evaluates to True.
Now take a numpy array of strings and you can do this:
import numpy as np
A0 = np.array(['z', 'u', 'w'],dtype=object)
A0[:,None] != A0
Resulting in a boolean array:
array([[False, True, True],
[ True, False, True],
[ True, True, False]], dtype=bool)
Lets now take another array:
A1 = np.array(['u_w', 'u_z', 'w_z'],dtype=object)
I want to check where a string of A0 is not contained in a string in A1, essentially creating unique combinations, but the following does not yield a boolean array, only a single boolean, regardless of how I write the indices:
A0[:,None] not in A1
I also tried using numpy.in1d and np.ndarray.__contains__ but those methods don't seem to do the trick either.
Performance is an issue here so I want to make full use of numpy's optimizations.
How do I achieve this?
EDIT:
I found it can be done like this:
fv = np.vectorize(lambda x,y: x not in y)
fv(A0[:,None],A1)
But as the numpy docs state:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
So this is the same as just looping over the array, and it would be nice to solve this without explicit or implicit for-loop.
We can convert to string dtype and then use one of those NumPy based string functions.
Thus, using np.char.count, one solution would be -
np.char.count(A1.astype(str),A0.astype(str)[:,None])==0
Alternative using np.char.find -
np.char.find(A1.astype(str),A0.astype(str)[:,None])==-1
One more using np.char.rfind -
np.char.rfind(A1.astype(str),A0.astype(str)[:,None])==-1
If we are converting one to str dtype, we can skip the conversion for the other array, as internally it would be done anyway. So, the last method could be simplified to -
np.char.rfind(A1.astype(str),A0[:,None])==-1
Sample run -
In [97]: A0
Out[97]: array(['z', 'u', 'w'], dtype=object)
In [98]: A1
Out[98]: array(['u_w', 'u_z', 'w_z', 'zz'], dtype=object)
In [99]: np.char.rfind(A1.astype(str),A0[:,None])==-1
Out[99]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)
# Loopy solution using np.vectorize for verification
In [100]: fv = np.vectorize(lambda x,y: x not in y)
In [102]: fv(A0[:,None],A1)
Out[102]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)

Using numpy any() in bool array of arrays

I have a list of lists which are composed by bools, let's say l = [[False, False], [True, False]], and I need to convert l to a numpy array of arrays of booleans. I converted every sublist into a bool array, and the whole list to numpy array too. My current real list has a size of 121 sublists, and the result of np.any() throws just five results, not the 121 expected. My code is this:
>>> result = np.array([ np.array(extracted[aindices[i]:aindices[i + 1]]) for i in range(len(aux_regions)) ])
>>> np.any(result)
[false, false, false, false, false]
extracted[aindices[i]:aindices[i + 1]] is the sublist which I convert to a bool array. The list generated in the whole line is converted to array too.
In the first example l the expected result is, for every subarray (asuming the list as converted) should be [False, True]
What's is the problem using np.any? or the data types for the converted list are not the right ones?
If you have a list of list of bools, you could skip numpy and use a simple comprehension:
In [1]: l = [[False, False], [True, False]]
In [2]: [any(subl) for subl in l]
Out[2]: [False, True]
If the sublists are all the same length, you can pass the list directly to np.array to get a numpy array of bools:
In [3]: import numpy as np
In [4]: result = np.array(l)
In [5]: result
Out[5]:
array([[False, False],
[ True, False]], dtype=bool)
Then you can use the any method on axis 1 to get the result for each row:
In [6]: result.any(axis=1) # or `np.any(result, axis=1)`
Out[6]: array([False, True], dtype=bool)
If the sublists are not all the same length, then a numpy array might not be the best data structure for this problem.
This part of my answer should be considered a "side bar" to what I wrote above. If the sublists have variable lengths, the list comprehension given above is my recommendation. The following is an alternative that uses an advanced numpy feature. I only suggest it because it looks like you already have the data structures needed to used numpy's reduceat function. It works without having to explicitly form the list of lists.
From reading your code, I infer the following:
extracted is a list of bools. You are splitting this up into sublists.
aindices is a list of integers. Each consecutive pair of integers in aindices specifies a range in extracted that is a sublist.
len(aux_regions) is the number of sublists; I'll call this n. The length of aindices is n+1, and the last value in aindices is the length of extracted.
For example, if the data looks like this:
In [74]: extracted
Out[74]: [False, True, False, False, False, False, True, True, True, True, False, False]
In [75]: aindices
Out[75]: [0, 3, 7, 10, 12]
it means there are four sublists:
In [76]: extracted[0:3]
Out[76]: [False, True, False]
In [77]: extracted[3:7]
Out[77]: [False, False, False, True]
In [78]: extracted[7:10]
Out[78]: [True, True, True]
In [79]: extracted[10:12]
Out[79]: [False, False]
With these data structures, you are set up to use the reduceat feature of numpy. The ufunc in this case is logical_or. You can compute the result with this one line:
In [80]: np.logical_or.reduceat(extracted, aindices[:-1])
Out[80]: array([ True, True, True, False], dtype=bool)

Combining logic statements AND in numpy array

What would be the way to select elements when two conditions are True in a matrix?
In R, it is basically possible to combine vectors of booleans.
So what I'm aiming for:
A = np.array([2,2,2,2,2])
A < 3 and A > 1 # A < 3 & A > 1 does not work either
Evals to:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It should eval to:
array([True,True,True,True,True])
My workaround usually is to sum these boolean vectors and equate to 2, but there must be a better way. What is it?
you could just use &, eg:
x = np.arange(10)
(x<8) & (x>2)
gives
array([False, False, False, True, True, True, True, True, False, False], dtype=bool)
A few details:
This works because & is shorthand for the numpy ufunc bitwise_and, which for the bool type is the same as logical_and. That is, this could also be spelled out asbitwise_and(less(x,8), greater(x,2))
You need the parentheses because in numpy & has higher precedence than < and >
and does not work because it is ambiguous for numpy arrays, so rather than guess, numpy raise the exception.
There's a function for that:
In [8]: np.logical_and(A < 3, A > 1)
Out[8]: array([ True, True, True, True, True], dtype=bool)
Since you can't override the and operator in Python it always tries to cast its arguments to bool. That's why the code you have gives an error.
Numpy has defined the __and__ function for arrays which overrides the & operator. That's what the other answer is using.
While this is primitive, what is wrong with
A = [2, 2, 2, 2, 2]
b = []
for i in A:
b.append(A[i]>1 and A[i]<3)
print b
The output is [True, True, True, True, True]

Categories

Resources