finding elements in an array containing numpy array [duplicate] - python

I have a list of numpy arrays, say,
a = [np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3)]
and I have a test array, say
b = np.random.rand(3, 3)
I want to check whether a contains b or not. However
b in a
throws the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
What is the proper way for what I want?

You can just make one array of shape (3, 3, 3) out of a:
a = np.asarray(a)
And then compare it with b (we're comparing floats here, so we should use isclose())
np.all(np.isclose(a, b), axis=(1, 2))
For example:
a = [np.random.rand(3,3),np.random.rand(3,3),np.random.rand(3,3)]
a = np.asarray(a)
b = a[1, ...] # set b to some value we know will yield True
np.all(np.isclose(a, b), axis=(1, 2))
# array([False, True, False])

As highlighted by #jotasi the truth value is ambiguous due to element-wise comparison within the array.
There was a previous answer to this question here. Overall your task can be done in various ways:
list-to-array:
You can use the "in" operator by converting the list to a (3,3,3)-shaped array as follows:
>>> a = [np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3)]
>>> a= np.asarray(a)
>>> b= a[1].copy()
>>> b in a
True
np.all:
>>> any(np.all((b==a),axis=(1,2)))
True
list-comperhension:
This done by iterating over each array:
>>> any([(b == a_s).all() for a_s in a])
True
Below is a speed comparison of the three approaches above:
Speed Comparison
import numpy as np
import perfplot
perfplot.show(
setup=lambda n: np.asarray([np.random.rand(3*3).reshape(3,3) for i in range(n)]),
kernels=[
lambda a: a[-1] in a,
lambda a: any(np.all((a[-1]==a),axis=(1,2))),
lambda a: any([(a[-1] == a_s).all() for a_s in a])
],
labels=[
'in', 'np.all', 'list_comperhension'
],
n_range=[2**k for k in range(1,20)],
xlabel='Array size',
logx=True,
logy=True,
)

Ok so in doesn't work because it's effectively doing
def in_(obj, iterable):
for elem in iterable:
if obj == elem:
return True
return False
Now, the problem is that for two ndarrays a and b, a == b is an array (try it), not a boolean, so if a == b fails. The solution is do define a new function
def array_in(arr, list_of_arr):
for elem in list_of_arr:
if (arr == elem).all():
return True
return False
a = [np.arange(5)] * 3
b = np.ones(5)
array_in(b, a) # --> False

This error is because if a and b are numpy arrays then a == b doesn't return True or False, but array of boolean values after comparing a and b element-wise.
You can try something like this:
np.any([np.all(a_s == b) for a_s in a])
[np.all(a_s == b) for a_s in a] Here you are creating list of boolean values, iterating through elements of a and checking if all elements in b and particular element of a are the same.
With np.any you can check if any element in your array is True

As pointed out in this answer, the documentation states that:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
a[0]==b is an array, though, containing an element-wise comparison of a[0] and b. The overall truth value of this array is obviously ambiguous. Are they the same if all elements match, or if most match of if at least one matches? Therefore, numpy forces you to be explicit in what you mean. What you want to know, is to test whether all elements are the same. You can do that by using numpy's all method:
any((b is e) or (b == e).all() for e in a)
or put in a function:
def numpy_in(arrayToTest, listOfArrays):
return any((arrayToTest is e) or (arrayToTest == e).all()
for e in listOfArrays)

Use array_equal from numpy
import numpy as np
a = [np.random.rand(3,3),np.random.rand(3,3),np.random.rand(3,3)]
b = np.random.rand(3,3)
for i in a:
if np.array_equal(b,i):
print("yes")

Related

While loop for two conditions to be checked simultaneously in array form?

I am writing a while loop for the following statement in python:
I have a vector valued function which returns a 2x1 array with x and y values
I want to write a code that ensures the loop is only run when the [x,y] given by the function are less than [x,y]
I tried to use a.all() however getting an attribute error
Is there another way to check two conditions simultaneously ?
I assumed that the array in your code is np.array.
Let's define the data a, b and c.
a = np.array([[3], [2]]) #[[3], [2]]
b = np.array([[2], [1]])
c = np.array([[4], [1]])
If we do the following comparison, we will obtain
In [1]: a > b
Out[1]:
array([[ True],
[ True]])
In [2]: a > c
Out[2]:
array([[False],
[ True]])
Since you want to ensure both conditions to be true at the same time, you can use the python built-in all(), which returns True only if all of the parameters are true.
In [3]: all(a > c)
Out[3]: False
In [4]: all(a > b)
Out[4]: True
You can do it either like this:
while all(abs(val) > err for val, err in zip(f(x, y), a):
...
Or like this:
while abs(f(x, y)[0]) > a[0] and abs(f(x, y)[1]) > a[1]:
...

Elementwise and in python list

I have two python lists A and B of equal length each containing only boolean values. Is it possible to get a third list C where C[i] = A[i] and B[i] for 0 <= i < len(A) without using loop?
I tried following
C = A and B
but probably it gives the list B
I also tried
C = A or B
which gives first list
I know it can easily be done using for loop in single line like C = [x and y for x, y in zip(A, B)].
I'd recommend you use numpy to use these kind of predicates over arrays. Now, I don't think you can avoid loops to achieve what you want, but... if you don't consider mapping or enumerating as a form of looping, you could do something like this (C1):
A = [True, True, True, True]
B = [False, False, True, True]
C = [x and y for x, y in zip(A, B)]
C1 = map(lambda (i,x): B[i] and x, enumerate(A))
C2 = [B[i] and x for i,x in enumerate(A)]
print C==C1==C2
You can do it without an explicit loop by using map, which performs the loop internally, at C speed. Of course, the actual and operation is still happening at Python speed, so I don't think it'll save much time (compared to doing essentially the same thing with Numpy, which can not only do the looping at C speed, it can do the and operation at C speed too. Of course, there's also the overhead of converting between native Python lists & Numpy arrays).
Demo:
from operator import and_
a = [0, 1, 0, 1]
b = [0, 0, 1, 1]
c = map(and_, a, b)
print c
output
[0, 0, 0, 1]
Note that the and_ function performs a bitwise and operation, but that should be ok since you're operating on boolean values.
Simple answer: you can't.
Except in the trivial way, which is by calling a function that does this for you, using a loop. If you want this kind of nice syntax you can use libraries as suggested: map, numpy, etc. Or you can write your own function.
If what you are looking for is syntactic convenience, Python does not allow overloading operators for built-in types such as list.
Oh, and you can use recursion, if that's "not a loop" for you.
Is it possible to get a third list C where C[i] = A[i] and B[i] for 0 <= i < len(A) without using loop?
Kind of:
class AndList(list):
def __init__(self, A, B):
self.A = A
self.B = B
def __getitem__(self, index):
return self.A[index] and self.B[index]
A = [False, False, True, True]
B = [False, True, False, True]
C = AndList(A, B)
print isinstance(C, list) and all(C[i] == (A[i] and B[i])
for i in range(len(A)))
Prints True.

Optimizing logical operations on nested numpy arrays

I start with a numpy array of numpy arrays, where each of the inner numpy arrays can have different lengths. An example is given below:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5])
c = np.array([a, b])
print c
[[1 2 3] [4 5]]
I'd like to be able to perform a boolean operation on every element in every element in array c, but when I try to I get the following value error:
print c > 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
I'd like to be able to get the result:
[[True True True] [True True]]
Without using a for loop or iterating on the outer array. Is this possible, and if so, how can I accomplish it?
I can think of two broad approaches, either pad your arrays so that you can a single 2d array instead of nested arrays, or treat your nested arrays as a list of arrays. The first would look something like this:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5, -99])
c = np.array([a, b])
print c.shape
# (2, 3)
print c > 0
# [[ True True True]
# [ True True False]]
Or do something like:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5])
c = np.array([a, b])
out = [i > 0 for i in c]
print out
# [array([ True, True, True], dtype=bool), array([ True, True], dtype=bool)]
If padding is not an option, you might in fact find that lists of arrays are better behaved than arrays of arrays.

Comparing two NumPy arrays for equality, element-wise

What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE

Comparing all elements of two tuples (with all() functionality)

So i know that comparisons on tuples work lexicographically:
Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.
If not equal, the sequences are ordered the same as their first differing elements. For example, cmp([1,2,x], [1,2,y]) returns the same as cmp(x,y). If the corresponding element does not exist, the shorter sequence is ordered first (for example, [1,2] < [1,2,3]).
So from this:
>>> a = (100, 0)
>>> b = (50, 50)
>>> a > b
True
But i want to compare all elements of 2 tuples in order, so functionally i want something akin to (using values from above):
>>> a > b
(True, False) #returned tuple containing each comparison
>>> all(a > b)
False
As an example in practice, for something like screen coordinates, if you wanted to check if something was 'inside' the screen at (0,0), but done a comparison like coord > (0,0), if the x coord was bigger than 0, but the y coord was smaller it would still return true, which isn't what is needed in this case.
As sort of a sub question/discussion:
I am not sure why comparing 2 tuples of different values is returned in such a way. You are not given any sort of index, so the only thing you get from comparing a tuple (that isn't testing equality) is that at some point in the tuple, one of the comparisons will throw a true or false value when they are not equal. How could you take advantage of that?
You can achieve this with a list comprehension and the zip built-in:
>>> a = (100, 0)
>>> b = (50, 50)
>>> [(a > b) for a, b in zip(a,b)]
[True, False]
You can use all() or any() on the returned list.
Replace a > b with tuple(i > j for i, j in zip(a,b)) in your second code sample.
>>> a = (100, 0)
>>> b = (50, 50)
>>> tuple(i > j for i, j in zip(a,b))
(True, False)
>>> all(i > j for i, j in zip(a,b))
False
You might consider using the following vectorized approach, which is usually more performant, and syntactically/semantically very clear:
>>> import numpy
>>>
>>> a = (100, 0)
>>> b = (50, 50)
>>> numpy.array(a) > b
array([ True, False], dtype=bool)
>>>
>>> (numpy.array(a) > b).any()
True
>>> (numpy.array(a) > b).all()
False
numpy is quite performant, and the resulting objects above also embed the any()/all() query methods you want. If you will be performing vector-like operations (as your screen coordinates example suggests), you may consider working with 'a' and 'b' as numpy arrays, instead of tuples. That results in the most efficient implementation of what you seek: no pre-conversion necessary, and Python-based loops are replaced with efficient numpy-based loops. This is worth highlighting because there are two and potentially three loops involved: (1) a preprocessing loop during conversion (which you can eliminate); (2) an item-by-item comparison loop; and (3) a query loop to answer the any/all question.
Note that I could've also created a numpy array from 'b', but not doing so eliminated one conversion step and pre-processing time. Since that approach results in one operand being a numpy array and the other a tuple, as the sequences grow, that may/may-not result in less speedy item-by-item comparisons (which strict numpy-to-numpy is good at). Try it. :)
I felt like the use of map and lambda functions was missing from the answers
>>> a = (100, 0)
>>> b = (50, 50)
>>> all(map(lambda x,y: x > y, a, b))
False
To get the described behavior, try:
[ai > bi for ai,bi in zip(a,b)]
The reason that comparisons of tuples are returned in that way is that you might want to write something like:
if a >= (0.,0.):
print "a has only positive values"
else:
print "a has at least one negative value"
If Python were to return the tuple that you describe, then the else would never happen. Try
if (False,False):
print "True!" # This is what is printed.
else:
print "False!"
I hope this helps.

Categories

Resources