Test the equality of multiple arguments with Numpy - python

I would like to test the equality of multiple args (i.e. it should return True if all args are equal and False if at least one argument differs).
As numpy.equal can only handle two arguments, I would have tried reduce but it, obviously, fails:
reduce(np.equal, (4, 4, 4)) # return False because...
reduce(np.equal, (True, 4)) # ... is False

You can use np.unique to check if the length of unique items within your array is 1:
np.unique(array).size == 1
Or np.all() in order to check if all of the items are equal with one of your items (for example the first one):
np.all(array == array[0])
Demo:
>>> a = np.array([1, 1, 1, 1])
>>> b = np.array([1, 1, 1, 2])
>>> np.unique(a).size == 1
True
>>> np.unique(b).size == 1
False
>>> np.all(a==a[0])
True
>>> np.all(b==b[0])
False

The numpy_indexed package has a builtin function for this. Note that it also works on multidimensional arrays, ie you can use it to check if a stack of images are all identical, for instance.
import numpy_indexed as npi
npi.all_equal(array)

If your args are floating point values the equality test can produce weird results due to round off errors. In this case you should use a more robust approach, for example numpy.allclose:
In [636]: x = [2./3., .2/.3]
In [637]: x
Out[637]: [0.6666666666666666, 0.6666666666666667]
In [638]: xarr = np.array(x)
In [639]: np.unique(xarr).size == 1
Out[639]: False
In [640]: np.all(xarr == xarr[0])
Out[640]: False
In [641]: reduce(np.allclose, x)
Out[641]: True
Note: Python 3 users will need to include the sentence from functools import reduce since reduce is no longer a built-in function in Python 3.

Related

How to extract numpy array stored in tuple?

Let's consider very easy example:
import numpy as np
a = np.array([0, 1, 2])
print(np.where(a < -1))
(array([], dtype=int64),)
print(np.where(a < 2))
(array([0, 1]),)
I'm wondering if its possible to extract length of those arrays, i.e. I want to know that the first array is empty, and the second is not. Usually it can be easily done with len function, however now numpy array is stored in tuple. Do you know how it can be done?
Just use this:
import numpy as np
a = np.array([0, 1, 2])
x = np.where(a < 2)[0]
print(len(x))
Outputs 2
To find the number of values in the array satisfying the predicate, you can skip np.where and use np.count_nonzero instead:
a = np.array([0, 1, 2])
print(np.count_nonzero(a < -1))
>>> 0
print(np.count_nonzero(a < 2))
>>> 2
If you need to know whether there are any values in a that satisfy the predicate, but not how many there are, a cleaner way of doing so is with np.any:
a = np.array([0, 1, 2])
print(np.any(a < -1))
>>> False
print(np.any(a < 2))
>>> True
np.where takes 3 arguments: condition, x, y where last two are arrays and are optional. When provided the funciton returns element from x for indices where condition is True, and y otherwise. When only condition is provided it acts like np.asarray(condition).nonzero() and returns a tuple, as in your case. For more details see Note at np.where.
Alternatively, because you need only length of sublist where condition is True, you can simply use np.sum(condition):
a = np.array([0, 1, 2])
print(np.sum(a < -1))
>>> 0
print(np.sum(a < 2))
>>> 2

Comparing values in two numpy arrays with 'if'

Im fairly new to numpy arrays and have encountered a problem when comparing one array with another.
I have two arrays, such that:
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
I want to do something like the following:
if b > a:
c = b
else:
c = a
so that I end up with an array c = np.array([2,4,3,5,5]).
This can be otherwise thought of as taking the max value for each element of the two arrays.
However, I am running into the error
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all().
I have tried using these but Im not sure that the are right for what I want.
Is someone able to offer some advice in solving this?
You are looking for the function np.fmax. It takes the element-wise maximum of the two arrays, ignoring NaNs.
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([2, 4, 3, 5, 2])
c = np.fmax(a, b)
The output is
array([2, 4, 3, 5, 5])
As with almost everything else in numpy, comparisons are done element-wise, returning a whole array:
>>> b > a
array([ True, True, False, True, False], dtype=bool)
So, is that true or false? What should an if statement do with it?
Numpy's answer is that it shouldn't try to guess, it should just raise an exception.
If you want to consider it true because at least one value is true, use any:
>>> if np.any(b > a): print('Yes!')
Yes!
If you want to consider it false because not all values are true, use all:
>>> if np.all(b > a): print('Yes!')
But I'm pretty sure you don't want either of these. You want to broadcast the whole if/else over the array.
You could of course wrap the if/else logic for a single value in a function, then explicitly vectorize it and call it:
>>> def mymax(a, b):
... if b > a:
... return b
... else:
... return a
>>> vmymax = np.vectorize(mymax)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
This is worth knowing how to do… but very rarely worth doing. There's usually a more indirect way to do it using natively-vectorized functions—and often a more direct way, too.
One way to do it indirectly is by using the fact that True and False are numerical 1 and 0:
>>> (b>a)*b + (b<=a)*a
array([2, 4, 3, 5, 5])
This will add the 1*b[i] + 0*a[i] when b>a, and 0*b[i] + 1*a[i] when b<=a. A bit ugly, but not too hard to understand. There are clearer, but more verbose, ways to write this.
But let's look for an even better, direct solution.
First, notice that your mymax function will do exactly the same as Python's built-in max, for 2 values:
>>> vmymax = np.vectorize(max)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
Then consider that for something so useful, numpy probably already has it. And a quick search will turn up maximum:
>>> np.maximum(a, b)
array([2, 4, 3, 5, 5])
Here's an other way of achieving this
c = np.array([y if y>z else z for y,z in zip(a,b)])
The following methods also work:
Use numpy.maximum
>>> np.maximum(a, b)
Use numpy.max and numpy.vstack
>>> np.max(np.vstack(a, b), axis = 0)
May not be the most efficient one but this is a more suitable answer to the original question:
import numpy as np
c = np.zeros(shape=(5,1))
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
for i in range(5):
if b.item(i) > a.item(i):
c[i] = b.item(i)
else:
c[i] = a.item(i)

Equality without using operator

I was asked if it was possible to compare two (say) lists without invoking operators, to determine if they were the same (or rather, contained the same elements).
I first entertained using
x in y
before I realised that it would not care for order, but for mere presence. Of course, if the lists were to contain purely numbers it would be trivial to do a modulus test or so, but lists can contain strings. (is didn't work either, but I didn't really expect it to, considering it tests identity...)
So I was wondering if it's (even) possible to pull off equality tests without using operators (==, !=)?
It was a mere rhetorical question, but it's been gnawing at me for some time and I've rather given up trying to solve it myself with my not-very extensive python knowledge.
Sure it is, just bypass the operators and go straight for the __eq__ special method:
>>> x = [1, 2, 3]
>>> y = [1, 2, 3]
>>> x.__eq__(y)
True
>>> z = [42]
>>> x.__eq__(z)
False
You can also use the operator module:
>>> import operator
>>> operator.eq(x, y)
True
>>> operator.eq(x, z)
False
In Python 2, you could use looping with any() and cmp(), with itertools.izip_longest() to make sure we don't ignore uneven lengths:
>>> from itertools import izip_longest
>>> not any(cmp(i, j) for i, j in izip_longest(x, y, fillvalue=object()))
True
>>> not any(cmp(i, j) for i, j in izip_longest(x, z, fillvalue=object()))
False
This works because cmp() returns 0 for values that are equal. any() returns False only if all results are false (e.g. 0).
Hell, go straight for cmp() without looping:
>>> not cmp(x, y)
True
>>> not cmp(x, z)
False
For Python 3 you'd have to create your own cmp() function, perhaps using .__lt__ and .__gt__ if you want to avoid the < and > operators too.
For lists with only integers, you can forgo the cmp() function and go straight to subtraction; let's use map() here and include the list lengths:
>>> not (len(x) - len(y)) and not any(map(lambda i, j: i - j, x, y))
True
>>> not (len(x) - len(z)) and not any(map(lambda i, j: i - j, x, z))
False
This works because map() zips up the values in the lists and passes these pairs to the first argument, a callable. That subtracts the values and only if the integers are equal do we get all 0 values and any() returns False.
Apart from Martijn Pieters's answer, i could think of following options:
using XOR:
x = [1, 2, 3]
y = [1, 2, 3]
result = "list equal"
if len(x)-len(y):
result = "list not equal"
else:
for i,j in zip(x,y):
if i ^ j:
result = "list is not equal"
break
print result
Using set:
if set(x).difference(set(y)):
print "list not equal"
else:
print "list equal"

Comparing two NumPy arrays for equality, element-wise

What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?
Simply using == gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.
(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all() behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False
Now use np.array_equal. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the
pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to
compare Vectors, Arrays or Dataframes in R.
You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE

Comparing all elements of two tuples (with all() functionality)

So i know that comparisons on tuples work lexicographically:
Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.
If not equal, the sequences are ordered the same as their first differing elements. For example, cmp([1,2,x], [1,2,y]) returns the same as cmp(x,y). If the corresponding element does not exist, the shorter sequence is ordered first (for example, [1,2] < [1,2,3]).
So from this:
>>> a = (100, 0)
>>> b = (50, 50)
>>> a > b
True
But i want to compare all elements of 2 tuples in order, so functionally i want something akin to (using values from above):
>>> a > b
(True, False) #returned tuple containing each comparison
>>> all(a > b)
False
As an example in practice, for something like screen coordinates, if you wanted to check if something was 'inside' the screen at (0,0), but done a comparison like coord > (0,0), if the x coord was bigger than 0, but the y coord was smaller it would still return true, which isn't what is needed in this case.
As sort of a sub question/discussion:
I am not sure why comparing 2 tuples of different values is returned in such a way. You are not given any sort of index, so the only thing you get from comparing a tuple (that isn't testing equality) is that at some point in the tuple, one of the comparisons will throw a true or false value when they are not equal. How could you take advantage of that?
You can achieve this with a list comprehension and the zip built-in:
>>> a = (100, 0)
>>> b = (50, 50)
>>> [(a > b) for a, b in zip(a,b)]
[True, False]
You can use all() or any() on the returned list.
Replace a > b with tuple(i > j for i, j in zip(a,b)) in your second code sample.
>>> a = (100, 0)
>>> b = (50, 50)
>>> tuple(i > j for i, j in zip(a,b))
(True, False)
>>> all(i > j for i, j in zip(a,b))
False
You might consider using the following vectorized approach, which is usually more performant, and syntactically/semantically very clear:
>>> import numpy
>>>
>>> a = (100, 0)
>>> b = (50, 50)
>>> numpy.array(a) > b
array([ True, False], dtype=bool)
>>>
>>> (numpy.array(a) > b).any()
True
>>> (numpy.array(a) > b).all()
False
numpy is quite performant, and the resulting objects above also embed the any()/all() query methods you want. If you will be performing vector-like operations (as your screen coordinates example suggests), you may consider working with 'a' and 'b' as numpy arrays, instead of tuples. That results in the most efficient implementation of what you seek: no pre-conversion necessary, and Python-based loops are replaced with efficient numpy-based loops. This is worth highlighting because there are two and potentially three loops involved: (1) a preprocessing loop during conversion (which you can eliminate); (2) an item-by-item comparison loop; and (3) a query loop to answer the any/all question.
Note that I could've also created a numpy array from 'b', but not doing so eliminated one conversion step and pre-processing time. Since that approach results in one operand being a numpy array and the other a tuple, as the sequences grow, that may/may-not result in less speedy item-by-item comparisons (which strict numpy-to-numpy is good at). Try it. :)
I felt like the use of map and lambda functions was missing from the answers
>>> a = (100, 0)
>>> b = (50, 50)
>>> all(map(lambda x,y: x > y, a, b))
False
To get the described behavior, try:
[ai > bi for ai,bi in zip(a,b)]
The reason that comparisons of tuples are returned in that way is that you might want to write something like:
if a >= (0.,0.):
print "a has only positive values"
else:
print "a has at least one negative value"
If Python were to return the tuple that you describe, then the else would never happen. Try
if (False,False):
print "True!" # This is what is printed.
else:
print "False!"
I hope this helps.

Categories

Resources