numpy testing assert array NOT equal - python

We have numpy.testing.assert_array_equal to assert that two arrays are equal.
But what is the best way to do numpy.testing.assert_array_not_equal, that is, to make sure that two arrays are NOT equal?

If you want to use specifically NumPy testing, then you can use numpy.testing.assert_array_equal together with numpy.testing.assert_raises for the opposite result. For example:
assert_raises(AssertionError, assert_array_equal, array_1, array_2)
Also there is numpy.testing.utils.assert_array_compare (it is used by numpy.testing.assert_array_equal), but I don't see it documented anywhere, so use with caution. This one will check that every element is different, so I guess this is not your use case:
import operator
assert_array_compare(operator.__ne__, array_1, array_2)

I don't think there is anything built directly into the NumPy testing framework but you could just use:
np.any(np.not_equal(a1,a2))
and assert true with the built in unittest framework or check with NumPy as assert_equal to True e.g.
np.testing.assert_equal(np.any(np.not_equal(a,a)), True)

Not sure why this hasn't been posted, may be I didn't understand the question properly, but what about:
assert not np.array_equal(array1 , array2)
Any reason why you would like to keep it exclusively in the testing module of numpy?

Cleaner syntax to #Eswcvlad's answer:
import numpy as np
with np.testing.assert_raises(AssertionError):
np.testing.assert_array_equal(expected, actual)

Perhaps you usually want to test if something is almost equal (considering decimal precision) and consequently want to test if something is NOT almost equal in some cases. Building on #Mikhail answer (and also using pytest.raises) this would give:
import numpy as np
import pytest
with pytest.raises(AssertionError):
np.testing.assert_almost_equal(...)

Related

numpy: efficient way to do "any" or "all" on the result of an operation

Suppose that you have two NumPy arrays, a and b, and you want to test whether any value of a is greater than the corresponding value of b.
Now you could calculate a boolean array and call its any method:
(a > b).any()
This will do all the looping internally, which is good, but it suffers from the need to perform the comparison on all the pairs even if, say, the very first result evaluates as True.
Alternatively, you could do an explicit loop over scalar comparisons. An example implementation in the case where a and b are the same shape (so broadcasting is not required) might look like:
any(ai > bi for ai, bi in zip(a.flatten(), b.flatten()))
This will benefit from the ability to stop processing after the first True result is encountered, but with all the costs associated with an explicit loop in Python (albeit inside a comprehension).
Is there any way, either in NumPy itself or in an external library, that you could pass in a description of the operation that you wish to perform, rather than the result of that operation, and then have it perform the operation internally (in optimised low-level code) inside an "any" loop that can be broken out from?
One could imagine hypothetically some kind of interface like:
from array_operations import GreaterThan, Any
expression1 = GreaterThan('x', 'y')
expression2 = Any(expression1)
print(expression2.evaluate(x=a, y=b))
If such a thing exists, clearly it could have other uses beyond efficient evaluation of all and any, in terms of being able to create functions dynamically.
Is there anything like this?
One way to solve this is with delayed/deferred/lazy evaluation. The C++ community uses something called "expression templates" to achieve this; you can find an accessible overview here: http://courses.csail.mit.edu/18.337/2015/projects/TylerOlsen/18337_tjolsen_ExpressionTemplates.pdf
In Python the easiest way to do this is using Numba. You basically just write the function you need in Python using for loops, then you decorate it with #numba.njit and it's done. Like this:
#numba.njit
def any_greater(a, b):
for ai, bi in zip(a.flatten(), b.flatten()):
if ai > bi:
return True
return False
There is/was a NumPy enhancement proposal that could help your use case, but I don't think it has been implemented: https://docs.scipy.org/doc/numpy-1.13.0/neps/deferred-ufunc-evaluation.html

Python: what's the difference - abs and operator.abs

In python what is the difference between :
abs(a) and operator.abs(a)
They are the very same and they work alike. If they are the very same then why are two separate functions doing the same stuff are made??
If there is some specific functionality for any one of it - please do explain it.
There is no difference. The documentation even says so:
>>> import operator
>>> print(operator.abs.__doc__)
abs(a) -- Same as abs(a).
It is implemented as a wrapper just so the documentation can be updated:
from builtins import abs as _abs
# ...
def abs(a):
"Same as abs(a)."
return _abs(a)
(Note, the above Python implementation is only used if the C module itself can't be loaded).
It is there purely to complement the other (mathematical) operators; e.g. if you wanted to do dynamic operator lookups on that module you don't have to special-case abs().
No difference at all. You might wanna use operator.abs with functions like itertools.accumulate, just like you use operator.add for +. There is a performance differene though.
For example using operator.add is twice as fast as +(Beazly).

How to check if a value is of a NumPy type?

Imagine you have a value that might or might not be one of the NumPy dtypes. How would you write a function that checks which is the case?
def is_numpy(value):
# how to code?
One way I've found that works was used by Mike T in his answer to Converting numpy dtypes to native python types:
def is_numpy(value):
return hasattr(value, 'dtype')
I'm not sure whether or not this is the preferred method, but it's relatively simple and clean.

Calculate mean of hue angles

I have been struggling with this for some time, despite there being related questions on SO (e.g. this one).
def circmean(arr):
arr = np.deg2rad(arr)
return np.rad2deg(np.arctan2(np.mean(np.sin(arr)),np.mean(np.cos(arr))))
But the results I'm getting don't make sense! I regularly get negative values, e.g.:
test = np.array([323.64,161.29])
circmean(test)
>> -117.53500000000004
I don't know if (a) my function is incorrect, (b) the method I'm using is incorrect, or (c) I just have to do a transformation to the negative values (add 360 degrees?). My research suggests that the problem isn't (a), and I've seen implementations (e.g. here) matching my own, so I'm leaning towards (c), but I really don't know.
Following this question, I've done some research that led me to find the circmean function in the scipy library.
Considering you're using the numpy library, I thought that a proper implementation in the scipy library shall suit your needs.
As noted in my answer to the aforementioned question, I haven't found any documentation of that function, but inspecting its source code revealed the proper way it should be invoked:
>>> import numpy as np
>>> from scipy import stats
>>>
>>> test = np.array([323.64,161.29])
>>> stats.circmean(test, high=360)
242.46499999999995
>>>
>>> test = np.array([5, 350])
>>> stats.circmean(test, high=360)
357.49999999999994
This might not be of any use to you, since some time passed since you posted your question and considering you've already implemented the function yourself, but I hope it may benefit future readers who are struggling with the same issue.

numpy.max or max ? Which one is faster?

In python, which one is faster ?
numpy.max(), numpy.min()
or
max(), min()
My list/array length varies from 2 to 600. Which one should I use to save some run time ?
Well from my timings it follows if you already have numpy array a you should use a.max (the source tells it's the same as np.max if a.max available). But if you have built-in list then most of the time takes converting it into np.ndarray => that's why max is better in your timings.
In essense: if np.ndarray then a.max, if list and no need for all the machinery of np.ndarray then standard max.
I was also interested in this and tested the three variants with perfplot (a little project of mine). Result: You're not going wrong with a.max().
Code to reproduce the plot:
import numpy as np
import perfplot
b = perfplot.bench(
setup=np.random.rand,
kernels=[max, np.max, lambda a: a.max()],
labels=["max(a)", "np.max(a)", "a.max()"],
n_range=[2 ** k for k in range(25)],
xlabel="len(a)",
)
b.show()
It's probably best if you use something like the Python timeit module to test it for yourself. That way you can test your own data in your own environment, rather than relying on third parties with various test data and environments which aren't necessarily representative of yours.
numpy.min and numpy.max have slightly different semantics (and call signatures) to the builtins, so the choice shouldn't be to do with speed. Use the numpy versions if you need to be able to handle multidimensional data sanely. If you're just using Python lists or other things that don't know about dimensionality, use the builtins.

Categories

Resources