good way to test `numpy.allclose` on a time series?

good way to test `numpy.allclose` on a time series? - python

I'm trying to test in Python whether a vector of recovered times is close to a vector of ground truth times. Let's ignore how we recover the times, it's not relevant to the question.
My first instinct was to use numpy.allclose, but unless I'm misunderstanding something, allclose is actually a bad fit here because of how it works.
Essentially you specify an absolute tolerance atol and relative tolerance rtol, along with your ground truth vector b and a comparison vector a, and numpy.allclose returns:
all(numpy.abs(a - b) <= atol + rtol * numpy.abs(b))
There's some nuance to what the actual function does as you can see in the
source but the "pseudo-numpython" above from the docs gives you the basic idea.
The issue is that with any monotonically-increasing vector of positive values, like a time series, your tolerance actually will increase!
Take this series of times in seconds:
>>> times_true = array([0.01147392, 0.46244898, 0.78571429, 1.22238095, 1.74857143,
2.30984127, 2.92777778, 3.57 , 4.16634921, 4.76809524])
>>> times_recovered = array([0.00944365, 0.46007857, 0.7838881 , 1.22103095, 1.74722143,
2.30849127, 2.92642778, 3.56865 , 4.16499921, 4.76674524])
I want my times to be no more than a millisecond apart, plus or minus some wiggle room. This is basically the case for my example vectors.
>>> np.abs(times_recovered - times_true)
array([0.00203027, 0.00237041, 0.00182619, 0.00135 , 0.00135 ,
0.00135 , 0.00135 , 0.00135 , 0.00135 , 0.00135 ])
since I want the values to be "roughly 1 msec apart", I specify atol to be 0.001 and my rtol to be 0.001. My understanding of these terms right now is that atol is the absolute difference between each element of a and b, i.e., np.abs(a - b), and that rtol is some additional "slop" tolerance we can add. (edit: changed how I defined the terms originally).
Now look at the what this gives me for the second term above:
>>> atol, rtol = 0.001, 0.001
>>> rtol * np.abs(times_true)
array([1.14739229e-05, 4.62448980e-04, 7.85714286e-04, 1.22238095e-03,
1.74857143e-03, 2.30984127e-03, 2.92777778e-03, 3.57000000e-03,
4.16634921e-03, 4.76809524e-03])
For this vector of times, we start out with a relative tolerance of 1.e-5 and finish with 1.e-3, a two orders of magnitude difference. In other words, allclose will check whether the differences np.abs(a - b) are less than or equal to the following:
>>> atol + rtol * np.abs(times_true)
array([0.00201147, 0.00246245, 0.00278571, 0.00322238, 0.00374857,
0.00430984, 0.00492778, 0.00557 , 0.00616635, 0.0067681 ])
This seems bad? I want my tolerance to be roughly the same at every point but it's clearly increasing. And the tolerance will only continue to increase as I get larger times in my vectors. It's also bad because for small times my tolerance will actually be smaller! Giving me false alarms.
It seems like what I should really do is just take np.abs(times_recovered - times_true) and ask whether any of the values are greater than the largest difference I'm willing to tolerate
>>> MAX_DIFF = 0.003
>>> assert not np.any(np.abs(times_recovered - times_true) > MAX_DIFF)
but if so then am I just completely understanding how numpy.allclose is supposed to work?
Any feedback from sage scientific Pythonistas would be appreciated

For your problem you want all your (pointwise) errors to be close to zero. So... just use allclose on the error timeseries and a zero vector (it broadcasts under the hood):
t_err = times_recovered - times_true
MAX_DIFF = 0.003
np.allclose(t_err, 0, rtol=0, atol=MAX_DIFF)
This is effectively the same as your assert statement (but don't use assert in production code, unless it's a test!) - your choice which you want to use.

Related

What is rtol for in numpy's allclose function?

In numpy.allclose() there are two tolerance factors used to determine if two arrays are close enough to count as the same. There is the relative tolerance rtol and absolute tolerance atol. From the docs:
numpy.allclose(a, b, rtol=1e-05, atol=1e-08)
Also from the docs:
If the following equation is element-wise True, then allclose returns True.
absolute(a - b) <= (atol + rtol * absolute(b))
Mathematically I understand this, but I am confused about the point of rtol. Why not just use a single tolerance value tol, and if |a-b| < tol, then return False? Obviously, following the above equation, I could just do this manually by setting rtol to zero, thereby making everything symmetric. What is the point of the symmetry-breaking rtol factor?
Related question
How allclose() works?

The confusing part is that the equation shows both parameters being used at the same time. Look at it like this instead:
Usecase 1: absolute tolerance (atol): absolute(a - b) <= atol
Usecase 2: relative tolerance (rtol): absolute(a - b) <= rtol * absolute(b)
An alternative way to implement both with a single tolerance parameter would be to add a flag that determines if the toerance is relative or absolute. Separating the use-cases like that breaks down in the usecase where array values can be both large and zero. If only one array can have zeros, make that one a and use the asymmetrical equation to your benefit without atol. If either one can have zeros, simply set rtol to some acceptable value for large elements, and set atol to the value you want to kick in for zeros.
You generally want to use rtol: since the precision of numbers and calculations is very much finite, larger numbers will almost always be less precise than smaller ones, and the difference scales linearly (again, in general). The only time you use atol is for numbers that are so close to zero that rounding errors are liable to be larger than the number itself.
Another way to look at it is atol compares fixed decimal places, while rtol compares significant figures.

Which tolerance(s) to use depends on your problem statement. For example, what if my array has a wide domain of values, ranging from 1e-10 to 1e10? A small atol would work well for small values, but poorly for large values, and vice-versa for a large atol. But rtol is perfect in this case because I can specify that the acceptable delta should scale with each value.

Max Expected Floating Point Error in Python

I have the following method in my python code that compares values between two objects to determine if they are equal:
def equals(self, vec, tol):
return all(i < tol for i in [abs(a - b) for a, b in zip(self, vec)])
I want to give a default value to my tolerance variable, tol, such that it is the smallest possible value that is always greater than error that could occur from floating-point inaccuracies. What is this value?

The largest possible error is infinity, and NaN (Not a Number) is also possible. There is no general formula that is correct for tol. Determining what error could occur always requires knowledge of the values used and the operations performed.
Additionally, there are limited situations where “comparing for equality using a tolerance” is a proper technique. (Testing software is one of them.) Comparing for equality using a tolerance reduces the risk of deciding two numbers are unequal even though they would be equal if computed with exact mathematics, but it does so at the expense of falsely accepting as equal two numbers that would be unequal if computed with exact mathematics. Even deciding whether such a trade-off is acceptable is application-dependent, let alone deciding what the tolerance should be.

I usually use something like this with numpy:
tol = max(np.finfo(float).eps, np.finfo(float).eps * abs(a - b))

Numerical solution of exponential equation using Python or other software

I want to find numerical solutions to the following exponential equation where a,b,c,d are constants and I want to solve for r, which is not equal to 1.
a^r + b^r = c^r + d^r (Equation 1)
I define a function in order to use Scipy.optimize.fsolve:
from scipy.optimize import fsolve
def func(r,a,b,c,d):
if r==1:
return 10**5
else:
return ( a**(1-r) + b**(1-r) ) - ( c**(1-r) + d**(1-r) )
fsolve(funcp,0.1, args=(5,5,4,7))
However, the fsolve always returns 1 as the solution, which is not what I want. Can someone help me with this issue? Or in general, tell me how to solve (Equation 1). I used an online numerical solver long time ago, but I cannot find it anymore. That's why I am trying to figure it out using Python.

You need to apply some mathematical reasoning when choosing the initial guess. Consider your problem f(r) = (51-r + 51-r) − (41-r + 71-r)
When r ≤ 1, f(r) is always negative and decreasing (since 71-r is growing much faster than other terms). Therefore, all root-finding algorithms will be pushed to right towards 1 until reaching this local solution.
You need to pick a point far away from 1 on the right to find the nontrivial solution:
>>> scipy.optimize.fsolve(lambda r: 5**(1-r)+5**(1-r)-4**(1-r)-7**(1-r), 2.0)
array([ 2.48866034])
Simply setting f(1) = 105 is not going to have any effect, as the root-finding algorithm won't check f(1) until the very last step(note).
If you wish to apply a penalty, the penalty must be applied to a range of value around 1. One way to do so, without affecting the position of other roots, is to divide the whole function by (r − 1):
>>> scipy.optimize.fsolve(lambda r: (5**(1-r)+5**(1-r)-4**(1-r)-7**(1-r)) / (r-1), 0.1)
array([ 2.48866034])
(note): they may climb like f(0.1) → f(0.4) → f(0.7) → f(0.86) → f(0.96) → f(0.997) → … and stop as soon as |f(x)| < 10-5, so your f(1) is never evaluated

First of your code seems to uses a different equation than your question: 1-r instead of just r.
Valid answers to the equation is 1 and 2.4886 approximately as can be seen here. With the second argument of fsolve you specify a starting estimate. I think due to 0.1 being close to 1 you get that result. Using the 2.1 as starting estimate I get the other answer 2.4886.
from scipy.optimize import fsolve
def func(r,a,b,c,d):
if r==1:
return 10**5
else:
return ( a**(1-r) + b**(1-r) ) - ( c**(1-r) + d**(1-r) )
print(fsolve(func, 2.1, args=(5,5,4,7)))
Chosing a starting estimate is tricky as many give the following error: ValueError: Integers to negative integer powers are not allowed.

Float precision breakdown in python/numpy when adding numbers

I have some problems due to really low numbers used with numpy. It took me several weeks to trace back my constant problems with numerical integration to the fact, that when I add up floats in a function the float64 precision gets lost. Performing the mathematically identic calculation with a product instead of a sum leads to values that are alright.
Here is a code sample and a plot of the results:
from matplotlib.pyplot import *
from numpy import vectorize, arange
import math
def func_product(x):
return math.exp(-x)/(1+math.exp(x))
def func_sum(x):
return math.exp(-x)-1/(1+math.exp(x))
#mathematically, both functions are the same
vecfunc_sum = vectorize(func_sum)
vecfunc_product = vectorize(func_product)
x = arange(0.,300.,1.)
y_sum = vecfunc_sum(x)
y_product = vecfunc_product(x)
plot(x,y_sum, 'k.-', label='sum')
plot(x,y_product,'r--',label='product')
yscale('symlog', linthreshy=1E-256)
legend(loc='lower right')
show()
As you can see, the summed values that are quite low are scattered around zero or are exactly zero while the multiplicated values are fine...
Please, could someone help/explain? Thanks a lot!

Floating point precision is pretty sensitive to addition/subtraction due to roundoff error. Eventually, 1+exp(x) gets so big that adding 1 to exp(x) gives the same thing as exp(x). In double precision that's somewhere around exp(x) == 1e16:
>>> (1e16 + 1) == (1e16)
True
>>> (1e15 + 1) == (1e15)
False
Note that math.log(1e16) is approximately 37 -- Which is roughly where things go crazy on your plot.
You can have the same problem, but on different scales:
>>> (1e-16 + 1.) == (1.)
True
>>> (1e-15 + 1.) == (1.)
False
For a vast majority of the points in your regime, your func_product is actually calculating:
exp(-x)/exp(x) == exp(-2*x)
Which is why your graph has a nice slope of -2.
Taking it to the other extreme, you're other version is calculating (at least approximately):
exp(-x) - 1./exp(x)
which is approximately
exp(-x) - exp(-x)

This is an example of catastrophic cancellation.
Let's look at the first point where the calculation goes awry, when x = 36.0
In [42]: np.exp(-x)
Out[42]: 2.3195228302435691e-16
In [43]: - 1/(1+np.exp(x))
Out[43]: -2.3195228302435691e-16
In [44]: np.exp(-x) - 1/(1+np.exp(x))
Out[44]: 0.0
The calculation using func_product does not subtract nearly equal numbers, so it avoids the catastrophic cancellation.
By the way, if you change math.exp to np.exp, you can get rid of np.vectorize (which is slow):
def func_product(x):
return np.exp(-x)/(1+np.exp(x))
def func_sum(x):
return np.exp(-x)-1/(1+np.exp(x))
y_sum = func_sum_sum(x)
y_product = func_product_product(x)

The problem is that your func_sum is numerically unstable because it involves a subtraction between two very close values.
In the calculation of func_sum(200), for example, math.exp(-200) and 1/(1+math.exp(200)) have the same value, because adding 1 to math.exp(200) has no effect, since it is outside the precision of 64-bit floating point:
math.exp(200).hex()
0x1.73f60ea79f5b9p+288
(math.exp(200) + 1).hex()
0x1.73f60ea79f5b9p+288
(1/(math.exp(200) + 1)).hex()
0x1.6061812054cfap-289
math.exp(-200).hex()
0x1.6061812054cfap-289
This explains why func_sum(200) gives zero, but what about the points that lie off the x axis? These are also caused by floating point imprecision; it occasionally happens that math.exp(-x) is not equal to 1/math.exp(x); ideally, math.exp(x) is the closest floating-point value to e^x, and 1/math.exp(x) is the closest floating-point value to the reciprocal of the floating-point number calculated by math.exp(x), not necessarily to e^-x. Indeed, math.exp(-100) and 1/(1+math.exp(100)) are very close and in fact only differ in the last unit:
math.exp(-100).hex()
0x1.a8c1f14e2af5dp-145
(1/math.exp(100)).hex()
0x1.a8c1f14e2af5cp-145
(1/(1+math.exp(100))).hex()
0x1.a8c1f14e2af5cp-145
func_sum(100).hex()
0x1.0000000000000p-197
So what you have actually calculated is the difference, if any, between math.exp(-x) and 1/math.exp(x). You can trace the line of the function math.pow(2, -52) * math.exp(-x) to see that it passes through the positive values of func_sum (recall that 52 is the size of the significand in 64-bit floating point).

Function to determine if two numbers are nearly equal when rounded to n significant decimal digits

I have been asked to test a library provided by a 3rd party. The library is known to be accurate to n significant figures. Any less-significant errors can safely be ignored. I want to write a function to help me compare the results:
def nearlyequal( a, b, sigfig=5 ):
The purpose of this function is to determine if two floating-point numbers (a and b) are approximately equal. The function will return True if a==b (exact match) or if a and b have the same value when rounded to sigfig significant-figures when written in decimal.
Can anybody suggest a good implementation? I've written a mini unit-test. Unless you can see a bug in my tests then a good implementation should pass the following:
assert nearlyequal(1, 1, 5)
assert nearlyequal(1.0, 1.0, 5)
assert nearlyequal(1.0, 1.0, 5)
assert nearlyequal(-1e-9, 1e-9, 5)
assert nearlyequal(1e9, 1e9 + 1 , 5)
assert not nearlyequal( 1e4, 1e4 + 1, 5)
assert nearlyequal( 0.0, 1e-15, 5 )
assert not nearlyequal( 0.0, 1e-4, 6 )
Additional notes:
Values a and b might be of type int, float or numpy.float64. Values a and b will always be of the same type. It's vital that conversion does not introduce additional error into the function.
Lets keep this numerical, so functions that convert to strings or use non-mathematical tricks are not ideal. This program will be audited by somebody who is a mathematician who will want to be able to prove that the function does what it is supposed to do.
Speed... I've got to compare a lot of numbers so the faster the better.
I've got numpy, scipy and the standard-library. Anything else will be hard for me to get, especially for such a small part of the project.

As of Python 3.5, the standard way to do this (using the standard library) is with the math.isclose function.
It has the following signature:
isclose(a, b, rel_tol=1e-9, abs_tol=0.0)
An example of usage with absolute error tolerance:
from math import isclose
a = 1.0
b = 1.00000001
assert isclose(a, b, abs_tol=1e-8)
If you want it with precision of n significant digits, simply replace the last line with:
assert isclose(a, b, abs_tol=10**-n)

There is a function assert_approx_equal in numpy.testing (source here) which may be a good starting point.
def assert_approx_equal(actual,desired,significant=7,err_msg='',verbose=True):
"""
Raise an assertion if two items are not equal up to significant digits.
.. note:: It is recommended to use one of `assert_allclose`,
`assert_array_almost_equal_nulp` or `assert_array_max_ulp`
instead of this function for more consistent floating point
comparisons.
Given two numbers, check that they are approximately equal.
Approximately equal is defined as the number of significant digits
that agree.

Here's a take.
def nearly_equal(a,b,sig_fig=5):
return ( a==b or
int(a*10**sig_fig) == int(b*10**sig_fig)
)

I believe your question is not defined well enough, and the unit-tests you present prove it:
If by 'round to N sig-fig decimal places' you mean 'N decimal places to the right of the decimal point', then the test assert nearlyequal(1e9, 1e9 + 1 , 5) should fail, because even when you round 1000000000 and 1000000001 to 0.00001 accuracy, they are still different.
And if by 'round to N sig-fig decimal places' you mean 'The N most significant digits, regardless of the decimal point', then the test assert nearlyequal(-1e-9, 1e-9, 5) should fail, because 0.000000001 and -0.000000001 are totally different when viewed this way.
If you meant the first definition, then the first answer on this page (by Triptych) is good.
If you meant the second definition, please say it, I promise to think about it :-)

There are already plenty of great answers, but here's a think:
def closeness(a, b):
"""Returns measure of equality (for two floats), in unit
of decimal significant figures."""
if a == b:
return float("infinity")
difference = abs(a - b)
avg = (a + b)/2
return math.log10( avg / difference )
if closeness(1000, 1000.1) > 3:
print "Joy!"

This is a fairly common issue with floating point numbers. I solve it based on the discussion in Section 1.5 of Demmel[1]. (1) Calculate the roundoff error. (2) Check that the roundoff error is less than some epsilon. I haven't used python in some time and only have version 2.4.3, but I'll try to get this correct.
Step 1. Roundoff error
def roundoff_error(exact, approximate):
return abs(approximate/exact - 1.0)
Step 2. Floating point equality
def float_equal(float1, float2, epsilon=2.0e-9):
return (roundoff_error(float1, float2) < epsilon)
There are a couple obvious deficiencies with this code.
Division by zero error if the exact value is Zero.
Does not verify that the arguments are floating point values.
Revision 1.
def roundoff_error(exact, approximate):
if (exact == 0.0 or approximate == 0.0):
return abs(exact + approximate)
else:
return abs(approximate/exact - 1.0)
def float_equal(float1, float2, epsilon=2.0e-9):
if not isinstance(float1,float):
raise TypeError,"First argument is not a float."
elif not isinstance(float2,float):
raise TypeError,"Second argument is not a float."
else:
return (roundoff_error(float1, float2) < epsilon)
That's a little better. If either the exact or the approximate value is zero, than the error is equal to the value of the other. If something besides a floating point value is provided, a TypeError is raised.
At this point, the only difficult thing is setting the correct value for epsilon. I noticed in the documentation for version 2.6.1 that there is an epsilon attribute in sys.float_info, so I would use twice that value as the default epsilon. But the correct value depends on both your application and your algorithm.
[1] James W. Demmel, Applied Numerical Linear Algebra, SIAM, 1997.

"Significant figures" in decimal is a matter of adjusting the decimal point and truncating to an integer.
>>> int(3.1415926 * 10**3)
3141
>>> int(1234567 * 10**-3)
1234
>>>

Oren Shemesh got part of the problem with the problem as stated but there's more:
assert nearlyequal( 0.0, 1e-15, 5 )
also fails the second definition (and that's the definition I learned in school.)
No matter how many digits you are looking at, 0 will not equal a not-zero. This could prove to be a headache for such tests if you have a case whose correct answer is zero.

There is a interesting solution to this by B. Dawson (with C++ code)
at "Comparing Floating Point Numbers". His approach relies on strict IEEE representation of two numbers and the enforced lexicographical ordering when said numbers are represented as unsigned integers.

I have been asked to test a library provided by a 3rd party
If you are using the default Python unittest framework, you can use assertAlmostEqual
self.assertAlmostEqual(a, b, places=5)

There are lots of ways of comparing two numbers to see if they agree to N significant digits. Roughly speaking you just want to make sure that their difference is less than 10^-N times the largest of the two numbers being compared. That's easy enough.
But, what if one of the numbers is zero? The whole concept of relative-differences or significant-digits falls down when comparing against zero. To handle that case you need to have an absolute-difference as well, which should be specified differently from the relative-difference.
I discuss the problems of comparing floating-point numbers -- including a specific case of handling zero -- in this blog post:
http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

good way to test `numpy.allclose` on a time series? - python

Related

What is rtol for in numpy's allclose function?

Max Expected Floating Point Error in Python

Numerical solution of exponential equation using Python or other software

Float precision breakdown in python/numpy when adding numbers

Function to determine if two numbers are nearly equal when rounded to n significant decimal digits

Categories

Resources