Why does numpy.std() use abs()? - python

I checked the numpy library and found the following definition for the standard deviation in numpy:
std = sqrt(mean(abs(x - x.mean())**2))
Why is the abs() function used? - Because mathematically the square of a number will be positive per definition.
So I thought:
abs(x - x.mean())**2 == (x - x.mean())**2

The square of a real number is always positive, but this is not true for complex numbers.
A very simple example: j**2=-1
A more complex (pun intended) example: (3-2j)**2=(5-12j)
From documentation:
Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.
Note:
Python uses j for the imaginary unit, while mathematicians uses i.

Related

What is rtol for in numpy's allclose function?

In numpy.allclose() there are two tolerance factors used to determine if two arrays are close enough to count as the same. There is the relative tolerance rtol and absolute tolerance atol. From the docs:
numpy.allclose(a, b, rtol=1e-05, atol=1e-08)
Also from the docs:
If the following equation is element-wise True, then allclose returns True.
absolute(a - b) <= (atol + rtol * absolute(b))
Mathematically I understand this, but I am confused about the point of rtol. Why not just use a single tolerance value tol, and if |a-b| < tol, then return False? Obviously, following the above equation, I could just do this manually by setting rtol to zero, thereby making everything symmetric. What is the point of the symmetry-breaking rtol factor?
Related question
How allclose() works?
The confusing part is that the equation shows both parameters being used at the same time. Look at it like this instead:
Usecase 1: absolute tolerance (atol): absolute(a - b) <= atol
Usecase 2: relative tolerance (rtol): absolute(a - b) <= rtol * absolute(b)
An alternative way to implement both with a single tolerance parameter would be to add a flag that determines if the toerance is relative or absolute. Separating the use-cases like that breaks down in the usecase where array values can be both large and zero. If only one array can have zeros, make that one a and use the asymmetrical equation to your benefit without atol. If either one can have zeros, simply set rtol to some acceptable value for large elements, and set atol to the value you want to kick in for zeros.
You generally want to use rtol: since the precision of numbers and calculations is very much finite, larger numbers will almost always be less precise than smaller ones, and the difference scales linearly (again, in general). The only time you use atol is for numbers that are so close to zero that rounding errors are liable to be larger than the number itself.
Another way to look at it is atol compares fixed decimal places, while rtol compares significant figures.
Which tolerance(s) to use depends on your problem statement. For example, what if my array has a wide domain of values, ranging from 1e-10 to 1e10? A small atol would work well for small values, but poorly for large values, and vice-versa for a large atol. But rtol is perfect in this case because I can specify that the acceptable delta should scale with each value.

Max Expected Floating Point Error in Python

I have the following method in my python code that compares values between two objects to determine if they are equal:
def equals(self, vec, tol):
return all(i < tol for i in [abs(a - b) for a, b in zip(self, vec)])
I want to give a default value to my tolerance variable, tol, such that it is the smallest possible value that is always greater than error that could occur from floating-point inaccuracies. What is this value?
The largest possible error is infinity, and NaN (Not a Number) is also possible. There is no general formula that is correct for tol. Determining what error could occur always requires knowledge of the values used and the operations performed.
Additionally, there are limited situations where “comparing for equality using a tolerance” is a proper technique. (Testing software is one of them.) Comparing for equality using a tolerance reduces the risk of deciding two numbers are unequal even though they would be equal if computed with exact mathematics, but it does so at the expense of falsely accepting as equal two numbers that would be unequal if computed with exact mathematics. Even deciding whether such a trade-off is acceptable is application-dependent, let alone deciding what the tolerance should be.
I usually use something like this with numpy:
tol = max(np.finfo(float).eps, np.finfo(float).eps * abs(a - b))

python: why does float's zero detection fail after calculating i as an exponent?

0 == ((-1)**.5).real
... is False in python 3.5.1, whereas:
0 == complex(0,1).real
... is True. how are these two cases handled differently? when do the zero-detecting features of the float class work and when do they not?
>>> (-1)**0.5
(6.123233995736766e-17+1j)
That's all there is to it - due to floating-point vagaries, the real part of the computed result isn't exactly zero. But in your other case it is:
>>> complex(0,1).real
0.0
By the way, ** invokes a general-purpose exponentiation routine, which adds several layers of floating-point roundoff errors under the covers. If you know you want a square root, it's better to use a square root function:
>>> import cmath
>>> cmath.sqrt(-1)
1j
The fractional power is computed -- simplifying somewhat -- as r cis theta. Since theta (pi) cannot be exactly represented as a binary fraction, the result is not exactly what you'd expect from hand calculation. There are various "equal within a tolerance" functions you can apply to work around this.

python strange result using math.fmod()

I am playing around with the math module in Python 3.4 and I got some curious results when using fmod function for which I am having hard times in getting detailed info from the python website.
One simple example is the following:
from math import *
x = 99809175801648148531
y = 6.5169020832937505
sqrt(x)-cos(x)**fmod(x, y)*log10(x)
it returns:
(9990454237.014296+8.722374238018135j)
How to interpret this result? What is j?
Is it an imaginary number like i?
If so, why j and not i?
Any info, as well as links to some resources about fmod are very welcome.
The result you got was a complex number because you exponentiated a negative number. i and j are just notational choices to represent the imaginary number unit, i being used in mathematics more and j being used in engineering more. You can see in the docs that Python has chosen to use j:
https://docs.python.org/2/library/cmath.html#conversions-to-and-from-polar-coordinates
Here, j is the same as i, the square root of -1. It is a convention commonly used in engineering, where i is used to denote electrical current.
The reason complex numbers arise in your case is that you're raising a negative number to a fractional power. See How do you compute negative numbers to fractional powers? for further discussion.
cos(x) is a negative number. When you raise a negative number to a non-integral power, it is not surprising to get a complex result. Most roots of negative numbers are complex.
>>> x = 99809175801648148531
>>> y = 6.5169020832937505
>>> cos(x)
-0.7962325418899466
>>> fmod(x,y)
3.3940870272073056
>>> cos(x)**fmod(x,y)
(-0.1507219382442201-0.436136801343955j)
Imaginary numbers can be represented with either an 'i' or a 'j'. I believe the reasons are historical. Mathematicians prefered 'i' for imaginary. Electrical engineers didn't want to get an imaginary 'i' confused with an 'i' for current, so they used 'j'. Now, both are used.

Function to determine if two numbers are nearly equal when rounded to n significant decimal digits

I have been asked to test a library provided by a 3rd party. The library is known to be accurate to n significant figures. Any less-significant errors can safely be ignored. I want to write a function to help me compare the results:
def nearlyequal( a, b, sigfig=5 ):
The purpose of this function is to determine if two floating-point numbers (a and b) are approximately equal. The function will return True if a==b (exact match) or if a and b have the same value when rounded to sigfig significant-figures when written in decimal.
Can anybody suggest a good implementation? I've written a mini unit-test. Unless you can see a bug in my tests then a good implementation should pass the following:
assert nearlyequal(1, 1, 5)
assert nearlyequal(1.0, 1.0, 5)
assert nearlyequal(1.0, 1.0, 5)
assert nearlyequal(-1e-9, 1e-9, 5)
assert nearlyequal(1e9, 1e9 + 1 , 5)
assert not nearlyequal( 1e4, 1e4 + 1, 5)
assert nearlyequal( 0.0, 1e-15, 5 )
assert not nearlyequal( 0.0, 1e-4, 6 )
Additional notes:
Values a and b might be of type int, float or numpy.float64. Values a and b will always be of the same type. It's vital that conversion does not introduce additional error into the function.
Lets keep this numerical, so functions that convert to strings or use non-mathematical tricks are not ideal. This program will be audited by somebody who is a mathematician who will want to be able to prove that the function does what it is supposed to do.
Speed... I've got to compare a lot of numbers so the faster the better.
I've got numpy, scipy and the standard-library. Anything else will be hard for me to get, especially for such a small part of the project.
As of Python 3.5, the standard way to do this (using the standard library) is with the math.isclose function.
It has the following signature:
isclose(a, b, rel_tol=1e-9, abs_tol=0.0)
An example of usage with absolute error tolerance:
from math import isclose
a = 1.0
b = 1.00000001
assert isclose(a, b, abs_tol=1e-8)
If you want it with precision of n significant digits, simply replace the last line with:
assert isclose(a, b, abs_tol=10**-n)
There is a function assert_approx_equal in numpy.testing (source here) which may be a good starting point.
def assert_approx_equal(actual,desired,significant=7,err_msg='',verbose=True):
"""
Raise an assertion if two items are not equal up to significant digits.
.. note:: It is recommended to use one of `assert_allclose`,
`assert_array_almost_equal_nulp` or `assert_array_max_ulp`
instead of this function for more consistent floating point
comparisons.
Given two numbers, check that they are approximately equal.
Approximately equal is defined as the number of significant digits
that agree.
Here's a take.
def nearly_equal(a,b,sig_fig=5):
return ( a==b or
int(a*10**sig_fig) == int(b*10**sig_fig)
)
I believe your question is not defined well enough, and the unit-tests you present prove it:
If by 'round to N sig-fig decimal places' you mean 'N decimal places to the right of the decimal point', then the test assert nearlyequal(1e9, 1e9 + 1 , 5) should fail, because even when you round 1000000000 and 1000000001 to 0.00001 accuracy, they are still different.
And if by 'round to N sig-fig decimal places' you mean 'The N most significant digits, regardless of the decimal point', then the test assert nearlyequal(-1e-9, 1e-9, 5) should fail, because 0.000000001 and -0.000000001 are totally different when viewed this way.
If you meant the first definition, then the first answer on this page (by Triptych) is good.
If you meant the second definition, please say it, I promise to think about it :-)
There are already plenty of great answers, but here's a think:
def closeness(a, b):
"""Returns measure of equality (for two floats), in unit
of decimal significant figures."""
if a == b:
return float("infinity")
difference = abs(a - b)
avg = (a + b)/2
return math.log10( avg / difference )
if closeness(1000, 1000.1) > 3:
print "Joy!"
This is a fairly common issue with floating point numbers. I solve it based on the discussion in Section 1.5 of Demmel[1]. (1) Calculate the roundoff error. (2) Check that the roundoff error is less than some epsilon. I haven't used python in some time and only have version 2.4.3, but I'll try to get this correct.
Step 1. Roundoff error
def roundoff_error(exact, approximate):
return abs(approximate/exact - 1.0)
Step 2. Floating point equality
def float_equal(float1, float2, epsilon=2.0e-9):
return (roundoff_error(float1, float2) < epsilon)
There are a couple obvious deficiencies with this code.
Division by zero error if the exact value is Zero.
Does not verify that the arguments are floating point values.
Revision 1.
def roundoff_error(exact, approximate):
if (exact == 0.0 or approximate == 0.0):
return abs(exact + approximate)
else:
return abs(approximate/exact - 1.0)
def float_equal(float1, float2, epsilon=2.0e-9):
if not isinstance(float1,float):
raise TypeError,"First argument is not a float."
elif not isinstance(float2,float):
raise TypeError,"Second argument is not a float."
else:
return (roundoff_error(float1, float2) < epsilon)
That's a little better. If either the exact or the approximate value is zero, than the error is equal to the value of the other. If something besides a floating point value is provided, a TypeError is raised.
At this point, the only difficult thing is setting the correct value for epsilon. I noticed in the documentation for version 2.6.1 that there is an epsilon attribute in sys.float_info, so I would use twice that value as the default epsilon. But the correct value depends on both your application and your algorithm.
[1] James W. Demmel, Applied Numerical Linear Algebra, SIAM, 1997.
"Significant figures" in decimal is a matter of adjusting the decimal point and truncating to an integer.
>>> int(3.1415926 * 10**3)
3141
>>> int(1234567 * 10**-3)
1234
>>>
Oren Shemesh got part of the problem with the problem as stated but there's more:
assert nearlyequal( 0.0, 1e-15, 5 )
also fails the second definition (and that's the definition I learned in school.)
No matter how many digits you are looking at, 0 will not equal a not-zero. This could prove to be a headache for such tests if you have a case whose correct answer is zero.
There is a interesting solution to this by B. Dawson (with C++ code)
at "Comparing Floating Point Numbers". His approach relies on strict IEEE representation of two numbers and the enforced lexicographical ordering when said numbers are represented as unsigned integers.
I have been asked to test a library provided by a 3rd party
If you are using the default Python unittest framework, you can use assertAlmostEqual
self.assertAlmostEqual(a, b, places=5)
There are lots of ways of comparing two numbers to see if they agree to N significant digits. Roughly speaking you just want to make sure that their difference is less than 10^-N times the largest of the two numbers being compared. That's easy enough.
But, what if one of the numbers is zero? The whole concept of relative-differences or significant-digits falls down when comparing against zero. To handle that case you need to have an absolute-difference as well, which should be specified differently from the relative-difference.
I discuss the problems of comparing floating-point numbers -- including a specific case of handling zero -- in this blog post:
http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

Categories

Resources