I found myself needing to compute the "integer cube root", meaning the cube root of an integer, rounded down to the nearest integer. In Python, we could use the NumPy floating-point cbrt() function:
import numpy as np
def icbrt(x):
return int(np.cbrt(x))
Though this works most of the time, it fails at certain input x, with the result being one less than expected. For example, icbrt(15**3) == 14, which comes about because np.cbrt(15**3) == 14.999999999999998. The following finds the first 100,000 such failures:
print([x for x in range(100_000) if (icbrt(x) + 1)**3 == x])
# [3375, 19683, 27000, 50653] == [15**3, 27**3, 30**3, 37**3]
Question: What is special about 15, 27, 30, 37, ..., making cbrt() return ever so slightly below the exact result? I can find no obvious underlying pattern for these numbers.
A few observations:
The story is the same if we switch from NumPy's cbrt() to that of Python's math module, or if we switch from Python to C (not surprising, as I believe that both numpy.cbrt() and math.cbrt() delegate to cbrt() from the C math library in the end).
Replacing cbrt(x) with x**(1/3) (pow(x, 1./3.) in C) leads to many more cases of failure. Let us stick to cbrt().
For the square root, a similar problem does not arise, meaning that
import numpy as np
def isqrt(x):
return int(np.sqrt(x))
returns the correct result for all x (tested up to 100,000,000). Test code:
print([x for x in range(100_000) if (y := np.sqrt(x))**2 != x and (y + 1)**2 <= x])
Extra
As the above icbrt() only seems to fail on cubic input, we can correct for the occasional mistakes by adding a fixup, like so:
import numpy as np
def icbrt(x):
y = int(np.cbrt(x))
if (y + 1)**3 == x:
y += 1
return y
A different solution is to stick to exact integer computation, implementing icbrt() without the use of floating-point numbers. This is discussed e.g. in this SO question. An extra benefit of such approaches is that they are (or can be) faster than using the floating-point cbrt().
To be clear, my question is not about how to write a better icbrt(), but about why cbrt() fails at some specific inputs.
This problem is caused by a bad implementation of cbrt. It is not caused by floating-point arithmetic because floating-point arithmetic is not a barrier to computing the cube root well enough to return an exactly correct result when the exactly correct result is representable in the floating-point format.
For example, if one were to use integer arithmetic to compute nine-fifths of 80, we would expect a correct result of 144. If a routine to compute nine-fifths of a number were implemented as int NineFifths(int x) { return 9/5*x; }, we would blame that routine for being implemented incorrectly, not blame integer arithmetic for not handling fractions. Similarly, if a routine uses floating-point arithmetic to calculate an incorrect result when a correct result is representable, we blame the routine, not floating-point arithmetic.
Some mathematical functions are difficult to calculate, and we accept some amount of error in them. In fact, for some of the routines in the math library, humans have not yet figured out how to calculate them with correct rounding in a known-bounded execution time. So we accept that not every math routine is correctly rounded.
Howver, when the mathematical value of a function is exactly representable in a floating-point format, the correct result can be obtained by faithful rounding rather than correct rounding. So this is a desirable goal for math library functions.
Correctly rounded means the computed result equals the number you would obtain by rounding the exact mathematical result to the nearest representable value.1 Faithfully rounded means the computed result is less than one ULP from the exact mathematical result. An ULP is the unit of least precision, the distance between two adjacent representable numbers.
Correctly rounding a function can be difficult because, in general, a function can be arbitrarily close to a rounding decision point. For round-to-nearest, this is midway between two adjacent representable numbers. Consider two adjacent representable numbers a and b. Their midpoint is m = (a+b)/2. If the mathematical value of some function f(x) is just below m, it should be rounded to a. If it is just above, it should be rounded to b. As we implement f in software, we might compute it with some very small error e. When we compute f(x), if our computed result lies in [m-e, m+e], and we only know the error bound is e, then we cannot tell whether f(x) is below m or above m. And because, in general, a function f(x) can be arbitrarily close to m, this is always a problem: No matter how accurately we compute f, no matter how small we make the error bound e, there is a possibility that our computed value will lie very close to a midpoint m, closer than e, and therefore our computation will not tell us whether to round down or to round up.
For some specific functions and floating-point formats, studies have been made and proofs have been written about how close the functions approach such rounding decision points, and so certain functions like sine and cosine can be implemented with correct rounding with known bounds on the compute time. Other functions have eluded proof so far.
In contrast, faithful rounding is easier to implement. If we compute a function with an error bound less than ½ ULP, then we can always return a faithfully rounded result, one that is within one ULP of the exact mathematical result. Once we have computed some result y, we round that to the nearest representable value2 and return that. Starting with y having error less than ½ ULP, the rounding may add up to ½ ULP more error, so the total error is less than one ULP, which is faithfully rounded.
A benefit of faithful rounding is that a faithfully rounded implementation of a function always produces the exact result when the exact result is representable. This is because the next nearest result is one ULP away, but faithful rounding always has an error less than one ULP. Thus, a faithfully rounded cbrt function returns exact results when they are representable.
What is special about 15, 27, 30, 37, ..., making cbrt() return ever so slightly below the exact result? I can find no obvious underlying pattern for these numbers.
The bad cbrt implementation might compute the cube root by reducing the argument to a value in [1, 8) or similar interval and then applying a precomputed polynomial approximation. Each addition and multiplication in that polynomial may introduce a rounding error as the result of each operation is rounded to the nearest representable value in floating-point format. Additionally, the polynomial has inherent error. Rounding errors behave somewhat like a random process, sometimes rounding up, sometimes down. As they accumulate over several calculations, they may happen to round in different directions and cancel, or they may round in the same direction ad reinforce. If the errors happen to cancel by the end of the calculations, you get an exact result from cbrt. Otherwise, you may get an incorrect result from cbrt.
Footnotes
1 In general, there is a choice of rounding rules. The default and most common is round-to-nearest, ties-to-even. Others include round-upward, round-downward, and round-toward-zero. This answer focuses on round-to-nearest.
2 Inside a mathematical function, numbers may be computed using extended precision, so we may have computed results that are not representable in the destination floating-point format; they will have more precision.
Related
I'm attempting a bit of code that would help me approximate the derivatives of an arbitrary function. I saw there were four options on another post:
Finite Differences
Automatic Derivatives
Symbolic Differentiation
Compute derivatives by hand
I saw that my approach falls best in the first option, which had the note, "prone to numerical error". So I'm aware that this method isn't expected to be exact, which is fine.
That being said, I did some research into what size numbers can be stored by different data types, and found in this post that it can be quite small, (on the order of 10–308) and that "In the normal range, results of elementary operations will be accurate within the normal precision of the format".
That being said, I seem to be getting extremely bad outcomes for this following code snippet where I explore different sized intervals; the smallest difference shouldn't be much smaller than 10–27 (10–9, cubed), which is much larger than the limiting value. I would appreciate maybe a more specific response?
epsilon = 0.01 # is "small" w.r.t. to 3
def approx_derivative(func): # rough derivative factory function
return lambda x : (func(x + epsilon) - func(x)) / epsilon
while epsilon > 10**-9:
nth_deriv = lambda x : x ** 3 # 0th derivative
for i in range(5): # should read about 27, 27, 18, 6, 0
print(nth_deriv(3), end=', ')
nth_deriv = approx_derivative(nth_deriv) # take derivative
print('\n')
epsilon *= 0.1
The output is:
27, 27.090099999999495, 18.0599999999842, 6.000000002615025, -3.552713678800501e-07,
27, 27.009000999996147, 18.00600000123609, 6.000000496442226, -0.007105427357601002,
27, 27.00090001006572, 18.000599766310188, 6.004086117172847, -71.05427357601002,
27, 27.000090000228735, 18.000072543600254, 3.5527136788005005, 355271.36788005003,
27, 27.000009005462285, 17.998047496803334, 0.0, 3552713678.8005,
27, 27.000000848431675, 18.11883976188255, 0.0, -35527136788004.99,
27, 27.0000001023618, 0.0, 0.0, 3.552713678800497e+17,
27, 27.000002233990003, 0.0, 0.0, 0.0,
As we can see in the first couple of examples, the results aren't exact but are pretty good. For certain interval sizes, though, some values are blown up; others go to 0; and some are just plain wrong, like giving half the value, despite the intuition that they should become more accurate for smaller epsilons. What main things can I attribute to this error? What should I be looking out for/be cautious of? Are there errors I should be worried about catching with a block (like division by 0)?
Is there a value for epsilon that is generally considered "best" for doing computations with floats? Or is there a "rule-of-thumb" for choosing a good-sized epsilon based on your input? Is there a preferred definition of the derivative to use over the one I implemented?
First, the minimum value representable in the floating-point type is irrelevant here. The precision is a concern. The Python specification is not specific about what floating-point format a Python implementation uses, but many use the IEEE-754 binary64 format, which has 53 bits in its significand, meaning the smallest representable change in value is about 2−52 times the magnitude of the number represented. We will look at how that affects your calculations.
Computing the approximation of the derivative, f', computes (x+e)3−x3. The results is 3ex2 + 3e2x + e3, approximately 3ex2, which is then divided by e to yield approximately 3x2. Note that two terms around x3 in magnitude were involved, and they differed by about 3ex2. Since you are taking the derivative at x = 3, 3ex2 happens to equal ex3, so this is lower than x3 by a factor of e. As long as e is modest, two binary64 numbers near x3 can readily differ by ex3 without losing accuracy.
Computing the approximation of the second derivative, f'', computes ((x+2e)3−(x+e)3)−((x+e)3−x3) = (x+2e)3−2(x+e)3+x3 = 6e2x + 6e3. Along the way, this gets divided by e2, but we can ignore that for error analysis for now. The original terms had magnitude around x3, but the result of our subtractions is around 6e2x. So this relies on the numbers being able to differ by about e2. Since 2−52 is about 2•10−16, this fails considerably when e is around 10−8, with lesser errors before that. When e is around this size, the binary64 format simply cannot record the differences between the numbers precisely enough to retain accuracy when they are later subtracted.
For the third derivative, you need to represent differences around e3 relative to the primary magnitudes, so it fails considerably around e = 10−5, since then e3 is getting near the limit of 2•10−16.
For the moment, put aside any issues relating to pseudorandom number generators and assume that numpy.random.rand perfectly samples from the discrete distribution of floating point numbers over [0, 1). What are the odds getting at least two exactly identical floating point numbers in the result of:
numpy.random.rand(n)
for any given value of n?
Mathematically, I think this is equivalent to first asking how many IEEE 754 singles or doubles there are in the interval [0, 1). Then I guess the next step would be to solve the equivalent birthday problem? I'm not really sure. Anyone have some insight?
The computation performed by numpy.random.rand for each element generates a number 0.<53 random bits>, for a total of 2^53 equally likely outputs. (Of course, the memory representation isn't a fixed-point 0.stuff; it's still floating point.) This computation is incapable of producing most binary64 floating-point numbers between 0 and 1; for example, it cannot produce 1/2^60. You can see the code in numpy/random/mtrand/randomkit.c:
double
rk_double(rk_state *state)
{
/* shifts : 67108864 = 0x4000000, 9007199254740992 = 0x20000000000000 */
long a = rk_random(state) >> 5, b = rk_random(state) >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
(Note that rk_random produces 32-bit outputs, regardless of the size of long.)
Assuming a perfect source of randomness, the probability of repeats in numpy.random.rand(n) is 1-(1-0/k)(1-1/k)(1-2/k)...(1-(n-1)/k), where k=2^53. It's probably best to use an approximation instead of calculating this directly for large values of n. (The approximation may even be more accurate, depending on how the approximation error compares to the rounding error accumulated in a direct computation.)
I think you are correct, this is like the birthday problem.
But you need to decide on the number of possible options. You do this by deciding the precision of your floating point numbers.
For example, if you decide to have a precision of 2 numbers after the dot, then there are 100 options(including zero and excluding 1).
And if you have n numbers then the probability of not having a collision is:
or when given R possible numbers and N data points, the probability of no collision is:
And of collision is 1 - P.
This is because the probability of getting any given number is 1/R. And at any point, the probability of a data point not colliding with prior data points is (R-i)/R for i being the index of the data point. But to get the probability of no data points colliding with each other, we need to multiply all the probabilities of data points not colliding with those prior to them. Applying some algebraic operations, we get the equation above.
Inspired by this answer, I wonder why numpy.nextafter gives different results for the smallest positive float number from numpy.finfo(float).tiny and sys.float_info.min:
import numpy, sys
nextafter = numpy.nextafter(0., 1.) # 5e-324
tiny = numpy.finfo(float).tiny # 2.2250738585072014e-308
info = sys.float_info.min # 2.2250738585072014e-308
According to the documentations:
numpy.nextafter
Return the next floating-point value after x1 towards x2, element-wise.
finfo(float).tiny
The smallest positive usable number. Type of tiny is an appropriate floating point type.
sys.float_info
A structseq holding information about the float type. It contains low level information about the precision and internal representation. Please study your system's :file:float.h for more information.
Does someone have an explanation for this?
The documentation’s wording on this is bad; “usable” is colloquial and not defined. Apparently tiny is meant to be the smallest positive normal number.
nextafter is returning the actual next representable value after zero, which is subnormal.
Python does not rigidly specify its floating-point properties. Python implementations commonly inherit them from underlying hardware or software, and use of IEEE-754 formats (but not full conformance to IEEE-754 semantics) is common. In IEEE-754, numbers are represented with an implicit leading one bit in the significand1 until the exponent reaches its minimum value for the format, after which the implicit bit is zero instead of one and smaller values are representable only by reducing the significand instead of reducing the exponent. These numbers with the implicit leading zero are the subnormal numbers. They serve to preserve some useful arithmetic properties, such as x-y == 0 if and only if x == y. (Without subnormal numbers, two very small numbers might be different, but their even smaller difference might not be representable because it was below the exponent limit, so computing x-y would round to zero, resulting in code like if (x != y) quotient = t / (x-y) getting a divide-by-zero error.)
Note
1 “Significand” is the term preferred by experts for the fraction portion of a floating-point representation. “Mantissa” is an old term for the fraction portion of a logarithm. Mantissas are logarithmic, while significands are linear.
I'm doing calculations with 3D vectors with floating point coordinates. Occasionally, I want to check if a vector is nonzero. However, with floating point numbers, there's always a chance of a rounding error.
Is there a standard way in Python to check if a floating point number is sufficiently close to zero? I could write abs(x) < 0.00001, but it's the hard-coded cutoff that bugs me on general grounds ...
Like Ami wrote in the comments, it depends on what you're doing. The system epsilon is good for single operation errors, but when you use already rounded values in further calculations, the errors can get much larger than the system epsilon. Take this extreme example:
import sys
print('%.20f\n' % sys.float_info.epsilon)
x = 0.1
for _ in range(25):
print('%.20f' % x)
x = 11*x - 1
With exact values, x would always be 0.1, since 11*0.1-1 is 0.1 again. But what really happens is this:
0.00000000000000022204
0.10000000000000000555
0.10000000000000008882
0.10000000000000097700
0.10000000000001074696
0.10000000000011821655
0.10000000000130038202
0.10000000001430420227
0.10000000015734622494
0.10000000173080847432
0.10000001903889321753
0.10000020942782539279
0.10000230370607932073
0.10002534076687252806
0.10027874843559780871
0.10306623279157589579
0.13372856070733485367
0.47101416778068339042
4.18115584558751685051
44.99271430146268357930
493.91985731608951937233
5432.11843047698494046926
59752.30273524683434516191
657274.33008771517779678106
7230016.63096486683934926987
79530181.94061353802680969238
Note that the original x differed from 0.1 by far less than my system epsilon, but the error quickly grew larger than that epsilon and even your 0.00001 and now it's in the millions.
This is an extreme example, though, and it's highly unlikely you'll encounter something this bad. But it shows that the precision really depends on what you're doing, so you'll have to find a good way for your particular calculation.
I noticed a glitch while using math.sin(math.pi).
The answer should have been 0 but instead it is 1.2246467991473532e-16.
If the statement is math.sin(math.pi/2) the answer is 1.0 which is correct.
why this error?
The result is normal: numbers in computers are usually represented with floats, which have a finite precision (they are stored in only a few bytes). This means that only a finite number of real numbers can be represented by floats. In particular, π cannot be represented exactly, so math.pi is not π but a very good approximation of it. This is why math.sin(math.pi) does not have to be sin(π) but only something very close to it.
The precise value that you observe for math.sin(math.pi) is understandable: the relative precision of (double precision) floats is about 1e-16. This means that math.pi can be wrong by about π*1e-16 ~ 3e-16. Since sin(π-ε) ~ ε, the value that you obtain with math.sin(math.pi) can be in the worst case ~3e-16 (in absolute value), which is the case (this calculation is not supposed to give the exact value but just the correct order of magnitude, and it does).
Now, the fact that math.sin(math.pi/2) == 1 is not shocking: it may be (I haven't checked) that math.pi/2 (a float) is so close to the exact value π/2 that the float which is the closest to sin(math.pi/2) is exactly 1. In general, you can expect the result of a function applied to a floating point number to be off by about 1e-16 relative (or be about 1e-16 instead of 0).