Probability that a formula fails in IEEE 754

Probability that a formula fails in IEEE 754 - python

On my computer, I can check that
(0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3)
evaluates to False.
More generally, I can estimate that the formula (a + b) + c == a + (b + c) fails roughly 17% of the time when a,b,c are chosen uniformly and independently on [0,1], using the following simulation:
import numpy as np
import numexpr
np.random.seed(0)
formula = '(a + b) + c == a + (b + c)'
def failure_expectation(formula=formula, N=10**6):
a, b, c = np.random.rand(3, N)
return 1.0 - numexpr.evaluate(formula).mean()
# e.g. 0.171744
I wonder if it is possible to arrive at this probability by hand, e.g. using the definitions in the floating point standard and some assumption on the uniform distribution.
Given the answer below, I assume that the following part of the original question is out of reach, at least for now.
Is there is a tool that computes the failure probability for a given
formula without running a simulation.
Formulas can be assumed to be simple, e.g. involving the use of
parentheses, addition, subtraction, and possibly multiplication and
division.
(What follows may be an artifact of numpy random number generation, but still seems fun to explore.)
Bonus question based on an observation by NPE. We can use the following code to generate failure probabilities for uniform distributions on a sequence of ranges [[-n,n] for n in range(100)]:
import pandas as pd
def failures_in_symmetric_interval(n):
a, b, c = (np.random.rand(3, 10**4) - 0.5) * n
return 1.0 - numexpr.evaluate(formula).mean()
s = pd.Series({
n: failures_in_symmetric_interval(n)
for n in range(100)
})
The plot looks something like this:
In particular, failure probability dips down to 0 when n is a power of 2 and seems to have a fractal pattern. It also looks like every "dip" has a failure probability equal to that of some previous "peak". Any elucidation of why this happens would be great!

It's definitely possible to evaluate these things by hand, but the only methods I know are tedious and involve a lot of case-by-case enumeration.
For example, for your specific example of determining the probability that (a + b) + c == a + (b + c), that probability is 53/64, to within a few multiples of the machine epsilon. So the probability of a mismatch is 11/64, or around 17.19%, which agrees with what you were observing from your simulation.
To start with, note that there's a major simplifying factor in this particular case, and that's that Python and NumPy's "uniform-on-[0, 1]" random numbers are always of the form n/2**53 for some integer n in range(2**53), and within the constraints of the underlying Mersenne Twister PRNG, each such number is equally likely to occur. Since there are around 2**62 IEEE 754 binary64 representable values in the range [0.0, 1.0], that means that the vast majority of those IEEE 754 values aren't generated by random.random() (or np.random.rand()). This fact greatly simplifies the analysis, but also means that it's a bit of a cheat.
Here's an incomplete sketch, just to give an idea of what's involved. To compute the value of 53/64, I had to divide into five separate cases:
The case where both a + b < 1 and b + c < 1. In this case, both a + b and b + c are computed without error, and (a + b) + c and a + (b + c) therefore both give the closest float to the exact result, rounding ties to even as usual. So in this case, the probability of agreement is 1.
The case where a + b < 1 and b + c >= 1. Here (a + b) + c will be the correctly rounded value of the true sum, but a + (b + c) may not be. We can divide further into subcases, depending on the parity of the least significant bits of a, b and c. Let's abuse terminology and call a "odd" if it's of the form n/2**53 with n odd, and "even" if it's of the form n/2**53 with n even, and similarly for b and c. If b and c have the same parity (which will happen half the time), then (b + c) is computed exactly and again a + (b + c) must match (a + b) + c. For the other cases, the probability of agreement is 1/2 in each case; the details are all very similar, but for example in the case where a is odd, b is odd and c is even, (a + b) + c is computed exactly, while in computing a + (b + c) we incur two rounding errors, each of magnitude exactly 2**-53. If those two errors are in opposite directions, they cancel and we get agreement. If not, we don't. Overall, there's a 3/4 probability of agreement in this case.
The case where a + b >= 1 and b + c < 1. This is identical to the previous case after swapping the roles of a and c; the probability of agreement is again 3/4.
a + b >= 1 and b + c >= 1, but a + b + c < 2. Again, one can split on the parities of a, b and c and look at each of the resulting 8 cases in turn. For the cases even-even-even and odd-odd-odd we always get agreement. For the case odd-even-odd, the probability of agreement turns out to be 3/4 (by yet further subanalysis). For all the other cases, it's 1/2. Putting those together gets an aggregate probability of 21/32 for this case.
Case a + b + c >= 2. In this case, since we're rounding the final result to a multiple of four times 2**-53, it's necessary to look not just at the parities of a, b, and c, but to look at the last two significant bits. I'll spare you the gory details, but the probability of agreement turns out to be 13/16.
Finally, we can put all these cases together. To do that, we also need to know the probability that our triple (a, b, c) lands in each case. The probability that a + b < 1 and b + c < 1 is the volume of the square-based pyramid described by 0 <= a, b, c <= 1, a + b < 1, b + c < 1, which is 1/3. The probabilities of the other four cases can be seen (either by a bit of solid geometry, or by setting up suitable integrals) to be 1/6 each.
So our grand total is 1/3 * 1 + 1/6 * 3/4 + 1/6 * 3/4 + 1/6 * 21/32 + 1/6 * 13/16, which comes out to be 53/64, as claimed.
A final note: 53/64 almost certainly isn't quite the right answer - to get a perfectly accurate answer we'd need to be careful about all the corner cases where a + b, b + c, or a + b + c hit a binade boundary (1.0 or 2.0). It would certainly be possible to do refine the above approach to compute exactly how many of the 2**109 possible triples (a, b, c) satisfy (a + b) + c == a + (b + c), but not before it's time for me to go to bed. But the corner cases should constitute on the order of 1/2**53 of the total number of cases, so our estimate of 53/64 should be accurate to at least 15 decimal places.
Of course, there are lots of details missing above, but I hope it gives some idea of how it might be possible to do this.

Related

How to check if abc == sqrt(a^b^c) very fast (preferably Python)?

Let a,b,c be the first digits of a number (e.g. 523 has a=5, b=2, c=3). I am trying to check if abc == sqrt(a^b^c) for many values of a,b,c. (Note: abc = 523 stands for the number itself.)
I have tried this with Python, but for a>7 it already took a significant amount of time to check just one digit combination. I have tried rewriting the equality as multiple logs, like log_c[log_b[log_a[ (abc)^2 ]]] == 1, however, I encountered Math Domain Errors.
Is there a fast / better way to check this equality (preferably in Python)?
Note: Three digits are an example for StackOverflow. The goal is to test much higher powers with seven to ten digits (or more).
Here is the very basic piece of code I have used so far:
for a in range(1,10):
for b in range(1,10):
for c in range(1,10):
N = a*10**2 + b*10 + c
X = a**(b**c)
if N == X:
print a,b,c

The problem is that you are uselessly calculating very large integers, which can take much time as Python has unlimited size for them.
You should limit the values of c you test.
If your largest possible number is 1000, you want a**b**c < 1000**2, so b**c < log(1000**2, a) = 2*log(1000, a)), so c < log(2*log(1000, a), b)
Note that you should exclude a = 1, as any power of it is 1, and b = 1, as b^c would then be 1, and the whole expression is just a.
To test if the square root of a^b^c is abc, it's better to test if a^b^c is equal to the square of abc, in order to avoid using floats.
So, the code, that (as expected) doesn't find any solution under 1000, but runs very fast:
from math import log
for a in range(2,10):
for b in range(2,10):
for c in range(1,int(log(2*log(1000, a), b))):
N2 = (a*100 + b*10 + c)**2
X = a**(b**c)
if N2 == X:
print(a,b,c)

You are looking for numbers whose square root is equal to a three-digit integer. That means your X has to have at most 6 digits, or more precisely log10(X) < 6. Once your a gets larger, the potential solutions you're generating are much larger than that, so we can eliminate large swathes of them without needing to check them (or needing to calculate a ** b ** c, which can get very large: 9 ** 9 ** 9 has 369_693_100 DIGITS!).
log10(X) < 6 gives us log10(a ** b ** c) < 6 which is the same as b ** c * log10(a) < 6. Bringing it to the other side: log10(a) < 6 / b ** c, and then a < 10 ** (6 / b ** c). That means I know I don't need to check for any a that exceeds that. Correcting for an off-by-one error gives the solution:
for b in range(1, 10):
for c in range(1, 10):
t = b ** c
for a in range(1, 1 + min(9, int(10 ** (6 / t)))):
N = a * 100 + b * 10 + c
X = a ** t
if N * N == X:
print(a, b, c)
Running this shows that there aren't any valid solutions to your equation, sadly!

a**(b**c) will grow quite fast and most of the time it will far exceed three digit number. Most of the calculations you are doing will be useless. To optimize your solution do the following:
Iterate over all 3 digit numbers
For each of these numbers square it and is a power of the first digit of the number
For those that are, check if this power is in turn a power of the second digit
And last check if this power is the third digit

Differences between complex number implementations in Haskell and Python

I'm trying to map the complex number functionality in Python, to Data.Complex in Haskell, but I've reached a point where they differ, and I am unsure as to why.
In python:
>>> x = 3j
3j
>>> x.real
0.0
>>> x.imag
3.0
In Haskell:
> import Data.Complex
> let j n = 0 :+ n
> let x = j 3.0
> realPart x
0.0
> imagPart x
3.0
So far they look the same. Looks like operating on them doesn't differ much either:
Python:
>>> y = 1 + x
(1+3j)
>>> y.real
1.0
>>> y.imag
3.0
Haskell:
> let y = 1 + x
> realPart y
1.0
> imagPart y
3.0
In isolation + - * / ** all seem to work the same way. However this operation yields two different results:
>>> z = (y - 1) ** 2
(-9+0j)
>>> z.real
-9.0
>>> z.imag
0.0
But in Haskell:
> let z = (y - 1) ** 2
> realPart z
-9.000000000000002
> imagPart z
1.1021821192326181e-15
Why is this?

In Haskell, (**) for Complex is essentially
a ** b = exp (b * log a)
which has many opportunities for bad rounding errors to creep in. (I don't know enough Python to check what it would do with an analogous log-then-exp expression; the thing I tried complained that it wasn't ready to handle log(3j).) It has a bunch of special cases to thwart rounding errors, but none check for a fully-real integer exponent. You might consider this a bug or infelicity and report it to the folks in charge of the Complex type as another special case worth adding to the implementation of (**).
In the meantime, if you know your exponent is integral, you can use (^) (for positive numbers only) or (^^) instead:
Data.Complex> (0 :+ 3) ^ 2
(-9.0) :+ 0.0

Although the results given by the two languages are different, they aren't very different (as others have indicated in the comments). So you might guess that it's just a matter of slightly different implementations -- and you'd be right.
Daniel Wagner indicates that in Haskell, the ** operator is defined as
a ** b = exp (b * log a)
Haskell does some special casing, but most of the time, the operation relies on the general-purpose definitions of exp and log for complex numbers.
In Python, it's a little different: powers are calculated using a polar representation. This approach involves using a different set of general-purpose functions -- most of them basic trigonometric functions over ordinary floating point numbers -- and does almost no special-casing. It's not clear to me that this approach is better overall, but it does happen to give a more correct answer in the particular case you've chosen.
Here's the core of the implementation:
vabs = hypot(a.real,a.imag);
len = pow(vabs,b.real);
at = atan2(a.imag, a.real);
phase = at*b.real;
if (b.imag != 0.0) {
len /= exp(at*b.imag);
phase += b.imag*log(vabs);
}
r.real = len*cos(phase);
r.imag = len*sin(phase);
Here, a is the base and b is the exponent. vabs and at give the polar representation of a, such that
a.real = vabs * cos(at)
a.imag = vabs * sin(at)
And as you can see in the last two lines of code, len and phase give the corresponding polar representation of the result, r.
When b is real, the if block isn't executed, and this simplifies to De Moivre's formula. I can't find a canonical formula covering the complex or imaginary cases, but it appears to be pretty simple!

Is it possible to solve equations of bit wise operators?

We can easily find:
a=7
b=8
c=a|b
Then c comes out to be: 15
Now can we find a if c is given?
For example:
b=8
c=15
c=a|b
Find a?
And also if x=2<<1 is given, then we can get x=4. But if 4=y<<1 is given Can we get y?

To begin with, these are just my observations and I have no sources to back them up. There are better ways, but the Wikipedia pages were really long and confusing so I hacked together this method.
Yes, you can, but you need more context (other equations to solve in reference to) and a lot more parsing. This is the method I came up with for doing this, but there are better ways to approach this problem. This was just conceptually easier for me.
Numbers
You can't just put an integer into an equation and have it work. Bitwise operators refer only refer to booleans, we just treat them as if they are meant for integers. In order to simplify an equation, we have to look at it as an array of booleans.
Taking for example an unsigned 8 bit integer:
a = 0b10111001
Now becomes:
a = {1, 0, 1, 1, 1, 0, 0, 1}
Parsing
Once you can get your equations to just booleans, then you can apply the actual bitwise operators to simple 1s and 0s. But you can take it one step further now, at this all bitwise equations can be written in terms of just AND, OR, and NOT. Addition, subtraction and multiplication can also be represented this way, but you need to manually write out the steps taken.
A ^ B = ~( ( A & B ) | ( (~A) & (~B) ) )
This includes bitshifts, but instead of expanding to other bitwise operators, they act as an assignment.
A = 0b10111001
B = 0b10100110
C = (A >> 2) ^ B
This then expands to 8 equations, one for each bit.
C[0] = A[2] ^ B[0]
C[1] = A[3] ^ B[1]
C[2] = A[4] ^ B[2]
C[3] = A[5] ^ B[3]
C[4] = A[6] ^ B[4]
C[5] = A[7] ^ B[5]
C[6] = 0 ^ B[6]
C[7] = 0 ^ B[7]
C[6] and C[7] can then be reduced to just B[6] and B[7] respectively.
Algebra
Now that you have an equation consisting of only AND, OR, and NOT, you can represent them using traditional algebra. In this step, they are no longer treated as bits, but instead as real numbers which just happen to be 0 or 1.
A | B => A + B - AB
A & B => AB
~A => 1 - A
Note that when plugging in 1 and 0, all of these remain true.
For this example, I will be using the Majority function as an example. It's job is to take in three bits and return 1 if there are more 1s than 0s.
It is defined as:
f(a, b, c) = ((a & b) | (a & c) | (b & c))
which becomes
f(a, b, c) = (ab + ac - (ab * ac)) + bc - (ab + ac - (ab * ac) * bc
f(a, b, c) = ab + ac + bc - a2bc - ab2c - abc2 + a2b2c2
And now that you have this information, you can easily combine it with your other equations using standard algebra in order to get a solution. Any non 1 or 0 solution is extraneous.

A solution (if it exists) of such equation can be considered "unique" provided that you allow three states for each bit:
bit is 0
bit is 1
does not matter X
E.g. 7 | 00001XXX(binary) = 15
Of course, such result cannot be converted to decimal.
For some operations it may be necessary to specify the bit width.

For your particular cases, the answer is no, you cannot solve or 'undo' the OR-operation (|) and shifting left or right (<<, >>) since in both cases information is lost by applying the operation. For example, 8|7=15 and 12|7=15, thus given the 7 and 15 it is not possible to obtain a unique solution.
An exception is the XOR operation, for which does hold that when a^b=c, then b^c=a and a^c=b.

you can find an a that solves the equation, but it will not be unique. assume b=c=1 then a=0 and a=1 are solutions. for c=1, b=0 there will be no solution. this is valid for all the bits in the numbers you consider. if the equation is solvable a=c will be (one of the) solution(s).
and left-shifting an integer will always result in an even integer (the least-significant bit is zero). so this only works for even itegers. in that case you can invert the operation by applying a right-shift (>>).

numpy mean of complex numbers with infinities

numpy seems to not be a good friend of complex infinities
While we can evaluate:
In[2]: import numpy as np
In[3]: np.mean([1, 2, np.inf])
Out[3]: inf
The following result is more cumbersome:
In[4]: np.mean([1 + 0j, 2 + 0j, np.inf + 0j])
Out[4]: (inf+nan*j)
...\_methods.py:80: RuntimeWarning: invalid value encountered in cdouble_scalars
ret = ret.dtype.type(ret / rcount)
I'm not sure the imaginary part make sense to me. But please do comment if I'm wrong.
Any insight into interacting with complex infinities in numpy?

Solution
To compute the mean we divide the sum by a real number. This division causes problems because of type promotion (see below). To avoid type promotion we can manually perform this division separately for the real and imaginary part of the sum:
n = 3
s = np.sum([1 + 0j, 2 + 0j, np.inf + 0j])
mean = np.real(s) / n + 1j * np.imag(s) / n
print(mean) # (inf+0j)
Rationale
The issue is not related to numpy but to the way complex division is performed. Observe that ((1 + 0j) + (2 + 0j) + (np.inf + 0j)) / (3+0j) also results in (inf+nanj).
The result needs to be split into a real and imagenary part. For division both operands are promoted to complex, even if you divide by a real number. So basically the division is:
a + bj
--------
c + dj
The division operation does not know that d=0. So to split the result into real and imaginary it has to get rid of the j in the denominator. This is done by multiplying numerator and denominator with the complex conjugate:
a + bj (a + bj) * (c - dj) ac + bd + bcj - adj
-------- = --------------------- = ---------------------
c + dj (c + dj) * (c - dj) c**2 + d**2
Now, if a=inf and d=0 the term a * d * j = inf * 0 * j = nan * j.

when you run the function with a np.inf in your array the result will be the infinity object for np.mean or another functions like np.max(). But in this case for calculating the mean(), since you have complex numbers and an infinity complex numbers is defined as an infinite number in the complex plane whose complex argument is unknown or undefined, you're getting non*j as the imaginary part.
In order to get around this problem, you should ignore the infinity items in such mathematical operations. You can use isfinite() function to detect them and apply the function on finite items:
In [16]: arr = np.array([1 + 0j, 2 + 0j, np.inf + 0j])
In [17]: arr[np.isfinite(arr)]
Out[17]: array([ 1.+0.j, 2.+0.j])
In [18]: np.mean(arr[np.isfinite(arr)])
Out[18]: (1.5+0j)

Because of type promotion.
When you do the division of a complex by a real, like (inf + 0j) / 2, the (real) divisor gets promoted to 2 + 0j.
And by complex division, the imaginary part is equal to (0 * 2 - inf * 0) / 4. Note the inf * 0 here which is an indeterminate form, and it evaluates to NaN. This makes the imaginary part NaN.
And back to the topic. When numpy calculates the mean of a complex array, it really doesn't try to do anything clever. First it reduces the array with the "addition" operation, obtaining the sum. After that, the sum is divided by the count. This sum contains an inf in the real part, which causes the trouble described above when the divisor (count) gets promoted from integral type to complex floating point.
Edit: a word about solution
The IEEE floating point "infinity" is really a very primitive construct that represents indeterminate forms like 1 / 0. These forms are not constant numbers, but possible limits. The special inf or NaN "floating point numbers" are placeholders that notifies you about the presence of indeterminate forms. They do nothing about the existence or type of the limit, which you must determine by the mathematical context.
Even for real numbers, the underlying limit can depend on how you approach the limit. A superficial 1 / 0 form can go to positive or negative infinity. On the complex plane, things are even more complex (well). For example, you may run into branch cuts and (different kinds of) singularities. There's no universal solution that fits all.
Tl;dr: Fix the underlying problem in the face of ambiguous/incomplete/corrupted data, or prove that the end computational result can withstand such corruption (which can happen).

Why does Python's modulus operator (%) not match the Euclidean definition?

Euclidean definition says,
Given two integers a and b, with b ≠ 0, there exist unique integers q and r such that a = bq + r and 0 ≤ r < |b|, where |b| denotes the absolute value of b.
Based on below observation,
>>> -3 % -2 # Ideally it should be (-2 * 2) + 1
-1
>>> -3 % 2 # this looks fine, (-2 * 2) + 1
1
>>> 2 % -3 # Ideally it should be (-3 * 0) + 2
-1
looks like the % operator is running with different rules.
link1 was not helpful,
link2 gives recursive answer, because, as I do not understand how % works, it is difficult to understand How (a // b) * b + (a % b) == a works
My question:
How do I understand the behavior of modulo operator in python? Am not aware of any other language with respect to the working of % operator.

The behaviour of integer division and modulo operations are explained in an article of The History of Python, namely: Why Python's Integer Division Floors . I'll quote the relevant parts:
if one of the operands is negative, the result is floored, i.e.,
rounded away from zero (towards negative infinity):
>>> -5//2
-3
>>> 5//-2
-3
This disturbs some people, but there is a good mathematical reason.
The integer division operation (//) and its sibling, the modulo
operation (%), go together and satisfy a nice mathematical
relationship (all variables are integers):
a/b = q with remainder r
such that
b*q + r = a and 0 <= r < b
(assuming a and b are >= 0).
If you want the relationship to extend for negative a (keeping b
positive), you have two choices: if you truncate q towards zero, r
will become negative, so that the invariant changes to 0 <= abs(r)
otherwise, you can floor q towards negative infinity, and the
invariant remains 0 <= r < b.
In mathematical number theory, mathematicians always prefer the latter
choice (see e.g. Wikipedia). For Python, I made the same choice
because there are some interesting applications of the modulo
operation where the sign of a is uninteresting.
[...]
For negative b, by the way, everything just flips, and the invariant
becomes:
0 >= r > b.
In other words python decided to break the euclidean definition in certain circumstances to obtain a better behaviour in the interesting cases. In particular negative a was considered interesting while negative b was not considered as such. This is a completely arbitrary choice, which is not shared between languages.
Note that many common programming languages (C,C++,Java,...) do not satisfy the euclidean invariant, often in more cases than python (e.g. even when b is positive).
some of them don't even provide any guarantee about the sign of the remainder, leaving that detail as implementation defined.
As a side note: Haskell provides both kind of moduluses and divisions. The standard euclidean modulus and division are called rem and quot, while the floored division and "python style" modulus are called mod and div.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.