Floating point arithmetics: Possible unsafe reliance on specific comparison?

Floating point arithmetics: Possible unsafe reliance on specific comparison? - python

The following python code calculates the number of iterations to do stuff based on some variables.
# a - b - c is always a multiple of d.
i = (a - b - c) / d
while i:
# do stuff
i -= 1
The variables will all be of the same type, that is only ints or floats or whatever. My concern is whether it will work correctly if the values are floats. I know enough to always consider the pitfalls of relying on exact float values. But I can't tell if the above is dangerous or not. I can use i = int(round((a - b - c) / d)), but I am curious as to understand floats better.
It all comes down to the following: a - b - c is an exact multiple of d. So I am relying on (a-b-c)/d to become a value i that I can subtract 1 from and get the expected number of iterations in the while loop, with the implied assumption that i == 0 becomes true. That is, can calculated multiples like this be decremented by 1 to reach exactly 0?
I would like to not only know if it is unsafe, but more importantly, what do I need to understand about floating point to resolve a question like this? If someone knows decisively whether this is safe or not, would it be possible to explain how so?

You can use the decimal module to get an idea of what "hides" between a floating point number such as 0.3:
>>> from decimal import Decimal
>>> Decimal(0.3)
Decimal('0.299999999999999988897769753748434595763683319091796875')
Note that Python 2.7 changed how floating point numbers are written (how repr(f) works) so that it now shows the shortest string that will give the same floating point number if you do float(s). This means that repr(0.3) == '0.3' in Python 2.7, but repr(0.3) == '0.29999999999999999' in earlier versions. I'm mentioning this since it can confuse things further when you really want to see what's behind the numbers.
Using the decimal module, we can see the error in a computation with floats:
>>> (Decimal(2.0) - Decimal(1.1)) / Decimal(0.3) - Decimal(3)
Decimal('-1.85037170771E-16')
Here we might expect (2.0 - 1.1) / 0.3 == 3.0, but there is a small non-zero difference. However, if you do the computation with normal floating point numbers, then you do get zero:
>>> (2 - 1.1) / 0.3 - 3
0.0
>>> bool((2 - 1.1) / 0.3 - 3)
False
The result is rounded somewhere along the way since 1.85e-16 is non-zero:
>>> bool(-1.85037170771E-16)
True
I'm unsure exactly where this rounding takes place.
As for the loop termination in general, then there's one clue I can offer: for floats less than 253, IEEE 754 can represent all integers:
>>> 2.0**53
9007199254740992.0
>>> 2.0**53 + 1
9007199254740992.0
>>> 2.0**53 + 2
9007199254740994.0
The space between representable numbers is 2 from 253 to 254, as shown above. But if your i is an integer less than 253, then i - 1 will also be a representable integer and you will eventually hit 0.0, which is considered false in Python.

I will give you a language-agnostic answer (I don't really know Python).
There are multiple potential problems in your code. Firstly, this:
(a - b - c)
If a is (for example) 109, and b and c are both 1, then the answer will be 109, not 109-2 (I'm assuming single-precision float here).
Then there's this:
i = (a - b - c) / d
If numerator and denominator are numbers that can't be exactly represented in floating-point (e.g. 0.3 and 0.1), then the result might not be an exact integer (it might be 3.0000001 instead of 3). Therefore, your loop will never terminate.
Then there's this:
i -= 1
Similarly to above, if i is currently 109, then the result of this operation will still be 109, so your loop will never terminate.
Therefore, you should strongly consider performing all the calculations in integer arithmetic.

You're right that there could be a non-convergence on zero (at least for more iterations than you intend). Why not have your test be: while i >= 1. In that case, as with integers, if your i value dips below 1, the loop will end.

Related

How to ensure expressions that evaluate to floats, give the expected integer value with int(*)

In this question's most general form, I want to know how I can guarantee that int(x * y) (with x and y both being floats gives me the arithmetically "correct" answer when I know the result to be a round number. For example: 1.5 * 2.0 = 3, or 16.0 / 2.0 = 8. I worry this could be a problem because int can round down if there is some floating point error. For example: int(16.0 - 5 * sys.float_info.epsilon) gives 15.
And specializing the question a bit, I could also ask about division between two ints where I know the result is a round number. For example 16 / 2 = 8. If this specialization changes the answer to the more general question, I'd like to know how.
By the way, I know that I could do int(round(x * y). I'm just wondering if there's a more direct built-in, or if there's some proven guarantee that means I don't have to worry about this in the first place.

If both inputs are exact, and the mathematically correct result is representable, then the output is also guaranteed to be exact. This is only true for a limited number of basic floating-point operations, but * and / are such operations.
Note that the "both inputs are exact" condition is only satisfiable for dyadic rationals. Numbers like 1.5 are fine, but numbers like 0.1 cannot be exactly represented in binary floating point. Also, floating point precision limits apply to integers, too, not just fractional values - very large integers may not be exactly representable, due to requiring more precision than a Python float has.

sometimes missing a cent when translating euros to euro cents

I have to translate euro's (in a string) to euro cents (int):
Examples:
'12,1' => 1210
'14,51' => 1451
I use this python function:
int(round(float(amount.replace(',', '.')), 2) * 100)
But with this amount '1229,84' the result is : 122983
Update
I use the solution from Wim, bacause I use integers in both Python / Jinja and javascript for currency artitmetic. See also the answer from Chepner.
int(round(100 * float(amout.replace(',', '.')), 2))
My questions was anwered by Mr. Me, who explained the above result.

What the Docs Say, and a simple explanation
I tried it out, and was surprised that this was happening. So I turned to the documentation, and there is a little note in there that says.
Note The behavior of round() for floats can be surprising: for
example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This
is not a bug: it’s a result of the fact that most decimal fractions
can’t be represented exactly as a float.
Now what does that mean, most decimal fractions can't be represented as a float. Well the documentations follows up with a great link at explains this, but since you probably didn't come here to read a nerdy technical document, let me summarize what is going on.
Python uses the IEEE-754 floating point standard to represent floats. This standard compromises accuracy for speed. Some numbers cannot be accurately represented. For example .1 is actually represented as 0.1000000000000000055511151231257827021181583404541015625. Interestingly, .1 in binary is actually an infinitely repeating number, just like 1/3 is an infinitely repeating .333333.
An Under the Hood Case Study
Now on to your particular case. This was pretty fun to look into, and this is what I discovered.
first lets simplify what you where trying to do
>>> amount = '1229,84'
>>> int(round(float(amount.replace(',', '.')), 2) * 100)
>>> 122983
to
>>>int(1229.84 * 100)
>>> 122983
Sometimes Python1 is unable to 100% accurately display binary floating point numbers, for the same reason we are unable to display the fraction 1/3 as a decimal. When this happens Python hides any extra digits. .1 is actually stored as -0.100000000000000092, but Python will display it as .1 if you type it into the console. We can see those extra digits by doing int(1.1) - 1.13. we can apply this int(myNum) - myNum formula to most floating point numbers to see the extra hidden digits behind them.4. In your case we would do the following.
>>> int(1229.84) - 1229.84
-0.8399999999999181
1229.84 is actually 1229.8399999999999181. Continuing on.5
>>> 1229.84, 2) * 100
122983.99999999999 #there's all of our hidden digits showing up.
Now on to the last step. This is the part we are concerned about. Changing it back to an integer.
>>> int(122983.99999999999)
122983
It rounds downwards instead of upwards, however, if we never had multiplied it by 100, we would still have 2 more 9s at the end, and Python would round up.
>>> int(122983.9999999999999)
122984
??? Now what is going on. Why is Python rounding 122983.99999999999 down, but it rounds 122983.9999999999999 up? Well whenever Python turns a float into a integer it rounds down. However, you have to remember that to Python 122983.9999999999999 with the extra two 99s at the end is the same thing as 122984.0 For example.
>>> 122983.9999999999999
122984.0
>>> a = 122983.9999999999999
>>> int(a) - a
0.0
and without the two extra 99s on the end.
>>> 122983.99999999999
122983.99999999999
>>> a=122983.99999999999
>>> int(a) - a
-0.9999999999854481
Python is definitely treating 122983.9999999999999 as 122984.0 but not 122983.99999999999. Now back to casting 122983.99999999999 to an integer. Because we have created ourselves a decimal portion that is less than 122984 that Python sees as being a seperate number from 122984, and because casting to an integer always causes Python to round down, we get 122983 as a result.
Whew. That was a lot to go through, but I sure learned a lot writing this out, and I hope you did to. The solution to all of this is to use decimal numbers instead of floats which compromises speed for accuracy.
What about rounding? The original problem had some rounding in it as well -- it's useless. See appendix item 6.
The Solution
a) The easiest solution is to use the decimal module instead of floating point numbers. This is the preferred way of doing things in any finance or accounting program.
The documentation also mentioned the following solutions which I've summarized.
b) The exact value can be expressed and retrieved in a hexadecimal form via myFloat.hex() and float.fromhex(myHex)
c) The exact value can also be retrieved as a fraction through myFloat.as_integer_ratio()
d) The documentation briefly mentions using SciPy for floating point arithmitic, however this SO question mentions that SciPy's NumPy floats are nothing more than aliases to the built-in float type. The decimal module would be a better solution.
Appendix
1 - Even though I will often refer to Python's behavior, the things I talk about are part of the IEEE-754 floating point standard which is what the major programming languages use for their floating point numbers.
2 - int(1.1) - 1.1 gives me -0.10000000000000009, but according to the documentation .1 is really 0.1000000000000000055511151231257827021181583404541015625
3 - We used int(1.1) - 1.1 instead of int(.1) - .1 because int(.1) - .1 does not give us the hidden digits, but according to the documentation they should still be there for .1, hence I say int(someNum) -someNum works most of the time, but not all of the time.
4 - When we use the formula int(myNum) - myNum what is happening is that casting the number to an integer will round the number down so int(3.9) becomes 3, and when we minus 3 from 3.9 we are left with -.9. However, for some reason that I do not know, when we get rid of all the whole numbers, and we're just left with the decimal portion, Python decides to show us everything -- the whole mantissa.
5 - this does not really affect the outcome of our analysis, but when multiplying by 100, instead of the hidden digits being shifted over by 2 decimal places, they changed a little as well.
>>> a = 1229.84
>>> int(a) - a
-0.8399999999999181
>>> a = round(1229.84, 2) * 100
>>> int(a) - a
-0.9999999999854481 #I expected -0.9999999999918100?
6 - It may seem like we can get rid of all those extra digits by rounding to two decimal places.
>>> round(1229.84, 2) # which is really round(1229.8399999999999181, 2)
1229.84
But when we use our int(someNum) - someNum formula to see the hidden digits, they are still there.
>>> a = round(1229.84, 2)
>>> int(a) - a
-0.8399999999999181
This is because Python cannot store 1229.84 as a binary floating point number. It can't be done. So... rounding 1229.84 does absolutely nothing.

Don't use floating-point arithmetic for currency; rounding error for values that cannot be represented exactly will cause the type of loss you are seeing. Instead, convert the string representation to an integer number of cents, which you can convert to euros-and-cents for display as needed.
euros, cents = '12,1'.split(',') # '12,1' -> ('12', '1')
cents = 100*int(euros) + int(cents * 10 if len(cents) == 1 else 1) # ('12', '1') -> 1210
(Notice you'll need a check to handle cents without a trailing 0.)
display_str = '%d,%d' % divMod(cents, 100) # 1210 -> (12, 10) -> '12.10'
You can also use the Decimal class from the decimal module, which essentially encapsulates all the logic for using integers to represent fractional values.

As #wim mentions in a comment, use the Decimal type from the stdlib decimal module instead of the built in float type. Decimal objects do not have the binary rounding behavior that floats have and also have a precision that can be user defined.
Decimal should be used anywhere you are doing financial calculations or anywhere you need floating point calculations that behave like the decimal math people learn in school (as opposed to the binary floating point behavior of the built in float type).

Format of complex number in Python

I am wondering about the way Python (3.3.0) prints complex numbers. I am looking for an explanation, not a way to change the print.
Example:
>>> complex(1,1)-complex(1,1)
0j
Why doesn't it just print "0"? My guess is: to keep the output of type complex.
Next example:
>>> complex(0,1)*-1
(-0-1j)
Well, a simple "-1j" or "(-1j)" would have done. And why "-0"?? Isn't that the same as +0? It doesn't seem to be a rounding problem:
>>> (complex(0,1)*-1).real == 0.0
True
And when the imaginary part gets positive, the -0 vanishes:
>>> complex(0,1)
1j
>>> complex(0,1)*-1
(-0-1j)
>>> complex(0,1)*-1*-1
1j
Yet another example:
>>> complex(0,1)*complex(0,1)*-1
(1-0j)
>>> complex(0,1)*complex(0,1)*-1*-1
(-1+0j)
>>> (complex(0,1)*complex(0,1)*-1).imag
-0.0
Am I missing something here?

It prints 0j to indicate that it's still a complex value. You can also type it back in that way:
>>> 0j
0j
The rest is probably the result of the magic of IEEE 754 floating point representation, which makes a distinction between 0 and -0, the so-called signed zero. Basically, there's a single bit that says whether the number is positive or negative, regardless of whether the number happens to be zero. This explains why 1j * -1 gives something with a negative zero real part: the positive zero got multiplied by -1.
-0 is required by the standard to compare equal to +0, which explains why (1j * -1).real == 0.0 still holds.
The reason that Python still decides to print the -0, is that in the complex world these make a difference for branch cuts, for instance in the phase function:
>>> phase(complex(-1.0, 0.0))
3.141592653589793
>>> phase(complex(-1.0, -0.0))
-3.141592653589793
This is about the imaginary part, not the real part, but it's easy to imagine situations where the sign of the real part would make a similar difference.

The answer lies in the Python source code itself.
I'll work with one of your examples. Let
a = complex(0,1)
b = complex(-1, 0)
When you doa*b you're calling this function:
real_part = a.real*b.real - a.imag*b.imag
imag_part = a.real*b.imag + a.imag*b.real
And if you do that in the python interpreter, you'll get
>>> real_part
-0.0
>>> imag_part
-1.0
From IEEE754, you're getting a negative zero, and since that's not +0, you get the parens and the real part when printing it.
if (v->cval.real == 0. && copysign(1.0, v->cval.real)==1.0) {
/* Real part is +0: just output the imaginary part and do not
include parens. */
...
else {
/* Format imaginary part with sign, real part without. Include
parens in the result. */
...
I guess (but I don't know for sure) that the rationale comes from the importance of that sign when calculating with elementary complex functions (there's a reference for this in the wikipedia article on signed zero).

0j is an imaginary literal which indeed indicates a complex number rather than an integer or floating-point one.
The +-0 ("signed zero") is a result of Python's conformance to IEEE 754 floating point representation since in Python, complex is by definition a pair of floating point numbers. Due to the latter, there's no need to print or specify zero fraction parts for a complex too.
The -0 part is printed in order to accurately represent the contents as repr()'s documentation demands (repr() is implicitly called whenever an operation's result is output to the console).
Regarding the question why (-0+1j) = 1j but (1j*-1) = (-0+1j).
Note that (-0+0j) or (-0.0+0j) aren't single complex numbers but expressions - an int/float added to a complex. To compute the result, first the first number is converted to a complex (-0-> (0.0,0.0) since integers don't have signed zeros, -0.0-> (-0.0,0.0)). Then its .real and .imag are added to the corresponding ones of 1j which are (+0.0,1.0). The result is (+0.0,1.0) :^) . To construct a complex directly, use complex(-0.0,1).

As far as the first question is concerned: if it just printed 0 it would be mathematically correct, but you wouldn't know you were dealing with a complex object vs an int. As long as you don't specify .real you will always get a J component.
I'm not sure why you would ever get -0; it's not technically incorrect (-1 * 0 = 0) but it's syntactically odd.
As far as the rest goes, it's strange that it isn't consistent, however none are technically correct, just an artifact of the implementation.

Pitfalls of number values in Python, "How deep?"

I'm a fairly green programmer, and I'm learning Python right now. I'm up to chapter 17 in "Learn to Think Like a Computer Scientist" (Classes and Methods), and I just wrote my first doctest that failed in a way I truly do not fully understand:
class Point(object):
'''
represents a point object.
attributes: x, y
'''
def ___init___(self, x = 0, y = 0):
'''
>>> point = Point()
>>> point.y
0
>>> point = Point(4.7, 8.2)
>>> point.x
4.7
'''
self.x = x
self.y = y
The second doctest for __init__ fails, and returns 4.7000000000000002 instead of 4.7. However, if I rewrite the doctest with a "print" statement as so:
>>> point = Point(4.7, 8.2)
>>> print point.x
4.7
It runs correctly.
So I read up on how Python stores floats, and I now understand that, due to binary representation of decimal numbers, the reason for the discrepancy is that Python stores 4.7 as a string of 1s and 0s that almost but don't quite equal 4.7.
But what I don't understand is why a call to "point.x" returns 4.7000000000000002 and a call to "print point.x" returns 4.7. Under what other circumstances will Python choose to round like it does with "print"? How does this rounding work? Can these trailing significant figures lead to errors in programming (aside from, obviously, failed doctests)? Can a failure to pay attention to rounding create dangerous ambiguity?
Since this has to do with binary representation of decimal numbers, I'm sure that this is in fact a general CS issue and not one specific to Python, but what I really need to know right now is what I can do, specifically as a Python programmer, to avoid any related issues and/or bug infestations.
Also, for bonus points, is there some other way that Python can store floating point numbers aside from the default activated by a line like "a = 4.7"? I know there's the Decimal package, but I'm not totally sure how it works. Honestly, all of this dynamic typing stuff confuses me sometimes.
Edit:
I should specify that I'm using Python 2.6 (at some point I want to use NumPy and Biopython)

>>> point.x
calls repr function which is for string representation holding more technical information than strfunction, which is called when
>>> print point.x
occurs

This has to do with how computers store floating point numbers. A detailed description of this is here. However, for your case, the quick solution is to check not the printed representation of point.x but if point.x is equal to 4.7. So...
>>> point = Point(4.7, 8.2)
>>> point.x == 4.7
True
Or better:
>>> point = Point(4.7, 8.2)
>>> eps = 2**-53 #get epsilon for standard double precision number
>>> -eps <= point.x - 4.7 <= eps
True
Where eps is the maximum value for rounding errors in floating-point arithmetic. For details on epsilon, see here.
EDIT: -eps <= point.x - 4.7 <= eps is equivalent to abs(point.x - 4.7) <= eps. I only add this because not everyone is familiar with Python's chaining of comparison operators.
EDIT 2: Since you mentioned numpy, numpy has a method to get the eps without calculating it yourself. Use eps = numpy.finfo(float).eps instead of 2**-53 if you're using numpy. Note that the numpy epsilon is for some reason bigger than it should be and is equal to 2**-52 rather than 2**-53. I have no idea why this is.

When working with floating point numbers, the common approach goes like this:
a == b if abs(a-b) <= eps, where eps is the required precision.
In programming contests, eps is given along with the problem to solve.
My advice is to establish an accuracy that you need for your stuff, and use it

You get a different behavior because print truncates numbers:
In [1]: 1.23456789012334
Out[1]: 1.23456789012334
In [2]: print 1.23456789012334
1.23456789012
Note, at the precision used in Python's floats:
In [3]: 4.7 == 4.7000000000000002
Out[3]: True
This is because floats have a limited (relative) precision because they use a finite number of (binary) digits to represent real numbers. Thus, as above, different decimal representations of a given number can actually be equal for Python, after being approximated by the closest float. This is a general property of floating point numbers.

This comprehensive guide explains everything.
Here are Python-specific explanations.

Python:Which way gives better precision

Is there any difference in precision between one time assignment:
res=n/k
and multiple assignment in for cycle:
for i in range(n):
res+=1/k
?

Floating-point division a/b is not mathematical division a ÷ b, except in very rare* circumstances.
Generally, floating point division a/b is a ÷ b + ε.
This is true for two reasons.
Float numbers (except in rare cases) are an approximation of the decimal number.
a is a + εa.
b is b + εb.
Float numbers uses a base 2 encoding of the digits to the right of the decimal place. When you write 3.1, this is expanded to a base-2 approximation that differs from the real value by a small amount.
Real decimal numbers have the same problem, by the way. Write down the decimal expansion of 1/3. Oops. You have to stop writing decimal places at some point. Binary floating point numbers have the same problem.
Division has a fixed number of binary places, meaning the answer is truncated. If there's a repeating binary pattern, it gets chopped. In rare cases, this doesn't matter. In general, you've introduced error by doing division.
Therefore, when you do something like repeatedly add 1/k values you're computing
1 ÷ k + ε
And adding those up. Your result (if you had the right range) would be
n × (1 ÷ k + ε) = n ÷ k + n × ε
You've multiplied the small error, ε, by n. Making it a big error. (Except in rare cases.)
This is bad. Very bad. All floating point division introduces an error. Your job as a programmer is to do the algebra to avoid or defer division to prevent this. Good software design means good algebra to prevent errors being introduced by the division operator.
[* The rare cases. In rare cases, the small error happens to be zero. The rare cases occur when your floating point values are small whole numbers or fractions that are sums of powers of two 1/2, 1/4, 1/8, etc. In the rare case that you have a benign number with a benign fractional part, the error will be zero.]

Sure, they are different, because of how floating point division works.
>>> res = 0
>>> for x in xrange(5000): res += 0.1
...
>>> res == 5000 * 0.1
False
There's a good explanation in the python official tutorial.

Well if k divides n then definitely the first one is more precise :-) To be serious, if the division is floating point and n > 1 then the first one will be more precise anyway though they will probably give different results, as nosklo said.
BTW, in Python 2.6 the division is integer by default so you'll have very different results. 1/k will always give 0 unless k <= 1.

Floating point arithmetic has representation and roundoff errors. For the types of data floating point numbers are intended to represent, real numbers of reasonable size, these errors are generally acceptable.
If you want to calculate the quotient of two numbers, the right way is simply to say result = n / k (beware if these are both integers and you have not said from __future__ import division, this is not what you may expect). The second way is silly, error-prone, and ugly.
There is some discussion of floating point inexactness in the Python tutorial: http://docs.python.org/tutorial/floatingpoint.html

Even if we charitably assume a floating-point division, there's very definitely a difference in precision; the for loop is executed n - 1 times!
assert (n-1) / k != n / k
Also depends on what res is initialised to in the second case :-)

Certainly there is a difference if you use floating point numbers, unless the Python interpreter/compiler you are using is capable of optimizing away the loop (Maybe Jython or IronPython might be able to? C compilers are pretty good at this).
If you actually want these two approaches to be the same precision though, and you are using integers for your numerator and denominator, you can use the python fractions package
from fractions import Fraction
n,k = 999,1000
res = Fraction(0,1)
for i in range(0,n):
res += Fraction(1,k)
print float(res)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.