I'm new to python, and I'm trying to understand the floating point approximation and how floats are represented in Python.
For example:
>>> .1 + .1 + .1 == .3
False
>>> .25 + .25 + .25 == 0.75
True
I understand these two situations but what about these specific situations.
>>> .1 + .1 + .1 +.1 == .4
True
>>> .1 + .1 == .2
True
Is it coincidently just because the values of .1+.1+.1+.1 and .1+.1 are equal to .4 and .2 respectively even if these numbers are not correctly represented in Python? Are there any other situations like this or is there any way to identify them?
Thank you!
Short answer: Yes its just a coincidence.
Numbers are represented as 64 bit IEEE floating point numbers in Python, also called double-precision.
https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats
When you write 0.3 Python finds the closest IEEE number that represents 0.3.
When adding multiple numbers these small errors in the last digits accumulate and you end up with a different number. Sometimes that happens sooner, other times later. Sometimes those errors counter-act, often not.
This answer is a good read:
Is floating point math broken?
To go deeper into your examples, you would need to look at the bit representation of these numbers. However it gets complicated, as one also need to look at how rounding and addition works ...
Floating-point numbers are represented in computer hardware as base 2 (binary) fractions. For example, the decimal fraction 0.125 has value 1/10 + 2/100 + 5/1000, and in the same way the binary fraction 0.001 has value 0/2 + 0/4 + 1/8. These two fractions have identical values, the only real difference being that the first is written in base 10 fractional notation, and the second in base 2.
Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine.
One illusion may beget another. For example, since 0.1 is not exactly 1/10, summing three values of 0.1 may not yield exactly 0.3, either:
>>>.1 + .1 + .1 == .3
False
Also, since the 0.1 cannot get any closer to the exact value of 1/10 and 0.3 cannot get any closer to the exact value of 3/10, then pre-rounding with round() function cannot help:
>>>round(.1, 1) + round(.1, 1) + round(.1, 1) == round(.3, 1)
False
Though the numbers cannot be made closer to their intended exact values, the round() function can be useful for post-rounding so that results with inexact values become comparable to one another:
>>>round(.1 + .1 + .1, 10) == round(.3, 10)
True
Related
I know that most decimals don't have an exact floating point representation (Is floating point math broken?).
But I don't see why 4*0.1 is printed nicely as 0.4, but 3*0.1 isn't, when
both values actually have ugly decimal representations:
>>> 3*0.1
0.30000000000000004
>>> 4*0.1
0.4
>>> from decimal import Decimal
>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an "exact" operation). Python tries to find the shortest string that would round to the desired value, so it can display 4*0.1 as 0.4 as these are equal, but it cannot display 3*0.1 as 0.3 because these are not equal.
You can use the .hex method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what's going on under the hood.
>>> (0.1).hex()
'0x1.999999999999ap-4'
>>> (0.3).hex()
'0x1.3333333333333p-2'
>>> (0.1*3).hex()
'0x1.3333333333334p-2'
>>> (0.4).hex()
'0x1.999999999999ap-2'
>>> (0.1*4).hex()
'0x1.999999999999ap-2'
0.1 is 0x1.999999999999a times 2^-4. The "a" at the end means the digit 10 - in other words, 0.1 in binary floating point is very slightly larger than the "exact" value of 0.1 (because the final 0x0.99 is rounded up to 0x0.a). When you multiply this by 4, a power of two, the exponent shifts up (from 2^-4 to 2^-2) but the number is otherwise unchanged, so 4*0.1 == 0.4.
However, when you multiply by 3, the tiny little difference between 0x0.99 and 0x0.a0 (0x0.07) magnifies into a 0x0.15 error, which shows up as a one-digit error in the last position. This causes 0.1*3 to be very slightly larger than the rounded value of 0.3.
Python 3's float repr is designed to be round-trippable, that is, the value shown should be exactly convertible into the original value (float(repr(f)) == f for all floats f). Therefore, it cannot display 0.3 and 0.1*3 exactly the same way, or the two different numbers would end up the same after round-tripping. Consequently, Python 3's repr engine chooses to display one with a slight apparent error.
repr (and str in Python 3) will put out as many digits as required to make the value unambiguous. In this case the result of the multiplication 3*0.1 isn't the closest value to 0.3 (0x1.3333333333333p-2 in hex), it's actually one LSB higher (0x1.3333333333334p-2) so it needs more digits to distinguish it from 0.3.
On the other hand, the multiplication 4*0.1 does get the closest value to 0.4 (0x1.999999999999ap-2 in hex), so it doesn't need any additional digits.
You can verify this quite easily:
>>> 3*0.1 == 0.3
False
>>> 4*0.1 == 0.4
True
I used hex notation above because it's nice and compact and shows the bit difference between the two values. You can do this yourself using e.g. (3*0.1).hex(). If you'd rather see them in all their decimal glory, here you go:
>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(0.3)
Decimal('0.299999999999999988897769753748434595763683319091796875')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
>>> Decimal(0.4)
Decimal('0.40000000000000002220446049250313080847263336181640625')
Here's a simplified conclusion from other answers.
If you check a float on Python's command line or print it, it goes through function repr which creates its string representation.
Starting with version 3.2, Python's str and repr use a complex rounding scheme, which prefers
nice-looking decimals if possible, but uses more digits where
necessary to guarantee bijective (one-to-one) mapping between floats
and their string representations.
This scheme guarantees that value of repr(float(s)) looks nice for simple
decimals, even if they can't be
represented precisely as floats (eg. when s = "0.1").
At the same time it guarantees that float(repr(x)) == x holds for every float x
Not really specific to Python's implementation but should apply to any float to decimal string functions.
A floating point number is essentially a binary number, but in scientific notation with a fixed limit of significant figures.
The inverse of any number that has a prime number factor that is not shared with the base will always result in a recurring dot point representation. For example 1/7 has a prime factor, 7, that is not shared with 10, and therefore has a recurring decimal representation, and the same is true for 1/10 with prime factors 2 and 5, the latter not being shared with 2; this means that 0.1 cannot be exactly represented by a finite number of bits after the dot point.
Since 0.1 has no exact representation, a function that converts the approximation to a decimal point string will usually try to approximate certain values so that they don't get unintuitive results like 0.1000000000004121.
Since the floating point is in scientific notation, any multiplication by a power of the base only affects the exponent part of the number. For example 1.231e+2 * 100 = 1.231e+4 for decimal notation, and likewise, 1.00101010e11 * 100 = 1.00101010e101 in binary notation. If I multiply by a non-power of the base, the significant digits will also be affected. For example 1.2e1 * 3 = 3.6e1
Depending on the algorithm used, it may try to guess common decimals based on the significant figures only. Both 0.1 and 0.4 have the same significant figures in binary, because their floats are essentially truncations of (8/5)(2^-4) and (8/5)(2^-6) respectively. If the algorithm identifies the 8/5 sigfig pattern as the decimal 1.6, then it will work on 0.1, 0.2, 0.4, 0.8, etc. It may also have magic sigfig patterns for other combinations, such as the float 3 divided by float 10 and other magic patterns statistically likely to be formed by division by 10.
In the case of 3*0.1, the last few significant figures will likely be different from dividing a float 3 by float 10, causing the algorithm to fail to recognize the magic number for the 0.3 constant depending on its tolerance for precision loss.
Edit:
https://docs.python.org/3.1/tutorial/floatingpoint.html
Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.
There is no tolerance for precision loss, if float x (0.3) is not exactly equal to float y (0.1*3), then repr(x) is not exactly equal to repr(y).
Today it was pointed out to me that 0.99 can't be represented by a float:
num = 0.99
print('{0:.20f}'.format(num))
prints 0.98999999999999999112. I'm fine with this concept.
So then how does python know to do this:
num = 0.99
print(num)
prints 0.99.
How does Python remember the number of decimal places one used to specify a float?
It doesn't. Try this:
num = 0.990
print(num)
Notice that that also outputs 0.99, not 0.990.
I can't speak specifically for the print function, but it's common in environments that have IEEE-754 double-precision binary floating point numbers to use an algorithm that outputs only as many digits as are needed to differentiate the number from its closest "representable" neighbour. But it's much more complicated than it might seem on the surface. See this paper on number rounding for details (associated code here and here).
Sam Mason provided some great links related to this:
From Floating Point Arithmetic: Issues and Limitations
This bears out the "closest representable" thing above. It starts by describing the issue in base 10 that you can't accurately represent one-third (1/3). 0.3 comes close, 0.33 comes closer, 0.333 comes even closer, but really 1/3 is an infinitely repeating series of 0.3 followed by 3s forever. In the same way, binary float point (which stores the number as a base 2 fraction rather than a base 10 fraction) can't exactly represent 0.1 (for instance), like 1/3 in base 10 it's an infinitely repeating series of digits in base 2 and anything else is an approximation. It then continues:
In the same way, no matter how many base 2 digits you’re willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, 1/10 is the infinitely repeating fraction
0.0001100110011001100110011001100110011001100110011...
Stop at any finite number of bits, and you get an approximation. On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is 3602879701896397 / 2 ** 55 which is close to but not exactly equal to the true value of 1/10.
Many users are not aware of the approximation because of the way values are displayed. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. On most machines, if Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display
>>> 0.1
0.1000000000000000055511151231257827021181583404541015625
That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead
>>> 1 / 10
0.1
Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.
The code for it in CPython
An issue discussing it on the issues list
It's not remembering. It's looking at the value it's got and deciding the best way to present it, which it thinks in this case is 0.99 because the value is as close as possible to 0.99.
If you print(0.98999999999999999112) it will show 0.99, even though that is not the number of decimal places you used to specify it.
>>> .1+.1+.1+.1 ==.4
True
>>> .1+.1+.1 ==.3
False
>>>
The above is an output from python interpreter. I understand the fact that
floating point arithmetic is done using base 2 and is stored as binary in the
and so the difference in calculations like above results.
Now I found that .4 = .011(0011) [The number inside () repeats infinitely this is a binary
representation of this fraction] since this cannot be stored exactly an approximate value
will be stored.
Similary 0.3 = .01(0011)
So both 0.4 and 0.3 cannot be stored exactly internally.
But then what's the reason for python to return first as True and the second as False
As both cannot be compared
_______________________________________________________________________________
I did some research and found the following :
>>> Decimal(.4)
Decimal('0.40000000000000002220446049250313080847263336181640625')
>>> Decimal(.1+.1+.1+.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
>>> Decimal(.1+.1+.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(.3)
Decimal('0.299999999999999988897769753748434595763683319091796875')
>>> Decimal(.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
This probably explains why the additions are happening the way they are
assuming that Decimal is giving the exact ouput of the number stored underneath
But then what's the reason for python to return first as True and the second as False
As both cannot be compared
Floating-point numbers absolutely can be compared for equality. Problems arise only when you expect exact equality to be preserved by an approximate computation. But the semantics of floating-point equality comparison is perfectly well defined.
When you write 0.1 in a program, this is rounded to the nearest IEEE 754 binary64 floating-point number, which is the real number 0.1000000000000000055511151231257827021181583404541015625, or 0x1.999999999999ap−4 in hexadecimal notation (the ‘p−4’ part means × 2⁻⁴). Every (normal) binary64 floating-point number is a real number of the form ±2ⁿ × (1 + 𝑓/2⁵³), where 𝑛 and 𝑓 are integers with −1022 ≤ 𝑛 ≤ 1023 and 0 ≤ 𝑓 < 2⁵³; this one is the nearest such number to 0.1.
When you add that to itself three times in floating-point arithmetic, the exact result 0.3000000000000000166533453693773481063544750213623046875 is rounded to 0.3000000000000000444089209850062616169452667236328125 or 0x1.3333333333334p−2 (since there are only 53 bits of precision available), but when you write 0.3, you get 0.299999999999999988897769753748434595763683319091796875 or 0x1.3333333333333p−2 which is slightly closer to 0.3.
However, four times 0.1000000000000000055511151231257827021181583404541015625 or 0x1.999999999999ap−4 is 0.4000000000000000222044604925031308084726333618164062500 or 0x1.999999999999ap−2, which is also the closest floating-point number to 0.4 and hence is what you get when you write 0.4 in a program. So when you write 4*0.1, the result is exactly the same floating-point number as when you write 0.4.
Now, you didn't write 4*0.1—instead you wrote .1 + .1 + .1 + .1. But it turns out there is a theorem in binary floating-point arithmetic that x + x + x + x—that is, fl(fl(fl(𝑥 + 𝑥) + 𝑥) + 𝑥)—always yields exactly 4𝑥 without rounding (except when it overflows), in spite of the fact that x + x + x or fl(fl(𝑥 + 𝑥) + 𝑥) = fl(3𝑥) may be rounded and not exactly equal to 3𝑥. (Note that fl(𝑥 + 𝑥) = fl(2𝑥) is always equal to 2𝑥, again ignoring overflow, because it's just a matter of adjusting the exponent.)
It just happens that any rounding error committed by adding the fourth term cancels out whatever rounding error may have been committed by adding the third!
I have a number that I have to deal with that I hate (and I am sure there are others).
It is
a17=0.0249999999999999
a18=0.02499999999999999
Case 1:
round(a17,2) gives 0.02
round(a18,2) gives 0.03
Case 2:
round(a17,3)=round(a18,3)=0.025
Case 3:
round(round(a17,3),2)=round(round(a18,3),2)=0.03
but when these numbers are in a data frame...
Case 4:
df=pd.DataFrame([a17,a18])
np.round(df.round(3),2)=[0.02, 0.02]
Why are the answers I get are the same as in Case 1?
When you are working with floats - you will be unable to get EXACT value, but only approximated in most cases. Because of the in-memory organization of floats.
You should keep in mind, that when you print float - you always print approximated decimal!!!
And this is not the same.
Exact value will be only 17 digits after '.' in 0.xxxx
That is why:
>>> round(0.0249999999999999999,2)
0.03
>>> round(0.024999999999999999,2)
0.02
This is true for most of programming languages (Fortran, Python, C++ etc)
Let us look into fragment of Python documentation:
(https://docs.python.org/3/tutorial/floatingpoint.html)
0.0001100110011001100110011001100110011001100110011...
Stop at any finite number of bits, and you get an approximation. On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is 3602879701896397 / 2 ** 55 which is close to but not exactly equal to the true value of 1/10.
Many users are not aware of the approximation because of the way values are displayed. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. On most machines, if Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display
>>>0.1
0.1000000000000000055511151231257827021181583404541015625
That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead
>>>1 / 10
0.1
Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.
Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.
Let us look into fragment of NumPy documentation:
(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.around.html#numpy.around)
For understanding - np.round uses np.around - see NumPy documentation
For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc. Results may also be surprising due to the inexact representation of decimal fractions in the IEEE floating point standard [R9] and errors introduced when scaling by powers of ten.
Conclusions:
In your case np.round just rounded 0.025 to 0.02 by rules described above (source - NumPy documentation)
I understand that floating number has their limitations so this can be expected:
>>> 0.1 + 0.2 == 0.3
False
But why is this valid? Computers can't store 0.45, 0.55 reliably either right?
>>> 0.45 + 0.55 == 1.00
True
I want to know how in the first case computer couldn't correct its inaccuracy and in the later one it could.
As you know most decimal numbers can't be stored exactly. That's true for all of your above numbers except 1.0.
But they get stored with a high accuracy. Instead of 0.3, some very close representable number gets used. It's not only very close, it's the closest such number.
When you compute 0.1 + 0.2, then another representable number gets computed, which is also very close to 0.3. You are "unlucky" and it differs from the closest possible representable number.
There's no real luck involved, both 0.1 and 0.2 get represented by a slightly larger number. When added, the two errors add as they're of the same sign and you get something like 0.30000000000000004.
With 0.45 + 0.55, the errors are of different signs and cancel out.