I am trying to find the cause of this result:
import numpy
result1 = numpy.rint(1.5)
result2 = numpy.rint(6.5)
print result
The output:
result1-> 2
result2-> 6
This is odd: result1 is correct but I result2 is not (It has to be 7 because rint rounds any float to the nearest integer).
Any idea? (THANKS!)
From numpy's documentation on numpy.around, equivalent to numpy.round, which supposedly also is relevant for numpy.rint:
For values exactly halfway between rounded decimal values, Numpy
rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5
and 0.5 round to 0.0, etc. Results may also be surprising due to the
inexact representation of decimal fractions in the IEEE floating point
standard [R9] and errors introduced when scaling by powers of ten.
Also relevant: While for large numbers there might be representation errors, for small values half integers are exactly representable in binary-base floating points, in particular 1.5 and 6.5 are exactly representable in standard single-precision floats. Without the preference for either odd, even, lower, upper integers or any other scheme one would have undefined behaviour here.
As #wim points out in the comments the behaviour of Python's build-in round is different. It rounds away from zero: It prefers upper integers for positive inputs and lower integers for negative inputs. (see http://docs.python.org/2/library/functions.html#round)
I think this is the rule of the thumb - when you have a float midway between two integers, like 1.5 lies midway between 1 and 2 and since both choices are equally good, we prefer rounding to the even number(which is 2 in this case) and for 6.5, which lies midway between 6 and 7, 6 is the closest even number.
Related
In this question's most general form, I want to know how I can guarantee that int(x * y) (with x and y both being floats gives me the arithmetically "correct" answer when I know the result to be a round number. For example: 1.5 * 2.0 = 3, or 16.0 / 2.0 = 8. I worry this could be a problem because int can round down if there is some floating point error. For example: int(16.0 - 5 * sys.float_info.epsilon) gives 15.
And specializing the question a bit, I could also ask about division between two ints where I know the result is a round number. For example 16 / 2 = 8. If this specialization changes the answer to the more general question, I'd like to know how.
By the way, I know that I could do int(round(x * y). I'm just wondering if there's a more direct built-in, or if there's some proven guarantee that means I don't have to worry about this in the first place.
If both inputs are exact, and the mathematically correct result is representable, then the output is also guaranteed to be exact. This is only true for a limited number of basic floating-point operations, but * and / are such operations.
Note that the "both inputs are exact" condition is only satisfiable for dyadic rationals. Numbers like 1.5 are fine, but numbers like 0.1 cannot be exactly represented in binary floating point. Also, floating point precision limits apply to integers, too, not just fractional values - very large integers may not be exactly representable, due to requiring more precision than a Python float has.
I know that most decimals don't have an exact floating point representation (Is floating point math broken?).
But I don't see why 4*0.1 is printed nicely as 0.4, but 3*0.1 isn't, when
both values actually have ugly decimal representations:
>>> 3*0.1
0.30000000000000004
>>> 4*0.1
0.4
>>> from decimal import Decimal
>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an "exact" operation). Python tries to find the shortest string that would round to the desired value, so it can display 4*0.1 as 0.4 as these are equal, but it cannot display 3*0.1 as 0.3 because these are not equal.
You can use the .hex method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what's going on under the hood.
>>> (0.1).hex()
'0x1.999999999999ap-4'
>>> (0.3).hex()
'0x1.3333333333333p-2'
>>> (0.1*3).hex()
'0x1.3333333333334p-2'
>>> (0.4).hex()
'0x1.999999999999ap-2'
>>> (0.1*4).hex()
'0x1.999999999999ap-2'
0.1 is 0x1.999999999999a times 2^-4. The "a" at the end means the digit 10 - in other words, 0.1 in binary floating point is very slightly larger than the "exact" value of 0.1 (because the final 0x0.99 is rounded up to 0x0.a). When you multiply this by 4, a power of two, the exponent shifts up (from 2^-4 to 2^-2) but the number is otherwise unchanged, so 4*0.1 == 0.4.
However, when you multiply by 3, the tiny little difference between 0x0.99 and 0x0.a0 (0x0.07) magnifies into a 0x0.15 error, which shows up as a one-digit error in the last position. This causes 0.1*3 to be very slightly larger than the rounded value of 0.3.
Python 3's float repr is designed to be round-trippable, that is, the value shown should be exactly convertible into the original value (float(repr(f)) == f for all floats f). Therefore, it cannot display 0.3 and 0.1*3 exactly the same way, or the two different numbers would end up the same after round-tripping. Consequently, Python 3's repr engine chooses to display one with a slight apparent error.
repr (and str in Python 3) will put out as many digits as required to make the value unambiguous. In this case the result of the multiplication 3*0.1 isn't the closest value to 0.3 (0x1.3333333333333p-2 in hex), it's actually one LSB higher (0x1.3333333333334p-2) so it needs more digits to distinguish it from 0.3.
On the other hand, the multiplication 4*0.1 does get the closest value to 0.4 (0x1.999999999999ap-2 in hex), so it doesn't need any additional digits.
You can verify this quite easily:
>>> 3*0.1 == 0.3
False
>>> 4*0.1 == 0.4
True
I used hex notation above because it's nice and compact and shows the bit difference between the two values. You can do this yourself using e.g. (3*0.1).hex(). If you'd rather see them in all their decimal glory, here you go:
>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(0.3)
Decimal('0.299999999999999988897769753748434595763683319091796875')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
>>> Decimal(0.4)
Decimal('0.40000000000000002220446049250313080847263336181640625')
Here's a simplified conclusion from other answers.
If you check a float on Python's command line or print it, it goes through function repr which creates its string representation.
Starting with version 3.2, Python's str and repr use a complex rounding scheme, which prefers
nice-looking decimals if possible, but uses more digits where
necessary to guarantee bijective (one-to-one) mapping between floats
and their string representations.
This scheme guarantees that value of repr(float(s)) looks nice for simple
decimals, even if they can't be
represented precisely as floats (eg. when s = "0.1").
At the same time it guarantees that float(repr(x)) == x holds for every float x
Not really specific to Python's implementation but should apply to any float to decimal string functions.
A floating point number is essentially a binary number, but in scientific notation with a fixed limit of significant figures.
The inverse of any number that has a prime number factor that is not shared with the base will always result in a recurring dot point representation. For example 1/7 has a prime factor, 7, that is not shared with 10, and therefore has a recurring decimal representation, and the same is true for 1/10 with prime factors 2 and 5, the latter not being shared with 2; this means that 0.1 cannot be exactly represented by a finite number of bits after the dot point.
Since 0.1 has no exact representation, a function that converts the approximation to a decimal point string will usually try to approximate certain values so that they don't get unintuitive results like 0.1000000000004121.
Since the floating point is in scientific notation, any multiplication by a power of the base only affects the exponent part of the number. For example 1.231e+2 * 100 = 1.231e+4 for decimal notation, and likewise, 1.00101010e11 * 100 = 1.00101010e101 in binary notation. If I multiply by a non-power of the base, the significant digits will also be affected. For example 1.2e1 * 3 = 3.6e1
Depending on the algorithm used, it may try to guess common decimals based on the significant figures only. Both 0.1 and 0.4 have the same significant figures in binary, because their floats are essentially truncations of (8/5)(2^-4) and (8/5)(2^-6) respectively. If the algorithm identifies the 8/5 sigfig pattern as the decimal 1.6, then it will work on 0.1, 0.2, 0.4, 0.8, etc. It may also have magic sigfig patterns for other combinations, such as the float 3 divided by float 10 and other magic patterns statistically likely to be formed by division by 10.
In the case of 3*0.1, the last few significant figures will likely be different from dividing a float 3 by float 10, causing the algorithm to fail to recognize the magic number for the 0.3 constant depending on its tolerance for precision loss.
Edit:
https://docs.python.org/3.1/tutorial/floatingpoint.html
Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.
There is no tolerance for precision loss, if float x (0.3) is not exactly equal to float y (0.1*3), then repr(x) is not exactly equal to repr(y).
I've been reading a book Write Great code - Understanding the Machine. In the section about rounding it says:
Numbers should be rounded to the smallest bigger number if the decimal bit value is more than or equal half the total decimal value that can be represented.
which means:
round(1.5) // equals 2
round(1.49) // equals 1
but when I tried this with Python:
x1 = 1.4999 # rounds to 1
x2 = 1.4999999999999999 # rounds to 2
print(round(x1))
print(round(x2))
the output was:
1
2
I tried the same thing with C# and Swift and it gave the same output. So I assume it's a language-agnostic topic.
But why does this happen?
My assumption is that the floating-point unit rounds the extra bits which convert the "1.4999999999999999999" to "1.5" before applying the programmer's rounding.
In x2 = 1.4999999999999999 and print(round(x2)), there are two operations that affect the value. The round function cannot operate directly on the number 1.4999999999999999 or the numeral “1.4999999999999999”. Its operand must be in the floating-point format that the Python implementation uses.
So, first, 1.4999999999999999 is converted to the floating-point format. Python is not strict about which floating-point format a Python implementation uses, but the IEEE-754 basic 64-bit binary format is common. In this format, the closest representable values to 1.4999999999999999 are 1.5 and 1.4999999999999997779553950749686919152736663818359375. The former is closer to 1.4999999999999999 than the latter is, so the former is used.
Thus, converting 1.4999999999999999 to the floating-point format produces 1.5. Then round(1.5) produces 2.
This is more of a numerical analysis rather than programming question, but I suppose some of you will be able to answer it.
In the sum two floats, is there any precision lost? Why?
In the sum of a float and a integer, is there any precision lost? Why?
Thanks.
In the sum two floats, is there any precision lost?
If both floats have differing magnitude and both are using the complete precision range (of about 7 decimal digits) then yes, you will see some loss in the last places.
Why?
This is because floats are stored in the form of (sign) (mantissa) × 2(exponent). If two values have differing exponents and you add them, then the smaller value will get reduced to less digits in the mantissa (because it has to adapt to the larger exponent):
PS> [float]([float]0.0000001 + [float]1)
1
In the sum of a float and a integer, is there any precision lost?
Yes, a normal 32-bit integer is capable of representing values exactly which do not fit exactly into a float. A float can still store approximately the same number, but no longer exactly. Of course, this only applies to numbers that are large enough, i. e. longer than 24 bits.
Why?
Because float has 24 bits of precision and (32-bit) integers have 32. float will still be able to retain the magnitude and most of the significant digits, but the last places may likely differ:
PS> [float]2100000050 + [float]100
2100000100
The precision depends on the magnitude of the original numbers. In floating point, the computer represents the number 312 internally as scientific notation:
3.12000000000 * 10 ^ 2
The decimal places in the left hand side (mantissa) are fixed. The exponent also has an upper and lower bound. This allows it to represent very large or very small numbers.
If you try to add two numbers which are the same in magnitude, the result should remain the same in precision, because the decimal point doesn't have to move:
312.0 + 643.0 <==>
3.12000000000 * 10 ^ 2 +
6.43000000000 * 10 ^ 2
-----------------------
9.55000000000 * 10 ^ 2
If you tried to add a very big and a very small number, you would lose precision because they must be squeezed into the above format. Consider 312 + 12300000000000000000000. First you have to scale the smaller number to line up with the bigger one, then add:
1.23000000000 * 10 ^ 15 +
0.00000000003 * 10 ^ 15
-----------------------
1.23000000003 <-- precision lost here!
Floating point can handle very large, or very small numbers. But it can't represent both at the same time.
As for ints and doubles being added, the int gets turned into a double immediately, then the above applies.
When adding two floating point numbers, there is generally some error. D. Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" describes the effect and the reasons in detail, and also how to calculate an upper bound on the error, and how to reason about the precision of more complex calculations.
When adding a float to an integer, the integer is first converted to a float by C++, so two floats are being added and error is introduced for the same reasons as above.
The precision available for a float is limited, so of course there is always the risk that any given operation drops precision.
The answer for both your questions is "yes".
If you try adding a very large float to a very small one, you will for instance have problems.
Or if you try to add an integer to a float, where the integer uses more bits than the float has available for its mantissa.
The short answer: a computer represents a float with a limited number of bits, which is often done with mantissa and exponent, so only a few bytes are used for the significant digits, and the others are used to represent the position of the decimal point.
If you were to try to add (say) 10^23 and 7, then it won't be able to accurately represent that result. A similar argument applies when adding a float and integer -- the integer will be promoted to a float.
In the sum two floats, is there any precision lost?
In the sum of a float and a integer, is there any precision lost? Why?
Not always. If the sum is representable with the precision you ask, and you won't get any precision loss.
Example: 0.5 + 0.75 => no precision loss
x * 0.5 => no precision loss (except if x is too much small)
In the general case, one add floats in slightly different ranges so there is a precision loss which actually depends on the rounding mode.
ie: if you're adding numbers with totally different ranges, expect precision problems.
Denormals are here to give extra-precision in extreme cases, at the expense of CPU.
Depending on how your compiler handle floating-point computation, results can vary.
With strict IEEE semantics, adding two 32 bits floats should not give better accuracy than 32 bits.
In practice it may requires more instruction to ensure that, so you shouldn't rely on accurate and repeatable results with floating-point.
In both cases yes:
assert( 1E+36f + 1.0f == 1E+36f );
assert( 1E+36f + 1 == 1E+36f );
The case float + int is the same as float + float, because a standard conversion is applied to the int. In the case of float + float, this is implementation dependent, because an implementation may choose to do the addition at double precision. There may be some loss when you store the result, of course.
In both cases, the answer is "yes". When adding an int to a float, the integer is converted to floating point representation before the addition takes place anyway.
To understand why, I suggest you read this gem: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
Sorry, but I really don't know what's the meaning of the defination of round in python 3.3.2 doc:
round(number[, ndigits])
Return the floating point value number rounded to ndigits digits after the decimal point. If ndigits is omitted, it defaults to zero. Delegates to number.__round__(ndigits).
For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus ndigits if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). The return value is an integer if called with one argument, otherwise of the same type as number.
I don't know how come the multiple of 10 and pow.
After reading the following examples, I think round(number,n) works like:
if let number be 123.456, let n be 2
round will get two number:123.45 and 123.46
round compares abs(number-123.45) (0.006) and abs(number-123.46) (0.004),and chooses the smaller one.
so, 123.46 is the result.
and if let number be 123.455, let n be 2:
round will get two number:123.45 and 123.46
round compares abs(number-123.45) (0.005) and abs(number-123.46) (0.005). They are equal. So round checks the last digit of 123.45 and 123.46. The even one is the result.
so, the result is 123.46
Am I right?
If not, could you offer a understandable version of values are rounded to the closest multiple of 10 to the power minus ndigits?
ndigits = 0 => pow(10, -ndigits) = 10^(-ndigits) = 1
ndigits = 1 => pow(10, -ndigits) = 10^(-ndigits) = 0.1
etc.
>>> for ndigits in range(6):
... print round(123.456789, ndigits) / pow(10, -ndigits)
123.0
1235.0
12346.0
123457.0
1234568.0
12345679.0
Basically, the number you get is always an integer multiple of 10^(-ndigits). For ndigits=0, that means the number you get is itself an integer, for ndigits=1 it means it won't have more than one non-zero value after the decimal point.
It helps to know that anything to the power of 0 equals 1. As ndigits increases, the function:
f(ndigits) = 10-ndigits gets smaller as you increase ndigits. Specifically as you increase ndigits by 1, you simply shift the decimal place of precision one left. e.g. 10^-0 = 1, 10^-1 = .1 and 10^-2 = .01. The place where the 1 is in the answer is the last point of precision for round.
For the part where it says
For the built-in types supporting round(), values are rounded to the
closest multiple of 10 to the power minus ndigits; if two multiples
are equally close, rounding is done toward the even choice (so, for
example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2).
This has unexpected behavior in Python 3 and it will not work for all floats. Consider the example you gave, round(123.455, 2) yields the value 123.45. This is not expected behavior because the closest even multiple of 10^-2 is 123.46, not 123.45!
To understand this, you have to pay special attention to the note below this:
Note The behavior of round() for floats can be surprising: for
example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This
is not a bug: it’s a result of the fact that most decimal fractions
can’t be represented exactly as a float.
And that is why certain floats will round to the "wrong value" and there is really no easy workaround as far as I am aware. (sadface) You could use fractions (i.e. two variables representing the numerator and the denominator) to represent floats in a custom round function if you want to get different behavior than the unpredictable behavior for floats.