I'm currently trying to round standard deviations to the sixth decimal place from an array of data.
Python round() didn't work as I wanted it to, given that some numbers were displayed oddly. For example, what I meant to be 0.013931 showed up as 0.013931099999999998. I fixed the gist of the issue by using Decimal and setting the context precision to 5, but now some standard deviations show up rounded to the 6th decimal while others are rounded to the 7th!
from decimal import *
getcontext().prec = 4
getcontext().rounding = ROUND_HALF_UP
print(Decimal(0.005855795678472189)/10)
print(Decimal(0.013931099999999998)/10)
I expect the output to be 0.00058558 and 0.0013931, yet the actual output is 0.0005856 and 0.001393, which have different lengths!
The precision in the decimal package is applied to the fraction part of the floating point number. That is, in scientific notation you will always see 4 digits if you set getcontext().prec = 4, like so
>>> print(Decimal(0.005855795678472189)/10)
0.0005856
>>> print(Decimal(0.0005855795678472189)/10)
0.00005856
>>> print(Decimal(0.00005855795678472189)/10)
0.000005856
>>> print(Decimal(0.000005855795678472189)/10)
5.856E-7
>>> print(Decimal(0.0000005855795678472189)/10)
5.856E-8
>>> print(Decimal(0.00000005855795678472189)/10)
5.856E-9
>>> print(Decimal(0.000000005855795678472189)/10)
5.856E-10
Note that floating point numbers are stored in three parts
the first bit is the sign (plus/minus),
the next few bits are the exponent part. (This is 11 bits in floating point numbers that follow the IEEE 754 standard for 64 bit. This includes C++ double and python float. The exponent part is then the binary scientific notation exponent -1023 so we can have numbers between e-1024 of the exponent part is 0 (all zeros) and e+1023 if the exponent part is 2**11-1=2047 (all ones).)
the remaining bits are the fractional part.
The wikipedia article has details.
Related
In the following example:
import math
x = math.log(2)
print("{:.500f}".format(x))
I tried to get 500 digits output I get only 53 decimals output of ln(2) as follows:
0.69314718055994528622676398299518041312694549560546875000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
How I can fix this problem?
You can't with the Python float type. It's dependent on the underlying machine architecture, and in most cases you're limited to a double-precision float.
However, you can get higher precision with the decimal module:
>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 500
>>> d = Decimal(2)
>>> d.ln()
Decimal('0.69314718055994530941723212145817656807550013436025525412068000949339362196969471560586332699641868754200148102057068573368552023575813055703267075163507596193072757082837143519030703862389167347112335011536449795523912047517268157493206515552473413952588295045300709532636664265410423915781495204374043038550080194417064167151864471283996817178454695702627163106454615025720740248163777338963855069526066834113727387372292895649354702576265209885969320196505855476470330679365443254763274495125040607')
>>> print(d.ln())
0.69314718055994530941723212145817656807550013436025525412068000949339362196969471560586332699641868754200148102057068573368552023575813055703267075163507596193072757082837143519030703862389167347112335011536449795523912047517268157493206515552473413952588295045300709532636664265410423915781495204374043038550080194417064167151864471283996817178454695702627163106454615025720740248163777338963855069526066834113727387372292895649354702576265209885969320196505855476470330679365443254763274495125040607
I tried to get 500 digits output I get only 53 decimals output of ln(2) as follows:
The problem is not in the printing. The 500 digit output is the exact value returned from math.log(2).
The return value of math.log(2) is encoded using binary64 which can only represent about 264 different finite values - each of them is a dyadic rational. Mathematically log(2) is an irrational number, thus it is impossible for x to encode the math result exactly.
Instead math.log(2) returns the nearest encodable value.
That value is exactly 0.6931471805599452862267639829951804131269454956054687500...
Printing binary64 with more than 17 significant digits typically does not add important value information.
Within the realm of real numbers, which is an infinite set of numbers with arbitrary precision, the floating point numbers are a small subset of numbers with a finite precision. They are the numbers that are represented by a linear combination of powers of two (See Double Precision floating point format).
As Ln(2) is not re-presentable as a floating-point number, a computer finds the nearest number by numerical approximations. In case of Ln(2), this number is:
6243314768165359 * 2^-53 = 0.69314718055994528622676398299518041312694549560546875
If you need to do arbitrary precision arithmetic, you are required to make use of different computational methods. Various software packages exist that allow this. For Python, MPmath is fairly standard:
>>> from mpmath import *
>>> mp.dps = 500
>>> mp.pretty=True
>>> ln(2)
0.69314718055994530941723212145817656807550013436025525412068000949339362196969471560586332699641868754200148102057068573368552023575813055703267075163507596193072757082837143519030703862389167347112335011536449795523912047517268157493206515552473413952588295045300709532636664265410423915781495204374043038550080194417064167151864471283996817178454695702627163106454615025720740248163777338963855069526066834113727387372292895649354702576265209885969320196505855476470330679365443254763274495125040607
we know that multiplication and division are inverse each other, so in python Suppose i have a number 454546456756765675454.00 and i want to divided the number with 32 lets define a variable for example
value = 454546456756765675454.00/32
so the output will be, 1.4204576773648927e+19 or 14204576773648926720.000000, now i want to multiply the output with 32 so if i multiply 14204576773648926720.000000 * 32 then the output give me 454546456756765655040.00 not 454546456756765675454.00 why this happend? i am not good at math, but my question is why float multiply give me wrong answer ( i also try decimal module but its not work for me or maybe i dont know how to use decimal module to get exact answer)
Floating points are stored as binary fractions. Some number cannot be precisely written in base 2 form. So, their approximated value is store.
Now if this approximation had an error of +0.0001 for some number, and if this number is multiplied by 10000, then we our result will shift by value of 0.0001*10000 = 1.
It is same in pretty much all programming languages.
For operations where precision is very important, decimal module should be preferred.
i also try decimal module but its not work for me or maybe i dont know how to use decimal module to get exact answer
Your example, using decimal module, will look something like:
import decimal
value = decimal.Decimal(454546456756765675454)
vd = value/decimal.Decimal(32)
vm = vd*32
diff = vm - value
assert diff == decimal.Decimal(0)
# assert diff == 0.0
Within wide range limits, multiplication and division by powers of two, including 32, are exact in binary floating point. Conversion of a decimal is inexact. 454546456756765655040 is the closest IEEE 754 64-bit binary number to 454546456756765675454. The division and multiplication by 32 made no difference.
More generally, division and multiplication by the same number can result in rounding error in finite width decimal/binary etc. fractions unless all the prime factors of the divisor are also prime factors of the radix being used to represent fractions. In both binary and decimal fractions, division and multiplication by 3 can cause rounding error, because 3 is a factor of neither 2 nor 10.
Division and multiplication by 32 can be exact, given enough significand width, in both decimal and binary because two, the only prime factor of 32, is a factor of both 10 and 2.
Why do some numbers lose accuracy when stored as floating point numbers?
For example, the decimal number 9.2 can be expressed exactly as a ratio of two decimal integers (92/10), both of which can be expressed exactly in binary (0b1011100/0b1010). However, the same ratio stored as a floating point number is never exactly equal to 9.2:
32-bit "single precision" float: 9.19999980926513671875
64-bit "double precision" float: 9.199999999999999289457264239899814128875732421875
How can such an apparently simple number be "too big" to express in 64 bits of memory?
In most programming languages, floating point numbers are represented a lot like scientific notation: with an exponent and a mantissa (also called the significand). A very simple number, say 9.2, is actually this fraction:
5179139571476070 * 2 -49
Where the exponent is -49 and the mantissa is 5179139571476070. The reason it is impossible to represent some decimal numbers this way is that both the exponent and the mantissa must be integers. In other words, all floats must be an integer multiplied by an integer power of 2.
9.2 may be simply 92/10, but 10 cannot be expressed as 2n if n is limited to integer values.
Seeing the Data
First, a few functions to see the components that make a 32- and 64-bit float. Gloss over these if you only care about the output (example in Python):
def float_to_bin_parts(number, bits=64):
if bits == 32: # single precision
int_pack = 'I'
float_pack = 'f'
exponent_bits = 8
mantissa_bits = 23
exponent_bias = 127
elif bits == 64: # double precision. all python floats are this
int_pack = 'Q'
float_pack = 'd'
exponent_bits = 11
mantissa_bits = 52
exponent_bias = 1023
else:
raise ValueError, 'bits argument must be 32 or 64'
bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0'))
return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
There's a lot of complexity behind that function, and it'd be quite the tangent to explain, but if you're interested, the important resource for our purposes is the struct module.
Python's float is a 64-bit, double-precision number. In other languages such as C, C++, Java and C#, double-precision has a separate type double, which is often implemented as 64 bits.
When we call that function with our example, 9.2, here's what we get:
>>> float_to_bin_parts(9.2)
['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
Interpreting the Data
You'll see I've split the return value into three components. These components are:
Sign
Exponent
Mantissa (also called Significand, or Fraction)
Sign
The sign is stored in the first component as a single bit. It's easy to explain: 0 means the float is a positive number; 1 means it's negative. Because 9.2 is positive, our sign value is 0.
Exponent
The exponent is stored in the middle component as 11 bits. In our case, 0b10000000010. In decimal, that represents the value 1026. A quirk of this component is that you must subtract a number equal to 2(# of bits) - 1 - 1 to get the true exponent; in our case, that means subtracting 0b1111111111 (decimal number 1023) to get the true exponent, 0b00000000011 (decimal number 3).
Mantissa
The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:
6.0221413x1023
The mantissa would be the 6.0221413. Recall that the mantissa in scientific notation always begins with a single non-zero digit. The same holds true for binary, except that binary only has two digits: 0 and 1. So the binary mantissa always starts with 1! When a float is stored, the 1 at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:
1.0010011001100110011001100110011001100110011001100110
This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.
When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 252 to move it 52 places to the right:
0.0010011001100110011001100110011001100110011001100110
In decimal notation, that's the same as dividing 675539944105574 by 4503599627370496 to get 0.1499999999999999. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)
Now that we've transformed the third component into a fractional number, adding 1 gives the true mantissa.
Recapping the Components
Sign (first component): 0 for positive, 1 for negative
Exponent (middle component): Subtract 2(# of bits) - 1 - 1 to get the true exponent
Mantissa (last component): Divide by 2(# of bits) and add 1 to get the true mantissa
Calculating the Number
Putting all three parts together, we're given this binary number:
1.0010011001100110011001100110011001100110011001100110 x 1011
Which we can then convert from binary to decimal:
1.1499999999999999 x 23 (inexact!)
And multiply to reveal the final representation of the number we started with (9.2) after being stored as a floating point value:
9.1999999999999993
Representing as a Fraction
9.2
Now that we've built the number, it's possible to reconstruct it into a simple fraction:
1.0010011001100110011001100110011001100110011001100110 x 1011
Shift mantissa to a whole number:
10010011001100110011001100110011001100110011001100110 x 1011-110100
Convert to decimal:
5179139571476070 x 23-52
Subtract the exponent:
5179139571476070 x 2-49
Turn negative exponent into division:
5179139571476070 / 249
Multiply exponent:
5179139571476070 / 562949953421312
Which equals:
9.1999999999999993
9.5
>>> float_to_bin_parts(9.5)
['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.
Assemble the binary scientific notation:
1.0011 x 1011
Shift the decimal point:
10011 x 1011-100
Subtract the exponent:
10011 x 10-1
Binary to decimal:
19 x 2-1
Negative exponent to division:
19 / 21
Multiply exponent:
19 / 2
Equals:
9.5
Further reading
The Floating-Point Guide: What Every Programmer Should Know About Floating-Point Arithmetic, or, Why don’t my numbers add up? (floating-point-gui.de)
What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg 1991)
IEEE Double-precision floating-point format (Wikipedia)
Floating Point Arithmetic: Issues and Limitations (docs.python.org)
Floating Point Binary
This isn't a full answer (mhlester already covered a lot of good ground I won't duplicate), but I would like to stress how much the representation of a number depends on the base you are working in.
Consider the fraction 2/3
In good-ol' base 10, we typically write it out as something like
0.666...
0.666
0.667
When we look at those representations, we tend to associate each of them with the fraction 2/3, even though only the first representation is mathematically equal to the fraction. The second and third representations/approximations have an error on the order of 0.001, which is actually much worse than the error between 9.2 and 9.1999999999999993. In fact, the second representation isn't even rounded correctly! Nevertheless, we don't have a problem with 0.666 as an approximation of the number 2/3, so we shouldn't really have a problem with how 9.2 is approximated in most programs. (Yes, in some programs it matters.)
Number bases
So here's where number bases are crucial. If we were trying to represent 2/3 in base 3, then
(2/3)10 = 0.23
In other words, we have an exact, finite representation for the same number by switching bases! The take-away is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.
To drive this point home, let's look at 1/2. It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.
(1/2)10 = 0.510 = 0.12 = 0.1111...3
Why are floating point numbers inaccurate?
Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.
While all of the other answers are good there is still one thing missing:
It is impossible to represent irrational numbers (e.g. π, sqrt(2), log(3), etc.) precisely!
And that actually is why they are called irrational. No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision.
Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. You would need to store a pair of (possibly very big) integers a and b to hold the number represented by the fraction a/b. All your arithmetic would have to be done on fractions just like in highschool math (e.g. a/b * c/d = ac/bd).
But of course you would still run into the same kind of trouble when pi, sqrt, log, sin, etc. are involved.
TL;DR
For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every not-representable number is approximated. Some numbers (i.e. irrational) can never be represented no matter the system.
There are infinitely many real numbers (so many that you can't enumerate them), and there are infinitely many rational numbers (it is possible to enumerate them).
The floating-point representation is a finite one (like anything in a computer) so unavoidably many many many numbers are impossible to represent. In particular, 64 bits only allow you to distinguish among only 18,446,744,073,709,551,616 different values (which is nothing compared to infinity). With the standard convention, 9.2 is not one of them. Those that can are of the form m.2^e for some integers m and e.
You might come up with a different numeration system, 10 based for instance, where 9.2 would have an exact representation. But other numbers, say 1/3, would still be impossible to represent.
Also note that double-precision floating-points numbers are extremely accurate. They can represent any number in a very wide range with as much as 15 exact digits. For daily life computations, 4 or 5 digits are more than enough. You will never really need those 15, unless you want to count every millisecond of your lifetime.
Why can we not represent 9.2 in binary floating point?
Floating point numbers are (simplifying slightly) a positional numbering system with a restricted number of digits and a movable radix point.
A fraction can only be expressed exactly using a finite number of digits in a positional numbering system if the prime factors of the denominator (when the fraction is expressed in it's lowest terms) are factors of the base.
The prime factors of 10 are 5 and 2, so in base 10 we can represent any fraction of the form a/(2b5c).
On the other hand the only prime factor of 2 is 2, so in base 2 we can only represent fractions of the form a/(2b)
Why do computers use this representation?
Because it's a simple format to work with and it is sufficiently accurate for most purposes. Basically the same reason scientists use "scientific notation" and round their results to a reasonable number of digits at each step.
It would certainly be possible to define a fraction format, with (for example) a 32-bit numerator and a 32-bit denominator. It would be able to represent numbers that IEEE double precision floating point could not, but equally there would be many numbers that can be represented in double precision floating point that could not be represented in such a fixed-size fraction format.
However the big problem is that such a format is a pain to do calculations on. For two reasons.
If you want to have exactly one representation of each number then after each calculation you need to reduce the fraction to it's lowest terms. That means that for every operation you basically need to do a greatest common divisor calculation.
If after your calculation you end up with an unrepresentable result because the numerator or denominator you need to find the closest representable result. This is non-trivil.
Some Languages do offer fraction types, but usually they do it in combination with arbitary precision, this avoids needing to worry about approximating fractions but it creates it's own problem, when a number passes through a large number of calculation steps the size of the denominator and hence the storage needed for the fraction can explode.
Some languages also offer decimal floating point types, these are mainly used in scenarios where it is imporant that the results the computer gets match pre-existing rounding rules that were written with humans in mind (chiefly financial calculations). These are slightly more difficult to work with than binary floating point, but the biggest problem is that most computers don't offer hardware support for them.
When using log2() in gmpy2 it does not seem to be accurate after 16 digits. It seems to work fine at 15 digits but after that the answer is not correct using mpz(mpfr(2) ** mpfr(x)). Do I need change the precision? I thought python by itself would be accurate up to 53 digits.
Additionally, is there a way in gmpy2 to use a logarithm operation in bases besides 10 and 2? For example, base 8 or 16.
The standard Python float type is accurate to 53 bits which is roughly 16 decimal digits. gmpy2 uses a default precision of 53 bits. If you want more accurate results, you will need to increase the precision.
>>> import gmpy2
>>> from gmpy2 import mpz,mpfr,log2
>>> a=12345678901234567890
>>> gmpy2.get_context().precision=70
>>> mpz(2**log2(a))
mpz(12345678901234567890L)
To calculate a logarithm in a different, just use
>>> gmpy2.log(x)/gmpy2.log(base)
Update
Recovering an exact integer result from a sequence of floating point calculations is generally not possible. Depending on the actual calculations, you can increase the precision until you get "close enough".
Let's look at the impact of precision. Note that a is 57 bits long so it cannot be exactly represented with 53 bits of floating point precision.
>>> a=123543221556677776
>>> a.bit_length()
57
>>> gmpy2.get_context().precision=53
>>> mpfr(a);2**log2(a)
mpfr('1.2354322155667778e+17')
mpfr('1.2354322155667752e+17')
Since conversion of a binary floating point number to decimal can introduce a conversion error, lets look at the results in binary.
>>> mpfr(a).digits(2);(2**log2(a)).digits(2)
('11011011011101001111001111100101101001011000011001001', 57, 53)
('11011011011101001111001111100101101001011000010111001', 57, 53)
Let's trying increasing the precision to 57 bits.
>>> gmpy2.get_context().precision=57
>>> mpfr(a).digits(2);(2**log2(a)).digits(2)
('110110110111010011110011111001011010010110000110010010000', 57, 57)
('110110110111010011110011111001011010010110000110010011000', 57, 57)
Notice more bits are correct but there is still an error. Let's try 64 bits.
>>> gmpy2.get_context().precision=64
>>> mpfr(a);2**log2(a)
mpfr('123543221556677776.0',64)
mpfr('123543221556677775.953',64)
>>> mpfr(a).digits(2);(2**log2(a)).digits(2)
('1101101101110100111100111110010110100101100001100100100000000000', 57, 64)
('1101101101110100111100111110010110100101100001100100011111111010', 57, 64)
The large number of trailing 1's is roughly equivalent to trailing 9's in decimal.
Once you get "close enough", you can convert to an integer which will round the result to the expected value.
Why isn't 57 bits sufficient? The MPFR library that is used by gmpy2 does perform correct rounding. There is still a small error. Let's also look at the results using the floating point values immediately above and below the correctly rounded value.
>>> gmpy2.get_context().precision=57
>>> b=log2(a)
>>> 2**gmpy2.next_below(b);2**log2(a);2**gmpy2.next_above(b)
mpfr('123543221556677746.0',57)
mpfr('123543221556677784.0',57)
mpfr('123543221556677822.0',57)
Notice that even a small change in b causes a much larger change in 2**b.
Update 2
Floating point arithmetic is only an approximation to the mathematical properties of real numbers. Some numbers are rational (they can be written as a fraction) but most numbers are irrational (they can never be written exactly as a fraction). Floating point arithmetic actually uses a rational approximation to a number.
I've skipped some of the details in the following - I assume all numbers are between 0 and 1.
With binary floating point (what most computers use), the denominator of the rational approximation must be a power of 2. Numbers like 1/2 or 1/4 can be represented exactly. Decimal floating point uses rational approximations that have a denominator that is a power of 10. Numbers like 1/2, '1/4', '1/5', and 1/20 can all be represented exactly. Neither can represent 1/3 exactly. A base-6 implementation of floating point arithmetic can represent 1/2 and 1/3 exactly but not 1/10. The precision of a particular format just specifies the maximum size of the numerator. There will always be some rational numbers that cannot be represented exactly by a given base.
Since irrational numbers can't be written as a rational number, they can not be represented exactly by a given base. Since logarithm and exponential functions almost always result in irrational values, the calculations are almost never exact. By increasing the precision, you can usually get "close enough" but you can never get exact.
There are programs that work symbolically - they remember that a is log2(n) and when you do 2**a, the exact value of a is returned. See SymPy.
Inspired by this question, I was trying to find out what exactly happens there (my answer was more intuitive, but I cannot exactly understand the why of it).
I believe it comes down to this (running 64 bit Python):
>>> sys.maxint
9223372036854775807
>>> float(sys.maxint)
9.2233720368547758e+18
Python uses the IEEE 754 floating-point representation, which effectively has 53 bits for the significant. However, as far as I understand it, the significant in the above example would require 57 bits (56 if you drop the implied leading 1) to be represented. Can someone explain this discrepancy?
Perhaps the following will help clear things up:
>>> hex(int(float(sys.maxint)))
'0x8000000000000000L'
This shows that float(sys.maxint) is in fact a power of 2. Therefore, in binary its mantissa is exactly 1. In IEEE 754 the leading 1. is implied, so in the machine representation this number's mantissa consists of all zero bits.
In fact, the IEEE bit pattern representing this number is as follows:
0x43E0000000000000
Observe that only the first three nibbles (the sign and the exponent) are non-zero. The significand consists entirely of zeroes. As such it doesn't require 56 (nor indeed 53) bits to be represented.
You're wrong. It requires 1 bit.
>>> (9.2233720368547758e+18).hex()
'0x1.0000000000000p+63'
When you convert sys.maxint to a float or double, the result is exactly 0x1p63, because the significand contains only 24 or 53 bits (including the implicit bit), so the trailing bits cause a round up. (sys.maxint is 2^63 - 1, and rounding it up produces 2^63.)
Then, when you print this float, some subroutine formats it as a decimal numeral. To do this, it calculates digits to represent 2^63. The fact that it is able to print 9.2233720368547758e+18 does not imply that the original number contains bits that would distinguish it from 9.2233720368547759e+18. It simple means that the bits in it do represent 9.2233720368547758e+18 (approximately). In fact, the next representable floating-point number in double precision is 9223372036854777856 (approximately 9.2233720368547778e+18), which is 2^63 + 2048. So the low 11 bits of these integers are not present in the double. The formatter merely displays the number as if those bits are zero.