This question already has answers here:
underlying data structure for float in python
(6 answers)
Closed 9 years ago.
My question is more about
Why in C++,C# and VB
std::cout<<2.2f*3.0f<<std::endl; //prints 6.6
And in Python, Java and Ruby
2.2*3.0 //prints 6.6000000000000005
I'm very familiar with floating point representation in memory.
I checked that in fact 2.2 cannot be represented precisely in memory with single precision.
But still why do C++,C# and VB cut-out the irrelevant part of the result when printing the value while the others do not?
This is not to do with Python, it's just the way computers handle floating-point arithmetic.
Imagine trying to write down 1/3 exactly as a decimal in base 10 - you can't as you don't have an infinite amount of time or paper. There are an infinite number of 3s, so any decimal representation can only ever be an approximation.
Similarly, computers don't have an infinite amount of memory, so they can't represent certain fractions exactly (although these are different fractions as computers work in base 2). So in this case, the nearest the computer can get to 2.2*3.0 is 6.6000000000000005. This isn't to do with the multiplication, it's becuase the computer can't store 2.2 completely accurately. However, most of the time, the degree of accuracy given is near enough.
If you need perfect accuracy in Python, you can use the Decimal module.
In relation to problems this causes in "precise business logic", the answer is usually that when dealing with money, don't encode £1.23 as 1.23, but as 123 (pence). However, you may need to do something more complicated when dealing with dividing amounts of money, but this something else shouldn't just be using floats.
In answer to your edited question, it's just that C++ doesn't display as much of the number as Python. It doesn't store it more accurately.
"But still why do C++,C# and VB cut-out the irrelevant part of the result when printing the value while the others do not?"
Because they do. The people who implemented those languages made a different choice to those who implemented other languages. They weighed up the benefits of printing it out fully (it means you don't forget that floating point arithmetic is inaccurate) with the downsides (it is sometimes more difficult to see what the effective result is, shortening it gives an accurate result much of the time) and came to different conclusions.
Related
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 years ago.
Good day,
I'm getting a strange rounding error and I'm unsure as to why.
print(-0.0075+0.005)
is coming out in the terminal as
-0.0024999999999999996
Which appears to be throwing off the math in the rest of my program. Given that the numbers in this function could be anywhere between 1 and 0.0001, the number of decimal places can vary.
Any ideas why I'm not getting the expected answer of -0.0025
Joe,
The 'rounding error' you refer to is a consequence of the arithmetic system Python is using to perform the requested operation (though by no means limited to Python!).
All real numbers need to be represented by a finite number of bits in a computer program, thus they are 'rounded' to a suitable representation. This is necessary because with a finite number of bits it is not possible to represent all real numbers exactly (or even a finite interval of the numbers in the real line). So while you may define a variable to be 'a=0.005', the computer will store it as something very close to, but not exactly, that. Typically this rounding is done through floating-point representations which is the case in standard Python. In the binary version of this system, real numbers are approximated by integers multiplied by powers of 2 for their representation.
Consequently, operations such as the sum that you are performing operate on this 'rounded' version of the numbers and return another rounded version of the result. This implies that the arithmetic in the computer is always approximate, although usually, it is precise enough that we do not care. If these rounding errors are too large for your application, you may try to switch to use a more precise representation (more bits). You can find a good explainer with examples on Python's docs.
I'm using the Decimal class for operations that requires precision.
I would like to use 'largest possible' precision. With this, I mean as precise as the system on which the program runs can handle.
To set a certain precision it's simple:
import decimal
decimal.getcontext().prec = 123 #123 decimal precision
I tried to figure out the maximum precision the 'Decimal' class can compute:
print(decimal.MAX_PREC)
>> 999999999999999999
So I tried to set the precision to the maximum precision (knowing it probably won't work..):
decimal.getcontext().prec = decimal.MAX_PREC
But, of course, this throws a Memory Error (on division)
So my question is: How do I figure out the maximum precision the current system can handle?
Extra info:
import sys
print(sys.maxsize)
>> 9223372036854775807
Trying to do this is a mistake. Throwing more precision at a problem is a tempting trap for newcomers to floating-point, but it's not that useful, especially to this extreme.
Your operations wouldn't actually require the "largest possible" precision even if that was a well-defined notion. Either they require exact arithmetic, in which case decimal.Decimal is the wrong tool entirely and you should look into something like fractions.Fraction or symbolic computation, or they don't require that much precision, and you should determine how much precision you actually need and use that.
If you still want to throw all the precision you can at your problem, then how much precision that actually is will depend on what kind of math you're doing, and how many absurdly precise numbers you're attempting to store in memory at once. This can be determined by analyzing your program and the memory requirements of Decimal objects, or you can instead take the precision as a parameter and binary search for the largest precision that doesn't cause a crash.
I'd like to suggest a function that allows you to estimate your maximum precision for a given operation in a brute force way:
def find_optimum(a,b, max_iter):
for i in range(max_iter):
print(i)
c = int((a+b)/2)
decimal.getcontext().prec = c
try:
dummy = decimal.Decimal(1)/decimal.Decimal(7) #your operation
a = c
print("no fail")
except MemoryError:
print("fail")
dummy = 1
b = c
print(c)
del dummy
This is just halving intervals one step at a time and looks if an error occurs. Calling with max_iter=10 and a=int(1e9), b=int(1e11) gives:
>>> find_optimum(int(1e9), int(1e11), 10)
0
fail
50500000000
1
no fail
25750000000
2
no fail
38125000000
3
no fail
44312500000
4
fail
47406250000
5
fail
45859375000
6
no fail
45085937500
7
no fail
45472656250
8
no fail
45666015625
9
no fail
45762695312
This may give a rough idea of what you are dealing with. This took approx half an hour on i5-3470 and 16GB RAM so you really only would use it for testing purposes.
I don't think, that there is an actual exact way of getting the maximum precision for your operation, as you'd have to have exact knowledge of the dependency of your memory usage on memory consumption. I hope this helps you at least a bit and I would really like to know, what you need that kind of precision for.
EDIT I feel like this really needs to be added, since I read your comments under the top rated post here. Using arbitrarily high precision in this manner is not the way, that people calculate constants. You would program something, that utilizes disk space in a smart way (for example calcutating a bunch of digits in RAM and writing this bunch to a text file), but never only use RAM/swap only, because this will always limit your results. With modern algorithms to calculate pi, you don't need infinite RAM, you just put another 4TB hard drive in the machine and let it write the next digits. So far for mathematical constants.
Now for physical constants: They are not precise. They rely on measurement. I'm not quite sure atm (will edit) but I think the most exact physical constant has an error of 10**(-8). Throwing more precision at it, doesn't make it more exact, you just calculate more wrong numbers.
As an experiment though, this was a fun idea, which is why I even posted the answer in the first place.
The maximum precision of the Decimal class is a function of the memory on the device, so there's no good way to set it for the general case. Basically, you're allocating all of the memory on the machine to one variable to get the maximum precision.
If the mathematical operation supports it, long integers will give you unlimited precision. However, you are limited to whole numbers.
Addition, subtraction, multiplication, and simple exponents can be performed exactly with long integers.
Prior to Python 3, the built-in long data type would perform arbitrary precision calculations.
https://docs.python.org/2/library/functions.html#long
In Python >=3, the int data type now represents long integers.
https://docs.python.org/3/library/functions.html#int
One example of a 64-bit integer math is implementation is bitcoind, where transactions calculations require exact values. However, the precision of Bitcoin transactions is limited to 1 "Satoshi"; each Bitcoin is defined as 10^8 (integer) Satoshi.
The Decimal class works similarly under the hood. A Decimal precision of 10^-8 is similar to the Bitcoin-Satoshi paradigm.
From your reply above:
What if I just wanted to find more digits in pi than already found? what if I wanted to test the irrationality of e or mill's constant.
I get it. I really do. My one SO question, several years old, is about arbitrary-precision floating point libraries for Python. If those are the types of numerical representations you want to generate, be prepared for the deep dive. Decimal/FP arithmetic is notoriously tricky in Computer Science.
Some programmers, when confronted with a problem, think “I know, I’ll use floating point arithmetic.” Now they have 1.999999999997 problems. – #tomscott
I think when others have said it's a "mistake" or "it depends" to wonder what the max precision is for a Python Decimal type on a given platform, they're taking your question more literally than I'm guessing it was intended. You asked about the Python Decimal type, but if you're interested in FP arithmetic for educational purposes -- "to find more digits in pi" -- you're going to need more powerful, more flexible tools than Decimal or float. These built-in Python types don't even come close. Those are good enough for NASA maybe, but they have limits... in fact, the very limits you are asking about.
That's what multiple-precision (or arbitrary-precision) floating point libraries are for: arbitrarily-precise representations. Want to compute pi for the next 20 years? Python's Decimal type won't even get you through the day.
The fact is, multi-precision binary FP arithmetic is still kinda fringe science. For Python, you'll need to install the GNU MPFR library on your Linux box, then you can use the Python library gmpy2 to dive as deep as you like.
Then, the question isn't, "What's the max precision my program can use?"
It's, "How do I write my program so that it'll run until the electricity goes out?"
And that's a whole other problem, but at least it's restricted by your algorithm, not the hardware it runs on.
I am calculating relative frequencies of words (word count / total number of words). This results in quite a few very small numbers (e.g. 1.2551539760140076e-05). I have read about some of the issues with using floats in this context, e.g. in this article
A float has roughly seven decimal digits of precision ...
Some suggest using logged values instead. I am going to multiply these numbers and was wondering
In general, is the seven digit rule something to go by in Python?
In my case, should I use log values instead?
What bad things could happen if I don't -- just a less accurate value or straight up errors, e.g. in the multiplication?
And If so, do I just convert the float with math.log() - I feel at that point the information is already lost?
Any help is much appreciated!
That article talks about the type float in C, which is a 32 bit quantity. The Python type float is a 64 bit number, like C's double, and therefore can store roughly 17 decimal digits (53 fractional bits instead of 24 with C's float). While that too can be too little precision for some applications, it's much less dire than with 32-bit floats.
Furthermore, because it is a floating point format, small numbers such as 1.2551539760140076e-05 (which actually isn't that small) are not inherently disadvantaged. While only about 17 decimal digits can be represented, these 17 digits need not be the first 17 digits after the decimal point. They can be shifted around, so to speak1. In fact, you used the same concept of floating (decimal) point when you give a number as a bunch of decimal digits times a power of ten (e-5). To give extreme examples, 1-300 can be represented just fine2, as can 10300 — only when these two numbers meet, problems happen (1e300 + 1e-300 == 1e300).
As for a log representation, you would take the log of all values as early as possible and perform as many calculations as possible in log space. In your example you'd calculate the relative frequency of a word as log(word_count) - log(total_words), which is the same as log(word_count / total_words) but possibly more accurate.
What bad things could happen if I don't -- just a less accurate value or straight up errors, e.g. in the multiplication?
I'm not sure what the distinction is. Numeric calculations can have almost perfect accuracy (relative rounding error on the scale of 2-50 or better), but unstable algorithms can also give laughably bad results in some cases. There are quite strict bounds on the rounding error of each individual operation3, but in longer calculations, they interact in surprising ways to cause very large errors. For example, even just summing up a large list of floats can introduce significant error, especially if they are of very different magnitudes and signs. The proper analysis and design of reliable numeric algorithms is an art of its own which I cannot do justice here, but thanks to the good design of IEEE-754, most algorithms usually work out okay. Don't worry too much about it, but don't ignore it either.
1 In reality we're talking about 53 binary digits being shifted around, but this is unimportant for this concept. Decimal floating point formats exist.
2 With a relative rounding error of less than 2-54, which occurs for any fraction whose denominator isn't a power of two, including such mundane ones as 1/3 or 0.1.
3 For basic arithmetic operations, the rounding error should be half a unit in the last place, i.e., the result must be calculated exactly and then be rounded correctly. For transcendental functions the error is rarely more than a one or two units in the last place but can be larger.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How is floating point stored? When does it matter?
Python rounding error with float numbers
I am trying to understand why we get floating point representation error in python. I know this is not new question here but honestly I am finding hard time to understand it. I am going through the official document of python http://docs.python.org/tutorial/floatingpoint.html on section Representation Error at bottom of the page.
But I am not able to get how this expression J/2**N comes into picture and why in my interpreter I am getting this value.
0.1--->0.10000000000000001
The closest question I found is floating point issue and How are floating point numbers are stored in memory? but not able to understand.
Can anyone please in detail and simple language? Appreciate any help.
Thanks,
Sunil
You can think of 0.1 being a rational number for a computer - a rational number whose decimal expansion is not finite.
Take 1/3 for instance. For us humans, we know that it means "one third" (no more, no less). But if we were to write it down without fractions, we would have to write 0.3333... and so on. In fact, there is no way we can represent exactly one third with a decimal notation. So there are numbers we can write using decimal notation, and numbers we can't. For the latter, we have to use fractions - and we can do so because we have been taught maths at school.
On the other hand, the computer works with bits (only 2 digits: 1 and 0), and can only work with a binary notation - no fractions. Because of the different basis (2 instead of 10), the concept of a finite rational number is somewhat shifted: numbers that we can represent exactly in decimal notation may not be represented exactly in binary notation, and vice versa. What looks like a simple case for us (1/10=one tenth=0.1, exactly) is not necessarily an easy case for a CPU.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Python float - str - float weirdness
In python, 2/5.0 or 2/float(5) returns 0.40000000000000002
Why do I get that error at the end and how can I get the right value to use in additional calculations?
Welcome to IEEE754, enjoy your stay.
Use decimal instead.
Because floating point arithmetic is not exact. You should use this value in your additional calculations, and round off the result when you're finished. If you need it to be exact, use another data type.
Ignacio above has the right answer.
There is are IEEE standards for efficiently storing floating point numbers into binary computers. These go in excruciating detail about exactly how numbers are stored and these rules are followed on almost every computer.
They are also wrong. Binary numbers cannot handle most normal numbers, just powers of two. Instead of doing something tricky requiring recomputation of the bottom bits to round-off or other tricks, the standards choose efficiency.
That way, you can curse at your system that runs slightly faster. There are occasional debates about changing Python in some way to work around these problems, but the answers are not trivial without a huge loss in efficiency.
Getting around this:
One option is digging into the "decimal" package of the standard library. If you stick to the examples and ignore the long page, it will get you what you want. No bets on efficiency.
Second is to do a manual rounding and string truncate yourself in one output function. Who cares if the number is off a bit if you never print those bits?
Note that Python 3.1 has a new floating point formatting function that avoids this sort of appearance. See What's new in Python 3.1 for more details (search for "floating point").
See this question for the explanation. The right way would be to either
Use integers until the "final" calculation
Live with rounding errors.