Python (and almost anything else) has known limitations while working with floating point numbers (nice overview provided here).
While problem is described well in the documentation it avoids providing any approach to fixing it. And with this question I am seeking to find a more or less robust way to avoid situations like the following:
print(math.floor(0.09/0.015)) # >> 6
print(math.floor(0.009/0.0015)) # >> 5
print(99.99-99.973) # >> 0.016999999999825377
print(.99-.973) # >> 0.017000000000000015
var = 0.009
step = 0.0015
print(var < math.floor(var/step)*step+step) # False
print(var < (math.floor(var/step)+1)*step) # True
And unlike suggested in this question, their solution does not help to fix a problem like next peace of code failing randomly:
total_bins = math.ceil((data_max - data_min) / width) # round to upper
new_max = data_min + total_bins * width
assert new_max >= data_max
# fails. because for example 1.9459999999999997 < 1.946
If you deal in discrete quantities, use int.
Sometimes people use float in places where they definitely shouldn't. If you're counting something (like number of cars in the world) as opposed to measuring something (like how much gasoline is used per day), floating-point is probably the wrong choice. Currency is another example where floating point numbers are often abused: if you're storing your bank account balance in a database, it's really not 123.45 dollars, it's 12345 cents. (But also see below about Decimal.)
Most of the rest of the time, use float.
Floating-point numbers are general-purpose. They're extremely accurate; they just can't represent certain fractions, like finite decimal numbers can't represent the number 1/3. Floats are generally suited for any kind of analog quantity where the measurement has error bars: length, mass, frequency, energy -- if there's uncertainty on the order of 2^(-52) or greater, there's probably no good reason not to use float.
If you need human-readable numbers, use float but format it.
"This number looks weird" is a bad reason not to use float. But that doesn't mean you have to display the number to arbitrary precision. If a number with only three significant figures comes out to 19.99909997918947, format it to one decimal place and be done with it.
>>> print('{:0.1f}'.format(e**pi - pi))
20.0
If you need precise decimal representation, use Decimal.
Sraw's answer refers to the decimal module, which is part of the standard library. I already mentioned currency as a discrete quantity, but you may need to do calculations on amounts of currency in which not all numbers are discrete, for example calculating interest. If you're writing code for an accounting system, there will be rules that say when rounding is applied and to what accuracy various calculations are done, and those specifications will be written in terms of decimal places. In this situation and others where the decimal representation is inherent to the problem specification, you'll want to use a decimal type.
>>> from decimal import Decimal
>>> rate = Decimal('0.0345')
>>> principal = Decimal('3412.65')
>>> interest = rate*principal
>>> interest
Decimal('117.736425')
>>> interest.quantize(Decimal('0.01'))
Decimal('117.74')
But most importantly, use data types and operations that make sense in context.
Several of your examples use math.floor, which takes a float and chops off the fractional part. In any situation where you should use math.floor, floating-point error doesn't matter. (If you want to round to the nearest integer, use round instead.) Yes, there are ways to use floating-point operations that have wrong results from a mathematical standpoint. But real-world quantities usually fall into one of these categories:
Exact, and therefore should not be put in a float;
Imprecise to a degree far exceeding the likely accumulation of floating-point error.
As a programmer, it's part of your job to know the quantities you're dealing with and choose appropriate data types. So there's no "fix" for floating point numbers, because there's no "problem" really -- just people using the wrong type for the wrong thing.
Let's talk about decimal. Actually, this library converts number into a string-like object, and then do any arithmetical operation based on chars.
So in this case, it can handle significantly huge number with almost perfect precision.
But, as it calculate number based on chars, it cost much more.
Further, if you want to use decimal, to ensure precision, you need consistently use it. If you mix decimal with normal types such as float, it may cause unexpected problems.
Finally, when you construct a Decimal object, it is better to pass a string but not a number.
>>> print(Decimal(99.99) - Decimal(99.973))
0.01699999999999590727384202182
>>> print(Decimal("99.99") - Decimal("99.973"))
0.017
It depends what your end goal is - there is no way to "perfectly" store floating point numbers. Only "good enough".
If you are working with money for example (dollars and cents) it is common practice to not store dollars - and only cents. (dollar = 100 cents) - this is how paypal stores your account balance on their servers.
There is also the python Decimal class for fixed point arithmetic.
Related
When checking if a floor is an int, the recommend method would be is_integer:
However, I get a weird behaviour with the results of the log function:
print(log(9,3)); #2.0
print((log(9,3)).is_integer()); #True
print((log(243,3))); #5.0
print((log(243,3)).is_integer()); #False
Furthermore:
print((int) (log(9,3))); #2
print((int) (log(243,3))); #4
Is this normal?
log(243,3) simply doesn't give you exactly 5:
>>> '%.60f' % log(243,3)
'4.999999999999999111821580299874767661094665527343750000000000'
As the docs say, log(x, base) is "calculated as log(x)/log(base)". And neither log(243) nor log(3) can be represented exactly, and you get rounding errors. Sometimes you're lucky, sometimes you're not. Don't count on it.
When you want to compare float numbers, use math.isclose().
When you want to convert a float number that is close to an integer, use round().
Float numbers are too subject to error for "conventional" methods to be used. Their precision (and the precision of functions like log) is too limited, unfortunately. What looks like a 5 may not be an exact 5.
And yes: it is normal. This is not a problem with Python, but with every language I'm aware of (they all use the same underlying representation). Python offers some ways to work around float problems: decimal and fractions. Both have their own drawbacks, but sometimes they help. For example, with fractions, you can represent 1/3 without loss of precision. Similarly, with decimal, you can represent 0.1 exactly. However, you'll still have problems with log, sqrt, irrational numbers, numbers that require many digits to be represented and so on.
What is the default rounding mode (rounding to nearest, etc) in Python? And how can we specify it?
With IEEE754-based platform (as most modern ones do, including x86, ARM, MIPS...), it's default mode "round to nearest, ties to even" is the only mode available in Python standard library. That is "provided" by standardized defaults and absense of library methods to change it. There are more languages that doesn't allow to change rounding mode - e.g. Java - so this isn't an isolated Python whim.
In real, there are too few reasons to change this. Direct rounding modes of IEEE754 are very special in their use. (I don't apologize the approach to stick on the default rounding, but simply comment on it.) For example, multiply of 1e308 by 1e308 with rounding to zero or to minus infinity results in approximately 1.8e308, so, the result is too far both from the exact answer and from POLA-based one (infinity). If you really need some specific modes for your computations, consider using specific libraries, like MPFR or gmpy2.
If you insist on changing this without external modules specialized on floating-point calculations, try using C-library fesetround via ctypes module or analog, e.g. here. Again, it's your choice to use such hacks and become responsible to all consequences. I'd suggest wrapping all pieces with special rounding to C-level code which restores the default mode on function exit.
The accepted answer is not really correct. While floats are probably what first comes to mind when someone asks about rounding modes, they are not the place you should look.
The reason is simple: rounding is something you use to make your answer have a smaller number of digits. Whenever you mention digits, you must be sure what base you're talking about. I don't know about you specifically, but when people speak of digits, they usually mean decimal digits. For that purpose, floats are obviously inadequate, since they have binary digits. You cannot round a float 0.12 to one decimal digit because it doesn't make sense: despite the appearance, it doesn't have that kind of digits. :-)
Of course, what you can do, is try to compensate for the decimal inexactness of floats by rounding them so the overshoots and undershoots cancel each other in the best possible way, and in that context, it has been proven long ago that there is only one right answer, ROUND_HALF_EVEN---and it is provided by float (and by round function, if you need it on some higher decimal place) out of the box. But please note that it's not the same as 'calculating the mean grade' (ROUND_HALF_UP, usually), or 'estimating the mean error' (ROUND_UP), or 'giving you a tax grade (ROUND_FLOOR), or various other specific tasks which need some fixed number of decimal digits (or in case of some now defunct currencies, some other base, but usually not binary).
And in fact, there is a Python standard library module which gives you all the rounding modes you might find useful, and given the above paragraphs, it is the logical place to look: of course, it's the decimal module. It represents floating point numbers not in base 2, but in base 10, and as such, it offers the meaningful possibility to round a number to a given number of decimals, using a rounding method that makes sense for a particular task.
>>> import statistics, decimal
>>> grades = map(decimal.Decimal, [4, 5])
>>> print(statistics.mean(grades).to_integral_exact(decimal.ROUND_HALF_UP))
5
HTH.
I've used these in the past without any negative effects:
from math import floor, ceil
def round_floor(scale, x):
return floor(x*(10**scale))/(10**scale)
def round_ceiling(scale,x):
return ceil(x*(10**scale))/(10**scale)
but I haven't considered the implications for very large or very precise numbers.
>>> round_floor(1,123.456)
123.4
>>> round_floor(-1,123.456)
120.0
>>> round_floor(2,123.456)
123.45
>>> round_ceiling(1,123.456)
123.5
>>> round_ceiling(-1,123.456)
130.0
>>> round_ceiling(-2,123.456)
200.0
round() is the built in fonction for rounding. it works as follows:
round(number[, ndigits])
it returns the float type number you are rounding with ndigits decimals
Values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). The return value is an integer if called with one argument, otherwise of the same type as number.
for limitations:
https://docs.python.org/3/tutorial/floatingpoint.html#tut-fp-issues
sources:
https://docs.python.org/3/library/functions.html
I would need to have a float variable rounded to 2 significant digits and store the result into a new variable (or the same of before, it doesn't matter) but this is what happens:
>>> a
981.32000000000005
>>> b= round(a,2)
>>> b
981.32000000000005
I would need this result, but into a variable that cannot be a string since I need to insert it as a float...
>>> print b
981.32
Actually truncate would also work I don't need extreme precision in this case.
What you are trying to do is in fact impossible. That's because 981.32 is not exactly representable as a binary floating point value. The closest double precision binary floating point value is:
981.3200000000000500222085975110530853271484375
I suspect that this may come as something of a shock to you. If so, then I suggest that you read What Every Computer Scientist Should Know About Floating-Point Arithmetic.
You might choose to tackle your problem in one of the following ways:
Accept that binary floating point numbers cannot represent such values exactly, and continue to use them. Don't do any rounding at all, and keep the full value. When you wish to display the value as text, format it so that only two decimal places are emitted.
Use a data type that can represent your number exactly. That means a decimal rather than binary type. In Python you would use decimal.
Try this :
Round = lambda x, n: eval('"%.' + str(int(n)) + 'f" % ' + repr(x))
print Round(0.1, 2)
0.10
print Round(0.1, 4)
0.1000
print Round(981,32000000000005, 2)
981,32
Just indicate the number of digits you want as a second kwarg
I wrote a solution of this problem.
Plz try
from decimal import *
from autorounddecimal.core import adround,decimal_round_digit
decimal_round_digit(Decimal("981.32000000000005")) #=> Decimal("981.32")
adround(981.32000000000005) # just wrap decimal_round_digit
More detail can be found in https://github.com/niitsuma/autorounddecimal
There is a difference between the way Python prints floats and the way it stores floats. For example:
>>> a = 1.0/5.0
>>> a
0.20000000000000001
>>> print a
0.2
It's not actually possible to store an exact representation of many floats, as David Heffernan points out. It can be done if, looking at the float as a fraction, the denominator is a power of 2 (such as 1/4, 3/8, 5/64). Otherwise, due to the inherent limitations of binary, it has to make do with an approximation.
Python recognizes this, and when you use the print function, it will use the nicer representation seen above. This may make you think that Python is storing the float exactly, when in fact it is not, because it's not possible with the IEEE standard float representation. The difference in calculation is pretty insignificant, though, so for most practical purposes it isn't a problem. If you really really need those significant digits, though, use the decimal package.
I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like
mydec = decimal.Decimal(str(x))
where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?
For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.
>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'
Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:
>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')
So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)
There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!
In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:
/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
the rounding noise created by various operations is suppressed, while
giving plenty of precision for practical use. */
#define PyFloat_STR_PRECISION 12
You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.
There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.
If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).
As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to
return a string containing a nicely printable representation of an object
There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.
If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):
Decimal( "{0:.5f}".format(a_float) )
You can also remove 0s on the right with resulting_string.rstrip("0").
Again, this method does not recover the information that has been lost.
Problem: to see when computer makes approximation in mathematical calculations when I use Python
Example of the problem:
My old teacher once said the following statement
You cannot never calculate 200! with your computer.
I am not completely sure whether it is true or not nowadays.
It seems that it is, since I get a lot zeros for it from a Python script.
How can you see when your Python code makes approximations?
Python use arbitrary-precision arithmetic to calculate with integers, so it can exactly calculate 200!. For real numbers (so-called floating-point), Python does not use an exact representation. It uses a binary representation called IEEE 754, which is essentially scientific notation, except in base 2 instead of base 10.
Thus, any real number that cannot be exactly represented in base 2 with 53 bits of precision, Python cannot produce an exact result. For example, 0.1 (in base 10) is an infinite decimal in base 2, 0.0001100110011..., so it cannot be exactly represented. Hence, if you enter on a Python prompt:
>>> 0.1
0.10000000000000001
The result you get back is different, since has been converted from decimal to binary (with 53 bits of precision), back to decimal. As a consequence, you get things like this:
>>> 0.1 + 0.2 == 0.3
False
For a good (but long) read, see What Every Programmer Should Know About Floating-Point Arithmetic.
Python has unbounded integer sizes in the form of a long type. That is to say, if it is a whole number, the limit on the size of the number is restricted by the memory available to Python.
When you compute a large number such as 200! and you see an L on the end of it, that means Python has automatically cast the int to a long, because an int was not large enough to hold that number.
See section 6.4 of this page for more information.
200! is a very large number indeed.
If the range of an IEEE 64-bit double is 1.7E +/- 308 (15 digits), you can see that the largest factorial you can get is around 170!.
Python can handle arbitrary sized numbers, as can Java with its BigInteger.
Without some sort of clarification to that statement, it's obviously false. Just from personal experience, early lessons in programming (in the late 1980s) included solving very similar, if not exactly the same, problems. In general, to know some device which does calculations isn't making approximations, you have to prove (in the math sense of a proof) that it isn't.
Python's integer types (named int and long in 2.x, both folded into just the int type in 3.x) are very good, and do not overflow like, for example, the int type in C. If you do the obvious of print 200 * 199 * 198 * ... it may be slow, but it will be exact. Similiarly, addition, subtraction, and modulus are exact. Division is a mixed bag, as there's two operators, / and //, and they underwent a change in 2.x—in general you can only treat it as inexact.
If you want more control yet don't want to limit yourself to integers, look at the decimal module.
Python handles large numbers automatically (unlike a language like C where you can overflow its datatypes and the values reset to zero, for example) - over a certain point (sys.maxint or 2147483647) it converts the integer to a "long" (denoted by the L after the number), which can be any length:
>>> def fact(x):
... return reduce(lambda x, y: x * y, range(1, x+1))
...
>>> fact(10)
3628800
>>> fact(200)
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000L
Long numbers are "easy", floating point is more complicated, and almost any computer representation of a floating point number is an approximation, for example:
>>> float(1)/3
0.33333333333333331
Obviously you can't store an infinite number of 3's in memory, so it cheats and rounds it a bit..
You may want to look at the decimal module:
Decimal numbers can be represented exactly. In contrast, numbers like 1.1 do not have an exact representation in binary floating point. End users typically would not expect 1.1 to display as 1.1000000000000001 as it does with binary floating point.
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem
See Handling very large numbers in Python.
Python has a BigNum class for holding 200! and will use it automatically.
Your teacher's statement, though not exactly true here is true in general. Computers have limitations, and it is good to know what they are. Remember that every time you add another integer of data storage, you can store a number that is 2^32 (4 billion +) times larger. It is hard to comprehend how many more numbers that is - but maths gets slower as you add more integers to store the exact value of a very large number.
As an example (what you can store with 1000 bits)
>>> 2 << 1000
2143017214372534641896850098120003621122809623411067214887500776740702102249872244986396
7576313917162551893458351062936503742905713846280871969155149397149607869135549648461970
8421492101247422837559083643060929499671638825347975351183310878921541258291423929553730
84335320859663305248773674411336138752L
I tried to illustrate how big a number you can store with 10000 bits, or even 8,000,000 bits (a megabyte) but that number is many pages long.