I have a variable containing a large floating point number, say a = 999999999999999.99
When I type int(a) in the interpreter, it returns 1000000000000000.
How do I get the output as 999999999999999 for long numbers like these?
999999999999999.99 is a number that can't be precisely represented in the floating-point format, so Python compromises and picks the closest value that can be represented. In this case, that happens to be 1000000000000000. That's why converting that to an integer gives you 1000000000000000.
If you need more precision than floats can provide, consider using decimal.Decimal.
>>> import decimal
>>> a = decimal.Decimal("999999999999999.99")
>>> a
Decimal('999999999999999.99')
>>> int(a)
999999999999999
The problem is not int, the problem is the floating point value itself. Your value would need 17 digits of precision to be represented correctly, while double precision floating point values have between 15 and 16 digits of precision. So, when you input it, it is rounded to the nearest representable float value, which is 1000000000000000.0. When int is called it cannot do a thing - the precision is already lost.
If you need to represent this kind of values exactly you can use the decimal data type, keeping in mind that performance does suffer compared to regular floats.
Related
In the following example:
import math
x = math.log(2)
print("{:.500f}".format(x))
I tried to get 500 digits output I get only 53 decimals output of ln(2) as follows:
0.69314718055994528622676398299518041312694549560546875000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
How I can fix this problem?
You can't with the Python float type. It's dependent on the underlying machine architecture, and in most cases you're limited to a double-precision float.
However, you can get higher precision with the decimal module:
>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 500
>>> d = Decimal(2)
>>> d.ln()
Decimal('0.69314718055994530941723212145817656807550013436025525412068000949339362196969471560586332699641868754200148102057068573368552023575813055703267075163507596193072757082837143519030703862389167347112335011536449795523912047517268157493206515552473413952588295045300709532636664265410423915781495204374043038550080194417064167151864471283996817178454695702627163106454615025720740248163777338963855069526066834113727387372292895649354702576265209885969320196505855476470330679365443254763274495125040607')
>>> print(d.ln())
0.69314718055994530941723212145817656807550013436025525412068000949339362196969471560586332699641868754200148102057068573368552023575813055703267075163507596193072757082837143519030703862389167347112335011536449795523912047517268157493206515552473413952588295045300709532636664265410423915781495204374043038550080194417064167151864471283996817178454695702627163106454615025720740248163777338963855069526066834113727387372292895649354702576265209885969320196505855476470330679365443254763274495125040607
I tried to get 500 digits output I get only 53 decimals output of ln(2) as follows:
The problem is not in the printing. The 500 digit output is the exact value returned from math.log(2).
The return value of math.log(2) is encoded using binary64 which can only represent about 264 different finite values - each of them is a dyadic rational. Mathematically log(2) is an irrational number, thus it is impossible for x to encode the math result exactly.
Instead math.log(2) returns the nearest encodable value.
That value is exactly 0.6931471805599452862267639829951804131269454956054687500...
Printing binary64 with more than 17 significant digits typically does not add important value information.
Within the realm of real numbers, which is an infinite set of numbers with arbitrary precision, the floating point numbers are a small subset of numbers with a finite precision. They are the numbers that are represented by a linear combination of powers of two (See Double Precision floating point format).
As Ln(2) is not re-presentable as a floating-point number, a computer finds the nearest number by numerical approximations. In case of Ln(2), this number is:
6243314768165359 * 2^-53 = 0.69314718055994528622676398299518041312694549560546875
If you need to do arbitrary precision arithmetic, you are required to make use of different computational methods. Various software packages exist that allow this. For Python, MPmath is fairly standard:
>>> from mpmath import *
>>> mp.dps = 500
>>> mp.pretty=True
>>> ln(2)
0.69314718055994530941723212145817656807550013436025525412068000949339362196969471560586332699641868754200148102057068573368552023575813055703267075163507596193072757082837143519030703862389167347112335011536449795523912047517268157493206515552473413952588295045300709532636664265410423915781495204374043038550080194417064167151864471283996817178454695702627163106454615025720740248163777338963855069526066834113727387372292895649354702576265209885969320196505855476470330679365443254763274495125040607
I have been trying to fix the precision issue in my code that has been breaking. I want the value to be presented exactly as i provided but it seems like Python is rounding up the number. Below is the sample.
x = 3069682093880544.81
print (x)
3069682093880545.0
x = Decimal(3069682093880544.81)
print (x)
3069682093880545
x = float(3069682093880544.81)
print(x)
3069682093880545.0
x = Decimal(str(3069682093880544.81))
print(x)
3069682093880545.0
3069682093880545.0
x = str(3069682093880544.81)
print(x)
3069682093880545.0
All i want is to be able to assign exact value to the variable and it provides me the same value when called. What am i doing wrong?
The number 3069682093880544.81 is being converted into a 64 bit floating point number according the IEEE format. The closest number in that format is 43910A47D717A69F. However, converting that number back will be 3069682093880544.64. As you can see, the last 2 digits after the comma have changed.
The number of significant digits in a IEEE 64 bit float is 16 digits. And that's likely why the printed output choses to stop printing after 16 digits, which is 3069682093880545.
If you want more decimal places, you need to chose a method which does not have a IEEE floating point number in the way of its processing. (Note that even the source code interpreter will parse numbers into floating point format already.) As mentioned by #LeopardShark,
from decimal import *
print(Decimal("3069682093880544.81"))
goes from String to Decimal without any processing as float.
The problem is that the literal 3069682093880544.81 is parsed as a float. So, your second statement, for example, is sort of equivalent to Decimal(float(3069682093880544.81)). What you want is Decimal("3069682093880544.81") which parses it as a string, and then converts it to a Decimal.
This is more of a numerical analysis rather than programming question, but I suppose some of you will be able to answer it.
In the sum two floats, is there any precision lost? Why?
In the sum of a float and a integer, is there any precision lost? Why?
Thanks.
In the sum two floats, is there any precision lost?
If both floats have differing magnitude and both are using the complete precision range (of about 7 decimal digits) then yes, you will see some loss in the last places.
Why?
This is because floats are stored in the form of (sign) (mantissa) × 2(exponent). If two values have differing exponents and you add them, then the smaller value will get reduced to less digits in the mantissa (because it has to adapt to the larger exponent):
PS> [float]([float]0.0000001 + [float]1)
1
In the sum of a float and a integer, is there any precision lost?
Yes, a normal 32-bit integer is capable of representing values exactly which do not fit exactly into a float. A float can still store approximately the same number, but no longer exactly. Of course, this only applies to numbers that are large enough, i. e. longer than 24 bits.
Why?
Because float has 24 bits of precision and (32-bit) integers have 32. float will still be able to retain the magnitude and most of the significant digits, but the last places may likely differ:
PS> [float]2100000050 + [float]100
2100000100
The precision depends on the magnitude of the original numbers. In floating point, the computer represents the number 312 internally as scientific notation:
3.12000000000 * 10 ^ 2
The decimal places in the left hand side (mantissa) are fixed. The exponent also has an upper and lower bound. This allows it to represent very large or very small numbers.
If you try to add two numbers which are the same in magnitude, the result should remain the same in precision, because the decimal point doesn't have to move:
312.0 + 643.0 <==>
3.12000000000 * 10 ^ 2 +
6.43000000000 * 10 ^ 2
-----------------------
9.55000000000 * 10 ^ 2
If you tried to add a very big and a very small number, you would lose precision because they must be squeezed into the above format. Consider 312 + 12300000000000000000000. First you have to scale the smaller number to line up with the bigger one, then add:
1.23000000000 * 10 ^ 15 +
0.00000000003 * 10 ^ 15
-----------------------
1.23000000003 <-- precision lost here!
Floating point can handle very large, or very small numbers. But it can't represent both at the same time.
As for ints and doubles being added, the int gets turned into a double immediately, then the above applies.
When adding two floating point numbers, there is generally some error. D. Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" describes the effect and the reasons in detail, and also how to calculate an upper bound on the error, and how to reason about the precision of more complex calculations.
When adding a float to an integer, the integer is first converted to a float by C++, so two floats are being added and error is introduced for the same reasons as above.
The precision available for a float is limited, so of course there is always the risk that any given operation drops precision.
The answer for both your questions is "yes".
If you try adding a very large float to a very small one, you will for instance have problems.
Or if you try to add an integer to a float, where the integer uses more bits than the float has available for its mantissa.
The short answer: a computer represents a float with a limited number of bits, which is often done with mantissa and exponent, so only a few bytes are used for the significant digits, and the others are used to represent the position of the decimal point.
If you were to try to add (say) 10^23 and 7, then it won't be able to accurately represent that result. A similar argument applies when adding a float and integer -- the integer will be promoted to a float.
In the sum two floats, is there any precision lost?
In the sum of a float and a integer, is there any precision lost? Why?
Not always. If the sum is representable with the precision you ask, and you won't get any precision loss.
Example: 0.5 + 0.75 => no precision loss
x * 0.5 => no precision loss (except if x is too much small)
In the general case, one add floats in slightly different ranges so there is a precision loss which actually depends on the rounding mode.
ie: if you're adding numbers with totally different ranges, expect precision problems.
Denormals are here to give extra-precision in extreme cases, at the expense of CPU.
Depending on how your compiler handle floating-point computation, results can vary.
With strict IEEE semantics, adding two 32 bits floats should not give better accuracy than 32 bits.
In practice it may requires more instruction to ensure that, so you shouldn't rely on accurate and repeatable results with floating-point.
In both cases yes:
assert( 1E+36f + 1.0f == 1E+36f );
assert( 1E+36f + 1 == 1E+36f );
The case float + int is the same as float + float, because a standard conversion is applied to the int. In the case of float + float, this is implementation dependent, because an implementation may choose to do the addition at double precision. There may be some loss when you store the result, of course.
In both cases, the answer is "yes". When adding an int to a float, the integer is converted to floating point representation before the addition takes place anyway.
To understand why, I suggest you read this gem: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
This question already has answers here:
Why is math.sqrt() incorrect for large numbers?
(4 answers)
Is floating point math broken?
(31 answers)
Closed 5 years ago.
If you take a number, take its square root, drop the decimal, and then raise it to the second power, the result should always be less than or equal to the original number.
This seems to hold true in python until you try it on 99999999999999975425 for some reason.
import math
def check(n):
assert math.pow(math.floor(math.sqrt(n)), 2) <= n
check(99999999999999975424) # No exception.
check(99999999999999975425) # Throws AssertionError.
It looks like math.pow(math.floor(math.sqrt(99999999999999975425)), 2) returns 1e+20.
I assume this has something to do with the way we store values in python... something related to floating point arithmetic, but I can't reason about specifically how that affects this case.
The problem is not really about sqrt or pow, the problem is you're using numbers larger than floating point can represent precisely. Standard IEEE 64 bit floating point arithmetic can't represent every integer value beyond 52 bits (plus one sign bit).
Try just converting your inputs to float and back again:
>>> int(float(99999999999999975424))
99999999999999967232
>>> int(float(99999999999999975425))
99999999999999983616
As you can see, the representable value skipped by 16384. The first step in math.sqrt is converting to float (C double), and at that moment, your value increased by enough to ruin the end result.
Short version: float can't represent large integers precisely. Use decimal if you need greater precision. Or if you don't care about the fractional component, as of 3.8, you can use math.isqrt, which works entirely in integer space (so you never experience precision loss, only the round down loss you expect), giving you the guarantee you're looking for, that the result is "the greatest integer a such that a² ≤ n".
Unlike Evan Rose's (now-deleted) answer claims, this is not due to an epsilon value in the sqrt algorithm.
Most math module functions cast their inputs to float, and math.sqrt is one of them.
99999999999999975425 cannot be represented as a float. For this input, the cast produces a float with exact numeric value 99999999999999983616, which repr shows as 9.999999999999998e+19:
>>> float(99999999999999975425)
9.999999999999998e+19
>>> int(_)
99999999999999983616L
The closest float to the square root of this number is 10000000000.0, and that's what math.sqrt returns.
So I know how to print a floating point number with a certain decimal places.
My question is how to return it with a specified number of decimal places?
Thanks.
You could use the round() function
The docs about it:
round(x[, n])
x rounded to n digits, rounding half to even. If n is omitted, it defaults to 0.
In order to get two decimal places, multiply the number by 100, floor it, then divide by 100.
And note that the number you will return will not really have only two decimal places because division by 100 cannot be represented exactly in IEEE-754 floating-point arithmetic most of the time. It will only be the closest representable approximation to a number with only two decimal places.
If you really want floating point numbers with a fixed precision you could use the decimal module. Those numbers have a user alterable precision and you could just do your calculation on two-digit decimals.
Floating point numbers have infinite number of decimal places. The physical representation on the computer is dependent on the representation of float, or double, or whatever and is dependent on a) language b) construct, e.g. float, double, etc. c) compiler implementation d) hardware.
Now, given that you have a representation of a floating point number (i.e. a real) within a particular language, is your question how to round it off or truncate it to a specific number of digits?
There is no need to do this within the return call, since you can always truncate/round afterwards. In fact, you would usually not want to truncate until actually printing, to preserve more precision. An exception might be if you wanted to ensure that results were consistent across different algorithms/hardware, ie. say you had some financial trading software that needed to pass unit tests across different languages/platforms etc.