I have the following code:
import numpy as np
float_number_1 = -0.09115307778120041
float_number_2 = -0.41032475233078003
print(float_number_1) #-0.09115307778120041
print(float_number_2) #-0.41032475233078003
my_array= np.array([[float_number_1, float_number_2]], dtype=np.float64)
for number in my_array:
print(number) #[-0.09115308 -0.41032475]
Now when I add the following:
np.set_printoptions(precision=50)
and print my_array again I get the full numbers:
[-0.09115307778120041 -0.41032475233078003]
Is set_printoptions just for display purposes or does this affect the actual precision of the numbers held in the numpy array?
I need to keep the precision for calculation purposes.
Also, what is the maximum precision I can set for float64?
Yes, it is just there for display options. The storage format for numbers is not altered. From the numpy.set_printoptions() documentation:
These options determine the way floating point numbers, arrays and other NumPy objects are displayed.
(Bold emphasis mine)
A 64-bit floating point number has 53 bits of significand precision, so the smallest binary fraction is 2^-52, or 2.220446049250313e-16; about 16 decimal digits, so going beyond np.set_printoptions(precision=16) there probably is not much point.
Note that the setting for floatmode also matters; the default is 'maxprec_equal', which means that the number of digits actually shown depends on the actual values in your array; if you set precision=16, but your array values can be all be uniquely represented with just 4 decimals, then numpy will not use more. Only if floatmode is set to 'fixed' will numpy stick to a larger precision setting.
On what it means to uniquely represent floating point numbers: Because floating point numbers are an approximation using binary fractions, there is a whole range of Real numbers that all would result in the exact same floating point representation e.g. 0.5 and 0.50000000000000005 both end up as the binary value 00111111000000000000000000000000. Numpy strives to find the fewest number of decimal digits that a floating point number can represent, then show that to you.
Related
When I run the simple code:
a = np.float64([20269207])
b = np.float32(a)
The output turns to be
a = array([20269207.])
b = array([20269208.], dtype=float32)
What reason causes the difference before and after this conversion? And in what condition the outputs will be different?
It is impossible to store the value 20269207 in the float32 (IEEE 754) format.
You may see, why:
It is possible to store the values 20269206 and 20269208; their representations in binary form are (see IEEE-754 Floating Point Converter):
01001011100110101010010001001011 for 20269206
01001011100110101010010001001100 for 20269208
Their binary forms differ by 1, so there is no place for any number between 20269206 and 20269208.
By the rounding rules “Round to nearest, ties to even” and “Round to nearest, ties away from zero” of IEEE 754, your number is rounded to the nearest even higher number, i.e. to the number 20269208.
Outputs for integer numbers will be different:
for odd numbers with absolute value greater than 16,777,216,
for almost all numbers with absolute value greater than 33,554,432.
Notes:
The first number is 2^24, the second one is 2^25.
"allmost all" - there are "nice" numbers, such as powers of 2, which have precise representations even for very very large numbers.
Inspired by this answer, I wonder why numpy.nextafter gives different results for the smallest positive float number from numpy.finfo(float).tiny and sys.float_info.min:
import numpy, sys
nextafter = numpy.nextafter(0., 1.) # 5e-324
tiny = numpy.finfo(float).tiny # 2.2250738585072014e-308
info = sys.float_info.min # 2.2250738585072014e-308
According to the documentations:
numpy.nextafter
Return the next floating-point value after x1 towards x2, element-wise.
finfo(float).tiny
The smallest positive usable number. Type of tiny is an appropriate floating point type.
sys.float_info
A structseq holding information about the float type. It contains low level information about the precision and internal representation. Please study your system's :file:float.h for more information.
Does someone have an explanation for this?
The documentation’s wording on this is bad; “usable” is colloquial and not defined. Apparently tiny is meant to be the smallest positive normal number.
nextafter is returning the actual next representable value after zero, which is subnormal.
Python does not rigidly specify its floating-point properties. Python implementations commonly inherit them from underlying hardware or software, and use of IEEE-754 formats (but not full conformance to IEEE-754 semantics) is common. In IEEE-754, numbers are represented with an implicit leading one bit in the significand1 until the exponent reaches its minimum value for the format, after which the implicit bit is zero instead of one and smaller values are representable only by reducing the significand instead of reducing the exponent. These numbers with the implicit leading zero are the subnormal numbers. They serve to preserve some useful arithmetic properties, such as x-y == 0 if and only if x == y. (Without subnormal numbers, two very small numbers might be different, but their even smaller difference might not be representable because it was below the exponent limit, so computing x-y would round to zero, resulting in code like if (x != y) quotient = t / (x-y) getting a divide-by-zero error.)
Note
1 “Significand” is the term preferred by experts for the fraction portion of a floating-point representation. “Mantissa” is an old term for the fraction portion of a logarithm. Mantissas are logarithmic, while significands are linear.
This is more of a numerical analysis rather than programming question, but I suppose some of you will be able to answer it.
In the sum two floats, is there any precision lost? Why?
In the sum of a float and a integer, is there any precision lost? Why?
Thanks.
In the sum two floats, is there any precision lost?
If both floats have differing magnitude and both are using the complete precision range (of about 7 decimal digits) then yes, you will see some loss in the last places.
Why?
This is because floats are stored in the form of (sign) (mantissa) × 2(exponent). If two values have differing exponents and you add them, then the smaller value will get reduced to less digits in the mantissa (because it has to adapt to the larger exponent):
PS> [float]([float]0.0000001 + [float]1)
1
In the sum of a float and a integer, is there any precision lost?
Yes, a normal 32-bit integer is capable of representing values exactly which do not fit exactly into a float. A float can still store approximately the same number, but no longer exactly. Of course, this only applies to numbers that are large enough, i. e. longer than 24 bits.
Why?
Because float has 24 bits of precision and (32-bit) integers have 32. float will still be able to retain the magnitude and most of the significant digits, but the last places may likely differ:
PS> [float]2100000050 + [float]100
2100000100
The precision depends on the magnitude of the original numbers. In floating point, the computer represents the number 312 internally as scientific notation:
3.12000000000 * 10 ^ 2
The decimal places in the left hand side (mantissa) are fixed. The exponent also has an upper and lower bound. This allows it to represent very large or very small numbers.
If you try to add two numbers which are the same in magnitude, the result should remain the same in precision, because the decimal point doesn't have to move:
312.0 + 643.0 <==>
3.12000000000 * 10 ^ 2 +
6.43000000000 * 10 ^ 2
-----------------------
9.55000000000 * 10 ^ 2
If you tried to add a very big and a very small number, you would lose precision because they must be squeezed into the above format. Consider 312 + 12300000000000000000000. First you have to scale the smaller number to line up with the bigger one, then add:
1.23000000000 * 10 ^ 15 +
0.00000000003 * 10 ^ 15
-----------------------
1.23000000003 <-- precision lost here!
Floating point can handle very large, or very small numbers. But it can't represent both at the same time.
As for ints and doubles being added, the int gets turned into a double immediately, then the above applies.
When adding two floating point numbers, there is generally some error. D. Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" describes the effect and the reasons in detail, and also how to calculate an upper bound on the error, and how to reason about the precision of more complex calculations.
When adding a float to an integer, the integer is first converted to a float by C++, so two floats are being added and error is introduced for the same reasons as above.
The precision available for a float is limited, so of course there is always the risk that any given operation drops precision.
The answer for both your questions is "yes".
If you try adding a very large float to a very small one, you will for instance have problems.
Or if you try to add an integer to a float, where the integer uses more bits than the float has available for its mantissa.
The short answer: a computer represents a float with a limited number of bits, which is often done with mantissa and exponent, so only a few bytes are used for the significant digits, and the others are used to represent the position of the decimal point.
If you were to try to add (say) 10^23 and 7, then it won't be able to accurately represent that result. A similar argument applies when adding a float and integer -- the integer will be promoted to a float.
In the sum two floats, is there any precision lost?
In the sum of a float and a integer, is there any precision lost? Why?
Not always. If the sum is representable with the precision you ask, and you won't get any precision loss.
Example: 0.5 + 0.75 => no precision loss
x * 0.5 => no precision loss (except if x is too much small)
In the general case, one add floats in slightly different ranges so there is a precision loss which actually depends on the rounding mode.
ie: if you're adding numbers with totally different ranges, expect precision problems.
Denormals are here to give extra-precision in extreme cases, at the expense of CPU.
Depending on how your compiler handle floating-point computation, results can vary.
With strict IEEE semantics, adding two 32 bits floats should not give better accuracy than 32 bits.
In practice it may requires more instruction to ensure that, so you shouldn't rely on accurate and repeatable results with floating-point.
In both cases yes:
assert( 1E+36f + 1.0f == 1E+36f );
assert( 1E+36f + 1 == 1E+36f );
The case float + int is the same as float + float, because a standard conversion is applied to the int. In the case of float + float, this is implementation dependent, because an implementation may choose to do the addition at double precision. There may be some loss when you store the result, of course.
In both cases, the answer is "yes". When adding an int to a float, the integer is converted to floating point representation before the addition takes place anyway.
To understand why, I suggest you read this gem: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
I have a variable containing a large floating point number, say a = 999999999999999.99
When I type int(a) in the interpreter, it returns 1000000000000000.
How do I get the output as 999999999999999 for long numbers like these?
999999999999999.99 is a number that can't be precisely represented in the floating-point format, so Python compromises and picks the closest value that can be represented. In this case, that happens to be 1000000000000000. That's why converting that to an integer gives you 1000000000000000.
If you need more precision than floats can provide, consider using decimal.Decimal.
>>> import decimal
>>> a = decimal.Decimal("999999999999999.99")
>>> a
Decimal('999999999999999.99')
>>> int(a)
999999999999999
The problem is not int, the problem is the floating point value itself. Your value would need 17 digits of precision to be represented correctly, while double precision floating point values have between 15 and 16 digits of precision. So, when you input it, it is rounded to the nearest representable float value, which is 1000000000000000.0. When int is called it cannot do a thing - the precision is already lost.
If you need to represent this kind of values exactly you can use the decimal data type, keeping in mind that performance does suffer compared to regular floats.
So I know how to print a floating point number with a certain decimal places.
My question is how to return it with a specified number of decimal places?
Thanks.
You could use the round() function
The docs about it:
round(x[, n])
x rounded to n digits, rounding half to even. If n is omitted, it defaults to 0.
In order to get two decimal places, multiply the number by 100, floor it, then divide by 100.
And note that the number you will return will not really have only two decimal places because division by 100 cannot be represented exactly in IEEE-754 floating-point arithmetic most of the time. It will only be the closest representable approximation to a number with only two decimal places.
If you really want floating point numbers with a fixed precision you could use the decimal module. Those numbers have a user alterable precision and you could just do your calculation on two-digit decimals.
Floating point numbers have infinite number of decimal places. The physical representation on the computer is dependent on the representation of float, or double, or whatever and is dependent on a) language b) construct, e.g. float, double, etc. c) compiler implementation d) hardware.
Now, given that you have a representation of a floating point number (i.e. a real) within a particular language, is your question how to round it off or truncate it to a specific number of digits?
There is no need to do this within the return call, since you can always truncate/round afterwards. In fact, you would usually not want to truncate until actually printing, to preserve more precision. An exception might be if you wanted to ensure that results were consistent across different algorithms/hardware, ie. say you had some financial trading software that needed to pass unit tests across different languages/platforms etc.