Value difference when converting from double(float64) to float(float32) in numpy

Value difference when converting from double(float64) to float(float32) in numpy - python

When I run the simple code:
a = np.float64([20269207])
b = np.float32(a)
The output turns to be
a = array([20269207.])
b = array([20269208.], dtype=float32)
What reason causes the difference before and after this conversion? And in what condition the outputs will be different?

It is impossible to store the value 20269207 in the float32 (IEEE 754) format.
You may see, why:
It is possible to store the values 20269206 and 20269208; their representations in binary form are (see IEEE-754 Floating Point Converter):
01001011100110101010010001001011 for 20269206
01001011100110101010010001001100 for 20269208
Their binary forms differ by 1, so there is no place for any number between 20269206 and 20269208.
By the rounding rules “Round to nearest, ties to even” and “Round to nearest, ties away from zero” of IEEE 754, your number is rounded to the nearest even higher number, i.e. to the number 20269208.
Outputs for integer numbers will be different:
for odd numbers with absolute value greater than 16,777,216,
for almost all numbers with absolute value greater than 33,554,432.
Notes:
The first number is 2^24, the second one is 2^25.
"allmost all" - there are "nice" numbers, such as powers of 2, which have precise representations even for very very large numbers.

Related

Reason for residuum in python float multiplication

Why are in some float multiplications in python those weird residuum?
e.g.
>>> 50*1.1
55.00000000000001
but
>>> 30*1.1
33.0
The reason should be somewhere in the binary representation of floats, but where is the difference in particular of both examples?

(This answer assumes your Python implementation uses IEEE-754 binary64, which is common.)
When 1.1 is converted to floating-point, the result is exactly 1.100000000000000088817841970012523233890533447265625, because this is the nearest representable value. (This number is 4953959590107546 • 2−52 — an integer with at most 53 bits multiplied by a power of two.)
When that is multiplied by 50, the exact mathematical result is 55.00000000000000444089209850062616169452667236328125. That cannot be exactly represented in binary64. To fit it into the binary64 format, it is rounded to the nearest representable value, which is 55.00000000000000710542735760100185871124267578125 (which is 7740561859543041 • 2−47).
When it is multiplied by 30, the exact result is 33.00000000000000266453525910037569701671600341796875. it also cannot be represented exactly in binary64. It is rounded to the nearest representable value, which is 33. (The next higher representable value is 33.00000000000000710542735760100185871124267578125, and we can see …026 is closer to …000 than to …071.)
That explains what the internal results are. Next there is an issue of how your Python implementation formats the output. I do not believe the Python implementation is strict about this, but it is likely one of two methods is used:
In effect, the number is converted to a certain number of decimal digits, and then trailing insignificant zeros are removed. Converting 55.00000000000000710542735760100185871124267578125 to a numeral with 16 digits after the decimal point yields 55.00000000000001, which has no trailing zeros to remove. Converting 33 to a numeral with 16 digits after the decimal point yields 33.00000000000000, which has 15 trailing zeros to remove. (Presumably your Python implementation always leaves at least one trailing zero after a decimal point to clearly distinguish that it is a floating-point number rather than an integer.)
Just enough decimal digits are used to uniquely distinguish the number from adjacent representable values. This method is required in Java and JavaScript but is not yet common in other programming languages. In the case of 55.00000000000000710542735760100185871124267578125, printing “55.00000000000001” distinguishes it from the neighboring values 55 (which would be formatted as “55.0”) and 55.0000000000000142108547152020037174224853515625 (which would be “55.000000000000014”).

Is np.set_printoptions() just for display purposes?

I have the following code:
import numpy as np
float_number_1 = -0.09115307778120041
float_number_2 = -0.41032475233078003
print(float_number_1) #-0.09115307778120041
print(float_number_2) #-0.41032475233078003
my_array= np.array([[float_number_1, float_number_2]], dtype=np.float64)
for number in my_array:
print(number) #[-0.09115308 -0.41032475]
Now when I add the following:
np.set_printoptions(precision=50)
and print my_array again I get the full numbers:
[-0.09115307778120041 -0.41032475233078003]
Is set_printoptions just for display purposes or does this affect the actual precision of the numbers held in the numpy array?
I need to keep the precision for calculation purposes.
Also, what is the maximum precision I can set for float64?

Yes, it is just there for display options. The storage format for numbers is not altered. From the numpy.set_printoptions() documentation:
These options determine the way floating point numbers, arrays and other NumPy objects are displayed.
(Bold emphasis mine)
A 64-bit floating point number has 53 bits of significand precision, so the smallest binary fraction is 2^-52, or 2.220446049250313e-16; about 16 decimal digits, so going beyond np.set_printoptions(precision=16) there probably is not much point.
Note that the setting for floatmode also matters; the default is 'maxprec_equal', which means that the number of digits actually shown depends on the actual values in your array; if you set precision=16, but your array values can be all be uniquely represented with just 4 decimals, then numpy will not use more. Only if floatmode is set to 'fixed' will numpy stick to a larger precision setting.
On what it means to uniquely represent floating point numbers: Because floating point numbers are an approximation using binary fractions, there is a whole range of Real numbers that all would result in the exact same floating point representation e.g. 0.5 and 0.50000000000000005 both end up as the binary value 00111111000000000000000000000000. Numpy strives to find the fewest number of decimal digits that a floating point number can represent, then show that to you.

python's negative threshold, the lowest non-infinity negative number?

What is python's threshold of representable negative numbers? What's the lowest number below which Python will call any other value a - negative inifinity?

There is no most negative integer, as Python integers have arbitrary precision. The smallest float greater than negative infinity (which, depending on your implementation, can be represented as -float('inf')) can be found in sys.float_info.
>>> import sys
>>> sys.float_info.max
1.7976931348623157e+308
The actual values depend on the actual implementation, but typically uses your C library's double type. Since floating-point values typically use a sign bit, the smallest negative value is simply the inverse of the largest positive value. Also, because of how floating point values are stored (separate mantissa and exponent), you can't simply subtract a small value from the "minimum" value and get back negative infinity. Subtracting 1, for example, simply returns the same value due to limited precision.
(In other words, the possible float values are a small subset of the actual real numbers, and operations on two float values is not necessarily equivalent to the same operation on the "equivalent" reals.)

python 3.3.2 do I get the right understanding of the function "round"?

Sorry, but I really don't know what's the meaning of the defination of round in python 3.3.2 doc:
round(number[, ndigits])
Return the floating point value number rounded to ndigits digits after the decimal point. If ndigits is omitted, it defaults to zero. Delegates to number.__round__(ndigits).
For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus ndigits if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). The return value is an integer if called with one argument, otherwise of the same type as number.
I don't know how come the multiple of 10 and pow.
After reading the following examples, I think round(number,n) works like:
if let number be 123.456, let n be 2
round will get two number:123.45 and 123.46
round compares abs(number-123.45) (0.006) and abs(number-123.46) (0.004),and chooses the smaller one.
so, 123.46 is the result.
and if let number be 123.455, let n be 2:
round will get two number:123.45 and 123.46
round compares abs(number-123.45) (0.005) and abs(number-123.46) (0.005). They are equal. So round checks the last digit of 123.45 and 123.46. The even one is the result.
so, the result is 123.46
Am I right?
If not, could you offer a understandable version of values are rounded to the closest multiple of 10 to the power minus ndigits?

ndigits = 0 => pow(10, -ndigits) = 10^(-ndigits) = 1
ndigits = 1 => pow(10, -ndigits) = 10^(-ndigits) = 0.1
etc.
>>> for ndigits in range(6):
... print round(123.456789, ndigits) / pow(10, -ndigits)
123.0
1235.0
12346.0
123457.0
1234568.0
12345679.0
Basically, the number you get is always an integer multiple of 10^(-ndigits). For ndigits=0, that means the number you get is itself an integer, for ndigits=1 it means it won't have more than one non-zero value after the decimal point.

It helps to know that anything to the power of 0 equals 1. As ndigits increases, the function:
f(ndigits) = 10-ndigits gets smaller as you increase ndigits. Specifically as you increase ndigits by 1, you simply shift the decimal place of precision one left. e.g. 10^-0 = 1, 10^-1 = .1 and 10^-2 = .01. The place where the 1 is in the answer is the last point of precision for round.
For the part where it says
For the built-in types supporting round(), values are rounded to the
closest multiple of 10 to the power minus ndigits; if two multiples
are equally close, rounding is done toward the even choice (so, for
example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2).
This has unexpected behavior in Python 3 and it will not work for all floats. Consider the example you gave, round(123.455, 2) yields the value 123.45. This is not expected behavior because the closest even multiple of 10^-2 is 123.46, not 123.45!
To understand this, you have to pay special attention to the note below this:
Note The behavior of round() for floats can be surprising: for
example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This
is not a bug: it’s a result of the fact that most decimal fractions
can’t be represented exactly as a float.
And that is why certain floats will round to the "wrong value" and there is really no easy workaround as far as I am aware. (sadface) You could use fractions (i.e. two variables representing the numerator and the denominator) to represent floats in a custom round function if you want to get different behavior than the unpredictable behavior for floats.

How to return a float point number with a defined number of decimal places?

So I know how to print a floating point number with a certain decimal places.
My question is how to return it with a specified number of decimal places?
Thanks.

You could use the round() function
The docs about it:
round(x[, n])
x rounded to n digits, rounding half to even. If n is omitted, it defaults to 0.

In order to get two decimal places, multiply the number by 100, floor it, then divide by 100.
And note that the number you will return will not really have only two decimal places because division by 100 cannot be represented exactly in IEEE-754 floating-point arithmetic most of the time. It will only be the closest representable approximation to a number with only two decimal places.

If you really want floating point numbers with a fixed precision you could use the decimal module. Those numbers have a user alterable precision and you could just do your calculation on two-digit decimals.

Floating point numbers have infinite number of decimal places. The physical representation on the computer is dependent on the representation of float, or double, or whatever and is dependent on a) language b) construct, e.g. float, double, etc. c) compiler implementation d) hardware.
Now, given that you have a representation of a floating point number (i.e. a real) within a particular language, is your question how to round it off or truncate it to a specific number of digits?
There is no need to do this within the return call, since you can always truncate/round afterwards. In fact, you would usually not want to truncate until actually printing, to preserve more precision. An exception might be if you wanted to ensure that results were consistent across different algorithms/hardware, ie. say you had some financial trading software that needed to pass unit tests across different languages/platforms etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.