What is python's threshold of representable negative numbers? What's the lowest number below which Python will call any other value a - negative inifinity?
There is no most negative integer, as Python integers have arbitrary precision. The smallest float greater than negative infinity (which, depending on your implementation, can be represented as -float('inf')) can be found in sys.float_info.
>>> import sys
>>> sys.float_info.max
1.7976931348623157e+308
The actual values depend on the actual implementation, but typically uses your C library's double type. Since floating-point values typically use a sign bit, the smallest negative value is simply the inverse of the largest positive value. Also, because of how floating point values are stored (separate mantissa and exponent), you can't simply subtract a small value from the "minimum" value and get back negative infinity. Subtracting 1, for example, simply returns the same value due to limited precision.
(In other words, the possible float values are a small subset of the actual real numbers, and operations on two float values is not necessarily equivalent to the same operation on the "equivalent" reals.)
Related
Is this just the smallest number that can be stored in 32 bits for example?
from math import inf
Floating-point numbers (real numbers as typically implemented on a computer) have special values reserved for positive and negative infinity. Rather than just being “the largest or smallest representable 32-bit numbers,” they act as though they really are infinite. For example, adding anything to positive infinity (other than negative infinity) gives positive infinity, and a similar rule holds for negative infinity.
For more on this, do a search for “IEEE-754 infinity.”
When I run the simple code:
a = np.float64([20269207])
b = np.float32(a)
The output turns to be
a = array([20269207.])
b = array([20269208.], dtype=float32)
What reason causes the difference before and after this conversion? And in what condition the outputs will be different?
It is impossible to store the value 20269207 in the float32 (IEEE 754) format.
You may see, why:
It is possible to store the values 20269206 and 20269208; their representations in binary form are (see IEEE-754 Floating Point Converter):
01001011100110101010010001001011 for 20269206
01001011100110101010010001001100 for 20269208
Their binary forms differ by 1, so there is no place for any number between 20269206 and 20269208.
By the rounding rules “Round to nearest, ties to even” and “Round to nearest, ties away from zero” of IEEE 754, your number is rounded to the nearest even higher number, i.e. to the number 20269208.
Outputs for integer numbers will be different:
for odd numbers with absolute value greater than 16,777,216,
for almost all numbers with absolute value greater than 33,554,432.
Notes:
The first number is 2^24, the second one is 2^25.
"allmost all" - there are "nice" numbers, such as powers of 2, which have precise representations even for very very large numbers.
Inspired by this answer, I wonder why numpy.nextafter gives different results for the smallest positive float number from numpy.finfo(float).tiny and sys.float_info.min:
import numpy, sys
nextafter = numpy.nextafter(0., 1.) # 5e-324
tiny = numpy.finfo(float).tiny # 2.2250738585072014e-308
info = sys.float_info.min # 2.2250738585072014e-308
According to the documentations:
numpy.nextafter
Return the next floating-point value after x1 towards x2, element-wise.
finfo(float).tiny
The smallest positive usable number. Type of tiny is an appropriate floating point type.
sys.float_info
A structseq holding information about the float type. It contains low level information about the precision and internal representation. Please study your system's :file:float.h for more information.
Does someone have an explanation for this?
The documentation’s wording on this is bad; “usable” is colloquial and not defined. Apparently tiny is meant to be the smallest positive normal number.
nextafter is returning the actual next representable value after zero, which is subnormal.
Python does not rigidly specify its floating-point properties. Python implementations commonly inherit them from underlying hardware or software, and use of IEEE-754 formats (but not full conformance to IEEE-754 semantics) is common. In IEEE-754, numbers are represented with an implicit leading one bit in the significand1 until the exponent reaches its minimum value for the format, after which the implicit bit is zero instead of one and smaller values are representable only by reducing the significand instead of reducing the exponent. These numbers with the implicit leading zero are the subnormal numbers. They serve to preserve some useful arithmetic properties, such as x-y == 0 if and only if x == y. (Without subnormal numbers, two very small numbers might be different, but their even smaller difference might not be representable because it was below the exponent limit, so computing x-y would round to zero, resulting in code like if (x != y) quotient = t / (x-y) getting a divide-by-zero error.)
Note
1 “Significand” is the term preferred by experts for the fraction portion of a floating-point representation. “Mantissa” is an old term for the fraction portion of a logarithm. Mantissas are logarithmic, while significands are linear.
This question already has answers here:
Why is math.sqrt() incorrect for large numbers?
(4 answers)
Is floating point math broken?
(31 answers)
Closed 5 years ago.
If you take a number, take its square root, drop the decimal, and then raise it to the second power, the result should always be less than or equal to the original number.
This seems to hold true in python until you try it on 99999999999999975425 for some reason.
import math
def check(n):
assert math.pow(math.floor(math.sqrt(n)), 2) <= n
check(99999999999999975424) # No exception.
check(99999999999999975425) # Throws AssertionError.
It looks like math.pow(math.floor(math.sqrt(99999999999999975425)), 2) returns 1e+20.
I assume this has something to do with the way we store values in python... something related to floating point arithmetic, but I can't reason about specifically how that affects this case.
The problem is not really about sqrt or pow, the problem is you're using numbers larger than floating point can represent precisely. Standard IEEE 64 bit floating point arithmetic can't represent every integer value beyond 52 bits (plus one sign bit).
Try just converting your inputs to float and back again:
>>> int(float(99999999999999975424))
99999999999999967232
>>> int(float(99999999999999975425))
99999999999999983616
As you can see, the representable value skipped by 16384. The first step in math.sqrt is converting to float (C double), and at that moment, your value increased by enough to ruin the end result.
Short version: float can't represent large integers precisely. Use decimal if you need greater precision. Or if you don't care about the fractional component, as of 3.8, you can use math.isqrt, which works entirely in integer space (so you never experience precision loss, only the round down loss you expect), giving you the guarantee you're looking for, that the result is "the greatest integer a such that a² ≤ n".
Unlike Evan Rose's (now-deleted) answer claims, this is not due to an epsilon value in the sqrt algorithm.
Most math module functions cast their inputs to float, and math.sqrt is one of them.
99999999999999975425 cannot be represented as a float. For this input, the cast produces a float with exact numeric value 99999999999999983616, which repr shows as 9.999999999999998e+19:
>>> float(99999999999999975425)
9.999999999999998e+19
>>> int(_)
99999999999999983616L
The closest float to the square root of this number is 10000000000.0, and that's what math.sqrt returns.
For an introduction to Python course, I'm looking at generating a random floating point number in Python, and I have seen a standard recommended code of
import random
lower = 5
upper = 10
range_width = upper - lower
x = random.random() * range_width + lower
for a random floating point from 5 up to but not including 10.
It seems to me that the same effect could be achieved by:
import random
x = random.randrange(5, 10) + random.random()
Since that would give an integer of 5, 6, 7, 8, or 9, and then tack on a decimal to it.
The question I have is would this second code still give a fully even probability distribution, or would it not keep the full randomness of the first version?
According to the documentation then yes random() is indeed a uniform distribution.
random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator.
So both code examples should be fine. To shorten your code, you can equally do:
random.uniform(5, 10)
Note that uniform(a, b) is simply a + (b - a) * random() so the same as your first example.
The second example depends on the version of Python you're using.
Prior to 3.2 randrange() could produce a slightly uneven distributions.
There is a difference. Your second method is theoretically superior, although in practice it only matters for large ranges. Indeed, both methods will give you a uniform distribution. But only the second method can return all values in the range that are representable as a floating point number.
Since your range is so small, there is no appreciable difference. But still there is a difference, which you can see by considering a larger range. If you take a random real number between 0 and 1, you get a floating-point representation with a given number of bits. Now suppose your range is, say, in the order of 2**32. By multiplying the original random number by this range, you lose 32 bits of precision in the result. Put differently, there will be gaps between the values that this method can return. The gaps are still there when you multiply by 4: You have lost the two least significant bits of the original random number.
The two methods can give different results, but you'll only notice the difference in fairly extreme situations (with very wide ranges). For instance, If you generate random numbers between 0 and 2/sys.float_info.epsilon (9007199254740992.0, or a little more than 9 quintillion), you'll notice that the version using multiplication will never give you any floats with fractional values. If you increase the maximum bound to 4/sys.float_info.epsilon, you won't get any odd integers, only even ones. That's because the 64-bit floating point type Python uses doesn't have enough precision to represent all integers at the upper end of that range, and it's trying to maintain a uniform distribution (so it omits small odd integers and fractional values even though those can be represented in parts of the range).
The second version of the calculation will give extra precision to the smaller random numbers generated. For instance, if you're generating numbers between 0 and 2/sys.float_info.epsilon and the randrange call returned 0, you can use the full precision of the random call to add a fractional part to the number. On the other hand if the randrange returned the largest number in the range (2/sys.float_info.epsilon - 1), very little of the precision of the fraction would be used (the number will round to the nearest integer without any fractional part remaining).
Adding a fractional value also can't help you deal with ranges that are too large for every integer to be represented. If randrange returns only even numbers, adding a fraction usually won't make odd numbers appear (it can in some parts of the range, but not for others, and the distribution may be very uneven). Even for ranges where all integers can be represented, the odds of a specific floating point number appearing will not be entirely uniform, since the smaller numbers can be more precisely represented. Large but imprecise numbers will be more common than smaller but more precisely represented ones.