Discrepancy of floating representation - python

In this SO answer, an user provided this short function that returns the binary representation of a floating-point value:
import struct
import sys
def float_to_bin(f):
""" Convert a float into a binary string. """
if sys.version_info >= (3,): # Python 3?
ba = struct.pack('>d', f)
else:
ba = bytearray(struct.pack('>d', f)) # Convert str result.
s = ''.join('{:08b}'.format(b) for b in ba)
return s[:-1].lstrip('0') + s[0] # Strip but one leading zero.
When I call this function with the value 7/3-4/3 (in Python 3.5), or with 1.0000000000000002, I get this binary representation :
11111111110000000000000000000000000000000000000000000000000000
Using this online tool, with the same values, I get this binary representation :
11111111110000000000000000000000000000000000000000000000000001
Why is there a difference between these two representations ?
Why is float_to_bin returning the floating representation of 1.0 for 1.0000000000000002 ?
Is there some precision loss in float_to_bin induced somewhere (maybe when calling struct.pack) ?

The logic in that function to "strip but one leading zero" is completely wrong, and is removing significant digits from the result.
The correct representation of the value is neither of the values mentioned in your question; it is:
0011111111110000000000000000000000000000000000000000000000000001
which can be retrieved by replacing the last line of that function with:
return s
or by using the simpler implementation:
def float_to_bin(f):
[d] = struct.unpack(">Q", struct.pack(">d", f))
return '{:064b}'.format(d)
Leading and trailing zeroes in floating-point values are significant, and cannot be removed without altering the value.

Related

How can I check the length of a long float? Python is truncating the length [duplicate]

I have some number 0.0000002345E^-60. I want to print the floating point value as it is.
What is the way to do it?
print %f truncates it to 6 digits. Also %n.nf gives fixed numbers. What is the way to print without truncation.
Like this?
>>> print('{:.100f}'.format(0.0000002345E-60))
0.0000000000000000000000000000000000000000000000000000000000000000002344999999999999860343602938602754
As you might notice from the output, it’s not really that clear how you want to do it. Due to the float representation you lose precision and can’t really represent the number precisely. As such it’s not really clear where you want the number to stop displaying.
Also note that the exponential representation is often used to more explicitly show the number of significant digits the number has.
You could also use decimal to not lose the precision due to binary float truncation:
>>> from decimal import Decimal
>>> d = Decimal('0.0000002345E-60')
>>> p = abs(d.as_tuple().exponent)
>>> print(('{:.%df}' % p).format(d))
0.0000000000000000000000000000000000000000000000000000000000000000002345
You can use decimal.Decimal:
>>> from decimal import Decimal
>>> str(Decimal(0.0000002345e-60))
'2.344999999999999860343602938602754401109865640550232148836753621775217856801120686600683401464097113374472942165409862789978024748827516129306833728589548440037314681709534891496105046826414763927459716796875E-67'
This is the actual value of float created by literal 0.0000002345e-60. Its value is a number representable as python float which is closest to actual 0.0000002345 * 10**-60.
float should be generally used for approximate calculations. If you want accurate results you should use something else, like mentioned Decimal.
If I understand, you want to print a float?
The problem is, you cannot print a float.
You can only print a string representation of a float. So, in short, you cannot print a float, that is your answer.
If you accept that you need to print a string representation of a float, and your question is how specify your preferred format for the string representations of your floats, then judging by the comments you have been very unclear in your question.
If you would like to print the string representations of your floats in exponent notation, then the format specification language allows this:
{:g} or {:G}, depending whether or not you want the E in the output to be capitalized). This gets around the default precision for e and E types, which leads to unwanted trailing 0s in the part before the exponent symbol.
Assuming your value is my_float, "{:G}".format(my_float) would print the output the way that the Python interpreter prints it. You could probably just print the number without any formatting and get the same exact result.
If your goal is to print the string representation of the float with its current precision, in non-exponentiated form, User poke describes a good way to do this by casting the float to a Decimal object.
If, for some reason, you do not want to do this, you can do something like is mentioned in this answer. However, you should set 'max_digits' to sys.float_info.max_10_exp, instead of 14 used in the answer. This requires you to import sys at some point prior in the code.
A full example of this would be:
import math
import sys
def precision_and_scale(x):
max_digits = sys.float_info.max_10_exp
int_part = int(abs(x))
magnitude = 1 if int_part == 0 else int(math.log10(int_part)) + 1
if magnitude >= max_digits:
return (magnitude, 0)
frac_part = abs(x) - int_part
multiplier = 10 ** (max_digits - magnitude)
frac_digits = multiplier + int(multiplier * frac_part + 0.5)
while frac_digits % 10 == 0:
frac_digits /= 10
scale = int(math.log10(frac_digits))
return (magnitude + scale, scale)
f = 0.0000002345E^-60
p, s = precision_and_scale(f)
print "{:.{p}f}".format(f, p=p)
But I think the method involving casting to Decimal is probably better, overall.

Converting string to binary then xor binary

So I am trying to convert a string to binary then xor the binary by using the following methods
def string_to_binary(s):
return ' '.join(map(bin,bytearray(s,encoding='utf-8')))
def xor_bin(a,b):
return int(a,2) ^ int(b,2)
When I try and run the xor_bin function I get the following error:
Exception has occurred: exceptions.ValueError
invalid literal for int() with base 2: '0b1100010 0b1111001 0b1100101 0b1100101 0b1100101'
I can't see what's wrong here.
bin is bad here; it doesn't pad out to eight digits (so you'll lose data alignment whenever the high bit is a 0 and misinterpret all bits to the left of that loss as being lower magnitude than they should be), and it adds a 0b prefix that you don't want. str.format can fix both issues, by zero padding and omitting the 0b prefix (I also removed the space in the joiner string, since you don't want spaces in the result):
def string_to_binary(s):
return ''.join(map('{:08b}'.format, bytearray(s, encoding='utf-8')))
With that, string_to_binary('byeee') gets you '0110001001111001011001010110010101100101' which is what you want, as opposed to '0b1100010 0b1111001 0b1100101 0b1100101 0b1100101' which is obviously not a (single) valid base-2 integer.
Your question is unclear because you don't show how the two functions you defined where being used when the error occurred — therefore this answer is a guess.
You can convert a binary string representation of an integer into a Python int, (which are stored internally as binary values) by simply using passing it to the int() function — as you're doing in the xor_bin() function. Once you have two int values, you can xor them "in binary" by simply using the ^ operator — which again, you seem to know.
This means means to xor the binary string representations of two integers and convert the result back into a binary string representation could be done like this you one of your functions just as it is. Here's what I mean:
def xor_bin(a, b):
return int(a, 2) ^ int(b, 2)
s1 = '0b11000101111001110010111001011100101'
s2 = '0b00000000000000000000000000001111111'
# ---------------------------------------
# '0b11000101111001110010111001010011010' expected result of xoring them
result = xor_bin(s1, s2)
print bin(result) # -> 0b11000101111001110010111001010011010

How to find an original text representation for lower precision float values in Python?

I've run into an issue displaying float values in Python, loaded from an external data-source(they're 32bit floats, but this would apply to lower precision floats too).
(In case its important - These values were typed in by humans in C/C++, so unlike arbitrary calculated values, deviations from round numbers is likely not intended, though can't be ignored since the values may be constants such as M_PI or multiplied by constants).
Since CPython uses higher precision, (64bit typically), a value entered in as a lower precision float may repr() showing precision loss from being a 32bit-float, where the 64bit-float would show round values.
eg:
# Examples of 32bit float's displayed as 64bit floats in CPython.
0.0005 -> 0.0005000000237487257
0.025 -> 0.02500000037252903
0.04 -> 0.03999999910593033
0.05 -> 0.05000000074505806
0.3 -> 0.30000001192092896
0.98 -> 0.9800000190734863
1.2 -> 1.2000000476837158
4096.3 -> 4096.2998046875
Simply rounding the values to some arbitrary precision works in most cases, but may be incorrect since it could loose significant values with eg: 0.00000001.
An example of this can be shown by printing a float converted to a 32bit float.
def as_float_32(f):
from struct import pack, unpack
return unpack("f", pack("f", f))[0]
print(0.025) # --> 0.025
print(as_float_32(0.025)) # --> 0.02500000037252903
So my question is:
Whats the most efficient & straightforward way to get the original representation for a 32bit float, without making assumptions or loosing precision?
Put differently, if I have a data-source containing of 32bit floats, These were originally entered in by a human as round values, (examples above), but having them represented as higher precision values exposes that the value as a 32bit float is an approximation of the original value.
I would like to reverse this process, and get the round number back from the 32bit float data, but without loosing the precision which a 32bit float gives us. (which is why simply rounding isn't a good option).
Examples of why you might want to do this:
Generating API documentation where Python extracts values from a C-API that uses single precision floats internally.
When people need to read/review values of data generated which happens to be provided as single precision floats.
In both cases it's important not to loose significant precision, or show values which can't be easily read by humans at a glance.
Update, I've made a solution which I'll include as an answer (for reference and to show its possible), but highly doubt its an efficient or elegant solution.
Of course you can't know the notation used: 0.1f, 0.1F or 1e-1f where entered, that's not the purpose of this question.
You're looking to solve essentially the same problem that Python's repr solves, namely, finding the shortest decimal string that rounds to a given float. Except that in your case, the float isn't an IEEE 754 binary64 ("double precision") float, but an IEEE 754 binary32 ("single precision") float.
Just for the record, I should of course point out that retrieving the original string representation is impossible, since for example the strings '0.10', '0.1', '1e-1' and '10e-2' all get converted to the same float (or in this case float32). But under suitable conditions we can still hope to produce a string that has the same decimal value as the original string, and that's what I'll do below.
The approach you outline in your answer more-or-less works, but it can be streamlined a bit.
First, some bounds: when it comes to decimal representations of single-precision floats, there are two magic numbers: 6 and 9. The significance of 6 is that any (not-too-large, not-too-small) decimal numeric string with 6 or fewer significant decimal digits will round-trip correctly through a single-precision IEEE 754 float: that is, converting that string to the nearest float32, and then converting that value back to the nearest 6-digit decimal string, will produce a string with the same value as the original. For example:
>>> x = "634278e13"
>>> y = float(np.float32(x))
>>> y
6.342780214942106e+18
>>> "{:.6g}".format(y)
'6.34278e+18'
(Here, by "not-too-large, not-too-small" I just mean that the underflow and overflow ranges of float32 should be avoided. The property above applies for all normal values.)
This means that for your problem, if the original string had 6 or fewer digits, we can recover it by simply formatting the value to 6 significant digits. So if you only care about recovering strings that had 6 or fewer significant decimal digits in the first place, you can stop reading here: a simple '{:.6g}'.format(x) is enough. If you want to solve the problem more generally, read on.
For roundtripping in the other direction, we have the opposite property: given any single-precision float x, converting that float to a 9-digit decimal string (rounding to nearest, as always), and then converting that string back to a single-precision float, will always exactly recover the value of that float.
>>> x = np.float32(3.14159265358979)
>>> x
3.1415927
>>> np.float32('{:.9g}'.format(x)) == x
True
The relevance to your problem is there's always at least one 9-digit string that rounds to x, so we never have to look beyond 9 digits.
Now we can follow the same approach that you used in your answer: first try for a 6-digit string, then a 7-digit, then an 8-digit. If none of those work, the 9-digit string surely will, by the above. Here's some code.
def original_string(x):
for places in range(6, 10): # try 6, 7, 8, 9
s = '{:.{}g}'.format(x, places)
y = np.float32(s)
if x == y:
return s
# If x was genuinely a float32, we should never get here.
raise RuntimeError("We should never get here")
Example outputs:
>>> original_string(0.02500000037252903)
'0.025'
>>> original_string(0.03999999910593033)
'0.04'
>>> original_string(0.05000000074505806)
'0.05'
>>> original_string(0.30000001192092896)
'0.3'
>>> original_string(0.9800000190734863)
'0.98'
However, the above comes with several caveats.
First, for the key properties we're using to be true, we have to assume that np.float32 always does correct rounding. That may or may not be the case, depending on the operating system. (Even in cases where the relevant operating system calls claim to be correctly rounded, there may still be corner cases where that claim fails to be true.) In practice, it's likely that np.float32 is close enough to correctly rounded not to cause issues, but for complete confidence you'd want to know that it was correctly rounded.
Second, the above won't work for values in the subnormal range (so for float32, anything smaller than 2**-126). In the subnormal range, it's no longer true that a 6-digit decimal numeric string will roundtrip correctly through a single-precision float. If you care about subnormals, you'd need to do something more sophisticated there.
Third, there's a really subtle (and interesting!) error in the above that almost doesn't matter at all. The string formatting we're using always rounds x to the nearest places-digit decimal string to the true value of x. However, we want to know simply whether there's any places-digit decimal string that rounds back to x. We're implicitly assuming the (seemingly obvious) fact that if there's any places-digit decimal string that rounds to x, then the closest places-digit decimal string rounds to x. And that's almost true: it follows from the property that the interval of all real numbers that rounds to x is symmetric around x. But that symmetry property fails in one particular case, namely when x is a power of 2.
So when x is an exact power of 2, it's possible (but fairly unlikely) that (for example) the closest 8-digit decimal string to x doesn't round to x, but nevertheless there is an 8-digit decimal string that does round to x. You can do an exhaustive search for cases where this happens within the range of a float32, and it turns out that there are exactly three values of x for which this occurs, namely x = 2**-96, x = 2**87 and x = 2**90. For 7 digits, there are no such values. (And for 6 and 9 digits, this can never happen.) Let's take a closer look at the case x = 2**87:
>>> x = 2.0**87
>>> x
1.5474250491067253e+26
Let's take the closest 8-digit decimal value to x:
>>> s = '{:.8g}'.format(x)
>>> s
'1.547425e+26'
It turns out that this value doesn't round back to x:
>>> np.float32(s) == x
False
But the next 8-digit decimal string up from it does:
>>> np.float32('1.5474251e+26') == x
True
Similarly, here's the case x = 2**-96:
>>> x = 2**-96.
>>> x
1.262177448353619e-29
>>> s = '{:.8g}'.format(x)
>>> s
'1.2621774e-29'
>>> np.float32(s) == x
False
>>> np.float32('1.2621775e-29') == x
True
So ignoring subnormals and overflows, out of all 2 billion or so positive normal single-precision values, there are precisely three values x for which the above code doesn't work. (Note: I originally thought there was just one; thanks to #RickRegan for pointing out the error in comments.) So here's our (slightly tongue-in-cheek) fixed code:
def original_string(x):
"""
Given a single-precision positive normal value x,
return the shortest decimal numeric string which produces x.
"""
# Deal with the three awkward cases.
if x == 2**-96.:
return '1.2621775e-29'
elif x == 2**87:
return '1.5474251e+26'
elif x == 2**90:
return '1.2379401e+27'
for places in range(6, 10): # try 6, 7, 8, 9
s = '{:.{}g}'.format(x, places)
y = np.float32(s)
if x == y:
return s
# If x was genuinely a float32, we should never get here.
raise RuntimeError("We should never get here")
I think Decimal.quantize() (to round to a given number of decimal digits) and .normalize() (to strip trailing 0's) is what you need.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from decimal import Decimal
data = (
0.02500000037252903,
0.03999999910593033,
0.05000000074505806,
0.30000001192092896,
0.9800000190734863,
)
for f in data:
dec = Decimal(f).quantize(Decimal('1.0000000')).normalize()
print("Original %s -> %s" % (f, dec))
Result:
Original 0.0250000003725 -> 0.025
Original 0.0399999991059 -> 0.04
Original 0.0500000007451 -> 0.05
Original 0.300000011921 -> 0.3
Original 0.980000019073 -> 0.98
Heres a solution I've come up with which works (perfectly as far as I can tell) but isn't efficient.
It works by rounding at increasing decimal places, and returning the string when the rounded and non-rounded inputs match (when compared as values converted to lower precision).
Code:
def round_float_32(f):
from struct import pack, unpack
return unpack("f", pack("f", f))[0]
def as_float_low_precision_repr(f, round_fn):
f_round = round_fn(f)
f_str = repr(f)
f_str_frac = f_str.partition(".")[2]
if not f_str_frac:
return f_str
for i in range(1, len(f_str_frac)):
f_test = round(f, i)
f_test_round = round_fn(f_test)
if f_test_round == f_round:
return "%.*f" % (i, f_test)
return f_str
# ----
data = (
0.02500000037252903,
0.03999999910593033,
0.05000000074505806,
0.30000001192092896,
0.9800000190734863,
1.2000000476837158,
4096.2998046875,
)
for f in data:
f_as_float_32 = as_float_low_precision_repr(f, round_float_32)
print("%s -> %s" % (f, f_as_float_32))
Outputs:
0.02500000037252903 -> 0.025
0.03999999910593033 -> 0.04
0.05000000074505806 -> 0.05
0.30000001192092896 -> 0.3
0.9800000190734863 -> 0.98
1.2000000476837158 -> 1.2
4096.2998046875 -> 4096.3
If you have at least NumPy 1.14.0, you can just use repr(numpy.float32(your_value)). Quoting the release notes:
Float printing now uses “dragon4” algorithm for shortest decimal representation
The str and repr of floating-point values (16, 32, 64 and 128 bit) are now printed to give the shortest decimal representation which uniquely identifies the value from others of the same type. Previously this was only true for float64 values. The remaining float types will now often be shorter than in numpy 1.13.
Here's a demo running against a few of your example values:
>>> repr(numpy.float32(0.0005000000237487257))
'0.0005'
>>> repr(numpy.float32(0.02500000037252903))
'0.025'
>>> repr(numpy.float32(0.03999999910593033))
'0.04'
Probably what you are looking for is decimal:
Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.”
At least in python3 you can use .as_integer_ratio. That's not exactly a string but the floating point definition as such is not really well suited for giving an exact representation in "finite" strings.
a = 0.1
a.as_integer_ratio()
(3602879701896397, 36028797018963968)
So by saving these two numbers you'll never lose precision because these two exactly represent the saved floating point number. (Just divide the first by the second to get the value).
As an example using numpy dtypes (very similar to c dtypes):
# A value in python floating point precision
a = 0.1
# The value as ratio of integers
b = a.as_integer_ratio()
import numpy as np
# Force the result to have some precision:
res = np.array([0], dtype=np.float16)
np.true_divide(b[0], b[1], res)
print(res)
# Compare that two the wanted result when inputting 0.01
np.true_divide(1, 10, res)
print(res)
# Other precisions:
res = np.array([0], dtype=np.float32)
np.true_divide(b[0], b[1], res)
print(res)
res = np.array([0], dtype=np.float64)
np.true_divide(b[0], b[1], res)
print(res)
The result of all these calculations is:
[ 0.09997559] # Float16 with integer-ratio
[ 0.09997559] # Float16 reference
[ 0.1] # Float32
[ 0.1] # Float64

How to convert a string representing a binary fraction to a number in Python

Let us suppose that we have a string representing a binary fraction such as:
".1"
As a decimal number this is 0.5. Is there a standard way in Python to go from such strings to a number type (whether it is binary or decimal is not strictly important).
For an integer, the solution is straightforward:
int("101", 2)
>>>5
int() takes an optional second argument to provide the base, but float() does not.
I am looking for something functionally equivalent (I think) to this:
def frac_bin_str_to_float(num):
"""Assuming num to be a string representing
the fractional part of a binary number with
no integer part, return num as a float."""
result = 0
ex = 2.0
for c in num:
if c == '1':
result += 1/ex
ex *= 2
return result
I think that does what I want, although I may well have missed some edge cases.
Is there a built-in or standard method of doing this in Python?
The following is a shorter way to express the same algorithm:
def parse_bin(s):
return int(s[1:], 2) / 2.**(len(s) - 1)
It assumes that the string starts with the dot. If you want something more general, the following will handle both the integer and the fractional parts:
def parse_bin(s):
t = s.split('.')
return int(t[0], 2) + int(t[1], 2) / 2.**len(t[1])
For example:
In [56]: parse_bin('10.11')
Out[56]: 2.75
It is reasonable to suppress the point instead of splitting on it, as follows. This bin2float function (unlike parse_bin in previous answer) correctly deals with inputs without points (except for returning an integer instead of a float in that case).
For example, the invocations bin2float('101101'), bin2float('.11101'), andbin2float('101101.11101')` return 45, 0.90625, 45.90625 respectively.
def bin2float (b):
s, f = b.find('.')+1, int(b.replace('.',''), 2)
return f/2.**(len(b)-s) if s else f
You could actually generalize James's code to convert it from any number system if you replace the hard coded '2' to that base.
def str2float(s, base=10):
dot, f = s.find('.') + 1, int(s.replace('.', ''), base)
return f / float(base)**(len(s) - dot) if dot else f
You can use the Binary fractions package. With this package you can convert binary-fraction strings into floats and vice-versa.
Example:
>>> from binary_fractions import Binary
>>> float(Binary("0.1"))
0.5
>>> str(Binary(0.5))
'0b0.1'
It has many more helper functions to manipulate binary strings such as: shift, add, fill, to_exponential, invert...
PS: Shameless plug, I'm the author of this package.

Python: a could be rounded to b in the general case

As a part of some unit testing code that I'm writing, I wrote the following function. The purpose of which is to determine if 'a' could be rounded to 'b', regardless of how accurate 'a' or 'b' are.
def couldRoundTo(a,b):
"""Can you round a to some number of digits, such that it equals b?"""
roundEnd = len(str(b))
if a == b:
return True
for x in range(0,roundEnd):
if round(a,x) == b:
return True
return False
Here's some output from the function:
>>> couldRoundTo(3.934567892987, 3.9)
True
>>> couldRoundTo(3.934567892987, 3.3)
False
>>> couldRoundTo(3.934567892987, 3.93)
True
>>> couldRoundTo(3.934567892987, 3.94)
False
As far as I can tell, it works. However, I'm scared of relying on it considering I don't have a perfect grasp of issues concerning floating point accuracy. Could someone tell me if this is an appropriate way to implement this function? If not, how could I improve it?
Could someone tell me if this is an appropriate way to implement this function?
It depends. The given function will behave surprisingly if b isn't precisely equal to a value that would normally be obtained directly from decimal-to-binary-float conversion.
For example:
>>> print(0.1, 0.2/2, 0.3/3)
0.1 0.1 0.1
>>> couldRoundTo(0.123, 0.1)
True
>>> couldRoundTo(0.123, 0.2/2)
True
>>> couldRoundTo(0.123, 0.3/3)
False
This fails because the calculation of 0.3 / 3 results in a slightly different representation than 0.1 and 0.2 / 2 (and round(0.123, 1)).
If not, how could I improve it?
Rule of thumb: if your calculation specifically involves decimal digits in any way, just use Decimal, to avoid all the lossy base-2 round-tripping.
In particular, Decimal includes a helper called quantize that makes this problem trivially easy:
from decimal import Decimal
def roundable(a, b):
a = Decimal(str(a))
b = Decimal(str(b))
return a.quantize(b) == b
One way to do it:
def could_round_to(a, b):
(x, y) = map(len, str(b).split('.'))
round_format = "%" + "%d.%df"%(x, y)
return round_format%a == str(b)
First, we take the number of digits before and after the decimal in x and y. Then, we construct a format such as %x.yf. Then, we supply a to the format string.
>>> "%2.2f"%123.1234
'123.12'
>>> "%2.2f"%123.1264
'123.13'
>>> "%3.2f"%000.001
'0.00'
Now, all that's left is comparing the strings.
The only point that I'm afraid of is the conversion from strings to floating points when interpreting floating-point literals (as in http://docs.python.org/reference/lexical_analysis.html#floating-point-literals). I don't know if there is any guarantee that a floating-point literal will evaluate to the floating-point number that is closest to the given string. This mentioned section is the place in the specification where I would expect such a guarantee.
For example, Java is much more specific about what to expect from a string literal. From the documentation of Double.valueOf(String):
[...] [the argument] is regarded as representing an exact decimal value in the usual "computerized scientific notation" or as an exact hexadecimal value; this exact numerical value is then conceptually converted to an "infinitely precise" binary value that is then rounded to type double by the usual round-to-nearest rule of IEEE 754 floating-point arithmetic [...]
Unless you can find such a guarantee anywhere in the Python documentation, you can be just lucky, because some earlier floating-point libraries (on which Python might rely) convert a string just to a floating-point number nearby, not to the best available.
Unfortunately, it seems to me that neither round, nor float, nor the specification for floating-point literaly give you any usable guarantee.
If you purpose is to test if round function will round to the target, then you are correct. Otherwise (what else is the purpose?) if you are in doubt , you should use decimal module

Categories

Resources