Python Format Strings and Floating Point Representation

Python Format Strings and Floating Point Representation - python

I'm working with some in-place code dealing with formatting user-stored floating point numbers for human display.
The current implementation does this:
"{0:.24f}".format(some_floating_point).rstrip('0')
which makes sense and works just fine for the most part. But when faced with a value of such as 0.0003 things don't go as well.
>>> "{0:.24f}".format(0.0003).rstrip('0')
'0.000299999999999999973719'
Some further investigation indicates that Python seems to change the underlying representation based on the number of digits requested?
>>> "{0:.15f}".format(0.0003)
'0.000300000000000'
>>> "{0:.20f}".format(0.0003)
'0.00029999999999999997'
My assumption is single precision vs double.
The user enters these values where they are stored in the database as a double, and when the form is rendered again later the same value is prepopulated in the field. Therefore I need a 1:1 mapping of these representations.
My question is therefore: What is an elegant, and more importantly safe way to deal with this behavior? My best efforts so far have involved log10 and are less than ideal to put it nicely.
EDIT: As Prune points out the value is not actually changing, but rather the rounding done by format will carry over causing a set of 9s to become 0s (d'oh). The behavior makes sense then, but the solution is still escaping me.

You are receiving the number as stored. 0.0003 cannot be stored exactly as a binary fraction. To illustrate:
>>> 0.00029999999999999997 == 0.0003
True
Print formatting rounds the number at the least significant digit. Double precision merely pushes the problem farther to the right. To fully "solve" the problem to base-10 eyes, you need to switch to decimal arithmetic, or perhaps build your own string handler for numbers that are sufficiently close to a simpler value (a suspicious string of 9's or 0's in the fractional part).
Here's the start of a function for you. I tested it with 0.0004, which stores as a hair more than 0.0004; the 9's case is left as an exercise :-) .
def str_round(x):
size = 6
nines = '9'*size
zeros = '0'*size
str = "{0:.24f}".format(x).rstrip('0')
str_len = len(str)
print str, str_len
if nines in str:
# replace leading digit with one more
pos = str.index(nines)
# ADD CODE HERE
# Turn the leading portion into an integer;
# increment and convert back to zero-leading string.
# Fill out the rest with zeros.
elif zeros in str:
# Change all trailing digits to 0
pos = str.index(zeros)
str = str[:pos] + '0'*(str_len - pos)
return str
print str_round(0.0004)

Related

Python decimal.Decimal producing result in scientific notation

I'm dividing a very long into much smaller number. Both are of type decimal.Decimal().
The result is coming out in scientific notation. How do I stop this? I need to print the number in full.
>>> decimal.getcontext().prec
50
>>> val
Decimal('1000000000000000000000000')
>>> units
Decimal('1500000000')
>>> units / val
Decimal('1.5E-15')

The precision is kept internally - you just have to explicitly call for the number of decimal places you want at the point you are exporting your decimal value to a string.
So, if you are going a print, or inserting the value in an HTML template, the first step is to use the string format method (or f-strings), to ensure the number is encompassed:
In [29]: print(f"{units/val:.50f}")
0.00000000000000150000000000000000000000000000000000
Unfortunatelly, the string-format minilanguage has no way to eliminate by itself the redundant zeroes on the right hand side. (the left side can be padded with "0", " ", custom characters, whatever one want, but all the precision after the decimal separator is converted to trailing 0s).
Since finding the least significant non-zero digit is complicated - otherwiser we could use a parameter extracted from the number instead of the "50" for precision in the format expression, the simpler thing is to remove those zeros after formatting take place, with the string .rstrip method:
In [30]: print(f"{units/val:.50f}".rstrip("0"))
0.0000000000000015
In short: this seems to be the only way to go: in all interface points, where the number is leaving the core to an output where it is representd as a string, you format it with an excess of precision with the fixed point notation, and strip out the tailing zeros with f-string:
return template.render(number=f"{number:.50f}".rstrip("0"), ...)

Render the decimal into a formatted string with a float type-indicator {:,f}, and it will display just the right number of digits to express the whole number, regardless of whether it is a very large integer or a very large decimal.
>>> val
Decimal('1000000000000000000000000')
>>> units
Decimal('1500000000')
>>> "{:,f}".format(units / val)
'0.0000000000000015'
# very large decimal integer, formatted as float-type string, appears without any decimal places at all when it has none! Nice!
>>> "{:,f}".format(units * val)
'1,500,000,000,000,000,000,000,000,000,000,000'
You don't need to specify the decimal places. It will display only as many as required to express the number, omitting that trail of useless zeros that appear after the final decimal digit when the decimal is shorter than a fixed format width. And you don't get any decimal places if the number has no fraction part.
Very large numbers are therefore accommodated without having to second guess how large they will be. And you don't have to second guess whether they will be have decimal places either.
Any specified thousands separator {:,f} will likewise only have effect if it turns out that the number is a large integer instead of a long decimal.
Proviso
Decimal(), however, has this idea of significant places, by which it will add trailing zeros if it thinks you want them.
The idea is that it intelligently handles situations where you might be dealing with currency digits such as £ 10.15. To use the example from the documentation:
>>> decimal.Decimal('1.30') + decimal.Decimal('1.20')
Decimal('2.50')
It makes no difference if you format the Decimal() - you still get the trailing zero if the Decimal() deems it to be significant:
>>> "{:,f}".format( decimal.Decimal('1.30') + decimal.Decimal('1.20'))
'2.50'
The same thing happens (perhaps for some good reason?) when you treat thousands and fractions together:
>>> decimal.Decimal(2500) * decimal.Decimal('0.001')
Decimal('2.500')
Remove significant trailing zeros with the Decimal().normalize() method:
>>> (2500 * decimal.Decimal('0.001')).normalize()
Decimal('2.5')

Reconstructing two (string concatenated) numbers that were originally floats

Unfortunately the printing instruction of a code was written without an end-of-the-line character and one every 26 numbers consists of two numbers joined together. The following is a code that shows an example of such behaviour; at the end there is a fragment of the original database.
import numpy as np
for _ in range(2):
A=np.random.rand()+np.random.randint(0,100)
B=np.random.rand()+np.random.randint(0,100)
C=np.random.rand()+np.random.randint(0,100)
D=np.random.rand()+np.random.randint(0,100)
with open('file.txt','a') as f:
f.write(f'{A},{B},{C},{D}')
And thus the output example file looks very similar to what follows:
40.63358599010553,53.86722741700399,21.800795158561158,13.95828176311762557.217562728494684,2.626308403991772,4.840593988487278,32.401778122213486
With the issue being that there are two numbers 'printed together', in the example they were as follows:
13.95828176311762557.217562728494684
So you cannot know if they should be
13.958281763117625, 57.217562728494684
or
13.9582817631176255, 7.217562728494684
Please understand that in this case they are only two options, but the problem that I want to address considers 'unbounded numbers' which are type Python's "float" (where 'unbounded' means in a range we don't know e.g. in the range +- 1E4)
Can the original numbers be reconstructed based on "some" python internal behavior I'm missing?
Actual data with periodicity 27 (i.e. the 26th number consists of 2 joined together):
0.9221878978925224, 0.9331311610066017,0.8600582424784715,0.8754578588852764,0.8738648974725404, 0.8897837559800233,0.6773502027673041,0.736325377603136,0.7956454122424133, 0.8083168444596229,0.7089031184165164, 0.7475306242508357,0.9702361286847581, 0.9900689384633811,0.7453878225174624, 0.7749000030576826,0.7743879170108678, 0.8032590543649807,0.002434,0.003673,0.004194,0.327903,11.357262,13.782266,20.14374,31.828905,33.9260060.9215201173775437, 0.9349343132442707,0.8605282244327555,0.8741626682026793,0.8742163597524663, 0.8874673376386358,0.7109322043854609,0.7376362393985332,0.796158275345

To expand my comment into an actual answer:
We do have some information - An IEEE-754 standard float only has 32 bits of precision, some of which is taken up by the mantissa (not all numbers can be represented by a float). For datasets like yours, they're brushing up against the edge of that precision.
We can make that work for us - we just need to test whether the number can, in fact, be represented by a float, at each possible split point. We can abuse strings for this, by testing num_str == str(float(num_str)) (i.e. a string remains the same after being converted to a float and back to a string)
If your number is able to be represented exactly by the IEEE float standard, then the before and after will be equal
If the number cannot be represented exactly by the IEEE float standard, it will be coerced into the nearest number that the float can represent. Obviously, if we then convert this back to a string, will not be identical to the original.
Here's a snippet, for example, that you can play around with
def parse_number(s: str) -> List[float]:
if s.count('.') == 2:
first_decimal = s.index('.')
second_decimal = s[first_decimal + 1:].index('.') + first_decimal + 1
split_idx = second_decimal - 1
for i in range(second_decimal - 1, first_decimal + 1, -1):
a, b = s[:split_idx], s[split_idx:]
if str(float(a)) == a and str(float(b)) == b:
return [float(a), float(b)]
# default to returning as large an a as possible
return [float(s[:second_decimal - 1]), float(s[second_decimal - 1:])]
else:
return [float(s)]
parse_number('33.9260060.9215201173775437')
# [33.926006, 0.9215201173775437]
# this is the only possible combination that actually works for this particular input
Obviously this isn't foolproof, and for some numbers there may not be enough information to differentiate the first number from the second. Additionally, for this to work, the tool that generated your data needs to have worked with IEEE standards-compliant floats (which does appear to be the case in this example, but may not be if the results were generated using a class like Decimal (python) or BigDecimal (java) or something else).
Some inputs might also have multiple possibilities. In the above snippet I've biased it to take the longest possible [first number], but you could modify it to go in the opposite order and instead take the shortest possible [first number].

Yes, you have one available weapon: you're using the default precision to display the numbers. In the example you cite, there are 15 digits after the decimal point, making it easy to reconstruct the original numbers.
Let's take a simple case, where you have only 3 digits after the decimal point. It's trivial to separate
13.95857.217
The formatting requires a maximum of 2 digits before the decimal point, and three after.
Any case that has five digits between the points, is trivial to split.
13.958 57.217
However, you run into the "trailing zero" problem in some cases. If you see, instead
13.9557.217
This could be either
13.950 57.217
or
13.955 07.217
Your data do not contain enough information to differentiate the two cases.

How to dynamically format string representation of float number in python?

Hi I would like to dynamically adjust the displayed decimal places of a string representation of a floating point number, but i couldn't find any information on how to do it.
E.g:
precision = 8
n = 7.12345678911
str_n = '{0:.{precision}}'.format(n)
print(str_n) should display -> 7.12345678
But instead i'm getting a "KeyError". What am i missing?

You need to specify where precision in your format string comes from:
precision = 8
n = 7.12345678911
print('{0:.{precision}}'.format(n, precision=precision))
The first time, you specified which argument you'd like to be the number using an index ({0}), so the formatting function knows where to get the argument from, but when you specify a placeholder by some key, you have to explicitly specify that key.
It's a little unusual to mix these two systems, i'd recommend staying with one:
print('{number:.{precision}}'.format(number=n, precision=precision)) # most readable
print('{0:.{1}}'.format(n, precision))
print('{:.{}}'.format(n, precision)) # automatic indexing, least obvious
It is notable that these precision values will include the numbers before the point, so
>>> f"{123.45:.3}"
'1.23e+02'
will give drop drop the decimals and only give the first three digits of the number.
Instead, the f can be supplied to the type of the format (See the documentation) to get fixed-point formatting with precision decimal digits.
print('{number:.{precision}f}'.format(number=n, precision=precision)) # most readable
print('{0:.{1}f}'.format(n, precision))
print('{:.{}f}'.format(n, precision)) # automatic indexing, least obvious

In addition to #Talon, for those interested in f-strings, this also works.
precision = 8
n = 7.12345678911
print(f'{n:.{precision}f}')

Python 3.x rounding half up

I know that questions about rounding in python have been asked multiple times already, but the answers did not help me. I'm looking for a method that is rounding a float number half up and returns a float number. The method should also accept a parameter that defines the decimal place to round to. I wrote a method that implements this kind of rounding. However, I think it does not look elegant at all.
def round_half_up(number, dec_places):
s = str(number)
d = decimal.Decimal(s).quantize(
decimal.Decimal(10) ** -dec_places,
rounding=decimal.ROUND_HALF_UP)
return float(d)
I don't like it, that I have to convert float to a string (to avoid floating point inaccuracy) and then work with the decimal module.
Do you have any better solutions?
Edit: As pointed out in the answers below, the solution to my problem is not that obvious as correct rounding requires correct representation of numbers in the first place and this is not the case with float. So I would expect that the following code
def round_half_up(number, dec_places):
d = decimal.Decimal(number).quantize(
decimal.Decimal(10) ** -dec_places,
rounding=decimal.ROUND_HALF_UP)
return float(d)
(that differs from the code above just by the fact that the float number is directly converted into a decimal number and not to a string first) to return 2.18 when used like this: round_half_up(2.175, 2) But it doesn't because Decimal(2.175) will return Decimal('2.17499999999999982236431605997495353221893310546875'), the way the float number is represented by the computer.
Suprisingly, the first code returns 2.18 because the float number is converted to string first. It seems that the str() function conducts an implicit rounding to the number that was initially meant to be rounded. So there are two roundings taking place. Even though this is the result that I would expect, it is technically wrong.

Rounding is surprisingly hard to do right, because you have to handle floating-point calculations very carefully. If you are looking for an elegant solution (short, easy to understand), what you have like like a good starting point. To be correct, you should replace decimal.Decimal(str(number)) with creating the decimal from the number itself, which will give you a decimal version of its exact representation:
d = Decimal(number).quantize(...)...
Decimal(str(number)) effectively rounds twice, as formatting the float into the string representation performs its own rounding. This is because str(float value) won't try to print the full decimal representation of the float, it will only print enough digits to ensure that you get the same float back if you pass those exact digits to the float constructor.
If you want to retain correct rounding, but avoid depending on the big and complex decimal module, you can certainly do it, but you'll still need some way to implement the exact arithmetics needed for correct rounding. For example, you can use fractions:
import fractions, math
def round_half_up(number, dec_places=0):
sign = math.copysign(1, number)
number_exact = abs(fractions.Fraction(number))
shifted = number_exact * 10**dec_places
shifted_trunc = int(shifted)
if shifted - shifted_trunc >= fractions.Fraction(1, 2):
result = (shifted_trunc + 1) / 10**dec_places
else:
result = shifted_trunc / 10**dec_places
return sign * float(result)
assert round_half_up(1.49) == 1
assert round_half_up(1.5) == 2
assert round_half_up(1.51) == 2
assert round_half_up(2.49) == 2
assert round_half_up(2.5) == 3
assert round_half_up(2.51) == 3
Note that the only tricky part in the above code is the precise conversion of a floating-point to a fraction, and that can be off-loaded to the as_integer_ratio() float method, which is what both decimals and fractions do internally. So if you really want to remove the dependency on fractions, you can reduce the fractional arithmetic to pure integer arithmetic; you stay within the same line count at the expense of some legibility:
def round_half_up(number, dec_places=0):
sign = math.copysign(1, number)
exact = abs(number).as_integer_ratio()
shifted = (exact[0] * 10**dec_places), exact[1]
shifted_trunc = shifted[0] // shifted[1]
difference = (shifted[0] - shifted_trunc * shifted[1]), shifted[1]
if difference[0] * 2 >= difference[1]: # difference >= 1/2
shifted_trunc += 1
return sign * (shifted_trunc / 10**dec_places)
Note that testing these functions brings to spotlight the approximations performed when creating floating-point numbers. For example, print(round_half_up(2.175, 2)) prints 2.17 because the decimal number 2.175 cannot be represented exactly in binary, so it is replaced by an approximation that happens to be slightly smaller than the 2.175 decimal. The function receives that value, finds it smaller than the actual fraction corresponding to the 2.175 decimal, and decides to round it down. This is not a quirk of the implementation; the behavior derives from properties of floating-point numbers and is also present in the round built-in of Python 3 and 2.

I don't like it, that I have to convert float to a string (to avoid
floating point inaccuracy) and then work with the decimal module. Do
you have any better solutions?
Yes; use Decimal to represent your numbers throughout your whole program, if you need to represent numbers such as 2.675 exactly and have them round to 2.68 instead of 2.67.
There is no other way. The floating point number which is shown on your screen as 2.675 is not the real number 2.675; in fact, it is very slightly less than 2.675, which is why it gets rounded down to 2.67:
>>> 2.675 - 2
0.6749999999999998
It only shows in string form as '2.675' because that happens to be the shortest string such that float(s) == 2.6749999999999998. Note that this longer representation (with lots of 9s) isn't exact either.
However you write your rounding function, it is not possible for my_round(2.675, 2) to round up to 2.68 and also for my_round(2 + 0.6749999999999998, 2) to round down to 2.67; because the inputs are actually the same floating point number.
So if your number 2.675 ever gets converted to a float and back again, you have already lost the information about whether it should round up or down. The solution is not to make it float in the first place.

After trying for a very long time to produce an elegant one-line function, I ended up getting something that is comparable to a dictionary in size.
I would say the simplest way to do this is just to
def round_half_up(inp,dec_places):
return round(inp+0.0000001,dec_places)
i would acknowledge that this is not accurate in every cases, but should work if you just want a simple quick workaround.

Float converted to 2.dp reverts to original number of decimal places when inserted into a string

I have created the following snippet of code and I am trying to convert my 5 dp DNumber to a 2 dp one and insert this into a string. However which ever method I try to use, always seems to revert the DNumber back to the original number of decimal places (5)
Code snippet below:
if key == (1, 1):
DNumber = '{r[csvnum]}'.format(r=row)
# returns 7.65321
DNumber = """%.2f""" % (float(DNumber))
# returns 7.65
Check2 = False
if DNumber:
if DNumber <= float(8):
Check2 = True
if Check2:
print DNumber
# returns 7.65
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str("""%.2f""" % (float(gtpe))))
# returns: test Hello 7.65321 test
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str(DNumber))
# returns: test Hello 7.65321 test
What I hoped it would return: test Hello 7.65 test
Any Ideas or suggestion on alternative methods to try?

It seems like you were hoping that converting the float to a 2-decimal-place string and then back to a float would give you a 2-decimal-place float.
The first problem is that your code doesn't actually do that anywhere. If you'd done that, you would get something very close to 7.65, not 7.65321.
But the bigger problem is that what you're trying to do doesn't make any sense. A float always has 53 binary digits, no matter what. If you round it to two decimal digits (no matter how you do it, including by converting to string and back), what you actually get is a float rounded to two decimal digits and then rounded to 53 binary digits. The closest float to 7.65 is not exactly 7.65, but 7.650000000000000355271368.* So, that's what you'd end up with. And there's no way around that; it's inherent to the way float is stored.
However, there is a different type you can use for this: decimal.Decimal. For example:
>>> f = 7.65321
>>> s = '%.2f' % f
>>> d = decimal.Decimal(s)
>>> f, s, d
(7.65321, '7.65', Decimal('7.65'))
Or, of course, you could just pass around a string instead of a float (as you're accidentally doing in your code already), or you could remember to use the .2f format every time you want to output it.
As a side note, since your DNumber ends up as a string, this line is not doing anything useful:
if DNumber <= 8:
In Python 2.x, comparing two values of different types gives you a consistent but arbitrary and meaningless answer. With CPython 2.x, it will always be False.** In a different Python 2.x implementation, it might be different. In Python 3.x, it raises a TypeError.
And changing it to this doesn't help in any way:
if DNumber <= float(8):
Now, instead of comparing a str to an int, you're comparing a str to a float. This is exactly as meaningless, and follows the exact same rules. (Also, float(8) means the same thing as 8.0, but less readable and potentially slower.)
For that matter, this:
if DNumber:
… is always going to be true. For a number, if foo checks whether it's non-zero. That's a bad idea for float values (you should check whether it's within some absolute or relative error range of 0). But again, you don't have a float value; you have a str. And for strings, if foo checks whether the string is non-empty. So, even if you started off with 0, your string "0.00" is going to be true.
* I'm assuming here that you're using CPython, on a platform that uses IEEE-754 double for its C double type, and that all those extra conversions back and forth between string and float aren't introducing any additional errors.
** The rule is, slightly simplified: If you compare two numbers, they're converted to a type that can hold them both; otherwise, if either value is None it's smaller; otherwise, if either value is a number, it's smaller; otherwise, whichever one's type has an alphabetically earlier name is smaller.

I think you're trying to do the following - combine the formatting with the getter:
>>> a = 123.456789
>>> row = {'csvnum': a}
>>> print 'test {r[csvnum]:.2f} hello'.format(r=row)
test 123.46 hello

If your number is a 7 followed by five digits, you might want to try:
print "%r" % float(str(x)[:4])
where x is the float in question.
Example:
>>>x = 1.11111
>>>print "%r" % float(str(x)[:4])
>>>1.11

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Format Strings and Floating Point Representation - python

Related

Python decimal.Decimal producing result in scientific notation

Reconstructing two (string concatenated) numbers that were originally floats

How to dynamically format string representation of float number in python?

Python 3.x rounding half up

Float converted to 2.dp reverts to original number of decimal places when inserted into a string

Categories

Resources