How can I format a float with fstrings such that the output doesn't change, unless there is a trailing .0, then simply drop it ?
I am asking because most answers simply tell us to use the g string formatting but that is not the correct answer. When using the g if the number gets too large, the representation can leave out some information and/or change to the scientific notation as you can see here :
>>> f"{202105.35}"
'202105.35'
>>> f"{202105.35:g}"
'202105'
>>> f"{202105.0}"
'202105.0'
>>> f"{202105.0:g}"
'202105'
I want the best of both world with the use of the format parameters but without losing precision.
In my research I found that you can specify a precision with digit+ but I don't want to set a specific precision but if that's the only way to do it, then I'll have to set one.
Related
I'm trying to use read_sql_query() to read a query from MySQL database, one of the field in the database, its type is double(24, 8), I want to use dtype= parameter to have full control of the datatypes and read it to decimal, but seems like pandas can't recognize decimal type so I had to read it to Float64
In the database, the values for this field look like this:
Value
100.96000000
77.17000000
1.00000000
0.12340000
Then I'm trying to read it from Python code:
from decimal import *
dtypes = {
'id': 'Int64',
'date': 'datetime64',
'value': 'Float64'
}
df = pd.read_sql_query(sql_query, mysql_engine, dtype=dtypes)
but after reading the data from the code above, it looks like this:
Value
100.96
77.17
1.0
0.1234
How can I read this column to decimal and keep all the digits? Thanks.
What "the data looks like in the database" is tricky. This is because the act of printing it out feeds the bits through a formatting algorithm. In this case it removes trailing zeros. To see what is "in the database", one needs to get a hex dump of the file and then decipher it; this is non-trivial.
I believe that DECIMAL numbers hold all the digits specified, packed 2 digits per byte. No, I don't know how they are packed (0..99 versus 2 hex digits; what to do if the number of digits is odd; where is the sign?)
I believe that FLOAT and DOUBLE exactly conform to IEEE-764 encoding format. No, I don't know how the bytes are stored (big-endian vs little-endian). I suspect Python's Float64 is IEEE DOUBLE.
For DECIMAL(10,6), I would expect to see "1.234" to be stored as +, 0001, and 234000, but never displayed with leading zeros and optionally displayed with trailing zeros -- depending on the output formatting package.
For DOUBLE, I would expect to find hex 3ff3be76c8b43958 after adjusting for endianism, and I would not be surprised to see the output be 1.23399999999999999e+0. (Yes, I actually got that, given a suitable formatting in PHP, which I am using.) I would hope to see 1.234 since that is presumably the intent of the number.
Do not use DOUBLE(m,n). The (m,n) leads to extra rounding and it is deprecated syntax. Float and Double are not intended for exact number of decimal places; use DECIMAL for such.
For FLOAT: 1.234 becomes hex 3f9df3b6 and displays something like 1.2339999675751 assuming the output method works in DOUBLE and is asked to show lots of decimal places.
Bottom line: The output method you are using is causing the problem.
I want to format some values with a fixed precision of 3 unless it's an integer. In that case I don't want any decimal point or trailing 0s.
Acording to the docs, the 'f' type in string formating should remove the decimal point if no digits follow it:
If no digits follow the decimal point, the decimal point is also removed unless the # option is used.
But testing it with python3.8 I get the following results:
>>> f'{123:.3f}'
'123.000'
>>> f'{123.0:.3f}'
'123.000'
Am I misunderstanding something? How could I achive the desired result without using if else checks?
In order to forcefully achieve both your desired outputs with the same f-string expression, you could apply some kung-fu like
i = 123
f"{i:.{3*isinstance(i, float)}f}"
# '123'
i = 123.0
f"{i:.{3*isinstance(i, float)}f}"
# '123.000'
But this won't improve your code in terms of readability. There's no harm in being more explicit.
So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...
When using Python's decimal.Decimal class we now have a need to drop extraneous decimal places. So for instance '0.00' becomes '0' and '0.50' becomes '0.5'. Is there a cleaner way of doing this than converting to a string and manually dropping trailing zeros and full stops?
To clarify we need to be able to dynamically round the result without knowing the number of decimal places in advance and potentially output an integer (or a string representation of one) if no decimal places are needed... is this already built-in to Python?
Use Decimal.normalize:
>>> Decimal('0.00').normalize()
Decimal('0')
>>> Decimal('0.50').normalize()
Decimal('0.5')
I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like
mydec = decimal.Decimal(str(x))
where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?
For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.
>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'
Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:
>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')
So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)
There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!
In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:
/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
the rounding noise created by various operations is suppressed, while
giving plenty of precision for practical use. */
#define PyFloat_STR_PRECISION 12
You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.
There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.
If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).
As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to
return a string containing a nicely printable representation of an object
There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.
If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):
Decimal( "{0:.5f}".format(a_float) )
You can also remove 0s on the right with resulting_string.rstrip("0").
Again, this method does not recover the information that has been lost.