I am maintaining a Python script that uses xlrd to retrieve values from Excel spreadsheets, and then do various things with them. Some of the cells in the spreadsheet are high-precision numbers, and they must remain as such. When retrieving the values of one of these cells, xlrd gives me a float such as 0.38288746115497402.
However, I need to get this value into a string later on in the code. Doing either str(value) or unicode(value) will return something like "0.382887461155". The requirements say that this is not acceptable; the precision needs to be preserved.
I've tried a couple things so far to no success. The first was using a string formatting thingy:
data = "%.40s" % (value)
data2 = "%.40r" % (value)
But both produce the same rounded number, "0.382887461155".
Upon searching around for people with similar problems on SO and elsewhere on the internet, a common suggestion was to use the Decimal class. But I can't change the way the data is given to me (unless somebody knows of a secret way to make xlrd return Decimals). And when I try to do this:
data = Decimal(value)
I get a TypeError: Cannot convert float to Decimal. First convert the float to a string. But obviously I can't convert it to a string, or else I will lose the precision.
So yeah, I'm open to any suggestions -- even really gross/hacky ones if necessary. I'm not terribly experienced with Python (more of a Java/C# guy myself) so feel free to correct me if I've got some kind of fundamental misunderstanding here.
EDIT: Just thought I would add that I am using Python 2.6.4. I don't think there are any formal requirements stopping me from changing versions; it just has to not mess up any of the other code.
I'm the author of xlrd. There is so much confusion in other answers and comments to rebut in comments so I'm doing it in an answer.
#katriealex: """precision being lost in the guts of xlrd""" --- entirely unfounded and untrue. xlrd reproduces exactly the 64-bit float that's stored in the XLS file.
#katriealex: """It may be possible to modify your local xlrd installation to change the float cast""" --- I don't know why you would want to do this; you don't lose any precision by floating a 16-bit integer!!! In any case that code is used only when reading Excel 2.X files (which had an INTEGER-type cell record). The OP gives no indication that he is reading such ancient files.
#jloubert: You must be mistaken. "%.40r" % a_float is just a baroque way of getting the same answer as repr(a_float).
#EVERYBODY: You don't need to convert a float to a decimal to preserve the precision. The whole point of the repr() function is that the following is guaranteed:
float(repr(a_float)) == a_float
Python 2.X (X <= 6) repr gives a constant 17 decimal digits of precision, as that is guaranteed to reproduce the original value. Later Pythons (2.7, 3.1) give the minimal number of decimal digits that will reproduce the original value.
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32
>>> f = 0.38288746115497402
>>> repr(f)
'0.38288746115497402'
>>> float(repr(f)) == f
True
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
>>> f = 0.38288746115497402
>>> repr(f)
'0.382887461154974'
>>> float(repr(f)) == f
True
So the bottom line is that if you want a string that preserves all the precision of a float object, use preserved = repr(the_float_object) ... recover the value later by float(preserved). It's that simple. No need for the decimal module.
You can use repr() to convert to a string without losing precision, then convert to a Decimal:
>>> from decimal import Decimal
>>> f = 0.38288746115497402
>>> d = Decimal(repr(f))
>>> print d
0.38288746115497402
EDIT: I am wrong. I shall leave this answer here so the rest of the thread makes sense, but it's not true. Please see John Machin's answer above. Thanks guys =).
If the above answers work that's great -- it will save you a lot of nasty hacking. However, at least on my system, they won't. You can check this with e.g.
import sys
print( "%.30f" % sys.float_info.epsilon )
That number is the smallest float that your system can distinguish from zero. Anything smaller than that may be randomly added or subtracted from any float when you perform an operation. This means that, at least on my Python setup, the precision is lost inside the guts of xlrd, and there seems to be nothing you can do without modifying it. Which is odd; I'd have expected this case to have occurred before, but apparently not!
It may be possible to modify your local xlrd installation to change the float cast. Open up site-packages\xlrd\sheet.py and go down to line 1099:
...
elif rc == XL_INTEGER:
rowx, colx, cell_attr, d = local_unpack('<HH3sH', data)
self_put_number_cell(rowx, colx, float(d), self.fixed_BIFF2_xfindex(cell_attr, rowx, colx))
...
Notice the float cast -- you could try changing that to a decimal.Decimal and see what happens.
EDIT: Cleared my previous answer b/c it didn't work properly.
I'm on Python 2.6.5 and this works for me:
a = 0.38288746115497402
print repr(a)
type(repr(a)) #Says it's a string
Note: This just converts to a string. You'll need to convert to Decimal yourself later if needed.
As has already been said, a float isn't precise at all - so preserving precision can be somewhat misleading.
Here's a way to get every last bit of information out of a float object:
>>> from decimal import Decimal
>>> str(Decimal.from_float(0.1))
'0.1000000000000000055511151231257827021181583404541015625'
Another way would be like so.
>>> 0.1.hex()
'0x1.999999999999ap-4'
Both strings represent the exact contents of the float. Allmost anything else interprets the float as python thinks it was probably intended (which most of the time is correct).
Related
I'm trying to use python to investigate the effect of C++ truncating doubles to floats.
In C++ I have relativistic energies and momenta which are cast to floats, and I'm trying to work out whether at these energies saving them as doubles would actually result in any improved precision in the difference between energy and momentum.
I chose python because it seemed like a quick and easy way to look at this with some test energies, but now I realise that the difference between C++ 32 bit floats and 64 bit doubles isn't reflect in python.
I haven't yet found a simple way of taking a number and reducing the number of bits used to store it in python. Please can someone enlighten me?
I'd suggest using Numpy as well. It exposes various data types including C style floats and doubles.
Other useful tools are the C++17 style hex encoding and the decimal module for getting accurate decimal expansions.
For example:
import numpy as np
from decimal import Decimal
for ftype in (np.float32, np.float64):
v = np.exp(ftype(1))
pyf = float(v)
print(f"{v.dtype} {v:20} {pyf.hex()} {Decimal(pyf)}")
giving
float32 2.7182819843292236 0x1.5bf0aa0000000p+1 2.7182819843292236328125
float64 2.718281828459045 0x1.5bf0a8b145769p+1 2.718281828459045090795598298427648842334747314453125
Unfortunately the hex-encoding is a bit verbose for float32s (i.e. the zeros are redundant), and the decimal module doesn't know about Numpy floats, so you need to convert it to a Python-native float first. But given that binary32's can be directly converted to binary64's this doesn't seem too bad.
Just thought, that it sounds like you might want these written out to a file. If so, Numpy scalars (and ndarrays) support the buffer protocol, which means you can just write them out or use bytes(v) to get the underlying bytes.
I'm trying to have Python replicate some FORTRAN output of real values. My FORTRAN prints the real value as "31380.". I'm trying to replicate the same in Python--note that although I have no decimal places, I actually want the decimal point (period) to be printed. My current code is
htgm=31380.
print '{:6.0f}'.format(htgm)
which yields "31380". What am I doing wrong?
Python format language includes an 'alternate' form for floats which forces the decimal point by using a '#' in the format string:
>>> htgm=31380.
>>> format(htgm, '#.0f')
'31380.'
Which is what I think you are looking for.
I thought #g would be what you wanted but for some reason python adds the 0 back on:
>>> htgm=31380.
>>> format(htgm, 'g')
'31380'
>>> format(htgm, '#g')
'31380.0'
It is not possible to do it Python keeping the type of htgm as float. However if you are OK with making it as str, you may do:
htgm=31380.
'{0:.0f}.'.format(htgm)
# returns: '31380.'
# OR, even simply
'{}.'.format(int(htgm))
When you need to display the number, use:
print(str(htgm)[:-1])
This notation will shave off the last '0'.
I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like
mydec = decimal.Decimal(str(x))
where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?
For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.
>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'
Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:
>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')
So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)
There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!
In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:
/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
the rounding noise created by various operations is suppressed, while
giving plenty of precision for practical use. */
#define PyFloat_STR_PRECISION 12
You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.
There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.
If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).
As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to
return a string containing a nicely printable representation of an object
There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.
If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):
Decimal( "{0:.5f}".format(a_float) )
You can also remove 0s on the right with resulting_string.rstrip("0").
Again, this method does not recover the information that has been lost.
I am trying to write a method in Python 3.2 that encrypts a phrase and then decrypts it. The problem is that the numbers are so big that when Python does math with them it immediately converts it into scientific notation. Since my code requires all the numbers to function scientific notation, this is not useful.
What I have is:
coded = ((eval(input(':'))+1213633288469888484)/2)+1042
Basically, I just get a number from the user and do some math to it.
I have tried format() and a couple other things but I can't get them to work.
EDIT: I use only even integers.
In python3, '/' does real division (e.g. floating point). To get integer division, you need to use //. In other words 100/2 yields 50.0 (float) whereas 100//2 yields 50 (integer)
Your code probably needs to be changed as:
coded = ((eval(input(':'))+1213633288469888484)//2)+1042
As a cautionary tale however, you may want to consider using int instead of eval:
coded = ((int(input(':'))+1213633288469888484)//2)+1042
If you know that the floating point value is really an integer, or you don't care about dropping the fractional part, you can just convert it to an int before you print it.
>>> print 1.2e16
1.2e+16
>>> print int(1.2e16)
12000000000000000
Is there some way I could make decimal.Decimal the default type for all numerical values in Python? I would like to be able to use Python in a manner similar to the bc and dc programs without having to call decimal.Decimal(...) for every number.
EDIT: For the uninitiated: bc.
EDIT 2: Thank you tokenize module..
At the bottom of the tokenize module's documentation, there is a function that does exactly what I need:
Python 3: "Example of a script rewriter that transforms float literals into Decimal objects"
Python 2: "Example of a script re-writer that transforms float literals into Decimal objects"
You cannot really do what you ask without some serious magic, which I won't try to touch upon in my answer, but there is at least a slightly easier way than doing decimal.Decimal(...)
from decimal import Decimal as D
num = D("1") + D("2.3")
Edit: use the shorter form from the comment.