Changing number representation in IDLE

Changing number representation in IDLE - python

I use Python IDLE a lot in my day-to-day job, mostly for short scripts and as a powerful and convenient calculator.
I usually have to work with different numeric bases (mostly decimal, hexadecimal, binary and less frequently octal and other bases.)
I know that using int(), hex(), bin(), oct() is a convenient way to move from one base to another and prefixing integer literals with the right prefix is another way to express an number.
I find it quite inconvenient to have to put a calculation in a function just to see the result in the right base (and the resulting ouput of hex() and similar functions is a string) , so what I'm trying to achieve is to have either a function (or maybe a statement?) that set the internal IDLE number representation to a known base (2, 8, 10, 16).
Example :
>>> repr_hex() # from now on, all number are considered hexadecimal, in input and in output
>>> 10 # 16 in dec
>>> 0x10 # now output is also in hexadecimal
>>> 1e + 2
>>> 0x20
# override should be possible with integer literal prefixes
# 0x: hex ; 0b: bin ; 0n: dec ; 0o: oct
>>> 0b111 + 10 + 0n10 # dec : 7 + 16 + 10
>>> 0x21 # 33 dec
# still possible to override output representation temporarily with a conversion function
>>> conv(_, 10) # conv(x, output_base, current_base=internal_base)
>>> 0n33
>>> conv(_, 2) # use prefix of previous output to set current_base to 10
>>> 0b100001
>>> conv(10, 8, 16) # convert 10 to base 8 (10 is in base 16: 0x10)
>>> 0o20
>>> repr_dec() # switch to base 10, in input and in output
>>> _
>>> 0n16
>>> 10 + 10
>>> 0n20
Implementing those features doesn't seem to be difficult, what I don't know is:
Is it possible to change number representation in IDLE?
Is it possible to do this without having to change IDLE (source code) itself? I looked at IDLE extensions, but I don't know where to start to have access to IDLE internals from there.
Thank you.

IDLE does not have a number representation. It sends the code you enter to a Python interpreter and displays the string sent back in response. In this sense, it is irrelevant that IDLE is written in Python. The same is true of any IDE or REPL for Python code.
That said, the CPython sys module has a displayhook function. For 3.5:
>>> help(sys.displayhook)
Help on built-in function displayhook in module sys:
displayhook(...)
displayhook(object) -> None
Print an object to sys.stdout and also save it in builtins._
That actually should be __builtins__._, as in the example below. Note that the input is any Python object. For IDLE, the default sys.displayhook is a function defined in idlelib/rpc.py. Here is an example relevant to your question.
>>> def new_hook(ob):
if type(ob) is int:
ob = hex(ob)
__builtins__._ = ob
print(ob)
>>> sys.displayhook = new_hook
>>> 33
0x21
>>> 0x21
0x21
This gives you the more important half of what you asked for. Before actually using anything in IDLE, I would look at the default version to make sure I did not miss anything. One could write an extension to add menu entries that would switch displayhooks.
Python intentionally does not have an input preprocessor function. GvR wants the contents of a .py file to always be python code as defined in some version of the reference manual.
I have thought about the possibility of adding an inputhook to IDLE, but I would not allow one to be active when running a .py file from the editor. If there were one added for the Shell, I would change the prompt from '>>>' to something else, such as 'hex>' or 'bin>'.
EDIT:
One could also write an extension to rewrite input code when explicitly requested either with a menu selection or a hot key or key binding. Or one could edit the current idlelib/ScriptBinding.py to make rewriting automatic. The hook I have thought about would make this easier, but not expand what can be done now.

Related

Unable to convert string to float in Python [duplicate]

How can I convert a string like 123,456.908 to float 123456.908 in Python?
For ints, see How to convert a string to a number if it has commas in it as thousands separators?, although the techniques are essentially the same.

Using the localization services
The default locale
The standard library locale module is Python's interface to C-based localization routines.
The basic usage is:
import locale
locale.atof('123,456')
In locales where , is treated as a thousands separator, this would return 123456.0; in locales where it is treated as a decimal point, it would return 123.456.
However, by default, this will not work:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/locale.py", line 326, in atof
return func(delocalize(string))
ValueError: could not convert string to float: '123,456'
This is because by default, the program is "in a locale" that has nothing to do with the platform the code is running on, but is instead defined by the POSIX standard. As the documentation explains:
Initially, when a program is started, the locale is the C locale, no matter what the user’s preferred locale is. There is one exception: the LC_CTYPE category is changed at startup to set the current locale encoding to the user’s preferred locale encoding. The program must explicitly say that it wants the user’s preferred locale settings for other categories by calling setlocale(LC_ALL, '').
That is: aside from making a note of the system's default setting for the preferred character encoding in text files (nowadays, this will likely be UTF-8), by default, the locale module will interpret data the same way that Python itself does (via a locale named C, after the C programming language). locale.atof will do the same thing as float passed a string, and similarly locale.atoi will mimic int.
Using a locale from the environment
Making the setlocale call mentioned in the above quote from the documentation will pull in locale settings from the user's environment. Thus:
>>> import locale
>>> # passing an empty string asks for a locale configured on the
>>> # local machine; the return value indicates what that locale is.
>>> locale.setlocale(locale.LC_ALL, '')
'en_CA.UTF-8'
>>> locale.atof('123,456.789')
123456.789
>>> locale.atof('123456.789')
123456.789
The locale will not care if the thousands separators are in the right place - it just recognizes and filters them:
>>> locale.atof('12,34,56.789')
123456.789
In 3.6 and up, it will also not care about underscores, which are separately handled by the built-in float and int conversion:
>>> locale.atof('12_34_56.789')
123456.789
On the other side, the string format method, and f-strings, are locale-aware if the n format is used:
>>> f'{123456.789:.9n}' # `.9` specifies 9 significant figures
'123,456.789'
Without the previous setlocale call, the output would not have the comma.
Setting a locale explicitly
It is also possible to make temporary locale settings, using the appropriate locale name, and apply those settings only to a specific aspect of localization. To get localized parsing and formatting only for numbers, for example, use LC_NUMERIC rather than LC_ALL in the setlocale call.
Here are some examples:
>>> # in Denmark, periods are thousands separators and commas are decimal points
>>> locale.setlocale(locale.LC_NUMERIC, 'en_DK.UTF-8')
'en_DK.UTF-8'
>>> locale.atof('123,456.789')
123.456789
>>> # Formatting a number according to the Indian lakh/crore system:
>>> locale.setlocale(locale.LC_NUMERIC, 'en_IN.UTF-8')
'en_IN.UTF-8'
>>> f'{123456.789:9.9n}'
'1,23,456.789'
The necessary locale strings may depend on your operating system, and may require additional work to enable.
To get back to how Python behaves by default, use the C locale described previously, thus: locale.setlocale(locale.LC_ALL, 'C').
Caveats
Setting the locale affects program behaviour globally, and is not thread safe. If done at all, it should normally be done just once at the beginning of the program. Again quoting from documentation:
It is generally a bad idea to call setlocale() in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored.
If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as certain formats used with time.strftime()), you will have to find a way to do it without using the standard library routine. Even better is convincing yourself that using locale settings is okay. Only as a last resort should you document that your module is not compatible with non-C locale settings.
When the Python code is embedded within a C program, setting the locale can even affect the C code:
Extension modules should never call setlocale(), except to find out what the current locale is. But since the return value can only be used portably to restore it, that is not very useful (except perhaps to find out whether or not the locale is C).
(N.B: when setlocale is called with a single category argument, or with None - not an empty string - for the locale name, it does not change anything, and simply returns the name of the existing locale.)
So, this is not meant as a tool, in production code, to try out experimentally parsing or formatting data that was meant for different locales. The above examples are only examples to illustrate how the system works. For this purpose, seek a third-party internationalization library.
However, if the data is all formatted according to a specific locale, specifying that locale ahead of time will make it possible to use locale.atoi and locale.atof as drop-in replacements for int and float calls on string input.

Just remove the , with replace():
float("123,456.908".replace(',',''))

If you don't know the locale and you want to parse any kind of number, use this parseNumber(text) function (My repo). It is not perfect but take into account most cases :
>>> parseNumber("a 125,00 €")
125
>>> parseNumber("100.000,000")
100000
>>> parseNumber("100 000,000")
100000
>>> parseNumber("100,000,000")
100000000
>>> parseNumber("100 000 000")
100000000
>>> parseNumber("100.001 001")
100.001
>>> parseNumber("$.3")
0.3
>>> parseNumber(".003")
0.003
>>> parseNumber(".003 55")
0.003
>>> parseNumber("3 005")
3005
>>> parseNumber("1.190,00 €")
1190
>>> parseNumber("1190,00 €")
1190
>>> parseNumber("1,190.00 €")
1190
>>> parseNumber("$1190.00")
1190
>>> parseNumber("$1 190.99")
1190.99
>>> parseNumber("1 000 000.3")
1000000.3
>>> parseNumber("1 0002,1.2")
10002.1
>>> parseNumber("")
>>> parseNumber(None)
>>> parseNumber(1)
1
>>> parseNumber(1.1)
1.1
>>> parseNumber("rrr1,.2o")
1
>>> parseNumber("rrr ,.o")
>>> parseNumber("rrr1rrr")
1

If the input uses a comma as a decimal point and period as a thousands separator, use .replace twice to convert the data to the format used by the built-in float. Thus:
s = s.replace('.','').replace(',','.')
number = float(s)

What about this?
my_string = "123,456.908"
commas_removed = my_string.replace(',', '') # remove comma separation
my_float = float(commas_removed) # turn from string to float.
In short:
my_float = float(my_string.replace(',', ''))

Better solution for different currency formats:
def text_currency_to_float(text):
t = text
dot_pos = t.rfind('.')
comma_pos = t.rfind(',')
if comma_pos > dot_pos:
t = t.replace(".", "")
t = t.replace(",", ".")
else:
t = t.replace(",", "")
return float(t)
This attempts to detect whether commas are thousands separators and periods are decimal points, or the other way around, by checking where each appears in the string, if at all. (The premise is that thousands separators should not be used in the fractional part of the number.)

s = "123,456.908"
print float(s.replace(',', ''))

Here's a simple way I wrote up for you. :)
>>> number = '123,456,789.908'.replace(',', '') # '123456789.908'
>>> float(number)
123456789.908

You may use babel:
from babel.numbers import parse_decimal
f = float(parse_decimal("123,456.908", locale="en_US"))

Multiple number formats in Python list [duplicate]

How can I convert a string like 123,456.908 to float 123456.908 in Python?
For ints, see How to convert a string to a number if it has commas in it as thousands separators?, although the techniques are essentially the same.

Using the localization services
The default locale
The standard library locale module is Python's interface to C-based localization routines.
The basic usage is:
import locale
locale.atof('123,456')
In locales where , is treated as a thousands separator, this would return 123456.0; in locales where it is treated as a decimal point, it would return 123.456.
However, by default, this will not work:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/locale.py", line 326, in atof
return func(delocalize(string))
ValueError: could not convert string to float: '123,456'
This is because by default, the program is "in a locale" that has nothing to do with the platform the code is running on, but is instead defined by the POSIX standard. As the documentation explains:
Initially, when a program is started, the locale is the C locale, no matter what the user’s preferred locale is. There is one exception: the LC_CTYPE category is changed at startup to set the current locale encoding to the user’s preferred locale encoding. The program must explicitly say that it wants the user’s preferred locale settings for other categories by calling setlocale(LC_ALL, '').
That is: aside from making a note of the system's default setting for the preferred character encoding in text files (nowadays, this will likely be UTF-8), by default, the locale module will interpret data the same way that Python itself does (via a locale named C, after the C programming language). locale.atof will do the same thing as float passed a string, and similarly locale.atoi will mimic int.
Using a locale from the environment
Making the setlocale call mentioned in the above quote from the documentation will pull in locale settings from the user's environment. Thus:
>>> import locale
>>> # passing an empty string asks for a locale configured on the
>>> # local machine; the return value indicates what that locale is.
>>> locale.setlocale(locale.LC_ALL, '')
'en_CA.UTF-8'
>>> locale.atof('123,456.789')
123456.789
>>> locale.atof('123456.789')
123456.789
The locale will not care if the thousands separators are in the right place - it just recognizes and filters them:
>>> locale.atof('12,34,56.789')
123456.789
In 3.6 and up, it will also not care about underscores, which are separately handled by the built-in float and int conversion:
>>> locale.atof('12_34_56.789')
123456.789
On the other side, the string format method, and f-strings, are locale-aware if the n format is used:
>>> f'{123456.789:.9n}' # `.9` specifies 9 significant figures
'123,456.789'
Without the previous setlocale call, the output would not have the comma.
Setting a locale explicitly
It is also possible to make temporary locale settings, using the appropriate locale name, and apply those settings only to a specific aspect of localization. To get localized parsing and formatting only for numbers, for example, use LC_NUMERIC rather than LC_ALL in the setlocale call.
Here are some examples:
>>> # in Denmark, periods are thousands separators and commas are decimal points
>>> locale.setlocale(locale.LC_NUMERIC, 'en_DK.UTF-8')
'en_DK.UTF-8'
>>> locale.atof('123,456.789')
123.456789
>>> # Formatting a number according to the Indian lakh/crore system:
>>> locale.setlocale(locale.LC_NUMERIC, 'en_IN.UTF-8')
'en_IN.UTF-8'
>>> f'{123456.789:9.9n}'
'1,23,456.789'
The necessary locale strings may depend on your operating system, and may require additional work to enable.
To get back to how Python behaves by default, use the C locale described previously, thus: locale.setlocale(locale.LC_ALL, 'C').
Caveats
Setting the locale affects program behaviour globally, and is not thread safe. If done at all, it should normally be done just once at the beginning of the program. Again quoting from documentation:
It is generally a bad idea to call setlocale() in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored.
If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as certain formats used with time.strftime()), you will have to find a way to do it without using the standard library routine. Even better is convincing yourself that using locale settings is okay. Only as a last resort should you document that your module is not compatible with non-C locale settings.
When the Python code is embedded within a C program, setting the locale can even affect the C code:
Extension modules should never call setlocale(), except to find out what the current locale is. But since the return value can only be used portably to restore it, that is not very useful (except perhaps to find out whether or not the locale is C).
(N.B: when setlocale is called with a single category argument, or with None - not an empty string - for the locale name, it does not change anything, and simply returns the name of the existing locale.)
So, this is not meant as a tool, in production code, to try out experimentally parsing or formatting data that was meant for different locales. The above examples are only examples to illustrate how the system works. For this purpose, seek a third-party internationalization library.
However, if the data is all formatted according to a specific locale, specifying that locale ahead of time will make it possible to use locale.atoi and locale.atof as drop-in replacements for int and float calls on string input.

Just remove the , with replace():
float("123,456.908".replace(',',''))

If you don't know the locale and you want to parse any kind of number, use this parseNumber(text) function (My repo). It is not perfect but take into account most cases :
>>> parseNumber("a 125,00 €")
125
>>> parseNumber("100.000,000")
100000
>>> parseNumber("100 000,000")
100000
>>> parseNumber("100,000,000")
100000000
>>> parseNumber("100 000 000")
100000000
>>> parseNumber("100.001 001")
100.001
>>> parseNumber("$.3")
0.3
>>> parseNumber(".003")
0.003
>>> parseNumber(".003 55")
0.003
>>> parseNumber("3 005")
3005
>>> parseNumber("1.190,00 €")
1190
>>> parseNumber("1190,00 €")
1190
>>> parseNumber("1,190.00 €")
1190
>>> parseNumber("$1190.00")
1190
>>> parseNumber("$1 190.99")
1190.99
>>> parseNumber("1 000 000.3")
1000000.3
>>> parseNumber("1 0002,1.2")
10002.1
>>> parseNumber("")
>>> parseNumber(None)
>>> parseNumber(1)
1
>>> parseNumber(1.1)
1.1
>>> parseNumber("rrr1,.2o")
1
>>> parseNumber("rrr ,.o")
>>> parseNumber("rrr1rrr")
1

If the input uses a comma as a decimal point and period as a thousands separator, use .replace twice to convert the data to the format used by the built-in float. Thus:
s = s.replace('.','').replace(',','.')
number = float(s)

What about this?
my_string = "123,456.908"
commas_removed = my_string.replace(',', '') # remove comma separation
my_float = float(commas_removed) # turn from string to float.
In short:
my_float = float(my_string.replace(',', ''))

Better solution for different currency formats:
def text_currency_to_float(text):
t = text
dot_pos = t.rfind('.')
comma_pos = t.rfind(',')
if comma_pos > dot_pos:
t = t.replace(".", "")
t = t.replace(",", ".")
else:
t = t.replace(",", "")
return float(t)
This attempts to detect whether commas are thousands separators and periods are decimal points, or the other way around, by checking where each appears in the string, if at all. (The premise is that thousands separators should not be used in the fractional part of the number.)

s = "123,456.908"
print float(s.replace(',', ''))

Here's a simple way I wrote up for you. :)
>>> number = '123,456,789.908'.replace(',', '') # '123456789.908'
>>> float(number)
123456789.908

You may use babel:
from babel.numbers import parse_decimal
f = float(parse_decimal("123,456.908", locale="en_US"))

How can I convert a string with dot and comma into a float in Python

How can I convert a string like 123,456.908 to float 123456.908 in Python?
For ints, see How to convert a string to a number if it has commas in it as thousands separators?, although the techniques are essentially the same.

Using the localization services
The default locale
The standard library locale module is Python's interface to C-based localization routines.
The basic usage is:
import locale
locale.atof('123,456')
In locales where , is treated as a thousands separator, this would return 123456.0; in locales where it is treated as a decimal point, it would return 123.456.
However, by default, this will not work:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/locale.py", line 326, in atof
return func(delocalize(string))
ValueError: could not convert string to float: '123,456'
This is because by default, the program is "in a locale" that has nothing to do with the platform the code is running on, but is instead defined by the POSIX standard. As the documentation explains:
Initially, when a program is started, the locale is the C locale, no matter what the user’s preferred locale is. There is one exception: the LC_CTYPE category is changed at startup to set the current locale encoding to the user’s preferred locale encoding. The program must explicitly say that it wants the user’s preferred locale settings for other categories by calling setlocale(LC_ALL, '').
That is: aside from making a note of the system's default setting for the preferred character encoding in text files (nowadays, this will likely be UTF-8), by default, the locale module will interpret data the same way that Python itself does (via a locale named C, after the C programming language). locale.atof will do the same thing as float passed a string, and similarly locale.atoi will mimic int.
Using a locale from the environment
Making the setlocale call mentioned in the above quote from the documentation will pull in locale settings from the user's environment. Thus:
>>> import locale
>>> # passing an empty string asks for a locale configured on the
>>> # local machine; the return value indicates what that locale is.
>>> locale.setlocale(locale.LC_ALL, '')
'en_CA.UTF-8'
>>> locale.atof('123,456.789')
123456.789
>>> locale.atof('123456.789')
123456.789
The locale will not care if the thousands separators are in the right place - it just recognizes and filters them:
>>> locale.atof('12,34,56.789')
123456.789
In 3.6 and up, it will also not care about underscores, which are separately handled by the built-in float and int conversion:
>>> locale.atof('12_34_56.789')
123456.789
On the other side, the string format method, and f-strings, are locale-aware if the n format is used:
>>> f'{123456.789:.9n}' # `.9` specifies 9 significant figures
'123,456.789'
Without the previous setlocale call, the output would not have the comma.
Setting a locale explicitly
It is also possible to make temporary locale settings, using the appropriate locale name, and apply those settings only to a specific aspect of localization. To get localized parsing and formatting only for numbers, for example, use LC_NUMERIC rather than LC_ALL in the setlocale call.
Here are some examples:
>>> # in Denmark, periods are thousands separators and commas are decimal points
>>> locale.setlocale(locale.LC_NUMERIC, 'en_DK.UTF-8')
'en_DK.UTF-8'
>>> locale.atof('123,456.789')
123.456789
>>> # Formatting a number according to the Indian lakh/crore system:
>>> locale.setlocale(locale.LC_NUMERIC, 'en_IN.UTF-8')
'en_IN.UTF-8'
>>> f'{123456.789:9.9n}'
'1,23,456.789'
The necessary locale strings may depend on your operating system, and may require additional work to enable.
To get back to how Python behaves by default, use the C locale described previously, thus: locale.setlocale(locale.LC_ALL, 'C').
Caveats
Setting the locale affects program behaviour globally, and is not thread safe. If done at all, it should normally be done just once at the beginning of the program. Again quoting from documentation:
It is generally a bad idea to call setlocale() in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored.
If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as certain formats used with time.strftime()), you will have to find a way to do it without using the standard library routine. Even better is convincing yourself that using locale settings is okay. Only as a last resort should you document that your module is not compatible with non-C locale settings.
When the Python code is embedded within a C program, setting the locale can even affect the C code:
Extension modules should never call setlocale(), except to find out what the current locale is. But since the return value can only be used portably to restore it, that is not very useful (except perhaps to find out whether or not the locale is C).
(N.B: when setlocale is called with a single category argument, or with None - not an empty string - for the locale name, it does not change anything, and simply returns the name of the existing locale.)
So, this is not meant as a tool, in production code, to try out experimentally parsing or formatting data that was meant for different locales. The above examples are only examples to illustrate how the system works. For this purpose, seek a third-party internationalization library.
However, if the data is all formatted according to a specific locale, specifying that locale ahead of time will make it possible to use locale.atoi and locale.atof as drop-in replacements for int and float calls on string input.

Just remove the , with replace():
float("123,456.908".replace(',',''))

If you don't know the locale and you want to parse any kind of number, use this parseNumber(text) function (My repo). It is not perfect but take into account most cases :
>>> parseNumber("a 125,00 €")
125
>>> parseNumber("100.000,000")
100000
>>> parseNumber("100 000,000")
100000
>>> parseNumber("100,000,000")
100000000
>>> parseNumber("100 000 000")
100000000
>>> parseNumber("100.001 001")
100.001
>>> parseNumber("$.3")
0.3
>>> parseNumber(".003")
0.003
>>> parseNumber(".003 55")
0.003
>>> parseNumber("3 005")
3005
>>> parseNumber("1.190,00 €")
1190
>>> parseNumber("1190,00 €")
1190
>>> parseNumber("1,190.00 €")
1190
>>> parseNumber("$1190.00")
1190
>>> parseNumber("$1 190.99")
1190.99
>>> parseNumber("1 000 000.3")
1000000.3
>>> parseNumber("1 0002,1.2")
10002.1
>>> parseNumber("")
>>> parseNumber(None)
>>> parseNumber(1)
1
>>> parseNumber(1.1)
1.1
>>> parseNumber("rrr1,.2o")
1
>>> parseNumber("rrr ,.o")
>>> parseNumber("rrr1rrr")
1

If the input uses a comma as a decimal point and period as a thousands separator, use .replace twice to convert the data to the format used by the built-in float. Thus:
s = s.replace('.','').replace(',','.')
number = float(s)

What about this?
my_string = "123,456.908"
commas_removed = my_string.replace(',', '') # remove comma separation
my_float = float(commas_removed) # turn from string to float.
In short:
my_float = float(my_string.replace(',', ''))

Better solution for different currency formats:
def text_currency_to_float(text):
t = text
dot_pos = t.rfind('.')
comma_pos = t.rfind(',')
if comma_pos > dot_pos:
t = t.replace(".", "")
t = t.replace(",", ".")
else:
t = t.replace(",", "")
return float(t)
This attempts to detect whether commas are thousands separators and periods are decimal points, or the other way around, by checking where each appears in the string, if at all. (The premise is that thousands separators should not be used in the fractional part of the number.)

s = "123,456.908"
print float(s.replace(',', ''))

Here's a simple way I wrote up for you. :)
>>> number = '123,456,789.908'.replace(',', '') # '123456789.908'
>>> float(number)
123456789.908

You may use babel:
from babel.numbers import parse_decimal
f = float(parse_decimal("123,456.908", locale="en_US"))

Converting a Python Float to a String without losing precision

I am maintaining a Python script that uses xlrd to retrieve values from Excel spreadsheets, and then do various things with them. Some of the cells in the spreadsheet are high-precision numbers, and they must remain as such. When retrieving the values of one of these cells, xlrd gives me a float such as 0.38288746115497402.
However, I need to get this value into a string later on in the code. Doing either str(value) or unicode(value) will return something like "0.382887461155". The requirements say that this is not acceptable; the precision needs to be preserved.
I've tried a couple things so far to no success. The first was using a string formatting thingy:
data = "%.40s" % (value)
data2 = "%.40r" % (value)
But both produce the same rounded number, "0.382887461155".
Upon searching around for people with similar problems on SO and elsewhere on the internet, a common suggestion was to use the Decimal class. But I can't change the way the data is given to me (unless somebody knows of a secret way to make xlrd return Decimals). And when I try to do this:
data = Decimal(value)
I get a TypeError: Cannot convert float to Decimal. First convert the float to a string. But obviously I can't convert it to a string, or else I will lose the precision.
So yeah, I'm open to any suggestions -- even really gross/hacky ones if necessary. I'm not terribly experienced with Python (more of a Java/C# guy myself) so feel free to correct me if I've got some kind of fundamental misunderstanding here.
EDIT: Just thought I would add that I am using Python 2.6.4. I don't think there are any formal requirements stopping me from changing versions; it just has to not mess up any of the other code.

I'm the author of xlrd. There is so much confusion in other answers and comments to rebut in comments so I'm doing it in an answer.
#katriealex: """precision being lost in the guts of xlrd""" --- entirely unfounded and untrue. xlrd reproduces exactly the 64-bit float that's stored in the XLS file.
#katriealex: """It may be possible to modify your local xlrd installation to change the float cast""" --- I don't know why you would want to do this; you don't lose any precision by floating a 16-bit integer!!! In any case that code is used only when reading Excel 2.X files (which had an INTEGER-type cell record). The OP gives no indication that he is reading such ancient files.
#jloubert: You must be mistaken. "%.40r" % a_float is just a baroque way of getting the same answer as repr(a_float).
#EVERYBODY: You don't need to convert a float to a decimal to preserve the precision. The whole point of the repr() function is that the following is guaranteed:
float(repr(a_float)) == a_float
Python 2.X (X <= 6) repr gives a constant 17 decimal digits of precision, as that is guaranteed to reproduce the original value. Later Pythons (2.7, 3.1) give the minimal number of decimal digits that will reproduce the original value.
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32
>>> f = 0.38288746115497402
>>> repr(f)
'0.38288746115497402'
>>> float(repr(f)) == f
True
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
>>> f = 0.38288746115497402
>>> repr(f)
'0.382887461154974'
>>> float(repr(f)) == f
True
So the bottom line is that if you want a string that preserves all the precision of a float object, use preserved = repr(the_float_object) ... recover the value later by float(preserved). It's that simple. No need for the decimal module.

You can use repr() to convert to a string without losing precision, then convert to a Decimal:
>>> from decimal import Decimal
>>> f = 0.38288746115497402
>>> d = Decimal(repr(f))
>>> print d
0.38288746115497402

EDIT: I am wrong. I shall leave this answer here so the rest of the thread makes sense, but it's not true. Please see John Machin's answer above. Thanks guys =).
If the above answers work that's great -- it will save you a lot of nasty hacking. However, at least on my system, they won't. You can check this with e.g.
import sys
print( "%.30f" % sys.float_info.epsilon )
That number is the smallest float that your system can distinguish from zero. Anything smaller than that may be randomly added or subtracted from any float when you perform an operation. This means that, at least on my Python setup, the precision is lost inside the guts of xlrd, and there seems to be nothing you can do without modifying it. Which is odd; I'd have expected this case to have occurred before, but apparently not!
It may be possible to modify your local xlrd installation to change the float cast. Open up site-packages\xlrd\sheet.py and go down to line 1099:
...
elif rc == XL_INTEGER:
rowx, colx, cell_attr, d = local_unpack('<HH3sH', data)
self_put_number_cell(rowx, colx, float(d), self.fixed_BIFF2_xfindex(cell_attr, rowx, colx))
...
Notice the float cast -- you could try changing that to a decimal.Decimal and see what happens.

EDIT: Cleared my previous answer b/c it didn't work properly.
I'm on Python 2.6.5 and this works for me:
a = 0.38288746115497402
print repr(a)
type(repr(a)) #Says it's a string
Note: This just converts to a string. You'll need to convert to Decimal yourself later if needed.

As has already been said, a float isn't precise at all - so preserving precision can be somewhat misleading.
Here's a way to get every last bit of information out of a float object:
>>> from decimal import Decimal
>>> str(Decimal.from_float(0.1))
'0.1000000000000000055511151231257827021181583404541015625'
Another way would be like so.
>>> 0.1.hex()
'0x1.999999999999ap-4'
Both strings represent the exact contents of the float. Allmost anything else interprets the float as python thinks it was probably intended (which most of the time is correct).

How do I use extended characters in Python's curses library?

I've been reading tutorials about Curses programming in Python, and many refer to an ability to use extended characters, such as line-drawing symbols. They're characters > 255, and the curses library knows how to display them in the current terminal font.
Some of the tutorials say you use it like this:
c = ACS_ULCORNER
...and some say you use it like this:
c = curses.ACS_ULCORNER
(That's supposed to be the upper-left corner of a box, like an L flipped vertically)
Anyway, regardless of which method I use, the name is not defined and the program thus fails. I tried "import curses" and "from curses import *", and neither works.
Curses' window() function makes use of these characters, so I even tried poking around on my box for the source to see how it does it, but I can't find it anywhere.

you have to set your local to all, then encode your output as utf-8 as follows:
import curses
import locale
locale.setlocale(locale.LC_ALL, '') # set your locale
scr = curses.initscr()
scr.clear()
scr.addstr(0, 0, u'\u3042'.encode('utf-8'))
scr.refresh()
# here implement simple code to wait for user input to quit
scr.endwin()
output:
あ

From curses/__init__.py:
Some constants, most notably the ACS_*
ones, are only added to the C
_curses module's dictionary after initscr() is called. (Some versions
of SGI's curses don't define values
for those constants until initscr()
has been called.) This wrapper
function calls the underlying C
initscr(), and then copies the
constants from the
_curses module to the curses package's dictionary. Don't do 'from curses
import *' if you'll be needing the
ACS_* constants.
In other words:
>>> import curses
>>> curses.ACS_ULCORNER
exception
>>> curses.initscr()
>>> curses.ACS_ULCORNER
>>> 4194412

I believe the below is appropriately related, to be posted under this question. Here I'll be using utfinfo.pl (see also on Super User).
First of all, for standard ASCII character set, the Unicode code point and the byte encoding is the same:
$ echo 'a' | perl utfinfo.pl
Char: 'a' u: 97 [0x0061] b: 97 [0x61] n: LATIN SMALL LETTER A [Basic Latin]
So we can do in Python's curses:
window.addch('a')
window.border('a')
... and it works as intended
However, if a character is above basic ASCII, then there are differences, which addch docs don't necessarily make explicit. First, I can do:
window.addch(curses.ACS_PI)
window.border(curses.ACS_PI)
... in which case, in my gnome-terminal, the Unicode character 'π' is rendered. However, if you inspect ACS_PI, you'll see it's an integer number, with a value of 4194427 (0x40007b); so the following will also render the same character (or rater, glyph?) 'π':
window.addch(0x40007b)
window.border(0x40007b)
To see what's going on, I grepped through the ncurses source, and found the following:
#define ACS_PI NCURSES_ACS('{') /* Pi */
#define NCURSES_ACS(c) (acs_map[NCURSES_CAST(unsigned char,c)])
#define NCURSES_CAST(type,value) static_cast<type>(value)
#lib_acs.c: NCURSES_EXPORT_VAR(chtype *) _nc_acs_map(void): MyBuffer = typeCalloc(chtype, ACS_LEN);
#define typeCalloc(type,elts) (type *)calloc((elts),sizeof(type))
#./widechar/lib_wacs.c: { '{', { '*', 0x03c0 }}, /* greek pi */
Note here:
$ echo '{π' | perl utfinfo.pl
Got 2 uchars
Char: '{' u: 123 [0x007B] b: 123 [0x7B] n: LEFT CURLY BRACKET [Basic Latin]
Char: 'π' u: 960 [0x03C0] b: 207,128 [0xCF,0x80] n: GREEK SMALL LETTER PI [Greek and Coptic]
... neither of which relates to the value of 4194427 (0x40007b) for ACS_PI.
Thus, when addch and/or border see a character above ASCII (basically an unsigned int, as opposed to unsigned char), they (at least in this instance) use that number not as Unicode code point, or as UTF-8 encoded bytes representation - but instead, they use it as a look-up index for acs_map-ping function (which ultimately, however, would return the Unicode code point, even if it emulates VT-100). That is why the following specification:
window.addch('π')
window.border('π')
will fail in Python 2.7 with argument 1 or 3 must be a ch or an int; and in Python 3.2 would render simply a space instead of a character. When we specify 'π'. we've actually specified the UTF-8 encoding [0xCF,0x80] - but even if we specify the Unicode code point:
window.addch(0x03C0)
window.border0x03C0)
... it simply renders nothing (space) in both Python 2.7 and 3.2.
That being said - the function addstr does accept UTF-8 encoded strings, and works fine:
window.addstr('π')
... but for borders - since border() apparently handles characters in the same way addch() does - we're apparently out of luck, for anything not explicitly specified as an ACS constant (and there's not that many of them, either).
Hope this helps someone,
Cheers!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing number representation in IDLE - python

Related

Unable to convert string to float in Python [duplicate]

Multiple number formats in Python list [duplicate]

How can I convert a string with dot and comma into a float in Python

Converting a Python Float to a String without losing precision

How do I use extended characters in Python's curses library?

Categories

Resources