Locale-indepenent string to float conversion in python

Locale-indepenent string to float conversion in python - python

I need to convert string to float, but there can be different input string formats, such as '1234,5' or '1234.5' or '1 234,5' or '1,234.5' or whatever. And I can not change locale decimal pointer or thousands separator, because I may not know what data I will get in advance.
Is there a way or method or library to parse and convert to float this kind of locale-specific values without knowing which locale is used?
P.S. Is there any solution exists for the same problem with dates?
TIA.

You can make some assumptions on which character is the thousands separator and which is the decimal point. However, there is a case where you cannot know for sure what do do:
Look for the last character that is . or ,. If it occurs more than once, the number does not have a decimal point and that character is the thousands separator
If the string contains exactly one of each, the last one is the decimal point
If the string contains only one point/comma, you are pretty much out of luck: 123.456 or 123,456 might be the number 123456 or 123.456. However, with a number like 123.45 - i.e. the number of digits after the potential thousands separator not being a multiple of three - you can assume that it's a decimal point.

Related

Convert float to Decimal with fixed digits after decimal

I want to convert some floats to Decimal retaining 5 digits after decimal place regardless of how many digits before the decimal place. Is using string formatting the most efficient way to do this?
I see in the docs:
The significance of a new Decimal is determined solely by the number of digits input. Context precision and rounding only come into play during arithmetic operations.
So that means I need to add 0 to force it to use the specified prec but the prec is total digits not after decimal so it doesn't actually help.
The best thing I can come up with is
a=[1.132434, 22.2334,99.33999434]
[Decimal("%.5f" % round(x,5)) for x in a]
to get [Decimal('1.13243'), Decimal('22.23340'), Decimal('99.33999')]
Is there a better way? It feels like turning floats into strings just to convert them back to a number format isn't very good although I can't articulate why.

Do all the formatting on the way out from your code, inside the print and write statements. There is no reason I can think of to lose precision (and convert the numbers to some fixed format) while doing numeric calculations inside the code.

How to convert all floats to two decimal places regardless of number of decimal places float has without repeating convert code?

Goal
I want to convert all floats to two decimal places regardless of number of decimal places float has without repeating convert code.
For example, I want to convert
50 to 50.00
50.5 to 50.50
without repeating the convert code again and again. What I mean is explained in the following section - research.
Not what this question is about
This question is NOT about:
Only limiting floats to two decimal points - if there is less than two decimal points then I want it to have two decimal points with zeros for the unused spaces.
Flooring the decimal or ceiling it.
Rounding the decimal off.
This question is not a duplicate of this question.
That question only answers the first part of my question - convert floats to two decimal places regardless of number of decimal places float has, not the second part - without repeating convert code.
Nor this question.
That is just how to add units before the decimal place. My question
is: how to convert all floats to two decimal places regardless of
number of decimal places float has without repeating convert code.
Research
I found two ways I can achieve the convert. One is using the decimal module:
from decimal import *
TWOPLACES = Decimal(10) ** -2
print(Decimal('9.9').quantize(TWOPLACES))
Another, without using any other modules:
print(f"{9.9:.2f}")
However, that does not fully answer my question. Realise that the code to convert keeps being needed to repeat itself? I keep having to repeat the code to convert again and again. Sadly, my whole program is already almost completed and it will be quite a waste of time to add this code here and there so the format will be correct. Is there any way to convert all floats to two decimal places regardless of number of decimal places float has without repeating convert code?
Clarification
What I mean by convert is, something like what Dmytro Chasovskyi said, that I want all places with floats in my program without extra changes to start to operate like decimals. For example, if I had the operation 1.2345 + 2.7 + 3 + 56.1223183 it should be 1.23 + 2.70 + 3.00 + 56.12.
Also, float is a number, not a function.

The bad news is: there is no "float" with "two decimal places".
Floating point numbers are represented internally with a fixed number of digits in base 2. https://floating-point-gui.de/basic/ .
And these are both efficient and accurate enough for almost all calculations we perform with any modern program.
What we normally want is that the human-readable text representation of a number, in all outputs of a program, shows only two digits. And this is controlled at wherever your program is either writting the value to a text file, to the screen, or rendering it to an HTML template (which is "writing it to a text file", again).
So, it happens that the same syntaxes that will convert a number to text, embedded in another string, allows additionally to control the exact output of the number. You put as an example print(f"{9.9:.2f}"). The only thing that looks impractical there is due to you hardcoding your number along with its conversion. Typically, the number will be in a variable.
Them, all you have to do is writting, wherever you output the number:
print(f"The value is: {myvar:.02f}")
instead of
print(f"The value is: {myvar}")
Or in whatever function you are calling that will need the rendered version of the number instead of print. Notice that the use of the word "rendered" here is deliberate: while your program is running, the number is stored in an efficient way in memory, directly usable by the CPU, that is not human readable. At any point you want to "see" the number, you have to convert it into text. It is just that some calls to it implicitly, like print(myvar). Then, just resort to explicitly converting it in these places - `print(f"{myvar:.02f}").
really having 2 decimal places in memory
If you use decimal.Decimal, then yes, there are ways to keep the internal representation of the number with 2 decimal digits,
but them, instead of just converting the number on output, you must convert it into a "2 decimal place" value on all inputs as well
That means that whenever ingesting a number into your program, be it typed by the user, read from a binary file or database, or received via wire from a sensor, you have to apply a similar transform to the one used in the output as detailed above. More precisely: you convert your float to a properly formatted string, and then convert that to a decimal.Decimal.
And this will prevent your program of accumulating errors due to base conversion, but you will still need to force the format to 2 decimal places on every output, just like above.

Use this function.
def cvt_decimal(input):
number = float(input)
return ("%.2f" % number)
print(cvt_decimal(50))
print(cvt_decimal(50.5))
Output is :
50.00
50.50
** Process exited - Return Code: 0 **
Press Enter to exit terminal

you can modify the decimal precision, even if you do any operation between 2 decimal types
import decimal
from decimal import Decimal
decimal.getcontext().prec = 2
a = Decimal('0.12345')
b = Decimal('0.12345')
print(a + b)
Decimal calculations are precise but it takes more time to do calculations, keep that in mind.

pandas `read_sql_query` - read `double` datatype in MySQL database to `Decimal`

I'm trying to use read_sql_query() to read a query from MySQL database, one of the field in the database, its type is double(24, 8), I want to use dtype= parameter to have full control of the datatypes and read it to decimal, but seems like pandas can't recognize decimal type so I had to read it to Float64
In the database, the values for this field look like this:
Value
100.96000000
77.17000000
1.00000000
0.12340000
Then I'm trying to read it from Python code:
from decimal import *
dtypes = {
'id': 'Int64',
'date': 'datetime64',
'value': 'Float64'
}
df = pd.read_sql_query(sql_query, mysql_engine, dtype=dtypes)
but after reading the data from the code above, it looks like this:
Value
100.96
77.17
1.0
0.1234
How can I read this column to decimal and keep all the digits? Thanks.

What "the data looks like in the database" is tricky. This is because the act of printing it out feeds the bits through a formatting algorithm. In this case it removes trailing zeros. To see what is "in the database", one needs to get a hex dump of the file and then decipher it; this is non-trivial.
I believe that DECIMAL numbers hold all the digits specified, packed 2 digits per byte. No, I don't know how they are packed (0..99 versus 2 hex digits; what to do if the number of digits is odd; where is the sign?)
I believe that FLOAT and DOUBLE exactly conform to IEEE-764 encoding format. No, I don't know how the bytes are stored (big-endian vs little-endian). I suspect Python's Float64 is IEEE DOUBLE.
For DECIMAL(10,6), I would expect to see "1.234" to be stored as +, 0001, and 234000, but never displayed with leading zeros and optionally displayed with trailing zeros -- depending on the output formatting package.
For DOUBLE, I would expect to find hex 3ff3be76c8b43958 after adjusting for endianism, and I would not be surprised to see the output be 1.23399999999999999e+0. (Yes, I actually got that, given a suitable formatting in PHP, which I am using.) I would hope to see 1.234 since that is presumably the intent of the number.
Do not use DOUBLE(m,n). The (m,n) leads to extra rounding and it is deprecated syntax. Float and Double are not intended for exact number of decimal places; use DECIMAL for such.
For FLOAT: 1.234 becomes hex 3f9df3b6 and displays something like 1.2339999675751 assuming the output method works in DOUBLE and is asked to show lots of decimal places.
Bottom line: The output method you are using is causing the problem.

Drop extraneous decimal places when using Python's decimal.Decimal

When using Python's decimal.Decimal class we now have a need to drop extraneous decimal places. So for instance '0.00' becomes '0' and '0.50' becomes '0.5'. Is there a cleaner way of doing this than converting to a string and manually dropping trailing zeros and full stops?
To clarify we need to be able to dynamically round the result without knowing the number of decimal places in advance and potentially output an integer (or a string representation of one) if no decimal places are needed... is this already built-in to Python?

Use Decimal.normalize:
>>> Decimal('0.00').normalize()
Decimal('0')
>>> Decimal('0.50').normalize()
Decimal('0.5')

In a python logging is there a formatter to truncate the string?

Python logging formats strings with a syntax I don't see elsewhere in python, like
'format': '%(name)s'
Is there any way to truncate an error message using the formatter, or do I need to override the LogRecord class for that?
This truncates parts of the message (though I can't find the documentation for this feature in the normal places):
'format': '%(name).40s %(message).40s'
I'd rather truncate the entire formatted message, if possible (for an 80 column console, say).

This is just the old style of string formatting.
I think you can just use a ".L", where L is the length to truncate the string to whatever length you like. eg:
'format': '%(name).5s'
would truncate the length to 5 characters.
It's a little hard to find, but they actually do mention it in the docs:
The precision is a decimal number indicating how many digits should be displayed after the decimal point for a floating point value formatted with 'f' and 'F', or before and after the decimal point for a floating point value formatted with 'g' or 'G'. For non-number types the field indicates the maximum field size - in other words, how many characters will be used from the field content

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.