Python rounding and inserting into array does not round [duplicate] - python

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?

The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.

In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.

It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.

Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

Related

How do I check the default decimal precision when converting float to str?

When converting a float to a str, I can specify the number of decimal points I want to display
'%.6f' % 0.1
> '0.100000'
'%.6f' % .12345678901234567890
> '0.123457'
But when simply calling str on a float in python 2.7, it seems to default to 12 decimal points max
str(0.1)
>'0.1'
str(.12345678901234567890)
>'0.123456789012'
Where is this max # of decimal points defined/documented? Can I programmatically get this number?
The number of decimals displayed is going to vary greatly, and there won't be a way to predict how many will be displayed in pure Python. Some libraries like numpy allow you to set precision of output.
This is simply because of the limitations of float representation.
The relevant parts of the link talk about how Python chooses to display floats.
Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine
Python keeps the number of digits manageable by displaying a rounded value instead
Now, there is the possibility of overlap here:
Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction
The method for choosing which decimal values to display was changed in Python 3.1 (But the last sentence implies this might be an implementation detail).
For example, the numbers 0.1 and 0.10000000000000001 are both
approximated by 3602879701896397 / 2 ** 55. Since all of these decimal
values share the same approximation, any one of them could be
displayed while still preserving the invariant eval(repr(x)) == x
Historically, the Python prompt and built-in repr() function would
choose the one with 17 significant digits, 0.10000000000000001.
Starting with Python 3.1, Python (on most systems) is now able to
choose the shortest of these and simply display 0.1.
I do not believe this exists in the python language spec. However, the cpython implementation does specify it. The float_repr() function, which turns a float into a string, eventually calls a helper function with the 'r' formatter, which eventually calls a utility function that hardcodes the format to what comes down to format(float, '.16g'). That code can be seen here. Note that this is for python3.6.
>>> import math
>>> str(math.pi*4)
12.5663706144
giving the maximum number of signification digits (both before and after the decimal) at 16. It appears that in the python2.7 implementation, this value was hardcoded to .12g. As for why this happened (and is somewhat lacking documentation, can be found here.)
So if you are trying to get how long a number will be formatted when printed, simply get it's length with .12g.
def len_when_displayed(n):
return len(format(n, '.12g'))
Well, if you're looking for a pure python way of accomplishing this, you could always use something like,
len(str(.12345678901234567890).split('.')[1])
>>>> 12
I couldn't find it in the documentation and will add it here if I do, but this is a work around that can at least always return the length of precision if you want to know before hand.
As you said, it always seems to be 12 even when feeding bigger floating-points.
From what I was able to find, this number can be highly variable and in these cases, finding it empirically seems to be the most reliable way of doing it. So, what I would do is define a simple method like this,
def max_floating_point():
counter = 0
current_length = 0
str_rep = '.1'
while(counter <= current_length):
str_rep += '1'
current_length = len(str(float(str_rep)).split('.')[1])
counter += 1
return current_length
This will return you the maximum length representation on your current system,
print max_floating_point()
>>>> 12
By looking at the output of random numbers converted, I have been unable to understand how the length of the str() is determined, e.g. under Python 3.6.6:
>>> str(.123456789123456789123456789)
'0.12345678912345678'
>>> str(.111111111111111111111111111)
'0.1111111111111111'
You may opt for this code that actually simulates your real situation:
import random
maxdec=max(map(lambda x:len(str(x)),filter(lambda x:x>.1,[random.random() for i in range(99)])))-2
Here we are testing the length of ~90 random numbers in the (.1,1) open interval after conversion (and deducing the 0. from the left, hence the -2).
Python 2.7.5 on a 64bit linux gives me 12, and Python 3.4.8 and 3.6.6 give me 17.

number format is different in linux and windows version of pycharm

I have used one python code in PyCharm in Linux and the format of number was
-91.35357. When I used the same code in PyCharm in Windows format was
-91.35356999999999. The problem is that value is consisted in the file name which I need to open (and the list of files to open is long).
Anyone knows possible explanation and hot to fix it?
Floats
Always remember that float numbers have a limited precision. If you think about it, there must be a limit to how exactly you represent a number if you limit storage to 32 or 64 bits (or any other number).
in Python
Python provides just one float type. Float numbers are usually implemented using 64 bits, but yet they might be 64 bit in one Python binary, 32 bit on another, so you can't really rely on that (however, see #Mark Dickinson comment below).
Let's test this. But note that, because Python does not provide float32 and float64 alternatives, we will use a different library, numpy, to provide us with those types and operations:
>>> n = 1.23456789012345678901234567890
>>> n
1.2345678901234567
>>> numpy.float64(n)
1.2345678901234567
>>> numpy.float32(n)
1.2345679
Here we can see that Python, in my computer, handles the variable as a float64. This already truncates the number we introduced (because a float64 can only handle so much precision).
When we use a float32, precision is further reduced and, because of truncation, the closest number we can represent is slightly different.
Conclusion
Float resolution is limited. Furthermore, some operations behave differently across different architectures.
Even if you are using a consistent float size, not all numbers can be represented, and operations will accumulate truncation errors.
Comparing a float to another float shall be done considering a possible error margin. Do not use float_a == float_b, instead use abs(float_a - float_b) < error_margin.
Relying on float representations is always a bad idea. Python sometimes uses scientific notation:
>>> a = 0.0000000001
>>> str(a)
'1e-10'
You can get consistent rounding approximation (ie, to use in file names), but remember that storage and representation are different things. This other thread may assist you: Limiting floats to two decimal points
In general, I'd advise against using float numbers in file names or as any other kind of identifier.
Latitude / Longitude
float32 numbers have not enough precision to represent the 5th and 6th decimal numbers in latitude/longitude pairs (depending on whether the integer part has one, two or three digits).
If you want to learn what's really happening, check this page and test some of your numbers: https://www.h-schmidt.net/FloatConverter/IEEE754.html
Representing
Note that Python rounds float values when representing them:
>>> lat = 123.456789
>>> "{0:.6f}".format(lat)
'123.456789'
>>> "{0:.5f}".format(lat)
'123.45679'
And as stated above, latitude/longitude cannot be correctly represented by a float32 down to the 6th decimal, and furthermore, the truncated float values are rounded when presented by Python:
>>> lat = 123.456789
>>> lat
123.456789
>>> "{0:.5f}".format(numpy.float64(lat))
'123.45679'
>>> "{0:.5f}".format(numpy.float32(lat))
'123.45679'
>>> "{0:.6f}".format(numpy.float32(lat))
'123.456787'
As you can see, the rounded version of that float32 number fails to match the original number from the 5th decimal. But also does the rounded version to the 5th decimal of the float64 number.
Your PyCharm on Linux is simply rounding of your large floating point number. Rounding it off to the nearest 6 or 7 can resolve your issue but DONT USE THESE AS FILE NAMES.
Keeping your code constant in both cases then, their can be many explanations:
1) 32-bit Processors handles float differently than 64-Bit Processors.
2) PyCharm for both Linux and Windows behaves differently for floating points which we cannot determine exactly, may be PyCharm for Windows is better optimised.
edit 1
Explanation for Point 1
on 32-Bit processors everything is really done in 80-bit precision internally. The precision really just determines how many of those bits are stored in memory. This is part of the reason why different optimisation settings can change results slightly: They change the amount of rounding from 80-bit to 32- or 64-bit.
edit 2
You can use hashmapping for saving your data in files and then mapping them onto the co-ordinates.
Example:
# variable = {(long,lat):"<random_file_name>"}
cordinates_and_file ={(-92.45453534,-87.2123123):"AxdwaWAsdAwdz"}

Division in python an types

I have a code which part of it looks like this,
A=(-1//2+(int(math.sqrt(1+8*t)))//2)
if type(A)==int:
print(t)
print(A)
The problem arises when I use "/" to get "A",
Since I am using "/", I always get an extra decimal point. For example 5/5=1.0 or 4/2=2.0 etc, which python interprets it as a float (I am using 3.6.5). Hence whatever the result is my code stuck at line2.
When I use // the same thing happens. I get 5/2=2 which its float actually but it appears as an integer.
Since my code depends on the type of this division how I can solve this problem?
A=(-1//2+(int(math.sqrt(1+8*t)))//2) its a actually the formula for finding the roots of the quadratic equation (where in the equation a=1 and b=1 and c=-2t for ax^2+bx+c) I need only the integer roots with positive values
What you're trying to do won't work. For two integers x and y, x/y is always a float, even if it happens to be integral, and x//y is always an int, even if it has to truncate (throw away) a fractional part. So testing type(A) == int doesn't test for anything except which of the two you used.
There is a method float.is_integer that you can use, and that works fine for integers divided by 2—but it doesn't work once you're using sqrt. Explaining floating-point rounding issues is a big enough job that it takes up a whole paper that's so important that it's been included by reference in multiple language specifications, but the short version is that sqrt could very easily give you a number that's a tiny big bigger or smaller than an integer, so is_integer will give you the wrong answer.
What you probably want to do is something like this:
if math.isclose(A, round(A)):
The round function will round a float to the nearest integer. The isclose function will then check whether the resulting integer is "close enough" to the original float. You should read the docs on isclose to understand exactly what it does, but in this case, I think the default values will be fine, unless you're dealing with huge integers.

Python default behavior of str(x)

I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like
mydec = decimal.Decimal(str(x))
where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?
For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.
>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'
Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:
>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')
So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)
There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!
In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:
/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
the rounding noise created by various operations is suppressed, while
giving plenty of precision for practical use. */
#define PyFloat_STR_PRECISION 12
You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.
There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.
If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).
As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to
return a string containing a nicely printable representation of an object
There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.
If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):
Decimal( "{0:.5f}".format(a_float) )
You can also remove 0s on the right with resulting_string.rstrip("0").
Again, this method does not recover the information that has been lost.

How do I force Python to keep an integer out of scientific notation

I am trying to write a method in Python 3.2 that encrypts a phrase and then decrypts it. The problem is that the numbers are so big that when Python does math with them it immediately converts it into scientific notation. Since my code requires all the numbers to function scientific notation, this is not useful.
What I have is:
coded = ((eval(input(':'))+1213633288469888484)/2)+1042
Basically, I just get a number from the user and do some math to it.
I have tried format() and a couple other things but I can't get them to work.
EDIT: I use only even integers.
In python3, '/' does real division (e.g. floating point). To get integer division, you need to use //. In other words 100/2 yields 50.0 (float) whereas 100//2 yields 50 (integer)
Your code probably needs to be changed as:
coded = ((eval(input(':'))+1213633288469888484)//2)+1042
As a cautionary tale however, you may want to consider using int instead of eval:
coded = ((int(input(':'))+1213633288469888484)//2)+1042
If you know that the floating point value is really an integer, or you don't care about dropping the fractional part, you can just convert it to an int before you print it.
>>> print 1.2e16
1.2e+16
>>> print int(1.2e16)
12000000000000000

Categories

Resources