number format is different in linux and windows version of pycharm - python

I have used one python code in PyCharm in Linux and the format of number was
-91.35357. When I used the same code in PyCharm in Windows format was
-91.35356999999999. The problem is that value is consisted in the file name which I need to open (and the list of files to open is long).
Anyone knows possible explanation and hot to fix it?

Floats
Always remember that float numbers have a limited precision. If you think about it, there must be a limit to how exactly you represent a number if you limit storage to 32 or 64 bits (or any other number).
in Python
Python provides just one float type. Float numbers are usually implemented using 64 bits, but yet they might be 64 bit in one Python binary, 32 bit on another, so you can't really rely on that (however, see #Mark Dickinson comment below).
Let's test this. But note that, because Python does not provide float32 and float64 alternatives, we will use a different library, numpy, to provide us with those types and operations:
>>> n = 1.23456789012345678901234567890
>>> n
1.2345678901234567
>>> numpy.float64(n)
1.2345678901234567
>>> numpy.float32(n)
1.2345679
Here we can see that Python, in my computer, handles the variable as a float64. This already truncates the number we introduced (because a float64 can only handle so much precision).
When we use a float32, precision is further reduced and, because of truncation, the closest number we can represent is slightly different.
Conclusion
Float resolution is limited. Furthermore, some operations behave differently across different architectures.
Even if you are using a consistent float size, not all numbers can be represented, and operations will accumulate truncation errors.
Comparing a float to another float shall be done considering a possible error margin. Do not use float_a == float_b, instead use abs(float_a - float_b) < error_margin.
Relying on float representations is always a bad idea. Python sometimes uses scientific notation:
>>> a = 0.0000000001
>>> str(a)
'1e-10'
You can get consistent rounding approximation (ie, to use in file names), but remember that storage and representation are different things. This other thread may assist you: Limiting floats to two decimal points
In general, I'd advise against using float numbers in file names or as any other kind of identifier.
Latitude / Longitude
float32 numbers have not enough precision to represent the 5th and 6th decimal numbers in latitude/longitude pairs (depending on whether the integer part has one, two or three digits).
If you want to learn what's really happening, check this page and test some of your numbers: https://www.h-schmidt.net/FloatConverter/IEEE754.html
Representing
Note that Python rounds float values when representing them:
>>> lat = 123.456789
>>> "{0:.6f}".format(lat)
'123.456789'
>>> "{0:.5f}".format(lat)
'123.45679'
And as stated above, latitude/longitude cannot be correctly represented by a float32 down to the 6th decimal, and furthermore, the truncated float values are rounded when presented by Python:
>>> lat = 123.456789
>>> lat
123.456789
>>> "{0:.5f}".format(numpy.float64(lat))
'123.45679'
>>> "{0:.5f}".format(numpy.float32(lat))
'123.45679'
>>> "{0:.6f}".format(numpy.float32(lat))
'123.456787'
As you can see, the rounded version of that float32 number fails to match the original number from the 5th decimal. But also does the rounded version to the 5th decimal of the float64 number.

Your PyCharm on Linux is simply rounding of your large floating point number. Rounding it off to the nearest 6 or 7 can resolve your issue but DONT USE THESE AS FILE NAMES.
Keeping your code constant in both cases then, their can be many explanations:
1) 32-bit Processors handles float differently than 64-Bit Processors.
2) PyCharm for both Linux and Windows behaves differently for floating points which we cannot determine exactly, may be PyCharm for Windows is better optimised.
edit 1
Explanation for Point 1
on 32-Bit processors everything is really done in 80-bit precision internally. The precision really just determines how many of those bits are stored in memory. This is part of the reason why different optimisation settings can change results slightly: They change the amount of rounding from 80-bit to 32- or 64-bit.
edit 2
You can use hashmapping for saving your data in files and then mapping them onto the co-ordinates.
Example:
# variable = {(long,lat):"<random_file_name>"}
cordinates_and_file ={(-92.45453534,-87.2123123):"AxdwaWAsdAwdz"}

Related

Does Python document its behavior for rounding to a specified number of fractional digits?

Is the algorithm used for rounding a float in Python to a specified number of digits specified in any Python documentation? The semantics of round with zero fractional digits (i.e. rounding to an integer) are simple to understand, but it's not clear to me how the case where the number of digits is nonzero is implemented.
The most straightforward implementation of the function that I can think of (given the existence of round to zero fractional digits) would be:
def round_impl(x, ndigits):
return (10 ** -ndigits) * round(x * (10 ** ndigits))
I'm trying to write some C++ code that mimics the behavior of Python's round() function for all values of ndigits, and the above agrees with Python for the most part, when translated to equivalent C++ calls. However, there are some cases where it differs, e.g.:
>>> round(0.493125, 5)
0.49312
>>> round_impl(0.493125, 5)
0.49313
There is clearly a difference that occurs when the value to be rounded is at or very near the exact midpoint between two potential output values. Therefore, it seems important that I try to use the same technique if I want similar results.
Is the specific means for performing the rounding specified by Python? I'm using CPython 2.7.15 in my tests, but I'm specifically targeting v2.7+.
Also refer to What Every Programmer Should Know About Floating-Point Arithmetic, which has more detailed explanations for why this is happening as it is.
This is a mess. First of all, as far as float is concerned, there is no such number as 0.493125, when you write 0.493125 what you actually get is:
0.493124999999999980015985556747182272374629974365234375
So this number is not exactly between two decimals, it's actually closer to 0.49312 than it is to 0.49313, so it should definitely round to 0.49312, that much is clear.
The problem is that when you multiply by 105, you get the exact number 49312.5. So what happened here is the multiplication gave you an inexact result which by coincidence canceled out the rounding error in the original number. Two rounding errors canceled each other out, yay! But the problem is that when you do this, the rounding is actually incorrect... at least if you want to round up at midpoints, but Python 3 and Python 2 behave differently. Python 2 rounds away from 0, and Python 3 rounds towards even least-significant digits.
Python 2
if two multiples are equally close, rounding is done away from 0
Python 3
...if two multiples are equally close, rounding is done toward the even choice...
Summary
In Python 2,
>>> round(49312.5)
49313.0
>>> round(0.493125, 5)
0.49312
In Python 3,
>>> round(49312.5)
49312
>>> round(0.493125, 5)
0.49312
And in both cases, 0.493125 is really just a short way of writing 0.493124999999999980015985556747182272374629974365234375.
So, how does it work?
I see two plausible ways for round() to actually behave.
Choose the closest decimal number with the specified number of digits, and then round that decimal number to float precision. This is hard to implement, because it requires doing calculations with more precision than you can get from a float.
Take the two closest decimal numbers with the specified number of digits, round them both to float precision, and return whichever is closer. This will give incorrect results, because it rounds numbers twice.
And Python chooses... option #1! The exactly correct, but much harder to implement version. Refer to Objects/floatobject.c:927 double_round(). It uses the following process:
Write the floating-point number to a string in decimal format, using the requested precision.
Parse the string back in as a float.
This uses code based on David Gay's dtoa library. If you want C++ code that gets the actual correct result like Python does, this is a good start. Fortunately you can just include dtoa.c in your program and call it, since its licensing is very permissive.
The Python documentation for and 2.7 specifies the behaviour:
Values are rounded to the closest multiple of 10 to the power minus
ndigits; if two multiples are equally close, rounding is done away
from 0.
For 3.7:
For the built-in types supporting round(), values are rounded to the
closest multiple of 10 to the power minus ndigits; if two multiples
are equally close, rounding is done toward the even choice
Update:
The (cpython) implementation can be found floatobjcet.c in the function float___round___impl, which calls round if ndigits is not given, but double_round if it is.
double_round has two implementations.
One converts the double to a string (aka decimal) and back to a double.
The other one does some floating point calculations, calls to pow and at its core calls round. It seems to have more potential problems with overflows, since it actually multiplies the input by 10**-ndigits.
For the precise algorithm, look at the linked source file.

Dealing with large numbers in R [Inf] and Python

I am learning Python these days, and this is probably my first post on Python. I am relatively new to R as well, and have been using R for about a year. I am comparing both the languages while learning Python. I apologize if this question is too basic.
I am unsure why R outputs Inf for something python doesn't. Let's take 2^1500 as an example.
In R:
nchar(2^1500)
[1] 3
2^1500
[1] Inf
In Python:
len(str(2**1500))
Out[7]: 452
2**1500
Out[8]: 3507466211043403874...
I have two questions:
a) Why is it that R provides Inf when Python doesn't.
b) I researched How to work with large numbers in R? thread. It seems that Brobdingnag could help us out with dealing with large numbers. However, even in such case, I am unable to compute nchar. How do I compute above expression i.e. 2^1500 in R
2^Brobdingnag::as.brob(500)
[1] +exp(346.57)
> nchar(2^Brobdingnag::as.brob(500))
Error in nchar(2^Brobdingnag::as.brob(500)) :
no method for coercing this S4 class to a vector
In answer to your questions:
a) They use different representations for numbers. Most numbers in R are represented as double precision floating point values. These are all 64 bits long, and give about 15 digit precision throughout the range, which goes from -double.xmax to double.xmax, then switches to signed infinite values. R also uses 32 bit integer values sometimes. These cover the range of roughly +/- 2 billion. R chooses these types because it is geared towards statistical and numerical methods, and those rarely need more precision than double precision gives. (They often need a bigger range, but usually taking logs solves that problem.)
Python is more of a general purpose platform, and it has types discussed in MichaelChirico's comment.
b) Besides Brobdingnag, the gmp package can handle arbitrarily large integers. For example,
> as.bigz(2)^1500
Big Integer ('bigz') :
[1] 35074662110434038747627587960280857993524015880330828824075798024790963850563322203657080886584969261653150406795437517399294548941469959754171038918004700847889956485329097264486802711583462946536682184340138629451355458264946342525383619389314960644665052551751442335509249173361130355796109709885580674313954210217657847432626760733004753275317192133674703563372783297041993227052663333668509952000175053355529058880434182538386715523683713208549376
> nchar(as.character(as.bigz(2)^1500))
[1] 452
I imagine the as.character() call would also be needed with Brobdingnag.
Apparently python uses arbitrary precision integers by default when needed. R does not. However, there are many useful R packages to perform arbitrary precision arithmetic. Which package to pick depends on the use case.
To bring up a package that hasn't been discussed yet, consider the Rmpfr package:
> library(Rmpfr)
> a <- 2^mpfr(1500, 10000)
> a
1 'mpfr' number of precision 10000 bits
[1] 35074662110434038747627587960280857993524015880330828824075798024790963850563322203657080886584969261653150406795437517399294548941469959754171038918004700847889956485329097264486802711583462946536682184340138629451355458264946342525383619389314960644665052551751442335509249173361130355796109709885580674313954210217657847432626760733004753275317192133674703563372783297041993227052663333668509952000175053355529058880434182538386715523683713208549376
It requires you to set a precision, but if you make it large enough it can hold 2^1500 as integer.
However, it also doesn't seem to define an as.character() function:
> as.character(a)
[1] "<S4 object of class \"mpfr1\">"
So if your problem is specifically to count digits, then the gmp package as discussed in this answer is probably the way to go. On the other hand, if you're interested in arbitrary precision floating point arithmetic, Rmpfr might be a better choice.

Python rounding and inserting into array does not round [duplicate]

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

Python default behavior of str(x)

I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like
mydec = decimal.Decimal(str(x))
where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?
For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.
>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'
Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:
>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')
So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)
There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!
In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:
/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
the rounding noise created by various operations is suppressed, while
giving plenty of precision for practical use. */
#define PyFloat_STR_PRECISION 12
You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.
There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.
If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).
As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to
return a string containing a nicely printable representation of an object
There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.
If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):
Decimal( "{0:.5f}".format(a_float) )
You can also remove 0s on the right with resulting_string.rstrip("0").
Again, this method does not recover the information that has been lost.

Unable to see Python's approximations in mathematical calculations

Problem: to see when computer makes approximation in mathematical calculations when I use Python
Example of the problem:
My old teacher once said the following statement
You cannot never calculate 200! with your computer.
I am not completely sure whether it is true or not nowadays.
It seems that it is, since I get a lot zeros for it from a Python script.
How can you see when your Python code makes approximations?
Python use arbitrary-precision arithmetic to calculate with integers, so it can exactly calculate 200!. For real numbers (so-called floating-point), Python does not use an exact representation. It uses a binary representation called IEEE 754, which is essentially scientific notation, except in base 2 instead of base 10.
Thus, any real number that cannot be exactly represented in base 2 with 53 bits of precision, Python cannot produce an exact result. For example, 0.1 (in base 10) is an infinite decimal in base 2, 0.0001100110011..., so it cannot be exactly represented. Hence, if you enter on a Python prompt:
>>> 0.1
0.10000000000000001
The result you get back is different, since has been converted from decimal to binary (with 53 bits of precision), back to decimal. As a consequence, you get things like this:
>>> 0.1 + 0.2 == 0.3
False
For a good (but long) read, see What Every Programmer Should Know About Floating-Point Arithmetic.
Python has unbounded integer sizes in the form of a long type. That is to say, if it is a whole number, the limit on the size of the number is restricted by the memory available to Python.
When you compute a large number such as 200! and you see an L on the end of it, that means Python has automatically cast the int to a long, because an int was not large enough to hold that number.
See section 6.4 of this page for more information.
200! is a very large number indeed.
If the range of an IEEE 64-bit double is 1.7E +/- 308 (15 digits), you can see that the largest factorial you can get is around 170!.
Python can handle arbitrary sized numbers, as can Java with its BigInteger.
Without some sort of clarification to that statement, it's obviously false. Just from personal experience, early lessons in programming (in the late 1980s) included solving very similar, if not exactly the same, problems. In general, to know some device which does calculations isn't making approximations, you have to prove (in the math sense of a proof) that it isn't.
Python's integer types (named int and long in 2.x, both folded into just the int type in 3.x) are very good, and do not overflow like, for example, the int type in C. If you do the obvious of print 200 * 199 * 198 * ... it may be slow, but it will be exact. Similiarly, addition, subtraction, and modulus are exact. Division is a mixed bag, as there's two operators, / and //, and they underwent a change in 2.x—in general you can only treat it as inexact.
If you want more control yet don't want to limit yourself to integers, look at the decimal module.
Python handles large numbers automatically (unlike a language like C where you can overflow its datatypes and the values reset to zero, for example) - over a certain point (sys.maxint or 2147483647) it converts the integer to a "long" (denoted by the L after the number), which can be any length:
>>> def fact(x):
... return reduce(lambda x, y: x * y, range(1, x+1))
...
>>> fact(10)
3628800
>>> fact(200)
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000L
Long numbers are "easy", floating point is more complicated, and almost any computer representation of a floating point number is an approximation, for example:
>>> float(1)/3
0.33333333333333331
Obviously you can't store an infinite number of 3's in memory, so it cheats and rounds it a bit..
You may want to look at the decimal module:
Decimal numbers can be represented exactly. In contrast, numbers like 1.1 do not have an exact representation in binary floating point. End users typically would not expect 1.1 to display as 1.1000000000000001 as it does with binary floating point.
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem
See Handling very large numbers in Python.
Python has a BigNum class for holding 200! and will use it automatically.
Your teacher's statement, though not exactly true here is true in general. Computers have limitations, and it is good to know what they are. Remember that every time you add another integer of data storage, you can store a number that is 2^32 (4 billion +) times larger. It is hard to comprehend how many more numbers that is - but maths gets slower as you add more integers to store the exact value of a very large number.
As an example (what you can store with 1000 bits)
>>> 2 << 1000
2143017214372534641896850098120003621122809623411067214887500776740702102249872244986396
7576313917162551893458351062936503742905713846280871969155149397149607869135549648461970
8421492101247422837559083643060929499671638825347975351183310878921541258291423929553730
84335320859663305248773674411336138752L
I tried to illustrate how big a number you can store with 10000 bits, or even 8,000,000 bits (a megabyte) but that number is many pages long.

Categories

Resources