How does the decimal accuracy of Python compare to that of C?

How does the decimal accuracy of Python compare to that of C? - python

I was looking at the Golden Ratio formula for finding the nth Fibonacci number, and it made me curious.
I know Python handles arbitrarily large integers, but what sort of precision do you get with decimals? Is it just straight on top of a C double or something, or does it use a a more accurate modified implementation too? (Obviously not with arbitrary accuracy. ;D)

almost all platforms map Python floats to IEEE-754 “double precision”.
http://docs.python.org/tutorial/floatingpoint.html#representation-error
there's also the decimal module for arbitrary precision floating point math

Python floats use the double type of the underlying C compiler. As Bwmat says, this is generally IEEE-754 double precision.
However if you need more precision than that you can use the Python decimal module which was added in Python 2.4.
Python 2.6 also added the fraction module which may be a better fit for some problems.
Both of these are going to be slower than using the float type, but that is the price for more precision.

Related

Transferring a double from C++ to python without loss of precision

I have some C++ code which outputs an array of double values. I want to use these double values in python. The obvious and easiest way to transfer the values would of course be dumping them into a file and then rereading the file in python. However, this would lead to loss of precision, because not all decimal places may be transferred. On the other hand, if I add more decimal places, the file gets larger. The array I am trying to transfer has a few million entries. Hence, my idea is to use the double's binary representation, dump them into a binary file and rereading that in python.
The first problem is, that I do not know how the double values are formatted in memory, for example here. It is easy to read the binary representation of an object from memory, but I have to known where the sign bit, the exponent and the mantiassa are located. There are of course standards for this. The first question is therefore, how do I know which standard my compiler uses? I want to use g++-9. I tried googling this question for various compilers, but without any precise answer. The next question would be on how to turn the bytes back into a double, given the format.
Another possibility may be to compile the C++ code as a python module and use it directly, transferring the array without a file from memory only. But I do not know if this would be easy to set up quickly.
I have also seen that it is possible to compile C++ code directly from a string in python using numpy, but I cannot find any documentation for that.

You could write out the double value(s) in binary form and then read and convert them in python with struct.unpack("d", file.read(8)), thereby assuming that IEEE 754 is used.
There are a couple of issues, however:
C++ does not specify the bit representation of doubles. While it is IEEE 754 on any platform I have come across, this should not be taken for granted.
Python assumes big endian byte ordering. So on a little endian machine you have to tell struct.unpack when reading or change endianess before writing.
If this code is targeted for a specific machine I would advice to just test the approach on the machine.
This code should then not be assumed to work on other architectures, so it is advisable that you have checks in your Makefile/CMakefile that refuses to build on unexpected targets.
Another approach would be to use a common serialization format, such as protobuf. They essentially have to deal with the same problems but I would argue that they have solved it.

I have not checked that, but probably python's C++ interface will store doubles by just copying the binary image they represent (the 64bit image) as most probably both languages use the same internal representation of binary floating point numbers (IEEE-754 binary 64bit format) This has one reason: it is because both use the floating point coprocessor to operate on them, and that's the format it requires to pass it the numbers.
One question arises on that, as you don't say: How have you determined that you are lossing precision in the data? Have you checked different decimal digits only? Or have you exported the actual binary format to check for differences in the bit patterns? A common mistake is to print both numbers with, let's say 20 significand digits, and then observe differences in the last two or three digits. This is because you are failing to acquaint on that doubles represented this way (in binary IEEE-752 format) have only around 17 significant digits (it depends on the number, but you can have differences on digit 17th or later, this is because the numbers are binary encoded)
What I strongly don't recommend to you is to convert those numbers into a decimal representation and send them as ascii strings. You are going to lose some precision (in form of rounding errors, see below) in the encoding, and then again in the decoding phase in python. Think that converting (even at the maximum precision) a binary floating point into decimal, and then back to binary is almost always a lossing information process. The problem is that a number that can be represented exactly in decimal (like 0.1) cannot be represented exactly in binary form (you get a periodic infinite repeating sequence, as when you divide 1.0 by 3.0 in decimal, you get a result that is not exact) The opposite conversion is different, as you can always convert a finite decimal binary number into a finite decimal base ten number, but not within 53 bits --which is the amount of bits dedicated to the significand in 64 bit floating point numbers)
So, my advice is to recheck where your numbers show differences and compare with what I say here (if the numbers show differences in digit positions after the 16th decimal digit, those differences are ok --- they have to do only with the different algorithms used by C++ library and python library to convert the numbers into decimal format) If the differences occur before that, check how are represented floating point numbers in python, or check if, at some point, you lose precision by storing those numbers in a single precision float variable (this is more frequent that normally one estimates) and see if there's some difference (I don't believe there will be) in the formats used by both environments. By the way, showing such differences in your question should be a plus (something you have also not done) as we could tell you if the differences you observe are normal or not.

Getting the most accurate precision with a function equating factorial, divison and squaring [duplicate]

I'm using the Decimal class for operations that requires precision.
I would like to use 'largest possible' precision. With this, I mean as precise as the system on which the program runs can handle.
To set a certain precision it's simple:
import decimal
decimal.getcontext().prec = 123 #123 decimal precision
I tried to figure out the maximum precision the 'Decimal' class can compute:
print(decimal.MAX_PREC)
>> 999999999999999999
So I tried to set the precision to the maximum precision (knowing it probably won't work..):
decimal.getcontext().prec = decimal.MAX_PREC
But, of course, this throws a Memory Error (on division)
So my question is: How do I figure out the maximum precision the current system can handle?
Extra info:
import sys
print(sys.maxsize)
>> 9223372036854775807

Trying to do this is a mistake. Throwing more precision at a problem is a tempting trap for newcomers to floating-point, but it's not that useful, especially to this extreme.
Your operations wouldn't actually require the "largest possible" precision even if that was a well-defined notion. Either they require exact arithmetic, in which case decimal.Decimal is the wrong tool entirely and you should look into something like fractions.Fraction or symbolic computation, or they don't require that much precision, and you should determine how much precision you actually need and use that.
If you still want to throw all the precision you can at your problem, then how much precision that actually is will depend on what kind of math you're doing, and how many absurdly precise numbers you're attempting to store in memory at once. This can be determined by analyzing your program and the memory requirements of Decimal objects, or you can instead take the precision as a parameter and binary search for the largest precision that doesn't cause a crash.

I'd like to suggest a function that allows you to estimate your maximum precision for a given operation in a brute force way:
def find_optimum(a,b, max_iter):
for i in range(max_iter):
print(i)
c = int((a+b)/2)
decimal.getcontext().prec = c
try:
dummy = decimal.Decimal(1)/decimal.Decimal(7) #your operation
a = c
print("no fail")
except MemoryError:
print("fail")
dummy = 1
b = c
print(c)
del dummy
This is just halving intervals one step at a time and looks if an error occurs. Calling with max_iter=10 and a=int(1e9), b=int(1e11) gives:
>>> find_optimum(int(1e9), int(1e11), 10)
0
fail
50500000000
1
no fail
25750000000
2
no fail
38125000000
3
no fail
44312500000
4
fail
47406250000
5
fail
45859375000
6
no fail
45085937500
7
no fail
45472656250
8
no fail
45666015625
9
no fail
45762695312
This may give a rough idea of what you are dealing with. This took approx half an hour on i5-3470 and 16GB RAM so you really only would use it for testing purposes.
I don't think, that there is an actual exact way of getting the maximum precision for your operation, as you'd have to have exact knowledge of the dependency of your memory usage on memory consumption. I hope this helps you at least a bit and I would really like to know, what you need that kind of precision for.
EDIT I feel like this really needs to be added, since I read your comments under the top rated post here. Using arbitrarily high precision in this manner is not the way, that people calculate constants. You would program something, that utilizes disk space in a smart way (for example calcutating a bunch of digits in RAM and writing this bunch to a text file), but never only use RAM/swap only, because this will always limit your results. With modern algorithms to calculate pi, you don't need infinite RAM, you just put another 4TB hard drive in the machine and let it write the next digits. So far for mathematical constants.
Now for physical constants: They are not precise. They rely on measurement. I'm not quite sure atm (will edit) but I think the most exact physical constant has an error of 10**(-8). Throwing more precision at it, doesn't make it more exact, you just calculate more wrong numbers.
As an experiment though, this was a fun idea, which is why I even posted the answer in the first place.

The maximum precision of the Decimal class is a function of the memory on the device, so there's no good way to set it for the general case. Basically, you're allocating all of the memory on the machine to one variable to get the maximum precision.
If the mathematical operation supports it, long integers will give you unlimited precision. However, you are limited to whole numbers.
Addition, subtraction, multiplication, and simple exponents can be performed exactly with long integers.
Prior to Python 3, the built-in long data type would perform arbitrary precision calculations.
https://docs.python.org/2/library/functions.html#long
In Python >=3, the int data type now represents long integers.
https://docs.python.org/3/library/functions.html#int
One example of a 64-bit integer math is implementation is bitcoind, where transactions calculations require exact values. However, the precision of Bitcoin transactions is limited to 1 "Satoshi"; each Bitcoin is defined as 10^8 (integer) Satoshi.
The Decimal class works similarly under the hood. A Decimal precision of 10^-8 is similar to the Bitcoin-Satoshi paradigm.

From your reply above:
What if I just wanted to find more digits in pi than already found? what if I wanted to test the irrationality of e or mill's constant.
I get it. I really do. My one SO question, several years old, is about arbitrary-precision floating point libraries for Python. If those are the types of numerical representations you want to generate, be prepared for the deep dive. Decimal/FP arithmetic is notoriously tricky in Computer Science.
Some programmers, when confronted with a problem, think “I know, I’ll use floating point arithmetic.” Now they have 1.999999999997 problems. – #tomscott
I think when others have said it's a "mistake" or "it depends" to wonder what the max precision is for a Python Decimal type on a given platform, they're taking your question more literally than I'm guessing it was intended. You asked about the Python Decimal type, but if you're interested in FP arithmetic for educational purposes -- "to find more digits in pi" -- you're going to need more powerful, more flexible tools than Decimal or float. These built-in Python types don't even come close. Those are good enough for NASA maybe, but they have limits... in fact, the very limits you are asking about.
That's what multiple-precision (or arbitrary-precision) floating point libraries are for: arbitrarily-precise representations. Want to compute pi for the next 20 years? Python's Decimal type won't even get you through the day.
The fact is, multi-precision binary FP arithmetic is still kinda fringe science. For Python, you'll need to install the GNU MPFR library on your Linux box, then you can use the Python library gmpy2 to dive as deep as you like.
Then, the question isn't, "What's the max precision my program can use?"
It's, "How do I write my program so that it'll run until the electricity goes out?"
And that's a whole other problem, but at least it's restricted by your algorithm, not the hardware it runs on.

Is there a way of setting a default precision that differs from double in Python?

I'm aware of Decimal, however I am working with a lot of code written by someone else, and I don't want to go through a large amount of code to change every initialization of a floating point number to Decimal. It would be more convenient if there was some kind of package where I could put SetPrecision(128) or such at the top of my scripts and be off to the races. I suspect no such thing exists but I figured I would ask just in case I'm wrong.
To head off XY Problem comments, I'm solving differential equations which are supposed to be positive invariant, and one quantity which has an equilibrium on the order of 1e-12 goes negative regardless of the error tolerance I specify (using scipy's interface to LSODA).

yes, but no. `
The bigfloat package is a Python wrapper for the GNU MPFR library for
arbitrary-precision floating-point reliable arithmetic. The MPFR
library is a well-known portable C library for arbitrary-precision
arithmetic on floating-point numbers. It provides precise control over
precisions and rounding modes and gives correctly-rounded reproducible
platform-independent results.`
Blockquote
https://pythonhosted.org/bigfloat
You would then need to coerce the builtin float to be bigfloat everywhere, which would likely be non-trivial.

LSODA exposed through scipy.integrate is double precision only.
You might want to look into some rescaling of variables, so that that thing which is 1e-12 becomes closer to unity.
EDIT. In the comments, you indicated
As I've stated three times, I am open to rewriting to avoid LSODA
Then what you can try doing is to look over the code of solve_ivp, which is pure python. Feed it with decimals or mpmath high-precision floats. Observe where it fails, look for where it assumes double precision. Rewrite, remove this assumption. Rinse and repeat. Whether it'll work in the end, I don't know. Whether it's worth it, I suspect not, but YMMV.

Mongodb lack of precision incrementing floats

I have a problem because Mongodb doesn't seem to maintain precision when incrementing floats. For example, the following should yield 2.0:
from decimal import Decimal # for python precision
for i in range(40):
db.test.update({}, {'$inc': {'count': float(Decimal(1) / 20)}}, upsert=True)
print db.test.find_one()['count']
2.000000000000001
How can I get around this issue?

Unfortunately, you can't -- at least not directly. Mongo stores floating-point numbers as double-precision IEEE floats (https://en.wikipedia.org/wiki/IEEE_floating_point), and those rounding errors are inherent to the format.
I'm noticing you're using Decimals in your code -- they're converted to Python floats (which are doubles) before being sent to the DB. If you want to keep your true decimal precision, you'll have to store your numbers as stringified Decimals, which means you'll also have to give up Mongo's number-handling facilities such as $inc.
It is, sadly, a tradeoff you'll be confronted to in most databases and programming languages: IEEE floating-point numbers is the format CPUs natively deal with, and any attempts to stray away from them (to use arbitrary-precision decimals like decimal.Decimal) come with a big performance and usability penalty.

Calculating Pi with decimal on Python

I'm trying to calculate pi with arbitrary precision on Python using one of Ramanujan's formulas: http://en.wikipedia.org/wiki/Approximations_of_%CF%80#20th_century. It basically requires lots of factorials and high-precision floating numbers division.
Here's my code so far:
http://pastie.org/private/pa6ijmoowiwiw4xwiqmq
I'm getting error somewhere around the fifteenth digit of pi( 3.1415926535897930 and it should be 3.1415926535897932 ).
Can you give some advice why is it happening?
I' am using decimal type and the docs say that it allows arbitrary precision floating and integer numbers.
PS: It's a homework assignment so i can't use another formula.
PSS: I'm using python 2.7
Thanks:)

Use Decimal(2).sqrt() instead of Decimal(sqrt(2)).
I've checked the first 1000 digits and it seems to work fine. By the way, for some reason your code outputs 1007 decimal places instead of 1000.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.