faster fractions module in python

faster fractions module in python - python

Is there a faster equivalent of the fractions module, something like a cFractions module, just as there is a cDecimal module, which is a faster equivalent of the Decimal module ? The fractions module is too slow.

Use http://code.google.com/p/gmpy/
It uses the GMP mutliple-precision library for fast integer and rational arithmetic.
Note: I'm also the maintainer.

I was struggling with lack of this package as well and decided to implement one called cfractions (source code available on Github).
The only thing we need is to install it
/path/to/python3 -m pip install cfractions
and after that replace fractions with cfractions in your modules, as easy as that.
Main features include
less memory
>>> from cfractions import Fraction
>>> import sys
>>> sys.getsizeof(Fraction())
32
compared to
>>> from fractions import Fraction
>>> import sys
>>> sys.getsizeof(Fraction())
48
so it's basically a plain Python object + 2 pointers for numerator & denominator.
more speed:
construction from pair of int
construction from single float
construction from str
sum of n instances
product of n instances
or if we take a look at relative performance
we can see fractions.Fraction skyrocketing, yay!
Note: I'm using perfplot package, all benchmarks run on Python3.9.4.
Python3.5+ support,
plain Python C API, no additional dependencies,
constructing from numerator/denominator pair, single int/float/any numbers.Rational value, str (from version 1.4.0),
full spectre of arithmetic & comparison operations,
string representation (both __repr__ & __str__),
pickleing and copying,
immmutability & hashability,
operating with int and float (with conversion of Fraction instance to float for the latter, as it is for fractions.Fraction),
PyPy support (by falling back to fractions.Fraction proxy),
property-based tests of all operations using Hypothesis framework.
What it doesn't include
operating with complex.

Unfortunately, there's no c equivalent available without needing a compiled external dependency. Depending on your needs, the gist I've made: https://gist.github.com/mscuthbert/f22942537ebbba2c31d4 may help.
It exposes a function opFrac(num) that optionally converts an int, float, or Fraction into a float or Fraction with a denominator limit (I use 65535 because I'm working with small fractions); if the float can be exactly represented in binary (i.e., it's a multiple of some power of two denominator), it leaves it alone. Otherwise it converts it to a Fraction. Similarly, if the Fraction is exactly representable in binary we convert it to a float; otherwise we leave it alone.
The Fraction(float).limit_denominator(x) call is extracted out into a helper function, _preFracLimitDenominator, that only creates one Fraction object rather than the three normally created with the call.
The use cases for this gist are pretty few, but where they exist, the results are spectacular. For my project, music21, we work mostly with notes that are generally placed on a beat (integer) or on a half, quarter, eighth, etc. beat (exactly representable in binary), but on the rarer occasions when notes have placement (offset) or duration that is, say, 1/3 or 1/5 of a beat, we were running into big floating point conversion problems that led to obscure bugs. Our test suite was running in 350 seconds using floating point offsets and durations. Switching everything to Fractions ballooned the time to 1100 seconds -- totally unacceptable. Switching to optional Fractions with fast Fraction creation brought the time back to 360 seconds, or only a 3% performance hit.
If you can deal with sometimes working with floats and sometimes Fractions, this may be the way to go.

I couldn't find anything.
You could make one.http://docs.python.org/extending/extending.html
A quick search on fractions in c gave me http://www.spiration.co.uk/post/1400/fractions-in-c---a-rational-arithmetic-library. Use the 2nd post, it also handles negative numbers.
But that may not be what you need and you can find something else. If you don't want to extend python you have to stick to Fractions if you can't find anyone who has a cFractions module. I'm sorry.

Related

Using decimals in math functions in Python

math is a Python module used by many to do a bit more advanced mathematical functions and using the decimal module. One can calculate stuff correctly 1.2-1.1=0.0999~, but using the decimal type it's 0.1.
My problem is that these two modules don't work well with each other. For example, log(1000, 10)=2.9999~, but using a decimal type gives the same result. How can I make these two work with each other? Do I need to implement my own functions? Isn't there any way?

You have Decimal.log10, Decimal.ln and Decimal.logb methods of each Decimal instance, and many more (max, sqrt):
from decimal import Decimal
print(Decimal('0.001').log10())
# Decimal('-3')
print(Decimal('0.001').ln())
# Decimal('-6.907755278982137052053974364')
There are no trigonometry functions, though.
More advanced alternative for arbitrary-precision calculations is mpmath, available at PyPI. It won't provide you with sin or tan as well, of course, because sin(x) is irrational for many x values, and so storing it as Decimal doesn't make sense. However, given fixed precision you can compute these functions via Tailor series etc. with mpmath help. You can also do Decimal(math.sin(0.17)) to get some decimal holding something close to sine of 0.17 radians.
Also refer to official Decimal recipes.

There is a log10 method in Decimal.
from decimal import *
a = Decimal('1000')
print(a.log10())
However, it would make more sense to me to use a function that calculates the exact value of a logarithm if you're trying to solve logarithms with exact integer solutions. Logarithm functions are generally expected to output some irrational result in typical usage. You could instead use a for loop and repeated division.

Transferring a double from C++ to python without loss of precision

I have some C++ code which outputs an array of double values. I want to use these double values in python. The obvious and easiest way to transfer the values would of course be dumping them into a file and then rereading the file in python. However, this would lead to loss of precision, because not all decimal places may be transferred. On the other hand, if I add more decimal places, the file gets larger. The array I am trying to transfer has a few million entries. Hence, my idea is to use the double's binary representation, dump them into a binary file and rereading that in python.
The first problem is, that I do not know how the double values are formatted in memory, for example here. It is easy to read the binary representation of an object from memory, but I have to known where the sign bit, the exponent and the mantiassa are located. There are of course standards for this. The first question is therefore, how do I know which standard my compiler uses? I want to use g++-9. I tried googling this question for various compilers, but without any precise answer. The next question would be on how to turn the bytes back into a double, given the format.
Another possibility may be to compile the C++ code as a python module and use it directly, transferring the array without a file from memory only. But I do not know if this would be easy to set up quickly.
I have also seen that it is possible to compile C++ code directly from a string in python using numpy, but I cannot find any documentation for that.

You could write out the double value(s) in binary form and then read and convert them in python with struct.unpack("d", file.read(8)), thereby assuming that IEEE 754 is used.
There are a couple of issues, however:
C++ does not specify the bit representation of doubles. While it is IEEE 754 on any platform I have come across, this should not be taken for granted.
Python assumes big endian byte ordering. So on a little endian machine you have to tell struct.unpack when reading or change endianess before writing.
If this code is targeted for a specific machine I would advice to just test the approach on the machine.
This code should then not be assumed to work on other architectures, so it is advisable that you have checks in your Makefile/CMakefile that refuses to build on unexpected targets.
Another approach would be to use a common serialization format, such as protobuf. They essentially have to deal with the same problems but I would argue that they have solved it.

I have not checked that, but probably python's C++ interface will store doubles by just copying the binary image they represent (the 64bit image) as most probably both languages use the same internal representation of binary floating point numbers (IEEE-754 binary 64bit format) This has one reason: it is because both use the floating point coprocessor to operate on them, and that's the format it requires to pass it the numbers.
One question arises on that, as you don't say: How have you determined that you are lossing precision in the data? Have you checked different decimal digits only? Or have you exported the actual binary format to check for differences in the bit patterns? A common mistake is to print both numbers with, let's say 20 significand digits, and then observe differences in the last two or three digits. This is because you are failing to acquaint on that doubles represented this way (in binary IEEE-752 format) have only around 17 significant digits (it depends on the number, but you can have differences on digit 17th or later, this is because the numbers are binary encoded)
What I strongly don't recommend to you is to convert those numbers into a decimal representation and send them as ascii strings. You are going to lose some precision (in form of rounding errors, see below) in the encoding, and then again in the decoding phase in python. Think that converting (even at the maximum precision) a binary floating point into decimal, and then back to binary is almost always a lossing information process. The problem is that a number that can be represented exactly in decimal (like 0.1) cannot be represented exactly in binary form (you get a periodic infinite repeating sequence, as when you divide 1.0 by 3.0 in decimal, you get a result that is not exact) The opposite conversion is different, as you can always convert a finite decimal binary number into a finite decimal base ten number, but not within 53 bits --which is the amount of bits dedicated to the significand in 64 bit floating point numbers)
So, my advice is to recheck where your numbers show differences and compare with what I say here (if the numbers show differences in digit positions after the 16th decimal digit, those differences are ok --- they have to do only with the different algorithms used by C++ library and python library to convert the numbers into decimal format) If the differences occur before that, check how are represented floating point numbers in python, or check if, at some point, you lose precision by storing those numbers in a single precision float variable (this is more frequent that normally one estimates) and see if there's some difference (I don't believe there will be) in the formats used by both environments. By the way, showing such differences in your question should be a plus (something you have also not done) as we could tell you if the differences you observe are normal or not.

Getting the most accurate precision with a function equating factorial, divison and squaring [duplicate]

I'm using the Decimal class for operations that requires precision.
I would like to use 'largest possible' precision. With this, I mean as precise as the system on which the program runs can handle.
To set a certain precision it's simple:
import decimal
decimal.getcontext().prec = 123 #123 decimal precision
I tried to figure out the maximum precision the 'Decimal' class can compute:
print(decimal.MAX_PREC)
>> 999999999999999999
So I tried to set the precision to the maximum precision (knowing it probably won't work..):
decimal.getcontext().prec = decimal.MAX_PREC
But, of course, this throws a Memory Error (on division)
So my question is: How do I figure out the maximum precision the current system can handle?
Extra info:
import sys
print(sys.maxsize)
>> 9223372036854775807

Trying to do this is a mistake. Throwing more precision at a problem is a tempting trap for newcomers to floating-point, but it's not that useful, especially to this extreme.
Your operations wouldn't actually require the "largest possible" precision even if that was a well-defined notion. Either they require exact arithmetic, in which case decimal.Decimal is the wrong tool entirely and you should look into something like fractions.Fraction or symbolic computation, or they don't require that much precision, and you should determine how much precision you actually need and use that.
If you still want to throw all the precision you can at your problem, then how much precision that actually is will depend on what kind of math you're doing, and how many absurdly precise numbers you're attempting to store in memory at once. This can be determined by analyzing your program and the memory requirements of Decimal objects, or you can instead take the precision as a parameter and binary search for the largest precision that doesn't cause a crash.

I'd like to suggest a function that allows you to estimate your maximum precision for a given operation in a brute force way:
def find_optimum(a,b, max_iter):
for i in range(max_iter):
print(i)
c = int((a+b)/2)
decimal.getcontext().prec = c
try:
dummy = decimal.Decimal(1)/decimal.Decimal(7) #your operation
a = c
print("no fail")
except MemoryError:
print("fail")
dummy = 1
b = c
print(c)
del dummy
This is just halving intervals one step at a time and looks if an error occurs. Calling with max_iter=10 and a=int(1e9), b=int(1e11) gives:
>>> find_optimum(int(1e9), int(1e11), 10)
0
fail
50500000000
1
no fail
25750000000
2
no fail
38125000000
3
no fail
44312500000
4
fail
47406250000
5
fail
45859375000
6
no fail
45085937500
7
no fail
45472656250
8
no fail
45666015625
9
no fail
45762695312
This may give a rough idea of what you are dealing with. This took approx half an hour on i5-3470 and 16GB RAM so you really only would use it for testing purposes.
I don't think, that there is an actual exact way of getting the maximum precision for your operation, as you'd have to have exact knowledge of the dependency of your memory usage on memory consumption. I hope this helps you at least a bit and I would really like to know, what you need that kind of precision for.
EDIT I feel like this really needs to be added, since I read your comments under the top rated post here. Using arbitrarily high precision in this manner is not the way, that people calculate constants. You would program something, that utilizes disk space in a smart way (for example calcutating a bunch of digits in RAM and writing this bunch to a text file), but never only use RAM/swap only, because this will always limit your results. With modern algorithms to calculate pi, you don't need infinite RAM, you just put another 4TB hard drive in the machine and let it write the next digits. So far for mathematical constants.
Now for physical constants: They are not precise. They rely on measurement. I'm not quite sure atm (will edit) but I think the most exact physical constant has an error of 10**(-8). Throwing more precision at it, doesn't make it more exact, you just calculate more wrong numbers.
As an experiment though, this was a fun idea, which is why I even posted the answer in the first place.

The maximum precision of the Decimal class is a function of the memory on the device, so there's no good way to set it for the general case. Basically, you're allocating all of the memory on the machine to one variable to get the maximum precision.
If the mathematical operation supports it, long integers will give you unlimited precision. However, you are limited to whole numbers.
Addition, subtraction, multiplication, and simple exponents can be performed exactly with long integers.
Prior to Python 3, the built-in long data type would perform arbitrary precision calculations.
https://docs.python.org/2/library/functions.html#long
In Python >=3, the int data type now represents long integers.
https://docs.python.org/3/library/functions.html#int
One example of a 64-bit integer math is implementation is bitcoind, where transactions calculations require exact values. However, the precision of Bitcoin transactions is limited to 1 "Satoshi"; each Bitcoin is defined as 10^8 (integer) Satoshi.
The Decimal class works similarly under the hood. A Decimal precision of 10^-8 is similar to the Bitcoin-Satoshi paradigm.

From your reply above:
What if I just wanted to find more digits in pi than already found? what if I wanted to test the irrationality of e or mill's constant.
I get it. I really do. My one SO question, several years old, is about arbitrary-precision floating point libraries for Python. If those are the types of numerical representations you want to generate, be prepared for the deep dive. Decimal/FP arithmetic is notoriously tricky in Computer Science.
Some programmers, when confronted with a problem, think “I know, I’ll use floating point arithmetic.” Now they have 1.999999999997 problems. – #tomscott
I think when others have said it's a "mistake" or "it depends" to wonder what the max precision is for a Python Decimal type on a given platform, they're taking your question more literally than I'm guessing it was intended. You asked about the Python Decimal type, but if you're interested in FP arithmetic for educational purposes -- "to find more digits in pi" -- you're going to need more powerful, more flexible tools than Decimal or float. These built-in Python types don't even come close. Those are good enough for NASA maybe, but they have limits... in fact, the very limits you are asking about.
That's what multiple-precision (or arbitrary-precision) floating point libraries are for: arbitrarily-precise representations. Want to compute pi for the next 20 years? Python's Decimal type won't even get you through the day.
The fact is, multi-precision binary FP arithmetic is still kinda fringe science. For Python, you'll need to install the GNU MPFR library on your Linux box, then you can use the Python library gmpy2 to dive as deep as you like.
Then, the question isn't, "What's the max precision my program can use?"
It's, "How do I write my program so that it'll run until the electricity goes out?"
And that's a whole other problem, but at least it's restricted by your algorithm, not the hardware it runs on.

Is there a way of setting a default precision that differs from double in Python?

I'm aware of Decimal, however I am working with a lot of code written by someone else, and I don't want to go through a large amount of code to change every initialization of a floating point number to Decimal. It would be more convenient if there was some kind of package where I could put SetPrecision(128) or such at the top of my scripts and be off to the races. I suspect no such thing exists but I figured I would ask just in case I'm wrong.
To head off XY Problem comments, I'm solving differential equations which are supposed to be positive invariant, and one quantity which has an equilibrium on the order of 1e-12 goes negative regardless of the error tolerance I specify (using scipy's interface to LSODA).

yes, but no. `
The bigfloat package is a Python wrapper for the GNU MPFR library for
arbitrary-precision floating-point reliable arithmetic. The MPFR
library is a well-known portable C library for arbitrary-precision
arithmetic on floating-point numbers. It provides precise control over
precisions and rounding modes and gives correctly-rounded reproducible
platform-independent results.`
Blockquote
https://pythonhosted.org/bigfloat
You would then need to coerce the builtin float to be bigfloat everywhere, which would likely be non-trivial.

LSODA exposed through scipy.integrate is double precision only.
You might want to look into some rescaling of variables, so that that thing which is 1e-12 becomes closer to unity.
EDIT. In the comments, you indicated
As I've stated three times, I am open to rewriting to avoid LSODA
Then what you can try doing is to look over the code of solve_ivp, which is pure python. Feed it with decimals or mpmath high-precision floats. Observe where it fails, look for where it assumes double precision. Rewrite, remove this assumption. Rinse and repeat. Whether it'll work in the end, I don't know. Whether it's worth it, I suspect not, but YMMV.

Calculate the shortest decimal number that is approximated by a given float

The python floating point docs (eg https://docs.python.org/3/tutorial/floatingpoint.html) state
Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.
Historically, the Python prompt and built-in repr() function would choose the one with 17 significant digits, 0.10000000000000001. Starting with Python 3.1, Python (on most systems) is now able to choose the shortest of these and simply display 0.1.
Is there a way I can get that shortest representation as a decimal.Decimal (or other exact representation)?
Obviously one way would be decimal.Decimal(repr(0.1)) but I'm wondering if there is something explicit that doesn't rely on the vague "on most systems" caveat and possibly is available as a package that would work with earlier version of python.
(Also, functions that do this in other languages may be of interest if there is nothing in python, as this is really a general floating point question)

The following may be useful:
Python provides PyOS_double_to_string, but the code comment says this is only used if _Py_dg_dtoa is not available.
_Py_dg_dtoa is available in the source but I'm not sure if it can be accessed publicly. (In particular, it would be nice if there was a way to 'prove' that a the python interpreter we are using is using this internally). This function has detailed documentation about the way the string conversion is done, and it explains which flags to use to get the shortest decimal representation. If this was available from the interpreter this might be the best option as using it directly would give certainty about what was happening.
The implementation of Errol can be found on git and the paper found here has references to many other implementations along with a summary of their pros and cons.
Any one of the float -> string methods mentioned above would be a reasonable way to proceed, as would the standard repr method but it will require digging into the implementation details to be sure that a particular method guarantees to give you what you a looking for.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.