Performance improvments for Python large integer multiplication

Performance improvments for Python large integer multiplication - python

In python, I'm working on a project that involves very large number multiplication due to taking the nth factorial of x where x is n 1s in a row.
The whole things work very efficiently, but I'm spending 80%+ of my computation time calculating the product of the integers for the factorial.
This bottleneck becomes especially noticeable at n = 7 where I effectively hit a brick wall. 6 takes under 0.1 seconds, 7 takes 7.5 seconds, and 8 takes so long I've stopped it after a few minutes without it completing.
Any way I can improve the efficiency of this? More specifically the efficiency of the math.prod(arr).
import argparse
import MyFormatter
import datetime
import math
def first_n_digits(num, n):
return num // 10 ** (int(math.log(num, 10)) - n + 1)
start = datetime.datetime.now()
parser = argparse.ArgumentParser(
formatter_class=MyFormatter.MyFormatter,
description="Calcs x factorial",
usage="",
)
parser.add_argument("-n", "--number", type=int)
args = parser.parse_args()
if args.number == 1 :
print(1)
exit()
s = ""
for _ in range(0, args.number) :
s = s + "1"
n = 1
s = int(s)
arr = []
while (s > 1) :
arr.append(s)
s -= args.number
n = math.prod(arr)
fnd = str(first_n_digits(n,3))
print("{}.{}{}e{}".format(fnd[0], fnd[1], fnd[2], int(math.log10(n))))
end = datetime.datetime.now()
print(end-start)

You don't need an exact product. You just need 3 leading digits and an order of magnitude. You're wasting tremendous amounts of time and memory doing the computation in exact integer arithmetic.
One initial step would be to add up logarithms instead of multiplying integers:
log_prod = 0
while s > 1:
log_prod += math.log10(s)
s -= args.number
magnitude = int(log_prod)
normalized = 10**(log_prod-magnitude)
This discards millions of digits of precision you don't need, computing results in approximate floating-point arithmetic that still has enough precision for your use case. normalized is a number between 1 and 10 that has the same leading digits as the full product, and magnitude is the full product's order of magnitude.
This still has to add up a lot of logarithms as the input size increases, taking exponentially more time and losing more precision. Further steps might involve using a more sophisticated summation routine (helping with precision, but not runtime), or finding a different way to express the multifactorial that's more amenable to computation.

Related

Numerical approximation of forward difference in an interval

How can python be used for numerical finite difference calculation without using numpy?
For example I want to find multiple function values numerically in a certain interval with a step size 0.05 for a first order and second order derivatives.

Why don't you want to use Numpy? It's a good library and very fast for doing numerical computations because it's written in C (which is generally faster for numerical stuff than pure Python).
If you're curious how these methods work and how they look in code here's some sample code:
def linspace(a, b, step):
if a > b:
# see if going backwards?
if step < 0:
return linspace(b, a, -1*step)[::-1]
# step isn't negative so no points
return []
pt = a
res = [pt]
while pt <= b:
pt += step
res.append(pt)
return res
def forward(data, step):
if not data:
return []
res = []
i = 0
while i+1 < len(data):
delta = (data[i+1] - data[i])/step
res.append(delta)
i += 1
return res
# example usage
size = 0.1
ts = linspace(0, 1, size)
y = [t*t for t in ts]
dydt = forward(y, size)
d2ydt2 = forward(dydt, size)
Note: this will still use normal floating point numbers and so there are still odd rounding errors that happen because some numbers don't have an exact binary decimal representation.
Another library to check out is mpmath which has a lot of cool math functions like integration and special functions AND it allows you to specify how much precision you want. Of course using 100 digits of precision is going to be a lot slower than normal floats, but it is still a very cool library!

When does python start using a different algorithm for big multiplication?

I'm currently in an algorithms class and was interested to see which of two methods of multiplying a list of large numbers gives the faster runtime. What I found was that the recursive multiply performs about 10x faster. For the code below, I got t_sim=53.05s and t_rec=4.73s. I did some other tests and they all seemed to be around the 10x range.
Additionally, you could put the values from the recursive multiply into a tree and reuse them to even more quickly compute multiplications of subsets of the list.
I did a theoretical runtime analysis, and both are n^2 using standard multiplication, but when the karatsuba algorithm is used, that factor goes down to n^log_2(3).
Every multiply in simple_multiply should have runtime i * 1. Summing over i=1...n, we get an arithmetic series and can use gauss's formula to get n*(n+1)/2 = O(n^2).
For the second one, we can see that the time to multiply for a given level of recursion is (2^d)^2, where d is the depth, but only needs to multiply n*2^-d values. The levels turn out to form a geometric series where the runtime at each level is n*2^d with a final depth of log_2(n). The solution to the geometric series is n * (1-2^log_2(n))/(1-2) = n*(n-1) = O(n^2). If using the karatsuba algorithm, you can get O(n^log_2(3)) by doing the same method
If the code were using the karatsuba algorithm, then the speedup would make sense, but what doesn't seem to make sense is the linear relationship between the two runtimes, making it seem like python is using standard multiplication, which according to wikipedia is faster when using under 500ish bits. (I'm using 2^23 bits in the code below. Each number is literally a megabyte long)
import random
import time
def simple_multiply(values):
a = 1
for val in values:
a *= val
return a
def recursive_multiply(values):
if len(values) == 1:
return values[0]
temp = []
i = 0
while i + 1 < len(values):
temp.append(values[i] * values[i+1])
i += 2
if len(values) % 2 == 1:
temp.append(values[-1])
return recursive_multiply(temp)
def test(func, values):
t1 = time.time()
func(values)
print( time.time() - t1)
def main():
n = 2**11
scale = 2**12
values = [random.getrandbits(scale) for i in range(n)]
test(simple_multiply, values)
test(recursive_multiply, values)
pass
if __name__ == '__main__':
main()

Both versions of the code have the same number of multiplications, but in the simple version each multiplication is ~2000 bits long on average.
In the second version n/2 multiplications are 24 bits long, n/4 are 48 bits long, n/8 are 96 bits long, etc... The average length is only 48 bits.

There is something wrong in your assumption. Your assumption is that each multiplication of the between the different ranks should take same times, for instance len(24)*len(72) approx len(48)*len(48). But that's not true, as evident by the following snippets:
%%timeit
random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)
>>>1000 loops, best of 3: 1.48 ms per loop
%%timeit
(random.getrandbits(2**14)*random.getrandbits(2**14))*(random.getrandbits(2**14)*random.getrandbits(2**14))
>>>1000 loops, best of 3: 1.23 ms per loop
The difference is consistent even on such a small scale

Time Complexity - Codility - Ladder - Python

The question is available here. My Python code is
def solution(A, B):
if len(A) == 1:
return [1]
ways = [0] * (len(A) + 1)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
result = [1] * len(A)
for i in xrange(len(A)):
result[i] = ways[A[i]] & ((1<<B[i]) - 1)
return result
The detected time complexity by the system is O(L^2) and I can't see why. Thank you in advance.

First, let's show that the runtime genuinely is O(L^2). I copied a section of your code, and ran it with increasing values of L:
import time
import matplotlib.pyplot as plt
def solution(L):
if L == 0:
return
ways = [0] * (L+5)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
points = []
for L in xrange(0, 100001, 10000):
start = time.time()
solution(L)
points.append(time.time() - start)
plt.plot(points)
plt.show()
The result graph is this:
To understand why this O(L^2) when the obvious "time complexity" calculation suggests O(L), note that "time complexity" is not a well-defined concept on its own since it depends on which basic operations you're counting. Normally the basic operations are taken for granted, but in some cases you need to be more careful. Here, if you count additions as a basic operation, then the code is O(N). However, if you count bit (or byte) operations then the code is O(N^2). Here's the reason:
You're building an array of the first L Fibonacci numbers. The length (in digits) of the i'th Fibonacci number is Theta(i). So ways[i] = ways[i-1] + ways[i-2] adds two numbers with approximately i digits, which takes O(i) time if you count bit or byte operations.
This observation gives you an O(L^2) bit operation count for this loop:
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
In the case of this program, it's quite reasonable to count bit operations: your numbers are unboundedly huge as L increases and addition of huge numbers is linear in clock time rather than O(1).
You can fix the complexity of your code by computing the Fibonacci numbers mod 2^32 -- since 2^32 is a multiple of 2^B[i]. That will keep a finite bound on the numbers you're dealing with:
for i in xrange(3, len(ways)):
ways[i] = (ways[i-1] + ways[i-2]) & ((1<<32) - 1)
There are some other issues with the code, but this will fix the slowness.

I've taken the relevant parts of the function:
def solution(A, B):
for i in xrange(3, len(A) + 1): # replaced ways for clarity
# ...
for i in xrange(len(A)):
# ...
return result
Observations:
A is an iterable object (e.g. a list)
You're iterating over the elements of A in sequence
The behavior of your function depends on the number of elements in A, making it O(A)
You're iterating over A twice, meaning 2 O(A) -> O(A)
On point 4, since 2 is a constant factor, 2 O(A) is still in O(A).
I think the page is not correct in its measurement. Had the loops been nested, then it would've been O(A²), but the loops are not nested.
This short sample is O(N²):
def process_list(my_list):
for i in range(0, len(my_list)):
for j in range(0, len(my_list)):
# do something with my_list[i] and my_list[j]
I've not seen the code the page is using to 'detect' the time complexity of the code, but my guess is that the page is counting the number of loops you're using without understanding much of the actual structure of the code.
EDIT1:
Note that, based on this answer, the time complexity of the len function is actually O(1), not O(N), so the page is not incorrectly trying to count its use for the time-complexity. If it were doing that, it would've incorrectly claimed a larger order of growth because it's used 4 separate times.
EDIT2:
As #PaulHankin notes, asymptotic analysis also depends on what's considered a "basic operation". In my analysis, I've counted additions and assignments as "basic operations" by using the uniform cost method, not the logarithmic cost method, which I did not mention at first.
Most of the time simple arithmetic operations are always treated as basic operations. This is what I see most commonly being done, unless the algorithm being analysed is for a basic operation itself (e.g. time complexity of a multiplication function), which is not the case here.
The only reason why we have different results appears to be this distinction. I think we're both correct.
EDIT3:
While an algorithm in O(N) is also in O(N²), I think it's reasonable to state that the code is still in O(N) b/c, at the level of abstraction we're using, the computational steps that seem more relevant (i.e. are more influential) are in the loop as a function of the size of the input iterable A, not the number of bits being used to represent each value.
Consider the following algorithm to compute an:
def function(a, n):
r = 1
for i in range(0, n):
r *= a
return r
Under the uniform cost method, this is in O(N), because the loop is executed n times, but under logarithmic cost method, the algorithm above turns out to be in O(N²) instead due to the time complexity of the multiplication at line r *= a being in O(N), since the number of bits to represent each number is dependent on the size of the number itself.

Codility Ladder competition is best solved in here:
It is super tricky.
We first compute the Fibonacci sequence for the first L+2 numbers. The first two numbers are used only as fillers, so we have to index the sequence as A[idx]+1 instead of A[idx]-1. The second step is to replace the modulo operation by removing all but the n lowest bits

Capturing all data in non-whole train, test, and validate splits

just wondering if a better solution exists for this sort of problem.
We know that for a X/Y percentage split of an even number we can get an exact split of the data - for example for data size 10:
10 * .6 = 6
10 * .4 = 4
10
Splitting data this way is easy, and we can guarantee we have all of the data and nothing is lost. However where I am struggling is on less friendly numbers - take 11
11 * .6 = 6.6
11 * .4 = 4.4
11
However we can't index into an array at i = 6.6 for example. So we have to decide how to to do this. If we take JUST the integer portion we lose 1 data point -
First set = 0..6
Second set = 6..10
This would be the same case if we floored the numbers.
However, if we take the ceiling of the numbers:
First set = 0..7
Second set = 7..12
And we've read past the end of our array.
This gets even worse when we throw in a 3rd or 4th split (30,30,20,20 for example).
Is there a standard splitting procedure for these kinds of problems? Is data loss accepted? It seems like data loss would be unacceptable for dependent data, such as time series.
Thanks!
EDIT: The values .6 and .4 are chosen by me. They could be any two numbers that sum to 1.

First of all, notice that your problem is not limited to odd-sized arrays as you claim, but any-sized arrays. How would you make the 56%-44% split of a 10 element array? Or a 60%-40% split of a 4 element array?
There is no standard procedure. In many cases, programmers do not care that much about an exact split and they either do it by flooring or rounding one quantity (the size of the first set), while taking the complementary (array length - rounded size) for the other (the size of the second).
This might be ok in most cases when this is an one-off calculation and accuracy is not required. You have to ask yourself what your requirements are. For example: are you taking thousands of 10-sized arrays and each time you are splitting them 56%-44% doing some calculations and returning a result? You have to ask yourself what accuracy do you want. Do you care if your result ends up being
the 60%-40% split or the 50%-50% split?
As another example imagine that you are doing a 4-way equal split of 25%-25%-25%-25%. If you have 10 elements and you apply the rounding technique you end up with 3,3,3,1 elements. Surely this will mess up your results.
If you do care about all these inaccuracies then the first step is consider whether you can to adjust either the array size and/or the split ratio(s).
If these are set in stone then the only way to have an accurate split of any ratios of any sized array is to make it probabilistic. You have to split multiple arrays for this to work (meaning you have to apply the same split ratio to same-sized arrays multiple times). The more arrays the better (or you can use the same array multiple times).
So imagine that you have to make a 56%-44% split of a 10 sized array. This means that you need to split it in 5.6 elements and 4.4 elements on the average.
There are many ways you can achieve a 5.6 element average. The easiest one (and the one with the smallest variance in the sequence of tries) is to have 60% of the time a set with 6 elements and 40% of the time a set that has 5 elements.
0.6*6 + 0.4*5 = 5.6
In terms of code this is what you can do to decide on the size of the set each time:
import random
array_size = 10
first_split = 0.56
avg_split_size = array_size * first_split
floored_split_size = int(avg_split_size)
if avg_split_size > floored_split_size:
if random.uniform(0,1) > avg_split_size - floored_split_size:
this_split_size = floored_split_size
else:
this_split_size = floored_split_size + 1
else:
this_split_size = avg_split_size
You could make the code more compact, I just made an outline here so you get the idea. I hope this helps.

Instead of using ciel() or floor() use round() instead. For example:
>>> round(6.6)
7.0
The value returned will be of float type. For getting the integer value, type-cast it to int as:
>>> int(round(6.6))
7
This will be the value of your first split. For getting the second split, calculate it using len(data) - split1_val. This will be applicable in case of 2 split problem.
In case of 3 split, take round value of two split and take the value of 3rd split as the value of len(my_list) - val_split_1 - val_split2
In a Generic way, For N split:
Take the round() value of N-1 split. And for the last value, do len(data) - "value of N round() values".
where len() gives the length of the list.

Let's first consider just splitting the set into two pieces.
Let n be the number of elements we are splitting, and p and q be the proportions, so that
p+q == 1
I assert that the parts after the decimal point will always sum to either 1 or 0, so we should use floor on one and ceil on the other, and we will always be right.
Here is a function that does that, along with a test. I left the print statements in but they are commented out.
def simpleSplitN(n, p, q):
"split n into proportions p and q and return indices"
np = math.ceil(n*p)
nq = math.floor(n*q)
#print n, sum([np, nq]) #np and nq are the proportions
return [0, np] #these are the indices we would use
#test for simpleSplitN
for i in range(1, 10):
p = i/10.0;
q = 1-p
simpleSplitN(37, p, q);
For the mathematically inclined, here is the proof that the decimal proportions will sum to 1
-----------------------
We can express p*n as n/(1/p), and so by the division algorithm we get integers k and r
n == k*(1/p) + r with 0 <= r < (1/p)
Thus r/(1/p) == p*r < 1
We can do exactly the same for q, getting
q*r < 1 (this is a different r)
It is important to note that q*r and p*r are the part after the decimal when we divide our n.
Now we can add them together (we've added subscripts now)
0 <= p*(r_1) < 1
0 <= q*(r_2) < 1
=> 0 < p*r + q*r == p*n + q*n + k_1 + k_2 == n + k_1 + k_2 < 2
But by closure of the integers, n + k_1 + k_2 is an integer and so
0 < n + k_1 + k_2 < 2
means that p*r + q*r must be either 0 or 1. It will only be 0 in the case that our n is divided evenly.
Otherwise we can now see that our fractional parts will always sum to 1.
-----------------------
We can do a very similar (but slightly more complicated) proof for splitting n into an arbitrary number (say N) parts, but instead of them summing to 1, they will sum to an integer less than N.
Here is the general function, it has uncommented print statements for verification purposes.
import math
import random
def splitN(n, c):
"""Compute indices that can be used to split
a dataset of n items into a list of proportions c
by first dividing them naively and then distributing
the decimal parts of said division randomly
"""
nc = [n*i for i in c];
nr = [n*i - int(n*i) for i in c] #the decimal parts
N = int(round(sum(nr))) #sum of all decimal parts
print N, nc
for i in range(0, len(nc)):
nc[i] = math.floor(nc[i])
for i in range(N): #randomly distribute leftovers
nc[random.randint(1, len(nc)) - 1] += 1
print n,sum(nc); #nc now contains the proportions
out = [0] #compute a cumulative sum
for i in range(0, len(nc) - 1):
out.append(out[-1] + nc[i])
print out
return out
#test for splitN with various proportions
c = [.1,.2,.3,.4]
c = [.2,.2,.2,.2,.2]
c = [.3, .2, .2, .3]
for n in range( 10, 40 ):
print splitN(n, c)
If we have leftovers, we will never get an even split, so we distribute them randomly, like #Thanassis said. If you don't like the dependency on random, then you could just add them all at the beginning or at even intervals.
Both of my functions output indices but they compute proportions and thus could be slightly modified to output those instead per user preference.

How to multiply a super large number with a super small number in python?

I'm doing some probability calculation.
In one of my task, I need to multiply the combination number of choose 8000 samples from 10000 items with 0.8**8000.
The combination number is a long long-number, and with the help of numpy, I get the result of 0.8**8000 as 5.2468172239242176864e-776.
But when I try to multiply these two numbers, I got [9] 34845 segmentation fault ipython -i.
How can I do such multiplication then?
PS: This is a piece of my code
import numpy
d2 = numpy.float128(0.8) ** 8000
d1 = 165555575235503558460892983752748337696863078099010763950122624527927836980322780662408249953188062227721112100054260160204180655980717428736444016909193193353770953722788106404786520413339850951599929567643032803416164290936680088121145665954509987077953596641237451927908536624592636591471456488142060812180933761408708169972797751139799352908109763166895772281109195968567911923343187466596002627570139321755043803267091330804414889831229832744256038117150720178689066894068507531026417815624234453195871008113238128934831837842040515600131726096039123279876153916504647241693083829553081901075278042326502699324012014817969085443550523855284341221708045253558716789811929298590803855947461554713178815399150688529048306222786951038548880400191620565711291586700534540755526276938422405001345270278335726581375322976014611332999126216550500951669985289322635729053541565465940744524663726205818866513444952048185208697438054246674199211750006230637806394882672053335493831407089830994135058867370833787098758113596190447219426121568324685764151601296948654893782399960327514764114467176417125060133454019708700782282480571935020898204763471121684913190735908414301826140125010936910161942130277906874552721346626800201093026689035996876035329180150478191582393837824731994055511844267891121846403164857127885959745644323971338513739214928092232132691519007718752719466750891748327404893783451436251805894736392433617289459646429204124129760273396235033220480921175386059331059354409267348067375581516003852060360378571075522650956157791058846993826792047806030332676423336065499519953076910418838626376480202828151673161942289092221049283902410699951912366163469099917310239336454637062482599733606299329923589714875696509548029668358723465427602758225427644633549944802010973352599970041918971524450218727345622721744933664742499521140235707102217164259438766026322532351208348119475549696983427008567651685921355966036780080415723688044325099562693124488758728102729947753752228785786200998322978801432511608341549234067324280214361346940194251357867820535466891356019219904248859277399657389914429390105240751239760865282709465029549690591863591028864648910033430400L
print d1 * d2

When multiplying an extremely large number by an extremely small number, working with floats can introduce huge inaccuracies. In your case, the magnitude of the numbers is causing overflow errors, so you have bigger problems than just inaccuracies!
Whenever you find yourself in this situation, it can be useful to first check if it is possible to stay in the integer domain, and "massage" the numbers a little first. In your case, it is possible and I'll explain how below.
One operand of the multiplication, the extremely large number, is 8000 samples from 10000 items. Use the closed form equation for the number of combinations, where your sample size n is 10000 and the subset size r is 8000. Exclam (!) here is factorial, which you can find in math.factorial in python.
C(n,r) = n! / r! (n - r)!
The other operand 0.8 ** 8000 is the extremely small number, which by index laws is equal to:
8**8000 / 10**8000
So when we multiply these two numbers together, the answer we want is:
10000! * 8**8000
--------------------------
8000! * 2000! * 10**8000
Let's call this number x and then take logarithms of both sides. Working in the log domain will transform multiplications into additions, and divisions into subtractions, making things more manageable.
from math import log, factorial
numerator = log(factorial(10000)) + 8000*log(8)
denominator = log(factorial(8000)) + log(factorial(2000)) + 8000*log(10)
log_x = numerator - denominator
Now these numbers are of a magnitude that is usable in python.
You will find that log_x is equal to approximately 3214. You now only need to observe that exp(log_x) == x to find your answer. It is a very large, but finite, number.

Arbitrary-precision integers aren't really the way to go for this problem, since you're destroying any precision you had by calling log, so I'll just let scipy.special.gammaln speak for itself (but see my edit below):
from math import log, factorial
from scipy.special import gammaln
def comp_integral(n, r, p, q):
numerator = log(factorial(n)) + r*log(8)
denominator = log(factorial(r)) + log(factorial(n-r)) + r*log(q)
return numerator - denominator
def comp_gamma(n, r, p, q):
comb = gammaln(n+1) - gammaln(n-r+1) - gammaln(r+1)
expon = r*(log(p) - log(q))
return comb+expon
In [220]: comp_integral(10000, 8000, 8, 10)
Out[220]: 3214.267963130871
In [221]: comp_gamma(10000, 8000, 8, 10)
Out[221]: 3214.2679631308811
In [222]: %timeit comp_integral(10000, 8000, 8, 10)
10 loops, best of 3: 80.3 ms per loop
In [223]: %timeit comp_gamma(10000, 8000, 8, 10)
100000 loops, best of 3: 11.4 µs per loop
Note that the outputs are identical up to 14 digits, but the gammaln version is almost 8000 times faster. If you're going to do this a lot, this will count.
EDIT: What gammaln does is to compute the natural log of the gamma function. The gamma function can be thought of as a generalization of factorial, in that factorial(n) == gamma(n+1). So comb(n,r) == gamma(n+1)/(gamma(n-r+1)*gamma(r+1)). Then taking logs turns it into the form above.
Gamma also has values for fractional inputs and for negative numbers. That doesn't really matter here though.

I maintain the gmpy2 library and it can do this very easily.
>>> import gmpy2
>>> gmpy2.comb(10000,8000) * gmpy2.mpfr('0.8')**8000
mpfr('8.6863984366232171e+1395')

Building off of wim's great answer, you can also store this number as a Fraction by building a list of prime factors, doing any cancellations and multiplying everything together.
I've included a rather naive implementation for this problem. It returns a fraction in less than a minute as is but if you implement slightly smarter factorization you can surely make it even faster.
from collections import Counter
from fractions import Fraction
import gmpy2 as gmpy
def get_factors(n):
factors = Counter()
factor = 1
while n != 1:
factor = int(gmpy.next_prime(factor))
while not n % factor:
n //= factor
factors[factor] += 1
return factors
factors = Counter()
# multiply by 10000!
for i in range(10000):
factors += get_factors(i+1)
# multiply by 8^8000
factors[2] += 3*8000
#divide by 2000!
for i in range(2000):
factors -= get_factors(i+1)
#divide by 8000!
for i in range(8000):
factors -= get_factors(i+1)
# divide by 10^8000
factors[2] -= 8000
factors[5] -= 8000
# build Fraction
numer = 1
denom = 1
for f,c in factors.items():
if c>0:
numer *= f**c
elif c<0:
denom *= f**-c
frac = Fraction(numer, denom)
Looks like it's around 8.686*10^1395

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.