Faster bitwise modulus for Lucas-Lehmer primality test - python

The Lucas-Lehmer primality test tests prime numbers to determine whether they are also Mersenne primes. One of the bottlenecks is the modulus operation in the calculation of (s**2 − 2) % (2**p - 1).
Using bitwise operations can speed things up considerably (see the L-L link), the best I have so far being:
def mod(n,p):
""" Returns the value of (s**2 - 2) % (2**p -1)"""
Mp = (1<<p) - 1
while n.bit_length() > p: # For Python < 2.7 use len(bin(n)) - 2 > p
n = (n & Mp) + (n >> p)
if n == Mp:
return 0
else:
return n
A simple test case is where p has 5-9 digits and s has 10,000+ digits (or more; not important what they are). Solutions can be tested by mod((s**2 - 2), p) == (s**2 - 2) % (2**p -1). Keep in mind that p - 2 iterations of this modulus operation are required in the L-L test, each with exponentially increasing s, hence the need for optimization.
Is there a way to speed this up further, using pure Python (Python 3 included)? Is there a better way?

The best improvement I could find was removing Mp = (1<<p) - 1 from the modulus function altogether, and pre-calculating it in the L-L function before starting the iterations of the L-L test. Using while n > Mp: instead of while n.bit_length() > p: also saved some time.

In the case where n is much longer than 2^p, you can avoid some quadratic-time pain by doing something like this:
def mod1(n,p):
while n.bit_length() > 3*p:
k = n.bit_length() // p
k1 = k>>1
k1p = k1*p
M = (1<<k1p)-1
n = (n & M) + (n >> k1p)
Mp = (1<<p)-1
while n.bit_length() > p:
n = (n&Mp) + (n>>p)
if n==Mp: return 0
return n
[EDITED because I screwed up the formatting before; thanks to Benjamin for pointing this out. Moral: don't copy-and-paste from an Idle window into SO. Sorry!]
(Note: the criterion for halving the length of n rather than taking p off it, and the exact choice of k1, are both a bit wrong, but it doesn't matter so I haven't bothered fixing them.)
If I take p=12345 and n=9**200000 (yes, I know p then isn't prime, but that doesn't matter here) then this is about 13 times faster.
Unfortunately this will not help you, because in the L-L test n is never bigger than about (2^p)^2. Sorry.

Related

efficiency graph size calculation with power function

I am looking to improve the following simple function (written in python), calculating the maximum size of a specific graph:
def max_size_BDD(n):
i = 1
size = 2
while i <= n:
size += min(pow(2, i-1), pow(2, pow(2, n-i+1))-pow(2, pow(2, n-i)))
i+=1
print(str(i)+" // "+ str(size))
return size
if i give it as input n = 45, the process gets killed (probably because it takes too long, i dont think it is a memory thing, right?). How can i redesign the following algorithm such that it can handle larger inputs?
My proposal: While the original function starts to run into troubles at ~10, I have practically no limitations (even for n = 100000000, I stay below 1s).
def exp_base_2(n):
return 1 << n
def max_size_bdd(n):
# find i at which the min branch switches
start_i = n
while exp_base_2(n - start_i + 1) < start_i:
start_i -= 1
# evaluate all to that point
size = 1 + 2 ** start_i
# evaluate remaining terms (in an uncritical range of n - i)
for i in range(start_i + 1, n + 1):
val = exp_base_2(exp_base_2(n - i))
size += val * (val - 1)
print(f"{i} // {size}")
return size
Remarks:
Core idea: avoid the large powers of 2, as they are not necessary to calculate if you use the min in the end.
I did all this in a rush, maybe I can add more explanation later... if anyone is interested. Then, I could also do a more decent verification of the new implementation.
The effect of exp_base_2 should be negligible after doing all the math to optimize the original calculations. I did this optimization before I went into analysis.
Maybe a complete closed-form solution is possible. I did not invest the time for further investigations.

Create an algorithm whose recurrence is T(n) = T(n/3) + T(n/4) + O(n^2)

How do I go about problems like this one: T(n) = T(n/3) + T(n/4) + O(n^2)
I am able to use two for loops which will give me the O(n^2), right?
To interpret the equation, read it in English as: "the running time for an input of size n equals the running time for an input of size n/3, plus the running time for an input of size n/4, plus a time proportional to n^2".
Writing code that runs in time proportional to n^2 is possible using nested loops, for example, though it's simpler to write a single loop like for i in range(n ** 2): ....
Writing code that runs in time equal to the time it takes for the algorithm with an input of size n/3 or n/4 is even easier - just call the algorithm recursively with an input of that size. (Don't forget a base case for the recursion to terminate on.)
Putting it together, the code could look something like this:
def nonsense_algorithm(n):
if n <= 0:
print('base case')
else:
nonsense_algorithm(n // 3)
nonsense_algorithm(n // 4)
for i in range(n ** 2):
print('constant amount of text')
I've used integer division (rounding down) because I don't think it makes sense otherwise.
Here, we would use for and range(start, stop, steps), maybe with a simple to understand algorithm, similar to:
def algorithm_n2(n: int) -> int:
"""Returns an integer for a O(N2) addition algorithm
:n: must be a positive integer
"""
if n < 1:
return False
output = 0
for i in range(n):
for j in range(n):
output += n * 2
for i in range(0, n, 3):
output += n / 3
for i in range(0, n, 4):
output += n / 4
return int(output)
# Test
if __name__ == "__main__":
print(algorithm_n2(24))
Using print() inside a method is not a best practice.
Output
27748

Recursive function for finding combinations without hitting the built-in limit in python

I have put together a function for finding combinations using recursion without hitting the built in limit in python. For example you could calculate: choose(1000, 500).
This is how it looks right now:
def choose(n, k):
if not k:
return 1 .
elif n < k:
return 0
else:
return ((n + 1 - k) / k) * choose(n, k - 1)
It works exactly how I want it to work. If k is 0 then return 1 and if n is less then k then return 0 (this is according to the mathematical definitions I found on wikipedia). However, the problem is that I don't quite understand the last row (found it when I was browsing the web). Before the last row I'm using at the moment, this was the last row I had in the function:
return choose(n-1,k-1) + choose(n-1, k)
Which I also found on wikipedia (though I don't think I understand this one a 100% as well). But it would always result in an error because of the built in limitation in python, whereas the new line I'm using does not result in such an error. I understand that the new line works much more efficient with the program, because we for example don't split it up to two subproblems.
So again.. what I'm asking is if there are any kind souls out there that could explain (in an understandable manner) how this line of code works in the function:
return ((n + 1 - k) / k) * choose(n, k - 1)
You would first need to know how combination C(n, k) is defined. The formula for C(n, k) is:
or, equivalently:
which can be reformed into a recursive expression:
which is what you implemented.
For the second implemention, this is the Pascal's formula. A recursive implementation would be very slow (and potentionly stack overflow, yes). A more efficient implementation would be to store each C(n, k) in a two-dimensional array to calculate each value in order.
Spoiler: the bottom line will be that you should use the closed form n! / (k! (n - k)!).
In many other languages, the solution would be to make your function tail-recursive, although Python does not support this kind of optimization. Thus implementing recursive solution is simply not the best option.
You could increase the maximal recursion depth with sys.setrecursionlimit, but this is not optimal.
An improvement would be to compute n-choose-k with iteration.
def choose(n, k):
if n < k:
return 0
ans = 1
while k > 0:
ans *= (n + 1 - k) / k
k -= 1
return ans
Although, the above will accumulate an error due to float arithmetic. The very best approach is thus to use the closed form of n-choose-k.
from math import factorial
def choose(n, k):
return factorial(n) / (factorial(k) * factorial(n - k))

Prime number generator crashes from memory error if there are too many numbers in array

I have a prime number generator, I was curious to see how small and how fast I could get a prime number generator to be based on optimizations and such:
from math import sqrt
def p(n):
if n < 2: return []
s = [True]*(((n/2)-1+n%2)+1)
for i in range(int(sqrt(n)) >> 1):
if not s[i]: continue
for j in range( (i**i+(3*i) << 1) + 3, ((n/2)-1+n%2), (i<<1)+3): s[j] = False
q = [2]; q.extend([(i<<1) + 3 for i in range(((n/2)-1+n%2)) if s[i]]); return len(q), q
print p(input())
The generator works great! It is super fast, feel free to try it out. However, if you input numbers greater than 10^9 or 10^10 (i think) it will crash from a memory error. I can't figure out how to expand the memory it uses so that it can take as much as it needs. Any advice would be greatly appreciated!
My question is very similar to this one, but this is Python, not C.
EDIT: This is one of the memory related tracebacks I get for trying to run 10^9.
python prime.py
1000000000
Traceback (most recent call last):
File "prime.py", line 9, in <module>
print p(input())
File "prime.py", line 7, in p
for j in range( (i**i+(3*i) << 1) + 3, ((n/2)-1+n%2), (i<<1)+3): s[j] = False
MemoryError
The Problem is in line 7.
for j in range( (i**i+(3*i) << 1) + 3, ((n/2)-1+n%2), (i<<1)+3): s[j] = False
especially this part: i**i
1000000000^1000000000 is a 9 * 10^9 digit long number. Storing it takes multiple Gb if not Tb (WolframAlpha couldn't caclulate it anymore).
I know that i ist the square root of n (maximal), but at that large numbers that's not a big difference.
You have to split this caclulation into smaller parts if posible and safe it on a hard drive. This makes the process slow, but doable.
First of all, there is a problem since the generator says that numbers like 33, 35 and 45 are prime.
Other than that, there are several structures taking up memory here:
s = [True]*(((n/2)-1+n%2)+1)
A list element takes up several bytes per element. For n = 1 billion the s array is consuming gigabytes.
range(...) creates a list and then iterates over the elements. Use xrange(...) instead where possible.
Converting range() to xrange() has pitfalls - e.g. see this SO answer:
OverflowError Python int too large to convert to C long
A better implementation of s is to use a Python integer as a bit-array which has a density of 8 elements per byte. Here is a translation between using a list and a integer:
s = [True]*(((n/2)-1+n%2)+1) t = (1 << (n/2)+1)-1
s[i] (t & (1<<i))
not s[i] not (t & (1<<i))
s[j] = False m = 1<<j
if (t & m): t ^= m
Update
Here's an unoptimized version which uses yield and xrange. For larger values of n take care of the limitations of xrange as noted above.
def primes(n):
if n < 2: return
yield 2
end = int( sqrt(n) )
t = (1 << n) -1
for p in xrange(3, end, 2):
if not (t & (1 << p)): continue
yield p
for q in xrange(p*p, n, p):
m = t & (1<<q)
if (t&m): t ^= m
continue
for p in xrange(end - (end%2) +1, n, 2):
if not (t & (1 << p)): continue
yield p
def test(n):
for p in primes(n): print p
test(100000)

Compute sum with huge intermediate values

I would like to compute
for values of n up to 1000000 as accurately as possible. Here is some sample code.
from __future__ import division
from scipy.misc import comb
def M(n):
return sum(comb(n,k,exact=True)*(1/n)*(1-k/n)**(2*n-k)*(k/n)**(k-1) for k in xrange(1,n+1))
for i in xrange(1,1000000,100):
print i,M(i)
The first problem is that I get OverflowError: long int too large to convert to float when n = 1101. This is because comb(n,k,exact=True) is too large to be converted to a float. The end result is however always a number around 0.159 .
I asked a related question at How to compute sum with large intermediate values however this question is different for three main reasons.
The formula I want to compute is different which causes different problems.
The solution proposed before to use exact=True does not help here as can be seen in the example I gave. Coding up my own implementation of comb is also not going to work as I still need to perform the floating point division.
I need to compute the answer for much bigger values than before which causes new problems. I suspect it can't be done without coding up the sum in some clever way.
A solution that doesn't crash is to use
from fractions import Fraction
def M2(n):
return sum(comb(n,k,exact=True)*Fraction(1,n)*(1-Fraction(k,n))**(2*n-k)*Fraction(k,n)**(k-1) for k in xrange(1,n+1))
for i in xrange(1,1000000,100):
print i, M2(i)*1.0
Unfortunately it is now so slow that I don't get an answer for n=1101 in a reasonable amount of time.
So the second problem is how to make it fast enough to complete for large n.
You can compute each summand in with a logarithm transformation that replaces multiplication, division, and exponentiation with addition, subtraction, and multiplication, respectively.
def summand(n,k):
lk=log(k)
ln=log(n)
a=(lk-ln)*(k-1)
b=(log(n-k)-ln)*(2*n-k)
c=-ln
d=sum(log(x) for x in xrange(n-k+1,n+1))-sum(log(x) for x in xrange(1,k+1))
return exp(a+b+c+d)
def M(n):
return sum(summand(n,k) for k in xrange(1,n))
Note that when k=n the summand will be zero so I do not compute it since the logarithm will be undefined.
You can use gmpy2. It has arbitrary precision floating point arithmetic with large exponent bounds.
from __future__ import division
from gmpy2 import comb,mpfr,fsum
def M(n):
return fsum(comb(n,k)*(mpfr(1)/n)*(mpfr(1)-mpfr(k)/n)**(mpfr(2)*n-k)*(mpfr(k)/n)**(k-1) for k in xrange(1,n+1))
for i in xrange(1,1000000,100):
print i,M(i)
Here is an excerpt of the output:
2001 0.15857490038127975
2101 0.15857582611615381
2201 0.15857666768820194
2301 0.15857743607577454
2401 0.15857814042739268
2501 0.15857878842787806
2601 0.15857938657957615
Disclaimer: I maintain gmpy2.
A rather brutal method is to compute all the factors and then mutliply in such a way that the result stays around 1.0 (Python 3.x):
def M(n):
return sum(summand(n, k) for k in range(1, n + 1))
def f1(n, k):
for i in range(k - 1):
yield k
for i in range(k):
yield n - i
def f2(n, k):
for i in range(k - 1):
yield 1 / n
for i in range(2 * n - k):
yield 1 - k / n
yield 1 / n
for i in range(2, k + 1):
yield 1 / i
def summand(n, k):
result = 1.0
factors1 = f1(n, k)
factors2 = f2(n, k)
while True:
empty1 = False
for factor in factors1:
result *= factor
if result > 1:
break
else:
empty1 = True
for factor in factors2:
result *= factor
if result < 1:
break
else:
if empty1:
break
return result
For M(1101) I get 0.15855899364641846, but it takes a few seconds. M(2000) takes about 14 seconds and yields 0.15857489065619598.
(I'm sure it can be optimised.)

Categories

Resources