efficiency graph size calculation with power function

efficiency graph size calculation with power function - python

I am looking to improve the following simple function (written in python), calculating the maximum size of a specific graph:
def max_size_BDD(n):
i = 1
size = 2
while i <= n:
size += min(pow(2, i-1), pow(2, pow(2, n-i+1))-pow(2, pow(2, n-i)))
i+=1
print(str(i)+" // "+ str(size))
return size
if i give it as input n = 45, the process gets killed (probably because it takes too long, i dont think it is a memory thing, right?). How can i redesign the following algorithm such that it can handle larger inputs?

My proposal: While the original function starts to run into troubles at ~10, I have practically no limitations (even for n = 100000000, I stay below 1s).
def exp_base_2(n):
return 1 << n
def max_size_bdd(n):
# find i at which the min branch switches
start_i = n
while exp_base_2(n - start_i + 1) < start_i:
start_i -= 1
# evaluate all to that point
size = 1 + 2 ** start_i
# evaluate remaining terms (in an uncritical range of n - i)
for i in range(start_i + 1, n + 1):
val = exp_base_2(exp_base_2(n - i))
size += val * (val - 1)
print(f"{i} // {size}")
return size
Remarks:
Core idea: avoid the large powers of 2, as they are not necessary to calculate if you use the min in the end.
I did all this in a rush, maybe I can add more explanation later... if anyone is interested. Then, I could also do a more decent verification of the new implementation.
The effect of exp_base_2 should be negligible after doing all the math to optimize the original calculations. I did this optimization before I went into analysis.
Maybe a complete closed-form solution is possible. I did not invest the time for further investigations.

Related

How do I know if my algorithm is fast? (Gauss's Circle Problem)

When I wrote my thesis on number theory, I included my take on an algorithm for finding the number of integer lattice points in a circle. I did not go into too much detail about it, since I couldn't find out if it was fast or not, compared to existing algorithms.
I read a lot of pages concerning Gauss's Circle Problem but I could not find many code examples. The fastest one I found was the following, from Wolfram Mathworld:
def wolfram_circle_points(r):
iterpoints = 0
for i in range(1, int(r // 1) + 1):
iterpoints += int((r**2-i**2)**(1/2) // 1)
return 1 + 4*int(r // 1) + 4*iterpoints
My algorithm takes about 60% less time to run than the one from Wolfram:
def my_circle_points(r):
if r == 0:
return 1
a = int(-1 * (r/2**(1/2)) // 1 * -1)
iterpoint = 0
for x in range(a, int(r // 1) + 1):
calcs = x - (r**2-x**2)**(1/2)
if calcs - int(calcs // 1) == 0:
iterpoint += int(calcs // 1) - 1
else:
iterpoint += int(calcs // 1)
return 4*a + 4*int(r // 1)**2 - 8*iterpoint - 3
How do I know if this is worth anything to anybody (time savings-wise)? Is there a database that holds the current fastest algorithms for a bunch of different tasks?

Should I consider whether an input value is even or odd when calculating time complexity?

I have the following function:
def pascal_triangle(i:int,j:int):
j = min(j, i-j + 1)
if i == 1 or j == 1:
return 1
elif j > i or j == 0 or i < 1 or j < 1:
return 0
else:
return pascal_triangle(i-1,j-1) + pascal_triangle(i-1,j)
The input value for j has the following constraint:
1<=j<=i/2
My computation for the time complexity is as follows:
f(i,j) = f(i-1,j-1) + f(i-1,j) = f(i-n,j-n) + ... + f(i-n,j)
So, to find the max n, we have:
i-n>=1
j-n>=1
i-n>=j
and, since we know that:
j>=1
j<=i/2
The max n is i/2-1, so the time complexity is O(2^(i/2-1)), and the space complexity is the maximum depth of recursion(n) times needed space for each time(O(1)), O(2^*(i/2-1)).
I hope my calculation is correct. Now my concern is that if i is odd, This number is not divisible by 2, but the terms of function must be an integer. Therefore, I want to know should I write the time complexity like this:
The time complexity and space complexity of the function are both:
O(2^(i/2-1)) if i is even
O(2^(i/2-0.5)) if i is odd

At a first glance, the time and space analysis looks (roughly) correct. I haven't made a super close inspection, however, since it doesn't appear to be the focus of the question.
Regarding the time complexity for even / odd inputs, the answer is that the time complexity is O(sqrt(2)^i), regardless of whether i is even or odd.
In the even case, we have O(2^(i / 2 - 1)) ==> O(1/2 * sqrt(2)^i) ==> O(sqrt(2)^i).
In the odd case, we have O(2^(i / 2 - 0.5)) ==> O(sqrt(2) / 2 * sqrt(2)^i) ==> O(sqrt(2)^i).
What you've written is technically correct, but significantly more verbose than necessary. (At the very least, it's poor style, and if this question was on a homework assignment or an exam, I personally think one could justify a penalty on this basis.)

Create an algorithm whose recurrence is T(n) = T(n/3) + T(n/4) + O(n^2)

How do I go about problems like this one: T(n) = T(n/3) + T(n/4) + O(n^2)
I am able to use two for loops which will give me the O(n^2), right?

To interpret the equation, read it in English as: "the running time for an input of size n equals the running time for an input of size n/3, plus the running time for an input of size n/4, plus a time proportional to n^2".
Writing code that runs in time proportional to n^2 is possible using nested loops, for example, though it's simpler to write a single loop like for i in range(n ** 2): ....
Writing code that runs in time equal to the time it takes for the algorithm with an input of size n/3 or n/4 is even easier - just call the algorithm recursively with an input of that size. (Don't forget a base case for the recursion to terminate on.)
Putting it together, the code could look something like this:
def nonsense_algorithm(n):
if n <= 0:
print('base case')
else:
nonsense_algorithm(n // 3)
nonsense_algorithm(n // 4)
for i in range(n ** 2):
print('constant amount of text')
I've used integer division (rounding down) because I don't think it makes sense otherwise.

Here, we would use for and range(start, stop, steps), maybe with a simple to understand algorithm, similar to:
def algorithm_n2(n: int) -> int:
"""Returns an integer for a O(N2) addition algorithm
:n: must be a positive integer
"""
if n < 1:
return False
output = 0
for i in range(n):
for j in range(n):
output += n * 2
for i in range(0, n, 3):
output += n / 3
for i in range(0, n, 4):
output += n / 4
return int(output)
# Test
if __name__ == "__main__":
print(algorithm_n2(24))
Using print() inside a method is not a best practice.
Output
27748

Fast way to place bits for puzzle

There is a puzzle which I am writing code to solve that goes as follows.
Consider a binary vector of length n that is initially all zeros. You choose a bit of the vector and set it to 1. Now a process starts that sets the bit that is the greatest distance from any 1 bit to $1$ (or an arbitrary choice of furthest bit if there is more than one). This happens repeatedly with the rule that no two 1 bits can be next to each other. It terminates when there is no more space to place a 1 bit. The goal is to place the initial 1 bit so that as many bits as possible are set to 1 on termination.
Say n = 2. Then wherever we set the bit we end up with exactly one bit set.
For n = 3, if we set the first bit we get 101 in the end. But if we set the middle bit we get 010 which is not optimal.
For n = 4, whichever bit we set we end up with two set.
For n = 5, setting the first gives us 10101 with three bits set in the end.
For n = 7, we need to set the third bit to get 1010101 it seems.
I have written code to find the optimal value but it does not scale well to large n. My code starts to get slow around n = 1000 but I would like to solve the problem for n around 1 million.
#!/usr/bin/python
from __future__ import division
from math import *
def findloc(v):
count = 0
maxcount = 0
id = -1
for i in xrange(n):
if (v[i] == 0):
count += 1
if (v[i] == 1):
if (count > maxcount):
maxcount = count
id = i
count = 0
#Deal with vector ending in 0s
if (2*count >= maxcount and count >= v.index(1) and count >1):
return n-1
#Deal with vector starting in 0s
if (2*v.index(1) >= maxcount and v.index(1) > 1):
return 0
if (maxcount <=2):
return -1
return id-int(ceil(maxcount/2))
def addbits(v):
id = findloc(v)
if (id == -1):
return v
v[id] = 1
return addbits(v)
#Set vector length
n=21
max = 0
for i in xrange(n):
v = [0]*n
v[i] = 1
v = addbits(v)
score = sum([1 for j in xrange(n) if v[j] ==1])
# print i, sum([1 for j in xrange(n) if v[j] ==1]), v
if (score > max):
max = score
print max

Latest answer (O(log n) complexity)
If we believe the conjecture by templatetypedef and Aleksi Torhamo (update: proof at the end of this post), there is a closed form solution count(n) calculable in O(log n) (or O(1) if we assume logarithm and bit shifting is O(1)):
Python:
from math import log
def count(n): # The count, using position k conjectured by templatetypedef
k = p(n-1)+1
count_left = k/2
count_right = f(n-k+1)
return count_left + count_right
def f(n): # The f function calculated using Aleksi Torhamo conjecture
return max(p(n-1)/2 + 1, n-p(n-1))
def p(n): # The largest power of 2 not exceeding n
return 1 << int(log(n,2)) if n > 0 else 0
C++:
int log(int n){ // Integer logarithm, by counting the number of leading 0
return 31-__builtin_clz(n);
}
int p(int n){ // The largest power of 2 not exceeding n
if(n==0) return 0;
return 1<<log(n);
}
int f(int n){ // The f function calculated using Aleksi Torhamo conjecture
int val0 = p(n-1);
int val1 = val0/2+1;
int val2 = n-val0;
return val1>val2 ? val1 : val2;
}
int count(int n){ // The count, using position k conjectured by templatetypedef
int k = p(n-1)+1;
int count_left = k/2;
int count_right = f(n-k+1);
return count_left + count_right;
}
This code can calculate the result for n=100,000,000 (and even n=1e24 in Python!) correctly in no time1.
I have tested the codes with various values for n (using my O(n) solution as the standard, see Old Answer section below), and they still seem correct.
This code relies on the two conjectures by templatetypedef and Aleksi Torhamo2. Anyone wants to proof those? =D (Update 2: PROVEN)
1By no time, I meant almost instantly
2The conjecture by Aleksi Torhamo on f function has been empirically proven for n<=100,000,000
Old answer (O(n) complexity)
I can return the count of n=1,000,000 (the result is 475712) in 1.358s (in my iMac) using Python 2.7. Update: It's 0.198s for n=10,000,000 in C++. =)
Here is my idea, which achieves O(n) time complexity.
The Algorithm
Definition of f(n)
Define f(n) as the number of bits that will be set on bitvector of length n, assuming that the first and last bit are set (except for n=2, where only the first or last bit is set). So we know some values of f(n) as follows:
f(1) = 1
f(2) = 1
f(3) = 2
f(4) = 2
f(5) = 3
Note that this is different from the value that we are looking for, since the initial bit might not be at the first or last, as calculated by f(n). For example, we have f(7)=3 instead of 4.
Note that this can be calculated rather efficiently (amortized O(n) to calculate all values of f up to n) using the recurrence relation:
f(2n) = f(n)+f(n+1)-1
f(2n+1) = 2*f(n+1)-1
for n>=5, since the next bit set following the rule will be the middle bit, except for n=1,2,3,4. Then we can split the bitvector into two parts, each independent of each other, and so we can calculate the number of bits set using f( floor(n/2) ) + f( ceil(n/2) ) - 1, as illustrated below:
n=11 n=13
10000100001 1000001000001
<----> <----->
f(6)<----> f(7) <----->
f(6) f(7)
n=12 n=14
100001000001 10000010000001
<----> <----->
f(6)<-----> f(7) <------>
f(7) f(8)
we have the -1 in the formula to exclude the double count of the middle bit.
Now we are ready to count the solution of original problem.
Definition of g(n,i)
Define g(n,i) as the number of bits that will be set on bitvector of length n, following the rules in the problem, where the initial bit is at the i-th bit (1-based). Note that by symmetry the initial bit can be anywhere from the first bit up to the ceil(n/2)-th bit. And for those cases, note that the first bit will be set before any bit in between the first and the initial, and so is the case for the last bit. Therefore the number of bit set in the first partition and the second partition is f(i) and f(n+1-i) respectively.
So the value of g(n,i) can be calculated as:
g(n,i) = f(i) + f(n+1-i) - 1
following the idea when calculating f(n).
Now, to calculate the final result is trivial.
Definition of g(n)
Define g(n) as the count being looked for in the original problem. We can then take the maximum of all possible i, the position of initial bit:
g(n) = maxi=1..ceil(n/2)(f(i) + f(n+1-i) - 1)
Python code:
import time
mem_f = [0,1,1,2,2]
mem_f.extend([-1]*(10**7)) # This will take around 40MB of memory
def f(n):
global mem_f
if mem_f[n]>-1:
return mem_f[n]
if n%2==1:
mem_f[n] = 2*f((n+1)/2)-1
return mem_f[n]
else:
half = n/2
mem_f[n] = f(half)+f(half+1)-1
return mem_f[n]
def g(n):
return max(f(i)+f(n+1-i)-1 for i in range(1,(n+1)/2 + 1))
def main():
while True:
n = input('Enter n (1 <= n <= 10,000,000; 0 to stop): ')
if n==0: break
start_time = time.time()
print 'g(%d) = %d, in %.3fs' % (n, g(n), time.time()-start_time)
if __name__=='__main__':
main()
Complexity Analysis
Now, the interesting thing is, what is the complexity of calculating g(n) with the method described above?
We should first note that we iterate over n/2 values of i, the position of initial bit. And in each iteration we call f(i) and f(n+1-i). Naive analysis will lead to O(n * O(f(n))), but actually we used memoization on f, so it's much faster than that, since each value of f(i) is calculated only once, at most. So the complexity is actually added by the time required to calculate all values of f(n), which would be O(n + f(n)) instead.
So what's the complexity of initializing f(n)?
We can assume that we precompute every value of f(n) first before calculating g(n). Note that due to the recurrence relation and the memoization, generating the whole values of f(n) takes O(n) time. And the next call to f(n) will take O(1) time.
So, the overall complexity is O(n+n) = O(n), as evidenced by this running time in my iMac for n=1,000,000 and n=10,000,000:
> python max_vec_bit.py
Enter n (1 <= n <= 10,000,000; 0 to stop): 1000000
g(1000000) = 475712, in 1.358s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
>
> <restarted the program to remove the effect of memoization>
>
> python max_vec_bit.py
Enter n (1 <= n <= 10,000,000; 0 to stop): 10000000
g(10000000) = 4757120, in 13.484s
Enter n (1 <= n <= 10,000,000; 0 to stop): 6745231
g(6745231) = 3145729, in 3.072s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
And as a by-product of memoization, the calculation of lesser value of n will be much faster after the first call to large n, as you can also see in the sample run. And with language better suited for number crunching such as C++, you might get significantly faster running time
I hope this helps. =)
The code using C++, for performance improvement
The result in C++ is about 68x faster (measured by clock()):
> ./a.out
Enter n (1 <= n <= 10,000,000; 0 to stop): 1000000
g(1000000) = 475712, in 0.020s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
>
> <restarted the program to remove the effect of memoization>
>
> ./a.out
Enter n (1 <= n <= 10,000,000; 0 to stop): 10000000
g(10000000) = 4757120, in 0.198s
Enter n (1 <= n <= 10,000,000; 0 to stop): 6745231
g(6745231) = 3145729, in 0.047s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
Code in C++:
#include <cstdio>
#include <cstring>
#include <ctime>
int mem_f[10000001];
int f(int n){
if(mem_f[n]>-1)
return mem_f[n];
if(n%2==1){
mem_f[n] = 2*f((n+1)/2)-1;
return mem_f[n];
} else {
int half = n/2;
mem_f[n] = f(half)+f(half+1)-1;
return mem_f[n];
}
}
int g(int n){
int result = 0;
for(int i=1; i<=(n+1)/2; i++){
int cnt = f(i)+f(n+1-i)-1;
result = (cnt > result ? cnt : result);
}
return result;
}
int main(){
memset(mem_f,-1,sizeof(mem_f));
mem_f[0] = 0;
mem_f[1] = mem_f[2] = 1;
mem_f[3] = mem_f[4] = 2;
clock_t start, end;
while(true){
int n;
printf("Enter n (1 <= n <= 10,000,000; 0 to stop): ");
scanf("%d",&n);
if(n==0) break;
start = clock();
int result = g(n);
end = clock();
printf("g(%d) = %d, in %.3fs\n",n,result,((double)(end-start))/CLOCKS_PER_SEC);
}
}
Proof
note that for the sake of keeping this answer (which is already very long) simple, I've skipped some steps in the proof
Conjecture of Aleksi Torhamo on the value of f
For `n>=1`, prove that:
f(2n+k) = 2n-1+1 for k=1,2,…,2n-1 ...(1)
f(2n+k) = k for k=2n-1+1,…,2n ...(2)
given f(0)=f(1)=f(2)=1
The result above can be easily proven using induction on the recurrence relation, by considering the four cases:
Case 1: (1) for even k
Case 2: (1) for odd k
Case 3: (2) for even k
Case 4: (2) for odd k
Suppose we have the four cases proven for n. Now consider n+1.
Case 1:
f(2n+1+2i) = f(2n+i) + f(2n+i+1) - 1, for i=1,…,2n-1
= 2n-1+1 + 2n-1+1 - 1
= 2n+1
Case 2:
f(2n+1+2i+1) = 2*f(2n+i+1) - 1, for i=0,…,2n-1-1
= 2*(2n-1+1) - 1
= 2n+1
Case 3:
f(2n+1+2i) = f(2n+i) + f(2n+i+1) - 1, for i=2n-1+1,…,2n
= i + (i+1) - 1
= 2i
Case 4:
f(2n+1+2i+1) = 2*f(2n+i+1) - 1, for i=2n-1+1,…,2n-1
= 2*(i+1) - 1
= 2i+1
So by induction the conjecture is proven.
Conjecture of templatetypedef on the best position
For n>=1 and k=1,…,2n, prove that g(2n+k) = g(2n+k, 2n+1)
That is, prove that placing the first bit on the 2n+1-th position gives maximum number of bits set.
The proof:
First, we have
g(2n+k,2n+1) = f(2n+1) + f(k-1) - 1
Next, by the formula of f, we have the following equalities:
f(2n+1-i) = f(2n+1), for i=-2n-1,…,-1
f(2n+1-i) = f(2n+1)-i, for i=1,…,2n-2-1
f(2n+1-i) = f(2n+1)-2n-2, for i=2n-2,…,2n-1
and also the following inequality:
f(k-1+i) <= f(k-1), for i=-2n-1,…,-1
f(k-1+i) <= f(k-1)+i , for i=1,…,2n-2-1
f(k-1+i) <= f(k-1)+2n-2, for i=2n-2,…,2n-1
and so we have:
f(2n+1-i)+f(k-1+i) <= f(2n+1)+f(k-1), for i=-2n-1,…,2n-1
Now, note that we have:
g(2n+k) = maxi=1..ceil(2n-1+1-k/2)(f(i) + f(2n+k+1-i) - 1)
<= f(2n+1) + f(k-1) - 1
= g(2n+k,2n+1)
And so the conjecture is proven.

So in a break with my normal tradition of not posting algorithms I don't have a proof for, I think I should mention that there's an algorithm that appears to be correct for numbers up to 50,000+ and runs in O(log n) time. This is due to Sophia Westwood, who I worked on this problem with for about three hours today. All credit for this is due to her. Empirically it seems to work beautifully, and it's much, much faster than the O(n) solutions.
One observation about the structure of this problem is that if n is sufficiently large (n ≥ 5), then if you put a 1 anywhere, the problem splits into two subproblems, one to the left of the 1 and one to the right. Although the 1s might be placed in the different halves at different times, the eventual placement is the same as if you solved each half separately and combined them back together.
The next observation is this: suppose you have an array of size 2k + 1 for some k. In that case, suppose that you put a 1 on either side of the array. Then:
The next 1 is placed on the other side of the array.
The next 1 is placed in the middle.
You now have two smaller subproblems of size 2k-1 + 1.
The important part about this is that the resulting bit pattern is an alternating series of 1s and 0s. For example:
For 5 = 4 + 1, we get 10101
For 9 = 8 + 1, we get 101010101
For 17 = 16 + 1, we get 10101010101010101
The reason this matters is the following: suppose you have n total elements in the array and let k be the largest possible value for which 2k + 1 ≤ n. If you place the 1 at position 2k + 1, then the left part of the array up to that position will end up getting tiled with alternating 1s and 0s, which puts a lot of 1s into the array.
What's not obvious is that placing the 1 bit there, for all numbers up to 50,000, appears to yield an optimal solution! I've written a Python script that checks this (using a recurrence relation similar to the one #justhalf) and it seems to work well. The reason that this fact is so useful is that it's really easy to compute this index. In particular, if 2k + 1 ≤ n, then 2k ≤ n - 1, so k ≤ lg (n - 1). Choosing the value ⌊lg (n - 1) ⌋ as your choice of k then lets you compute the bit index by computing 2k + 1. This value of k can be computed in O(log n) time and the exponentiation can be done in O(log n) time as well, so the total runtime is Θ(log n).
The only issue is that I haven't formally proven that this works. All I know is that it's right for the first 50,000 values we've tried. :-)
Hope this helps!

I'll attach what I have. Same as yours, alas, time is basically O(n**3). But at least it avoids recursion (etc), so won't blow up when you get near a million ;-) Note that this returns the best vector found, not the count; e.g.,
>>> solve(23)
[6, 0, 11, 0, 1, 0, 0, 10, 0, 5, 0, 9, 0, 3, 0, 0, 8, 0, 4, 0, 7, 0, 2]
So it also shows the order in which the 1 bits were chosen. The easiest way to get the count is to pass the result to max().
>>> max(solve(23))
11
Or change the function to return maxsofar instead of best.
If you want to run numbers on the order of a million, you'll need something radically different. You can't even afford quadratic time for that (let alone this approach's cubic time). Unlikely to get such a huge O() improvement from fancier data structures - I expect it would require deeper insight into the mathematics of the problem.
def solve(n):
maxsofar, best = 1, [1] + [0] * (n-1)
# by symmetry, no use trying starting points in last half
# (would be a mirror image).
for i in xrange((n + 1)//2):
v = [0] * n
v[i] = count = 1
# d21[i] = distance to closest 1 from index i
d21 = range(i, 0, -1) + range(n-i)
while 1:
d, j = max((d, j) for j, d in enumerate(d21))
if d >= 2:
count += 1
v[j] = count
d21[j] = 0
k = 1
while j-k >= 0 and d21[j-k] > k:
d21[j-k] = k
k += 1
k = 1
while j+k < n and d21[j+k] > k:
d21[j+k] = k
k += 1
else:
if count > maxsofar:
maxsofar = count
best = v[:]
break
return best

Faster bitwise modulus for Lucas-Lehmer primality test

The Lucas-Lehmer primality test tests prime numbers to determine whether they are also Mersenne primes. One of the bottlenecks is the modulus operation in the calculation of (s**2 − 2) % (2**p - 1).
Using bitwise operations can speed things up considerably (see the L-L link), the best I have so far being:
def mod(n,p):
""" Returns the value of (s**2 - 2) % (2**p -1)"""
Mp = (1<<p) - 1
while n.bit_length() > p: # For Python < 2.7 use len(bin(n)) - 2 > p
n = (n & Mp) + (n >> p)
if n == Mp:
return 0
else:
return n
A simple test case is where p has 5-9 digits and s has 10,000+ digits (or more; not important what they are). Solutions can be tested by mod((s**2 - 2), p) == (s**2 - 2) % (2**p -1). Keep in mind that p - 2 iterations of this modulus operation are required in the L-L test, each with exponentially increasing s, hence the need for optimization.
Is there a way to speed this up further, using pure Python (Python 3 included)? Is there a better way?

The best improvement I could find was removing Mp = (1<<p) - 1 from the modulus function altogether, and pre-calculating it in the L-L function before starting the iterations of the L-L test. Using while n > Mp: instead of while n.bit_length() > p: also saved some time.

In the case where n is much longer than 2^p, you can avoid some quadratic-time pain by doing something like this:
def mod1(n,p):
while n.bit_length() > 3*p:
k = n.bit_length() // p
k1 = k>>1
k1p = k1*p
M = (1<<k1p)-1
n = (n & M) + (n >> k1p)
Mp = (1<<p)-1
while n.bit_length() > p:
n = (n&Mp) + (n>>p)
if n==Mp: return 0
return n
[EDITED because I screwed up the formatting before; thanks to Benjamin for pointing this out. Moral: don't copy-and-paste from an Idle window into SO. Sorry!]
(Note: the criterion for halving the length of n rather than taking p off it, and the exact choice of k1, are both a bit wrong, but it doesn't matter so I haven't bothered fixing them.)
If I take p=12345 and n=9**200000 (yes, I know p then isn't prime, but that doesn't matter here) then this is about 13 times faster.
Unfortunately this will not help you, because in the L-L test n is never bigger than about (2^p)^2. Sorry.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.