Related
I'm trying to gather some statistics on prime numbers, among which is the distribution of factors for the number (prime-1)/2. I know there are general formulas for the size of factors of uniformly selected numbers, but I haven't seen anything about the distribution of factors of one less than a prime.
I've written a program to iterate through primes starting at the first prime after 2^63, and then factor the (prime - 1)/2 using trial division by all primes up to 2^32. However, this is extremely slow because that is a lot of primes (and a lot of memory) to iterate through. I store the primes as a single byte each (by storing the increment from one prime to the next). I also use a deterministic variant of the Miller-Rabin primality test for numbers up to 2^64, so I can easily detect when the remaining value (after a successful division) is prime.
I've experimented using a variant of pollard-rho and elliptic curve factorization, but it is hard to find the right balance of between trial division and switching to these more complicated methods. Also I'm not sure I've implemented them correctly, because sometimes they seem to take a very lone time to find a factor, and based on their asymptotic behavior, I'd expect them to be quite quick for such small numbers.
I have not found any information on factoring many numbers (vs just trying to factor one), but it seems like there should be some way to speed up the task by taking advantage of this.
Any suggestions, pointers to alternate approaches, or other guidance on this problem is greatly appreciated.
Edit:
The way I store the primes is by storing an 8-bit offset to the next prime, with the implicit first prime being 3. Thus, in my algorithms, I have a separate check for division by 2, then I start a loop:
factorCounts = collections.Counter()
while N % 2 == 0:
factorCounts[2] += 1
N //= 2
pp = 3
for gg in smallPrimeGaps:
if pp*pp > N:
break
if N % pp == 0:
while N % pp == 0:
factorCounts[pp] += 1
N //= pp
pp += gg
Also, I used a wheel sieve to calculate the primes for trial division, and I use an algorithm based on the remainder by several primes to get the next prime after the given starting point.
I use the following for testing if a given number is prime (porting code to c++ now):
bool IsPrime(uint64_t n)
{
if(n < 341531)
return MillerRabinMulti(n, {9345883071009581737ull});
else if(n < 1050535501)
return MillerRabinMulti(n, {336781006125ull, 9639812373923155ull});
else if(n < 350269456337)
return MillerRabinMulti(n, {4230279247111683200ull, 14694767155120705706ull, 1664113952636775035ull});
else if(n < 55245642489451)
return MillerRabinMulti(n, {2ull, 141889084524735ull, 1199124725622454117, 11096072698276303650});
else if(n < 7999252175582851)
return MillerRabinMulti(n, {2ull, 4130806001517ull, 149795463772692060ull, 186635894390467037ull, 3967304179347715805ull});
else if(n < 585226005592931977)
return MillerRabinMulti(n, {2ull, 123635709730000ull, 9233062284813009ull, 43835965440333360ull, 761179012939631437ull, 1263739024124850375ull});
else
return MillerRabinMulti(n, {2ull, 325ull, 9375ull, 28178ull, 450775ull, 9780504ull, 1795265022ull});
}
I don't have a definitive answer, but I do have some observations and some suggestions.
There are about 2*10^17 primes between 2^63 and 2^64, so any program you write is going to run for a while.
Let's talk about a primality test for numbers in the range 2^63 to 2^64. Any general-purpose test will do more work than you need, so you can speed things up by writing a special-purpose test. I suggest strong-pseudoprime tests (as in Miller-Rabin) to bases 2 and 3. If either of those tests shows the number is composite, you're done. Otherwise, look up the number (binary search) in a table of strong-pseudoprimes to bases 2 and 3 (ask Google to find those tables for you). Two strong pseudoprime tests followed by a table lookup will certainly be faster than the deterministic Miller-Rabin test you are currently performing, which probably uses six or seven bases.
For factoring, trial division to 1000 followed by Brent-Rho until the product of the known prime factors exceeds the cube root of the number being factored ought to be fairly fast, a few milliseconds. Then, if the remaining cofactor is composite, it will necessarily have only two factors, so SQUFOF would be a good algorithm to split them, faster than the other methods because all the arithmetic is done with numbers less than the square root of the number being factored, which in your case means the factorization could be done using 32-bit arithmetic instead of 64-bit arithmetic, so it ought to be fast.
Instead of factoring and primality tests, a better method uses a variant of the Sieve of Eratosthenes to factor large blocks of numbers. That will still be slow, as there are 203 million sieving primes less than 2^32, and you will need to deal with the bookkeeping of a segmented sieve, but considering that you factor lots of numbers at once, it's probably the best approach to your task.
I have code for everything mentioned above at my blog.
This is how I store primes for later:
(I'm going to assume you want the factors of the number, and not just a primality test).
Copied from my website http://chemicaldevelopment.us/programming/2016/10/03/PGS.html
I’m going to assume you know the binary number system for this part. If not, just think of 1 as a “yes” and 0 as a “no”.
So, there are plenty of algorithms to generate the first few primes. I use the Sieve of Eratosthenes to compute a list.
But, if we stored the primes as an array, like [2, 3, 5, 7] this would take up too much space. How much space exactly?
Well, 32 bit integers which can store up to 2^32 each take up 4 bytes because each byte is 8 bits, and 32 / 8 = 4
If we wanted to store each prime under 2,000,000,000, we would have to store over 98,000,000,000. This takes up more space, and is slower at runtime than a bitset, which is explained below.
This approach will take 98,000,000 integers of space (each is 32 bits, which is 4 bytes), and when we check at runtime, we will need to check every integer in the array until we find it, or we find a number that is greater than it.
For example, say I give you a small list of primes: [2, 3, 5, 7, 11, 13, 17, 19]. I ask you if 15 is prime. How do you tell me?
Well, you would go through the list and compare each to 15.
Is 2 = 15?
Is 3 = 15?
. . .
Is 17 = 15?
At this point, you can stop because you have passed where 15 would be, so you know it isn’t prime.
Now then, let’s say we use a list of bits to tell you if the number is prime. The list above would look like:
001101010001010001010
This starts at 0, and goes to 19
The 1s mean that the index is prime, so count from the left: 0, 1, 2
001101010001010001010
The last number in bold is 1, which indicates that 2 is prime.
In this case, if I asked you to check if 15 is prime, you don’t need to go through all the numbers in the list; All you need to do is skip to 0 . . . 15, and check that single bit.
And for memory usage, the first approach uses 98000000 integers, whereas this one can store 32 numbers in a single integer (using the list of 1s and 0s), so we would need
2000000000/32=62500000 integers.
So it uses about 60% as much memory as the first approach, and is much faster to use.
We store the array of integers from the second approach in a file, then read it back when you run.
This uses 250MB of ram to store data on the first 2000000000 primes.
You can further reduce this with wheel sieving (like what you did storing (prime-1)/2)
I'll go a little bit more into wheel sieve.
You got it right by storing (prime - 1)/2, and 2 being a special case.
You can extend this to p# (the product of the first p primes)
For example, you use (1#)*k+1 for numbers k
You can also use the set of linear equations (n#)*k+L, where L is the set of primes less than n# and 1 excluding the first n primes.
So, you can also just store info for 6*k+1 and 6*k+5, and even more than that, because L={1, 2, 3, 5}{2, 3}
These methods should give you an understanding of some the methods behind it.
You will need someway to implement this bitset, such as a list of 32 bit integers, or a string.
Look at: https://pypi.python.org/pypi/bitarray for a possible abstraction
Given positive integers b, c, m where (b < m) is True it is to find a positive integer e such that
(b**e % m == c) is True
where ** is exponentiation (e.g. in Ruby, Python or ^ in some other languages) and % is modulo operation. What is the most effective algorithm (with the lowest big-O complexity) to solve it?
Example:
Given b=5; c=8; m=13 this algorithm must find e=7 because 5**7%13 = 8
From the % operator I'm assuming that you are working with integers.
You are trying to solve the Discrete Logarithm problem. A reasonable algorithm is Baby step, giant step, although there are many others, none of which are particularly fast.
The difficulty of finding a fast solution to the discrete logarithm problem is a fundamental part of some popular cryptographic algorithms, so if you find a better solution than any of those on Wikipedia please let me know!
This isn't a simple problem at all. It is called calculating the discrete logarithm and it is the inverse operation to a modular exponentation.
There is no efficient algorithm known. That is, if N denotes the number of bits in m, all known algorithms run in O(2^(N^C)) where C>0.
Python 3 Solution:
Thankfully, SymPy has implemented this for you!
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.
This is the documentation on the discrete_log function. Use this to import it:
from sympy.ntheory import discrete_log
Their example computes \log_7(15) (mod 41):
>>> discrete_log(41, 15, 7)
3
Because of the (state-of-the-art, mind you) algorithms it employs to solve it, you'll get O(\sqrt{n}) on most inputs you try. It's considerably faster when your prime modulus has the property where p - 1 factors into a lot of small primes.
Consider a prime on the order of 100 bits: (~ 2^{100}). With \sqrt{n} complexity, that's still 2^{50} iterations. That being said, don't reinvent the wheel. This does a pretty good job. I might also add that it was almost 4x times more memory efficient than Mathematica's MultiplicativeOrder function when I ran with large-ish inputs (44 MiB vs. 173 MiB).
Since a duplicate of this question was asked under the Python tag, here is a Python implementation of baby step, giant step, which, as #MarkBeyers points out, is a reasonable approach (as long as the modulus isn't too large):
def baby_steps_giant_steps(a,b,p,N = None):
if not N: N = 1 + int(math.sqrt(p))
#initialize baby_steps table
baby_steps = {}
baby_step = 1
for r in range(N+1):
baby_steps[baby_step] = r
baby_step = baby_step * a % p
#now take the giant steps
giant_stride = pow(a,(p-2)*N,p)
giant_step = b
for q in range(N+1):
if giant_step in baby_steps:
return q*N + baby_steps[giant_step]
else:
giant_step = giant_step * giant_stride % p
return "No Match"
In the above implementation, an explicit N can be passed to fish for a small exponent even if p is cryptographically large. It will find the exponent as long as the exponent is smaller than N**2. When N is omitted, the exponent will always be found, but not necessarily in your lifetime or with your machine's memory if p is too large.
For example, if
p = 70606432933607
a = 100001
b = 54696545758787
then 'pow(a,b,p)' evaluates to 67385023448517
and
>>> baby_steps_giant_steps(a,67385023448517,p)
54696545758787
This took about 5 seconds on my machine. For the exponent and the modulus of those sizes, I estimate (based on timing experiments) that brute force would have taken several months.
Discrete logarithm is a hard problem
Computing discrete logarithms is believed to be difficult. No
efficient general method for computing discrete logarithms on
conventional computers is known.
I will add here a simple bruteforce algorithm which tries every possible value from 1 to m and outputs a solution if it was found. Note that there may be more than one solution to the problem or zero solutions at all. This algorithm will return you the smallest possible value or -1 if it does not exist.
def bruteLog(b, c, m):
s = 1
for i in xrange(m):
s = (s * b) % m
if s == c:
return i + 1
return -1
print bruteLog(5, 8, 13)
and here you can see that 3 is in fact the solution:
print 5**3 % 13
There is a better algorithm, but because it is often asked to be implemented in programming competitions, I will just give you a link to explanation.
as said the general problem is hard. however a prcatical way to find e if and only if you know e is going to be small (like in your example) would be just to try each e from 1.
btw e==3 is the first solution to your example, and you can obviously find that in 3 steps, compare to solving the non discrete version, and naively looking for integer solutions i.e.
e = log(c + n*m)/log(b) where n is a non-negative integer
which finds e==3 in 9 steps
I tried to use the following formula
to find the index of a fibonacci number() in a programming question and all the smaller test cases passed but some cases in which F was close to 10^18 failed. I did some dry-run and found out that if F = 99194853094755497 (82nd Fibonacci number) the value of n according to the above formula is 81. I coded this in Python and C++ which can be found here and here respectively. I want to know whether the formula works for every value of F or has some limitations?
Note: After doing some more tests, I found out that the code is giving correct answers till 52nd fibonacci number.
Update: The question has t test cases that's why I used a for loop. The given number F might not necessarily be a Fibonacci number. For ex- If F = 6, then it lies between two fibonacci numbers 5 and 8. Now the index of '5' in the fibonacci sequence is 4 so the answer is 4.
The formula works just fine:
import math
n = 99194853094755497
print math.log(n * math.sqrt(5) + 0.5) / math.log(1.61803398875) - 1
Output:
82.0
A remark on your code:
Using int(...) for rounding off to an integer might cause trouble if the floating point result is very close to 82.0. Numerical issues might cause it to be slightly larger, even though mathematically it would be smaller.
I think your formula is causing a stack overflow because the number is too large to hold in int.
F = 99194853094755497 is 84 Fibonacci number and hence the index for it is 83. Use the below script to get the correct index (integer instead of float).
eps = 10**-10
phi = (1+math.sqrt(5))/2 # golden search ratio
fibonacci_index = int(round(math.log(n * math.sqrt(5)+eps)/math.log(phi)))
Additional Info, code
See this https://github.com/gvavvari/Python/tree/master/Fibonacci_index for more detailed documentation on the implementation
Lately I've been solving some challenges from Google Foobar for fun, and now I've been stuck in one of them for more than 4 days. It is about a recursive function defined as follows:
R(0) = 1
R(1) = 1
R(2) = 2
R(2n) = R(n) + R(n + 1) + n (for n > 1)
R(2n + 1) = R(n - 1) + R(n) + 1 (for n >= 1)
The challenge is writing a function answer(str_S) where str_S is a base-10 string representation of an integer S, which returns the largest n such that R(n) = S. If there is no such n, return "None". Also, S will be a positive integer no greater than 10^25.
I have investigated a lot about recursive functions and about solving recurrence relations, but with no luck. I outputted the first 500 numbers and I found no relation with each one whatsoever. I used the following code, which uses recursion, so it gets really slow when numbers start getting big.
def getNumberOfZombits(time):
if time == 0 or time == 1:
return 1
elif time == 2:
return 2
else:
if time % 2 == 0:
newTime = time/2
return getNumberOfZombits(newTime) + getNumberOfZombits(newTime+1) + newTime
else:
newTime = time/2 # integer, so rounds down
return getNumberOfZombits(newTime-1) + getNumberOfZombits(newTime) + 1
The challenge also included some test cases so, here they are:
Test cases
==========
Inputs:
(string) str_S = "7"
Output:
(string) "4"
Inputs:
(string) str_S = "100"
Output:
(string) "None"
I don't know if I need to solve the recurrence relation to anything simpler, but as there is one for even and one for odd numbers, I find it really hard to do (I haven't learned about it in school yet, so everything I know about this subject is from internet articles).
So, any help at all guiding me to finish this challenge will be welcome :)
Instead of trying to simplify this function mathematically, I simplified the algorithm in Python. As suggested by #LambdaFairy, I implemented memoization in the getNumberOfZombits(time) function. This optimization sped up the function a lot.
Then, I passed to the next step, of trying to see what was the input to that number of rabbits. I had analyzed the function before, by watching its plot, and I knew the even numbers got higher outputs first and only after some time the odd numbers got to the same level. As we want the highest input for that output, I first needed to search in the even numbers and then in the odd numbers.
As you can see, the odd numbers take always more time than the even to reach the same output.
The problem is that we could not search for the numbers increasing 1 each time (it was too slow). What I did to solve that was to implement a binary search-like algorithm. First, I would search the even numbers (with the binary search like algorithm) until I found one answer or I had no more numbers to search. Then, I did the same to the odd numbers (again, with the binary search like algorithm) and if an answer was found, I replaced whatever I had before with it (as it was necessarily bigger than the previous answer).
I have the source code I used to solve this, so if anyone needs it I don't mind sharing it :)
The key to solving this puzzle was using a binary search.
As you can see from the sequence generators, they rely on a roughly n/2 recursion, so calculating R(N) takes about 2*log2(N) recursive calls; and of course you need to do it for both the odd and the even.
Thats not too bad, but you need to figure out where to search for the N which will give you the input. To do this, I first implemented a search for upper and lower bounds for N. I walked up N by powers of 2, until I had N and 2N that formed the lower and upper bounds respectively for each sequence (odd and even).
With these bounds, I could then do a binary search between them to quickly find the value of N, or its non-existence.
ok so I am feeling a little stupid for not knowing this, but a coworker asked so I am asking here: I have written a python algorithm that solves his problem. given x > 0 add all numbers together from 1 to x.
def intsum(x):
if x > 0:
return x + intsum(x - 1)
else:
return 0
intsum(10)
55
first what is this type of equation is this and what is the correct way to get this answer as it is clearly easier using some other method?
This is recursion, though for some reason you're labeling it like it's factorial.
In any case, the sum from 1 to n is also simply:
n * ( n + 1 ) / 2
(You can special case it for negative values if you like.)
Transforming recursively-defined sequences of integers into ones that can be expressed in a closed form is a fascinating part of discrete mathematics -- I heartily recommend Concrete Mathematics: A Foundation for Computer Science, by Ronald Graham, Donald Knuth, and Oren Patashnik (see. e.g. the wikipedia entry about it).
However, the specific sequence you show, fac(x) = fac(x - 1) + x, according to a famous anecdote, was solved by Gauss when he was a child in first grade -- the teacher had given the pupils the taksk of summing numbers from 1 to 100 to keep them quet for a while, but two minutes later there was young Gauss with the answer, 5050, and the explanation: "I noticed that I can sum the first, 1, and the last, 100, that's 101; and the second, 2, and the next-to-last, 99, and that's again 101; and clearly that repeats 50 times, so, 50 times 101, 5050". Not rigorous as proofs go, but quite correct and appropriate for a 6-years-old;-).
In the same way (plus really elementary algebra) you can see that the general case is, as many have already said, (N * (N+1)) / 2 (the product is always even, since one of the numbers must be odd and one even; so the division by two will always produce an integer, as desired, with no remainder).
Here is how to prove the closed form for an arithmetic progression
S = 1 + 2 + ... + (n-1) + n
S = n + (n-1) + ... + 2 + 1
2S = (n+1) + (n+1) + ... + (n+1) + (n+1)
^ you'll note that there are n terms there.
2S = n(n+1)
S = n(n+1)/2
I'm not allowed to comment yet so I'll just add that you'll want to be careful in using range() as it's 0 base. You'll need to use range(n+1) to get the desired effect.
Sorry for the duplication...
sum(range(10)) != 55
sum(range(11)) == 55
OP has asked, in a comment, for a link to the story about Gauss as a schoolchild.
He may want to check out this fascinating article by Brian Hayes. It not only rather convincingly suggests that the Gauss story may be a modern fabrication, but outlines how it would be rather difficult not to see the patterns involved in summing the numbers from 1 to 100. That in fact the only way to miss these patterns would be to solve the problem by writing a program.
The article also talks about different ways to sum arithmetic progressions, which is at the heart of OP's question. There is also an ad-free version here.
Larry is very correct with his formula, and its the fastest way to calculate the sum of all integers up to n.
But for completeness, there are built-in Python functions, that perform what you have done, on lists with arbitrary elements. E.g.
sum()
>>> sum(range(11))
55
>>> sum([2,4,6])
12
or more general, reduce()
>>> import operator
>>> reduce(operator.add, range(11))
55
Consider that N+1, N-1+2, N-2+3, and so on all add up to the same number, and there are approximately N/2 instances like that (exactly N/2 if N is even).
What you have there is called arithmetic sequence and as suggested, you can compute it directly without overhead which might result from the recursion.
And I would say this is a homework despite what you say.