Since I'm starting to get the hang of Python, I'm starting to test my newly acquired Python skills on some problems on projecteuler.net.
Anyways, at some point, I ended up making a function for getting a list of all primes up until a number 'n'.
Here's how the function looks atm:
def primes(n):
"""Returns list of all the primes up until the number n."""
# Gather all potential primes in a list.
primes = range(2, n + 1)
# The first potential prime in the list should be two.
assert primes[0] == 2
# The last potential prime in the list should be n.
assert primes[-1] == n
# 'p' will be the index of the current confirmed prime.
p = 0
# As long as 'p' is within the bounds of the list:
while p < len(primes):
# Set the candidate index 'c' to start right after 'p'.
c = p + 1
# As long as 'c' is within the bounds of the list:
while c < len(primes):
# Check if the candidate is divisible by the prime.
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed.
primes.pop(c)
# Move on to the next candidate and redo the process.
c = c + 1
# The next integer in the list should now be a prime,
# since it is not divisible by any of the primes before it.
# Thus we can move on to the next prime and redo the process.
p = p + 1
# The list should now only contain primes, and can thus be returned.
return primes
It seems to work fine, although one there's one thing that bothers me.
While commenting the code, this piece suddenly seemed off:
# Check if the candidate is divisible by the prime.
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed from the list.
primes.pop(c)
# Move on to the next candidate and redo the process.
c += 1
If the candidate IS NOT divisible by the prime we examine the next candidate located at 'c + 1'. No problem with that.
However, if the candidate IS divisible by the prime, we first pop it and then examine the next candidate located at 'c + 1'.
It struck me that the next candidate, after popping, is not located at 'c + 1', but 'c', since after popping at 'c', the next candidate "falls" into that index.
I then thought that the block should look like the following:
# If the candidate is divisible by the prime:
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed from the list.
primes.pop(c)
# If not:
else:
# Move on to the next candidate.
c += 1
This above block seems more correct to me, but leaves me wondering why the original piece apparently worked just fine.
So, here are my questions:
After popping a candidate which turned out not be a prime, can we assume, as it is in my original code, that the next candidate is NOT divisible by that same prime?
If so, why is that?
Would the suggested "safe" code just do unnecessary checks on the candidates which where skipped in the "unsafe" code?
PS:
I've tried writing the above assumption as an assertion into the 'unsafe' function, and test it with n = 100000. No problems occurred. Here's the modified block:
# If the candidate is divisible by the prime:
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed.
primes.pop(c)
# If c is still within the bounds of the list:
if c < len(primes):
# We assume that the new candidate at 'c' is not divisible by the prime.
assert primes[c] % primes[p] != 0
# Move on to the next candidate and redo the process.
c = c + 1
It fails for much bigger numbers. The first prime is 71, for that the candidate can fail. The smallest failing candidate for 71 is 10986448536829734695346889 which overshadows the number 10986448536829734695346889 + 142.
def primes(n, skip_range=None):
"""Modified "primes" with the original assertion from P.S. of the question.
with skipping of an unimportant huge range.
>>> primes(71)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]
>>> # The smallest failing number for the first failing prime 71:
>>> big_n = 10986448536829734695346889
>>> primes(big_n + 2 * 71, (72, big_n))
Traceback (most recent call last):
AssertionError
"""
if not skip_range:
primes = list(range(2, n + 1))
else:
primes = list(range(2, skip_range[0]))
primes.extend(range(skip_range[1], n + 1))
p = 0
while p < len(primes):
c = p + 1
while c < len(primes):
if(primes[c] % primes[p] == 0):
primes.pop(c)
if c < len(primes):
assert primes[c] % primes[p] != 0
c = c + 1
p = p + 1
return primes
# Verify that it can fail.
aprime = 71 # the first problematic prime
FIRST_BAD_NUMBERS = (
10986448536829734695346889, 11078434793489708690791399,
12367063025234804812185529, 20329913969650068499781719,
30697401499184410328653969, 35961932865481861481238649,
40008133490686471804514089, 41414505712084173826517629,
49440212368558553144898949, 52201441345368693378576229)
for bad_number in FIRST_BAD_NUMBERS:
try:
primes(bad_number + 2 * aprime, (aprime + 1, bad_number))
raise Exception('The number {} should fail'.format(bad_number))
except AssertionError:
print('{} OK. It fails as is expected'.format(bad_number))
I solved these numbers by a complicated algorithm like a puzzle by searching possible remainders of n modulo small primes. The last simple step was to get the complete n (by chinese remainder theorem in three lines of Python code). I know all 120 basic solutions smaller than primorial(71) = 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 * 29 * 31 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 repeated periodically by all multiples of this number. I rewrote the algorithm many times for every decade of tested primes because for every decade was the solution much slower than for the previous. Maybe I find a smaller solution with the same algorithm for primes 73 or 79 in acceptable time.
Edit:
I would like to find also a complete silent fail of the unsafe original function. Maybe exists some candidate composed from different primes. This way of solution would only postpone the final outcome for later. Every step would be much more and more expensive for time and resources. Therefore only numbers composed from one or two primes are attractive.
I expect that only two solutions the hidden candidate c are good: c = p ** n or c = p1 * p ** n or c = p1 ** n1 * p ** n where p and p1 are primes and n is a power greater than 1. The primes function fails if c - 2 * p is divisible by no prime smaller than p and if all number between c-2n and c are divisible by any prime smaller than p. The variant p1*p**n requires also that the same c had failed before for p1 (p1 < p) as we already know infinite number of such candidates.
EDIT: I found a smaller example of failure: number 121093190175715194562061 for the prime 79. (which is about ninety times less than for 71) I can't continue by the same algorithm to find smaller examples because all 702612 basic solutions took more than 30 hours for the prime 79 on my laptop.
I also verified it for all candidates smaller than 400000000 (4E10) and for all relevant primes, that no candidate will fail the assertion in the question. Until you have terabytes of memory and thousands years of time, the assertion in the algorithm will pass, because your time complexity is O((n / log(n)) ^2) or very similar.
Your observation seems to be accurate, which is quite a good catch.
I suspect the reason that it works, at least in some cases, is because composite numbers are actually factored into multiple primes. So, the inner loop may miss the value on the first factor, but it then picks it up on a later factor.
For a small'ish "n", you can print out values of the list to see if this is what is happening.
This method of finding primes, by the way, is based on the Sieve of Eratothenes. It is possible when doing the sieve that if "c" is a multiple of "p", then the next value is never a multiple of the same prime.
The question is: are there any cases where all values between p*x and p*(x+1) are divisible by some prime less than p and p*x+1). (This is where the algorithm would miss a value and it would not be caught later.) However, one of these values is even, so it would be eliminated on round "2". So, the real question is whether there are cases where all values between p*x and p*(x+2) are divisible by numbers less than p.
Off hand, I can't think of any numbers less than 100 that meet this condition. For p = 5, there is always a value that is not divisible by 2 or 3 between two consecutive multiples of 5.
There seems to be a lot written on prime gaps and sequences, but not so much on sequences of consecutive integers divisible by numbers less than p. After some (okay, a lot) of trial and error, I've determined that every number between 39,474 (17*2,322) and 39,491 (17*2,233) is divisible by an integer less than 17:
39,475 5
39,476 2
39,477 3
39,478 2
39,479 11
39,480 2
39,481 13
39,482 2
39,483 3
39,484 2
39,485 5
39,486 2
39,487 7
39,488 2
39,489 3
39,490 2
I am not sure if this is the first such value. However, we would have to find sequences twice as long as this. I think that is unlikely, but not sure if there is a proof.
My conclusion is that the original code might work, but that your fix is the right thing to do. Without a proof that there are no such sequences, it looks like a bug, albeit a bug that could be very, very, very rare.
Given two numbers n, m in the consecutive sequence of possible primes such that n and m are not divisible by the last divisor p, then m - n < p
Given q (the next higher divisor) > p, then if n is divisible by q, then the next number divisible by q is n + q > n + p > m
so m should be skipped in the current iteration for divisibility test
Here n = primes[c]
m = primes[c + 1], i.e. primes[c] after primes.pop(c)
p = primes[p]
q = primes[p+1]
This program does not work correctly, i.e., it incorrectly reports a composite number as prime. It turns out to have the same bug as a program by Wirth. The details may be found in Paul Pritchard, Some negative results concerning prime number generators, Communications of the ACM, Vol. 27, no. 1, Jan. 1984, pp. 53–57. This paper gives a proof that the program must fail, and also exhibits an explicit composite which it reports as prime.
This doesn't provide a remotely conclusive answer, but here's what I've tried on this:
I've restated the required assumption here as (lpf stands for Least Prime Factor):
For any composite number, x, where:
lpf(x) = n
There exists a value, m, where 0 < m < 2n and:
lpf(x+m) > n
It can be easily demonstrated that values for x exist where no composite number (x+m), exists to satisfy the inequality. Any squared prime demonstrates that:
lpf(x) = x^.5, so x = n^2
n^2 + 2n < (n + 1)^2 = n^2 + 2n + 1
So, in the case of any squared prime, for this to hold true, there must be a prime number, p, present in the range x < p < x + 2n.
I think that can be concluded given the asymptotic distribution of squares (x^.5) compared to the the Prime Number Theorem (asymptotic distribution of primes approx. x/(ln x)), though, really, my understanding of the Prime Number Theorem is limited at best.
And I have no strategy whatsoever for extending that conclusion to non-square composite numbers, so that may not be a useful avenue.
I've put together a program testing values using the above restatement of the problem.
Test this statement directly should remove any got-lucky results from just running the algorithm as stated. By got-lucky results, I'm referring to a value being skipped that may not be safe, but that doesn't turn up any incorrect results, due to a skipped value not being divisible by the number currently being iterated on, or being picked up by subsequent iterations. Essentially, if the algorithm gets the correct result, but either doesn't find the LEAST prime factor of each eliminated value, or doesn't rigorously check each prime result, I'm not satisfied with it. If such cases exist, I think it's reasonable to assume that cases also exist where it would not get lucky (unusual though they may be), and would render an incorrect result.
Running my test, however, shows no counter-examples in the values from 2 - 2,000,000. So, for what it's worth, values from the algorithm as stated should be safe up to, at least, 2,000,000, unless my logic is incorrect.
That's what I have to add. Great question, Phazyck, had fun with it!
Here is an idea:
Triptych explained1 that the next number after c cannot be c + p, but we still need to show that it can also never be c + 2p.
If we use primes = [2], we can only have one consecutive "non-prime", an number divisible by 2.
If we use primes = [2,3] we can construct 3 consecutive "non-primes", a number divided by 2, a number divided by three, and a number divided by 2, and they cannot get the next number. Or
2,3,4 => 3 consecutive "non-primes"
Even though 2 and 3 are not "non-primes" it is easier for me to think in terms of those numbers.
If we use [2,3,5], we get
2,3,4,5,6 => 5 consecutive "non-primes"
If we use [2,3,5,7], we get
2,3,4,5,6,7,8,9,10 => 9 consecutive "non-primes"
The pattern emerges. The most consecutive non-primes that we can get is next prime - 2.
Therefore, if next_prime < p * 2 + 1, we have to have at least some number between c and c + 2p, because number of consecutive non-primes is not long enough, given the primes yet.
I don't know about very very big number, but I think this next_prime < p * 2 + 1 is likely to hold very big numbers.
I hope this makes sense, and adds some light.
1 Triptych's answer has been deleted.
If prime p divides candidate c, then the next larger candidate that is divisible by p is c + p. Therefore, your original code is correct.
However, it's a rotten way to produce a list of primes; try it with n = 1000000 and see how slow it gets. The problem is that you are performing trial division when you should be using a sieve. Here's a simple sieve (pseudocode, I'll let you do the translation to Python or another language):
function primes(n)
sieve := makeArray(2..n, True)
for p from 2 to n step 1
if sieve[p]
output p
for i from p+p to n step p
sieve[i] := False
That should get the primes less than a million in less than a second. And there are other sieve algorithms that are even faster.
This algorithm is called the Sieve of Eratosthenes, and was invented about 2200 years ago by a Greek mathematician. Eratosthenes was an interesting fellow: besides sieving for primes, he invented the leap day and a system of latitude and longitude, accurately calculated the distance from Sun to Earth and the circumference of the Earth, and was for a time the Chief Librarian of Ptolemy's Library in Alexandria.
When you are ready to learn more about programming with prime numbers, I modestly recommend this essay at my blog.
Related
I would like to decompose a number into a tuple of numbers as close to each other in size as possible, whose product is the initial number. The inputs are the number n we want to factor and the number m of factors desired.
For the two factor situation (m==2), it is enough to look for the largest factor less than a square root, so I can do something like this
def get_factors(n):
i = int(n**0.5 + 0.5)
while n % i != 0:
i -= 1
return i, n/i
So calling this with 120 will result in 10,12.
I realize there is some ambiguity as to what it means for the numbers to be "close to each other in size". I don't mind if this is interpretted as minimizing Σ(x_i - x_avg) or Σ(x_i - x_avg)^2 or something else generally along those lines.
For the m==3 case, I would expect that 336 to produce 6,7,8 and 729 to produce 9,9,9.
Ideally, I would like a solution for general m, but if someone has an idea even for m==3 it would be much appreciated. I welcome general heuristics too.
EDIT: I would prefer to minimize the sum of the factors. Still interested in the above, but if someone has an idea for a way of also figuring out the optimal m value such that the sum of factors is minimal, it'd be great!
To answer your second question (which m minimizes the sum of factors), it will always be optimal to split number into its prime factors. Indeed, for any positive composite number except 4 sum of its prime factors is less that the number itself, so any split that has composite numbers can be improved by splitting that composite numbers into its prime factors.
To answer your first question, greedy approaches suggested by others will not work, as I pointed out in the comments 4104 breaks them, greedy will immediately extract 8 as the first factor, and then will be forced to split the remaining number into [3, 9, 19], failing to find a better solution [6, 6, 6, 19]. However, a simple DP can find the best solution. The state of the DP is the number we are trying to factor, and how many factors do we want to get, the value of the DP is the best sum possible. Something along the lines of the code below. It can be optimized by doing factorization smarter.
n = int(raw_input())
left = int(raw_input())
memo = {}
def dp(n, left): # returns tuple (cost, [factors])
if (n, left) in memo: return memo[(n, left)]
if left == 1:
return (n, [n])
i = 2
best = n
bestTuple = [n]
while i * i <= n:
if n % i == 0:
rem = dp(n / i, left - 1)
if rem[0] + i < best:
best = rem[0] + i
bestTuple = [i] + rem[1]
i += 1
memo[(n, left)] = (best, bestTuple)
return memo[(n, left)]
print dp(n, left)[1]
For example
[In] 4104
[In] 4
[Out] [6, 6, 6, 19]
You can start with the same principle: look for numbers under or equal to the mth root that are factors. Then you can recurse to find the remaining factors.
def get_factors(n, m):
factors = []
factor = int(n**(1.0/m) + .1) # fudged to deal with precision problem with float roots
while n % factor != 0:
factor = factor - 1
factors.append(factor)
if m > 1:
factors = factors + get_factors(n / factor, m - 1)
return factors
print get_factors(729, 3)
How about this, for m=3 and some n:
Get the largest factor of n smaller than the cube root of n, call it f1
Divide n by f1, call it g
Find the "roughly equal factors" of g as in the m=2 example.
For 336, the largest factor smaller than the cube root of 336 is 6 (I think). Dividing 336 by 6 gives 56 (another factor, go figure!) Performing the same math for 56 and looking for two factors, we get 7 and 8.
Note that doesn't work for any number with fewer than 3 factors. This method can be expanded for m > 3, maybe.
If this is right, and I'm not too crazy, the solution would be a recursive function:
factors=[]
n=336
m=3
def getFactors(howMany, value):
if howMany < 2:
return value
root=getRoot(howMany, value) # get the root of value, eg square root, cube, etc.
factor=getLargestFactor(value, root) # get the largest factor of value smaller than root
otherFactors=getFactors(howMany-1, value / factor)
otherFactors.insert(factor)
return otherFactors
print getFactors(n, m)
I'm too lazy to code the rest, but that should do it.
m=5 n=4 then m^(1/n)
you get:
Answer=1.495
then
1.495*1.495*1.495*1.495 = 5
in C#
double Result = Math.Pow(m,1/(double)n);
The problem is:
Given a range of numbers (x,y) , Find all the prime numbers(Count only) which are sum of the squares of two numbers, with the restriction that 0<=x<y<=2*(10^8)
According to Fermat's theorem :
Fermat's theorem on sums of two squares asserts that an odd prime number p can be
expressed as p = x^2 + y^2 with integer x and y if and only if p is congruent to
1 (mod4).
I have done something like this:
import math
def is_prime(n):
if n % 2 == 0 and n > 2:
return False
return all(n % i for i in range(3, int(math.sqrt(n)) + 1, 2))
a,b=map(int,raw_input().split())
count=0
for i in range(a,b+1):
if(is_prime(i) and (i-1)%4==0):
count+=1
print(count)
But this increases the time complexity and memory limit in some cases.
Here is my submission result:
Can anyone help me reduce the Time Complexity and Memory limit with better algorithm?
Problem Link(Not an ongoing contest FYI)
Do not check whether each number is prime. Precompute all the prime numbers in the range, using Sieve of Eratosthenes. This will greatly reduce the complexity.
Since you have maximum of 200M numbers and 256Mb memory limit and need at least 4 bytes per number, you need a little hack. Do not initialize the sieve with all numbers up to y, but only with numbers that are not divisible by 2, 3 and 5. That will reduce the initial size of the sieve enough to fit into the memory limit.
UPD As correctly pointed out by Will Ness in comments, sieve contains only flags, not numbers, thus it requires not more than 1 byte per element and you don't even need this precomputing hack.
You can reduce your memory usage by changing for i in range(a,b+1): to for i in xrange(a,b+1):, so that you are not generating an entire list in memory.
You can do the same thing inside the statement below, but you are right that it does not help with time.
return all(n % i for i in xrange(3, int(math.sqrt(n)) + 1, 2))
One time optimization that might not cost as much in terms of memory as the other answer is to use Fermat's Little Theorem. It may help you reject many candidates early.
More specifically, you could pick maybe 3 or 4 random values to test and if one of them rejects, then you can reject. Otherwise you can do the test you are currently doing.
First of all, although it will not change the order of your time-complexity, you can still narrow down the list of numbers that you are checking by a factor of 6, since you only need to check numbers that are either equal to 1 mod 12 or equal to 5 mod 12 (such as [1,5], [13,17], [25,29], [37,41], etc).
Since you only need to count the primes which are sum of squares of two numbers, the order doesn't matter. Therefore, you can change range(a,b+1) to range(1,b+1,12)+range(5,b+1,12).
Obviously, you can then remove the if n % 2 == 0 and n > 2 condition in function is_prime, and in addition, change the if is_prime(i) and (i-1)%4 == 0 condition to if is_prime(i).
And finally, you can check the primality of each number by dividing it only with numbers that are adjacent to multiples of 6 (such as [5,7], [11,13], [17,19], [23,25], etc).
So you can change this:
range(3,int(math.sqrt(n))+1,2)
To this:
range(5,math.sqrt(n))+1,6)+range(7,math.sqrt(n))+1,6)
And you might as well calculate math.sqrt(n))+1 beforehand.
To summarize all this, here is how you can improve the overall performance of your program:
import math
def is_prime(n):
max = int(math.sqrt(n))+1
return all(n % i for i in range(5,max,6)+range(7,max,6))
count = 0
b = int(raw_input())
for i in range(1,b+1,12)+range(5,b+1,12):
if is_prime(i):
count += 1
print count
Please note that 1 is typically not regarded as prime, so you might want to print count-1 instead. On the other hand, 2 is not equal to 1 mod 4, yet it is the sum of two squares, so you may leave it as is...
The following below is an algorithm that finds the prime factorization for a given number N. I'm wondering if there are any ways to make this faster using HUGE numbers. I'm talking like 20-35 digit numbers. I wanna try and get these to go as fast as possible. Any ideas?
import time
def prime_factors(n):
"""Returns all the prime factors of a positive integer"""
factors = []
divisor = 2
while n > 1:
while n % divisor == 0:
factors.append(divisor)
n /= divisor
divisor = divisor + 1
if divisor*divisor > n:
if n > 1:
factors.append(n)
break
return factors
#HUGE NUMBERS GO IN HERE!
start_time = time.time()
my_factors = prime_factors(15227063669158801)
end_time = time.time()
print my_factors
print "It took ", end_time-start_time, " seconds."
Your algorithm is trial division, which has time complexity O(sqrt(n)). You can improve your algorithm by using only 2 and the odd numbers as trial divisors, or even better by using only prime numbers as trial divisors, but the time complexity will remain O(sqrt(n)).
To go faster you need a better algorithm. Try this:
def factor(n, c):
f = lambda(x): (x*x+c) % n
t, h, d = 2, 2, 1
while d == 1:
t = f(t); h = f(f(h)); d = gcd(t-h, n)
if d == n:
return factor(n, c+1)
return d
To call it on your number, say
print factor(15227063669158801, 1)
That returns the (possibly composite) factor 2090327 virtually instantly. It uses an algorithm called the rho algorithm, invented by John Pollard in 1975. The rho algorithm has time complexity O(sqrt(sqrt(n))), so it's much faster than trial division.
There are many other algorithms for factoring integers. For numbers in the 20 to 35 digit range that interests you, the elliptic curve algorithm is well-suited. It should factor numbers of that size in no more than a few seconds. Another algorithm that is well-suited to such numbers, especially those that are semi-primes (have exactly two prime factors), is SQUFOF.
If you're interested in programming with prime numbers, I modestly recommend this essay on my blog. When you're finished with that, other entries on my blog talk about elliptic curve factorization, and SQUFOF, and various other even more-powerful methods of factoring ever-larger integers.
For example, list all prime factorization for a number 100.
Check 2 is one of factorizations or not. And then, 2 < 2*c <= 100 could be removed. Ex, 4, 6, 8, ... 98
Check 3 is one of factorizations or not. And then, 3 < 2*d <= 100 could be removed. Ex, 9, 12, ... 99
4 is removed from possible set.
Check 5, And then, 10, 15, 20, ..., 100 are removed.
6 is removed.
Check 7, ....
....
It seems like there is no check for divisors. Sorry if I am wrong but how do you know if divisor is prime or not? Your divisor variable is increasing by 1 after each loop so I assume it will generate a lot of composite numbers.
No optimizations to that algorithm will allow you to factor 35 digit numbers at least in the general case. The reason is that the number of primes up to 35 digits are too high to be listed in a reasonable amount of time let alone attempt to divide by each one. Even if one was inclined to try, the number of bits required to store them would be far too much as well. In this case you'll want to select a different algorithm from the list of general purpose factorization algorithms.
However, if all the prime factors are small enough (say below 10^12 or so), then you could use a segmented Sieve of Eratosthenes, or simply find a list of primes up to some practical number (say 10^12 or so) online and use that instead of trying to calculate the primes and hope the list is large enough.
The 10th problem in Project Euler:
The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.
I found this snippet :
sieve = [True] * 2000000 # Sieve is faster for 2M primes
def mark(sieve, x):
for i in xrange(x+x, len(sieve), x):
sieve[i] = False
for x in xrange(2, int(len(sieve) ** 0.5) + 1):
if sieve[x]: mark(sieve, x)
print sum(i for i in xrange(2, len(sieve)) if sieve[i])
published here
which run for 3 seconds.
I wrote this code:
def isprime(n):
for x in xrange(3, int(n**0.5)+1):
if n % x == 0:
return False
return True
sum=0;
for i in xrange(1,int(2e6),2):
if isprime(i):
sum += i
I don't understand why my code (the second one) is much slower?
Your algorithm is checking every number individually from 2 to N (where N=2000000) for primality.
Snippet-1 uses the sieve of Eratosthenes algorithm, discovered about 2200 years ago.
It does not check every number but:
Makes a "sieve" of all numbers from 2 to 2000000.
Finds the first number (2), marks it as prime, then deletes all its multiples from the sieve.
Then finds the next undeleted number (3), marks it as prime and deletes all its multiples from the sieve.
Then finds the next undeleted number (5), marks it as prime and deletes all its multiples from the sieve.
...
Until it finds the prime 1409 and deletes all its multiples from the sieve.
Then all primes up to 1414 ~= sqrt(2000000) have been found and it stops
The numbers from 1415 up to 2000000 do not have to be checked. All of them who have not been deleted are primes, too.
So the algorithm produces all primes up to N.
Notice that it does not do any division, only additions (not even multiplications, and not that it matters with so small numbers but it might with bigger ones). Time complexity is O(n loglogn) while your algorithm has something near O(n^(3/2)) (or O(n^(3/2) / logn) as #Daniel Fischer commented), assuming divisions cost the same as multiplications.
From the Wikipedia (linked above) article:
Time complexity in the random access machine model is O(n log log n) operations, a direct consequence of the fact that the prime harmonic series asymptotically approaches log log n.
(with n = 2e6 in this case)
The first version pre-computes all the primes in the range and stores them in the sieve array, then finding the solution is a simple matter of adding the primes in the array. It can be seen as a form of memoization.
The second version tests for each number in the range to see if it is prime, repeating a lot of work already made by previous calculations.
In conclusion, the first version avoids re-computing values, whereas the second version performs the same operations again and again.
To easily understand the difference, try thinking how many times each number will be used as a potential divider:
In your solution, the number 2 will be tested for EACH number when that number will be tested for being a prime. Every number you pass along the way will then be used as a potential divider for every next number.
In the first solution, once you stepped over a number you never look back - you always move forward from the place you reached. By the way, a possible and common optimization is to go for odd numbers only after you marked 2:
mark(sieve, 2)
for x in xrange(3, int(len(sieve) ** 0.5) + 1, 2):
if sieve[x]: mark(sieve, x)
This way you only look at each number once and clear out all of its multiplications forward, rather than going through all possible dividers again and again checking each number with all its predecessors, and the if statement prevents you from doing repeated work for a number you previously encountered.
As Óscar's answer indicates, your algorithm repeats a lot of work. To see just how much processing the other algorithm saves, consider the following modified version of the mark() and isprime() functions, which keep track of how many times the function has been called and the total number of for loop iterations:
calls, count = 0, 0
def mark(sieve, x):
global calls, count
calls += 1
for i in xrange(x+x, len(sieve), x):
count += 1
sieve[i] = False
After running the first code with this new function we can see that mark() is called 223 times with a total of 4,489,006 (~4.5 million) iterations in the for loop.
calls, count = 0
def isprime(n):
global calls, count
calls += 1
for x in xrange(3, int(n**0.5)+1):
count += 1
if n % x == 0:
return False
return True
If we make a similar change to your code, we can see that isprime() is called 1,000,000 (1 million) times with 177,492,735 (~177.5 million) iterations of the for loop.
Counting function calls and loop iterations isn't always a conclusive way to determine why an algorithm is faster, but generally less steps == less time, and clearly your code could use some optimization to reduce the number of steps.
I am trying to solve a problem involving printing the product of all divisors of a given number. The number of test cases is a number 1 <= t <= 300000 , and the number itself can range from 1 <= n <= 500000
I wrote the following code, but it always exceeds the time limit of 2 seconds. Are there any ways to speed up the code ?
from math import sqrt
def divisorsProduct(n):
ProductOfDivisors=1
for i in range(2,int(round(sqrt(n)))+1):
if n%i==0:
ProductOfDivisors*=i
if n/i != i:
ProductOfDivisors*=(n/i)
if ProductOfDivisors <= 9999:
print ProductOfDivisors
else:
result = str(ProductOfDivisors)
print result[len(result)-4:]
T = int(raw_input())
for i in range(1,T+1):
num = int(raw_input())
divisorsProduct(num)
Thank You.
You need to clarify by what you mean by "product of divisors." The code posted in the question doesn't work for any definition yet. This sounds like a homework question. If it is, then perhaps your instructor was expecting you to think outside the code to meet the time goals.
If you mean the product of unique prime divisors, e.g., 72 gives 2*3 = 6, then having a list of primes is the way to go. Just run through the list up to the square root of the number, multiplying present primes into the result. There are not that many, so you could even hard code them into your program.
If you mean the product of all the divisors, prime or not, then it is helpful to think of what the divisors are. You can make serious speed gains over the brute force method suggested in the other answers and yours. I suspect this is what your instructor intended.
If the divisors are ordered in a list, then they occur in pairs that multiply to n -- 1 and n, 2 and n/2, etc. -- except for the case where n is a perfect square, where the square root is a divisor that is not paired with any other.
So the result will be n to the power of half the number of divisors, (regardless of whether or not n is a square).
To compute this, find the prime factorization using your list of primes. That is, find the power of 2 that divides n, then the power of 3, etc. To do this, take out all the 2s, then the 3s, etc.
The number you are taking the factors out of will be getting smaller, so you can do the square root test on the smaller intermediate numbers to see if you need to continue up the list of primes. To gain some speed, test p*p <= m, rather than p <= sqrt(m)
Once you have the prime factorization, it is easy to find the number of divisors. For example, suppose the factorization is 2^i * 3^j * 7^k. Then, since each divisor uses the same prime factors, with exponents less than or equal to those in n including the possibility of 0, the number of divisors is (i+1)(j+1)(k+1).
E.g., 72 = 2^3 * 3^2, so the number of divisors is 4*3 = 12, and their product is 72^6 = 139,314,069,504.
By using math, the algorithm can become much better than O(n). But it is hard to estimate your speed gains ahead of time because of the relatively small size of the n in the input.
You could eliminate the if statement in the loop by only looping to less than the square root, and check for square root integer-ness outside the loop.
It is a rather strange question you pose. I have a hard time imagine a use for it, other than it possibly being an assignment in a course. My first thought was to pre-compute a list of primes and only test against those, but I assume you are quite deliberately counting non-prime factors? I.e., if the number has factors 2 and 3, you are also counting 6.
If you do use a table of pre-computed primes, you would then have to also subsequently include all possible combinations of primes in your result, which gets more complex.
C is really a great language for that sort of thing, because even suboptimal algorithms run really fast.
Okay, I think this is close to the optimal algorithm. It produces the product_of_divisors for each number in range(500000).
import math
def number_of_divisors(maxval=500001):
""" Example: the number of divisors of 12 is 6: 1, 2, 3, 4, 6, 12.
Given a prime factoring of n, the number of divisors of n is the
product of each factor's multiplicity plus one (mpo in my variables).
This function works like the Sieve of Eratosthenes, but marks each
composite n with the multiplicity (plus one) of each prime factor. """
numdivs = [1] * maxval # multiplicative identity
currmpo = [0] * maxval
# standard logic for 2 < p < sqrt(maxval)
for p in range(2, int(math.sqrt(maxval))):
if numdivs[p] == 1: # if p is prime
for exp in range(2,50): # assume maxval < 2^50
pexp = p ** exp
if pexp > maxval:
break
exppo = exp + 1
for comp in range(pexp, maxval, pexp):
currmpo[comp] = exppo
for comp in range(p, maxval, p):
thismpo = currmpo[comp] or 2
numdivs[comp] *= thismpo
currmpo[comp] = 0 # reset currmpo array in place
# abbreviated logic for p > sqrt(maxval)
for p in range(int(math.sqrt(maxval)), maxval):
if numdivs[p] == 1: # if p is prime
for comp in range(p, maxval, p):
numdivs[comp] *= 2
return numdivs
# this initialization times at 7s on my machine
NUMDIV = number_of_divisors()
def product_of_divisors(n):
if NUMDIV[n] % 2 == 0:
# each pair of divisors has product equal to n, for example
# 1*12 * 2*6 * 3*4 = 12**3
return n ** (NUMDIV[n] / 2)
else:
# perfect squares have their square root as an unmatched divisor
return n ** (NUMDIV[n] / 2) * int(math.sqrt(n))
# this loop times at 13s on my machine
for n in range(500000):
a = product_of_divisors(n)
On my very slow machine, it takes 7s to compute the numberofdivisors for each number, then 13s to compute the productofdivisors for each. Of course it can be sped up by translating it into C. (#someone with a fast machine: how long does it take on your machine?)