Euler 3 Python. Putting the prime numbers into a list - python

Im still pretty new to python and I'm trying to get all of the prime numbers from 600851475143 into a list. However, I keep getting a random assortment of numbers in the list instead of the prime numbers. I'm not really sure where I am going wrong. Thank you for your time
import math
factors_list = []
prime_factors = []
def number_factors(s):
s = int(math.sqrt(s))
for num in range(2, s):
for i in range(2, num):
if (num % i) == 0:
factors_list.append(num)
else:
prime_factors.append(num)
number_factors(600851475143)
print factors_list
print prime_factors

Currently you append to prime_factor every time if (num % i) == 0. So, for example, if num=12 (not prime), and i=5 you'll do the append to prime_factor.
Instead, you should only append if it has no divisors at all, not just a single number doesn't divide evenly.
I'll warn you ahead of time though, this problem is not only about calculating prime numbers, but that 600851475143 is a very large number. So you should probably get your current code working as a learning exercise, but you'll need to rethink your approach to the full solution.

Here's a better algorithm for factoring n. I'll describe it in words, so you can work out the coding yourself.
1) Set f = 2. Variable f represents the current trial factor.
2) If f * f > n, then n must be prime, so output n and stop.
3) Divide n by f. If the remainder is 0, then f is a factor of n,
so output f and set n = n / f, then return to Step 2.
4) Since the remainder in the prior step was not 0, set f = f + 1
and return to Step 2.
For instance, to factor 13195, first set f = 2; the test in Step 2 is not satisfied, the remainder in Step 3 is 1, so in Step 4 set f = 3 and return to Step 2. Now the test in Step 2 is not satisfied, the remainder in Step 3 is 1, so in Step 4 set f = 4 and return to Step 2. Now the test in Step 2 is not satisfied, the remainder in Step 3 is 3, so in Step 4 set f = 5 and return to Step 2.
Now the test in Step 2 is not satisfied, but the remainder in Step 3 is 0, so 5 is a factor of 13195; output 5, set n = 2639, and return to Step 2. Now the test in Step 2 is not satisfied, the remainder in Step 3 is 4, so in Step 4 set f = 6 and return to Step 2. Now the test in Step 2 is not satisfied, the remainder in Step 3 is 5, so in Step 4 set f = 7 and return to Step 2.
Now the test in Step 2 is not satisfied, but the remainder in Step 3 is 0, so 7 is a factor of 2639 (and also of 13195); output 7, set n = 377, and return to Step 2. Now the test in Step 2 is not satisfied, the remainder in Step 3 is 6, so in Step 4 set f = 8 and return to Step 2. Continue in this way until f = 13.
Now the test in Step 2 is not satisfied, but the remainder in Step 3 is 0, so 13 is a factor of 377 (and also of 2639 and 13195); output 13, set n = 29, and return to Step 2. Here the test in Step 2 is satisfied, since 13 * 13 = 169 which is greater than 29, so 29 is prime, output it and halt. The final factorization is 5 * 7 * 13 * 29 = 13195.
The factorization of 600851475143 works in exactly the same way, except that it takes longer. There are better ways to factor integers. But this algorithm is simple, and is sufficient for PE3.

This will run quite slowly for large numbers. Consider the case in which the algorithm attempts to find the prime factors where num = 1000000. Your nested FOR loop will generate 1million operations before the next number is even considered!
Consider using the Sieve of Eratosthones to get all of the prime numbers up to a certain integer. It is not as efficient as certain other Sieves, but is easy to implement. Spend some time reading the theory behind the sieve before implementing--this will help your understanding of later problems.
http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

Related

Does an algorithm exist that converts a (base 10) number to into another number for any base in constant time?

I am solving a problem where I am given three integers (a,b,c), all three can be very large and (a>b>c)
I want to identify for which base between b and c, produces the smallest sum of digits, when we convert 'a' to that base.
For example a = 216, b=2, c=7 -> the output= 6, because: 216 base 2 = 11011000, and the sum of digits = 4, if we do the same for all bases between 2 and 7, we find that 216 base 6 produces the smallest sum of digits, because 216 base 6 = 1000, which has sum 1.
My question is, is there any function out there that can convert a number to any base in constant time faster than the below algorithm? Or any suggestions on how to optimise my algorithm?
from collections import defaultdict
n = int(input())
for _ in range(n):
(N,X) = map(int,input().split())
array = list(map(int,input().split()))
my_dict = defaultdict(int)
#original count of elements in array
for i in range(len(array)):
my_dict[array[i]] +=1
#ensure array contains distinct elements
array = set(array)
count = max(my_dict.values()) #count= max of single value
temp = count
res = None
XOR_count = float("inf")
if X==0:
print(count,0)
break
for j in array:
if j^X in my_dict:
curr = my_dict[j^X] + my_dict[j]
if curr>=count:
count = curr
XOR_count = min(my_dict[j],XOR_count)
if count ==temp:
XOR_count = 0
print(f"{count} {XOR_count}")
Here are some sample input and outputs:
Sample Input
3
3 2
1 2 3
5 100
1 2 3 4 5
4 1
2 2 6 6
Sample Output
2 1
1 0
2 0
Which for the problem I am solving runs into time limit exceeded error.
I found this link to be quite useful (https://www.purplemath.com/modules/logrules5.htm) in terms of converting log bases, which I can kind of see how it relates, but I couldn't use it to get a solution for my above problem.
You could separate the problem in smaller concerns by writing a function that returns the sum of digits in a given base and another one that returns a number expressed in a given base (base 2 to 36 in my example below):
def digitSum(N,b=10):
return N if N<b else N%b+digitSum(N//b,b)
digits = "0123456789abcdefghijklmnopqrstuvwxyz"
def asBase(N,b):
return "" if N==0 else asBase(N//b,b)+digits[N%b]
def lowestBase(N,a,b):
return asBase(N, min(range(a,b+1),key=lambda c:digitSum(N,c)) )
output:
print(lowestBase(216,2,7))
1000 # base 6
print(lowestBase(216,2,5))
11011000 # base 2
Note that both digitSum and asBase could be written as iterative instead of recursive if you're manipulating numbers that are greater than base^1000 and don't want to deal with recursion depth limits
Here's a procedural version of digitSum (to avoid recursion limits):
def digitSum(N,b=10):
result = 0
while N:
result += N%b
N //=b
return result
and returning only the base (not the encoded number):
def lowestBase(N,a,b):
return min(range(a,b+1),key=lambda c:digitSum(N,c))
# in which case you don't need the asBase() function at all.
With those changes results for a range of bases from 2 to 1000 are returned in less than 60 milliseconds:
lowestBase(10**250+1,2,1000) --> 10 in 57 ms
lowestBase(10**1000-1,2,1000) --> 3 in 47 ms
I don't know how large is "very large" but it is still sub-second for millions of bases (yet for a relatively smaller number):
lowestBase(10**10-1,2,1000000) --> 99999 in 0.47 second
lowestBase(10**25-7,2,1000000) --> 2 in 0.85 second
[EDIT] optimization
By providing a maximum sum to the digitSum() function, you can make it stop counting as soon as it goes beyond that maximum. This will allow the lowestBase() function to obtain potential improvements more efficiently based on its current best (minimal sum so far). Going through the bases backwards also gives a better chance of hitting small digit sums faster (thus leveraging the maxSum parameter of digitSum()):
def digitSum(N,b=10,maxSum=None):
result = 0
while N:
result += N%b
if maxSum and result>=maxSum:break
N //= b
return result
def lowestBase(N,a,b):
minBase = a
minSum = digitSum(N,a)
for base in range(b,a,-1):
if N%base >= minSum: continue # last digit already too large
baseSum = digitSum(N,base,minSum)
if baseSum < minSum:
minBase,minSum = base,baseSum
if minSum == 1: break
return minBase
This should yield a significant performance improvement in most cases.

Finding max of ANDing between two numbers in Python

I am a beginner to Python coding. I have two numbers A and B from user.
My problem is to find the max(P AND Q) where A <= P < Q <= B
I have two solutions right now for this.
Solution 1 : # ANDing with all combinations, This solution works if combinations are less. For higher values, it throws memory exceeding error.
given = raw_input()
n= list(map(int,given.split()))
A = n[0]
B = n[1]
newlist = range(B+1)
# print newlist
# Finding all combinations
comb = list(itertools.combinations(newlist,2))
# print comb
# ANDing
l = []
for i in com:
x = i[0] & i[1]
l.append(x)
# print l
print max(l)
Solution 2: After observing many input-outputs, when B == Odd, max(value) = B-1 and for B == Even, max(value) = B-2.
given = raw_input()
n= list(map(int,given.split()))
A = n[0]
B = n[1]
if B % 2 == 0:
print (B - 2)
else:
print (B -1)
According to the problem statement I am not using any ANDing for Solution 2. Still I am getting correct output.
But I am looking for much easier and Pythonic logic. Is there any other way/logic to solve this?
Your second solution is the optimal solution. But why? First, consider that a logical AND is performed on the binary representation of a number, and it is only possible to produce a number less than or equal to the smallest operand of the AND operator. For instance, 9 is represented as 1001, and there is no number that 9 can be anded with that produces a number higher than 9. Indeed, the only possible outputs for anding another number with 9 would be 9, 8, 1 and 0. Or alternatively, the biggest result from anding 9 with a number smaller than 9, is 9 less its least significant bit (so 8). If you're not sure of the binary representation of a number you can always use the bin function. eg. bin(9) => '0b1001'.
Let's start with odd numbers (as they're the easiest). Odd numbers are easy because they always have a bit in the unit position. So the maximum possible number that we can get is B less that bit in the unit position (so B - 1 is the maximum). For instance, 9 is represented as 1001. Get rid of the unit bit and we have 1000 or 8. 9 and 8 == 8, so the maximum result is 8.
Now let's try something similar with evens. For instance, 14 is represented as 1110. The maximum number we can get from anding 14 with another number would be 1100 (or 12). Like with odds, we must always lose one bit, and the smallest possible bit that can be lost is the bit in 2s position. Here, we're fortunate as 14 already as a bit in the 2s position. But what about numbers that don't? Let's try 12 (represented as 1100). If we lost the smallest bit from 12, we would have 1000 or 8. However, this is not the maximum possible. And we can easily prove this, because the maximum for 11 is 10 (since we have shown the maximum for an odd number is the odd number less 1).
We have already shown that the biggest number that can be produced from anding two different numbers is the bigger number less its least significant bit. So if that bit has a value of 2 (in the case of 14), when we can just lose that bit. If that bit has a value higher than 2 (in the case of 12), then we know the maximum is the maximum of the biggest odd number less than B (which is 1 less than the odd number and 2 less than B).
So there we have it. The maximum for an odd number is the number less 1. And the maximum for an even number is the number less 2.
def and_max(A, B): # note that A is unused
if B & 1: # has a bit in the 1 position (odd)
P, Q = B - 1, B
else:
P, Q = B - 2, B - 1
# print("P = ", P, "Q = ", Q)
return P & Q # essentially, return P
Note that none of this covers negative numbers. This is because most representations of negative numbers are in two's complement. What this means is that all negative numbers are represented as constant negative number plus a positive number. For instance, using an 4-bit representation of integers the maximum possible number would be 0111 (or 7, 4 + 2 + 1). Negative numbers would be represented as -8 plus some positive number. This negative part is indicated by a leading bit. Thus -8 is 1000 (-8 + 0) and -1 is 1111 (-8 + 7). And that's the important part. As soon as you have -1, you have an all 1s bitmask which is guaranteed to lose the negative part when anded with a positive number. So the maximum for max(P and Q) where A <= P < Q <= B and A < 0 is always B. Where B < 0, we can no longer lose the negative bit and so must maximise the positive bits again.
I think this should work:
given = raw_input()
a, b = tuple(map(int,given.split()))
print(max([p & q for q in range(a,b+1) for p in range(a,q)]))
long a,b,c,ans;
for(int i=0;i<n;i++){
a=s.nextLong();
b=s.nextLong();
if(b%2==0)
ans=b-2;
else
ans=b-1;
if(ans>=a)
System.out.println(ans);
else
System.out.println(a&b);
}

Wieferich prime numbers

I need help with an assigment I'm working on. the task is to write a program to find all Wieferich prime numbers between two given values. The equation to determine if it is a Wieferich prime is this:
a Wieferich prime number p is such that p2 divides 2(p − 1) − 1
This is what I have so far:
start=int(input("enter start value"))
end=int(input("enter end value"))
for c in range(start,end):
if c%2!=0:
primedet=(c**2)/((2**(c-1))-1)
if primedet%1==0:
print(c," is a Wiefrich Prime")
Every time I run it, it just prints all the odd numbers between the given values. I know that there are only two Wieferich prime numbers: 1093 and 3011. I really just not sure how to make this work. Any guidance would be appreciated.
The use of modular arithmetic make this a more easy task, because you want that 2p-1 -1 be divisible by p2, that is 2p-1 -1 = 0 (mod p2) rearrange this you get 2p-1 = 1 (mod p2) in python this is
(2**(p-1)) % (p**2) == 1
but that is inefficient because first calculate 2p-1 to then take the modulo, but don't worry, python have a efficient way of doing modular exponentiation with the 3 argument call of pow
pow(2,p-1,p**2) == 1
finally you also need that p be a prime, then with implementing a primality test you are ready to go
def isPrime(n:int) -> bool:
return True #put here the code for primality check
def find_Wieferich_prime_in(star,end) -> [int]:
resul = list()
for p in range(star,end):
if isPrime(p) and pow(2,p-1,p**2)==1:
resul.append(p)
return resul
print(find_Wieferich_prime_in(0,4000))
and that is everything that you need to find the Wieferich prime
Your other mistake is in here
primedet=(c**2)/((2**(c-1))-1)
2c-1-1 is always bigger that c2 (to a sufficient large c ) so the division c2/(2c-1-1) < 1
furthermore
primedet%1
because primedet is a float, when you do float%1 it give you the decimal part of that number, mix round issues and you will get too many zeros,
but more than that, what you are testing there is something that is not the definition of a Wieferich prime.
This is very simple. Based on your statement, the numbers have the property of being prime prime and Wieferich just by the means of the equation you gave, so (2(p - 1) - 1) % p2 == 0 returns True means you found a number. As explained by #Copperfield, this can be written as (2(p-1)) % p2 == 1. Then you can do (with the help of pow which is faster):
# I assume we have `start` and `end` given by user. Now we can safely
# start from the first odd number greater or equal to start so we can
# stride by 2 in the `range` call which will half our iteration
start = start + 1 if start % 2 == 0 else start
# next I'm using filter because it's faster then the normal `for` loop
# and gives us exactly what we need, that is the list of numbers
# that pass the equation test. Note I've also included the `end`
# number. If I were to write `range(start, end, 2)` we wouldn't
# test for `end`
restult = list(filter(lambda n: pow(2, n - 1, n*n) == 1, range(start, end + 2, 2)))

Big O notation in python

Does anyone know of any good resources to learn big o notation? In particular learning how to walk through some code and being able to see that it would be O(N^2) or O(logN)? Preferably something that can tell me why a code like this is equal to O(N log N)
def complex(numbers):
N = len(numbers)
result = 0
for i in range(N):
j = 1
while j < N:
result += numbers[i]*numbers[j]
j = j*2
return result
Thanks!
To start, let me define to you what O(N log N) is. It means, that the program will run at most N log N operations, i.e. it has a upper bound of ~N log N (where N is the size of the input).
Now here, your N is the size of numbers, or your code:
N = len(numbers)
Notice that the first for loop runs from 0 to N-1, for a total of N operations. This is where the first N comes from.
-
Then, where does the log N come from? It is from the while loop.
In the while loop, you keep multiplying 2 to j until j is greater or equal than N.
This will be completed when we have executed the loop ~log2(N) times, which describes how many times we have to multiply j by 2 to get to N. For example, log2(8) = 3, because we multiply j by 2 three times to get 8:
#ofmult. j oldj
1 2 2 <- 1 * 2
2 4 4 <- 2 * 2
3 8 8 <- 4 * 2
To better illustrate this, I have added a print statement in your code, for i and j:
def complex(numbers):
N = len(numbers)
result = 0
for i in range(N):
j = 1
while j < N:
print(str(i) + " " + str(j))
result += numbers[i]*numbers[j]
j = j*2
return result
When this is run:
>>> complex([2,3,5,1,5,3,7,3])
This is what is outputted:
0 1
0 2
0 4
1 1
1 2
1 4
2 1
2 2
2 4
3 1
3 2
3 4
4 1
4 2
4 4
5 1
5 2
5 4
6 1
6 2
6 4
7 1
7 2
7 4
Notice how our i goes from 0...7 (N times for a total of O(N) ), and the second part, there are always 3 ( log2(N) ) j-outputs for every i.
So, the code is O(N log2 N).
Also, some good websites I would recommend are:
https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
And, a video from a lecture series from a Stanford professor:
https://www.youtube.com/watch?v=eNsKNfFUqFo
When you multiply j by 2, you're effectively saying "I've done half the remaining problem!". At each step in the while loop, you're solving half the remaining problem. Therefore if your problem is x size, then the number of iterations required would be i = log_2 x, which we just say is log x. In this case your x is just equal to N.
The for loop has you do the above section N times again, so you get N * log N.
We use O(N log N) to mean that, at each step, we might do any CONSTANT number of things (for example inside the while loop I might do a billion operations), but we don't care about this constant, because generally N is usually bigger, and can get arbitrarily big (beyond a certain size point, even a billion is nothing in comparison to what N COULD be, i.e. a googol). Hence we have O(N log N).
Here's a short crash course in the form of a pdf:
http://www1.icsi.berkeley.edu/~barath/cs61b-summer2002/lectures/lecture10.pdf
Here's a short crash course in the form of a lecture:
https://www.youtube.com/watch?v=VIS4YDpuP98

Does this prime function actually work?

Since I'm starting to get the hang of Python, I'm starting to test my newly acquired Python skills on some problems on projecteuler.net.
Anyways, at some point, I ended up making a function for getting a list of all primes up until a number 'n'.
Here's how the function looks atm:
def primes(n):
"""Returns list of all the primes up until the number n."""
# Gather all potential primes in a list.
primes = range(2, n + 1)
# The first potential prime in the list should be two.
assert primes[0] == 2
# The last potential prime in the list should be n.
assert primes[-1] == n
# 'p' will be the index of the current confirmed prime.
p = 0
# As long as 'p' is within the bounds of the list:
while p < len(primes):
# Set the candidate index 'c' to start right after 'p'.
c = p + 1
# As long as 'c' is within the bounds of the list:
while c < len(primes):
# Check if the candidate is divisible by the prime.
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed.
primes.pop(c)
# Move on to the next candidate and redo the process.
c = c + 1
# The next integer in the list should now be a prime,
# since it is not divisible by any of the primes before it.
# Thus we can move on to the next prime and redo the process.
p = p + 1
# The list should now only contain primes, and can thus be returned.
return primes
It seems to work fine, although one there's one thing that bothers me.
While commenting the code, this piece suddenly seemed off:
# Check if the candidate is divisible by the prime.
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed from the list.
primes.pop(c)
# Move on to the next candidate and redo the process.
c += 1
If the candidate IS NOT divisible by the prime we examine the next candidate located at 'c + 1'. No problem with that.
However, if the candidate IS divisible by the prime, we first pop it and then examine the next candidate located at 'c + 1'.
It struck me that the next candidate, after popping, is not located at 'c + 1', but 'c', since after popping at 'c', the next candidate "falls" into that index.
I then thought that the block should look like the following:
# If the candidate is divisible by the prime:
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed from the list.
primes.pop(c)
# If not:
else:
# Move on to the next candidate.
c += 1
This above block seems more correct to me, but leaves me wondering why the original piece apparently worked just fine.
So, here are my questions:
After popping a candidate which turned out not be a prime, can we assume, as it is in my original code, that the next candidate is NOT divisible by that same prime?
If so, why is that?
Would the suggested "safe" code just do unnecessary checks on the candidates which where skipped in the "unsafe" code?
PS:
I've tried writing the above assumption as an assertion into the 'unsafe' function, and test it with n = 100000. No problems occurred. Here's the modified block:
# If the candidate is divisible by the prime:
if(primes[c] % primes[p] == 0):
# If it is, it isn't a prime, and should be removed.
primes.pop(c)
# If c is still within the bounds of the list:
if c < len(primes):
# We assume that the new candidate at 'c' is not divisible by the prime.
assert primes[c] % primes[p] != 0
# Move on to the next candidate and redo the process.
c = c + 1
It fails for much bigger numbers. The first prime is 71, for that the candidate can fail. The smallest failing candidate for 71 is 10986448536829734695346889 which overshadows the number 10986448536829734695346889 + 142.
def primes(n, skip_range=None):
"""Modified "primes" with the original assertion from P.S. of the question.
with skipping of an unimportant huge range.
>>> primes(71)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]
>>> # The smallest failing number for the first failing prime 71:
>>> big_n = 10986448536829734695346889
>>> primes(big_n + 2 * 71, (72, big_n))
Traceback (most recent call last):
AssertionError
"""
if not skip_range:
primes = list(range(2, n + 1))
else:
primes = list(range(2, skip_range[0]))
primes.extend(range(skip_range[1], n + 1))
p = 0
while p < len(primes):
c = p + 1
while c < len(primes):
if(primes[c] % primes[p] == 0):
primes.pop(c)
if c < len(primes):
assert primes[c] % primes[p] != 0
c = c + 1
p = p + 1
return primes
# Verify that it can fail.
aprime = 71 # the first problematic prime
FIRST_BAD_NUMBERS = (
10986448536829734695346889, 11078434793489708690791399,
12367063025234804812185529, 20329913969650068499781719,
30697401499184410328653969, 35961932865481861481238649,
40008133490686471804514089, 41414505712084173826517629,
49440212368558553144898949, 52201441345368693378576229)
for bad_number in FIRST_BAD_NUMBERS:
try:
primes(bad_number + 2 * aprime, (aprime + 1, bad_number))
raise Exception('The number {} should fail'.format(bad_number))
except AssertionError:
print('{} OK. It fails as is expected'.format(bad_number))
I solved these numbers by a complicated algorithm like a puzzle by searching possible remainders of n modulo small primes. The last simple step was to get the complete n (by chinese remainder theorem in three lines of Python code). I know all 120 basic solutions smaller than primorial(71) = 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 * 29 * 31 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 repeated periodically by all multiples of this number. I rewrote the algorithm many times for every decade of tested primes because for every decade was the solution much slower than for the previous. Maybe I find a smaller solution with the same algorithm for primes 73 or 79 in acceptable time.
Edit:
I would like to find also a complete silent fail of the unsafe original function. Maybe exists some candidate composed from different primes. This way of solution would only postpone the final outcome for later. Every step would be much more and more expensive for time and resources. Therefore only numbers composed from one or two primes are attractive.
I expect that only two solutions the hidden candidate c are good: c = p ** n or c = p1 * p ** n or c = p1 ** n1 * p ** n where p and p1 are primes and n is a power greater than 1. The primes function fails if c - 2 * p is divisible by no prime smaller than p and if all number between c-2n and c are divisible by any prime smaller than p. The variant p1*p**n requires also that the same c had failed before for p1 (p1 < p) as we already know infinite number of such candidates.
EDIT: I found a smaller example of failure: number 121093190175715194562061 for the prime 79. (which is about ninety times less than for 71) I can't continue by the same algorithm to find smaller examples because all 702612 basic solutions took more than 30 hours for the prime 79 on my laptop.
I also verified it for all candidates smaller than 400000000 (4E10) and for all relevant primes, that no candidate will fail the assertion in the question. Until you have terabytes of memory and thousands years of time, the assertion in the algorithm will pass, because your time complexity is O((n / log(n)) ^2) or very similar.
Your observation seems to be accurate, which is quite a good catch.
I suspect the reason that it works, at least in some cases, is because composite numbers are actually factored into multiple primes. So, the inner loop may miss the value on the first factor, but it then picks it up on a later factor.
For a small'ish "n", you can print out values of the list to see if this is what is happening.
This method of finding primes, by the way, is based on the Sieve of Eratothenes. It is possible when doing the sieve that if "c" is a multiple of "p", then the next value is never a multiple of the same prime.
The question is: are there any cases where all values between p*x and p*(x+1) are divisible by some prime less than p and p*x+1). (This is where the algorithm would miss a value and it would not be caught later.) However, one of these values is even, so it would be eliminated on round "2". So, the real question is whether there are cases where all values between p*x and p*(x+2) are divisible by numbers less than p.
Off hand, I can't think of any numbers less than 100 that meet this condition. For p = 5, there is always a value that is not divisible by 2 or 3 between two consecutive multiples of 5.
There seems to be a lot written on prime gaps and sequences, but not so much on sequences of consecutive integers divisible by numbers less than p. After some (okay, a lot) of trial and error, I've determined that every number between 39,474 (17*2,322) and 39,491 (17*2,233) is divisible by an integer less than 17:
39,475 5
39,476 2
39,477 3
39,478 2
39,479 11
39,480 2
39,481 13
39,482 2
39,483 3
39,484 2
39,485 5
39,486 2
39,487 7
39,488 2
39,489 3
39,490 2
I am not sure if this is the first such value. However, we would have to find sequences twice as long as this. I think that is unlikely, but not sure if there is a proof.
My conclusion is that the original code might work, but that your fix is the right thing to do. Without a proof that there are no such sequences, it looks like a bug, albeit a bug that could be very, very, very rare.
Given two numbers n, m in the consecutive sequence of possible primes such that n and m are not divisible by the last divisor p, then m - n < p
Given q (the next higher divisor) > p, then if n is divisible by q, then the next number divisible by q is n + q > n + p > m
so m should be skipped in the current iteration for divisibility test
Here n = primes[c]
m = primes[c + 1], i.e. primes[c] after primes.pop(c)
p = primes[p]
q = primes[p+1]
This program does not work correctly, i.e., it incorrectly reports a composite number as prime. It turns out to have the same bug as a program by Wirth. The details may be found in Paul Pritchard, Some negative results concerning prime number generators, Communications of the ACM, Vol. 27, no. 1, Jan. 1984, pp. 53–57. This paper gives a proof that the program must fail, and also exhibits an explicit composite which it reports as prime.
This doesn't provide a remotely conclusive answer, but here's what I've tried on this:
I've restated the required assumption here as (lpf stands for Least Prime Factor):
For any composite number, x, where:
lpf(x) = n
There exists a value, m, where 0 < m < 2n and:
lpf(x+m) > n
It can be easily demonstrated that values for x exist where no composite number (x+m), exists to satisfy the inequality. Any squared prime demonstrates that:
lpf(x) = x^.5, so x = n^2
n^2 + 2n < (n + 1)^2 = n^2 + 2n + 1
So, in the case of any squared prime, for this to hold true, there must be a prime number, p, present in the range x < p < x + 2n.
I think that can be concluded given the asymptotic distribution of squares (x^.5) compared to the the Prime Number Theorem (asymptotic distribution of primes approx. x/(ln x)), though, really, my understanding of the Prime Number Theorem is limited at best.
And I have no strategy whatsoever for extending that conclusion to non-square composite numbers, so that may not be a useful avenue.
I've put together a program testing values using the above restatement of the problem.
Test this statement directly should remove any got-lucky results from just running the algorithm as stated. By got-lucky results, I'm referring to a value being skipped that may not be safe, but that doesn't turn up any incorrect results, due to a skipped value not being divisible by the number currently being iterated on, or being picked up by subsequent iterations. Essentially, if the algorithm gets the correct result, but either doesn't find the LEAST prime factor of each eliminated value, or doesn't rigorously check each prime result, I'm not satisfied with it. If such cases exist, I think it's reasonable to assume that cases also exist where it would not get lucky (unusual though they may be), and would render an incorrect result.
Running my test, however, shows no counter-examples in the values from 2 - 2,000,000. So, for what it's worth, values from the algorithm as stated should be safe up to, at least, 2,000,000, unless my logic is incorrect.
That's what I have to add. Great question, Phazyck, had fun with it!
Here is an idea:
Triptych explained1 that the next number after c cannot be c + p, but we still need to show that it can also never be c + 2p.
If we use primes = [2], we can only have one consecutive "non-prime", an number divisible by 2.
If we use primes = [2,3] we can construct 3 consecutive "non-primes", a number divided by 2, a number divided by three, and a number divided by 2, and they cannot get the next number. Or
2,3,4 => 3 consecutive "non-primes"
Even though 2 and 3 are not "non-primes" it is easier for me to think in terms of those numbers.
If we use [2,3,5], we get
2,3,4,5,6 => 5 consecutive "non-primes"
If we use [2,3,5,7], we get
2,3,4,5,6,7,8,9,10 => 9 consecutive "non-primes"
The pattern emerges. The most consecutive non-primes that we can get is next prime - 2.
Therefore, if next_prime < p * 2 + 1, we have to have at least some number between c and c + 2p, because number of consecutive non-primes is not long enough, given the primes yet.
I don't know about very very big number, but I think this next_prime < p * 2 + 1 is likely to hold very big numbers.
I hope this makes sense, and adds some light.
1 Triptych's answer has been deleted.
If prime p divides candidate c, then the next larger candidate that is divisible by p is c + p. Therefore, your original code is correct.
However, it's a rotten way to produce a list of primes; try it with n = 1000000 and see how slow it gets. The problem is that you are performing trial division when you should be using a sieve. Here's a simple sieve (pseudocode, I'll let you do the translation to Python or another language):
function primes(n)
sieve := makeArray(2..n, True)
for p from 2 to n step 1
if sieve[p]
output p
for i from p+p to n step p
sieve[i] := False
That should get the primes less than a million in less than a second. And there are other sieve algorithms that are even faster.
This algorithm is called the Sieve of Eratosthenes, and was invented about 2200 years ago by a Greek mathematician. Eratosthenes was an interesting fellow: besides sieving for primes, he invented the leap day and a system of latitude and longitude, accurately calculated the distance from Sun to Earth and the circumference of the Earth, and was for a time the Chief Librarian of Ptolemy's Library in Alexandria.
When you are ready to learn more about programming with prime numbers, I modestly recommend this essay at my blog.

Categories

Resources