How does this largest prime factor finding algorithm work? - python

I saw this YouTube video online where this guy finds the largest prime factor of a number using a seemingly simple approach, but I don't quite understand the math of it. Here's the link https://m.youtube.com/watch?v=5kv9q7qgvlI
The code -
n=1234 # Some number whose largest factor I want to find
i=2 # It seems that he tries to start from the smallest prime number
while i**2<n: # I don't understand this part where he checks if the square of variable is less than the target number
while n%i==0: # I know this checks if n is divisible by i
n/=i
i+=1 # increments i's value
print(n)
I know that this code works, but why? I get the last two lines, but why is it necessary to check if the square of variable i is less than n?

If your number N is divisible by I (in other terms, I being a factor of N), and I is greater than sqrt(N), then there must be another factor of N, call it J = N/I, being smaller than sqrt(N).
If you exhaustively searched all potential factors up to I, you must have already found J, so you covered that factorization earlier.
We are looking for the largest prime factor, so the question remains whether the final N when terminating the loop is a prime number.
Whenever we encountered a factor of N, we divided N by this factor, so when considering a potential factor I, we can be sure that the current N won't have any factors below I any more.
Can the final N when terminating the loop be composed from more than one factor (i.e. not being prime)?
No.
If N isn't prime, it must be composed from K and L (N = K * L), and K and L must be both bigger than sqrt(N), otherwise we would have divided by K or L in an earlier step. But multiplying two numbers, each bigger than sqrt(N), will always be bigger than N, violating N = K * L.
So, assuming that the final N wasn't prime, runs into a contradiction, and thus we can be sure that the final N is a factor of the original N, and that this factor is indeed a prime number.
Caveats with the original code (thanks JohanC):
The original code checks for I being strictly less than SQRT(N) (while i**2<n). That will miss cases like N=9 where its square root 3 must be included in the iteration. So, this should better read while i**2<=n.
And the code risks some inaccuracies:
Using floating-point division (/= operator instead of //=) might give inexact results. This applies to Python, while in languages like Java, /= would be fine.
In Python, raising an integer to the power of 2 (while i**2<=n) seems to guarantee exact integer arithmetic, so that's ok in the Python context. In languages like Java, I'd recommend not to use the pow() function, as that typically uses floating-point arithmetic with the risk of inexact results, but to simply write i*i.

Related

Why is the Python Code running so long and is there any way to get the output quickly?

I have a Python Code to get the largest prime factor of a Number and the below is my code
When I give the input to the number up to an 8-digit number takes a few minutes but when I tried running the code for a 12-digit number 600851475143 it took more time and still it didn't give any output or any error.
So is there any way how to get the output for the 12-digit numbers quickly?
def large_prime_fact(num):
prime_factors=[]
if num==2 or num==3:
return(prime_factors.append(num))
if num%2==0:
prime_factors.append(2)
for i in range(3,num,2):
cnt=0
if num%i==0:
for j in range(2,i):
if i%j==0:
cnt+=1
if cnt==0 and i!=2:
prime_factors.append(i)
return prime_factors
if __name__=='__main__':
number=int(input('Enter the number:'))
print('The Largest prime factor for',number,'is :',max(large_prime_fact(number)))
As Karl Knechtel pointed out your approach is seriously flawed. Here on SO there are lots of questions about prime numbers and factorization, which you may want to read.
That said, here's my solution. It may still benefit from further optimizations, but it solves your 600851475143 example in about 1ms.
##define a prime numbers generator
def prime_gen():
primes = []
n = 2
while n < 2**64:
if primes == []:
primes.append(n)
yield n
elif len(primes) == 1:
n = 3
primes.append(n)
yield n
else:
n += 2
limit = int(sqrt(n))+1
for p in takewhile(lambda x:x<limit, primes):
if n%p==0:
break
else:
primes.append(n)
yield n
##factorize and return largest factor
def largest_prime(n):
target = n
for x in prime_gen():
while target % x == 0:
target //= x
if target == 1:
return x
You have two algorithms here:
An algorithm for finding all factors of num, which contains
An algorithm for checking the primality of each factor you find
Both algorithms are implemented incredibly inefficiently. The primality tester knows, from the moment it finds a single smaller value which divides it, that it's not prime, but you continue counting all the factors of that number, and use the count solely to check if it's zero or non-zero. Since the vast majority of composite numbers will have a smallish factor, you could just break your checking loop as soon as you find even one, knowing it's composite immediately and avoiding huge amounts of work.
Even better, you could pre-compute the prime numbers up to the square root of the target number (note: Only need to go up to the square root because any factor larger than the square root definitionally has a factor below the square root which you could use to find it without exhaustive search). Efficiently computing all prime numbers in a fixed range (especially such a tiny range as the square root of your target number, which is under 1M) can be done much more efficiently than repeated trial division (search information on the Sieve of Eratosthenes for a decent algorithm for smallish primes).
Replacing slow per factor primality tests from #2 with a cheap precompute of all possible prime factors below the square root (allowing you to cheaply determine any prime factors above the square root) should solve the problem. There are additional optimizations available, but that should get 99% of it.

big O that is less than N?

Being completely self (and StackOverflow) taught, I'm new to Big-O notation and need to understand it better for some upcoming interviews.
My question is how do you annotate in Big-O when the complexity is less than N? Example is a prime calculator that checks for remainder of every integer up to N/2, since if we find no divisor less than half, it's certain there will be none in the upper half.
So is that O(N/2) or does N become N = N/2 for the purposes of the notation?
def primecheck(num):
i = 2
while i <= ( num // 2 ):
if not (num % i):
return False
i += 1
return True
Big-O notation is designed to ignore constant factors. As long as k is a constant (meaning, something unrelated to N, such as 1/2, 1/3, 1, 2, 3, etc), then O(kN) means exactly the same thing as O(N).
Big-O notation is designed to express variable-only complexity, all constants are ignored in this case so n/2 is expressed same as 4n. However prime numbers checking is a problem that needs to check only up to sqrt(n) so the problem is just O(sqrt(n))

How to improve running time for Project Euler exercises

I am experiencing some running time issues with Project Euler. The exercise can be found here:
Project Euler exercise 12. My solution is:
def triangularnr(n):
T_n = n*(n+1)/2 #Function to calculate triangular numbers
return T_n
for n in range(1,1*10**8): #Nr with over 500 divisors is large, large range required
count = 2 #Every nr is divisible by itself and by 1, thus initially count = 2
for i in range(2,int(triangularnr(n))): #Defining i to find divisors
if triangularnr(n)%i == 0:
count+=1 #Incrementing count to count the nr of divisors
if count > 500: #If a triangularnr has over 500 divisors we want to print the nr
print(triangularnr(n))
break
I've tried to optimize the code by reducing the nr of steps required and by using a mathematical formula for triangular numbers. My code works for 5 divisors, but it takes ages to find 500 divisors. I've let the code run for 3 hours yesterday and still no output. Should I let the code run for more than 3 hours, or will the output never be printed as there is something wrong with my code?
This is more a mathematical answer than a programming one, but one way to improve your algorithm is to think about how to determine the number of divisors of a number.
The brute force way would be to iterate over all numbers smaller than your test number and check every one of them but this is not very efficient for large numbers as you found out.
A more efficient way would be to consider the prime decomposition of your test number. Any integer can be written as the product of prime numbers. Suppose that N has prime factors p1, p2,... , pn with exponents k1, k2,... ,kn, i.e. N == p1**k1 * p2**k2 * ... * pn**kn, then the number of divisors of N is equal to (k1+1)*(k2+1)*...*(kn+1). So finding the number of divisors is equivalent to finding the prime factors of a number which restricts the number of integers you need to check considerably.
Another thing to realize is that if integers N1 and N2 have no prime factors in common (which is the case for N and N+1), the number of divisors of N1*N2 is equal to the number of divisors of N1 times the number of divisors of N2. This means that since you are considering numbers of the form N*(N+1)//2 and since N and N+1 have no prime factors in common, the number of divisors of your triangular numbers is equal to the product of the number of divisors of N//2 and the number of divisors of N+1 for even N, and to the product of the number of divisors of N and the number of divisors of (N+1)//2 for odd N.
Firstly, as a rule of thumb, for most project Euler exercises your code should take less than a minute to run. Remember, questions on project Euler challenge your ability to come up with interesting solutions, not to brute force answers.
Your algorithm is inefficient because:
Triangular numbers increase by the square (expand n*(n+1)/2)
Your algorithm loops through every triangular number
These two things mean that your algorithm probably has a complexity of n^3. Eg. doubling the required number of factors increases the search space by a factor of 8.
One tip that might prove useful is that if you have two numbers, a and b, the number of factors of a*b is equal to the number of factors of a multiplied by the number of factors of b. For example 5 has two factors (5, 1) and 14 has 4 factors (1, 2 ,7 and 14). From this we know that 5*14 has 2*4 factors, without having to search the numbers from 1 to 70.
Lucky for you the triangular number formula T_n = n * (n + 1) / 2 already comes broken down into two factors, so you need to code something to:
Determine if n or n + 1 is even
Divide the even factor by 2
Calculate the number of factors of these factors
Multiply these numbers to find the number of factors of the triangular number
Hope that helped :)

Optimizing a Prime Number Factorization algorithm

The following below is an algorithm that finds the prime factorization for a given number N. I'm wondering if there are any ways to make this faster using HUGE numbers. I'm talking like 20-35 digit numbers. I wanna try and get these to go as fast as possible. Any ideas?
import time
def prime_factors(n):
"""Returns all the prime factors of a positive integer"""
factors = []
divisor = 2
while n > 1:
while n % divisor == 0:
factors.append(divisor)
n /= divisor
divisor = divisor + 1
if divisor*divisor > n:
if n > 1:
factors.append(n)
break
return factors
#HUGE NUMBERS GO IN HERE!
start_time = time.time()
my_factors = prime_factors(15227063669158801)
end_time = time.time()
print my_factors
print "It took ", end_time-start_time, " seconds."
Your algorithm is trial division, which has time complexity O(sqrt(n)). You can improve your algorithm by using only 2 and the odd numbers as trial divisors, or even better by using only prime numbers as trial divisors, but the time complexity will remain O(sqrt(n)).
To go faster you need a better algorithm. Try this:
def factor(n, c):
f = lambda(x): (x*x+c) % n
t, h, d = 2, 2, 1
while d == 1:
t = f(t); h = f(f(h)); d = gcd(t-h, n)
if d == n:
return factor(n, c+1)
return d
To call it on your number, say
print factor(15227063669158801, 1)
That returns the (possibly composite) factor 2090327 virtually instantly. It uses an algorithm called the rho algorithm, invented by John Pollard in 1975. The rho algorithm has time complexity O(sqrt(sqrt(n))), so it's much faster than trial division.
There are many other algorithms for factoring integers. For numbers in the 20 to 35 digit range that interests you, the elliptic curve algorithm is well-suited. It should factor numbers of that size in no more than a few seconds. Another algorithm that is well-suited to such numbers, especially those that are semi-primes (have exactly two prime factors), is SQUFOF.
If you're interested in programming with prime numbers, I modestly recommend this essay on my blog. When you're finished with that, other entries on my blog talk about elliptic curve factorization, and SQUFOF, and various other even more-powerful methods of factoring ever-larger integers.
For example, list all prime factorization for a number 100.
Check 2 is one of factorizations or not. And then, 2 < 2*c <= 100 could be removed. Ex, 4, 6, 8, ... 98
Check 3 is one of factorizations or not. And then, 3 < 2*d <= 100 could be removed. Ex, 9, 12, ... 99
4 is removed from possible set.
Check 5, And then, 10, 15, 20, ..., 100 are removed.
6 is removed.
Check 7, ....
....
It seems like there is no check for divisors. Sorry if I am wrong but how do you know if divisor is prime or not? Your divisor variable is increasing by 1 after each loop so I assume it will generate a lot of composite numbers.
No optimizations to that algorithm will allow you to factor 35 digit numbers at least in the general case. The reason is that the number of primes up to 35 digits are too high to be listed in a reasonable amount of time let alone attempt to divide by each one. Even if one was inclined to try, the number of bits required to store them would be far too much as well. In this case you'll want to select a different algorithm from the list of general purpose factorization algorithms.
However, if all the prime factors are small enough (say below 10^12 or so), then you could use a segmented Sieve of Eratosthenes, or simply find a list of primes up to some practical number (say 10^12 or so) online and use that instead of trying to calculate the primes and hope the list is large enough.

How can this be made to do fewer calculations? It is very inefficient with large numbers

num = input ()
fact = 0
while fact != num:
fact = fact + 1
rem = num % fact
if rem == 0:
print fact
You only need to go to the square root of the input number to get all the factors (not as far as half the number, as suggested elsewhere). For example, 24 has factors 1, 2, 3, 4, 6, 8, 12, 24. sqrt(24) is approx 4.9. Check 1 and also get 24, check 2 and also get 12, check 3 and also get 8, check 4 and also get 6. Since 5 > 4.9, no need to check it. (Yes, I know 24 isn't the best example as all whole numbers less than sqrt(24) are factors of 24.)
factors = set()
for i in xrange(math.floor(math.sqrt(x))+1):
if x % i == 0:
factors.add(i)
factors.add(x/i)
print factors
There are some really complicated ways to do better for large numbers, but this should get you a decent runtime improvement. Depending on your application, caching could also save you a lot of time.
Use for loops, for starters. Then, let Python increment for you, and get rid of the unnecessary rem variable. This code does exactly the same as your code, except in a "Pythonic" way.
num = input()
for x in xrange(1, num):
if (num % x) == 0:
print fact
xrange(x, y) returns a generator for all integers from x up to, but not including y.
So that prints out all the factors of a number? The first obvious optimisation is that you could quit when fact*2 is greater than num. Anything greater than half of num can't be a factor. That's half the work thrown out instantly.
The second is that you'd be better searching for the prime factorisation and deriving all the possible factors from that. There are a bunch of really smart algorithms for that sort of thing.
Once you get halfway there (once fact>num/2), your not going to discover any new numbers as the numbers above num/2 can be discovered by calculating num/fact for each one (this can also be used to easily print each number with its pair).
The following code should cust the time down by a few seconds on every calculation and cut it in half where num is odd. Hopefully you can follow it, if not, ask.
I'll add more if I think of something later.
def even(num):
'''Even numbers can be divided by odd numbers, so test them all'''
fact=0
while fact<num/2:
fact+=1
rem=num % fact
if rem == 0:
print '%s and %s'%(fact,num/fact)
def odd(num):
'''Odd numbers can't be divided by even numbers, so why try?'''
fact=-1
while fact<num/2:
fact+=2
rem=num % fact
if rem == 0:
print '%s and %s'%(fact,num/fact)
while True:
num=input(':')
if str(num)[-1] in '13579':
odd(num)
else:
even(num)
Research integer factorization methods.
Unfortunately in Python, the divmod operation is implemented as a built-in function. Despite hardware integer division often producing the quotient and the remainder simultaneously, no non-assembly language that I'm aware of has implemented a /% or //% basic operator.
So: the following is a better brute-force algorithm if you count machine operations. It gets all factors in O(sqrt(N)) time without having to calculate sqrt(N) -- look, Mum, no floating point!
# even number
fact = 0
while 1:
fact += 1
fact2, rem = divmod(num, fact)
if not rem:
yield fact
yield fact2
if fact >= fact2 - 1:
# fact >= math.sqrt(num)
break
Yes. Use a quantum computer
Shor's algorithm, named after mathematician Peter Shor, is a quantum
algorithm (an algorithm which runs on a quantum computer) for integer
factorization formulated in 1994. Informally it solves the following
problem: Given an integer N, find its prime factors.
On a quantum computer, to factor an integer N, Shor's algorithm runs
in polynomial time (the time taken is polynomial in log N, which is
the size of the input). Specifically it takes time O((log N)3),
demonstrating that the integer factorization problem can be
efficiently solved on a quantum computer and is thus in the complexity
class BQP. This is exponentially faster than the most efficient known
classical factoring algorithm, the general number field sieve, which
works in sub-exponential time — about O(e1.9 (log N)1/3 (log log
N)2/3). The efficiency of Shor's algorithm is due to the efficiency
of the quantum Fourier transform, and modular exponentiation by
squarings.

Categories

Resources