Speeding up my Fermat Factorization function (Python) - python

My task is to factor very large composite numbers using Fermat's factorization method. The numbers are 1024 bits large, which is around 309 decimal digits.
I have come up with the Python code below, which uses the gmpy2 module for accuracy. It is simply a Python implementation of the pseudo-code shown on the Wikipedia page. I read the "Sieve Improvement" section on that page, but wasn't sure how to implement it.
def fermat_factor(n):
assert n % 2 != 0 # Odd integers only
a = gmpy2.ceil(gmpy2.sqrt(n))
b2 = gmpy2.square(a) - n
while not is_square(b2):
a += 1
b2 = gmpy2.square(a) - n
factor1 = a + gmpy2.sqrt(b2)
factor2 = a - gmpy2.sqrt(b2)
return int(factor1), int(factor2)
def is_square(n):
root = gmpy2.sqrt(n)
return root % 1 == 0 # '4.0' will pass, '4.1212' won't
This code runs fairly fast for small numbers, but takes much too long for numbers as large as those given in the problem. How can I improve the speed of this code? I'm not looking for people to write my code for me, but would appreciate some suggestions. Thank you for any responses.

You need to avoid doing so many square and sqrt operations, especially on large numbers.
The easy way to avoid them is to note that a^2 - N = b^2 must be true for all moduli to be a solution. For example,
a^2 mod 9 - N mod 9 = b^2 mod 9
Let's say your N is 55, so N mod 9 = 1.
Now consider the set of (a mod 9), and square it, modulo 9.
The resulting a^2 mod 9 is the set: {0, 1, 4, 7}. The same must be true for the b^2 mod 9.
If a^2 mod 9 = 0, then 0 - 1 = 8 (all mod 9) is not a solution, since 8 is not a square of a number modulo 9. This eliminates (a mod 9) = {0, 3 and 6}.
If a^2 mod 9 = 1, the 1 - 1 = 0 (all mod 9), so (a mod 9) = {1, 8} are possible solutions.
If a^2 mod 9 = 4, then 4 - 1 = 3 (all mod 9) is not a possible solution.
Ditto for a^2 mod 9 = 7.
So, that one modulus eliminated 7 out of 9 possible values of 'a mod 9'.
And you can have many moduli, each one eliminating at least half of the possibilities.
With a set of, say, 10 moduli, you only have to check about 1 in 1,000 a's for being perfect squares, or having integer square roots. (I use about 10,000 moduli for my work).
Note: Moduli which are powers of a prime are often more useful than a prime.
Also, a modulus of 16 is a useful special case, because 'a' must be odd when N mod 4 is 1,
and 'a' must be even when N mod 4 is 3. "Proof is left as an exercise for the student."

Consider rewriting this script to use only integers instead of arbitrary precision floats.
gmpy has support for integer square root (returns the floor of the square root, calculated efficiently). This can be used for the is_square() function by testing if the square of the square root equals the original.
I'm not sure about gmpy2, but in gmpy.sqrt() requires an integer argument, and returns an integer output. If you are using floats, then that is probably your problem (since floats are very slow as compared to integers, especially when using extended precision). If you are in fact using integers, then is_square() must be doing a tedious conversion from integer to float every time it is called (and gmpy2.sqrt() != gmpy.sqrt()).
For those of you who keep saying that this is a difficult problem, keep in mind that using this method was a hint: The fermat factorization algorithm is based on a weakness present when the composite number to be factored has two prime factors which are close to each other. If this was given as a hint, it is likely that the entity posing the problem knows this to be the case.
Edit: Apparently, gmpy2.isqrt() is the same as gmpy.sqrt() (the integer version of sqrt), and gmpy2.sqrt() is the floating-point version.

Related

Python pow() and modulus

The pow() function in python3 provide the values for exponents.
>>>pow(2,3)
8
Python3 has support to negative exponents that is can be represented using pow(10,-1). When I calculated pow(4,-1,5), it gave the output 4.
>>> pow(4, -1, 5)
4
I couldn't understand how the value 4 was calculated because, in the background, it performs
and it didn't return a value 4 as a reminder when I calculated manually.
When -ve value is passed in two values it responds with the desired output as a manual method.
>>> pow(4, -1)
.25
What is the difference when calculating a negative exponent with a modulus?
From the documentation;
If mod is present and exp is negative, base must be relatively prime to mod. In that case, pow(inv_base, -exp, mod) is returned, where inv_base is an inverse to base modulo mod.
Starting in python 3.8, the pow function allows you to calculate a modular inverse. As other answers have mentioned, this occurs when you use integers, have a negative exp, and base is relatively prime to mod. (this is the case in your example)
What is a modular inverse?
Lets start with normal inverses. Some number Y has an inverse X such that Y * X == 1. Modular inverses are very similar. For some number Y and some modulus mod, there exists an inverse X such that ((X * Y) % mod) == 1. From your example, you will see (4 * 4) % 5 does in fact equal 1, making 4 a valid modular inverse for Y = 4 and mod = 5.
How do you just get pow(4, -1, 5) == 0.25
Well, you could write it as separate steps (4 ** -1) % 5, but as the documentation says
if mod is present, return base to the power exp, modulo mod (computed more efficiently than pow(base, exp) % mod)
So you may sacrifice performance by using (4 ** -1) % 5. Unfortunately, it does not seem possible to do with pow.

My answer is changing with the same code [duplicate]

This question already has answers here:
Why does integer division yield a float instead of another integer?
(4 answers)
Closed 5 months ago.
I am a complete python beginner and I am trying to solve this problem :
A number is called triangular if it is the sum of the first n positive
integers for some n For example, 10 is triangular because 10 = 1+2+3+4
and 21 is triangular because 21 = 1+2+3+4+5+6. Write a Python program
to find the smallest 6-digit triangular number. Enter it as your
answer below.
I have written this program:
n = 0
trinum = 0
while len(str(trinum)) < 6:
trinum = n*(n+1)/2
n += 1
print(trinum)
And it only works in the python I have installed on my computer if I say while len(str(trinum)) < 8: but it is supposed to be while len(str(trinum)) < 6:. So I went to http://www.skulpt.org/ and ran my code there and it gave me the right answer with while len(str(trinum)) < 6: like it's supposed to. But it doesn't work with 6 with the python i have installed on my computer. Does anyone have any idea what's going on?
Short Answer
In Python 3, division is always floating point division. So on the first pass you get something like str(trinum) == '0.5'. Which isn't what you want.
You're looking for integer division. The operator for that is //.
Long Answer
The division operator changed in Python 2.x to 3.x. Previously, the type of the result was dependent on the arguments. So 1/2 does integer division, but 1./2 does floating point division.
To clean this up, a new operator was introduced: //. This operator will always do integer division.
So in Python 3.x, this expression (4 * 5)/2 is equal to 10.0. Note that this number is less than 100, but it has 4 characters in it.
If instead, we did (4*5)//2, we would get the integer 10 back. Which would allow your condition to hold true.
In Python 2, the / operator performs integer division when possible: "x divided by y is a remainder b," throwing away the "b" (use the % operator to find "b"). In Python 3, the / operator always performs float division: "x divided by y is a.fgh." Get integer division in Python 3 with the // operator.
You have two problems here, that combine to give you the wrong answer.
The first problem is that you're using /, which means integer division in Python 2 (and the almost-Python language that Skulpt implements), but float division in Python 3. So, when you run it on your local machine with Python 3, you're going to get floating point numbers.
The second problem is that you're not checking for "under 6 digits" you're checking for "under 6 characters long". For positive integers, those are the same thing, but for floats, say, 1035.5 is only 4 digits, but it's 6 characters. So you exit early.
If you solve either problem, it will work, at least most of the time. But you really should solve both.
So:
n = 0
trinum = 0
while trinum < 10**6: # note comparing numbers, not string length
trinum = n*(n+1)//2 # note // instead of /
n += 1
print(trinum)
The first problem is fixed by using //, which always means integer division, instead of /, which means different things in different Python versions.
The second problem is fixed by comparing the number as a number to 10**6 (that is, 10 to the 6th power, which means 1 with 6 zeros, or 1000000) instead of comparing its length as a string to 6.
Taking Malik Brahimi's answer further:
from itertools import *
print(next(dropwhile(lambda n: n <= 99999, accumulate(count(1))))
count(1) is all the numbers from 1 to infinity.
accumulate(count(1)) is all the running totals of those numbers.
dropwhile(…) is skipping the initial running totals until we reach 100000, then all the rest of them.
next(…) is the next one after the ones we skipped.
Of course you could argue that a 1-liner that takes 4 lines to describe to a novice isn't as good as a 4-liner that doesn't need any explanation. :)
(Also, the dropwhile is a bit ugly. Most uses of it in Python are. In a language like Haskell, where you can write that predicate with operator sectioning instead of a lambda, like (<= 99999), it's a different story.)
The division method in Py2.x and 3.x is different - so that is probably why you had issues.
Just another suggestion - which doesn't deal with divisions and lengths - so less buggy in general. Plus addition is addition anywhere.
trinum = 0
idx =0
while trinum < 99999: #largest 5 digit number
idx += 1
trinum += idx
print trinum
import itertools # to get the count function
n, c = 0, itertools.count(1) # start at zero
while n <= 99999:
n = n + next(c)

Prime numbers which can be written as sum of the squares of two numbers x and y

The problem is:
Given a range of numbers (x,y) , Find all the prime numbers(Count only) which are sum of the squares of two numbers, with the restriction that 0<=x<y<=2*(10^8)
According to Fermat's theorem :
Fermat's theorem on sums of two squares asserts that an odd prime number p can be
expressed as p = x^2 + y^2 with integer x and y if and only if p is congruent to
1 (mod4).
I have done something like this:
import math
def is_prime(n):
if n % 2 == 0 and n > 2:
return False
return all(n % i for i in range(3, int(math.sqrt(n)) + 1, 2))
a,b=map(int,raw_input().split())
count=0
for i in range(a,b+1):
if(is_prime(i) and (i-1)%4==0):
count+=1
print(count)
But this increases the time complexity and memory limit in some cases.
Here is my submission result:
Can anyone help me reduce the Time Complexity and Memory limit with better algorithm?
Problem Link(Not an ongoing contest FYI)
Do not check whether each number is prime. Precompute all the prime numbers in the range, using Sieve of Eratosthenes. This will greatly reduce the complexity.
Since you have maximum of 200M numbers and 256Mb memory limit and need at least 4 bytes per number, you need a little hack. Do not initialize the sieve with all numbers up to y, but only with numbers that are not divisible by 2, 3 and 5. That will reduce the initial size of the sieve enough to fit into the memory limit.
UPD As correctly pointed out by Will Ness in comments, sieve contains only flags, not numbers, thus it requires not more than 1 byte per element and you don't even need this precomputing hack.
You can reduce your memory usage by changing for i in range(a,b+1): to for i in xrange(a,b+1):, so that you are not generating an entire list in memory.
You can do the same thing inside the statement below, but you are right that it does not help with time.
return all(n % i for i in xrange(3, int(math.sqrt(n)) + 1, 2))
One time optimization that might not cost as much in terms of memory as the other answer is to use Fermat's Little Theorem. It may help you reject many candidates early.
More specifically, you could pick maybe 3 or 4 random values to test and if one of them rejects, then you can reject. Otherwise you can do the test you are currently doing.
First of all, although it will not change the order of your time-complexity, you can still narrow down the list of numbers that you are checking by a factor of 6, since you only need to check numbers that are either equal to 1 mod 12 or equal to 5 mod 12 (such as [1,5], [13,17], [25,29], [37,41], etc).
Since you only need to count the primes which are sum of squares of two numbers, the order doesn't matter. Therefore, you can change range(a,b+1) to range(1,b+1,12)+range(5,b+1,12).
Obviously, you can then remove the if n % 2 == 0 and n > 2 condition in function is_prime, and in addition, change the if is_prime(i) and (i-1)%4 == 0 condition to if is_prime(i).
And finally, you can check the primality of each number by dividing it only with numbers that are adjacent to multiples of 6 (such as [5,7], [11,13], [17,19], [23,25], etc).
So you can change this:
range(3,int(math.sqrt(n))+1,2)
To this:
range(5,math.sqrt(n))+1,6)+range(7,math.sqrt(n))+1,6)
And you might as well calculate math.sqrt(n))+1 beforehand.
To summarize all this, here is how you can improve the overall performance of your program:
import math
def is_prime(n):
max = int(math.sqrt(n))+1
return all(n % i for i in range(5,max,6)+range(7,max,6))
count = 0
b = int(raw_input())
for i in range(1,b+1,12)+range(5,b+1,12):
if is_prime(i):
count += 1
print count
Please note that 1 is typically not regarded as prime, so you might want to print count-1 instead. On the other hand, 2 is not equal to 1 mod 4, yet it is the sum of two squares, so you may leave it as is...

Optimizing a Prime Number Factorization algorithm

The following below is an algorithm that finds the prime factorization for a given number N. I'm wondering if there are any ways to make this faster using HUGE numbers. I'm talking like 20-35 digit numbers. I wanna try and get these to go as fast as possible. Any ideas?
import time
def prime_factors(n):
"""Returns all the prime factors of a positive integer"""
factors = []
divisor = 2
while n > 1:
while n % divisor == 0:
factors.append(divisor)
n /= divisor
divisor = divisor + 1
if divisor*divisor > n:
if n > 1:
factors.append(n)
break
return factors
#HUGE NUMBERS GO IN HERE!
start_time = time.time()
my_factors = prime_factors(15227063669158801)
end_time = time.time()
print my_factors
print "It took ", end_time-start_time, " seconds."
Your algorithm is trial division, which has time complexity O(sqrt(n)). You can improve your algorithm by using only 2 and the odd numbers as trial divisors, or even better by using only prime numbers as trial divisors, but the time complexity will remain O(sqrt(n)).
To go faster you need a better algorithm. Try this:
def factor(n, c):
f = lambda(x): (x*x+c) % n
t, h, d = 2, 2, 1
while d == 1:
t = f(t); h = f(f(h)); d = gcd(t-h, n)
if d == n:
return factor(n, c+1)
return d
To call it on your number, say
print factor(15227063669158801, 1)
That returns the (possibly composite) factor 2090327 virtually instantly. It uses an algorithm called the rho algorithm, invented by John Pollard in 1975. The rho algorithm has time complexity O(sqrt(sqrt(n))), so it's much faster than trial division.
There are many other algorithms for factoring integers. For numbers in the 20 to 35 digit range that interests you, the elliptic curve algorithm is well-suited. It should factor numbers of that size in no more than a few seconds. Another algorithm that is well-suited to such numbers, especially those that are semi-primes (have exactly two prime factors), is SQUFOF.
If you're interested in programming with prime numbers, I modestly recommend this essay on my blog. When you're finished with that, other entries on my blog talk about elliptic curve factorization, and SQUFOF, and various other even more-powerful methods of factoring ever-larger integers.
For example, list all prime factorization for a number 100.
Check 2 is one of factorizations or not. And then, 2 < 2*c <= 100 could be removed. Ex, 4, 6, 8, ... 98
Check 3 is one of factorizations or not. And then, 3 < 2*d <= 100 could be removed. Ex, 9, 12, ... 99
4 is removed from possible set.
Check 5, And then, 10, 15, 20, ..., 100 are removed.
6 is removed.
Check 7, ....
....
It seems like there is no check for divisors. Sorry if I am wrong but how do you know if divisor is prime or not? Your divisor variable is increasing by 1 after each loop so I assume it will generate a lot of composite numbers.
No optimizations to that algorithm will allow you to factor 35 digit numbers at least in the general case. The reason is that the number of primes up to 35 digits are too high to be listed in a reasonable amount of time let alone attempt to divide by each one. Even if one was inclined to try, the number of bits required to store them would be far too much as well. In this case you'll want to select a different algorithm from the list of general purpose factorization algorithms.
However, if all the prime factors are small enough (say below 10^12 or so), then you could use a segmented Sieve of Eratosthenes, or simply find a list of primes up to some practical number (say 10^12 or so) online and use that instead of trying to calculate the primes and hope the list is large enough.

question on karatsuba multiplication

I want to implement Karatsuba's 2-split multiplication in Python. However, writing numbers in the form
A=c*x+d
where x is a power of the base (let x=b^m) close to sqrt(A).
How am I supposed to find x, if I can't even use division and multiplication? Should I count the number of digits and shift A to the left by half the number of digits?
Thanks.
Almost. You don't shift A by half the number of digits; you shift 1. Of course, this is only efficient if the base is a power of 2, since "shifting" in base 10 (for example) has to be done with multiplications. (Edit: well, ok, you can multiply with shifts and additions. But it's ever so much simpler with a power of 2.)
If you're using Python 3.1 or greater, counting the bits is easy, because 3.1 introduced the int.bit_length() method. For other versions of Python, you can count the bits by copying A and shifting it right until it's 0. This can be done in O(log N) time (N = # of digits) with a sort of binary search method - shift by many bits, if it's 0 then that was too many, etc.
You already accepted an answer since I started writing this, but:
What Tom said: in Python 3.x you can get n = int.bit_length() directly.
In Python 2.x you get n in O(log2(A)) time by binary-search, like below.
Here is (2.x) code that calculates both. Let the base-2 exponent of x be n, i.e. x = 2**n.
First we get n by binary-search by shifting. (Really we only needed n/2, so that's one unnecessary last iteration).
Then when we know n, getting x,c,d is easy (still no using division)
def karatsuba_form(A,n=32):
"""Binary-search for Karatsuba form using binary shifts"""
# First search for n ~ log2(A)
step = n >> 1
while step>0:
c = A >> n
print 'n=%2d step=%2d -> c=%d' % (n,step,c)
if c:
n += step
else:
n -= step
# More concisely, could say: n = (n+step) if c else (n-step)
step >>= 1
# Then take x = 2^(n/2) ˜ sqrt(A)
ndiv2 = n/2
# Find Karatsuba form
c = (A >> ndiv2)
x = (1 << ndiv2)
d = A - (c << ndiv2)
return (x,c,d)
Your question is already answered in the article to which you referred: "Karatsuba's basic step works for any base B and any m, but the recursive algorithm is most efficient when m is equal to n/2, rounded up" ... n being the number of digits, and 0 <= value_of_digit < B.
Some perspective that might help:
You are allowed (and required!) to use elementary operations like number_of_digits // 2 and divmod(digit_x * digit_x, B) ... in school arithmetic, where B is 10, you are required (for example) to know that divmod(9 * 8, 10) produces (7, 2).
When implementing large number arithmetic on a computer, it is usual to make B the largest power of 2 that will support the elementary multiplication operation conveniently. For example in the CPython implementation on a 32-bit machine, B is chosen to to be 2 ** 15 (i.e. 32768), because then product = digit_x * digit_y; hi = product >> 15; lo = product & 0x7FFF; works without overflow and without concern about a sign bit.
I'm not sure what you are trying to achieve with an implementation in Python that uses B == 2, with numbers represented by Python ints, whose implementation in C already uses the Karatsuba algorithm for multiplying numbers that are large enough to make it worthwhile. It can't be speed.
As a learning exercise, you might like to try representing a number as a list of digits, with the base B being an input parameter.

Categories

Resources