question on karatsuba multiplication

question on karatsuba multiplication - python

I want to implement Karatsuba's 2-split multiplication in Python. However, writing numbers in the form
A=c*x+d
where x is a power of the base (let x=b^m) close to sqrt(A).
How am I supposed to find x, if I can't even use division and multiplication? Should I count the number of digits and shift A to the left by half the number of digits?
Thanks.

Almost. You don't shift A by half the number of digits; you shift 1. Of course, this is only efficient if the base is a power of 2, since "shifting" in base 10 (for example) has to be done with multiplications. (Edit: well, ok, you can multiply with shifts and additions. But it's ever so much simpler with a power of 2.)
If you're using Python 3.1 or greater, counting the bits is easy, because 3.1 introduced the int.bit_length() method. For other versions of Python, you can count the bits by copying A and shifting it right until it's 0. This can be done in O(log N) time (N = # of digits) with a sort of binary search method - shift by many bits, if it's 0 then that was too many, etc.

You already accepted an answer since I started writing this, but:
What Tom said: in Python 3.x you can get n = int.bit_length() directly.
In Python 2.x you get n in O(log2(A)) time by binary-search, like below.
Here is (2.x) code that calculates both. Let the base-2 exponent of x be n, i.e. x = 2**n.
First we get n by binary-search by shifting. (Really we only needed n/2, so that's one unnecessary last iteration).
Then when we know n, getting x,c,d is easy (still no using division)
def karatsuba_form(A,n=32):
"""Binary-search for Karatsuba form using binary shifts"""
# First search for n ~ log2(A)
step = n >> 1
while step>0:
c = A >> n
print 'n=%2d step=%2d -> c=%d' % (n,step,c)
if c:
n += step
else:
n -= step
# More concisely, could say: n = (n+step) if c else (n-step)
step >>= 1
# Then take x = 2^(n/2) ˜ sqrt(A)
ndiv2 = n/2
# Find Karatsuba form
c = (A >> ndiv2)
x = (1 << ndiv2)
d = A - (c << ndiv2)
return (x,c,d)

Your question is already answered in the article to which you referred: "Karatsuba's basic step works for any base B and any m, but the recursive algorithm is most efficient when m is equal to n/2, rounded up" ... n being the number of digits, and 0 <= value_of_digit < B.
Some perspective that might help:
You are allowed (and required!) to use elementary operations like number_of_digits // 2 and divmod(digit_x * digit_x, B) ... in school arithmetic, where B is 10, you are required (for example) to know that divmod(9 * 8, 10) produces (7, 2).
When implementing large number arithmetic on a computer, it is usual to make B the largest power of 2 that will support the elementary multiplication operation conveniently. For example in the CPython implementation on a 32-bit machine, B is chosen to to be 2 ** 15 (i.e. 32768), because then product = digit_x * digit_y; hi = product >> 15; lo = product & 0x7FFF; works without overflow and without concern about a sign bit.
I'm not sure what you are trying to achieve with an implementation in Python that uses B == 2, with numbers represented by Python ints, whose implementation in C already uses the Karatsuba algorithm for multiplying numbers that are large enough to make it worthwhile. It can't be speed.
As a learning exercise, you might like to try representing a number as a list of digits, with the base B being an input parameter.

Related

HackerRank bitwiseAnd challenge - algorithm is too slow?

I am working on this code challenge on HackerRank: Day 29: Bitwise AND:
Task
Given set 𝑆={1,2,3,...,𝑁}. Find two integers, 𝐴 and 𝐵 (where 𝐴 < 𝐵), from set 𝑆 such that the value of 𝐴&𝐵 is the
maximum possible and also less than a given integer, 𝐾. In this case,
& represents the bitwise AND operator.
Function Description
Complete the bitwiseAnd function in the editor below.
bitwiseAnd has the following parameter(s):
int N: the maximum integer to consider
int K: the limit of the result, inclusive
Returns
int: the maximum value of 𝐴&𝐵 within the limit.
Input Format
The first line contains an integer, 𝑇, the number of test cases. Each
of the 𝑇 subsequent lines defines a test case as 2 space-separated
integers, 𝑁 and 𝐾, respectively.
Constraints
1 ≤ 𝑇 ≤ 103
2 ≤ 𝑁 ≤ 103
2 ≤ 𝐾 ≤ 𝑁
Sample Input
STDIN Function
----- --------
3 T = 3
5 2 N = 5, K = 2
8 5 N = 8, K = 5
2 2 N = 2, K = 2*
Sample Output
1
4
0
*At the time of writing the original question has an error here. Corrected in this copy.
I was not able to solve it using Python, as the time limit was exceeded every time, with a message telling me to optimise my code (some test cases had like 1000 requests).
I then tried writing the same code in C#, and it worked perfectly, executing like 10 times faster, even without any effort in optimizing the code.
Is it possible to further optimise the code below, for example with some magic that I don't know about?
Code:
def bitwiseAnd(N, K):
result = 0
for x in range(1, N):
for y in range(x+1, N+1):
if result < x&y < K:
result = x&y
return result
for x in range(int(input())):
print(bitwiseAnd(*[int(x) for x in input().split(' ')]))

Your algorithm has a brute force approach, but it can be done more efficiently.
First, observe some properties of this problem:
𝐴 & 𝐵 will never be greater than 𝐴 nor than 𝐵
If we think we have a solution 𝐶, then both 𝐴 and 𝐵 should have the same 1-bits as 𝐶 has, including possibly a few more.
We want 𝐴 and 𝐵 to not be greater than needed, since they need to be not greater than 𝑁, so given the previous point, we should let 𝐴 be equal to 𝐶, and let 𝐵 just have one 1-bit more than 𝐴 (since it should be a different number).
The least possible value for 𝐵 is then to set a 1-bit at the least bit in 𝐴 that is still 0.
If this 𝐵 is still not greater than 𝑁, then we can conclude that 𝐶 is a solution.
With the above steps in mind, it makes sense to first try with the greatest 𝐶 possible, i.e. 𝐶=𝐾−1, and then reduce 𝐶 until the above routine finds a 𝐵 that is not greater than 𝑁.
Here is the code for that:
def bitwiseAnd(N, K):
for A in range(K - 1, 0, -1):
# Find the least bit that has a zero in this number A:
# Use some "magic" to get the number with just one 1-bit in that position
bit = (A + 1) & -(A + 1)
B = A + bit
if B <= N:
# We know that A & B == B here, so just return A
return A
return 0

Number of multiplications done in iterated squaring

The following code computes a**b using iterated squaring:
def power(a,b):
result=1
while b>0:
if b % 2 == 1:
result = result*a
a = a*a
b = b//2
return result
Suppose the decimal numbers a and b have n and m bits in their binary representation.
I'm trying to understand how many multiplications the code does for the smallest and biggest numbers a and b could be depending on n and m.
I know that in lines 5 and 6 of the code, a multiplication is done, but I'm struggling expressing the number of multiplications with the number of bits of a and b in their binary representation.
Any help appreciated.

Well, the number of multiplications depend only on one factor for this algorithm - which is b (while b > 0).
We meet operations that changes b's value inside the loop once, where b = b//2.
While dealing with binary representation, dividing by two leads to the last bit being shifted right - and since we got m bits in b, that would mean the loop will be executed m times.
Since every time we have at least one multiplication and maximum two (depending on the number of 1s in m), and m is guaranteed to be larger than 0 for the loop to occur, we get a total of minimum m+1 and maximum m*2 multiplications.

My answer is changing with the same code [duplicate]

This question already has answers here:
Why does integer division yield a float instead of another integer?
(4 answers)
Closed 5 months ago.
I am a complete python beginner and I am trying to solve this problem :
A number is called triangular if it is the sum of the first n positive
integers for some n For example, 10 is triangular because 10 = 1+2+3+4
and 21 is triangular because 21 = 1+2+3+4+5+6. Write a Python program
to find the smallest 6-digit triangular number. Enter it as your
answer below.
I have written this program:
n = 0
trinum = 0
while len(str(trinum)) < 6:
trinum = n*(n+1)/2
n += 1
print(trinum)
And it only works in the python I have installed on my computer if I say while len(str(trinum)) < 8: but it is supposed to be while len(str(trinum)) < 6:. So I went to http://www.skulpt.org/ and ran my code there and it gave me the right answer with while len(str(trinum)) < 6: like it's supposed to. But it doesn't work with 6 with the python i have installed on my computer. Does anyone have any idea what's going on?

Short Answer
In Python 3, division is always floating point division. So on the first pass you get something like str(trinum) == '0.5'. Which isn't what you want.
You're looking for integer division. The operator for that is //.
Long Answer
The division operator changed in Python 2.x to 3.x. Previously, the type of the result was dependent on the arguments. So 1/2 does integer division, but 1./2 does floating point division.
To clean this up, a new operator was introduced: //. This operator will always do integer division.
So in Python 3.x, this expression (4 * 5)/2 is equal to 10.0. Note that this number is less than 100, but it has 4 characters in it.
If instead, we did (4*5)//2, we would get the integer 10 back. Which would allow your condition to hold true.

In Python 2, the / operator performs integer division when possible: "x divided by y is a remainder b," throwing away the "b" (use the % operator to find "b"). In Python 3, the / operator always performs float division: "x divided by y is a.fgh." Get integer division in Python 3 with the // operator.

You have two problems here, that combine to give you the wrong answer.
The first problem is that you're using /, which means integer division in Python 2 (and the almost-Python language that Skulpt implements), but float division in Python 3. So, when you run it on your local machine with Python 3, you're going to get floating point numbers.
The second problem is that you're not checking for "under 6 digits" you're checking for "under 6 characters long". For positive integers, those are the same thing, but for floats, say, 1035.5 is only 4 digits, but it's 6 characters. So you exit early.
If you solve either problem, it will work, at least most of the time. But you really should solve both.
So:
n = 0
trinum = 0
while trinum < 10**6: # note comparing numbers, not string length
trinum = n*(n+1)//2 # note // instead of /
n += 1
print(trinum)
The first problem is fixed by using //, which always means integer division, instead of /, which means different things in different Python versions.
The second problem is fixed by comparing the number as a number to 10**6 (that is, 10 to the 6th power, which means 1 with 6 zeros, or 1000000) instead of comparing its length as a string to 6.

Taking Malik Brahimi's answer further:
from itertools import *
print(next(dropwhile(lambda n: n <= 99999, accumulate(count(1))))
count(1) is all the numbers from 1 to infinity.
accumulate(count(1)) is all the running totals of those numbers.
dropwhile(…) is skipping the initial running totals until we reach 100000, then all the rest of them.
next(…) is the next one after the ones we skipped.
Of course you could argue that a 1-liner that takes 4 lines to describe to a novice isn't as good as a 4-liner that doesn't need any explanation. :)
(Also, the dropwhile is a bit ugly. Most uses of it in Python are. In a language like Haskell, where you can write that predicate with operator sectioning instead of a lambda, like (<= 99999), it's a different story.)

The division method in Py2.x and 3.x is different - so that is probably why you had issues.
Just another suggestion - which doesn't deal with divisions and lengths - so less buggy in general. Plus addition is addition anywhere.
trinum = 0
idx =0
while trinum < 99999: #largest 5 digit number
idx += 1
trinum += idx
print trinum

import itertools # to get the count function
n, c = 0, itertools.count(1) # start at zero
while n <= 99999:
n = n + next(c)

sum of even fibonacci numbers up to 4million

The method I've used to try and solve this works but I don't think it's very efficient because as soon as I enter a number that is too large it doesn't work.
def fib_even(n):
fib_even = []
a, b = 0, 1
for i in range(0,n):
c = a+b
if c%2 == 0:
fib_even.append(c)
a, b = b, a+b
return fib_even
def sum_fib_even(n):
fib_evens = fib_even(n)
s = 0
for i in fib_evens:
s = s+i
return s
n = 4000000
answer = sum_fib_even(n)
print answer
This for example doesn't work for 4000000 but will work for 400. Is there a more efficient way of doing this?

It is not necessary to compute all the Fibonacci numbers.
Note: I use in what follows the more standard initial values F[0]=0, F[1]=1 for the Fibonacci sequence. Project Euler #2 starts its sequence with F[2]=1,F[3]=2,F[4]=3,.... For this problem the result is the same for either choice.
Summation of all Fibonacci numbers (as a warm-up)
The recursion equation
F[n+1] = F[n] + F[n-1]
can also be read as
F[n-1] = F[n+1] - F[n]
or
F[n] = F[n+2] - F[n+1]
Summing this up for n from 1 to N (remember F[0]=0, F[1]=1) gives on the left the sum of Fibonacci numbers, and on the right a telescoping sum where all of the inner terms cancel
sum(n=1 to N) F[n] = (F[3]-F[2]) + (F[4]-F[3]) + (F[5]-F[4])
+ ... + (F[N+2]-F[N+1])
= F[N+2] - F[2]
So for the sum using the number N=4,000,000 of the question one would have just to compute
F[4,000,002] - 1
with one of the superfast methods for the computation of single Fibonacci numbers. Either halving-and-squaring, equivalent to exponentiation of the iteration matrix, or the exponential formula based on the golden ratio (computed in the necessary precision).
Since about every 20 Fibonacci numbers you gain 4 additional digits, the final result will consist of about 800000 digits. Better use a data type that can contain all of them.
Summation of the even Fibonacci numbers
Just inspecting the first 10 or 20 Fibonacci numbers reveals that all even members have an index of 3*k. Check by subtracting two successive recursions to get
F[n+3]=2*F[n+2]-F[n]
so F[n+3] always has the same parity as F[n]. Investing more computation one finds a recursion for members three indices apart as
F[n+3] = 4*F[n] + F[n-3]
Setting
S = sum(k=1 to K) F[3*k]
and summing the recursion over n=3*k gives
F[3*K+3]+S-F[3] = 4*S + (-F[3*K]+S+F[0])
or
4*S = (F[3*K]+F[3*K]) - (F[3]+F[0]) = 2*F[3*K+2]-2*F[2]
So the desired sum has the formula
S = (F[3*K+2]-1)/2
A quick calculation with the golden ration formula reveals what N should be so that F[N] is just below the boundary, and thus what K=N div 3 should be,
N = Floor( log( sqrt(5)*Max )/log( 0.5*(1+sqrt(5)) ) )
Reduction of the Euler problem to a simple formula
In the original problem, one finds that N=33 and thus the sum is
S = (F[35]-1)/2;
Reduction of the problem in the question and consequences
Taken the mis-represented problem in the question, N=4,000,000, so K=1,333,333 and the sum is
(F[1,333,335]-1)/2
which still has about 533,400 digits. And yes, biginteger types can handle such numbers, it just takes time to compute with them.
If printed in the format of 60 lines a 80 digits, this number fills 112 sheets of paper, just to get the idea what the output would look like.

It should not be necessary to store all intermediate Fibonacci numbers, perhaps the storage causes a performance problem.

Speeding up my Fermat Factorization function (Python)

My task is to factor very large composite numbers using Fermat's factorization method. The numbers are 1024 bits large, which is around 309 decimal digits.
I have come up with the Python code below, which uses the gmpy2 module for accuracy. It is simply a Python implementation of the pseudo-code shown on the Wikipedia page. I read the "Sieve Improvement" section on that page, but wasn't sure how to implement it.
def fermat_factor(n):
assert n % 2 != 0 # Odd integers only
a = gmpy2.ceil(gmpy2.sqrt(n))
b2 = gmpy2.square(a) - n
while not is_square(b2):
a += 1
b2 = gmpy2.square(a) - n
factor1 = a + gmpy2.sqrt(b2)
factor2 = a - gmpy2.sqrt(b2)
return int(factor1), int(factor2)
def is_square(n):
root = gmpy2.sqrt(n)
return root % 1 == 0 # '4.0' will pass, '4.1212' won't
This code runs fairly fast for small numbers, but takes much too long for numbers as large as those given in the problem. How can I improve the speed of this code? I'm not looking for people to write my code for me, but would appreciate some suggestions. Thank you for any responses.

You need to avoid doing so many square and sqrt operations, especially on large numbers.
The easy way to avoid them is to note that a^2 - N = b^2 must be true for all moduli to be a solution. For example,
a^2 mod 9 - N mod 9 = b^2 mod 9
Let's say your N is 55, so N mod 9 = 1.
Now consider the set of (a mod 9), and square it, modulo 9.
The resulting a^2 mod 9 is the set: {0, 1, 4, 7}. The same must be true for the b^2 mod 9.
If a^2 mod 9 = 0, then 0 - 1 = 8 (all mod 9) is not a solution, since 8 is not a square of a number modulo 9. This eliminates (a mod 9) = {0, 3 and 6}.
If a^2 mod 9 = 1, the 1 - 1 = 0 (all mod 9), so (a mod 9) = {1, 8} are possible solutions.
If a^2 mod 9 = 4, then 4 - 1 = 3 (all mod 9) is not a possible solution.
Ditto for a^2 mod 9 = 7.
So, that one modulus eliminated 7 out of 9 possible values of 'a mod 9'.
And you can have many moduli, each one eliminating at least half of the possibilities.
With a set of, say, 10 moduli, you only have to check about 1 in 1,000 a's for being perfect squares, or having integer square roots. (I use about 10,000 moduli for my work).
Note: Moduli which are powers of a prime are often more useful than a prime.
Also, a modulus of 16 is a useful special case, because 'a' must be odd when N mod 4 is 1,
and 'a' must be even when N mod 4 is 3. "Proof is left as an exercise for the student."

Consider rewriting this script to use only integers instead of arbitrary precision floats.
gmpy has support for integer square root (returns the floor of the square root, calculated efficiently). This can be used for the is_square() function by testing if the square of the square root equals the original.
I'm not sure about gmpy2, but in gmpy.sqrt() requires an integer argument, and returns an integer output. If you are using floats, then that is probably your problem (since floats are very slow as compared to integers, especially when using extended precision). If you are in fact using integers, then is_square() must be doing a tedious conversion from integer to float every time it is called (and gmpy2.sqrt() != gmpy.sqrt()).
For those of you who keep saying that this is a difficult problem, keep in mind that using this method was a hint: The fermat factorization algorithm is based on a weakness present when the composite number to be factored has two prime factors which are close to each other. If this was given as a hint, it is likely that the entity posing the problem knows this to be the case.
Edit: Apparently, gmpy2.isqrt() is the same as gmpy.sqrt() (the integer version of sqrt), and gmpy2.sqrt() is the floating-point version.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

question on karatsuba multiplication - python

Related

HackerRank bitwiseAnd challenge - algorithm is too slow?

Number of multiplications done in iterated squaring

My answer is changing with the same code [duplicate]

sum of even fibonacci numbers up to 4million

Speeding up my Fermat Factorization function (Python)

Categories

Resources