Related
I am trying to create a generator that returns numbers in a given range that pass a particular test given by a function foo. However I would like the numbers to be tested in a random order. The following code will achieve this:
from random import shuffle
def MyGenerator(foo, num):
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
The Problem
The problem with this solution is that sometimes the range will be quite large (num might be of the order 10**8 and upwards). This function can become slow, having such a large list in memory. I have tried to avoid this problem, with the following code:
from random import randint
def MyGenerator(foo, num):
tried = set()
while len(tried) <= num - 1:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
This works well most of the time, since in most cases num will be quite large, foo will pass a reasonable number of numbers and the total number of times the __next__ method will be called will be relatively small (say, a maximum of 200 often much smaller). Therefore its reasonable likely we stumble upon a value that passes the foo test and the size of tried never gets large. (Even if it only passes 10% of the time, we wouldn't expect tried to get larger than about 2000 roughly.)
However, when num is small (close to the number of times that the __next__ method is called, or foo fails most of the time, the above solution becomes very inefficient - randomly guessing numbers until it guesses one that isn't in tried.
My attempted solution...
I was hoping to use some kind of function that maps the numbers 0,1,2,..., n onto themselves in a roughly random way. (This isn't being used for any security purposes and so doesn't matter if it isn't the most 'random' function in the world). The function here (Create a random bijective function which has same domain and range) maps signed 32-bit integers onto themselves, but I am not sure how to adapt the mapping to a smaller range. Given num I don't even need a bijection on 0,1,..num just a value of n larger than and 'close' to num (using whatever definition of close you see fit). Then I can do the following:
def mix_function_factory(num):
# something here???
def foo(index):
# something else here??
return foo
def MyGenerator(foo, num):
mix_function = mix_function_factory(num):
for i in range(num):
index = mix_function(i)
if index <= num:
if foo(index):
yield index
(so long as the bijection isn't on a set of numbers massively larger than num the number of times index <= num isn't True will be small).
My Question
Can you think of one of the following:
A potential solution for mix_function_factory or even a few other potential functions for mix_function that I could attempt to generalise for different values of num?
A better way of solving the original problem?
Many thanks in advance....
The problem is basically generating a random permutation of the integers in the range 0..n-1.
Luckily for us, these numbers have a very useful property: they all have a distinct value modulo n. If we can apply some mathemical operations to these numbers while taking care to keep each number distinct modulo n, it's easy to generate a permutation that appears random. And the best part is that we don't need any memory to keep track of numbers we've already generated, because each number is calculated with a simple formula.
Examples of operations we can perform on every number x in the range include:
Addition: We can add any integer c to x.
Multiplication: We can multiply x with any number m that shares no prime factors with n.
Applying just these two operations on the range 0..n-1 already gives quite satisfactory results:
>>> n = 7
>>> c = 1
>>> m = 3
>>> [((x+c) * m) % n for x in range(n)]
[3, 6, 2, 5, 1, 4, 0]
Looks random, doesn't it?
If we generate c and m from a random number, it'll actually be random, too. But keep in mind that there is no guarantee that this algorithm will generate all possible permutations, or that each permutation has the same probability of being generated.
Implementation
The difficult part about the implementation is really just generating a suitable random m. I used the prime factorization code from this answer to do so.
import random
# credit for prime factorization code goes
# to https://stackoverflow.com/a/17000452/1222951
def prime_factors(n):
gaps = [1,2,2,4,2,4,2,4,6,2,6]
length, cycle = 11, 3
f, fs, next_ = 2, [], 0
while f * f <= n:
while n % f == 0:
fs.append(f)
n /= f
f += gaps[next_]
next_ += 1
if next_ == length:
next_ = cycle
if n > 1: fs.append(n)
return fs
def generate_c_and_m(n, seed=None):
# we need to know n's prime factors to find a suitable multiplier m
p_factors = set(prime_factors(n))
def is_valid_multiplier(m):
# m must not share any prime factors with n
factors = prime_factors(m)
return not p_factors.intersection(factors)
# if no seed was given, generate random values for c and m
if seed is None:
c = random.randint(n)
m = random.randint(1, 2*n)
else:
c = seed
m = seed
# make sure m is valid
while not is_valid_multiplier(m):
m += 1
return c, m
Now that we can generate suitable values for c and m, creating the permutation is trivial:
def random_range(n, seed=None):
c, m = generate_c_and_m(n, seed)
for x in range(n):
yield ((x + c) * m) % n
And your generator function can be implemented as
def MyGenerator(foo, num):
for x in random_range(num):
if foo(x):
yield x
That may be a case where the best algorithm depends on the value of num, so why not using 2 selectable algorithms wrapped in one generator ?
you could mix your shuffle and set solutions with a threshold on the value of num. That's basically assembling your 2 first solutions in one generator:
from random import shuffle,randint
def MyGenerator(foo, num):
if num < 100000 # has to be adjusted by experiments
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
else: # big values, few collisions with random generator
tried = set()
while len(tried) < num:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
The randint solution (for big values of num) works well because there aren't so many repeats in the random generator.
Getting the best performance in Python is much trickier than in lower-level languages. For example, in C, you can often save a little bit in hot inner loops by replacing a multiplication by a shift. The overhead of python bytecode-orientation erases this. Of course, this changes again when you consider which variant of "python" you're targetting (pypy? numpy? cython?)- you really have to write your code based on which one you're using.
But even more important is arranging operations to avoid serialized dependencies, since all CPUs are superscalar these days. Of course, real compilers know about this, but it still matters when choosing an algorithm.
One of the easiest ways to gain a little bit over existing answers would be by by generating numbers in chunks using numpy.arange() and applying the ((x + c) * m) % n to the numpy ndarray directly. Every python-level loop that can be avoided helps.
If the function can be applied directly to numpy ndarrays, that might even better. Of course, a sufficiently-small function in python will be dominated by function-call overhead anyway.
The best fast random-number-generator today is PCG. I wrote a pure-python port here but concentrated on flexibility and ease-of-understanding rather than speed.
Xoroshiro128+ is second-best-quality and faster, but less informative to study.
Python's (and many others') default choice of Mersenne Twister is among the worst.
(there's also something called splitmix64 which I don't know enough about to place - some people say it's better than xoroshiro128+, but it has a period problem - of course, you might want that here)
Both default-PCG and xoroshiro128+ use a 2N-bit state to generate N-bit numbers. This is generally desirable, but means numbers will be repeated. PCG has alternate modes that avoid this, however.
Of course, much of this depends on whether num is (close to) a power of 2. In theory, PCG variants can be created for any bit width, but currently only various word sizes are implemented since you'd need explicit masking. I'm not sure exactly how to generate the parameters for new bit sizes (perhaps it's in the paper?), but they can be tested simply by doing a period/2 jump and verifying that the value is different.
Of course, if you're only making 200 calls to the RNG, you probably don't actually need to avoid duplicates on the math side.
Alternatively, you could use an LFSR, which does exist for every bit size (though note that it never generates the all-zeros value (or equivalently, the all-ones value)). LFSRs are serial and (AFAIK) not jumpable, and thus can't be easily split across multiple tasks. Edit: I figured out that this is untrue, simply represent the advance step as a matrix, and exponentiate it to jump.
Note that LFSRs do have the same obvious biases as simply generating numbers in sequential order based on a random start point - for example, if rng_outputs[a:b] all fail your foo function, then rng_outputs[b] will be much more likely as a first output regardless of starting point. PCG's "stream" parameter avoids this by not generating numbers in the same order.
Edit2: I have completed what I thought was a "brief project" implementing LFSRs in python, including jumping, fully tested.
I need help with evaluating the expression. I just started it but at a loss for what next plus all the for loops I am using seem unnecessary. it has sum, products and combinations:
What I tried is incomplete and in my opinion not accurate. I tried several but all I can come up with for now. I don't have the denominator yet.
i = 10
N = 3.1
j = []
for x in range(1, i + 1):
for y in range(1, i):
for z in range(1, n - i):
l = N * y * z
j.append(l)
ll = sum(j)
Any help is appreciated. I want to be able to understand it so I can do more complex examples.
Here are some hints. If you try them and are still stuck, ask for more help.
First, you know that the expression involves "combinations," also called "binomial coefficients." So you will need a routine that calculates those. Here is a question with multiple answers on how to calculate these numbers. Briefly, you can use the scipy package or make your own routine that uses Python's factorial function or that uses iteration.
Next, you see that the expression involves sums and products and is written as a single expression. Python has a sum function which works on generator expressions (as well as list and set generators and other iterables). Your conversion from math to Python will be easier if you know how to set up such expressions. If you do not understand these generators/iterables and how to sum them, do research on this topic. This approach is not necessary, since you could use loops rather than the generators, but this approach will be easier. Study until you can understand an expression (including why the final number in the range has 1 added to it) such as
sum(N * f(x) for x in range(1, 5+1))
Last, your expression has products, but Python has no built-in way to take the product of an iterable. Here is such a function in Python 3.
from operator import mul
from functools import reduce
def prod(iterable):
"""Return the product of the numbers in an iterable."""
return reduce(mul, iterable, 1)
With all of that, your desired expression will look like this (you will need to finish the job by replacing the ... with something more useful):
numerator = sum(N * prod(... for y in range(1, 1+1)) for x in range(1, 5+1))
denominator = prod(y + N for y in range(1, 5+1))
result = numerator / denominator
Note that your final result is a function of N.
The question is available here. My Python code is
def solution(A, B):
if len(A) == 1:
return [1]
ways = [0] * (len(A) + 1)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
result = [1] * len(A)
for i in xrange(len(A)):
result[i] = ways[A[i]] & ((1<<B[i]) - 1)
return result
The detected time complexity by the system is O(L^2) and I can't see why. Thank you in advance.
First, let's show that the runtime genuinely is O(L^2). I copied a section of your code, and ran it with increasing values of L:
import time
import matplotlib.pyplot as plt
def solution(L):
if L == 0:
return
ways = [0] * (L+5)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
points = []
for L in xrange(0, 100001, 10000):
start = time.time()
solution(L)
points.append(time.time() - start)
plt.plot(points)
plt.show()
The result graph is this:
To understand why this O(L^2) when the obvious "time complexity" calculation suggests O(L), note that "time complexity" is not a well-defined concept on its own since it depends on which basic operations you're counting. Normally the basic operations are taken for granted, but in some cases you need to be more careful. Here, if you count additions as a basic operation, then the code is O(N). However, if you count bit (or byte) operations then the code is O(N^2). Here's the reason:
You're building an array of the first L Fibonacci numbers. The length (in digits) of the i'th Fibonacci number is Theta(i). So ways[i] = ways[i-1] + ways[i-2] adds two numbers with approximately i digits, which takes O(i) time if you count bit or byte operations.
This observation gives you an O(L^2) bit operation count for this loop:
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
In the case of this program, it's quite reasonable to count bit operations: your numbers are unboundedly huge as L increases and addition of huge numbers is linear in clock time rather than O(1).
You can fix the complexity of your code by computing the Fibonacci numbers mod 2^32 -- since 2^32 is a multiple of 2^B[i]. That will keep a finite bound on the numbers you're dealing with:
for i in xrange(3, len(ways)):
ways[i] = (ways[i-1] + ways[i-2]) & ((1<<32) - 1)
There are some other issues with the code, but this will fix the slowness.
I've taken the relevant parts of the function:
def solution(A, B):
for i in xrange(3, len(A) + 1): # replaced ways for clarity
# ...
for i in xrange(len(A)):
# ...
return result
Observations:
A is an iterable object (e.g. a list)
You're iterating over the elements of A in sequence
The behavior of your function depends on the number of elements in A, making it O(A)
You're iterating over A twice, meaning 2 O(A) -> O(A)
On point 4, since 2 is a constant factor, 2 O(A) is still in O(A).
I think the page is not correct in its measurement. Had the loops been nested, then it would've been O(A²), but the loops are not nested.
This short sample is O(N²):
def process_list(my_list):
for i in range(0, len(my_list)):
for j in range(0, len(my_list)):
# do something with my_list[i] and my_list[j]
I've not seen the code the page is using to 'detect' the time complexity of the code, but my guess is that the page is counting the number of loops you're using without understanding much of the actual structure of the code.
EDIT1:
Note that, based on this answer, the time complexity of the len function is actually O(1), not O(N), so the page is not incorrectly trying to count its use for the time-complexity. If it were doing that, it would've incorrectly claimed a larger order of growth because it's used 4 separate times.
EDIT2:
As #PaulHankin notes, asymptotic analysis also depends on what's considered a "basic operation". In my analysis, I've counted additions and assignments as "basic operations" by using the uniform cost method, not the logarithmic cost method, which I did not mention at first.
Most of the time simple arithmetic operations are always treated as basic operations. This is what I see most commonly being done, unless the algorithm being analysed is for a basic operation itself (e.g. time complexity of a multiplication function), which is not the case here.
The only reason why we have different results appears to be this distinction. I think we're both correct.
EDIT3:
While an algorithm in O(N) is also in O(N²), I think it's reasonable to state that the code is still in O(N) b/c, at the level of abstraction we're using, the computational steps that seem more relevant (i.e. are more influential) are in the loop as a function of the size of the input iterable A, not the number of bits being used to represent each value.
Consider the following algorithm to compute an:
def function(a, n):
r = 1
for i in range(0, n):
r *= a
return r
Under the uniform cost method, this is in O(N), because the loop is executed n times, but under logarithmic cost method, the algorithm above turns out to be in O(N²) instead due to the time complexity of the multiplication at line r *= a being in O(N), since the number of bits to represent each number is dependent on the size of the number itself.
Codility Ladder competition is best solved in here:
It is super tricky.
We first compute the Fibonacci sequence for the first L+2 numbers. The first two numbers are used only as fillers, so we have to index the sequence as A[idx]+1 instead of A[idx]-1. The second step is to replace the modulo operation by removing all but the n lowest bits
[This is related to Minimum set cover ]
I would like to solve the following puzzle by computer for small size of n. Consider all 2^n binary vectors of length n. For each one you delete exactly n/3 of the bits, leaving a binary vector length 2n/3 (assume n is an integer multiple of 3). The goal is to choose the bits you delete so as to minimize the number of different binary vectors of length 2n/3 that remain at the end.
For example, for n = 3 the optimal answer is 2 different vectors 11 and 00. For n = 6 it is 4, for n = 9 it is 6 and for n = 12 it is 10.
I had previously attempted to solve this problem as a minimum set cover problem of the following sort. All the lists contain only 1s and 0s.
I say that a list A covers a list B if you can make B from A by inserting exactly x symbols.
Consider all 2^n lists of 1s and 0s of length n and set x = n/3. I would like to compute a minimal set of lists of length 2n/3 that covers them all. David Eisenstat provided code that converted this minimal set cover problem into a Mixed Integer Programming Problem that could be fed into CPLEX (or http://scip.zib.de/ which is open source).
from collections import defaultdict
from itertools import product, combinations
def all_fill(source, num):
output_len = (len(source) + num)
for where in combinations(range(output_len), len(source)):
poss = ([[0, 1]] * output_len)
for (w, s) in zip(where, source):
poss[w] = [s]
for tup in product(*poss):
(yield tup)
def variable_name(seq):
return ('x' + ''.join((str(s) for s in seq)))
n = 12
shortn = ((2 * n) // 3)
x = (n // 3)
all_seqs = list(product([0, 1], repeat=shortn))
hit_sets = defaultdict(set)
for seq in all_seqs:
for fill in all_fill(seq, x):
hit_sets[fill].add(seq)
print('Minimize')
print(' + '.join((variable_name(seq) for seq in all_seqs)))
print('Subject To')
for (fill, seqs) in hit_sets.items():
print(' + '.join((variable_name(seq) for seq in seqs)), '>=', 1)
print('Binary')
for seq in all_seqs:
print(variable_name(seq))
print('End')
The problem is that if you set n=15 then the instance it outputs is too large for any solver I can find. Is there a more efficient way of solving this problem so I can solve n=15 or even n = 18?
This doesn't solve your problem (well, not quickly enough), but you're not getting many ideas and someone else may find something useful to build on here.
It's a short pure Python 3 program, using backtracking search with some greedy ordering heuristics. It solves the N = 3, 6, and 9 instances very quickly. It finds a cover of size 10 for N=12 quickly too, but will apparently take a much longer time to exhaust the search space (I'm out of time for this, and it's still running). For N=15, the initialization time is already slow.
Bitstrings are represented by plain N-bit integers here, so consume little storage. That's to ease recoding in a faster language. It does make heavy use of sets of integers, but no other "advanced" data structures.
Hope this helps someone! But it's clear that the combinatorial explosion of possibilities as N increases ensures that nothing will be "fast enough" without digging deeper into the mathematics of the problem.
def dump(cover):
for s in sorted(cover):
print(" {:0{width}b}".format(s, width=I))
def new_best(cover):
global best_cover, best_size
assert len(cover) < best_size
best_size = len(cover)
best_cover = cover.copy()
print("N =", N, "new best cover, size", best_size)
dump(best_cover)
def initialize(N, X, I):
from itertools import combinations
# Map a "wide" (length N) bitstring to the set of all
# "narrow" (length I) bitstrings that generate it.
w2n = [set() for _ in range(2**N)]
# Map a narrow bitstring to all the wide bitstrings
# it generates.
n2w = [set() for _ in range(2**I)]
for wide, wset in enumerate(w2n):
for t in combinations(range(N), X):
narrow = wide
for i in reversed(t): # largest i to smallest
hi, lo = divmod(narrow, 1 << i)
narrow = ((hi >> 1) << i) | lo
wset.add(narrow)
n2w[narrow].add(wide)
return w2n, n2w
def solve(needed, cover):
if len(cover) >= best_size:
return
if not needed:
new_best(cover)
return
# Find something needed with minimal generating set.
_, winner = min((len(w2n[g]), g) for g in needed)
# And order its generators by how much reduction they make
# to `needed`.
for g in sorted(w2n[winner],
key=lambda g: len(needed & n2w[g]),
reverse=True):
cover.add(g)
solve(needed - n2w[g], cover)
cover.remove(g)
N = 9 # CHANGE THIS TO WHAT YOU WANT
assert N % 3 == 0
X = N // 3 # number of bits to exclude
I = N - X # number of bits to include
print("initializing")
w2n, n2w = initialize(N, X, I)
best_cover = None
best_size = 2**I + 1 # "infinity"
print("solving")
solve(set(range(2**N)), set())
Example output for N=9:
initializing
solving
N = 9 new best cover, size 6
000000
000111
001100
110011
111000
111111
Followup
For N=12 this eventually finished, confirming that the minimal covering set contains 10 elements (which it found very soon at the start). I didn't time it, but it took at least 5 hours.
Why's that? Because it's close to brain-dead ;-) A completely naive search would try all subsets of the 256 8-bit short strings. There are 2**256 such subsets, about 1.2e77 - it wouldn't finish in the expected lifetime of the universe ;-)
The ordering gimmicks here first detect that the "all 0" and "all 1" short strings must be in any covering set, so pick them. That leaves us looking at "only" the 254 remaining short strings. Then the greedy "pick an element that covers the most" strategy very quickly finds a covering set with 11 total elements, and shortly thereafter a covering with 10 elements. That happens to be optimal, but it takes a long time to exhaust all other possibilities.
At this point, any attempt at a covering set that reaches 10 elements is aborted (it can't possibly be smaller than 10 elements then!). If that were done wholly naively too, it would need to try adding (to the "all 0" and "all 1" strings) all 8-element subsets of the 254 remaining, and 254-choose-8 is about 3.8e14. Very much smaller than 1.2e77 - but still way too large to be practical. It's an interesting exercise to understand how the code manages to do so much better than that. Hint: it has a lot to do with the data in this problem.
Industrial-strength solvers are incomparably more sophisticated and complex. I was pleasantly surprised at how well this simple little program did on the smaller problem instances! It got lucky.
But for N=15 this simple approach is hopeless. It quickly finds a cover with 18 elements, but makes no more visible progress for at least hours. Internally, it's still working with needed sets containing hundreds (even thousands) of elements, which makes the body of solve() quite expensive. It still has 2**10 - 2 = 1022 short strings to consider, and 1022-choose-16 is about 6e34. I don't expect it would visibly help even if this code were sped by a factor of a million.
It was fun to try, though :-)
And a small rewrite
This version runs at least 6 times faster on a full N=12 run, simply by cutting off futile searches one level earlier. Also speeds initialization, and cuts memory use by changing the 2**N w2n sets into lists (no set operations are used on those). It's still hopeless for N=15, though :-(
def dump(cover):
for s in sorted(cover):
print(" {:0{width}b}".format(s, width=I))
def new_best(cover):
global best_cover, best_size
assert len(cover) < best_size
best_size = len(cover)
best_cover = cover.copy()
print("N =", N, "new best cover, size", best_size)
dump(best_cover)
def initialize(N, X, I):
from itertools import combinations
# Map a "wide" (length N) bitstring to the set of all
# "narrow" (length I) bitstrings that generate it.
w2n = [set() for _ in range(2**N)]
# Map a narrow bitstring to all the wide bitstrings
# it generates.
n2w = [set() for _ in range(2**I)]
# mask[i] is a string of i 1-bits
mask = [2**i - 1 for i in range(N)]
for t in combinations(range(N), X):
t = t[::-1] # largest i to smallest
for wide, wset in enumerate(w2n):
narrow = wide
for i in t: # delete bit 2**i
narrow = ((narrow >> (i+1)) << i) | (narrow & mask[i])
wset.add(narrow)
n2w[narrow].add(wide)
# release some space
for i, s in enumerate(w2n):
w2n[i] = list(s)
return w2n, n2w
def solve(needed, cover):
if not needed:
if len(cover) < best_size:
new_best(cover)
return
if len(cover) >= best_size - 1:
# can't possibly be extended to a cover < best_size
return
# Find something needed with minimal generating set.
_, winner = min((len(w2n[g]), g) for g in needed)
# And order its generators by how much reduction they make
# to `needed`.
for g in sorted(w2n[winner],
key=lambda g: len(needed & n2w[g]),
reverse=True):
cover.add(g)
solve(needed - n2w[g], cover)
cover.remove(g)
N = 9 # CHANGE THIS TO WHAT YOU WANT
assert N % 3 == 0
X = N // 3 # number of bits to exclude
I = N - X # number of bits to include
print("initializing")
w2n, n2w = initialize(N, X, I)
best_cover = None
best_size = 2**I + 1 # "infinity"
print("solving")
solve(set(range(2**N)), set())
print("best for N =", N, "has size", best_size)
dump(best_cover)
First consider if you have 6 bits. You can throw away 2 bits. Therefore, any pattern balance of 6-0, 5-1 or 4-2 can be converted to 0000 or 1111. In the case a 3-3 zero-one balance any pattern can be converted to one of four cases: 1000, 0001, 0111, or 1110. Therefore, one possible minimum set for 6 bits is:
0000
0001
0111
1110
1000
1111
Now consider 9 bits with 3 thrown away. You have the following set of 14 master patterns:
000000
100000
000001
010000
000010
110000
000011
001111
111100
101111
111101
011111
111110
111111
In other words, each pattern set has ones/zeros in the center, with every permutation of n/3-1 bits on each end. For example, if you have 24 bits then you will have 17 bits in the center and 7 bits on the ends. Since 2^7 = 128 you will have 4 x 128 - 2 = 510 possible patterns.
To find correct deletions there are various algorithms. One method is to find the edit distance between the current bit set and each master pattern. The pattern with the minimum edit distance is the one to convert to. This method uses dynamic programming. Another method would be to do a tree search through the patterns using a set of rules to find the matching pattern.
I want to implement Karatsuba's 2-split multiplication in Python. However, writing numbers in the form
A=c*x+d
where x is a power of the base (let x=b^m) close to sqrt(A).
How am I supposed to find x, if I can't even use division and multiplication? Should I count the number of digits and shift A to the left by half the number of digits?
Thanks.
Almost. You don't shift A by half the number of digits; you shift 1. Of course, this is only efficient if the base is a power of 2, since "shifting" in base 10 (for example) has to be done with multiplications. (Edit: well, ok, you can multiply with shifts and additions. But it's ever so much simpler with a power of 2.)
If you're using Python 3.1 or greater, counting the bits is easy, because 3.1 introduced the int.bit_length() method. For other versions of Python, you can count the bits by copying A and shifting it right until it's 0. This can be done in O(log N) time (N = # of digits) with a sort of binary search method - shift by many bits, if it's 0 then that was too many, etc.
You already accepted an answer since I started writing this, but:
What Tom said: in Python 3.x you can get n = int.bit_length() directly.
In Python 2.x you get n in O(log2(A)) time by binary-search, like below.
Here is (2.x) code that calculates both. Let the base-2 exponent of x be n, i.e. x = 2**n.
First we get n by binary-search by shifting. (Really we only needed n/2, so that's one unnecessary last iteration).
Then when we know n, getting x,c,d is easy (still no using division)
def karatsuba_form(A,n=32):
"""Binary-search for Karatsuba form using binary shifts"""
# First search for n ~ log2(A)
step = n >> 1
while step>0:
c = A >> n
print 'n=%2d step=%2d -> c=%d' % (n,step,c)
if c:
n += step
else:
n -= step
# More concisely, could say: n = (n+step) if c else (n-step)
step >>= 1
# Then take x = 2^(n/2) ˜ sqrt(A)
ndiv2 = n/2
# Find Karatsuba form
c = (A >> ndiv2)
x = (1 << ndiv2)
d = A - (c << ndiv2)
return (x,c,d)
Your question is already answered in the article to which you referred: "Karatsuba's basic step works for any base B and any m, but the recursive algorithm is most efficient when m is equal to n/2, rounded up" ... n being the number of digits, and 0 <= value_of_digit < B.
Some perspective that might help:
You are allowed (and required!) to use elementary operations like number_of_digits // 2 and divmod(digit_x * digit_x, B) ... in school arithmetic, where B is 10, you are required (for example) to know that divmod(9 * 8, 10) produces (7, 2).
When implementing large number arithmetic on a computer, it is usual to make B the largest power of 2 that will support the elementary multiplication operation conveniently. For example in the CPython implementation on a 32-bit machine, B is chosen to to be 2 ** 15 (i.e. 32768), because then product = digit_x * digit_y; hi = product >> 15; lo = product & 0x7FFF; works without overflow and without concern about a sign bit.
I'm not sure what you are trying to achieve with an implementation in Python that uses B == 2, with numbers represented by Python ints, whose implementation in C already uses the Karatsuba algorithm for multiplying numbers that are large enough to make it worthwhile. It can't be speed.
As a learning exercise, you might like to try representing a number as a list of digits, with the base B being an input parameter.