Sample Online Data Algorithm Analysis [closed]

Sample Online Data Algorithm Analysis [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am going through a book called "Elements of Programming Interview" and have gotten stuck at the following problem:
Implement an algorithm that takes as input an array of distinct
elements and a size, and returns a subset of the given size of the
array elements. All subsets should be equally likely. Return the
result in input array itself.
The solution they provide below is:
import random
def random_sampling(k, A):
for i in range(k):
# Generate a random index in [i, len(A) - 1].
r = random.randint(i, len(A) - 1)
A[i], A[r] = A[r], A[i]
A = [3, 7, 5, 11]
k = 3
print(random_sampling(k, A))
I so not understand what the authors are trying to do intuitively. Their explanation is below
Another approach is to enumerate all subsets of size k and then select
one at random from these. Since there are (n to k) subsets of size k,
the time and space complexity are huge. The key to efficiently
building a random subset of size exactly k is to first build one of
size k - 1 and then adding one more element, selected randomly from
the rest. The problem is trivial when k = 1. We make one call to the
random number generator, take the returned value mod n (call it r),
and swap A[0] with A[r]. The entry A[0] now holds the result.
For k > 1, we begin by choosing one element at random as above and we
now repeat the same process with n - 1 element sub-array A[1, n -1].
Eventually, the random subset occupies the slots A[0, k - 1] and the
remaining elements are in the last n - k slots.
Intuitively, if all subsets of size k are equally likely, then the
construction process ensures that the subset of size k + 1 are also
equally likely. A formal proof for this uses mathematical induction -
the induction hypothesis is that every permutation of every size k
subset of A is equally likely to be in A[0, k -1].
As a concrete example, let the input be A = <3, 7, 5, 11> and the size
be 3. In the first iteration, we use the random number generator to
pick a random integer in the interval [0,3]. Let the returned random
number be 2. We swap A[0] with A[2] - now the array is <5, 7, 3, 11>.
Now we pick a random integer in the interval [1, 3]. Let the returned
random number be 3. We swap A[1] with A[3] - now the resulting array
is <5, 11, 3, 7>. Now we pick a random integer in the interval [2,3].
Let the returned random number be 2. When we swap A[2] with itself the
resulting array is unchanged. The random subset consists of he first
three entries, ie., {5, 11, 3}.
Sorry for the long text; my questions are this
What is the key to efficiency they are referring to? Its not clicking in my head
What did they mean by "eventually, the random subset occupies the slots A[0, k-1] and the remaining elements are in the last n - k slots"
is there a clear reason why "every permutation of every size k subset of A is equally likely to be in A[0, k - 1]"?
Can you explain the theory behind the algorithm in clearer terms?
What is the return of the algorithm supposed to be?
thanks

an intuitive solution might be
def random_sampling(k, A):
subset = []
selected = set()
for i in range(k):
index = random.randint(0, len(A) - 1)
while index in selected:
index = random.randint(0, len(A) - 1)
selected.add(index)
subset.append([A[index]])
return subset
but its not clear that every k subset has equal probability (because for the same k you may use different number of randoms on different ranges)
so a solution that fit the probability condition will be
import itertools as it
def random_sampling(k, A):
index_posibilities = [i for i in it.combinations(A,k)] #very expansive action
index = random.randint(0, len(index_posibilities) - 1)
selected = []
for i in index:
selected.append(A[i])
return selected
so the solution they gave makes sure you use the same procedure of randoms for every set of k elements without the brute force above
the order of the list is now, first k elements are these we selected, the rest of the list are the remaining items
this is the induction assumption, I assume that every set in length k-1 has the same probability and proof it for set of length k.
an efficient way to make sure the same probability for every k size sub set, is to do exactly the same steps to produce it
no return value because the list is being changed in the function is also changed in main, the subset is the first k elements of the list after the function being called

Related

How can I find the k-th largest element in an exponentially large list?

Suppose there are n sets of real numbers: S[1], S[2], ..., S[n]. We know two things about these sets:
Each set S[i] has exactly 3 elements.
All elements in each of the sets S[i] are real numbers in the [0, 1] range. (I don't know if this detail can be helpful for the solution, though).
Let's consider a set T of all numbers that can be represented as p[1] * p[2] * p[3] * ... * p[n] where p[i] is an element of S[i]. This set T, obviously, has 3^n elements.
My question is, given the sets S[1], S[2], ..., S[n] (1 <= n <= 30) and some 1 <= k <= 10 as input, can we find the k-th largest number in T faster than in O(3^n) time? It's important that I need not only the k-th largest number, but also the corresponding numbers (p[1], p[2], p[3], ... , p[n]) that produce it.
Even if the answer is no, I would appreciate any hints on how you would solve this problem approximately, maybe, by using some heuristics? I know about beam search, but maybe you could suggest something else? And even for beam search, it is not really clear how to implement it here the best way.
If the exact answer can be obtained algorithmically in less than O(3^n) time, I would greatly appreciate it if you could point out the solution.

Well, you know that the largest product is the one that uses the largest factor from each set.
Furthermore, every other product can be formed by starting with a larger one, and then decreasing the factor chosen in exactly one set.
That leads to a simple search:
Put the largest product in a max-first priority queue.
Repeat k times:
a. Remove the largest product p from the priority queue
b. For each set that has a smaller number than the one selected in p,
generate the product formed by decreasing that number to the next lower one in that set. If this selection of factors hasn't been seen before, then add it to the priority queue.
Products will be removed from the queue in decreasing order, so the kth one you take out is the kth largest.
Complexity is about N*(k log kN), depending on how you implement things.
Note that there may be multiple ways to select the factors that produce the same product. This solution considers those ways to be distinct products, i.e., each way is counted when finding the kth largest. That may or may not be what you want.

To put the previous discussion into code we can do the following:
import operator
from functools import partial, reduce
import heapq
def prod_by_data(tup, data):
return reduce(operator.mul, (datum[t] for t, datum in zip(tup, data)), 1)
def downset(tup):
return [
tuple(t - (1 if j == i else 0) for j, t in enumerate(tup))
for i in range(len(tup))
if tup[i] > 0
]
data = [
[1, 2, 3],
[4, 2, 1],
[8, 1, 3],
[1, 1, 2],
]
data = [sorted(d) for d in data]
prod = partial(prod_by_data, data=data)
k_smallest = [tuple(len(dat) - 1 for dat in data)]
possible_k_smallest = []
while len(k_smallest) < 10:
new_possible = sorted(downset(k_smallest[-1]), key=prod, reverse=True)
possible_k_smallest = heapq.merge(possible_k_smallest, new_possible, key=prod, reverse=True)
k_smallest.append(next(possible_k_smallest))
print(k_smallest)
print([prod(tup) for tup in k_smallest])
We maintain a heap of the smallest elements. After we pop off the smallest, we need to check all if its downset (tuples that differ in exactly one position), because those tuples might be the next smallest element.
We see that we look through k - 1 times sorting O(n) elements each time with a key that itself is O(n). Because of the key this should make the sort take O(n^2) instead of O(n log n). The heapq is lazy and so popping from it is actually O(k). The initial sorting and preparation should be O(n) as well. Overall I think this makes everything O(k n^2).

Calculate element which is most divisible by all elements in the same array (better than O(n^2)) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have done this example in O(n^2). Given a array, I do the following:
max_key = 0
for k in set(keys):
count = 0
for divisor in keys:
if key < divisor: break
if key% divisor == 0: count += 1
if count > max_key: max_key = count
print(max_key)
An example of this would be:
keys = [2,4,8,2]
Then the element most divisible by all elements in the keys is 8 because there are 4 elements (2,2,4,8) that can divide 8.
Can anyone suggest an approach better than O(n^2) ?

keys = [2,4,5,8,2]
We can try something like memoization (from dynamic programming) to speed up while not doing any repeated calculations.
First, let's keep a hashmap, which stores all the divisors for a number in that array.
num_divisor = {} # hashmap
We keep another hashmap to store if a value is present in the array or not (count of that number).
cnt_num = {2: 2, 4: 1, 5: 1, 8: 1}
Now, we run prime sieve up to max(keys) to find the smallest prime factor for each number up to max(keys).
Now, we traverse the array while factoring out each number (factoring is only O(logn) given we know the smallest prime factor of each number now),
pseudo-code
for a in keys:
temp_a = a # copy
while temp_a != 1:
prime_factor = smallest_prime_factor[temp_a]
temp_a = temp_a / prime_factor
if solution for new temp_a is already known in num_divisor just update it from there (no recalculation)
else:
if new temp_a is in keys, we increment the solution in num_divisor by 1 and continue
overall complexity: max(keys) * log(max(keys)) [for seive] + n * log(max(keys))
This should work well if the keys are uniformly distributed. For cases like,
keys = [2, 4, 1001210], it will do lots of unnecessary computation, so in those cases, it is better to avoid the sieve and instead compute the prime factors directly or in extreme cases, the pairwise divisor calculation should outperform.

I think you could change one of the n into a factor that is pseudo-polynomial by inserting the numbers into a dict (amortized, expected).
keys = [2,4,8,2]
# https://stackoverflow.com/a/280156/2472827
# max key, O(keys)
max_key = max(keys)
# https://stackoverflow.com/a/6582852/2472827
# occurrence counter, O(keys)
count = dict()
for k in keys:
count[k] = count.get(k, 0) + 1
# https://stackoverflow.com/a/18634675/2472827
# transform into an answer counter, O(keys)
answer = dict.fromkeys(keys, 0)
# https://stackoverflow.com/a/1602964/2472827
# fill the answer, O(keys * max_key/min_key)
for a in answer:
max_factor = int(max_key / a)
for factor in range(1, max_factor + 1):
number = a * factor
if number in answer:
answer[number] += count[a]
# O(keys)
a = max(answer, key = answer.get)
print answer[a], "/", len(keys), "list items dividing", a
I think it works in O(n * max_n/min_n) (expected.) In this case, it is pretty good, but if you have a high dynamic range of values, it's easy to make it go slow.

You could potentially improve your code by:
Account for duplicates by putting keys in a counting map first (e.g. so you don't have to parse the '2' twice in your example). This helps if there's a lot of repetition.
If the square root of the value being checked is smaller than the number of keys, check up to the square root of the value being checked (together with the value being checked divided by its divisors). This helps if there are lots of numbers having square roots that are smaller than the total number of elements.
E.g. If we're checking 30 and the list is big, we only need to check: 1 up to 5 to see if they divide 30 and their counts, as well as 30 divided by any of its divisors in this range (30/1=30, 30/2=15, 30/3=10, 30/5=6) and their counts.
E.g. if we're checking 10^100+17, and there are 10 items total, just check each of them in turn.
Neither of these affect worst case analysis since an adversary could choose inputs where they're useless. They may help in the problems you need to solve depending on your inputs, and may help more broadly if you have some guarantees on the inputs.

Let's think about this a different way: instead of an array of numbers, consider a directed acyclic graph where each number becomes a vertex in the graph, and there is an edge from u → v if and only if u divides v. The problem is now to find a vertex with the greatest in-degree.
Note that we can't actually count the in-degree of a vertex directly in less than Θ(n) time, so the naive solution is to count the in-degree of every vertex in Θ(n2) time. To do better than Θ(n2), we need to take advantage of the fact that if there are edges u → v → w then there is also an edge u → w; we get knowledge of three edges for the price of two divisibility tests. If there are edges u → v → w → x then three divisibility tests can buy us knowledge of six edges in the graph, and so on.
However, crucially, we only get "free" information about edges if the numbers we test are divisible by each other. If we do a divisibility test and the result is negative, we get no "free" information about other possible edges. So in the worst case, there is only one edge in the graph (i.e. all the numbers in the array are not multiples of each other, except for one pair). For an algorithm to output the correct result, it must find the single number in the array which has a divisor in the array, but each divisibility test which doesn't find this pair gives no "free" information. So in the worst case, we would indeed have to test every pair to find the correct answer. This proves that a worst-case time complexity of Θ(n2) is the best you can do in the general* case.
*In case the array elements are bounded, then a pseudopolynomial algorithm could plausibly do better.

Evenly distribute within a list (Google Foobar: Maximum Equality)

This question comes from Google Foobar, and my code passes all but the last test, with the input/output hidden.
The prompt
In other words, choose two elements of the array, x[i] and x[j]
(i distinct from j) and simultaneously increment x[i] by 1 and decrement
x[j] by 1. Your goal is to get as many elements of the array to have
equal value as you can.
For example, if the array was [1,4,1] you could perform the operations
as follows:
Send a rabbit from the 1st car to the 0th: increment x[0], decrement
x[1], resulting in [2,3,1] Send a rabbit from the 1st car to the 2nd:
increment x[2], decrement x[1], resulting in [2,2,2].
All the elements are of the array are equal now, and you've got a
strategy to report back to Beta Rabbit!
Note that if the array was [1,2], the maximum possible number of equal
elements we could get is 1, as the cars could never have the same
number of rabbits in them.
Write a function answer(x), which takes the array of integers x and
returns the maximum number of equal array elements that we can get, by
doing the above described command as many times as needed.
The number of cars in the train (elements in x) will be at least 2,
and no more than 100. The number of rabbits that want to share a car
(each element of x) will be an integer in the range [0, 1000000].
My code
from collections import Counter
def most_common(lst):
data = Counter(lst)
return data.most_common(1)[0][1]
def answer(x):
"""The goal is to take all of the rabbits in list x and distribute
them equally across the original list elements."""
total = sum(x)
length = len(x)
# Find out how many are left over when distributing niavely.
div, mod = divmod(total, length)
# Because of the variable size of the list, the remainder
# might be greater than the length of the list.
# I just realized this is unnecessary.
while mod > length:
div += length
mod -= length
# Create a new list the size of x with the base number of rabbits.
result = [div] * length
# Distribute the leftovers from earlier across the list.
for i in xrange(mod):
result[i] += 1
# Return the most common element.
return most_common(result)
It runs well under my own testing purposes, handling one million tries in ten or so seconds. But it fails under an unknown input.
Have I missed something obvious, or did I make an assumption I shouldn't have?

Sorry, but your code doesn't work in my testing. I fed it [0, 0, 0, 0, 22] and got back a list of [5, 5, 4, 4, 4] for an answer of 3; the maximum would be 4 identical cars, with the original input being one such example. [4, 4, 4, 4, 6] would be another. I suspect that's your problem, and that there are quite a few other such examples in the data base.
For N cars, the maximum would be either N (if the rabbit population is divisible by the number of cars) or N-1. This seems so simple that I fear I'm missing a restriction in the problem. It didn't ask for a balanced population, just as many car populations as possible should be equal. In short:
def answer(census):
size = len(census)
return size if sum(census) % size == 0 else (size-1)

Random contiguous slice of list in Python based on a single random integer

Using a single random number and a list, how would you return a random slice of that list?
For example, given the list [0,1,2] there are seven possibilities of random contiguous slices:
[ ]
[ 0 ]
[ 0, 1 ]
[ 0, 1, 2 ]
[ 1 ]
[ 1, 2]
[ 2 ]
Rather than getting a random starting index and a random end index, there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
I need it that way, to ensure these 7 possibilities have equal probability.

Simply fix one order in which you would sort all possible slices, then work out a way to turn an index in that list of all slices back into the slice endpoints. For example, the order you used could be described by
The empty slice is before all other slices
Non-empty slices are ordered by their starting point
Slices with the same starting point are ordered by their endpoint
So the index 0 should return the empty list. Indices 1 through n should return [0:1] through [0:n]. Indices n+1 through n+(n-1)=2n-1 would be [1:2] through [1:n]; 2n through n+(n-1)+(n-2)=3n-3 would be [2:3] through [2:n] and so on. You see a pattern here: the last index for a given starting point is of the form n+(n-1)+(n-2)+(n-3)+…+(n-k), where k is the starting index of the sequence. That's an arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that for k, you get some formula involving square roots. You could then use the ceiling function to turn that into an integral value for k corresponding to the last index for that starting point. And once you know k, computing the end point is rather easy.
But the quadratic equation in the solution above makes things really ugly. So you might be better off using some other order. Right now I can't think of a way which would avoid such a quadratic term. The order Douglas used in his answer doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical and is often easier to handle since it doesn't depend on n. But since most people think about normal (forward) lexicographical order first, this answer might be more intuitive to many and might even be the required way for some applications.
Here is a bit of Python code which lists all sequence elements in order, and does the conversion from index i to endpoints [k:m] the way I described above:
from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
b = 1 - 2*n
c = 2*(i - n) - 1
# solve k^2 + b*k + c = 0
k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
The - 1 term in c doesn't come from the mathematical formula I presented above. It's more like subtracting 0.5 from each value of i. This ensures that even if the result of sqrt is slightly too large, you won't end up with a k which is too large. So that term accounts for numeric imprecision and should make the whole thing pretty robust.
The term k*(2*n-k+1)//2 is the last index belonging to starting point k-1, so i minus that term is the length of the subsequence under consideration.
You can simplify things further. You can perform some computation outside the loop, which might be important if you have to choose random sequences repeatedly. You can divide b by a factor of 2 and then get rid of that factor in a number of other places. The result could look like this:
from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
k = int(ceil(b - sqrt(bbc - 2*i)))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))

It is a little strange to give the empty list equal weight with the others. It is more natural for the empty list to be given weight 0 or n+1 times the others, if there are n elements on the list. But if you want it to have equal weight, you can do that.
There are n*(n+1)/2 nonempty contiguous sublists. You can specify these by the end point, from 0 to n-1, and the starting point, from 0 to the endpoint.
Generate a random integer x from 0 to n*(n+1)/2.
If x=0, return the empty list. Otherwise, x is unformly distributed from 1 through n(n+1)/2.
Compute e = floor(sqrt(2*x)-1/2). This takes the values 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, etc.
Compute s = (x-1) - e*(e+1)/2. This takes the values 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...
Return the interval starting at index s and ending at index e.
(s,e) takes the values (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),...
import random
import math
n=10
x = random.randint(0,n*(n+1)/2)
if (x==0):
print(range(n)[0:0]) // empty set
exit()
e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))
print(range(n)[s:e+1]) // starting at s, ending at e, inclusive

First create all possible slice indexes.
[0:0], [1:1], etc are equivalent, so we include only one of those.
Finally you pick a random index couple, and apply it.
import random
l = [0, 1, 2]
combination_couples = [(0, 0)]
length = len(l)
# Creates all index couples.
for j in range(1, length+1):
for i in range(j):
combination_couples.append((i, j))
print(combination_couples)
rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]
print(final_slice)
To ensure we got them all:
for i in combination_couples:
print(l[i[0]:i[1]])
Alternatively, with some math...
For a length-3 list there are 0 to 3 possible index numbers, that is n=4. You have 2 of them, that is k=2. First index has to be smaller than second, therefor we need to calculate the combinations as described here.
from math import factorial as f
def total_combinations(n, k=2):
result = 1
for i in range(1, k+1):
result *= n - k + i
result /= f(k)
# We add plus 1 since we included [0:0] as well.
return result + 1
print(total_combinations(n=4)) # Prints 7 as expected.

there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
It is difficult to say what method is best but if you're only interested in binding single random number to your contiguous slice you can use modulo.
Given a list l and a single random nubmer r you can get your contiguous slice like that:
l[r % len(l) : some_sparkling_transformation(r) % len(l)]
where some_sparkling_transformation(r) is essential. It depents on your needs but since I don't see any special requirements in your question it could be for example:
l[r % len(l) : (2 * r) % len(l)]
The most important thing here is that both left and right edges of the slice are correlated to r. This makes a problem to define such contiguous slices that wont follow any observable pattern. Above example (with 2 * r) produces slices that are always empty lists or follow a pattern of [a : 2 * a].
Let's use some intuition. We know that we want to find a good random representation of the number r in a form of contiguous slice. It cames out that we need to find two numbers: a and b that are respectively left and right edges of the slice. Assuming that r is a good random number (we like it in some way) we can say that a = r % len(l) is a good approach.
Let's now try to find b. The best way to generate another nice random number will be to use random number generator (random or numpy) which supports seeding (both of them). Example with random module:
import random
def contiguous_slice(l, r):
random.seed(r)
a = int(random.uniform(0, len(l)+1))
b = int(random.uniform(0, len(l)+1))
a, b = sorted([a, b])
return l[a:b]
Good luck and have fun!

Sorting Technique Python

I'm trying to create a sorting technique that sorts a list of numbers. But what it does is that it compares two numbers, the first being the first number in the list, and the other number would be the index of 2k - 1.
2^k - 1 = [1,3,7, 15, 31, 63...]
For example, if I had a list [1, 4, 3, 6, 2, 10, 8, 19]
The length of this list is 8. So the program should find a number in the 2k - 1 list that is less than 8, in this case it will be 7.
So now it will compare the first number in the random list (1) with the 7th number in the same list (19). if it is greater than the second number, it will swap positions.
After this step, it will continue on to 4 and the 7th number after that, but that doesn't exist, so now it should compare with the 3rd number after 4 because 3 is the next number in 2k - 1.
So it should compare 4 with 2 and swap if they are not in the right place. So this should go on and on until I reach 1 in 2k - 1 in which the list will finally be sorted.
I need help getting started on this code.
So far, I've written a small code that makes the 2k - 1 list but thats as far as I've gotten.
a = []
for i in range(10):
a.append(2**(i+1) -1)
print(a)
EXAMPLE:
Consider sorting the sequence V = 17,4,8,2,11,5,14,9,18,12,7,1. The skipping
sequence 1, 3, 7, 15, … yields r=7 as the biggest value which fits, so looking at V, the first sparse subsequence =
17,9, so as we pass along V we produce 9,4,8,2,11,5,14,17,18,12,7,1 after the first swap, and
9,4,8,2,1,5,14,17,18,12,7,11 after using r=7 completely. Using a=3 (the next smaller term in the skipping
sequence), the first sparse subsequence = 9,2,14,12, which when applied to V gives 2,4,8,9,1,5,12,17,18,14,7,11, and the remaining a = 3 sorts give 2,1,8,9,4,5,12,7,18,14,17,11, and then 2,1,5,9,4,8,12,7,11,14,17,18. Finally, with a = 1, we get 1,2,4,5,7,8,9,11,12,14,17,18.
You might wonder, given that at the end we do a sort with no skips, why
this might be any faster than simply doing that final step as the only step at the beginning. Think of it as a comb
going through the sequence -- notice that in the earlier steps we’re using course combs to get distant things in the
right order, using progressively finer combs until at the end our fine-tuning is dealing with a nearly-sorted sequence
needing little adjustment.
p = 0
x = len(V) #finding out the length of V to find indexer in a
for j in a: #for every element in a (1,3,7....)
if x >= j: #if the length is greater than or equal to current checking value
p = j #sets j as p
So that finds what distance it should compare the first number in the list with but now i need to write something that keeps doing that until the distance is out of range so it switches from 3 to 1 and then just checks the smaller distances until the list is sorted.

The sorting algorithm you're describing actually is called Combsort. In fact, the simpler bubblesort is a special case of combsort where the gap is always 1 and doesn't change.
Since you're stuck on how to start this, here's what I recommend:
Implement the bubblesort algorithm first. The logic is simpler and makes it much easier to reason about as you write it.
Once you've done that you have the important algorithmic structure in place and from there it's just a matter of adding gap length calculation into the mix. This means, computing the gap length with your particular formula. You'll then modifying the loop control index and the inner comparison index to use the calculated gap length.
After each iteration of the loop you decrease the gap length(in effect making the comb shorter) by some scaling amount.
The last step would be to experiment with different gap lengths and formulas to see how it affects algorithm efficiency.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.