How to detect repeating digits in an infinite sequence? I tried Floyd & Brent detection algorithm but come to nothing...
I have a generator that yields numbers ranging from 0 to 9 (inclusive) and I have to recognize a period in it.
Example test case:
import itertools
# of course this is a fake one just to offer an example
def source():
return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 1))
>>> gen = source()
>>> period(gen)
(1, 0, 1, 4, 8, 2, 1, 3, 3, 1)
Empirical methods
Here's a fun take on the problem. The more general form of your question is this:
Given a repeating sequence of unknown length, determine the period of
the signal.
The process to determine the repeating frequencies is known as the Fourier Transform. In your example case the signal is clean and discrete, but the following solution will work even with continuous noisy data! The FFT will try to duplicate the frequencies of the input signal by approximating them in the so-called "wave-space" or "Fourier-space". Basically a peak in this space corresponds to a repeating signal. The period of your signal is related to the longest wavelength that is peaked.
import itertools
# of course this is a fake one just to offer an example
def source():
return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 2))
import pylab as plt
import numpy as np
import scipy as sp
# Generate some test data, i.e. our "observations" of the signal
N = 300
vals = source()
X = np.array([vals.next() for _ in xrange(N)])
# Compute the FFT
W = np.fft.fft(X)
freq = np.fft.fftfreq(N,1)
# Look for the longest signal that is "loud"
threshold = 10**2
idx = np.where(abs(W)>threshold)[0][-1]
max_f = abs(freq[idx])
print "Period estimate: ", 1/max_f
This gives the correct answer for this case, 10 though if N didn't divide the cycles cleanly, you would get a close estimate. We can visualize this via:
plt.subplot(211)
plt.scatter([max_f,], [np.abs(W[idx]),], s=100,color='r')
plt.plot(freq[:N/2], abs(W[:N/2]))
plt.xlabel(r"$f$")
plt.subplot(212)
plt.plot(1.0/freq[:N/2], abs(W[:N/2]))
plt.scatter([1/max_f,], [np.abs(W[idx]),], s=100,color='r')
plt.xlabel(r"$1/f$")
plt.xlim(0,20)
plt.show()
Evgeny Kluev's answer provides a way to get an answer that might be right.
Definition
Let's assume you have some sequence D that is a repeating sequence. That is there is some sequence d of length L such that: D_i = d_{i mod L}, where t_i is the ith element of sequence t that is numbered from 0. We will say sequence d generates D.
Theorem
Given a sequence D which you know is generated by some finite sequence t. Given some d it is impossible to decide in finite time whether it generates D.
Proof
Since we are only allowed a finite time we can only access a finite number of elements of D. Let us suppose we access the first F elements of D. We chose the first F because if we are only allowed to access a finite number, the set containing the indices of the elements we've accessed is finite and hence has a maximum. Let that maximum be M. We can then let F = M+1, which is still a finite number.
Let L be the length of d and that D_i = d_{i mod L} for i < F. There are two possibilities for D_F it is either the same as d_{F mod L} or it is not. In the former case d seems to work, but in the latter case it does not. We cannot know which case is true until we access D_F. This will however require accessing F+1 elements, but we are limited to F element accesses.
Hence, for any F we won't have enough information to decide whether d generates D and therefore it is impossible to know in finite time whether d generates D.
Conclusions
It is possible to know in finite time that a sequence d does not generate D, but this doesn't help you. Since you want to find a sequence d that does generate D, but this involves amongst other things being able to prove that some sequence generates D.
Unless you have more information about D this problem is unsolvable. One bit of information that will make this problem decidable is some upper bound on the length of the shortest d that generates D. If you know the function generating D only has a known amount of finite state you can calculate this upper bound.
Hence, my conclusion is that you cannot solve this problem unless you change the specification a bit.
I have no idea about proper algorithms to apply here, but my understanding also is that you can never know for sure that you've detected a period if you have consumed only a finite number of terms. Anyway, here's what I've come up with, this is a very naive implementation, more to educate from the comments than to provide a good solution (I guess).
def guess_period(source, minlen=1, maxlen=100, trials=100):
for n in range(minlen, maxlen+1):
p = [j for i, j in zip(range(n), source)]
if all([j for i, j in zip(range(n), source)] == p
for k in range(trials)):
return tuple(p)
return None
This one, however, "forgets" the initial order and returns a tuple that is a cyclic permutation of the actual period:
In [101]: guess_period(gen)
Out[101]: (0, 1, 4, 8, 2, 1, 3, 3, 1, 1)
To compensate for this, you'll need to keep track of the offset.
Since your sequence is not of the form Xn+1 = f(Xn), Floyd's or Brent's algorithms are not directly applicable to your case. But they may be extended to do the task:
Use Floyd's or Brent's algorithm to find some repeating element of the sequence.
Find next sequence element with the same value. Distance between these elements is a supposed period (k).
Remember next k elements of the sequence
Find the next occurrence of this k-element subsequence.
If distance between subsequences is greater than k, update k and continue with the step 3.
Repeat step 4 several times to verify the result. If maximum length of the repeating sub-sequence is known a-priori, use appropriate number of repetitions. Otherwise use as much repetitions as possible, because each repetition increases the result's correctness.
If the sequence cycling starts from the first element, ignore step 1 and start from step 2 (find next sequence element equal to the first element).
If the sequence cycling does not start from the first element, it is possible that Floyd's or Brent's algorithm finds some repeating element of the sequence that does not belong to the cycle. So it is reasonable to limit the number of iterations in steps 2 and 4, and if this limit is exceeded, continue with the step 1.
Related
Suppose there are n sets of real numbers: S[1], S[2], ..., S[n]. We know two things about these sets:
Each set S[i] has exactly 3 elements.
All elements in each of the sets S[i] are real numbers in the [0, 1] range. (I don't know if this detail can be helpful for the solution, though).
Let's consider a set T of all numbers that can be represented as p[1] * p[2] * p[3] * ... * p[n] where p[i] is an element of S[i]. This set T, obviously, has 3^n elements.
My question is, given the sets S[1], S[2], ..., S[n] (1 <= n <= 30) and some 1 <= k <= 10 as input, can we find the k-th largest number in T faster than in O(3^n) time? It's important that I need not only the k-th largest number, but also the corresponding numbers (p[1], p[2], p[3], ... , p[n]) that produce it.
Even if the answer is no, I would appreciate any hints on how you would solve this problem approximately, maybe, by using some heuristics? I know about beam search, but maybe you could suggest something else? And even for beam search, it is not really clear how to implement it here the best way.
If the exact answer can be obtained algorithmically in less than O(3^n) time, I would greatly appreciate it if you could point out the solution.
Well, you know that the largest product is the one that uses the largest factor from each set.
Furthermore, every other product can be formed by starting with a larger one, and then decreasing the factor chosen in exactly one set.
That leads to a simple search:
Put the largest product in a max-first priority queue.
Repeat k times:
a. Remove the largest product p from the priority queue
b. For each set that has a smaller number than the one selected in p,
generate the product formed by decreasing that number to the next lower one in that set. If this selection of factors hasn't been seen before, then add it to the priority queue.
Products will be removed from the queue in decreasing order, so the kth one you take out is the kth largest.
Complexity is about N*(k log kN), depending on how you implement things.
Note that there may be multiple ways to select the factors that produce the same product. This solution considers those ways to be distinct products, i.e., each way is counted when finding the kth largest. That may or may not be what you want.
To put the previous discussion into code we can do the following:
import operator
from functools import partial, reduce
import heapq
def prod_by_data(tup, data):
return reduce(operator.mul, (datum[t] for t, datum in zip(tup, data)), 1)
def downset(tup):
return [
tuple(t - (1 if j == i else 0) for j, t in enumerate(tup))
for i in range(len(tup))
if tup[i] > 0
]
data = [
[1, 2, 3],
[4, 2, 1],
[8, 1, 3],
[1, 1, 2],
]
data = [sorted(d) for d in data]
prod = partial(prod_by_data, data=data)
k_smallest = [tuple(len(dat) - 1 for dat in data)]
possible_k_smallest = []
while len(k_smallest) < 10:
new_possible = sorted(downset(k_smallest[-1]), key=prod, reverse=True)
possible_k_smallest = heapq.merge(possible_k_smallest, new_possible, key=prod, reverse=True)
k_smallest.append(next(possible_k_smallest))
print(k_smallest)
print([prod(tup) for tup in k_smallest])
We maintain a heap of the smallest elements. After we pop off the smallest, we need to check all if its downset (tuples that differ in exactly one position), because those tuples might be the next smallest element.
We see that we look through k - 1 times sorting O(n) elements each time with a key that itself is O(n). Because of the key this should make the sort take O(n^2) instead of O(n log n). The heapq is lazy and so popping from it is actually O(k). The initial sorting and preparation should be O(n) as well. Overall I think this makes everything O(k n^2).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am going through a book called "Elements of Programming Interview" and have gotten stuck at the following problem:
Implement an algorithm that takes as input an array of distinct
elements and a size, and returns a subset of the given size of the
array elements. All subsets should be equally likely. Return the
result in input array itself.
The solution they provide below is:
import random
def random_sampling(k, A):
for i in range(k):
# Generate a random index in [i, len(A) - 1].
r = random.randint(i, len(A) - 1)
A[i], A[r] = A[r], A[i]
A = [3, 7, 5, 11]
k = 3
print(random_sampling(k, A))
I so not understand what the authors are trying to do intuitively. Their explanation is below
Another approach is to enumerate all subsets of size k and then select
one at random from these. Since there are (n to k) subsets of size k,
the time and space complexity are huge. The key to efficiently
building a random subset of size exactly k is to first build one of
size k - 1 and then adding one more element, selected randomly from
the rest. The problem is trivial when k = 1. We make one call to the
random number generator, take the returned value mod n (call it r),
and swap A[0] with A[r]. The entry A[0] now holds the result.
For k > 1, we begin by choosing one element at random as above and we
now repeat the same process with n - 1 element sub-array A[1, n -1].
Eventually, the random subset occupies the slots A[0, k - 1] and the
remaining elements are in the last n - k slots.
Intuitively, if all subsets of size k are equally likely, then the
construction process ensures that the subset of size k + 1 are also
equally likely. A formal proof for this uses mathematical induction -
the induction hypothesis is that every permutation of every size k
subset of A is equally likely to be in A[0, k -1].
As a concrete example, let the input be A = <3, 7, 5, 11> and the size
be 3. In the first iteration, we use the random number generator to
pick a random integer in the interval [0,3]. Let the returned random
number be 2. We swap A[0] with A[2] - now the array is <5, 7, 3, 11>.
Now we pick a random integer in the interval [1, 3]. Let the returned
random number be 3. We swap A[1] with A[3] - now the resulting array
is <5, 11, 3, 7>. Now we pick a random integer in the interval [2,3].
Let the returned random number be 2. When we swap A[2] with itself the
resulting array is unchanged. The random subset consists of he first
three entries, ie., {5, 11, 3}.
Sorry for the long text; my questions are this
What is the key to efficiency they are referring to? Its not clicking in my head
What did they mean by "eventually, the random subset occupies the slots A[0, k-1] and the remaining elements are in the last n - k slots"
is there a clear reason why "every permutation of every size k subset of A is equally likely to be in A[0, k - 1]"?
Can you explain the theory behind the algorithm in clearer terms?
What is the return of the algorithm supposed to be?
thanks
an intuitive solution might be
def random_sampling(k, A):
subset = []
selected = set()
for i in range(k):
index = random.randint(0, len(A) - 1)
while index in selected:
index = random.randint(0, len(A) - 1)
selected.add(index)
subset.append([A[index]])
return subset
but its not clear that every k subset has equal probability (because for the same k you may use different number of randoms on different ranges)
so a solution that fit the probability condition will be
import itertools as it
def random_sampling(k, A):
index_posibilities = [i for i in it.combinations(A,k)] #very expansive action
index = random.randint(0, len(index_posibilities) - 1)
selected = []
for i in index:
selected.append(A[i])
return selected
so the solution they gave makes sure you use the same procedure of randoms for every set of k elements without the brute force above
the order of the list is now, first k elements are these we selected, the rest of the list are the remaining items
this is the induction assumption, I assume that every set in length k-1 has the same probability and proof it for set of length k.
an efficient way to make sure the same probability for every k size sub set, is to do exactly the same steps to produce it
no return value because the list is being changed in the function is also changed in main, the subset is the first k elements of the list after the function being called
Algorithm problem:
Write a program which takes as input a positive integer n and size
k <= n; return a size-k subset of {0, 1, 2, .. , n -1}. The subset
should be represented as an array. All subsets should be equally
likely, and in addition, all permutations of elements of the array
should be equally likely. You may assume you have a function which
takes as input a nonnegative integer t and returns an integer in the
set {0, 1,...,t-1}.
My original solution to this in pseudocode is as follows:
Set t = n, and output the result of the random number generator into a set() until set() has size(set) == t. Return list(set)
The author solution is as follows:
def online_sampling(n, k):
changed_elements = {}
for i in range(k):
rand_idx = random.randrange(i, n)
rand_idx_mapped = changed_elements.get(rand_idx, rand_idx)
i_mapped = changed_elements.get(i, i)
changed_elements[rand_idx] = i_mapped
changed_elements[i] = rand_idx_mapped
return [changed_elements[i] for i in range(k)]
I totally understand the author's solution - my question is more about why my solution is incorrect. My guess is that it becomes greatly inefficient as t approaches n, because in that case, the probability that I need to keep running the random num function until I get a number that isn't in t gets higher and higher. If t == n, for the very last element to add to set there is just a 1/n chance that I get the correct element, and would probabilistically need to run the given rand() function n times just to get the last item.
Is this the correct reason for why my solution isn't efficient? Is there anything else I'm missing? And how would one describe the time complexity of my solution then? By the above rationale, I believe would be O(n^2) since probabilistically need to run n + n - 1 + n - 2... times.
Your solution is (almost) correct.
Firstly, it will run in O(n log n) instead of O(n^2), assuming that all operations with set are O(1). Here's why.
The expected time to add the first element to the set is 1 = n/n.
The expected time to add the second element to the set is n/(n-1), because the probability to randomly choose yet unchosen element is (n-1)/n. See geometric distribution for an explanation.
...
For k-th element, the expected time is n/(n-k). So for n elements the total time is n/n + n/(n-1) + ... + n/1 = n * (1 + 1/2 + ... + 1/n) = n log n.
Moreover, we can prove by induction that all chosen subsets will be equiprobable.
However, when you do list(set(...)), it is not guaranteed the resulting list will contain elements in the same order as you put them into a set. For example, if set is implemented as a binary search tree then the list will always be sorted. So you have to store the list of unique found elements separately.
UPD (#JimMischel): we proved the average case running time. There still is a possibility that the algorithm will run indefinitely (for example, if rand() always returns 1).
Your method has a big problem. You may return duplicate numbers if you random number generator create same number two times isn't it?
If you say set() will not keep duplicate numbers, your method has created members of set with different chance. So numbers in your set will not be equally likely.
Problem with your method is not efficiency, it does not create an equally likely result set. The author uses a variation of Fisher-Yates method for creating that subset which will be equally likely.
Using a single random number and a list, how would you return a random slice of that list?
For example, given the list [0,1,2] there are seven possibilities of random contiguous slices:
[ ]
[ 0 ]
[ 0, 1 ]
[ 0, 1, 2 ]
[ 1 ]
[ 1, 2]
[ 2 ]
Rather than getting a random starting index and a random end index, there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
I need it that way, to ensure these 7 possibilities have equal probability.
Simply fix one order in which you would sort all possible slices, then work out a way to turn an index in that list of all slices back into the slice endpoints. For example, the order you used could be described by
The empty slice is before all other slices
Non-empty slices are ordered by their starting point
Slices with the same starting point are ordered by their endpoint
So the index 0 should return the empty list. Indices 1 through n should return [0:1] through [0:n]. Indices n+1 through n+(n-1)=2n-1 would be [1:2] through [1:n]; 2n through n+(n-1)+(n-2)=3n-3 would be [2:3] through [2:n] and so on. You see a pattern here: the last index for a given starting point is of the form n+(n-1)+(n-2)+(n-3)+…+(n-k), where k is the starting index of the sequence. That's an arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that for k, you get some formula involving square roots. You could then use the ceiling function to turn that into an integral value for k corresponding to the last index for that starting point. And once you know k, computing the end point is rather easy.
But the quadratic equation in the solution above makes things really ugly. So you might be better off using some other order. Right now I can't think of a way which would avoid such a quadratic term. The order Douglas used in his answer doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical and is often easier to handle since it doesn't depend on n. But since most people think about normal (forward) lexicographical order first, this answer might be more intuitive to many and might even be the required way for some applications.
Here is a bit of Python code which lists all sequence elements in order, and does the conversion from index i to endpoints [k:m] the way I described above:
from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
b = 1 - 2*n
c = 2*(i - n) - 1
# solve k^2 + b*k + c = 0
k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
The - 1 term in c doesn't come from the mathematical formula I presented above. It's more like subtracting 0.5 from each value of i. This ensures that even if the result of sqrt is slightly too large, you won't end up with a k which is too large. So that term accounts for numeric imprecision and should make the whole thing pretty robust.
The term k*(2*n-k+1)//2 is the last index belonging to starting point k-1, so i minus that term is the length of the subsequence under consideration.
You can simplify things further. You can perform some computation outside the loop, which might be important if you have to choose random sequences repeatedly. You can divide b by a factor of 2 and then get rid of that factor in a number of other places. The result could look like this:
from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
k = int(ceil(b - sqrt(bbc - 2*i)))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
It is a little strange to give the empty list equal weight with the others. It is more natural for the empty list to be given weight 0 or n+1 times the others, if there are n elements on the list. But if you want it to have equal weight, you can do that.
There are n*(n+1)/2 nonempty contiguous sublists. You can specify these by the end point, from 0 to n-1, and the starting point, from 0 to the endpoint.
Generate a random integer x from 0 to n*(n+1)/2.
If x=0, return the empty list. Otherwise, x is unformly distributed from 1 through n(n+1)/2.
Compute e = floor(sqrt(2*x)-1/2). This takes the values 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, etc.
Compute s = (x-1) - e*(e+1)/2. This takes the values 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...
Return the interval starting at index s and ending at index e.
(s,e) takes the values (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),...
import random
import math
n=10
x = random.randint(0,n*(n+1)/2)
if (x==0):
print(range(n)[0:0]) // empty set
exit()
e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))
print(range(n)[s:e+1]) // starting at s, ending at e, inclusive
First create all possible slice indexes.
[0:0], [1:1], etc are equivalent, so we include only one of those.
Finally you pick a random index couple, and apply it.
import random
l = [0, 1, 2]
combination_couples = [(0, 0)]
length = len(l)
# Creates all index couples.
for j in range(1, length+1):
for i in range(j):
combination_couples.append((i, j))
print(combination_couples)
rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]
print(final_slice)
To ensure we got them all:
for i in combination_couples:
print(l[i[0]:i[1]])
Alternatively, with some math...
For a length-3 list there are 0 to 3 possible index numbers, that is n=4. You have 2 of them, that is k=2. First index has to be smaller than second, therefor we need to calculate the combinations as described here.
from math import factorial as f
def total_combinations(n, k=2):
result = 1
for i in range(1, k+1):
result *= n - k + i
result /= f(k)
# We add plus 1 since we included [0:0] as well.
return result + 1
print(total_combinations(n=4)) # Prints 7 as expected.
there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
It is difficult to say what method is best but if you're only interested in binding single random number to your contiguous slice you can use modulo.
Given a list l and a single random nubmer r you can get your contiguous slice like that:
l[r % len(l) : some_sparkling_transformation(r) % len(l)]
where some_sparkling_transformation(r) is essential. It depents on your needs but since I don't see any special requirements in your question it could be for example:
l[r % len(l) : (2 * r) % len(l)]
The most important thing here is that both left and right edges of the slice are correlated to r. This makes a problem to define such contiguous slices that wont follow any observable pattern. Above example (with 2 * r) produces slices that are always empty lists or follow a pattern of [a : 2 * a].
Let's use some intuition. We know that we want to find a good random representation of the number r in a form of contiguous slice. It cames out that we need to find two numbers: a and b that are respectively left and right edges of the slice. Assuming that r is a good random number (we like it in some way) we can say that a = r % len(l) is a good approach.
Let's now try to find b. The best way to generate another nice random number will be to use random number generator (random or numpy) which supports seeding (both of them). Example with random module:
import random
def contiguous_slice(l, r):
random.seed(r)
a = int(random.uniform(0, len(l)+1))
b = int(random.uniform(0, len(l)+1))
a, b = sorted([a, b])
return l[a:b]
Good luck and have fun!
I'm trying to create a sorting technique that sorts a list of numbers. But what it does is that it compares two numbers, the first being the first number in the list, and the other number would be the index of 2k - 1.
2^k - 1 = [1,3,7, 15, 31, 63...]
For example, if I had a list [1, 4, 3, 6, 2, 10, 8, 19]
The length of this list is 8. So the program should find a number in the 2k - 1 list that is less than 8, in this case it will be 7.
So now it will compare the first number in the random list (1) with the 7th number in the same list (19). if it is greater than the second number, it will swap positions.
After this step, it will continue on to 4 and the 7th number after that, but that doesn't exist, so now it should compare with the 3rd number after 4 because 3 is the next number in 2k - 1.
So it should compare 4 with 2 and swap if they are not in the right place. So this should go on and on until I reach 1 in 2k - 1 in which the list will finally be sorted.
I need help getting started on this code.
So far, I've written a small code that makes the 2k - 1 list but thats as far as I've gotten.
a = []
for i in range(10):
a.append(2**(i+1) -1)
print(a)
EXAMPLE:
Consider sorting the sequence V = 17,4,8,2,11,5,14,9,18,12,7,1. The skipping
sequence 1, 3, 7, 15, … yields r=7 as the biggest value which fits, so looking at V, the first sparse subsequence =
17,9, so as we pass along V we produce 9,4,8,2,11,5,14,17,18,12,7,1 after the first swap, and
9,4,8,2,1,5,14,17,18,12,7,11 after using r=7 completely. Using a=3 (the next smaller term in the skipping
sequence), the first sparse subsequence = 9,2,14,12, which when applied to V gives 2,4,8,9,1,5,12,17,18,14,7,11, and the remaining a = 3 sorts give 2,1,8,9,4,5,12,7,18,14,17,11, and then 2,1,5,9,4,8,12,7,11,14,17,18. Finally, with a = 1, we get 1,2,4,5,7,8,9,11,12,14,17,18.
You might wonder, given that at the end we do a sort with no skips, why
this might be any faster than simply doing that final step as the only step at the beginning. Think of it as a comb
going through the sequence -- notice that in the earlier steps we’re using course combs to get distant things in the
right order, using progressively finer combs until at the end our fine-tuning is dealing with a nearly-sorted sequence
needing little adjustment.
p = 0
x = len(V) #finding out the length of V to find indexer in a
for j in a: #for every element in a (1,3,7....)
if x >= j: #if the length is greater than or equal to current checking value
p = j #sets j as p
So that finds what distance it should compare the first number in the list with but now i need to write something that keeps doing that until the distance is out of range so it switches from 3 to 1 and then just checks the smaller distances until the list is sorted.
The sorting algorithm you're describing actually is called Combsort. In fact, the simpler bubblesort is a special case of combsort where the gap is always 1 and doesn't change.
Since you're stuck on how to start this, here's what I recommend:
Implement the bubblesort algorithm first. The logic is simpler and makes it much easier to reason about as you write it.
Once you've done that you have the important algorithmic structure in place and from there it's just a matter of adding gap length calculation into the mix. This means, computing the gap length with your particular formula. You'll then modifying the loop control index and the inner comparison index to use the calculated gap length.
After each iteration of the loop you decrease the gap length(in effect making the comb shorter) by some scaling amount.
The last step would be to experiment with different gap lengths and formulas to see how it affects algorithm efficiency.