Sorting Technique Python

Sorting Technique Python - python

I'm trying to create a sorting technique that sorts a list of numbers. But what it does is that it compares two numbers, the first being the first number in the list, and the other number would be the index of 2k - 1.
2^k - 1 = [1,3,7, 15, 31, 63...]
For example, if I had a list [1, 4, 3, 6, 2, 10, 8, 19]
The length of this list is 8. So the program should find a number in the 2k - 1 list that is less than 8, in this case it will be 7.
So now it will compare the first number in the random list (1) with the 7th number in the same list (19). if it is greater than the second number, it will swap positions.
After this step, it will continue on to 4 and the 7th number after that, but that doesn't exist, so now it should compare with the 3rd number after 4 because 3 is the next number in 2k - 1.
So it should compare 4 with 2 and swap if they are not in the right place. So this should go on and on until I reach 1 in 2k - 1 in which the list will finally be sorted.
I need help getting started on this code.
So far, I've written a small code that makes the 2k - 1 list but thats as far as I've gotten.
a = []
for i in range(10):
a.append(2**(i+1) -1)
print(a)
EXAMPLE:
Consider sorting the sequence V = 17,4,8,2,11,5,14,9,18,12,7,1. The skipping
sequence 1, 3, 7, 15, … yields r=7 as the biggest value which fits, so looking at V, the first sparse subsequence =
17,9, so as we pass along V we produce 9,4,8,2,11,5,14,17,18,12,7,1 after the first swap, and
9,4,8,2,1,5,14,17,18,12,7,11 after using r=7 completely. Using a=3 (the next smaller term in the skipping
sequence), the first sparse subsequence = 9,2,14,12, which when applied to V gives 2,4,8,9,1,5,12,17,18,14,7,11, and the remaining a = 3 sorts give 2,1,8,9,4,5,12,7,18,14,17,11, and then 2,1,5,9,4,8,12,7,11,14,17,18. Finally, with a = 1, we get 1,2,4,5,7,8,9,11,12,14,17,18.
You might wonder, given that at the end we do a sort with no skips, why
this might be any faster than simply doing that final step as the only step at the beginning. Think of it as a comb
going through the sequence -- notice that in the earlier steps we’re using course combs to get distant things in the
right order, using progressively finer combs until at the end our fine-tuning is dealing with a nearly-sorted sequence
needing little adjustment.
p = 0
x = len(V) #finding out the length of V to find indexer in a
for j in a: #for every element in a (1,3,7....)
if x >= j: #if the length is greater than or equal to current checking value
p = j #sets j as p
So that finds what distance it should compare the first number in the list with but now i need to write something that keeps doing that until the distance is out of range so it switches from 3 to 1 and then just checks the smaller distances until the list is sorted.

The sorting algorithm you're describing actually is called Combsort. In fact, the simpler bubblesort is a special case of combsort where the gap is always 1 and doesn't change.
Since you're stuck on how to start this, here's what I recommend:
Implement the bubblesort algorithm first. The logic is simpler and makes it much easier to reason about as you write it.
Once you've done that you have the important algorithmic structure in place and from there it's just a matter of adding gap length calculation into the mix. This means, computing the gap length with your particular formula. You'll then modifying the loop control index and the inner comparison index to use the calculated gap length.
After each iteration of the loop you decrease the gap length(in effect making the comb shorter) by some scaling amount.
The last step would be to experiment with different gap lengths and formulas to see how it affects algorithm efficiency.

Related

Fast way to check consecutive subsequences for total

I have a list (up to 10,000 long) of numbers 0, 1, or 2.
I need to see how many consecutive subsequences have a total which is NOT 1. My current method is to for each list do:
cons = 0
for i in range(seqlen+1):
for j in range(i+1, seqlen+1):
if sum(twos[i:j]) != 1:
cons += 1
So an example input would be:
[0, 1, 2, 0]
and the output would be
cons = 8
as the 8 working subsequences are:
[0] [2] [0] [1,2] [2, 0] [0, 1, 2] [1, 2, 0] [0, 1, 2, 0]
The issue is that simply going through all these subsequences (the i in range, j in range) takes almost more time than is allowed, and when the if statement is added, the code takes far too long to run on the server. (To be clear, this is only a small part of a larger problem, I'm not just asking for the solution to an entire problem). Anyway, is there any other way to check faster? I can't think of anything that wouldn't result in more operations needing to happen every time.

I think I see the problem: your terminology is incorrect / redundant. By definition, a sub-sequence is a series of consecutive elements.
Do not sum every candidate. Instead, identify every candidate whose sum is 1, and then subtract that total from the computed quantity of all sub-sequences (simple algebra).
All of the 1-sum candidates are of the regular expression form 0*10*: a 1 surrounded by any quantity of 0s on either or both sides.
Identify all such maximal-length strings. FOr instance, in
210002020001002011
you will pick out 1000, 000100, 01, and 1. For each string compute the quantity of substrings that contain the 1 (a simple equation on the lengths of the 0s on each side). Add up those quantities. Subtract from the total for the entire input. There's you answer.

Use sliding window technique to solve these type of problem. Take two variable to track first and last to track the scope of window. So you start with sum equal to first element. If the sum is larger than required value you subtract the 'first' element from sum and increment sum by 1. If the sum is smaller than required you add next element of 'last' pointer and increment last by 1. Every time sum is equal to required increment some counter.
As for NOT, count number of sub-sequence having '1' sum and then subtract from total number of sub-sequence possible, i.e. n * (n + 1) / 2

Find longest quasi-constant sub-sequence of a sequence

I had this test earlier today, and I tried to be too clever and hit a road block. Unfortunately I got stuck in this mental rut and wasted too much time, failing this portion of the test. I solved it afterward, but maybe y'all can help me get out of the initial rut I was in.
Problem definition:
An unordered and non-unique sequence A consisting of N integers (all positive) is given. A subsequence of A is any sequence obtained by removing none, some or all elements from A. The amplitude of a sequence is the difference between the largest and the smallest element in this sequence. The amplitude of the empty subsequence is assumed to be 0.
For example, consider the sequence A consisting of six elements such that:
A[0] = 1
A[1] = 7
A[2] = 6
A[3] = 2
A[4] = 6
A[5] = 4
A subsequence of array A is called quasi-constant if its amplitude does not exceed 1. In the example above, the subsequences [1,2], [6,6], and [6,6,7] are quasi-constant. Subsequence [6, 6, 7] is the longest possible quasi-constant subsequence of A.
Now, find a solution that, given a non-empty zero-indexed array A consisting of N integers, returns the length of the longest quasi-constant subsequence of array A. For example, given sequence A outlined above, the function should return 3, as explained.
Now, I solved this in python 3.6 after the fact using a sort-based method with no classes (my code is below), but I didn't initially want to do that as sorting on large lists can be very slow. It seemed this should have a relatively simple formulation as a breadth-first tree-based class, but I couldn't get it right. Any thoughts on this?
My class-less sort-based solution:
def amp(sub_list):
if len(sub_list) <2:
return 0
else:
return max(sub_list) - min(sub_list)
def solution(A):
A.sort()
longest = 0
idxStart = 0
idxEnd = idxStart + 1
while idxEnd <= len(A):
tmp = A[idxStart:idxEnd]
if amp(tmp) < 2:
idxEnd += 1
if len(tmp) > longest:
longest = len(tmp)
else:
idxStart = idxEnd
idxEnd = idxStart + 1
return longest

As Andrey Tyukin pointed out, you can solve this problem in O(n) time, which is better than the O(n log n) time you'd likely get from either sorting or any kind of tree based solution. The trick is to use dictionaries to count the number of occurrences of each number in the input, and use the count to figure out the longest subsequence.
I had a similar idea to him, but I had though of a slightly different implementation. After a little testing, it looks like my approach is a quite a bit faster, so I'm posting it as my own answer. It's quite short!
from collections import Counter
def solution(seq):
if not seq: # special case for empty input sequence
return 0
counts = Counter(seq)
return max(counts[x] + counts[x+1] for x in counts)
I suspect this is faster than Andrey's solution because the running time for both of our solutions really take O(n) + O(k) time where k is the number of distinct values in the input (and n is the total number of values in the input). My code handles the O(n) part very efficiently by handing off the sequence to the Counter constructor, which is implemented in C. It is likely to be a bit slower (on a per-item basis) to deal with the O(k) part, since it needs a generator expression. Andrey's code does the reverse (it runs slower Python code for the O(n) part, and uses faster builtin C functions for the O(k) part). Since k is always less than or equal to n (perhaps a lot less if the sequence has a lot of repeated values), my code is faster overall. Both solutions are still O(n) though, and both should be much better than sorting for large inputs.

I don't know how BFS is supposed to help here.
Why not simply run once through the sequence and count how many elements every possible quasi-constant subsequence would have?
from collections import defaultdict
def longestQuasiConstantSubseqLength(seq):
d = defaultdict(int)
for s in seq:
d[s] += 1
d[s+1] += 1
return max(d.values() or [0])
s = [1,7,6,2,6,4]
print(longestQuasiConstantSubseqLength(s))
prints:
3
as expected.
Explanation: Every non-constant quasi-constant subsequence is uniquely identified by the greatest number that it contains (there can be only two, take the greater one). Now, if you have a number s, it can either contribute to the quasi-constant subsequence that has s or s + 1 as the greatest number. So, just add +1 to the subsequences identified by s and s + 1. Then output the maximum of all counts.
You can't get it faster than O(n), because you have to look at every entry of the input sequence at least once.

Evenly distribute within a list (Google Foobar: Maximum Equality)

This question comes from Google Foobar, and my code passes all but the last test, with the input/output hidden.
The prompt
In other words, choose two elements of the array, x[i] and x[j]
(i distinct from j) and simultaneously increment x[i] by 1 and decrement
x[j] by 1. Your goal is to get as many elements of the array to have
equal value as you can.
For example, if the array was [1,4,1] you could perform the operations
as follows:
Send a rabbit from the 1st car to the 0th: increment x[0], decrement
x[1], resulting in [2,3,1] Send a rabbit from the 1st car to the 2nd:
increment x[2], decrement x[1], resulting in [2,2,2].
All the elements are of the array are equal now, and you've got a
strategy to report back to Beta Rabbit!
Note that if the array was [1,2], the maximum possible number of equal
elements we could get is 1, as the cars could never have the same
number of rabbits in them.
Write a function answer(x), which takes the array of integers x and
returns the maximum number of equal array elements that we can get, by
doing the above described command as many times as needed.
The number of cars in the train (elements in x) will be at least 2,
and no more than 100. The number of rabbits that want to share a car
(each element of x) will be an integer in the range [0, 1000000].
My code
from collections import Counter
def most_common(lst):
data = Counter(lst)
return data.most_common(1)[0][1]
def answer(x):
"""The goal is to take all of the rabbits in list x and distribute
them equally across the original list elements."""
total = sum(x)
length = len(x)
# Find out how many are left over when distributing niavely.
div, mod = divmod(total, length)
# Because of the variable size of the list, the remainder
# might be greater than the length of the list.
# I just realized this is unnecessary.
while mod > length:
div += length
mod -= length
# Create a new list the size of x with the base number of rabbits.
result = [div] * length
# Distribute the leftovers from earlier across the list.
for i in xrange(mod):
result[i] += 1
# Return the most common element.
return most_common(result)
It runs well under my own testing purposes, handling one million tries in ten or so seconds. But it fails under an unknown input.
Have I missed something obvious, or did I make an assumption I shouldn't have?

Sorry, but your code doesn't work in my testing. I fed it [0, 0, 0, 0, 22] and got back a list of [5, 5, 4, 4, 4] for an answer of 3; the maximum would be 4 identical cars, with the original input being one such example. [4, 4, 4, 4, 6] would be another. I suspect that's your problem, and that there are quite a few other such examples in the data base.
For N cars, the maximum would be either N (if the rabbit population is divisible by the number of cars) or N-1. This seems so simple that I fear I'm missing a restriction in the problem. It didn't ask for a balanced population, just as many car populations as possible should be equal. In short:
def answer(census):
size = len(census)
return size if sum(census) % size == 0 else (size-1)

Is it possible to calculate the number of count inversions using quicksort?

I have already solved the problem using mergesort, now I am thinking is that possible to calculate the number using quicksort? I also coded the quicksort, but I don't know how to calculate. Here is my code：
def Merge_and_Count(AL, AR):
count=0
i = 0
j = 0
A = []
for index in range(0, len(AL) + len(AR)):
if i<len(AL) and j<len(AR):
if AL[i] > AR[j]:
A.append(AR[j])
j = j + 1
count = count+len(AL) - i
else:
A.append(AL[i])
i = i + 1
elif i<len(AL):
A.append(AL[i])
i=i+1
elif j<len(AR):
A.append(AR[j])
j=j+1
return(count,A)
def Sort_and_Count(Arrays):
if len(Arrays)==1:
return (0,Arrays)
list1=Arrays[:len(Arrays) // 2]
list2=Arrays[len(Arrays) // 2:]
(LN,list1) = Sort_and_Count(list1)
(RN,list2) = Sort_and_Count(list2)
(M,Arrays)= Merge_and_Count(list1,list2)
return (LN + RN + M,Arrays)

Generally no, because during the partitioning, when you move a value to its correct side of the pivot, you don't know how many of the values you're moving it past are smaller than it and how many are larger. So, as soon as you do that you've lost information about the number of inversions in the original input.

I come across this problem for some times, As a whole, I think it should be still ok to use quick sort to compute the inversion count, as long as we do some modification to the original quick sort algorithm. (But I have not verified it yet, sorry for that).
Consider an array 3, 6, 2, 5, 4, 1. Support we use 3 as the pivot, the most voted answer is right in that the exchange might mess the orders of the other numbers. However, we might do it different by introducing a new temporary array:
Iterates over the array for the first time. During the iteration, moves all the numbers less than 3 to the temporary array. For each such number, we also records how much number larger than 3 are before it. In this case, the number 2 has one number 6 before it, and the number 1 has 3 number 6, 5, 4 before it. This could be done by a simple counting.
Then we copy 3 into the temporary array.
Then we iterates the array again and move the numbers large than 3 into the temporary array. At last we get 2 1 3 6 5 4.
The problem is that during this process how much inversion pairs are lost? The number is the sum of all the numbers in the first step, and the count of number less than the pivot in the second step. Then we have count all the inversion numbers that one is >= pivot and another is < pivot. Then we could recursively deal with the left part and the right part.

Detect period of unknown source

How to detect repeating digits in an infinite sequence? I tried Floyd & Brent detection algorithm but come to nothing...
I have a generator that yields numbers ranging from 0 to 9 (inclusive) and I have to recognize a period in it.
Example test case:
import itertools
# of course this is a fake one just to offer an example
def source():
return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 1))
>>> gen = source()
>>> period(gen)
(1, 0, 1, 4, 8, 2, 1, 3, 3, 1)

Empirical methods
Here's a fun take on the problem. The more general form of your question is this:
Given a repeating sequence of unknown length, determine the period of
the signal.
The process to determine the repeating frequencies is known as the Fourier Transform. In your example case the signal is clean and discrete, but the following solution will work even with continuous noisy data! The FFT will try to duplicate the frequencies of the input signal by approximating them in the so-called "wave-space" or "Fourier-space". Basically a peak in this space corresponds to a repeating signal. The period of your signal is related to the longest wavelength that is peaked.
import itertools
# of course this is a fake one just to offer an example
def source():
return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 2))
import pylab as plt
import numpy as np
import scipy as sp
# Generate some test data, i.e. our "observations" of the signal
N = 300
vals = source()
X = np.array([vals.next() for _ in xrange(N)])
# Compute the FFT
W = np.fft.fft(X)
freq = np.fft.fftfreq(N,1)
# Look for the longest signal that is "loud"
threshold = 10**2
idx = np.where(abs(W)>threshold)[0][-1]
max_f = abs(freq[idx])
print "Period estimate: ", 1/max_f
This gives the correct answer for this case, 10 though if N didn't divide the cycles cleanly, you would get a close estimate. We can visualize this via:
plt.subplot(211)
plt.scatter([max_f,], [np.abs(W[idx]),], s=100,color='r')
plt.plot(freq[:N/2], abs(W[:N/2]))
plt.xlabel(r"$f$")
plt.subplot(212)
plt.plot(1.0/freq[:N/2], abs(W[:N/2]))
plt.scatter([1/max_f,], [np.abs(W[idx]),], s=100,color='r')
plt.xlabel(r"$1/f$")
plt.xlim(0,20)
plt.show()

Evgeny Kluev's answer provides a way to get an answer that might be right.
Definition
Let's assume you have some sequence D that is a repeating sequence. That is there is some sequence d of length L such that: D_i = d_{i mod L}, where t_i is the ith element of sequence t that is numbered from 0. We will say sequence d generates D.
Theorem
Given a sequence D which you know is generated by some finite sequence t. Given some d it is impossible to decide in finite time whether it generates D.
Proof
Since we are only allowed a finite time we can only access a finite number of elements of D. Let us suppose we access the first F elements of D. We chose the first F because if we are only allowed to access a finite number, the set containing the indices of the elements we've accessed is finite and hence has a maximum. Let that maximum be M. We can then let F = M+1, which is still a finite number.
Let L be the length of d and that D_i = d_{i mod L} for i < F. There are two possibilities for D_F it is either the same as d_{F mod L} or it is not. In the former case d seems to work, but in the latter case it does not. We cannot know which case is true until we access D_F. This will however require accessing F+1 elements, but we are limited to F element accesses.
Hence, for any F we won't have enough information to decide whether d generates D and therefore it is impossible to know in finite time whether d generates D.
Conclusions
It is possible to know in finite time that a sequence d does not generate D, but this doesn't help you. Since you want to find a sequence d that does generate D, but this involves amongst other things being able to prove that some sequence generates D.
Unless you have more information about D this problem is unsolvable. One bit of information that will make this problem decidable is some upper bound on the length of the shortest d that generates D. If you know the function generating D only has a known amount of finite state you can calculate this upper bound.
Hence, my conclusion is that you cannot solve this problem unless you change the specification a bit.

I have no idea about proper algorithms to apply here, but my understanding also is that you can never know for sure that you've detected a period if you have consumed only a finite number of terms. Anyway, here's what I've come up with, this is a very naive implementation, more to educate from the comments than to provide a good solution (I guess).
def guess_period(source, minlen=1, maxlen=100, trials=100):
for n in range(minlen, maxlen+1):
p = [j for i, j in zip(range(n), source)]
if all([j for i, j in zip(range(n), source)] == p
for k in range(trials)):
return tuple(p)
return None
This one, however, "forgets" the initial order and returns a tuple that is a cyclic permutation of the actual period:
In [101]: guess_period(gen)
Out[101]: (0, 1, 4, 8, 2, 1, 3, 3, 1, 1)
To compensate for this, you'll need to keep track of the offset.

Since your sequence is not of the form Xn+1 = f(Xn), Floyd's or Brent's algorithms are not directly applicable to your case. But they may be extended to do the task:
Use Floyd's or Brent's algorithm to find some repeating element of the sequence.
Find next sequence element with the same value. Distance between these elements is a supposed period (k).
Remember next k elements of the sequence
Find the next occurrence of this k-element subsequence.
If distance between subsequences is greater than k, update k and continue with the step 3.
Repeat step 4 several times to verify the result. If maximum length of the repeating sub-sequence is known a-priori, use appropriate number of repetitions. Otherwise use as much repetitions as possible, because each repetition increases the result's correctness.
If the sequence cycling starts from the first element, ignore step 1 and start from step 2 (find next sequence element equal to the first element).
If the sequence cycling does not start from the first element, it is possible that Floyd's or Brent's algorithm finds some repeating element of the sequence that does not belong to the cycle. So it is reasonable to limit the number of iterations in steps 2 and 4, and if this limit is exceeded, continue with the step 1.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.