I know that in Python, variables are passed by giving a copy of the reference to the object. But I do not understand why in the following piece of code I wrote, the function Partition does not change the elements of arr.
def Partition(arr, lo, hi):
pivot = arr[lo]
i = lo
j = hi
while(True):
while(arr[i] < pivot):
i += 1
if i == hi: break
while(arr[j] > pivot):
j -= 1
if j == lo: break
if i >= j : break #check if ptrs cross
arr[i], arr[j] = arr[j], arr[i]
#swap lo and j
arr[lo], arr[j] = arr[j], arr[lo]
return j
def Sort(arr, start, end):
if (end <= start): return
right = Partition(arr, start, end)
Sort(arr, start, right-1)
Sort(arr, right+1, end)
Your Partition function has a logic problem.
If you follow it with a debugger, you will see it always gets the array back to its initial state before returning. The array would actually be seen as modified, the problem is that after fiddling with it for a while, it gets back to exactly how it was when entering the function.
Do you use debugging tools? If not, start doing so now.
If so, put a breakpoint on the return j statement, and examine the array, you will see what I mean.
You are trying to implement a Hoare partitioning, right?
I think you got it somewhat mixed up. The issue is because you're comparing against the pivot before the first loop iteration, you end up comparing the elements you just swapped again.
This seems to be a bug:
Assume arr = [1,3,4,7,5,8], lo=3, hi=6
def Partition(arr, lo, hi):
pivot = arr[lo] <- this is arr[3] = 7
i = lo <- i = 3
j = hi
while(True):
while(arr[i] < pivot): <- arr[3] = 7 so condition fails hence no swap
Related
lately im comparing different types of sort algorithms in python. I noticed that my quicksort isnt handling well inputs where values are repeated.
def compare_asc(a, b):
return a <= b
def partition(a, p, r, compare):
pivot = a[r]
i = p-1
for j in range(p, r):
if compare(a[j], pivot):
i += 1
a[i], a[j] = a[j], a[i]
a[i+1], a[r] = a[r], a[i+1]
return i + 1
def part_quick_sort(a, p, r, compare):
if p < r:
q = partition(a, p, r, compare)
part_quick_sort(a, p, q-1, compare)
part_quick_sort(a, q+1, r, compare)
def quick_sort(a, compare):
part_quick_sort(a, 0, len(a)-1, compare)
return a
Then I test this
import numpy as np
from timeit import default_timer as timer
import sys
test_list1 = np.random.randint(-10000, 10000, size=10000).tolist()
start = timer()
test_list1 = quick_sort(test_list1, compare_asc)
elapsed = timer() - start
print(elapsed)
test_list2 = np.random.randint(0, 2, size=10000).tolist()
start = timer()
test_list2 = quick_sort(test_list2, compare_asc)
elapsed = timer() - start
print(elapsed)
In this example i get RecursionError: maximum recursion depth exceeded in comparison, so i added sys.setrecursionlimit(1000000) and after that i get this output:
0.030029324000224733
5.489867554000284
Can anyone explain why it throws this recursion depth error only during sorting 2nd list ? And why there it is such big time difference ?
Here's a hint: pass a list where all the elements are the same, and watch what it does line by line. It will take time quadratic in the number of elements, and recurse to a level approximately equal to the number of elements.
The usual quicksort partition implementations proceed from both ends, so that in the all-equal case the list slice is approximately cut in half. You can get decent performance in this case for your "only look left-to-right" approach, but the clearest way to do so is to partition into three regions: "less than", "equal", and "greater than".
That can be done in a single left-to-right pass, and is usually called the "Dutch national flag problem". As the text on the linked page says,
The solution to this problem is of interest for designing sorting algorithms; in particular, variants of the quicksort algorithm that must be robust to repeated elements need a three-way partitioning function ...
CODE
For concreteness, here's a complete implementation doing one-pass "left to right" single-pivot 3-way partitioning. It also incorporates other well-known changes needed to make a quicksort robust for production use. Note:
You cannot create a pure quicksort that avoids worst-case quadratic time. The best you can do is average-case O(N*log(N)) time, and (as below, for one way) make worst-case O(N**2) time unlikely.
You can (as below) guarantee worst-case logarithmic recursion depth.
In this approach, a list of all-equal elements is not a bad case, but a very good case: the partitioning routine is called just once total.
The code:
from random import randrange
def partition(a, lo, hi, pivot):
i = L = lo
R = hi
# invariants:
# a[lo:L] < pivot
# a[L:i] == pivot
# a[i:R] unknown
# a[R:hi] > pivot
while i < R:
elt = a[i]
if elt < pivot:
a[L], a[i] = elt, a[L]
L += 1
i += 1
elif elt > pivot:
R -= 1
a[R], a[i] = elt, a[R]
else:
i += 1
return L, R
def qsort(a, lo=0, hi=None):
if hi is None:
hi = len(a)
while True: # sort a[lo:hi] in place
if hi - lo <= 1:
return
# select pivot ar random; else it's easy to construct
# inputs that systematically require quadratic time
L, R = partition(a, lo, hi, a[randrange(lo, hi)])
# must recur on only the shorter chunk to guarantee
# worst-case recursion depth is logarithmic in hi-lo
if L - lo <= hi - R:
qsort(a, lo, L)
# loop to do qsort(a, R, hi)
lo = R
else:
qsort(a, R, hi)
# loop to do qsort(a, lo, L)
hi = L
I have done a variation of my merge sort algorithm in python, based on what I've learnt from the CLRS book, and compared it with the implementation done on the introductory computer science book by MIT. I cannot find the problem in my algorithm, and the IDLE gives me an index out of range although everything looks fine to me. I'm unsure if this is due to some confusion in borrowing ideas from the MIT algorithm (see below).
lista = [1,2,3,1,1,1,1,6,7,12,2,7,7,67,4,7,9,6,6,3,1,14,4]
def merge(A, p, q, r):
q = (p+r)/2
L = A[p:q+1]
R = A[q+1:r]
i = 0
j = 0
for k in range(len(A)):
#if the list R runs of of space and L[i] has nothing to compare
if i+1 > len(R):
A[k] = L[i]
i += 1
elif j+1 > len(L):
A[k] = R[j]
j += 1
elif L[i] <= R[j]:
A[k] = L[i]
i += 1
elif R[j] <= L[i]:
A[k] = R[j]
j += 1
#when both the sub arrays have run out and all the ifs and elifs done,
# the for loop has effectively ended
return A
def mergesort(A, p, r):
"""A is the list, p is the first index and r is the last index for which
the portion of the list is to be sorted."""
q = (p+r)/2
if p<r:
mergesort(A, p, q)
mergesort(A, q+1, r)
merge (A, p, q, r)
return A
print mergesort(lista, 0, len(lista)-1)
I have followed the pseudocode in CLRS as closely as I could, just without using the "infinity value" at the end of L and R, which would continue to compare (is this less efficient?). I tried to incorporate ideas like that in the MIT book, which is to simply copy down the remaining L or R list to A, to mutate A and return a sorted list. However, I can't seem to find what has went wrong with it. Also, I don't get why the pseudo code requires a 'q' as an input, given that q would be calculated as (p+q)/2 for the middle index anyway. And why is there a need to put p
On the other hand, from the MIT book, we have something that looks really elegant.
def merge(left, right, compare):
"""Assumes left and right are sorted lists and
compare defines an ordering on the elements.
Returns a new sorted(by compare) list containing the
same elements as(left + right) would contain.
"""
result = []
i, j = 0, 0
while i < len(left) and j < len(right):
if compare(left[i], right[j]):
result.append(left[i])
i += 1
else :
result.append(right[j])
j += 1
while (i < len(left)):
result.append(left[i])
i += 1
while (j < len(right)):
result.append(right[j])
j += 1
return result
import operator
def mergeSort(L, compare = operator.lt):
"""Assumes L is a list, compare defines an ordering
on elements of L.
Returns a new sorted list containing the same elements as L"""
if len(L) < 2:
return L[: ]
else :
middle = len(L) //2
left = mergeSort(L[: middle], compare)
right = mergeSort(L[middle: ], compare)
return merge(left, right, compare)
Where could I have gone wrong?
Also, I think the key difference in the MIT implementation is that it creates a new list instead of mutating the original list. This makes it quite difficult for me to understand mergesort, because I found the CLRS explanation quite clear, by understanding it in terms of different layers of recursion occurring to sort the most minute components of the original list (the list of length 1 that needs no sorting), thus "storing" the results of recursion within the old list itself.
However, thinking again, is it right to say that the "result" returned by each recursion in the MIT algorithm, which is in turn combined?
Thank you!
the fundamental difference between your code and the MIT is the conditional statement in the mergesort function. Where your if statement is:
if p<r:
theirs is:
if len(L) < 2:
This means that if you were to have, at any point in the recursive call tree, a list that is of len(A) == 1, then it would still call merge on a size 1 or even 0 list. You can see that this causes problems in the merge function because then your L, R, or both sub lists can end up being of size 0, which would then cause an out if bounds index error.
your problem could then be easily fixed by changing your if statement to something alike to theirs, like len(A) < 2 or r-p < 2
I have a quicksort program here, but there seems to be a problem with the result. I think there must have been some issue in the areas highlighted below when referencing some values. Any suggestions?
#where l represents low, h represents high
def quick(arr,l,h):
#is this the correct array for quicksorting?
if len(x[l:h]) > 1:
#r is pivot POSITION
r = h
#R is pivot ELEMENT
R = arr[r]
i = l-1
for a in range(l,r+1):
if arr[a] <= arr[r]:
i+=1
arr[i], arr[a] = arr[a], arr[i]
#should I take these values? Note that I have repeated elements below, which is what I want to deal with
quick(arr,l,arr.index(R)-1)
quick(arr,arr.index(R)+arr.count(R),h)
x = [6,4,2,1,7,8,5,3]
quick(x,0,len(x)-1)
print(x)
Please check this. I think you find your answer.
def partition(array, begin, end):
pivot = begin
for i in xrange(begin+1, end+1):
if array[i] <= array[begin]:
pivot += 1
array[i], array[pivot] = array[pivot], array[i]
array[pivot], array[begin] = array[begin], array[pivot]
return pivot
def quicksort(array, begin=0, end=None):
if end is None:
end = len(array) - 1
if begin >= end:
return
pivot = partition(array, begin, end)
quicksort(array, begin, pivot-1)
quicksort(array, pivot+1, end)
array = [6,4,2,1,7,8,5,3]
quicksort(array)
print (array)
#should I take these values? Note that I have repeated elements below, which is what I want to deal with
quick(arr,l,arr.index(R)-1)
quick(arr,arr.index(R)+arr.count(R),h)
You seem to be assuming that the values equal to the pivot element are already consecutive. This assumption is probably wrong for your current implementation. Test it e.g. by outputting the full list before recursing.
To make the assumption true, partition into three instead of just two groups, as described at Wikipedia.
I'm trying to implement quicksort in python. However, my code doesn't properly sort (not quite). For example, on the input array [5,3,4,2,7,6,1], my code outputs [1,2,3,5,4,6,7]. So, the end result interposes the 4 and 5. I admit I am a bit rusty on python as I've been studying ML (and was fairly new to python before that). I'm aware of other python implementations of quicksort, and other similar questions on Stack Overflow about python and quicksort, but I am trying to understand what is wrong with this chunk of code that I wrote myself:
#still broken 'quicksort'
def partition(array):
pivot = array[0]
i = 1
for j in range(i, len(array)):
if array[j] < array[i]:
temp = array[i]
array[i] = array[j]
array[j] = temp
i += 1
array[0] = array[i]
array[i] = pivot
return array[0:(i)], pivot, array[(i+1):(len(array))]
def quick_sort(array):
if len(array) <= 1: #if i change this to if len(array) == 1 i get an index out of bound error
return array
low, pivot, high = partition(array)
#quick_sort(low)
#quick_sort(high)
return quick_sort(low) + [pivot] + quick_sort(high)
array = [5,3,4,2,7,6,1]
print quick_sort(array)
# prints [1,2,3,5,4,6,7]
I'm a little confused about what the algorithm's connection to quicksort is. In quicksort, you typically compare all entries against a pivot, so you get a lower and higher group; the quick_sort function clearly expects your partition function to do this.
However, in the partition function, you never compare anything against the value you name pivot. All comparisons are between index i and j, where j is incremented by the for loop and i is incremented if an item was found out of order. Those comparisons include checking an item against itself. That algorithm is more like a selection sort with a complexity slightly worse than a bubble sort. So you get items bubbling left as long as there are enough items to the left of them, with the first item finally dumped after where the last moved item went; since it was never compared against anything, we know this must be out of order if there are items left of it, simply because it replaced an item that was in order.
Thinking a little more about it, the items are only partially ordered, since you do not return to an item once it has been swapped to the left, and it was only checked against the item it replaced (now found to have been out of order). I think it is easier to write the intended function without index wrangling:
def partition(inlist):
i=iter(inlist)
pivot=i.next()
low,high=[],[]
for item in i:
if item<pivot:
low.append(item)
else:
high.append(item)
return low,pivot,high
You might find these reference implementations helpful while trying to understand your own.
Returning a new list:
def qsort(array):
if len(array) < 2:
return array
head, *tail = array
less = qsort([i for i in tail if i < head])
more = qsort([i for i in tail if i >= head])
return less + [head] + more
Sorting a list in place:
def quicksort(array):
_quicksort(array, 0, len(array) - 1)
def _quicksort(array, start, stop):
if stop - start > 0:
pivot, left, right = array[start], start, stop
while left <= right:
while array[left] < pivot:
left += 1
while array[right] > pivot:
right -= 1
if left <= right:
array[left], array[right] = array[right], array[left]
left += 1
right -= 1
_quicksort(array, start, right)
_quicksort(array, left, stop)
Generating sorted items from an iterable:
def qsort(sequence):
iterator = iter(sequence)
try:
head = next(iterator)
except StopIteration:
pass
else:
try:
tail, more = chain(next(iterator), iterator), []
yield from qsort(split(head, tail, more))
yield head
yield from qsort(more)
except StopIteration:
yield head
def chain(head, iterator):
yield head
yield from iterator
def split(head, tail, more):
for item in tail:
if item < head:
yield item
else:
more.append(item)
If pivot ends up needing to stay in the initial position (b/c it is the lowest value), you swap it with some other element anyway.
Read the Fine Manual :
Quick sort explanation and python implementation :
http://interactivepython.org/courselib/static/pythonds/SortSearch/TheQuickSort.html
Sorry, this should be a comment, but it has too complicated structure for a comment.
See what happens for array being [7, 8]:
pivot = 7
i = 1
for loop does nothing
array[0] becomes array[i] which is 8
array[i] becomes pivot which is 7
you return array[0:1] and pivot, which are [8, 7] and 7 (the third subexpression ignored)...
If you explicitly include the returned pivot in concatenation, you should skip it in the array returned.
okay i "fixed" it, at least on the one input i've tried it on (and idk why... python issues)
def partition(array):
pivot = array[0]
i = 1
for j in range(i, len(array)):
if array[j] < pivot:
temp = array[i]
array[i] = array[j]
array[j] = temp
i += 1
array[0] = array[i-1]
array[i-1] = pivot
return array[0:i-1], pivot, array[i:(len(array))]
def quick_sort(array):
if len(array) <= 1:
return array
low, pivot, high = partition(array)
#quick_sort (low)
#quick_sort (high)
return quick_sort (low) + [pivot] + quick_sort (high)
array = [5,3,4,2,7,6,1]
print quick_sort(array)
# prints [1,2,3,4,5,6,7]
I'm learning the quicksort algorithm, but for some reason, the output of this python implementation is just partially sorted, and I get the 'maximum recursion depth reached' for larger inputs. I've been banging my head against this for the last couple of days and I know it's probably something really stupid, but I can't seem to figure it out, so I'll appreciate any help.
def ChoosePivot(list):
return list[0]
def Partition(A,left,right):
p = ChoosePivot(A)
i = left + 1
for j in range(left + 1,right + 1): #upto right + 1 because of range()
if A[j] < p:
A[j], A[i] = A[i], A[j] #swap
i = i + 1
A[left], A[i - 1] = A[i-1], A[left] #swap
return i - 1
def QuickSort(list,left, right):
if len(list) == 1: return
if left < right:
pivot = Partition(list,left,right)
QuickSort(list,left, pivot - 1)
QuickSort(list,pivot + 1, right)
return list[:pivot] + [list[pivot]] + list[pivot+1:]
sample_array = [39,2,41,95,44,8,7,6,9,10,34,56,75,100]
print "Unsorted list: "
print sample_array
sample_array = QuickSort(sample_array,0,len(sample_array)-1)
print "Sorted list:"
print sample_array
Not entirley sure this is the issue, but you are chosing pivot wrongly:
def ChoosePivot(list):
return list[0]
def Partition(A,left,right):
p = ChoosePivot(A)
....
You are always taking the head of the original list, and not the head of the modified list.
Assume at some point you reduced the range to left=5,right=10 - you chose list[0] as the pivot - that can't be good.
As a result, in each iteration where left>0 you ignore the first element in the list, and "miss" it - which can explain the partial sorting
def ChoosePivot(list):
return list[0]
As amit said, this is wrong. You want p = A[left]. However, there is another issue:
if A[j] < p:
A[j], A[i] = A[i], A[j] #swap
i = i + 1
The pivot index should only be incremented when you swap. Indent i = i + 1 to the same depth as the swap, as part of the if statement.
Bonus question: Why are you partitioning twice?
also last swap;
A[left], A[i - 1] = A[i-1], A[left] #swap
should be done with pivot.
Besides that Quicksort works inplace. So you don't need following return;
return list[:pivot] + [list[pivot]] + list[pivot+1:]
Not exactly an answer to your question, but I believe it's still of most relevance.
Choosing a pivot always on the same position when implementing quicksort is a flaw on the algorithm. One can generate a sequence of numbers that makes your algorithm run in O(n^2) time, and absolute run time probably worse than bubblesort.
In your algorithm, choosing the leftmost item makes the algorithm run in worst-case time when the array is already sorted or nearly sorted.
The choice of the pivot should be performed randomly to avoid this issue.
Check the algorithms' Implementation issues in Wikipedia: http://en.wikipedia.org/wiki/Quicksort#Implementation_issues
Actually, check the hole article. It IS worth your time.