I'm learning the quicksort algorithm, but for some reason, the output of this python implementation is just partially sorted, and I get the 'maximum recursion depth reached' for larger inputs. I've been banging my head against this for the last couple of days and I know it's probably something really stupid, but I can't seem to figure it out, so I'll appreciate any help.
def ChoosePivot(list):
return list[0]
def Partition(A,left,right):
p = ChoosePivot(A)
i = left + 1
for j in range(left + 1,right + 1): #upto right + 1 because of range()
if A[j] < p:
A[j], A[i] = A[i], A[j] #swap
i = i + 1
A[left], A[i - 1] = A[i-1], A[left] #swap
return i - 1
def QuickSort(list,left, right):
if len(list) == 1: return
if left < right:
pivot = Partition(list,left,right)
QuickSort(list,left, pivot - 1)
QuickSort(list,pivot + 1, right)
return list[:pivot] + [list[pivot]] + list[pivot+1:]
sample_array = [39,2,41,95,44,8,7,6,9,10,34,56,75,100]
print "Unsorted list: "
print sample_array
sample_array = QuickSort(sample_array,0,len(sample_array)-1)
print "Sorted list:"
print sample_array
Not entirley sure this is the issue, but you are chosing pivot wrongly:
def ChoosePivot(list):
return list[0]
def Partition(A,left,right):
p = ChoosePivot(A)
....
You are always taking the head of the original list, and not the head of the modified list.
Assume at some point you reduced the range to left=5,right=10 - you chose list[0] as the pivot - that can't be good.
As a result, in each iteration where left>0 you ignore the first element in the list, and "miss" it - which can explain the partial sorting
def ChoosePivot(list):
return list[0]
As amit said, this is wrong. You want p = A[left]. However, there is another issue:
if A[j] < p:
A[j], A[i] = A[i], A[j] #swap
i = i + 1
The pivot index should only be incremented when you swap. Indent i = i + 1 to the same depth as the swap, as part of the if statement.
Bonus question: Why are you partitioning twice?
also last swap;
A[left], A[i - 1] = A[i-1], A[left] #swap
should be done with pivot.
Besides that Quicksort works inplace. So you don't need following return;
return list[:pivot] + [list[pivot]] + list[pivot+1:]
Not exactly an answer to your question, but I believe it's still of most relevance.
Choosing a pivot always on the same position when implementing quicksort is a flaw on the algorithm. One can generate a sequence of numbers that makes your algorithm run in O(n^2) time, and absolute run time probably worse than bubblesort.
In your algorithm, choosing the leftmost item makes the algorithm run in worst-case time when the array is already sorted or nearly sorted.
The choice of the pivot should be performed randomly to avoid this issue.
Check the algorithms' Implementation issues in Wikipedia: http://en.wikipedia.org/wiki/Quicksort#Implementation_issues
Actually, check the hole article. It IS worth your time.
Related
I am wondering why this brute force approach to a Maximum Sum Subarray of Size K problem is of time complexity nk instead of (n-k)k. Given that we are subtracting K elements from the outer most loop wouldn't the latter be more appropriate? The text solution mentions nk and confuses me slightly.
I have included the short code snippet below!
Thank you
def max_sub_array_of_size_k(k, arr):
max_sum = 0
window_sum = 0
for i in range(len(arr) - k + 1):
window_sum = 0
for j in range(i, i+k):
window_sum += arr[j]
max_sum = max(max_sum, window_sum)
return max_sum
I haven't actually tried to fix this, I just want to understand.
In the calculation of time complexity, O(n)=O(n-1)=O(n-k) ,both represent the complexity of linear growth, thus O(n-k)✖️O(k) = O(n*k). Of course, this question can be optimized to O(n) time complexity by using the sum of prefixes.
def max_sub_array_of_size_k(k, arr):
s = [0]
for i in range(len(arr)):
# sum[i] = sum of arr[0] + ... + arr[i]
s.append(s[-1] + arr[i])
max_sum = float("-inf")
for i in range(1, len(s) + 1 - k):
max_sum = max(max_sum, s[i + k - 1] - s[i - 1])
return max_sum
It's O(k(n-k+1)), actually, and you are right that that is a tighter bound than O(nk).
Big-O gives an upper bound, though, so O(k(n-k+1)) is a subset of O(nk), and saying that the complexity is in O(nk) is also correct.
It's just a question of whether or not the person making the statement about the algorithm's complexity cares about, or cares to communicate, the fact that it can be faster when k is close to n.
I have done a variation of my merge sort algorithm in python, based on what I've learnt from the CLRS book, and compared it with the implementation done on the introductory computer science book by MIT. I cannot find the problem in my algorithm, and the IDLE gives me an index out of range although everything looks fine to me. I'm unsure if this is due to some confusion in borrowing ideas from the MIT algorithm (see below).
lista = [1,2,3,1,1,1,1,6,7,12,2,7,7,67,4,7,9,6,6,3,1,14,4]
def merge(A, p, q, r):
q = (p+r)/2
L = A[p:q+1]
R = A[q+1:r]
i = 0
j = 0
for k in range(len(A)):
#if the list R runs of of space and L[i] has nothing to compare
if i+1 > len(R):
A[k] = L[i]
i += 1
elif j+1 > len(L):
A[k] = R[j]
j += 1
elif L[i] <= R[j]:
A[k] = L[i]
i += 1
elif R[j] <= L[i]:
A[k] = R[j]
j += 1
#when both the sub arrays have run out and all the ifs and elifs done,
# the for loop has effectively ended
return A
def mergesort(A, p, r):
"""A is the list, p is the first index and r is the last index for which
the portion of the list is to be sorted."""
q = (p+r)/2
if p<r:
mergesort(A, p, q)
mergesort(A, q+1, r)
merge (A, p, q, r)
return A
print mergesort(lista, 0, len(lista)-1)
I have followed the pseudocode in CLRS as closely as I could, just without using the "infinity value" at the end of L and R, which would continue to compare (is this less efficient?). I tried to incorporate ideas like that in the MIT book, which is to simply copy down the remaining L or R list to A, to mutate A and return a sorted list. However, I can't seem to find what has went wrong with it. Also, I don't get why the pseudo code requires a 'q' as an input, given that q would be calculated as (p+q)/2 for the middle index anyway. And why is there a need to put p
On the other hand, from the MIT book, we have something that looks really elegant.
def merge(left, right, compare):
"""Assumes left and right are sorted lists and
compare defines an ordering on the elements.
Returns a new sorted(by compare) list containing the
same elements as(left + right) would contain.
"""
result = []
i, j = 0, 0
while i < len(left) and j < len(right):
if compare(left[i], right[j]):
result.append(left[i])
i += 1
else :
result.append(right[j])
j += 1
while (i < len(left)):
result.append(left[i])
i += 1
while (j < len(right)):
result.append(right[j])
j += 1
return result
import operator
def mergeSort(L, compare = operator.lt):
"""Assumes L is a list, compare defines an ordering
on elements of L.
Returns a new sorted list containing the same elements as L"""
if len(L) < 2:
return L[: ]
else :
middle = len(L) //2
left = mergeSort(L[: middle], compare)
right = mergeSort(L[middle: ], compare)
return merge(left, right, compare)
Where could I have gone wrong?
Also, I think the key difference in the MIT implementation is that it creates a new list instead of mutating the original list. This makes it quite difficult for me to understand mergesort, because I found the CLRS explanation quite clear, by understanding it in terms of different layers of recursion occurring to sort the most minute components of the original list (the list of length 1 that needs no sorting), thus "storing" the results of recursion within the old list itself.
However, thinking again, is it right to say that the "result" returned by each recursion in the MIT algorithm, which is in turn combined?
Thank you!
the fundamental difference between your code and the MIT is the conditional statement in the mergesort function. Where your if statement is:
if p<r:
theirs is:
if len(L) < 2:
This means that if you were to have, at any point in the recursive call tree, a list that is of len(A) == 1, then it would still call merge on a size 1 or even 0 list. You can see that this causes problems in the merge function because then your L, R, or both sub lists can end up being of size 0, which would then cause an out if bounds index error.
your problem could then be easily fixed by changing your if statement to something alike to theirs, like len(A) < 2 or r-p < 2
I have a quicksort program here, but there seems to be a problem with the result. I think there must have been some issue in the areas highlighted below when referencing some values. Any suggestions?
#where l represents low, h represents high
def quick(arr,l,h):
#is this the correct array for quicksorting?
if len(x[l:h]) > 1:
#r is pivot POSITION
r = h
#R is pivot ELEMENT
R = arr[r]
i = l-1
for a in range(l,r+1):
if arr[a] <= arr[r]:
i+=1
arr[i], arr[a] = arr[a], arr[i]
#should I take these values? Note that I have repeated elements below, which is what I want to deal with
quick(arr,l,arr.index(R)-1)
quick(arr,arr.index(R)+arr.count(R),h)
x = [6,4,2,1,7,8,5,3]
quick(x,0,len(x)-1)
print(x)
Please check this. I think you find your answer.
def partition(array, begin, end):
pivot = begin
for i in xrange(begin+1, end+1):
if array[i] <= array[begin]:
pivot += 1
array[i], array[pivot] = array[pivot], array[i]
array[pivot], array[begin] = array[begin], array[pivot]
return pivot
def quicksort(array, begin=0, end=None):
if end is None:
end = len(array) - 1
if begin >= end:
return
pivot = partition(array, begin, end)
quicksort(array, begin, pivot-1)
quicksort(array, pivot+1, end)
array = [6,4,2,1,7,8,5,3]
quicksort(array)
print (array)
#should I take these values? Note that I have repeated elements below, which is what I want to deal with
quick(arr,l,arr.index(R)-1)
quick(arr,arr.index(R)+arr.count(R),h)
You seem to be assuming that the values equal to the pivot element are already consecutive. This assumption is probably wrong for your current implementation. Test it e.g. by outputting the full list before recursing.
To make the assumption true, partition into three instead of just two groups, as described at Wikipedia.
I'm trying to implement quicksort in python. However, my code doesn't properly sort (not quite). For example, on the input array [5,3,4,2,7,6,1], my code outputs [1,2,3,5,4,6,7]. So, the end result interposes the 4 and 5. I admit I am a bit rusty on python as I've been studying ML (and was fairly new to python before that). I'm aware of other python implementations of quicksort, and other similar questions on Stack Overflow about python and quicksort, but I am trying to understand what is wrong with this chunk of code that I wrote myself:
#still broken 'quicksort'
def partition(array):
pivot = array[0]
i = 1
for j in range(i, len(array)):
if array[j] < array[i]:
temp = array[i]
array[i] = array[j]
array[j] = temp
i += 1
array[0] = array[i]
array[i] = pivot
return array[0:(i)], pivot, array[(i+1):(len(array))]
def quick_sort(array):
if len(array) <= 1: #if i change this to if len(array) == 1 i get an index out of bound error
return array
low, pivot, high = partition(array)
#quick_sort(low)
#quick_sort(high)
return quick_sort(low) + [pivot] + quick_sort(high)
array = [5,3,4,2,7,6,1]
print quick_sort(array)
# prints [1,2,3,5,4,6,7]
I'm a little confused about what the algorithm's connection to quicksort is. In quicksort, you typically compare all entries against a pivot, so you get a lower and higher group; the quick_sort function clearly expects your partition function to do this.
However, in the partition function, you never compare anything against the value you name pivot. All comparisons are between index i and j, where j is incremented by the for loop and i is incremented if an item was found out of order. Those comparisons include checking an item against itself. That algorithm is more like a selection sort with a complexity slightly worse than a bubble sort. So you get items bubbling left as long as there are enough items to the left of them, with the first item finally dumped after where the last moved item went; since it was never compared against anything, we know this must be out of order if there are items left of it, simply because it replaced an item that was in order.
Thinking a little more about it, the items are only partially ordered, since you do not return to an item once it has been swapped to the left, and it was only checked against the item it replaced (now found to have been out of order). I think it is easier to write the intended function without index wrangling:
def partition(inlist):
i=iter(inlist)
pivot=i.next()
low,high=[],[]
for item in i:
if item<pivot:
low.append(item)
else:
high.append(item)
return low,pivot,high
You might find these reference implementations helpful while trying to understand your own.
Returning a new list:
def qsort(array):
if len(array) < 2:
return array
head, *tail = array
less = qsort([i for i in tail if i < head])
more = qsort([i for i in tail if i >= head])
return less + [head] + more
Sorting a list in place:
def quicksort(array):
_quicksort(array, 0, len(array) - 1)
def _quicksort(array, start, stop):
if stop - start > 0:
pivot, left, right = array[start], start, stop
while left <= right:
while array[left] < pivot:
left += 1
while array[right] > pivot:
right -= 1
if left <= right:
array[left], array[right] = array[right], array[left]
left += 1
right -= 1
_quicksort(array, start, right)
_quicksort(array, left, stop)
Generating sorted items from an iterable:
def qsort(sequence):
iterator = iter(sequence)
try:
head = next(iterator)
except StopIteration:
pass
else:
try:
tail, more = chain(next(iterator), iterator), []
yield from qsort(split(head, tail, more))
yield head
yield from qsort(more)
except StopIteration:
yield head
def chain(head, iterator):
yield head
yield from iterator
def split(head, tail, more):
for item in tail:
if item < head:
yield item
else:
more.append(item)
If pivot ends up needing to stay in the initial position (b/c it is the lowest value), you swap it with some other element anyway.
Read the Fine Manual :
Quick sort explanation and python implementation :
http://interactivepython.org/courselib/static/pythonds/SortSearch/TheQuickSort.html
Sorry, this should be a comment, but it has too complicated structure for a comment.
See what happens for array being [7, 8]:
pivot = 7
i = 1
for loop does nothing
array[0] becomes array[i] which is 8
array[i] becomes pivot which is 7
you return array[0:1] and pivot, which are [8, 7] and 7 (the third subexpression ignored)...
If you explicitly include the returned pivot in concatenation, you should skip it in the array returned.
okay i "fixed" it, at least on the one input i've tried it on (and idk why... python issues)
def partition(array):
pivot = array[0]
i = 1
for j in range(i, len(array)):
if array[j] < pivot:
temp = array[i]
array[i] = array[j]
array[j] = temp
i += 1
array[0] = array[i-1]
array[i-1] = pivot
return array[0:i-1], pivot, array[i:(len(array))]
def quick_sort(array):
if len(array) <= 1:
return array
low, pivot, high = partition(array)
#quick_sort (low)
#quick_sort (high)
return quick_sort (low) + [pivot] + quick_sort (high)
array = [5,3,4,2,7,6,1]
print quick_sort(array)
# prints [1,2,3,4,5,6,7]
I have implemented insertion sort in python and was wondering how to determine the complexity of the algorithm. Is this an inefficient way of implementing insertion sort? To me, this seems like the most readable algorithm.
import random as rand
source = [3,1,0,10,20,2,1]
target = []
while len(source)!=0:
if len(target) ==0:
target.append(source[0])
source.pop(0)
element = source.pop(0)
if(element <= target[0]):
target.reverse()
target.append(element)
target.reverse()
elif element > target[len(target)-1]:
target.append(element)
else:
for i in range(0,len(target)-1):
if element >= target[i] and element <= target[i+1]:
target.insert(i+1,element)
break
print target
Instead of:
target.reverse()
target.append(element)
target.reverse()
try:
target.insert(0, element)
Also, maybe use a for loop, instead of a while loop, to avoid source.pop()?:
for value in source:
...
In the final else block, the first part of the if test is redundant:
else:
for i in range(0,len(target)-1):
if element >= target[i] and element <= target[i+1]:
target.insert(i+1,element)
break
Since the list is already sorted, as soon as you find an element larger than the one you're inserting, you've found the insertion location.
I would say it is rather inefficient. How can you tell? Your approach creates a second array, but you don't need one in a selection sort. You use a lot of operations -- selection sort requires lookups and exchanges, but you have lookups, appends, pops, inserts, and reverses. So you know that you can probably do better.
def insertionsort( aList ):
for i in range( 1, len( aList ) ):
tmp = aList[i]
k = i
while k > 0 and tmp < aList[k - 1]:
aList[k] = aList[k - 1]
k -= 1
aList[k] = tmp
This code is taken from geekviewpoint.com. Clearly it's a O(n^2) algorithm since it's using two loops. If the input is already sorted, however, then it's O(n) since the while-loop would then always be skipped due to tmp < aList[k - 1] failing.