(Python 2.7.8 Windows)
I'm doing a comparison between different sorting algorithms (Quick, bubble and insertion), and mostly it's going as expected, Quick sort is considerably faster with long lists and bubble and insertion are faster with very short lists and alredy sorted ones.
What's raising a problem is Quick Sort and the before mentioned "already sorted" lists. I can sort lists of even 100000 items without problems with this, but with lists of integers from 0...n the limit seems to be considerably lower. 0...500 works but even 0...1000 gives:
RuntimeError: maximum recursion depth exceeded in cmp
Quick Sort:
def quickSort(myList):
if myList == []:
return []
else:
pivot = myList[0]
lesser = quickSort([x for x in myList[1:] if x < pivot])
greater = quickSort([x for x in myList[1:] if x >= pivot])
myList = lesser + [pivot] + greater
return myList
Is there something wrong with the code, or am I missing something?
There are two things going on.
First, Python intentionally limits recursion to a fixed depth. Unlike, say, Scheme, which will keep allocating frames for recursive calls until you run out of memory, Python (at least the most popular implementation, CPython) will only allocate sys.getrecursionlimit() frames (defaulting to 1000) before failing. There are reasons for that,* but really, that isn't relevant here; just the fact that it does this is what you need to know about.
Second, as you may already know, while QuickSort is O(N log N) with most lists, it has a worst case of O(N^2)—in particular (using the standard pivot rules) with already-sorted lists. And when this happens, your stack depth can end up being O(N). So, if you have 1000 elements, arranged in worst-case order, and you're already one frame into the stack, you're going to overflow.
You can work around this in a few ways:
Rewrite the code to be iterative, with an explicit stack, so you're only limited by heap memory instead of stack depth.
Make sure to always recurse into the shorter side first, rather than the left side. This means that even in the O(N^2) case, your stack depth is still O(log N). But only if you've already done the previous step.**
Use a random, median-of-three, or other pivot rule that makes common cases not like already-sorted worst-case. (Of course someone can still intentionally DoS your code; there's really no way to avoid that with quicksort.) The Wikipedia article has some discussion on this, and links to the classic Sedgewick and Knuth papers.
Use a Python implementation with an unlimited stack.***
sys.setrecursionlimit(max(sys.getrecursionlimit(), len(myList)+CONSTANT)). This way, you'll fail right off the bat for an obvious reason if you can't make enough space, and usually won't fail otherwise. (But you might—you could be starting the sort already 900 steps deep in the stack…) But this is a bad idea.****. Besides, you have to figure out the right CONSTANT, which is impossible in general.*****
* Historically, the CPython interpreter recursively calls itself for recursive Python function calls. And the C stack is fixed in size; if you overrun the end, you could segfault, stomp all over heap memory, or all kinds of other problems. This could be changed—in fact, Stackless Python started off as basically just CPython with this change. But the core devs have intentionally chosen not to do so, in part because they don't want to encourage people to write deeply recursive code.
** Or if your language does automatic tail call elimination, but Python doesn't do that. But, as gnibbler points out, you can write a hybrid solution—recurse on the small end, then manually unwrap the tail recursion on the large end—that won't require an explicit stack.
*** Stackless and PyPy can both be configured this way.
**** For one thing, eventually you're going to crash the C stack.
***** The constant isn't really constant; it depends on how deep you already are in the stack (computable non-portably by walking sys._getframe() up to the top) and how much slack you need for comparison functions, etc. (not computable at all, you just have to guess).
You're choosing the first item of each sublist as the pivot. If the list is already in order, this means that your greater sublist is all the items but the first, rather than about half of them. Essentially, each recursive call manages to process only one item. Which means the depth of recursive calls you'll need to make will be about the same as the number of items in the full list. Which overflows Python's built-in limit once you hit about 1000 items. You will have a similar problem sorting lists that are already in reversed order.
To correct this use one of the workarounds suggested in the literature, such as choosing an item at random to be the pivot or the median of the first, middle, and last items.
Always choosing the first (or last) element as the pivot will have problems for quicksort - worst case performance for some common inputs as you have seen
One technique that works fairly well is to choose the average of first,middle and last element
You don't want to make the pivot selection too complicated, or it will dominate the runtime of the search
Related
What space complexity does the python sort take? I can't find any definitive documentation on this anywhere
Space complexity is defined as how much additional space the algorithm needs in terms of the N elements. And even though according to the docs, the sort method sorts a list in place, it does use some additional space, as stated in the description of the implementation:
timsort can require a temp array containing as many as N//2 pointers, which means as many as 2*N extra bytes on 32-bit boxes. It can be expected to require a temp array this large when sorting random data; on data with significant structure, it may get away without using any extra heap memory.
Therefore the worst case space complexity is O(N) and best case O(1)
Python's built in sort method is a spin off of merge sort called Timsort, more information here - https://en.wikipedia.org/wiki/Timsort.
It's essentially no better or worse than merge sort, which means that its run time on average is O(n log n) and its space complexity is Ω(n)
So I just learned about sorting algorithm s bubble, merge, insertion, sort etc. they all seem to be very similar in their methods of sorting with what seems to me minimal changes in their approach. So why do they produce such different sorting times ie O(n^2) vs O(nlogn) as an example
The "similarity" (?!) that you see is completely illusory.
The elementary, O(N squared), approaches, repeat their workings over and over, without taking any advantage, for the "next step", of any work done on the "previous step". So the first step takes time proportional to N, the second one to N-1, and so on -- and the resulting sum of integers from 1 to N is proportional to N squared.
For example, in selection sort, you are looking each time for the smallest element in the I:N section, where I is at first 0, then 1, etc. This is (and must be) done by inspecting all those elements, because no care was previously taken to afford any lesser amount of work on subsequent passes by taking any advantage of previous ones. Once you've found that smallest element, you swap it with the I-th element, increment I, and continue. O(N squared) of course.
The advanced, O(N log N), approaches, are cleverly structured to take advantage in following steps of work done in previous steps. That difference, compared to the elementary approaches, is so pervasive and deep, that, if one cannot perceive it, that speaks chiefly about the acuity of one's perception, not about the approaches themselves:-).
For example, in merge sort, you logically split the array into two sections, 0 to half-length and half-length to length. Once each half is sorted (recursively by the same means, until the length gets short enough), the two halves are merged, which itself is a linear sub-step.
Since you're halving every time, you clearly need a number of steps proportional to log N, and, as each step is O(N), obviously you get the very desirable O(N log N) as a result.
Python's "timsort" is a "natural mergesort", i.e, a variant of mergesort tuned to take advantage of already-sorted (or reverse-sorted) parts of the array, which it recognizes rapidly and avoids spending any further work on. This doesn't change big-O because that's about worst-case time -- but expected time crashes much further down because in so many real-life cases some partial sortedness is present.
(Note that, going by the rigid definition of big-O, quicksort isn't quick at all -- it's worst-case proportional to N squared, when you just happen to pick a terrible pivot each and every time... expected-time wise it's fine, though nowhere as good as timsort, because in real life the situations where you repeatedly pick a disaster pivot are exceedingly rare... but, worst-case, they might happen!-).
timsort is so good as to blow away even very experienced programmers. I don't count because I'm a friend of the inventor, Tim Peters, and a Python fanatic, so my bias is obvious. But, consider...
...I remember a "tech talk" at Google where timsort was being presented. Sitting next to me in the front row was Josh Bloch, then also a Googler, and Java expert extraordinaire. Less than mid-way through the talk he couldn't resist any more - he opened his laptop and started hacking to see if it could possibly be as good as the excellent, sharp technical presentation seemed to show it would be.
As a result, timsort is now also the sorting algorithm in recent releases of the Java Virtual Machine (JVM), though only for user-defined objects (arrays of primitives are still sorted the old way, quickersort [*] I believe -- I don't know which Java peculiarities determined this "split" design choice, my Java-fu being rather weak:-).
[*] that's essentially quicksort plus some hacks for pivot choice to try and avoid the poison cases -- and it's also what Python used to use before Tim Peters gave this one immortal contribution out of the many important ones he's made over the decades.
The results are sometimes surprising to people with CS background (like Tim, I have the luck of having a far-ago academic background, not in CS, but in EE, which helps a lot:-). Say, for example, that you must maintain an ever-growing array that is always sorted at any point in time, as new incoming data points must get added to the array.
The classic approach would use bisection, O(log N), to find the proper insertion point for each new incoming data point -- but then, to put the new data in the right place, you need to shift what comes after it by one slot, that's O(N).
With timsort, you just append the new data point to the array, then sort the array -- that's O(N) for timsort in this case (as it's so awesome in exploiting the already-sorted nature of the first N-1 items!-).
You can think of timsort as pushing the "take advantage of work previously done" to a new extreme -- where not only work previously done by the algorithm itself, but also other influences by other aspects of real-life data processing (causing segments to be sorted in advance), are all exploited to the hilt.
Then we could move into bucket sort and radix sort, which change the plane of discourse -- which in traditional sorting limits one to being able to compare two items -- by exploiting the items' internal structure.
Or a similar example -- presented by Bentley in his immortal book "Programming Pearls" -- of needing to sort an array of several million unique positive integers, each constrained to be 24 bits long.
He solved it with an auxiliary array of 16M bits -- just 2M bytes after all -- initially all zeroes: one pass through the input array to set the corresponding bits in the auxiliary array, then one pass through the auxiliary array to form the required integers again where 1s are found -- and bang, O(N) [and very speedy:-)] sorting for this special but important case!-)
I'm relatively new to python (using v3.x syntax) and would appreciate notes regarding complexity and performance of heapq vs. sorted.
I've already implemented a heapq based solution for a greedy 'find the best job schedule' algorithm. But then I've learned about the possibility of using 'sorted' together with operator.itemgetter() and reverse=True.
Sadly, I could not find any explanation on expected complexity and/or performance of 'sorted' vs. heapq.
If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.
The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.
The sorted is faster if you will need to retrieve all elements in order later.
The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.
The nlargest() and nsmallest() functions of heapq are most appropriate if you are trying to find a relatively small number of items. If you want to find simply single smallest or largest number , min() and max() are most suitable, because it's faster and uses sorted and then slicing. If you are looking for the N smallest or largest items and N is small compared to the overall size of the collection, these functions provide superior performance. Although it's not necessary to use heapq in your code, it's just an interesting topic and a worthwhile subject of study.
heapq is implemented as a binary heap,
The key things to note about binary heaps, and by extension, heapq:
Searching is not supported
Insertions are constant time on average
Deletions are O(log n) time on average
Additional binary heap info described here: http://en.wikipedia.org/wiki/Binary_heap
While heapq is a data structure which has the properties of a binary heap, using sorted is a different concept. sorted returns a sorted list, so that's essentially a result, whereas the heapq is a data structure you are continually working with, which could, optionally, be sorted via sorted.
Additonal sorted info here: https://docs.python.org/3.4/library/functions.html#sorted
What specifically are you trying to accomplish?
Response to OP's comment:
Why do you think you need a heapq specifically? A binary heap is a specialized data structure, and depending on your requirements, it's quite likely not necessary.
You seem to be extremely concerned about performance, but it's not clear why. If something is a "bad performer", but its aggregate time is not significant, then it really doesn't matter in the bigger picture. In the aggregate case, a dict or a list would perform generally perform fine. Why do you specifically think a heapq is needed?
I wonder if this is a don't-let-the-perfect-be-the-enemy-of-the-good type of situation.
Writing Python using C extensions is a niche use case reserved for cases where performance is truly a significant issue. (i.e. it may be better to use, say, an XML parser that is a C extension than something that is pure Python if you're dealing with large files and if performance is your main concern).
Regarding In complex keep playing with structure case: could it be faster to sort with sorted and add elements via .append():
I'm still not clear what the use case is here. As I mentioned above, sorted and heapq are really two different concepts.
What is the use case for which you are so concerned about performance? (Absent other factors not yet specified, I think you may be overly emphasizing the importance of best-case performance in your code here.)
This question already has answers here:
Python Time Complexity (run-time)
(6 answers)
Closed 2 years ago.
When required to show how efficient the algorithm is, we need to show the algorithmic complexity of functions - Big O and so on. In Python code, how can we show or calculate the bounds of functions?
In general, there's no way to do this programmatically (you run into the halting problem).
If you have no idea where to start, you can gain some insight into how a function will perform by running some benchmarks (e.g. using the time module) with inputs of various sizes. You can even collect enough data to form a suspicion about what the runtime might be. But this won't give you a rigorous answer - for that, you need to prove mathematically that your suspected bound is in fact true.
For instance, if I'm playing with a sorting function and observe that the time is increasing roughly proportionally to the square of the input size, I might suspect that the complexity of this sort is O(n**2). But this does not constitute proof - in particular, some algorithms that perform well under typical inputs have pathological inputs that result in very poor performance.
To prove that the bound is in fact O(n**2), I need to look at what the algorithm is doing in the worst case - in this example, I might be analysing a selection sort, which repeatedly sweeps across the entire unsorted portion of the list and picks the lowest unsorted number. It should be evident that I'm examining something like n*(n-1) == O(n**2) elements. If examining elements is a constant-time operation, and placing the final element in the correct place is also not worse than O(n**2), then it follows that my entire algorithm is O(n**2).
If you're trying to get the big O notation for your own functions, you probably need variables keeping track of things like:
the runTime; the number of comparisons; the number of iterations; etc. As well as some calculation investigating how these correspond to the size of your data.
It's probably best to do this manually first, so you can check your understanding of an algorithm.
I am not new to it, but I do not use Python much and my knowledge is rather broad and not very deep in the language, perhaps someone here more knowledgeable can answer my question. I find myself in the situation when I need to add items to a list and keep it sorted as items as added. A quick way of doing this would be.
list.append(item) // O(1)
list.sort() // ??
I would imagine if this is the only way items are added to the list, I would hope the sort would be rather efficient, because the list is sorted with each addition. However there is also this that works:
inserted = False
for i in range(len(list)): // O(N)
if (item < list[i]):
list.insert(i, item) // ??
inserted = True
break
if not inserted: list.append(item)
Can anyone tell me if one of these is obviously more efficient? I am leaning toward the second set of statements, however I really have no idea.
What you are looking for is the bisect module and most possible the insort_left
So your expression could be equivalently written as
from
some_list.append(item) // O(1)
some_list.sort() // ??
to
bisect.insort_left(some_list, item)
Insertion anywhere except near the end takes O(n) time, as it has to move (copy) all elements which come after the insertion point. But on the other hand, all comparison-based sorting algorithm must, on average, make Omega(n log n) comparisons. Many sorts (including timsort, which Python uses) will do significantly better on many inputs, likely including yours (the "almost sorted" case). They still have to move at least that as many elements as inserting in the right position right away though. They also have to do quite a bit of additional work (inspecting all element to ensure they're in the right order, plus more complicated logic that often improves performance, but not in your case). For these reasons, it's probably slower, at least for large lists.
Due to being written in C (in CPython; but similar reasoning applies for other Pythons), it may still be faster than your Python-written linear scan. That leaves the question of how to find the insertion point. Binary search can do this part in O(log n) time, so it's quite useful here (of course, insertion is still O(n), but there's no way around this if you want a sorted list). Unfortunately, binary search is rather tricky to implement. Fortunately, it's already implemented in the standard library: bisect.