bisect.insort complexity not as expected

bisect.insort complexity not as expected - python

trying to find the most optimal data structure in python3 for a frotier problem i have to develop have just realised that the complexity of using the module bisect to make a real time ordered insert is not O(nlog n) as it should be and grows exponentially instead. do not know the reasoning of it so felt like asking u guys just in case know something about it since i find it really interesting.
think im using the module right so it shouldn't be a problem on my end, anyways here is the code used to insert node objects determining that insertion by a random f value nodes have.
bisect.insort(self._frontier, (node._f, node))
getting lots of objects in a few seconds, but then not that many over time. Bakuriu suggested me asking this question since he also found it interesting after doing some tests and ending up with same results as me. the code he used to test that out was the following:
python3 -m timeit -s 'import bisect as B; import random as R;seq=[]' 'for _ in range(100000):B.insort(seq, R.randint(0, 1000000))'
An these were his conclusions:
10k insertions is all fine (80ms and up to that point it basically scales linearly [keep in mind that it is O(nlog n) so it's a little bit worse than linear]) but with 100k it takes forever instead of 10 times more. A list of 100k elements isn't really that big and log(100k) is 16 so it's not that big.
any help will be much appreciated!

You probably missed that the time complexity for insort is O(n) and this is documented clearly, for bisect.insort_left():
Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.
Finding the insertion point is cheap, but inserting into a Python list is not, as the elements past the insertion point have to be moved up a step.
Also see the TimeComplexity page on the Python Wiki, where list insertion is documented:
Insert O(n)
You can find the insertion point in O(log n) time, but the insertion step that follows is O(n), making this a rather expensive way to sort.
If you are using this to sort m elements, you have a O(m^2) (quadratic) solution for what should only take O(m log m) time with TimSort (the sorting algorithm used by the sorted() function).

Binary search takes O(log n) comparisons, but insort isn't just a binary search. It also inserts the element, and inserting an element into a length-n list takes O(n) time.
The _frontier naming in your original code snippet suggests some sort of prioritized search algorithm. A heap would probably make more sense for that, or a SortedList from sortedcollections.

Related

What is the space complexity of the python sort?

What space complexity does the python sort take? I can't find any definitive documentation on this anywhere

Space complexity is defined as how much additional space the algorithm needs in terms of the N elements. And even though according to the docs, the sort method sorts a list in place, it does use some additional space, as stated in the description of the implementation:
timsort can require a temp array containing as many as N//2 pointers, which means as many as 2*N extra bytes on 32-bit boxes. It can be expected to require a temp array this large when sorting random data; on data with significant structure, it may get away without using any extra heap memory.
Therefore the worst case space complexity is O(N) and best case O(1)

Python's built in sort method is a spin off of merge sort called Timsort, more information here - https://en.wikipedia.org/wiki/Timsort.
It's essentially no better or worse than merge sort, which means that its run time on average is O(n log n) and its space complexity is Ω(n)

Python QuickSort maximum recursion depth

(Python 2.7.8 Windows)
I'm doing a comparison between different sorting algorithms (Quick, bubble and insertion), and mostly it's going as expected, Quick sort is considerably faster with long lists and bubble and insertion are faster with very short lists and alredy sorted ones.
What's raising a problem is Quick Sort and the before mentioned "already sorted" lists. I can sort lists of even 100000 items without problems with this, but with lists of integers from 0...n the limit seems to be considerably lower. 0...500 works but even 0...1000 gives:
RuntimeError: maximum recursion depth exceeded in cmp
Quick Sort:
def quickSort(myList):
if myList == []:
return []
else:
pivot = myList[0]
lesser = quickSort([x for x in myList[1:] if x < pivot])
greater = quickSort([x for x in myList[1:] if x >= pivot])
myList = lesser + [pivot] + greater
return myList
Is there something wrong with the code, or am I missing something?

There are two things going on.
First, Python intentionally limits recursion to a fixed depth. Unlike, say, Scheme, which will keep allocating frames for recursive calls until you run out of memory, Python (at least the most popular implementation, CPython) will only allocate sys.getrecursionlimit() frames (defaulting to 1000) before failing. There are reasons for that,* but really, that isn't relevant here; just the fact that it does this is what you need to know about.
Second, as you may already know, while QuickSort is O(N log N) with most lists, it has a worst case of O(N^2)—in particular (using the standard pivot rules) with already-sorted lists. And when this happens, your stack depth can end up being O(N). So, if you have 1000 elements, arranged in worst-case order, and you're already one frame into the stack, you're going to overflow.
You can work around this in a few ways:
Rewrite the code to be iterative, with an explicit stack, so you're only limited by heap memory instead of stack depth.
Make sure to always recurse into the shorter side first, rather than the left side. This means that even in the O(N^2) case, your stack depth is still O(log N). But only if you've already done the previous step.**
Use a random, median-of-three, or other pivot rule that makes common cases not like already-sorted worst-case. (Of course someone can still intentionally DoS your code; there's really no way to avoid that with quicksort.) The Wikipedia article has some discussion on this, and links to the classic Sedgewick and Knuth papers.
Use a Python implementation with an unlimited stack.***
sys.setrecursionlimit(max(sys.getrecursionlimit(), len(myList)+CONSTANT)). This way, you'll fail right off the bat for an obvious reason if you can't make enough space, and usually won't fail otherwise. (But you might—you could be starting the sort already 900 steps deep in the stack…) But this is a bad idea.****. Besides, you have to figure out the right CONSTANT, which is impossible in general.*****
* Historically, the CPython interpreter recursively calls itself for recursive Python function calls. And the C stack is fixed in size; if you overrun the end, you could segfault, stomp all over heap memory, or all kinds of other problems. This could be changed—in fact, Stackless Python started off as basically just CPython with this change. But the core devs have intentionally chosen not to do so, in part because they don't want to encourage people to write deeply recursive code.
** Or if your language does automatic tail call elimination, but Python doesn't do that. But, as gnibbler points out, you can write a hybrid solution—recurse on the small end, then manually unwrap the tail recursion on the large end—that won't require an explicit stack.
*** Stackless and PyPy can both be configured this way.
**** For one thing, eventually you're going to crash the C stack.
***** The constant isn't really constant; it depends on how deep you already are in the stack (computable non-portably by walking sys._getframe() up to the top) and how much slack you need for comparison functions, etc. (not computable at all, you just have to guess).

You're choosing the first item of each sublist as the pivot. If the list is already in order, this means that your greater sublist is all the items but the first, rather than about half of them. Essentially, each recursive call manages to process only one item. Which means the depth of recursive calls you'll need to make will be about the same as the number of items in the full list. Which overflows Python's built-in limit once you hit about 1000 items. You will have a similar problem sorting lists that are already in reversed order.
To correct this use one of the workarounds suggested in the literature, such as choosing an item at random to be the pivot or the median of the first, middle, and last items.

Always choosing the first (or last) element as the pivot will have problems for quicksort - worst case performance for some common inputs as you have seen
One technique that works fairly well is to choose the average of first,middle and last element
You don't want to make the pivot selection too complicated, or it will dominate the runtime of the search

Python heapq vs. sorted complexity and performance

I'm relatively new to python (using v3.x syntax) and would appreciate notes regarding complexity and performance of heapq vs. sorted.
I've already implemented a heapq based solution for a greedy 'find the best job schedule' algorithm. But then I've learned about the possibility of using 'sorted' together with operator.itemgetter() and reverse=True.
Sadly, I could not find any explanation on expected complexity and/or performance of 'sorted' vs. heapq.

If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.
The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.
The sorted is faster if you will need to retrieve all elements in order later.
The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.

The nlargest() and nsmallest() functions of heapq are most appropriate if you are trying to find a relatively small number of items. If you want to find simply single smallest or largest number , min() and max() are most suitable, because it's faster and uses sorted and then slicing. If you are looking for the N smallest or largest items and N is small compared to the overall size of the collection, these functions provide superior performance. Although it's not necessary to use heapq in your code, it's just an interesting topic and a worthwhile subject of study.

heapq is implemented as a binary heap,
The key things to note about binary heaps, and by extension, heapq:
Searching is not supported
Insertions are constant time on average
Deletions are O(log n) time on average
Additional binary heap info described here: http://en.wikipedia.org/wiki/Binary_heap
While heapq is a data structure which has the properties of a binary heap, using sorted is a different concept. sorted returns a sorted list, so that's essentially a result, whereas the heapq is a data structure you are continually working with, which could, optionally, be sorted via sorted.
Additonal sorted info here: https://docs.python.org/3.4/library/functions.html#sorted
What specifically are you trying to accomplish?
Response to OP's comment:
Why do you think you need a heapq specifically? A binary heap is a specialized data structure, and depending on your requirements, it's quite likely not necessary.
You seem to be extremely concerned about performance, but it's not clear why. If something is a "bad performer", but its aggregate time is not significant, then it really doesn't matter in the bigger picture. In the aggregate case, a dict or a list would perform generally perform fine. Why do you specifically think a heapq is needed?
I wonder if this is a don't-let-the-perfect-be-the-enemy-of-the-good type of situation.
Writing Python using C extensions is a niche use case reserved for cases where performance is truly a significant issue. (i.e. it may be better to use, say, an XML parser that is a C extension than something that is pure Python if you're dealing with large files and if performance is your main concern).
Regarding In complex keep playing with structure case: could it be faster to sort with sorted and add elements via .append():
I'm still not clear what the use case is here. As I mentioned above, sorted and heapq are really two different concepts.
What is the use case for which you are so concerned about performance? (Absent other factors not yet specified, I think you may be overly emphasizing the importance of best-case performance in your code here.)

Is the Big O notation the same for memoized recursion versus iteration?

I am using a simple example off the top of my head here
function factorial(n)
if n==1 return 1
else return n*factorial(n-1)
function factorial(n)
result = 1
for i = 1 to n
result *= n
return result
Or functions that are recursive and have memoization vs. dynamic programming where you iterate over an array and fill in values, etc.
I know that sometimes recursion is bad because you can run out of memory (tail recursion?) with the heap (or stack?), but does any of this affect O notation?
Does a recursive memoized algorithm have the same O notation / speed as the iterative version?

Generally when considering an algorithm's complexity we would consider space and time complexity separately.
Two similar algorithms, one recursive, and one converted to be not recursive will often have the same time complexity, but differ in space complexity.
In your example, both factorial functions are O(n) time complexity, but the recursive version is O(n) space complexity, versus O(1) fort he iterative version.
This difference isn't universal. Memoization take space, and sometimes the space it takes up is comparable or even greater than the stack space a recursive version uses.

Depending on the complexity of what you're using to store memoized values, the two will have the same order of complexity. For example, using a dict in Python (which has (amortized) O(1) insert/update/delete times), using memoization will have the same order (O(n)) for calculating a factorial as the basic iterative solution.
However, just as one can talk about time complexity, one can also talk about space complexity. Here, the iterative solution uses O(1) memory, while the memoized solution uses O(n) memory.

If you are talking about asymptotically time complexcity, of course it's the same cause you are using the same algorithm.
I guess what you really care about is performance. For C like language, it is possible that recursion will be more expensive.

Are you going to actually use the memoized results?
Besides the fact that the order is the same (both scale equivalently), for a single run of factorial, memoizing is useless - you'll walk through a series of arguments, and none of them will repeat - you'll never use your saved memoized values, meaning that you'll have wasted space and time storing them, and not gotten any speed-ups anywhere else.
However... once you have your memoized dictionary, then subsequent factorial calls will be less than O(n), and will depend on the history. I.E. if you calculate factorial(10), then values of factorial between 10 and 0 are available for instant lookup - O(1). If you calculate factorial(15), it does 15*14*13*12*11*factorial(10), which it just looks up, for 6 multiplies total (instead of 15).
However, you could also create a lookup dict for the iterative version, I guess. Memoization wouldn't help as much - in that case, factorial(10) would only store the result for 10, not all the results down to 0, because that's all the argument list would see. But, the function could store those intermediate values to the memoization dict directly.

Efficiency of Python sorted() built-in function vs. list insert() method

I am not new to it, but I do not use Python much and my knowledge is rather broad and not very deep in the language, perhaps someone here more knowledgeable can answer my question. I find myself in the situation when I need to add items to a list and keep it sorted as items as added. A quick way of doing this would be.
list.append(item) // O(1)
list.sort() // ??
I would imagine if this is the only way items are added to the list, I would hope the sort would be rather efficient, because the list is sorted with each addition. However there is also this that works:
inserted = False
for i in range(len(list)): // O(N)
if (item < list[i]):
list.insert(i, item) // ??
inserted = True
break
if not inserted: list.append(item)
Can anyone tell me if one of these is obviously more efficient? I am leaning toward the second set of statements, however I really have no idea.

What you are looking for is the bisect module and most possible the insort_left
So your expression could be equivalently written as
from
some_list.append(item) // O(1)
some_list.sort() // ??
to
bisect.insort_left(some_list, item)

Insertion anywhere except near the end takes O(n) time, as it has to move (copy) all elements which come after the insertion point. But on the other hand, all comparison-based sorting algorithm must, on average, make Omega(n log n) comparisons. Many sorts (including timsort, which Python uses) will do significantly better on many inputs, likely including yours (the "almost sorted" case). They still have to move at least that as many elements as inserting in the right position right away though. They also have to do quite a bit of additional work (inspecting all element to ensure they're in the right order, plus more complicated logic that often improves performance, but not in your case). For these reasons, it's probably slower, at least for large lists.
Due to being written in C (in CPython; but similar reasoning applies for other Pythons), it may still be faster than your Python-written linear scan. That leaves the question of how to find the insertion point. Binary search can do this part in O(log n) time, so it's quite useful here (of course, insertion is still O(n), but there's no way around this if you want a sorted list). Unfortunately, binary search is rather tricky to implement. Fortunately, it's already implemented in the standard library: bisect.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.