There's already a question regarding this, and the answer says that the asymptotic complexity is O(n). But I observed that if an unsorted list is converted into a set, the set can be printed out in a sorted order, which means that at some point in the middle of these operations the list has been sorted. Then, as any comparison sort has the lower bound of Omega(n lg n), the asymptotic complexity of this operation should also be Omega(n lg n). So what exactly is the complexity of this operation?
A set in Python is an unordered collection so any order you see is by chance. As both dict and set are implemented as hash tables in CPython, insertion is average case O(1) and worst case O(N).
So list(set(...)) is always O(N) and set(list(...)) is average case O(N).
You can browse the source code for set here.
Related
I am looking for a Python datastructure that functions as a sorted list that has the following asymptotics:
O(1) pop from beginning (pop smallest element)
O(1) pop from end (pop largest element)
>= O(log n) insert
Does such a datastructure with an efficient implementation exist? If so, is there a library that implements it in Python?
A regular red/black tree or B-tree can do this in an amortized sense. If you store pointers to the smallest and biggest elements of the tree, then the cost of deleting those elements is amortized O(1), meaning that any series of d deletions will take time O(d), though individual deletions may take longer than this. The cost of insertions are O(log n), which is as good as possible because otherwise you could sort n items in less than O(n log n) time with your data structure.
As for libraries that implement this - that I’m not sure of.
I am having problems trying to find the Big-O runtime of this. It's building a heap by calling the insert function to insert the elements into the heap.
buildHeap(A)
h = new empty heap
for each element e in A
h.insert(e)
What is the Big-O runtime of this version of buildHeap?
Written this way, for a typical binary heap, it would be O(n log n); you're inserting one at a time, and each insertion is O(log n). There are optimized ways to build a heap an array of elements all at once from n elements in O(n) time (referred to as the "heapify" operation), but it's not done by repeated single-element insertions.
The big-O could change depending on the type of heap; some variant heap designs have O(1) insertion, though of course the come with other trade-offs that differ by type, e.g. memory fragmentation, complexity of implementation, higher fixed costs per operation, etc.
trying to find the most optimal data structure in python3 for a frotier problem i have to develop have just realised that the complexity of using the module bisect to make a real time ordered insert is not O(nlog n) as it should be and grows exponentially instead. do not know the reasoning of it so felt like asking u guys just in case know something about it since i find it really interesting.
think im using the module right so it shouldn't be a problem on my end, anyways here is the code used to insert node objects determining that insertion by a random f value nodes have.
bisect.insort(self._frontier, (node._f, node))
getting lots of objects in a few seconds, but then not that many over time. Bakuriu suggested me asking this question since he also found it interesting after doing some tests and ending up with same results as me. the code he used to test that out was the following:
python3 -m timeit -s 'import bisect as B; import random as R;seq=[]' 'for _ in range(100000):B.insort(seq, R.randint(0, 1000000))'
An these were his conclusions:
10k insertions is all fine (80ms and up to that point it basically scales linearly [keep in mind that it is O(nlog n) so it's a little bit worse than linear]) but with 100k it takes forever instead of 10 times more. A list of 100k elements isn't really that big and log(100k) is 16 so it's not that big.
any help will be much appreciated!
You probably missed that the time complexity for insort is O(n) and this is documented clearly, for bisect.insort_left():
Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.
Finding the insertion point is cheap, but inserting into a Python list is not, as the elements past the insertion point have to be moved up a step.
Also see the TimeComplexity page on the Python Wiki, where list insertion is documented:
Insert O(n)
You can find the insertion point in O(log n) time, but the insertion step that follows is O(n), making this a rather expensive way to sort.
If you are using this to sort m elements, you have a O(m^2) (quadratic) solution for what should only take O(m log m) time with TimSort (the sorting algorithm used by the sorted() function).
Binary search takes O(log n) comparisons, but insort isn't just a binary search. It also inserts the element, and inserting an element into a length-n list takes O(n) time.
The _frontier naming in your original code snippet suggests some sort of prioritized search algorithm. A heap would probably make more sense for that, or a SortedList from sortedcollections.
What space complexity does the python sort take? I can't find any definitive documentation on this anywhere
Space complexity is defined as how much additional space the algorithm needs in terms of the N elements. And even though according to the docs, the sort method sorts a list in place, it does use some additional space, as stated in the description of the implementation:
timsort can require a temp array containing as many as N//2 pointers, which means as many as 2*N extra bytes on 32-bit boxes. It can be expected to require a temp array this large when sorting random data; on data with significant structure, it may get away without using any extra heap memory.
Therefore the worst case space complexity is O(N) and best case O(1)
Python's built in sort method is a spin off of merge sort called Timsort, more information here - https://en.wikipedia.org/wiki/Timsort.
It's essentially no better or worse than merge sort, which means that its run time on average is O(n log n) and its space complexity is Ω(n)
I was wondering what is the time complexity of sorting a dictionary by key and sorting a dictionary by value.
for e.g :
for key in sorted(my_dict, key = my_dict.get):
<some-code>
in the above line , what is the time complexity of sorted ? If it is assumed that quicksort is used, is it O(NlogN) on an average and O(N*N) in the worst case ?
and is the time complexity of sorting by value and sorting by key are different ? Since , accessing the value by its key takes only O(1) time, both should be same ?
Thanks.
sorted doesn't really sort a dictionary; it collects the iterable it receives into a list and sorts the list using the Timsort algorithm. Timsort is decidedly not a variant of quicksort, it is a hybrid algorithm closer to merge sort. According to wikipedia, its complexity is O(n log n) in the worst case, with optimizations to speed up the commonly encountered partially ordered data sets.
Since collecting the dict keys and values are both O(n), the complexity of both sorts is the same, determined by the sort algorithm as O(n log n).