What is the Big-O runtime of buildHeap? - python

I am having problems trying to find the Big-O runtime of this. It's building a heap by calling the insert function to insert the elements into the heap.
buildHeap(A)
h = new empty heap
for each element e in A
h.insert(e)
What is the Big-O runtime of this version of buildHeap?

Written this way, for a typical binary heap, it would be O(n log n); you're inserting one at a time, and each insertion is O(log n). There are optimized ways to build a heap an array of elements all at once from n elements in O(n) time (referred to as the "heapify" operation), but it's not done by repeated single-element insertions.
The big-O could change depending on the type of heap; some variant heap designs have O(1) insertion, though of course the come with other trade-offs that differ by type, e.g. memory fragmentation, complexity of implementation, higher fixed costs per operation, etc.

Related

Efficient Sorted List Python

I am looking for a Python datastructure that functions as a sorted list that has the following asymptotics:
O(1) pop from beginning (pop smallest element)
O(1) pop from end (pop largest element)
>= O(log n) insert
Does such a datastructure with an efficient implementation exist? If so, is there a library that implements it in Python?
A regular red/black tree or B-tree can do this in an amortized sense. If you store pointers to the smallest and biggest elements of the tree, then the cost of deleting those elements is amortized O(1), meaning that any series of d deletions will take time O(d), though individual deletions may take longer than this. The cost of insertions are O(log n), which is as good as possible because otherwise you could sort n items in less than O(n log n) time with your data structure.
As for libraries that implement this - that I’m not sure of.

Asymptotic complexity of converting list to set in Python

There's already a question regarding this, and the answer says that the asymptotic complexity is O(n). But I observed that if an unsorted list is converted into a set, the set can be printed out in a sorted order, which means that at some point in the middle of these operations the list has been sorted. Then, as any comparison sort has the lower bound of Omega(n lg n), the asymptotic complexity of this operation should also be Omega(n lg n). So what exactly is the complexity of this operation?
A set in Python is an unordered collection so any order you see is by chance. As both dict and set are implemented as hash tables in CPython, insertion is average case O(1) and worst case O(N).
So list(set(...)) is always O(N) and set(list(...)) is average case O(N).
You can browse the source code for set here.

What is the space complexity of the python sort?

What space complexity does the python sort take? I can't find any definitive documentation on this anywhere
Space complexity is defined as how much additional space the algorithm needs in terms of the N elements. And even though according to the docs, the sort method sorts a list in place, it does use some additional space, as stated in the description of the implementation:
timsort can require a temp array containing as many as N//2 pointers, which means as many as 2*N extra bytes on 32-bit boxes. It can be expected to require a temp array this large when sorting random data; on data with significant structure, it may get away without using any extra heap memory.
Therefore the worst case space complexity is O(N) and best case O(1)
Python's built in sort method is a spin off of merge sort called Timsort, more information here - https://en.wikipedia.org/wiki/Timsort.
It's essentially no better or worse than merge sort, which means that its run time on average is O(n log n) and its space complexity is Ω(n)

Is the Big O notation the same for memoized recursion versus iteration?

I am using a simple example off the top of my head here
function factorial(n)
if n==1 return 1
else return n*factorial(n-1)
function factorial(n)
result = 1
for i = 1 to n
result *= n
return result
Or functions that are recursive and have memoization vs. dynamic programming where you iterate over an array and fill in values, etc.
I know that sometimes recursion is bad because you can run out of memory (tail recursion?) with the heap (or stack?), but does any of this affect O notation?
Does a recursive memoized algorithm have the same O notation / speed as the iterative version?
Generally when considering an algorithm's complexity we would consider space and time complexity separately.
Two similar algorithms, one recursive, and one converted to be not recursive will often have the same time complexity, but differ in space complexity.
In your example, both factorial functions are O(n) time complexity, but the recursive version is O(n) space complexity, versus O(1) fort he iterative version.
This difference isn't universal. Memoization take space, and sometimes the space it takes up is comparable or even greater than the stack space a recursive version uses.
Depending on the complexity of what you're using to store memoized values, the two will have the same order of complexity. For example, using a dict in Python (which has (amortized) O(1) insert/update/delete times), using memoization will have the same order (O(n)) for calculating a factorial as the basic iterative solution.
However, just as one can talk about time complexity, one can also talk about space complexity. Here, the iterative solution uses O(1) memory, while the memoized solution uses O(n) memory.
If you are talking about asymptotically time complexcity, of course it's the same cause you are using the same algorithm.
I guess what you really care about is performance. For C like language, it is possible that recursion will be more expensive.
Are you going to actually use the memoized results?
Besides the fact that the order is the same (both scale equivalently), for a single run of factorial, memoizing is useless - you'll walk through a series of arguments, and none of them will repeat - you'll never use your saved memoized values, meaning that you'll have wasted space and time storing them, and not gotten any speed-ups anywhere else.
However... once you have your memoized dictionary, then subsequent factorial calls will be less than O(n), and will depend on the history. I.E. if you calculate factorial(10), then values of factorial between 10 and 0 are available for instant lookup - O(1). If you calculate factorial(15), it does 15*14*13*12*11*factorial(10), which it just looks up, for 6 multiplies total (instead of 15).
However, you could also create a lookup dict for the iterative version, I guess. Memoization wouldn't help as much - in that case, factorial(10) would only store the result for 10, not all the results down to 0, because that's all the argument list would see. But, the function could store those intermediate values to the memoization dict directly.

Time complexity of pytables File.get_node() operation

what is the time complexity of the pytables file operation get_node?
Let's say I query
mynode = myfile.get_node(where='group0/group1/..../groupN', name ='mynode')
How does this operation scale with N the number of parent groups of mynode ?
Linearly, i.e. O(N), or worse O(N*d) where d is the average branching factor of my hdf5 node tree, or very fast O(1) because pytables internally keeps some sort of dictionary of all pathways?
Thanks a lot!
HDF5 implements nodes as a B-tree, so get_node() has a time complexity of O(log N) [1]. PyTables does not do any preloading of these paths in a dictionary to make this O(1). However, once a node has been loaded it is tagged as 'alive' and goes into an alive_nodes dictionary. Thus subsequent access is O(1) as long as the node remains in memory. So this is sort of a lazy O(1) operatin where you pay the O(log N) cost upfront once.
http://en.wikipedia.org/wiki/B-tree

Categories

Resources