I wanna pop out the first (or the nth) element of a list (or deque). My friend told me the deque structure is O(1) operation. I am not talkign about poping out the last element. I am talking about poping out the first or the nth element.
Yes, removing this first element or the last (as well as inserting) is O(1)
You can see more information about that here, quoting from there:
The complexity (efficiency) of common operations on deques is as
follows:
Random access - constant O(1)
Insertion or removal of elements at the end or beginning - constant O(1)
Insertion or removal of elements - linear O(n)
Poping out the first element is O(1). But poping out the nth element is probably not.
As written in the python documentation:
https://docs.python.org/3/library/collections.html#deque-objects
Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.
It is possible to build a data structure that can pop out nth element in constant time, but the space complexity will not be satisfactory.
Related
I am looking for a Python datastructure that functions as a sorted list that has the following asymptotics:
O(1) pop from beginning (pop smallest element)
O(1) pop from end (pop largest element)
>= O(log n) insert
Does such a datastructure with an efficient implementation exist? If so, is there a library that implements it in Python?
A regular red/black tree or B-tree can do this in an amortized sense. If you store pointers to the smallest and biggest elements of the tree, then the cost of deleting those elements is amortized O(1), meaning that any series of d deletions will take time O(d), though individual deletions may take longer than this. The cost of insertions are O(log n), which is as good as possible because otherwise you could sort n items in less than O(n log n) time with your data structure.
As for libraries that implement this - that I’m not sure of.
What is the time complexity of operations in SortedList implementation of sortedcontainers module?
As I understand, the underlying data structure is an array list. So does insertion takes O(n) time since the index can be found in O(logn) and then insert the element at the correct location is O(n)?
Similarly, popping an element from an index must be O(n) as well.
Insert, remove, get index, bisect right and left, find element inside list, are all log(n) operations. Its similar to treeset and multiset in java and c++, implemented with AVL tree or red black tree.
Why does adding elements to a set take longer than adding elements to a list in python? I created a loop and iterated over 1000000 elements added it to a list and a set. List is consistently taking around 10 seconds vs set which is taking around 20 seconds.
Both operations are O(1) amortized time-complexity.
Appending elements to a list has a lower coefficient because it does not need to hash the element first, nor does it need to check/handle hash collisions.
In the case of adding x into a set, Python needs to compute hash(x) first, because keeping the hash of all elements is what allows sets to have fast O(1) membership checks (compared to O(n) membership checks for lists).
The time complexity for appending to a list is the same as for adding to a set - both are O(1) amortised operations, meaning on average they each take a constant amount of time, although occasionally the operation may take more than that constant amount of time in order to dynamically resize the array that the data is stored in.
However, just because both are O(1) doesn't mean they take the same amount of time:
Appending to a list should be faster, because a list is implemented as a dynamic array, so appending an element just requires writing that element at the right index (which is already known), and increasing the length by 1.
In contrast, adding to a set is slower, because it requires computing the hash of the element to find the index to start looking from, then testing indices in some sequence to see if the element is already there (by testing if there is an element at the index, and if so, whether it equals the element being inserted) until either finding it, or finding an empty space where the element should be added.
I'm trying to determine the complexity of converting a collections.deque object into a python list object is O(n). I imagine it would have to take every element and convert it into the list, but I cannot seem to find the implementation code behind deque. So has Python built in something more efficient behind the hood that could allow for O(1) conversion to a list?
Edit: Based off the following I do not believe it could be any faster than O(n)
"Indexed access is O(1) at both ends but slows to O(n) in the middle. For fast random access, use lists instead."
If it cannot access a middle node in O(1) time it will not be able to convert without the same complexity.
You have to access every node. O(1) time is impossible for that fact alone.
I would believe that a deque follows the same principles as conventional deques, in that it's constant time to access the first element. You have to do that for n elements, so the runtime to do so would be O(n).
Here is the implementation of deque
However, that is irrelevant for determining complexity to convert a deque to list in python.
If python is not reusing the data structure internally somehow, conversion into a list will require a walk through the deque and it will be O(n).
I am writing a program, that does a lot of deletions at either the front or back of a list of data, never the middle.
I understand that deletion of the last element is cheap, but how about deletion of the first element? For example let's say list A's address is at 4000, so element 0 is at 4000 and element 1 is at 4001.
Would deleting element 0 then just make the compiler put list A's address at 4001, or would it shift element 1 at 4001 to the location at 4000, and shift all other elements down by 1?
No, it isn't cheap. Removing an element from the front of the list (using list.pop(0), for example) is an O(N) operation and should be avoided. Similarly, inserting elements at the beginning (using list.insert(0, <value>)) is equally inefficient.
This is because, after the list is resized, it's elements must be shifted. For CPython, in the l.pop(0) case, this is done with memmove while for l.insert(0, <value>), the shifting is implemented with a loop through the items stored.
Lists are built for fast random access and O(1) operations on their end.
Since you're doing this operation commonly, though, you should consider using a deque from the collections module (as #ayhan suggested in a comment). The docs on deque also highlight how list objects aren't suitable for these operations:
Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.
(Emphasis mine)
The deque data structure offers O(1) complexity for both sides (beginning and end) with appendleft/popleft and append/pop methods for the beginning and end respectively.
Of course, with small sizes this incurs some extra space requirements (due to the structure of the deque) which should generally be of no concern (and as #juanpa noted in a comment, doesn't always hold) as the sizes of the lists grow. Finally, as #ShadowRanger's insightful comment notes, with really small sequence sizes the problem of popping or inserting from the front is trivialized to the point that it becomes of really no concern.
So, in short, for lists with many items, use deque if you need fast appends/pops from both sides, else, if you're randomly accessing and appending to the end, use lists.
Removing elements from the front of a list in Python is O(n), while removing elements from the ends of a collections.deque is only O(1). A deque would be great for your purpose as a result, however it should be noted that accessing or adding/removing from the middle of a deque is more costly than for a list.
The O(n) cost for removal is because a list in CPython is simply implemented as an array of pointers, thus your intuition regarding the shifting cost for each element is correct.
This can be seen in the Python TimeComplexity page on the Wiki.