Wondering the time complexity of remove of list, and remove of set.
My thought and study result is,
Removal of list is O(n)
Removal of set is O(1).
I just studied some discussion, but never prove it. If anyone could shed some lights, it will be great. Especially how set implements with O(1) removal?
Using Python 2.7.
a = set([1,2,3,4,5])
b = [1,2,3,4,5]
a.remove(3)
b.remove(3)
print a
print b
From the docs:
list.remove(x)
Remove the first item from the list whose value is x.
It is an error if there is no such item.
Without going into the details of the implementation, the item to remove can be anywhere in the list. A linear scan time is necessary in order to find the item before it can be removed. Once you find the index of the item to remove, you need to shift all the elements down by one index. In any case there's index amount of traversal and size - index amount of shifting involved. Therefore the removal time is equivalent to traversing the entire list: O(n).
You can find the source here: https://hg.python.org/cpython/file/tip/Objects/listobject.c#l2197 (also look for list_ass_slice(..)).
However, a set is different. It uses the hash value of the object being stored to locate it in its buckets. On an average, locating of objects using hash value is almost constant time. Note that it might not always be constant time where there's hash collision and a further search is required. But assuming a good hash function, it usually is.
UPDATE: I must thank Stefan Pochmann for pointing out the mistake.
Related
Consider an array which we know will have integers from 0 to n in sequence, starting at 0. For example:
a=[0,1,2]
Now, we pop the element at position 1.
a.pop(1)
The array we are left with:
[0,2]
And we have a[0]=0, a[1]=2.
Let's pop the element at position 0.
a.pop(0)
We are left with:
[2]
Now, referencing a[0] will yield 2.
This is the behavior I want, except that the pop operations are O(n) in time since the entire array needs to be copied over to the left. Is there a way to do this more efficiently so that each pop (or equivalent) operation is no more than O(log(n))? It's okay to take another O(n) of space if required.
After the sequence of operations shown above, querying for index 0 in the end should still return 2. I tried storing the popped elements in a binary search tree (without actually popping), but the logic became too convoluted.
You seem to be asking for an alternative data structure. Linked lists are good at insertion and deletion from an arbitrary position, but you need to have a reference to the location where you want to insert or delete; otherwise, you'll be spending O(n) scanning to the location that you want to insert or delete from.
As far as I know, there is no linked list in Python's standard library, but you could implement your own.
At the end of the day, you can't do "the exact same thing" in less time; something has to give. You need to relax one of your requirements (e.g. that the list type be used).
An alternative requirement you might be able to relax is "the existing elements need to remain in the same order". If you don't mind a little re-ordering, you can swap lst[i] with lst[-1]. Then, pop from the end (which is efficient for lists). This can be done pretty easily using Python's tuple assignment. Like so:
lst[i], lst[-1] = lst[-1], lst[i] # swap
lst.pop() # Now, the item that you wanted removed is gone,
# but the remaining elements are not
# all in the same order as before,
# but they mostly are.
This is often a viable alternative, because you don't always care about the order of a list.
I am writing a Python program to remove duplicates from a list. My code is the following:
some_values_list = [2,2,4,7,7,8]
unique_values_list = []
for i in some_values_list:
if i not in unique_values_list:
unique_values_list.append(i)
print(unique_values_list)
This code works fine. However, an alternative solution is given and I am trying to interpret it (as I am still a beginner in Python). Specifically, I do not understand the added value or benefit of creating an empty set - how does that make the code clearer or more efficient? Isn´t it enough to create an empty list as I have done in the first example?
The code for the alternative solution is the following:
a = [10,20,30,20,10,50,60,40,80,50,40]
dup_items = set()
uniq_items = []
for x in a:
if x not in dup_items:
uniq_items.append(x)
dup_items.add(x)
print(dup_items)
This code also throws up an error TypeError: set() missing 1 required positional argument: 'items' (This is from a website for Python exercises with answers key, so it is supposed to be correct.)
Determining if an item is present in a set is generally faster than determining if it is present in a list of the same size. Why? Because for a set (at least, for a hash table, which is how CPython sets are implemented) we don't need to traverse the entire collection of elements to check if a particular value is present (whereas we do for a list). Rather, we usually just need to check at most one element. A more precise way to frame this is to say that containment tests for lists take "linear time" (i.e. time proportional to the size of the list), whereas containment tests in sets take "constant time" (i.e. the runtime does not depend on the size of the set).
Lookup for an element in a list takes O(N) time (you can find an element in logarithmic time, but the list should be sorted, so not your case). So if you use the same list to keep unique elements and lookup newly added ones, your whole algorithm runs in O(N²) time (N elements, O(N) average lookup). set is a hash-set in Python, so lookup in it should take O(1) on average. Thus, if you use an auxiliary set to keep track of unique elements already found, your whole algorithm will only take O(N) time on average, chances are good, one order better.
In most cases sets are faster than lists. One of this cases is when you look for an item using "in" keyword. The reason why sets are faster is that, they implement hashtable.
So, in short, if x not in dup_items in second code snippet works faster than if i not in unique_values_list.
If you want to check the time complexity of different Python data structures and operations, you can check this link
.
I think your code is also inefficient in a way that for each item in list you are searching in larger list. The second snippet looks for the item in smaller set. But that is not correct all the time. For example, if the list is all unique items, then it is the same.
Hope it clarifies.
Assume that I have two lists named a and b of both size n, and I want to do the following slice setting operation with k < n
a[:k] = b[:k]
In the Python wiki's Time Complexity page it says that the complexity of slice setting is O(n+k) where k is the length of the slice. I just cannot understand why it is not just O(k) in the above situation.
I know that slicing returns a new list, so it is O(k), and I know that the list holds its data in a continuous way, so inserting an item in the middle would take O(n) time. But the above operation can easily be done in O(k) time. Am I missing something?
Furthermore, is there a documentation where I can find detailed information about such issues? Should I look into the CPython implementation?
Thanks.
O(n+k) is the average case, which includes having to grow or shrink the list to adjust for the number of elements inserted to replace the original slice.
Your case, where you replace the slice with an equal number of new elements, the implementation only takes O(k) steps. But given all possible combinations of number of elements inserted and deleted, the average case has to move the n remaining elements in the list up or down.
See the list_ass_slice function for the exact implementation.
You're right, if you want to know the exact details it's best to use the source. The CPython implementation of setting a slice is in listobject.c.
If I read it correctly, it will...
Count how many new elements you're inserting (or deleting!)
Shift the n existing elements of the list over enough places to make room for the new elements, taking O(n) time in the worst case (when every element of the list has to be shifted).
Copy over the new elements into the space that was just created, taking O(k) time.
That adds up to O(n+k).
Of course, your case is probably not that worst case: you're changing the last k elements of the list, so there might be no need for shifting at all, reducing the complexity to O(k) you expected. However, that is not true in general.
I tested two different ways to reverse a list in python.
import timeit
value = [i for i in range(100)]
def rev1():
v = []
for i in value:
v.append(i)
v.reverse()
def rev2():
v = []
for i in value:
v.insert(0, i)
print timeit.timeit(rev1)
print timeit.timeit(rev2)
Interestingly, the 2nd method that inserts the value to the first element is pretty much slower than the first one.
20.4851300716
73.5116429329
Why is this? In terms of operation, inserting an element to the head doesn't seem that expensive.
insert is an O(n) operation as it requires all elements at or after the insert position to be shifted up by one. append, on the other hand, is generally O(1) (and O(n) in the worst case, when more space must be allocated). This explains the substantial time difference.
The time complexities of these methods are thoroughly documented here.
I quote:
Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move).
Now, going back to your code, we can see that rev1() is an O(n) implementation whereas rev2() is in fact O(n2), so it makes sense that rev2() will be much slower.
In Python, lists are implemented as arrays. If you append one element to an array, the reserved space for an array is simply expanded. If you prepend an element, all elements are shifted by 1 and that is very expensive.
you can confirm this by reading about python lists online. Python implements a list as an array, where the size of the array is actually typically larger than the size of your current list. The unused elements are at the end of the array and represent new elements that could be added to the END of the list, not the beginning. Python uses a classical amortized cost approach so that on average, appending to the end of the list takes O(1) time if you do a bunch of appends, although occasionally a single append will cause the array to become full so a new larger array needs to be created, and all the data copied to the new array. On the other hand, if you always insert at the front of the list, then in the underlying array all elements need to be moved over one index to make room for the new element at the beginning of the array. So, to summarize, if you create a list by doing N insertions, then the total running time will be O(N) if you always append new items to the end of the list, and it will be O(N^2) if you always append to the front of the list.
I am trying to move even numbers in an array to the front and odd numbers to the back of the array. The problem asks to do this in a Linear Algorithm and do this In Place.
I came up with this:
def sort(a):
for i in range(0,len(a)-1):
if a[i]%2==0:
a.insert(0,a.pop(i))
return a
The issue is that, someone told me that technically, a.insert is an o(n) function so technically this would be considered a non-linear algorithm (when including the for i in range part of the function). Since the forum that asked this question is a couple months old, I couldn't ask for an explanation.
Basically I believe he said "Technically" because since this inserts it at the front, it does not check another N number of elements in the array, therefore making it run for practical purposes at O(n) and not O(n^2). Is this a correct assessment?
Also, someone on the forum used a.append to modify the above and changed it to look for odd numbers. No one replied so I was wondering, is a.append not an o(n) function since it moves it to the end? Is it o(1)?
Thanks for explanations and clarifications!
insert at the 0th index of a list requires shifting every other element along which makes it an O(N) operation. However, if you use a deque this operation is O(1).
append is an amortized O(1) operation since it simply requires adding the item on to the end of the list and no shifting is done. Sometimes the list needs to grow so it is not always an O(1) operation.
That is correct - insertion at the front of a Python standard list is O(n). Python lists are implemented as arrays, and thus inserting something at the front of the list requires shifting the entire contents over one spot. Appending, on the other hand, does not require any shifting, and thus is amortized O(1).
Note, however, that a.pop(i) is also an O(n) operation, because it requires shifting everything after the popped item over one spot. Thus, simply modifying your code to use append() instead of insert() would still not result in a linear algorithm.
A linear-time algorithm wouldn't use pop() but instead would do something like swap elements around so that the rest of the list doesn't have to be modified. For example, consider this:
def even_to_front(a_list):
next_even = 0
for idx in xrange(len(a_list)):
if not a_list[idx] % 2:
# a_list[idx] is even, so swap it towards the front
a_list[idx], a_list[next_even] = a_list[next_even], a_list[idx]
next_even += 1
Check this table of complexity
Insert - O(n)
Append - O(1) (lists are over allocated)
Here's how it can be done without append/insert or dequeue
def sort(L):
i, j = 0, len(L)-1
while i<j:
# point i to the next odd number from the start
while i<j and not L[i]%2: i+=1
# point j to the next even number from the end
while i<j and L[j]%2: j-=1
L[i],L[j] = L[j],L[i]
Every time you pop element from a list, you have to copy the trailing portion of the list to move it over one index to fill the hole left by the removed element. This is linear in the distance between the popped element and the tail of the list.
Every time you insert an element into a list, you have to copy the trailing portion of the list to move it over one index to create a spot to insert the new element. This is linear in the distance between the position into which you're inserting the element and the tail of the list.
If you use collections.deque, you can append and pop to both the front and the back in O(1) time. However, removing an element from the middle still be linear (and I think you'd have to write it yourself).