I have been refreshing my knowledge of data structures and algorithms using a book. I came across some sample code in the book, including some run time analysis values that I cannot seem to make sense of. I don't know if I am overthinking this or I am missing something extremely simple. Please help me out.
This is the case where they explain the logic behind adding an element at a specific index in a list. I get the logic, it is pretty simple. Move all the elements, starting with the rightmost one, one index to the right to make space for the element at the index. The code for this in the book is given by:
for j in range(self._n, k, −1):
self._A[j] = self._A[j−1]
What I do not get is the range of the loop. Technically, self._n is equivalent to len(list) (an internal state maintained by the list). And if you start at len(list), you are immediately at an IndexOutOfBoundError. Secondly, even if that were not true, the loop replaces n with n-1. Nowhere does it actually move n to n+1 first, so that value is lost. Am I missing something here? I actually tried these conditions out on the Python interpreter and they seem to validate my understanding.
Some of the run time analyses for list operations seems to confuse me. For example:
data.index(value) --> O(k+1)
value in data --> O(k+1)
data1 == data2 (similarly !=, <, <=, >, >=) --> O(k+1)
data[j:k] --> O(k−j+1)
I do not get why the +1 at the end of each running time analysis. Let us consider the data.index(value) operation, which basically returns the first index at which a certain value is found. At the worst case, it should iterate through all n elements of the list, but if not, if the search finds something at index k, then it returns from there. Why the O(k+1) there? The same logic applies to the other cases too, especially the list slicing. When you slice a list, isn't it just O(k-j)? On the contrary, the actual indices are j to k-1.
This understanding should be quite elementary and I really feel silly not being able to understand it. Or I don't know if there are genuine errata in the book and I understand it correctly. Could someone please clarify this for me? Help is much appreciated.
Note (from the comments): the book in question is Data Structures and Algorithms in Python by Goodrich, Tamassia and Goldwasser, and the questions are about pages 202 to 204.
If you actually look at the whole definition of insert from the book, it makes more sense.
def insert(self, k, value):
if self.n == self.capacity:
self.resize(2 * self.capacity)
for j in range(self.n, k, −1):
self.A[j] = self.A[j−1]
self.A[k] = value
self.n += 1
The first line implies that self.n is the number of elements, and corresponds to the index past-the-end, which means that, for a user of the list, accessing it at that index would be erroneous. But this code belongs to the list, and because it has a capacity in addition to a size, it can use self.A[n] if self.n < self.capacity (which is true when the for loop starts).
The loop simply moves the last element (at index n-1) to the next space in memory, which is out of bounds for a user, but not internaly. At the end, n is incremented to reflect the new size, and n-1 becomes the index of that "next space in memory", which now contains the last element.
As for the time complexity of the different operations: well, they are not incorrect. Even though O(n+1) = O(n), you can still write O(n+1) if you want to, and it might be more "precise" in some cases.
For example, it is written that data.index(value) has a complexity of O(k+1), with k the index of the value being search for. Well, if that value is at the very beginning, then k = 0 and the complexity is O(0+1) = O(1). And it's true: if you always search for a value that you know is at the very beginning, even though this operation is pointless, it has a constant time complexity. If you initially wrote O(k) instead, then you would get O(0) for that last operation, which I have never seen used, but would make me think that the operation is instantaneous.
The same thing happens for slicing: they probably wrote O(k−j+1) because if you only take one element, then j = k and the complexity is O(1) instead of O(0).
Note that time complexity isn't usually defined in terms of the actual indices of a particular application of the function, but instead in terms of the total number of elements in the container on which the function is used. You can think of it as the mean complexity for using the function with every possible index, which in the cases of index and slicing, is simply O(n).
For the first case, I think the assumption is that you have a list of fixed maximum length and you are supposed to lose the last datapoint. Also, are you certain that self._n == len(n) and not self._n == len(n)-1?
For the second case, as far as I understand, O(k+1) is the same as O(k), so it doesn't make sense to say O(k+1). But again if we really want to know how someone may count up to k+1... I would guess that guy is counting starting from 0. So to go from 0th to the kth index will take k+1 operations.
This is just an opinion, and an uniformed one, so please take it with a tablespoon of salt. I think that book is no good man.
Related
Consider an array which we know will have integers from 0 to n in sequence, starting at 0. For example:
a=[0,1,2]
Now, we pop the element at position 1.
a.pop(1)
The array we are left with:
[0,2]
And we have a[0]=0, a[1]=2.
Let's pop the element at position 0.
a.pop(0)
We are left with:
[2]
Now, referencing a[0] will yield 2.
This is the behavior I want, except that the pop operations are O(n) in time since the entire array needs to be copied over to the left. Is there a way to do this more efficiently so that each pop (or equivalent) operation is no more than O(log(n))? It's okay to take another O(n) of space if required.
After the sequence of operations shown above, querying for index 0 in the end should still return 2. I tried storing the popped elements in a binary search tree (without actually popping), but the logic became too convoluted.
You seem to be asking for an alternative data structure. Linked lists are good at insertion and deletion from an arbitrary position, but you need to have a reference to the location where you want to insert or delete; otherwise, you'll be spending O(n) scanning to the location that you want to insert or delete from.
As far as I know, there is no linked list in Python's standard library, but you could implement your own.
At the end of the day, you can't do "the exact same thing" in less time; something has to give. You need to relax one of your requirements (e.g. that the list type be used).
An alternative requirement you might be able to relax is "the existing elements need to remain in the same order". If you don't mind a little re-ordering, you can swap lst[i] with lst[-1]. Then, pop from the end (which is efficient for lists). This can be done pretty easily using Python's tuple assignment. Like so:
lst[i], lst[-1] = lst[-1], lst[i] # swap
lst.pop() # Now, the item that you wanted removed is gone,
# but the remaining elements are not
# all in the same order as before,
# but they mostly are.
This is often a viable alternative, because you don't always care about the order of a list.
lt = 1000 #list primes to ...
remaining = list(range(2, lt + 1)) #remaining primes
for c in remaining: #current "prime" being tested
for t in remaining[0: remaining.index(c)]: #test divisor
if c % t == 0 and c != t:
if c in remaining:
remaining.remove(c)
If you don't need context:
How can I either re-run the same target-list value, or use something other than for that reads the expression list every iteration?
If you need context:
I am currently creating a program that lists primes from 2 to a given value (lt). I have a list 'remaining' that starts as all integers from 2 to the given value. One at a time, it tests a value on the list 'c' and tests for divisibility one by one by all smaller numbers on the list 't'. If 'c' is divisible by 't', it removes it from the list. By the end of the program, in theory, only primes remain but I have run into the problem that because I am removing items from the list, and for only reads remaining once, for is skipping values in remaining and thus leaving composites in the list.
What you're trying to do is almost never the right answer (and it's definitely not the right answer here, for reasons I'll get to later), which is why Python doesn't give you a way to do it automatically. In fact, it's illegal for delete from or insert into a list while you're iterating over it, even if CPython and other Python implementations usually don't check for that error.
But there is a way you can simulate what you want, with a little verbosity:
for i in range(remaining.index(c)):
if i >= remaining.index(c): break
t = remaining[i]
Now we're not iterating over remaining, we're iterating over its indices. So, if we remove values, we'll be iterating over the indices of the modified list. (Of course we're not really relying on the range there, since the if…break tests the same thing; if you prefer for i in itertools.count():, that will work too.)
And, depending on what you want to do, you can expand it in different ways, such as:
end = remaining.index(c)
for i in range(end):
if i >= end: break
t = remaining[i]
# possibly subtract from end within the loop
# so we don't have to recalculate remaining.index(c)
… and so on.
However, as I mentioned at the top, this is really not what you want to be doing. If you look at your code, it's not only looping over all the primes less than c, it's calling a bunch of functions inside that loop that also loop over either all the primes less than c or your entire list (that's how index, remove, and in work for lists), meaning you're turning linear work into quadratic work.
The simplest way around this is to stop trying to mutate the original list to remove composite numbers, and instead build a set of primes as you go along. You can search, add, and remove from a set in constant time. And you can just iterate your list in the obvious way because you're no longer mutating it.
Finally, this isn't actually implementing a proper prime sieve, but a much less efficient algorithm that for some reason everyone has been teaching as a Scheme example for decades and more recently translating into other languages. See The Genuine Sieve of Eratosthenes for details, or this project for sample code in Python and Ruby that shows how to implement a proper sieve and a bit of commentary on performance tradeoffs.
(In the following, I ignore the XY problem of finding primes using a "mutable for".)
It's not entirely trivial to design an iteration over a sequence with well-defined (and efficient) behavior when the sequence is modified. In your case, where the sequence is merely being depleted, one reasonable thing to do is to use a list but "delete" elements by replacing them with a special value. (This makes it easy to preserve the current iteration position and avoids the cost of shifting the subsequent elements.)
To make it efficient to skip the deleted elements (both for the outer iteration and any inner iterations like in your example), the special value should be (or contain) a count of any following deleted elements. Note that there is a special case of deleting the current element, where for maximum efficiency you must move the cursor while you still know how far to move.
Wondering the time complexity of remove of list, and remove of set.
My thought and study result is,
Removal of list is O(n)
Removal of set is O(1).
I just studied some discussion, but never prove it. If anyone could shed some lights, it will be great. Especially how set implements with O(1) removal?
Using Python 2.7.
a = set([1,2,3,4,5])
b = [1,2,3,4,5]
a.remove(3)
b.remove(3)
print a
print b
From the docs:
list.remove(x)
Remove the first item from the list whose value is x.
It is an error if there is no such item.
Without going into the details of the implementation, the item to remove can be anywhere in the list. A linear scan time is necessary in order to find the item before it can be removed. Once you find the index of the item to remove, you need to shift all the elements down by one index. In any case there's index amount of traversal and size - index amount of shifting involved. Therefore the removal time is equivalent to traversing the entire list: O(n).
You can find the source here: https://hg.python.org/cpython/file/tip/Objects/listobject.c#l2197 (also look for list_ass_slice(..)).
However, a set is different. It uses the hash value of the object being stored to locate it in its buckets. On an average, locating of objects using hash value is almost constant time. Note that it might not always be constant time where there's hash collision and a further search is required. But assuming a good hash function, it usually is.
UPDATE: I must thank Stefan Pochmann for pointing out the mistake.
Assume that I have two lists named a and b of both size n, and I want to do the following slice setting operation with k < n
a[:k] = b[:k]
In the Python wiki's Time Complexity page it says that the complexity of slice setting is O(n+k) where k is the length of the slice. I just cannot understand why it is not just O(k) in the above situation.
I know that slicing returns a new list, so it is O(k), and I know that the list holds its data in a continuous way, so inserting an item in the middle would take O(n) time. But the above operation can easily be done in O(k) time. Am I missing something?
Furthermore, is there a documentation where I can find detailed information about such issues? Should I look into the CPython implementation?
Thanks.
O(n+k) is the average case, which includes having to grow or shrink the list to adjust for the number of elements inserted to replace the original slice.
Your case, where you replace the slice with an equal number of new elements, the implementation only takes O(k) steps. But given all possible combinations of number of elements inserted and deleted, the average case has to move the n remaining elements in the list up or down.
See the list_ass_slice function for the exact implementation.
You're right, if you want to know the exact details it's best to use the source. The CPython implementation of setting a slice is in listobject.c.
If I read it correctly, it will...
Count how many new elements you're inserting (or deleting!)
Shift the n existing elements of the list over enough places to make room for the new elements, taking O(n) time in the worst case (when every element of the list has to be shifted).
Copy over the new elements into the space that was just created, taking O(k) time.
That adds up to O(n+k).
Of course, your case is probably not that worst case: you're changing the last k elements of the list, so there might be no need for shifting at all, reducing the complexity to O(k) you expected. However, that is not true in general.
I am trying to move even numbers in an array to the front and odd numbers to the back of the array. The problem asks to do this in a Linear Algorithm and do this In Place.
I came up with this:
def sort(a):
for i in range(0,len(a)-1):
if a[i]%2==0:
a.insert(0,a.pop(i))
return a
The issue is that, someone told me that technically, a.insert is an o(n) function so technically this would be considered a non-linear algorithm (when including the for i in range part of the function). Since the forum that asked this question is a couple months old, I couldn't ask for an explanation.
Basically I believe he said "Technically" because since this inserts it at the front, it does not check another N number of elements in the array, therefore making it run for practical purposes at O(n) and not O(n^2). Is this a correct assessment?
Also, someone on the forum used a.append to modify the above and changed it to look for odd numbers. No one replied so I was wondering, is a.append not an o(n) function since it moves it to the end? Is it o(1)?
Thanks for explanations and clarifications!
insert at the 0th index of a list requires shifting every other element along which makes it an O(N) operation. However, if you use a deque this operation is O(1).
append is an amortized O(1) operation since it simply requires adding the item on to the end of the list and no shifting is done. Sometimes the list needs to grow so it is not always an O(1) operation.
That is correct - insertion at the front of a Python standard list is O(n). Python lists are implemented as arrays, and thus inserting something at the front of the list requires shifting the entire contents over one spot. Appending, on the other hand, does not require any shifting, and thus is amortized O(1).
Note, however, that a.pop(i) is also an O(n) operation, because it requires shifting everything after the popped item over one spot. Thus, simply modifying your code to use append() instead of insert() would still not result in a linear algorithm.
A linear-time algorithm wouldn't use pop() but instead would do something like swap elements around so that the rest of the list doesn't have to be modified. For example, consider this:
def even_to_front(a_list):
next_even = 0
for idx in xrange(len(a_list)):
if not a_list[idx] % 2:
# a_list[idx] is even, so swap it towards the front
a_list[idx], a_list[next_even] = a_list[next_even], a_list[idx]
next_even += 1
Check this table of complexity
Insert - O(n)
Append - O(1) (lists are over allocated)
Here's how it can be done without append/insert or dequeue
def sort(L):
i, j = 0, len(L)-1
while i<j:
# point i to the next odd number from the start
while i<j and not L[i]%2: i+=1
# point j to the next even number from the end
while i<j and L[j]%2: j-=1
L[i],L[j] = L[j],L[i]
Every time you pop element from a list, you have to copy the trailing portion of the list to move it over one index to fill the hole left by the removed element. This is linear in the distance between the popped element and the tail of the list.
Every time you insert an element into a list, you have to copy the trailing portion of the list to move it over one index to create a spot to insert the new element. This is linear in the distance between the position into which you're inserting the element and the tail of the list.
If you use collections.deque, you can append and pop to both the front and the back in O(1) time. However, removing an element from the middle still be linear (and I think you'd have to write it yourself).