Python: Removing elements near the front of a list efficiently? - python

Is there a way to remove elements from the start of a long list of numbers? Right now I am doing del arr[i:i+x] but it is slow since it has to move everything past that point to the left, which is time-consuming for large lists.
I looked into deques but not sure if those apply here. Could use some direction!

Yes deques do apply here, you should use them, it will be very fast if they are very near the front but slower if the start index is located towards the middle.
Indexed access is O(1) at both ends but slows to O(n) in the middle.
>>> from collections import deque
>>> def delete_slice(d, start, stop):
d.rotate(-start)
for i in range(stop-start): # use xrange on Python 2
d.popleft()
d.rotate(start)
>>> d = deque(range(15))
>>> delete_slice(d, 5, 10)
>>> d
deque([0, 1, 2, 3, 4, 10, 11, 12, 13, 14])
Note: Rotating past the middle, as previously stated, will be slow, if you want to support fast deletions from the right side you can extend the code like so:
def delete_slice(d, start, stop):
stop = min(stop, len(d)) # don't go past the end
start = min(start, stop) # don't go past stop
if start < len(d) // 2:
d.rotate(-start)
for i in range(stop-start): # use xrange on Python 2
d.popleft()
d.rotate(start)
else:
n = len(d) - stop
d.rotate(n)
for i in range(stop - start):
d.pop()
d.rotate(-n)
Of course there is some other error checking you will need to do but I'll leave that out of here for simplicity's sake. Unfortunately these methods are not already provided by deque itself, so you have to implement them like this.
To implement deque slicing, use a similar approach applying rotate() to bring a target element to the left side of the deque. Remove old entries with popleft(), add new entries with extend(), and then reverse the rotation. With minor variations on that approach, it is easy to implement Forth style stack manipulations such as dup, drop, swap, over, pick, rot, and roll.

Yes, deque applies here. Have prepared an example which shows how to use it:
import collections
"create deque from list"
d=collections.deque([1,2,3,4,5,6])
"remove first element"
d.popleft()
print d
Output:
deque([2,3,4,5,6])

I'm thinking you probably want a tree or skiplist.
I did a study of Python tree implementations a while back:
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/
You might be better off asking about this in the algorithms section of the site.

If you're doing several deletions in a row, it might be more efficient to create a new list using a generator with a filter:
arr = [e for e in arr if not rejected(e)]
If you need to work with indexes, you can use enumerate:
arr = [e for i, e in enumerate(arr) if not rejected(i)]
Both operations are O(n) (O(2*n) in space), while performing several deletions in a row is O(n*m) (but O(n) in space).
deque has this characteristic, which may not be what you want:
Indexed access is O(1) at both ends but slows to O(n) in the middle.

Lists are not optimized for appending or popping at the front. Deques are, though.
Note also that the deque.remove(item) in the CPython implementation will search from the front. If you need to move/remove queue items frequently, you may benefit from considering whether most items to be removed will be near the end and use the queue flipped if that is the case, i.e. appendleft and pop instead of append and popleft.

Related

Which is better: deque or list slicing?

If I use the code
from collections import deque
q = deque(maxlen=2)
while step <= step_max:
calculate(item)
q.append(item)
another_calculation(q)
how does it compare in efficiency and readability to
q = []
while step <= step_max:
calculate(item)
q.append(item)
q = q[-2:]
another_calculation(q)
calculate() and another_calculation() are not real in this case but in my actual program are simply two calculations. I'm doing these calculations every step for millions of steps (I'm simulating an ion in 2-d space). Because there are so many steps, q gets very long and uses a lot of memory, while another_calculation() only uses the last two values of q. I had been using the latter method, then heard deque mentioned and thought it might be more efficient; thus the question.
I.e., how do deques in python compare to just normal list slicing?
q = q[-2:]
now this is a costly operation because it recreates a list everytime (and copies the references). (A nasty side effect here is that it changes the reference of q even if you can use q[:] = q[-2:] to avoid that).
The deque object just changes the start of the list pointer and "forgets" the oldest item. So it's faster and it's one of the usages it's been designed for.
Of course, for 2 values, there isn't much difference, but for a bigger number there is.
If I interpret your question correctly, you have a function, that calculates a value, and you want to do another calculation with this and the previous value. The best way is to use two variables:
while step <= step_max:
item = calculate()
another_calculation(previous_item, item)
previous_item = item
If the calculations are some form of vector math, you should consider using numpy.

How to peek front of deque without popping?

I want to check a condition against the front of a queue before deciding whether or not to pop. How can I achieve this in python with collections.deque?
list(my_deque)[0]
seems ugly and poor for performance.
TL;DR: assuming your deque is called d, just inspect d[0], since the "leftmost" element in a deque is the front (you might want to test before the length of the deque to make sure it's not empty). Taking #asongtoruin's suggestion, use if d: to test whether the deque is empty (it's equivalent to if len(d) == 0:, but more pythonic)
###Why not converting to list?
Because deques are indexable and you're testing the front. While a deque has an interface similar to a list, the implementation is optimized for front- and back- operations. Quoting the documentation:
Deques support thread-safe, memory efficient appends and pops from
either side of the deque with approximately the same O(1) performance
in either direction.
Though list objects support similar operations, they are optimized for
fast fixed-length operations and incur O(n) memory movement costs for
pop(0) and insert(0, v) operations which change both the size and
position of the underlying data representation.
Converting to list might be desirable if you have lots of operations accessing the "middle" of the queue. Again quoting the documentation:
Indexed access is O(1) at both ends but slows to O(n) in the middle.
For fast random access, use lists instead.
Conversion to list is O(n), but every subsequent access is O(1).
You can simply find the last element using my_deque[-1] or my_deque[len(my_deque)-1] .
Here is a simple implementation that allowed me to check the front of the queue before popping (using while and q[0]):
Apply your own condition against q[0], before q.popleft(), below:
testLst = [100,200,-100,400,340]
q=deque(testLst)
while q:
print(q)
print('{}{}'.format("length of queue: ", len(q)))
print('{}{}'.format("head: ", q[0]))
print()
q.popleft()
output:
deque([100, 200, -100, 400, 340])
length of queue: 5
head: 100
deque([200, -100, 400, 340])
length of queue: 4
head: 200
deque([-100, 400, 340])
length of queue: 3
head: -100
deque([400, 340])
length of queue: 2
head: 400
deque([340])
length of queue: 1
head: 340
Assuming your deque is implemented from collections python
from collections import deque
deque = deque() //syntax
Deque too can be interpreted as a list in terms of accessing using indices.
You can peek front element by using deque[0] and peek last using deque[-1]
This works without popping elements from left or right and seems efficient too.

Explain time and space complexity of two python codes. Which one is the best? (Non- subjective)

These codes gives the sum of even integers in a list without using loop statement. I would like to know the time complexity and space complexity of both codes. which is best?
CODE 1:
class EvenSum:
#Initialize the class
def __init__(self):
self.res = 0
def sumEvenIntegers(self, integerlist):
if integerlist:
if not integerlist[0] % 2:
self.res += integerlist[0]
del integerlist[0]
self.sumEvenIntegers(integerlist)
else:
del integerlist[0]
self.sumEvenIntegers(integerlist)
return self.res
#main method
if __name__ == "__main__":
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even = EvenSum()
print even.sumEvenIntegers(l)
CODE 2:
import numpy as np
def sum_of_all_even_integers(list):
list_sum = sum(list)
bin_arr = map(lambda x:x%2, list)
return list_sum - sum(list*bin_arr)
if __name__ == "__main__":
list = np.array([1,2,3,4,5,6,7,8,9,10])
print sum_of_all_even_integers(list)
According to the Python wiki, deleting an item from a list takes linear time proportional to the number of elements in the list. Since you delete every item in the list, and each deletion takes linear time, the overall runtime is proportional to the square of number of items in the list.
In your second code snippet, both sum as well as map take linear time. So the overall complexity is linear proportional to the number of elements in the list. Interestingly, sum_of_elements isn't used at all (but it doesn't sum all even elements either).
what about the following?
import numpy as np
a = np.arange(20)
print np.sum(a[a%2==0])
It seems to be much more lightweight compared to your two code snippets.
Small timings with an np.arange(998):
Pure numpy:
248502
0.0
Class recursion:
248502
0.00399994850159
List/Numpy one:
248502
0.00200009346008
And, if there's a 999 element array, your class runs in failure, because the maximum recursion depth is reached.
First code use item deletion in list and recursivity, two thing at which python is not so good : time deletion take an O(n) time, since you recreate the whole list, and python does not optimize recursive calls (to keep full info about the traceback I think).
So I would go for the second code (which I think actually use "for loops", only the loops are hidden in the reduce and map).
If you use numpy, you could actually do something like :
a = np.array([1,2,3,4,5,6,7,8,9,10])
np.sum(np.where((a+1)%2,a,0))
Or like anki proposed :
np.sum( a[a%2 == 0] )
Which I think would be best since numpy is optimized for array manipulation.
By the way, never name an object list, as it overwrites the list constructor.
EDIT :
If you just want the sum of all even number in [0,n], you don't need a sum or anything. There is a mathematical formula for that :
s=(n//2)*(n//2+1)
First have O(N^2) time complexity and O(N) space complexity. The second have O(N) time complexity and space complexity.
The first uses one stack frame (one piece of stack memory of constant but quite large size) for each element in the array. In addition it executes the function for each element, but for each time it deletes the first element of the array, which is an O(N) operation.
The second happens much behind the scene. The map function generates a new list of the same size of the original, in addition it calls a function for each element - giving the complexity directly. Similarily for the reduce and sum functions - they do the same operation for each element in the list, although they don't use more memory than constant. Adding up these doesn't make the complexities worse - twice or thrice O(N) is still O(N).
The best is probably the later, but then again - it depends on what your preferences are. Maybe you want to consume a lot of time and stack space? In that case the first would suit your preferences much better.
Also note that the first solution modifies the input data - they don't do the same thing in other words. After calling the first the list sent to the function would be empty (which may or may not be a bad thing).

Python: Trim heapq heap so it is only X items long

What would be the fastest way to get the top X items of a heap, as a heap still?
I would figure there is a better way than rebuilding a heap by popping the heap X times.
#Ben is right on all counts, although Python's heapq heaps are min-heaps rather than max-heaps:
newheap = [heappop(oldheap) for _ in range(X)] # removes from oldheap
is usually as good as it gets. However, it can be faster, and especially so if X is nearly as large as len(oldheap), to do this instead:
newheap = sorted(oldheap)[:X] # doesn't change oldheap
At least in CPython, the sort method can take advantage of the partial order already in oldheap, and complete sorting the whole list faster than heappop() can extract the smallest X elements (sorting can need fewer comparisons overall, and the comparisons are the costliest part). The extreme in this vein is when X == len(oldheap) and oldheap already happens to be in sorted order. Then sorting requires a grand total of X-1 comparisons, while repeatedly popping requires on the order of X*log(X) comparisons.
In terms of asymptotic complexity, that's actually the best you can do. You know the front item is the maximal element, and the runner-up is one of its children. But the other child of the root node might be only the 100th biggest, with the higher 98 in the other half of the tree.
Of course, once you've pulled off your X items, you don't need to re-heapify them -- they'll already be sorted, and hence a well-formed binary heap of their own.

Is there a non-recursive way of separating each list elements into their own list?

I was looking at Wikipedia's pseudo-code (and other webpages like sortvis.org and sorting-algorithm.com) on merge sort and saw the preparation of a merge uses recursion.
I was curious to see if there is a non-recursive way to do it.
Perhaps something like a for each i element in list, i=[i-th element].
I am under the impression that recursion is keep-it-to-a-minimum-because-it's-undesirable, and so therefore I thought of this question.
The following is a pseudo-code sample of the recursive part of the merge-sort from Wikipedia:
function merge_sort(list m)
// if list size is 1, consider it sorted and return it
if length(m) <= 1
return m
// else list size is > 1, so split the list into two sublists
var list left, right
var integer middle = length(m) / 2
for each x in m up to middle
add x to left
for each x in m after or equal middle
add x to right
// recursively call merge_sort() to further split each sublist
// until sublist size is 1
left = merge_sort(left)
right = merge_sort(right)
Bottom-up merge sort is a non-recursive variant of merge sort.
See also this wikipedia page for a more detailed pseudocode implementation.
middle = len(lst) / 2
left = lst[:middle]
right = lst[middle:]
List slicing works fine.
As an aside - recursion is not undesirable per se.
Recursion is undesirable if you have limited stack space (are you afraid of stackoverflow? ;-) ), or in some cases where the time overhead of function calls is of great concern.
For much of the time these conditions do not hold; readability and maintainability of your code will be more relevant. Algorithms like merge sort make more sense when expressed recursively in my opinion.

Categories

Resources