Remove the last N elements of a list

Remove the last N elements of a list - python

Is there a a better way to remove the last N elements of a list.
for i in range(0,n):
lst.pop( )

Works for n >= 1
>>> L = [1,2,3, 4, 5]
>>> n=2
>>> del L[-n:]
>>> L
[1, 2, 3]

if you wish to remove the last n elements, in other words, keep first len - n elements:
lst = lst[:len(lst)-n]
Note: This is not an in memory operation. It would create a shallow copy.

As Vincenzooo correctly says, the pythonic lst[:-n] does not work when n==0.
The following works for all n>=0:
lst = lst[:-n or None]
I like this solution because it is kind of readable in English too: "return a slice omitting the last n elements or none (if none needs to be omitted)".
This solution works because of the following:
x or y evaluates to x when x is logically true (e.g., when it is not 0, "", False, None, ...) and to y otherwise. So -n or None is -n when n!=0 and None when n==0.
When slicing, None is equivalent to omitting the value, so lst[:None] is the same as lst[:] (see here).
As noted by #swK, this solution creates a new list (but immediately discards the old one unless it's referenced elsewhere) rather than editing the original one. This is often not a problem in terms of performance as creating a new list in one go is often faster than removing one element at the time (unless n<<len(lst)). It is also often not a problem in terms of space as usually the members of the list take more space than the list itself (unless it's a list of small objects like bytes or the list has many duplicated entries). Please also note that this solution is not exactly equivalent to the OP's: if the original list is referenced by other variables, this solution will not modify (shorten) the other copies unlike in the OP's code.
A possible solution (in the same style as my original one) that works for n>=0 but: a) does not create a copy of the list; and b) also affects other references to the same list, could be the following:
lst[-n:n and None] = []
This is definitely not readable and should not be used. Actually, even my original solution requires too much understanding of the language to be quickly read and univocally understood by everyone. I wouldn't use either in any real code and I think the best solution is that by #wonder.mice: a[len(a)-n:] = [].

Just try to del it like this.
del list[-n:]

I see this was asked a long ago, but none of the answers did it for me; what if we want to get a list without the last N elements, but keep the original one: you just do list[:-n]. If you need to handle cases where n may equal 0, you do list[:-n or None].
>>> a = [1,2,3,4,5,6,7]
>>> b = a[:-4]
>>> b
[1, 2, 3]
>>> a
[1, 1, 2, 3, 4, 5, 7]
As simple as that.

Should be using this:
a[len(a)-n:] = []
or this:
del a[len(a)-n:]
It's much faster, since it really removes items from existing array. The opposite (a = a[:len(a)-1]) creates new list object and less efficient.
>>> timeit.timeit("a = a[:len(a)-1]\na.append(1)", setup="a=range(100)", number=10000000)
6.833014965057373
>>> timeit.timeit("a[len(a)-1:] = []\na.append(1)", setup="a=range(100)", number=10000000)
2.0737061500549316
>>> timeit.timeit("a[-1:] = []\na.append(1)", setup="a=range(100)", number=10000000)
1.507638931274414
>>> timeit.timeit("del a[-1:]\na.append(1)", setup="a=range(100)", number=10000000)
1.2029790878295898
If 0 < n you can use a[-n:] = [] or del a[-n:] which is even faster.

This is one of the cases in which being pythonic doesn't work for me and can give hidden bugs or mess.
None of the solutions above works for the case n=0.
Using l[:len(l)-n] works in the general case:
l=range(4)
for n in [2,1,0]: #test values for numbers of points to cut
print n,l[:len(l)-n]
This is useful for example inside a function to trim edges of a vector, where you want to leave the possibility not to cut anything.

Related

Efficient way to sequential adding multiple list elements

I have multiple lists. I want to merge the elements sequentially one-by-one.
Example:
a = [1, 2, 3]
b = [4, 5, 6]
c = [7, 8, 9]
Result should be:
d = [1, 4, 7, 2, 5, 8, 3, 6, 9]
One way of doing it is:
d = []
for i, j, k in zip(a, b, c):
d.extend([i, j, k])
Is this efficient? What is most efficient way here?

A one-liner could be
import itertools
list(itertools.chain.from_iterable(zip(a,b,c)))
A variant of your method is
d=[]
for i, j, k in zip(a, b, c):
d+=[i,j,k]
Out of curiosity, i've just
Just used timeit, out of curiosity, comparing your method, that variant, my one-liner, and also the one in Olvin's comment (let's call it compound version), and verdict is
yours: 1.06-1.08
my variant (with += instead of extend): 0.94-0.97
my one-liner: 1.10-1.12
Olvin's one-liner: 1.28-1.34
Sometimes, the nicest methods aren't the fastest.
Timing may change for longer lists tho.
The fact that += is faster than .extend is quite interesting (since .extend change the list, while += build a new one, and then replace the old one. Instinct would say that rebuilding lists should be faster that extending them. But maybe memory management says otherwise).
But, well, so far, the fastest one is my second version (with +=), which, incidentally, is also the one I find the most boring, among all solutions seen here.
Edit
Since that ranking bothered me (it's itertools iterators are supposed to be faster, since they are a little bit less interpreted and a little bit more compiled), I've tried with longer list. And it is then another story
a=list(range(1000))
b=list(range(1000,2000))
c=list(range(2000,3000))
And then timeit verdict (with 100 times less run than before) is
Your method: 1.91
My += variant: 1.59
My one-liner: 0.98
Olvin's one-liner: 1.88
So, at least, itertools does win in the long run (with big enough data).
Victory of += over .extend is affirmed (I don't really know the internals of memory management. But, coming from the C world, I would say that sometimes, a new malloc and copy is faster than a constantly realloc. But maybe that's a naive view of what happens under the hood in python's interpreter. But well, is faster than extend for this usage in the long run)
Olvin's method is quite equivalent to yours. Which surprises me a little. Because it is roughly the compound version of the same thing. But I would have thought that, while building up a compound list, python could skip some steps in intermediary representation, that it could not skip in your method, where all the versions of the list (the one with just [1,4,7], then the one with [1,4,7,2,5,8] etc.) do exist in the interpreter. May be the 0.03 difference between Olvin's method and yours are because of that (it is not just noise. With this size, the timings are quite constant, and the 0.03 difference is also). But I would have thought that the difference would be higher.
But well, even if the timing differences surprise me, the ranking of the method makes more sense with big lists. With itertools > += > [compound] > .extend

a = [1, 2, 3]
b = [4, 5, 6]
c = [7, 8, 9]
flat = zip(a,b,c)
d = [x for tpl in flat for x in tpl]
This list comprehension is the same as:
flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)

Popping the last element of a one-dimensional array

When it comes to lists, we all know and love good old pop, which removes the last item from the list and returns it:
>>> x = range(3)
>>> last_element = x.pop()
>>> last_element
2
>>> x
[0, 1]
But suppose I'm using a one-dimensional numpy array to hold my items, because I'm doing a lot of elementwise computations. What then is the most efficient way for me to achieve a pop?
Of course I can do
>>> import numpy as np
>>> x = np.arange(3)
>>> last_element = x[-1]
>>> x = np.delete(x, -1) # Or x = x[:-1]
>>> last_element
2
>>> x
array([0, 1])
And, really, when it comes down to it, this is fine. But is there a one-liner for arrays I'm missing that removes the last item and returns it at the same time?
And I'm not asking for
>>> last_element, x = x[-1], x[:-1]
I'm not counting this as a one-liner, because it's two distinct assignments achieved by two distinct operations. Syntactic sugar is what puts it all on one line. It's a sugary way to do what I've already done above. (Ha, I was sure someone would rush to give this as the answer, and, indeed, someone has. This answer is the equivalent of my asking, "What's a faster way to get to the store than walking?" and someone answering, "Walk, but walk faster." Uh . . . thanks. I already know how to walk.)

There is no such one liner for numpy (unless you write your own). numpy is meant to work on fixed sized objects (or objects that change less frequently). So by that metric a regular old python list is better for popping.
You are correct in that element-wise operations are better with numpy. You're going to have to profile out your code and see which performs better and make a design decision.

reverse() usage in python?

The sort() and reverse() methods modify the list in place for economy of space when sorting or reversing a large list. To remind you that they operate by side effect, they don’t return the sorted or reversed list.
The above text can be found at http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types
What does "modify the list in place for economy of space" mean?
Example:
x = ["happy", "sad"]
y = x.reverse()
would return None to y. So then why does,
x.reverse()
Successfully reverse x?

What does "modify the list in place for economy of space" mean?
What this means is that it does not create a copy of the list.
why does, x.reverse() Successfully reverse x?
I don't understand this question. It does this because that's what it's designed to do (reverse x and return None).
Note that sort() and reverse() have counterparts that don't modify the original and return a copy. These functions are called sorted() and reversed():
In [7]: x = [1, 2, 3]
In [8]: y = list(reversed(x))
In [9]: x
Out[9]: [1, 2, 3]
In [10]: y
Out[10]: [3, 2, 1]

list.reverse reverses the list in-place and returns None, it's designed to do so, similar to append, remove, extend, insert, etc.
In [771]: x = ["happy", "sad"]
In [772]: x.reverse()
In [773]: x
Out[773]: ['sad', 'happy']

In almost any programming language there are two kinds of callable definitions: functions and procedures.
Functions are methods or subroutines that do some processing and return a value. For instance, a function to sum two integers a and b will return the result of the sum a + b.
In the other hand, procedures are pretty much the same (even in the syntax in most languages) with the slight difference that don't return a value, and of course they are used by their side effects, meaning that they change the state of something or just process some data and save it or just print it out to a file.
So in this case, reverse would act as a procedure, which changes the state of the list being reversed (by reversing it), and here is how this can be done without extra space:
def reverse(l):
for i in range(len(l) / 2):
l[i], l[-i-1] = l[-i-1], l[i]
Notice that a new list is never created, instead, what the code does is to interchange the elements of the list l in place by swapping the first and the last, the second and the penultimate and so on until the element in the half of the list.
Hope this helps you understand ;)

Removing duplicates and preserving order when elements inside the list is list itself

I have a following problem while trying to do some nodal analysis:
For example:
my_list=[[1,2,3,1],[2,3,1,2],[3,2,1,3]]
I want to write a function that treats the element_list inside my_list in a following way:
-The number of occurrence of certain element inside the list of my_list is not important and, as long as the unique elements inside the list are same, they are identical.
Find the identical loop based on the above premises and only keep the
first one and ignore other identical lists of my_list while preserving
the order.
Thus, in above example the function should return just the first list which is [1,2,3,1] because all the lists inside my_list are equal based on above premises.
I wrote a function in python to do this but I think it can be shortened and I am not sure if this is an efficient way to do it. Here is my code:
def _remove_duplicate_loops(duplicate_loop):
loops=[]
for i in range(len(duplicate_loop)):
unique_el_list=[]
for j in range(len(duplicate_loop[i])):
if (duplicate_loop[i][j] not in unique_el_list):
unique_el_list.append(duplicate_loop[i][j])
loops.append(unique_el_list[:])
loops_set=[set(x) for x in loops]
unique_loop_dict={}
for k in range(len(loops_set)):
if (loops_set[k] not in list(unique_loop_dict.values())):
unique_loop_dict[k]=loops_set[k]
unique_loop_pos=list(unique_loop_dict.keys())
unique_loops=[]
for l in range(len(unique_loop_pos)):
unique_loops.append(duplicate_loop[l])
return unique_loops

from collections import OrderedDict
my_list = [[1, 2, 3, 1], [2, 3, 1, 2], [3, 2, 1, 3]]
seen_combos = OrderedDict()
for sublist in my_list:
unique_elements = frozenset(sublist)
if unique_elements not in seen_combos:
seen_combos[unique_elements] = sublist
my_list = seen_combos.values()

you could do it in a fairly straightforward way using dictionaries. but you'll need to use frozenset instead of set, as sets are mutable and therefore not hashable.
def _remove_duplicate_lists(duplicate_loop):
dupdict = OrderedDict((frozenset(x), x) for x in reversed(duplicate_loop))
return reversed(dupdict.values())
should do it. Note the double reversed() because normally the last item is the one that is preserved, where you want the first, and the double reverses accomplish that.
edit: correction, yes, per Steven's answer, it must be an OrderedDict(), or the values returned will not be correct. His version might be slightly faster too..
edit again: You need an ordered dict if the order of the lists is important. Say your list is
[[1,2,3,4], [4,3,2,1], [5,6,7,8]]
The ordered dict version will ALWAYS return
[[1,2,3,4], [5,6,7,8]]
However, the regular dict version may return the above, or may return
[[5,6,7,8], [1,2,3,4]]
If you don't care, a non-ordered dict version may be faster/use less memory.

Python - removing items from lists

# I have 3 lists:
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
# I want to create another that is L1 minus L2's memebers and L3's memebers, so:
L4 = (L1 - L2) - L3 # Of course this isn't going to work
I'm wondering, what is the "correct" way to do this. I can do it many different ways, but Python's style guide says there should be only 1 correct way of doing each thing. I've never known what this was.

Here are some tries:
L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ] # parens for clarity
tmpset = set( L2 + L3 )
L4 = [ n for n in L1 if n not in tmpset ]
Now that I have had a moment to think, I realize that the L2 + L3 thing creates a temporary list that immediately gets thrown away. So an even better way is:
tmpset = set(L2)
tmpset.update(L3)
L4 = [ n for n in L1 if n not in tmpset ]
Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving L2 and L3 for the set to iterate over directly like I have done here.
$ python -m timeit \
-s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'
10000 loops, best of 3: 39.7 usec per loop
All other alternatives (that I can think of) will necessarily be slower than this. Doing the loops ourselves, for example, rather than letting the set() constructor do them, adds expense:
$ python -m timeit \
-s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'
10000 loops, best of 3: 46.4 usec per loop
Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive:
$ python -m timeit \
-s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \
'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))'
10000 loops, best of 3: 47.1 usec per loop
So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in L1 and wants them removed once each for every time the duplicate appears in one of the other lists.

update::: post contains a reference to false allegations of inferior performance of sets compared to frozensets. I maintain that it's still sensible to use a frozenset in this instance, even though there's no need to hash the set itself, just because it's more correct semantically. Though, in practice, I might not bother typing the extra 6 characters. I'm not feeling motivated to go through and edit the post, so just be advised that the "allegations" link links to some incorrectly run tests. The gory details are hashed out in the comments. :::update
The second chunk of code posted by Brandon Craig Rhodes is quite good, but as he didn't respond to my suggestion about using a frozenset (well, not when I started writing this, anyway), I'm going to go ahead and post it myself.
The whole basis of the undertaking at hand is to check if each of a series of values (L1) are in another set of values; that set of values is the contents of L2 and L3. The use of the word "set" in that sentence is telling: even though L2 and L3 are lists, we don't really care about their list-like properties, like the order that their values are in or how many of each they contain. We just care about the set (there it is again) of values they collectively contain.
If that set of values is stored as a list, you have to go through the list elements one by one, checking each one. It's relatively time-consuming, and it's bad semantics: again, it's a "set" of values, not a list. So Python has these neat set types that hold a bunch of unique values, and can quickly tell you if some value is in them or not. This works in pretty much the same way that python's dict types work when you're looking up a key.
The difference between sets and frozensets is that sets are mutable, meaning that they can be modified after creation. Documentation on both types is here.
Since the set we need to create, the union of the values stored in L2 and L3, is not going to be modified once created, it's semantically appropriate to use an immutable data type. This also allegedly has some performance benefits. Well, it makes sense that it would have some advantage; otherwise, why would Python have frozenset as a builtin?
update...
Brandon has answered this question: the real advantage of frozen sets is that their immutability makes it possible for them to be hashable, allowing them to be dictionary keys or members of other sets.
I ran some informal timing tests comparing the speed for creation of and lookup on relatively large (3000-element) frozen and mutable sets; there wasn't much difference. This conflicts with the above link, but supports what Brandon says about them being identical but for the aspect of mutability.
...update
Now, because frozensets are immutable, they don't have an update method. Brandon used the set.update method to avoid creating and then discarding a temporary list en route to set creation; I'm going to take a different approach.
items = (item for lst in (L2, L3) for item in lst)
This generator expression makes items an iterator over, consecutively, the contents of L2 and L3. Not only that, but it does it without creating a whole list-full of intermediate objects. Using nested for expressions in generators is a bit confusing, but I manage to keep it sorted out by remembering that they nest in the same order that they would if you wrote actual for loops, e.g.
def get_items(lists):
for lst in lists:
for item in lst:
yield item
That generator function is equivalent to the generator expression that we assigned to items. Well, except that it's a parametrized function definition instead of a direct assignment to a variable.
Anyway, enough digression. The big deal with generators is that they don't actually do anything. Well, at least not right away: they just set up work to be done later, when the generator expression is iterated. This is formally referred to as being lazy. We're going to do that (well, I am, anyway) by passing items to the frozenset function, which iterates over it and returns a frosty cold frozenset.
unwanted = frozenset(items)
You could actually combine the last two lines, by putting the generator expression right inside the call to frozenset:
unwanted = frozenset(item for lst in (L2, L3) for item in lst)
This neat syntactical trick works as long as the iterator created by the generator expression is the only parameter to the function you're calling. Otherwise you have to write it in its usual separate set of parentheses, just like you were passing a tuple as an argument to the function.
Now we can build a new list in the same way that Brandon did, with a list comprehension. These use the same syntax as generator expressions, and do basically the same thing, except that they are eager instead of lazy (again, these are actual technical terms), so they get right to work iterating over the items and creating a list from them.
L4 = [item for item in L1 if item not in unwanted]
This is equivalent to passing a generator expression to list, e.g.
L4 = list(item for item in L1 if item not in unwanted)
but more idiomatic.
So this will create the list L4, containing the elements of L1 which weren't in either L2 or L3, maintaining the order that they were originally in and the number of them that there were.
If you just want to know which values are in L1 but not in L2 or L3, it's much easier: you just create that set:
L1_unique_values = set(L1) - unwanted
You can make a list out of it, as does st0le, but that might not really be what you want. If you really do want the set of values that are only found in L1, you might have a very good reason to keep that set as a set, or indeed a frozenset:
L1_unique_values = frozenset(L1) - unwanted
...Annnnd, now for something completely different:
from itertools import ifilterfalse, chain
L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))

Assuming your individual lists won't contain duplicates....Use Set and Difference
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
print(list(set(L1) - set(L2) - set(L3)))

This may be less pythonesque than the list-comprehension answer, but has a simpler look to it:
l1 = [ ... ]
l2 = [ ... ]
diff = list(l1) # this copies the list
for element in l2:
diff.remove(element)
The advantage here is that we preserve order of the list, and if there are duplicate elements, we remove only one for each time it appears in l2.

Doing such operations in Lists can hamper your program's performance very soon. What happens is with each remove, List operations do a fresh malloc & move elements around. This can be expensive if you have a very huge list or otherwise. So I would suggest this -
I am assuming your list has unique elements. Otherwise you need to maintain a list in your dict having duplicate values. Anyway for the data your provided, here it is-
METHOD 1
d = dict()
for x in L1: d[x] = True
# Check if L2 data is in 'd'
for x in L2:
if x in d:
d[x] = False
for x in L3:
if x in d:
d[x] = False
# Finally retrieve all keys with value as True.
final_list = [x for x in d if d[x]]
METHOD 2
If all that looks like too much code. Then you could try using set. But this way your list will loose all duplicate elements.
final_set = set.difference(set(L1),set(L2),set(L3))
final_list = list(final_set)

I think intuited's answer is way too long for such a simple problem, and Python already has a builtin function to chain two lists as a generator.
The procedure is as follows:
Use itertools.chain to chain L2 and L3 without creating a memory-consuming copy
Create a set from that (in this case, a frozenset will do because we don't change it after creation)
Use list comprehension to filter out elements that are in L1 and also in L2 or L3. As set/frozenset lookup (x in someset) is O(1), this will be very fast.
And now the code:
L1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
L2 = [4, 7, 8]
L3 = [5, 2, 9]
from itertools import chain
tmp = frozenset(chain(L2, L3))
L4 = [x for x in L1 if x not in tmp] # [1, 3, 6]
This should be one of the fastest, simplest and least memory-consuming solution.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.