Below is a simple function to remove duplicates in a list while preserving order. I've tried it and it actually works, so the problem here is my understanding. It seems to me that the second time you run uniq.remove(item) for a given item, it will return an error (KeyError or ValueError I think?) because that item has already been removed from the unique set. Is this not the case?
def unique(seq):
uniq = set(seq)
return [item for item in seq if item in uniq and not uniq.remove(item)]
There's a check if item in uniq which gets executed before the item is removed. The and operator is nice in that it "short circuits". This means that if the condition on the left evaluates to False-like, then the condition on the right doesn't get evaluated -- We already know the expression can't be True-like.
set.remove is an in-place operation. This means that it does not return anything (well, it returns None); and bool(None) is False.
So your list comprehension is effectively this:
answer = []
for item in seq:
if item in uniq and not uniq.remove(item):
answer.append(item)
and since python does short circuiting of conditionals (as others have pointed out), this is effectively:
answer = []
for item in seq:
if item in uniq:
if not uniq.remove(item):
answer.append(item)
Of course, since unique.remove(item) returns None (the bool of which is False), either both conditions are evaluated or neither.
The reason that the second condition exists is to remove item from uniq. This way, if/when you encounter item again (as a duplicate in seq), it will not be found in uniq because it was deleted from uniq the last time it was found there.
Now, keep in mind, that this is fairly dangerous as conditions that modify variables are considered bad style (imagine debugging such a conditional when you aren't fully familiar with what it does). Conditionals should really not modify the variables they check. As such, they should only read the variables, not write to them as well.
Hope this helps
mgilson and others has answered this question nicely, as usual. I thought I might point out what is probably the canonical way of doing this in python, namely using the unique_everseen recipe from the recipe section of the itertools docs, quoted below:
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_with_order(seq):
final = []
for item in seq:
if item not in final:
final.append(item)
return final
print unique_with_order([1,2,3,3,4,3,6])
Break it down, make it simple :) Not everything has to be a list comprehension these days.
#mgilson's answer is the right one, but here, for your information, is a possible lazy (generator) version of the same function. This means it'll work for iterables that don't fit in memory - including infinite iterators - as long as the set of its elements will.
def unique(iterable):
uniq = set()
for item in iterable:
if item not in uniq:
uniq.add(item)
yield item
The first time you run this function, you will get [1,2,3,4] from your list comprehension and the set uniq will be emptied. The second time you run this function, you will get [] because your set uniq will be empty. The reason you don't get any errors on the second run is that Python's and short circuits - it sees the first clause (item in uniq) is false and doesn't bother to run the second clause.
Related
I was practicing programming by Python on leetcode.
So this is the problem:
https://leetcode.com/problems/reverse-vowels-of-a-string/
And this is my answer:
def reverseVowels(s):
result = list(s)
v_str = 'aeiouAEIOU'
v_list = [item for item in s if item in v_str]
v_list.reverse()
v_index = 0
for i, item in enumerate(s):
if item in v_list:
result[i] = v_list[v_index]
v_index+=1
return ''.join(result)
The result: Time Limit Exceeded
And I found an extremely similar answer in the discussion:
def reverseVowels(s):
lst = list(s)
vowels_str = "aeiouAEIOU"
vowels_list = [item for item in lst if item in vowels_str]
vowels_list.reverse()
vowels_index = 0
for index, item in enumerate(lst):
if item in vowels_str:
lst[index] = vowels_list[vowels_index]
vowels_index += 1
return ''.join(lst)
The result: Accepted
This is so weird. I think these two code seem exactly the same.
The difference of them is nothing but parameters.
I am curious about why these codes cause distinct results.
There are two different lines between both codes.
First one is:
for index, item in enumerate(lst):
for i, item in enumerate(s):
In the first case it iterates through the list and in the second it iterates through the string. There may be some performance loss here but it is not the main problem.
if item in vowels_str:
if item in v_list:
This is where the running time goes. In the first case (working code) it looks for the character in the string made of vowels, which has constant length.
In the second case, it looks for the character in the list of all vowels contained in the string, which can be huge depending of the string given in test.
In the first of these you are iterating over the string (s) directly, multiple times. In the second, after converting to a list, you are iterating over that list (lst).
The exact reason this causes a difference is an (admittedly large and probably important for correctness) implementation detail in the python interpreter.
See related question for more discussion: Why is it slower to iterate over a small string than a small list?
In python you can do list.pop(i) which removes and returns the element in index i, but is there a built in function like list.remove(e) where it removes and returns the first element equal to e?
Thanks
I mean, there is list.remove, yes.
>>> x = [1,2,3]
>>> x.remove(1)
>>> x
[2, 3]
I don't know why you need it to return the removed element, though. You've already passed it to list.remove, so you know what it is... I guess if you've overloaded __eq__ on the objects in the list so that it doesn't actually correspond to some reasonable notion of equality, you could have problems. But don't do that, because that would be terrible.
If you have done that terrible thing, it's not difficult to roll your own function that does this:
def remove_and_return(lst, item):
return lst.pop(lst.index(item))
Is there a builtin? No. Probably because if you already know the element you want to remove, then why bother returning it?1
The best you can do is get the index, and then pop it. Ultimately, this isn't such a big deal -- Chaining 2 O(n) algorithms is still O(n), so you still scale roughly the same ...
def extract(lst, item):
idx = lst.index(item)
return lst.pop(idx)
1Sure, there are pathological cases where the item returned might not be the item you already know... but they aren't important enough to warrant a new method which takes only 3 lines to write yourself :-)
Strictly speaking, you would need something like:
def remove(lst, e):
i = lst.index(e)
# error if e not in lst
a = lst[i]
lst.pop(i)
return a
Which would make sense only if e == a is true, but e is a is false, and you really need a instead of e.
In most case, though, I would say that this suggest something suspicious in your code.
A short version would be :
a = lst.pop(lst.index(e))
This question already has answers here:
"For" loop first iteration
(13 answers)
Closed last year.
I want to do something different to the the first item in a list. What is the most pythonic way of doing so?
for item in list:
# only if its the first item, do something
# otherwise do something else
A few choices, in descending order of Pythonicity:
for index, item in enumerate(lst): # note: don't use list
if not index: # or if index == 0:
# first item
else:
# other items
Or:
first = True
for item in lst:
if first:
first = False
# first item
else:
# other items
Or:
for index in range(len(lst)):
item = lst[i]
if not index:
# first item
else:
# other items
You can create an iterator over the list using iter(), then call next() on it to get the first value, then loop on the remainder. I find this a quite elegent way to handle files where the first line is the header and the rest is data, i.e.
list_iterator = iter(lst)
# consume the first item
first_item = next(list_iterator)
# now loop on the tail
for item in list_iterator:
print(item)
do_something_with_first_item(lst[0])
for item in lst[1:]:
do_something_else(item)
Or:
is_first_item = True
for item in lst:
if is_first_item:
do_something_with_first_item(item)
is_first_item = False
else:
do_something_else(item)
Do not use list as a variable name, because this shadows the built-in function list().
The enumerate-based solution in jonrsharpe's answer is superior to this. You should probably use that instead.
Use a flag that you change after processing the first item. For example:
first = True
for item in my_list:
if first:
# Processing unique to the first item
first = False
else:
# Processing unique to other items
# Shared processing
You could also just process the first item:
first = my_list.pop(0)
# Process first
for item in my_list:
# Process item
# Add the first item back to the list at the beginning
my_list.insert(0, first)
jonrsharpe's first version using enumerate is clean and simple, and works with all iterables:
for index, item in enumerate(lst):
if not index:
do_something_with_first_item(item)
else:
do_something_else(item)
senshin's first solution using lst[0] and lst[1:] is even simpler, but only works with sequences:
do_something_with_first_item(lst[0])
for item in lst[1:]:
do_something_else(item)
You can get the best of both worlds by using iterators directly:
it = iter(lst)
do_something_with_first_item(next(it))
for item in it:
do_something_else(item)
But, despite being the best of both worlds, it's actually not quite as simple as either. This is only clearly worth doing if you know you have an iterator (so you can skip the first line), or you need one anyway to do itertools and genexpr and similar stuff to (so you were already going to write the first line). Whether it's worth doing in other cases is more of a style issue.
The first print system work when there are multiple lists passed into the function. However, when passing in only a single list, I get the error "AttributeError: 'int' object has no attribute 'pop'"
This code is attempting to remove one item from the list to see if that popped item still exists in the remaining list.
def check_row(p):
for e in p:
while e:
x = e.pop()
if x in e:
return False
return True
print check_row([[8,2,3,4,5],
[2,3,1,5,6],
[4,0,2,3,1]])
print check_row([1,2,3,4,5])
Many thanks.
You are popping the item from the element, not the outer list. If your elements are not lists, then don't try to treat them as such.
You cannot, however, remove items from the outer list while at the same time looping over it, and expect the loop to not jump items.
If you want to see if an item occurs more than once in the list, compare the length of the set() of the list instead:
def check_row(row):
return len(row) == len(set(row))
This only works for hashable values, which nested lists are not, but at the very least does not alter the list in place, as your code would do.
You could still use a list scan, but then at least use list.index() to limit the search to a start index beyond the current position:
def check_row(row):
for i, elem in enumerate(row):
try:
row.index(elem, i + 1)
return False # dupe found
except ValueError:
pass # no dupe found
return True
However, this assumes you wanted to only test the outer list for duplicates. Supporting a nested structure and flat structures in the same code without more details on what you expect to happen in each case is much more complicated.
In the case of a single (non-nested) list, you're calling .pop() on elements (e) which aren't lists and therefore presumably don't have a .pop method.
That's because e is an element of your list. In the nested one, e is a list, while in the second one, e is an integer. Therefore, e.pop is invalid for the second one.
You'll have to make it always nested:
>>> print(check_row([[1, 2, 3, 4, 5]]))
True
This way, the value passed to check_row is always a nested list, even if it has only one element.
But as far as checking if the elements are still in the other lists, i would firstly flatten the list, and then check if there are duplicate elements in the list.
import collections
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
for sub in flatten(el):
yield sub
else:
yield el
def check_row(p):
flat = list(flatten(p))
return len(flat) == len(set(flat))
This way, check_row will always produce the result you wanted, ignoring the fact it's a list or nested list :)
Hope this helps!
You're confusing yourself with your naming. The function you called check_row actually checks a list of rows, despite the name, so passing it a single row fails. The fact that you're using meaningless one-letter names doesn't help either. Let's rewrite it more clearly:
If you want a function that checks a single row,
def check_rows(rows):
for row in rows:
while row:
element = row.pop()
if element in row:
return False
return True
Now it should be clearer why it fails: You're passing it a single row as the rows, so for row in rows is getting the elements rather than the rows, and it all goes downhill from there.
What you probably want is a check_row function that works on a single row, and then a check_rows that calls check_row on each row:
def check_row(row):
while row:
element = row.pop()
if element in row:
return False
return True
def check_rows(rows):
for row in rows:
if not check_row(row):
return False
return True
But really, I don't know why you want this function at all. It destructively modifies the rows, removing every element up to the first duplicate. Why would you want that? For example, Martijn Pieters's solution is simpler and more efficient, as well as being non-destructive:
def check_row(row):
return len(set(row)) == len(row)
And while we're at it, let's use the all function instead of an explicit loop for check_rows:
def check_rows(rows):
return all(check_row(row) for row in rows)
Simply put! there is this list say LST = [[12,1],[23,2],[16,3],[12,4],[14,5]] and i want to get all the minimum elements of this list according to its first element of the inside list. So for the above example the answer would be [12,1] and [12,4]. Is there any typical way in python of doing this?
Thanking you in advance.
Two passes:
minval = min(LST)[0]
return [x for x in LST if x[0] == minval]
One pass:
def all_minima(iterable, key=None):
if key is None: key = id
hasminvalue = False
minvalue = None
minlist = []
for entry in iterable:
value = key(entry)
if not hasminvalue or value < minvalue:
minvalue = value
hasminvalue = True
minlist = [entry]
elif value == minvalue:
minlist.append(entry)
return minlist
from operator import itemgetter
return all_minima(LST, key=itemgetter(0))
A compact single-pass solution requires sorting the list -- that's technically O(N log N) for an N-long list, but Python's sort is so good, and so many sequences "just happen" to have some embedded order in them (which timsort cleverly exploits to go faster), that sorting-based solutions sometimes have surprisingly good performance in the real world.
Here's a solution requiring 2.6 or better:
import itertools
import operator
f = operator.itemgetter(0)
def minima(lol):
return list(next(itertools.groupby(sorted(lol, key=f), key=f))[1])
To understand this approach, looking "from the inside, outwards" helps.
f, i.e., operator.itemgetter(0), is a key-function that picks the first item of its argument for ordering purposes -- the very purpose of operator.itemgetter is to easily and compactly build such functions.
sorted(lol, key=f) therefore returns a sorted copy of the list-of-lists lol, ordered by increasing first item. If you omit the key=f the sorted copy will be ordered lexicographically, so it will also be in order of increasing first item, but that acts only as the "primary key" -- items with the same first sub-item will in turn be sorted among them by the values of their second sub-items, and so forth -- while with the key=f you're guaranteed to preserve the original order among items with the same first sub-item. You don't specify which behavior you require (and in your example the two behaviors happen to produce the same result, so we cannot distinguish from that example) which is why I'm carefully detailing both possibilities so you can choose.
itertools.groupby(sorted(lol, key=f), key=f) performs the "grouping" task that is the heart of the operation: it yields groups from the sequence (in this case, the sequence sorted provides) based on the key ordering criteria. That is, a group with all adjacent items producing the same value among themselves when you call f with the item as an argument, then a group with all adjacent item producing a different value from the first group (but same among themselves), and so forth. groupby respect the ordering of the sequence it takes as its argument, which is why we had to sort the lol first (and this behavior of groupby makes it very useful in many cases in which the sequence's ordering does matter).
Each result yielded by groupby is a pair k, g: a key k which is the result of f(i) on each item in the group, an iterator g which yields each item in the group in sequence.
The next built-in (the only bit in this solution which requires Python 2.6) given an iterator produces its next item -- in particular, the first item when called on a fresh, newly made iterator (and, every generator of course is an iterator, as is groupby's result). In earlier Python versions, it would have to be groupby(...).next() (since next was only a method of iterators, not a built-in), which is deprecated since 2.6.
So, summarizing, the result of our next(...) is exactly the pair k, g where k is the minimum (i.e., first after sorting) value for the first sub-item, and g is an iterator for the group's items.
So, with that [1] we pick just the iterator, so we have an iterator yielding just the subitems we want.
Since we want a list, not an iterator (per your specs), the outermost list(...) call completes the job.
Is all of this worth it, performance-wise? Not on the tiny example list you give -- minima is actually slower than either code in #Kenny's answer (of which the first, "two-pass" solution is speedier). I just think it's worth keeping the ideas in mind for the next sequence processing problem you may encounter, where the details of typical inputs may be quite different (longer lists, rarer minima, partial ordering in the input, &c, &c;-).
m = min(LST, key=operator.itemgetter(0))[0]
print [x for x in LST if x[0] == m]
minval = min(x[0] for x in LST)
result = [x for x in LST if x[0]==minval]