better for-loop syntax for detecting empty sequences? - python

Is there a better way to write the following:
row_counter = 0
for item in iterable_sequence:
# do stuff with the item
counter += 1
if not row_counter:
# handle the empty-sequence-case
Please keep in mind that I can't use len(iterable_sequence) because 1) not all sequences have known lengths; 2) in some cases calling len() may trigger loading of the sequence's items into memory (as the case would be with sql query results).
The reason I ask is that I'm simply curious if there is a way to make above more concise and idiomatic. What I'm looking for is along the lines of:
for item in sequence:
#process item
*else*:
#handle the empty sequence case
(assuming "else" here worked only on empty sequences, which I know it doesn't)

for item in iterable:
break
else:
# handle the empty-sequence-case here
Or
item = next(iterator, sentinel)
if item is sentinel:
# handle the empty-sequence-case here
In each case one item is consumed if it is present.
An example of empty_adapter()'s implementation mentioned in the comments:
def empty_adaptor(iterable, sentinel=object()):
it = iter(iterable)
item = next(it, sentinel)
if item is sentinel:
return None # empty
else:
def gen():
yield item
for i in it:
yield i
return gen()
You could use it as follows:
it = empty_adaptor(some_iter)
if it is not None:
for i in it:
# handle items
else:
# handle empty case
Introducing special case for an empty sequence for a general case seems wrong. There should be a better solution for a domain specific problem.

It may be a job for itertools.tee
You "trigger" the sequence on the verification, but you are left with an untouched copy of the sequence afterwards:
from itertools import tee
check, sequence = tee(sequence, 2)
try:
check.next():
except StopIteration:
#empty sequence
for item in sequence:
#do stuff
(it's worth nting that tee does the "right" thing here: it will load just the first element of the sequence in the moment check.next() is executed - and this first elment will remain available in the sequence. The remaining items will only be retrieved as part of the for loop
Or just keeping it simple:
If you can't use len, you can't check if the sequence has a bool value of True, for the same reasons.
Therefore, your way seens simple enough -
another way would be to delete the name "item" before the "for" statement and
check if it exists after the loop:
del item
for item in sequence:
# do stuff
try:
item
except NameError:
# sequence is empty.
But your code should be used as its more clear than this.

The second example from J.F. Sebastian seems to be the ticket with a while loop.
NoItem = object()
myiter = (x for x in range(10))
item = next(myiter, NoItem)
if item is NoItem:
...
else:
while item is not NoItem:
print item
item = next(myiter, NoItem)
Not the most concise but objectively the clearest... Mud, no?

This shouldn't trigger len():
def handle_items(items):
index = -1
for index, item in enumerate(items):
print 'processing item #%d: %r' % (index, item)
# at this point, index will be the offset of the last item,
# i.e. length of items minus one
if index == -1:
print 'there were no items to process'
print 'done'
print
# test with an empty generator and two generators of different length:
handle_items(x for x in ())
handle_items(x for x in (1,))
handle_items(x for x in (1, 2, 3))

if not iterable_sequence.__length_hint__():
empty()
else:
for item in iterable_sequence:
dostuff()

Related

How to do something only to the first item within a loop in python? [duplicate]

This question already has answers here:
"For" loop first iteration
(13 answers)
Closed last year.
I want to do something different to the the first item in a list. What is the most pythonic way of doing so?
for item in list:
# only if its the first item, do something
# otherwise do something else
A few choices, in descending order of Pythonicity:
for index, item in enumerate(lst): # note: don't use list
if not index: # or if index == 0:
# first item
else:
# other items
Or:
first = True
for item in lst:
if first:
first = False
# first item
else:
# other items
Or:
for index in range(len(lst)):
item = lst[i]
if not index:
# first item
else:
# other items
You can create an iterator over the list using iter(), then call next() on it to get the first value, then loop on the remainder. I find this a quite elegent way to handle files where the first line is the header and the rest is data, i.e.
list_iterator = iter(lst)
# consume the first item
first_item = next(list_iterator)
# now loop on the tail
for item in list_iterator:
print(item)
do_something_with_first_item(lst[0])
for item in lst[1:]:
do_something_else(item)
Or:
is_first_item = True
for item in lst:
if is_first_item:
do_something_with_first_item(item)
is_first_item = False
else:
do_something_else(item)
Do not use list as a variable name, because this shadows the built-in function list().
The enumerate-based solution in jonrsharpe's answer is superior to this. You should probably use that instead.
Use a flag that you change after processing the first item. For example:
first = True
for item in my_list:
if first:
# Processing unique to the first item
first = False
else:
# Processing unique to other items
# Shared processing
You could also just process the first item:
first = my_list.pop(0)
# Process first
for item in my_list:
# Process item
# Add the first item back to the list at the beginning
my_list.insert(0, first)
jonrsharpe's first version using enumerate is clean and simple, and works with all iterables:
for index, item in enumerate(lst):
if not index:
do_something_with_first_item(item)
else:
do_something_else(item)
senshin's first solution using lst[0] and lst[1:] is even simpler, but only works with sequences:
do_something_with_first_item(lst[0])
for item in lst[1:]:
do_something_else(item)
You can get the best of both worlds by using iterators directly:
it = iter(lst)
do_something_with_first_item(next(it))
for item in it:
do_something_else(item)
But, despite being the best of both worlds, it's actually not quite as simple as either. This is only clearly worth doing if you know you have an iterator (so you can skip the first line), or you need one anyway to do itertools and genexpr and similar stuff to (so you were already going to write the first line). Whether it's worth doing in other cases is more of a style issue.

Why wont my code work for a single list but works for a nested list?

The first print system work when there are multiple lists passed into the function. However, when passing in only a single list, I get the error "AttributeError: 'int' object has no attribute 'pop'"
This code is attempting to remove one item from the list to see if that popped item still exists in the remaining list.
def check_row(p):
for e in p:
while e:
x = e.pop()
if x in e:
return False
return True
print check_row([[8,2,3,4,5],
[2,3,1,5,6],
[4,0,2,3,1]])
print check_row([1,2,3,4,5])
Many thanks.
You are popping the item from the element, not the outer list. If your elements are not lists, then don't try to treat them as such.
You cannot, however, remove items from the outer list while at the same time looping over it, and expect the loop to not jump items.
If you want to see if an item occurs more than once in the list, compare the length of the set() of the list instead:
def check_row(row):
return len(row) == len(set(row))
This only works for hashable values, which nested lists are not, but at the very least does not alter the list in place, as your code would do.
You could still use a list scan, but then at least use list.index() to limit the search to a start index beyond the current position:
def check_row(row):
for i, elem in enumerate(row):
try:
row.index(elem, i + 1)
return False # dupe found
except ValueError:
pass # no dupe found
return True
However, this assumes you wanted to only test the outer list for duplicates. Supporting a nested structure and flat structures in the same code without more details on what you expect to happen in each case is much more complicated.
In the case of a single (non-nested) list, you're calling .pop() on elements (e) which aren't lists and therefore presumably don't have a .pop method.
That's because e is an element of your list. In the nested one, e is a list, while in the second one, e is an integer. Therefore, e.pop is invalid for the second one.
You'll have to make it always nested:
>>> print(check_row([[1, 2, 3, 4, 5]]))
True
This way, the value passed to check_row is always a nested list, even if it has only one element.
But as far as checking if the elements are still in the other lists, i would firstly flatten the list, and then check if there are duplicate elements in the list.
import collections
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
for sub in flatten(el):
yield sub
else:
yield el
def check_row(p):
flat = list(flatten(p))
return len(flat) == len(set(flat))
This way, check_row will always produce the result you wanted, ignoring the fact it's a list or nested list :)
Hope this helps!
You're confusing yourself with your naming. The function you called check_row actually checks a list of rows, despite the name, so passing it a single row fails. The fact that you're using meaningless one-letter names doesn't help either. Let's rewrite it more clearly:
If you want a function that checks a single row,
def check_rows(rows):
for row in rows:
while row:
element = row.pop()
if element in row:
return False
return True
Now it should be clearer why it fails: You're passing it a single row as the rows, so for row in rows is getting the elements rather than the rows, and it all goes downhill from there.
What you probably want is a check_row function that works on a single row, and then a check_rows that calls check_row on each row:
def check_row(row):
while row:
element = row.pop()
if element in row:
return False
return True
def check_rows(rows):
for row in rows:
if not check_row(row):
return False
return True
But really, I don't know why you want this function at all. It destructively modifies the rows, removing every element up to the first duplicate. Why would you want that? For example, Martijn Pieters's solution is simpler and more efficient, as well as being non-destructive:
def check_row(row):
return len(set(row)) == len(row)
And while we're at it, let's use the all function instead of an explicit loop for check_rows:
def check_rows(rows):
return all(check_row(row) for row in rows)

I think this should raise an error, but it doesn't

Below is a simple function to remove duplicates in a list while preserving order. I've tried it and it actually works, so the problem here is my understanding. It seems to me that the second time you run uniq.remove(item) for a given item, it will return an error (KeyError or ValueError I think?) because that item has already been removed from the unique set. Is this not the case?
def unique(seq):
uniq = set(seq)
return [item for item in seq if item in uniq and not uniq.remove(item)]
There's a check if item in uniq which gets executed before the item is removed. The and operator is nice in that it "short circuits". This means that if the condition on the left evaluates to False-like, then the condition on the right doesn't get evaluated -- We already know the expression can't be True-like.
set.remove is an in-place operation. This means that it does not return anything (well, it returns None); and bool(None) is False.
So your list comprehension is effectively this:
answer = []
for item in seq:
if item in uniq and not uniq.remove(item):
answer.append(item)
and since python does short circuiting of conditionals (as others have pointed out), this is effectively:
answer = []
for item in seq:
if item in uniq:
if not uniq.remove(item):
answer.append(item)
Of course, since unique.remove(item) returns None (the bool of which is False), either both conditions are evaluated or neither.
The reason that the second condition exists is to remove item from uniq. This way, if/when you encounter item again (as a duplicate in seq), it will not be found in uniq because it was deleted from uniq the last time it was found there.
Now, keep in mind, that this is fairly dangerous as conditions that modify variables are considered bad style (imagine debugging such a conditional when you aren't fully familiar with what it does). Conditionals should really not modify the variables they check. As such, they should only read the variables, not write to them as well.
Hope this helps
mgilson and others has answered this question nicely, as usual. I thought I might point out what is probably the canonical way of doing this in python, namely using the unique_everseen recipe from the recipe section of the itertools docs, quoted below:
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_with_order(seq):
final = []
for item in seq:
if item not in final:
final.append(item)
return final
print unique_with_order([1,2,3,3,4,3,6])
Break it down, make it simple :) Not everything has to be a list comprehension these days.
#mgilson's answer is the right one, but here, for your information, is a possible lazy (generator) version of the same function. This means it'll work for iterables that don't fit in memory - including infinite iterators - as long as the set of its elements will.
def unique(iterable):
uniq = set()
for item in iterable:
if item not in uniq:
uniq.add(item)
yield item
The first time you run this function, you will get [1,2,3,4] from your list comprehension and the set uniq will be emptied. The second time you run this function, you will get [] because your set uniq will be empty. The reason you don't get any errors on the second run is that Python's and short circuits - it sees the first clause (item in uniq) is false and doesn't bother to run the second clause.

Print set() item fill with python generator

I have a generator:
foundUnique = set()
def unique_items(myList, index, clearFlag):
for item in myList:
if clearFlag is True:
foundUnique.clear()
clearFlag = False
if item[index] not in foundUnique:
yield item
foundUnique.add(item[index])
And I am using this `unique_items to get a unique list:
senderDupSend = unique_items(ip, 4, True)
Now I want my set to be reachable (I can print its element or do some changes on specific element .....) but when I write:
for item in foundUnique:
print item
It prints nothing!
But if I write:
for item in senderDupSend:
print item
for item in foundUnique:
print item
It prints all foundUnique items.
Please tell what did I do wrong? How can I solve this problem?
The problem is that unique_items is a generator so that
senderDupSend = unique_items(ip, 4, True)
is a generator that needs to be iterated over. When you run
for item in foundUnique:
print item
the generator has not actually run yet so foundUnique is still empty.
When you later go on to do
for item in senderDupSend: # This is what actually fills the list.
print item
for item in foundUnique:
print item
It should print out the set twice: once while it is being constructed and once after it is constructed.
It seems like what you are trying to do is construct a set that has the same index taken from every element of some sequence. You can do it like this very easily:
found_unique = set(item[index] for item in sequence)
In the concrete case that you show, it would be:
found_unique = set(item[4] for item in ip)
If you later wanted to extend the set to contain other items, you could do
found_unique.union(item[4] for item in other_ip_list)

Searching values of a list in another List using Python

I'm a trying to find a sublist of a list. Meaning if list1 say [1,5] is in list2 say [1,4,3,5,6] than it should return True. What I have so far is this:
for nums in l1:
if nums in l2:
return True
else:
return False
This would be true but I'm trying to return True only if list1 is in list2 in the respective order. So if list2 is [5,2,3,4,1], it should return False. I was thinking along the lines of comparing the index values of list1 using < but I'm not sure.
try:
last_found = -1
for num in L1:
last_found = L2.index(num, last_found + 1)
return True
except ValueError:
return False
The index method of list L2 returns the position at which the first argument (num) is found in the list; called, like here, with a second arg, it starts looking in the list at that position. If index does not find what it's looking for, it raises a ValueError exception.
So, this code uses this approach to look for each item num of L1, in order, inside L2. The first time it needs to start looking from position 0; each following time, it needs to start looking from the position just after the last one where it found the previous item, i.e. last_found + 1 (so at the start we must set last_found = -1 to start looking from position 0 the first time).
If every item in L1 is found this way (i.e. it's found in L2 after the position where the previous item was found), then the two lists meet the given condition and the code returns True. If any item of L1 is ever not-found, the code catches the resulting ValueError exception and just returns False.
A different approach would be to use iterators over the two lists, that can be formed with the iter built-in function. You can "advance" an iterator by calling built-in next on it; this will raise StopIteration if there is no "next item", i.e., the iterator is exhausted. You can also use for on the iterator for a somewhat smoother interface, where applicable. The low-level approach using the iter/next idea:
i1 = iter(L1)
i2 = iter(L2)
while True:
try:
lookfor = next(i1)
except StopIteration:
# no more items to look for == all good!
return True
while True:
try:
maybe = next(i2)
except StopIteration:
# item lookfor never matched == nope!
return False
if maybe == lookfor:
break
or, a bit higher-level:
i1 = iter(L1)
i2 = iter(L2)
for lookfor in i1:
for maybe in i2:
if maybe == lookfor:
break
else:
# item lookfor never matched == nope!
return False
# no more items to look for == all good!
return True
In fact, the only crucial use of iter here is to get i2 -- having the inner loop as for maybe in i2 guarantees the inner loop won't start looking from the beginning every time, but, rather, it will keep looking where it last left off. The outer loop might as well for for lookfor in L1:, since it has no "restarting" issue.
Key, here, is the else: clause of loops, which triggers if, and only if, the loop was not interrupted by break but rather exited naturally.
Working further on this idea we are again reminded of the in operator, which also can be made to continue where it last left off simply by using an iterator. Big simplification:
i2 = iter(L2)
for lookfor in L1:
if lookfor not in i2:
return False
# no more items to look for == all good!
return True
But now we recognize that is exactly the patter abstracted by the short-circuiting any and all built-in "short-circuiting accumulator" functions, so...:
i2 = iter(L2)
return all(lookfor in i2 for lookfor in L1)
which I believe is just about as simple as you can get. The only non-elementary bit left here is: you need to use an iter(L2) explicitly, just once, to make sure the in operator (intrinsically an inner loop) doesn't restart the search from the beginning but rather continues each time from where it last left off.
This question looks a bit like homework and for this reason I'd like to take the time and discuss what may be going wrong with the snippet shown in the question.
Although you are using a word in its plural form, for the nums variable, you need to understand that Python will use this variable to store ONE item from l1 at a time, and go through the block of code in this "for block", one time for each different item.
The result of your current snippet will therefore be to exit upon the very first iteration, with either True or False depending if by chance the first items in the list happen to match.
Edit: Yes, A1, exactly as you said: the logic exits with True after the first iteration. This is because of the "return" when nums is found in l2.
If you were to do nothing in the "found" case, the loop the logic would proceed with finishing whatever logic in the block (none here) and it would then start the next iteration. Therefore it would only exit with a "False" return value, in the case when an item from l1 is not found l2 (indeed after the very first such not-found item). Therefore your logic is almost correct (if it were to do nothing in the "found case"), the one thing missing would be to return "True", systematically after the for loop (since if it didn't exit with a False value within the loop, then all items of l2 are in l1...).
There are two ways to modify the code so it does nothing for the "found case".
- by using pass, which is a convenient way to instruct Python to do nothing; "pass" is typically used when "something", i.e. some action is syntactically required but we don't want anything done, but it can also be used when debugging etc.
- by rewriting the test as a "not in" instead
if nums not in l2:
return False
#no else:, i.e. do nothing at all if found
Now... Getting into more details.
There may be a flaw in your program (with the suggested changes), that is that it would consider l1 to be a sublist of l2, even if l1 had say 2 items with value say 5 whereby l2 only had one such value. I'm not sure if that kind of consideration is part of the problem (possibly the understanding is that both lists are "sets", with no possible duplicate items). If duplicates were allowed however, you would have to complicate the logic somewhat (a possible approach would be to intitially make a copy of l2 and each time "nums" is find in the l2 copy, to remove this item.
Another consideration is that maybe a list can only be said to be a sublist if its items are found the same order as the items in the other list... Again it all depends on the way the problem is defined... BTW some of the solutions proposed, like Alex Martelli's are written in such fashion because they solve the problem in a way that the order of items with the lists matter.
I think this solution is the fastest, since it iterates only once, albeit on the longer list and exits before finishing the iteration if a match is found. (Edit: However, it is not as succinct or as fast as Alex's latest solution)
def ck(l1,l2):
i,j = 0,len(l1)
for e in l2:
if e == l1[i]:
i += 1
if i == j:
return True
return False
An improvement was suggested by Anurag Uniyal (see comment) and is reflected in the showdown below.
Here are some speed results for a range of list size ratios (List l1 is a 10-element list containing random values from 1-10. List l2 ranges from 10-1000 in length (and also contain random values from 1-10).
Code that compares run times and plots the results:
import random
import os
import pylab
import timeit
def paul(l1,l2):
i = 0
j = len(l1)
try:
for e in l2:
if e == l1[i]:
i += 1
except IndexError: # thanks Anurag
return True
return False
def jed(list1, list2):
try:
for num in list1:
list2 = list2[list2.index(num):]
except: return False
else: return True
def alex(L1,L2): # wow!
i2 = iter(L2)
return all(lookfor in i2 for lookfor in L1)
from itertools import dropwhile
from operator import ne
from functools import partial
def thc4k_andrea(l1, l2):
it = iter(l2)
try:
for e in l1:
dropwhile(partial(ne, e), it).next()
return True
except StopIteration:
return False
ct = 100
ss = range(10,1000,100)
nms = 'paul alex jed thc4k_andrea'.split()
ls = dict.fromkeys(nms)
for nm in nms:
ls[nm] = []
setup = 'import test_sublist as x'
for s in ss:
l1 = [random.randint(1,10) for i in range(10)]
l2 = [random.randint(1,10) for i in range(s)]
for nm in nms:
stmt = 'x.'+nm+'(%s,%s)'%(str(l1),str(l2))
t = timeit.Timer(setup=setup, stmt=stmt).timeit(ct)
ls[nm].append( t )
pylab.clf()
for nm in nms:
print len(ss), len(ls[nm])
pylab.plot(ss,ls[nm],label=nm)
pylab.legend(loc=0)
pylab.xlabel('length of l2')
pylab.ylabel('time')
pylab.savefig('cmp_lsts.png')
os.startfile('cmp_lsts.png')
results:
This should be easy to understand and avoid corner case nicely as you don't need to work with indexes:
def compare(l1, l2):
it = iter(l2)
for e in l1:
try:
while it.next() != e: pass
except StopIteration: return False
return True
it tries to compare each element of l1 to the next element in l2.
if there is no next element (StopIteration) it returns false (it visited the whole l2 and didn't find the current e) else it found it, so it returns true.
For faster execution it may help to put the try block outside the for:
def compare(l1, l2):
it = iter(l2)
try:
for e in l1:
while it.next() != e: pass
except StopIteration: return False
return True
I have a hard time seeing questions like this and not wishing that Python's list handling was more like Haskell's. This seems a much more straightforward solution than anything I could come up with in Python:
contains_inorder :: Eq a => [a] -> [a] -> Bool
contains_inorder [] _ = True
contains_inorder _ [] = False
contains_inorder (x:xs) (y:ys) | x == y = contains_inorder xs ys
| otherwise = contains_inorder (x:xs) ys
The ultra-optimized version of Andrea's solution:
from itertools import dropwhile
from operator import ne
from functools import partial
def compare(l1, l2):
it = iter(l2)
try:
for e in l1:
dropwhile(partial(ne, e), it).next()
return True
except StopIteration:
return False
This can be written even more functional style:
def compare(l1,l2):
it = iter(l2)
# any( True for .. ) because any([0]) is False, which we don't want here
return all( any(True for _ in dropwhile(partial(ne, e), it)) for e in l1 )
I have a feeling this is more intensive than Alex's answer, but here was my first thought:
def test(list1, list2):
try:
for num in list1:
list2 = list2[list2.index(num):]
except: return False
else: return True
Edit: Just tried it. His is faster. It's close.
Edit 2: Moved try/except out of the loop (this is why others should look at your code). Thanks, gnibbler.

Categories

Resources