Why exhausted generators raise StopIteration more than once? - python

Why is it that when an exhausted generator is called several times, StopIteration is raised every time, rather than just on the first attempt? Aren't subsequent calls meaningless, and indicate a likely bug in the caller's code?
def gen_func():
yield 1
yield 2
gen = gen_func()
next(gen)
next(gen)
next(gen) # StopIteration as expected
next(gen) # why StopIteration and not something to warn me that I'm doing something wrong
This also results in this behavior when someone accidentally uses an expired generator:
def do_work(gen):
for x in gen:
# do stuff with x
pass
# here I forgot that I already used up gen
# so the loop does nothing without raising any exception or warning
for x in gen:
# do stuff with x
pass
def gen_func():
yield 1
yield 2
gen = gen_func()
do_work(gen)
If second and later attempts to call an exhausted generator raised a different exception, it would have been easier to catch this type of bugs.
Perhaps there's an important use case for calling exhausted generators multiple times and getting StopIteration?

Perhaps there's an important use case for calling exhausted generators multiple times and getting StopIteration?
There is, specifically, when you want to perform multiple loops on the same iterator. Here's an example from the itertools docs that relies on this behavior:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)

It is a part of the iteration protocol:
Once an iterator’s __next__() method raises StopIteration, it must
continue to do so on subsequent calls. Implementations that do not
obey this property are deemed broken.
Source: https://docs.python.org/3/library/stdtypes.html#iterator-types

Here's an implementation of a wrapper that raises an error whenever StopIteration is raised more than once, as already noted by VPfB, this is implementation is considered broken
#!/usr/bin/env python3.8
from typing import TypeVar, Iterator
"""
https://docs.python.org/3/library/stdtypes.html#iterator-types
This is considered broken by the iterator protocol, god knows why
"""
class IteratorExhaustedError(Exception):
"""Exception raised when exhausted iterators are ``next``d"""
T = TypeVar("T")
class reuse_guard(Iterator[T]):
"""
Wraps an iterator so that StopIteration is only raised once,
after that, ``IteratorExhaustedError`` will be raised to detect
fixed-size iterator misuses
"""
def __init__(self, iterator: Iterator[T]):
self._iterated: bool = False
self._iterator = iterator
def __next__(self) -> T:
try:
return next(self._iterator)
except StopIteration as e:
if self._iterated:
raise IteratorExhaustedError(
"This iterator has already reached its end")
self._iterated = True
raise e
def __iter__(self) -> Iterator[T]:
return self
Example:
In [48]: iterator = reuse_guard(iter((1, 2, 3, 4)))
In [49]: list(iterator)
Out[49]: [1, 2, 3, 4]
In [50]: list(iterator)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-47-456650faec86> in __next__(self)
19 try:
---> 20 return next(self._iterator)
21 except StopIteration as e:
StopIteration:
During handling of the above exception, another exception occurred:
IteratorExhaustedError Traceback (most recent call last)
<ipython-input-50-5070d0fe4365> in <module>
----> 1 list(iterator)
<ipython-input-47-456650faec86> in __next__(self)
21 except StopIteration as e:
22 if self._iterated:
---> 23 raise IteratorExhaustedError(
24 "This iterator has already reached its end")
25 self._iterated = True
IteratorExhaustedError: This iterator has already reached its end
Edit:
After revisiting the documentation on the iterator protocol it seems to me that the purpose of stating that iterators that do not continue to raise StopIteration should be considered broken is aimed more at the iterators that yield values instead of raising exceptions, that in this case make it more clear that the iterator should not be used once it's been exhausted. This is merely my interpretation thought.

Related

Mixing yield and return. `yield [cand]; return` vs `return [[cand]]`. Why do they lead to different output? [duplicate]

This question already has answers here:
Return in generator together with yield
(2 answers)
Closed last year.
Why does
yield [cand]
return
lead to different output/behavior than
return [[cand]]
Minimal viable example
uses recursion
the output of the version using yield [1]; return is different than the output of the version using return [[1]]
def foo(i):
if i != 1:
yield [1]
return
yield from foo(i-1)
def bar(i):
if i != 1:
return [[1]]
yield from bar(i-1)
print(list(foo(1))) # [[1]]
print(list(bar(1))) # []
Min viable counter example
does not use recurion
the output of the version using yield [1]; return is the same as the output of the version using return [[1]]
def foo():
yield [1]
return
def foofoo():
yield from foo()
def bar():
return [[1]]
def barbar():
yield from bar()
print(list(foofoo())) # [[1]]
print(list(barbar())) # [[1]]
Full context
I'm solving Leetcode #39: Combination Sum and was wondering why one solution works, but not the other:
Working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
yield [cand]
return
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Non-working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
return [[cand]]
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Output
On the following input
candidates = [2,3,6,7]
target = 7
print(Solution().combinationSum(candidates, target))
the working solution correctly prints
[[3,2,2],[7]]
while the non-working solution prints
[]
I'm wondering why yield [cand]; return works, but return [[cand]] doesn't.
In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").
For example, try:
>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...
>>> f = foo() # Makes a generator object, stores it in f
>>> next(f) # Pull one value from generator
'onlyvalue'
>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'
As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.
In short, you have to pick one:
Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
Don't use yield, and return works as expected (because it's not a generator function).
As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:
__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles
# body of loop goes here
# Outside of loop, there is no StopIteration object left
As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).

list-comprehension throws a RuntimeError

Why does this code work well and does not throw exceptions?
def myzip(*args):
iters = [iter(arg) for arg in args]
try:
while True:
yield tuple([next(it) for it in iters])
except StopIteration:
return
for x, y, z in myzip([1, 2], [3, 4], [5, 6]):
print(x, y, z)
But if this line
yield tuple([next(it) for it in iters])
replace by
yield tuple(next(it) for it in iters)
then everything stops working and throws a RuntimeError?
This is a feature introduced in Python 3.5, rather than a bug. Per PEP-479, a RuntimeError is re-raised intentionally when a StopIteration is raised from inside a generator so that iterations based on the generator can now only be stopped if the generator returns, at which point a StopIteration exception is raised to stop the iterations.
Otherwise, prior to Python 3.5, a StopIteration exception raised anywhere in a generator will stop the generator rather than getting propagated, so that in case of:
a = list(F(x) for x in xs)
a = [F(x) for x in xs]
The former would get a truncated result if F(x) raises a StopIteration exception at some point during the iteration, which makes it hard to debug, while the latter would propagate the exception raised from F(x). The goal of the feature is to make the two statements behave the same, which is why the change affects generators but not list comprehensions.

Why are exceptions within a Python generator not caught?

I have the following experimental code whose function is similar to the zip built-in. What it tries to do should have been simple and clear, trying to return the zipped tuples one at a time until an IndexError occurs when we stop the generator.
def my_zip(*args):
i = 0
while True:
try:
yield (arg[i] for arg in args)
except IndexError:
raise StopIteration
i += 1
However, when I tried to execute the following code, the IndexError was not caught but instead thrown by the generator:
gen = my_zip([1,2], ['a','b'])
print(list(next(gen)))
print(list(next(gen)))
print(list(next(gen)))
IndexError Traceback (most recent call last)
I:\Software\WinPython-32bit-3.4.2.4\python-3.4.2\my\temp2.py in <module>()
12 print(list(next(gen)))
13 print(list(next(gen)))
---> 14 print(list(next(gen)))
I:\Software\WinPython-32bit-3.4.2.4\python-3.4.2\my\temp2.py in <genexpr>(.0)
3 while True:
4 try:
----> 5 yield (arg[i] for arg in args)
6 except IndexError:
7 raise StopIteration
IndexError: list index out of range
Why is this happening?
Edit:
Thanks #thefourtheye for providing a nice explanation for what's happening above. Now another problem occurs when I execute:
list(my_zip([1,2], ['a','b']))
This line never returns and seems to hang the machine. What's happening now?
The yield yields a generator object everytime and when the generators were created there was no problem at all. That is why try...except in my_zip is not catching anything. The third time when you executed it,
list(arg[2] for arg in args)
this is how it got reduced to (over simplified for our understanding) and now, observe carefully, list is iterating the generator, not the actual my_zip generator. Now, list calls next on the generator object and arg[2] is evaluated, only to find that 2 is not a valid index for arg (which is [1, 2] in this case), so IndexError is raised, and list fails to handle it (it has no reason to handle that anyway) and so it fails.
As per the edit,
list(my_zip([1,2], ['a','b']))
will be evaluated like this. First, my_zip will be called and that will give you a generator object. Then iterate it with list. It calls next on it, and it gets another generator object list(arg[0] for arg in args). Since there is no exception or return encountered, it will call next, to get another generator object list(arg[1] for arg in args) and it keeps on iterating. Remember, the yielded generators are never iterated, so we ll never get the IndexError. That is why the code runs infinitely.
You can confirm this like this,
from itertools import islice
from pprint import pprint
pprint(list(islice(my_zip([1, 2], ["a", 'b']), 10)))
and you will get
[<generator object <genexpr> at 0x7f4d0a709678>,
<generator object <genexpr> at 0x7f4d0a7096c0>,
<generator object <genexpr> at 0x7f4d0a7099d8>,
<generator object <genexpr> at 0x7f4d0a709990>,
<generator object <genexpr> at 0x7f4d0a7095a0>,
<generator object <genexpr> at 0x7f4d0a709510>,
<generator object <genexpr> at 0x7f4d0a7095e8>,
<generator object <genexpr> at 0x7f4d0a71c708>,
<generator object <genexpr> at 0x7f4d0a71c750>,
<generator object <genexpr> at 0x7f4d0a71c798>]
So the code tries to build an infinite list of generator objects.
def my_zip(*args):
i = 0
while True:
try:
yield (arg[i] for arg in args)
except IndexError:
raise StopIteration
i += 1
IndexError is not caught, because (arg[i] for arg in args) is a generator which is not executed immediately, but when you start iterating over it. And you iterate over it in another scope, when you call list((arg[i] for arg in args)):
# get the generator which yields another generator on each iteration
gen = my_zip([1,2], ['a','b'])
# get the second generator `(arg[i] for arg in args)` from the first one
# then iterate over it: list((arg[i] for arg in args))
print(list(next(gen)))
On the first list(next(gen)) i equals 0.
On the second list(next(gen)) i equals 1.
On the third list(next(gen)) i equals 2. And here you get IndexError -- in the outer scope. The line is treated as list(arg[2] for arg in ([1,2], ['a','b']))
Sorry, I'm not able to offer a coherent explanation regarding the failure to catch the exception, however, there's an easy way around it; use a for loop over the length of the shortest sequence:
def my_zip(*args):
for i in range(min(len(arg) for arg in args)):
yield (arg[i] for arg in args)
>>> gen = my_zip([1,2], ["a",'b','c'])
>>> print(list(next(gen)))
[1, 'a']
>>> print(list(next(gen)))
[2, 'b']
>>> print(list(next(gen)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Try replacing yield (arg[i] for ...) with the following.
for arg in args:
yield arg[i]
But in case of numbers that causes an exception as 1[1] makes no sense. I suggest replacing arg[i] just with arg.

Uncatchable Exceptions in Generators

I'm having issues with Python 2.7, whereby an exception raised from a generator is not catchable.
I've lost a fair amount of time, twice, with this behavior.
def gen_function():
raise Exception("Here.")
for i in xrange(10):
yield i
try:
gen_function()
except Exception as e:
print("Ex: %s" % (e,))
else:
print("No exception.")
Output:
No exception.
gen_function() will give you generator object
You need to call next() function to invoke the code.
You can do it directly with next function
g = gen_function()
next(g)
or
for i in g:
pass # or whatever you want
Both will trigger an exception
Calling a generator just gives you the generator object. No code in the generator is actually executed, yet. Usually this isn't obvious since you often apply the generator immediately:
for x in gen_function():
print x
In this case the exception is raised. But where? To make it more explicit when this happens I've made explicit the for ... in loop (this is essentially what it does behind-the-scenes):
generator_obj = gen_function() # no exception
it = iter(generator_obj) # no exception (note iter(generator_obj) is generator_obj)
while True:
try:
x = it.next() # exception raised here
except StopIteration:
break
print x

Return in generator together with yield

In Python 2 there was an error when return was together with yield in a function definition. But for this code in Python 3.3:
def f():
return 3
yield 2
x = f()
print(x.__next__())
there is no error that return is used in function with yield. However when the function __next__ is called then there is thrown exception StopIteration. Why there is not just returned value 3? Is this return somehow ignored?
This is a new feature in Python 3.3. Much like return in a generator has long been equivalent to raise StopIteration(), return <something> in a generator is now equivalent to raise StopIteration(<something>). For that reason, the exception you're seeing should be printed as StopIteration: 3, and the value is accessible through the attribute value on the exception object. If the generator is delegated to using the (also new) yield from syntax, it is the result. See PEP 380 for details.
def f():
return 1
yield 2
def g():
x = yield from f()
print(x)
# g is still a generator so we need to iterate to run it:
for _ in g():
pass
This prints 1, but not 2.
The return value is not ignored, but generators only yield values, a return just ends the generator, in this case early. Advancing the generator never reaches the yield statement in that case.
Whenever a iterator reaches the 'end' of the values to yield, a StopIteration must be raised. Generators are no exception. As of Python 3.3 however, any return expression becomes the value of the exception:
>>> def gen():
... return 3
... yield 2
...
>>> try:
... next(gen())
... except StopIteration as ex:
... e = ex
...
>>> e
StopIteration(3,)
>>> e.value
3
Use the next() function to advance iterators, instead of calling .__next__() directly:
print(next(x))

Categories

Resources