Why does this code work well and does not throw exceptions?
def myzip(*args):
iters = [iter(arg) for arg in args]
try:
while True:
yield tuple([next(it) for it in iters])
except StopIteration:
return
for x, y, z in myzip([1, 2], [3, 4], [5, 6]):
print(x, y, z)
But if this line
yield tuple([next(it) for it in iters])
replace by
yield tuple(next(it) for it in iters)
then everything stops working and throws a RuntimeError?
This is a feature introduced in Python 3.5, rather than a bug. Per PEP-479, a RuntimeError is re-raised intentionally when a StopIteration is raised from inside a generator so that iterations based on the generator can now only be stopped if the generator returns, at which point a StopIteration exception is raised to stop the iterations.
Otherwise, prior to Python 3.5, a StopIteration exception raised anywhere in a generator will stop the generator rather than getting propagated, so that in case of:
a = list(F(x) for x in xs)
a = [F(x) for x in xs]
The former would get a truncated result if F(x) raises a StopIteration exception at some point during the iteration, which makes it hard to debug, while the latter would propagate the exception raised from F(x). The goal of the feature is to make the two statements behave the same, which is why the change affects generators but not list comprehensions.
Related
This question already has answers here:
Return in generator together with yield
(2 answers)
Closed last year.
Why does
yield [cand]
return
lead to different output/behavior than
return [[cand]]
Minimal viable example
uses recursion
the output of the version using yield [1]; return is different than the output of the version using return [[1]]
def foo(i):
if i != 1:
yield [1]
return
yield from foo(i-1)
def bar(i):
if i != 1:
return [[1]]
yield from bar(i-1)
print(list(foo(1))) # [[1]]
print(list(bar(1))) # []
Min viable counter example
does not use recurion
the output of the version using yield [1]; return is the same as the output of the version using return [[1]]
def foo():
yield [1]
return
def foofoo():
yield from foo()
def bar():
return [[1]]
def barbar():
yield from bar()
print(list(foofoo())) # [[1]]
print(list(barbar())) # [[1]]
Full context
I'm solving Leetcode #39: Combination Sum and was wondering why one solution works, but not the other:
Working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
yield [cand]
return
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Non-working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
return [[cand]]
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Output
On the following input
candidates = [2,3,6,7]
target = 7
print(Solution().combinationSum(candidates, target))
the working solution correctly prints
[[3,2,2],[7]]
while the non-working solution prints
[]
I'm wondering why yield [cand]; return works, but return [[cand]] doesn't.
In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").
For example, try:
>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...
>>> f = foo() # Makes a generator object, stores it in f
>>> next(f) # Pull one value from generator
'onlyvalue'
>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'
As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.
In short, you have to pick one:
Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
Don't use yield, and return works as expected (because it's not a generator function).
As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:
__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles
# body of loop goes here
# Outside of loop, there is no StopIteration object left
As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).
As indicated in the documentation, the default value is returned if the iterator is exhausted. However, in the following program, the g(x) function is not exhausted, and I hope that the error from f(x) would not be processed in the next function.
def f(x) :
if 0 : # to make sure that nothing is generated
yield 10
def g(x) :
yield next(f(x))
# list(g(3))
next(g(3), None)
What I expect:
Traceback (most recent call last):
File "a.py", line 9, in <module>
next(g(3), None)
File "a.py", line 6, in g
yield next(f(x))
StopIteration
What I encountered is that the program was running successfully.
Can I use an alternating approach to achieve the goal? Or can it be fixed in Python?
Edit: The program mentioned above may be modified like this in order to prevent ambiguation.
def f(x) :
if 0 : # to make sure that nothing is generated
yield 10
def g(x) :
f(x).__next__() # g(x) is not exhausted at this time
yield 'something meaningful'
# I hope that the next function will only catch this line
# list(g(3))
next(g(3), None)
next with a default parameter catches the StopIteration no matter the source.
The behavior you're seeing is expected, and maybe better understood using this code:
def justraise():
yield next(iter([])) # raises StopIteration
next(justraise(), None) # None
next(justraise()) # raises StopIteration
Moving to your code - even though the inner use is of next without a default argument, the StopIteration it raised is caught in the outer next with the default argument.
If you have a meaningful exception to raise, you should raise a meaningful exception and not StopIteration which indicates the iteration ended (and not erroneously) - which is what next relies on.
g(x) is an iterator that always yields f(x), which yields Nothing, and raises a StopIteration (in f)
You can check that next(f(some_value)) does throw an exception when called itself.
As will
def g(x):
return next(f(x))
But, you've added the default None, so that g(x) will run, but simply return back None since the iterator is exhausted.
If you remove the None, then you see
In [5]: next(g(3))
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-14-05eb86fce40b> in <module>()
----> 1 next(g(3))
<ipython-input-13-a4323284f776> in g(x)
1 def g(x) :
----> 2 yield next(f(x))
3
StopIteration:
Why is it that when an exhausted generator is called several times, StopIteration is raised every time, rather than just on the first attempt? Aren't subsequent calls meaningless, and indicate a likely bug in the caller's code?
def gen_func():
yield 1
yield 2
gen = gen_func()
next(gen)
next(gen)
next(gen) # StopIteration as expected
next(gen) # why StopIteration and not something to warn me that I'm doing something wrong
This also results in this behavior when someone accidentally uses an expired generator:
def do_work(gen):
for x in gen:
# do stuff with x
pass
# here I forgot that I already used up gen
# so the loop does nothing without raising any exception or warning
for x in gen:
# do stuff with x
pass
def gen_func():
yield 1
yield 2
gen = gen_func()
do_work(gen)
If second and later attempts to call an exhausted generator raised a different exception, it would have been easier to catch this type of bugs.
Perhaps there's an important use case for calling exhausted generators multiple times and getting StopIteration?
Perhaps there's an important use case for calling exhausted generators multiple times and getting StopIteration?
There is, specifically, when you want to perform multiple loops on the same iterator. Here's an example from the itertools docs that relies on this behavior:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
It is a part of the iteration protocol:
Once an iterator’s __next__() method raises StopIteration, it must
continue to do so on subsequent calls. Implementations that do not
obey this property are deemed broken.
Source: https://docs.python.org/3/library/stdtypes.html#iterator-types
Here's an implementation of a wrapper that raises an error whenever StopIteration is raised more than once, as already noted by VPfB, this is implementation is considered broken
#!/usr/bin/env python3.8
from typing import TypeVar, Iterator
"""
https://docs.python.org/3/library/stdtypes.html#iterator-types
This is considered broken by the iterator protocol, god knows why
"""
class IteratorExhaustedError(Exception):
"""Exception raised when exhausted iterators are ``next``d"""
T = TypeVar("T")
class reuse_guard(Iterator[T]):
"""
Wraps an iterator so that StopIteration is only raised once,
after that, ``IteratorExhaustedError`` will be raised to detect
fixed-size iterator misuses
"""
def __init__(self, iterator: Iterator[T]):
self._iterated: bool = False
self._iterator = iterator
def __next__(self) -> T:
try:
return next(self._iterator)
except StopIteration as e:
if self._iterated:
raise IteratorExhaustedError(
"This iterator has already reached its end")
self._iterated = True
raise e
def __iter__(self) -> Iterator[T]:
return self
Example:
In [48]: iterator = reuse_guard(iter((1, 2, 3, 4)))
In [49]: list(iterator)
Out[49]: [1, 2, 3, 4]
In [50]: list(iterator)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-47-456650faec86> in __next__(self)
19 try:
---> 20 return next(self._iterator)
21 except StopIteration as e:
StopIteration:
During handling of the above exception, another exception occurred:
IteratorExhaustedError Traceback (most recent call last)
<ipython-input-50-5070d0fe4365> in <module>
----> 1 list(iterator)
<ipython-input-47-456650faec86> in __next__(self)
21 except StopIteration as e:
22 if self._iterated:
---> 23 raise IteratorExhaustedError(
24 "This iterator has already reached its end")
25 self._iterated = True
IteratorExhaustedError: This iterator has already reached its end
Edit:
After revisiting the documentation on the iterator protocol it seems to me that the purpose of stating that iterators that do not continue to raise StopIteration should be considered broken is aimed more at the iterators that yield values instead of raising exceptions, that in this case make it more clear that the iterator should not be used once it's been exhausted. This is merely my interpretation thought.
I'm having issues with Python 2.7, whereby an exception raised from a generator is not catchable.
I've lost a fair amount of time, twice, with this behavior.
def gen_function():
raise Exception("Here.")
for i in xrange(10):
yield i
try:
gen_function()
except Exception as e:
print("Ex: %s" % (e,))
else:
print("No exception.")
Output:
No exception.
gen_function() will give you generator object
You need to call next() function to invoke the code.
You can do it directly with next function
g = gen_function()
next(g)
or
for i in g:
pass # or whatever you want
Both will trigger an exception
Calling a generator just gives you the generator object. No code in the generator is actually executed, yet. Usually this isn't obvious since you often apply the generator immediately:
for x in gen_function():
print x
In this case the exception is raised. But where? To make it more explicit when this happens I've made explicit the for ... in loop (this is essentially what it does behind-the-scenes):
generator_obj = gen_function() # no exception
it = iter(generator_obj) # no exception (note iter(generator_obj) is generator_obj)
while True:
try:
x = it.next() # exception raised here
except StopIteration:
break
print x
In Python 2 there was an error when return was together with yield in a function definition. But for this code in Python 3.3:
def f():
return 3
yield 2
x = f()
print(x.__next__())
there is no error that return is used in function with yield. However when the function __next__ is called then there is thrown exception StopIteration. Why there is not just returned value 3? Is this return somehow ignored?
This is a new feature in Python 3.3. Much like return in a generator has long been equivalent to raise StopIteration(), return <something> in a generator is now equivalent to raise StopIteration(<something>). For that reason, the exception you're seeing should be printed as StopIteration: 3, and the value is accessible through the attribute value on the exception object. If the generator is delegated to using the (also new) yield from syntax, it is the result. See PEP 380 for details.
def f():
return 1
yield 2
def g():
x = yield from f()
print(x)
# g is still a generator so we need to iterate to run it:
for _ in g():
pass
This prints 1, but not 2.
The return value is not ignored, but generators only yield values, a return just ends the generator, in this case early. Advancing the generator never reaches the yield statement in that case.
Whenever a iterator reaches the 'end' of the values to yield, a StopIteration must be raised. Generators are no exception. As of Python 3.3 however, any return expression becomes the value of the exception:
>>> def gen():
... return 3
... yield 2
...
>>> try:
... next(gen())
... except StopIteration as ex:
... e = ex
...
>>> e
StopIteration(3,)
>>> e.value
3
Use the next() function to advance iterators, instead of calling .__next__() directly:
print(next(x))