Why are exceptions within a Python generator not caught? - python

I have the following experimental code whose function is similar to the zip built-in. What it tries to do should have been simple and clear, trying to return the zipped tuples one at a time until an IndexError occurs when we stop the generator.
def my_zip(*args):
i = 0
while True:
try:
yield (arg[i] for arg in args)
except IndexError:
raise StopIteration
i += 1
However, when I tried to execute the following code, the IndexError was not caught but instead thrown by the generator:
gen = my_zip([1,2], ['a','b'])
print(list(next(gen)))
print(list(next(gen)))
print(list(next(gen)))
IndexError Traceback (most recent call last)
I:\Software\WinPython-32bit-3.4.2.4\python-3.4.2\my\temp2.py in <module>()
12 print(list(next(gen)))
13 print(list(next(gen)))
---> 14 print(list(next(gen)))
I:\Software\WinPython-32bit-3.4.2.4\python-3.4.2\my\temp2.py in <genexpr>(.0)
3 while True:
4 try:
----> 5 yield (arg[i] for arg in args)
6 except IndexError:
7 raise StopIteration
IndexError: list index out of range
Why is this happening?
Edit:
Thanks #thefourtheye for providing a nice explanation for what's happening above. Now another problem occurs when I execute:
list(my_zip([1,2], ['a','b']))
This line never returns and seems to hang the machine. What's happening now?

The yield yields a generator object everytime and when the generators were created there was no problem at all. That is why try...except in my_zip is not catching anything. The third time when you executed it,
list(arg[2] for arg in args)
this is how it got reduced to (over simplified for our understanding) and now, observe carefully, list is iterating the generator, not the actual my_zip generator. Now, list calls next on the generator object and arg[2] is evaluated, only to find that 2 is not a valid index for arg (which is [1, 2] in this case), so IndexError is raised, and list fails to handle it (it has no reason to handle that anyway) and so it fails.
As per the edit,
list(my_zip([1,2], ['a','b']))
will be evaluated like this. First, my_zip will be called and that will give you a generator object. Then iterate it with list. It calls next on it, and it gets another generator object list(arg[0] for arg in args). Since there is no exception or return encountered, it will call next, to get another generator object list(arg[1] for arg in args) and it keeps on iterating. Remember, the yielded generators are never iterated, so we ll never get the IndexError. That is why the code runs infinitely.
You can confirm this like this,
from itertools import islice
from pprint import pprint
pprint(list(islice(my_zip([1, 2], ["a", 'b']), 10)))
and you will get
[<generator object <genexpr> at 0x7f4d0a709678>,
<generator object <genexpr> at 0x7f4d0a7096c0>,
<generator object <genexpr> at 0x7f4d0a7099d8>,
<generator object <genexpr> at 0x7f4d0a709990>,
<generator object <genexpr> at 0x7f4d0a7095a0>,
<generator object <genexpr> at 0x7f4d0a709510>,
<generator object <genexpr> at 0x7f4d0a7095e8>,
<generator object <genexpr> at 0x7f4d0a71c708>,
<generator object <genexpr> at 0x7f4d0a71c750>,
<generator object <genexpr> at 0x7f4d0a71c798>]
So the code tries to build an infinite list of generator objects.

def my_zip(*args):
i = 0
while True:
try:
yield (arg[i] for arg in args)
except IndexError:
raise StopIteration
i += 1
IndexError is not caught, because (arg[i] for arg in args) is a generator which is not executed immediately, but when you start iterating over it. And you iterate over it in another scope, when you call list((arg[i] for arg in args)):
# get the generator which yields another generator on each iteration
gen = my_zip([1,2], ['a','b'])
# get the second generator `(arg[i] for arg in args)` from the first one
# then iterate over it: list((arg[i] for arg in args))
print(list(next(gen)))
On the first list(next(gen)) i equals 0.
On the second list(next(gen)) i equals 1.
On the third list(next(gen)) i equals 2. And here you get IndexError -- in the outer scope. The line is treated as list(arg[2] for arg in ([1,2], ['a','b']))

Sorry, I'm not able to offer a coherent explanation regarding the failure to catch the exception, however, there's an easy way around it; use a for loop over the length of the shortest sequence:
def my_zip(*args):
for i in range(min(len(arg) for arg in args)):
yield (arg[i] for arg in args)
>>> gen = my_zip([1,2], ["a",'b','c'])
>>> print(list(next(gen)))
[1, 'a']
>>> print(list(next(gen)))
[2, 'b']
>>> print(list(next(gen)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

Try replacing yield (arg[i] for ...) with the following.
for arg in args:
yield arg[i]
But in case of numbers that causes an exception as 1[1] makes no sense. I suggest replacing arg[i] just with arg.

Related

Is the function "next" a good practice to find first occurrence in a iterable?

I've learned about iterators and such and discovered this quite interesting way of getting the first element in a list that a condition is applied (and also with default value in case we don't find it):
first_occurence = next((x for x in range(1,10) if x > 5), None)
For me, it seems a very useful, clear way of obtaining the result.
But since I've never seen that in production code, and since next is a little more "low-level" in the python structure I was wondering if that could be bad practice for some reason. Is that the case? and why?
It's fine. It's efficient, it's fairly readable, etc.
If you're expecting a result, or None is a possible result (so using None as a placeholder makes it hard to figure out if you got a result or got the default) it may be better to use the EAFP form rather than providing a default, catching the StopIteration it raises if no item is found, or just letting it bubble up if the problem is from the caller's input not meeting specs (so it's up to them to handle it). It looks even cleaner at point of use that way:
first_occurence = next(x for x in range(1,10) if x > 5)
Alternatively, when None is a valid result, you can use an explicit sentinel object that's guaranteed unique like so:
sentinel = object() # An anonymous object you construct can't possibly appear in the input
first_occurence = next((x for x in range(1,10) if x > 5), sentinel)
if first_occurence is not sentinel: # Compare with is for performance and to avoid broken __eq__ comparing equal to sentinel
A common use case for this one of these constructs to replace a call to any when you not only need to know if any item passed the test, but which item (any can only return True or False, so it's unsuited to finding which item passed).
We can wrap it up in a function to provide an even nicer interface:
_raise = object()
# can pass either an iterable or an iterator
def first(iterable, condition, *, default=_raise, exctype=None):
"""Get the first value from `iterable` which meets `condition`.
Will consume elements from the iterable.
default -> if no element meets the condition, return this instead.
exctype -> if no element meets the condition and there is no default,
raise this kind of exception rather than `StopIteration`.
(It will be chained from the original `StopIteration`.)
"""
try:
# `iter` is idempotent; this makes sure we have an iterator
return next(filter(condition, iter(iterable)))
except StopIteration as e:
if default is not _raise:
return default
if exctype:
raise exctype() from e
raise
Let's test it:
>>> first(range(10), lambda x: x > 5)
6
>>> first(range(10), lambda x: x > 11)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in first
StopIteration
>>> first(range(10), lambda x: x > 11, exctype=ValueError)
Traceback (most recent call last):
File "<stdin>", line 4, in first
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in first
ValueError
>>> first(range(10), lambda x: x > 11, default=None)
>>>

Why exhausted generators raise StopIteration more than once?

Why is it that when an exhausted generator is called several times, StopIteration is raised every time, rather than just on the first attempt? Aren't subsequent calls meaningless, and indicate a likely bug in the caller's code?
def gen_func():
yield 1
yield 2
gen = gen_func()
next(gen)
next(gen)
next(gen) # StopIteration as expected
next(gen) # why StopIteration and not something to warn me that I'm doing something wrong
This also results in this behavior when someone accidentally uses an expired generator:
def do_work(gen):
for x in gen:
# do stuff with x
pass
# here I forgot that I already used up gen
# so the loop does nothing without raising any exception or warning
for x in gen:
# do stuff with x
pass
def gen_func():
yield 1
yield 2
gen = gen_func()
do_work(gen)
If second and later attempts to call an exhausted generator raised a different exception, it would have been easier to catch this type of bugs.
Perhaps there's an important use case for calling exhausted generators multiple times and getting StopIteration?
Perhaps there's an important use case for calling exhausted generators multiple times and getting StopIteration?
There is, specifically, when you want to perform multiple loops on the same iterator. Here's an example from the itertools docs that relies on this behavior:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
It is a part of the iteration protocol:
Once an iterator’s __next__() method raises StopIteration, it must
continue to do so on subsequent calls. Implementations that do not
obey this property are deemed broken.
Source: https://docs.python.org/3/library/stdtypes.html#iterator-types
Here's an implementation of a wrapper that raises an error whenever StopIteration is raised more than once, as already noted by VPfB, this is implementation is considered broken
#!/usr/bin/env python3.8
from typing import TypeVar, Iterator
"""
https://docs.python.org/3/library/stdtypes.html#iterator-types
This is considered broken by the iterator protocol, god knows why
"""
class IteratorExhaustedError(Exception):
"""Exception raised when exhausted iterators are ``next``d"""
T = TypeVar("T")
class reuse_guard(Iterator[T]):
"""
Wraps an iterator so that StopIteration is only raised once,
after that, ``IteratorExhaustedError`` will be raised to detect
fixed-size iterator misuses
"""
def __init__(self, iterator: Iterator[T]):
self._iterated: bool = False
self._iterator = iterator
def __next__(self) -> T:
try:
return next(self._iterator)
except StopIteration as e:
if self._iterated:
raise IteratorExhaustedError(
"This iterator has already reached its end")
self._iterated = True
raise e
def __iter__(self) -> Iterator[T]:
return self
Example:
In [48]: iterator = reuse_guard(iter((1, 2, 3, 4)))
In [49]: list(iterator)
Out[49]: [1, 2, 3, 4]
In [50]: list(iterator)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-47-456650faec86> in __next__(self)
19 try:
---> 20 return next(self._iterator)
21 except StopIteration as e:
StopIteration:
During handling of the above exception, another exception occurred:
IteratorExhaustedError Traceback (most recent call last)
<ipython-input-50-5070d0fe4365> in <module>
----> 1 list(iterator)
<ipython-input-47-456650faec86> in __next__(self)
21 except StopIteration as e:
22 if self._iterated:
---> 23 raise IteratorExhaustedError(
24 "This iterator has already reached its end")
25 self._iterated = True
IteratorExhaustedError: This iterator has already reached its end
Edit:
After revisiting the documentation on the iterator protocol it seems to me that the purpose of stating that iterators that do not continue to raise StopIteration should be considered broken is aimed more at the iterators that yield values instead of raising exceptions, that in this case make it more clear that the iterator should not be used once it's been exhausted. This is merely my interpretation thought.

Python unexpected StopIteration

This is my code
class A:
pass
def f():
yield A()
def g():
it = f()
next(it).a = next(it, None)
g()
that produces the StopIteration error, caused by next(it).a = next(it, None). Why?
The documentation says that next function does not raise the StopIteration if the default value is provided, and I expected it to get me the first item from the generator (the A instance) and set the a attribute to None.
Because f only yields a single value, you can only call next on it once.
The right hand side of your expression (next(it, None)) is evaluated before the left hand side, and thus exhausts the generator.
Calling next(it).a on the left hand side will then raise StopIteration.
Your f() generator function yields just one value. After that it is exhausted and raises StopIteration.
>>> class A:
... pass
...
>>> def f():
... yield A()
...
>>> generator = f()
>>> generator
<generator object f at 0x10be771f8>
>>> next(generator)
<__main__.A object at 0x10be76f60>
>>> next(generator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
That's because there is no loop in f() to yield more than once, a generator function does not, on its own, loop, just because you can use it in a loop.
Note that for an assignment, Python executes the right-hand-side expression first, before figuring out what to assign it to. So the next(it, None) is called first, the next(it).a for the assignment is called second.
The body of the f() function is executed just like any other Python function, with the addition of pausing. next() on the generator un-pauses the code, and the code then runs until the next yield statement is executed. That statement then pauses the generator again. If the function instead ends (returns), StopIteration is raised.
In your f() generator that means:
when you call f() a new generator object is created. The function body is paused.
you call next() on it the first time. The code starts running, creates an instance of A() and yields that instance. The function is paused again.
you call next() on it a second time. The code starts running, reaches the end of the function, and returns. StopIteration is raised.
If you add a loop to f(), or simply add a second yield line, your code works:
def f():
yield A()
yield A()
or
def f():
while True:
yield A()

Recursive python generators: why does the yield need to be iterated over?

A good exercise to test one's understanding of recursion is to write a function that generates all permutations of a string:
def get_perms(to_go, so_far=''):
if not to_go:
return [so_far]
else:
ret = []
for i in range(len(to_go)):
ret += get_perms(to_go[:i] + to_go[i+1:], so_far + to_go[i])
return ret
This code is fine, but we can significantly improve the efficiency from a memory standpoint by using a generator:
def perms_generator(to_go, so_far=''):
if not to_go:
yield so_far
else:
for i in range(len(to_go)):
for perm in perms_generator(to_go[:i] + to_go[i+1:], so_far + to_go[i]):
yield perm
(Note: the last for loop can also be replaced with yield from in python 3.3)
My question: why do we need to iterate over the results of each recursive call? I know that yield returns a generator, but from the statement yield so_far it would seem as though we're getting a string, and not something we would need to iterate over. Rather, it would seem as though we could replace
for perm in perms_generator(to_go[:i] + to_go[i+1:], so_far + to_go[i]):
yield perm
with
yield perms_generator(to_go[:i] + to_go[i+1:], so_far + to_go[i])
Thank you. Please let me know if the title is unclear. I have a feeling the content of this question is related to this SO question.
Remember, any function using yield does not return those values to the caller. Instead a generator object is returned and the code itself is instead paused until you iterate over the generator. Each time a yield is encountered, the code is paused again:
>>> def pausing_generator():
... print 'Top of the generator'
... yield 1
... print 'In the middle'
... yield 2
... print 'At the end'
...
>>> gen = pausing_generator()
>>> gen
<generator object pausing_generator at 0x1081e0d70>
>>> next(gen)
Top of the generator
1
>>> next(gen)
In the middle
2
>>> next(gen)
At the end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Calling the pausing_generator() function returned a generator object. Only iterating (using the next() function here) runs the actual code in the function, but pausing execution each time a yield is encountered.
Your perms_generator function returns a such a generator object, and a recursive call would still return a generator object. You could yield the whole generator object, but then you are producing a generator that produces a generator, etc. until you come to the inner-most generator.
You can visualise this with print statements:
>>> def countdown(i):
... if not i:
... return
... yield i
... recursive_result = countdown(i - 1)
... print i, recursive_result
... for recursive_elem in recursive_result:
... yield recursive_elem
...
>>> for i in countdown(5):
... print i
...
5
5 <generator object countdown at 0x1081e0e10>
4
4 <generator object countdown at 0x1081e0e60>
3
3 <generator object countdown at 0x1081e0eb0>
2
2 <generator object countdown at 0x1081e0f00>
1
1 <generator object countdown at 0x1081e0f50>
Here, the recursive calls returned a new generator object; if you wanted to have the elements produced by the generator your only choice is to loop over it and hand the elements down, not the generator object itself.
In Python 3, you can use yield from to delegate to a nested generator, including a recursive call:
def perms_generator(to_go, so_far=''):
if not to_go:
yield so_far
else:
for i, elem in enumerate(to_go):
yield from perms_generator(to_go[:i] + to_go[i+1:], so_far + elem)
When encountering a yield from iteration continues into the recursive call instead of yielding the whole generator object.
The difference between return and yield is that the former just returns a value. The latter means "wrap the value in a generator and then return the generator."
So in all cases, the function perms_generator() returns a generator.
The expression yield perms_generator() again would wrap the result of perms_generator() in a generator, giving you a generator of a generator. That would mean the function would return different things; sometimes, it would be a simple generator and sometimes nested generators. That would be very confusing for the consumer of your code.

How to handle empty (none) tuple returned from python function

I have a function that either returns a tuple or None. How is the Caller supposed to handle that condition?
def nontest():
return None
x,y = nontest()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable
EAFP:
try:
x,y = nontest()
except TypeError:
# do the None-thing here or pass
or without try-except:
res = nontest()
if res is None:
....
else:
x, y = res
How about:
x,y = nontest() or (None,None)
If nontest returns a two-item tuple like it should, then x and y are assigned to the items in the tuple. Otherwise, x and y are each assigned to none. Downside to this is that you can't run special code if nontest comes back empty (the above answers can help you if that is your goal). Upside is that it is clean and easy to read/maintain.
If you can change the function itself, it's probably a better idea to make it raise a relevant exception instead of returning None to signal an error condition. The caller should then just try/except that.
If the None isn't signalling an error condition, you'll want to rethink your semantics altogether.

Categories

Resources