I have a basic generator function that raises an exception if its parameters are not correct before doing any yield.
def my_generator(n):
if not isistance(n, int):
raise TypeError("Expecting an integer")
for i in range(1, 3):
yield n
I wanted to cover my whole project with unit tests, so I implemented this test function:
import pytest
from my_package import my_generator
#pytest.mark.parametrize("n, expected_exception", [
("1", TypeError), (1.0, TypeError), ([1], TypeError)
])
def test_my_generator_with_bad_parameters(n, expected_exception):
with pytest.raises(expected_exception):
my_generator(n)
But when I'm running pytest, I get:
Failed: DID NOT RAISE
However, if I modify my test to iterate over the resulting generator, the test passes.
def test_my_generator_with_bad_parameters(n, expected_exception):
res = my_generator(n)
with pytest.raises(expected_exception):
next(res)
How I am supposed to write this test? Is there a way to modify my_generator so that the first implementation of my unit test passes (assuming the function remains a generator)?
Normally, it's pretty OK to wait for the exception, until your generator is actually used, since most of the time, this is done in the same for-statement or list call.
If you really need checks at the time, your generator is generated, you can wrap your generator in an inner function:
def my_generator(n):
if not isistance(n, int):
raise TypeError("Expecting an integer")
def generator():
for i in range(1, 3):
yield n
return generator()
Related
This question already has answers here:
Return in generator together with yield
(2 answers)
Closed last year.
Why does
yield [cand]
return
lead to different output/behavior than
return [[cand]]
Minimal viable example
uses recursion
the output of the version using yield [1]; return is different than the output of the version using return [[1]]
def foo(i):
if i != 1:
yield [1]
return
yield from foo(i-1)
def bar(i):
if i != 1:
return [[1]]
yield from bar(i-1)
print(list(foo(1))) # [[1]]
print(list(bar(1))) # []
Min viable counter example
does not use recurion
the output of the version using yield [1]; return is the same as the output of the version using return [[1]]
def foo():
yield [1]
return
def foofoo():
yield from foo()
def bar():
return [[1]]
def barbar():
yield from bar()
print(list(foofoo())) # [[1]]
print(list(barbar())) # [[1]]
Full context
I'm solving Leetcode #39: Combination Sum and was wondering why one solution works, but not the other:
Working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
yield [cand]
return
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Non-working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
return [[cand]]
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Output
On the following input
candidates = [2,3,6,7]
target = 7
print(Solution().combinationSum(candidates, target))
the working solution correctly prints
[[3,2,2],[7]]
while the non-working solution prints
[]
I'm wondering why yield [cand]; return works, but return [[cand]] doesn't.
In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").
For example, try:
>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...
>>> f = foo() # Makes a generator object, stores it in f
>>> next(f) # Pull one value from generator
'onlyvalue'
>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'
As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.
In short, you have to pick one:
Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
Don't use yield, and return works as expected (because it's not a generator function).
As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:
__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles
# body of loop goes here
# Outside of loop, there is no StopIteration object left
As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).
According to PEP 8 we should be consistent in our function declarations and ensure that they all have the same return-pattern, i.e. all should return an expression or all should not. However, I am not sure how to apply this to generators.
A generator will yield values as long as the code reaches them, unless a return statement is encountered in which case it will stop the iteration. However, I don't see any use-case in which returning a value from a generator function can happen. In that spirit, I don't see why it is useful - from a PEP 8 perspective - to end such a function with the explicit return None. In other words, why do we ought to verbalize a return statement for generators if the return expression is only reached when the yield'ing is over?
Example: in the following code, I don't see how hello() can be used to assign 100 to a variable (thus using the return statement). So why does PEP 8 expect us to write a return statement (be it 100 or None).
def hello():
for i in range(5):
yield i
return 100
h = [x for x in hello()]
g = hello()
print(h)
# [0, 1, 2, 3, 4]
print(g)
# <generator object hello at 0x7fd2f285a7d8>
# can we ever get 100?
You have misread PEP8. PEP8 states:
Be consistent in return statements. Either all return statements in a function should return an expression, or none of them should.
(bold emphasis mine)
You should be consistent with how you use return within a single function, not across your whole project.
Use return, it's the only return statement in the function.
However, I don't see any use-case in which returning a value from a generator function can happen.
The return value of a generator is attached to the StopIteration exception raised:
>>> def gen():
... if False: yield
... return 'Return value'
...
>>> try:
... next(gen())
... except StopIteration as ex:
... print(ex.value)
...
Return value
And this is also the mechanism by which yield from produces a value; the return value of yield from is the value attribute on the StopIteration exception. A generator can thus return a result to code using result = yield from generator by using return result:
>>> def bar():
... result = yield from gen()
... print('gen() returned', result)
...
>>> next(bar(), None)
gen() returned Return value
This feature is used in the Python standard library; e.g. in the asyncio library the value of StopIteration is used to pass along Task results, and the #coroutine decorator uses res = yield from ... to run a wrapped generator or awaitable and pass through the return value.
So, from a PEP-8 point of view, for generators and there are two possibilities:
You are using return to exit the generator early, say in a loop with if. Use return, no need to add None:
def foo():
while bar:
yield ham
if spam:
return
You are using return <something> to exit and set StopIteration.value. Use return <something> consistently throughout your generator, even when returning None:
def foo():
for bar in baz:
yield bar
if spam:
return 'The bar bazzed the spam'
return None
For a long time I didn't know you can't put return in front of a yield statement. But actually you can:
def gen():
return (yield 42)
which is similar to
def gen():
yield 42
return
And the only usage I can think of is to attach sent value to StopIteration: pep-0380
return expr in a generator causes StopIteration(expr) to be raised
upon exit from the generator.
def gen():
return (yield 42)
g = gen()
print(next(g)) # 42
try:
g.send('AAAA')
except StopIteration as e:
print(e.value) # 'AAAA'
But this can be done using an extra variable too, which is more explicit:
def gen():
a = yield 42
return a
g = gen()
print(next(g))
try:
g.send('AAAA')
except StopIteration as e:
print(e.value) # 'AAAA'
So it seems return (yield xxx) is merely a syntactic sugar. Am I missing something?
Inside a generator the expressions (yield 42) will yield the value 42, but it also returns a value which is either None, if you use next(generator) or a given value if you use generator.send(value).
So as you say, you could use an intermediate value to get the same behavior, not because this is syntactical sugar, but because the yield expressions is literally returning the value you send it.
You could equally do something like
def my_generator():
return (yield (yield 42) + 10)
If we call this, using the sequence of calls:
g = my_generator()
print(next(g))
try:
print('first response:', g.send(1))
print('Second response:', g.send(22))
print('third response:', g.send(3))
except StopIteration as e:
print('stopped at', e.value)
First we get the output of 42, and the generator is essentially paused in a state you could describe like: return (yield <Input will go here> + 10),
If we then call g.send(1) we get the output 11. and the generator is now in the state:
return <Input will go here>, then sending g.send(22) will throw a StopIteration(22), because of the way return is handled in generators.
So you never get to the third send because of the exception.
I hope this example makes it a bit more apparent how yield works in generators and why the syntax return (yield something) is nothing special or exotic and works exactly how you'd expect it.
As for the literal question, when would you do this? Well when ever you want to yield something, and then later return a StopIteration echoing the input of the user sent to the generator. Because this is literally what the code is stating. I expect that such behavior is very rarely wanted.
I wrote this simple piece of code:
def mymap(func, *seq):
return (func(*args) for args in zip(*seq))
Should I use the 'return' statement as above to return a generator, or use a 'yield from' instruction like this:
def mymap(func, *seq):
yield from (func(*args) for args in zip(*seq))
and beyond the technical difference between 'return' and 'yield from', which is the better approach the in general case?
The difference is that your first mymap is just a usual function,
in this case a factory which returns a generator. Everything
inside the body gets executed as soon as you call the function.
def gen_factory(func, seq):
"""Generator factory returning a generator."""
# do stuff ... immediately when factory gets called
print("build generator & return")
return (func(*args) for args in seq)
The second mymap is also a factory, but it's also a generator
itself, yielding from a self-built sub-generator inside.
Because it is a generator itself, execution of the body does
not start until the first invokation of next(generator).
def gen_generator(func, seq):
"""Generator yielding from sub-generator inside."""
# do stuff ... first time when 'next' gets called
print("build generator & yield")
yield from (func(*args) for args in seq)
I think the following example will make it clearer.
We define data packages which shall be processed with functions,
bundled up in jobs we pass to the generators.
def add(a, b):
return a + b
def sqrt(a):
return a ** 0.5
data1 = [*zip(range(1, 5))] # [(1,), (2,), (3,), (4,)]
data2 = [(2, 1), (3, 1), (4, 1), (5, 1)]
job1 = (sqrt, data1)
job2 = (add, data2)
Now we run the following code inside an interactive shell like IPython to
see the different behavior. gen_factory immediately prints
out, while gen_generator only does so after next() being called.
gen_fac = gen_factory(*job1)
# build generator & return <-- printed immediately
next(gen_fac) # start
# Out: 1.0
[*gen_fac] # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]
gen_gen = gen_generator(*job1)
next(gen_gen) # start
# build generator & yield <-- printed with first next()
# Out: 1.0
[*gen_gen] # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]
To give you a more reasonable use case example for a construct
like gen_generator we'll extend it a little and make a coroutine
out of it by assigning yield to variables, so we can inject jobs
into the running generator with send().
Additionally we create a helper function which will run all tasks
inside a job and ask as for a new one upon completion.
def gen_coroutine():
"""Generator coroutine yielding from sub-generator inside."""
# do stuff... first time when 'next' gets called
print("receive job, build generator & yield, loop")
while True:
try:
func, seq = yield "send me work ... or I quit with next next()"
except TypeError:
return "no job left"
else:
yield from (func(*args) for args in seq)
def do_job(gen, job):
"""Run all tasks in job."""
print(gen.send(job))
while True:
result = next(gen)
print(result)
if result == "send me work ... or I quit with next next()":
break
Now we run gen_coroutinewith our helper function do_joband two jobs.
gen_co = gen_coroutine()
next(gen_co) # start
# receive job, build generator & yield, loop <-- printed with first next()
# Out:'send me work ... or I quit with next next()'
do_job(gen_co, job1) # prints out all results from job
# 1
# 1.4142135623730951
# 1.7320508075688772
# 2.0
# send me work... or I quit with next next()
do_job(gen_co, job2) # send another job into generator
# 3
# 4
# 5
# 6
# send me work... or I quit with next next()
next(gen_co)
# Traceback ...
# StopIteration: no job left
To come back to your question which version is the better approach in general.
IMO something like gen_factory makes only sense if you need the same thing done for multiple generators you are going to create, or in cases your construction process for generators is complicated enough to justify use of a factory instead of building individual generators in place with a generator comprehension.
Note:
The description above for the gen_generator function (second mymap) states
"it is a generator itself". That is a bit vague and technically not
really correct, but facilitates reasoning about the differences of the functions
in this tricky setup where gen_factory also returns a generator, namely that
one built by the generator comprehension inside.
In fact any function (not only those from this question with generator comprehensions inside!) with a yield inside, upon invocation, just
returns a generator object which gets constructed out of the function body.
type(gen_coroutine) # function
gen_co = gen_coroutine(); type(gen_co) # generator
So the whole action we observed above for gen_generator and gen_coroutine
takes place within these generator objects, functions with yield inside have spit out before.
The answer is: return a generator. It's more fast:
marco#buzz:~$ python3.9 -m pyperf timeit --rigorous --affinity 3 --value 6 --loops=4096 -s '
a = range(1000)
def f1():
for x in a:
yield x
def f2():
return f1()
' 'tuple(f2())'
........................................
Mean +- std dev: 72.8 us +- 5.8 us
marco#buzz:~$ python3.9 -m pyperf timeit --rigorous --affinity 3 --value 6 --loops=4096 -s '
a = range(1000)
def f1():
for x in a:
yield x
def f2():
yield from f1()
' 'tuple(f2())'
........................................
WARNING: the benchmark result may be unstable
* the standard deviation (12.6 us) is 10% of the mean (121 us)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3.9 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
Mean +- std dev: 121 us +- 13 us
If you read PEP 380, the main reason for the introduction of yield from is to use a part of the code of a generator for another generator, without having to duplicate the code or change the API:
The rationale behind most of the semantics presented above stems from
the desire to be able to refactor generator code. It should be
possible to take a section of code containing one or more yield
expressions, move it into a separate function (using the usual
techniques to deal with references to variables in the surrounding
scope, etc.), and call the new function using a yield from expression.
Source
The most important difference (I don't know if yield from generator is optimized) is that the context is different for return and yield from.
[ins] In [1]: def generator():
...: yield 1
...: raise Exception
...:
[ins] In [2]: def use_generator():
...: return generator()
...:
[ins] In [3]: def yield_generator():
...: yield from generator()
...:
[ins] In [4]: g = use_generator()
[ins] In [5]: next(g); next(g)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-5-3d9500a8db9f> in <module>
----> 1 next(g); next(g)
<ipython-input-1-b4cc4538f589> in generator()
1 def generator():
2 yield 1
----> 3 raise Exception
4
Exception:
[ins] In [6]: g = yield_generator()
[ins] In [7]: next(g); next(g)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-7-3d9500a8db9f> in <module>
----> 1 next(g); next(g)
<ipython-input-3-3ab40ecc32f5> in yield_generator()
1 def yield_generator():
----> 2 yield from generator()
3
<ipython-input-1-b4cc4538f589> in generator()
1 def generator():
2 yield 1
----> 3 raise Exception
4
Exception:
I prefer the version with yield from because it makes it easier to handle exceptions and context managers.
Take the example of a generator expression for the lines of a file:
def with_return(some_file):
with open(some_file, 'rt') as f:
return (line.strip() for line in f)
for line in with_return('/tmp/some_file.txt'):
print(line)
The return version raises a ValueError: I/O operation on closed file. since the file is not open anymore after the return statement.
On the other hand, the yield from version works as expected:
def with_yield_from(some_file):
with open(some_file, 'rt') as f:
yield from (line.strip() for line in f)
for line in with_yield_from('/tmp/some_file.txt'):
print(line)
Generators use yield, functions use return.
Generators are generally used in for loops for repeatedly iterating over the values automatically provided by a generator, but may be used also in another context, e. g. in list() function to create list - again from values automatically provided by a generator.
Functions are called to provide return value, only one value for every call.
Really it depends on the situation. yield is mainly suited to cases where you just want to iterate over the returned values and then manipulate them. return is mainly suited for when you want to store all of the values that your function has generated in memory rather than just iterate over them once. Do note that you can only iterate over a generator (what yield returns) once, there are some algorithms which this is definitely not suited for.
I would like to have a function that can, optionally, return or yield the result.
Here is an example.
def f(option=True):
...
for...:
if option:
yield result
else:
results.append(result)
if not option:
return results
Of course, this doesn't work, I have tried with python3 and I always get a generator no matter what option value I set.
As far I have understood, python checks the body of the function and if a yield is present, then the result will be a generator.
Is there any way to get around this and make a function that can return or yield at will?
You can't. Any use of yield makes the function a generator.
You could wrap your function with one that uses list() to store all values the generator produces in a list object and returns that:
def f_wrapper(option=True):
gen = f()
if option:
return gen # return the generator unchanged
return list(gen) # return all values of the generator as a list
However, generally speaking, this is bad design. Don't have your functions alter behaviour like this; stick to one return type (a generator or an object) and don't have it switch between the two.
Consider splitting this into two functions instead:
def f():
yield result
def f_as_list():
return list(f())
and use either f() if you need the generator, and f_as_list() if you want to have a list instead.
Since list(), (and next() to access just one value of a generator) are built-in functions, you rarely need to use a wrapper. Just call those functions directly:
# access elements one by one
gen = f()
one_value = next(gen)
# convert the generator to a list
all_values = list(f())
What about this?
def make_f_or_generator(option):
def f():
return "I am a function."
def g():
yield "I am a generator."
if option:
return f
else:
return g
This gives you at least the choice to create a function or a generator.
class based approach
class FunctionAndGenerator:
def __init__(self):
self.counter = 0
def __iter__(self):
return self
# You need a variable to indicate if dunder next should return the string or raise StopIteration.
# Raising StopIteration will stop the loop from iterating more.
# You'll have to teach next to raise StopIteration at some point
def __next__(self):
self.counter += 1
if self.counter > 1 :
raise StopIteration
return f"I'm a generator and I've generated {self.counter} times"
def __call__(self):
return "I'm a function"
x = FunctionAndGenerator()
print(x())
for i in x:
print(i)
I'm a function
I'm a generator and I've generated 1 times
[Program finished]