In Python, is there any difference between creating a generator object through a generator expression versus using the yield statement?
Using yield:
def Generator(x, y):
for i in xrange(x):
for j in xrange(y):
yield(i, j)
Using generator expression:
def Generator(x, y):
return ((i, j) for i in xrange(x) for j in xrange(y))
Both functions return generator objects, which produce tuples, e.g. (0,0), (0,1) etc.
Any advantages of one or the other? Thoughts?
There are only slight differences in the two. You can use the dis module to examine this sort of thing for yourself.
Edit: My first version decompiled the generator expression created at module-scope in the interactive prompt. That's slightly different from the OP's version with it used inside a function. I've modified this to match the actual case in the question.
As you can see below, the "yield" generator (first case) has three extra instructions in the setup, but from the first FOR_ITER they differ in only one respect: the "yield" approach uses a LOAD_FAST in place of a LOAD_DEREF inside the loop. The LOAD_DEREF is "rather slower" than LOAD_FAST, so it makes the "yield" version slightly faster than the generator expression for large enough values of x (the outer loop) because the value of y is loaded slightly faster on each pass. For smaller values of x it would be slightly slower because of the extra overhead of the setup code.
It might also be worth pointing out that the generator expression would usually be used inline in the code, rather than wrapping it with the function like that. That would remove a bit of the setup overhead and keep the generator expression slightly faster for smaller loop values even if LOAD_FAST gave the "yield" version an advantage otherwise.
In neither case would the performance difference be enough to justify deciding between one or the other. Readability counts far more, so use whichever feels most readable for the situation at hand.
>>> def Generator(x, y):
... for i in xrange(x):
... for j in xrange(y):
... yield(i, j)
...
>>> dis.dis(Generator)
2 0 SETUP_LOOP 54 (to 57)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_FAST 0 (x)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 40 (to 56)
16 STORE_FAST 2 (i)
3 19 SETUP_LOOP 31 (to 53)
22 LOAD_GLOBAL 0 (xrange)
25 LOAD_FAST 1 (y)
28 CALL_FUNCTION 1
31 GET_ITER
>> 32 FOR_ITER 17 (to 52)
35 STORE_FAST 3 (j)
4 38 LOAD_FAST 2 (i)
41 LOAD_FAST 3 (j)
44 BUILD_TUPLE 2
47 YIELD_VALUE
48 POP_TOP
49 JUMP_ABSOLUTE 32
>> 52 POP_BLOCK
>> 53 JUMP_ABSOLUTE 13
>> 56 POP_BLOCK
>> 57 LOAD_CONST 0 (None)
60 RETURN_VALUE
>>> def Generator_expr(x, y):
... return ((i, j) for i in xrange(x) for j in xrange(y))
...
>>> dis.dis(Generator_expr.func_code.co_consts[1])
2 0 SETUP_LOOP 47 (to 50)
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 40 (to 49)
9 STORE_FAST 1 (i)
12 SETUP_LOOP 31 (to 46)
15 LOAD_GLOBAL 0 (xrange)
18 LOAD_DEREF 0 (y)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 17 (to 45)
28 STORE_FAST 2 (j)
31 LOAD_FAST 1 (i)
34 LOAD_FAST 2 (j)
37 BUILD_TUPLE 2
40 YIELD_VALUE
41 POP_TOP
42 JUMP_ABSOLUTE 25
>> 45 POP_BLOCK
>> 46 JUMP_ABSOLUTE 6
>> 49 POP_BLOCK
>> 50 LOAD_CONST 0 (None)
53 RETURN_VALUE
In this example, not really. But yield can be used for more complex constructs - for example it can accept values from the caller as well and modify the flow as a result. Read PEP 342 for more details (it's an interesting technique worth knowing).
Anyway, the best advice is use whatever is clearer for your needs.
P.S. Here's a simple coroutine example from Dave Beazley:
def grep(pattern):
print "Looking for %s" % pattern
while True:
line = (yield)
if pattern in line:
print line,
# Example use
if __name__ == '__main__':
g = grep("python")
g.next()
g.send("Yeah, but no, but yeah, but no")
g.send("A series of tubes")
g.send("python generators rock!")
There is no difference for the kind of simple loops that you can fit into a generator expression. However yield can be used to create generators that do much more complex processing. Here is a simple example for generating the fibonacci sequence:
>>> def fibgen():
... a = b = 1
... while True:
... yield a
... a, b = b, a+b
>>> list(itertools.takewhile((lambda x: x<100), fibgen()))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
In usage, note a distinction between a generator object vs a generator function.
A generator object is use-once-only, in contrast to a generator function, which can be reused each time you call it again, because it returns a fresh generator object.
Generator expressions are in practice usually used "raw", without wrapping them in a function, and they return a generator object.
E.g.:
def range_10_gen_func():
x = 0
while x < 10:
yield x
x = x + 1
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Compare with a slightly different usage:
range_10_gen = range_10_gen_func()
print(list(range_10_gen))
print(list(range_10_gen))
print(list(range_10_gen))
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]
And compare with a generator expression:
range_10_gen_expr = (x for x in range(10))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
which also outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]
Using yield is nice if the expression is more complicated than just nested loops. Among other things you can return a special first or special last value. Consider:
def Generator(x):
for i in xrange(x):
yield(i)
yield(None)
Yes there is a difference.
For the generator expression (x for var in expr), iter(expr) is called when the expression is created.
When using def and yield to create a generator, as in:
def my_generator():
for var in expr:
yield x
g = my_generator()
iter(expr) is not yet called. It will be called only when iterating on g (and might not be called at all).
Taking this iterator as an example:
from __future__ import print_function
class CountDown(object):
def __init__(self, n):
self.n = n
def __iter__(self):
print("ITER")
return self
def __next__(self):
if self.n == 0:
raise StopIteration()
self.n -= 1
return self.n
next = __next__ # for python2
This code:
g1 = (i ** 2 for i in CountDown(3)) # immediately prints "ITER"
print("Go!")
for x in g1:
print(x)
while:
def my_generator():
for i in CountDown(3):
yield i ** 2
g2 = my_generator()
print("Go!")
for x in g2: # "ITER" is only printed here
print(x)
Since most iterators do not do a lot of stuff in __iter__, it is easy to miss this behavior. A real world example would be Django's QuerySet, which fetch data in __iter__ and data = (f(x) for x in qs) might take a lot of time, while def g(): for x in qs: yield f(x) followed by data=g() would return immediately.
For more info and the formal definition refer to PEP 289 -- Generator Expressions.
When thinking about iterators, the itertools module:
... standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.
For performance, consider itertools.product(*iterables[, repeat])
Cartesian product of input iterables.
Equivalent to nested for-loops in a generator expression. For example, product(A, B) returns the same as ((x,y) for x in A for y in B).
>>> import itertools
>>> def gen(x,y):
... return itertools.product(xrange(x),xrange(y))
...
>>> [t for t in gen(3,2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]
>>>
There is a difference that could be important in some contexts that hasn't been pointed out yet. Using yield prevents you from using return for something else than implicitly raising StopIteration (and coroutines related stuff).
This means this code is ill-formed (and feeding it to an interpreter will give you an AttributeError):
class Tea:
"""With a cloud of milk, please"""
def __init__(self, temperature):
self.temperature = temperature
def mary_poppins_purse(tea_time=False):
"""I would like to make one thing clear: I never explain anything."""
if tea_time:
return Tea(355)
else:
for item in ['lamp', 'mirror', 'coat rack', 'tape measure', 'ficus']:
yield item
print(mary_poppins_purse(True).temperature)
On the other hand, this code works like a charm:
class Tea:
"""With a cloud of milk, please"""
def __init__(self, temperature):
self.temperature = temperature
def mary_poppins_purse(tea_time=False):
"""I would like to make one thing clear: I never explain anything."""
if tea_time:
return Tea(355)
else:
return (item for item in ['lamp', 'mirror', 'coat rack',
'tape measure', 'ficus'])
print(mary_poppins_purse(True).temperature)
Related
I'm wondering how the Python interpreter actually handles lambda functions in memory.
If I have the following:
def squares():
return [(lambda x: x**2)(num) for num in range(1000)]
Will this create 1000 instances of the lambda function in memory, or is Python smart enough to know that each of these 1000 lambdas is the same, and therefore store them as one function in memory?
Your code create a single lambda function and calls it 1000 times, without creating a new object at each iteration. Because at each iteration you store the result of the lambda function not the function itself. It is equivalent to:
def square(x): return x*x # or def square x = lambda x: x*x
[square(x) for x in range(1000)]
Instead in this way we are going to create a lambda function object for each iteration. See this dummy example:
[lambda x: x*x for _ in range(3)]
gives:
[<function <listcomp>.<lambda> at 0x294358950>,
<function <listcomp>.<lambda> at 0x294358c80>,
<function <listcomp>.<lambda> at 0x294358378>]
The memory addresses of the lambdas, they are all different. Then a different object is created for each lambda.
TL;DR: The memory cost of the lambda objects in your example is the size of one lambda, but only while the squares() function is running, even if you hold a reference to its return value, because the returned list contains no lambda objects.
But even in cases where you do keep more than one function instance created from the same lambda expression (or def statement for that matter), they share the same code object, so the memory cost for each additional instance is less than the cost of the first instance.
In your example,
[(lambda x: x**2)(num) for num in range(1000)]
you're only storing the result of the lambda invocation in the list, not the lambda itself, so the lambda object's memory will be freed.
When exactly the lambda objects get garbage collected depends on your Python implementation. CPython should be able to do it immediately because the reference count drops to 0 each loop:
>>> class PrintsOnDel:
... def __del__(self):
... print('del') # We can see when this gets collected.
...
>>> [[PrintsOnDel(), print(x)][-1] for x in [1, 2, 3]] # Freed each loop.
1
del
2
del
3
del
[None, None, None]
PyPy is another story.
>>>> from __future__ import print_function
>>>> class PrintsOnDel:
.... def __del__(self):
.... print('del')
....
>>>> [[PrintsOnDel(), print(x)][-1] for x in [1, 2, 3]]
1
2
3
[None, None, None]
>>>> import gc
>>>> gc.collect() # Not freed until the gc actually runs!
del
del
del
0
It will create 1000 different lambda instances over time, but they won't all be in memory at once (in CPython) and they'll all point to the same code object, so having multiple instances of a function is not as bad as it sounds:
>>> a, b = [lambda x: x**2 for x in [1, 2]]
>>> a is b # Different lambda objects...
False
>>> a.__code__ is b.__code__ # ...point to the same code object.
True
Disassembling the bytecode can help you to understand exactly what the interpreter is doing:
>>> from dis import dis
>>> dis("[(lambda x: x**2)(num) for num in range(1000)]")
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x000001D11D066870, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_NAME 0 (range)
8 LOAD_CONST 2 (1000)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x000001D11D066870, file "<dis>", line 1>:
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (num)
8 LOAD_CONST 0 (<code object <lambda> at 0x000001D11D0667C0, file "<dis>", line 1>)
10 LOAD_CONST 1 ('<listcomp>.<lambda>')
12 MAKE_FUNCTION 0
14 LOAD_FAST 1 (num)
16 CALL_FUNCTION 1
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
Disassembly of <code object <lambda> at 0x000001D11D0667C0, file "<dis>", line 1>:
1 0 LOAD_FAST 0 (x)
2 LOAD_CONST 1 (2)
4 BINARY_POWER
6 RETURN_VALUE
Note the 12 MAKE_FUNCTION instruction each loop. It does make a new lambda instance every time. CPython's VM is a stack machine. Arguments get pushed to the stack by other instructions, and then consumed by later instructions that need them. Notice that above that MAKE_FUNCTION instruction is another one pushing an argument for it.
LOAD_CONST 0 (<code object <lambda>...
So it re-uses the code object.
The following behaviour seems rather counterintuitive to me (Python 3.4):
>>> [(yield i) for i in range(3)]
<generator object <listcomp> at 0x0245C148>
>>> list([(yield i) for i in range(3)])
[0, 1, 2]
>>> list((yield i) for i in range(3))
[0, None, 1, None, 2, None]
The intermediate values of the last line are actually not always None, they are whatever we send into the generator, equivalent (I guess) to the following generator:
def f():
for i in range(3):
yield (yield i)
It strikes me as funny that those three lines work at all. The Reference says that yield is only allowed in a function definition (though I may be reading it wrong and/or it may simply have been copied from the older version). The first two lines produce a SyntaxError in Python 2.7, but the third line doesn't.
Also, it seems odd
that a list comprehension returns a generator and not a list
and that the generator expression converted to a list and the corresponding list comprehension contain different values.
Could someone provide more information?
Note: this was a bug in the CPython's handling of yield in comprehensions and generator expressions, fixed in Python 3.8, with a deprecation warning in Python 3.7. See the Python bug report and the What's New entries for Python 3.7 and Python 3.8.
Generator expressions, and set and dict comprehensions are compiled to (generator) function objects. In Python 3, list comprehensions get the same treatment; they are all, in essence, a new nested scope.
You can see this if you try to disassemble a generator expression:
>>> dis.dis(compile("(i for i in range(3))", '', 'exec'))
1 0 LOAD_CONST 0 (<code object <genexpr> at 0x10f7530c0, file "", line 1>)
3 LOAD_CONST 1 ('<genexpr>')
6 MAKE_FUNCTION 0
9 LOAD_NAME 0 (range)
12 LOAD_CONST 2 (3)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 GET_ITER
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 3 (None)
26 RETURN_VALUE
>>> dis.dis(compile("(i for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
The above shows that a generator expression is compiled to a code object, loaded as a function (MAKE_FUNCTION creates the function object from the code object). The .co_consts[0] reference lets us see the code object generated for the expression, and it uses YIELD_VALUE just like a generator function would.
As such, the yield expression works in that context, as the compiler sees these as functions-in-disguise.
This is a bug; yield has no place in these expressions. The Python grammar before Python 3.7 allows it (which is why the code is compilable), but the yield expression specification shows that using yield here should not actually work:
The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition.
This has been confirmed to be a bug in issue 10544. The resolution of the bug is that using yield and yield from will raise a SyntaxError in Python 3.8; in Python 3.7 it raises a DeprecationWarning to ensure code stops using this construct. You'll see the same warning in Python 2.7.15 and up if you use the -3 command line switch enabling Python 3 compatibility warnings.
The 3.7.0b1 warning looks like this; turning warnings into errors gives you a SyntaxError exception, like you would in 3.8:
>>> [(yield i) for i in range(3)]
<stdin>:1: DeprecationWarning: 'yield' inside list comprehension
<generator object <listcomp> at 0x1092ec7c8>
>>> import warnings
>>> warnings.simplefilter('error')
>>> [(yield i) for i in range(3)]
File "<stdin>", line 1
SyntaxError: 'yield' inside list comprehension
The differences between how yield in a list comprehension and yield in a generator expression operate stem from the differences in how these two expressions are implemented. In Python 3 a list comprehension uses LIST_APPEND calls to add the top of the stack to the list being built, while a generator expression instead yields that value. Adding in (yield <expr>) just adds another YIELD_VALUE opcode to either:
>>> dis.dis(compile("[(yield i) for i in range(3)]", '', 'exec').co_consts[0])
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 13 (to 22)
9 STORE_FAST 1 (i)
12 LOAD_FAST 1 (i)
15 YIELD_VALUE
16 LIST_APPEND 2
19 JUMP_ABSOLUTE 6
>> 22 RETURN_VALUE
>>> dis.dis(compile("((yield i) for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 YIELD_VALUE
14 POP_TOP
15 JUMP_ABSOLUTE 3
>> 18 LOAD_CONST 0 (None)
21 RETURN_VALUE
The YIELD_VALUE opcode at bytecode indexes 15 and 12 respectively is extra, a cuckoo in the nest. So for the list-comprehension-turned-generator you have 1 yield producing the top of the stack each time (replacing the top of the stack with the yield return value), and for the generator expression variant you yield the top of the stack (the integer) and then yield again, but now the stack contains the return value of the yield and you get None that second time.
For the list comprehension then, the intended list object output is still returned, but Python 3 sees this as a generator so the return value is instead attached to the StopIteration exception as the value attribute:
>>> from itertools import islice
>>> listgen = [(yield i) for i in range(3)]
>>> list(islice(listgen, 3)) # avoid exhausting the generator
[0, 1, 2]
>>> try:
... next(listgen)
... except StopIteration as si:
... print(si.value)
...
[None, None, None]
Those None objects are the return values from the yield expressions.
And to reiterate this again; this same issue applies to dictionary and set comprehension in Python 2 and Python 3 as well; in Python 2 the yield return values are still added to the intended dictionary or set object, and the return value is 'yielded' last instead of attached to the StopIteration exception:
>>> list({(yield k): (yield v) for k, v in {'foo': 'bar', 'spam': 'eggs'}.items()})
['bar', 'foo', 'eggs', 'spam', {None: None}]
>>> list({(yield i) for i in range(3)})
[0, 1, 2, set([None])]
The following behaviour seems rather counterintuitive to me (Python 3.4):
>>> [(yield i) for i in range(3)]
<generator object <listcomp> at 0x0245C148>
>>> list([(yield i) for i in range(3)])
[0, 1, 2]
>>> list((yield i) for i in range(3))
[0, None, 1, None, 2, None]
The intermediate values of the last line are actually not always None, they are whatever we send into the generator, equivalent (I guess) to the following generator:
def f():
for i in range(3):
yield (yield i)
It strikes me as funny that those three lines work at all. The Reference says that yield is only allowed in a function definition (though I may be reading it wrong and/or it may simply have been copied from the older version). The first two lines produce a SyntaxError in Python 2.7, but the third line doesn't.
Also, it seems odd
that a list comprehension returns a generator and not a list
and that the generator expression converted to a list and the corresponding list comprehension contain different values.
Could someone provide more information?
Note: this was a bug in the CPython's handling of yield in comprehensions and generator expressions, fixed in Python 3.8, with a deprecation warning in Python 3.7. See the Python bug report and the What's New entries for Python 3.7 and Python 3.8.
Generator expressions, and set and dict comprehensions are compiled to (generator) function objects. In Python 3, list comprehensions get the same treatment; they are all, in essence, a new nested scope.
You can see this if you try to disassemble a generator expression:
>>> dis.dis(compile("(i for i in range(3))", '', 'exec'))
1 0 LOAD_CONST 0 (<code object <genexpr> at 0x10f7530c0, file "", line 1>)
3 LOAD_CONST 1 ('<genexpr>')
6 MAKE_FUNCTION 0
9 LOAD_NAME 0 (range)
12 LOAD_CONST 2 (3)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 GET_ITER
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 3 (None)
26 RETURN_VALUE
>>> dis.dis(compile("(i for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
The above shows that a generator expression is compiled to a code object, loaded as a function (MAKE_FUNCTION creates the function object from the code object). The .co_consts[0] reference lets us see the code object generated for the expression, and it uses YIELD_VALUE just like a generator function would.
As such, the yield expression works in that context, as the compiler sees these as functions-in-disguise.
This is a bug; yield has no place in these expressions. The Python grammar before Python 3.7 allows it (which is why the code is compilable), but the yield expression specification shows that using yield here should not actually work:
The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition.
This has been confirmed to be a bug in issue 10544. The resolution of the bug is that using yield and yield from will raise a SyntaxError in Python 3.8; in Python 3.7 it raises a DeprecationWarning to ensure code stops using this construct. You'll see the same warning in Python 2.7.15 and up if you use the -3 command line switch enabling Python 3 compatibility warnings.
The 3.7.0b1 warning looks like this; turning warnings into errors gives you a SyntaxError exception, like you would in 3.8:
>>> [(yield i) for i in range(3)]
<stdin>:1: DeprecationWarning: 'yield' inside list comprehension
<generator object <listcomp> at 0x1092ec7c8>
>>> import warnings
>>> warnings.simplefilter('error')
>>> [(yield i) for i in range(3)]
File "<stdin>", line 1
SyntaxError: 'yield' inside list comprehension
The differences between how yield in a list comprehension and yield in a generator expression operate stem from the differences in how these two expressions are implemented. In Python 3 a list comprehension uses LIST_APPEND calls to add the top of the stack to the list being built, while a generator expression instead yields that value. Adding in (yield <expr>) just adds another YIELD_VALUE opcode to either:
>>> dis.dis(compile("[(yield i) for i in range(3)]", '', 'exec').co_consts[0])
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 13 (to 22)
9 STORE_FAST 1 (i)
12 LOAD_FAST 1 (i)
15 YIELD_VALUE
16 LIST_APPEND 2
19 JUMP_ABSOLUTE 6
>> 22 RETURN_VALUE
>>> dis.dis(compile("((yield i) for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 YIELD_VALUE
14 POP_TOP
15 JUMP_ABSOLUTE 3
>> 18 LOAD_CONST 0 (None)
21 RETURN_VALUE
The YIELD_VALUE opcode at bytecode indexes 15 and 12 respectively is extra, a cuckoo in the nest. So for the list-comprehension-turned-generator you have 1 yield producing the top of the stack each time (replacing the top of the stack with the yield return value), and for the generator expression variant you yield the top of the stack (the integer) and then yield again, but now the stack contains the return value of the yield and you get None that second time.
For the list comprehension then, the intended list object output is still returned, but Python 3 sees this as a generator so the return value is instead attached to the StopIteration exception as the value attribute:
>>> from itertools import islice
>>> listgen = [(yield i) for i in range(3)]
>>> list(islice(listgen, 3)) # avoid exhausting the generator
[0, 1, 2]
>>> try:
... next(listgen)
... except StopIteration as si:
... print(si.value)
...
[None, None, None]
Those None objects are the return values from the yield expressions.
And to reiterate this again; this same issue applies to dictionary and set comprehension in Python 2 and Python 3 as well; in Python 2 the yield return values are still added to the intended dictionary or set object, and the return value is 'yielded' last instead of attached to the StopIteration exception:
>>> list({(yield k): (yield v) for k, v in {'foo': 'bar', 'spam': 'eggs'}.items()})
['bar', 'foo', 'eggs', 'spam', {None: None}]
>>> list({(yield i) for i in range(3)})
[0, 1, 2, set([None])]
I have following list
x = [1,2,3]
if x:
dosomething()
if len(x)>0:
dosomething()
In above example which if statement will work faster ?
There is (from the result) no difference if x is list. But the first one is a bit faster:
%%timeit
if x:
pass
10000000 loops, best of 3: 95 ns per loop
than the second one:
%%timeit
if len(x) > 0:
pass
1000000 loops, best of 3: 276 ns per loop
You should use the first one with if x in almost all cases. Only if you want to distinguish between None, False and an empty list (or something similar) you might need something else.
The first statement would work faster as it doesn't need to execute a function whereas in the second statement something needs to be executed before it will run which in this case len(x).
Internally,
if x:
will be getting the size of the list object and will check if it is a non-zero value.
In this case,
if len(x) > 0:
you are doing it explicitly.
Moreover, PEP-0008 suggests the first form,
For sequences, (strings, lists, tuples), use the fact that empty sequences are false.
Yes: if not seq:
if seq:
No: if len(seq)
if not len(seq)
Expanding on the answer by #thefourtheye, here's a demo/proof that __len__ is called when you check the truth-value of a list:
>>> class mylist(list):
... def __len__(self):
... print('__len__ called')
... return super(mylist, self).__len__()
...
>>> a = mylist([1, 2, 3])
>>> if a:
... print('doing something')
...
__len__ called
doing something
>>>
>>> if len(a) > 0:
... print('doing something')
...
__len__ called
doing something
>>>
>>> bool(a)
__len__ called
True
And here's a quick timing:
In [3]: a = [1,2,3]
In [4]: timeit if a: pass
10000000 loops, best of 3: 28.2 ns per loop
In [5]: timeit if len(a) > 0: pass
10000000 loops, best of 3: 62.2 ns per loop
So the implicit check is slightly faster (probably because there's no overhead from the global len function) and as already mentioned suggested by PEP-0008.
If you look at the dis.dis() for each method you'll see that the second has to perform almost twice as many steps as the first one.
In [1]: import dis
In [2]: def f(x):
....: if x: pass
....:
In [3]: def g(x):
....: if len(x) > 0: pass
....:
In [4]: dis.dis(f)
2 0 LOAD_FAST 0 (x)
3 POP_JUMP_IF_FALSE 9
6 JUMP_FORWARD 0 (to 9)
>> 9 LOAD_CONST 0 (None)
12 RETURN_VALUE
In [5]: dis.dis(g)
2 0 LOAD_GLOBAL 0 (len)
3 LOAD_FAST 0 (x)
6 CALL_FUNCTION 1
9 LOAD_CONST 1 (0)
12 COMPARE_OP 4 (>)
15 POP_JUMP_IF_FALSE 21
18 JUMP_FORWARD 0 (to 21)
>> 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
Both of them need to do LOAD_FAST, POP_JUMP_IF_FALSE, JUMP_FORWARD, LOAD_CONST, and RETURN_VALUE. But the second method needs to additionally do
LOAD_GLOBAL, CALL_FUNCTION, LOAD_CONST, and COMPARE_OP. Therefore, the first method will be faster.
In reality, however, the difference in time between the two methods will be so minuscule that unless these if statements run millions of times in your code it will not noticeably impact the performance of your program. This sounds like an example of premature optimization to me.
Suppose I have the following, which is the better, faster, more Pythonic method
and why?
if x == 2 or x == 3 or x == 4:
do following...
or :
if x in (2, 3, 4):
do following...
In Python 3 (3.2 and up), you should use a set:
if x in {2, 3, 4}:
as set membership is a O(1) test, versus a worst-case performance of O(N) for testing with separate or equality tests or using membership in a tuple.
In Python 3, the set literal will be optimised to use a frozenset constant:
>>> import dis
>>> dis.dis(compile('x in {1, 2, 3}', '<file>', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_CONST 4 (frozenset({1, 2, 3}))
6 COMPARE_OP 6 (in)
9 POP_TOP
10 LOAD_CONST 3 (None)
13 RETURN_VALUE
Note that this optimisation was added to Python 3.2 and in Python 2 or 3.0 or 3.1 you'd be better of using a tuple instead. For a small number of elements, the difference in lookup time is nullified by the set creation for each execution.