What are these extra symbols in a comprehension's symtable? - python

I'm using symtable to get the symbol tables of a piece of code. Curiously, when using a comprehension (listcomp, setcomp, etc.), there are some extra symbols I didn't define.
Reproduction (using CPython 3.6):
import symtable
root = symtable.symtable('[x for x in y]', '?', 'exec')
# Print symtable of the listcomp
print(root.get_children()[0].get_symbols())
Output:
[<symbol '.0'>, <symbol '_[1]'>, <symbol 'x'>]
Symbol x is expected. But what are .0 and _[1]?
Note that with any other non-comprehension construct I'm getting exactly the identifiers I used in the code. E.g., lambda x: y only results in the symbols [<symbol 'x'>, <symbol 'y'>].
Also, the docs say that symtable.Symbol is...
An entry in a SymbolTable corresponding to an identifier in the source.
...although these identifiers evidently don't appear in the source.

The two names are used to implement list comprehensions as a separate scope, and they have the following meaning:
.0 is an implicit argument, used for the iterable (sourced from y in your case).
_[1] is a temporary name in the symbol table, used for the target list. This list eventually ends up on the stack.*
A list comprehension (as well as dict and set comprehension and generator expressions) is executed in a new scope. To achieve this, Python effectively creates a new anonymous function.
Because it is a function, really, you need to pass in the iterable you are looping over as an argument. This is what .0 is for, it is the first implicit argument (so at index 0). The symbol table you produced explicitly lists .0 as an argument:
>>> root = symtable.symtable('[x for x in y]', '?', 'exec')
>>> type(root.get_children()[0])
<class 'symtable.Function'>
>>> root.get_children()[0].get_parameters()
('.0',)
The first child of your table is a function with one argument named .0.
A list comprehension also needs to build the output list, and that list could be seen as a local too. This is the _[1] temporary variable. It never actually becomes a named local variable in the code object that is produced; this temporary variable is kept on the stack instead.
You can see the code object produced when you use compile():
>>> code_object = compile('[x for x in y]', '?', 'exec')
>>> code_object
<code object <module> at 0x11a4f3ed0, file "?", line 1>
>>> code_object.co_consts[0]
<code object <listcomp> at 0x11a4ea8a0, file "?", line 1>
So there is an outer code object, and in the constants, is another, nested code object. That latter one is the actual code object for the loop. It uses .0 and x as local variables. It also takes 1 argument; the names for arguments are the first co_argcount values in the co_varnames tuple:
>>> code_object.co_consts[0].co_varnames
('.0', 'x')
>>> code_object.co_consts[0].co_argcount
1
So .0 is the argument name here.
The _[1] temporary variable is handled on the stack, see the disassembly:
>>> import dis
>>> dis.dis(code_object.co_consts[0])
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (x)
8 LOAD_FAST 1 (x)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
Here we see .0 referenced again. _[1] is the BUILD_LIST opcode pushing a list object onto the stack, then .0 is put on the stack for the FOR_ITER opcode to iterate over (the opcode removes the iterable from .0 from the stack again).
Each iteration result is pushed onto the stack by FOR_ITER, popped again and stored in x with STORE_FAST, then loaded onto the stack again with LOAD_FAST. Finally LIST_APPEND takes the top element from the stack, and adds it to the list referenced by the next element on the stack, so to _[1].
JUMP_ABSOLUTE then brings us back to the top of the loop, where we continue to iterate until the iterable is done. Finally, RETURN_VALUE returns the top of the stack, again _[1], to the caller.
The outer code object does the work of loading the nested code object and calling it as a function:
>>> dis.dis(code_object)
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x11a4ea8a0, file "?", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_NAME 0 (y)
8 GET_ITER
10 CALL_FUNCTION 1
12 POP_TOP
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
So this makes a function object, with the function named <listcomp> (helpful for tracebacks), loads y, produces an iterator for it (the moral equivalent of iter(y), and calls the function with that iterator as the argument.
If you wanted to translate that to Psuedo-code, it would look like:
def <listcomp>(.0):
_[1] = []
for x in .0:
_[1].append(x)
return _[1]
<listcomp>(iter(y))
The _[1] temporary variable is of course not needed for generator expressions:
>>> symtable.symtable('(x for x in y)', '?', 'exec').get_children()[0].get_symbols()
[<symbol '.0'>, <symbol 'x'>]
Instead of appending to a list, the generator expression function object yields the values:
>>> dis.dis(compile('(x for x in y)', '?', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 10 (to 14)
4 STORE_FAST 1 (x)
6 LOAD_FAST 1 (x)
8 YIELD_VALUE
10 POP_TOP
12 JUMP_ABSOLUTE 2
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE
Together with the outer bytecode, the generator expression is equivalent to:
def <genexpr>(.0):
for x in .0:
yield x
<genexpr>(iter(y))
* The temporary variable is actually no longer needed; they were used in the initial implementation of comprehensions, but this commit from April 2007 moved the compiler to just using the stack, and this has been the norm for all of the 3.x releases, as well as Python 2.7. It still is easier to think of the generated name as a reference to the stack. Because the variable is no longer needed, I filed issue 32836 to have it removed, and Python 3.8 and onwards will no longer include it in the symbol table.
In Python 2.6, you can still see the actual temporary name in the disassembly:
>>> import dis
>>> dis.dis(compile('[x for x in y]', '?', 'exec'))
1 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_NAME 0 (_[1])
7 LOAD_NAME 1 (y)
10 GET_ITER
>> 11 FOR_ITER 13 (to 27)
14 STORE_NAME 2 (x)
17 LOAD_NAME 0 (_[1])
20 LOAD_NAME 2 (x)
23 LIST_APPEND
24 JUMP_ABSOLUTE 11
>> 27 DELETE_NAME 0 (_[1])
30 POP_TOP
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
Note how the name actually has to be deleted again!

So, the way list-comprehensions are implemented is actually by creating a code-object, it is sort of like creating a one-time use anonymous function, for scoping purposes:
>>> import dis
>>> def f(y): [x for x in y]
...
>>> dis.dis(f)
1 0 LOAD_CONST 1 (<code object <listcomp> at 0x101df9db0, file "<stdin>", line 1>)
3 LOAD_CONST 2 ('f.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_FAST 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>>
Inspecting the code-object, I can find the .0 symbol:
>>> dis.dis(f.__code__.co_consts[1])
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
Note, the LOAD_FAST in the list-comp code-object seems to be loading the unnamed argument, which would correspond to the GET_ITER

Related

Is it possible to call a function from within a list comprehension without the overhead of calling the function?

In this trivial example, I want to factor out the i < 5 condition of a list comprehension into it's own function. I also want to eat my cake and have it too, and avoid the overhead of the CALL_FUNCTION bytecode/creating a new frame in the python virtual machine.
Is there any way to factor out the conditions inside of a list comprehension into a new function but somehow get a disassembled result that avoids the large overhead of CALL_FUNCTION?
import dis
import sys
import timeit
def my_filter(n):
return n < 5
def a():
# list comprehension with function call
return [i for i in range(10) if my_filter(i)]
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
assert a() == b()
>>> sys.version_info[:]
(3, 6, 5, 'final', 0)
>>> timeit.timeit(a)
1.2616060493517098
>>> timeit.timeit(b)
0.685117881097812
>>> dis.dis(a)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F4890B660, file "<stdin>", line 3>)
# ...
>>> dis.dis(b)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F48A42270, file "<stdin>", line 3>)
# ...
# list comprehension with function call
# big overhead with that CALL_FUNCTION at address 12
>>> dis.dis(a.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (my_filter)
10 LOAD_FAST 1 (i)
12 CALL_FUNCTION 1
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
# list comprehension without function call
>>> dis.dis(b.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
I'm willing to take a hacky solution that I would never use in production, like somehow replacing the bytecode at run time.
In other words, is it possible to replace a's addresses 8, 10, and 12 with b's 8, 10, and 12 at runtime?
Consolidating all of the excellent answers in the comments into one.
As georg says, this sounds like you are looking for a way to inline a function or an expression, and there is no such thing in CPython attempts have been made: https://bugs.python.org/issue10399
Therefore, along the lines of "metaprogramming", you can build the lambda's inline and eval:
from typing import Callable
import dis
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
def gen_list_comprehension(expr: str) -> Callable:
return eval(f"lambda: [i for i in range(10) if {expr}]")
a = gen_list_comprehension("i < 5")
dis.dis(a.__code__.co_consts[1])
print("=" * 10)
dis.dis(b.__code__.co_consts[1])
which when run under 3.7.6 gives:
6 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
==========
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
From a security standpoint "eval" is dangerous, athough here it is less so because what you can do inside a lambda. And what can be done in an IfExp expression is even more limited, but still dangerous like call a function that does evil things.
However, if you want the same effect that is more secure, instead of working with strings you can modify AST's. I find that a lot more cumbersome though.
A hybrid approach would be the call ast.parse() and check the result. For example:
import ast
def is_cond_str(s: str) -> bool:
try:
mod_ast = ast.parse(s)
expr_ast = isinstance(mod_ast.body[0])
if not isinstance(expr_ast, ast.Expr):
return False
compare_ast = expr_ast.value
if not isinstance(compare_ast, ast.Compare):
return False
return True
except:
return False
This is a little more secure, but there still may be evil functions in the condition so you could keep going. Again, I find this a little tedious.
Coming from the other direction of starting off with bytecode, there is my cross-version assembler; see https://pypi.org/project/xasm/

Understanding python tuple to list performance optimization

I have a tuple, say, atup = (1,3,4,5,6,6,7,78,8) and produced dynamically by list of tuples when iterated (generator yield). Each tuple needs to get converted to list so each elements of tuple can be transformed further and used in a method. While doing this, I was surprised to learn that just doing list(atup) is much faster than using list comprehension like this [i for i in atup]. Here is what I did:
Performance Test 1:
timeit.timeit('list((1,3,4,5,6,6,7,78,8))', number=100000)
0.02268475245609025
Performance Test 2:
timeit.timeit('[i for i in (1,3,4,5,6,6,7,78,8)]', number=100000)
0.05304025196801376
Can you please explain this ?
The list comprehension has to iterate over the tuple at the Python level:
>>> dis.dis("[i for i in (1,2,3)]")
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x1075c0c90, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 5 ((1, 2, 3))
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
list iterates over the tuple itself, and uses the C API to do it without going through (as much of) the Python data model.
>>> dis.dis("list((1,2,3))")
1 0 LOAD_NAME 0 (list)
2 LOAD_CONST 3 ((1, 2, 3))
4 CALL_FUNCTION 1
6 RETURN_VALUE
The Python-level iteration is more clearly seen in Python 2, which implements list comprehensions in a different fashion.
>>> def f():
... return [i for i in (1,2,3)]
...
>>> dis.dis(f)
2 0 BUILD_LIST 0
3 LOAD_CONST 4 ((1, 2, 3))
6 GET_ITER
>> 7 FOR_ITER 12 (to 22)
10 STORE_FAST 0 (i)
13 LOAD_FAST 0 (i)
16 LIST_APPEND 2
19 JUMP_ABSOLUTE 7
>> 22 RETURN_VALUE
As #blhsing points out, you can get disassemble the code object generated by the list comprehension in Python 3 to see the same thing.
>>> code = compile('[i for i in (1,2,3)]', '', 'eval')
>>> dis(code.co_consts[0])
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
The list constructor is implemented purely in C and has therefore minimal overhead, while with a list comprehension the Python compiler has to build a temporary function, build an iterator, store the iterator's output as variable i, and load the variable i to append to a list, which are a lot more Python byte codes to execute than simply loading a tuple and calling the list constructor.

Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?

In Python 3, is a list comprehension simply syntactic sugar for a generator expression fed into the list function?
e.g. is the following code:
squares = [x**2 for x in range(1000)]
actually converted in the background into the following?
squares = list(x**2 for x in range(1000))
I know the output is identical, and Python 3 fixes the surprising side-effects to surrounding namespaces that list comprehensions had, but in terms of what the CPython interpreter does under the hood, is the former converted to the latter, or are there any difference in how the code gets executed?
Background
I found this claim of equivalence in the comments section to this question, and a quick google search showed the same claim being made here.
There was also some mention of this in the What's New in Python 3.0 docs, but the wording is somewhat vague:
Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.
Both work differently. The list comprehension version takes advantage of the special bytecode LIST_APPEND which calls PyList_Append directly for us. Hence it avoids an attribute lookup to list.append and a function call at the Python level.
>>> def func_lc():
[x**2 for x in y]
...
>>> dis.dis(func_lc)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> lc_object = list(dis.get_instructions(func_lc))[0].argval
>>> lc_object
<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
>>> dis.dis(lc_object)
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE
On the other hand the list() version simply passes the generator object to list's __init__ method which then calls its extend method internally. As the object is not a list or tuple, CPython then gets its iterator first and then simply adds the items to the list until the iterator is exhausted:
>>> def func_ge():
list(x**2 for x in y)
...
>>> dis.dis(func_ge)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (y)
15 GET_ITER
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> ge_object = list(dis.get_instructions(func_ge))[1].argval
>>> ge_object
<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
>>> dis.dis(ge_object)
2 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_CONST 0 (2)
15 BINARY_POWER
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 1 (None)
24 RETURN_VALUE
>>>
Timing comparisons:
>>> %timeit [x**2 for x in range(10**6)]
1 loops, best of 3: 453 ms per loop
>>> %timeit list(x**2 for x in range(10**6))
1 loops, best of 3: 478 ms per loop
>>> %%timeit
out = []
for x in range(10**6):
out.append(x**2)
...
1 loops, best of 3: 510 ms per loop
Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.
>>> %%timeit
out = [];append=out.append
for x in range(10**6):
append(x**2)
...
1 loops, best of 3: 467 ms per loop
Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:
>>> [x**2 for x in 1, 2, 3] # Python 2
[1, 4, 9]
>>> [x**2 for x in 1, 2, 3] # Python 3
File "<ipython-input-69-bea9540dd1d6>", line 1
[x**2 for x in 1, 2, 3]
^
SyntaxError: invalid syntax
>>> [x**2 for x in (1, 2, 3)] # Add parenthesis
[1, 4, 9]
>>> for x in 1, 2, 3: # Python 3: For normal loops it still works
print(x**2)
...
1
4
9
Both forms create and call an anonymous function. However, the list(...) form creates a generator function and passes the returned generator-iterator to list, while with the [...] form, the anonymous function builds the list directly with LIST_APPEND opcodes.
The following code gets decompilation output of the anonymous functions for an example comprehension and its corresponding genexp-passed-to-list:
import dis
def f():
[x for x in []]
def g():
list(x for x in [])
dis.dis(f.__code__.co_consts[1])
dis.dis(g.__code__.co_consts[1])
The output for the comprehension is
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
The output for the genexp is
7 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
You can actually show that the two can have different outcomes to prove they are inherently different:
>>> list(next(iter([])) if x > 3 else x for x in range(10))
[0, 1, 2, 3]
>>> [next(iter([])) if x > 3 else x for x in range(10)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
StopIteration
The expression inside the comprehension is not treated as a generator since the comprehension does not handle the StopIteration, whereas the list constructor does.
They aren't the same, list() will evaluate what ever is given to it after what is in the parentheses has finished executing, not before.
The [] in python is a bit magical, it tells python to wrap what ever is inside it as a list, more like a type hint for the language.

Is this calculation executed in Python?

Disclaimer: I'm new to programming, but new to Python. This may be a pretty basic question.
I have the following block of code:
for x in range(0, 100):
y = 1 + 1;
Is the calculation of 1 + 1 in the second line executed 100 times?
I have two suspicions why it might not:
1) The compiler sees 1 + 1 as a constant value, and thus compiles this line into y = 2;.
2) The compiler sees that y is only set and never referenced, so it omits this line of code.
Are either/both of these correct, or does it actually get executed each iteration over the loop?
Option 1 is executed; the CPython compiler simplifies mathematical expressions with constants in the peephole optimiser.
Python will not eliminate the loop body however.
You can introspect what Python produces by looking at the bytecode; use the dis module to take a look:
>>> import dis
>>> def f():
... for x in range(100):
... y = 1 + 1
...
>>> dis.dis(f)
2 0 SETUP_LOOP 26 (to 29)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 12 (to 28)
16 STORE_FAST 0 (x)
3 19 LOAD_CONST 3 (2)
22 STORE_FAST 1 (y)
25 JUMP_ABSOLUTE 13
>> 28 POP_BLOCK
>> 29 LOAD_CONST 0 (None)
32 RETURN_VALUE
The bytecode at position 19, LOAD_CONST loads the value 2 to store in y.
You can see the constants associated with the code object in the co_consts attribute of a code object; for functions you can find that object under the __code__ attribute:
>>> f.__code__.co_consts
(None, 100, 1, 2)
None is the default return value for any function, 100 the literal passed to the range() call, 1 the original literal, left in place by the peephole optimiser and 2 is the result of the optimisation.
The work is done in peephole.c, in the fold_binops_on_constants() function:
/* Replace LOAD_CONST c1. LOAD_CONST c2 BINOP
with LOAD_CONST binop(c1,c2)
The consts table must still be in list form so that the
new constant can be appended.
Called with codestr pointing to the first LOAD_CONST.
Abandons the transformation if the folding fails (i.e. 1+'a').
If the new constant is a sequence, only folds when the size
is below a threshold value. That keeps pyc files from
becoming large in the presence of code like: (None,)*1000.
*/
Take into account that Python is a highly dynamic language, such optimisations can only be applied to literals and constants that you cannot later dynamically replace.

Lambda function behavior with and without keyword arguments

I am using lambda functions for GUI programming with tkinter.
Recently I got stuck when implementing buttons that open files:
self.file=""
button = Button(conf_f, text="Tools opt.",
command=lambda: tktb.helpers.openfile(self.file))
As you see, I want to define a file path that can be updated, and that is not known when creating the GUI.
The issue I had is that earlier my code was :
button = Button(conf_f, text="Tools opt.",
command=lambda f=self.file: tktb.helpers.openfile(f))
The lambda function had a keyword argument to pass the file path. In this case, the parameter f was not updated when self.file was.
I got the keyword argument from a code snippet and I use it everywhere. Obviously I shouldn't...
This is still not clear to me... Could someone explain me the difference between the two lambda forms and when to use one an another?
PS: The following comment led me to the solution but I'd like a little more explanations:
lambda working oddly with tkinter
I'll try to explain it more in depth.
If you do
i = 0
f = lambda: i
you create a function (lambda is essentially a function) which accesses its enclosing scope's i variable.
Internally, it does so by having a so-called closure which contains the i. It is, loosely spoken, a kind of pointer to the real variable which can hold different values at different points of time.
def a():
# first, yield a function to access i
yield lambda: i
# now, set i to different values successively
for i in range(100): yield
g = a() # create generator
f = next(g) # get the function
f() # -> error as i is not set yet
next(g)
f() # -> 0
next(g)
f() # -> 1
# and so on
f.func_closure # -> an object stemming from the local scope of a()
f.func_closure[0].cell_contents # -> the current value of this variable
Here, all values of i are - at their time - stored in that said closure. If the function f() needs them. it gets them from there.
You can see that difference on the disassembly listings:
These said functions a() and f() disassemble like this:
>>> dis.dis(a)
2 0 LOAD_CLOSURE 0 (i)
3 BUILD_TUPLE 1
6 LOAD_CONST 1 (<code object <lambda> at 0xb72ea650, file "<stdin>", line 2>)
9 MAKE_CLOSURE 0
12 YIELD_VALUE
13 POP_TOP
3 14 SETUP_LOOP 25 (to 42)
17 LOAD_GLOBAL 0 (range)
20 LOAD_CONST 2 (100)
23 CALL_FUNCTION 1
26 GET_ITER
>> 27 FOR_ITER 11 (to 41)
30 STORE_DEREF 0 (i)
33 LOAD_CONST 0 (None)
36 YIELD_VALUE
37 POP_TOP
38 JUMP_ABSOLUTE 27
>> 41 POP_BLOCK
>> 42 LOAD_CONST 0 (None)
45 RETURN_VALUE
>>> dis.dis(f)
2 0 LOAD_DEREF 0 (i)
3 RETURN_VALUE
Compare that to a function b() which looks like
>>> def b():
... for i in range(100): yield
>>> dis.dis(b)
2 0 SETUP_LOOP 25 (to 28)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 11 (to 27)
16 STORE_FAST 0 (i)
19 LOAD_CONST 0 (None)
22 YIELD_VALUE
23 POP_TOP
24 JUMP_ABSOLUTE 13
>> 27 POP_BLOCK
>> 28 LOAD_CONST 0 (None)
31 RETURN_VALUE
The main difference in the loop is
>> 13 FOR_ITER 11 (to 27)
16 STORE_FAST 0 (i)
in b() vs.
>> 27 FOR_ITER 11 (to 41)
30 STORE_DEREF 0 (i)
in a(): the STORE_DEREF stores in a cell object (closure), while STORE_FAST uses a "normal" variable, which (probably) works a little bit faster.
The lambda as well makes a difference:
>>> dis.dis(lambda: i)
1 0 LOAD_GLOBAL 0 (i)
3 RETURN_VALUE
Here you have a LOAD_GLOBAL, while the one above uses LOAD_DEREF. The latter, as well, is for the closure.
I completely forgot about lambda i=i: i.
If you have the value as a default parameter, it finds its way into the function via a completely different path: the current value of i gets passed to the just created function via a default parameter:
>>> i = 42
>>> f = lambda i=i: i
>>> dis.dis(f)
1 0 LOAD_FAST 0 (i)
3 RETURN_VALUE
This way the function gets called as f(). It detects that there is a missing argument and fills the respective parameter with the default value. All this happens before the function is called; from within the function you just see that the value is taken and returned.
And there is yet another way to accomplish your task: Just use the lambda as if it would take a value: lambda i: i. If you call this, it complains about a missing argument.
But you can cope with that with the use of functools.partial:
ff = [functools.partial(lambda i: i, x) for x in range(100)]
ff[12]()
ff[54]()
This wrapper gets a callable and a number of arguments to be passed. The resulting object is a callable which calls the original callable with these arguments plus any arguments you give to it. It can be used here to keep locked to the value intended.

Categories

Resources