I'm trying to learn about the Lambda function in Python by messing up the code to observe what errors Python would throw in various scenarios.
Can anyone please explain what exactly is the meaning of following message displayed by the IDLE and how to interpret it?
def myfunc(n):
return lambda a:a*n
myfunc(2)
error:
(function myfunc.. at 0x037800B8)
Your defined function returns a lambda function. This isn't an error message, it is printing the lambda function object.
I'll assume you mis-typed your indentation, and that the code snippet was
def myfunc(n):
return lambda a:a*n
myfunc(2)
the lambda operator here returns a function doing x-> n*x. So myfunc is essentially a function that returns another function.
Your code is legit, and the output of python is only the description of the object myfunc(2) returned. Let's break it down:
the leading function confirms that you get a function
The myfunc.<locals>.lambda is the name of the object. That part explains that this function was created in myfunc, using some of the parameters of myfunc (note that you use n in the lambda definition`. This is called a closure.
the 0x037800B8 is the memory adress this python object is stored at
You should interpret this as a function object or a functor.
myfunc.<locals>.<lambda> its just says that it is a local variable of myfunc and its a lambda function.
myfunc(2) # is a function object
print(myfunc(2)(3)) # prints 6 which is 3 * 2
This function stores the inner variable n = 2 and then returns the called value. This concept is called functional closure.
You can get into more detail if you disassembled and dive deep if you want.
from dis import dis
def myfunc(n):
return lambda a:a*n
dis(myfunc)
OUTPUT
➜ codebase git:(master) ✗ python temp.py
3 0 LOAD_CLOSURE 0 (n)
2 BUILD_TUPLE 1
4 LOAD_CONST 1 (<code object <lambda> at 0x7fc5f671f5d0, file "temp.py", line 3>)
6 LOAD_CONST 2 ('myfunc.<locals>.<lambda>')
8 MAKE_FUNCTION 8
10 RETURN_VALUE
Disassembly of <code object <lambda> at 0x7fc5f671f5d0, file "temp.py", line 3>:
3 0 LOAD_FAST 0 (a)
2 LOAD_DEREF 0 (n)
4 BINARY_MULTIPLY
6 RETURN_VALUE
This means that your command
myfunc(2)
returned a function (namely, a lambda function in myfunc scope). 0x037800B8 is its address in memory.
Looks like you're returning the entire lambda function rather than returning the result. To return the lambda function's value, surround it with parenthesis like so:
>>> def myfunc(n):
return (lambda a: a*n)(n)
>>> myfunc(2)
4
You are simply returning a reference to a function. Lambdas were born to make short functions, short to type, avoiding the boilerplate of writing "def function(): .."
Try looking at the following tutorial to get the basics:
https://www.w3schools.com/python/python_lambda.asp
Related
My apologies for what some may regard as a fundamental question. In the following simple code:
def greet(name):
def say_hi():
print('Preparing to greet...')
print('Hi', name, '!')
print('Greeting given.')
return say_hi
What is the sequence of events when 'greet' is called with a formal parameter and the interpreter encounters the 'say_hi' function. I see that a reference to it is returned (forming a closure I assume?), but is the inner function executed or simply 'read' and not called until the programmer writes code like the following:
f = greet('Caroline')
f()
Since every thing in python is about runtime (except compile time tasks like peephole optimizer and etc.), python doesn't call your function unless you call it.
You can see this behavior by using dis function from dis module, which returns the relative bytecode of your function :
>>> def greet(name):
... def say_hi():
... print('Preparing to greet...')
... print('Hi', name, '!')
... print('Greeting given.')
... return say_hi
...
>>> import dis
>>>
>>> dis.dis(greet)
2 0 LOAD_CLOSURE 0 (name)
3 BUILD_TUPLE 1
6 LOAD_CONST 1 (<code object say_hi at 0x7fdacc12c8b0, file "<stdin>", line 2>)
9 MAKE_CLOSURE 0
12 STORE_FAST 1 (say_hi)
6 15 LOAD_FAST 1 (say_hi)
18 RETURN_VALUE
As you can see in part 6 python just load the function as a code object in a CONST value.
First of all, let me say that I read the many threads with similar topics on creating dynamically named variables, but they mostly relate to Python 2 or they assume you are working with classes. And yes, I read Behavior of exec function in Python 2 and Python 3.
I'm also aware that creating dynamically named variables is a bad idea in 99% of time and dictionaries are the way to got, but I just want to know whether it is still possible and how exactly exec and locals work in python 3.
I'd like to show a bit of sample code illustrating my question (fibonacci calculates fibonacci numbers, ListOfLetters provides ["A", "B", ...]):
def functionname():
for index, buchstabe in enumerate(ListOfLetters.create_list("A", "K"), 1):
exec("{} = {}".format(buchstabe, fibonacci(index)) ) #A = 1, B = 1, C = 2, D = 3, E = 5,...
print(index, buchstabe, eval(buchstabe)) #works nicely, e.g. prints "4 D 3"
print(locals()) #pritns all locals: {'B': 1, 'A': 1, 'index': 11, 'C': 2, 'H': 21, 'K': 89, ...
print(locals()['K']) #prints 89 as it should
print(eval("K")) #prints 89 as it should
print(K) #NameError: name 'K' is not defined
So at least at my current understanding, there is some inconsistency in the behaviour of locals(), since it contains the variable names added by exec() but the variables are not available in the function.
I would be greatful if someone could explain this and tell whether this is by design or if it is a real inconsistency in the language. Yes, I know that locals should not be modified , but I'm not modifying it, I'm calling exec()...
When you're not sure why something works the way it does in Python, it often can help to put the behavior that you're confused by in a function and then disassemble it from the Python bytecode with the dis module.
Lets start with a simpler version of your code:
def foo():
exec("K = 89")
print(K)
If you run foo(), you'll get the same exception you're seeing with your more complicated function:
>>> foo()
Traceback (most recent call last):
File "<pyshell#167>", line 1, in <module>
foo()
File "<pyshell#166>", line 3, in foo
print(K)
NameError: name 'K' is not defined
Lets disassemble it and see why:
>>> import dis
>>> dis.dis(foo)
2 0 LOAD_GLOBAL 0 (exec)
3 LOAD_CONST 1 ('K = 89')
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP
3 10 LOAD_GLOBAL 1 (print)
13 LOAD_GLOBAL 2 (K)
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 POP_TOP
20 LOAD_CONST 0 (None)
23 RETURN_VALUE
The operation that you need to pay attention to is the one labeled "13". This is where the compiler handles looking up K within the last line of the function (print(K)). It is using the LOAD_GLOBAL opcode, which fails because "K" is not a global variable name, rather it's a value in our locals() dict (added by the exec call).
What if we persuaded the compiler to see K as a local variable (by giving it a value before running the exec), so it will know not to look for a global variable that doesn't exist?
def bar():
K = None
exec("K = 89")
print(K)
This function won't give you an error if you run it, but you won't get the expected value printed out:
>>> bar()
None
Lets disassemble to see why:
>>> dis.dis(bar)
2 0 LOAD_CONST 0 (None)
3 STORE_FAST 0 (K)
3 6 LOAD_GLOBAL 0 (exec)
9 LOAD_CONST 1 ('K = 89')
12 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
15 POP_TOP
4 16 LOAD_GLOBAL 1 (print)
19 LOAD_FAST 0 (K)
22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
25 POP_TOP
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
Note the opcodes used at "3" and "19". The Python compiler uses STORE_FAST and LOAD_FAST to put the value for the local variable K into slot 0 and later fetch it back out. Using numbered slots is significantly faster than inserting and fetching values from a dictionary like locals(), which is why the Python compiler does it for all local variable access in a function. You can't overwrite a local variable in a slot by modifying the dictionary returned by locals() (as exec does, if you don't pass it a dict to use for its namespace).
Indeed, lets try a third version of our function, where we peek into locals again when we have K defined as a regular local variable:
def baz():
K = None
exec("K = 89")
print(locals())
You won't see 89 in the output this time either!
>>> baz()
{"K": None}
The reason you see the old K value in locals() is explained in the function's documentation:
Update and return a dictionary representing the current local symbol table.
The slot that the local variable K's value is stored in was not changed by the exec statement, which only modifies the locals() dict. When you call locals() again, Python "update[s]" the dictionary with the value from the slot, replacing the value stored there by exec.
This is why the docs go on to say:
Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
Your exec call is modifying the locals() dict, and you're finding how its changes are not always seen by your later code.
On the exec/eval/locals question
At least on the CPython implementation modifications to the locals() dictionary do not actually change the names in the local scope, which is why it's meant to be used read-only. You can change it, and you can see your changes in the dictionary object, but the actual local scope is not changed.
exec() takes two optional dictionary arguments, a global scope and a local scope. It defaults to globals() and locals(), but since changes to locals() aren't "real" outside of the dictionary, exec() only affects the "real" local scope when globals() is locals(), i.e. in a module outside of any function. (So in your case it's failing because it's inside a function scope).
The "better" way to use exec() in this case is to pass in your own dictionary, then operate on the values in that.
def foo():
exec_scope = {}
exec("y = 2", exec_scope)
print(exec_scope['y'])
foo()
In this case, exec_scope is used as the global and local scope for the exec, and after the exec it will contain {'y': 2, '__builtins__': __builtins__} (the builtins are inserted for you if not present)
If you want access to more globals you could do exec_scope = dict(globals()).
Passing in different global and local scope dictionaries can produce "interesting" behavior.
If you pass the same dictionary into successive calls to exec or eval, then they have the same scope, which is why your eval worked (it implicitly used the locals() dictionary).
On dynamic variable names
If you set the name from a string, what's so wrong about getting the value as a string (i.e. what a dictionary does)? In other words, why would you want to set locals()['K'] and then access K? If K is in your source it's not really a dynamically set name... hence, dictionaries.
While I was hanging out in the Python chatroom, someone dropped in and reported the following exception:
NameError: free variable 'var' referenced before assignment in enclosing scope
I'd never seen that error message before, and the user provided only a small code fragment that couldn't have caused the error by itself, so off I went googling for information, and ... there doesn't seem to be much. While I was searching, the user reported their problem solved as a "whitespace issue", and then left the room.
After playing around a bit, I've only been able to reproduce the exception with toy code like this:
def multiplier(n):
def multiply(x):
return x * n
del n
return multiply
Which gives me:
>>> triple = multiplier(3)
>>> triple(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in multiply
NameError: free variable 'n' referenced before assignment in enclosing scope
All well and good, but I'm having a hard time working out how this exception could occur in the wild, given that my example above is
Pretty stupid
Unlikely to happen by accident
... but obviously it does, given the report I mentioned at the start of this question.
So - how can this specific exception occur in real code?
Think of a more complex function where n is bound depending on some condition, or not. You don't have to del the name in question, it also happens if the compiler sees an assignment, so the name is local, but the code path is not taken and the name gets never assigned anything. Another stupid example:
def f():
def g(x):
return x * n
if False:
n = 10
return g
Too late to answer but I think I can give some detailed information to this situation. It would help future readers to see what's going on here.
So the error message says:
NameError: free variable 'var' referenced before assignment in enclosing scope
When we talk about free variables, we're dealing with nested functions. Python has done some "magic" in order to give nested functions the ability to access the variables defined inside their parent scope. If we have:
def outer():
foo = 10
def inner():
print(foo)
return inner
outer()() # 10
Normally we shouldn't have access to foo in inner function. Why ? because after calling and executing the body of the outer function, its namespace is destroyed. Basically any local variable defined inside the function is no longer available after the function terminates.
But we have access...
That magic happens with the help of the "Cell object":
“Cell” objects are used to implement variables referenced by multiple scopes. For each such variable, a cell object is created to store the value; the local variables of each stack frame that references the value contains a reference to the cells from outer scopes which also use that variable. When the value is accessed, the value contained in the cell is used instead of the cell object itself.
Just to see that hidden stored value in cells(we'll talk about __closure__ a bit later):
def outer():
foo = 10
def inner():
print(foo)
return inner
print(outer().__closure__[0].cell_contents) # 10
How does it work?
In "compile" time,
when Python sees a function within another function, it takes note of the name of the variables referenced inside the nested function which are actually defined in the outer function. This information is stored in both functions' code objects. co_cellvars for outer function and co_freevars for inner function:
def outer():
foo = 10
def inner():
print(foo)
return inner
print(outer.__code__.co_cellvars) # ('foo',)
print(outer().__code__.co_freevars) # ('foo',)
Now execution time..., (see the code)
When Python wants to execute the outer function, it creates a "cell object" for each variables (co_cellvars) that it has taken a note of.
Then as it goes through the lines, whenever it sees an assignment to such variables, it fills the corresponding cell object with that variable. (remember, "they" contain the actual values indirectly.)
When the execution reaches the line of creating the inner function, Python takes all the created cell objects and make a tuple out of them. This tuple is then assigned to the inner function's __closure__.
The point is when this tuple is created, some of the cells may not have value yet. They are empty(see the output)!...
At this point when you call the inner function those cells without value will raise that mentioned error!
def outer():
foo = 10
def inner():
print(foo)
try:
print(boo)
except NameError as e:
print(e)
# Take a look at inner's __closure__ cells
print(inner.__closure__)
# So one boo is empty! This raises error
inner()
# Now lets look at inner's __closure__ cells one more time (they're filled now)
boo = 20
print(inner.__closure__)
# This works fine now
inner()
outer()
output from Python 3.10:
(<cell at 0x7f14a5b62710: empty>, <cell at 0x7f14a5b62830: int object at 0x7f14a6f00210>)
10
free variable 'boo' referenced before assignment in enclosing scope
(<cell at 0x7f14a5b62710: int object at 0x7f14a6f00350>, <cell at 0x7f14a5b62830: int object at 0x7f14a6f00210>)
10
20
The error free variable 'boo' referenced before assignment in enclosing scope makes sense now.
Note: This error is reworded in Python 3.11 to:
cannot access free variable 'boo' where it is not associated with a value in enclosing scope
But the idea is the same.
If you look at the bytecode of the outer function, you'd see the steps I mentioned in the "execution time" section in action:
from dis import dis
def outer():
foo = 10
def inner():
print(foo)
print(boo)
boo = 20
return inner
dis(outer)
output from Python 3.11:
0 MAKE_CELL 1 (boo)
2 MAKE_CELL 2 (foo)
3 4 RESUME 0
4 6 LOAD_CONST 1 (10)
8 STORE_DEREF 2 (foo)
5 10 LOAD_CLOSURE 1 (boo)
12 LOAD_CLOSURE 2 (foo)
14 BUILD_TUPLE 2
16 LOAD_CONST 2 (<code object inner at 0x7fb6d4731a30, file "", line 5>)
18 MAKE_FUNCTION 8 (closure)
20 STORE_FAST 0 (inner)
8 22 LOAD_CONST 3 (20)
24 STORE_DEREF 1 (boo)
9 26 LOAD_FAST 0 (inner)
28 RETURN_VALUE
MAKE_CELL is new in Python3.11.
STORE_DEREF stores the value inside the cell object.
Please consider the following code:
import re
def qcharToUnicode(s):
p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")
return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)
def fixSurrogatePresence(s) :
'''Returns the input UTF-16 string with surrogate pairs replaced by the character they represent'''
# ideas from:
# http://www.unicode.org/faq/utf_bom.html#utf16-4
# http://stackoverflow.com/a/6928284/1503120
def joinSurrogates(match) :
SURROGATE_OFFSET = 0x10000 - ( 0xD800 << 10 ) - 0xDC00
return chr ( ( ord(match.group(1)) << 10 ) + ord(match.group(2)) + SURROGATE_OFFSET )
return re.sub ( '([\uD800-\uDBFF])([\uDC00-\uDFFF])', joinSurrogates, s )
Now my questions below probably reflect a C/C++ way of thinking (and not a "Pythonic" one) but I'm curious nevertheless:
I'd like to know whether the evaluation of the compiled RE object p in qcharToUnicode and SURROGATE_OFFSET in joinSurrogates will take place at each call to the respective functions or only once at the point of definition? I mean in C/C++ one can declare the values as static const and the compile will (IIUC) make the construction occur only once, but in Python we do not have any such declarations.
The question is more pertinent in the case of the compiled RE object, since it seems that the only reason to construct such an object is to avoid the repeated compilation, as the Python RE HOWTO says:
Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re
accessing a regex within a loop, pre-compiling it will save a few function calls.
... and this purpose would be defeated if the compilation were to occur at each function call. I don't want to put the symbol p (or SURROGATE_OFFSET) at module level since I want to restrict its visibility to the relevant function only.
So does the interpreter do something like heuristically determine that the value pointed to by a particular symbol is constant (and visible within a particular function only) and hence need not be reconstructed at next function? Further, is this defined by the language or implementation-dependent? (I hope I'm not asking too much!)
A related question would be about the construction of the function object lambda m in qcharToUnicode -- is it also defined only once like other named function objects declared by def?
The simple answer is that as written, the code will be executed repeatedly at every function call. There is no implicit caching mechanism in Python for the case you describe.
You should get out of the habit of talking about "declarations". A function definition is in fact also "just" a normal statement, so I can write a loop which defines the same function repeatedly:
for i in range(10):
def f(x):
return x*2
y = f(i)
Here, we will incur the cost of creating the function at every loop run. Timing reveals that this code runs in about 75% of the time of the previous code:
def f(x):
return x*2
for i in range(10):
y = f(i)
The standard way of optimising the RE case is as you already know to place the p variable in the module scope, i.e.:
p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")
def qcharToUnicode(s):
return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)
You can use conventions like prepending "_" to the variable to indicate it is not supposed to be used, but normally people won't use it if you haven't documented it. A trick to make the RE function-local is to use a consequence about default parameters: they are executed at the same time as the function definition, so you can do this:
def qcharToUnicode(s, p=re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")):
return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)
This will allow you the same optimisation but also a little more flexibility in your matching function.
Thinking properly about function definitions also allows you to stop thinking about lambda as different from def. The only difference is that def also binds the function object to a name - the underlying object created is the same.
Python is a script/interpreted language... so yes, the assignment will be made every time you call the function. The interpreter will parse your code only once, generating Python bytecode. The next time you call this function, it will be already compiled into Python VM bytecode, so the function will be simply executed.
The re.compile will be called every time, as it would be in other languages. If you want to mimic a static initialization, consider using a global variable, this way it will be called only once. Better, you can create a class with static methods and static members (class and not instance members).
You can check all this using the dis module in Python. So, I just copied and pasted your code in a teste.py module.
>>> import teste
>>> import dis
>>> dis.dis(teste.qcharToUnicode)
4 0 LOAD_GLOBAL 0 (re)
3 LOAD_ATTR 1 (compile)
6 LOAD_CONST 1 ('QChar\\((0x[a-fA-F0-9]*)\\)')
9 CALL_FUNCTION 1
12 STORE_FAST 1 (p)
5 15 LOAD_FAST 1 (p)
18 LOAD_ATTR 2 (sub)
21 LOAD_CONST 2 (<code object <lambda> at 0056C140, file "teste.py", line 5>)
24 MAKE_FUNCTION 0
27 LOAD_FAST 0 (s)
30 CALL_FUNCTION 2
33 RETURN_VALUE
Yes, they are. Suppose re.compile() had a side-effect. That side effect would happen everytime the assignment to p was made, ie., every time the function containing said assignment was called.
This can be verified:
def foo():
print("ahahaha!")
return bar
def f():
return foo()
def funcWithSideEffect():
print("The airspeed velocity of an unladen swallow (european) is...")
return 25
def funcEnclosingAssignment():
p = funcWithSideEffect()
return p;
a = funcEnclosingAssignment()
b = funcEnclosingAssignment()
c = funcEnclosingAssignment()
Each time the enclosing function (analogous to your qcharToUnicode) is called, the statement is printed, revealing that p is being re-evaluated.
So AFAIK in CPython, function definitions are compiled into function objects when executed at parse time. But what about inner functions? Do they get compiled into function objects at parse time or do they get compiled (or interpreted) every single time the function is called? Do inner functions incur any performance penalty at all?
To give a general explaination - assuming you have the following code in a module:
def outer(x=1):
def inner(y=2):
return x+y
When the file is parsed by python via compile(), the above text is turned into bytecode for how to execute the module. In the module bytecode, there are two "code objects", one for the bytecode of outer() and one for the bytecode inner(). Note that I said code objects, not functions - the code objects contain little more than the bytecode used by the function, and any information that could be known at compile time - such as the bytecode for outer() containing a ref to the bytecode for inner().
When the module actually loads, by evaluating the code object associated with the module, one thing which happens is an actual "function object" is created for outer(), and stored in the module's outer attribute. The function object acts as a collection of the bytecode and all context-relavant things that are needed to call the function (eg which globals dict it should pull from, etc) that can't be known at compile time. In a way, a code object is a template for a function, which is a template for execution of the actual bytecode with all variables filled in.
None of this involved inner()-as-a-function yet - Each time you actually get around to calling outer(), that's when a new inner() function object is created for that invocation of outer, which binds the already-created inner bytecode object to a list of globals, including the value of x as passed into that call to outer. As you can imagine, this is pretty fast, since no parsing is needed, just filling in a quick struct with some pointers to other already-existing objects.
Easy test: the default arguments to a function are called once, at define time.
>>> def foo():
... def bar(arg=count()):
... pass
... pass
...
>>> def count():
... print "defined"
...
>>> foo()
defined
>>> foo()
defined
So yes: this is a minor (very very! minor) performance hit.
>>> import dis
>>> def foo():
... def bar():
... print "stuff"
... return bar
...
>>> b = foo()
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (<code object bar at 0x20bf738, file "<stdin>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (bar)
4 9 LOAD_FAST 0 (bar)
12 RETURN_VALUE
>>> dis.dis(b)
3 0 LOAD_CONST 1 ('stuff')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
I suspect this is heavily implementation dependent, but that was CPython 2.6.6, and the inner function looks like it was compiled. Here's another example:
>>> def foo():
... def bar():
... return 1
... return dis.dis(bar)
...
>>> foo()
3 0 LOAD_CONST 1 (1)
3 RETURN_VALUE
So we can conclude that they are compiled. As for their performance characteristics, use them. If you start having performance issues, profile. I know it's not really an answer, but it almost never matters and when it does, general answers don't cut it. Function calls incur some overhead and it looks like inner functions are just like functions.
To extend nmichaels answer inner function are compiled in compile time as he guessed and there byte code is saved in the foo.func_code.co_consts and they are accessed using the opcode LOAD_CONST as you can see in the disassembly of the function.
Example:
>>> def foo():
... def inner():
... pass
>>> print foo.func_code.co_consts
(None, <code object inner at 0x249c6c0, file "<ipython console>", line 2>)
I'm late on this, but as a little experimental complement to these thorough answers: you may use the builtin function id to verify whether a new object is created or not:
In []: # inner version
def foo():
def bar():
return id(bar)
return bar()
foo(), foo()
Out[]: (4352951432, 4352952752)
The actual numbers may differ, but their difference indicates that two distinct instances of bar are indeed created.
In []: # outer version
def bar():
return id(bar)
def foo():
return bar()
foo(), foo()
Out[]: (4352950952, 4352950952)
This time, as expected, the two ids are the same.
Now for some timeit measurements. Inner first, outer second:
100000 loops, best of 3: 1.93 µs per loop
1000000 loops, best of 3: 1.25 µs per loop
So, on my machine, it seems that the inner version is 50% slower (Python 2.7, IPython Notebook).