Code first:
def another_func(func):
func()
def outer_func(pa=1, pb=2):
def inner_func():
print(pa)
print(type(inner_func))
another_func(inner_func)
if __name__ == '__main__':
outer_func()
#print "1"
I am not sure that, "inner_func" calls a parameter of "outer_func", but that is in the body of "outer_func". How can it "know" there is a "pa" when called by another_func?
I mean, what is actually passed to another_func when its called in "outer_func"? It seems there is something more than a reference of the function object.
Function objects in Python aren't just functions, they're closures:1 they carry around a reference to the local environment where the def statement was executed.
In particular, local variables from inside outer_func can be accessed from inside inner_func. (Even if you return inner_func, those values are kept alive so the closure will still work, for as long as inner_func is alive.)
If you add a nonlocal statement inside inner_func, it can even reassign the local variables from the body of outer_func.
How does this work?
Well, a def statement2 is just a statement, like any other. What it does is something like this:
inner_func = _make_closure(<code from compiling inner_func body>, locals())
That <code from compiling inner_func body> is actually a constant value—the compiler compiles the body of every function in your module into constant code objects at import time.
But the function object that comes back from that _make_closure is a new thing that's created on the fly, and it has a reference to the local variables baked into it. Every time you run outer_func, it creates a new inner_func closure from the same <code>, each one capturing the current local environment.
The details are a little more complicated—and, to some extent, they differ between implementations, so this will be CPython-specific.
Part of the compiler's job is figuring out what kind of variable each name in a function is. You may have read the rules on globals vs. locals (a variable is local if and only if you have an assignment for the name somewhere in the function body, and there's no global statement). But closures make things more complicated.
If a variable would have been local, but a nested function references the variable without assigning to it, or has a nonlocal statement, then it's a cell variable in the outer function, and a free variable in the inner function.3
When the interpreter calls a function, it creates a frame object that holds the local namespace—the references to all of the function's local variables.
But cell variables are special: the interpreter creates a special cell object for each one, and a reference to that cell goes into the namespace, so there's an extra dereference in front of the value every time you access or change it.
And what that _make_closure pseudo-code above does is to copy the cells from the outer function's frame to a special attribute on the nested function called __closure__.
Then, when you call the inner function, the interpreter copies those cells from the __closure__ into the frame for that function.
So, the outer function's frame and the inner function's frame both have references to the same cells, which is how they can share variables.
From more on this, see the inspect module's docs, which show you how to find things like __closure__ and co_freevars in your interactive interpreter, and the dis module which lets you look at the actual bytecode that your functions get compiled to.
1. This is one of those words that has a slew of related but different meanings. "Closure" can mean the technique of capturing the local namespace in a function, or it can mean the captured namespace, or it can mean the function with a captured namespace attached to it, or it can mean one of the variables in the captured namespace. Usually it's obvious which one you mean from context. If not, you have to say something like "closure capture" or "closure function" or "closure variable".
2. In case you're wondering, lambda expressions work exactly the same way as def statements. And class definitions are not identical, but similar.
3. It's actually still more complicated if you have multiple layers of nesting, but let's ignore that.
You seem to be confusing the code if the function with the function object. The code object is evaluated only once, when the source file is read. However, a new function object called inner_func is created every time outer_func is called. This happens because a def statement is a type of assignment: it associates a function object with the specified name.
The function object contains a reference to its code as a matter of course, along with references to all the namespaces it will need to operate, including it's parent's nonlocal namespace and the global module namespace.
So the value of pa in inner_func is going to be whatever it is in outer_func at the time of calling. The reference is to the namespace, not the name itself. If outer_func returns (think decorators), the namespace will be fixed, and only accessible through inner_func's special reference to it.
Related
Suppose I have the following code:
def outer(information):
print(locals())
def inner():
print("The information given to me is: ", information)
return inner
func1 = outer("info1")
print(func1)
It returns:
{'information': 'info1'}
<function outer.<locals>.inner at 0x1004d9d30>
Of course, if I call func1, it will print with info1 in the statement. So, from printing the locals() in the outer function, I can see that there is some relationship between the local scope and the storage of the argument.
I was expecting func1 to simply be outer.inner, why does the syntax instead say outer.<locals>.inner? Is this a syntactical way of clarifying that there are different local scopes associated to each of these functions - imagine I made another one func2 = outer("info2") - I return using the outer function?
Also, is there something special about the enclosing <> syntax when used around a name? I see it around both the object and locals.
See PEP 3155 -- Qualified name for classes and functions and the example with nested functions.
For nested classes, methods, and nested functions, the __qualname__ attribute contains a dotted path leading to the object from the module top-level. A function's local namespace is represented in that dotted path by a component named <locals>.
Since the __repr__ of a function uses the __qualname__ attribute, you see this extra component in the output when printing a nested function.
I was expecting func1 to simply be outer.inner
That's not a fully qualified name. With this repr you might mistakenly assume you could import the name outer and dynamically access the attribute inner. Remember the qualname is a "dotted path leading to the object", but in this case attribute access is not possible because inner is a local variable.
Also, is there something special about the enclosing <> syntax when used around a name?
There is nothing special about it, but it gives a pretty strong hint to the programmer that you can't access this namespace directly, because the name is not a valid identifier.
You can think of outer.<locals>.inner as saying that inner is a local variable created by the function. inner is what is referred to a closure in computer science. Roughly speaking a closure is like a lambda in that it acts as a function, but it requires non-global data be bundled with it to operate. In memory it acts as a tuple between information and a reference to the function being called.
foo = outer("foo")
bar = outer("bar")
# In memory these more or less looks like the following:
("foo", outer.inner)
("bar", outer.inner)
# And since it was created from a local namespace and can not be accessed
# from a static context local variables bundled with the function, it
# represents that by adding <local> when printed.
# While something like this looks a whole lot more convenient, it gets way
# more annoying to work with when the local variables used are the length of
# your entire terminal screen.
<function outer."foo".inner at 0x1004d9d30>
There is nothing inherently special about the <> other than informing you that <local> has some special meaning.
Edit:
I was not completely sure when writing my answer, but after seeing #wim's answer <local> not only applies to closures created consuming variables within a local context. It can be applied more broadly to all functions (or anything else) created within a local namespace. So in summary foo.<local>.bar just means that "bar was created within the local namespace of foo".
As you may know, the scope of a variable is statically determined in python (What are the rules for local and global variables in Python?).
For instance :
a = "global"
def function():
print(a)
a = "local"
function()
# UnboundLocalError: local variable 'a' referenced before assignment
The same rule applies to classes, but it seems to default to the global scope instead of raising an AttributeError:
a = "global"
class C():
print(a)
a = "local"
# 'global'
Moreover, in the case of a nested function, the behavior is the same (without using nonlocal or global) :
a = "global"
def outer_func():
a = "outer"
def inner_func():
print(a)
a = "local"
inner_func()
outer_func()
# UnboundLocalError: local variable 'a' referenced before assignment
But in the case of nested classes, it still defaults to the global scope, and not the outer scope (again without using global or nonlocal) :
a = "global"
def outer_func():
a = "outer"
class InnerClass:
print(a)
a = "local"
outer_func()
# 'global'
The weirdest part is that the nested class default to the outer scope when there is no declaration of a :
a = "global"
def outer_func():
a = "outer"
class InnerClass:
print(a)
outer_func()
# 'outer'
So my questions are :
Why the discrepancy between functions and classes (one raising an exception, the other defaulting to the global scope.
In nested classes, why the default scope has to become global instead of keeping using the outer one when using a variable defined afterward?
The answer is given in great detail in Section 9.2 of the official docs. The crux of the matter is
... On the other hand, the actual search for names is done dynamically, at run time — however, the language definition is evolving towards static name resolution, at “compile” time, so don’t rely on dynamic name resolution! (In fact, local variables are already determined statically.)
When you are in the class definition, which at the moment of its execution is the innermost scope, dynamic name resolution applies. You therefore see printouts of the global value of a.
If the name resolution were static, as in function definitions, the name a would be recognized as a local name even in the print statement. That is why you can't print a in a function before assigning to it.
The rules for class body scoping are alluded to in Section 4.2.2:
Class definition blocks and arguments to exec() and eval() are special in the context of name resolution. A class definition is an executable statement that may use and define names. These references follow the normal rules for name resolution with an exception that unbound local variables are looked up in the global namespace.
Let's parse that last sentence carefully, because it fully covers your last two examples. First off, what is an unbound local variable in this context? A class body creates a new namespace, just like entering a function. If a name is bound somewhere in a class body, it is a local variable. This is determined statically, as mentioned above. If you attempt to reference the name before it is first bound, you have an unbound local variable. Instead of raising an error, as a function call would do, python jumps straight to the global namespace to perform the lookup (and ignores builtins as well). In all other cases (not local variables), normal LEGB lookup order applies.
This is indeed a bit counter-intuitive, and I would argue that it pushes if not outright breaks the rule of least surprise.
This question already has answers here:
Short description of the scoping rules?
(9 answers)
Closed 2 years ago.
I seem to have misunderstood something about Python variable binding. What are the precise rules for deciding which variable is accessed given a nested scope with shadowing names?
Let me illustrate with some examples. First the basic shadow.
a = 1
def foo():
a = 2
def _foo():
return a
return _foo()
print(foo()) # -> 2
Everything is fine here. The value is overwritten and returned accordingly. However, if the value is changed after the function definition, it is still the inner value:
def bar():
def _bar():
return a
a = 2
return _bar()
print(bar()) # -> 2
What's more, defining a function that references a non-existent variable is possible.
def baz():
def _baz():
return b
return _baz()
Then, if b is defined later, the function can be executed. But not if is defined in another inner scope:
def qux(f):
b = 3
return f()
print(qux(baz())) # -> NameError
Now all of these cases could be explained by having Python know about lines that come later in the program, but that conflicts with my knowledge of Python being an interpreted language, advancing line by line. So are statements parsed at once instead of line by line?
A weird behaviour with shadowing class attributes throws me off a bit more.
class C:
a = 2
b = a
def meth(self):
return a
c = meth
print(C.b, C().meth(), C.c) # -> 2 1 C.meth
Here a is defined as a class attribute and is successfully used in b, but this does not carry over to the method definition. The method itself can be used in later attributes, but not for example in other methods without going through self.
Is my guess about the binding happening all at once correct? And in that case are class bodies an exception by design, or are they not a scope at all? Or is something else going on here?
I think you might be overthinking it.
By default, variables when created are put in the narrowest enclosing function's scope.
Variables from all enclosing scopes are available in a read-only capacity, be that an enclosing function's scope or the global scope. If you try to assign to this, it'll create a new variable in the narrowest enclosing scope, shadowing those outside. Using the global keyword to bring an external variable into the local scope will stop this from happening, allowing you to assign things to the non-local scope.
Additionally, keep in mind that functions are compiled and evaluated at the time when the def statement is interpreted. For nested functions, essentially, every new call re-evaluates the inner functions. This also means that inner functions have read-only access to the scope of the outer functions. Same rules as usual.
Your bar() example works because, by the time python tries to access the variable a, it is present in at least one of the enclosing scopes. Python doesn't check these things until the last possible moment. Your qux() example doesn't work because the scope in which b is declared does not enclose the scope where _baz() is defined, and thus is not accessible.
Class scopes are weird. When the class is evaluated, all variables defined inside it are bound to the class. However, the class doesn't really count as a scope of its own, for the purpose of the methods enclosed inside it. Think of meth() as an unbound function, declared in the global scope, which C.meth refers to (and, now, C.c). Calling a function via dot notation is a syntactic shorthand:
# the following two are identical
C().meth()
C.meth(C())
and while C.meth is technically bound to C, it's not enclosed in C's class-level namespace. Trying to do C().meth() will fail, because a is not defined with respect to the function. (note that if a is defined in the global scope, the function will work as expected - C.meth() has the global scope as a parent, not C's class-level scope).
Take the following code sample
var = True
def func1():
if var:
print("True")
else:
print("False")
# var = True
func1()
This prints True as one would expect.
However, if I uncomment # var = True, I get the error
UnboundLocalError: local variable 'var' referenced before assignment
Why does writing to a variable make an otherwise accessible variable inaccessible? What was the rationale behind this design choice?
Note I know how to solve it (with the global keyword). My question is why was it decided to behave this way.
Because:
Namespaces exist: the same variable name can be used at module level and inside functions, and have nothing to do with each other.
Python does not require variables to be declared, for ease of use
There still needs to be a way to distinguish between local and global variables
In cases where there is likely unexpected behavior, it is better to throw an error than to silently accept it
So Python chose the rule "if a variable name is assigned to within a function, then that name refers to a local variable" (because if it's never assigned to, it clearly isn't local as it never gets a value).
Your code could have been interpreted as using the module-level variable first (in the if: line), and then using the local variable later for the assignment. But, that will very often not be the expected behavior. So Guido decided that Python would not work like that, and throw the error instead.
Python defaults to implicit variable declaration via assignment, in order to remove the need for additional explicit declarations. Just "implicit declaration" leaves several options what assignment in nested scopes means, most prominently:
Assignment always declares a variable in the inner-most scope.
Assignment always declares a variable in the outer-most scope.
Assignment declares a variable in the inner-most scope, unless declared in any outer scope.
Assignment declares a variable in the inner-most scope, readable only after assignment.
The latter two options mean that a variable does not have a scope well-defined just by the assignment itself. They are "declaration via assignment + X" which can lead to unintended interactions between unrelated code.
That leaves the decision of whether "writing to a variable" should preferably happen to isolated local or shared global variables.
The Python designers consider it more important to explicitly mark writing to a global variable.
Python FAQ: Why am I getting an UnboundLocalError when the variable has a value?
[...]
This explicit declaration is required in order to remind you that (...) you are actually modifying the value of the variable in the outer scope
This is an intentional asymmetry towards purely reading globals, which is considered proper practice.
Python FAQ: What are the rules for local and global variables in Python?
[...]
On one hand, requiring global for assigned variables provides a bar against unintended side-effects. On the other hand, if global was required for all global references, you’d be using global all the time.
This is described in section 4.2.2 Resolution of names
When a name is not found at all, a NameError exception is raised. If the current scope is a function scope, and the name refers to a local variable that has not yet been bound to a value at the point where the name is used, an UnboundLocalError exception is raised. UnboundLocalError is a subclass of NameError.
If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations.
If a variable name defined in the outer scope is used in a nested scope, it depends on what you do with it in this nested scope:
If you only read a variable, it is the same variable.
If you write to a variable, then Python automatically creates a new, local variable, different from the one in the outer scope.
This local variable prevents access to a variable with the same name in the outer scope.
So writing to a variable don't change its scope, it creates a different, local variable.
You are not able to read this local variable before assigning to it.
Almost same as question Where are the local, global, static, auto, register, extern, const, volatile variables are stored?, the difference is this thread is asking how Python language implement this.
Of all those, Python only has "local", "global" and "nonlocal" variables.
Some of those are stored in a Dictionary or dictionary like object, which usually can be explicitly addressed.
"global": Actually "global" variables are global relatively to the module where they are defined - sometimes they are referred to as "module level" variables instead of globals, since most of evils of using global variables in C do not apply - since one won't have neither naming conflicts neither won't know wether a certain name came from when using a module-level global variable.
Their value is stored in the dictionary available as the "__dict__" attribute of a module object. It is important to note that all names in a module are stored in this way - since names in Python point to any akind of object: that is, there is no distinction at the language level, of "variables", functions or classes in a module: the names for all these objects will be keys in the "__dict__" special attribute, which is accessed directly by the Language. Yes, one can insert or change the objects pointed by variables in a module at run time with the usual "setattr", or even changing the module's __dict__ directly.
"local": Local variables are available fr "user code" in a dictionary returned by the "locals()" buil-in function call. This dictionary is referenced by the "f_locals" attribute of the current code frame being run. Since there are ways of retrieving the code frame of functions that called the current running code, one can retrieve values of the variables available in those functions using the f_locals attribute, although in the CPython implementation, changing a value in the f_locals dictionary won't reflect on the actuall variable values of the running code - those values are cached by the bytecode machinery.
"nonlocal" Variables are special references, inside a function to variables defined in an outter scope, in the case of functions (or other code, like a class body) defined inside a function. They can be retrieved in running code, by getting the func_closure attribute - which is a tuple of "cell" objects. For example, to retrieve the value of the first nonlocal variable inside a function object, one does:_
function.func_closure[0].cell_contents - the values are kept separate from the variable names, which can be retrieved as function.func_code.co_varnames. (this naming scheme is valid for Python 2.x)
The bottom-line: Variable "values" are always kept inside objects that are compatible with Python objects and managed by the virtual machine. Some of these data can be made programmatically accessible through introspection - some of it is opaque. (For example, retrieving, through introspection, nonlocal variables from inside the function that owns them itself is a bit tricky)