Behaviour discrepancy between classes and functions scope in Python

Behaviour discrepancy between classes and functions scope in Python - python

As you may know, the scope of a variable is statically determined in python (What are the rules for local and global variables in Python?).
For instance :
a = "global"
def function():
print(a)
a = "local"
function()
# UnboundLocalError: local variable 'a' referenced before assignment
The same rule applies to classes, but it seems to default to the global scope instead of raising an AttributeError:
a = "global"
class C():
print(a)
a = "local"
# 'global'
Moreover, in the case of a nested function, the behavior is the same (without using nonlocal or global) :
a = "global"
def outer_func():
a = "outer"
def inner_func():
print(a)
a = "local"
inner_func()
outer_func()
# UnboundLocalError: local variable 'a' referenced before assignment
But in the case of nested classes, it still defaults to the global scope, and not the outer scope (again without using global or nonlocal) :
a = "global"
def outer_func():
a = "outer"
class InnerClass:
print(a)
a = "local"
outer_func()
# 'global'
The weirdest part is that the nested class default to the outer scope when there is no declaration of a :
a = "global"
def outer_func():
a = "outer"
class InnerClass:
print(a)
outer_func()
# 'outer'
So my questions are :
Why the discrepancy between functions and classes (one raising an exception, the other defaulting to the global scope.
In nested classes, why the default scope has to become global instead of keeping using the outer one when using a variable defined afterward?

The answer is given in great detail in Section 9.2 of the official docs. The crux of the matter is
... On the other hand, the actual search for names is done dynamically, at run time — however, the language definition is evolving towards static name resolution, at “compile” time, so don’t rely on dynamic name resolution! (In fact, local variables are already determined statically.)
When you are in the class definition, which at the moment of its execution is the innermost scope, dynamic name resolution applies. You therefore see printouts of the global value of a.
If the name resolution were static, as in function definitions, the name a would be recognized as a local name even in the print statement. That is why you can't print a in a function before assigning to it.
The rules for class body scoping are alluded to in Section 4.2.2:
Class definition blocks and arguments to exec() and eval() are special in the context of name resolution. A class definition is an executable statement that may use and define names. These references follow the normal rules for name resolution with an exception that unbound local variables are looked up in the global namespace.
Let's parse that last sentence carefully, because it fully covers your last two examples. First off, what is an unbound local variable in this context? A class body creates a new namespace, just like entering a function. If a name is bound somewhere in a class body, it is a local variable. This is determined statically, as mentioned above. If you attempt to reference the name before it is first bound, you have an unbound local variable. Instead of raising an error, as a function call would do, python jumps straight to the global namespace to perform the lookup (and ignores builtins as well). In all other cases (not local variables), normal LEGB lookup order applies.
This is indeed a bit counter-intuitive, and I would argue that it pushes if not outright breaks the rule of least surprise.

Related

Precise rules of variable binding in nested scopes [duplicate]

This question already has answers here:
Short description of the scoping rules?
(9 answers)
Closed 2 years ago.
I seem to have misunderstood something about Python variable binding. What are the precise rules for deciding which variable is accessed given a nested scope with shadowing names?
Let me illustrate with some examples. First the basic shadow.
a = 1
def foo():
a = 2
def _foo():
return a
return _foo()
print(foo()) # -> 2
Everything is fine here. The value is overwritten and returned accordingly. However, if the value is changed after the function definition, it is still the inner value:
def bar():
def _bar():
return a
a = 2
return _bar()
print(bar()) # -> 2
What's more, defining a function that references a non-existent variable is possible.
def baz():
def _baz():
return b
return _baz()
Then, if b is defined later, the function can be executed. But not if is defined in another inner scope:
def qux(f):
b = 3
return f()
print(qux(baz())) # -> NameError
Now all of these cases could be explained by having Python know about lines that come later in the program, but that conflicts with my knowledge of Python being an interpreted language, advancing line by line. So are statements parsed at once instead of line by line?
A weird behaviour with shadowing class attributes throws me off a bit more.
class C:
a = 2
b = a
def meth(self):
return a
c = meth
print(C.b, C().meth(), C.c) # -> 2 1 C.meth
Here a is defined as a class attribute and is successfully used in b, but this does not carry over to the method definition. The method itself can be used in later attributes, but not for example in other methods without going through self.
Is my guess about the binding happening all at once correct? And in that case are class bodies an exception by design, or are they not a scope at all? Or is something else going on here?

I think you might be overthinking it.
By default, variables when created are put in the narrowest enclosing function's scope.
Variables from all enclosing scopes are available in a read-only capacity, be that an enclosing function's scope or the global scope. If you try to assign to this, it'll create a new variable in the narrowest enclosing scope, shadowing those outside. Using the global keyword to bring an external variable into the local scope will stop this from happening, allowing you to assign things to the non-local scope.
Additionally, keep in mind that functions are compiled and evaluated at the time when the def statement is interpreted. For nested functions, essentially, every new call re-evaluates the inner functions. This also means that inner functions have read-only access to the scope of the outer functions. Same rules as usual.
Your bar() example works because, by the time python tries to access the variable a, it is present in at least one of the enclosing scopes. Python doesn't check these things until the last possible moment. Your qux() example doesn't work because the scope in which b is declared does not enclose the scope where _baz() is defined, and thus is not accessible.
Class scopes are weird. When the class is evaluated, all variables defined inside it are bound to the class. However, the class doesn't really count as a scope of its own, for the purpose of the methods enclosed inside it. Think of meth() as an unbound function, declared in the global scope, which C.meth refers to (and, now, C.c). Calling a function via dot notation is a syntactic shorthand:
# the following two are identical
C().meth()
C.meth(C())
and while C.meth is technically bound to C, it's not enclosed in C's class-level namespace. Trying to do C().meth() will fail, because a is not defined with respect to the function. (note that if a is defined in the global scope, the function will work as expected - C.meth() has the global scope as a parent, not C's class-level scope).

Why does writing to a variable change its scope?

Take the following code sample
var = True
def func1():
if var:
print("True")
else:
print("False")
# var = True
func1()
This prints True as one would expect.
However, if I uncomment # var = True, I get the error
UnboundLocalError: local variable 'var' referenced before assignment
Why does writing to a variable make an otherwise accessible variable inaccessible? What was the rationale behind this design choice?
Note I know how to solve it (with the global keyword). My question is why was it decided to behave this way.

Because:
Namespaces exist: the same variable name can be used at module level and inside functions, and have nothing to do with each other.
Python does not require variables to be declared, for ease of use
There still needs to be a way to distinguish between local and global variables
In cases where there is likely unexpected behavior, it is better to throw an error than to silently accept it
So Python chose the rule "if a variable name is assigned to within a function, then that name refers to a local variable" (because if it's never assigned to, it clearly isn't local as it never gets a value).
Your code could have been interpreted as using the module-level variable first (in the if: line), and then using the local variable later for the assignment. But, that will very often not be the expected behavior. So Guido decided that Python would not work like that, and throw the error instead.

Python defaults to implicit variable declaration via assignment, in order to remove the need for additional explicit declarations. Just "implicit declaration" leaves several options what assignment in nested scopes means, most prominently:
Assignment always declares a variable in the inner-most scope.
Assignment always declares a variable in the outer-most scope.
Assignment declares a variable in the inner-most scope, unless declared in any outer scope.
Assignment declares a variable in the inner-most scope, readable only after assignment.
The latter two options mean that a variable does not have a scope well-defined just by the assignment itself. They are "declaration via assignment + X" which can lead to unintended interactions between unrelated code.
That leaves the decision of whether "writing to a variable" should preferably happen to isolated local or shared global variables.
The Python designers consider it more important to explicitly mark writing to a global variable.
Python FAQ: Why am I getting an UnboundLocalError when the variable has a value?
[...]
This explicit declaration is required in order to remind you that (...) you are actually modifying the value of the variable in the outer scope
This is an intentional asymmetry towards purely reading globals, which is considered proper practice.
Python FAQ: What are the rules for local and global variables in Python?
[...]
On one hand, requiring global for assigned variables provides a bar against unintended side-effects. On the other hand, if global was required for all global references, you’d be using global all the time.

This is described in section 4.2.2 Resolution of names
When a name is not found at all, a NameError exception is raised. If the current scope is a function scope, and the name refers to a local variable that has not yet been bound to a value at the point where the name is used, an UnboundLocalError exception is raised. UnboundLocalError is a subclass of NameError.
If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations.

If a variable name defined in the outer scope is used in a nested scope, it depends on what you do with it in this nested scope:
If you only read a variable, it is the same variable.
If you write to a variable, then Python automatically creates a new, local variable, different from the one in the outer scope.
This local variable prevents access to a variable with the same name in the outer scope.
So writing to a variable don't change its scope, it creates a different, local variable.
You are not able to read this local variable before assigning to it.

Loose late binding v. strict late binding

While reading Python’s Execution model documentation, I realized that Python’s free variables do not seem to have a strict late binding property where a name binding occurring in any code block can be used for name resolution. Indeed, executing:
def f():
return x
def g():
x = 0
return f()
print(g())
raises:
NameError: name 'x' is not defined
They have rather a loose late binding property where only a name binding occurring in an outer code block of the code block introducing the free variable can be used for name resolution. Indeed executing
def f():
return x
x = 0
print(f())
prints:
0
What are the benefits and drawbacks of the loose late binding property compared to the strict late binding property?

This is generally known as dynamic scoping and static scoping. Roughly speaking, dynamic scoping determines scope by call nesting and static scoping determines scope by declaration nesting.
In general, dynamic scoping is very easy to implement for any language with a call stack – a name lookup simply searches the current stack linearly. In contrast, static scoping is more complex, requiring several distinct scopes with their own lifetime.
However, static scoping is generally easier to understand, since the scope of a variable never changes – a name lookup has to be resolved once and will always point to the same scope. In contrast, dynamic scoping is more brittle, with names being resolved in different or no scope when calling a function.
Python's scoping rules are mainly defined by PEP 227 introducing nested scoping ("closures") and PEP 3104 introducing writable nested scoping (nonlocal). The primary use-case of such static scoping is to allow higher-order functions ("function-producing-function") to automatically parameterise inner functions; this is commonly used for callbacks, decorators or factory-functions.
def adder(base=0): # factory function returns a new, parameterised function
def add(x):
return base + x # inner function is implicitly parameterised by base
return add
Both PEPs codify how Python handles the complications of static scoping. In specific, scope is resolved once at compile time – every name is thereafter strictly either global, nonlocal or local. In return, static scoping allows to optimise variable access – variables are read either from a fast array of locals, an indirecting array of closure cells, or a slow global dictionary.
An artefact of this statically scoped name resolution is UnboundLocalError : a name may be scoped locally but not yet assigned locally. Even though there is some value assigned to the name somewhere, static scoping forbids accessing it.
>>> some_name = 42
>>> def ask():
... print("the answer is", some_name)
... some_name = 13
...
>>> ask()
UnboundLocalError: local variable 'some_name' referenced before assignment
Various means exist to circumvent this, but they all come down to the programmer having to explicitly define how to resolve a name.
While Python does not natively implement dynamic scoping, it can in be easily emulated. Since dynamic scoping is identical to a stack of scopes for each stack of calls, this can be implemented explicitly.
Python natively provides threading.local to contextualise a variable to each call stack. Similarly, contextvars allows to explicitly contextualise a variable – this is useful for e.g. async code which sidesteps the regular call stack. A naive dynamic scope for threads can be built as a literal scope stack that is thread local:
import contextlib
import threading
class DynamicScope(threading.local): # instance data is local to each thread
"""Dynamic scope that supports assignment via a context manager"""
def __init__(self):
super().__setattr__('_scopes', []) # keep stack of scopes
#contextlib.contextmanager # a context enforces pairs of set/unset operations
def assign(self, **names):
self._scopes.append(names) # push new assignments to stack
yield self # suspend to allow calling other functions
self._scopes.pop() # clear new assignments from stack
def __getattr__(self, item):
for sub_scope in reversed(self._scopes): # linearly search through scopes
try:
return sub_scope[item]
except KeyError:
pass
raise NameError(f"name {item!r} not dynamically defined")
def __setattr__(self, key, value):
raise TypeError(f'{self.__class__.__name__!r} does not support assignment')
This allows to globally define a dynamic scope, to which a name can be assigned for a restricted duration. Assigned names are automatically visible in called functions.
scope = DynamicScope()
def print_answer():
print(scope.answer) # read from scope and hope something is assigned
def guess_answer():
# assign to scope before calling function that uses the scope
with scope.assign(answer=42):
print_answer()
with scope.assign(answer=13):
print_answer() # 13
guess_answer() # 42
print_answer() # 13
print_answer() # NameError: name 'answer' not dynamically defined

Static (Early) and Dynamic (Late) Binding :
Binding refers to the association of names in program text to the storage locations to which they refer. In static binding, this association is predetermined at build time. With dynamic binding, this association is not determined until run-time.
Dynamic binding is the binding which happens in Python. This means that the Python interpreter does binding only as code runs. For example -
>>> if False:
... x # This line never runs, so no error is raised
... else:
... 1 + 2
...
3
>>>
Advantages of dynamic binding
Main advantage of dynamic type binding is flexibility. It is more easy to write generic code.
Ex - a program to process a list of data in a language that uses dynamic type binding can be written as a generic program.
Disadvantages of dynamic binding
Error detection capability of the compiler is diminished. some errors that a compiler could have caught.
considerable overhead at runtime.

Read-only Namespace in Python

What does read-value mean in the context of names?
In [1]: def outer():
...: x=1
...: def inner():
...: print x
...: inner()
...:
In [2]: outer()
1
Like in the above example x in not the namespace of inner(). Do variables in namespaces have types such as read-only/ writeable etc?
Quoting official docs: "To rebind variables found outside of the innermost
scope, the nonlocal statement can be used; if not declared nonlocal,
those variable are read-only (an attempt to write to such a variable
will simply create a new local variable in the innermost scope,
leaving the identically named outer variable unchanged)."
Reference: https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces (5 th last paragraph)

No, variables don't have 'properties' like read-only or write. The namespace implementation may disallow direct manipulation, but that's something entirely different.
The nonlocal and global statements let you override the scope of a variable; without these Python will make a name local if it is being bound to. Binding actions include assignment, for loop, with .. as and except .. as statement targets, function arguments and imports. Changing the scope doesn't alter if a variable is read-only or not, it merely changes the scope that manages the variable. nonlocal means it'll be stored in a parent scope and made a closure.
The term read-only used by the quote you found is very misleading; assignment clearly still alters the local variable. The presence of local doesn't make the same name in the parent scope not writable, it is simply not visible in the innermost scope. It's disappointing that this is from the official tutorials; I've reported a bug to see this corrected.

Immediately enclosing namespace of a top-level function (for the purpose of parsing nonlocal)? [duplicate]

In Python 3.3.1, this works:
i = 76
def A():
global i
i += 10
print(i) # 76
A()
print(i) # 86
This also works:
def enclosing_function():
i = 76
def A():
nonlocal i
i += 10
print(i) # 76
A()
print(i) # 86
enclosing_function()
But this doesn't work:
i = 76
def A():
nonlocal i # "SyntaxError: no binding for nonlocal 'i' found"
i += 10
print(i)
A()
print(i)
The documentation for the nonlocal keyword states (emphasis added):
The nonlocal statement causes the listed identifiers to refer to
previously bound variables in the nearest enclosing scope.
In the third example, the "nearest enclosing scope" just happens to be the global scope. So why doesn't it work?
PLEASE READ THIS BIT
I do notice that the documentation goes on to state (emphasis added):
The [nonlocal] statement allows encapsulated code to
rebind variables outside of the local scope besides the global
(module) scope.
but, strictly speaking, this doesn't mean that what I'm doing in the third example shouldn't work.

The search order for names is LEGB, i.e Local, Enclosing, Global, Builtin. So the global scope is not an enclosing scope.
EDIT
From the docs:
The nonlocal statement causes the listed identifiers to refer to
previously bound variables in the nearest enclosing scope. This is
important because the default behavior for binding is to search the
local namespace first. The statement allows encapsulated code to
rebind variables outside of the local scope besides the global
(module) scope.

why is a module's scope considered global and not an enclosing one? It's still not global to other modules (well, unless you do from module import *), is it?
If you put some name into module's namespace; it is visible in any module that uses module i.e., it is global for the whole Python process.
In general, your application should use as few mutable globals as possible. See Why globals are bad?:
Non-locality
No Access Control or Constraint Checking
Implicit coupling
Concurrency issues
Namespace pollution
Testing and Confinement
Therefore It would be bad if nonlocal allowed to create globals by accident. If you want to modify a global variable; you could use global keyword directly.
global is the most destructive: may affect all uses of the module anywhere in the program
nonlocal is less destructive: limited by the outer() function scope (the binding is checked at compile time)
no declaration (local variable) is the least destructive option: limited by inner() function scope
You can read about history and motivation behind nonlocal in PEP: 3104
Access to Names in Outer Scopes.

It depends upon the Boundary cases:
nonlocals come with some senstivity areas which we need to be aware of. First, unlike the global statement, nonlocal names really must have previous been assigned in an enclosing def's scope when a nonlocal is evaluated or else you'll get an error-you cannot create them dynamically by assigning them anew in the enclosing scope. In fact, they are checked at function definition time before either or nested function is called
>>>def tester(start):
def nested(label):
nonlocal state #nonlocals must already exist in enclosing def!
state = 0
print(label, state)
return nested
SyntaxError: no binding for nonlocal 'state' found
>>>def tester(start):
def nested(label):
global state #Globals dont have to exits yet when declared
state = 0 #This creates the name in the module now
print(label, state)
return nested
>>> F = tester(0)
>>> F('abc')
abc 0
>>> state
0
Second, nonlocal restricts the scope lookup to just enclosing defs; nonlocals are not looked up in the enclosing module's global scope or the built-in scope outside all def's, even if they are already there:
for example:-
>>>spam = 99
>>>def tester():
def nested():
nonlocal spam #Must be in a def, not the module!
print('current=', spam)
spam += 1
return nested
SyntaxError: no binding for nonlocal 'spam' found
These restrictions make sense once you realize that python would not otherwise generally know enclosing scope to create a brand-new name in. In the prior listing, should spam be assigned in tester, or the module outside? Because this is ambiguous, Python must resolve nonlocals at function creation time, not function call time.

The answer is that the global scope does not enclose anything - it is global to everything. Use the global keyword in such a case.

Historical reasons
In 2.x, nonlocal didn't exist yet. It wasn't considered necessary to be able to modify enclosing, non-global scopes; the global scope was seen as a special case. After all, the concept of a "global variable" is a lot easier to explain than lexical closures.
The global scope works differently
Because functions are objects, and in particular because a nested function could be returned from its enclosing function (producing an object that persists after the call to the enclosing function), Python needs to implement lookup into enclosing scopes differently from lookup into either local or global scopes. Specifically, in the reference implementation of 3.x, Python will attach a __closure__ attribute to the inner function, which is a tuple of cell instances that work like references (in the C++ sense) to the closed-over variables. (These are also references in the reference-counting garbage-collection sense; they keep the call frame data alive so that it can be accessed after the enclosing function returns.)
By contrast, global lookup works by doing a chained dictionary lookup: there's a dictionary that implements the global scope, and if that fails, a separate dictionary for the builtin scope is checked. (Of course, writing a global only writes to the global dict, not the builtin dict; there is no builtin keyword.)
Theoretically, of course, there's no reason why the implementation of nonlocal couldn't fall back on a lookup in the global (and then builtin) scope, in the same way that a lookup in the global scope falls back to builtins. Stack Overflow is not the right place to speculate on the reason behind the design decision. I can't find anything relevant in the PEP, so it may simply not have been considered.
The best I can offer is: like with local variable lookup, nonlocal lookup works by determining at compile time what the scope of the variable will be. If you consider builtins as simply pre-defined, shadow-able globals (i.e. the only real difference between the actual implementation and just dumping them into the global scope ahead of time, is that you can recover access to the builtin with del), then so does global lookup. As they say, "simple is better than complex" and "special cases aren't special enough to break the rules"; so, no fallback behaviour.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.