I am trying to understand how variables are managed internally by Python.
x = 10
def ex1():
if False:
x=1
print(x)
ex1()
When ex1() is executed, it shows an UnboundLocalError since local variable 'x' is not referenced.
How does this happen?
Does the parsing happen in an initial pass and just create the symbol table and specifies scope followed by the interpretation which happens in another pass and skips x=1 as it is unreachable?
Python doesn't have variable declarations, so when variables are created/used it has to determine the scope itself. Python scoping is lexical, meaning that it can access a variable in its enclosed scope, it cannot modify them.
The way ex1() is written, x=1 is local to ex1(). However, when you go to run ex1() it tries to read x=10 as it is local, and throws your UnboundLocalError.
So, the way that the variable is managed, is it sees a local declaration, runs the function, and sees another local declaration, and due to scope, cannot relate the two.
Conceptually this make sense. I can't tell how its implemented but I can tell why.
When you affect a variable it gets affected in the local scope unless you explicitly tell by using the keyword global. If you only access it and there is no affectation it will implicitly use the global variable since no local variable is defined.
x = 10
def access_global():
print x
def affect_local():
x = 0
print x
def affect_global():
global x
x = 1
print x
access_global() # 10
affect_local() # 0
print x # 10
affect_global() # 1
print x # 10
If you do this inside a nested function, class or module the rules are similar:
def main():
y = 10
def access():
print y
def affect():
y = 0
print y
access() # 10
affect() # 0
print y # 10
main()
This is probably saving hours of painful debugging by never overwriting variable of parent scopes unless its explicitly stated.
EDIT
disassembling the python byte code gives us some additional information to understand:
import dis
x = 10
def local():
if False:
x = 1
def global_():
global x
x = 1
print local
dis.dis(local)
print global_
dis.dis(global_)
<function local at 0x7fa01ec6cde8>
37 0 LOAD_GLOBAL 0 (False)
3 POP_JUMP_IF_FALSE 15
38 6 LOAD_CONST 1 (1)
9 STORE_FAST 0 (x)
12 JUMP_FORWARD 0 (to 15)
>> 15 LOAD_CONST 0 (None)
18 RETURN_VALUE
<function global_ at 0x7fa01ec6ce60>
42 0 LOAD_CONST 1 (1)
3 STORE_GLOBAL 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
We can see that the byte code for the local function is calling STORE_FAST and the global_ function calls STORE_GLOBAL.
This question also explain why its more performant to translate function to byte code to avoid compiling every time the function is called:
Why python compile the source to bytecode before interpreting?
Related
def f():
print("Before", locals()) # line 2
print(x); # line 3
x = 2 # line 4
print("After", locals()) # line 5
x = 1
f()
I am aware of the LEGB rule for scoping in Python.
For the above code, when I comment out line 4, everything executes normally as expected: for line 3, python does not find variable x in the local scope and therefore searches it in the global scope where it finds it and prints 1.
But when I execute the whole code as it is without commenting, it raises UnboundLocalError: local variable 'x' referenced before assignment.
I do know I can use nonlocal and global, but my question is :
How does python know there is a local variable declaration before it has encountered one?
Even if it does know there is a variable named x in the local scope (although not yet initialised), why doesn't it shows it in locals()?
I tried finding the answer in similar questions suggestions but failed. Please correct if any of my understanding is wrong.
To some extent, the answer is implementation specific, as Python only specifies the expected behavior, not how to implement it.
That said, let's look at the byte code generated for f by the usual implementation, CPython:
>>> import dis
>>> dis.dis(f)
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Before')
4 LOAD_GLOBAL 1 (locals)
6 CALL_FUNCTION 0
8 CALL_FUNCTION 2
10 POP_TOP
3 12 LOAD_GLOBAL 0 (print)
14 LOAD_FAST 0 (x)
16 CALL_FUNCTION 1
18 POP_TOP
4 20 LOAD_CONST 2 (2)
22 STORE_FAST 0 (x)
5 24 LOAD_GLOBAL 0 (print)
26 LOAD_CONST 3 ('After')
28 LOAD_GLOBAL 1 (locals)
30 CALL_FUNCTION 0
32 CALL_FUNCTION 2
34 POP_TOP
36 LOAD_CONST 0 (None)
38 RETURN_VALUE
There are several different LOAD_* op codes used to retrieve various values. LOAD_GLOBAL is used for names in the global scope; LOAD_CONST is used for local values not assigned to any name. LOAD_FAST is used for local variables. Local variables don't even exist by name, but by indices in an array. That's why they are "fast"; they are available in an array rather than a hash table. (LOAD_GLOBAL also uses integer arguments, but that's just an index into an array of names; the name itself still needs to be looked up in whatever mapping provides the global scope.)
You can even see the constants and local values associated with f:
>>> f.__code__.co_consts
(None, 'Before', 2, 'After')
>>> f.__code__.co_varnames
('x',)
LOAD_CONST 1 puts Before on the stack because f.__code__.co_consts[1] == 'Before', and LOAD_FAST 0 puts the value of x on the stack because f.__code__.co_varnames[0] == 'x'.
The key here is that the byte code is generated before f is ever executed. Python isn't simply executing each line the first time it sees it. Executing the def statement involves, among other things:
reading the source code
parsing into an abstract syntax tree (AST)
using the entire AST to generate the byte code stored in the __code__ attribute of the function object.
Part of the code generation is noting that the name x, due to the assignment somewhere in the body of the function (even if that function is logically unreachable), is a local name, and therefore must be accessed with LOAD_FAST.
At the time locals is called (and indeed before LOAD_FAST 0 is used the first time), no assignment to x (i.e., STORE_FAST 0) has yet been made, so there is no local value in slot 0 to look up.
Because you define it before the f() function calling,
let's try this one :
def f(y):
print("Before", locals()) # line 2
print(y); # line 3
y = 2 # line 4
print("After", locals()) # line 5
f(x)
x = 1
This question already has answers here:
UnboundLocalError trying to use a variable (supposed to be global) that is (re)assigned (even after first use)
(14 answers)
Closed 1 year ago.
Why does the first print statement in the second function throw an error that x is not defined?
x = 5
def function_a():
print(x)
def function_b():
print(x)
x = 7
print(x)
Running the first function gives the following result.
>>> function_a()
5
While running the second function throws an error.
UnboundLocalError: local variable 'x' referenced before assignment
Python will infer and use the variable from the inner scope whenever it sees the variable declared within the scope, even if that variable is declared after usage.
To demonstrate this, I created two scenarios.
Variable declared inside the inner scope
A variable is inferred in the following order: local, nonlocal, and global. Since x is declared inside the inner scope, Python will infer and use the x in this scope even if it is declared after usage.
Note that Python cannot distinguish modification from declaration; what was meant to modify x from the global scope was interpreted to declare another variable x within that scope.
No variables declared inside the inner scope
If no variables are declared within the inner scope, Python will switch the inferred scope to nonlocal, and then to global.
Explicitly switching scope
If you explicitly declared the scope of x beforehand, Python would not have to infer.
The following code will not throw an error because the scope it uses is explicitly the global scope instead of the inner one.
x = 5
def scope():
global x
print(x)
x = 7
print(x)
Scopes
By choosing which scope to work with, you are not only using the variables within that specific scope but also modifying the variables. Therefore, you need extra caution when dealing with scopes.
Because Python cannot distinguish variable declaration from variable modification, my advice is that if you want to use a global variable, you should explicitly state it beforehand.
This also applies to nested scopes.
x = 5
def outer():
x = 7
def inner():
nonlocal x
print(x)
x = 3
print(x)
inner()
Running the outer function gives the following result.
>>> outer()
7
3
Try changing the nonlocal keyword to global and see a different result; or remove the line completely to get an error.
In the 2nd method, you have written x = 7, which makes x a local variable for that method. But since, you are trying to access it on the print statement before the line "x = 7", it throws an error saying local variable x is accessed before assignment.
If you remove the line x = 7, it would work just fine.
You can try this
x = 5
def vs_code():
print(x)
def vs_code1():
print(x)
y = 7
print(y)
this will print
5
5
7
Now, since I am not declaring x inside 2nd method, now x is not local for 2nd method. If you declare it inside 2nd method, it will interpret that as a local variable. and you have to use it only after assignment.
Hope you understood.
python is special in variable scope
you can use dis module to see what happeded.
def foo():
print(x)
print(y)
x = ...
import dis
dis.dis(foo)
output
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_FAST 0 (x)
4 CALL_FUNCTION 1
6 POP_TOP
3 8 LOAD_GLOBAL 0 (print)
10 LOAD_GLOBAL 1 (y)
12 CALL_FUNCTION 1
14 POP_TOP
4 16 LOAD_CONST 1 (Ellipsis)
18 STORE_FAST 0 (x)
20 LOAD_CONST 0 (None)
22 RETURN_VALUE
x is LOAD_FAST
y is LOAD_GLOBAL
before running, python thinks x is a local variable because you try to modify it (cannot modify global variable).
this will works fine
def vs_code1():
global x
print(x)
x = 7
print(x)
When a variable is assigned a value inside a function, that variable name is considered local. Unless it is explicitly declared as global with the line:
global x
In the first print of the function the variable x is not defined in the local scope. That-s why throws an error.
So you have to add the line global x at the beggining of the function body for the variable x be considered global despite it is assigned in the function.
While I was learning LEGB scope rule of python, I wanted a deeper understanding of how global works in python. It seems that even if I refer to an undefined variable(Which is also not in builtins), The code doesn't give me an error. Please help me figure out what actually is happening.
def hey():
x = 1
def hey2():
global ew #ew not defined in the module
x = 2
print(x)
hey2()
print(x)
hey()
OUTPUT: 2
1
The keyword global is used to create or update a global variable locally
def hey():
x = 1
def hey2():
global ew #reference to create or update a global variable named ew
ew=2 # if you comment this global variable will not be created
x = 2
#print(x)
hey2()
#print(x)
print '\t ------Before function call-----'
print globals()
hey()
print '\n'
print '\t -----After function call------ '
print globals()
globals() will give you a dictionary of all objects the global scope contains
you can see in the second dictionary ew is present, which was not present in the first dictionary
Yes, the global statement can apply to a name that is not bound (an undefined variable) or even never used. It doesn't create the name, but informs the compiler that this name should be looked up only in a global scope, not in the local scope. The difference shows up in the compiled code as distinct operations:
>>> def foo():
... global g
... l = 1
... g = 2
...
>>> dis.dis(foo)
3 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (l)
4 6 LOAD_CONST 2 (2)
9 STORE_GLOBAL 0 (g)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
We see that STORE_FAST was used for a local variable, while STORE_GLOBAL was used for the global variable. There isn't any output for the global statement itself; it only changed how references to g operate.
Simple example of global variabl in two functions
def hey():
global x
x = 1
print x
hey() # prints 1
def hey2():
global x
x += 2
print x
hey2() #prints 3
I am learning Python and right now I am on the topic of scopes and nonlocal statement.
At some point I thought I figured it all out, but then nonlocal came and broke everything down.
Example number 1:
print( "let's begin" )
def a():
def b():
nonlocal x
x = 20
b()
a()
Running it naturally fails.
What is more interesting is that print() does not get executed. Why?.
My understanding was that enclosing def a() is not executed until print() is executed, and nested def b() is executed only when a() is called. I am confused...
Ok, let's try example number 2:
print( "let's begin" )
def a():
if False: x = 10
def b():
nonlocal x
x = 20
b()
a()
Aaand... it runs fine.
Whaaat?! How did THAT fix it? x = 10 in function a is never executed!
My understanding was that nonlocal statement is evaluated and executed at run-time, searching enclosing function's call contexts and binding local name x to some particular "outer" x. And if there is no x in outer functions - raise an exception. Again, at run-time.
But now it looks like this is done at the time of syntax analysis, with pretty dumb check "look in outer functions for x = blah, if there is something like this - we're fine," even if that x = blah is never executed...
Can anybody explain me when and how nonlocal statement is processed?
You can see what the scope of b knows about free variables (available for binding) from the scope of a, like so:
import inspect
print( "let's begin" )
def a():
if False:
x = 10
def b():
print(inspect.currentframe().f_code.co_freevars)
nonlocal x
x = 20
b()
a()
Which gives:
let's begin
('x',)
If you comment out the nonlocal line, and remove the if statement with x inside, the you'll see the free variables available to b is just ().
So let's look at what bytecode instruction this generates, by putting the definition of a into IPython and then using dis.dis:
In [3]: import dis
In [4]: dis.dis(a)
5 0 LOAD_CLOSURE 0 (x)
2 BUILD_TUPLE 1
4 LOAD_CONST 1 (<code object b at 0x7efceaa256f0, file "<ipython-input-1-20ba94fb8214>", line 5>)
6 LOAD_CONST 2 ('a.<locals>.b')
8 MAKE_FUNCTION 8
10 STORE_FAST 0 (b)
10 12 LOAD_FAST 0 (b)
14 CALL_FUNCTION 0
16 POP_TOP
18 LOAD_CONST 0 (None)
20 RETURN_VALUE
So then let's look at how LOAD_CLOSURE is processed in ceval.c.
TARGET(LOAD_CLOSURE) {
PyObject *cell = freevars[oparg];
Py_INCREF(cell);
PUSH(cell);
DISPATCH();
}
So we see it must look up x from freevars of the enclosing scope(s).
This is mentioned in the Execution Model documentation, where it says:
The nonlocal statement causes corresponding names to refer to previously bound variables in the nearest enclosing function scope. SyntaxError is raised at compile time if the given name does not exist in any enclosing function scope.
First, understand that python will check your module's syntax and if it detects something invalid it raises a SyntaxError which stops it from running at all. Your first example raises a SyntaxError but to understand exactly why is pretty complicated although it is easier to understand if you know how __slots__ works so I will quickly introduce that first.
When a class defines __slots__ it is basically saying that the instances should only have those attributes so each object is allocated memory with space for only those, trying to assign other attributes raises an error
class SlotsTest:
__slots__ = ["a", "b"]
x = SlotsTest()
x.a = 1 ; x.b = 2
x.c = 3 #AttributeError: 'SlotsTest' object has no attribute 'c'
The reason x.c = 3 can't work is that there is no memory space to put a .c attribute in.
If you do not specify __slots__ then all instances are created with a dictionary to store the instance variables, dictionaries do not have any limitations on how many values they contain
class DictTest:
pass
y = DictTest()
y.a = 1 ; y.b = 2 ; y.c = 3
print(y.__dict__) #prints {'a': 1, 'b': 2, 'c': 3}
Python functions work similar to slots. When python checks the syntax of your module it finds all variables assigned (or attempted to be assigned) in each function definition and uses that when constructing frames during execution.
When you use nonlocal x it gives an inner function access to a specific variable in the outer function scope but if there is no variable defined in the outer function then nonlocal x has no space to point to.
Global access doesn't run into the same issue since python modules are created with a dictionary to store its attributes. So global x is allowed even if there is no global reference to x
From the Google Style Guide on lexical scoping:
A nested Python function can refer to variables defined in enclosing
functions, but can not assign to them.
This specification can be seen here:
def toplevel():
a = 5
def nested():
# Tries to print local variable `a`, but `a` is created locally after,
# so `a` is referenced before assignment. You would need `nonlocal a`
print(a + 2)
a = 7
nested()
return a
toplevel()
# UnboundLocalError: local variable 'a' referenced before assignment
Reversing the order of the two statements in nested gets rid of this issue:
def toplevel():
a = 5
def nested():
# Two statements' order reversed, `a` is now locally assigned and can
# be referenced
a = 7
print(a + 2)
nested()
return a
toplevel()
My question is, what is it about Python's implementation that tells the first function that a will be declared locally (after the print statement)? My understanding is that Python is effectively interpreted line by line. So, shouldn't it default to looking for a nonlocal a at that point in the code?
To elaborate, if I was to use just reference (no assignment),
def toplevel():
a = 5
def nested():
print(a + 2)
nested()
return a
toplevel()
somehow the print statement knows to reference the nonlocal a defined in the enclosing function. But if I assign to a local a after that line, the function is almost too smart for its own good.
My understanding is that Python is effectively interpreted line by line.
That's not the right mental model.
The body of the entire function is analysed to determine which names refer to local variables and which don't.
To simplify your example, the following also gives UnboundLocalError:
def func():
print(a)
a = 2
func()
Here, func() compiles to the following bytecodes:
2 0 LOAD_FAST 0 (a)
3 PRINT_ITEM
4 PRINT_NEWLINE
3 5 LOAD_CONST 1 (2)
8 STORE_FAST 0 (a)
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
Compare this with
def gunc():
print(a)
which compiles to
2 0 LOAD_GLOBAL 0 (a)
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
Observe how the absence of assignment to a turns the reference from a local to a global one.
My understanding is that Python is effectively interpreted line by line
That's where you're wrong. The whole file is compiled to bytecode before any interpretation begins.
Also, even if the bytecode compilation pass didn't exist, print(a + 2) wouldn't actually be executed before a = 7 is seen, because it's in a function definition. Python would still know about the a = 7 by the time it actually tries to execute print(a + 2).
As per document
A special quirk of Python is that – if no global statement is in effect – assignments to names always go into the innermost scope. Assignments do not copy data — they just bind names to objects.