Related
This question already has answers here:
UnboundLocalError trying to use a variable (supposed to be global) that is (re)assigned (even after first use)
(14 answers)
Closed 1 year ago.
Why does the first print statement in the second function throw an error that x is not defined?
x = 5
def function_a():
print(x)
def function_b():
print(x)
x = 7
print(x)
Running the first function gives the following result.
>>> function_a()
5
While running the second function throws an error.
UnboundLocalError: local variable 'x' referenced before assignment
Python will infer and use the variable from the inner scope whenever it sees the variable declared within the scope, even if that variable is declared after usage.
To demonstrate this, I created two scenarios.
Variable declared inside the inner scope
A variable is inferred in the following order: local, nonlocal, and global. Since x is declared inside the inner scope, Python will infer and use the x in this scope even if it is declared after usage.
Note that Python cannot distinguish modification from declaration; what was meant to modify x from the global scope was interpreted to declare another variable x within that scope.
No variables declared inside the inner scope
If no variables are declared within the inner scope, Python will switch the inferred scope to nonlocal, and then to global.
Explicitly switching scope
If you explicitly declared the scope of x beforehand, Python would not have to infer.
The following code will not throw an error because the scope it uses is explicitly the global scope instead of the inner one.
x = 5
def scope():
global x
print(x)
x = 7
print(x)
Scopes
By choosing which scope to work with, you are not only using the variables within that specific scope but also modifying the variables. Therefore, you need extra caution when dealing with scopes.
Because Python cannot distinguish variable declaration from variable modification, my advice is that if you want to use a global variable, you should explicitly state it beforehand.
This also applies to nested scopes.
x = 5
def outer():
x = 7
def inner():
nonlocal x
print(x)
x = 3
print(x)
inner()
Running the outer function gives the following result.
>>> outer()
7
3
Try changing the nonlocal keyword to global and see a different result; or remove the line completely to get an error.
In the 2nd method, you have written x = 7, which makes x a local variable for that method. But since, you are trying to access it on the print statement before the line "x = 7", it throws an error saying local variable x is accessed before assignment.
If you remove the line x = 7, it would work just fine.
You can try this
x = 5
def vs_code():
print(x)
def vs_code1():
print(x)
y = 7
print(y)
this will print
5
5
7
Now, since I am not declaring x inside 2nd method, now x is not local for 2nd method. If you declare it inside 2nd method, it will interpret that as a local variable. and you have to use it only after assignment.
Hope you understood.
python is special in variable scope
you can use dis module to see what happeded.
def foo():
print(x)
print(y)
x = ...
import dis
dis.dis(foo)
output
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_FAST 0 (x)
4 CALL_FUNCTION 1
6 POP_TOP
3 8 LOAD_GLOBAL 0 (print)
10 LOAD_GLOBAL 1 (y)
12 CALL_FUNCTION 1
14 POP_TOP
4 16 LOAD_CONST 1 (Ellipsis)
18 STORE_FAST 0 (x)
20 LOAD_CONST 0 (None)
22 RETURN_VALUE
x is LOAD_FAST
y is LOAD_GLOBAL
before running, python thinks x is a local variable because you try to modify it (cannot modify global variable).
this will works fine
def vs_code1():
global x
print(x)
x = 7
print(x)
When a variable is assigned a value inside a function, that variable name is considered local. Unless it is explicitly declared as global with the line:
global x
In the first print of the function the variable x is not defined in the local scope. That-s why throws an error.
So you have to add the line global x at the beggining of the function body for the variable x be considered global despite it is assigned in the function.
From the Google Style Guide on lexical scoping:
A nested Python function can refer to variables defined in enclosing
functions, but can not assign to them.
This specification can be seen here:
def toplevel():
a = 5
def nested():
# Tries to print local variable `a`, but `a` is created locally after,
# so `a` is referenced before assignment. You would need `nonlocal a`
print(a + 2)
a = 7
nested()
return a
toplevel()
# UnboundLocalError: local variable 'a' referenced before assignment
Reversing the order of the two statements in nested gets rid of this issue:
def toplevel():
a = 5
def nested():
# Two statements' order reversed, `a` is now locally assigned and can
# be referenced
a = 7
print(a + 2)
nested()
return a
toplevel()
My question is, what is it about Python's implementation that tells the first function that a will be declared locally (after the print statement)? My understanding is that Python is effectively interpreted line by line. So, shouldn't it default to looking for a nonlocal a at that point in the code?
To elaborate, if I was to use just reference (no assignment),
def toplevel():
a = 5
def nested():
print(a + 2)
nested()
return a
toplevel()
somehow the print statement knows to reference the nonlocal a defined in the enclosing function. But if I assign to a local a after that line, the function is almost too smart for its own good.
My understanding is that Python is effectively interpreted line by line.
That's not the right mental model.
The body of the entire function is analysed to determine which names refer to local variables and which don't.
To simplify your example, the following also gives UnboundLocalError:
def func():
print(a)
a = 2
func()
Here, func() compiles to the following bytecodes:
2 0 LOAD_FAST 0 (a)
3 PRINT_ITEM
4 PRINT_NEWLINE
3 5 LOAD_CONST 1 (2)
8 STORE_FAST 0 (a)
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
Compare this with
def gunc():
print(a)
which compiles to
2 0 LOAD_GLOBAL 0 (a)
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
Observe how the absence of assignment to a turns the reference from a local to a global one.
My understanding is that Python is effectively interpreted line by line
That's where you're wrong. The whole file is compiled to bytecode before any interpretation begins.
Also, even if the bytecode compilation pass didn't exist, print(a + 2) wouldn't actually be executed before a = 7 is seen, because it's in a function definition. Python would still know about the a = 7 by the time it actually tries to execute print(a + 2).
As per document
A special quirk of Python is that – if no global statement is in effect – assignments to names always go into the innermost scope. Assignments do not copy data — they just bind names to objects.
I am trying to understand how variables are managed internally by Python.
x = 10
def ex1():
if False:
x=1
print(x)
ex1()
When ex1() is executed, it shows an UnboundLocalError since local variable 'x' is not referenced.
How does this happen?
Does the parsing happen in an initial pass and just create the symbol table and specifies scope followed by the interpretation which happens in another pass and skips x=1 as it is unreachable?
Python doesn't have variable declarations, so when variables are created/used it has to determine the scope itself. Python scoping is lexical, meaning that it can access a variable in its enclosed scope, it cannot modify them.
The way ex1() is written, x=1 is local to ex1(). However, when you go to run ex1() it tries to read x=10 as it is local, and throws your UnboundLocalError.
So, the way that the variable is managed, is it sees a local declaration, runs the function, and sees another local declaration, and due to scope, cannot relate the two.
Conceptually this make sense. I can't tell how its implemented but I can tell why.
When you affect a variable it gets affected in the local scope unless you explicitly tell by using the keyword global. If you only access it and there is no affectation it will implicitly use the global variable since no local variable is defined.
x = 10
def access_global():
print x
def affect_local():
x = 0
print x
def affect_global():
global x
x = 1
print x
access_global() # 10
affect_local() # 0
print x # 10
affect_global() # 1
print x # 10
If you do this inside a nested function, class or module the rules are similar:
def main():
y = 10
def access():
print y
def affect():
y = 0
print y
access() # 10
affect() # 0
print y # 10
main()
This is probably saving hours of painful debugging by never overwriting variable of parent scopes unless its explicitly stated.
EDIT
disassembling the python byte code gives us some additional information to understand:
import dis
x = 10
def local():
if False:
x = 1
def global_():
global x
x = 1
print local
dis.dis(local)
print global_
dis.dis(global_)
<function local at 0x7fa01ec6cde8>
37 0 LOAD_GLOBAL 0 (False)
3 POP_JUMP_IF_FALSE 15
38 6 LOAD_CONST 1 (1)
9 STORE_FAST 0 (x)
12 JUMP_FORWARD 0 (to 15)
>> 15 LOAD_CONST 0 (None)
18 RETURN_VALUE
<function global_ at 0x7fa01ec6ce60>
42 0 LOAD_CONST 1 (1)
3 STORE_GLOBAL 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
We can see that the byte code for the local function is calling STORE_FAST and the global_ function calls STORE_GLOBAL.
This question also explain why its more performant to translate function to byte code to avoid compiling every time the function is called:
Why python compile the source to bytecode before interpreting?
How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:
class Foo:
x = 5
y = [x for i in range(1)]
Python 3.2 gives the error:
NameError: global name 'x' is not defined
Trying Foo.x doesn't work either. Any ideas on how to do this in Python 3?
A slightly more complicated motivating example:
from collections import namedtuple
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for args in [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
]]
In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.
Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.
The why; or, the official word on this
In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see List comprehension rebinds names even after scope of comprehension. Is this right?). That's great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.
This is documented in pep 227:
Names in class scope are not accessible. Names are resolved in
the innermost enclosing function scope. If a class definition
occurs in a chain of nested scopes, the resolution process skips
class definitions.
and in the class compound statement documentation:
The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.
Emphasis mine; the execution frame is the temporary scope.
Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.
Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:
The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:
class A:
a = 42
b = list(a + i for i in range(10))
So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.
# Same error, in Python 2 or 3
y = {x: x for i in range(1)}
The (small) exception; or, why one part may still work
There's one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it's the range(1):
y = [x for i in range(1)]
# ^^^^^^^^
Thus, using x in that expression would not throw an error:
# Runs fine
y = [i for i in range(x)]
This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension's scope:
# NameError
y = [i for i in range(1) for j in range(x)]
# ^^^^^^^^^^^^^^^^^ -----------------
# outer loop inner, nested loop
This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.
Looking under the hood; or, way more detail than you ever wanted
You can see this all in action using the dis module. I'm using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.
To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:
>>> import dis
>>> def foo():
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo)
2 0 LOAD_BUILD_CLASS
1 LOAD_CONST 1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>)
4 LOAD_CONST 2 ('Foo')
7 MAKE_FUNCTION 0
10 LOAD_CONST 2 ('Foo')
13 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
16 STORE_FAST 0 (Foo)
5 19 LOAD_FAST 0 (Foo)
22 RETURN_VALUE
The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.
The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.
The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.
Let's inspect that code object that creates the class body itself; code objects have a co_consts structure:
>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
2 0 LOAD_FAST 0 (__locals__)
3 STORE_LOCALS
4 LOAD_NAME 0 (__name__)
7 STORE_NAME 1 (__module__)
10 LOAD_CONST 0 ('foo.<locals>.Foo')
13 STORE_NAME 2 (__qualname__)
3 16 LOAD_CONST 1 (5)
19 STORE_NAME 3 (x)
4 22 LOAD_CONST 2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>)
25 LOAD_CONST 3 ('foo.<locals>.Foo.<listcomp>')
28 MAKE_FUNCTION 0
31 LOAD_NAME 4 (range)
34 LOAD_CONST 4 (1)
37 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
40 GET_ITER
41 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
44 STORE_NAME 5 (y)
47 LOAD_CONST 5 (None)
50 RETURN_VALUE
The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn't work because x isn't defined as a global). Note that after storing 5 in x, it loads another code object; that's the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.
From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.
Python 2.x uses inline bytecode there instead, here is output from Python 2.7:
2 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
3 6 LOAD_CONST 0 (5)
9 STORE_NAME 2 (x)
4 12 BUILD_LIST 0
15 LOAD_NAME 3 (range)
18 LOAD_CONST 1 (1)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 12 (to 40)
28 STORE_NAME 4 (i)
31 LOAD_NAME 2 (x)
34 LIST_APPEND 2
37 JUMP_ABSOLUTE 25
>> 40 STORE_NAME 5 (y)
43 LOAD_LOCALS
44 RETURN_VALUE
No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.
However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn't found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:
>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_GLOBAL 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.
Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):
>>> def foo():
... x = 2
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
5 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_DEREF 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
The LOAD_DEREF will indirectly load x from the code object cell objects:
>>> foo.__code__.co_cellvars # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars # Refers to `x` in foo
('x',)
>>> foo().y
[2]
The actual referencing looks the value up from the current frame data structures, which were initialized from a function object's .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function's closure. To see a closure in action, we'd have to inspect a nested function instead:
>>> def spam(x):
... def eggs():
... return x
... return eggs
...
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5
So, to summarize:
List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn't include the class body as a scope, the temporary function namespace is not considered.
A workaround; or, what to do about it
If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:
>>> class Foo:
... x = 5
... def y(x):
... return [x for i in range(1)]
... y = y(x)
...
>>> Foo.y
[5]
The 'temporary' y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:
>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)
Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.
The best work-around is to just use __init__ to create an instance variable instead:
def __init__(self):
self.y = [self.x for i in range(1)]
and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don't store the generated class at all), or use a global:
from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])
class StateDatabase:
db = [State(*args) for args in [
('Alabama', 'Montgomery'),
('Alaska', 'Juneau'),
# ...
]]
In my opinion it is a flaw in Python 3. I hope they change it.
Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):
class A:
x = 4
y = [x+i for i in range(1)]
NOTE: simply scoping it with A.x would not solve it
New Way (works in 3+):
class A:
x = 4
y = (lambda x=x: [x+i for i in range(1)])()
Because the syntax is so ugly I just initialize all my class variables in the constructor typically
The accepted answer provides excellent information, but there appear to be a few other wrinkles here -- differences between list comprehension and generator expressions. A demo that I played around with:
class Foo:
# A class-level variable.
X = 10
# I can use that variable to define another class-level variable.
Y = sum((X, X))
# Works in Python 2, but not 3.
# In Python 3, list comprehensions were given their own scope.
try:
Z1 = sum([X for _ in range(3)])
except NameError:
Z1 = None
# Fails in both.
# Apparently, generator expressions (that's what the entire argument
# to sum() is) did have their own scope even in Python 2.
try:
Z2 = sum(X for _ in range(3))
except NameError:
Z2 = None
# Workaround: put the computation in lambda or def.
compute_z3 = lambda val: sum(val for _ in range(3))
# Then use that function.
Z3 = compute_z3(X)
# Also worth noting: here I can refer to XS in the for-part of the
# generator expression (Z4 works), but I cannot refer to XS in the
# inner-part of the generator expression (Z5 fails).
XS = [15, 15, 15, 15]
Z4 = sum(val for val in XS)
try:
Z5 = sum(XS[i] for i in range(len(XS)))
except NameError:
Z5 = None
print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)
Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension's scope:
import itertools as it
class Foo:
x = 5
y = [j for i, j in zip(range(3), it.repeat(x))]
One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:
class Foo:
x = 5
y = [j for j in (x,) for i in range(3)]
For the specific example of the OP:
from collections import namedtuple
import itertools as it
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for State, args in zip(it.repeat(State), [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
])]
This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.
To illustrate why this is a bug, let's return to the original example. This fails:
class Foo:
x = 5
y = [x for i in range(1)]
But this works:
def Foo():
x = 5
y = [x for i in range(1)]
The limitation is stated at the end of this section in the reference guide.
This may be by design, but IMHO, it's a bad design. I know I'm not an expert here, and I've tried reading the rationale behind this, but it just goes over my head, as I think it would for any average Python programmer.
To me, a comprehension doesn't seem that much different than a regular mathematical expression. For example, if 'foo' is a local function variable, I can easily do something like:
(foo + 5) + 7
But I can't do:
[foo + x for x in [1,2,3]]
To me, the fact that one expression exists in the current scope and the other creates a scope of its own is very surprising and, no pun intended, 'incomprehensible'.
I spent quite some time to understand why this is a feature, not a bug.
Consider the simple code:
a = 5
def myfunc():
print(a)
Since there is no "a" defined in myfunc(), the scope would expand and the code will execute.
Now consider the same code in the class. It cannot work because this would completely mess around accessing the data in the class instances. You would never know, are you accessing a variable in the base class or the instance.
The list comprehension is just a sub-case of the same effect.
One can use a for loop:
class A:
x=5
##Won't work:
## y=[i for i in range(101) if i%x==0]
y=[]
for i in range(101):
if i%x==0:
y.append(i)
Please correct me i'm not wrong...
How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:
class Foo:
x = 5
y = [x for i in range(1)]
Python 3.2 gives the error:
NameError: global name 'x' is not defined
Trying Foo.x doesn't work either. Any ideas on how to do this in Python 3?
A slightly more complicated motivating example:
from collections import namedtuple
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for args in [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
]]
In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.
Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.
The why; or, the official word on this
In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see List comprehension rebinds names even after scope of comprehension. Is this right?). That's great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.
This is documented in pep 227:
Names in class scope are not accessible. Names are resolved in
the innermost enclosing function scope. If a class definition
occurs in a chain of nested scopes, the resolution process skips
class definitions.
and in the class compound statement documentation:
The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.
Emphasis mine; the execution frame is the temporary scope.
Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.
Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:
The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:
class A:
a = 42
b = list(a + i for i in range(10))
So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.
# Same error, in Python 2 or 3
y = {x: x for i in range(1)}
The (small) exception; or, why one part may still work
There's one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it's the range(1):
y = [x for i in range(1)]
# ^^^^^^^^
Thus, using x in that expression would not throw an error:
# Runs fine
y = [i for i in range(x)]
This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension's scope:
# NameError
y = [i for i in range(1) for j in range(x)]
# ^^^^^^^^^^^^^^^^^ -----------------
# outer loop inner, nested loop
This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.
Looking under the hood; or, way more detail than you ever wanted
You can see this all in action using the dis module. I'm using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.
To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:
>>> import dis
>>> def foo():
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo)
2 0 LOAD_BUILD_CLASS
1 LOAD_CONST 1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>)
4 LOAD_CONST 2 ('Foo')
7 MAKE_FUNCTION 0
10 LOAD_CONST 2 ('Foo')
13 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
16 STORE_FAST 0 (Foo)
5 19 LOAD_FAST 0 (Foo)
22 RETURN_VALUE
The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.
The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.
The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.
Let's inspect that code object that creates the class body itself; code objects have a co_consts structure:
>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
2 0 LOAD_FAST 0 (__locals__)
3 STORE_LOCALS
4 LOAD_NAME 0 (__name__)
7 STORE_NAME 1 (__module__)
10 LOAD_CONST 0 ('foo.<locals>.Foo')
13 STORE_NAME 2 (__qualname__)
3 16 LOAD_CONST 1 (5)
19 STORE_NAME 3 (x)
4 22 LOAD_CONST 2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>)
25 LOAD_CONST 3 ('foo.<locals>.Foo.<listcomp>')
28 MAKE_FUNCTION 0
31 LOAD_NAME 4 (range)
34 LOAD_CONST 4 (1)
37 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
40 GET_ITER
41 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
44 STORE_NAME 5 (y)
47 LOAD_CONST 5 (None)
50 RETURN_VALUE
The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn't work because x isn't defined as a global). Note that after storing 5 in x, it loads another code object; that's the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.
From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.
Python 2.x uses inline bytecode there instead, here is output from Python 2.7:
2 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
3 6 LOAD_CONST 0 (5)
9 STORE_NAME 2 (x)
4 12 BUILD_LIST 0
15 LOAD_NAME 3 (range)
18 LOAD_CONST 1 (1)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 12 (to 40)
28 STORE_NAME 4 (i)
31 LOAD_NAME 2 (x)
34 LIST_APPEND 2
37 JUMP_ABSOLUTE 25
>> 40 STORE_NAME 5 (y)
43 LOAD_LOCALS
44 RETURN_VALUE
No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.
However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn't found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:
>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_GLOBAL 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.
Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):
>>> def foo():
... x = 2
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
5 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_DEREF 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
The LOAD_DEREF will indirectly load x from the code object cell objects:
>>> foo.__code__.co_cellvars # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars # Refers to `x` in foo
('x',)
>>> foo().y
[2]
The actual referencing looks the value up from the current frame data structures, which were initialized from a function object's .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function's closure. To see a closure in action, we'd have to inspect a nested function instead:
>>> def spam(x):
... def eggs():
... return x
... return eggs
...
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5
So, to summarize:
List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn't include the class body as a scope, the temporary function namespace is not considered.
A workaround; or, what to do about it
If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:
>>> class Foo:
... x = 5
... def y(x):
... return [x for i in range(1)]
... y = y(x)
...
>>> Foo.y
[5]
The 'temporary' y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:
>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)
Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.
The best work-around is to just use __init__ to create an instance variable instead:
def __init__(self):
self.y = [self.x for i in range(1)]
and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don't store the generated class at all), or use a global:
from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])
class StateDatabase:
db = [State(*args) for args in [
('Alabama', 'Montgomery'),
('Alaska', 'Juneau'),
# ...
]]
In my opinion it is a flaw in Python 3. I hope they change it.
Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):
class A:
x = 4
y = [x+i for i in range(1)]
NOTE: simply scoping it with A.x would not solve it
New Way (works in 3+):
class A:
x = 4
y = (lambda x=x: [x+i for i in range(1)])()
Because the syntax is so ugly I just initialize all my class variables in the constructor typically
The accepted answer provides excellent information, but there appear to be a few other wrinkles here -- differences between list comprehension and generator expressions. A demo that I played around with:
class Foo:
# A class-level variable.
X = 10
# I can use that variable to define another class-level variable.
Y = sum((X, X))
# Works in Python 2, but not 3.
# In Python 3, list comprehensions were given their own scope.
try:
Z1 = sum([X for _ in range(3)])
except NameError:
Z1 = None
# Fails in both.
# Apparently, generator expressions (that's what the entire argument
# to sum() is) did have their own scope even in Python 2.
try:
Z2 = sum(X for _ in range(3))
except NameError:
Z2 = None
# Workaround: put the computation in lambda or def.
compute_z3 = lambda val: sum(val for _ in range(3))
# Then use that function.
Z3 = compute_z3(X)
# Also worth noting: here I can refer to XS in the for-part of the
# generator expression (Z4 works), but I cannot refer to XS in the
# inner-part of the generator expression (Z5 fails).
XS = [15, 15, 15, 15]
Z4 = sum(val for val in XS)
try:
Z5 = sum(XS[i] for i in range(len(XS)))
except NameError:
Z5 = None
print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)
Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension's scope:
import itertools as it
class Foo:
x = 5
y = [j for i, j in zip(range(3), it.repeat(x))]
One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:
class Foo:
x = 5
y = [j for j in (x,) for i in range(3)]
For the specific example of the OP:
from collections import namedtuple
import itertools as it
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for State, args in zip(it.repeat(State), [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
])]
This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.
To illustrate why this is a bug, let's return to the original example. This fails:
class Foo:
x = 5
y = [x for i in range(1)]
But this works:
def Foo():
x = 5
y = [x for i in range(1)]
The limitation is stated at the end of this section in the reference guide.
This may be by design, but IMHO, it's a bad design. I know I'm not an expert here, and I've tried reading the rationale behind this, but it just goes over my head, as I think it would for any average Python programmer.
To me, a comprehension doesn't seem that much different than a regular mathematical expression. For example, if 'foo' is a local function variable, I can easily do something like:
(foo + 5) + 7
But I can't do:
[foo + x for x in [1,2,3]]
To me, the fact that one expression exists in the current scope and the other creates a scope of its own is very surprising and, no pun intended, 'incomprehensible'.
I spent quite some time to understand why this is a feature, not a bug.
Consider the simple code:
a = 5
def myfunc():
print(a)
Since there is no "a" defined in myfunc(), the scope would expand and the code will execute.
Now consider the same code in the class. It cannot work because this would completely mess around accessing the data in the class instances. You would never know, are you accessing a variable in the base class or the instance.
The list comprehension is just a sub-case of the same effect.
One can use a for loop:
class A:
x=5
##Won't work:
## y=[i for i in range(101) if i%x==0]
y=[]
for i in range(101):
if i%x==0:
y.append(i)
Please correct me i'm not wrong...