Difference between "global" and "import __main__" - python

I defined three functions that should change a global variable x.
def changeXto1():
global x
x = 1
def changeXto2():
from __main__ import x
x = 2
def changeXto3():
import __main__
__main__.x = 3
x = 0
print x
changeXto1()
print x
changeXto2()
print x
changeXto3()
print x
It gives the result:
0
1
1
3
changeXto1 uses the normal global statement. The result is as expected x == 1. changeXto2 uses from __main__ import to address x. This doesn't work. Afterwards x is still 1. changeXto3 uses import main to address x via __main__.x. The result afterwards is 3 as expected.
Why doesn't from __main__ import work in changeXto2, while import __main__ is working in changeXto3? Why do we need a global statement in Python if we can address global variables also with the __main__ module?

This is related to how Python translate your code to bytecode (the compilation step).
When compiling a function, Python treat all variable that are assigned as local variable and perform an optimisation to reduce the number of name lookup it would have to do. Each local variable get assigned an index, and when the function is called their value will be stored in a stack local array addressed by index. The compiler will emit LOAD_FAST and STORE_FAST opcode to access the variable.
The global syntax indicate instead to the compiler that even if the variable is assigned a value, it should not be considered a local variable, should not be assigned an index. It will instead use LOAD_GLOBAL and STORE_GLOBAL opcode to access the variable. Those opcode are slower since they use the name to do a lookup in possibly many dictionaries (locals, globals).
If a variable is only accessed for reading the value, the compiler always emit LOAD_GLOBAL since it don't know whether it is supposed to be a local or global variable, and thus assume it is a global.
So, in your first function, using global x informs the compiler that you want it to treat the write access to x as writing to a global variable instead of a local variable. The opcodes for the function make it clear:
>>> dis.dis(changeXto1)
3 0 LOAD_CONST 1 (1)
3 STORE_GLOBAL 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
In your third example, you import the __main__ module into a local variable named __main__ and then assign to its x field. Since module are object that store all top-level mapping as fields, you are assigning to the variable x in the __main__ module. And as you found, the __main__ module fields directly map to the values in the globals() dictionary because your code is defined in the __main__ module. The opcodes show that you don't access x directly:
>>> dis.dis(changeXto3)
2 0 LOAD_CONST 1 (-1)
3 LOAD_CONST 0 (None)
6 IMPORT_NAME 0 (__main__)
9 STORE_FAST 0 (__main__)
3 12 LOAD_CONST 2 (3)
15 LOAD_FAST 0 (__main__)
18 STORE_ATTR 1 (x)
21 LOAD_CONST 0 (None)
24 RETURN_VALUE
The second example is interesting. Since you assign a value to the x variable, the compiler assume it is a local variable and does the optimisation. Then, the from __main__ import x does import the module __main__ and create a new binding the value of x in the module __main__ to the local variable named x. This is always the case, from ${module} import ${name} just create a new binding the current namespace. When you assign a new value to the variable x you just change the current binding, not the binding in module __main__ that is unrelated (though if the value is mutable, and you mutate it, the change will be visible through all the bindings). Here are the opcodes:
>>> dis.dis(f2)
2 0 LOAD_CONST 1 (-1)
3 LOAD_CONST 2 (('x',))
6 IMPORT_NAME 0 (__main__)
9 IMPORT_FROM 1 (x)
12 STORE_FAST 0 (x)
15 POP_TOP
3 16 LOAD_CONST 3 (2)
19 STORE_FAST 0 (x)
22 LOAD_CONST 0 (None)
25 RETURN_VALUE
A good way to think about this is that in Python all assignment are binding a name to a value in a dictionary, and dereference is just doing a dictionary lookup (this is a rough approximation, but pretty close to the conceptual model). When doing obj.field, then you are looking up the hidden dictionary of obj (accessible via obj.__dict__) for the "field" key.
When you have a naked variable name, then it is looked up in the locals() dictionary, then the globals() dictionary if it is different (they are the same when the code is executed at a module level). For an assignment, it always put the binding in the locals() dictionary, unless you declared that you wanted a global access by doing global ${name} (this syntax also works at top-level).
So translating your function, this is almost if you had written:
# NOTE: this is valid Python code, but is less optimal than
# the original code. It is here only for demonstration.
def changeXto1():
globals()['x'] = 1
def changeXto2():
locals()['x'] = __import__('__main__').__dict__['x']
locals()['x'] = 2
def changeXto3():
locals()['__main__'] = __import__('__main__')
locals()['__main__'].__dict__['x'] = 3

Why doesn't from __main__ import work in changeXto2, while import __main__ is working in changeXto3?
It works fine, it just doesn't do what you want. It copies the name and value into the local namespace instead of having the code access __main__'s namespace.
Why do we need a global statement in Python if we can address global variables also with the __main__ module?
Because they only do the same thing when your code is running in __main__. If you're running in, say, othermodule after importing it, then __main__ will refer to the main script and not othermodule.

Related

UnboundLocalError when manipulating variables yields inconsistent behavior

In Python, the following code works:
a = 1
b = 2
def test():
print a, b
test()
And the following code works:
a = 1
b = 2
def test():
if a == 1:
b = 3
print a, b
test()
But the following does not work:
a = 1
b = 2
def test():
if a == 1:
a = 3
print a, b
test()
The result of this last block is an UnboundLocalError message, saying a is referenced before assignment.
I understand I can make the last block work if I add global a in the test() definition, so it knows which a I am talking about.
Why do I not get an error when assigning a new value to b?
Am I creating a local b variable, and it doesn't yell at me because I'm not trying to reference it before assignment?
But if that's the case, why can I print a, b in the case of the first block, without having to declare global a, b beforehand?
Let me give you the link to the docs where it is clearly mentioned.
If a variable is assigned a new value anywhere within the function’s body, it’s assumed to be a local.
(emphasis mine)
Thus your variable a is a local variable and not global. This is because you have a assignment statement,
a = 3
In the third line of your code. This makes a a local variable. And referring to the local variable before declaration causes an error which is an UnboundLocalError.
However in your 2nd code block, you do not make any such assignment statements and hence you do not get any such error.
Another use ful link is this
Raised when a reference is made to a local variable in a function or method, but no value has been bound to that variable.
Thus you are referring to the local variable you create in the next line.
To prevent this there are two ways
Good way - Passing parameters
Define your function as def test(a): and call it as test(a)
Bad way - Using global
Have a line global a at the top of your function call.
Python scoping rules are a little tricky! You need to master them to get hold of the language. Check out this
In the third block the compiler has marked a as a local variable since it is being assigned to, therefore when it is used in the expression it is looked for in the local scope. Since it does not exist there, an exception is raised.
In the second block the compiler has marked b as a local variable but not a, hence there is no exception when a is accessed since outer scopes will be searched.
When you modify a, it becomes a local variable. When you're simply referencing it, it is a global. You haven't defined a in the local scope, so you can't modify it.
If you want to modify a global, you need to call it global in your local scope.
Take a look at the bytecode for the following
import dis
a = 9 # Global
def foo():
print a # Still global
def bar():
a += 1 # This "a" is local
dis.dis(foo)
Output:
2 0 LOAD_GLOBAL 0 (a)
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
For the second function:
dis.dis(bar)
Output:
2 0 LOAD_FAST 0 (a)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (a)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
The first function's bytecode loads the global a (LOAD_GLOBAL) because it is only being referenced. The second function's bytecode (LOAD_FAST) tries to load a local a but one hasn't been defined.
The only reason your second function works is because a is equal to 1. If a was anything but 1, the local assignment to b wouldn't happen and you'd receive the same error.

Creating dynamically named variables in a function in python 3 / Understanding exec / eval / locals in python 3

First of all, let me say that I read the many threads with similar topics on creating dynamically named variables, but they mostly relate to Python 2 or they assume you are working with classes. And yes, I read Behavior of exec function in Python 2 and Python 3.
I'm also aware that creating dynamically named variables is a bad idea in 99% of time and dictionaries are the way to got, but I just want to know whether it is still possible and how exactly exec and locals work in python 3.
I'd like to show a bit of sample code illustrating my question (fibonacci calculates fibonacci numbers, ListOfLetters provides ["A", "B", ...]):
def functionname():
for index, buchstabe in enumerate(ListOfLetters.create_list("A", "K"), 1):
exec("{} = {}".format(buchstabe, fibonacci(index)) ) #A = 1, B = 1, C = 2, D = 3, E = 5,...
print(index, buchstabe, eval(buchstabe)) #works nicely, e.g. prints "4 D 3"
print(locals()) #pritns all locals: {'B': 1, 'A': 1, 'index': 11, 'C': 2, 'H': 21, 'K': 89, ...
print(locals()['K']) #prints 89 as it should
print(eval("K")) #prints 89 as it should
print(K) #NameError: name 'K' is not defined
So at least at my current understanding, there is some inconsistency in the behaviour of locals(), since it contains the variable names added by exec() but the variables are not available in the function.
I would be greatful if someone could explain this and tell whether this is by design or if it is a real inconsistency in the language. Yes, I know that locals should not be modified , but I'm not modifying it, I'm calling exec()...
When you're not sure why something works the way it does in Python, it often can help to put the behavior that you're confused by in a function and then disassemble it from the Python bytecode with the dis module.
Lets start with a simpler version of your code:
def foo():
exec("K = 89")
print(K)
If you run foo(), you'll get the same exception you're seeing with your more complicated function:
>>> foo()
Traceback (most recent call last):
File "<pyshell#167>", line 1, in <module>
foo()
File "<pyshell#166>", line 3, in foo
print(K)
NameError: name 'K' is not defined
Lets disassemble it and see why:
>>> import dis
>>> dis.dis(foo)
2 0 LOAD_GLOBAL 0 (exec)
3 LOAD_CONST 1 ('K = 89')
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP
3 10 LOAD_GLOBAL 1 (print)
13 LOAD_GLOBAL 2 (K)
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 POP_TOP
20 LOAD_CONST 0 (None)
23 RETURN_VALUE
The operation that you need to pay attention to is the one labeled "13". This is where the compiler handles looking up K within the last line of the function (print(K)). It is using the LOAD_GLOBAL opcode, which fails because "K" is not a global variable name, rather it's a value in our locals() dict (added by the exec call).
What if we persuaded the compiler to see K as a local variable (by giving it a value before running the exec), so it will know not to look for a global variable that doesn't exist?
def bar():
K = None
exec("K = 89")
print(K)
This function won't give you an error if you run it, but you won't get the expected value printed out:
>>> bar()
None
Lets disassemble to see why:
>>> dis.dis(bar)
2 0 LOAD_CONST 0 (None)
3 STORE_FAST 0 (K)
3 6 LOAD_GLOBAL 0 (exec)
9 LOAD_CONST 1 ('K = 89')
12 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
15 POP_TOP
4 16 LOAD_GLOBAL 1 (print)
19 LOAD_FAST 0 (K)
22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
25 POP_TOP
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
Note the opcodes used at "3" and "19". The Python compiler uses STORE_FAST and LOAD_FAST to put the value for the local variable K into slot 0 and later fetch it back out. Using numbered slots is significantly faster than inserting and fetching values from a dictionary like locals(), which is why the Python compiler does it for all local variable access in a function. You can't overwrite a local variable in a slot by modifying the dictionary returned by locals() (as exec does, if you don't pass it a dict to use for its namespace).
Indeed, lets try a third version of our function, where we peek into locals again when we have K defined as a regular local variable:
def baz():
K = None
exec("K = 89")
print(locals())
You won't see 89 in the output this time either!
>>> baz()
{"K": None}
The reason you see the old K value in locals() is explained in the function's documentation:
Update and return a dictionary representing the current local symbol table.
The slot that the local variable K's value is stored in was not changed by the exec statement, which only modifies the locals() dict. When you call locals() again, Python "update[s]" the dictionary with the value from the slot, replacing the value stored there by exec.
This is why the docs go on to say:
Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
Your exec call is modifying the locals() dict, and you're finding how its changes are not always seen by your later code.
On the exec/eval/locals question
At least on the CPython implementation modifications to the locals() dictionary do not actually change the names in the local scope, which is why it's meant to be used read-only. You can change it, and you can see your changes in the dictionary object, but the actual local scope is not changed.
exec() takes two optional dictionary arguments, a global scope and a local scope. It defaults to globals() and locals(), but since changes to locals() aren't "real" outside of the dictionary, exec() only affects the "real" local scope when globals() is locals(), i.e. in a module outside of any function. (So in your case it's failing because it's inside a function scope).
The "better" way to use exec() in this case is to pass in your own dictionary, then operate on the values in that.
def foo():
exec_scope = {}
exec("y = 2", exec_scope)
print(exec_scope['y'])
foo()
In this case, exec_scope is used as the global and local scope for the exec, and after the exec it will contain {'y': 2, '__builtins__': __builtins__} (the builtins are inserted for you if not present)
If you want access to more globals you could do exec_scope = dict(globals()).
Passing in different global and local scope dictionaries can produce "interesting" behavior.
If you pass the same dictionary into successive calls to exec or eval, then they have the same scope, which is why your eval worked (it implicitly used the locals() dictionary).
On dynamic variable names
If you set the name from a string, what's so wrong about getting the value as a string (i.e. what a dictionary does)? In other words, why would you want to set locals()['K'] and then access K? If K is in your source it's not really a dynamically set name... hence, dictionaries.

How can "NameError: free variable 'var' referenced before assignment in enclosing scope" occur in real code?

While I was hanging out in the Python chatroom, someone dropped in and reported the following exception:
NameError: free variable 'var' referenced before assignment in enclosing scope
I'd never seen that error message before, and the user provided only a small code fragment that couldn't have caused the error by itself, so off I went googling for information, and ... there doesn't seem to be much. While I was searching, the user reported their problem solved as a "whitespace issue", and then left the room.
After playing around a bit, I've only been able to reproduce the exception with toy code like this:
def multiplier(n):
def multiply(x):
return x * n
del n
return multiply
Which gives me:
>>> triple = multiplier(3)
>>> triple(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in multiply
NameError: free variable 'n' referenced before assignment in enclosing scope
All well and good, but I'm having a hard time working out how this exception could occur in the wild, given that my example above is
Pretty stupid
Unlikely to happen by accident
... but obviously it does, given the report I mentioned at the start of this question.
So - how can this specific exception occur in real code?
Think of a more complex function where n is bound depending on some condition, or not. You don't have to del the name in question, it also happens if the compiler sees an assignment, so the name is local, but the code path is not taken and the name gets never assigned anything. Another stupid example:
def f():
def g(x):
return x * n
if False:
n = 10
return g
Too late to answer but I think I can give some detailed information to this situation. It would help future readers to see what's going on here.
So the error message says:
NameError: free variable 'var' referenced before assignment in enclosing scope
When we talk about free variables, we're dealing with nested functions. Python has done some "magic" in order to give nested functions the ability to access the variables defined inside their parent scope. If we have:
def outer():
foo = 10
def inner():
print(foo)
return inner
outer()() # 10
Normally we shouldn't have access to foo in inner function. Why ? because after calling and executing the body of the outer function, its namespace is destroyed. Basically any local variable defined inside the function is no longer available after the function terminates.
But we have access...
That magic happens with the help of the "Cell object":
“Cell” objects are used to implement variables referenced by multiple scopes. For each such variable, a cell object is created to store the value; the local variables of each stack frame that references the value contains a reference to the cells from outer scopes which also use that variable. When the value is accessed, the value contained in the cell is used instead of the cell object itself.
Just to see that hidden stored value in cells(we'll talk about __closure__ a bit later):
def outer():
foo = 10
def inner():
print(foo)
return inner
print(outer().__closure__[0].cell_contents) # 10
How does it work?
In "compile" time,
when Python sees a function within another function, it takes note of the name of the variables referenced inside the nested function which are actually defined in the outer function. This information is stored in both functions' code objects. co_cellvars for outer function and co_freevars for inner function:
def outer():
foo = 10
def inner():
print(foo)
return inner
print(outer.__code__.co_cellvars) # ('foo',)
print(outer().__code__.co_freevars) # ('foo',)
Now execution time..., (see the code)
When Python wants to execute the outer function, it creates a "cell object" for each variables (co_cellvars) that it has taken a note of.
Then as it goes through the lines, whenever it sees an assignment to such variables, it fills the corresponding cell object with that variable. (remember, "they" contain the actual values indirectly.)
When the execution reaches the line of creating the inner function, Python takes all the created cell objects and make a tuple out of them. This tuple is then assigned to the inner function's __closure__.
The point is when this tuple is created, some of the cells may not have value yet. They are empty(see the output)!...
At this point when you call the inner function those cells without value will raise that mentioned error!
def outer():
foo = 10
def inner():
print(foo)
try:
print(boo)
except NameError as e:
print(e)
# Take a look at inner's __closure__ cells
print(inner.__closure__)
# So one boo is empty! This raises error
inner()
# Now lets look at inner's __closure__ cells one more time (they're filled now)
boo = 20
print(inner.__closure__)
# This works fine now
inner()
outer()
output from Python 3.10:
(<cell at 0x7f14a5b62710: empty>, <cell at 0x7f14a5b62830: int object at 0x7f14a6f00210>)
10
free variable 'boo' referenced before assignment in enclosing scope
(<cell at 0x7f14a5b62710: int object at 0x7f14a6f00350>, <cell at 0x7f14a5b62830: int object at 0x7f14a6f00210>)
10
20
The error free variable 'boo' referenced before assignment in enclosing scope makes sense now.
Note: This error is reworded in Python 3.11 to:
cannot access free variable 'boo' where it is not associated with a value in enclosing scope
But the idea is the same.
If you look at the bytecode of the outer function, you'd see the steps I mentioned in the "execution time" section in action:
from dis import dis
def outer():
foo = 10
def inner():
print(foo)
print(boo)
boo = 20
return inner
dis(outer)
output from Python 3.11:
0 MAKE_CELL 1 (boo)
2 MAKE_CELL 2 (foo)
3 4 RESUME 0
4 6 LOAD_CONST 1 (10)
8 STORE_DEREF 2 (foo)
5 10 LOAD_CLOSURE 1 (boo)
12 LOAD_CLOSURE 2 (foo)
14 BUILD_TUPLE 2
16 LOAD_CONST 2 (<code object inner at 0x7fb6d4731a30, file "", line 5>)
18 MAKE_FUNCTION 8 (closure)
20 STORE_FAST 0 (inner)
8 22 LOAD_CONST 3 (20)
24 STORE_DEREF 1 (boo)
9 26 LOAD_FAST 0 (inner)
28 RETURN_VALUE
MAKE_CELL is new in Python3.11.
STORE_DEREF stores the value inside the cell object.

How references to variables are resolved in Python [duplicate]

This question already has answers here:
Short description of the scoping rules?
(9 answers)
Closed 6 months ago.
This message is a a bit long with many examples, but I hope it
will help me and others to better grasp the full story of variables
and attribute lookup in Python 2.7.
I am using the terms of PEP 227
(http://www.python.org/dev/peps/pep-0227/) for code blocks (such as
modules, class definition, function definitions, etc.) and
variable bindings (such as assignments, argument declarations, class
and function declaration, for loops, etc.)
I am using the terms variables for names that can be called without a
dot, and attributes for names that need to be qualified with an object
name (such as obj.x for the attribute x of object obj).
There are three scopes in Python for all code blocks, but the functions:
Local
Global
Builtin
There are four blocks in Python for the functions only (according to
PEP 227):
Local
Enclosing functions
Global
Builtin
The rule for a variable to bind it to and find it in a block is
quite simple:
any binding of a variable to an object in a block makes this variable
local to this block, unless the variable is declared global (in that
case the variable belongs to the global scope)
a reference to a variable is looked up using the rule LGB (local,
global, builtin) for all blocks, but the functions
a reference to a variable is looked up using the rule LEGB (local,
enclosing, global, builtin) for the functions only.
Let me know take examples validating this rule, and showing many
special cases. For each example, I will give my understanding. Please
correct me if I am wrong. For the last example, I don't understand the
outcome.
example 1:
x = "x in module"
class A():
print "A: " + x #x in module
x = "x in class A"
print locals()
class B():
print "B: " + x #x in module
x = "x in class B"
print locals()
def f(self):
print "f: " + x #x in module
self.x = "self.x in f"
print x, self.x
print locals()
>>>A.B().f()
A: x in module
{'x': 'x in class A', '__module__': '__main__'}
B: x in module
{'x': 'x in class B', '__module__': '__main__'}
f: x in module
x in module self.x in f
{'self': <__main__.B instance at 0x00000000026FC9C8>}
There is no nested scope for the classes (rule LGB) and a function in
a class cannot access the attributes of the class without using a
qualified name (self.x in this example). This is well described in
PEP227.
example 2:
z = "z in module"
def f():
z = "z in f()"
class C():
z = "z in C"
def g(self):
print z
print C.z
C().g()
f()
>>>
z in f()
z in C
Here variables in functions are looked up using the LEGB rule, but if
a class is in the path, the class arguments are skipped. Here again,
this is what PEP 227 is explaining.
example 3:
var = 0
def func():
print var
var = 1
>>> func()
Traceback (most recent call last):
File "<pyshell#102>", line 1, in <module>
func()
File "C:/Users/aa/Desktop/test2.py", line 25, in func
print var
UnboundLocalError: local variable 'var' referenced before assignment
We expect with a dynamic language such as python that everything is
resolved dynamically. But this is not the case for functions. Local
variables are determined at compile time. PEP 227 and
http://docs.python.org/2.7/reference/executionmodel.html describe this
behavior this way
"If a name binding operation occurs anywhere within a code block, all
uses of the name within the block are treated as references to the
current block."
example 4:
x = "x in module"
class A():
print "A: " + x
x = "x in A"
print "A: " + x
print locals()
del x
print locals()
print "A: " + x
>>>
A: x in module
A: x in A
{'x': 'x in A', '__module__': '__main__'}
{'__module__': '__main__'}
A: x in module
But we see here that this statement in PEP227 "If a name binding
operation occurs anywhere within a code block, all uses of the name
within the block are treated as references to the current block." is
wrong when the code block is a class. Moreover, for classes, it seems
that local name binding is not made at compile time, but during
execution using the class namespace. In that respect,
PEP227 and the execution model in the Python doc is misleading and for
some parts wrong.
example 5:
x = 'x in module'
def f2():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
class MyClass(object):
x = x
print x
return MyClass
myfunc()
f2()
>>>
x in module
my understanding of this code is the following. The instruction x = x
first look up the object the right hand x of the expression is referring
to. In that case, the object is looked up locally in the class, then
following the rule LGB it is looked up in the global scope, which is
the string 'x in module'. Then a local attribute x to MyClass is
created in the class dictionary and pointed to the string object.
example 6:
Now here is an example I cannot explain.
It is very close to example 5, I am just changing the local MyClass
attribute from x to y.
x = 'x in module'
def f2():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
class MyClass(object):
y = x
print y
return MyClass
myfunc()
f2()
>>>
x in myfunc
Why in that case the x reference in MyClass is looked up in the
innermost function?
In an ideal world, you'd be right and some of the inconsistencies you found would be wrong. However, CPython has optimized some scenarios, specifically function locals. These optimizations, together with how the compiler and evaluation loop interact and historical precedent, lead to the confusion.
Python translates code to bytecodes, and those are then interpreted by a interpreter loop. The 'regular' opcode for accessing a name is LOAD_NAME, which looks up a variable name as you would in a dictionary. LOAD_NAME will first look up a name as a local, and if that fails, looks for a global. LOAD_NAME throws a NameError exception when the name is not found.
For nested scopes, looking up names outside of the current scope is implemented using closures; if a name is not assigned to but is available in a nested (not global) scope, then such values are handled as a closure. This is needed because a parent scope can hold different values for a given name at different times; two calls to a parent function can lead to different closure values. So Python has LOAD_CLOSURE, MAKE_CLOSURE and LOAD_DEREF opcodes for that situation; the first two opcodes are used in loading and creating a closure for a nested scope, and the LOAD_DEREF will load the closed-over value when the nested scope needs it.
Now, LOAD_NAME is relatively slow; it will consult two dictionaries, which means it has to hash the key first and run a few equality tests (if the name wasn't interned). If the name isn't local, then it has to do this again for a global. For functions, that can potentially be called tens of thousands of times, this can get tedious fast. So function locals have special opcodes. Loading a local name is implemented by LOAD_FAST, which looks up local variables by index in a special local names array. This is much faster, but it does require that the compiler first has to see if a name is a local and not global. To still be able to look up global names, another opcode LOAD_GLOBAL is used. The compiler explicitly optimizes for this case to generate the special opcodes. LOAD_FAST will throw an UnboundLocalError exception when there is not yet a value for the name.
Class definition bodies on the other hand, although they are treated much like a function, do not get this optimization step. Class definitions are not meant to be called all that often; most modules create classes once, when imported. Class scopes don't count when nesting either, so the rules are simpler. As a result, class definition bodies do not act like functions when you start mixing scopes up a little.
So, for non-function scopes, LOAD_NAME and LOAD_DEREF are used for locals and globals, and for closures, respectively. For functions, LOAD_FAST, LOAD_GLOBAL and LOAD_DEREF are used instead.
Note that class bodies are executed as soon as Python executes the class line! So in example 1, class B inside class A is executed as soon as class A is executed, which is when you import the module. In example 2, C is not executed until f() is called, not before.
Lets walk through your examples:
You have nested a class A.B in a class A. Class bodies do not form nested scopes, so even though the A.B class body is executed when class A is executed, the compiler will use LOAD_NAME to look up x. A.B().f() is a function (bound to the B() instance as a method), so it uses LOAD_GLOBAL to load x. We'll ignore attribute access here, that's a very well defined name pattern.
Here f().C.z is at class scope, so the function f().C().g() will skip the C scope and look at the f() scope instead, using LOAD_DEREF.
Here var was determined to be a local by the compiler because you assign to it within the scope. Functions are optimized, so LOAD_FAST is used to look up the local and an exception is thrown.
Now things get a little weird. class A is executed at class scope, so LOAD_NAME is being used. A.x was deleted from the locals dictionary for the scope, so the second access to x results in the global x being found instead; LOAD_NAME looked for a local first and didn't find it there, falling back to the global lookup.
Yes, this appears inconsistent with the documentation. Python-the-language and CPython-the implementation are clashing a little here. You are, however, pushing the boundaries of what is possible and practical in a dynamic language; checking if x should have been a local in LOAD_NAME would be possible but takes precious execution time for a corner case that most developers will never run into.
Now you are confusing the compiler. You used x = x in the class scope, and thus you are setting a local from a name outside of the scope. The compiler finds x is a local here (you assign to it), so it never considers that it could also be a scoped name. The compiler uses LOAD_NAME for all references to x in this scope, because this is not an optimized function body.
When executing the class definition, x = x first requires you to look up x, so it uses LOAD_NAME to do so. No x is defined, LOAD_NAME doesn't find a local, so the global x is found. The resulting value is stored as a local, which happens to be named x as well. print x uses LOAD_NAME again, and now finds the new local x value.
Here you did not confuse the compiler. You are creating a local y, x is not local, so the compiler recognizes it as a scoped name from parent function f2().myfunc(). x is looked up with LOAD_DEREF from the closure, and stored in y.
You could see the confusion between 5 and 6 as a bug, albeit one that is not worth fixing in my opinion. It was certainly filed as such, see issue 532860 in the Python bug tracker, it has been there for over 10 years now.
The compiler could check for a scoped name x even when x is also a local, for that first assignment in example 5. Or LOAD_NAME could check if the name is meant to be a local, really, and throw an UnboundLocalError if no local was found, at the expense of more performance. Had this been in a function scope, LOAD_FAST would have been used for example 5, and an UnboundLocalError would be thrown immediately.
However, as the referenced bug shows, for historical reasons the behaviour is retained. There probably is code out there today that'll break were this bug fixed.
In two words, the difference between example 5 and example 6 is that in example 5 the variable x is also assigned to in the same scope, while not in example 6. This triggers a difference that can be understood by historical reasons.
This raises UnboundLocalError:
x = "foo"
def f():
print x
x = 5
f()
instead of printing "foo". It makes a bit of sense, even if it seems strange at first: the function f() defines the variable x locally, even if it is after the print, and so any reference to x in the same function must be to that local variable. At least it makes sense in that it avoids strange surprizes if you have by mistake reused the name of a global variable locally, and are trying to use both the global and the local variable. This is a good idea because it means that we can statically know, just by looking at a variable, which variable it means. For example, we know that print x refers to the local variable (and thus may raise UnboundLocalError) here:
x = "foo"
def f():
if some_condition:
x = 42
print x
f()
Now, this rule doesn't work for class-level scopes: there, we want expressions like x = x to work, capturing the global variable x into the class-level scope. This means that class-level scopes don't follow the basic rule above: we can't know if x in this scope refers to some outer variable or to the locally-defined x --- for example:
class X:
x = x # we want to read the global x and assign it locally
bar = x # but here we want to read the local x of the previous line
class Y:
if some_condition:
x = 42
print x # may refer to either the local x, or some global x
class Z:
for i in range(2):
print x # prints the global x the 1st time, and 42 the 2nd time
x = 42
So in class scopes, a different rule is used: where it would normally raise UnboundLocalError --- and only in that case --- it instead looks up in the module globals. That's all: it doesn't follow the chain of nested scopes.
Why not? I actually doubt there is a better explanation that "for historical reasons". In more technical terms, it could consider that the variable x is both locally defined in the class scope (because it is assigned to) and should be passed in from the parent scope as a lexically nested variable (because it is read). It would be possible to implement it by using a different bytecode than LOAD_NAME that looks up in the local scope, and falls back to using the nested scope's reference if not found.
EDIT: thanks wilberforce for the reference to http://bugs.python.org/issue532860. We may have a chance to get some discussion reactivated with the proposed new bytecode, if we feel that it should be fixed after all (the bug report considers killing support for x = x but was closed for fear of breaking too much existing code; instead what I'm suggesting here would be to make x = x work in more cases). Or I may be missing another fine point...
EDIT2: it seems that CPython did precisely that in the current 3.4 trunk: http://bugs.python.org/issue17853 ... or not? They introduced the bytecode for a slightly different reason and don't use it systematically...
Long story short, this is a corner case of Python's scoping that is a bit inconsistent, but has to be kept for backwards compatibility (and because it's not that clear what the right answer should be). You can see lots of the original discussion about it on the Python mailing list when PEP 227 was being implemented, and some in the bug for which this behaviour is the fix.
We can work out why there's a difference using the dis module, which lets us look inside code objects to see the bytecode a piece of code has been compiled to. I'm on Python 2.6, so the details of this might be slightly different - but I see the same behaviour, so I think it's probably close enough to 2.7.
The code that initialises each nested MyClass lives in a code object that you can get to via the attributes of the top-level functions. (I'm renaming the functions from example 5 and example 6 to f1 and f2 respectively.)
The code object has a co_consts tuple, which contains the myfunc code object, which in turn has the code that runs when MyClass gets created:
In [20]: f1.func_code.co_consts
Out[20]: (None,
'x in f2',
<code object myfunc at 0x1773e40, file "<ipython-input-3-6d9550a9ea41>", line 4>)
In [21]: myfunc1_code = f1.func_code.co_consts[2]
In [22]: MyClass1_code = myfunc1_code.co_consts[3]
In [23]: myfunc2_code = f2.func_code.co_consts[2]
In [24]: MyClass2_code = myfunc2_code.co_consts[3]
Then you can see the difference between them in bytecode using dis.dis:
In [25]: from dis import dis
In [26]: dis(MyClass1_code)
6 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
7 6 LOAD_NAME 2 (x)
9 STORE_NAME 2 (x)
8 12 LOAD_NAME 2 (x)
15 PRINT_ITEM
16 PRINT_NEWLINE
17 LOAD_LOCALS
18 RETURN_VALUE
In [27]: dis(MyClass2_code)
6 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
7 6 LOAD_DEREF 0 (x)
9 STORE_NAME 2 (y)
8 12 LOAD_NAME 2 (y)
15 PRINT_ITEM
16 PRINT_NEWLINE
17 LOAD_LOCALS
18 RETURN_VALUE
So the only difference is that in MyClass1, x is loaded using the LOAD_NAME op, while in MyClass2, it's loaded using LOAD_DEREF. LOAD_DEREF looks up a name in an enclosing scope, so it gets 'x in myfunc'. LOAD_NAME doesn't follow nested scopes - since it can't see the x names bound in myfunc or f1, it gets the module-level binding.
Then the question is, why does the code of the two versions of MyClass get compiled to two different opcodes? In f1 the binding is shadowing x in the class scope, while in f2 it's binding a new name. If the MyClass scopes were nested functions instead of classes, the y = x line in f2 would be compiled the same, but the x = x in f1 would be a LOAD_FAST - this is because the compiler would know that x is bound in the function, so it should use the LOAD_FAST to retrieve a local variable. This would fail with an UnboundLocalError when it was called.
In [28]: x = 'x in module'
def f3():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
def MyFunc():
x = x
print x
return MyFunc()
myfunc()
f3()
---------------------------------------------------------------------------
Traceback (most recent call last)
<ipython-input-29-9f04105d64cc> in <module>()
9 return MyFunc()
10 myfunc()
---> 11 f3()
<ipython-input-29-9f04105d64cc> in f3()
8 print x
9 return MyFunc()
---> 10 myfunc()
11 f3()
<ipython-input-29-9f04105d64cc> in myfunc()
7 x = x
8 print x
----> 9 return MyFunc()
10 myfunc()
11 f3()
<ipython-input-29-9f04105d64cc> in MyFunc()
5 x = 'x in myfunc'
6 def MyFunc():
----> 7 x = x
8 print x
9 return MyFunc()
UnboundLocalError: local variable 'x' referenced before assignment
This fails because the MyFunc function then uses LOAD_FAST:
In [31]: myfunc_code = f3.func_code.co_consts[2]
MyFunc_code = myfunc_code.co_consts[2]
In [33]: dis(MyFunc_code)
7 0 LOAD_FAST 0 (x)
3 STORE_FAST 0 (x)
8 6 LOAD_FAST 0 (x)
9 PRINT_ITEM
10 PRINT_NEWLINE
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
(As an aside, it's not a big surprise that there should be a difference in how scoping interacts with code in the body of classes and code in a function. You can tell this because bindings at the class level aren't available in methods - method scopes aren't nested inside the class scope in the same way as nested functions are. You have to explicitly reach them via the class, or by using self. (which will fall back to the class if there's not also an instance-level binding).)

Python: Accessing another class of current module by name

I have the following setup in one module:
class A(object):
# stuff
class B(object):
# stuff
Now what I want to do is, creating an instance of class A by name (I just have the class name as a string) inside of B. How can I do this avoiding the globals function?
Why not just use A? Or do you have just a string 'A'? If yes, globals()['A'] is the way to go. The alternative would be getattr(sys.modules[__name__], 'A') but obviously globals() is more appropriate.
>>> dis.dis(lambda: getattr(sys.modules[__name__], 'Foo'))
1 0 LOAD_GLOBAL 0 (getattr)
3 LOAD_GLOBAL 1 (sys)
6 LOAD_ATTR 2 (modules)
9 LOAD_GLOBAL 3 (__name__)
12 BINARY_SUBSCR
13 LOAD_CONST 1 ('Foo')
16 CALL_FUNCTION 2
19 RETURN_VALUE
>>> dis.dis(lambda: globals()['Foo'])
1 0 LOAD_GLOBAL 0 (globals)
3 CALL_FUNCTION 0
6 LOAD_CONST 1 ('Foo')
9 BINARY_SUBSCR
10 RETURN_VALUE
>>> dis.dis(lambda: Foo)
1 0 LOAD_GLOBAL 0 (Foo)
3 RETURN_VALUE
So by just looking at the instructions used for the various ways to access Foo, using globals() is most likely faster than going through sys.modules.
Let me see if I understand you right:
You have a settings file that looks something like
...
connection-type: FooConnection
...
You have a bunch of classes
class FooConnection(Connection): ...
class BarConnection(Connection): ...
class BazConnection(Connection): ...
You want to map "FooConnection" from the settings file to the class FooConnection.
If so, I would do this instead:
Put
connection-type: Foo
in the settings file, or some other human-readable name that doesn't depend on the name of the class.
Write a mapping from human-readable names to implementations:
implementations = {
"Foo": FooConnection,
"Bar": BarConnection,
"Baz": BazConnection
}
You can change this mapping if you want to change e.g. how you implement the classes. This also lets you have synonyms.
Look up the value in the settings file in the implementations dictionary to get the class you want.
In fact, you're doing this already. Just, instead of explicitly writing down the mapping of strings to classes, you're using the globals dictionary; in other words, assuming that the end user knows the class names you want to use. That's not nice.
I'm not clear what you mean by "accessing on class A by name", but usually there are three main approaches, depending on what you actually want to do.
You create an instance of class A inside class B's __init__
You inherit from class A in class B
What #ThiefMaster posted.

Categories

Resources