Related
I'm trying to set up a simple test example of setattr() in Python, but it fails to assign a new value to the member.
class Foo(object):
__bar = 0
def modify_bar(self):
print(self.__bar)
setattr(self, "__bar", 1)
print(self.__bar)
Here I tried variable assignment with setattr(self, "bar", 1), but was unsuccessful:
>>> foo = Foo()
>>> foo.modify_bar()
0
0
Can someone explain what is happening under the hood. I'm new to python, so please forgive my elementary question.
A leading double underscore invokes python name-mangling.
So:
class Foo(object):
__bar = 0 # actually `_Foo__bar`
def modify_bar(self):
print(self.__bar) # actually self._Foo__bar
setattr(self, "__bar", 1)
print(self.__bar) # actually self._Foo__bar
Name mangling only applies to identifiers, not strings, which is why the __bar in the setattr function call is unaffected.
class Foo(object):
_bar = 0
def modify_bar(self):
print(self._bar)
setattr(self, "_bar", 1)
print(self._bar)
should work as expected.
Leading double underscores are generally not used very frequently in most python code (because their use is typically discouraged). There are a few valid use-cases (mainly to avoid name clashes when subclassing), but those are rare enough that name mangling is generally avoided in the wild.
I have run across a few examples of Python code that looks something like this:
class GiveNext :
list = ''
def __init__(self, list) :
GiveNext.list = list
def giveNext(self, i) :
retval = GiveNext.list[i]
return retval
class GiveABCs(GiveNext):
i = -1
def _init__(self, list) :
GiveNext.__init__(self, list)
def giveNext(self):
GiveABCs.i += 1
return GiveNext.giveNext(self, GiveABCs.i)
class Give123s(GiveNext):
i = -1
def _init__(self, list) :
GiveNext.__init__(self, list)
def giveNext(self):
Give123s.i += 1
return GiveNext.giveNext(self, Give123s.i)
for i in range(3):
print(GiveABCs('ABCDEFG').giveNext())
print(Give123s('12345').giveNext())
the output is: A 1 B 2 C 3
If I were more clever, I could figure out how to put the string literals inside the constructor...but that is not crucial right now.
My question is on the use of classes this way. Yes, an instance of the class gets created each time that that the call within the print() gets made. Yet the i's are 'permanent' in each class.
This strikes me as less of an object-oriented approach, and more of a way of using classes to accomplish encapsulation and/or a functional programming paradigm, since the instances are entirely transitory. In other words, an instance of the class is never instantiated for its own purposes; it is there only to allow access to the class-wide methods and variables within to do their thing, and then it is tossed away. In many cases, it seems like the class mechanism is used in a back-handed way, in order to leverage inheritance and name resolution/spacing: an instance of the class is never really required to be built or used, conceptually.
Is this standard Python form?
Bonus question: how would I put the string literals inside each class declaration? Right now, even if I change the _init__ for GiveABCs to
GiveNext.__init__(self, 'wxyz')
it completely ignores the 'wxyz' literal, and uses the 'ABCDEF' one - even though it is never mentioned...
Please don't learn Python with this code. As mentioned by others, this code goes against many Python principles.
One example: list is a Python builtin type. Don't overwrite it, especially not with a string instance!
The code also mixes class and instance variables and doesn't use super() in subclasses.
This code tries to simulate an iterator. So simply use an iterator:
give_abcs = iter('ABCDEFG')
give_123s = iter('12345')
for _ in range(3):
print(next(give_abcs))
print(next(give_123s))
# A
# 1
# B
# 2
# C
# 3
If you really want to fix the above code, you could use:
class GiveNext :
def __init__(self, iterable) :
self.i = - 1
self.iterable = iterable
def giveNext(self) :
self.i += 1
return self.iterable[self.i]
giveABCs = GiveNext('ABCDEFG')
give123s = GiveNext('12345')
for _ in range(3):
print(giveABCs.giveNext())
print(give123s.giveNext())
It outputs:
A
1
B
2
C
3
This code in the OP is an incredible amount of crap. Not only it is long, unreadable, misuses OO features, and does not use Python features at all (an iterator being a standard Python feature). Here is a suggestion for a more Pythonist approach:
giveABCs = iter('ABCDEFG')
give123s = iter('12345')
for i in range(3):
print(next(giveABCs))
print(next(give123s))
About your bonus question: I guess you are modifing the _init__() method of GiveABCs and Give123s. It is normal that whatever code you put in there has no effect, because the Python constructor is __init__() (with 2 leading underscores, not 1). So The constructor from GiveNext is not overloaded.
This is a bit of a silly thing, but I want to know if there is concise way in Python to define class variables that contain string representations of their own names. For example, one can define:
class foo(object):
bar = 'bar'
baz = 'baz'
baf = 'baf'
Probably a more concise way to write it in terms of lines consumed is:
class foo(object):
bar, baz, baf = 'bar', 'baz', 'baf'
Even there, though, I still have to type each identifier twice, once on each side of the assignment, and the opportunity for typos is rife.
What I want is something like what sympy provides in its var method:
sympy.var('a,b,c')
The above injects into the namespace the variables a, b, and c, defined as the corresponding sympy symbolic variables.
Is there something comparable that would do this for plain strings?
class foo(object):
[nifty thing]('bar', 'baz', 'baf')
EDIT: To note, I want to be able to access these as separate identifiers in code that uses foo:
>>> f = foo(); print(f.bar)
bar
ADDENDUM: Given the interest in the question, I thought I'd provide more context on why I want to do this. I have two use-cases at present: (1) typecodes for a set of custom exceptions (each Exception subclass has a distinct typecode set); and (2) lightweight enum. My desired feature set is:
Only having to type the typecode / enum name (or value) once in the source definition. class foo(object): bar = 'bar' works fine but means I have to type it out twice in-source, which gets annoying for longer names and exposes a typo risk.
Valid typecodes / enum values exposed for IDE autocomplete.
Values stored internally as comprehensible strings:
For the Exception subclasses, I want to be able to define myError.__str__ as just something like return self.typecode + ": " + self.message + " (" + self.source + ")", without having to do a whole lot of dict-fu to back-reference an int value of self.typecode to a comprehensible and meaningful string.
For the enums, I want to just be able to obtain widget as output from e = myEnum.widget; print(e), again without a lot of dict-fu.
I recognize this will increase overhead. My application is not speed-sensitive (GUI-based tool for driving a separate program), so I don't think this will matter at all.
Straightforward membership testing, by also including (say) a frozenset containing all of the typecodes / enum string values as myError.typecodes/myEnum.E classes. This addresses potential problems from accidental (or intentional.. but why?!) use of an invalid typecode / enum string via simple sanity checks like if not enumVal in myEnum.E: raise(ValueError('Invalid enum value: ' + str(enumVal))).
Ability to import individual enum / exception subclasses via, say, from errmodule import squirrelerror, to avoid cluttering the namespace of the usage environment with non-relevant exception subclasses. I believe this prohibits any solutions requiring post-twiddling on the module level like what Sinux proposed.
For the enum use case, I would rather avoid introducing an additional package dependency since I don't (think I) care about any extra functionality available in the official enum class. In any event, it still wouldn't resolve #1.
I've already figured out implementation I'm satisfied with for all of the above but #1. My interest in a solution to #1 (without breaking the others) is partly a desire to typo-proof entry of the typecode / enum values into source, and partly plain ol' laziness. (Says the guy who just typed up a gigantic SO question on the topic.)
I recommend using collections.namedtuple:
Example:
>>> from collections import namedtuple as nifty_thing
>>> Data = nifty_thing("Data", ["foo", "bar", "baz"])
>>> data = Data(foo=1, bar=2, baz=3)
>>> data.foo
1
>>> data.bar
2
>>> data.baz
3
Side Note: If you are using/on Python 3.x I'd recommend Enum as per #user2357112's comment. This is the standardized approach going forward for Python 3+
Update: Okay so if I understand the OP's exact requirement(s) here I think the only way to do this (and presumably sympy does this too) is to inject the names/variables into the globals() or locals() namespaces. Example:
#!/usr/bin/env python
def nifty_thing(*names):
d = globals()
for name in names:
d[name] = None
nifty_thing("foo", "bar", "baz")
print foo, bar, bar
Output:
$ python foo.py
None None None
NB: I don't really recommend this! :)
Update #2: The other example you showed in your question is implemented like this:
#!/usr/bin/env python
import sys
def nifty_thing(*names):
frame = sys._getframe(1)
locals = frame.f_locals
for name in names:
locals[name] = None
class foo(object):
nifty_thing("foo", "bar", "baz")
f = foo()
print f.foo, f.bar, f.bar
Output:
$ python foo.py
None None None
NB: This is inspired by zope.interface.implements().
current_list = ['bar', 'baz', 'baf']
class foo(object):
"""to be added"""
for i in current_list:
setattr(foo, i, i)
then run this:
>>>f = foo()
>>>print(f.bar)
bar
>>>print(f.baz)
baz
This doesn't work exactly like what you asked for, but it seems like it should do the job:
class AutoNamespace(object):
def __init__(self, names):
try:
# Support space-separated name strings
names = names.split()
except AttributeError:
pass
for name in names:
setattr(self, name, name)
Demo:
>>> x = AutoNamespace('a b c')
>>> x.a
'a'
If you want to do what SymPy does with var, you can, but I would strongly recommend against it. That said, here's a function based on the source code of sympy.var:
def var(names):
from inspect import currentframe
frame = currentframe().f_back
try:
names = names.split()
except AttributeError:
pass
for name in names:
frame.f_globals[name] = name
Demo:
>>> var('foo bar baz')
>>> bar
'bar'
It'll always create global variables, even if you call it from inside a function or class. inspect is used to get at the caller's globals, whereas globals() would get var's own globals.
How about you define the variable as emtpy string and then get their name:
class foo(object):
def __getitem__(self, item):
return item
foo = foo()
print foo['test']
Here's an extension of bman's idea. This has its advantages and disadvantages, but at least it does work with some autocompleters.
class FooMeta(type):
def __getattr__(self, attr):
return attr
def __dir__(self):
return ['bar', 'baz', 'baf']
class foo:
__metaclass__ = FooMeta
This allows access like foo.xxx → 'xxx' for all xxx, but also guides autocomplete through __dir__.
Figured out what I was looking for:
>>> class tester:
... E = frozenset(['this', 'that', 'the', 'other'])
... for s in E:
... exec(str(s) + "='" + str(s) + "'") # <--- THIS
...
>>> tester()
<__main__.tester instance at 0x03018BE8>
>>> t = tester()
>>> t.this
'this'
>>> t.that in tester.E
True
Only have to define the element strings once, and I'm pretty sure it will work for all of my requirements listed in the question. In actual implementation, I plan to encapsulate the str(s) + "='" + str(s) + "'" in a helper function, so that I can just call exec(helper(s)) in the for loop. (I'm pretty sure that the exec has to be placed in the body of the class, not in the helper function, or else the new variables would be injected into the (transitory) scope of the helper function, not that of the class.)
EDIT: Upon detailed testing, this DOES NOT WORK -- the use of exec prevents the introspection of the IDE from knowing of the existence of the created variables.
I think you can achieve a rather beautiful solution using metaclasses, but I'm not fluent enough in using those to present that as an answer, but I do have an option which seems to work rather nicely:
def new_enum(name, *class_members):
"""Builds a class <name> with <class_members> having the name as value."""
return type(name, (object, ), { val : val for val in class_members })
Foo = new_enum('Foo', 'bar', 'baz', 'baf')
This should recreate the class you've given as example, and if you want you can change the inheritance by changing the second parameter of the call to class type(name, bases, dict).
Please consider the following code:
import re
def qcharToUnicode(s):
p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")
return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)
def fixSurrogatePresence(s) :
'''Returns the input UTF-16 string with surrogate pairs replaced by the character they represent'''
# ideas from:
# http://www.unicode.org/faq/utf_bom.html#utf16-4
# http://stackoverflow.com/a/6928284/1503120
def joinSurrogates(match) :
SURROGATE_OFFSET = 0x10000 - ( 0xD800 << 10 ) - 0xDC00
return chr ( ( ord(match.group(1)) << 10 ) + ord(match.group(2)) + SURROGATE_OFFSET )
return re.sub ( '([\uD800-\uDBFF])([\uDC00-\uDFFF])', joinSurrogates, s )
Now my questions below probably reflect a C/C++ way of thinking (and not a "Pythonic" one) but I'm curious nevertheless:
I'd like to know whether the evaluation of the compiled RE object p in qcharToUnicode and SURROGATE_OFFSET in joinSurrogates will take place at each call to the respective functions or only once at the point of definition? I mean in C/C++ one can declare the values as static const and the compile will (IIUC) make the construction occur only once, but in Python we do not have any such declarations.
The question is more pertinent in the case of the compiled RE object, since it seems that the only reason to construct such an object is to avoid the repeated compilation, as the Python RE HOWTO says:
Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re
accessing a regex within a loop, pre-compiling it will save a few function calls.
... and this purpose would be defeated if the compilation were to occur at each function call. I don't want to put the symbol p (or SURROGATE_OFFSET) at module level since I want to restrict its visibility to the relevant function only.
So does the interpreter do something like heuristically determine that the value pointed to by a particular symbol is constant (and visible within a particular function only) and hence need not be reconstructed at next function? Further, is this defined by the language or implementation-dependent? (I hope I'm not asking too much!)
A related question would be about the construction of the function object lambda m in qcharToUnicode -- is it also defined only once like other named function objects declared by def?
The simple answer is that as written, the code will be executed repeatedly at every function call. There is no implicit caching mechanism in Python for the case you describe.
You should get out of the habit of talking about "declarations". A function definition is in fact also "just" a normal statement, so I can write a loop which defines the same function repeatedly:
for i in range(10):
def f(x):
return x*2
y = f(i)
Here, we will incur the cost of creating the function at every loop run. Timing reveals that this code runs in about 75% of the time of the previous code:
def f(x):
return x*2
for i in range(10):
y = f(i)
The standard way of optimising the RE case is as you already know to place the p variable in the module scope, i.e.:
p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")
def qcharToUnicode(s):
return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)
You can use conventions like prepending "_" to the variable to indicate it is not supposed to be used, but normally people won't use it if you haven't documented it. A trick to make the RE function-local is to use a consequence about default parameters: they are executed at the same time as the function definition, so you can do this:
def qcharToUnicode(s, p=re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")):
return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)
This will allow you the same optimisation but also a little more flexibility in your matching function.
Thinking properly about function definitions also allows you to stop thinking about lambda as different from def. The only difference is that def also binds the function object to a name - the underlying object created is the same.
Python is a script/interpreted language... so yes, the assignment will be made every time you call the function. The interpreter will parse your code only once, generating Python bytecode. The next time you call this function, it will be already compiled into Python VM bytecode, so the function will be simply executed.
The re.compile will be called every time, as it would be in other languages. If you want to mimic a static initialization, consider using a global variable, this way it will be called only once. Better, you can create a class with static methods and static members (class and not instance members).
You can check all this using the dis module in Python. So, I just copied and pasted your code in a teste.py module.
>>> import teste
>>> import dis
>>> dis.dis(teste.qcharToUnicode)
4 0 LOAD_GLOBAL 0 (re)
3 LOAD_ATTR 1 (compile)
6 LOAD_CONST 1 ('QChar\\((0x[a-fA-F0-9]*)\\)')
9 CALL_FUNCTION 1
12 STORE_FAST 1 (p)
5 15 LOAD_FAST 1 (p)
18 LOAD_ATTR 2 (sub)
21 LOAD_CONST 2 (<code object <lambda> at 0056C140, file "teste.py", line 5>)
24 MAKE_FUNCTION 0
27 LOAD_FAST 0 (s)
30 CALL_FUNCTION 2
33 RETURN_VALUE
Yes, they are. Suppose re.compile() had a side-effect. That side effect would happen everytime the assignment to p was made, ie., every time the function containing said assignment was called.
This can be verified:
def foo():
print("ahahaha!")
return bar
def f():
return foo()
def funcWithSideEffect():
print("The airspeed velocity of an unladen swallow (european) is...")
return 25
def funcEnclosingAssignment():
p = funcWithSideEffect()
return p;
a = funcEnclosingAssignment()
b = funcEnclosingAssignment()
c = funcEnclosingAssignment()
Each time the enclosing function (analogous to your qcharToUnicode) is called, the statement is printed, revealing that p is being re-evaluated.
This question already has answers here:
Short description of the scoping rules?
(9 answers)
Closed 6 months ago.
This message is a a bit long with many examples, but I hope it
will help me and others to better grasp the full story of variables
and attribute lookup in Python 2.7.
I am using the terms of PEP 227
(http://www.python.org/dev/peps/pep-0227/) for code blocks (such as
modules, class definition, function definitions, etc.) and
variable bindings (such as assignments, argument declarations, class
and function declaration, for loops, etc.)
I am using the terms variables for names that can be called without a
dot, and attributes for names that need to be qualified with an object
name (such as obj.x for the attribute x of object obj).
There are three scopes in Python for all code blocks, but the functions:
Local
Global
Builtin
There are four blocks in Python for the functions only (according to
PEP 227):
Local
Enclosing functions
Global
Builtin
The rule for a variable to bind it to and find it in a block is
quite simple:
any binding of a variable to an object in a block makes this variable
local to this block, unless the variable is declared global (in that
case the variable belongs to the global scope)
a reference to a variable is looked up using the rule LGB (local,
global, builtin) for all blocks, but the functions
a reference to a variable is looked up using the rule LEGB (local,
enclosing, global, builtin) for the functions only.
Let me know take examples validating this rule, and showing many
special cases. For each example, I will give my understanding. Please
correct me if I am wrong. For the last example, I don't understand the
outcome.
example 1:
x = "x in module"
class A():
print "A: " + x #x in module
x = "x in class A"
print locals()
class B():
print "B: " + x #x in module
x = "x in class B"
print locals()
def f(self):
print "f: " + x #x in module
self.x = "self.x in f"
print x, self.x
print locals()
>>>A.B().f()
A: x in module
{'x': 'x in class A', '__module__': '__main__'}
B: x in module
{'x': 'x in class B', '__module__': '__main__'}
f: x in module
x in module self.x in f
{'self': <__main__.B instance at 0x00000000026FC9C8>}
There is no nested scope for the classes (rule LGB) and a function in
a class cannot access the attributes of the class without using a
qualified name (self.x in this example). This is well described in
PEP227.
example 2:
z = "z in module"
def f():
z = "z in f()"
class C():
z = "z in C"
def g(self):
print z
print C.z
C().g()
f()
>>>
z in f()
z in C
Here variables in functions are looked up using the LEGB rule, but if
a class is in the path, the class arguments are skipped. Here again,
this is what PEP 227 is explaining.
example 3:
var = 0
def func():
print var
var = 1
>>> func()
Traceback (most recent call last):
File "<pyshell#102>", line 1, in <module>
func()
File "C:/Users/aa/Desktop/test2.py", line 25, in func
print var
UnboundLocalError: local variable 'var' referenced before assignment
We expect with a dynamic language such as python that everything is
resolved dynamically. But this is not the case for functions. Local
variables are determined at compile time. PEP 227 and
http://docs.python.org/2.7/reference/executionmodel.html describe this
behavior this way
"If a name binding operation occurs anywhere within a code block, all
uses of the name within the block are treated as references to the
current block."
example 4:
x = "x in module"
class A():
print "A: " + x
x = "x in A"
print "A: " + x
print locals()
del x
print locals()
print "A: " + x
>>>
A: x in module
A: x in A
{'x': 'x in A', '__module__': '__main__'}
{'__module__': '__main__'}
A: x in module
But we see here that this statement in PEP227 "If a name binding
operation occurs anywhere within a code block, all uses of the name
within the block are treated as references to the current block." is
wrong when the code block is a class. Moreover, for classes, it seems
that local name binding is not made at compile time, but during
execution using the class namespace. In that respect,
PEP227 and the execution model in the Python doc is misleading and for
some parts wrong.
example 5:
x = 'x in module'
def f2():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
class MyClass(object):
x = x
print x
return MyClass
myfunc()
f2()
>>>
x in module
my understanding of this code is the following. The instruction x = x
first look up the object the right hand x of the expression is referring
to. In that case, the object is looked up locally in the class, then
following the rule LGB it is looked up in the global scope, which is
the string 'x in module'. Then a local attribute x to MyClass is
created in the class dictionary and pointed to the string object.
example 6:
Now here is an example I cannot explain.
It is very close to example 5, I am just changing the local MyClass
attribute from x to y.
x = 'x in module'
def f2():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
class MyClass(object):
y = x
print y
return MyClass
myfunc()
f2()
>>>
x in myfunc
Why in that case the x reference in MyClass is looked up in the
innermost function?
In an ideal world, you'd be right and some of the inconsistencies you found would be wrong. However, CPython has optimized some scenarios, specifically function locals. These optimizations, together with how the compiler and evaluation loop interact and historical precedent, lead to the confusion.
Python translates code to bytecodes, and those are then interpreted by a interpreter loop. The 'regular' opcode for accessing a name is LOAD_NAME, which looks up a variable name as you would in a dictionary. LOAD_NAME will first look up a name as a local, and if that fails, looks for a global. LOAD_NAME throws a NameError exception when the name is not found.
For nested scopes, looking up names outside of the current scope is implemented using closures; if a name is not assigned to but is available in a nested (not global) scope, then such values are handled as a closure. This is needed because a parent scope can hold different values for a given name at different times; two calls to a parent function can lead to different closure values. So Python has LOAD_CLOSURE, MAKE_CLOSURE and LOAD_DEREF opcodes for that situation; the first two opcodes are used in loading and creating a closure for a nested scope, and the LOAD_DEREF will load the closed-over value when the nested scope needs it.
Now, LOAD_NAME is relatively slow; it will consult two dictionaries, which means it has to hash the key first and run a few equality tests (if the name wasn't interned). If the name isn't local, then it has to do this again for a global. For functions, that can potentially be called tens of thousands of times, this can get tedious fast. So function locals have special opcodes. Loading a local name is implemented by LOAD_FAST, which looks up local variables by index in a special local names array. This is much faster, but it does require that the compiler first has to see if a name is a local and not global. To still be able to look up global names, another opcode LOAD_GLOBAL is used. The compiler explicitly optimizes for this case to generate the special opcodes. LOAD_FAST will throw an UnboundLocalError exception when there is not yet a value for the name.
Class definition bodies on the other hand, although they are treated much like a function, do not get this optimization step. Class definitions are not meant to be called all that often; most modules create classes once, when imported. Class scopes don't count when nesting either, so the rules are simpler. As a result, class definition bodies do not act like functions when you start mixing scopes up a little.
So, for non-function scopes, LOAD_NAME and LOAD_DEREF are used for locals and globals, and for closures, respectively. For functions, LOAD_FAST, LOAD_GLOBAL and LOAD_DEREF are used instead.
Note that class bodies are executed as soon as Python executes the class line! So in example 1, class B inside class A is executed as soon as class A is executed, which is when you import the module. In example 2, C is not executed until f() is called, not before.
Lets walk through your examples:
You have nested a class A.B in a class A. Class bodies do not form nested scopes, so even though the A.B class body is executed when class A is executed, the compiler will use LOAD_NAME to look up x. A.B().f() is a function (bound to the B() instance as a method), so it uses LOAD_GLOBAL to load x. We'll ignore attribute access here, that's a very well defined name pattern.
Here f().C.z is at class scope, so the function f().C().g() will skip the C scope and look at the f() scope instead, using LOAD_DEREF.
Here var was determined to be a local by the compiler because you assign to it within the scope. Functions are optimized, so LOAD_FAST is used to look up the local and an exception is thrown.
Now things get a little weird. class A is executed at class scope, so LOAD_NAME is being used. A.x was deleted from the locals dictionary for the scope, so the second access to x results in the global x being found instead; LOAD_NAME looked for a local first and didn't find it there, falling back to the global lookup.
Yes, this appears inconsistent with the documentation. Python-the-language and CPython-the implementation are clashing a little here. You are, however, pushing the boundaries of what is possible and practical in a dynamic language; checking if x should have been a local in LOAD_NAME would be possible but takes precious execution time for a corner case that most developers will never run into.
Now you are confusing the compiler. You used x = x in the class scope, and thus you are setting a local from a name outside of the scope. The compiler finds x is a local here (you assign to it), so it never considers that it could also be a scoped name. The compiler uses LOAD_NAME for all references to x in this scope, because this is not an optimized function body.
When executing the class definition, x = x first requires you to look up x, so it uses LOAD_NAME to do so. No x is defined, LOAD_NAME doesn't find a local, so the global x is found. The resulting value is stored as a local, which happens to be named x as well. print x uses LOAD_NAME again, and now finds the new local x value.
Here you did not confuse the compiler. You are creating a local y, x is not local, so the compiler recognizes it as a scoped name from parent function f2().myfunc(). x is looked up with LOAD_DEREF from the closure, and stored in y.
You could see the confusion between 5 and 6 as a bug, albeit one that is not worth fixing in my opinion. It was certainly filed as such, see issue 532860 in the Python bug tracker, it has been there for over 10 years now.
The compiler could check for a scoped name x even when x is also a local, for that first assignment in example 5. Or LOAD_NAME could check if the name is meant to be a local, really, and throw an UnboundLocalError if no local was found, at the expense of more performance. Had this been in a function scope, LOAD_FAST would have been used for example 5, and an UnboundLocalError would be thrown immediately.
However, as the referenced bug shows, for historical reasons the behaviour is retained. There probably is code out there today that'll break were this bug fixed.
In two words, the difference between example 5 and example 6 is that in example 5 the variable x is also assigned to in the same scope, while not in example 6. This triggers a difference that can be understood by historical reasons.
This raises UnboundLocalError:
x = "foo"
def f():
print x
x = 5
f()
instead of printing "foo". It makes a bit of sense, even if it seems strange at first: the function f() defines the variable x locally, even if it is after the print, and so any reference to x in the same function must be to that local variable. At least it makes sense in that it avoids strange surprizes if you have by mistake reused the name of a global variable locally, and are trying to use both the global and the local variable. This is a good idea because it means that we can statically know, just by looking at a variable, which variable it means. For example, we know that print x refers to the local variable (and thus may raise UnboundLocalError) here:
x = "foo"
def f():
if some_condition:
x = 42
print x
f()
Now, this rule doesn't work for class-level scopes: there, we want expressions like x = x to work, capturing the global variable x into the class-level scope. This means that class-level scopes don't follow the basic rule above: we can't know if x in this scope refers to some outer variable or to the locally-defined x --- for example:
class X:
x = x # we want to read the global x and assign it locally
bar = x # but here we want to read the local x of the previous line
class Y:
if some_condition:
x = 42
print x # may refer to either the local x, or some global x
class Z:
for i in range(2):
print x # prints the global x the 1st time, and 42 the 2nd time
x = 42
So in class scopes, a different rule is used: where it would normally raise UnboundLocalError --- and only in that case --- it instead looks up in the module globals. That's all: it doesn't follow the chain of nested scopes.
Why not? I actually doubt there is a better explanation that "for historical reasons". In more technical terms, it could consider that the variable x is both locally defined in the class scope (because it is assigned to) and should be passed in from the parent scope as a lexically nested variable (because it is read). It would be possible to implement it by using a different bytecode than LOAD_NAME that looks up in the local scope, and falls back to using the nested scope's reference if not found.
EDIT: thanks wilberforce for the reference to http://bugs.python.org/issue532860. We may have a chance to get some discussion reactivated with the proposed new bytecode, if we feel that it should be fixed after all (the bug report considers killing support for x = x but was closed for fear of breaking too much existing code; instead what I'm suggesting here would be to make x = x work in more cases). Or I may be missing another fine point...
EDIT2: it seems that CPython did precisely that in the current 3.4 trunk: http://bugs.python.org/issue17853 ... or not? They introduced the bytecode for a slightly different reason and don't use it systematically...
Long story short, this is a corner case of Python's scoping that is a bit inconsistent, but has to be kept for backwards compatibility (and because it's not that clear what the right answer should be). You can see lots of the original discussion about it on the Python mailing list when PEP 227 was being implemented, and some in the bug for which this behaviour is the fix.
We can work out why there's a difference using the dis module, which lets us look inside code objects to see the bytecode a piece of code has been compiled to. I'm on Python 2.6, so the details of this might be slightly different - but I see the same behaviour, so I think it's probably close enough to 2.7.
The code that initialises each nested MyClass lives in a code object that you can get to via the attributes of the top-level functions. (I'm renaming the functions from example 5 and example 6 to f1 and f2 respectively.)
The code object has a co_consts tuple, which contains the myfunc code object, which in turn has the code that runs when MyClass gets created:
In [20]: f1.func_code.co_consts
Out[20]: (None,
'x in f2',
<code object myfunc at 0x1773e40, file "<ipython-input-3-6d9550a9ea41>", line 4>)
In [21]: myfunc1_code = f1.func_code.co_consts[2]
In [22]: MyClass1_code = myfunc1_code.co_consts[3]
In [23]: myfunc2_code = f2.func_code.co_consts[2]
In [24]: MyClass2_code = myfunc2_code.co_consts[3]
Then you can see the difference between them in bytecode using dis.dis:
In [25]: from dis import dis
In [26]: dis(MyClass1_code)
6 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
7 6 LOAD_NAME 2 (x)
9 STORE_NAME 2 (x)
8 12 LOAD_NAME 2 (x)
15 PRINT_ITEM
16 PRINT_NEWLINE
17 LOAD_LOCALS
18 RETURN_VALUE
In [27]: dis(MyClass2_code)
6 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
7 6 LOAD_DEREF 0 (x)
9 STORE_NAME 2 (y)
8 12 LOAD_NAME 2 (y)
15 PRINT_ITEM
16 PRINT_NEWLINE
17 LOAD_LOCALS
18 RETURN_VALUE
So the only difference is that in MyClass1, x is loaded using the LOAD_NAME op, while in MyClass2, it's loaded using LOAD_DEREF. LOAD_DEREF looks up a name in an enclosing scope, so it gets 'x in myfunc'. LOAD_NAME doesn't follow nested scopes - since it can't see the x names bound in myfunc or f1, it gets the module-level binding.
Then the question is, why does the code of the two versions of MyClass get compiled to two different opcodes? In f1 the binding is shadowing x in the class scope, while in f2 it's binding a new name. If the MyClass scopes were nested functions instead of classes, the y = x line in f2 would be compiled the same, but the x = x in f1 would be a LOAD_FAST - this is because the compiler would know that x is bound in the function, so it should use the LOAD_FAST to retrieve a local variable. This would fail with an UnboundLocalError when it was called.
In [28]: x = 'x in module'
def f3():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
def MyFunc():
x = x
print x
return MyFunc()
myfunc()
f3()
---------------------------------------------------------------------------
Traceback (most recent call last)
<ipython-input-29-9f04105d64cc> in <module>()
9 return MyFunc()
10 myfunc()
---> 11 f3()
<ipython-input-29-9f04105d64cc> in f3()
8 print x
9 return MyFunc()
---> 10 myfunc()
11 f3()
<ipython-input-29-9f04105d64cc> in myfunc()
7 x = x
8 print x
----> 9 return MyFunc()
10 myfunc()
11 f3()
<ipython-input-29-9f04105d64cc> in MyFunc()
5 x = 'x in myfunc'
6 def MyFunc():
----> 7 x = x
8 print x
9 return MyFunc()
UnboundLocalError: local variable 'x' referenced before assignment
This fails because the MyFunc function then uses LOAD_FAST:
In [31]: myfunc_code = f3.func_code.co_consts[2]
MyFunc_code = myfunc_code.co_consts[2]
In [33]: dis(MyFunc_code)
7 0 LOAD_FAST 0 (x)
3 STORE_FAST 0 (x)
8 6 LOAD_FAST 0 (x)
9 PRINT_ITEM
10 PRINT_NEWLINE
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
(As an aside, it's not a big surprise that there should be a difference in how scoping interacts with code in the body of classes and code in a function. You can tell this because bindings at the class level aren't available in methods - method scopes aren't nested inside the class scope in the same way as nested functions are. You have to explicitly reach them via the class, or by using self. (which will fall back to the class if there's not also an instance-level binding).)