Function closures and meaning of <locals> syntax in object name in Python - python

Suppose I have the following code:
def outer(information):
print(locals())
def inner():
print("The information given to me is: ", information)
return inner
func1 = outer("info1")
print(func1)
It returns:
{'information': 'info1'}
<function outer.<locals>.inner at 0x1004d9d30>
Of course, if I call func1, it will print with info1 in the statement. So, from printing the locals() in the outer function, I can see that there is some relationship between the local scope and the storage of the argument.
I was expecting func1 to simply be outer.inner, why does the syntax instead say outer.<locals>.inner? Is this a syntactical way of clarifying that there are different local scopes associated to each of these functions - imagine I made another one func2 = outer("info2") - I return using the outer function?
Also, is there something special about the enclosing <> syntax when used around a name? I see it around both the object and locals.

See PEP 3155 -- Qualified name for classes and functions and the example with nested functions.
For nested classes, methods, and nested functions, the __qualname__ attribute contains a dotted path leading to the object from the module top-level. A function's local namespace is represented in that dotted path by a component named <locals>.
Since the __repr__ of a function uses the __qualname__ attribute, you see this extra component in the output when printing a nested function.
I was expecting func1 to simply be outer.inner
That's not a fully qualified name. With this repr you might mistakenly assume you could import the name outer and dynamically access the attribute inner. Remember the qualname is a "dotted path leading to the object", but in this case attribute access is not possible because inner is a local variable.
Also, is there something special about the enclosing <> syntax when used around a name?
There is nothing special about it, but it gives a pretty strong hint to the programmer that you can't access this namespace directly, because the name is not a valid identifier.

You can think of outer.<locals>.inner as saying that inner is a local variable created by the function. inner is what is referred to a closure in computer science. Roughly speaking a closure is like a lambda in that it acts as a function, but it requires non-global data be bundled with it to operate. In memory it acts as a tuple between information and a reference to the function being called.
foo = outer("foo")
bar = outer("bar")
# In memory these more or less looks like the following:
("foo", outer.inner)
("bar", outer.inner)
# And since it was created from a local namespace and can not be accessed
# from a static context local variables bundled with the function, it
# represents that by adding <local> when printed.
# While something like this looks a whole lot more convenient, it gets way
# more annoying to work with when the local variables used are the length of
# your entire terminal screen.
<function outer."foo".inner at 0x1004d9d30>
There is nothing inherently special about the <> other than informing you that <local> has some special meaning.
Edit:
I was not completely sure when writing my answer, but after seeing #wim's answer <local> not only applies to closures created consuming variables within a local context. It can be applied more broadly to all functions (or anything else) created within a local namespace. So in summary foo.<local>.bar just means that "bar was created within the local namespace of foo".

Related

How do variables from an outer scope get evaluated in nested functions?

Code first:
def another_func(func):
func()
def outer_func(pa=1, pb=2):
def inner_func():
print(pa)
print(type(inner_func))
another_func(inner_func)
if __name__ == '__main__':
outer_func()
#print "1"
I am not sure that, "inner_func" calls a parameter of "outer_func", but that is in the body of "outer_func". How can it "know" there is a "pa" when called by another_func?
I mean, what is actually passed to another_func when its called in "outer_func"? It seems there is something more than a reference of the function object.
Function objects in Python aren't just functions, they're closures:1 they carry around a reference to the local environment where the def statement was executed.
In particular, local variables from inside outer_func can be accessed from inside inner_func. (Even if you return inner_func, those values are kept alive so the closure will still work, for as long as inner_func is alive.)
If you add a nonlocal statement inside inner_func, it can even reassign the local variables from the body of outer_func.
How does this work?
Well, a def statement2 is just a statement, like any other. What it does is something like this:
inner_func = _make_closure(<code from compiling inner_func body>, locals())
That <code from compiling inner_func body> is actually a constant value—the compiler compiles the body of every function in your module into constant code objects at import time.
But the function object that comes back from that _make_closure is a new thing that's created on the fly, and it has a reference to the local variables baked into it. Every time you run outer_func, it creates a new inner_func closure from the same <code>, each one capturing the current local environment.
The details are a little more complicated—and, to some extent, they differ between implementations, so this will be CPython-specific.
Part of the compiler's job is figuring out what kind of variable each name in a function is. You may have read the rules on globals vs. locals (a variable is local if and only if you have an assignment for the name somewhere in the function body, and there's no global statement). But closures make things more complicated.
If a variable would have been local, but a nested function references the variable without assigning to it, or has a nonlocal statement, then it's a cell variable in the outer function, and a free variable in the inner function.3
When the interpreter calls a function, it creates a frame object that holds the local namespace—the references to all of the function's local variables.
But cell variables are special: the interpreter creates a special cell object for each one, and a reference to that cell goes into the namespace, so there's an extra dereference in front of the value every time you access or change it.
And what that _make_closure pseudo-code above does is to copy the cells from the outer function's frame to a special attribute on the nested function called __closure__.
Then, when you call the inner function, the interpreter copies those cells from the __closure__ into the frame for that function.
So, the outer function's frame and the inner function's frame both have references to the same cells, which is how they can share variables.
From more on this, see the inspect module's docs, which show you how to find things like __closure__ and co_freevars in your interactive interpreter, and the dis module which lets you look at the actual bytecode that your functions get compiled to.
1. This is one of those words that has a slew of related but different meanings. "Closure" can mean the technique of capturing the local namespace in a function, or it can mean the captured namespace, or it can mean the function with a captured namespace attached to it, or it can mean one of the variables in the captured namespace. Usually it's obvious which one you mean from context. If not, you have to say something like "closure capture" or "closure function" or "closure variable".
2. In case you're wondering, lambda expressions work exactly the same way as def statements. And class definitions are not identical, but similar.
3. It's actually still more complicated if you have multiple layers of nesting, but let's ignore that.
You seem to be confusing the code if the function with the function object. The code object is evaluated only once, when the source file is read. However, a new function object called inner_func is created every time outer_func is called. This happens because a def statement is a type of assignment: it associates a function object with the specified name.
The function object contains a reference to its code as a matter of course, along with references to all the namespaces it will need to operate, including it's parent's nonlocal namespace and the global module namespace.
So the value of pa in inner_func is going to be whatever it is in outer_func at the time of calling. The reference is to the namespace, not the name itself. If outer_func returns (think decorators), the namespace will be fixed, and only accessible through inner_func's special reference to it.

what does "unqualified on right hand side" mean in OOPs (Python)?

I came across "unqualified on right hand side" phrase while reading oops concept in python for usage like self._customer = customer. What that phrase trying to explain?
Complete statement is
For example, the command, self._customer = customer, assigns the instance variable self._customer to the parameter customer; note that because customer is unqualified on the right-hand side, it refers to the parameter in the local namespace. --Data Structures and Algorithms in Python p. 72
According to the Python docs
qualified name
A dotted name showing the “path” from a module’s global scope to a class, function or method defined in that module, as defined in PEP 3155. For top-level functions and classes, the qualified name is the same as the object’s name:
...
When used to refer to modules, the fully qualified name means the entire dotted path to the module, including any parent packages, e.g. email.mime.text:
Put more simply, qualifying a name in Python means that you explicitly define its scope. Thus self._customer is a qualified name (it identifies the instance variable customer for the enclosing class) whereas the bare customerreference does not specify any scope qualifications.
When a name is unqualified, Python applies Lexical Scoping rules to try and find the variable, searching (in order)
Local variables (including function parameters)
Variables local to any outer functions, if we're dealing with a nested function definition
Global variables
Built-in variables

Introspecting for locally-scoped classes (python)

While trying to use introspection to navigate from strings to classes via some of the suggestions in Convert string to Python class object? I noticed that the given approaches won't work to get at a class in scope local to a function. Consider the following code:
import sys
def f():
class LocalClass:
pass
print LocalClass
print 'LocalClass' in dir(sys.modules[__name__])
which gives output
__main__.LocalClass
False
I'm a bit confused as to why LocalClass seems to belong to the main module according to the class object itself, and yet not accessible through sys.modules. Can someone give an explanation?
And is there a way to generate a class from a string, even if that class is only in non-global scope?
In the function f, LocalClass is indeed local. You can see this by trying __main__.LocalClass and seeing that AttributeError: 'module' object has no attribute 'LocalClass' is raised.
As to why the class returns __main__.LocalClass is because by default, the __repr__ function returns <cls.__module__>.<cls.__name__>.
The reason why dir isn't finding it is because it only looks at the variables defined in its scope. LocalClass is local so it won't show up if you are looking in the main module.
A way to create a class from a string can be done in many ways.
The first and easiest to understand is by using exec. Now you shouldn't just go around using exec for random things so I wouldn't reccomend using this method.
The second method is by using the type function. The help page for it returns type(name, bases, dict). This means you can create a class called LocalClass subclassed by object with the attribute foo set to "bar" by doing type("LocalClass", (object,), {"foo": "bar"}) and catching the returned class in a variable. You can make the class global by doing globals()["LocalClass"] = ...
PS: An easier (not sure if prettier) way to get the main module is by doing import __main__. This can be used in any module but I would generally advise against using this unless you know what you are doing because in general, python people don't like you doing this sort of thing.
EDIT: after looking at the linked question, you dont want to dynamically create a new class but to retrieve a variable given it's name. All the answers in the linked question will do that. I'll leave you up to deciding which one you prefer the most
EDIT2: LocalClass.__module__ is the same as __main__ because that was the module you defined the class. If you had defined it in module Foo that was imported by __main__ (and not actually ran standalone), you would find that __module__ would be "B". Even though LocalClass was defined in __main__, it won't automatically go into the global table just because it is a class - in python, as you might have already known, (almost) EVERYTHING is an object. The dir function searches for all variables defined in a scope. As you are looking in the main scope, it is nearly equivalent to be doing __dict__ or globals() but with some slight differences. Because LocalClass is local, it isn't defined in the global context. If however you did locals() whilst inside the function f, you would find that LocalClass would appear in that list

execfile() cannot be used reliably to modify a function’s locals

The python documentation states "execfile() cannot be used reliably to modify a function’s locals." on the page http://docs.python.org/2/library/functions.html#execfile
Can anyone provide any further details on this statement? The documentation is fairly minimal. The statement seems very contradictory to "If both dictionaries are omitted, the expression is executed in the environment where execfile() is called." which is also in the documentation. Is there a special case when excecfile is used within a function then execfile is then acting similar to a function in that it creates a new scoping level?
If I use execfile in a function such as
def testfun():
execfile('thefile.py',globals())
def testfun2():
print a
and there are objects created by the commands in 'thefile.py' (such as the object 'a'), how do I know if they are going to be local objects to testfun or global objects? So, in the function testfun2, 'a' will appear to be a global? If I omit globals() from the execfile statement, can anyone give a more detailed explanation why objects created by commands in 'thefile.py' are not available to 'testfun'?
In Python, the way names are looked up is highly optimized inside functions. One of the side effects is that the mapping returned by locals() gives you a copy of the local names inside a function, and altering that mapping does not actually influence the function:
def foo():
a = 'spam'
locals()['a'] = 'ham'
print(a) # prints 'spam'
Internally, Python uses the LOAD_FAST opcode to look up the a name in the current frame by index, instead of the slower LOAD_NAME, which would look for a local name (by name), then in the globals() mapping if not found in the first.
The python compiler can only emit LOAD_FAST opcodes for local names that are known at compile time; but if you allow the locals() to directly influence a functions' locals then you cannot know all the local names ahead of time. Nested functions using scoped names (free variables) complicates matters some more.
In Python 2, you can force the compiler to switch off the optimizations and use LOAD_NAME always by using an exec statement in the function:
def foo():
a = 'spam'
exec 'a == a' # a noop, but just the presence of `exec` is important
locals()['a'] = 'ham'
print(a) # prints 'ham'
In Python 3, exec has been replaced by exec() and the work-around is gone. In Python 3 all functions are optimized.
And if you didn't follow all this, that's fine too, but that is why the documentation glosses over this a little. It is all due to an implementation detail of the CPython compiler and interpreter that most Python users do not need to understand; all you need to know that using locals() to change local names in a function does not work, usually.
Locals are kind of weird in Python. Regular locals are generally accessed by index, not by name, in the bytecode (as this is faster), but this means that Python has to know all the local variables at compile time. And that means you can't add new ones at runtime.
Now, if you use exec in a function, in Python 2.x, Python knows not to do this and falls back to the slower method of accessing local variables by name, and you can make new ones programmatically. (This trick was removed in Python 3.) You'd think Python would also do this for execfile(), but it doesn't, because exec is a statement and execfile() is a function call, and the name execfile might not refer to the built-in function at runtime (it can be reassigned, after all).
What will happen in your example function? Well, try it and find out! As the documentation for execfile states, if you don't pass in a locals dict, the dict you pass in as globals will be used. You pass in globals() (your module's real global variables) so if it assigns to a, then a becomes a global.
Now you might try something like this:
def testfun():
execfile('thefile.py')
def testfun2():
print a
return testfun2
exec ""
The exec statement at the end forces testfun() to use the old-style name-based local variables. It doesn't even have to be executed, as it is not here; it just has to be in the function somewhere.
But this doesn't work either, because the name-based locals don't support nesting functions with free variables (a in this case). That functionality also requires Python know all the local variables at function definition time. You can't even define the above function—Python won't let you.
In short, trying to deal with local variables programmatically is a pain and the documentation is correct: execfile() cannot reliably be used to modify a function's locals.
A better solution, probably, is to just import the file as a module. You can do this within the function, then access values in the module the usual way.
def testfun():
import thefile
print thefile.a
If you won't know the name of the file to be imported until runtime, you can use __import__ instead. Also, you may need to modify sys.path to make sure the directory you want to import from is first in the path (and put it back afterward, probably).
You can also just pass in your own dictionary to execfile and afterward, access the variables from the executed file using myVarsDict['a'] and so on.

where python store global and local variables?

Almost same as question Where are the local, global, static, auto, register, extern, const, volatile variables are stored?, the difference is this thread is asking how Python language implement this.
Of all those, Python only has "local", "global" and "nonlocal" variables.
Some of those are stored in a Dictionary or dictionary like object, which usually can be explicitly addressed.
"global": Actually "global" variables are global relatively to the module where they are defined - sometimes they are referred to as "module level" variables instead of globals, since most of evils of using global variables in C do not apply - since one won't have neither naming conflicts neither won't know wether a certain name came from when using a module-level global variable.
Their value is stored in the dictionary available as the "__dict__" attribute of a module object. It is important to note that all names in a module are stored in this way - since names in Python point to any akind of object: that is, there is no distinction at the language level, of "variables", functions or classes in a module: the names for all these objects will be keys in the "__dict__" special attribute, which is accessed directly by the Language. Yes, one can insert or change the objects pointed by variables in a module at run time with the usual "setattr", or even changing the module's __dict__ directly.
"local": Local variables are available fr "user code" in a dictionary returned by the "locals()" buil-in function call. This dictionary is referenced by the "f_locals" attribute of the current code frame being run. Since there are ways of retrieving the code frame of functions that called the current running code, one can retrieve values of the variables available in those functions using the f_locals attribute, although in the CPython implementation, changing a value in the f_locals dictionary won't reflect on the actuall variable values of the running code - those values are cached by the bytecode machinery.
"nonlocal" Variables are special references, inside a function to variables defined in an outter scope, in the case of functions (or other code, like a class body) defined inside a function. They can be retrieved in running code, by getting the func_closure attribute - which is a tuple of "cell" objects. For example, to retrieve the value of the first nonlocal variable inside a function object, one does:_
function.func_closure[0].cell_contents - the values are kept separate from the variable names, which can be retrieved as function.func_code.co_varnames. (this naming scheme is valid for Python 2.x)
The bottom-line: Variable "values" are always kept inside objects that are compatible with Python objects and managed by the virtual machine. Some of these data can be made programmatically accessible through introspection - some of it is opaque. (For example, retrieving, through introspection, nonlocal variables from inside the function that owns them itself is a bit tricky)

Categories

Resources