execfile() cannot be used reliably to modify a function’s locals

execfile() cannot be used reliably to modify a function’s locals - python

The python documentation states "execfile() cannot be used reliably to modify a function’s locals." on the page http://docs.python.org/2/library/functions.html#execfile
Can anyone provide any further details on this statement? The documentation is fairly minimal. The statement seems very contradictory to "If both dictionaries are omitted, the expression is executed in the environment where execfile() is called." which is also in the documentation. Is there a special case when excecfile is used within a function then execfile is then acting similar to a function in that it creates a new scoping level?
If I use execfile in a function such as
def testfun():
execfile('thefile.py',globals())
def testfun2():
print a
and there are objects created by the commands in 'thefile.py' (such as the object 'a'), how do I know if they are going to be local objects to testfun or global objects? So, in the function testfun2, 'a' will appear to be a global? If I omit globals() from the execfile statement, can anyone give a more detailed explanation why objects created by commands in 'thefile.py' are not available to 'testfun'?

In Python, the way names are looked up is highly optimized inside functions. One of the side effects is that the mapping returned by locals() gives you a copy of the local names inside a function, and altering that mapping does not actually influence the function:
def foo():
a = 'spam'
locals()['a'] = 'ham'
print(a) # prints 'spam'
Internally, Python uses the LOAD_FAST opcode to look up the a name in the current frame by index, instead of the slower LOAD_NAME, which would look for a local name (by name), then in the globals() mapping if not found in the first.
The python compiler can only emit LOAD_FAST opcodes for local names that are known at compile time; but if you allow the locals() to directly influence a functions' locals then you cannot know all the local names ahead of time. Nested functions using scoped names (free variables) complicates matters some more.
In Python 2, you can force the compiler to switch off the optimizations and use LOAD_NAME always by using an exec statement in the function:
def foo():
a = 'spam'
exec 'a == a' # a noop, but just the presence of `exec` is important
locals()['a'] = 'ham'
print(a) # prints 'ham'
In Python 3, exec has been replaced by exec() and the work-around is gone. In Python 3 all functions are optimized.
And if you didn't follow all this, that's fine too, but that is why the documentation glosses over this a little. It is all due to an implementation detail of the CPython compiler and interpreter that most Python users do not need to understand; all you need to know that using locals() to change local names in a function does not work, usually.

Locals are kind of weird in Python. Regular locals are generally accessed by index, not by name, in the bytecode (as this is faster), but this means that Python has to know all the local variables at compile time. And that means you can't add new ones at runtime.
Now, if you use exec in a function, in Python 2.x, Python knows not to do this and falls back to the slower method of accessing local variables by name, and you can make new ones programmatically. (This trick was removed in Python 3.) You'd think Python would also do this for execfile(), but it doesn't, because exec is a statement and execfile() is a function call, and the name execfile might not refer to the built-in function at runtime (it can be reassigned, after all).
What will happen in your example function? Well, try it and find out! As the documentation for execfile states, if you don't pass in a locals dict, the dict you pass in as globals will be used. You pass in globals() (your module's real global variables) so if it assigns to a, then a becomes a global.
Now you might try something like this:
def testfun():
execfile('thefile.py')
def testfun2():
print a
return testfun2
exec ""
The exec statement at the end forces testfun() to use the old-style name-based local variables. It doesn't even have to be executed, as it is not here; it just has to be in the function somewhere.
But this doesn't work either, because the name-based locals don't support nesting functions with free variables (a in this case). That functionality also requires Python know all the local variables at function definition time. You can't even define the above function—Python won't let you.
In short, trying to deal with local variables programmatically is a pain and the documentation is correct: execfile() cannot reliably be used to modify a function's locals.
A better solution, probably, is to just import the file as a module. You can do this within the function, then access values in the module the usual way.
def testfun():
import thefile
print thefile.a
If you won't know the name of the file to be imported until runtime, you can use __import__ instead. Also, you may need to modify sys.path to make sure the directory you want to import from is first in the path (and put it back afterward, probably).
You can also just pass in your own dictionary to execfile and afterward, access the variables from the executed file using myVarsDict['a'] and so on.

Related

Function closures and meaning of <locals> syntax in object name in Python

Suppose I have the following code:
def outer(information):
print(locals())
def inner():
print("The information given to me is: ", information)
return inner
func1 = outer("info1")
print(func1)
It returns:
{'information': 'info1'}
<function outer.<locals>.inner at 0x1004d9d30>
Of course, if I call func1, it will print with info1 in the statement. So, from printing the locals() in the outer function, I can see that there is some relationship between the local scope and the storage of the argument.
I was expecting func1 to simply be outer.inner, why does the syntax instead say outer.<locals>.inner? Is this a syntactical way of clarifying that there are different local scopes associated to each of these functions - imagine I made another one func2 = outer("info2") - I return using the outer function?
Also, is there something special about the enclosing <> syntax when used around a name? I see it around both the object and locals.

See PEP 3155 -- Qualified name for classes and functions and the example with nested functions.
For nested classes, methods, and nested functions, the __qualname__ attribute contains a dotted path leading to the object from the module top-level. A function's local namespace is represented in that dotted path by a component named <locals>.
Since the __repr__ of a function uses the __qualname__ attribute, you see this extra component in the output when printing a nested function.
I was expecting func1 to simply be outer.inner
That's not a fully qualified name. With this repr you might mistakenly assume you could import the name outer and dynamically access the attribute inner. Remember the qualname is a "dotted path leading to the object", but in this case attribute access is not possible because inner is a local variable.
Also, is there something special about the enclosing <> syntax when used around a name?
There is nothing special about it, but it gives a pretty strong hint to the programmer that you can't access this namespace directly, because the name is not a valid identifier.

You can think of outer.<locals>.inner as saying that inner is a local variable created by the function. inner is what is referred to a closure in computer science. Roughly speaking a closure is like a lambda in that it acts as a function, but it requires non-global data be bundled with it to operate. In memory it acts as a tuple between information and a reference to the function being called.
foo = outer("foo")
bar = outer("bar")
# In memory these more or less looks like the following:
("foo", outer.inner)
("bar", outer.inner)
# And since it was created from a local namespace and can not be accessed
# from a static context local variables bundled with the function, it
# represents that by adding <local> when printed.
# While something like this looks a whole lot more convenient, it gets way
# more annoying to work with when the local variables used are the length of
# your entire terminal screen.
<function outer."foo".inner at 0x1004d9d30>
There is nothing inherently special about the <> other than informing you that <local> has some special meaning.
Edit:
I was not completely sure when writing my answer, but after seeing #wim's answer <local> not only applies to closures created consuming variables within a local context. It can be applied more broadly to all functions (or anything else) created within a local namespace. So in summary foo.<local>.bar just means that "bar was created within the local namespace of foo".

Is it possible to get a warning when using a global variable in a function without expicitely passing it to the function?

In python if you define a global variable it's known to all functions without explicit passing and you can do e.g. this:
x=1
def func():
return x
I know this is normal behaviour, but it's also enabling unclean code because the interpreter doesn't tell you if you forget to pass a global variable to a function.
I'm usually writing scripts for data processing where usually all the code is in one file as data processing is a linear process. For the same reason I only use functions but not classes. However that way of designing leads to having lots of global variables and sometimes forgetting to pass all of them explicitely to functions.
Is there a way to have Python throw a warning when I use a global variable in a function without explicitely passing it?
I'm using Python 3.7.4 on IPython 7.7.0 in Spyder 3.7.

Unfortunately, no. In python, everything is an truly an object, so "globals" would include classes, functions and imported modules too.
However, you can check if a function uses global variables by something like
import inspect, warnings
y = 5
def f(x):
return x + y
def warn_me(func):
if inspect.getclosurevars(func).globals:
warning.warn(f'function {func.__name__} uses global variables')
warn_me(f)
For debugging purposes, you could make a script that walks through your files & functions and checks.

I would recommend avoiding global variables as much as possible. There are some legitimate reasons to use them, but in general there are cleaner ways to achieve the same results.
If you conform to best practices, you would write your global variables in upper case. This is then easy to spot the global variables in your code. Additionally a linting tool such as pylint would warn you with a message like this in the case of your example: Constant name "x" doesn't conform to UPPER_CASE naming style.
If you have a lot of global variables/constants, you could place them all in a global dict CONFIG = {'MARCO': 'Polo', 'ping': 'pong',} and pass this object around from function to function.

Why is globals() a function in Python?

Python offers the function globals() to access a dictionary of all global variables. Why is that a function and not a variable? The following works:
g = globals()
g["foo"] = "bar"
print foo # Works and outputs "bar"
What is the rationale behind hiding globals in a function? And is it better to call it only once and store a reference somewhere or should I call it each time I need it?
IMHO, this is not a duplicate of Reason for globals() in Python?, because I'm not asking why globals() exist but rather why it must be a function (instead of a variable __globals__).

Because it may depend on the Python implementation how much work it is to build that dictionary.
In CPython, globals are kept in just another mapping, and calling the globals() function returns a reference to that mapping. But other Python implementations are free to create a separate dictionary for the object, as needed, on demand.
This mirrors the locals() function, which in CPython has to create a dictionary on demand because locals are normally stored in an array (local names are translated to array access in CPython bytecode).
So you'd call globals() when you need access to the mapping of global names. Storing a reference to that mapping works in CPython, but don't count on other this in other implementations.

How do you declare a global constant in Python?

I have some heavy calculations that I want to do when my program starts, and then I want to save the result (a big bumpy matrix) in memory so that I can use it again and again. My program contains multiple files and classes, and I would like to be able to access this variable from anywhere, and if possible define it as constant.
How do you define a global constant in Python?

You can just declare a variable on the module level and use it in the module as a global variable. An you can also import it to other modules.
#mymodule.py
GLOBAL_VAR = 'Magic String' #or matrix...
def myfunc():
print(GLOBAL_VAR)
Or in other modules:
from mymodule import GLOBAL_VAR

I do not think the marked as good answer solves the op question. The global keyword in Python is used to modify a global variable in a local context (as explained here). This means that if the op modifies SOME_CONSTANT within myfunc the change will affect also outside the function scope (globally).
Not using the global keyword at the begining of myfunc is closer to the sense of global constant than the one suggested. Despite there are no means to render a value constant or immutable in Python.

There is no way to declare a constant in Python. You can just use
SOME_CONSTANT = [...]
If the file name where it is declared is file1.py, then you can access to it from other files in the following way:
import file1
print file1.SOME_CONSTANT
Assuming that both files are in the same directory.

I am not sure what you mean by 'global constant'; because there are no constants in Python (there is no "data protection", all variables are accessible).
You can implement a singleton pattern, but you will have to regenerate this at runtime each time.
Your other option will be to store the results in an external store (like say, redis) which is accessible from all processes.
Depending on how big your data set is, storing it externally in a fast K/V like redis might offer a performance boost as well.
You would still have to transform and load it though, since redis would not know what a numpy array is (although it has many complex types that you can exploit).

How to make a cross-module variable?

The __debug__ variable is handy in part because it affects every module. If I want to create another variable that works the same way, how would I do it?
The variable (let's be original and call it 'foo') doesn't have to be truly global, in the sense that if I change foo in one module, it is updated in others. I'd be fine if I could set foo before importing other modules and then they would see the same value for it.

If you need a global cross-module variable maybe just simple global module-level variable will suffice.
a.py:
var = 1
b.py:
import a
print a.var
import c
print a.var
c.py:
import a
a.var = 2
Test:
$ python b.py
# -> 1 2
Real-world example: Django's global_settings.py (though in Django apps settings are used by importing the object django.conf.settings).

I don't endorse this solution in any way, shape or form. But if you add a variable to the __builtin__ module, it will be accessible as if a global from any other module that includes __builtin__ -- which is all of them, by default.
a.py contains
print foo
b.py contains
import __builtin__
__builtin__.foo = 1
import a
The result is that "1" is printed.
Edit: The __builtin__ module is available as the local symbol __builtins__ -- that's the reason for the discrepancy between two of these answers. Also note that __builtin__ has been renamed to builtins in python3.

I believe that there are plenty of circumstances in which it does make sense and it simplifies programming to have some globals that are known across several (tightly coupled) modules. In this spirit, I would like to elaborate a bit on the idea of having a module of globals which is imported by those modules which need to reference them.
When there is only one such module, I name it "g". In it, I assign default values for every variable I intend to treat as global. In each module that uses any of them, I do not use "from g import var", as this only results in a local variable which is initialized from g only at the time of the import. I make most references in the form g.var, and the "g." serves as a constant reminder that I am dealing with a variable that is potentially accessible to other modules.
If the value of such a global variable is to be used frequently in some function in a module, then that function can make a local copy: var = g.var. However, it is important to realize that assignments to var are local, and global g.var cannot be updated without referencing g.var explicitly in an assignment.
Note that you can also have multiple such globals modules shared by different subsets of your modules to keep things a little more tightly controlled. The reason I use short names for my globals modules is to avoid cluttering up the code too much with occurrences of them. With only a little experience, they become mnemonic enough with only 1 or 2 characters.
It is still possible to make an assignment to, say, g.x when x was not already defined in g, and a different module can then access g.x. However, even though the interpreter permits it, this approach is not so transparent, and I do avoid it. There is still the possibility of accidentally creating a new variable in g as a result of a typo in the variable name for an assignment. Sometimes an examination of dir(g) is useful to discover any surprise names that may have arisen by such accident.

Define a module ( call it "globalbaz" ) and have the variables defined inside it. All the modules using this "pseudoglobal" should import the "globalbaz" module, and refer to it using "globalbaz.var_name"
This works regardless of the place of the change, you can change the variable before or after the import. The imported module will use the latest value. (I tested this in a toy example)
For clarification, globalbaz.py looks just like this:
var_name = "my_useful_string"

You can pass the globals of one module to onother:
In Module A:
import module_b
my_var=2
module_b.do_something_with_my_globals(globals())
print my_var
In Module B:
def do_something_with_my_globals(glob): # glob is simply a dict.
glob["my_var"]=3

Global variables are usually a bad idea, but you can do this by assigning to __builtins__:
__builtins__.foo = 'something'
print foo
Also, modules themselves are variables that you can access from any module. So if you define a module called my_globals.py:
# my_globals.py
foo = 'something'
Then you can use that from anywhere as well:
import my_globals
print my_globals.foo
Using modules rather than modifying __builtins__ is generally a cleaner way to do globals of this sort.

You can already do this with module-level variables. Modules are the same no matter what module they're being imported from. So you can make the variable a module-level variable in whatever module it makes sense to put it in, and access it or assign to it from other modules. It would be better to call a function to set the variable's value, or to make it a property of some singleton object. That way if you end up needing to run some code when the variable's changed, you can do so without breaking your module's external interface.
It's not usually a great way to do things — using globals seldom is — but I think this is the cleanest way to do it.

I wanted to post an answer that there is a case where the variable won't be found.
Cyclical imports may break the module behavior.
For example:
first.py
import second
var = 1
second.py
import first
print(first.var) # will throw an error because the order of execution happens before var gets declared.
main.py
import first
On this is example it should be obvious, but in a large code-base, this can be really confusing.

I wondered if it would be possible to avoid some of the disadvantages of using global variables (see e.g. http://wiki.c2.com/?GlobalVariablesAreBad) by using a class namespace rather than a global/module namespace to pass values of variables. The following code indicates that the two methods are essentially identical. There is a slight advantage in using class namespaces as explained below.
The following code fragments also show that attributes or variables may be dynamically created and deleted in both global/module namespaces and class namespaces.
wall.py
# Note no definition of global variables
class router:
""" Empty class """
I call this module 'wall' since it is used to bounce variables off of. It will act as a space to temporarily define global variables and class-wide attributes of the empty class 'router'.
source.py
import wall
def sourcefn():
msg = 'Hello world!'
wall.msg = msg
wall.router.msg = msg
This module imports wall and defines a single function sourcefn which defines a message and emits it by two different mechanisms, one via globals and one via the router function. Note that the variables wall.msg and wall.router.message are defined here for the first time in their respective namespaces.
dest.py
import wall
def destfn():
if hasattr(wall, 'msg'):
print 'global: ' + wall.msg
del wall.msg
else:
print 'global: ' + 'no message'
if hasattr(wall.router, 'msg'):
print 'router: ' + wall.router.msg
del wall.router.msg
else:
print 'router: ' + 'no message'
This module defines a function destfn which uses the two different mechanisms to receive the messages emitted by source. It allows for the possibility that the variable 'msg' may not exist. destfn also deletes the variables once they have been displayed.
main.py
import source, dest
source.sourcefn()
dest.destfn() # variables deleted after this call
dest.destfn()
This module calls the previously defined functions in sequence. After the first call to dest.destfn the variables wall.msg and wall.router.msg no longer exist.
The output from the program is:
global: Hello world!
router: Hello world!
global: no message
router: no message
The above code fragments show that the module/global and the class/class variable mechanisms are essentially identical.
If a lot of variables are to be shared, namespace pollution can be managed either by using several wall-type modules, e.g. wall1, wall2 etc. or by defining several router-type classes in a single file. The latter is slightly tidier, so perhaps represents a marginal advantage for use of the class-variable mechanism.

This sounds like modifying the __builtin__ name space. To do it:
import __builtin__
__builtin__.foo = 'some-value'
Do not use the __builtins__ directly (notice the extra "s") - apparently this can be a dictionary or a module. Thanks to ΤΖΩΤΖΙΟΥ for pointing this out, more can be found here.
Now foo is available for use everywhere.
I don't recommend doing this generally, but the use of this is up to the programmer.
Assigning to it must be done as above, just setting foo = 'some-other-value' will only set it in the current namespace.

I use this for a couple built-in primitive functions that I felt were really missing. One example is a find function that has the same usage semantics as filter, map, reduce.
def builtin_find(f, x, d=None):
for i in x:
if f(i):
return i
return d
import __builtin__
__builtin__.find = builtin_find
Once this is run (for instance, by importing near your entry point) all your modules can use find() as though, obviously, it was built in.
find(lambda i: i < 0, [1, 3, 0, -5, -10]) # Yields -5, the first negative.
Note: You can do this, of course, with filter and another line to test for zero length, or with reduce in one sort of weird line, but I always felt it was weird.

I could achieve cross-module modifiable (or mutable) variables by using a dictionary:
# in myapp.__init__
Timeouts = {} # cross-modules global mutable variables for testing purpose
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 60
# in myapp.mod1
from myapp import Timeouts
def wait_app_up(project_name, port):
# wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']
# ...
# in myapp.test.test_mod1
from myapp import Timeouts
def test_wait_app_up_fail(self):
timeout_bak = Timeouts['WAIT_APP_UP_IN_SECONDS']
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 3
with self.assertRaises(hlp.TimeoutException) as cm:
wait_app_up(PROJECT_NAME, PROJECT_PORT)
self.assertEqual("Timeout while waiting for App to start", str(cm.exception))
Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS'] = timeout_bak
When launching test_wait_app_up_fail, the actual timeout duration is 3 seconds.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

execfile() cannot be used reliably to modify a function’s locals - python

Related

Function closures and meaning of <locals> syntax in object name in Python

Is it possible to get a warning when using a global variable in a function without expicitely passing it to the function?

Why is globals() a function in Python?

How do you declare a global constant in Python?

How to make a cross-module variable?

Categories

Resources