globals from multiple modules not visible to exec code? - python

My app executes bits of python logic stored in a configuration file via exec, as in:
"foo() + 2"
This logic commonly references symbols that I store in a module named "standard". For example, in the above, you can see the logic accesses the method foo(), and foo is defined inside standard.py:
def foo():...
To provide the logic with access to the symbols in standard, I'm extracting out the methods from standard into a dictionary, like so:
import standard
my_globals = standard.__dict__
Then I'm adding in a few other relevant symbols to my_globals (which I don't show here) and providing them to the logic, when I execute it:
exec("foo() + 2", my_globals)
This is working. When I look at globals() from inside foo(), I can see other methods I defined in the module standard.py as well as the other relevant symbols I mentioned above and foo() can access all of those things.
The problem comes in when I want to make another module of functions available to the logic as well. Let's say I have a module named custom.py that has some other symbols I want the logic to access. I'm trying to make those symbols available as well by doing this:
import custom
my_globals.update(custom.__dict__)
Let's say my logic now is "bar() + 1", where "bar" is defined inside of custom.py. bar() also wants to access some of those relevant other symbols I added into my_globals.
The problem I'm running in to is that code inside of custom is only seeing the symbols defined inside custom.py, and not everything else stored in my_globals. IE, bar() can't see foo(), nor the other stuff I tucked away into my_globals.
Yet foo() can. It's code can see any other methods I defined in standard, as well as symbols defined in custom, as well as the "extra" symbols I plugged into my_globals.
Why is this happening? My expectation is that the logic being executed is run in the context of the contents of my_globals, so it seems like both foo() and bar() should be able to access any and all symbols in my_globals.
I suspect this has to do with how I'm creating my_globals. Any insight would be greatly appreciated!

Here is some insight:
"To provide the logic with access to the symbols in standard, I'm extracting out the methods from standard into a dictionary, like so:"
import standard
my_globals = standard.__dict__
Not exactly. You're just creating a local variable, my_globals that now points to standard.__dict__. Whenever you update my_globals, you're really just updating standard.__dict__.
When you add your other symbols to my_globals, again, you're just adding them to standard.__dict__.
Calling:
exec("foo() + 2", my_globals)
works great when foo is defined in standard.py, because you've added all the other methods to this module - you now have access to them all.
When you do:
import custom
my_globals.update(custom.__dict__)
You've added your "symbols" from custom.py to the standard module. All the functions in standard can access functions from custom.py after this point
Unfortunately, custom.py itself, doesn't have direct access to the methods in standard.py (unless you import them). From within custom.py, you can see that everything you've created is in standard now:
(from within custom.py):
import sys
def custom_foo():
print(dir(sys.modules['standard'])) # shows that you've put everything in here
sys.modules['standard'].foo() # calls foo from standard.py (assuming you've imported standard in your main pgm)
Above is really ugly though - you could just add a:
from standard import *
at the top of custom.py, and you would have access to everything you've added to its __dict__ instead.
I doubt you really want to do what you're attempting with the whole exec thing, but I'm not really sure what your use case is.
EDIT:
If you really want all the symbols you've attached to my_globals available to the methods of custom.py, you could call:
custom.__dict__.update(my_globals)
After this point, functions in custom.py would have access to everything you've added to the standard.dict (aka my_globals). (You've also overrode any functions defined in custom.py with functions of the same name in my_globals)
Please note, doing things this way is pretty atypical (read: somewhat ill advised).

Related

How to mimic Python modules without import?

I’ve tried to develop a « module expander » tool for Python 3 but I've some issues.
The idea is the following : for a given Python script main.py, the tool generates a functionally equivalent Python script expanded_main.py, by replacing each import statement by the actual code of the imported module; this assumes that the Python source code of the imported is accessible. To do the job the right way, I’m using the builtin module ast of Python as well as astor, a third-party tool allowing to dump the AST back into Python source. The motivation of this import expander is to be able to compile a script into one single bytecode chunk, so the Python VM should not take care of importing modules (this could be useful for MicroPython, for instance).
The simplest case is the statement:
from import my_module1 import *
To transform this, my tool looks for a file my_module1.py and it replaces the import statement by the content of this file. Then, the expanded_main.py can access any name defined in my_module, as if the module was imported the normal way. I don’t care about subtle side effects that may reveal the trick. Also, to simplify, I treat from import my_module1 import a, b, c as the previous import (with asterisk), without caring about possible side effect. So far so good.
Now here is my point. How could you handle this flavor of import:
import my_module2
My first idea was to mimic this by creating a class having the same name as the module and copying the content of the Python file indented:
class my_module2:
# content of my_module2.py
…
This actually works for many cases but, sadly, I discovered that this has several glitches: one of these is that it fails with functions having a body referring to a global variable defined in the module. For example, consider the following two Python files:
# my_module2.py
g = "Hello"
def greetings():
print (g + " World!")
and
# main.py
import my_module2
print(my_module2.g)
my_module2.greetings()
At execution, main.py prints "Hello" and "Hello World!". Now, my expander tool shall generate this:
# expanded_main.py
class my_module2:
g = "Hello"
def greetings():
print (g + " World!")
print(my_module2.g)
my_module2.greetings()
At execution of expanded_main.py, the first print statement is OK ("Hello") but the greetings function raises an exception: NameError: name 'g' is not defined.
What happens actually is that
in the module my_module2, g is a global variable,
in the class my_module2, g is a class variable, which should be referred as my_module2.g.
Other similar side effects happens when you define functions, classes, … in my_module2.py and you want to refer to them in other functions, classes, … of the same my_module2.py.
Any idea how these problems could be solved?
Apart classes, are there other Python constructs that allow to mimic a module?
Final note: I’m aware that the tool should take care 1° of nested imports (recursion), 2° of possible multiple import of the same module. I don't expect to discuss these topics here.
You can execute the source code of a module in the scope of a function, specifically an instance method. The attributes can then be made available by defining __getattr__ on the corresponding class and keeping a copy of the initial function's locals(). Here is some sample code:
class Importer:
def __init__(self):
g = "Hello"
def greetings():
print (g + " World!")
self._attributes = locals()
def __getattr__(self, item):
return self._attributes[item]
module1 = Importer()
print(module1.g)
module1.greetings()
Nested imports are handled naturally by replacing them the same way with an instance of Importer. Duplicate imports shouldn't be a problem either.

How exactly do modules in Python work?

I am trying to better understand Pythons modules, coming from C background mostly.
I have main.py with the following:
def g():
print obj # Need access to the object below
if __name__ == "__main__":
obj = {}
import child
child.f()
And child.py:
def f():
import main
main.g()
This particular structure of code may seem strange at first, but rest assured this is stripped from a larger project I am working on, where delegation of responsibility and decoupling forces the kind of inter-module function call sequence you see.
I need to be able to access the actual object I create when first executing main python main.py. Is this possible without explicitly sending obj as parameter around? Because I will have other variables and I don't want to send these too. If desperate, I can create a "state" object for the entire main module that I need access to, and send it around, but even that is to me a last resort. This is global variables at its simplest in C, but in Python this is a different beast I suppose (module global variables only?)
One of the solutions, excluding parameter passing at least, has turned to revolve around the fact that when executing the main Python module main as such - via f.e. python main.py where if clause suceeds and subsequently, obj is bound - the main module and its state exist and are referenced as __main__ (inspected using sys.modules dictionary). So when the child module needs the actual instance of the main module, it is not main it needs to import but __main__, otherwise two distinct copies would exist, with their own distinct states.
'Fixed' child.py:
def f():
import __main__
__main__.g()

Why import when you need to use the full name?

In python, if you need a module from a different package you have to import it. Coming from a Java background, that makes sense.
import foo.bar
What doesn't make sense though, is why do I need to use the full name whenever I want to use bar? If I wanted to use the full name, why do I need to import? Doesn't using the full name immediately describe which module I'm addressing?
It just seems a little redundant to have from foo import bar when that's what import foo.bar should be doing. Also a little vague why I had to import when I was going to use the full name.
The thing is, even though Python's import statement is designed to look similar to Java's, they do completely different things under the hood. As you know, in Java an import statement is really little more than a hint to the compiler. It basically sets up an alias for a fully qualified class name. For example, when you write
import java.util.Set;
it tells the compiler that throughout that file, when you write Set, you mean java.util.Set. And if you write s.add(o) where s is an object of type Set, the compiler (or rather, linker) goes out and finds the add method in Set.class and puts in a reference to it.
But in Python,
import util.set
(that is a made-up module, by the way) does something completely different. See, in Python, packages and modules are not just names, they're actual objects, and when you write util.set in your code, that instructs Python to access an object named util and look for an attribute on it named set. The job of Python's import statement is to create that object and attribute. The way it works is that the interpreter looks for a file named util/__init__.py, uses the code in it to define properties of an object, and binds that object to the name util. Similarly, the code in util/set.py is used to initialize an object which is bound to util.set. There's a function called __import__ which takes care of all of this, and in fact the statement import util.set is basically equivalent to
util = __import__('util.set')
The point is, when you import a Python module, what you get is an object corresponding to the top-level package, util. In order to get access to util.set you need to go through that, and that's why it seems like you need to use fully qualified names in Python.
There are ways to get around this, of course. Since all these things are objects, one simple approach is to just bind util.set to a simpler name, i.e. after the import statement, you can have
set = util.set
and from that point on you can just use set where you otherwise would have written util.set. (Of course this obscures the built-in set class, so I don't recommend actually using the name set.) Or, as mentioned in at least one other answer, you could write
from util import set
or
import util.set as set
This still imports the package util with the module set in it, but instead of creating a variable util in the current scope, it creates a variable set that refers to util.set. Behind the scenes, this works kind of like
_util = __import__('util', fromlist='set')
set = _util.set
del _util
in the former case, or
_util = __import__('util.set')
set = _util.set
del _util
in the latter (although both ways do essentially the same thing). This form is semantically more like what Java's import statement does: it defines an alias (set) to something that would ordinarily only be accessible by a fully qualified name (util.set).
You can shorten it, if you would like:
import foo.bar as whateveriwant
Using the full name prevents two packages with the same-named submodules from clobbering each other.
There is a module in the standard library called io:
In [84]: import io
In [85]: io
Out[85]: <module 'io' from '/usr/lib/python2.6/io.pyc'>
There is also a module in scipy called io:
In [95]: import scipy.io
In [96]: scipy.io
Out[96]: <module 'scipy.io' from '/usr/lib/python2.6/dist-packages/scipy/io/__init__.pyc'>
If you wanted to use both modules in the same script, then namespaces are a convenient way to distinguish the two.
In [97]: import this
The Zen of Python, by Tim Peters
...
Namespaces are one honking great idea -- let's do more of those!
in Python, importing doesn't just indicate you might use something. The import actually executes code at the module level. You can think of the import as being the moment where the functions are 'interpreted' and created. Any code that is in the _____init_____.py level or not inside a function or class definition happens then.
The import also makes an inexpensive copy of the whole module's namespace and puts it inside the namespace of the file / module / whatever where it is imported. An IDE then has a list of the functions you might be starting to type for command completion.
Part of the Python philosophy is explicit is better than implicit. Python could automatically import the first time you try to access something from a package, but that's not explicit.
I'm also guessing that package initialization would be much more difficult if the imports were automatic, as it wouldn't be done consistently in the code.
You're a bit confused about how Python imports work. (I was too when I first started.) In Python, you can't simply refer to something within a module by the full name, unlike in Java; you HAVE to import the module first, regardless of how you plan on referring to the imported item. Try typing math.sqrt(5) in the interpreter without importing math or math.sqrt first and see what happens.
Anyway... the reason import foo.bar has you required to use foo.bar instead of just bar is to prevent accidental namespace conflicts. For example, what if you do import foo.bar, and then import baz.bar?
You could, of course, choose to do import foo.bar as bar (i.e. aliasing), but if you're doing that you may as well just use from foo import bar. (EDIT: except when you want to import methods and variables. Then you have to use the from ... import ... syntax. This includes instances where you want to import a method or variable without aliasing, i.e. you can't simply do import foo.bar if bar is a method or variable.)
Other than in Java, in Python import foo.bar declares, that you are going to use the thing referred to by foo.bar.
This matches with Python's philosophy that explicit is better than implicit. There are more programming languages that make inter-module dependencies more explicit than Java, for example Ada.
Using the full name makes it possible to disambiguate definitions with the same name coming from different modules.
You don't have to use the full name. Try one of these
from foo import bar
import foo.bar as bar
import foo.bar
bar = foo.bar
from foo import *
A few reasons why explicit imports are good:
They help signal to humans and tools what packages your module depends on.
They avoid the overhead of dynamically determining which packages have to be loaded (and possibly compiled) at run time.
They (along with sys.path) unambiguously distinguish symbols with conflicting names from different namespaces.
They give the programmer some control of what enters the namespace within which he is working.

How can I figure out in my module if the main program uses a specific variable?

I know this does not sound Pythonic, but bear with me for a second.
I am writing a module that depends on some external closed-source module. That module needs to get instantiated to be used (using module.create()).
My module attempts to figure out if my user already loaded that module (easy to do), but then needs to figure out if the module was instantiated. I understand that checking out the type() of each variable can tell me this, but I am not sure how I can get the names of variables defined by the main program. The reason for this is that when one instantiates the model, they also set a bunch of parameters that I do not want to overwrite for any reason.
My attempts so far involved using sys._getframe().f_globals and iterating through the elements, but in my testing it doesn't work. If I instantiate the module as modInst and then call the function in my module, it fails to show the modInst variable. Is there another solution to this? Sample code provided below.
import sys
if moduleName not in sys.modules:
import moduleName
modInst = moduleName.create()
else:
globalVars = sys._getframe().f_globals
for key, value in globalVars:
if value == "Module Name Instance":
return key
return moduleName.create()
EDIT: Sample code included.
Looks like your code assumes that the .create() function was called, if at all, by the immediate/direct caller of your function (which you show only partially, making it pretty hard to be sure about what's going on) and the results placed in a global variable (of the module where the caller of your function resides). It all seems pretty fragile. Doesn't that third-party module have some global variables of its own that are affected by whether the module's create has been called or not? I imagine it would -- where else is it keeping the state-changes resulting from executing the create -- and I would explore that.
To address a specific issue you raise,
I am not sure how I can get the names
of variables defined by the main
program
that's easy -- the main program is found, as a module, in sys.modules['__main__'], so just use vars(sys.modules['__main__']) to get the global dictionary of the main program (the variable names are the keys in that dictionary, along of course with names of functions, classes, etc -- the module, like any other module, has exactly one top-level/global namespace, not one for variables, a separate one for functions, etc).
Suppose the external closed-sourced module is called extmod.
Create my_extmod.py:
import extmod
INSTANTIATED=False
def create(*args,**kw):
global INSTANTIATED
INSTANTIATED=True
return extmod.create(*args,**kw)
Then require your users to import my_extmod instead of extmod directly.
To test if the create function has been called, just check the value of extmod.INSTANTIATED.
Edit: If you open up an IPython session and type import extmod, then type
extmod.[TAB], then you'll see all the top-level variables in the extmod namespace. This might help you find some parameter that changes when extmod.create is called.
Barring that, and barring the possibility of training users to import my_extmod, then perhaps you could use something like the function below. find_extmod_instance searches through all modules in sys.modules.
def find_instance(cls):
for modname in sys.modules:
module=sys.modules[modname]
for value in vars(module).values():
if isinstance(value,cls):
return value
x=find_instance(extmod.ExtmodClass) or extmod.create()

How to make a cross-module variable?

The __debug__ variable is handy in part because it affects every module. If I want to create another variable that works the same way, how would I do it?
The variable (let's be original and call it 'foo') doesn't have to be truly global, in the sense that if I change foo in one module, it is updated in others. I'd be fine if I could set foo before importing other modules and then they would see the same value for it.
If you need a global cross-module variable maybe just simple global module-level variable will suffice.
a.py:
var = 1
b.py:
import a
print a.var
import c
print a.var
c.py:
import a
a.var = 2
Test:
$ python b.py
# -> 1 2
Real-world example: Django's global_settings.py (though in Django apps settings are used by importing the object django.conf.settings).
I don't endorse this solution in any way, shape or form. But if you add a variable to the __builtin__ module, it will be accessible as if a global from any other module that includes __builtin__ -- which is all of them, by default.
a.py contains
print foo
b.py contains
import __builtin__
__builtin__.foo = 1
import a
The result is that "1" is printed.
Edit: The __builtin__ module is available as the local symbol __builtins__ -- that's the reason for the discrepancy between two of these answers. Also note that __builtin__ has been renamed to builtins in python3.
I believe that there are plenty of circumstances in which it does make sense and it simplifies programming to have some globals that are known across several (tightly coupled) modules. In this spirit, I would like to elaborate a bit on the idea of having a module of globals which is imported by those modules which need to reference them.
When there is only one such module, I name it "g". In it, I assign default values for every variable I intend to treat as global. In each module that uses any of them, I do not use "from g import var", as this only results in a local variable which is initialized from g only at the time of the import. I make most references in the form g.var, and the "g." serves as a constant reminder that I am dealing with a variable that is potentially accessible to other modules.
If the value of such a global variable is to be used frequently in some function in a module, then that function can make a local copy: var = g.var. However, it is important to realize that assignments to var are local, and global g.var cannot be updated without referencing g.var explicitly in an assignment.
Note that you can also have multiple such globals modules shared by different subsets of your modules to keep things a little more tightly controlled. The reason I use short names for my globals modules is to avoid cluttering up the code too much with occurrences of them. With only a little experience, they become mnemonic enough with only 1 or 2 characters.
It is still possible to make an assignment to, say, g.x when x was not already defined in g, and a different module can then access g.x. However, even though the interpreter permits it, this approach is not so transparent, and I do avoid it. There is still the possibility of accidentally creating a new variable in g as a result of a typo in the variable name for an assignment. Sometimes an examination of dir(g) is useful to discover any surprise names that may have arisen by such accident.
Define a module ( call it "globalbaz" ) and have the variables defined inside it. All the modules using this "pseudoglobal" should import the "globalbaz" module, and refer to it using "globalbaz.var_name"
This works regardless of the place of the change, you can change the variable before or after the import. The imported module will use the latest value. (I tested this in a toy example)
For clarification, globalbaz.py looks just like this:
var_name = "my_useful_string"
You can pass the globals of one module to onother:
In Module A:
import module_b
my_var=2
module_b.do_something_with_my_globals(globals())
print my_var
In Module B:
def do_something_with_my_globals(glob): # glob is simply a dict.
glob["my_var"]=3
Global variables are usually a bad idea, but you can do this by assigning to __builtins__:
__builtins__.foo = 'something'
print foo
Also, modules themselves are variables that you can access from any module. So if you define a module called my_globals.py:
# my_globals.py
foo = 'something'
Then you can use that from anywhere as well:
import my_globals
print my_globals.foo
Using modules rather than modifying __builtins__ is generally a cleaner way to do globals of this sort.
You can already do this with module-level variables. Modules are the same no matter what module they're being imported from. So you can make the variable a module-level variable in whatever module it makes sense to put it in, and access it or assign to it from other modules. It would be better to call a function to set the variable's value, or to make it a property of some singleton object. That way if you end up needing to run some code when the variable's changed, you can do so without breaking your module's external interface.
It's not usually a great way to do things — using globals seldom is — but I think this is the cleanest way to do it.
I wanted to post an answer that there is a case where the variable won't be found.
Cyclical imports may break the module behavior.
For example:
first.py
import second
var = 1
second.py
import first
print(first.var) # will throw an error because the order of execution happens before var gets declared.
main.py
import first
On this is example it should be obvious, but in a large code-base, this can be really confusing.
I wondered if it would be possible to avoid some of the disadvantages of using global variables (see e.g. http://wiki.c2.com/?GlobalVariablesAreBad) by using a class namespace rather than a global/module namespace to pass values of variables. The following code indicates that the two methods are essentially identical. There is a slight advantage in using class namespaces as explained below.
The following code fragments also show that attributes or variables may be dynamically created and deleted in both global/module namespaces and class namespaces.
wall.py
# Note no definition of global variables
class router:
""" Empty class """
I call this module 'wall' since it is used to bounce variables off of. It will act as a space to temporarily define global variables and class-wide attributes of the empty class 'router'.
source.py
import wall
def sourcefn():
msg = 'Hello world!'
wall.msg = msg
wall.router.msg = msg
This module imports wall and defines a single function sourcefn which defines a message and emits it by two different mechanisms, one via globals and one via the router function. Note that the variables wall.msg and wall.router.message are defined here for the first time in their respective namespaces.
dest.py
import wall
def destfn():
if hasattr(wall, 'msg'):
print 'global: ' + wall.msg
del wall.msg
else:
print 'global: ' + 'no message'
if hasattr(wall.router, 'msg'):
print 'router: ' + wall.router.msg
del wall.router.msg
else:
print 'router: ' + 'no message'
This module defines a function destfn which uses the two different mechanisms to receive the messages emitted by source. It allows for the possibility that the variable 'msg' may not exist. destfn also deletes the variables once they have been displayed.
main.py
import source, dest
source.sourcefn()
dest.destfn() # variables deleted after this call
dest.destfn()
This module calls the previously defined functions in sequence. After the first call to dest.destfn the variables wall.msg and wall.router.msg no longer exist.
The output from the program is:
global: Hello world!
router: Hello world!
global: no message
router: no message
The above code fragments show that the module/global and the class/class variable mechanisms are essentially identical.
If a lot of variables are to be shared, namespace pollution can be managed either by using several wall-type modules, e.g. wall1, wall2 etc. or by defining several router-type classes in a single file. The latter is slightly tidier, so perhaps represents a marginal advantage for use of the class-variable mechanism.
This sounds like modifying the __builtin__ name space. To do it:
import __builtin__
__builtin__.foo = 'some-value'
Do not use the __builtins__ directly (notice the extra "s") - apparently this can be a dictionary or a module. Thanks to ΤΖΩΤΖΙΟΥ for pointing this out, more can be found here.
Now foo is available for use everywhere.
I don't recommend doing this generally, but the use of this is up to the programmer.
Assigning to it must be done as above, just setting foo = 'some-other-value' will only set it in the current namespace.
I use this for a couple built-in primitive functions that I felt were really missing. One example is a find function that has the same usage semantics as filter, map, reduce.
def builtin_find(f, x, d=None):
for i in x:
if f(i):
return i
return d
import __builtin__
__builtin__.find = builtin_find
Once this is run (for instance, by importing near your entry point) all your modules can use find() as though, obviously, it was built in.
find(lambda i: i < 0, [1, 3, 0, -5, -10]) # Yields -5, the first negative.
Note: You can do this, of course, with filter and another line to test for zero length, or with reduce in one sort of weird line, but I always felt it was weird.
I could achieve cross-module modifiable (or mutable) variables by using a dictionary:
# in myapp.__init__
Timeouts = {} # cross-modules global mutable variables for testing purpose
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 60
# in myapp.mod1
from myapp import Timeouts
def wait_app_up(project_name, port):
# wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']
# ...
# in myapp.test.test_mod1
from myapp import Timeouts
def test_wait_app_up_fail(self):
timeout_bak = Timeouts['WAIT_APP_UP_IN_SECONDS']
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 3
with self.assertRaises(hlp.TimeoutException) as cm:
wait_app_up(PROJECT_NAME, PROJECT_PORT)
self.assertEqual("Timeout while waiting for App to start", str(cm.exception))
Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS'] = timeout_bak
When launching test_wait_app_up_fail, the actual timeout duration is 3 seconds.

Categories

Resources