Python: setting __globals__ for modules and their imports - python

I'm developing a test engine with Python, but I'm facing some problems related to module loading and global functions.
The main idea of the framework I'm creating is to load a Python file containing functions and annotations "#thisisatest" to tell which functions are tests. I load this file with imp.load_source, and latter, I spawn threads that calls the function from the loaded module. It's something like this:
module = imp.load_source("test", "testdir/test.py")
function = module.testFunction
thread = threading.Thread(target=function)
thread.start()
Anyway, I want to connect to this test a "assertion function", doing something like:
module = imp.load_source("test", "testdir/test.py")
module.__globals__.assertAndTerminate = assertionFunction
function = module.testFunction
thread = threading.Thread(target=function)
thread.start()
And that's all right. The problem starts when the test.py imports another module that uses the assertAndTerminate function inside it. The module loaded by test.py is completely unaware from the __globals__ from test.py and don't know who's the assertAndTerminate I'm talking about (and that makes sense, since each module has its own __globals__).
Does anyone know a way I could set the same assertAndTerminate function for the test.py module and the modules loaded by it in a thread? I would prefer not searching for imports in a tree, is it possible?
Is there something like Thread(target=function, global_vars=["assertAndTerminate":assertionFunction])?

You need to set the attribute directly on the module; that is the global namespace for that module:
module = imp.load_source("test", "testdir/test.py")
module.assertAndTerminate = assertionFunction
You do have to set globals on a per-module basis. Globals from one module do no not propagate to other modules on import.
You can add to the __builtin__ module (builtin in Python 3):
import __builtin__
__builtin__.assertAndTerminate = assertionFunction
These are then visible in all modules:
>>> import __builtin__
>>> __builtin__.foobar = 'barbaz'
>>> foobar
'barbaz'
Generally speaking, you really want to avoid doing this. Find some other method to solve your problem. Import code instead of relying on globals being set.

Related

Python global variable in import * [duplicate]

I've run into a bit of a wall importing modules in a Python script. I'll do my best to describe the error, why I run into it, and why I'm tying this particular approach to solve my problem (which I will describe in a second):
Let's suppose I have a module in which I've defined some utility functions/classes, which refer to entities defined in the namespace into which this auxiliary module will be imported (let "a" be such an entity):
module1:
def f():
print a
And then I have the main program, where "a" is defined, into which I want to import those utilities:
import module1
a=3
module1.f()
Executing the program will trigger the following error:
Traceback (most recent call last):
File "Z:\Python\main.py", line 10, in <module>
module1.f()
File "Z:\Python\module1.py", line 3, in f
print a
NameError: global name 'a' is not defined
Similar questions have been asked in the past (two days ago, d'uh) and several solutions have been suggested, however I don't really think these fit my requirements. Here's my particular context:
I'm trying to make a Python program which connects to a MySQL database server and displays/modifies data with a GUI. For cleanliness sake, I've defined the bunch of auxiliary/utility MySQL-related functions in a separate file. However they all have a common variable, which I had originally defined inside the utilities module, and which is the cursor object from MySQLdb module.
I later realised that the cursor object (which is used to communicate with the db server) should be defined in the main module, so that both the main module and anything that is imported into it can access that object.
End result would be something like this:
utilities_module.py:
def utility_1(args):
code which references a variable named "cur"
def utility_n(args):
etcetera
And my main module:
program.py:
import MySQLdb, Tkinter
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
And then, as soon as I try to call any of the utilities functions, it triggers the aforementioned "global name not defined" error.
A particular suggestion was to have a "from program import cur" statement in the utilities file, such as this:
utilities_module.py:
from program import cur
#rest of function definitions
program.py:
import Tkinter, MySQLdb
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
But that's cyclic import or something like that and, bottom line, it crashes too. So my question is:
How in hell can I make the "cur" object, defined in the main module, visible to those auxiliary functions which are imported into it?
Thanks for your time and my deepest apologies if the solution has been posted elsewhere. I just can't find the answer myself and I've got no more tricks in my book.
Globals in Python are global to a module, not across all modules. (Many people are confused by this, because in, say, C, a global is the same across all implementation files unless you explicitly make it static.)
There are different ways to solve this, depending on your actual use case.
Before even going down this path, ask yourself whether this really needs to be global. Maybe you really want a class, with f as an instance method, rather than just a free function? Then you could do something like this:
import module1
thingy1 = module1.Thingy(a=3)
thingy1.f()
If you really do want a global, but it's just there to be used by module1, set it in that module.
import module1
module1.a=3
module1.f()
On the other hand, if a is shared by a whole lot of modules, put it somewhere else, and have everyone import it:
import shared_stuff
import module1
shared_stuff.a = 3
module1.f()
… and, in module1.py:
import shared_stuff
def f():
print shared_stuff.a
Don't use a from import unless the variable is intended to be a constant. from shared_stuff import a would create a new a variable initialized to whatever shared_stuff.a referred to at the time of the import, and this new a variable would not be affected by assignments to shared_stuff.a.
Or, in the rare case that you really do need it to be truly global everywhere, like a builtin, add it to the builtin module. The exact details differ between Python 2.x and 3.x. In 3.x, it works like this:
import builtins
import module1
builtins.a = 3
module1.f()
As a workaround, you could consider setting environment variables in the outer layer, like this.
main.py:
import os
os.environ['MYVAL'] = str(myintvariable)
mymodule.py:
import os
myval = None
if 'MYVAL' in os.environ:
myval = os.environ['MYVAL']
As an extra precaution, handle the case when MYVAL is not defined inside the module.
This post is just an observation for Python behaviour I encountered. Maybe the advices you read above don't work for you if you made the same thing I did below.
Namely, I have a module which contains global/shared variables (as suggested above):
#sharedstuff.py
globaltimes_randomnode=[]
globalist_randomnode=[]
Then I had the main module which imports the shared stuff with:
import sharedstuff as shared
and some other modules that actually populated these arrays. These are called by the main module. When exiting these other modules I can clearly see that the arrays are populated. But when reading them back in the main module, they were empty. This was rather strange for me (well, I am new to Python). However, when I change the way I import the sharedstuff.py in the main module to:
from globals import *
it worked (the arrays were populated).
Just sayin'
A function uses the globals of the module it's defined in. Instead of setting a = 3, for example, you should be setting module1.a = 3. So, if you want cur available as a global in utilities_module, set utilities_module.cur.
A better solution: don't use globals. Pass the variables you need into the functions that need it, or create a class to bundle all the data together, and pass it when initializing the instance.
The easiest solution to this particular problem would have been to add another function within the module that would have stored the cursor in a variable global to the module. Then all the other functions could use it as well.
module1:
cursor = None
def setCursor(cur):
global cursor
cursor = cur
def method(some, args):
global cursor
do_stuff(cursor, some, args)
main program:
import module1
cursor = get_a_cursor()
module1.setCursor(cursor)
module1.method()
Since globals are module specific, you can add the following function to all imported modules, and then use it to:
Add singular variables (in dictionary format) as globals for those
Transfer your main module globals to it
.
addglobals = lambda x: globals().update(x)
Then all you need to pass on current globals is:
import module
module.addglobals(globals())
Since I haven't seen it in the answers above, I thought I would add my simple workaround, which is just to add a global_dict argument to the function requiring the calling module's globals, and then pass the dict into the function when calling; e.g:
# external_module
def imported_function(global_dict=None):
print(global_dict["a"])
# calling_module
a = 12
from external_module import imported_function
imported_function(global_dict=globals())
>>> 12
The OOP way of doing this would be to make your module a class instead of a set of unbound methods. Then you could use __init__ or a setter method to set the variables from the caller for use in the module methods.
Update
To test the theory, I created a module and put it on pypi. It all worked perfectly.
pip install superglobals
Short answer
This works fine in Python 2 or 3:
import inspect
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
save as superglobals.py and employ in another module thusly:
from superglobals import *
superglobals()['var'] = value
Extended Answer
You can add some extra functions to make things more attractive.
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
def getglobal(key, default=None):
"""
getglobal(key[, default]) -> value
Return the value for key if key is in the global dictionary, else default.
"""
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals.get(key, default)
def setglobal(key, value):
_globals = superglobals()
_globals[key] = value
def defaultglobal(key, value):
"""
defaultglobal(key, value)
Set the value of global variable `key` if it is not otherwise st
"""
_globals = superglobals()
if key not in _globals:
_globals[key] = value
Then use thusly:
from superglobals import *
setglobal('test', 123)
defaultglobal('test', 456)
assert(getglobal('test') == 123)
Justification
The "python purity league" answers that litter this question are perfectly correct, but in some environments (such as IDAPython) which is basically single threaded with a large globally instantiated API, it just doesn't matter as much.
It's still bad form and a bad practice to encourage, but sometimes it's just easier. Especially when the code you are writing isn't going to have a very long life.

Share namespace of caller with imported module

The short version of the question first:
Assume we have a module called "module" and a python script "caller.py" that imports module.
Is it possible to share the globals() namespace of caller.py with the module?
Such that i could do something like this:
module.py
def print_handle(fkt_name):
globals()[fkt_name]
caller.py:
def function_from_caller():
return 0
import module
module.print_handle('function_from_caller')
# which then returns something like:
# <function __main__.function_from_caller()>
Long version:
As far as I understand, the scope of imported module in python is restricted to that module.
Anything that is not defined in the module or imported somehow is unknown to it.
If a module is imported I can share it's namespace with the namespace of caller by either specifically naming the functions of interest with
from module import function_of_interest
or to share the full namespace
from module import *
However, as far as I know it is not possible to achieve this the other way around, or is it?
Can I pass the namespace from the caller function to the module in any way?
I.e. with something like
pi = 3
import module with pi
or in case I want to pass everything
import module with *
If this is not possible as suspected, why is that?
I do not see the reason why you should do that.
if the module has anb attribute a in it, you can just do
module.a = 12
If you have many attributes, just use setattr(module, AttrName, AttrValue) in a for loop.
MMan

Importing modules in python (3 modules)

Let's say i have 3 modules within the same directory. (module1,module2,module3)
Suppose the 2nd module imports the 3rd module then if i import module2 in module 1. Does that automatically import module 3 to module 1 ?
Thanks
No. The imports only work inside a module. You can verify that by creating a test.
Saying,
# module1
import module2
# module2
import module3
# in module1
module3.foo() # oops
This is reasonable because you can think in reverse: if imports cause a chain of importing, it'll be hard to decide which function is from which module, thus causing complex naming conflicts.
No, it will not be imported unless you explicitly specify python to, like so:
from module2 import *
What importing does conceptually is outlined below.
import some_module
The statement above is equivalent to:
module_variable = import_module("some_module")
All we have done so far is bind some object to a variable name.
When it comes to the implementation of import_module it is also not that hard to grasp.
def import_module(module_name):
if module_name in sys.modules:
module = sys.modules[module_name]
else:
filename = find_file_for_module(module_name)
python_code = open(filename).read()
module = create_module_from_code(python_code)
sys.modules[module_name] = module
return module
First, we check if the module has been imported before. If it was, then it will be available in the global list of all modules (sys.modules), and so will simply be reused. In the case that the module is not available, we create it from the code. Once the function returns, the module will be assigned to the variable name that you have chosen. As you can see the process is not inefficient or wasteful. All you are doing is creating an alias for your module. In most cases, transparency is prefered, hence having a quick look at the top of the file can tell you what resources are available to you. Otherwise, you might end up in a situation where you are wondering where is a given resource coming from. So, that is why you do not get modules inherently "imported".
Resource:
Python doc on importing

How do I load a Python module with custom globals using importlib?

I'm trying to put together a small build system in Python that generates Ninja files for my C++ project. Its behavior should be similar to CMake; that is, a bldfile.py script defines rules and targets and optionally recurses into one or more directories by calling bld.subdir(). Each bldfile.py script has a corresponding bld.File object. When the bldfile.py script is executing, the bld global should be predefined as that file's bld.File instance, but only in that module's scope.
Additionally, I would like to take advantage of Python's bytecode caching somehow, but the .pyc file should be stored in the build output directory instead of in a __pycache__ directory alongside the bldfile.py script.
I know I should use importlib (requiring Python 3.4+ is fine), but I'm not sure how to:
Load and execute a module file with custom globals.
Re-use the bytecode caching infrastructure.
Any help would be greatly appreciated!
Injecting globals into a module before execution is an interesting idea. However, I think it conflicts with several points of the Zen of Python. In particular, it requires writing code in the module that depends on global values which are not explicitly defined, imported, or otherwise obtained - unless you know the particular procedure required to call the module.
This may be an obvious or slick solution for the specific use case but it is not very intuitive. In general, (Python) code should be explicit. Therefore, I would go for a solution where parameters are explicitly passed to the executing code. Sounds like functions? Right:
bldfile.py
def exec(bld):
print('Working with bld:', bld)
# ...
calling the module:
# set bld
# Option 1: static import
import bldfile
bldfile.exec(bld)
# Option 2: dynamic import if bldfile.py is located dynamically
import importlib.util
spec = importlib.util.spec_from_file_location("unique_name", "subdir/subsubdir/bldfile.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
module.exec(bld)
That way no code (apart from the function definition) is executed when importing the module. The exec function needs to be called explicitly and when looking at the code inside exec it is clear where bld comes from.
I studied importlib's source code and since I don't intend to make a reusable Loader, it seems like a lot of unnecessary complexity. So I just settled on creating a module with types.ModuleType, adding bld to the module's __dict__, compiling and caching the bytecode with compile, and executing the module with exec. At a low level, that's basically all importutil does anyway.
It is possible to overcome the lack of possibility by using dummy module, which would load its globals .
#service.py
module = importlib.import_module('userset')
module.user = user
module = importlib.import_module('config')
#config.py
from userset import *
#now you can use user from service.py

Does python ever multiply-load a module?

I'm noticing some weird situations where tests like the following fail:
x = <a function from some module, passed around some big application for a while>
mod = __import__(x.__module__)
x_ref = getattr(mod, x.__name__)
assert x_ref is x # Fails
(Code like this appears in the pickle module)
I don't think I have any import hooks, reload calls, or sys.modules manipulation that would mess with python's normal import caching behavior.
Is there any other reason why a module would be loaded twice? I've seen claims about this (e.g, https://stackoverflow.com/a/10989692/1332492), but I haven't been able to reproduce it in a simple, isolated script.
I believe you misunderstood how __import__ works:
>>> from my_package import my_module
>>> my_module.function.__module__
'my_package.my_module'
>>> __import__(my_module.function.__module__)
<module 'my_package' from './my_package/__init__.py'>
From the documentation:
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not
the module named by name. However, when a non-empty fromlist
argument is given, the module named by name is returned.
As you can see __import__ does not return the sub-module, but only the top package. If you have function also defined at package level you will indeed have different references to it.
If you want to just load a module you should use importlib.import_module instead of __import__.
As to answer you actual question: AFAIK there is no way to import the same module, with the same name, twice without messing around with the importing mechanism. However, you could have a submodule of a package that is also available in the sys.path, in this case you can import it twice using different names:
from some.package import submodule
import submodule as submodule2
print(submodule is submodule2) # False. They have *no* relationships.
This sometimes can cause problems with, e.g., pickle. If you pickle something referenced by submodule you cannot unpickle it using submodule2 as reference.
However this doesn't address the specific example you gave us, because using the __module__ attribute the import should return the correct module.

Categories

Resources