To preface, I think I may have figured out how to get this code working (based on Changing module variables after import), but my question is really about why the following behavior occurs so I can understand what to not do in the future.
I have three files. The first is mod1.py:
# mod1.py
import mod2
var1A = None
def func1A():
global var1
var1 = 'A'
mod2.func2()
def func1B():
global var1
print var1
if __name__ == '__main__':
func1A()
Next I have mod2.py:
# mod2.py
import mod1
def func2():
mod1.func1B()
Finally I have driver.py:
# driver.py
import mod1
if __name__ == '__main__':
mod1.func1A()
If I execute the command python mod1.py then the output is None. Based on the link I referenced above, it seems that there is some distinction between mod1.py being imported as __main__ and mod1.py being imported from mod2.py. Therefore, I created driver.py. If I execute the command python driver.py then I get the expected output: A. I sort of see the difference, but I don't really see the mechanism or the reason for it. How and why does this happen? It seems counterintuitive that the same module would exist twice. If I execute python mod1.py, would it be possible to access the variables in the __main__ version of mod1.py instead of the variables in the version imported by mod2.py?
The __name__ variable always contains the name of the module, except when the file has been loaded into the interpreter as a script instead. Then that variable is set to the string '__main__' instead.
After all, the script is then run as the main file of the whole program, everything else are modules imported directly or indirectly by that main file. By testing the __name__ variable, you can thus detect if a file has been imported as a module, or was run directly.
Internally, modules are given a namespace dictionary, which is stored as part of the metadata for each module, in sys.modules. The main file, the executed script, is stored in that same structure as '__main__'.
But when you import a file as a module, python first looks in sys.modules to see if that module has already been imported before. So, import mod1 means that we first look in sys.modules for the mod1 module. It'll create a new module structure with a namespace if mod1 isn't there yet.
So, if you both run mod1.py as the main file, and later import it as a python module, it'll get two namespace entries in sys.modules. One as '__main__', then later as 'mod1'. These two namespaces are completely separate. Your global var1 is stored in sys.modules['__main__'], but func1B is looking in sys.modules['mod1'] for var1, where it is None.
But when you use python driver.py, driver.py becomes the '__main__' main file of the program, and mod1 will be imported just once into the sys.modules['mod1'] structure. This time round, func1A stores var1 in the sys.modules['mod1'] structure, and that's what func1B will find.
Regarding a practical solution for using a module optionally as main script - supporting consistent cross-imports:
Solution 1:
See e.g. in Python's pdb module, how it is run as a script by importing itself when executing as __main__ (at the end) :
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
...
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
Just I would recommend to reorganize the __main__ startup to the beginning of the script like this:
#! /usr/bin/env python
"""A Python debugger."""
# When invoked as main program, invoke the debugger on a script
import sys
if __name__ == '__main__':
##assert os.path.splitext(os.path.basename(__file__))[0] == 'pdb'
import pdb
pdb.main()
sys.exit(0)
import linecache
...
This way the module body is not executed twice - which is "costly", undesirable and sometimes critical.
Solution 2:
In rarer cases it is desirable to expose the actual script module __main__ even directly as the actual module alias (mod1):
# mod1.py
import mod2
...
if __name__ == '__main__':
# use main script directly as cross-importable module
_mod = sys.modules['mod1'] = sys.modules[__name__]
##_modname = os.path.splitext(os.path.basename(os.path.realpath(__file__)))[0]
##_mod = sys.modules[_modname] = sys.modules[__name__]
func1A()
Known drawbacks:
reload(_mod) fails
pickle'ed classes would need extra mappings for unpickling (find_global ..)
Related
To illustrate the issue I am having, please consider the following. I have two .py files, one named main.py and the other named mymodule.py. They are both in the same directory.
The contents of main.py:
from mymodule import myfunction
myfunction()
The contents of mymodule.py:
def myfunction():
for number in range(0,10):
print(number)
print("Hi")
I was under the impression that importing a function would only import that function. However, when I run main.py, this is what I get:
Hi
0
1
2
3
4
5
6
7
8
9
Why is print("Hi") being called? It isn't part of the function I imported!
I was under the impression that importing a function would only import that function.
It seems there's an incorrect assumption about what a from-import actually does.
The first time a module is imported, an import statement will execute the entire module, including print calls made at the global scope (docs). This is true regardless of whether the mymodule was first imported by using a statement like import mymodule or by using a statement like from mymodule import myfunction.
Subsequent imports of the same module will re-use an existing module cached in sys.modules, which may be how you arrived at the misunderstanding that the entire module is not executed.
There is a common pattern to avoid global level code being executed by a module import. Often you will find code which is not intended to be executed at import time located inside a conditional, like this:
def myfunction():
for number in range(0,10):
print(number)
if __name__ == "__main__":
print("Hi")
In order to import something from the module Python needs to load this module first. At that moment all the code at module-level is executed.
According to the docs:
A module can contain executable statements as well as function
definitions. These statements are intended to initialize the module.
They are executed only the first time the module name is encountered
in an import statement.
this question seems to be a duplicate of this one.
In short : all the code of a python file is called when importing the module. What is neither a function nor a class is usually put in a main function called here:
if __name__ == "__main__":
# stuff only to run when not called via 'import' here
main()
Please consider closing this thread.
To preface, I think I may have figured out how to get this code working (based on Changing module variables after import), but my question is really about why the following behavior occurs so I can understand what to not do in the future.
I have three files. The first is mod1.py:
# mod1.py
import mod2
var1A = None
def func1A():
global var1
var1 = 'A'
mod2.func2()
def func1B():
global var1
print var1
if __name__ == '__main__':
func1A()
Next I have mod2.py:
# mod2.py
import mod1
def func2():
mod1.func1B()
Finally I have driver.py:
# driver.py
import mod1
if __name__ == '__main__':
mod1.func1A()
If I execute the command python mod1.py then the output is None. Based on the link I referenced above, it seems that there is some distinction between mod1.py being imported as __main__ and mod1.py being imported from mod2.py. Therefore, I created driver.py. If I execute the command python driver.py then I get the expected output: A. I sort of see the difference, but I don't really see the mechanism or the reason for it. How and why does this happen? It seems counterintuitive that the same module would exist twice. If I execute python mod1.py, would it be possible to access the variables in the __main__ version of mod1.py instead of the variables in the version imported by mod2.py?
The __name__ variable always contains the name of the module, except when the file has been loaded into the interpreter as a script instead. Then that variable is set to the string '__main__' instead.
After all, the script is then run as the main file of the whole program, everything else are modules imported directly or indirectly by that main file. By testing the __name__ variable, you can thus detect if a file has been imported as a module, or was run directly.
Internally, modules are given a namespace dictionary, which is stored as part of the metadata for each module, in sys.modules. The main file, the executed script, is stored in that same structure as '__main__'.
But when you import a file as a module, python first looks in sys.modules to see if that module has already been imported before. So, import mod1 means that we first look in sys.modules for the mod1 module. It'll create a new module structure with a namespace if mod1 isn't there yet.
So, if you both run mod1.py as the main file, and later import it as a python module, it'll get two namespace entries in sys.modules. One as '__main__', then later as 'mod1'. These two namespaces are completely separate. Your global var1 is stored in sys.modules['__main__'], but func1B is looking in sys.modules['mod1'] for var1, where it is None.
But when you use python driver.py, driver.py becomes the '__main__' main file of the program, and mod1 will be imported just once into the sys.modules['mod1'] structure. This time round, func1A stores var1 in the sys.modules['mod1'] structure, and that's what func1B will find.
Regarding a practical solution for using a module optionally as main script - supporting consistent cross-imports:
Solution 1:
See e.g. in Python's pdb module, how it is run as a script by importing itself when executing as __main__ (at the end) :
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
...
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
Just I would recommend to reorganize the __main__ startup to the beginning of the script like this:
#! /usr/bin/env python
"""A Python debugger."""
# When invoked as main program, invoke the debugger on a script
import sys
if __name__ == '__main__':
##assert os.path.splitext(os.path.basename(__file__))[0] == 'pdb'
import pdb
pdb.main()
sys.exit(0)
import linecache
...
This way the module body is not executed twice - which is "costly", undesirable and sometimes critical.
Solution 2:
In rarer cases it is desirable to expose the actual script module __main__ even directly as the actual module alias (mod1):
# mod1.py
import mod2
...
if __name__ == '__main__':
# use main script directly as cross-importable module
_mod = sys.modules['mod1'] = sys.modules[__name__]
##_modname = os.path.splitext(os.path.basename(os.path.realpath(__file__)))[0]
##_mod = sys.modules[_modname] = sys.modules[__name__]
func1A()
Known drawbacks:
reload(_mod) fails
pickle'ed classes would need extra mappings for unpickling (find_global ..)
Say I am using Python 3 (and hence absolute imports) and my directory structure looks like this:
> package:
> sub_directory
__init__.py
sub_dir_file.py
sub_dir_file2.py
__init__.py
main_dir_file.py
In the file sub_dir_file.py I wish to import a function from sub_dir_file2.py. The catch is, I want to be able to run sub_dir_file.py with __name__ == '__main__', as well as import it in main_dir_file.py. Hence, if in sub_dir_file.py I use a relative import:
from .sub_dir_file2 import some_function
the module executes perfectly fine when run from main_dir_file.py, but throws an error when executed directly (as the relative import cannot be executed when __name__ == '__main__'. If I however use a normal absolute import, sub_dir_file.py will execute as a main, but cannot be imported from main_dir_file.py.
What would be the most elegant way of solving this problem? One obvious solution seems to be:
if __name__ == '__main__':
from sub_dir_file2 import some_function
else:
from .sub_dir_file2 import some_function
However, it doesn't seem very pythonic.
You should use the relative import syntax from .sub_dir_file2 import some_function or eventually the absolute syntax from package.sub_directory.sub_dir_file2 import some_function.
Then, in order to call one of the package sub module, it is simpler to use the -m option of the python interpreter to execute its content as the __main__ module.
Search sys.path for the named module and execute its contents as the
main module.
Since the argument is a module name, you must not give a file
extension (.py). The module name should be a valid absolute Python
module name, but the implementation may not always enforce this (e.g.
it may allow you to use a name that includes a hyphen).
For example:
> python -m package.main_dir_file
> python -m package.sub_directory.sub_dir_file
I would suggest using a main() function invoked if name is __main__. It's a good habit anyway, as far as I'm aware.
That way you can just call the imported module's main() yourself. It has other benefits, too, like allowing you to test or re-invoke the module without necessarily executing the file every time.
To preface, I think I may have figured out how to get this code working (based on Changing module variables after import), but my question is really about why the following behavior occurs so I can understand what to not do in the future.
I have three files. The first is mod1.py:
# mod1.py
import mod2
var1A = None
def func1A():
global var1
var1 = 'A'
mod2.func2()
def func1B():
global var1
print var1
if __name__ == '__main__':
func1A()
Next I have mod2.py:
# mod2.py
import mod1
def func2():
mod1.func1B()
Finally I have driver.py:
# driver.py
import mod1
if __name__ == '__main__':
mod1.func1A()
If I execute the command python mod1.py then the output is None. Based on the link I referenced above, it seems that there is some distinction between mod1.py being imported as __main__ and mod1.py being imported from mod2.py. Therefore, I created driver.py. If I execute the command python driver.py then I get the expected output: A. I sort of see the difference, but I don't really see the mechanism or the reason for it. How and why does this happen? It seems counterintuitive that the same module would exist twice. If I execute python mod1.py, would it be possible to access the variables in the __main__ version of mod1.py instead of the variables in the version imported by mod2.py?
The __name__ variable always contains the name of the module, except when the file has been loaded into the interpreter as a script instead. Then that variable is set to the string '__main__' instead.
After all, the script is then run as the main file of the whole program, everything else are modules imported directly or indirectly by that main file. By testing the __name__ variable, you can thus detect if a file has been imported as a module, or was run directly.
Internally, modules are given a namespace dictionary, which is stored as part of the metadata for each module, in sys.modules. The main file, the executed script, is stored in that same structure as '__main__'.
But when you import a file as a module, python first looks in sys.modules to see if that module has already been imported before. So, import mod1 means that we first look in sys.modules for the mod1 module. It'll create a new module structure with a namespace if mod1 isn't there yet.
So, if you both run mod1.py as the main file, and later import it as a python module, it'll get two namespace entries in sys.modules. One as '__main__', then later as 'mod1'. These two namespaces are completely separate. Your global var1 is stored in sys.modules['__main__'], but func1B is looking in sys.modules['mod1'] for var1, where it is None.
But when you use python driver.py, driver.py becomes the '__main__' main file of the program, and mod1 will be imported just once into the sys.modules['mod1'] structure. This time round, func1A stores var1 in the sys.modules['mod1'] structure, and that's what func1B will find.
Regarding a practical solution for using a module optionally as main script - supporting consistent cross-imports:
Solution 1:
See e.g. in Python's pdb module, how it is run as a script by importing itself when executing as __main__ (at the end) :
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
...
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
Just I would recommend to reorganize the __main__ startup to the beginning of the script like this:
#! /usr/bin/env python
"""A Python debugger."""
# When invoked as main program, invoke the debugger on a script
import sys
if __name__ == '__main__':
##assert os.path.splitext(os.path.basename(__file__))[0] == 'pdb'
import pdb
pdb.main()
sys.exit(0)
import linecache
...
This way the module body is not executed twice - which is "costly", undesirable and sometimes critical.
Solution 2:
In rarer cases it is desirable to expose the actual script module __main__ even directly as the actual module alias (mod1):
# mod1.py
import mod2
...
if __name__ == '__main__':
# use main script directly as cross-importable module
_mod = sys.modules['mod1'] = sys.modules[__name__]
##_modname = os.path.splitext(os.path.basename(os.path.realpath(__file__)))[0]
##_mod = sys.modules[_modname] = sys.modules[__name__]
func1A()
Known drawbacks:
reload(_mod) fails
pickle'ed classes would need extra mappings for unpickling (find_global ..)
When writing python modules, is there a way to prevent it being imported twice by the client codes? Just like the c/c++ header files do:
#ifndef XXX
#define XXX
...
#endif
Thanks very much!
Python modules aren't imported multiple times. Just running import two times will not reload the module. If you want it to be reloaded, you have to use the reload statement. Here's a demo
foo.py is a module with the single line
print("I am being imported")
And here is a screen transcript of multiple import attempts.
>>> import foo
Hello, I am being imported
>>> import foo # Will not print the statement
>>> reload(foo) # Will print it again
Hello, I am being imported
Imports are cached, and only run once. Additional imports only cost the lookup time in sys.modules.
As specified in other answers, Python generally doesn't reload a module when encountering a second import statement for it. Instead, it returns its cached version from sys.modules without executing any of its code.
However there are several pitfalls worth noting:
Importing the main module as an ordinary module effectively creates two instances of the same module under different names.
This occurs because during program startup the main module is set up with the name __main__. Thus, when importing it as an ordinary module, Python doesn't detect it in sys.modules and imports it again, but with its proper name the second time around.
Consider the file /tmp/a.py with the following content:
# /tmp/a.py
import sys
print "%s executing as %s, recognized as %s in sys.modules" % (__file__, __name__, sys.modules[__name__])
import b
Another file /tmp/b.py has a single import statement for a.py (import a).
Executing /tmp/a.py results in the following output:
root#machine:/tmp$ python a.py
a.py executing as __main__, recognized as <module '__main__' from 'a.py'> in sys.modules
/tmp/a.pyc executing as a, recognized as <module 'a' from '/tmp/a.pyc'> in sys.modules
Therefore, it is best to keep the main module fairly minimal and export most of its functionality to an external module, as advised here.
This answer specifies two more possible scenarios:
Slightly different import statements utilizing different entries in sys.path leading to the same module.
Attempting another import of a module after a previous one failed halfway through.