I have two modules as follows:
Module A - moda.py
import modb
x = None
def printx():
global x
print(x)
def main():
global x
x = 42
printx()
modb.printx()
printx()
if __name__ == '__main__':
main()
Module B - modb.py
import moda
def printx():
moda.printx()
print('modb imported')
When I run python moda.py, the output I get is:
modb imported
42
None
42
I don't understand why the second print (coming from modb.printx()) is None. I thought python modules behaved as singletons. What am I missing?
Can someone please explain why the module imported in modb is not same as the original module moda?
When an import statement is encountered, the interpreter looks for a corresponding key in sys.modules. If a key is found, it is bound to the name you requested. If not, a new empty module object is createdplaced in sys.modules, , and then populated. The reason for doing it like that is exactly to avoid infinite loops with circular imports.
When you run a module, it is imported under the name __main__. You ca
Here is the sequence of events when you run moda as a script:
Start of load of moda.py as sys.modules['__main__']. At this point, this is just an empty namespace
import modb encountered in moda.py. New empty namespace created for sys.modules['modb'].
import moda encountered in modb.py. New empty namespace created for sys.modules['moda']. Notice that this is not the same object as sys.modules['__main__'] in step 1.
import modb encountered in moda.py. Since sys.modules['modb'] exists, it is bound to that name in moda
Since moda.py is currently being loaded under the name moda, it finishes populating its namespace without running the import guard.
modb.py finishes populating its namespace (from step 2.) and runs print('modb loaded').
__main__ defined in moda.py finishes populating its namespace (from step 1.) and runs the import guard.
Hopefully this helps you visualize what happens. You have three modules, not two, that were loaded, because moda is loaded under two different names, and as two entirely different module objects.
The import guard in __main__ calls __main__.main, which does the following:
Set __main__.x = 42 (moda.x is still None)
__main__.printx prints __main__.x, which is 42
modb.printx calls moda.printx, which prints moda.x, which is None.
__main__.printx prints __main__.x again, which is still 42.
Related
To illustrate the issue I am having, please consider the following. I have two .py files, one named main.py and the other named mymodule.py. They are both in the same directory.
The contents of main.py:
from mymodule import myfunction
myfunction()
The contents of mymodule.py:
def myfunction():
for number in range(0,10):
print(number)
print("Hi")
I was under the impression that importing a function would only import that function. However, when I run main.py, this is what I get:
Hi
0
1
2
3
4
5
6
7
8
9
Why is print("Hi") being called? It isn't part of the function I imported!
I was under the impression that importing a function would only import that function.
It seems there's an incorrect assumption about what a from-import actually does.
The first time a module is imported, an import statement will execute the entire module, including print calls made at the global scope (docs). This is true regardless of whether the mymodule was first imported by using a statement like import mymodule or by using a statement like from mymodule import myfunction.
Subsequent imports of the same module will re-use an existing module cached in sys.modules, which may be how you arrived at the misunderstanding that the entire module is not executed.
There is a common pattern to avoid global level code being executed by a module import. Often you will find code which is not intended to be executed at import time located inside a conditional, like this:
def myfunction():
for number in range(0,10):
print(number)
if __name__ == "__main__":
print("Hi")
In order to import something from the module Python needs to load this module first. At that moment all the code at module-level is executed.
According to the docs:
A module can contain executable statements as well as function
definitions. These statements are intended to initialize the module.
They are executed only the first time the module name is encountered
in an import statement.
this question seems to be a duplicate of this one.
In short : all the code of a python file is called when importing the module. What is neither a function nor a class is usually put in a main function called here:
if __name__ == "__main__":
# stuff only to run when not called via 'import' here
main()
Please consider closing this thread.
To preface, I think I may have figured out how to get this code working (based on Changing module variables after import), but my question is really about why the following behavior occurs so I can understand what to not do in the future.
I have three files. The first is mod1.py:
# mod1.py
import mod2
var1A = None
def func1A():
global var1
var1 = 'A'
mod2.func2()
def func1B():
global var1
print var1
if __name__ == '__main__':
func1A()
Next I have mod2.py:
# mod2.py
import mod1
def func2():
mod1.func1B()
Finally I have driver.py:
# driver.py
import mod1
if __name__ == '__main__':
mod1.func1A()
If I execute the command python mod1.py then the output is None. Based on the link I referenced above, it seems that there is some distinction between mod1.py being imported as __main__ and mod1.py being imported from mod2.py. Therefore, I created driver.py. If I execute the command python driver.py then I get the expected output: A. I sort of see the difference, but I don't really see the mechanism or the reason for it. How and why does this happen? It seems counterintuitive that the same module would exist twice. If I execute python mod1.py, would it be possible to access the variables in the __main__ version of mod1.py instead of the variables in the version imported by mod2.py?
The __name__ variable always contains the name of the module, except when the file has been loaded into the interpreter as a script instead. Then that variable is set to the string '__main__' instead.
After all, the script is then run as the main file of the whole program, everything else are modules imported directly or indirectly by that main file. By testing the __name__ variable, you can thus detect if a file has been imported as a module, or was run directly.
Internally, modules are given a namespace dictionary, which is stored as part of the metadata for each module, in sys.modules. The main file, the executed script, is stored in that same structure as '__main__'.
But when you import a file as a module, python first looks in sys.modules to see if that module has already been imported before. So, import mod1 means that we first look in sys.modules for the mod1 module. It'll create a new module structure with a namespace if mod1 isn't there yet.
So, if you both run mod1.py as the main file, and later import it as a python module, it'll get two namespace entries in sys.modules. One as '__main__', then later as 'mod1'. These two namespaces are completely separate. Your global var1 is stored in sys.modules['__main__'], but func1B is looking in sys.modules['mod1'] for var1, where it is None.
But when you use python driver.py, driver.py becomes the '__main__' main file of the program, and mod1 will be imported just once into the sys.modules['mod1'] structure. This time round, func1A stores var1 in the sys.modules['mod1'] structure, and that's what func1B will find.
Regarding a practical solution for using a module optionally as main script - supporting consistent cross-imports:
Solution 1:
See e.g. in Python's pdb module, how it is run as a script by importing itself when executing as __main__ (at the end) :
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
...
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
Just I would recommend to reorganize the __main__ startup to the beginning of the script like this:
#! /usr/bin/env python
"""A Python debugger."""
# When invoked as main program, invoke the debugger on a script
import sys
if __name__ == '__main__':
##assert os.path.splitext(os.path.basename(__file__))[0] == 'pdb'
import pdb
pdb.main()
sys.exit(0)
import linecache
...
This way the module body is not executed twice - which is "costly", undesirable and sometimes critical.
Solution 2:
In rarer cases it is desirable to expose the actual script module __main__ even directly as the actual module alias (mod1):
# mod1.py
import mod2
...
if __name__ == '__main__':
# use main script directly as cross-importable module
_mod = sys.modules['mod1'] = sys.modules[__name__]
##_modname = os.path.splitext(os.path.basename(os.path.realpath(__file__)))[0]
##_mod = sys.modules[_modname] = sys.modules[__name__]
func1A()
Known drawbacks:
reload(_mod) fails
pickle'ed classes would need extra mappings for unpickling (find_global ..)
Let's say i have a really long script.(1000+ lines long, in my case) so i split it into sepperate files:
main.py #the file i execute
foo1.py #a file my main.py imports
foo2.py #a file imported by foo1.py
(note: main.py imports several files, not just the one)
Foo1.py holds Tkinter, and things related to it, while Foo2.py holds a huge object class with functions related to said class.
My problem is as follows:
Foo1 imports Foo2
Foo2 runs a function that calls another function from Foo1
Foo2 raises a 'global name ' is not defined' error
And also i can't import the function into Foo2, because Foo1 already has it and that raises an import error.
When two modules import each other there are a few things you need to keep in mind so everything is defined before it is needed.
First lets consider the mechanic of importing:
when a module is imported for the first time an entry is added to sys.modules and the defining file starts executing (pausing the execution of the import-er)
subsequent imports will simply use the entry in sys.modules - whether or not the file finished executing
So lets say module A is loaded first, imports module B which then imports module A, when this happens execution is as follows:
A is imported from something else for first time, A is added to sys.modules
A is executed up to importing B
B is added to sys.modules
B is executed:
when it imports A the partially loaded module is used from sys.modules
B runs completely before resuming
A resumes executing, having access to the complete module B
*1 so from A import x can only work if x is defined in A before import B, just using import A will give you the module object which is updated as the file executes, so while it may not have all the definitions right after import it will when the file has a chance to finish executing.
So the simplest way of solving this is to first not rely on the import for the execution of the module - meaning all the uses of the circular import is within def blocks that are not called from the module level of execution.
To preface, I think I may have figured out how to get this code working (based on Changing module variables after import), but my question is really about why the following behavior occurs so I can understand what to not do in the future.
I have three files. The first is mod1.py:
# mod1.py
import mod2
var1A = None
def func1A():
global var1
var1 = 'A'
mod2.func2()
def func1B():
global var1
print var1
if __name__ == '__main__':
func1A()
Next I have mod2.py:
# mod2.py
import mod1
def func2():
mod1.func1B()
Finally I have driver.py:
# driver.py
import mod1
if __name__ == '__main__':
mod1.func1A()
If I execute the command python mod1.py then the output is None. Based on the link I referenced above, it seems that there is some distinction between mod1.py being imported as __main__ and mod1.py being imported from mod2.py. Therefore, I created driver.py. If I execute the command python driver.py then I get the expected output: A. I sort of see the difference, but I don't really see the mechanism or the reason for it. How and why does this happen? It seems counterintuitive that the same module would exist twice. If I execute python mod1.py, would it be possible to access the variables in the __main__ version of mod1.py instead of the variables in the version imported by mod2.py?
The __name__ variable always contains the name of the module, except when the file has been loaded into the interpreter as a script instead. Then that variable is set to the string '__main__' instead.
After all, the script is then run as the main file of the whole program, everything else are modules imported directly or indirectly by that main file. By testing the __name__ variable, you can thus detect if a file has been imported as a module, or was run directly.
Internally, modules are given a namespace dictionary, which is stored as part of the metadata for each module, in sys.modules. The main file, the executed script, is stored in that same structure as '__main__'.
But when you import a file as a module, python first looks in sys.modules to see if that module has already been imported before. So, import mod1 means that we first look in sys.modules for the mod1 module. It'll create a new module structure with a namespace if mod1 isn't there yet.
So, if you both run mod1.py as the main file, and later import it as a python module, it'll get two namespace entries in sys.modules. One as '__main__', then later as 'mod1'. These two namespaces are completely separate. Your global var1 is stored in sys.modules['__main__'], but func1B is looking in sys.modules['mod1'] for var1, where it is None.
But when you use python driver.py, driver.py becomes the '__main__' main file of the program, and mod1 will be imported just once into the sys.modules['mod1'] structure. This time round, func1A stores var1 in the sys.modules['mod1'] structure, and that's what func1B will find.
Regarding a practical solution for using a module optionally as main script - supporting consistent cross-imports:
Solution 1:
See e.g. in Python's pdb module, how it is run as a script by importing itself when executing as __main__ (at the end) :
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
...
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
Just I would recommend to reorganize the __main__ startup to the beginning of the script like this:
#! /usr/bin/env python
"""A Python debugger."""
# When invoked as main program, invoke the debugger on a script
import sys
if __name__ == '__main__':
##assert os.path.splitext(os.path.basename(__file__))[0] == 'pdb'
import pdb
pdb.main()
sys.exit(0)
import linecache
...
This way the module body is not executed twice - which is "costly", undesirable and sometimes critical.
Solution 2:
In rarer cases it is desirable to expose the actual script module __main__ even directly as the actual module alias (mod1):
# mod1.py
import mod2
...
if __name__ == '__main__':
# use main script directly as cross-importable module
_mod = sys.modules['mod1'] = sys.modules[__name__]
##_modname = os.path.splitext(os.path.basename(os.path.realpath(__file__)))[0]
##_mod = sys.modules[_modname] = sys.modules[__name__]
func1A()
Known drawbacks:
reload(_mod) fails
pickle'ed classes would need extra mappings for unpickling (find_global ..)
To preface, I think I may have figured out how to get this code working (based on Changing module variables after import), but my question is really about why the following behavior occurs so I can understand what to not do in the future.
I have three files. The first is mod1.py:
# mod1.py
import mod2
var1A = None
def func1A():
global var1
var1 = 'A'
mod2.func2()
def func1B():
global var1
print var1
if __name__ == '__main__':
func1A()
Next I have mod2.py:
# mod2.py
import mod1
def func2():
mod1.func1B()
Finally I have driver.py:
# driver.py
import mod1
if __name__ == '__main__':
mod1.func1A()
If I execute the command python mod1.py then the output is None. Based on the link I referenced above, it seems that there is some distinction between mod1.py being imported as __main__ and mod1.py being imported from mod2.py. Therefore, I created driver.py. If I execute the command python driver.py then I get the expected output: A. I sort of see the difference, but I don't really see the mechanism or the reason for it. How and why does this happen? It seems counterintuitive that the same module would exist twice. If I execute python mod1.py, would it be possible to access the variables in the __main__ version of mod1.py instead of the variables in the version imported by mod2.py?
The __name__ variable always contains the name of the module, except when the file has been loaded into the interpreter as a script instead. Then that variable is set to the string '__main__' instead.
After all, the script is then run as the main file of the whole program, everything else are modules imported directly or indirectly by that main file. By testing the __name__ variable, you can thus detect if a file has been imported as a module, or was run directly.
Internally, modules are given a namespace dictionary, which is stored as part of the metadata for each module, in sys.modules. The main file, the executed script, is stored in that same structure as '__main__'.
But when you import a file as a module, python first looks in sys.modules to see if that module has already been imported before. So, import mod1 means that we first look in sys.modules for the mod1 module. It'll create a new module structure with a namespace if mod1 isn't there yet.
So, if you both run mod1.py as the main file, and later import it as a python module, it'll get two namespace entries in sys.modules. One as '__main__', then later as 'mod1'. These two namespaces are completely separate. Your global var1 is stored in sys.modules['__main__'], but func1B is looking in sys.modules['mod1'] for var1, where it is None.
But when you use python driver.py, driver.py becomes the '__main__' main file of the program, and mod1 will be imported just once into the sys.modules['mod1'] structure. This time round, func1A stores var1 in the sys.modules['mod1'] structure, and that's what func1B will find.
Regarding a practical solution for using a module optionally as main script - supporting consistent cross-imports:
Solution 1:
See e.g. in Python's pdb module, how it is run as a script by importing itself when executing as __main__ (at the end) :
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
...
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
Just I would recommend to reorganize the __main__ startup to the beginning of the script like this:
#! /usr/bin/env python
"""A Python debugger."""
# When invoked as main program, invoke the debugger on a script
import sys
if __name__ == '__main__':
##assert os.path.splitext(os.path.basename(__file__))[0] == 'pdb'
import pdb
pdb.main()
sys.exit(0)
import linecache
...
This way the module body is not executed twice - which is "costly", undesirable and sometimes critical.
Solution 2:
In rarer cases it is desirable to expose the actual script module __main__ even directly as the actual module alias (mod1):
# mod1.py
import mod2
...
if __name__ == '__main__':
# use main script directly as cross-importable module
_mod = sys.modules['mod1'] = sys.modules[__name__]
##_modname = os.path.splitext(os.path.basename(os.path.realpath(__file__)))[0]
##_mod = sys.modules[_modname] = sys.modules[__name__]
func1A()
Known drawbacks:
reload(_mod) fails
pickle'ed classes would need extra mappings for unpickling (find_global ..)