I've read this post about circular imports in Python. It describes the following scenario and argues that this raises an error when run:
# module1
import module2
def function1():
module2.function2()
def function3():
print('Goodbye, World!')
# module2
import module1
def function2():
print('Hello, World!')
module1.function3()
# __init__.py
import module1
module1.function1()
But when I run this (Python 3.95), it runs perfectly fine. The post is pretty old and it doesn't specify the Python version it uses. Maybe there was some change in latter Pythons that support this?
Here's a simplified sequence of events that happen in the code in Python 3:
__init__.py starts running
An empty __main__ module is added to sys.modules
import module1 starts loading module1.py
An empty module1 module is added to sys.modules
import module2 starts loading module2.py
An empty module2 module is added to sys.modules
module2.function2 is created and added to module2.__dict__
The fact that function2 references names in module1 does not affect the creation of the function object in any way
module2 is fully loaded and execution returns to module1
module1.function1 and module1.function3 are created and added to module1.__dict__
Again, it does not matter what names the functions reference because they are not being called. AttributeError and NameError can be raised at runtime if necessary.
module1 is fully loaded and execution returns to __main__
module1.function runs successfully, since all the names it references are resolvable.
As you can see, there are no circular import issues in this particular sequence of imports because module1 and module2 do not attempt to call each other's functions. The current import system allows both modules to load before the functions are called.
The post you mention is from 2017, and must be using a version of python from before 3.0. A hint is found in the link in the following quote, which links to the python-2.x docs:
This approach doesn't contradict Python syntax, as the Python documentation says: "It is customary but not required to place all import statements at the beginning of a module (or script, for that matter)".
The paragraph after that is a bit misleading by the way:
The Python documentation also says that it is advisable to use import X, instead of other statements, such as from module import *, or from module import a,b,c.
While star imports are certainly discouraged, specific-name imports of the form from module import a,b,c are generally very much encouraged with few exceptions.
Related
I currently have a module I created that has a number of functions.
It's getting quite large so I figured I should make it into a package and split the functions up to make it more manageable.
I'm just testing out how this all works before I do this for real so apologies if it seems a bit tenuous.
I've created a folder called pack_test and in it I have:
__init__.py
foo.py
bar.py
__init__.py contains:
__all__ = ['foo', 'bar']
from . import *
import subprocess
from os import environ
In the console I can write import pack_test as pt and this is fine, no errors.
pt. and two tabs shows me that I can see pt.bar, pt.environ, pt.foo and pt.subprocess in there.
All good so far.
If I want to reference subprocess or environ in foo.py or bar.py how do I do it in there?
If in bar.py I have a function which just does return subprocess.call('ls') it errors saying NameError: name 'subprocess' is not defined. There must be something I'm missing which enables me to reference subprocess from the level above? Presumably, once I can get the syntax from that I can also just call environ in a similar way?
The alternative as I could see it would be to have import subprocess in both foo.py and bar.py but then this seems a bit odd to me to have it appear across multiple files when I could have it the once at a higher level, particularly if I went on to have a large number of files rather than just 2 in this example.
TL;DR:
__init__.py :
import foo
import bar
__all__ = ["foo", "bar"]
foo.py:
import subprocess
from os import environ
# your code here
bar.py
import subprocess
from os import environ
# your code here
There must be something I'm missing which enables me to reference subprocess from the level above?
Nope, this is the expected behaviour.
import loads a module (if it isn't already), caches it in sys.modules (idem), and bind the imported names in the current namespace. Each Python module has (or "is") it's own namespace (there's no real "global" namespace). IOW, you have to import what you need in each module, ie if foo.py needs subprocess, it must explicitely import it.
This can seem a bit tedious at first but in the long run it really helps wrt/ maintainability - you just have to read the imports at the top of your module (pep 08: always put all imports at the beginning of the module) to know where a name comes from.
Also you should not use star imports (aka wild card imports aka from xxx import *) anywhere else than in your python shell (and even then...) - it's a maintainance time bomb. Not only because you don't know where each name comes from, but also because it's a sure way to rebind an already import name. Imagine that your foo module defines function "func". Somewhere you have "from foo import *; from bar import *", then later in the code a call to func. Now someone edits bar.py and adds a (distinct) "func" function, and suddenly you call fails, because you're not calling the expected "func". Now enjoy debugging this... And real-life examples are usually a bit more complex than this.
So if you fancy your mental sanity, don't be lazy, don't try to be smart either, just do the simple obvious thing: explicitely import the names you're interested in at the top of your modules.
(been here, done that etc)
You could create modules.py containing
import subprocess
import os
Then in foo.py or any of your files just have.
from modules import *
Your import statements in your files are then static and just update modules.py when you want to add an additional module accessible to them all.
Consider the following:
a.py
foo = 1
b.py
bar = 2
c.py
import a
kik = 3
d.py
import a
import c
def main():
import b
main()
main()
How many times is a.py loaded?
How many times is b.py loaded?
More generally, I would like to know how is Python handling imported files and functions/variables?
Both a and b are loaded once. When you import a module, its content is cached so when you load the same module again, you're not calling upon the original script for the import, done using a "finder":
https://www.python.org/dev/peps/pep-0451/#finder
https://docs.python.org/3/library/importlib.html#importlib.abc.MetaPathFinder
This works across modules so if you had a d.py of which import b, it will bind to the same cache as an import within c.py.
Some interesting builtin modules can help understand what happens during an import:
https://docs.python.org/3/reference/import.html#importsystem
When a module is first imported, Python searches for the module and if found, it creates a module object 1, initializing it.
Notably here the first import, all imports after follow the __import__. Internal caches of finders are stored at sys.meta_path.
https://docs.python.org/3/library/functions.html#import
You can leverage the import system to invalidate those caches for example:
https://docs.python.org/3/library/importlib.html#importlib.import_module
If you are dynamically importing a module that was created since the interpreter began execution (e.g., created a Python source file), you may need to call invalidate_caches() in order for the new module to be noticed by the import system.
The imp (and importlib py3.4+) allows the recompilation of a module after import:
import imp
import a
imp.reload(a)
https://docs.python.org/3/library/importlib.html#importlib.reload
Python module’s code is recompiled and the module-level code re-executed, defining a new set of objects which are bound to names in the module’s dictionary by reusing the loader which originally loaded the module.
https://docs.python.org/3/library/imp.html
I might be completely wrong here, but I can't find a proper google source for the dilemma that I have:
Let's say we are using python, and we have files
foo.py and bar.py, which have the following pseudocode:
Code in foo.py:
# Code in foo.py
import sys
def foo():
# Some blah code for foo function
And code in bar.py is:
# Code in bar.py
import sys
import foo
def bar():
# Some blah code for bar function
Now, what I am wondering is : Will this not cause code bloat?, since we have imported sys twice in bar.py. Once via import sys and another time because we are doing import foo?
Additionally, what will be the correct thing to do when you have to include libraries in multiple files, which in turn will be included in other files?
Importing a module twice in python does not introduce "bloat". A second import is a mere name-lookup in a cached modules-dictionary (sys.modules, to be precise. Which in case of sys makes this even less relevant, as there is actually nothing that doesn't implicitly trigger an import of sys - although it's obviously not exposed in the namespace).
And what happens if you import some parts of a module x in foo.py, and need them and possibly others in bar.py? Having to carefully groom your imports, and then use foo.something_from_x or x.something_else_from_x in bar.py would be extremely cumbersome to write and maintain.
TLDR: don't worry. Really. Don't.
This would not cause any kind of code bloat . When you import the same library multiple times , python only actually imports it one time (the first time) , and then caches it in sys.modules , and then later on everytime you do import sys , it returns the module object from sys.modules.
A very simple example to show this -
Lets say I have an a.py -
print("In A")
This would print In A everytime the module is imported. Now lets try to import this in multiple times -
>>> import a
In A
>>> import a
>>> import a
>>> import a
As you can see the code was imported only once actually, the rest of the times the cached object was returned. To check sys.modules -
>>> import sys
>>> sys.modules['a']
<module 'a' from '\\path\to\\a.py'>
When you import a module, what happens is that python imports the code , and creates a module object and then creates a name in the local namespace with either the name of the module (if no as keyword was provided , otherwise the name provided after as keyword) , and assigns the module object to it.
The same thing happens when doing it when importing in other modules. Another example -
b.py -
import a
print("In B")
c.py -
import b
import a
print("In C")
Result of running c.py -
In A
In B
In C
As you can see , a.py was only imported once.
I want to create a Python module that works like NumPy. The methods are not only sub-modules in the leaves of the tree from the module source. There is a root module containing many methods that I can call directly, and there are also sub-modules. The problem is the root methods must be defined somewhere. I was thinking to have a directory structure:
module/
__init__.py
core.py
stuff1.py
submodule/
__init__.py
stuff2.py
stuff3.py
Now that I want is for everything inside "core" to be imported into the "module" namespace, as if it were a module.py file, and the contents of core.py were inside this module.py. The problem is that module is a directory instead of a file, so how do I define these methods that should sit in the root of the module?
I tried putting "from core import *" inside init.py, but that didn't work. (EDIT: Actually it does.)
Should I have the core methods inside a "module.py" file, and also a module directory? I don't know if that works, but it looks pretty awkward.
What I think you want is to be able to do this:
# some_other_script.py
import module
# Do things using routines defined in module.core
What happens when you ask Python to import module is (in a very basic sense), module/__init__.py is run, and a module object is created and imported into your namespace. This object (again, very basically) encompasses the things that happened when __init__.py was run: name definitions and so on. These can be accessed through module.something.
Now, if your setup looks like this:
# module/__init__.py
from module.core import Clazz
c = Clazz()
print c # Note: demo only! Module-level side-effects are usually a bad idea!
When you import module, you'll see a print statement like this:
<module.core.Clazz object at 0x00BBAA90>
Great. But if you then try to access c, you'll get a NameError:
# some_other_script.py
import module # prints "<module.core.Clazz object at 0x00BBAA90>"
print c # NameError (c is not defined)
This is because you haven't imported c; you've imported module. If instead your entry-point script looks like this:
# some_other_script.py
import module # prints "<module.core.Clazz object at 0x00BBAA90>"
print module.c # Access c *within module*
Everything will run fine. This will also work fine with from core import * and/or from module import *, but I (and PEP8) advise against that just because it's not very clear what's going on in the script when you start mucking around with wild imports. For clarity:
# module/core.py
def core_func():
return 1
# module/__init__.py
from core import *
def mod_func():
return 2
The above is really pretty much fine, although you might as well make core "private" (rename to _core) to indicate that there's no reason to touch it from outside the package anymore.
# some_other_script.py
from module import *
print core_func() # Prints 1
print mod_func() # Prints 2
Check out information about the __all__ list. It allows you to define what names are exported.
Tag it as such and you can setup a function to determine what to pull in from your submodules:
#property
all(self):
#Whatever introspective code you may want for your modules
__all__ += submodule.__all__
If you just want the whole damn shabang in module space, here's a way:
$ ipython
In [1]: from foomod import *
In [2]: printbar()
Out[2]: 'Imported from a foreign land'
In [3]: ^D
Do you really want to exit ([y]/n)?
$ ls foomod/
__init__.py __init__.pyc core.py core.pyc submodule
$ grep . foomod/*.py
foomod/__init__.py:from foomod.core import *
foomod/core.py:def printbar():
foomod/core.py: return "Imported from a foreign land"
... and if we make __init__.py empty:
$ echo > foomod/__init__.py
$ ipython
In [1]: from foomod import *
In [2]: printbar()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-ba5b6693441e> in <module>()
----> 1 printbar()
NameError: name 'printbar' is not defined
When writing python modules, is there a way to prevent it being imported twice by the client codes? Just like the c/c++ header files do:
#ifndef XXX
#define XXX
...
#endif
Thanks very much!
Python modules aren't imported multiple times. Just running import two times will not reload the module. If you want it to be reloaded, you have to use the reload statement. Here's a demo
foo.py is a module with the single line
print("I am being imported")
And here is a screen transcript of multiple import attempts.
>>> import foo
Hello, I am being imported
>>> import foo # Will not print the statement
>>> reload(foo) # Will print it again
Hello, I am being imported
Imports are cached, and only run once. Additional imports only cost the lookup time in sys.modules.
As specified in other answers, Python generally doesn't reload a module when encountering a second import statement for it. Instead, it returns its cached version from sys.modules without executing any of its code.
However there are several pitfalls worth noting:
Importing the main module as an ordinary module effectively creates two instances of the same module under different names.
This occurs because during program startup the main module is set up with the name __main__. Thus, when importing it as an ordinary module, Python doesn't detect it in sys.modules and imports it again, but with its proper name the second time around.
Consider the file /tmp/a.py with the following content:
# /tmp/a.py
import sys
print "%s executing as %s, recognized as %s in sys.modules" % (__file__, __name__, sys.modules[__name__])
import b
Another file /tmp/b.py has a single import statement for a.py (import a).
Executing /tmp/a.py results in the following output:
root#machine:/tmp$ python a.py
a.py executing as __main__, recognized as <module '__main__' from 'a.py'> in sys.modules
/tmp/a.pyc executing as a, recognized as <module 'a' from '/tmp/a.pyc'> in sys.modules
Therefore, it is best to keep the main module fairly minimal and export most of its functionality to an external module, as advised here.
This answer specifies two more possible scenarios:
Slightly different import statements utilizing different entries in sys.path leading to the same module.
Attempting another import of a module after a previous one failed halfway through.