How does python library handle internal imports? - python

Consider the following directory
myProject
myCode.py
__init__.py
myProject2
__init__.py
myProject2Inner
myCode.py
__init__.py
myLibrary
__init__.py
myPackage1
__init__.py
myPackage1Code.py
myPackage2
__init__.py
myPackage2Code.py
If myCode.py is dependent on myPackage1Code.py and myPackage1Code.py is dependent on myPackage2Code.py
I am currently doing the following
sys.path.append(os.path.abspath('../myLibrary/myPackage2/'))
import myPackage2Code
in myPackage1Code.py to make the code run successfully. But this is obviously really bad since the library import path is entirely dependent on who is using it. For example if myProject2Inner requires myPackage1 then the code above wouldn't work.
I would have to do
sys.path.append(os.path.abspath('../../myLibrary/myPackage2/'))
import myPackage2Code
I think I am doing something really wrong here, can someone point me a direction of how to handle import path within a self containing library?

In your case, myLibrary, myPackage1 and myPackage2 are packages. To import modules (or packages) from other packages, you must either use an absolute or relative path:
# in myPackage1Code.py
# absolute import
from myLibrary.myPackage2 import myPackage2Code
# relative import
from ..myPackage2 import myPackage2Code
This uniquely identifies the module you actually want, and tells Python where to find it. Note that . and .. are not file-system operations: they also work with dynamically composed namespace packages.
If you want to execute a script contained inside your package, you execute it as part of the package:
python2 -m myLibrary.myPackage1.myPackage1Code
Python2 also has implicit relative imports:
# in myLibrary/__init__.py
from myPackage2 import myPackage2Code
This form is generally discouraged, as it breaks if there is a global myPackage2. It also does not work with Python3.
Note that for packages to work, you have to use them as such! If you directly access part of a package (don't do this at home!)
# directly run code module of a package in the shell
python2 myLibrary/myPackage1/myPackage1Code.py
# directly import module of a package
sys.path.append(os.path.abspath('../../myLibrary/myPackage2/'))
import myPackage2Code
then Python does not know that myPackage2Code belongs to myLibrary.myPackage2.
This has two notable effects:
The myPackage2Code cannot use relative imports. Python considers it a top-level module, so imports cannot go "up" in the package hierarchy.
If another module imports it with its full path, this creates two separate modules myPackage2Code and myLibrary.myPackage2.myPackage2Code. Since these contain separate objects, they for example fail isinstance checks of except clauses.

Related

Why does error occurs when importing from same *sub*directory?

Consider this folder structure:
main.py
module_a/
aa.py
bb.py
__init__.py
In main.py, I import aa as:
from module_a import aa
aa.yyy()
Then in aa.py, I import bb and include its functions as:
import bb
bb.xxx()
However, when I run main.py, python says "No module named 'bb'".
May I know why this happens. What is the correct way to import bb.
Thanks!!!
I have tried to write aa.py as:
import .bb
bb.xxx()
But it still does not work.
why this happens
Because the aa folder is not a place that Python is searching for modules.
Python's imports are absolute by default. They only look in specific places determined by sys.path. In main.py, import module_a.aa works because the root folder of the project happens to be on the sys.path; that folder contains a module_a folder; and that folder contains aa.py.
What is the correct way to import bb.
Please use relative imports between files in your package. In this case, the necessary import in aa.py looks like:
from . import bb
Absolute imports are error-prone; a project that uses two packages whose contents have overlapping names will run into namespace collisions. (Sadly, the standard library uses absolute imports in most places, such that projects need to ban certain module names for safety.) Relative imports will also require much less maintenance, should you later rename a sub-package.
The only thing relative imports require is that the package gets loaded, which typically will happen automatically with the first (yes, absolute) import of any of the package contents. When the package is loaded, it automatically sets a __package__ attribute on the modules in that package, which Python can use to resolve the relative imports. It's important to note that relative imports are relative to the package hierarchy, not the directory structure, which is why this is necessary; imports like from .. import example do not work by figuring out the current file location and then going up a level in the directory hierarchy. Instead, they check the __package__ to figure out what the containing package is, then check the file/folder location for that, and work from there.
If the "driver" script is within the package, run it as a module, using the -m switch for Python. For example, from the root folder, if module_a/aa.py is the driver, use python -m module_a.aa. This instructs Python that module_a is the containing package for aa.py, and ensures it gets loaded even though no import in the code has loaded it.
Contrary to what many people will wrongly tell you, it is almost never required to manipulate sys.path; there are popular Python projects on GitHub, running hundreds of thousands of lines of code, which either do not use it at all or use it only once in an ancillary role (perhaps because of a special requirement for a documentation tool). Just don't do it.
Also contrary to what many people will wrongly tell you, __init__.py files are not required to create packages in Python. They are simply a place where additional code can be placed for package initialization - for example, to create aliases for sub-package contents, or to limit what will be imported with a *-import (by setting __all__).
import .bb
This is just invalid. Relative imports only use the from syntax. See above for the correct syntax.
Suppose the same file structure as stated in OP, and:
main.py:
import module_a.aa
module_a.aa.thisFile()
module_a.aa.module_a.bb.thisFile()
aa.py:
import module_a.bb
def thisFile():
print("aa")
bb.py:
def thisFile():
print("bb")
Then, this will print
aa
bb
like you would expect. The main difference here, is that bb.py is imported in aa.py via module_a.bb. Running main.py gives no problem, however, running aa.py does not work in this way. That is why you might want to add folders to your path, such that you can call a function from a different file without this trouble. This is done via:
import os, inspect, sys
current_folder = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parent_folder = os.path.dirname(current_folder)
sys.path.insert(0,parent_folder)
Then you can import your files such as import file. If you consider this option, I would suggest to do some reading about how this works. Tip: make sure you avoid cyclic import problems.

How to structure imports inside project to work for both scripts and modules?

I have a rather simple setup:
[FOLDER]
|-> [Lib]
__init__.py (__all__=["modA","modB"])
modA.py (contains class named classA)
modB.py (contains class named classB + from modA import classA)
test1.py (from Lib.modA import classA
from Lib.modB import classB)
|-> [example]
test2.py (import sys
sys.path.append("../")
from Lib.modA import classA
from Lib.modB import classB)
Running test1.py from the Lib folder works perfectly without errors. Running test2.py from the example folder on the other hand requires the sys-patch to find Lib at all; however, it then crashes with No module named modA tracing back to the from modA import classA in modB.py via from Lib.modB import classB in test2.py.
How is one supposed to define an import in a module such that it will also work irrespective of the possible location of any future script that may use/import said module?
Python programs should be thought of as packages and modules, not as directories and files. While there is some overlap, packages and modules are more restrictive but also better encapsulated as a result.
Mixing both – say by manually modifying sys.path – should only be done as a last resort.
TLDR:
Use qualified imports based on the package:
from Lib.modA import classA/from .modA import classA instead of from modA import classA .
Use the environment for discovery:
Add search paths via PYTHONPATH instead of sys.path.
Start by deciding which ones are the top-level packages.
This is the point where we go from "directories/files" to "package". Notably, we cannot go "above" the top-level later on, so it should contain everything we need. However, we cannot remove anything "below" either, so it should be a tight enough selection.
In the example we could go as low as treating modA and siblings as their own module-package, and as high as FOLDER encompassing the entire project.
[FOLDER]
|-> [Lib]
| |-> modA.py
: :
It is reasonable to pick Lib since it represents a self-contained part. Going higher to FOLDER would be excessive, going lower to modA and siblings would break their relation as belonging together.
Everything under the top-level package folder now belongs to the package.
Prominently, FOLDER/Lib/modA.py is now the module Lib.modA. It is not FOLDER.Lib.modA, nor just modA.
Import package content only with absolute or relative qualified names.
Now that the top-level is established, all import statements must be made with regard to it. Refer to modules by their qualified name only, even when two modules share a more common parent:
# Lib.modB
# okay, fully qualified import
from Lib.modA import classA
# broken, unqualified import
from modA import classA
In order to avoid typing out the entire fully qualified name, one can use a relative name instead. This reuses the current module name to derive the fully qualified name of the module to import.
# Lib.modB
# okay, relative import
from .modA import classA
Note: Relative imports are a package operation, not a filesystem operation. One cannot go beyond the top-level, but descend into package namespaces.
Everything inside the module is now encapsulated and self-contained.
All qualified imports will work irrespective of the location of the package itself. As long as the top-level can be imported, everything below it can be imported as well.
Notably, this works irrespective of the location of any future script that may use/import the package.
Run packaged code with absolute qualified names
When executing code inside a package, Python has to know that the code is part of the package. Thus, it must be run with its absolute qualified named. Use the -m switch to do so:
# import and execute Lib.test1
python3 -m Lib.test1
Enable packages via the environment, not the program.
The point of defining a package is getting a single, self-contained entity that represents a library. One could zip up a package or similar, and it would still be a package.
In return, this means that the code itself should not break this abstraction by directly adding/changing directories to search packages.
Instead of having scripts assume the location of the package, the environment - of the script, the user, or the entire machine – should expose the package. There are basically two ways to do this:
Announce the package location as a search location. This is suitable for development as it is flexible but hard to maintain. The PYTHONPATH is appropriate for this.
Move the package to a search location used by Python. This is suitable for production and distribution as it requires effort but is easy to maintain. Packaging and using a package manager is appropriate for this.
To what I understand, when you are trying to run manually there is no path set as the environment, and when you just do
python file.py
python doesn't know in which environment we are running, which is precisely why, we do sys.path.append, to add the sibling directory path to env.
In the case if you are using an IDE like spyder or pycharm. you have an option to set the directory in which your program is located, or in pycharm it creates the whole program as a package where all the paths are handled by IDE.

Python imports that work both when using a file as a module and when called directly

I was reading today about absolute and relative imports in Python, and I was wondering if there is a pythonic way of making an import work when the script that contains the import is used as a module and when it is called directly. As an example:
└── project
├── main.py
└── package
├── submoduleA.py
└── submoduleB.py
main.py
from package.submoduleA import submoduleAfunction
submoduleAfunction()
submoduleB.py
def submoduleBfunction():
print('Hi from submoduleBfunction')
submoduleA.py
# Absolute import, works when called directly: python3 package/submoduleA.py
from submoduleB import submoduleBfunction
# Relative import, works when imported as module: python3 main.py
# from .submoduleB import submoduleBfunction
def submoduleAfunction():
submoduleBfunction()
if __name__=="__main__":
submoduleAfunction()
As you see, I have there both absolute and relative imports. Each of them works in a specific situation. But is there a pythonic way of making it work for both cases? So far I've done the trick messing with sys.path, but I'm not sure that this will work in every situation (i.e. called from interpreter or in different OS's) or even if this goes against any recommendations.
import sys
import pathlib
sys.path.append(str(pathlib.Path(__file__).parent.resolve()))
# Absolute import, works when called directly: python3 package/submoduleA.py
from submoduleB import submoduleBfunction
Since submoduleA is the "sibling" of submoduleB, this is actually an implicit relative import. The style guide says of them:
Implicit relative imports should never be used and have been removed in Python 3.
The only reason it even works when running python3 package/submoduleA.py as a script is that Python prepends the directory of the script to sys.path. From the docs:
As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter.
That is, the directory /path/to/package gets injected to sys.path, allowing submoduleB to be imported directly.
Implicit relative imports should never be used, so what should you use instead? It doesn't matter, you can use the correct form of the absolute import:
from package.submoduleB import submoduleBfunction
or a relative import:
from .submoduleB import submoduleBfunction
Either is fine as long as package has been installed into the environment (installing the package is what will put the code into site-packages, which is on sys.path).
If you absolute must run package/submoduleA.py directly as a script, which is considered an anti-pattern, then you will need to use the absolute import version. Relative imports do not work in this case. You would need to use instead python3 -m package.submoduleA in order for the relative-import version to work.
Disclaimer:
Your mileage may vary and I realize that my answer does not fully answer the question (using relative imports). Mea maxima culpa (my bad).
In my defense: googling python avoid relative imports returns 540k hits.
I usually avoid relative imports because they seem to have too many special cases on just this type of things and I am a huge fan of running python files directly as scripts when appropriate.
For example, my constants.py file can be run as a script - it will breakpoint() and I can inspect its contents, a good deal of which come from environment variables.
Assuming that the parent directory for project is in your Python path, I would just use from project.package.submoduleA import submoduleAfunction. That works in all cases.
(One possible additional benefit I found with explicit imports is when I wanted to rename my top-level package (myproject to myproject2). Having everything explicit allowed me to quickly sed the changes. Note Wim's remarks in the comments below)
One possibility is to just enclose the absolute import in a try block and catch an ImportException. In case of an exception, you are presumably using it as a module so then do a relative import.
try:
​from submoduleB import submoduleBfunction
except ImportException:
from .submoduleB import submoduleBfunction

Resolving module conflict in python [duplicate]

Okay, the scenario is very simple. I have this file structure:
.
├── interface.py
├── pkg
│   ├── __init__.py
│   ├── mod1.py
│   ├── mod2.py
Now, these are my conditions:
mod2 needs to import mod1.
both interface.py and mod2 needs to be run independently as a main script. If you want, think interface as the actual program and mod2 as an internal tester of the package.
So, in Python 2 I would simply do import mod1 inside mod2.py and both python2 mod2.py and python2 interface.py would work as expected.
However, and this is the part I less understand, using Python 3.5.2, if I do import mod1; then I can do python3 mod2.py, but python3 interface.py throws: ImportError: No module named 'mod1' :(
So, apparently, python 3 proposes to use import pkg.mod1 to avoid collisions against built-in modules. Ok, If I use that I can do python3 interface.py; but then I can't python3 mod2.py because: ImportError: No module named 'pkg'
Similarly, If I use relative import:
from . import mod1 then python3 interface.py works; but mod2.py says SystemError: Parent module '' not loaded, cannot perform relative import :( :(
The only "solution", I've found is to go up one folder and do python -m pkg.mod2 and then it works. But do we have to be adding the package prefix pkg to every import to other modules within that package? Even more, to run any scripts inside the package, do I have to remember to go one folder up and use the -m switch? That's the only way to go??
I'm confused. This scenario was pretty straightforward with python 2, but looks awkward in python 3.
UPDATE: I have upload those files with the (referred as "solution" above) working source code here: https://gitlab.com/Akronix/test_python3_packages. Note that I still don't like it, and looks much uglier than the python2 solution.
Related SO questions I've already read:
Python -- import the package in a module that is inside the same package
How to do relative imports in Python?
Absolute import module in same package
Related links:
https://docs.python.org/3.5/tutorial/modules.html
https://www.python.org/dev/peps/pep-0328/
https://www.python.org/dev/peps/pep-0366/
TLDR:
Run your code with python -m pkg.mod2.
Import your code with from . import mod1.
The only "solution", I've found is to go up one folder and do python -m pkg.mod2 and then it works.
Using the -m switch is indeed the "only" solution - it was already the only solution before. The old behaviour simply only ever worked out of sheer luck; it could be broken without even modifying your code.
Going "one folder up" merely adds your package to the search path. Installing your package or modifying the search path works as well. See below for details.
But do we have to be adding the package prefix pkg to every import to other modules within that package?
You must have a reference to your package - otherwise it is ambiguous which module you want. The package reference can be either absolute or relative.
A relative import is usually what you want. It saves writing pkg explicitly, making it easier to refactor and move modules.
# inside mod1.py
# import mod2 - this is wrong! It can pull in an arbitrary mod2 module
# these are correct, they uniquely identify the module
import pkg.mod2
from pkg import mod2
from . import mod2
from .mod2 import foo # if pkg.mod2.foo exists
Note that you can always use <import> as <name> to bind your import to a different name. For example, import pkg.mod2 as mod2 lets you work with just the module name.
Even more, to run any scripts inside the package, do I have to remember to go one folder up and use the -m switch? That's the only way to go??
If your package is properly installed, you can use the -m switch from anywhere. For example, you can always use python3 -m json.tool.
echo '{"json":"obj"}' | python -m json.tool
If your package is not installed (yet), you can set PYTHONPATH to its base directory. This includes your package in the search path, and allows the -m switch to find it properly.
If you are in the executable's directory, you can execute export PYTHONPATH="$(pwd)/.." to quickly mount the package for import.
I'm confused. This scenario was pretty straightforward with python 2, but looks awkward in python 3.
This scenario was basically broken in python 2. While it was straightforward in many cases, it was difficult or outright impossible to fix in any other cases.
The new behaviour is more awkward in the straightforward case, but robust and reliable in any case.
I had similar problem.
I solved it adding
import sys
sys.path.insert(0,".package_name")
into the __init__.py file in the package folder.

Python package structure

I have a Python package with several subpackages.
myproject/
__init__.py
models/
__init__.py
...
controllers/
__init__.py
..
scripts/
__init__.py
myscript.py
Within myproject.scripts.myscript, how can I access myproject.models? I've tried
from myproject import models # No module named myproject
import models # No module named models
from .. import models # Attempted relative import in non-package
I've had to solve this before, but I can never remember how it's supposed to be done. It's just not intuitive to me.
This is the correct version:
from myproject import models
If it fails with ImportError: No module named foo it is because you haven't set PYTHONPATH to include the directory which contains myproject/.
I'm afraid other people will suggest tricks to let you avoid setting PYTHONPATH. I urge you to disregard them. This is why PYTHONPATH exists: to tell Python where to look for code to load. It is robust, reasonably well documented, and portable to many environments. Tricks people play to avoid having to set it are none of these things.
The explicit relative import will work even without PYTHONPATH being set, since it can just walk up the directory hierarchy until it finds the right place, it doesn't need to find the top and then walk down. However, it doesn't work in a script you pass as a command line argument to python (or equivalently, invoke directly with a #!/usr/bin/python line). This is because in both these cases, it becomes the __main__ module of the process. There's nowhere to walk up to from __main__ - it's already at the top! If you invoke the code in your script by importing that module, then it will be fine. That is, compare:
python myproject/scripts/myscript.py
to
python -c 'import myproject.scripts.myscript'
You can take advantage of this by not executing your script module directly, but creating a bin/myscript that does the import and perhaps calls a main function:
import myprojects.scripts.myscript
myprojects.scripts.myscript.main()
Compare to how Twisted's command line scripts are defined: http://twistedmatrix.com/trac/browser/trunk/bin/twistd
Your project is not in your path.
Option A
Install your package so that python can find it via its absolute name from anywhere (using from myproject import models )
Option B
Trickery to add the relative parent to your path
sys.path.append(os.path.abspath('..'))
The former option is recommended.

Categories

Resources