The problem
I've found dozens of articles and tutorials about the basics of using import in Python, but none that would provide a comprehensive guide on setting up your own Python project with multiple packages.
This is my project's structure:
codename/
__init__.py
package1.py (has class1 and is a script)
package2.py (has class2)
package3.py (has function1 and is a script)
test/
__init__.py
test_package1.py (has unit tests for package1)
test_package3.py (has unit tests for package3)
How do I setup my imports to have the following requirements met (and do all of them make sense?):
class1, class2 and function1 are in namespace codename, i.e. this works:
import codename
obj = codename.class1()
codename.function1(obj)
they may be imported the same way using from codename import * or from codename import class1
function1 can easily access class1 (how?)
package1 and package2 are executable scripts
so are test_package1.py and test_package3.py
tests are also executable via python -m unittest discover
scripts are also executable via python -m codename.package1
For some reasons I'm having issues with having all of these met and when I try to fix one issue, another one pops out.
What have I tried?
Leaving codename/__init__.py empty satisfies almost all of the requirements, because everything works, but leaves names like class1 in their module's namespaces - whereas I want them imported into the package.
Adding from codename.package1 import class1 et al again satisfies most of the requirements, but I get a warning when executing the script via python -m codename.package1:
RuntimeWarning: 'codename.package2' found in sys.modules \
after import of package 'codename', but prior to execution of \
'codename.package2'; this may result in unpredictable behaviour
which sort of makes sense...
Running the script via python codename/package1.py functions, but I guess I would probably like both ways to work.
I ran into an answer to a similar question that stated that internal modules should not also be scripts, but I don't understand why we get the -m switch then? Anyway, extracting the mains into an external scripts directory works, but is it the only canonical way of setting all of this up?
you'll need to add the parent directory of codename/ to the PYTHONPATH environment variable (or write/use a setup.py file, or modify sys.path at runtime)
You'll need to import all names that you want to export in codename/__init__.py
from .package1 import function1 if you write/use a setup.py file, otherwise from codename.package1 import function1
You should use a setup.py file for scripts/executables since it makes everything much cleaner (and you'll need a setup.py file sooner or later anyway)
(and 6.) I would suggest using py.test it will find all tests for you automagically (and can run them in parallel etc.)
That should work out-of-the-box, but if you've written a setup.py then you can run them from anywhere (and on any platform) as just package1.
Related
I encountered a strange issue with unit tests in a namespaced package. Here's an example I built on GitHub. Here's the basic structure:
$ tree -P '*.py' src
src
└── namespace
└── testcase
├── __init__.py
├── a.py
├── sub
│ ├── __init__.py
│ └── b.py
└── tests
├── __init__.py
└── test_imports.py
4 directories, 6 files
I would expect that relative imports within a namespaced package would maintain the namespace. Normally, that seems to be true:
$ cat src/namespace/testcase/a.py
print(__name__)
$ cat src/namespace/testcase/sub/b.py
print(__name__)
from ..a import *
$ python -c 'from namespace.testcase.sub import b'
namespace.testcase.sub.b
namespace.testcase.a
But if I involve a test, I get a surprise:
$ cat src/namespace/testcase/tests/test_imports.py
from namespace.testcase import a
from ..sub import b
$ python -m unittest discover src/namespace/
namespace.testcase.a
testcase.sub.b
testcase.a
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
The code in src/namespace/testcase/a.py is getting run twice! In my case, this caused a singleton I had stubbed to be re-initialized as a real object, subsequently causing test failures.
Is this expected behavior? What is the correct usage here? Should I always avoid relative imports (and have to do global search-and-replace if my company decides to rename something?)
Problem: Overlapping sys.path entries
The duplicate imports with different module names happen when you have overlapping sys.path entries: that is, when sys.path contains both a parent and child directory as separate entries. This situation is almost always an error: it will make Python see the child directory as a separate, unrelated root for imports, which leads surprising behaviour.
In your example:
$ python -m unittest discover src/namespace/
namespace.testcase.a
testcase.sub.b
testcase.a
This means that both src and src/namespace ended up in sys.path, so that:
namespace.testcase.a was imported relative to src
testcase.sub.b and testcase.a were imported relative to src/namespace
Why?
In this case, the overlapping sys.path entries happen because unittest discover is trying to be helpful: it defaults to assuming that the start directory for test discovery is also the top-level directory that your imports are relative to, and it will insert that top-level directory into sys.path if it's not already there, as a convenience. (…not so convenient, it turns out. 😔️)
Solution: Explicitly specify the correct top-level directory
You can explicitly specify the correct top-level directory with -t (--top-level-directory):
python -m unittest discover -t src -s src/namespace/
This will work as before, but won't treat src/namespace as a top-level directory to insert into sys.path.
Side note: The -s option prefix for src/namespace/ was implicit in the previous example: the above just makes it explicit.
(unittest discover has weird positional argument handling: it treats its first three positional arguments as values for -s, -p, and -t, in that order.)
Details
The code responsible for this lives in unittest/loader.py:
class TestLoader(object):
def discover(self, start_dir, pattern='test*.py', top_level_dir=None):
...
if top_level_dir is None:
set_implicit_top = True
top_level_dir = start_dir
top_level_dir = os.path.abspath(top_level_dir)
if not top_level_dir in sys.path:
# all test modules must be importable from the top level directory
# should we *unconditionally* put the start directory in first
# in sys.path to minimise likelihood of conflicts between installed
# modules and development versions?
sys.path.insert(0, top_level_dir)
...
Not sure exactly why unittest wouldn't respect your setup.py, but indeed often it does not (maybe a bug, or a difficulty in doing so for the implementers). Or perhaps unittest is by design very "low level" and does not come with any bells or whistles you'd expect from something like pytest.
What you need to do is help unittest out and tell it where your package starts, use the --top-level-directory option for that (or -t for short).
This should work as you expect:
python -m unittest discover -t src/ src/namespace/
The issue is that you probably have something like this in your setup.py:
package_dir={"": "src"},
And unfortunately unittest is not "smart enough" to figure that out.
This is one example detail why I strongly prefer pytest to std-lib's unittest :)
pytest will go to greater lengths to "do the right thing", while not forcing you to be verbose in your test run invocation (for example: it auto-discovers recursively by default etc).
If you want to learn more about how unittest imports things, you can add this line to your a.py file:
assert __package__ == "namespace.testcase"
Then, run your test without the -t src/ as you originally did -> you will see exactly where unittest is crashing. If you open that code, you will see that all it does is try to simply __import__(name), where name is simply the thing it just found that could look like a test.
Tests are usually NOT in a package, a more strict project layout would be like:
src/namespace/ # -> your project or lib
tests/ # -> your tests
The above is "more strict" because it makes it harder to confuse your tests with your actual shipped code (ie: no oopsie import ..tests.foo from the actual code).
Now, given this, a lot of testing tools like unittest and pytest, will kind of assume that your tests don't really have a package, so they will import them as-if the package doesn't matter at all...
Ie: they won't necessarily try and import test_foo.py as-if it was under your main top-level name.
So, in theory you should (from my experience writing tests):
use relative imports from within your actual code only (ie: any non-test submodule)
use full absolute import from the tests (that simplifies quite a few things for testing tools + it allows to treat your code "less intimately" from the tests -> kinda forces you to import stuff from your namespace project like any other user would do)
Hope that helps. I don't have handy links to docs on this (and maybe it would be worth a good book). But consider this: if you write this from your test:
from ..sub import b
You are taking shortcuts a user of your library cannot do. Anyone who would pip install namespace for example would have to import b with an absolute import:
from namespace.sub import b
It is helpful I find to isolate tests from the code itself. I know many projects do just add a tests/ subfolder to their main code tree, but I do find that odd, since that ships the tests together with the published package, and one could technically import the tests just like the rest of the code... for example:
from namespace.testcase.tests import test_imports
An example of tests/ outside the main code tree is the requests package.
Followed the code, as this got me curious.
unittest discover looks for test cases, it finds testcase/ which looks like a test folder to it.
So it simply does a "standalone" (ie: regardless of any "top-level" context) import testcase.
Then your test does this (all of these imports are simply cached in sys.modules, by name):
from namespace.testcase import a, which triggers the import of a as a submodule of namespace.testcase as expected
but then it calls from ..sub import b, now in unittest's context, this expands to testcase.sub.b, which then leads to the confusion.
First of all, there are a bunch of solutions on stackoverflow regarding this but from the ones I tried none of them is working. I am working on a remote machine (linux). I am prototyping within the dir-2/module_2.py file using an ipython interpreter. Also I am trying to avoid using absolute paths as the absolute path in this remote machine is long and ugly, and I want my code to run on other machines upon download.
My directory structure is as follows:
/project-dir/
-/dir-1/
-/__ init__.py
-/module_1.py
-/dir-2/
-/__ init__.py
-/module_2.py
-/module_3.py
Now I want to import module_1 from module_2. However the solution mentioned in this stackoverflow post: link of using
sys.path.append('../..')
import module_2
Does not work. I get the error: ModuleNotFoundError: No module named 'module_1'
Moreover, within the ipython interpreter things like import .module_3 within module_2 throws error:
import .module_3
^ SyntaxError: invalid syntax
Isn't the dot operator supposed to work within the same directory as well. Overall I am quite confused by the importing mechanism. Any help with the initial problem is greatly appreciated! Thanks a lot!
Why it didn't work?
If you run the module1.py file and you want to import module2 then you need something like
sys.path.append("../dir-2")
If you use sys.path.append("../..") then the folder you added to the path is the folder containing project-dirand there is notmodule2.py` file inside it.
The syntax import .module_3 is for relative imports. if you tried to execute module2.py and it contains import .module_3 it does not work because you are using module2.py as a script. To use relative imports you need to treat both module2.py and module_3.py as modules. That is, some other file imports module2 and module2 import something from module3 using this syntax.
Suggestion on how you can proceed
One possible solution that solves both problems is property organizing the project and (optionally, ut a good idea) packaging your library (that is, make your code "installable"). Then, once your library is installed (in the virtual environment you are working) you don't need hacky sys.path solutions. You will be able to import your library from any folder.
Furthermore, don't treat your modules as scripts (don't run your modules). Use a separate python file as your "executable" (or entry point) and import everything you need from there. With this, relative imports in your module*.py files will work correctly and you don't get confused.
A possible directory structure could be
/project-dir/
- apps/
- main.py
- yourlib/
-/__ init__.py
-/dir-1/
-/__ init__.py
-/module_1.py
-/dir-2/
-/__ init__.py
-/module_2.py
-/module_3.py
Notice that the the yourlib folder as well as subfolders contain an __init__.py file. With this structure, you only run main.py (the name does not need to be main.py).
Case 1: You don't want to package your library
If you don't want to package your library, then you can add sys.path.append("../") in main.py to add "the project-dir/ folder to the path. With that your yourlib library will be "importable" in main.py. You can do something like from yourlib import module_2 and it will work correctly (and module_2 can use relative imports). Alternatively, you can also directly put main.py in the project-dir/ folder and you don't need to change sys.path at all, since project-dir/ will be the "working directory" in that case.
Note that you can also have a tests folder inside project-dir and to run a test file you can do the same as you did to run main.py.
Case 2: You want to package your library
The previous solution already solves your problems, but going the extra mile adds some benefits, such as dependency management and no need to change sys.path no matter where you are. There are several options to package your library and I will show one option using poetry due to its simplicity.
After installing poetry, you can run the command below in a terminal to create a new project
poetry new mylib
This creates the following folder structure
mylib/
- README.rst
- mylib/
- __init__.py
- pyproject.toml
- tests
You can then add the apps folder if you want, as well as subfolders inside mylib/ (each with a __init__.py file).
The pyproject.toml file specifies the dependencies and project metadata. You can edit it by hand and/or use poetry to add new dependencies, such as
poetry add pandas
poetry add --dev mypy
to add pandas as a dependency and mypy as a development dependency, for instance. After that, you can run
poetry build
to create a virtual environment and install your library in it. You can activate the virtual environment with poetry shell and you will be able to import your library from anywhere. Note that you can change your library files without the need to run poetry build again.
At last, if you want to publish your library in PyPi for everyone to see you can use
poetry publish --username your_pypi_username --password _passowrd_
TL; DR
Use an organized project structure with a clear place for the scripts you execute. Particularly, it is better if the script you execute is outside the folder with your modules. Also, don't run a module as a script (otherwise you can't use relative imports).
Given the directory structure:
/home/user/python/mypacakge/src/foo.py
/home/user/python/mypacakge/tests
/home/user/python/mypacakge/tests/fixtures
/home/user/python/mypacakge/tests/fixtures/config.json.sample
/home/user/python/mypacakge/tests/foo_tests.py
/home/user/python/mypacakge/README.md
Where src contains the source code, and test contains the unit tests, how do I setup a "package" so that my relative imports that are used in the unit tests located in test/ can load classes in src/?
Similar questions: Python Relative Imports and Packages and Python: relative imports without packages or modules, but the first doesn't really answer my question (or I don't understand it) and the second relies on symlinks to hack it together (respectively).
I figured it out.
You have to have __init__.py in each of the folders like so:
/home/user/python/mypackage/src/__init__.py
/home/user/python/mypackage/src/Foo.py
/home/user/python/mypackage/tests
/home/user/python/mypackage/tests/fixtures
/home/user/python/mypackage/tests/fixtures/config.json.sample
/home/user/python/mypackage/tests/foo_test.py
/home/user/python/mypackage/tests/__init__.py
/home/user/python/mypackage/README.md
/home/user/python/mypackage/__init__.py
This tells python that we have "a package" in each of the directories including the top level directory. So, at this point, I have the following packages:
mypackage
mypackage.test
mypackage.src
So, because python will only go "down into" directories, we have to execute the unit tests from the root of the top-most package, which in this case is:
/home/user/python/mypackage/
So, from here, I can execute python and tell it to execute the unittest module and then specify which tests I want it to perform by specifying the module using the command line options
python -m unittest tests.foo_test.TestFoo
This tells python:
Execute python and load the module unittest
Tell unit test to run the tests contained in the class TestFoo, which is in the file foo_test.py, which is in the test directory.
Python is able to find it because __init__.py in each of these directories promotes them to a package that python and unittest can work with.
Lastly, foo_test.py must contain an import statement like:
from src import Foo
Because we are executing from the top level directory, AND we have packages setup for each of the subdirectories, the src package is available in the namespace, and can be loaded by a test.
I just got set up to use pytest with Python 2.6. It has worked well so far with the exception of handling "import" statements: I can't seem to get pytest to respond to imports in the same way that my program does.
My directory structure is as follows:
src/
main.py
util.py
test/
test_util.py
geom/
vector.py
region.py
test/
test_vector.py
test_region.py
To run, I call python main.py from src/.
In main.py, I import both vector and region with
from geom.region import Region
from geom.vector import Vector
In vector.py, I import region with
from geom.region import Region
These all work fine when I run the code in a standard run. However, when I call "py.test" from src/, it consistently exits with import errors.
Some Problems and My Solution Attempts
My first problem was that, when running "test/test_foo.py", py.test could not "import foo.py" directly. I solved this by using the "imp" tool. In "test_util.py":
import imp
util = imp.load_source("util", "util.py")
This works great for many files. It also seems to imply that when pytest is running "path/test/test_foo.py" to test "path/foo.py", it is based in the directory "path".
However, this fails for "test_vector.py". Pytest can find and import the vector module, but it cannot locate any of vector's imports. The following imports (from "vector.py") both fail when using pytest:
from geom.region import *
from region import *
These both give errors of the form
ImportError: No module named [geom.region / region]
I don't know what to do next to solve this problem; my understanding of imports in Python is limited.
What is the proper way to handle imports when using pytest?
Edit: Extremely Hacky Solution
In vector.py, I changed the import statement from
from geom.region import Region
to simply
from region import Region
This makes the import relative to the directory of "vector.py".
Next, in "test/test_vector.py", I add the directory of "vector.py" to the path as follows:
import sys, os
sys.path.append(os.path.realpath(os.path.dirname(__file__)+"/.."))
This enables Python to find "../region.py" from "geom/test/test_vector.py".
This works, but it seems extremely problematic because I am adding a ton of new directories to the path. What I'm looking for is either
1) An import strategy that is compatible with pytest, or
2) An option in pytest that makes it compatible with my import strategy
So I am leaving this question open for answers of these kinds.
The issue here is that Pytest walks the filesystem to discover files that contain tests, but then needs to generate a module name that will cause import to load that file. (Remember, files are not modules.)
Pytest comes up with this test package name by finding the first directory at or above the level of the file that does not include an __init__.py file and declaring that the "basedir" for the module tree containing a module generated from this file. It then adds the basedir to sys.path and imports using the module name that will find that file relative to the basedir.
There are some implications of this of which you should beware:
The basepath may not match your intended basepath in which case the module will have a name that doesn't match what you would normally use. E.g., what you think of as geom.test.test_vector will actually be named just test_vector during the Pytest run because it found no __init__.py in src/geom/test/ and so added that directory to sys.path.
You may run into module naming collisions if two files in different directories have the same name. For example, lacking __init__.py files anywhere, adding geom/test/test_util.py will conflict with test/test_util.py because both are loaded as import test_util.py, with both test/ and geom/test/ in the path.
The system you're using here, without explicit __init__.py modules, is having Python create implicit namespace packages for your directories. (A package is a module with submodules.) Ideally we'd configure Pytest with a path from which it would also generate this, but it doesn't seem to know how to do that.
The easiest solution here is simply to add empty __init__.py files to all of the subdirectories under src/; this will cause Pytest to import everything using package/module names that start with directory names under src/.
The question How do I Pytest a project using PEP 420 namespace packages? discusses other solutions to this.
import looks in the following directories to find a module:
The home directory of the program. This is the directory of your root script. When you are running pytest your home directory is where it is installed (/usr/local/bin probably). No matter that you are running it from your src directory because the location of your pytest determines your home directory. That is the reason why it doesn't find the modules.
PYTHONPATH. This is an environment variable. You can set it from the command line of your operating system. In Linux/Unix systems you can do this by executing: 'export PYTHONPATH=/your/custom/path' If you wanted Python to find your modules from the test directory you should include the src path in this variable.
The standard libraries directory. This is the directory where all your libraries are installed.
There is a less common option using a pth file.
sys.path is the result of combining the home directory, PYTHONPATH and the standard libraries directory. What you are doing, modifying sys.path is correct. It is something I do regularly. You could try using PYTHONPATH if you don't like messing with sys.path
If you include an __init__.py file inside your tests directory, then when the program is looking to set a home directory it will walk 'upwards' until it finds one that does not contain an init file. In this case src/.
From here you can import by saying :
from geom.region import *
you must also make sure that you have an init file in any other subdirectories, such as the other nested test directory
I was wondering what to do about this problem too. After reading this post, and playing around a bit, I figured out an elegant solution. I created a file called "test_setup.py" and put the following code in it:
import sys, os
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
I put this file in the top-level directory (such as src). When pytest is run from the top-level directory, it will run all test files including this one since the file is prefixed with "test". There are no tests in the file, but it is still run since it begins with "test".
The code will append the current directory name of the test_setup.py file to the system path within the test environment. This will be done only once, so there are not a bunch of things added to the path.
Then, from within any test function, you can import modules relative to that top-level folder (such as import geom.region) and it knows where to find it since the src directory was added to the path.
If you want to run a single test file (such as test_util.py) instead of all the files, you would use:
pytest test_setup.py test\test_util.py
This runs both the test_setup and test_util code so that the test_setup code can still be used.
Are so late to answer that question but usining python 3.9 or 3.10 u just need to add __init__.py folder in tests folders.
When u add this file python interprets this folders as a module.
Wold be like this
src/
main.py
util.py
test/
__init__.py
test_util.py
geom/
vector.py
region.py
test/
__init__.py
test_vector.py
test_region.py
so u just run pytest.
Sorry my poor english
Not the best solution, but maybe the fastest one:
cd path/python_folder
python -m pytest python_file.py
I have a Python package with several subpackages.
myproject/
__init__.py
models/
__init__.py
...
controllers/
__init__.py
..
scripts/
__init__.py
myscript.py
Within myproject.scripts.myscript, how can I access myproject.models? I've tried
from myproject import models # No module named myproject
import models # No module named models
from .. import models # Attempted relative import in non-package
I've had to solve this before, but I can never remember how it's supposed to be done. It's just not intuitive to me.
This is the correct version:
from myproject import models
If it fails with ImportError: No module named foo it is because you haven't set PYTHONPATH to include the directory which contains myproject/.
I'm afraid other people will suggest tricks to let you avoid setting PYTHONPATH. I urge you to disregard them. This is why PYTHONPATH exists: to tell Python where to look for code to load. It is robust, reasonably well documented, and portable to many environments. Tricks people play to avoid having to set it are none of these things.
The explicit relative import will work even without PYTHONPATH being set, since it can just walk up the directory hierarchy until it finds the right place, it doesn't need to find the top and then walk down. However, it doesn't work in a script you pass as a command line argument to python (or equivalently, invoke directly with a #!/usr/bin/python line). This is because in both these cases, it becomes the __main__ module of the process. There's nowhere to walk up to from __main__ - it's already at the top! If you invoke the code in your script by importing that module, then it will be fine. That is, compare:
python myproject/scripts/myscript.py
to
python -c 'import myproject.scripts.myscript'
You can take advantage of this by not executing your script module directly, but creating a bin/myscript that does the import and perhaps calls a main function:
import myprojects.scripts.myscript
myprojects.scripts.myscript.main()
Compare to how Twisted's command line scripts are defined: http://twistedmatrix.com/trac/browser/trunk/bin/twistd
Your project is not in your path.
Option A
Install your package so that python can find it via its absolute name from anywhere (using from myproject import models )
Option B
Trickery to add the relative parent to your path
sys.path.append(os.path.abspath('..'))
The former option is recommended.