Test discovery drops Python namespaces from relative imports?

Test discovery drops Python namespaces from relative imports? - python

I encountered a strange issue with unit tests in a namespaced package. Here's an example I built on GitHub. Here's the basic structure:
$ tree -P '*.py' src
src
└── namespace
└── testcase
├── __init__.py
├── a.py
├── sub
│   ├── __init__.py
│   └── b.py
└── tests
├── __init__.py
└── test_imports.py
4 directories, 6 files
I would expect that relative imports within a namespaced package would maintain the namespace. Normally, that seems to be true:
$ cat src/namespace/testcase/a.py
print(__name__)
$ cat src/namespace/testcase/sub/b.py
print(__name__)
from ..a import *
$ python -c 'from namespace.testcase.sub import b'
namespace.testcase.sub.b
namespace.testcase.a
But if I involve a test, I get a surprise:
$ cat src/namespace/testcase/tests/test_imports.py
from namespace.testcase import a
from ..sub import b
$ python -m unittest discover src/namespace/
namespace.testcase.a
testcase.sub.b
testcase.a
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
The code in src/namespace/testcase/a.py is getting run twice! In my case, this caused a singleton I had stubbed to be re-initialized as a real object, subsequently causing test failures.
Is this expected behavior? What is the correct usage here? Should I always avoid relative imports (and have to do global search-and-replace if my company decides to rename something?)

Problem: Overlapping sys.path entries
The duplicate imports with different module names happen when you have overlapping sys.path entries: that is, when sys.path contains both a parent and child directory as separate entries. This situation is almost always an error: it will make Python see the child directory as a separate, unrelated root for imports, which leads surprising behaviour.
In your example:
$ python -m unittest discover src/namespace/
namespace.testcase.a
testcase.sub.b
testcase.a
This means that both src and src/namespace ended up in sys.path, so that:
namespace.testcase.a was imported relative to src
testcase.sub.b and testcase.a were imported relative to src/namespace
Why?
In this case, the overlapping sys.path entries happen because unittest discover is trying to be helpful: it defaults to assuming that the start directory for test discovery is also the top-level directory that your imports are relative to, and it will insert that top-level directory into sys.path if it's not already there, as a convenience. (…not so convenient, it turns out. 😔️)
Solution: Explicitly specify the correct top-level directory
You can explicitly specify the correct top-level directory with -t (--top-level-directory):
python -m unittest discover -t src -s src/namespace/
This will work as before, but won't treat src/namespace as a top-level directory to insert into sys.path.
Side note: The -s option prefix for src/namespace/ was implicit in the previous example: the above just makes it explicit.
(unittest discover has weird positional argument handling: it treats its first three positional arguments as values for -s, -p, and -t, in that order.)
Details
The code responsible for this lives in unittest/loader.py:
class TestLoader(object):
def discover(self, start_dir, pattern='test*.py', top_level_dir=None):
...
if top_level_dir is None:
set_implicit_top = True
top_level_dir = start_dir
top_level_dir = os.path.abspath(top_level_dir)
if not top_level_dir in sys.path:
# all test modules must be importable from the top level directory
# should we *unconditionally* put the start directory in first
# in sys.path to minimise likelihood of conflicts between installed
# modules and development versions?
sys.path.insert(0, top_level_dir)
...

Not sure exactly why unittest wouldn't respect your setup.py, but indeed often it does not (maybe a bug, or a difficulty in doing so for the implementers). Or perhaps unittest is by design very "low level" and does not come with any bells or whistles you'd expect from something like pytest.
What you need to do is help unittest out and tell it where your package starts, use the --top-level-directory option for that (or -t for short).
This should work as you expect:
python -m unittest discover -t src/ src/namespace/
The issue is that you probably have something like this in your setup.py:
package_dir={"": "src"},
And unfortunately unittest is not "smart enough" to figure that out.
This is one example detail why I strongly prefer pytest to std-lib's unittest :)
pytest will go to greater lengths to "do the right thing", while not forcing you to be verbose in your test run invocation (for example: it auto-discovers recursively by default etc).
If you want to learn more about how unittest imports things, you can add this line to your a.py file:
assert __package__ == "namespace.testcase"
Then, run your test without the -t src/ as you originally did -> you will see exactly where unittest is crashing. If you open that code, you will see that all it does is try to simply __import__(name), where name is simply the thing it just found that could look like a test.
Tests are usually NOT in a package, a more strict project layout would be like:
src/namespace/ # -> your project or lib
tests/ # -> your tests
The above is "more strict" because it makes it harder to confuse your tests with your actual shipped code (ie: no oopsie import ..tests.foo from the actual code).
Now, given this, a lot of testing tools like unittest and pytest, will kind of assume that your tests don't really have a package, so they will import them as-if the package doesn't matter at all...
Ie: they won't necessarily try and import test_foo.py as-if it was under your main top-level name.
So, in theory you should (from my experience writing tests):
use relative imports from within your actual code only (ie: any non-test submodule)
use full absolute import from the tests (that simplifies quite a few things for testing tools + it allows to treat your code "less intimately" from the tests -> kinda forces you to import stuff from your namespace project like any other user would do)
Hope that helps. I don't have handy links to docs on this (and maybe it would be worth a good book). But consider this: if you write this from your test:
from ..sub import b
You are taking shortcuts a user of your library cannot do. Anyone who would pip install namespace for example would have to import b with an absolute import:
from namespace.sub import b
It is helpful I find to isolate tests from the code itself. I know many projects do just add a tests/ subfolder to their main code tree, but I do find that odd, since that ships the tests together with the published package, and one could technically import the tests just like the rest of the code... for example:
from namespace.testcase.tests import test_imports
An example of tests/ outside the main code tree is the requests package.
Followed the code, as this got me curious.
unittest discover looks for test cases, it finds testcase/ which looks like a test folder to it.
So it simply does a "standalone" (ie: regardless of any "top-level" context) import testcase.
Then your test does this (all of these imports are simply cached in sys.modules, by name):
from namespace.testcase import a, which triggers the import of a as a submodule of namespace.testcase as expected
but then it calls from ..sub import b, now in unittest's context, this expands to testcase.sub.b, which then leads to the confusion.

Related

sharing a module between tests and core - appropriate project structure

I am trying to improve the project structure while adding to a code base. I found a sample structure here which looks like this:
README.rst
LICENSE
setup.py
requirements.txt
sample/__init__.py
sample/core.py
sample/helpers.py
docs/conf.py
docs/index.rst
tests/test_basic.py
tests/test_advanced.py
I notice in particular that requirements.txt and setup.py are on a higher level than tests/ and sample/
If I add sample/classes.py you need only write from classes import MyClass in sample/core.py to get it in there. It cannot however so easily be imported into tests/test_basic.py, does not seem like python 'looks around the corner' like that when importing.
In my case, there is also a MANIFEST.in on the same level with requirements.txt and some files which are not really python but just set things up for the platform on which this runs.
If classes.py were on the same level as requirements.txt I think it would be easily importable by everything in tests/ and in sample/ and their subdirectories, but it may need a __init__.py That doesn't feel right somehow.
So where should it go if both tests/ and sample/ need to be able to use it?

Let's make it easy.
If I understand correctly, the problem is How to import simple module in test. Which means you want to use something like from simple.classes import MyClass.
That's easy, just add your root path to PYTHONPATH before executing python test/test_basic.py.
That's also what an IDE does for you when you execute tests through it.

Assuming you use a Python >= 3.3, you can simply turn the test folder in a package by adding a __init__.py module in it. Then in that __init__.py (and only there) you add the path of the parent package to sys.path. That if enough for unittest discover to use it for all the modules in tests.
My one is just:
import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
Then if you need to access classes.py from one of the test modules, you can just use:
from sample import classes
or to directly import MyClass:
from sample.classes import MyClass
It just works because sample is already a package, and its parent folder has been added to sys.path when python unittest has loaded the test package.
Of course, this only works in you can have your tests in a package. If for any reason it is not an option, for example because you need to run individually the test modules, then you should put the sys.path modification directly in all the test files.
Write a path_helper.py file in the tests folder:
import os
import sys
core_path = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
if core_path not in sys.path: # don't add it if it is already here
sys.path.append(core_path)
You can then import it in all test files:
import path_helper
...

Setting up imports and init.py in a multi-module project

The problem
I've found dozens of articles and tutorials about the basics of using import in Python, but none that would provide a comprehensive guide on setting up your own Python project with multiple packages.
This is my project's structure:
codename/
__init__.py
package1.py (has class1 and is a script)
package2.py (has class2)
package3.py (has function1 and is a script)
test/
__init__.py
test_package1.py (has unit tests for package1)
test_package3.py (has unit tests for package3)
How do I setup my imports to have the following requirements met (and do all of them make sense?):
class1, class2 and function1 are in namespace codename, i.e. this works:
import codename
obj = codename.class1()
codename.function1(obj)
they may be imported the same way using from codename import * or from codename import class1
function1 can easily access class1 (how?)
package1 and package2 are executable scripts
so are test_package1.py and test_package3.py
tests are also executable via python -m unittest discover
scripts are also executable via python -m codename.package1
For some reasons I'm having issues with having all of these met and when I try to fix one issue, another one pops out.
What have I tried?
Leaving codename/__init__.py empty satisfies almost all of the requirements, because everything works, but leaves names like class1 in their module's namespaces - whereas I want them imported into the package.
Adding from codename.package1 import class1 et al again satisfies most of the requirements, but I get a warning when executing the script via python -m codename.package1:
RuntimeWarning: 'codename.package2' found in sys.modules \
after import of package 'codename', but prior to execution of \
'codename.package2'; this may result in unpredictable behaviour
which sort of makes sense...
Running the script via python codename/package1.py functions, but I guess I would probably like both ways to work.
I ran into an answer to a similar question that stated that internal modules should not also be scripts, but I don't understand why we get the -m switch then? Anyway, extracting the mains into an external scripts directory works, but is it the only canonical way of setting all of this up?

you'll need to add the parent directory of codename/ to the PYTHONPATH environment variable (or write/use a setup.py file, or modify sys.path at runtime)
You'll need to import all names that you want to export in codename/__init__.py
from .package1 import function1 if you write/use a setup.py file, otherwise from codename.package1 import function1
You should use a setup.py file for scripts/executables since it makes everything much cleaner (and you'll need a setup.py file sooner or later anyway)
(and 6.) I would suggest using py.test it will find all tests for you automagically (and can run them in parallel etc.)
That should work out-of-the-box, but if you've written a setup.py then you can run them from anywhere (and on any platform) as just package1.

Nosetests Import Error

I'm trying to use nosetests to run my tests in a directory structure like this
src
- file1.py
- ...
test
- helper.py
- test_file1.py
As you can see, test_file1.py has some functions that test file1.py, so it imports file1.py like this:
# In file1.py
import file1
import helper
# Tests go here...
I also use a helper.py file that has some neat functionality built in so that I can create tests more easily. This functionality is achieved by extending a couple of classes in my actual code and overriding some methods. So helper.py looks something like this:
# In helper.py
import file1
# Use stuff in file1.py
I'm having trouble understanding how nose goes about importing these things with its custom importer. I was able to get my test file to import file1.py by running nosetest ../tests within the src directory, but I'm currently getting an error akin to:
File helper.py:
ImportError: cannot import name file1
How does nose do its imports and is there a way I can essentially get it to lump all my tests/src files together so they can all import one another while I keep them in separate folders?

Seeing that you execute tests with nosetests ../tests I assume they are executed from the tests folder itself. Therefore, files from the src directory are not added to sys.path, hence the error.
To fix this one could:
run tests from the parent directory - nosetests will be able to identify src and test (or tests) directory by himself and will add them to the sys.path before running tests
add src directory path to the PYTHONPATH before running nosetests (export PYTHONPATH=../src; nosetests)
Note that you can as well omit the last argument to the nosetests as by default it runs the tests from current directory. Otherwise, if the tests are not in the directory you launch nosetests from, you can define its location with --where=<path-to-tests> parameter (or, simply -w). So for example you can execute tests from src direcotory and without even setting the PYTHONPATH (because current directory will be added to sys.path by default) like this: nosetests -w ../tests.
Lastly, even though this is very questionable by itself, and yet: the most common way to organize a Python source code is having python files and packages starting directly in the project directory, and having tests in "test" sub-packages of the packages they test. So, in your case it would be:
/file1.py
/test/helper.py
/test/test_file1.py
or better:
/myproject/__init__.py
/myproject/file1.py
/myproject/test/__init__.py
/myproject/test/helper.py
/myproject/test/test_file1.py
(latter, provided you also use correct imports in your test sources, e.g. from .. import file1).
In which case one runs tests from the project's root directory simply with nosetests without any argument.
Anyway, nosetests is flexible enough to work with any structure - use whatever seems more suitable for you and the project.
More on project structure in What is the best project structure for a Python application?

This seems generally like an issue I had with nose tests:
Importing with Python and Nose Tests
The work around I found was to insert a try..except block so that BOTH python and nosetest commands will work on the same directory as follows:
(1) In your main file, at the very top before anything else add:
# In file1.py
try:
# This will allow you to do python file1.py inside the src directory
from file2 import *
from helper import *
except:
# This will allow you to run nosetests in the directory just above
# the src and test directories.
from src.file1 import *
from src.helper import *
(2) Inside your test.py file add:
from src.file2 import *
from src.helper import *

Relative import in Python 3 is not working [duplicate]

This question already has answers here:
Python3 correct way to import relative or absolute?
(2 answers)
Closed 2 years ago.
I have the following directory:
mydirectory
├── __init__.py
├── file1.py
└── file2.py
I have a function f defined in file1.py.
If, in file2.py, I do
from .file1 import f
I get the following error:
SystemError: Parent module '' not loaded, cannot perform relative
import
Why? And how to make it work?

Launching modules inside a package as executables is a bad practice.
When you develop something you either build a library, which is intended to be imported by other programs and thus it doesn't make much sense to allow executing its submodules directly, or you build an executable in which case there's no reason to make it part of a package.
This is why in setup.py you distinguish between packages and scripts. The packages will go under site-packages while the scripts will be installed under /usr/bin (or similar location depending on the OS).
My recommendation is thus to use the following layout:
/
├── mydirectory
| ├── __init__.py
| ├── file1.py
└── file2.py
Where file2.py imports file1.py as any other code that wants to use the library mydirectory, with an absolute import:
from mydirectory.file1 import f
When you write a setup.py script for the project you simply list mydirectory as a package and file2.py as a script and everything will work. No need to fiddle with sys.path.
If you ever, for some reason, really want to actually run a submodule of a package, the proper way to do it is to use the -m switch:
python -m mydirectory.file1
This loads the whole package and then executes the module as a script, allowing the relative import to succeed.
I'd personally avoid doing this. Also because a lot of people don't even know you can do this and will end up getting the same error as you and think that the package is broken.
Regarding the currently accepted answer, which says that you should just use an implicit relative import from file1 import f because it will work since they are in the same directory:
This is wrong!
It will not work in python3 where implicit relative imports are disallowed and will surely break if you happen to have installed a file1 module (since it will be imported instead of your module!).
Even if it works the file1 will not be seen as part of the mydirectory package. This can matter.
For example if file1 uses pickle, the name of the package is important for proper loading/unloading of data.

When launching a python source file, it is forbidden to import another file, that is in the current package, using relative import.
In documentation it is said:
Note that relative imports are based on the name of the current module. Since the name of the main module is always "__main__", modules intended for use as the main module of a Python application must always use absolute imports.
So, as #mrKelley said, you need to use absolute import in such situation.

since file1 and file2 are in the same directory, you don't even need to have an __init__.py file. If you're going to be scaling up, then leave it there.
To import something in a file in the same directory, just do like this
from file1 import f
i.e., you don't need to do the relative path .file1 because they are in the same directory.
If your main function, script, or whatever, that will be running the whole application is in another directory, then you will have to make everything relative to wherever that is being executed.

myproject/
mypackage
├── __init__.py
├── file1.py
├── file2.py
└── file3.py
mymainscript.py
Example to import from one file to another
#file1.py
from myproject import file2
from myproject.file3 import MyClass
Import the package example to the mainscript
#mymainscript.py
import mypackage
https://docs.python.org/3/tutorial/modules.html#packages
https://docs.python.org/3/reference/import.html#regular-packages
https://docs.python.org/3/reference/simple_stmts.html#the-import-statement
https://docs.python.org/3/glossary.html#term-import-path
The variable sys.path is a list of strings that determines the interpreter’s search path for modules. It is initialized to a default path taken from the environment variable PYTHONPATH, or from a built-in default if PYTHONPATH is not set. You can modify it using standard list operations:
import sys
sys.path.append('/ufs/guido/lib/python')
sys.path.insert(0, '/ufs/guido/myhaxxlib/python')
Inserting it at the beginning has the benefit of guaranteeing that the path is searched before others (even built-in ones) in the case of naming conflicts.

Python package structure

I have a Python package with several subpackages.
myproject/
__init__.py
models/
__init__.py
...
controllers/
__init__.py
..
scripts/
__init__.py
myscript.py
Within myproject.scripts.myscript, how can I access myproject.models? I've tried
from myproject import models # No module named myproject
import models # No module named models
from .. import models # Attempted relative import in non-package
I've had to solve this before, but I can never remember how it's supposed to be done. It's just not intuitive to me.

This is the correct version:
from myproject import models
If it fails with ImportError: No module named foo it is because you haven't set PYTHONPATH to include the directory which contains myproject/.
I'm afraid other people will suggest tricks to let you avoid setting PYTHONPATH. I urge you to disregard them. This is why PYTHONPATH exists: to tell Python where to look for code to load. It is robust, reasonably well documented, and portable to many environments. Tricks people play to avoid having to set it are none of these things.
The explicit relative import will work even without PYTHONPATH being set, since it can just walk up the directory hierarchy until it finds the right place, it doesn't need to find the top and then walk down. However, it doesn't work in a script you pass as a command line argument to python (or equivalently, invoke directly with a #!/usr/bin/python line). This is because in both these cases, it becomes the __main__ module of the process. There's nowhere to walk up to from __main__ - it's already at the top! If you invoke the code in your script by importing that module, then it will be fine. That is, compare:
python myproject/scripts/myscript.py
to
python -c 'import myproject.scripts.myscript'
You can take advantage of this by not executing your script module directly, but creating a bin/myscript that does the import and perhaps calls a main function:
import myprojects.scripts.myscript
myprojects.scripts.myscript.main()
Compare to how Twisted's command line scripts are defined: http://twistedmatrix.com/trac/browser/trunk/bin/twistd

Your project is not in your path.
Option A
Install your package so that python can find it via its absolute name from anywhere (using from myproject import models )
Option B
Trickery to add the relative parent to your path
sys.path.append(os.path.abspath('..'))
The former option is recommended.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.