Is there a way of unit testing what modules are imported in a Python file (a bit like ArchUnit in Java)? The context is in implementing a hexagonal architecture and wanting to ensure that the domain model does not import any code that resides in an adapter. I'd like unit tests to fail if there are forbidden imports.
For example, I might like to test that no modules within foo.bar.domain import anything from foo.bar.adapter. Imports of foo.bar.domain should be allowed from within foo.bar.adapter.
Is this possible in Python and what's the best way of achieving this?
You can use the -Ximporttime Python flag to trace imports. I'm not entirely sure what would be the logic for finding forbidden imports in your case, but here's a silly example script that might help:
import subprocess
import sys
process = subprocess.run(
('python3', '-Ximporttime', '-c', 'import ' + 'shlex'),
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
encoding='utf-8',
)
blacklisted_imports = {'enum', 're', 'zipfile'}
data = [
x.rpartition('|')[2].strip() for x in process.stderr.split('\n')
]
for import_ in data:
if import_ in blacklisted_imports:
print('found bad import:', import_)
Output:
found bad import: enum
found bad import: re
I am not aware that testing methods exist for this specific case, but someone might know more about it. One thing that comes to mind are try-catch with the methods from the other module being checked if you can call a method. Another hacky way, would be to add custom string constants in global context of the each module, and if they are exist you know that the submodule imported/used the other module.
Check more on this stack overflow post.
Related
In Python's typing module, they have a really helpful constant that's True when type checking, but False otherwise. This means, for example, that you can import classes dynamically if TYPE_CHECKING evaluates to True.
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from module import Class
It would be super useful if unittest had something similar. I can see in the __init__.py file, there exists a variable defined as __unittest = True:
__all__ = ['TestResult', 'TestCase', 'TestSuite',
'TextTestRunner', 'TestLoader', 'FunctionTestCase', 'main',
'defaultTestLoader', 'SkipTest', 'skip', 'skipIf', 'skipUnless',
'expectedFailure', 'TextTestResult', 'installHandler',
__unittest = True
Is there any way to use __unittest in the same way as TYPE_CHECKING from typing?
Reason for this: I have some user examples in my code-base which can be run and plot graphs. I would like to run these examples as part of the unit tests to see when they break and need fixing. I need a dynamic way of stopping the examples trying to open a plotting window and blocking the unit tests, however.
Any help very much appreciated!
Reason for this: I have some user examples in my code-base which can be run and plot graphs. I would like to run these examples as part of the unit tests to see when they break and need fixing. I need a dynamic way of stopping the examples trying to open a plotting window and blocking the unit tests, however.
The best way to achieve that is by mocking. This will replace functionality with some "mocked" functionality. This is generally used for "plotting code" or code that makes "requests" and such like. Because you can disable certain functionality that you don't need/want while testing.
It also avoids cluttering the production code with unittest-related stuff. For type checking this is needed because that happens at runtime but unittests generally don't happen at runtime or should have any effect at runtime.
You haven't said what kind of plotting you used but in case it's matplotlib they do have some documentation to facilitate testing or see this QA.
To check what you are asking (that seems if unittest has been imported) you can check if the key "unittest" is in sys.modules
import sys
import unittest
if "unittest" in sys.modules:
print("I'm unittest-ing")
For situations like this, I'd suggest using command-line arguments to control whether code is run or not.
In this case, you could have a simple module that defines a value like is_unit_testing based on whether the -unittest argument has been passed to the python process.
To handle commandline arguments, please look here: How do I access command line arguments in Python?
import sys
# sys.argv is a list of all commandline arguments you can check through
command_line_arguments_list = sys.argv # e.g. ["arg1", "arg2", "-unittest"]
is_unit_testing = "-unittest" in command_line_arguments_list
You can then pass the arguments as you'd expect via command-line:
python myModule.py arg1 arg2 -unittest
This works well for anything like this where you want multiple 'builds' from the same code base. Such as having a 'no database'/'no gui' mode etc.
I have a codebase where I'm cleaning up some messy decisions by the previous developer. Frequently, he has done something like:
from scipy import *
from numpy import *
...This, of course, pollutes the name space and makes it difficult to tell where an attribute in the module is originally from.
Is there any way to have Python analyze and fix this for me? Has anyone made a utility for this? If not, how might a utility like this be made?
I think PurityLake's and Martijn Pieters's assisted-manual solutions are probably the best way to go. But it's not impossible to do this programmatically.
First, you need to get a list of all names that existing in the module's dictionary that might be used in the code. I'm assuming your code isn't directly calling any dunder functions, etc.
Then, you need to iterate through them, using inspect.getmodule() to find out which module each object was originally defined in. And I'm assuming that you're not using anything that's been doubly from foo import *-ed. Make a list of all of the names that were defined in the numpy and scipy modules.
Now you can take that output and just replace each foo with numpy.foo.
So, putting it together, something like this:
for modname in sys.argv[1:]:
with open(modname + '.py') as srcfile:
src = srcfile.read()
src = src.replace('from numpy import *', 'import numpy')
src = src.replace('from scipy import *', 'import scipy')
mod = __import__(modname)
for name in dir(mod):
original_mod = inspect.getmodule(getattr(mod, name))
if original_mod.__name__ == 'numpy':
src = src.replace(name, 'numpy.'+name)
elif original_mod.__name__ == 'scipy':
src = src.replace(name, 'scipy.'+name)
with open(modname + '.tmp') as dstfile:
dstfile.write(src)
os.rename(modname + '.py', modname + '.bak')
os.rename(modname + '.tmp', modname + '.py')
If either of the assumptions is wrong, it's not hard to change the code. Also, you might want to use tempfile.NamedTemporaryFile and other improvements to make sure you don't accidentally overwrite things with temporary files. (I just didn't want to deal with the headache of writing something cross-platform; if you're not running on Windows, it's easy.) And add in some error handling, obviously, and probably some reporting.
Yes. Remove the imports and run a linter on the module.
I recommend using flake8, although it may also create a lot of noise about style errors.
Merely removing the imports and trying to run the code is probably not going to be enough, as many name errors won't be raised until you run just the right line of code with just the right input. A linter will instead analyze the code by parsing and will detect potential NameErrors without having to run the code.
This all presumes that there are no reliable unit tests, or that the tests do not provide enough coverage.
In this case, where there are multiple from module import * lines, it gets a little more painful in that you need to figure out for each and every missing name what module supplied that name. That will require manual work, but you can simply import the module in a python interpreter and test if the missing name is defined on that module:
>>> import scipy, numpy
>>> 'loadtxt' in dir(numpy)
True
You do need to take into account that in this specific case, that there is overlap between the numpy and scipy modules; for any name defined in both modules, the module imported last wins.
Note that leaving any from module import * line in place means the linter will not be able to detect what names might raise NameErrors!
I've now made a small utility for doing this which I call 'dedazzler'. It will find lines that are 'from module import *', and then expand the 'dir' of the target modules, replacing the lines.
After running it, you still need to run a linter. Here's the particularly interesting part of the code:
import re
star_match = re.compile('from\s(?P<module>[\.\w]+)\simport\s[*]')
now = str(time.time())
error = lambda x: sys.stderr.write(x + '\n')
def replace_imports(lines):
"""
Iterates through lines in a Python file, looks for 'from module import *'
statements, and attempts to fix them.
"""
for line_num, line in enumerate(lines):
match = star_match.search(line)
if match:
newline = import_generator(match.groupdict()['module'])
if newline:
lines[line_num] = newline
return lines
def import_generator(modulename):
try:
prop_depth = modulename.split('.')[1:]
namespace = __import__(modulename)
for prop in prop_depth:
namespace = getattr(namespace, prop)
except ImportError:
error("Couldn't import module '%s'!" % modulename)
return
directory = [ name for name in dir(namespace) if not name.startswith('_') ]
return "from %s import %s\n"% (modulename, ', '.join(directory))
I'm maintaining this in a more useful stand-alone utility form here:
https://github.com/USGM/dedazzler/
ok, this is what i think you could do, break the program. remove the imports and notice the errors that are made. Then import only the modules that you want, this may take a while but this is the only way I know of doing this, I will be happily surprised if someone does know of a tool to help
EDIT:
ah yes, a linter, I hadn't thought of that.
I'm refactoring and eliminating wildcard imports on some fairly monolithic code.
Pylint seems to do a great job of listing all the unused imports that come along with a wildcard import, but what i wish it did was provide a list of used imports so I can quickly replace the wildcard import. Any quick ways of doing this? I'm about to go parse the output of pyLint and do a set.difference() on this and the dir() of the imported module. But I bet there's some tool/procedure I'm not aware of.
NB: pylint does not recommend a set of used imports. When changing this, you have to be aware of other modules importing the code you are modifying, which could use symbols which belong to the namespace of the module you are refactoring only because you have unused imports.
I recommend the following procedure to refactor from foo import *:
in an interactive shell, type:
import re
import foo as module # XXX use the correct module name here!
module_name = module.__name__
import_line = 'from %s import (%%s)' % module_name
length = len(import_line) - 3
print import_line % (',\n' + length * ' ').join([a for a in dir(module)
if not re.match('__.*[^_]{2}', a)])
replace the from foo import * line with the one printed above
run pylint, and remove the unused imports flagged by pylint
run pylint again on the whole code based, looking for imports of non existing sympols
run your unit tests
repeat with from bar import *
Here's dewildcard, a very simple tool based on Alex's initial ideas:
https://github.com/quentinsf/dewildcard
This is an old question, but I wrote something that does this based on autoflake.
See here: https://github.com/fake-name/autoflake/blob/master/autostar.py
It works the opposite way dewildcard does, in that it attempts to fully qualify all uses of wildcard items.
E.g.
from os.path import *
Is converted to
import os.path
and all uses of os.path.<func> are prepended with the proper function.
Summary: when a certain python module is imported, I want to be able to intercept this action, and instead of loading the required class, I want to load another class of my choice.
Reason: I am working on some legacy code. I need to write some unit test code before I start some enhancement/refactoring. The code imports a certain module which will fail in a unit test setting, however. (Because of database server dependency)
Pseduo Code:
from LegacyDataLoader import load_me_data
...
def do_something():
data = load_me_data()
So, ideally, when python excutes the import line above in a unit test, an alternative class, says MockDataLoader, is loaded instead.
I am still using 2.4.3. I suppose there is an import hook I can manipulate
Edit
Thanks a lot for the answers so far. They are all very helpful.
One particular type of suggestion is about manipulation of PYTHONPATH. It does not work in my case. So I will elaborate my particular situation here.
The original codebase is organised in this way
./dir1/myapp/database/LegacyDataLoader.py
./dir1/myapp/database/Other.py
./dir1/myapp/database/__init__.py
./dir1/myapp/__init__.py
My goal is to enhance the Other class in the Other module. But since it is legacy code, I do not feel comfortable working on it without strapping a test suite around it first.
Now I introduce this unit test code
./unit_test/test.py
The content is simply:
from myapp.database.Other import Other
def test1():
o = Other()
o.do_something()
if __name__ == "__main__":
test1()
When the CI server runs the above test, the test fails. It is because class Other uses LegacyDataLoader, and LegacydataLoader cannot establish database connection to the db server from the CI box.
Now let's add a fake class as suggested:
./unit_test_fake/myapp/database/LegacyDataLoader.py
./unit_test_fake/myapp/database/__init__.py
./unit_test_fake/myapp/__init__.py
Modify the PYTHONPATH to
export PYTHONPATH=unit_test_fake:dir1:unit_test
Now the test fails for another reason
File "unit_test/test.py", line 1, in <module>
from myapp.database.Other import Other
ImportError: No module named Other
It has something to do with the way python resolves classes/attributes in a module
You can intercept import and from ... import statements by defining your own __import__ function and assigning it to __builtin__.__import__ (make sure to save the previous value, since your override will no doubt want to delegate to it; and you'll need to import __builtin__ to get the builtin-objects module).
For example (Py2.4 specific, since that's what you're asking about), save in aim.py the following:
import __builtin__
realimp = __builtin__.__import__
def my_import(name, globals={}, locals={}, fromlist=[]):
print 'importing', name, fromlist
return realimp(name, globals, locals, fromlist)
__builtin__.__import__ = my_import
from os import path
and now:
$ python2.4 aim.py
importing os ('path',)
So this lets you intercept any specific import request you want, and alter the imported module[s] as you wish before you return them -- see the specs here. This is the kind of "hook" you're looking for, right?
There are cleaner ways to do this, but I'll assume that you can't modify the file containing from LegacyDataLoader import load_me_data.
The simplest thing to do is probably to create a new directory called testing_shims, and create LegacyDataLoader.py file in it. In that file, define whatever fake load_me_data you like. When running the unit tests, put testing_shims into your PYTHONPATH environment variable as the first directory. Alternately, you can modify your test runner to insert testing_shims as the first value in sys.path.
This way, your file will be found when importing LegacyDataLoader, and your code will be loaded instead of the real code.
The import statement just grabs stuff from sys.modules if a matching name is found there, so the simplest thing is to make sure you insert your own module into sys.modules under the target name before anything else tries to import the real thing.
# in test code
import sys
import MockDataLoader
sys.modules['LegacyDataLoader'] = MockDataLoader
import module_under_test
There are a handful of variations on the theme, but that basic approach should work fine to do what you describe in the question. A slightly simpler approach would be this, using just a mock function to replace the one in question:
# in test code
import module_under_test
def mock_load_me_data():
# do mock stuff here
module_under_test.load_me_data = mock_load_me_data
That simply replaces the appropriate name right in the module itself, so when you invoke the code under test, presumably do_something() in your question, it calls your mock routine.
Well, if the import fails by raising an exception, you could put it in a try...except loop:
try:
from LegacyDataLoader import load_me_data
except: # put error that occurs here, so as not to mask actual problems
from MockDataLoader import load_me_data
Is that what you're looking for? If it fails, but doesn't raise an exception, you could have it run the unit test with a special command line tag, like --unittest, like this:
import sys
if "--unittest" in sys.argv:
from MockDataLoader import load_me_data
else:
from LegacyDataLoader import load_me_data
I need to make one function in a module platform-independent by offering several implementations, without changing any files that import it. The following works:
do_it = getattr(__import__(__name__), "do_on_" + sys.platform)
...but breaks if the module is put into a package.
An alternative would be an if/elif with hard-coded calls to the others in do_it().
Anything better?
Put the code for platform support in different files in your package. Then add this to the file people are supposed to import from:
if sys.platform.startswith("win"):
from ._windows_support import *
elif sys.platform.startswith("linux"):
from ._unix_support import *
else:
raise ImportError("my module doesn't support this system")
Use globals()['do_on_' + platform] instead of the getattr call and your original idea should work whether this is inside a package or not.
If you need to create a platform specific instance of an class you should look into the Factory Pattern:
link text
Dive Into Python offers the exceptions alternative.