Python: Run The Same Unittest module Tests For Multiple Files - python

I am attempting to create a simple framework that will discover all of the test cases from a specific directory (I am using unittest for these cases) and run each of these test cases against multiples python files that will all implement the same code with the same function signatures.
Autograder.py
TestCasesFolder/
TestCase1.py
TestCase2.py
...
ImplementationFolder/
Implementation1.py
SecondImplementationFolder/
Implementation2.py
The framework succesfully finds all of the test case using (note this is in the class)
self.suites = unittest.defaultTestLoader.discover(self.testDirectory)
From there, I would like to run these suites on both Implementation1 and Implementation2.
I have been using the built in
self.suites.run(unittest.TestResult)
method from unittest to run my tests, and my first attempt at solving this problem was to import the current implementation I wanted to test using
imp.load_source
and then update the global namespace for the TestCase1.py with the correct module reference. However, because each module has its own global namespace I'm not sure if I can hook into the other files namespace. I am also not sure if this the correct approach, or if there is a better way than my implementation. How should I go about doing this?
EDIT
My current solution that seems to work is for the Autograder.py file to update the __builtins__ module with a reference to the Implementation module. The actual line looks like:
__builtins__.ImplementationModule = imp.load_source("Implementation Module", "Implementation1.py")
This means when the TestCase1.py has access to ImplementationModule through __builtins__. Of course the problem is this assumes that the __builtins__ module never implements anything that has the name ImplementationModule otherwise I will overwrite it with unknown implications. Is there a less risky version of doing this?

Have you looked at the nose system? It sounds very similar to what you are doing.
http://readthedocs.org/docs/nose/

Related

Choose Python classes to instantiate at runtime based on either user input or on command line parameters

I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.

PyCharm procedural __all__ generation and syntax highlighting

I'm using this decorator to manage __all__ in a DRY manner:
def export(obj):
mod = sys.modules[obj.__module__]
if hasattr(mod, '__all__'):
mod.__all__.append(obj.__name__)
else:
mod.__all__ = [obj.__name__]
return obj
For names imported with import * PyCharm issues an unresolved reference error, which is understandable, since it doesn't run the code before analysis. But it is an obvious inconvenience.
How would you solve it (or maybe already solved)?
My assumptions:
Adding some automatic linter plugin or altering existing PyCharm's inspection code would be fine.
Something that's actually editing a .py source is viable, but not fine.
This method is probably not the best one, therefore suggesting another convenient technique of dealing with exports is fine too.
You may be interested in an alternative approach to managing __all__:
https://pypi.org/project/auto-all/
This provides a start_all() and end_all() function to place in your module around the items you want to make accessible. This approach works with PyCharms code inspection.
from auto_all import start_all, end_all
# Imports outside the start and end function calls are not included in __all__.
from pathlib import Path
def a_private_function():
print("This is a private function.")
# Start defining externally accessible objects.
start_all(globals())
def a_public_function():
print("This is a public function.")
# Stop defining externally accessible objects.
end_all(globals())
I feel like this is a reasonable approach to managing __all__, and one that I used on more complex packages. The source code for the package is small, so could easily be included direct in your code to avoid external dependencies if you need.
The reason I use this is I have some modules where lots of items need to be "exported" and a I want to keep imported items out of the export list. I have multiple developers working on the code and it's easy to add new items and forget to include them in __all__, so automating this helps.

(Monkey) patch for micro python unit test

I can’t find any examples, I will try to make the question specific:
Given that micropython has some form of unit test library, how can I do monkey patch or similar to replace system objects output or input within test cases.
The desire is to write test cases that cannot be mocked without altering the actual implementation code just for test, I.e network or file system objects replaced with mocks using patch - or a similar manual way of overriding the system objects for test purposes.
You can try the technique I laid out in https://forum.micropython.org/viewtopic.php?t=4475#p25925
# load in the module for patching (make sure this gets run before other imports)
import modulename
# create a modified version of modulename, cherry-picking from the real thing
patchedmodule = ...
# replace the module with your modified one for all future imports
sys.modules['modulename']=patchedmodule

Save A Reloaded Python Module For Testing Purposes

I have a Python module that I am testing, and because of the way that the module works (it does some initialization upon import) have been reloading the module during each unittest that is testing the initialization. The reload is done in the setUp method, so all tests are actually reloading the module, which is fine.
This all works great if I am only running tests in that file during any given Python session because I never required a reference to the previous instance of the module. But when I use Pydev or unittest's discover I get errors as seen here because other tests which import this module have lost their reference to objects in the module since they were imported before all of the reloading business in my tests.
There are similar questions around SO like this one, but those all deal with updating objects after reloads have occurred. What I would like to do is save the state of the module after the initial import, run my tests that do all of the reloading, and then in the test tearDown to put the initial reference to the module back so that tests that run downstream that use the module still have the correct reference. Note that I am not making any changes to the module, I am only reloading it to test some initialization pieces that it does.
There are also some solutions that include hooks in the module code which I am not interested in. I don't want to ask developers to push things into the codebase just so tests can run. I am using Python 2.6 and unittest. I see that some projects exist like process-isolation, and while I am not sure if that does entirely what I am asking for, it does not work for Python 2.6 and I don't want to add new packages to our stack if possible. Stub code follows:
import mypackage.mymodule
saved_module = mypackage.mymodule
class SomeTestThatReloads(unittest.TestCase):
def setUp(self):
reload(mypackage.mymodule)
def tearDown(self):
# What to do here with saved_module?
def test_initialization(self):
# testing scenario code
Unfortunately, there is no simple way to do that. If your module's initialization has side effects (and by the looks of it it does -- hooks, etc.), there is no automated way to undo them, short of entirely restarting the Python process.
Similarly, if anything in your code imports something from your module rather than the module itself (e.g. from my_package.my_module import some_object instead of import my_package.my_module), reloading the module won't do anything to the imported objects (some_object will refer to whatever my_package.my_module.some_object referred to when the import statement was executed, regardless of what you reload and what's on the disk).
The problem this all comes down to is that Python's module system works by executing the modules (which is full of side effects, the definition of classes/functions/variables being only one of many) and then exposing the top-level variables they created, and the Python VM itself treats modules as one big chunk of global state with no isolation.
Therefore, the general solution to your problem is to restart a new Python process after each test (which sucks :( ).
If your modules' initialization side effects are limited, you can try running your tests with Nose instead of Unittest (the tests are compatible, you don't have to rewrite anything), whose Isolate plugin attempts to do what you want: http://nose.readthedocs.org/en/latest/plugins/isolate.html
But it's not guaranteed to work in the general case, because of what I said above.

Import statement: Config file Python

I'm maintaining a dictionary and that is loaded inside the config file. The dictionary is loaded from a JSON file.
In config.py
name_dict = json.load(open(dict_file))
I'm importing this config file in several other scripts(file1.py, file2.py,...,filen.py) using
import config
statement. My question is when will the config.py script be executed ? I'm sure it wont be executed for every import call that is made inside my multiple scripts. But, what exactly happens when an import statement is called.
The top-level code in a module is executed once, the first time you import it. After that, the module object will be found in sys.modules, and the code will not be re-executed to re-generate it.
There are a few exceptions to this:
reload, obviously.
Accidentally importing the same module under two different names (e.g., if the module is in a package, and you've got some directory in the middle of the package in sys.path, you could end up with mypackage.mymodule and mymodule being two copies of the same thing, in which case the code gets run twice).
Installing import hooks/custom imported that replace the standard behavior.
Explicitly monkeying with sys.modules.
Directly calling functions out of imp/importlib or the like.
Certain cases with multiprocessing (and modules that use it indirectly, like concurrent.futures).
For Python 3.1 and later, this is all described in detail under The import system. In particular, look at the Searching section. (The multiprocessing-specific cases are described for that module.)
For earlier versions of Python, you pretty much have to infer the behavior from a variety of different sources and either reading the code or experimenting. However, the well-documented new behavior is intended to work like the old behavior except in specifically described ways, so you can usually get away with reading the 3.x docs even for 2.x.
Note that in general, you don't want to rely on whether top-level code in the module is run once or multiple times. For example, given a top-level function definition, as long as you never compare function objects, or rebind any globals that it (meaning the definition itself, not just the body) depends on, it doesn't make any difference. However, there are some exceptions to that, and loading start-time config files is a perfect example of an exception.

Categories

Resources