How can I store testing data for python nosetests? - python

I want to write some tests for a python MFCC feature extractor for running with nosetest. As well as some lower-level tests, I would also like to be able to store some standard input and expected-output files with the unit tests.
At the moment we are hard-coding the paths to the files on our servers, but I would prefer the testing files (both input and expected-output) to be somewhere in the code repository so they can be kept under source control alongside the testing code.
The problem I am having is that I'm not sure where the best place to put the testing files would be, and how to know what that path is when nosetest calls each testing function. At the moment I am thinking of storing the testing data in the same folder as the tests and using __file__ to work out where that is (would that work?), but I am open to other suggestions.

I think that using __file__ to figure out where the test is located and storing data alongside the it is a good idea. I'm doing the same for some tests that I write.
This:
os.path.dirname(os.path.abspath(__file__))
is probably the best you are going to get, and that's not bad. :-)

Based on the idea of using __file__, maybe you could use a module to help with the path construction. You could find all the files contained in the module directory, gather their name and path in a dictionnary for later use.
Create a module accessible to your tests, i.e. a directory besides your test such as testData, where you can put your data files. In the __init__.py of this module, insert the following code.
import os
from os.path import join,dirname,abspath
testDataFiles = dict()
baseDir = dirname(abspath(__file__)) + os.path.sep
for root, dirs, files in os.walk(baseDir):
localDataFiles = [(join(root.replace(baseDir,""),name), join(root,name)) for name in files]
testDataFiles.update( dict(localDataFiles))
Assuming you called your module testData and it contains a file called data.txt you can then use the following construct in your test to obtain the path to the file. aFileOperation is assumed to be a function that take a parameter path
import unittest
from testData import testDataFiles
class ATestCase(unittest.TestCase):
def test_Something(self):
self.assertEqual( 0, aFileOperation(testDataFiles['data.txt'] )
It will also allow you to use subdirectories such as
def test_SomethingInASubDir(self):
self.assertEqual( 0, aFileOperation(testDataFiles['subdir\\data.txt'] )

Related

How do I ensure that a python package module saves results to a sub-directory of that package?

I'm creating a package with the following structure
/package
__init__.py
/sub_package_1
__init__.py
other_stuff.py
/sub_package_2
__init__.py
calc_stuff.py
/results_dir
I want to ensure that calc_stuff.py will save results to /results_dir, unless otherwise specified (yes, I'm not entirely certain having a results directory in my package is the best idea, but it should work well for now). However, since I don't know from where, or on which machine calc_stuff will be run, I need the package, or at least my_calc.py, to know where it is saved.
So far the two approaches I have tried:
from os import path
saved_dir = path.join(path.dirname(__file__), 'results_dir')
and
from pkg_resources import resource_filename
filepath = resource_filename(__name__, 'results_dir')
have only given me paths relative to the root of the package.
What do I need to do to ensure a statement along the lines of:
pickle.dump(my_data,open(os.path.join(full_path,
'results_dir',
'results.pkl'), 'wb')
Will result in a pickle file being saved into results_dir ?
I'm not entirely certain having a results directory in my package is the best idea, me either :)
But, if you were to put a function like the following inside a module in subpackage2, it should return a path consisting of (module path minus filename, 'results_dir', the filename you passed the function as an argument):
def get_save_path(filename):
import os
return os.path.join(os.path.dirname(__file__), "results_dir", filename)
C:\Users\me\workspaces\workspace-oxygen\test36\TestPackage\results_dir\foo.ext

python unittest xml files created in different folder when test fails

The following code results in the log file printed to different folders depending on whether the test passes or not. I have a test case with one test purpose. During the running of the test, it does a chdir().
If the test result is fail (an assert* fails), the xml file is written to the test's current directory. If the test result is pass, then the xml file is written to the start folder. See the code snippet for how I specify the log file folder. Other than using full paths, is there a way to make python unittest always write it to the start folder?
logFolderName = "TestMyStuff_detail-" +str(scriptPid)
unittest.main(testRunner=xmlrunner.XMLTestRunner(output=logFolderName),
failfast=False)
Other than using full paths, is there a way to make python unittest always write it to the start folder?
Doubtful since relative paths will always be relative to the current working directory. If your test changes the current working directory, you're kind of out of luck.
With that said, it shouldn't be too hard to use a full path:
import os
cwd = os.getcwd()
localLogFolderName = "TestMyStuff_detail-" +str(scriptPid)
logFolderName = os.path.abspath(os.path.join(cwd, localLogFolderName))
you could use a fixed path to write your output.
Something like
path_to_my_output_folder="/path/to/output/"
test1_write_xml(path_to_my_output_folder+"file1.xml")
test2_write_xml(path_to_my_output_folder+"file2.xml")
test3_write_xml(path_to_my_output_folder+"file3.xml")

Accessing resource files in Python unit tests & main code

I have a Python project with the following directory structure:
project/
project/src/
project/src/somecode.py
project/src/mypackage/mymodule.py
project/src/resources/
project/src/resources/datafile1.txt
In mymodule.py, I have a class (lets call it "MyClass") which needs to load datafile1.txt. This sort of works when I do:
open ("../resources/datafile1.txt")
Assuming the code that creates the MyClass instance created is run from somecode.py.
The gotcha however is that I have unit tests for mymodule.py which are defined in that file, and if I leave the relative pathname as described above, the unittest code blows up as now the code is being run from project/src/mypackage instead of project/src and the relative filepath doesn't resolve correctly.
Any suggestions for a best practice type approach to resolve this problem? If I move my testcases into project/src that clutters the main source folder with testcases.
I usually use this to get a relative path from my module. Never tried in a unittest tho.
import os
print(os.path.join(os.path.dirname(__file__),
'..',
'resources'
'datafile1.txt'))
Note: The .. tricks works pretty well, but if you change your directory structure you would need to update that part.
On top of the above answers, I'd like to add some Python 3 tricks to make your tests cleaner.
With the help of the pathlib library, you can explicit your ressources import in your tests. It even handles the separators difference between Unix (/) and Windows ().
Let's say we have a folder structure like this :
`-- tests
|-- test_1.py <-- You are here !
|-- test_2.py
`-- images
|-- fernando1.jpg <-- You want to import this image !
`-- fernando2.jpg
You are in the test_1.py file, and you want to import fernando1.jpg. With the help to the pathlib library, you can read your test resource with an object oriented logic as follows :
from pathlib import Path
current_path = Path(os.path.dirname(os.path.realpath(__file__)))
image_path = current_path / "images" / "fernando1.jpg"
with image_path.open(mode='rb') as image :
# do what you want with your image object
But there's actually convenience methods to make your code more explicit than mode='rb', as :
image_path.read_bytes() # Which reads bytes of an object
text_file_path.read_text() # Which returns you text file content as a string
And there you go !
in each directory that contains Python scripts, put a Python module that knows the path to the root of the hierarchy. It can define a single global variable with the relative path. Import this module in each script. Python searches the current directory first so it will always use the version of the module in the current directory, which will have the relative path to the root of the current directory. Then use this to find your other files. For example:
# rootpath.py
rootpath = "../../../"
# in your scripts
from rootpath import rootpath
datapath = os.path.join(rootpath, "src/resources/datafile1.txt")
If you don't want to put additional modules in each directory, you could use this approach:
Put a sentinel file in the top level of the directory structure, e.g. thisisthetop.txt. Have your Python script move up the directory hierarchy until it finds this file. Write all your pathnames relative to that directory.
Possibly some file you already have in the project directory can be used for this purpose (e.g. keep moving up until you find a src directory), or you can name the project directory in such a way to make it apparent.
You can access files in a package using importlib.resources (mind Python version compatibility of the individual functions, there are backports available as importlib_resources), as described here. Thus, if you put your resources folder into your mypackage, like
project/src/mypackage/__init__.py
project/src/mypackage/mymodule.py
project/src/mypackage/resources/
project/src/mypackage/resources/datafile1.txt
you can access your resource file in code without having to rely on inferring file locations of your scripts:
import importlib.resources
file_path = importlib.resources.files('mypackage').joinpath('resources/datafile1.txt')
with open(file_path) as f:
do_something_with(f)
Note, if you distribute your package, don't forget to include the resources/ folder when creating the package.
The filepath will be relative to the script that you initially invoked. I would suggest that you pass the relative path in as an argument to MyClass. This way, you can have different paths depending on which script is invoking MyClass.

Is it possible to 'import * from DIRECTORY', then somehow (anyhow) iterate over the loaded modules?

Let me explain the use case...
In a simple python web application framework designed for Google App Engine, I'd like to have my models loaded automatically from a 'models' directory, so all that's needed to add a model to the application is place a file user.py (for example), which contains a class called 'User', in the 'models/' directory.
Being GAE, I can't read from the file system so I can't just read the filenames that way, but it seems to me that I must be able to 'import * from models' (or some equivalent), and retrieve at least a list of module names that were loaded, so I can subject them to further processing logic.
To be clear, I want this to be done WITHOUT having to maintain a separate list of these module names for the application to read from.
You can read from the filesystem in GAE just fine; you just can't write to the filesystem.
from models import * will only import modules listed in __all__ in models/__init__.py; there's no automatic way to import all modules in a package if they're not declared to be part of the package. You just need to read the directory (which you can do) and __import__() everything in it.
As explained in the Python tutorial, you cannot load all .py files from a directory unless you list them manually in the list named __all__ in the file __init__.py. One of the reasons why this is impossible is that it would not work well on case-insensitive file systems -- Python would not know in which case the module names should be used.
Let me start by saying that I'm not familiar with Google App Engine, but the following code demonstrates how to import all python files from a directory. In this case, I am importing files from my 'example' directory, which contains one file, 'dynamic_file.py'.
import os
import imp
import glob
def extract_module_names(python_files):
module_names = []
for py_file in python_files:
module_name = (os.path.basename(py_file))[:-3]
module_names.append(module_name)
return module_names
def load_modules(modules, py_files):
module_count = len(modules)
for i in range(0, module_count):
globals()[modules[i]] = imp.load_source(modules[i], py_files[i])
if __name__ == "__main__":
python_files = glob.glob('example/*.py')
module_names = extract_module_names(python_files)
load_modules(module_names, python_files)
dynamic_file.my_func()
Also, if you wish to iterate over these modules, you could modify the load_modules function to return a list of the loaded module objects by appending the 'imp.load_source(..)' call to a list.
Hope it helps.

I want to load all of the unit-tests in a tree, can it be done?

I have a heirarchical folder full of Python unit-tests. They are all importable ".py" files which define TestCase objects. This folder contains thousands of files in many nested subdirectories and was written by somebody else. I do not have permission to change it, I just have to run it.
I want to generate a single TestSuite object which contains all of the TestCases in the folder. Is there an easy and elegant way to do this?
Thanks
The nose application may be useful for you, either directly, or to show how to implement this.
http://code.google.com/p/python-nose/ seems to be the home page.
Basically, what you want to do is walk the source tree (os.walk), use imp.load_module
to load the module, use unittest.defaultTestLoader to load the tests from the module into a TestSuite, and then use that in whatever way you need to use it.
Or at least that's approximately what I do in my custom TestRunner implementation
(bzr get http://code.liw.fi/coverage-test-runner/bzr/trunk).
Look at the unittest.TestLoader (https://docs.python.org/library/unittest.html#loading-and-running-tests)
And the os.walk (https://docs.python.org/library/os.html#files-and-directories)
You should be able to traverse your package tree using the TestLoader to build a suite which you can then run.
Something along the lines of this.
runner = unittest.TextTestRunner()
superSuite = unittest.TestSuite()
for path, dirs, files in os.walk( 'path/to/tree' ):
# if a CVS dir or whatever: continue
for f in files:
# if not a python file: continue
suite= unittest.defaultTestLoader.loadTestsFromModule( os.path.join(path,f)
superSuite .addTests(suite ) # OR runner.run( suite)
runner.run( superSuite )
You can either walk through the tree simply running each test (runner.run(suite)) or you can accumulate a superSuite of all individual suites and run the whole mass as a single test (runner.run( superSuite )).
You don't need to do both, but I included both sets of suggestions in the above (untested) code.
The test directory of the Python Library source shows the way.
The README file describes how to write Python Regression Tests for library modules.
The regrtest.py module starts with:
"""Regression test.
This will find all modules whose name is "test_*" in the test
directory, and run them.

Categories

Resources