Mocking Directory Structure in Python - python

I have some code below that I'm using to take an input of files, open and process, and then output some data. I've gotten the functionality working and I'm unit testing it now, below is an example of the code.
def foo(dir):
path_to_search = join(dir, "/baz/foo")
if isdir(path_to_search):
#path exists so do stuff...
for fname in listdir(path_to_search):
do_stuff()
else:
print "path doesn't exist"
I've been able to create a test where the past doesn't exist easily enough, but as you can see above I assert that the "/baz/foo" portion of the directory structure exists (in production the directory structure must have this file, in some cases it won't and we won't need to process it.)
I've tried to create a temporary directory structure using TempDir and join, but the code always kicks out saying the path doesn't exists.
Is it possible to mock the output of os.listdir such that I won't need to create a temporary directory structure that follows the needed /baz/foo convention?

You don't need to create a fake directory structure, all you need to do is mock the isdir() and listdir() functions.
Using the unittest.mock library (or the external mock library, which is the exact same thing for Python versions < 3.3):
try:
# Python >= 3.3
from unittest import mock
except ImportError:
# Python < 3.3
import mock
with mock.patch('yourmodule.isdir') as mocked_isdir, \
mock.patch('yourmodule.listdir') as mocked_listdir:
mocked_isdir.return_value = True
mocked_listdir.return_value = ['filename1', 'filename2']
yourmodule.foo('/spam/eggs')
mocked_isdir.assert_called_with('/spam/eggs/baz/foo')
mocked_listdir.assert_called_with('/spam/eggs/baz/foo')

Related

Is this the approved way to acess data adjacent to/packaged with a Python script?

I have a Python script that needs some data that's stored in a file that will always be in the same location as the script. I have a setup.py for the script, and I want to make sure it's pip installable in a wide variety of environments, and can be turned into a standalone executable if necessary.
Currently the script runs with Python 2.7 and Python 3.3 or higher (though I don't have a test environment for 3.3 so I can't be sure about that).
I came up with this method to get the data. This script isn't part of a module directory with __init__.py or anything, it's just a standalone file that will work if just run with python directly, but also has an entry point defined in the setup.py file. It's all one file. Is this the correct way?
def fetch_wordlist():
wordlist = 'wordlist.txt'
try:
import importlib.resources as res
return res.read_binary(__file__, wordlist)
except ImportError:
pass
try:
import pkg_resources as resources
req = resources.Requirement.parse('makepw')
wordlist = resources.resource_filename(req, wordlist)
except ImportError:
import os.path
wordlist = os.path.join(os.path.dirname(__file__), wordlist)
with open(wordlist, 'rb') as f:
return f.read()
This seems ridiculously complex. Also, it seems to rely on the package management system in ways I'm uncomfortable with. The script no longer works unless it's been pip-installed, and that also doesn't seem desirable.
Resources living on the filesystem
The standard way to read a file adjacent to your python script would be:
a) If you've got python>=3.4 I'd suggest you use the pathlib module, like this:
from pathlib import Path
def fetch_wordlist(filename="wordlist.txt"):
return (Path(__file__).parent / filename).read_text()
if __name__ == '__main__':
print(fetch_wordlist())
b) And if you're still using a python version <3.4 or you still want to use the good old os.path module you should do something like this:
import os
def fetch_wordlist(filename="wordlist.txt"):
with open(os.path.join(os.path.dirname(__file__), filename)) as f:
return f.read()
if __name__ == '__main__':
print(fetch_wordlist())
Also, I'd suggest you capture exceptions in the outer callers, the above methods are standard way to read files in python so you don't need wrap them in a function like fetch_wordlist, said otherwise, reading files in python is an "atomic" operation.
Now, it may happen that you've frozen your program using some freezer such as cx_freeze, pyinstaller or similars... in that case you'd need to detect that, here's a simple way to check it out:
a) using os.path:
if getattr(sys, 'frozen', False):
app_path = os.path.dirname(sys.executable)
elif __file__:
app_path = os.path.dirname(__file__)
b) using pathlib:
if getattr(sys, 'frozen', False):
app_path = Path(sys.executable).parent
elif __file__:
app_path = Path(__file__).parent
Resources living inside a zip file
The above solutions would work if the code lives on the file system but it wouldn't work if the package is living inside a zip file, when that happens you could use either importlib.resources (new in version 3.7) or pkg_resources combo as you've already shown in the question (or you could wrap up in some helpers) or you could use a nice 3rd party library called importlib_resources that should work with the old&modern python versions:
pypi: https://pypi.org/project/importlib_resources/
documentation: https://importlib-resources.readthedocs.io/en/latest/
Specifically for your particular problem I'd suggest you take a look to this https://importlib-resources.readthedocs.io/en/latest/using.html#file-system-or-zip-file.
If you want to know what that library is doing behind the curtains because you're not willing to install any 3rd party library you can find the code for py2 here and py3 here in case you wanted to get the relevant bits for your particular problem
I'm going to go out on a limb and make an assumption because it may drastically simplify your problem. The only way I can imagine that you can claim that this data is "stored in a file that will always be in the same location as the script" is because you created this data, once, and put it in a file in the source code directory. Even though this data is binary, have you considered making the data a literal byte-string in a python file, and then simply importing it as you would anything else?
You're right about the fact that your method of reading a file is a bit unnecessarily complex. Unless you have got a really specific reason to use the importlib and pkg_resources modules, it's rather simple.
import os
def fetch_wordlist():
if not os.path.exists('wordlist.txt'):
raise FileNotFoundError
with open('wordlist.txt', 'rb') as wordlist:
return wordlist.read()
You haven't given much information regarding your script, so I cannot comment on why it doesn't work unless it's installed using pip. My best guess: your script is probably packed into a python package.

python unittest xml files created in different folder when test fails

The following code results in the log file printed to different folders depending on whether the test passes or not. I have a test case with one test purpose. During the running of the test, it does a chdir().
If the test result is fail (an assert* fails), the xml file is written to the test's current directory. If the test result is pass, then the xml file is written to the start folder. See the code snippet for how I specify the log file folder. Other than using full paths, is there a way to make python unittest always write it to the start folder?
logFolderName = "TestMyStuff_detail-" +str(scriptPid)
unittest.main(testRunner=xmlrunner.XMLTestRunner(output=logFolderName),
failfast=False)
Other than using full paths, is there a way to make python unittest always write it to the start folder?
Doubtful since relative paths will always be relative to the current working directory. If your test changes the current working directory, you're kind of out of luck.
With that said, it shouldn't be too hard to use a full path:
import os
cwd = os.getcwd()
localLogFolderName = "TestMyStuff_detail-" +str(scriptPid)
logFolderName = os.path.abspath(os.path.join(cwd, localLogFolderName))
you could use a fixed path to write your output.
Something like
path_to_my_output_folder="/path/to/output/"
test1_write_xml(path_to_my_output_folder+"file1.xml")
test2_write_xml(path_to_my_output_folder+"file2.xml")
test3_write_xml(path_to_my_output_folder+"file3.xml")

Get pytest to look within the base directory of the testing script

In pytest, my testing script compares the calculated results with baseline results which are loaded via
SCRIPTLOC = os.path.dirname(__file__)
TESTBASELINE = os.path.join(SCRIPTLOC, 'baseline', 'baseline.csv')
baseline = pandas.DataFrame.from_csv(TESTBASELINE)
Is there a non-boilerplate way to tell pytest to start looking from the root directory of the script rather than get the absolute location through SCRIPTLOC?
If you are simply looking for the pytest equivalent of using __file__, you can add the request fixture to your test and use request.fspath
From the docs:
class FixtureRequest
...
fspath
the file system path of the test module which collected this test.
So an example might look like:
def test_script_loc(request):
baseline = os.path.join(request.fspath.dirname, 'baseline', 'baseline.cvs')
print(baseline)
If you're wanting to avoid boilerplate though, you're not going to gain much from doing this (assuming I understand what you mean by 'non-boilerplate')
Personally, I think using the fixture is more explicit (within pytest idioms), but I prefer to wrap the request manipulation in another fixture so I know that I'm specifically grabbing sample test data just by looking at the method signature of a test.
Here's a snippet I use (modified to match your question, I use a subdirectory hierarchy):
# in conftest.py
import pytest
#pytest.fixture(scope="module")
def script_loc(request):
'''Return the directory of the currently running test script'''
# uses .join instead of .dirname so we get a LocalPath object instead of
# a string. LocalPath.join calls normpath for us when joining the path
return request.fspath.join('..')
And sample usage
def test_script_loc(script_loc):
baseline = script_loc.join('baseline/baseline.cvs')
print(baseline)
Update: pathlib
Given that I've long since stopped supporting legacy Python and instead use pathlib, I no longer return LocalPath objects nor depend on their API.
The pytest team is also planning to eventually deprecate py.path and port their internals to the standard library pathlib. This started as early as pytest 3.9.0 with the introduction of tmp_path, though the actual removal of LocalPath attributes may not happen for some time.
Though the pytest team will likely add an alternative attribute (e.g. request.fs_path) that returns a Path object, it's easy enough to convert the LocalPath to a Path ourselves for now.
Here's a variant of the above example using a Path object, tweak as it suits you:
# in conftest.py
import pytest
from pathlib import Path
#pytest.fixture(scope="module")
def script_loc(request):
'''Return the directory of the currently running test script'''
return Path(request.fspath).parent
And sample usage
def test_script_loc(script_loc):
baseline = script_loc.joinpath('baseline/baseline.cvs')
print(baseline)

Creating a function to load data into a list? Passing filename to import through function

I am new to writing functions and modules, and I need some help creating a function within my new module that will make my currently repetitive process of loading data much more efficient.
I would like this function to reside in a larger overall module with other function that I can keep stored on my home directory and not have to copy into my working directory every time I want to call one of my function.
The data that I have is just some JSON Twitter data from the streaming API, and I would like to use a function to load the data (list of dicts) into a list that I can access after the function runs by using something like data = my_module.my_function('file.json').
I've created a folder in my home directory for my python modules and I have two files in that directory: __init__.py and my_module.py.
I've also went ahead and added the python module folder to sys.path by using sys.path.append('C:\python')
Within python module folder, the file __init__.py has nothing in it, it's just an empty file.
Do I need to put anything in the __init__.py file?
my_module.py has the following code:
import json
def my_function(parameter1):
tweets = []
for line in open(parameter1):
try:
tweets.append(json.loads(line))
except:
pass
I would like to cal the function as such:
import my_module
data = my_module.my_function('tweets.json')
What else do I need to do to create this function to make loading my data more efficient?
To import the module from the package, for example:
import my_package.my_module as my_module
would do what you want. In this case it's fine to leave the init.py empty, and the module will be found just by it being in the package folder "my_package". There are many alternatives on how to define a package/module structure and how to import them, I encourage you to read up, as otherwise you will get confused at some point.
I would like this function to reside in a larger overall module with
other function that I can keep stored on my home directory and not
have to copy into my working directory every time I want to call one
of my function.
For this to work, you need to create a .pth file in C:\Python\site-packages\ (or wherever you have installed Python). This is a simple text file, and inside it you'd put the path to your module:
C:/Users/You/Some/Path/
Call it custom.pth and make sure its in the site-packages directory, otherwise it won't work.
I've also went ahead and added the python module folder to sys.path by
using sys.path.append('C:\python')
You don't need to do this. The site-packages directory is checked by default for modules.
In order to use your function as you intend, you need to make sure:
The file is called my_module.py
It is in the directory you added to the .pth file as explained earlier
That's it, nothing else needs to be done.
As for your code itself the first thing is you are missing a return statement.
If each line in the file is a json object, then use this:
from __future__ import with_statement
import json
def my_function(parameter1):
tweets = []
with open(parameter1) as inf:
for line in inf:
try:
tweets.append(json.loads(line))
except:
pass
return tweets
If the enter file is a json object, then you can do this:
def my_function(parameter1):
tweets = None
with open(parameter1) as inf:
tweets = json.load(inf)
return tweets

How can I store testing data for python nosetests?

I want to write some tests for a python MFCC feature extractor for running with nosetest. As well as some lower-level tests, I would also like to be able to store some standard input and expected-output files with the unit tests.
At the moment we are hard-coding the paths to the files on our servers, but I would prefer the testing files (both input and expected-output) to be somewhere in the code repository so they can be kept under source control alongside the testing code.
The problem I am having is that I'm not sure where the best place to put the testing files would be, and how to know what that path is when nosetest calls each testing function. At the moment I am thinking of storing the testing data in the same folder as the tests and using __file__ to work out where that is (would that work?), but I am open to other suggestions.
I think that using __file__ to figure out where the test is located and storing data alongside the it is a good idea. I'm doing the same for some tests that I write.
This:
os.path.dirname(os.path.abspath(__file__))
is probably the best you are going to get, and that's not bad. :-)
Based on the idea of using __file__, maybe you could use a module to help with the path construction. You could find all the files contained in the module directory, gather their name and path in a dictionnary for later use.
Create a module accessible to your tests, i.e. a directory besides your test such as testData, where you can put your data files. In the __init__.py of this module, insert the following code.
import os
from os.path import join,dirname,abspath
testDataFiles = dict()
baseDir = dirname(abspath(__file__)) + os.path.sep
for root, dirs, files in os.walk(baseDir):
localDataFiles = [(join(root.replace(baseDir,""),name), join(root,name)) for name in files]
testDataFiles.update( dict(localDataFiles))
Assuming you called your module testData and it contains a file called data.txt you can then use the following construct in your test to obtain the path to the file. aFileOperation is assumed to be a function that take a parameter path
import unittest
from testData import testDataFiles
class ATestCase(unittest.TestCase):
def test_Something(self):
self.assertEqual( 0, aFileOperation(testDataFiles['data.txt'] )
It will also allow you to use subdirectories such as
def test_SomethingInASubDir(self):
self.assertEqual( 0, aFileOperation(testDataFiles['subdir\\data.txt'] )

Categories

Resources