Success unit testing pyinotify? - python

I'm using pyinotify to mirror files from a source directory to a destination directory. My code seems to be working when I execute it manually, but I'm having trouble getting accurate unit test results. I think the problem boils down to this:
I have to use ThreadedNotifier
in my tests, otherwise they will
just hang, waiting for manual input.
Because I'm using another thread, my tests and the Notifier get out of sync. Tests that pass when running observational, manual tests fail when running the unit tests.
Has anyone succeeded in unit testing pyinotify?

When unit testing, things like threads and the file system should normally be factored out. Do you have a reason to unit test with the actual file system, user input, etc.?
Python makes it very easy to monkey patch; you could for example, replace the entire os/sys module with a mock object (such as Python Mock) so that you never need to deal with the file system. This will also make your tests run much more quickly.
If you want to do functional testing with the file system, I'd recommend setting up a virtual machine that will have a known state, and reverting to that state every time you run the tests. You could also simulate user input, file operations, etc. as needed.
Edit
Here's a simple example of how to fake, or mock the "open" function.
Say you've got a module, my_module, with a get_text_upper function:
def get_text_upper(filename):
return open(filename).read().upper()
You want to test this without actually touching the file system (eventually you'll start just passing file objects instead of file names to avoid this but for now...). You can mock the open function so that it returns a StringIO object instead:
from cStringIO import StringIO
def fake_open(text):
fp = StringIO()
fp.write(text)
fp.seek(0)
return fp
def test_get_text():
my_module.open = lambda *args, **kwargs : fake_open("foo")
text = my_module.get_text_upper("foo.txt")
assert text == "FOO", text
Using a mocking library just makes this process a lot easier and more flexible.
Here's a stackoverflow post on mocking libraries for python.

Related

Prevent any file system usage in Python's pytest

I have a program that, for data security reasons, should never persist anything to local storage if deployed in the cloud. Instead, any input / output needs to be written to the connected (encrypted) storage instead.
To allow deployment locally as well as to multiple clouds, I am using the very useful fsspec. However, other developers are working on the project as well, and I need a way to make sure that they aren't accidentally using local File I/O methods - which may pass unit tests, but fail when deployed to the cloud.
For this, my idea is to basically mock/replace any I/O methods in pytest with ones that don't work and make the test fail. However, this is probably not straightforward to implement. I am wondering whether anyone else has had this problem as well, and maybe best practices / a library exists for this already?
During my research, I found pyfakefs, which looks like it is very close what I am trying to do - except I don't want to simulate another file system, I want there to be no local file system at all.
Any input appreciated.
You can not use any pytest addons to make it secure. There will always be ways to overcome it. Even if you patch everything in the standard python library, the code always can use third-party C libraries which can't be patched from the Python side.
Even if you, by some way, restrict every way the python process can write the file, it will still be able to call the OS or other process to write something.
The only ways are to run only the trusted code or to use some sandbox to run the process.
In Unix-like operating systems, the workable solution may be to create a chroot and run the program inside it.
If you're ok with just preventing opening files using open function, you can patch this function in builtins module.
_original_open = builtins.open
class FileSystemUsageError(Exception):
pass
def patched_open(*args, **kwargs):
raise FileSystemUsageError()
#pytest.fixture
def disable_fs():
builtins.open = patched_open
yield
builtins.open = _original_open
I've done this example of code on the basis of the pytest plugin which is written by the company in which I work now to prevent using network in pytests. You can see a full example here: https://github.com/best-doctor/pytest_network/blob/4e98d816fb93bcbdac4593710ff9b2d38d16134d/pytest_network.py

Safely import module using sys.addaudithook

I recently found out about the new sys.addaudithook in python. I run a service that involves running some untrusted scripts and I want to use the audit hooks to further secure the system. Scripts are running in a isolated container anyway but extra protection doesn't hurt.
Right now my code is this:
import sys
def audithook(event, args):
if event.startswith('os.') or event.startswith('sys.') or event.startswith('socket.') or event.startswith('winreg.') or event.startswith('webbrowser.') or event.startswith('shutil.') or event == "import":
raise RuntimeError("Attempt to break out of sandbox");
# Some code here that explicitly requires filesystem access
sys.addaudithook(audithook)
import untrusted_module
# More code that uses other
# Results need to eventually be saved to a file, that seems to work fine if I open the file before creating the hook then just write to it later. This is why I can't protect the entire program.
However, importing the file triggers quite a few hooks in the process of compiling the other file to bytecode and running it. I want to only prevent file system access inside the script itself and not the process of importing it.
I could try to set it to ignore some number N events before blocking new ones but the number of audit events seems to vary significantly depending on if the script changed or not, and how recent the stored bytecode is.
The entire audithook feature seems to be poorly documented and I can find very little about it online. Does anyone know how to use it to secure 1 module?

Read py.test's output as object

Earlier I was using python unittest in my project, and with it came unittest.TextTestRunner and unittest.defaultTestLoader.loadTestsFromTestCase. I used them for the following reasons,
Control the execution of unittest using a wrapper function which calls the unittests's run method. I did not want the command line approach.
Read the unittest's output from the result object and upload the results to a bug tracking system which allow us to generate some complex reports on code stability.
Recently there was a decision made to switch to py.test, how can I do the above using py.test ? I don't want to parse any CLI/HTML to get the output from py.test. I also don't want to write too much code on my unit test file to do this.
Can someone help me with this ?
You can use the pytest's hook to intercept the test result reporting:
conftest.py:
import pytest
#pytest.hookimpl(hookwrapper=True)
def pytest_runtest_logreport(report):
yield
# Define when you want to report:
# when=setup/call/teardown,
# fields: .failed/.passed/.skipped
if report.when == 'call' and report.failed:
# Add to the database or an issue tracker or wherever you want.
print(report.longreprtext)
print(report.sections)
print(report.capstdout)
print(report.capstderr)
Similarly, you can intercept one of these hooks to inject your code at the needed stage (in some cases, with the try-except around yield):
pytest_runtest_protocol(item, nextitem)
pytest_runtest_setup(item)
pytest_runtest_call(item)
pytest_runtest_teardown(item, nextitem)
pytest_runtest_makereport(item, call)
pytest_runtest_logreport(report)
Read more: Writing pytest plugins
All of this can be easily done either with a tiny plugin made as a simple installable library, or as a pseudo-plugin conftest.py which just lies around in one of the directories with the tests.
It looks like pytest lets you launch from Python code instead of using the command line. It looks like you just pass the same arguments to the function call that would be on the command line.
Pytest will create resultlog format files, but the feature is deprecated. The documentation suggests using the pytest-tap plugin that produces files in the Test Anything Protocol.

How can I test a procedural python script properly

I'm pretty new to Python. However, I am writing a script that loads some data from a file and generates another file. My script has several functions and it also needs two user inputs (paths) to work.
Now, I am wondering, if there is a way to test each function individually. Since there are no classes, I don't think I can do it with Unit tests, do I?
What is the common way to test a script, if I don't want to run the whole script all the time? Someone else has to maintain the script later. Therefore, something similar to unit tests would be awesome.
Thanks for your inputs!
If you write your code in the form of functions that operate on file objects (streams) or, if the data is small enough, that accept and return strings, you can easily write tests that feed the appropriate data and check the results. If the real data is large enough to need streams, but the test data is not, use the StringIO function in the test code to adapt.
Then use the __name__=="__main__" trick to allow your unit test driver to import the file without running the user-facing script.

Reproducible test framework

I am looking for a test or integration framework that supports long, costly tests for correctness. The tests should only be rerun if the code affecting the test has changed.
Ideally the test framework would
find the code of the test
produce a hash of it,
run the code and write to an output file with the hash as the name
or skip if that already exists.
provide a simple overview what tests succeeded and which failed.
It would be OK if the test has to specify the modules and files it depends on.
Python would be ideal, but this problem may be high-level enough that other languages would work too.
Perhaps there exists already a test or build integration framework I can adapt to fit this behaviour?
Basically you need to track what is the test doing so you can check whether it has changed.
Python code can be traced with sys.settrace(tracefunc). There is a module trace that can help with it.
But if it is not just Python code - if the tests execute other programs, test input files etc. and you need to watch it for changes too, then you would need tracing on operating system level, like strace, dtrace, dtruss.
I've created a small demo/prototype of simple testing framework that runs only tests that changed from last run: https://gist.github.com/messa/3825eba3ad3975840400 It uses the trace module. It works this way:
collect tests, each test is identified by name
load test fingerprints from JSON file (if present)
for each test:
if the fingerprint matches the current bytecode of functions listed in the fingerprint, the test is skipped
run test otherwise
trace it while running, record all functions being called
create test fingerprint with function names and bytecode MD5 hashes of each recorded function
save updated test fingerprints to a JSON file
But there is one problem: it's slow. Running code while tracing it with trace.Trace is about 40x slower than without tracing. So maybe you will be just better running all tests without tracing :) But if the tracer would be implemented in C like for example it is in the coverage module it should be faster. (Python trace module is not in C.)
Maybe some other tricks could help with speed. Maybe you are interested just in some top-level function whether they changed or not, so you don't need to trace all function calls.
Have you considered other ways how to speed up expensive tests? Like paralellization, ramdisk (tmpfs)... For example, if you test against a database, don't use the "system" or development one, but run a special instance of the database with lightweight configuration (no prealloc, no journal...) from tmpfs. If it is possible, of course - some tests need to be run on configuration similar to the production.
Some test frameworks (or their plugins) can run only the tests that failed last time - that's different, but kind of similar functinality.
This may not be the most efficient way to do this, but this can be done with Python's pickle module.
import pickle
At the end of your file, have it save itself as a pickle.
myfile = open('myfile.py', 'r') #Your script
savefile = open('savefile.pkl', 'w') #File the script will be saved to
#Any file extension can be used but I like .pkl for "pickle"
mytext = myfile.readlines()
pickle.dump(mytext, savefile) #Saves list from readlines() as a pickle
myfile.close()
savefile.close()
And then at the beginning of your script (after you have pickled it once already), add the code bit that checks it against the pickle.
myfile = ('myfile.py', 'r')
savefile = ('savefile.pkl', 'r')
mytext = myfile.readlines
savetext = pickle.load(savefile)
myfile.close()
savefile.close()
if mytext == savetext:
#Do whatever you want it to do
else:
#more code
That should work. It's a little long, but it's pure python and should do what you're looking for.

Categories

Resources