How to write unittests for a custom pylint checker? - python

I have written a custom pylint checker to check for non-i18n'ed strings in python code. I'd like to write some unittests for that code now.
But how? I'd like to have tests with python code as a string, which will get passed through my plugin/checker and check the output?
I'd rather not write the strings out to temporary files and then invoke the pylint binary on it. Is there a cleaner more robust approach, which only tests my code?

Take a look at pylint's test directory. You will find several example there. For instance, unittest_checker_*.py files are all unit-tests for unique checker.
You may also be interested in functional tests: put some input file in test/input and expected message file in test/messages, then python test_func.py should run it. While this is more for pylint's core checker, it may be easily backported in your own project.

Related

How to test a python binary in the python unit test?

I would like to test a python binary "main.py" with command line arguments in the unit test. The usage of this binary is as follows.
main.py --input_file=input.txt --output_file=out.txt
When designing unit tests, I think it is better to test each component like a class or a method.
However, in some cases like the above one, I would like to do end-to-end testing of the whole python binary especially when it is already created by someone else. In the above case, I want to make it sure whether "main.py" generates "out.txt" correctly.
One option is using subprocess.check_call and create it to a temporary directory, and loads it and compares it with the golden (expected output).
Is it a good way?
Or, if there is any better way, could you please advise me?
This is called blackbox testing as the inner structure of the program is unknown to the tester. If you insist on testing the module without getting to know what's happening inside, You could (as you mentioned) use exec or subprocess to check the validity of the output. But the more logical way is to use unittest libraries and try to test the code using the API it provides.
If you're testing the handling of arguments as part of the unit tests, just look inside main.py and see how it handles arguments.
For example, you might have a test case that sets sys.argv and then calls main:
import sys
import myapp.main
sys.argv = '''main.py --input_file=input.txt --output_file=out.txt'''.split()
myapp.main.main()
# I have no idea what test you want to run.

Testing full program by comparing output file to reference file: what's it called, and is there a package for it?

I have a completely non-interactive python program that takes some command-line options and input files and produces output files. It can be fairly easily tested by choosing simple cases and writing the input and expected output files by hand, then running the program on the input files and comparing output files to the expected ones.
1) What's the name for this type of testing?
2) Is there a python package to do this type of testing?
It's not difficult to set up by hand in the most basic form, and I did that already. But then I ran into cases like output files containing the date and other information that can legitimately change between the runs - I considered writing something that would let me specify which sections of the reference files should be allowed to be different and still have the test pass, and realized I might be getting into "reinventing the wheel" territory.
(I rewrote a good part of unittest functionality before I caught myself last time this happened...)
I guess you're referring to a form of system testing.
No package would know which parts can legitimately change. My suggestion is to mock out the sections of code that result in the changes so that you can be sure that the output is always the same - you can use tools like Mock for that. Comparing two files is pretty straightforward, just dump each to a string and compare strings.
Functional testing. Or regression testing, if that is its purpose. Or code coverage, if you structure your data to cover all code paths.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)
This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.
You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **
Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.
An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.
built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...
Use a Virtual Machine instead of running it on a system that you are concerned about.
Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

How to do integration tests for a Python script?

I have a Python command line script that connects to a database, generates files from the data and sends e-mails. I already have some unit-tests for the important components. Now I'd like to do tests that use several or all components together, load the test database with sample data and check for the correct output.
Are there Python libraries that support this kind of testing? Specifically, I'm looking for easy ways to
put sample data in the database
check for specific changes in the database
check for existence and content of specific files
Should I even do these tests in Python or should I just write a bunch of shell scripts?
I use nosetests
http://somethingaboutorange.com/mrl/projects/nose/1.0.0/
And it fits very well with another component, "Coverage", that provides you with information about how much of your code is tested.
Yes. The unittest libraries can be used for this.
Don't be fooled by the name. A "unit" is not always a single, standalone class (or function).
A "unit" can be a composite "unit" of one or more classes or modules or packages.

How to I get scons to invoke an external script?

I'm trying to use scons to build a latex document. In particular, I want to get scons to invoke a python program that generates a file containing a table that is \input{} into the main document. I've looked over the scons documentation but it is not immediately clear to me what I need to do.
What I wish to achieve is essentially what you would get with this makefile:
document.pdf: table.tex
pdflatex document.tex
table.tex:
python table_generator.py
How can I express this in scons?
Something along these lines should do -
env.Command ('document.tex', '', 'python table_generator.py')
env.PDF ('document.pdf', 'document.tex')
It declares that 'document.tex' is generated by calling the Python script, and requests a PDF document to be created from this generatd 'document.tex' file.
Note that this is in spirit only. It may require some tweaking. In particular, I'm not certain what kind of semantics you would want for the generation of 'document.tex' - should it be generated every time? Only when it doesn't exist? When some other file changes? (you would want to add this dependency as the second argument to Command() that case).
In addition, the output of Command() can be used as input to PDF() if desired. For clarity, I didn't do that.
In this simple case, the easiest way is to just use the subprocess module
from subprocess import call
call("python table_generator.py")
call("pdflatex document.tex")
Regardless of where in your SConstruct file these lines are placed, they will happen before any of the compiling and linking performed by SCons.
The downside is that these commands will be executed every time you run SCons, rather than only when the files have changed, which is what would happen in your example Makefile. So if those commands take a long time to run, this wouldn't be a good solution.
If you really need to only run these commands when the files have changed, look at the SCons manual section Writing Your Own Builders.

Categories

Resources