Python library for continuous integration testing

Python library for continuous integration testing - python

I'm trying to make a testing system for continuous integration and I want to be able to have checks that do specific operations, like formatting files in a specific way, and those checks need to have a flow to them.
Example:
FormatFilesCheck(files):
Test(see if formatter is installed)
for prep in prework:
Test(do prep)
for file in files:
Test(try formatting file)
Test(clean up work)
I was wondering if there is a python library out there that handles test control. Obviously there is unittest but from what I can tell it's meant linear testing, i.e. testing with dependencies.
If unittest has this capability an example would be amazing.
*edit: What I'm really looking for is a framework with unittest like capabilities for error handling and assertions which allows me to write checks and have them plug into it.
Thanks ahead of time

If I'm understanding what you're asking, you're looking for something along the lines of Chai JS. Fortunately, someone has written a python library that approximates it, also called Chai. Your testcase would look something like this:
FormatFilesCheck(files):
expect(formatter.status).equals(installed)
for prep in prework:
do prep
expect(prep).equals(what_prep_should_look_like)
for file in files:
format file
expect(file).equals(what_file_should_look_like)
clean up work
# your expect/assert statements to make sure everything was cleaned up

Related

Feed file path into Pandas DF for PyTest as command line parameter

so I am trying to find a way to parameterize tests a bit better for data validation. I have basic tests I’ve created in PyTest that involve me reading a .csv file and loading it into a Pandas DF. From there I’m doing simple checks like checking for duplicates, verifying data types and whatnot. All of that works just fine.
However I want to be able to share the script with colleagues and allow for them to simply execute from the command line by feeding the file name into the test. I know that PyTest allows for parameterization but everything I’ve found thus far is for simple text and number inputs, haven’t seen anything helpful for file inputs.
Anyone had experience with this?

Maybe I'm not understanding what you're trying to do, but if you're trying to check that some tabular data is properly sanitized, that's not what pytest is for. Pytest is for writing unit tests, and in that context it doesn't really make sense to share scripts with colleagues who would provide their own input files.
I think what you want is a data validation library, e.g. pandera. This would allow you to handle any command-line arguments yourself, and would also probably simplify the actual data validation steps.

Python Unit Test for writing Excel file

I'm not very familiar with pytest but try to incorporate it into my project. I already have some tests and understand main ideas.
But I got stuck with test for Excel output. I have a function that makes a report and saves it in Excel file (I use xlsxwriter to save in Excel format). It has some merged cells, different fonts and colors, but first of all I would like to be sure that values in cells are correct.
I would like to have a test that will automatically check content of this file to be sure that function logic isn't broken.
I'm not sure that binary comparison of generated excel file to the correct sample is a good idea (as excel format is rather complex and minor change of xlsxwriter library may make files completely different).
So, I seek an advice how to implement this kind of test. Had someone similar experience? May you give advice?

IMHO a unit test should not touch external things (like file system, database, or network). If your test does this, it is an integration test. These usually run much slower and tend to be brittle because of the external resources.
That said, you have 2 options: unit test it, mocking the xls writing or integration test it, reading the xls file again after writing.
When you mock the xlswriter, you can have your mock check that it receives what should be written. This assumes that you don't want to test the actual xlswriter, which makes sense cause it's not your code, and you usually just test your own code. This makes for a fast test.
In the other scenario you could open the excel file with xslsreader and compare the written file to what is expected. This is probably best if you can avoid the file system and write the xls data to a memory buffer from which you can read again. If you can't do that, try using a tempdir for your test, but with that you're already getting into integration test land. This makes for a slower, more complicated, but also more thorough test.
Personally, I'd write one integration test to see that it works in general, and then a lot of unit tests for the different things you want to write.

How can I test a procedural python script properly

I'm pretty new to Python. However, I am writing a script that loads some data from a file and generates another file. My script has several functions and it also needs two user inputs (paths) to work.
Now, I am wondering, if there is a way to test each function individually. Since there are no classes, I don't think I can do it with Unit tests, do I?
What is the common way to test a script, if I don't want to run the whole script all the time? Someone else has to maintain the script later. Therefore, something similar to unit tests would be awesome.
Thanks for your inputs!

If you write your code in the form of functions that operate on file objects (streams) or, if the data is small enough, that accept and return strings, you can easily write tests that feed the appropriate data and check the results. If the real data is large enough to need streams, but the test data is not, use the StringIO function in the test code to adapt.
Then use the __name__=="__main__" trick to allow your unit test driver to import the file without running the user-facing script.

Testing full program by comparing output file to reference file: what's it called, and is there a package for it?

I have a completely non-interactive python program that takes some command-line options and input files and produces output files. It can be fairly easily tested by choosing simple cases and writing the input and expected output files by hand, then running the program on the input files and comparing output files to the expected ones.
1) What's the name for this type of testing?
2) Is there a python package to do this type of testing?
It's not difficult to set up by hand in the most basic form, and I did that already. But then I ran into cases like output files containing the date and other information that can legitimately change between the runs - I considered writing something that would let me specify which sections of the reference files should be allowed to be different and still have the test pass, and realized I might be getting into "reinventing the wheel" territory.
(I rewrote a good part of unittest functionality before I caught myself last time this happened...)

I guess you're referring to a form of system testing.
No package would know which parts can legitimately change. My suggestion is to mock out the sections of code that result in the changes so that you can be sure that the output is always the same - you can use tools like Mock for that. Comparing two files is pretty straightforward, just dump each to a string and compare strings.

Functional testing. Or regression testing, if that is its purpose. Or code coverage, if you structure your data to cover all code paths.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)

This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.

You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **

Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.

An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.

built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...

Use a Virtual Machine instead of running it on a system that you are concerned about.

Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python library for continuous integration testing - python

Related

Feed file path into Pandas DF for PyTest as command line parameter

Python Unit Test for writing Excel file

How can I test a procedural python script properly

Testing full program by comparing output file to reference file: what's it called, and is there a package for it?

Dangerous Python Keywords?

Categories

Resources