How can I test a procedural python script properly - python

I'm pretty new to Python. However, I am writing a script that loads some data from a file and generates another file. My script has several functions and it also needs two user inputs (paths) to work.
Now, I am wondering, if there is a way to test each function individually. Since there are no classes, I don't think I can do it with Unit tests, do I?
What is the common way to test a script, if I don't want to run the whole script all the time? Someone else has to maintain the script later. Therefore, something similar to unit tests would be awesome.
Thanks for your inputs!

If you write your code in the form of functions that operate on file objects (streams) or, if the data is small enough, that accept and return strings, you can easily write tests that feed the appropriate data and check the results. If the real data is large enough to need streams, but the test data is not, use the StringIO function in the test code to adapt.
Then use the __name__=="__main__" trick to allow your unit test driver to import the file without running the user-facing script.

Related

Feed file path into Pandas DF for PyTest as command line parameter

so I am trying to find a way to parameterize tests a bit better for data validation. I have basic tests I’ve created in PyTest that involve me reading a .csv file and loading it into a Pandas DF. From there I’m doing simple checks like checking for duplicates, verifying data types and whatnot. All of that works just fine.
However I want to be able to share the script with colleagues and allow for them to simply execute from the command line by feeding the file name into the test. I know that PyTest allows for parameterization but everything I’ve found thus far is for simple text and number inputs, haven’t seen anything helpful for file inputs.
Anyone had experience with this?
Maybe I'm not understanding what you're trying to do, but if you're trying to check that some tabular data is properly sanitized, that's not what pytest is for. Pytest is for writing unit tests, and in that context it doesn't really make sense to share scripts with colleagues who would provide their own input files.
I think what you want is a data validation library, e.g. pandera. This would allow you to handle any command-line arguments yourself, and would also probably simplify the actual data validation steps.

Python Unit Test for writing Excel file

I'm not very familiar with pytest but try to incorporate it into my project. I already have some tests and understand main ideas.
But I got stuck with test for Excel output. I have a function that makes a report and saves it in Excel file (I use xlsxwriter to save in Excel format). It has some merged cells, different fonts and colors, but first of all I would like to be sure that values in cells are correct.
I would like to have a test that will automatically check content of this file to be sure that function logic isn't broken.
I'm not sure that binary comparison of generated excel file to the correct sample is a good idea (as excel format is rather complex and minor change of xlsxwriter library may make files completely different).
So, I seek an advice how to implement this kind of test. Had someone similar experience? May you give advice?
IMHO a unit test should not touch external things (like file system, database, or network). If your test does this, it is an integration test. These usually run much slower and tend to be brittle because of the external resources.
That said, you have 2 options: unit test it, mocking the xls writing or integration test it, reading the xls file again after writing.
When you mock the xlswriter, you can have your mock check that it receives what should be written. This assumes that you don't want to test the actual xlswriter, which makes sense cause it's not your code, and you usually just test your own code. This makes for a fast test.
In the other scenario you could open the excel file with xslsreader and compare the written file to what is expected. This is probably best if you can avoid the file system and write the xls data to a memory buffer from which you can read again. If you can't do that, try using a tempdir for your test, but with that you're already getting into integration test land. This makes for a slower, more complicated, but also more thorough test.
Personally, I'd write one integration test to see that it works in general, and then a lot of unit tests for the different things you want to write.

Reproducible test framework

I am looking for a test or integration framework that supports long, costly tests for correctness. The tests should only be rerun if the code affecting the test has changed.
Ideally the test framework would
find the code of the test
produce a hash of it,
run the code and write to an output file with the hash as the name
or skip if that already exists.
provide a simple overview what tests succeeded and which failed.
It would be OK if the test has to specify the modules and files it depends on.
Python would be ideal, but this problem may be high-level enough that other languages would work too.
Perhaps there exists already a test or build integration framework I can adapt to fit this behaviour?
Basically you need to track what is the test doing so you can check whether it has changed.
Python code can be traced with sys.settrace(tracefunc). There is a module trace that can help with it.
But if it is not just Python code - if the tests execute other programs, test input files etc. and you need to watch it for changes too, then you would need tracing on operating system level, like strace, dtrace, dtruss.
I've created a small demo/prototype of simple testing framework that runs only tests that changed from last run: https://gist.github.com/messa/3825eba3ad3975840400 It uses the trace module. It works this way:
collect tests, each test is identified by name
load test fingerprints from JSON file (if present)
for each test:
if the fingerprint matches the current bytecode of functions listed in the fingerprint, the test is skipped
run test otherwise
trace it while running, record all functions being called
create test fingerprint with function names and bytecode MD5 hashes of each recorded function
save updated test fingerprints to a JSON file
But there is one problem: it's slow. Running code while tracing it with trace.Trace is about 40x slower than without tracing. So maybe you will be just better running all tests without tracing :) But if the tracer would be implemented in C like for example it is in the coverage module it should be faster. (Python trace module is not in C.)
Maybe some other tricks could help with speed. Maybe you are interested just in some top-level function whether they changed or not, so you don't need to trace all function calls.
Have you considered other ways how to speed up expensive tests? Like paralellization, ramdisk (tmpfs)... For example, if you test against a database, don't use the "system" or development one, but run a special instance of the database with lightweight configuration (no prealloc, no journal...) from tmpfs. If it is possible, of course - some tests need to be run on configuration similar to the production.
Some test frameworks (or their plugins) can run only the tests that failed last time - that's different, but kind of similar functinality.
This may not be the most efficient way to do this, but this can be done with Python's pickle module.
import pickle
At the end of your file, have it save itself as a pickle.
myfile = open('myfile.py', 'r') #Your script
savefile = open('savefile.pkl', 'w') #File the script will be saved to
#Any file extension can be used but I like .pkl for "pickle"
mytext = myfile.readlines()
pickle.dump(mytext, savefile) #Saves list from readlines() as a pickle
myfile.close()
savefile.close()
And then at the beginning of your script (after you have pickled it once already), add the code bit that checks it against the pickle.
myfile = ('myfile.py', 'r')
savefile = ('savefile.pkl', 'r')
mytext = myfile.readlines
savetext = pickle.load(savefile)
myfile.close()
savefile.close()
if mytext == savetext:
#Do whatever you want it to do
else:
#more code
That should work. It's a little long, but it's pure python and should do what you're looking for.

Is this a wrapper script? What is the correct name for it? (Python)

execfile("es.py")
execfile("ew.py")
execfile("ef.py")
execfile("gw.py")
execfile("sh.py")
print "--Complete--"
It seems more like a "batch" script. It looks analogous to writing a bash or batch file in unix/windows to do a bunch of pre-defined things. I definitely agree that it's not really a pythonic thing to do as is written though.
It is a kind of wrapper script, i guess, but there is no "correct name" for that as it is not very pythonic to write something like that.
One would probably write something like
import es
import ew
### ...
es.run() # assuming es code is in a run function
ew.run()
# ...
To expand on that, in python, a file is supposed to represent an unit of functionality. For example, if you implement a list type / class, you might want to put the class for this type and every function related to it into this file.
In python, a function is supposed to be an unit of execution, eg. a bunch of code that is supposed to run at once, on some particular data. It's hard to advise you without knowing the content of the files, but the content of every file executed in your snippet of code would probably benefit from being put into functions. The file division could be kept, or not depending on functionality.

Testing full program by comparing output file to reference file: what's it called, and is there a package for it?

I have a completely non-interactive python program that takes some command-line options and input files and produces output files. It can be fairly easily tested by choosing simple cases and writing the input and expected output files by hand, then running the program on the input files and comparing output files to the expected ones.
1) What's the name for this type of testing?
2) Is there a python package to do this type of testing?
It's not difficult to set up by hand in the most basic form, and I did that already. But then I ran into cases like output files containing the date and other information that can legitimately change between the runs - I considered writing something that would let me specify which sections of the reference files should be allowed to be different and still have the test pass, and realized I might be getting into "reinventing the wheel" territory.
(I rewrote a good part of unittest functionality before I caught myself last time this happened...)
I guess you're referring to a form of system testing.
No package would know which parts can legitimately change. My suggestion is to mock out the sections of code that result in the changes so that you can be sure that the output is always the same - you can use tools like Mock for that. Comparing two files is pretty straightforward, just dump each to a string and compare strings.
Functional testing. Or regression testing, if that is its purpose. Or code coverage, if you structure your data to cover all code paths.

Categories

Resources