Unit testing a script that opens a file

Unit testing a script that opens a file - python

I've written a script that opens up a file, reads the content and does some operations and calculations and stores them in sets and dictionaries.
How would I write a unit test for such a thing? My questions specifically are:
Would I test that the file opened?
The file is huge (it's the unix dictionary file). How would I unit test the calculations? Do I literally have to manually calculate everything and test that the result is right? I have a feeling that this defeats the whole purpose of unit testing. I'm not taking any input through stdin.

That's not what unit-testing is about!
Your file doesn't represent an UNIT, so no you don't test the file or WITH the file!
your unit-test should test every single method of your functions/methods which deals with the a)file-processing b) calculations
it's not seldom that your unit-tests exceeds the line of code of your units under test.
Unit-test means (not complete and not the by-the-book definition):
minimalistic/atomic - you split your units down to the most basic/simple unit possible; an unit is normally a callable (method, function, callable object)
separation of concern - you test ONE and only ONE thing in every single test; if you want to test different conditions of a single unit, you write different tests
determinism - you give the unit something to process, with the beforehand knowledge of what it's result SHOULD be
if your unit-under-test needs a specific enviroment you create a fixture/test-setup/mock-up
unit-tests are (as a rule of thumb) blazingly fast! if it's slow check if you violated another point from above
if you need to test somethin which violates somethin from above you may have made the next step in testing towards integration-tests
you may use unit-test frameworks for not unit-testings, but don't call it unit-test just because of the use of the unittest-framework
This guy (Gary Bernhardt) has some interesting practical examples of what testing and unit-testing means.
Update for some clarifications:
"1. Would I test that the file opened?"
Well you could do that, but what would be the "UNIT" for that? Keep in mind, that a test has just two solutions: pass and fail. If your test fails, it should (ideally must) have only one reason for that: Your unit(=function) sucks! But in this case your test can fail, because:
* the file doesn't exist
* is locked
* is corrupted
* no file-handles left
* out of memeory (big file)
* moon- phase
and so on.
so what would a failing (or passing) "unit" test say about your unit? You don't test your unit alone, but the whole surrounding enviroment with it. That's more a system-test!
If you would like to test nontheless for successful file-opening you should at least mock a file.
"2 ... How would I unit test the calculations? Do I literally have to manually calculate everything and test that the result is right?"
No. You would write test for the corner- and regular-cases and check the expected outcome against the processed one. The amount of tests needed depends on the complexity of your calculations and the exceptions to the rule.
e.g.:
def test_negative_factor(self):
assert result
def test_discontinuity(self):
assert raise exception if x == undefined_value
I hope i made myself clearer!

You should refactor your code to be unit-testable. That, on the top of my head, would say:
Take the functionality of the file opening into a separate unit. Make that new unit receive the file name, and return the stream of the contents.
Make your unit receive a stream and read it, instead of opening a file and reading it.
Write a unit test for your main (calculation) unit. You would need to mock a stream, e.g. from a dictionary. Write several test cases, each time provide your unit with a different stream, and make sure your unit calculates the correct data for each input.
Get your code coverage as close to 100% as you can. Use nosetests for coverage.
Finally, write a test for your 'stream provider'. Feed it with several files (store them in your test folder), and make sure your stream provider reads them correctly.
Get the second unit test coverage as close to 100% as you can.
Now, and only now, commit your code.

You haven't explained what the calculations are, but I guess your program should be able to work with a subset of the big file, as well. If this is the case, make a unit test which opens up a small file , does the calculations and produces some result, which you can verify is correct.

Related

Python Unit Test for writing Excel file

I'm not very familiar with pytest but try to incorporate it into my project. I already have some tests and understand main ideas.
But I got stuck with test for Excel output. I have a function that makes a report and saves it in Excel file (I use xlsxwriter to save in Excel format). It has some merged cells, different fonts and colors, but first of all I would like to be sure that values in cells are correct.
I would like to have a test that will automatically check content of this file to be sure that function logic isn't broken.
I'm not sure that binary comparison of generated excel file to the correct sample is a good idea (as excel format is rather complex and minor change of xlsxwriter library may make files completely different).
So, I seek an advice how to implement this kind of test. Had someone similar experience? May you give advice?

IMHO a unit test should not touch external things (like file system, database, or network). If your test does this, it is an integration test. These usually run much slower and tend to be brittle because of the external resources.
That said, you have 2 options: unit test it, mocking the xls writing or integration test it, reading the xls file again after writing.
When you mock the xlswriter, you can have your mock check that it receives what should be written. This assumes that you don't want to test the actual xlswriter, which makes sense cause it's not your code, and you usually just test your own code. This makes for a fast test.
In the other scenario you could open the excel file with xslsreader and compare the written file to what is expected. This is probably best if you can avoid the file system and write the xls data to a memory buffer from which you can read again. If you can't do that, try using a tempdir for your test, but with that you're already getting into integration test land. This makes for a slower, more complicated, but also more thorough test.
Personally, I'd write one integration test to see that it works in general, and then a lot of unit tests for the different things you want to write.

Is there a good way to test partial write failures to files?

Is there a good way to test partial write failures to files? I'm particularly interested in simulating a full disk.
I have some code which modifies a file. For some failures there's nothing the code can do, eg: if the disk is unplugged while writing. But for other predictable failures, such as disk full, my code should (and can) catch the exception and undo all changes since the most recent modification began.
I think my code does this well, but am struggling to find a way to exhaustively unit test it. It's difficult to write a unit test to limit a real file system1. I don't see any way to limit a BytesIO. I'm not aware of any mock packages for this.
Are there any standard tools/techniques for this before I write my own?
1 Limiting a real file system is hard for a few reasons. The biggest difficulty is that file systems are usually limited by blocks of a few KiB not bytes. It's hard to make this test all unhappy paths. That is, a good test would be repeated with limits of different lengths to ensuring every individual file.write(...) errors in test, but achieving this with block sizes of say 4KiB is going to be difficult.

Disclaimer: I'm a contributor to pyfakefs.
This may be overkill for you, but you could simulate the whole file system using pyfakefs. This allows you to set the file system size beforehand. Here is a trivial example using pytest:
def test_disk_full(fs): # fs is the file system fixture
fs.set_disk_usage(100) # sets the file system size in bytes
os.makedirs('/foo')
with open('/foo/bar.txt', 'w') as f:
with pytest.raises(OSError):
f.write('a' * 200)
f.flush()

How to compare cpp and python outputs?

I am trying to reproduce some functionality from Python in C++. It is an involved numerical method with a bunch of subfunctions.
Is there a nice way to compare the values of the Python functions with the C++ functions that I am writing and that should mirror them?
Could someone paste some code or give a mini tutorial?
Or some pointers or references?
Many thanks!
Artabalt

Testing.
Write tests for the functionality on Python (unless they already exist; Python code is usually tested).
Port the tests to C++. Then make your C++ functions pass the tests. Ideally, make the tests a target on your makefile and run them whenever you can.
Edit:
You can test randomly, if it's a good idea or not, might depend on your particular case.
The rule of thumb is to use a couple of each border case (you can use code coverage to see if there's a case that you're missing).
As an example, at my work I use an integration test that requires a lot of components. We simulate a betting process (all the process excluding user interface), at a given point in time, with a hardcoded server response, and we simulate the hardware printer with a redirection to a bmp.
Now, you can't test everything. I can't test every valid combination of numbers (6 numbers from 0 to 36?), in every possible order. Neither can I test every invalid combination (infinite?). You test a couple. The meaningful ones.
So we give it always the same numbers. The test is that it has to produce always the same bmp (has to render fonts the same, has to have the same texts, the same betting date, etc.).
Now, depending on your code and your application is the level of automation that generating this tests can have.
I don't know your particular case, but I would start creating a little program (the simpler, the better) that uses your library. One implementation in Python, one implementation in C++. If you were testing a string library, for example, you do a small program that deletes a letter inside a string, adds up a text, erases every other letter, etc.
Then you automate for a couple of test cases:
cat largetext.txt | ./my_python_program > output.pyhthon_program.txt
cat largetext.txt | ./my_cpp_program > output.cpp_program.txt
diff output.pyhthon_program.txt output.cpp_program.txt || echo "ERROR"
The ideal case is that everytime you compile, the tests get run (this can be done if the code is simple; if not, you can do testing at the end of the day, as an example).
This doesn't grant the programs are the same (you can't test every possible input). But gives you some confidence. If you see something is not right, you first add it to your test, then make it fail, then fix it until it passess.
Regards.

How can I test a procedural python script properly

I'm pretty new to Python. However, I am writing a script that loads some data from a file and generates another file. My script has several functions and it also needs two user inputs (paths) to work.
Now, I am wondering, if there is a way to test each function individually. Since there are no classes, I don't think I can do it with Unit tests, do I?
What is the common way to test a script, if I don't want to run the whole script all the time? Someone else has to maintain the script later. Therefore, something similar to unit tests would be awesome.
Thanks for your inputs!

If you write your code in the form of functions that operate on file objects (streams) or, if the data is small enough, that accept and return strings, you can easily write tests that feed the appropriate data and check the results. If the real data is large enough to need streams, but the test data is not, use the StringIO function in the test code to adapt.
Then use the __name__=="__main__" trick to allow your unit test driver to import the file without running the user-facing script.

Testing full program by comparing output file to reference file: what's it called, and is there a package for it?

I have a completely non-interactive python program that takes some command-line options and input files and produces output files. It can be fairly easily tested by choosing simple cases and writing the input and expected output files by hand, then running the program on the input files and comparing output files to the expected ones.
1) What's the name for this type of testing?
2) Is there a python package to do this type of testing?
It's not difficult to set up by hand in the most basic form, and I did that already. But then I ran into cases like output files containing the date and other information that can legitimately change between the runs - I considered writing something that would let me specify which sections of the reference files should be allowed to be different and still have the test pass, and realized I might be getting into "reinventing the wheel" territory.
(I rewrote a good part of unittest functionality before I caught myself last time this happened...)

I guess you're referring to a form of system testing.
No package would know which parts can legitimately change. My suggestion is to mock out the sections of code that result in the changes so that you can be sure that the output is always the same - you can use tools like Mock for that. Comparing two files is pretty straightforward, just dump each to a string and compare strings.

Functional testing. Or regression testing, if that is its purpose. Or code coverage, if you structure your data to cover all code paths.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.