I've never written a proper test until now, only small programs that I would dispose of after the test succeeded. I was looking through Python's unittest module and tutorials around the web, but something's not clear to me.
How much should one TestCase cover? I've seen examples on the web that have TestCase classes with only one method, as well as classes that test almost the entire available functionality.
In my case, I'm trying to write a test for a simple bloom filter. How do you think I should organize my test cases?
To put it simple: one unit test should cover single feature of your program. That's all there is to say. That's why they're called unit tests.
Of course, what we understand by feature may vary. Think about smallest parts of your program that might break or not work as expected. Think about business requirements of your code. Those are parts that you want each to be covered by dedicated unit test.
Usually, unit tests are small, isolated and atomic. They should be easy to understand, they should fail/pass independently from one another, and should execute fast. Fairly good indication of proper unit tests is single assertion - if you find yourself writing more, you probably test too much and it's a sign you need more than one test for given feature. However, this is not a strict rule - the more complex code is involved, the more complex unit tests tend to be.
When writing tests, it's easy to split your code functionality and test those separated parts (this should give you the idea of atomicity of your tests). For example, if you have a method that verifies input then calls a service and finally returns result, you usually want to have all three (verify, call, return) steps covered.
I would create one TestCase with several test methods. A bloom filter has simple semantics, so only one TestCase. I usually add a TestCase per feature.
Related
I have the following scenario:
I have a list of "dangerous patterns"; each is a string that contains dangerous characters, such as "%s with embedded ' single quote", "%s with embedded \t horizontal tab" and similar (it's about a dozen patterns).
Each test is to be run
once with a vanilla pattern that does not inject dangerous characters (i.e. "%s"), and
once for each dangerous pattern.
If the vanilla test fails, we skip the dangerous pattern tests on the assumption that they don't make much sense if the vanilla case fails.
Various other constraints:
Stick with the batteries included in Python as far as possible (i.e. unittest is hopefully enough, even if nose would work).
I'd like to keep the contract of unittest.TestCase as far as possible. I.e. the solution should not affect test discovery (which might find everything that starts with test, but then there's also runTest which may be overridden in the construction, and more variation).
I tried a few solutions, and I am not happy with any of them:
Writing an abstract class causes unittest to try and run it as a test case (because it quacks like a test case). This can be worked around, but the code is getting ugly fast. Also, a whole lot of functions needs to be overridden, and for several of them the documentation is a bit unclear about what properties need to be implemented in the base class. Plus test discovery would have to be replicated, which means duplicating code from inside unittest.
Writing a function that executes the tests as SubTests, to be called from each test function. Requires boilerplate in every test function, and gives just a single test result for the entire series of tests.
Write a decorator. Avoids the test case discovery problem of the abstract class approach, but has all the other problems.
Write a decorator that takes a TestCase and returns a TestSuite. This worked best so far, but I don't know whether I can add the same TestCase object multiple times. Or whether TestCase objects can be copied or not (I have control over all of them but I don't know what the base class does or expects in this regard).
What's the best approach?
So I'm trying to decide the way to plan and organize a testing suite for my python project but I have a doubt of when a unit test is no longer a unit test and I would love to have some feedback from the comunity.
If I understand correctly:
A Unit test test a minimal part of your code, be if a function/method that does one and only one simple thing, even if it has several use cases.
An Integration test tests that two or more units of you code that are executed under the same context, environment, etc (but trying to keep it to a minimmum of units per integration test) work well together and not only by themselves.
My doubt is: say I have a simple function that performs a HTTP request and returns the content of such request, be it HTML, JSON, etc, it doesn't matter, the fact is that the function is very, very simple but requests information from an external source, like:
import requests
def my_function(arg):
# do something very simple with `arg`, like removing spaces or the simplest thing you can imagine
return requests.get('http://www.google.com/' + arg).content
Now this is a very stupid example, but my doubt is:
Given that this function is requesting information from an external source, when you write a test for it, can you still consider such test a Unit test?
UPDATE: The test for my_function() would stub out calls to the external source so that it doesn't depend on network/db/filesystem/etc so that it's isolated. But the fact that the function that is being tested depends on external sources when running, for example, in production.
Thanks in advance!! :)
P.S.: Of course Maybe I'm not understading 100% de purposes of Unit and Integration testing, so, if I'm mistaken, please point me out where, I'll appreciate it.
Based on your update:
The test for my_function() would stub out calls to the external source
so that it doesn't depend on network/db/filesystem/etc so that it's
isolated. But the fact that the function that is being tested depends
on external sources when running, for example, in production.
As long as the external dependencies are stubbed out during your test then yes you can call it a Unit Test. It's a Unit Test based on how the unit under test behaves in your test suite rather than how the unit behaves in production.
Based on your original question:
Given that this function is requesting information from an external
source, when you write a test for it, can you still consider such test
a Unit test?
No, any test for code that touches or depends on things external to the unit under test is an integration test. This includes the existence of any web, file system, and database requests.
If your code is not 100% isolated from it's dependencies and not 100% reproducible without other components then it is an integration test.
For your example code to be properly unit tested you would need to mock out the call to google.com.
With the code calling google.com, your test would fail if google went down or you lost connection to the internet (ie the test is not 100% isolated). Your test would also fail if the behavior of google changed (ie the test is not 100% reproducable).
I don't think this is an integration test since it doesn't make use of different parts of your application. The function does really one thing and tests for it can be called unit.
On the other side, this particular function has an external dependency and you don't want to be dependent on the network in your tests. This is where mocking would really help a lot.
In other words, isolate the function and make unit tests for it.
Also, make integration tests that would use a more high-level approach and would test parts of your application which call my_function().
Whether or not your code is tested in isolation is not a distinguishing criterion whether your test is a unit test or not. For example, you would (except in very rare cases) not stub standard library functions like sin(x). But, if you don't stub sin(x) that does not make your test an integration test.
When is a test an integration test? A test is an integration test if your goal with the test is to find bugs on integration level. That means, with an integration test you want to find out whether the interaction between two (or more) components is based on the same assumptions on both (all) sides.
Mocking, however, is an orthogonal technique that can be combined with almost all kinds of tests. (In an integration test, however, you can not mock the partners of the interaction of which you want to test - but you can mock additional components).
Since mocking typically causes some effort, it must bring a benefit, like:
significantly speeding up your tests
testing although some software part is not ready yet or buggy
testing exceptional cases that are hard to set up in an integrated software
getting rid of nondeterministic behaviour like timing or randomness
...
If, however, mocking does not solve a real problem, you might be better off using the depended-on component directly.
I have a static library I created from C++, and would like to test this using a Driver code.
I noticed one of my professors like to do his tests using python, but he simply executes the program (not a library in this case, but an executable) using random test arguments.
I would like to take this approach, but I realized that this is a library and doesn't have a main function; that would mean I should either create a Driver.cpp class, or wrap the library into python using SWIG or boost python.
I’m planning to do the latter because it seems more fun, but logically, I feel that there is going to be more bugs when trying to wrap a library to a different language just to test it, rather than test it in its native language.
Is testing programs in a different language an accepted practice in the real world, or is this bad practice?
I'd say that it's best to test the API that your users will be exposed to. Other tests are good to have as well, but that's the most important aspect.
If your users are going to write C/C++ code linking to your library, then it would be good to have tests making use of your library the same way.
If you are going to ship a Python wrapper (why not?) then you should have Python tests.
Of course, there is a convenience aspect to this, as well. It may be easier to write tests in Python, and you might have time constraints that make it more appealing, etc.
I guess what I'm saying is: There's nothing inherently wrong with tests being in a different language from the code under test (that's totally normal for testing a REST API, for instance), but make sure you have tests for the public-facing API at a minimum.
Aside, on terminology:
I don't think the types of tests you are describing are "unit tests" in the usual sense of the term. Probably "functional test" would be more accurate.
A unit test typically tests a very small component - such as a function call - that might be one piece of larger functionality. Unit tests like these are often "white box" tests, so you can see the inner workings of your code.
Testing something from a user's point-of-view (such as your professor's commandline tests) are "black box" tests, and in these examples are at a more functional level rather than "unit" level.
I'm sure plenty of people may disagree with that, though - it's not a rigidly-defined set of terms.
A few things to keep in mind:
If you are writing tests as you code, then, by all means, use whatever language works best to give you rapid feedback. This enables fast test-code cycles (and is fun as well). BUT.
Always have well-written tests in the language of the consumer. How is your client/consumer going to call your functions? What language will they be using? Using the same language minimizes integration issues later on in the life-cycle.
It really depends on what it is you are trying to test. It almost always makes sense to write unit tests in the same language as the code you are testing so that you can construct the objects under test or invoke the functions under test, both of which can be most easily done in the same language, and verify that they work correctly. There are, however, cases in which it makes sense to use a different language, namely:
Integration tests that run a number of different components or applications together.
Tests that verify compilation or interpretation failures which could not be tested in the language, itself, since you are validating that an error occurs at the language level.
An example of #1 might be a program that starts up multiple different servers connected to each other, issues requests to the server, and verifies those responses. Or, as a simpler example, a program that simply forks an application under test as a subprocess and verifies that it produces the expected outputs for a given input.
An example of #2 might be a program that verifies that a certain piece of C++ code will produce a static assertion failure or that a particular template instantiation which is intentionally disallowed will result in a compilation failure if someone attempts to use it.
To answer your larger question, it is not bad practice per-se to write tests in a different language. Whatever makes the tests more convenient to write, easier to understand, more robust to changes in implementation, more sensitive to regressions, and better on any one of the properties that define good testing would be a good justification to write the tests one way vs another. If that means writing the tests in another language, then go for it. That being said, small unit tests typically need to be able to invoke the item under test directly which, in most cases, means writing the unit tests in the same language as the component under test.
I would say it depends on what you're actually trying to test. For true unit testing, it is, I think, best to test in the same language, or at least a binary-compatible language (i.e. testing Java with Groovy -- I use Spock in this case, which is Groovy based, to unit-test my Java code, since I can intermingle the Java with the Groovy), but if you are testing results, then I think it's fair to switch languages.
For example, I have tested the expected results when given a specific set of a data when running a Perl application via nose in Python. This works because I'm not unit testing the Perl code, per se, but the outcomes of that Perl code.
In that case, to unit test actual Perl functions that are part of the application, I would use a Perl-based test framework such as Test::More.
Why not, it's an awesome idea because you really understand that you are testing the unit like a black box.
Of course there may be technical issues involved, what if you need to mock some parts of the unit under test, that may be difficult in a different language.
This is a common practice for integration tests though, I've seen lots of programs driven from external tools such as a website from selenium, or an application from cucumber. Both those can be considered the same as a custom python script.
If you consider the difference between integration testing and unit testing is the number of things under test at any given time, the only reason why you shouldn't do this is tool support.
I have a test suit for my app. As the test suit grew organically, the tests have a lot of repeated code which can be refactored.
However I would like to ensure that the test suite doesn't change with the refactor. How can test that my tests are invariant with the refactor.
(I am using Python+UnitTest), but I guess the answer to this can be language agnostic.
The real test for the tests is the production code.
An effective way to check that a test code refactor hasn't broken your tests would be to do Mutation Testing, in which a copy of the code under test is mutated to introduce errors in order to verify that your tests catch the errors. This is a tactic used by some test coverage tools.
I haven't used it (and I'm not really a python coder), but this seems to be supported by the Python Mutant Tester, so that might be worth looking at.
Coverage.py is your friend.
Move over all the tests you want to refactor into "system tests" (or some such tag). Refactor the tests you want (you would be doing unit tests here right?) and monitor the coverage:
After running your new unit tests but before running the system tests
After running both the new unit tests and the system tests.
In an ideal case, the coverage would be same or higher but you can thrash your old system tests.
FWIW, py.test provides mechanism for easily tagging tests and running only the specific tests and is compatible with unittest2 tests.
Interesting question - I'm always keen to hear discussions of the type "how do I test the tests?!". And good points from #marksweb above too.
It's always a challenge to check your tests are actually doing what you want them to do and testing what you intend, but good to get this right and do it properly. I always try to consider the rule-of-thumb that testing should make up 1/3 of development effort in any project... regardless of project time constraints, pressures and problems that inevitably crop up.
If you intend to continue and grow your project have you considered refactoring like you say, but in a way that creates a proper test framework that allows test driven development (TDD) of any future additions of functionality or general expansion of the project?
Although you have mentioned Python, I would like to comment how refactoring is applied in Smalltalk. Most modern Smalltalk implementations include a "Refactoring Browser" integrated in a System Browser to restructure source code. The RB includes a rewrite framework to perform dynamically the transformations you asked about preserving the system behavior and stability. A way to use it is to open a scoped browser, apply refactorings and review/edit changes before commiting through a diff tool. I don't know about maturity of Python refactoring tools, but it took many iteration cycles (years) for the Smalltalk community to have such an amazing piece of software.
Don Roberts and John Brant wrote one of the first refactoring browser tools which now serves as the standard for refactoring tools. There are some videos and here demonstrating some of these features. For promoting a method into a superclass, in Pharo you just select the method, refactor and "pull up" menu item. The rule will detect and let you review the proposed duplicated sub-implementors for deletion before its execution. Application of refactorings are regardless of Testing code.
In theory you could write a test for the test, mocking the actualy object under test.But I guess that is just way to much work and not worth it.
So what you are left with are some strategies, that will help, but not make this fail safe.
Work very carefully and slowly. Use the features of you IDEs as much as possible in order to limit the chance of human error.
Work in pairs. A partner looking over your shoulder might just spot the glitch that you missed.
Copy the test, then refactor it. When done introduce errors in the production code to ensure, both tests find the the problem in the same (or equivalent) ways. Only then remove the original test.
The last step can be done by tools, although I don't know the python flavors. The keyword to search for is 'mutation testing'.
Having said all that, I'm personally satisfied with steps 1+2.
I can't see an easy way to refactor a test suite, and depending on the extent of your refactor you're obviously going to have to change the test suite. How big is your test suite?
Refactoring properly takes time and attention to detail (and a lot of Ctrl+C Ctrl+V!). Whenever I've refactored my tests I don't try and find any quick ways of doing things, besides find & replace, because there is too much risk involved.
You're best of doing things properly and manually albeit slowly if you want to make keep the quality of your tests.
Don't refactor the test suite.
The purpose of refactoring is to make it easier to maintain the code, not to satisfy some abstract criterion of "code niceness". Test code doesn't need to be nice, it doesn't need to avoid repetition, but it does need to be thorough. Once you have a test that is valid (i.e. it really does test necessary conditions on the code under test), you should never remove it or change it, so test code doesn't need to be easy to maintain en masse.
If you like, you can rewrite the existing tests to be nice, and run the new tests in addition to the old ones. This guarantees that the new combined test suite catches all the errors that the old one did (and maybe some more, as you expand the new code in future).
There are two ways that a test can be deemed invalid -- you realise that it's wrong (i.e. it sometimes fails falsely for correct code under test), or else the interface under test has changed (to remove the API tested, or to permit behaviour that previously was a test failure). In that case you can remove a test from the suite. If you realise that a whole bunch of tests are wrong (because they contain duplicated code that is wrong), then you can remove them all and replace them with a refactored and corrected version. You don't remove tests just because you don't like the style of their source.
To answer your specific question: to test that your new test code is equivalent to the old code, you would have to ensure (a) all the new tests pass on your currently-correct-as-far-as-you-known code base, which is easy, but also (b) the new tests detect all the errors that the old tests detect, which is usually not possible because you don't have on hand a suite of faulty implementations of the code under test.
Test code can be the best low level documentation of your API since they do not outdate as long as they pass and are correct. But messy test code doesn't serve that purpose very well. So refactoring is essential.
Also might your tested code change over time. So do the tests. If you want that to be smooth, code duplication must be minimized and readability is a key.
Tests should be easy to read and always test one thing at once and make the follwing explicit:
what are the preconditions?
what is being executed?
what is the expected outcome?
If that is considered, it should be pretty safe to refactor the test code. One step at a time and, as #Don Ruby mentioned, let your production code be the test for the test.
For many refactoring you can often safely rely on advanced IDE tooling – if you beware of side effects in the extracted code.
Although I agree that refactoring without proper test coverage should be avoided, I think writing tests for your tests is almost absurd in usual contexts.
Example
Let's say you have a hypothetical API like this:
import foo
account_name = foo.register()
session = foo.login(account_name)
session.do_something()
The key point being that in order to do_something(), you need to be registered and logged in.
Here's an over-simplified, first-pass, suite of unit tests one might write:
# test_foo.py
import foo
def test_registration_should_succeed():
foo.register()
def test_login_should_succeed():
account_name = foo.register()
foo.login(account_name)
def test_do_something_should_succeed():
account_name = foo.register()
session = foo.login(account_name)
session.do_something()
The Problem
When registration fails, all the tests fail and that makes it non-obvious where
the real problem is. It looks like everything's broken, but really only one, crucial, thing is broken. It's hard to find that once crucial thing unless you are familiar with all the tests.
The Question
How do you structure your unit tests so that subsequent tests aren't executed when core functionality on which they depend fails?
Ideas
Here are possible solutions I've thought of.
Manually detect failures in each test and and raise SkipTest. - Works, but a lot of manual, error-prone work.
Leverage generators to not yield subsequent tests when the primary ones fail. - Not sure if this would actually work (because how do I "know" the previously yielded test failed).
Group tests into test classes. E.g., these are all the unit tests that require you to be logged in. - Not sure this actually addresses my issue. Wouldn't there be just as many failures?
Rather than answering the explicit question, I think a better answer is to use mock objects. In general, unit tests should not require accessing external databases (as is presumably required to log in). However, if you want to have some integration tests that do this (which is a good idea), then those tests should test the integration aspects, and your unit tests should tests the unit aspects. I would go so far as to keep the integration tests and the unit tests in separate files, so that you can quickly run the unit tests on a very regular basis, and run the integration tests on a slightly less regular basis (although still at least once a day).
This problem indicates that Foo is doing to much. You need to separate concerns. Then testing will become easy. If you had a Registrator, a LoginHandler and a DoSomething class, plus a central controller class that orchestrated the workflow, then everything could be tested separately.