So I'm trying to decide the way to plan and organize a testing suite for my python project but I have a doubt of when a unit test is no longer a unit test and I would love to have some feedback from the comunity.
If I understand correctly:
A Unit test test a minimal part of your code, be if a function/method that does one and only one simple thing, even if it has several use cases.
An Integration test tests that two or more units of you code that are executed under the same context, environment, etc (but trying to keep it to a minimmum of units per integration test) work well together and not only by themselves.
My doubt is: say I have a simple function that performs a HTTP request and returns the content of such request, be it HTML, JSON, etc, it doesn't matter, the fact is that the function is very, very simple but requests information from an external source, like:
import requests
def my_function(arg):
# do something very simple with `arg`, like removing spaces or the simplest thing you can imagine
return requests.get('http://www.google.com/' + arg).content
Now this is a very stupid example, but my doubt is:
Given that this function is requesting information from an external source, when you write a test for it, can you still consider such test a Unit test?
UPDATE: The test for my_function() would stub out calls to the external source so that it doesn't depend on network/db/filesystem/etc so that it's isolated. But the fact that the function that is being tested depends on external sources when running, for example, in production.
Thanks in advance!! :)
P.S.: Of course Maybe I'm not understading 100% de purposes of Unit and Integration testing, so, if I'm mistaken, please point me out where, I'll appreciate it.
Based on your update:
The test for my_function() would stub out calls to the external source
so that it doesn't depend on network/db/filesystem/etc so that it's
isolated. But the fact that the function that is being tested depends
on external sources when running, for example, in production.
As long as the external dependencies are stubbed out during your test then yes you can call it a Unit Test. It's a Unit Test based on how the unit under test behaves in your test suite rather than how the unit behaves in production.
Based on your original question:
Given that this function is requesting information from an external
source, when you write a test for it, can you still consider such test
a Unit test?
No, any test for code that touches or depends on things external to the unit under test is an integration test. This includes the existence of any web, file system, and database requests.
If your code is not 100% isolated from it's dependencies and not 100% reproducible without other components then it is an integration test.
For your example code to be properly unit tested you would need to mock out the call to google.com.
With the code calling google.com, your test would fail if google went down or you lost connection to the internet (ie the test is not 100% isolated). Your test would also fail if the behavior of google changed (ie the test is not 100% reproducable).
I don't think this is an integration test since it doesn't make use of different parts of your application. The function does really one thing and tests for it can be called unit.
On the other side, this particular function has an external dependency and you don't want to be dependent on the network in your tests. This is where mocking would really help a lot.
In other words, isolate the function and make unit tests for it.
Also, make integration tests that would use a more high-level approach and would test parts of your application which call my_function().
Whether or not your code is tested in isolation is not a distinguishing criterion whether your test is a unit test or not. For example, you would (except in very rare cases) not stub standard library functions like sin(x). But, if you don't stub sin(x) that does not make your test an integration test.
When is a test an integration test? A test is an integration test if your goal with the test is to find bugs on integration level. That means, with an integration test you want to find out whether the interaction between two (or more) components is based on the same assumptions on both (all) sides.
Mocking, however, is an orthogonal technique that can be combined with almost all kinds of tests. (In an integration test, however, you can not mock the partners of the interaction of which you want to test - but you can mock additional components).
Since mocking typically causes some effort, it must bring a benefit, like:
significantly speeding up your tests
testing although some software part is not ready yet or buggy
testing exceptional cases that are hard to set up in an integrated software
getting rid of nondeterministic behaviour like timing or randomness
...
If, however, mocking does not solve a real problem, you might be better off using the depended-on component directly.
Related
I have a static library I created from C++, and would like to test this using a Driver code.
I noticed one of my professors like to do his tests using python, but he simply executes the program (not a library in this case, but an executable) using random test arguments.
I would like to take this approach, but I realized that this is a library and doesn't have a main function; that would mean I should either create a Driver.cpp class, or wrap the library into python using SWIG or boost python.
I’m planning to do the latter because it seems more fun, but logically, I feel that there is going to be more bugs when trying to wrap a library to a different language just to test it, rather than test it in its native language.
Is testing programs in a different language an accepted practice in the real world, or is this bad practice?
I'd say that it's best to test the API that your users will be exposed to. Other tests are good to have as well, but that's the most important aspect.
If your users are going to write C/C++ code linking to your library, then it would be good to have tests making use of your library the same way.
If you are going to ship a Python wrapper (why not?) then you should have Python tests.
Of course, there is a convenience aspect to this, as well. It may be easier to write tests in Python, and you might have time constraints that make it more appealing, etc.
I guess what I'm saying is: There's nothing inherently wrong with tests being in a different language from the code under test (that's totally normal for testing a REST API, for instance), but make sure you have tests for the public-facing API at a minimum.
Aside, on terminology:
I don't think the types of tests you are describing are "unit tests" in the usual sense of the term. Probably "functional test" would be more accurate.
A unit test typically tests a very small component - such as a function call - that might be one piece of larger functionality. Unit tests like these are often "white box" tests, so you can see the inner workings of your code.
Testing something from a user's point-of-view (such as your professor's commandline tests) are "black box" tests, and in these examples are at a more functional level rather than "unit" level.
I'm sure plenty of people may disagree with that, though - it's not a rigidly-defined set of terms.
A few things to keep in mind:
If you are writing tests as you code, then, by all means, use whatever language works best to give you rapid feedback. This enables fast test-code cycles (and is fun as well). BUT.
Always have well-written tests in the language of the consumer. How is your client/consumer going to call your functions? What language will they be using? Using the same language minimizes integration issues later on in the life-cycle.
It really depends on what it is you are trying to test. It almost always makes sense to write unit tests in the same language as the code you are testing so that you can construct the objects under test or invoke the functions under test, both of which can be most easily done in the same language, and verify that they work correctly. There are, however, cases in which it makes sense to use a different language, namely:
Integration tests that run a number of different components or applications together.
Tests that verify compilation or interpretation failures which could not be tested in the language, itself, since you are validating that an error occurs at the language level.
An example of #1 might be a program that starts up multiple different servers connected to each other, issues requests to the server, and verifies those responses. Or, as a simpler example, a program that simply forks an application under test as a subprocess and verifies that it produces the expected outputs for a given input.
An example of #2 might be a program that verifies that a certain piece of C++ code will produce a static assertion failure or that a particular template instantiation which is intentionally disallowed will result in a compilation failure if someone attempts to use it.
To answer your larger question, it is not bad practice per-se to write tests in a different language. Whatever makes the tests more convenient to write, easier to understand, more robust to changes in implementation, more sensitive to regressions, and better on any one of the properties that define good testing would be a good justification to write the tests one way vs another. If that means writing the tests in another language, then go for it. That being said, small unit tests typically need to be able to invoke the item under test directly which, in most cases, means writing the unit tests in the same language as the component under test.
I would say it depends on what you're actually trying to test. For true unit testing, it is, I think, best to test in the same language, or at least a binary-compatible language (i.e. testing Java with Groovy -- I use Spock in this case, which is Groovy based, to unit-test my Java code, since I can intermingle the Java with the Groovy), but if you are testing results, then I think it's fair to switch languages.
For example, I have tested the expected results when given a specific set of a data when running a Perl application via nose in Python. This works because I'm not unit testing the Perl code, per se, but the outcomes of that Perl code.
In that case, to unit test actual Perl functions that are part of the application, I would use a Perl-based test framework such as Test::More.
Why not, it's an awesome idea because you really understand that you are testing the unit like a black box.
Of course there may be technical issues involved, what if you need to mock some parts of the unit under test, that may be difficult in a different language.
This is a common practice for integration tests though, I've seen lots of programs driven from external tools such as a website from selenium, or an application from cucumber. Both those can be considered the same as a custom python script.
If you consider the difference between integration testing and unit testing is the number of things under test at any given time, the only reason why you shouldn't do this is tool support.
Example
Let's say you have a hypothetical API like this:
import foo
account_name = foo.register()
session = foo.login(account_name)
session.do_something()
The key point being that in order to do_something(), you need to be registered and logged in.
Here's an over-simplified, first-pass, suite of unit tests one might write:
# test_foo.py
import foo
def test_registration_should_succeed():
foo.register()
def test_login_should_succeed():
account_name = foo.register()
foo.login(account_name)
def test_do_something_should_succeed():
account_name = foo.register()
session = foo.login(account_name)
session.do_something()
The Problem
When registration fails, all the tests fail and that makes it non-obvious where
the real problem is. It looks like everything's broken, but really only one, crucial, thing is broken. It's hard to find that once crucial thing unless you are familiar with all the tests.
The Question
How do you structure your unit tests so that subsequent tests aren't executed when core functionality on which they depend fails?
Ideas
Here are possible solutions I've thought of.
Manually detect failures in each test and and raise SkipTest. - Works, but a lot of manual, error-prone work.
Leverage generators to not yield subsequent tests when the primary ones fail. - Not sure if this would actually work (because how do I "know" the previously yielded test failed).
Group tests into test classes. E.g., these are all the unit tests that require you to be logged in. - Not sure this actually addresses my issue. Wouldn't there be just as many failures?
Rather than answering the explicit question, I think a better answer is to use mock objects. In general, unit tests should not require accessing external databases (as is presumably required to log in). However, if you want to have some integration tests that do this (which is a good idea), then those tests should test the integration aspects, and your unit tests should tests the unit aspects. I would go so far as to keep the integration tests and the unit tests in separate files, so that you can quickly run the unit tests on a very regular basis, and run the integration tests on a slightly less regular basis (although still at least once a day).
This problem indicates that Foo is doing to much. You need to separate concerns. Then testing will become easy. If you had a Registrator, a LoginHandler and a DoSomething class, plus a central controller class that orchestrated the workflow, then everything could be tested separately.
I'm currently writing a set of unit tests for a Python microblogging library, and following advice received here have begun to use mock objects to return data as if from the service (identi.ca in this case).
However, surely by mocking httplib2 - the module I am using to request data - I am tying the unit tests to a specific implementation of my library, and removing the ability for them to function after refactoring (which is obviously one primary benefit of unit testing in the firt place).
Is there a best of both worlds scenario? The only one I can think of is to set up a microblogging server to use only for testing, but this would clearly be a large amount of work.
You are right that if you refactor your library to use something other than httplib2, then your unit tests will break. That isn't such a horrible dependency, since when that time comes it will be a simple matter to change your tests to mock out the new library.
If you want to avoid that, then write a very minimal wrapper around httplib2, and your tests can mock that. Then if you ever shift away from httplib2, you only have to change your wrapper. But notice the number of lines you have to change is the same either way, all that changes is whether they are in "test code" or "non-test code".
Not sure what your problem is. The mock class is part of the tests, conceptually at least. It is ok for the tests to depend on particular behaviour of the mock objects that they inject into the code being tested. Of course the injection itself should be shared across unit tests, so that it is easy to change the mockup implementation.
I'm wondering if there is a test framework that allows for tests to be declared as being dependent on other tests. This would imply that they should not be run, or that their results should not be prominently displayed, if the tests that they depend on do not pass.
The point of such a setup would be to allow the root cause to be more readily determined in a situation where there are many test failures.
As a bonus, it would be great if there some way to use an object created with one test as a fixture for other tests.
Is this feature set provided by any of the Python testing frameworks? Or would such an approach be antithetical to unit testing's underlying philosophy?
Or would such an approach be
antithetical to unit testing's
underlying philosophy?
Yep...if it is a unit test, it should be able to run on its own. Anytime I have found someone wanting to create dependencies on tests was due to the code being structured in a poor manner. I am not saying this is the instance in your case but it can often be a sign of code smell.
Proboscis is a Python test framework that extends Python’s built-in unittest module and Nose with features from TestNG.
Sounds like what you're looking for. Note that it works a bit differently to unittest and Nose, but that page explains how it works pretty well.
This seems to be a recurring question - e.g. #3396055
It most probably isn't a unit-test, because they should be fast (and independent). So running them all isn't a big drag. I can see where this might help in short-circuiting integration/regression runs to save time. If this is a major need for you, I'd tag the setup tests with [Core] or some such attribute.
I then proceed to write a build script which has two tasks
Taskn : run all tests in X,Y,Z dlls marked with tag [Core]
Taskn+1 depends on Taskn: run all tests in X,Y,Z dlls excluding those marked with tag [Core]
(Taskn+1 shouldn't run if Taskn didn't succeed.) It isn't a perfect solution - e.g. it would just bail out if any one [Core] test failed. But I guess you should be fixing the Core ones instead of proceeding with Non-Core tests.
It looks like what you need is not to prevent the execution of your dependent tests but to report the results of your unit test in a more structured way that allows you to identify when an error in a test cascades onto other failed tests.
The test runners py.test, Nosetests and unit2/unittest2 all support the notion of "exiting after the first failure". py.test more generally allows to specify "--maxfail=NUM" to stop running and reporting after NUM failures. This may already help your case especially since maintaining and updating dependencies for tests may not be that interesting a task.
I manage the testing for a very large financial pricing system. Recently our HQ have insisted that we verify that every single part of our project has a meaningful test in place. At the very least they want a system which guarantees that when we change something we can spot unintentional changes to other sub-systems. Preferably they want something which validates the correctness of every component in our system.
That's obviously going to be quite a lot of work! It could take years, but for this kind of project it's worth it.
I need to find out which parts of our code are not covered by any of our unit-tests. If I knew which parts of my system were untested then I could set about developing new tests which would eventually approach towards my goal of complete test-coverage.
So how can I go about running this kind of analysis. What tools are available to me?
I use Python 2.4 on Windows 32bit XP
UPDATE0:
Just to clarify: We have a very comprehensive unit-test suite (plus a seperate and very comprehensive regtest suite which is outside the scope of this exercise). We also have a very stable continuous integration platform (built with Hudson) which is designed to split-up and run standard python unit-tests across our test facility: Approx 20 PCs built to the company spec.
The object of this exercise is to plug any gaps in our python unittest suite (only) suite so that every component has some degree of unittest coverage. Other developers will be taking responsibility for non Python components of the project (which are also outside of scope).
"Component" is intentionally vague: Sometime it will be a class, other time an entire module or assembly of modules. It might even refer to a single financial concept (e.g. a single type of financial option or a financial model used by many types of option). This cake can be cut in many ways.
"Meaningful" tests (to me) are ones which validate that the function does what the developer originally intended. We do not want to simply reproduce the regtests in pure python. Often the developer's intent is not immediatly obvious, hence the need to research and clarify anything which looks vague and then enshrine this knowledge in a unit-test which makes the original intent quite explicit.
For the code coverage alone, you could use coverage.py.
As for coverage.py vs figleaf:
figleaf differs from the gold standard
of Python coverage tools
('coverage.py') in several ways.
First and foremost, figleaf uses the
same criterion for "interesting" lines
of code as the sys.settrace function,
which obviates some of the complexity
in coverage.py (but does mean that
your "loc" count goes down). Second,
figleaf does not record code executed
in the Python standard library, which
results in a significant speedup. And
third, the format in which the
coverage format is saved is very
simple and easy to work with.
You might want to use figleaf if
you're recording coverage from
multiple types of tests and need to
aggregate the coverage in interesting
ways, and/or control when coverage is
recorded. coverage.py is a better
choice for command-line execution, and
its reporting is a fair bit nicer.
I guess both have their pros and cons.
First step would be writing meaningfull tests. If you'll be writing tests only meant to reach full coverage, you'll be counter-productive; it will probably mean you'll focus on unit's implementation details instead of it's expectations.
BTW, I'd use nose as unittest framework (http://somethingaboutorange.com/mrl/projects/nose/0.11.1/); it's plugin system is very good and leaves coverage option to you (--with-coverage for Ned's coverage, --with-figleaf for Titus one; support for coverage3 should be coming), and you can write plugisn for your own build system, too.
FWIW, this is what we do. Since I don't know about your Unit-Test and Regression-Test setup, you have to decide yourself whether this is helpful.
Every Python package has
UnitTests.
We automatically detect unit tests using nose. Nose automagically detects standard Python unit tests (basically everything that looks like a test). Thereby we don't miss unit-tests. Nose also has a plug-in concept so that you can produce, e.g. nice output.
We strive for 100% coverage for
unit-testing. To this end, we use
Coverage
to check, because a nose-plugin provides integration.
We have set up Eclipse (our IDE) to automatically run nose whenever a file changes so that the unit-tests always get executed, which shows code-coverage as a by-product.
"every single part of our project has a meaningful test in place"
"Part" is undefined. "Meaningful" is undefined. That's okay, however, since it gets better further on.
"validates the correctness of every component in our system"
"Component" is undefined. But correctness is defined, and we can assign a number of alternatives to component. You only mention Python, so I'll assume the entire project is pure Python.
Validates the correctness of every module.
Validates the correctness of every class of every module.
Validates the correctness of every method of every class of every module.
You haven't asked about line of code coverage or logic path coverage, which is a good thing. That way lies madness.
"guarantees that when we change something we can spot unintentional changes to other sub-systems"
This is regression testing. That's a logical consequence of any unit testing discipline.
Here's what you can do.
Enumerate every module. Create a unittest for that module that is just a unittest.main(). This should be quick -- a few days at most.
Write a nice top-level unittest script that uses a testLoader to all unit tests in your tests directory and runs them through the text runner. At this point, you'll have a lot of files -- one per module -- but no actual test cases. Getting the testloader and the top-level script to work will take a few days. It's important to have this overall harness working.
Prioritize your modules. A good rule is "most heavily reused". Another rule is "highest risk from failure". Another rule is "most bugs reported". This takes a few hours.
Start at the top of the list. Write a TestCase per class with no real methods or anything. Just a framework. This takes a few days at most. Be sure the docstring for each TestCase positively identifies the Module and Class under test and the status of the test code. You can use these docstrings to determine test coverage.
At this point you'll have two parallel tracks. You have to actually design and implement the tests. Depending on the class under test, you may have to build test databases, mock objects, all kinds of supporting material.
Testing Rework. Starting with your highest priority untested module, start filling in the TestCases for each class in each module.
New Development. For every code change, a unittest.TestCase must be created for the class being changed.
The test code follows the same rules as any other code. Everything is checked in at the end of the day. It has to run -- even if the tests don't all pass.
Give the test script to the product manager (not the QA manager, the actual product manager who is responsible for shipping product to customers) and make sure they run the script every day and find out why it didn't run or why tests are failing.
The actual running of the master test script is not a QA job -- it's everyone's job. Every manager at every level of the organization has to be part of the daily build script output. All of their jobs have to depend on "all tests passed last night". Otherwise, the product manager will simply pull resources away from testing and you'll have nothing.
Assuming you already have a relatively comprehensive test suite, there are tools for the python part. The C part is much more problematic, depending on tools availability.
For python unit tests
For C code, it is difficult on many platforms because gprof, the Gnu code profiler cannot handle code built with -fPIC. So you have to build every extension statically in this case, which is not supported by many extensions (see my blog post for numpy, for example). On windows, there may be better code coverage tools for compiled code, but that may require you to recompile the extensions with MS compilers.
As for the "right" code coverage, I think a good balance it to avoid writing complicated unit tests as much as possible. If a unit test is more complicated than the thing it tests, then it is a probably not a good test, or a broken test.