What to test when using TDD and asserts

What to test when using TDD and asserts - python

First, I'm a newbie at both TDD and Python. I don't code for a living, but I do write a ton of code.
I'm in the process of moving a large project from Matlab to Python. 1) because the language has been limiting, and 2) when analyzing edge cases and debugging, I was starting to break as much as I was fixing. So I decided I'll start from scratch, with TDD this time.
I get the TDD cycle (red, green, refractor). The question is, -what- does one test? For an individual unit/function, how many tests do you write? In this case, I already have the whole project in my head, even though there will be a few library, and structural, changes between the languages.
Also, I've heard ad nauseam to use asserts everywhere, so I am. But sometimes it seems I'm writing tests to verify my asserts. And it seems like a waste of time to write a valid input type test for every argument, one function at a time.

You've ran into the very common realisation across programmers who try to embrace testing (good for you!) - that unit testing, for most part, sucks. Or at the very least is a wrong tool for what you are trying to do.
And this is why I recommend for you to drop the heresy of unit testing and embrace the elegance and beauty of behaviour testing. There even is a fantastic library for it: behave.
This not only allows you to re-use your code, but also forces you to describe logic of your application in plain English (which, when done right, is equivalent to proper documentation) which helps to narrow logic holes and other design non-senses that may arise. It also removes the problem of "what to test", as with BDD you should test every way the application can behave.
In your specific case you will definitely also be interested in the scenario outline functionality, where you can write the testing code once, and then just write long list of examples to run by this code.

Related

Is it acceptable practice to unit-test a program in a different language?

I have a static library I created from C++, and would like to test this using a Driver code.
I noticed one of my professors like to do his tests using python, but he simply executes the program (not a library in this case, but an executable) using random test arguments.
I would like to take this approach, but I realized that this is a library and doesn't have a main function; that would mean I should either create a Driver.cpp class, or wrap the library into python using SWIG or boost python.
I’m planning to do the latter because it seems more fun, but logically, I feel that there is going to be more bugs when trying to wrap a library to a different language just to test it, rather than test it in its native language.
Is testing programs in a different language an accepted practice in the real world, or is this bad practice?

I'd say that it's best to test the API that your users will be exposed to. Other tests are good to have as well, but that's the most important aspect.
If your users are going to write C/C++ code linking to your library, then it would be good to have tests making use of your library the same way.
If you are going to ship a Python wrapper (why not?) then you should have Python tests.
Of course, there is a convenience aspect to this, as well. It may be easier to write tests in Python, and you might have time constraints that make it more appealing, etc.
I guess what I'm saying is: There's nothing inherently wrong with tests being in a different language from the code under test (that's totally normal for testing a REST API, for instance), but make sure you have tests for the public-facing API at a minimum.
Aside, on terminology:
I don't think the types of tests you are describing are "unit tests" in the usual sense of the term. Probably "functional test" would be more accurate.
A unit test typically tests a very small component - such as a function call - that might be one piece of larger functionality. Unit tests like these are often "white box" tests, so you can see the inner workings of your code.
Testing something from a user's point-of-view (such as your professor's commandline tests) are "black box" tests, and in these examples are at a more functional level rather than "unit" level.
I'm sure plenty of people may disagree with that, though - it's not a rigidly-defined set of terms.

A few things to keep in mind:
If you are writing tests as you code, then, by all means, use whatever language works best to give you rapid feedback. This enables fast test-code cycles (and is fun as well). BUT.
Always have well-written tests in the language of the consumer. How is your client/consumer going to call your functions? What language will they be using? Using the same language minimizes integration issues later on in the life-cycle.

It really depends on what it is you are trying to test. It almost always makes sense to write unit tests in the same language as the code you are testing so that you can construct the objects under test or invoke the functions under test, both of which can be most easily done in the same language, and verify that they work correctly. There are, however, cases in which it makes sense to use a different language, namely:
Integration tests that run a number of different components or applications together.
Tests that verify compilation or interpretation failures which could not be tested in the language, itself, since you are validating that an error occurs at the language level.
An example of #1 might be a program that starts up multiple different servers connected to each other, issues requests to the server, and verifies those responses. Or, as a simpler example, a program that simply forks an application under test as a subprocess and verifies that it produces the expected outputs for a given input.
An example of #2 might be a program that verifies that a certain piece of C++ code will produce a static assertion failure or that a particular template instantiation which is intentionally disallowed will result in a compilation failure if someone attempts to use it.
To answer your larger question, it is not bad practice per-se to write tests in a different language. Whatever makes the tests more convenient to write, easier to understand, more robust to changes in implementation, more sensitive to regressions, and better on any one of the properties that define good testing would be a good justification to write the tests one way vs another. If that means writing the tests in another language, then go for it. That being said, small unit tests typically need to be able to invoke the item under test directly which, in most cases, means writing the unit tests in the same language as the component under test.

I would say it depends on what you're actually trying to test. For true unit testing, it is, I think, best to test in the same language, or at least a binary-compatible language (i.e. testing Java with Groovy -- I use Spock in this case, which is Groovy based, to unit-test my Java code, since I can intermingle the Java with the Groovy), but if you are testing results, then I think it's fair to switch languages.
For example, I have tested the expected results when given a specific set of a data when running a Perl application via nose in Python. This works because I'm not unit testing the Perl code, per se, but the outcomes of that Perl code.
In that case, to unit test actual Perl functions that are part of the application, I would use a Perl-based test framework such as Test::More.

Why not, it's an awesome idea because you really understand that you are testing the unit like a black box.
Of course there may be technical issues involved, what if you need to mock some parts of the unit under test, that may be difficult in a different language.
This is a common practice for integration tests though, I've seen lots of programs driven from external tools such as a website from selenium, or an application from cucumber. Both those can be considered the same as a custom python script.
If you consider the difference between integration testing and unit testing is the number of things under test at any given time, the only reason why you shouldn't do this is tool support.

How do i test/refactor my tests?

I have a test suit for my app. As the test suit grew organically, the tests have a lot of repeated code which can be refactored.
However I would like to ensure that the test suite doesn't change with the refactor. How can test that my tests are invariant with the refactor.
(I am using Python+UnitTest), but I guess the answer to this can be language agnostic.

The real test for the tests is the production code.
An effective way to check that a test code refactor hasn't broken your tests would be to do Mutation Testing, in which a copy of the code under test is mutated to introduce errors in order to verify that your tests catch the errors. This is a tactic used by some test coverage tools.
I haven't used it (and I'm not really a python coder), but this seems to be supported by the Python Mutant Tester, so that might be worth looking at.

Coverage.py is your friend.
Move over all the tests you want to refactor into "system tests" (or some such tag). Refactor the tests you want (you would be doing unit tests here right?) and monitor the coverage:
After running your new unit tests but before running the system tests
After running both the new unit tests and the system tests.
In an ideal case, the coverage would be same or higher but you can thrash your old system tests.
FWIW, py.test provides mechanism for easily tagging tests and running only the specific tests and is compatible with unittest2 tests.

Interesting question - I'm always keen to hear discussions of the type "how do I test the tests?!". And good points from #marksweb above too.
It's always a challenge to check your tests are actually doing what you want them to do and testing what you intend, but good to get this right and do it properly. I always try to consider the rule-of-thumb that testing should make up 1/3 of development effort in any project... regardless of project time constraints, pressures and problems that inevitably crop up.
If you intend to continue and grow your project have you considered refactoring like you say, but in a way that creates a proper test framework that allows test driven development (TDD) of any future additions of functionality or general expansion of the project?

Although you have mentioned Python, I would like to comment how refactoring is applied in Smalltalk. Most modern Smalltalk implementations include a "Refactoring Browser" integrated in a System Browser to restructure source code. The RB includes a rewrite framework to perform dynamically the transformations you asked about preserving the system behavior and stability. A way to use it is to open a scoped browser, apply refactorings and review/edit changes before commiting through a diff tool. I don't know about maturity of Python refactoring tools, but it took many iteration cycles (years) for the Smalltalk community to have such an amazing piece of software.
Don Roberts and John Brant wrote one of the first refactoring browser tools which now serves as the standard for refactoring tools. There are some videos and here demonstrating some of these features. For promoting a method into a superclass, in Pharo you just select the method, refactor and "pull up" menu item. The rule will detect and let you review the proposed duplicated sub-implementors for deletion before its execution. Application of refactorings are regardless of Testing code.

In theory you could write a test for the test, mocking the actualy object under test.But I guess that is just way to much work and not worth it.
So what you are left with are some strategies, that will help, but not make this fail safe.
Work very carefully and slowly. Use the features of you IDEs as much as possible in order to limit the chance of human error.
Work in pairs. A partner looking over your shoulder might just spot the glitch that you missed.
Copy the test, then refactor it. When done introduce errors in the production code to ensure, both tests find the the problem in the same (or equivalent) ways. Only then remove the original test.
The last step can be done by tools, although I don't know the python flavors. The keyword to search for is 'mutation testing'.
Having said all that, I'm personally satisfied with steps 1+2.

I can't see an easy way to refactor a test suite, and depending on the extent of your refactor you're obviously going to have to change the test suite. How big is your test suite?
Refactoring properly takes time and attention to detail (and a lot of Ctrl+C Ctrl+V!). Whenever I've refactored my tests I don't try and find any quick ways of doing things, besides find & replace, because there is too much risk involved.
You're best of doing things properly and manually albeit slowly if you want to make keep the quality of your tests.

Don't refactor the test suite.
The purpose of refactoring is to make it easier to maintain the code, not to satisfy some abstract criterion of "code niceness". Test code doesn't need to be nice, it doesn't need to avoid repetition, but it does need to be thorough. Once you have a test that is valid (i.e. it really does test necessary conditions on the code under test), you should never remove it or change it, so test code doesn't need to be easy to maintain en masse.
If you like, you can rewrite the existing tests to be nice, and run the new tests in addition to the old ones. This guarantees that the new combined test suite catches all the errors that the old one did (and maybe some more, as you expand the new code in future).
There are two ways that a test can be deemed invalid -- you realise that it's wrong (i.e. it sometimes fails falsely for correct code under test), or else the interface under test has changed (to remove the API tested, or to permit behaviour that previously was a test failure). In that case you can remove a test from the suite. If you realise that a whole bunch of tests are wrong (because they contain duplicated code that is wrong), then you can remove them all and replace them with a refactored and corrected version. You don't remove tests just because you don't like the style of their source.
To answer your specific question: to test that your new test code is equivalent to the old code, you would have to ensure (a) all the new tests pass on your currently-correct-as-far-as-you-known code base, which is easy, but also (b) the new tests detect all the errors that the old tests detect, which is usually not possible because you don't have on hand a suite of faulty implementations of the code under test.

Test code can be the best low level documentation of your API since they do not outdate as long as they pass and are correct. But messy test code doesn't serve that purpose very well. So refactoring is essential.
Also might your tested code change over time. So do the tests. If you want that to be smooth, code duplication must be minimized and readability is a key.
Tests should be easy to read and always test one thing at once and make the follwing explicit:
what are the preconditions?
what is being executed?
what is the expected outcome?
If that is considered, it should be pretty safe to refactor the test code. One step at a time and, as #Don Ruby mentioned, let your production code be the test for the test.
For many refactoring you can often safely rely on advanced IDE tooling – if you beware of side effects in the extracted code.
Although I agree that refactoring without proper test coverage should be avoided, I think writing tests for your tests is almost absurd in usual contexts.

Using Python code coverage tool for understanding and pruning back source code of a large library

My project targets a low-cost and low-resource embedded device. I am dependent on a relatively large and sprawling Python code base, of which my use of its APIs is quite specific.
I am keen to prune the code of this library back to its bare minimum, by executing my test suite within a coverage tools like Ned Batchelder's coverage or figleaf, then scripting removal of unused code within the various modules/files. This will help not only with understanding the libraries' internals, but also make writing any patches easier. Ned actually refers to the use of coverage tools to "reverse engineer" complex code in one of his online talks.
My question to the SO community is whether people have experience of using coverage tools in this way that they wouldn't mind sharing? What are the pitfalls if any? Is the coverage tool a good choice? Or would I be better off investing my time with figleaf?
The end-game is to be able to automatically generate a new source tree for the library, based on the original tree, but only including the code actually used when I run nosetests.
If anyone has developed a tool that does a similar job for their Python applications and libraries, it would be terrific to get a baseline from which to start development.
Hopefully my description makes sense to readers...

What you want isn't "test coverage", it is the transitive closure of "can call" from the root of the computation. (In threaded applications, you have to include "can fork").
You want to designate some small set (perhaps only 1) of functions that make up the entry points of your application, and want to trace through all possible callees (conditional or unconditional) of that small set. This is the set of functions you must have.
Python makes this very hard in general (IIRC, I'm not a deep Python expert) because of dynamic dispatch and especially due to "eval". Reasoning about what function can get called can be pretty tricky for a static analyzers applied to highly dynamic languages.
One might use test coverage as a way to seed the "can call" relation with specific "did call" facts; that could catch a lot of dynamic dispatches (dependent on your test suite coverage). Then the result you want is the transitive closure of "can or did" call. This can still be erroneous, but is likely to be less so.
Once you get a set of "necessary" functions, the next problem will be removing the unnecessary functions from the source files you have. If the number of files you start with is large, the manual effort to remove the dead stuff may be pretty high. Worse, you're likely to revise your application, and then the answer as to what to keep changes. So for every change (release), you need to reliably recompute this answer.
My company builds a tool that does this analysis for Java packages (with appropriate caveats regarding dynamic loads and reflection): the input is a set of Java files and (as above) a designated set of root functions. The tool computes the call graph, and also finds all dead member variables and produces two outputs: a) the list of purportedly dead methods and members, and b) a revised set of files with all the "dead" stuff removed. If you believe a), then you use b). If you think a) is wrong, then you add elements listed in a) to the set of roots and repeat the analysis until you think a) is right. To do this, you need a static analysis tool that parse Java, compute the call graph, and then revise the code modules to remove the dead entries. The basic idea applies to any language.
You'd need a similar tool for Python, I'd expect.
Maybe you can stick to just dropping files that are completely unused, although that may still be a lot of work.

As others have pointed out, coverage can tell you what code has been executed. The trick for you is to be sure that your test suite truly exercises the code fully. The failure case here is over-pruning because your tests skipped some code that will really be needed in production.
Be sure to get the latest version of coverage.py (v3.4): it adds a new feature to indicate files that are never executed at all.
BTW:: for a first cut prune, Python provides a neat trick: remove all the .pyc files in your source tree, then run your tests. Files that still have no .pyc file were clearly not executed!

I haven't used coverage for pruning out, but it seems like it should do well. I've used the combination of nosetests + coverage, and it worked better for me than figleaf. In particular, I found the html report from nosetests+coverage to be helpful -- this should be helpful to you in understanding where the unused portions of the library are.

TDD - beginner problems and stumbling blocks

While I've written unit tests for most of the code I've done, I only recently got my hands on a copy of TDD by example by Kent Beck. I have always regretted certain design decisions I made since they prevented the application from being 'testable'. I read through the book and while some of it looks alien, I felt that I could manage it and decided to try it out on my current project which is basically a client/server system where the two pieces communicate via. USB. One on the gadget and the other on the host. The application is in Python.
I started off and very soon got entangled in a mess of rewrites and tiny tests which I later figured didn't really test anything. I threw away most of them and and now have a working application for which the tests have all coagulated into just 2.
Based on my experiences, I have a few questions which I'd like to ask. I gained some information from New to TDD: Are there sample applications with tests to show how to do TDD? but have some specific questions which I'd like answers to/discussion on.
Kent Beck uses a list which he adds to and strikes out from to guide the development process. How do you make such a list? I initially had a few items like "server should start up", "server should abort if channel is not available" etc. but they got mixed and finally now, it's just something like "client should be able to connect to server" (which subsumed server startup etc.).
How do you handle rewrites? I initially selected a half duplex system based on named pipes so that I could develop the application logic on my own machine and then later add the USB communication part. It them moved to become a socket based thing and then moved from using raw sockets to using the Python SocketServer module. Each time things changed, I found that I had to rewrite considerable parts of the tests which was annoying. I'd figured that the tests would be a somewhat invariable guide during my development. They just felt like more code to handle.
I needed a client and a server to communicate through the channel to test either side. I could mock one of the sides to test the other but then the whole channel wouldn't be tested and I worry that I'd miss that. This detracted from the whole red/green/refactor rhythm. Is this just lack of experience or am I doing something wrong?
The "Fake it till you make it" left me with a lot of messy code that I later spent a lot of time to refactor and clean up. Is this the way things work?
At the end of the session, I now have my client and server running with around 3 or 4 unit tests. It took me around a week to do it. I think I could have done it in a day if I were using the unit tests after code way. I fail to see the gain.
I'm looking for comments and advice from people who have implemented large non trivial projects completely (or almost completely) using this methodology. It makes sense to me to follow the way after I have something already running and want to add a new feature but doing it from scratch seems to tiresome and not worth the effort.
P.S. : Please let me know if this should be community wiki and I'll mark it like that.
Update 0 : All the answers were equally helpful. I picked the one I did because it resonated with my experiences the most.
Update 1: Practice Practice Practice!

As a preliminary comment, TDD takes practice. When I look back at the tests I wrote when I began TDD, I see lots of issues, just like when I look at code I wrote a few year ago. Keep doing it, and just like you begin to recognize good code from bad, the same things will happen with your tests - with patience.
How do you make such a list? I
initially had a few items like "server
should start up", "server should abort
if channel is not available" etc. but
they got mixed and finally now, it's
just something like "client should be
able to connect to server"
"The list" can be rather informal (that's the case in Beck's book) but when you move into making the items into tests, try to write the statements in a "[When something happens to this] then [this condition should be true on that]" format. This will force you to think more about what it is you are verifying, how you would verify it and translates directly into tests - or if it doesn't it should give you a clue about which piece of functionality is missing. Think use case / scenario. For instance "server should start up" is unclear, because nobody is initiating an action.
Each time things changed, I found that
I had to rewrite considerable parts of
the tests which was annoying. I'd
figured that the tests would be a
somewhat invariable guide during my
development. They just felt like more
code to handle.
First, yes, tests are more code, and requires maintenance - and writing maintainable tests takes practice. I agree with S. Lott, if you need to change your tests a lot, you are probably testing "too deep". Ideally you want to test at the level of the public interface, which is not likely to change, and not at the level of the implementation detail, which could evolve. But part of the exercise is about coming up with a design, so you should expect to get some of it wrong and have to move/refactor your tests as well.
I could mock one of the sides to test
the other but then the whole channel
wouldn't be tested and I worry that
I'd miss that.
Not totally sure about that one. From the sound of it, using a mock was the right idea: take one side, mock the other one, and check that each side works, assuming the other one is implemented properly. Testing the whole system together is integration testing, which you also want to do, but is typically not part of the TDD process.
The "Fake it till you make it" left me
with a lot of messy code that I later
spent a lot of time to refactor and
clean up. Is this the way things work?
You should spend a lot of time refactoring while doing TDD. On the other hand, when you fake it, it's temporary, and your immediate next step should be to un-fake it. Typically you shouldn't have multiple tests passing because you faked it - you should be focusing on one piece at a time, and work on refactoring it ASAP.
I think I could have done it in a day
if I were using the unit tests after
code way. I fail to see the gain.
Again, it takes practice, and you should get faster over time. Also, sometimes TDD is more fruitful than others, I find that in some situations, when I know exactly the code I want to write, it's just faster to write a good part of the code, and then write tests.
Besides Beck, one book I enjoyed is The Art of Unit Testing, by Roy Osherove. It's not a TDD book, and it is .Net-oriented, but you might want to give it a look anyways: a good part is about how to write maintainable tests, tests quality and related questions. I found that the book resonated with my experience after having written tests and sometimes struggled to do it right...
So my advice is, don't throw the towel too fast, and give it some time. You might also want to give it a shot on something easier - testing server communication related things doesn't sound like the easiest project to start with!

Kent Beck uses a list ... finally now, it's just something like "client should be able to connect to server" (which subsumed server startup etc.).
Often a bad practice.
Separate tests for each separate layer of the architecture are good.
Consolidated tests tend to obscure architectural issues.
However, only test the public functions. Not every function.
And don't invest a lot of time optimizing your testing. Redundancy in the tests doesn't hurt as much as it does in the working application. If things change and one test works, but another test breaks, perhaps then you can refactor your tests. Not before.
2. How do you handle rewrites? ... I found that I had to rewrite considerable parts of the tests.
You're testing at too low a level of detail. Test the outermost, public, visible interface. The part that's supposed to be unchanging.
And
Yes, significant architectural change means significant testing change.
And
The test code is how you prove things work. It is almost as important as the application itself. Yes, it's more code. Yes, you must manage it.
3. I needed a client and a server to communicate through the channel to test either side. I could mock one of the sides to test the other but then the whole channel wouldn't be tested ...
There are unit tests. With mocks.
There are integration tests, which test the whole thing.
Don't confuse them.
You can use unit test tools to do integration tests, but they're different things.
And you need to do both.
4. The "Fake it till you make it" left me with a lot of messy code that I later spent a lot of time to refactor and clean up. Is this the way things work?
Yes. That's exactly how it works. In the long run, some people find this more effective than straining their brains trying to do all the design up front. Some people don't like this and want to do all the design up front; you're free to do a lot of design up front if you want to.
I've found that refactoring is a good thing and design up front is too hard. Maybe it's because I've been coding for almost 40 years and my brain is wearing out.
5. I fail to see the gain.
All the true geniuses find that testing slows them down.
The rest of us can't be sure our code works until we have a complete set of tests that prove that it works.
If you don't need proof that your code works, you don't need testing.

Q. Kent Beck uses a list which he adds to and strikes out from to guide the development process. How do you make such a list? I initially had a few items like "server should start up", "server should abort if channel is not available" etc. but they got mixed and finally now, it's just something like "client should be able to connect to server" (which subsumed server startup etc.).
I start by picking anything I might check. In your example, you chose "server starts".
Server starts
Now I look for any simpler test I might want to write. Something with less variation, and fewer moving parts. I might consider "configured server correctly", for example.
Configured server correctly
Server starts
Really, though, "server starts" depends on "configured server correctly", so I make that link clear.
Configured server correctly
Server starts if configured correctly
Now I look for variations. I ask, "What could go wrong?" I could configure the server incorrectly. How many different ways that matter? Each of those makes a test. How might the server not start even though I configured it correctly? Each case of that makes a test.
Q. How do you handle rewrites? I initially selected a half duplex system based on named pipes so that I could develop the application logic on my own machine and then later add the USB communication part. It them moved to become a socket based thing and then moved from using raw sockets to using the Python SocketServer module. Each time things changed, I found that I had to rewrite considerable parts of the tests which was annoying. I'd figured that the tests would be a somewhat invariable guide during my development. They just felt like more code to handle.
When I change behavior, I find it reasonable to change the tests, and even to change them first! If I have to change tests that don't directly check the behavior I'm in the process of changing, though, that's a sign that my tests depend on too many different behaviors. Those are integration tests, which I think are a scam. (Google "Integration tests are a scam")
Q. I needed a client and a server to communicate through the channel to test either side. I could mock one of the sides to test the other but then the whole channel wouldn't be tested and I worry that I'd miss that. This detracted from the whole red/green/refactor rhythm. Is this just lack of experience or am I doing something wrong?
If I build a client, a server, and a channel, then I try to check each in isolation. I start with the client, and when I test-drive it, I decide how the server and channel need to behave. Then I implement the channel and server each to match the behavior I need. When checking the client, I stub the channel; when checking the server, I mock the channel; when checking the channel, I stub and mock both client and server. I hope this makes sense to you, since I have to make some serious assumptions about the nature of this client, server, and channel.
Q. The "Fake it till you make it" left me with a lot of messy code that I later spent a lot of time to refactor and clean up. Is this the way things work?
If you let your "fake it" code get very messy before cleaning it up, then you might have spent too long faking it. That said, I find that even though I end up cleaning up more code with TDD, the overall rhythm feels much better. This comes from practice.
Q. At the end of the session, I now have my client and server running with around 3 or 4 unit tests. It took me around a week to do it. I think I could have done it in a day if I were using the unit tests after code way. I fail to see the gain.
I have to say that unless your client and server are very, very simple, you need more than 3 or 4 tests each to check them thoroughly. I will guess that your tests check (or at least execute) a number of different behaviors at once, and that might account for the effort it took you to write them.
Also, don't measure the learning curve. My first real TDD experience consisted of re-writing 3 months' worth of work in 9, 14-hour days. I had 125 tests that took 12 minutes to run. I had no idea what I was doing, and it felt slow, but it felt steady, and the results were fantastic. I essentially re-wrote in 3 weeks what originally took 3 months to get wrong. If I wrote it now, I could probably do it in 3-5 days. The difference? My test suite would have 500 tests that take 1-2 seconds to run. That came with practice.

As a novice programmer, the thing I found tricky about test-driven development was the idea that testing should come first.
To the novice, that’s not actually true. Design comes first. (Interfaces, objects and classes, methods, whatever’s appropriate to your language.) Then you write your tests to that. Then you write the code that actually does stuff.
It’s been a while since I looked at the book, but Beck seems to write as if the design of the code just sort of happens unconsciously in your head. For experienced programmers, that may be true, but for noobs like me, nuh-uh.
I found the first few chapters of Code Complete really useful for thinking about design. They emphasise the fact that your design may well change, even once you’re down at the nitty gritty level of implementation. When that happens, you may well have to re-write your tests, because they were based on the same assumptions as your design.
Coding is hard. Let’s go shopping.

For point one, see a question I asked a while back relating to your first point.
Rather than handle the other points in turn, I'll offer some global advice. Practice. It took me a good while and a few 'dodgy' projects (personal though) to actual get TDD. Just Google for much more compelling reasons on why TDD is so good.
Despite the tests driving the design of my code, I still get a whiteboard and scribble out some design. From this, at least you have some idea of what you are meant to be doing. Then I produce the list of tests per fixture that I think I need. Once you start working, more features and tests get added to the list.
One thing that stood out from your question is the act of rewriting your tests again. This sounds like you are carrying out behavioural tests, rather than state. In other words, the tests sound too closely tied to your code. Thus, a simple change that doesn't effect the output will break some tests. Unit testing (at least good unit testing) too, is a skill to master.
I recommend the Google Testing Blog quite heavily because some of the articles on there made my testing for TDD projects much better.

The the named pipes were put behind the right interface, changing how that interface is implemented (from named pipes to sockets to another sockets library) should only impact tests for the component that implements that interface. So cutting things up more/differently would have helped... That interface the sockets are behind will likely evolve to.
I started doing TDD maybe 6 months ago? I am still learning myself. I can say over time my tests and code have gotten much better, so keep it up. I really recommend the book XUnit Design Patterns as well.

How do you make such a list to add to
and strike out from to guide the
development process? I initially had a
few items like "server should start
up", "server should abort if channel
is not available"
Items in TDD TODO lists are finer grained than that, they aim at testing one behavior of one method only, for instance:
test successful client connection
test client connection error type 1
test client connection error type 2
test successful client communication
test client communication fails when not connected
You could build a list of tests (positive and negative) for every example you gave. Moreover, when unit testing you do not establish any connection between the server and the client. You just invoke methods in isolation, ... This answers question 3.
How do you handle rewrites?
If the unit test tests behavior and not implementation, then they do not have to be rewritten. If unit test code really creates a named pipe to communicate with production code and, then obviously the tests have to be modified when switching from pipe to socket.
Unit tests shall stay away from external resources such as filesystems, networks, databases because they are slow, can be unavailable ... see these Unit Testing rules.
This implies the lowest level function are not unit tested, they will be tested with integration tests, where the whole system is tested end-to-end.

Debugging a scripting language like ruby

I am basically from the world of C language programming, now delving into the world of scripting languages like Ruby and Python.
I am wondering how to do debugging.
At present the steps I follow is,
I complete a large script,
Comment everything but the portion I
want to check
Execute the script
Though it works, I am not able to debug like how I would do in, say, a VC++ environment or something like that.
My question is, is there any better way of debugging?
Note: I guess it may be a repeated question, if so, please point me to the answer.

Your sequence seems entirely backwards to me. Here's how I do it:
I write a test for the functionality I want.
I start writing the script, executing bits and verifying test results.
I review what I'd done to document and publish.
Specifically, I execute before I complete. It's way too late by then.
There are debuggers, of course, but with good tests and good design, I've almost never needed one.

Here's a screencast on ruby debugging with ruby-debug.

Seems like the problem here is that your environment (Visual Studio) doesn't support these languages, not that these languages don't support debuggers in general.
Perl, Python, and Ruby all have fully-featured debuggers; you can find other IDEs that help you, too. For Ruby, there's RubyMine; for Perl, there's Komodo. And that's just off the top of my head.

There is a nice gentle introduction to the Python debugger here

If you're working with Python then you can find a list of debugging tools here to which I just want to add Eclipse with the Pydev extension, which makes working with breakpoints etc. also very simple.

My question is, is there any better way of debugging?"
Yes.
Your approach, "1. I complete a large script, 2. Comment everything but the portion I want to check, 3. Execute the script" is not really the best way to write any software in any language (sorry, but that's the truth.)
Do not write a large anything. Ever.
Do this.
Decompose your problem into classes of objects.
For each class, write the class by
2a. Outline the class, focus on the external interface, not the implementation details.
2b. Write tests to prove that interface works.
2c. Run the tests. They'll fail, since you only outlined the class.
2d. Fix the class until it passes the test.
2e. At some points, you'll realize your class designs aren't optimal. Refactor your design, assuring your tests still pass.
Now, write your final script. It should be short. All the classes have already been tested.
3a. Outline the script. Indeed, you can usually write the script.
3b. Write some test cases that prove the script works.
3c. Runt the tests. They may pass. You're done.
3d. If the tests don't pass, fix things until they do.
Write many small things. It works out much better in the long run that writing a large thing and commenting parts of it out.

Script languages have no differences compared with other languages in the sense that you still have to break your problems into manageable pieces -- that is, functions. So, instead of testing the whole script after finishing the whole script, I prefer to test those small functions before integrating them. TDD always helps.

There's a SO question on Ruby IDEs here - and searching for "ruby IDE" offers more.
I complete a large script
That's what caught my eye: "complete", to me, means "done", "finished", "released". Whether or not you write tests before writing the functions that pass them, or whether or not you write tests at all (and I recommend that you do) you should not be writing code that can't be run (which is a test in itself) until it's become large. Ruby and Python offer a multitude of ways to write small, individually-testable (or executable) pieces of code, so that you don't have to wait for (?) days before you can run the thing.
I'm building a (Ruby) database translation/transformation script at the moment - it's up to about 1000 lines and still not done. I seldom go more than 5 minutes without running it, or at least running the part on which I'm working. When it breaks (I'm not perfect, it breaks a lot ;-p) I know where the problem must be - in the code I wrote in the last 5 minutes. Progress is pretty fast.
I'm not asserting that IDEs/debuggers have no place: some problems don't surface until a large body of code is released: it can be really useful on occasion to drop the whole thing into a debugging environment to find out what is going on. When third-party libraries and frameworks are involved it can be extremely useful to debug into their code to locate problems (which are usually - but not always - related to faulty understanding of the library function).

You can debug your Python scripts using the included pdb module. If you want a visual debugger, you can download winpdb - don't be put off by that "win" prefix, winpdb is cross-platform.

The debugging method you described is perfect for a static language like C++, but given that the language is so different, the coding methods are similarly different. One of the big very important things in a dynamic language such as Python or Ruby is the interactive toplevel (what you get by typing, say python on the command line). This means that running a part of your program is very easy.
Even if you've written a large program before testing (which is a bad idea), it is hopefully separated into many functions. So, open up your interactive toplevel, do an import thing (for whatever thing happens to be) and then you can easily start testing your functions one by one, just calling them on the toplevel.
Of course, for a more mature project, you probably want to write out an actual test suite, and most languages have a method to do that (in Python, this is doctest and nose, don't know about other languages). At first, though, when you're writing something not particularly formal, just remember a few simple rules of debugging dynamic languages:
Start small. Don't write large programs and test them. Test each function as you write it, at least cursorily.
Use the toplevel. Running small pieces of code in a language like Python is extremely lightweight: fire up the toplevel and run it. Compare with writing a complete program and the compile-running it in, say, C++. Use that fact that you can quickly change the correctness of any function.
Debuggers are handy. But often, so are print statements. If you're only running a single function, debugging with print statements isn't that inconvenient, and also frees you from dragging along an IDE.

There's a lot of good advice here, i recommend going through some best practices:
http://github.com/edgecase/ruby_koans
http://blog.rubybestpractices.com/
http://on-ruby.blogspot.com/2009/01/ruby-best-practices-mini-interview-2.html
(and read Greg Brown's book, it's superb)
You talk about large scripts. A lot of my workflow is working out logic in irb or the python shell, then capturing them into a cascade of small, single-task focused methods, with appropriate tests (not 100% coverage, more focus on edge and corner cases).
http://binstock.blogspot.com/2008/04/perfecting-oos-small-classes-and-short.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.