Debugging a scripting language like ruby

Debugging a scripting language like ruby - python

I am basically from the world of C language programming, now delving into the world of scripting languages like Ruby and Python.
I am wondering how to do debugging.
At present the steps I follow is,
I complete a large script,
Comment everything but the portion I
want to check
Execute the script
Though it works, I am not able to debug like how I would do in, say, a VC++ environment or something like that.
My question is, is there any better way of debugging?
Note: I guess it may be a repeated question, if so, please point me to the answer.

Your sequence seems entirely backwards to me. Here's how I do it:
I write a test for the functionality I want.
I start writing the script, executing bits and verifying test results.
I review what I'd done to document and publish.
Specifically, I execute before I complete. It's way too late by then.
There are debuggers, of course, but with good tests and good design, I've almost never needed one.

Here's a screencast on ruby debugging with ruby-debug.

Seems like the problem here is that your environment (Visual Studio) doesn't support these languages, not that these languages don't support debuggers in general.
Perl, Python, and Ruby all have fully-featured debuggers; you can find other IDEs that help you, too. For Ruby, there's RubyMine; for Perl, there's Komodo. And that's just off the top of my head.

There is a nice gentle introduction to the Python debugger here

If you're working with Python then you can find a list of debugging tools here to which I just want to add Eclipse with the Pydev extension, which makes working with breakpoints etc. also very simple.

My question is, is there any better way of debugging?"
Yes.
Your approach, "1. I complete a large script, 2. Comment everything but the portion I want to check, 3. Execute the script" is not really the best way to write any software in any language (sorry, but that's the truth.)
Do not write a large anything. Ever.
Do this.
Decompose your problem into classes of objects.
For each class, write the class by
2a. Outline the class, focus on the external interface, not the implementation details.
2b. Write tests to prove that interface works.
2c. Run the tests. They'll fail, since you only outlined the class.
2d. Fix the class until it passes the test.
2e. At some points, you'll realize your class designs aren't optimal. Refactor your design, assuring your tests still pass.
Now, write your final script. It should be short. All the classes have already been tested.
3a. Outline the script. Indeed, you can usually write the script.
3b. Write some test cases that prove the script works.
3c. Runt the tests. They may pass. You're done.
3d. If the tests don't pass, fix things until they do.
Write many small things. It works out much better in the long run that writing a large thing and commenting parts of it out.

Script languages have no differences compared with other languages in the sense that you still have to break your problems into manageable pieces -- that is, functions. So, instead of testing the whole script after finishing the whole script, I prefer to test those small functions before integrating them. TDD always helps.

There's a SO question on Ruby IDEs here - and searching for "ruby IDE" offers more.
I complete a large script
That's what caught my eye: "complete", to me, means "done", "finished", "released". Whether or not you write tests before writing the functions that pass them, or whether or not you write tests at all (and I recommend that you do) you should not be writing code that can't be run (which is a test in itself) until it's become large. Ruby and Python offer a multitude of ways to write small, individually-testable (or executable) pieces of code, so that you don't have to wait for (?) days before you can run the thing.
I'm building a (Ruby) database translation/transformation script at the moment - it's up to about 1000 lines and still not done. I seldom go more than 5 minutes without running it, or at least running the part on which I'm working. When it breaks (I'm not perfect, it breaks a lot ;-p) I know where the problem must be - in the code I wrote in the last 5 minutes. Progress is pretty fast.
I'm not asserting that IDEs/debuggers have no place: some problems don't surface until a large body of code is released: it can be really useful on occasion to drop the whole thing into a debugging environment to find out what is going on. When third-party libraries and frameworks are involved it can be extremely useful to debug into their code to locate problems (which are usually - but not always - related to faulty understanding of the library function).

You can debug your Python scripts using the included pdb module. If you want a visual debugger, you can download winpdb - don't be put off by that "win" prefix, winpdb is cross-platform.

The debugging method you described is perfect for a static language like C++, but given that the language is so different, the coding methods are similarly different. One of the big very important things in a dynamic language such as Python or Ruby is the interactive toplevel (what you get by typing, say python on the command line). This means that running a part of your program is very easy.
Even if you've written a large program before testing (which is a bad idea), it is hopefully separated into many functions. So, open up your interactive toplevel, do an import thing (for whatever thing happens to be) and then you can easily start testing your functions one by one, just calling them on the toplevel.
Of course, for a more mature project, you probably want to write out an actual test suite, and most languages have a method to do that (in Python, this is doctest and nose, don't know about other languages). At first, though, when you're writing something not particularly formal, just remember a few simple rules of debugging dynamic languages:
Start small. Don't write large programs and test them. Test each function as you write it, at least cursorily.
Use the toplevel. Running small pieces of code in a language like Python is extremely lightweight: fire up the toplevel and run it. Compare with writing a complete program and the compile-running it in, say, C++. Use that fact that you can quickly change the correctness of any function.
Debuggers are handy. But often, so are print statements. If you're only running a single function, debugging with print statements isn't that inconvenient, and also frees you from dragging along an IDE.

There's a lot of good advice here, i recommend going through some best practices:
http://github.com/edgecase/ruby_koans
http://blog.rubybestpractices.com/
http://on-ruby.blogspot.com/2009/01/ruby-best-practices-mini-interview-2.html
(and read Greg Brown's book, it's superb)
You talk about large scripts. A lot of my workflow is working out logic in irb or the python shell, then capturing them into a cascade of small, single-task focused methods, with appropriate tests (not 100% coverage, more focus on edge and corner cases).
http://binstock.blogspot.com/2008/04/perfecting-oos-small-classes-and-short.html

Related

Is it acceptable practice to unit-test a program in a different language?

I have a static library I created from C++, and would like to test this using a Driver code.
I noticed one of my professors like to do his tests using python, but he simply executes the program (not a library in this case, but an executable) using random test arguments.
I would like to take this approach, but I realized that this is a library and doesn't have a main function; that would mean I should either create a Driver.cpp class, or wrap the library into python using SWIG or boost python.
I’m planning to do the latter because it seems more fun, but logically, I feel that there is going to be more bugs when trying to wrap a library to a different language just to test it, rather than test it in its native language.
Is testing programs in a different language an accepted practice in the real world, or is this bad practice?

I'd say that it's best to test the API that your users will be exposed to. Other tests are good to have as well, but that's the most important aspect.
If your users are going to write C/C++ code linking to your library, then it would be good to have tests making use of your library the same way.
If you are going to ship a Python wrapper (why not?) then you should have Python tests.
Of course, there is a convenience aspect to this, as well. It may be easier to write tests in Python, and you might have time constraints that make it more appealing, etc.
I guess what I'm saying is: There's nothing inherently wrong with tests being in a different language from the code under test (that's totally normal for testing a REST API, for instance), but make sure you have tests for the public-facing API at a minimum.
Aside, on terminology:
I don't think the types of tests you are describing are "unit tests" in the usual sense of the term. Probably "functional test" would be more accurate.
A unit test typically tests a very small component - such as a function call - that might be one piece of larger functionality. Unit tests like these are often "white box" tests, so you can see the inner workings of your code.
Testing something from a user's point-of-view (such as your professor's commandline tests) are "black box" tests, and in these examples are at a more functional level rather than "unit" level.
I'm sure plenty of people may disagree with that, though - it's not a rigidly-defined set of terms.

A few things to keep in mind:
If you are writing tests as you code, then, by all means, use whatever language works best to give you rapid feedback. This enables fast test-code cycles (and is fun as well). BUT.
Always have well-written tests in the language of the consumer. How is your client/consumer going to call your functions? What language will they be using? Using the same language minimizes integration issues later on in the life-cycle.

It really depends on what it is you are trying to test. It almost always makes sense to write unit tests in the same language as the code you are testing so that you can construct the objects under test or invoke the functions under test, both of which can be most easily done in the same language, and verify that they work correctly. There are, however, cases in which it makes sense to use a different language, namely:
Integration tests that run a number of different components or applications together.
Tests that verify compilation or interpretation failures which could not be tested in the language, itself, since you are validating that an error occurs at the language level.
An example of #1 might be a program that starts up multiple different servers connected to each other, issues requests to the server, and verifies those responses. Or, as a simpler example, a program that simply forks an application under test as a subprocess and verifies that it produces the expected outputs for a given input.
An example of #2 might be a program that verifies that a certain piece of C++ code will produce a static assertion failure or that a particular template instantiation which is intentionally disallowed will result in a compilation failure if someone attempts to use it.
To answer your larger question, it is not bad practice per-se to write tests in a different language. Whatever makes the tests more convenient to write, easier to understand, more robust to changes in implementation, more sensitive to regressions, and better on any one of the properties that define good testing would be a good justification to write the tests one way vs another. If that means writing the tests in another language, then go for it. That being said, small unit tests typically need to be able to invoke the item under test directly which, in most cases, means writing the unit tests in the same language as the component under test.

I would say it depends on what you're actually trying to test. For true unit testing, it is, I think, best to test in the same language, or at least a binary-compatible language (i.e. testing Java with Groovy -- I use Spock in this case, which is Groovy based, to unit-test my Java code, since I can intermingle the Java with the Groovy), but if you are testing results, then I think it's fair to switch languages.
For example, I have tested the expected results when given a specific set of a data when running a Perl application via nose in Python. This works because I'm not unit testing the Perl code, per se, but the outcomes of that Perl code.
In that case, to unit test actual Perl functions that are part of the application, I would use a Perl-based test framework such as Test::More.

Why not, it's an awesome idea because you really understand that you are testing the unit like a black box.
Of course there may be technical issues involved, what if you need to mock some parts of the unit under test, that may be difficult in a different language.
This is a common practice for integration tests though, I've seen lots of programs driven from external tools such as a website from selenium, or an application from cucumber. Both those can be considered the same as a custom python script.
If you consider the difference between integration testing and unit testing is the number of things under test at any given time, the only reason why you shouldn't do this is tool support.

Why don't you need a powerful ide for writing Python? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
I have heard before that many Python developers don't use an IDE like Eclipse because it is unnecessary with a language like Python.
What are the reasons people use to justify this claim?

I'd say the main reason is because Python isn't horribly verbose like, e.g., Java. You don't need an IDE to generate 100s of lines of boilerplate because you don't need 100s of lines of boilerplate in Python. You tend to automate stuff within the language instead of further up the toolchain.
A second reason is that you don't need build process automation b/c there's no build process.

I'm going to risk offending some people and express something that I think a lot of python lovers will agree with: Java is so bloody cumbersome and verbose that one almost needs an IDE like Eclipse just to manage its unwieldy bloat.
With python, the main programming-specific features I want from my editor are syntax highlighting and a jump-to-definition command. Bonus points for a complementary return-from-jump command.
I find Geany does what I need, and is refreshingly light, quick, and stable compared to monster IDEs like Eclipse. For other suggestions, take a look at this question.

I know why you need (can benefit from) a good IDE - Rapid Application Development
Time is money :) And I'd much rather spend my time solving problems than typing every little piece of code in.

Two Main Reasons
Because of the dynamic typing and generally super-powerful functionality, there just isn't much extra typing that the IDE can do for you. Implementing a Java interface in a package is a lot of work and dozens of lines of boilerplate. In Python or Ruby it just doesn't have to be typed in the first place.
Because of the dynamic typing, the fancy editor doesn't have nearly as much information at its fingertips, and so the capability is reduced as well.
So the squeeze is top-down from the reduced need and bottom-up from the reduced editor capability, the net result being a really small application area. The benefit of using a fast and familiar day-to-day editor ends up higher than the benefit of the mostly-pointless IDE.
I suppose it's also possible that the categories are a bit fuzzy now. Vi(1) is the lightest-weight and fastest "plain" editor around, and yet vim(1) can now colorize every language under the sun and TextMate is classified as "lightweight", so all of the categories have really begun to merge a bit.

Python is dynamically typed and the way it handles modules as objects makes it impossible to determine what a name will resolve to at a certain time without actually running the code. Therefore, the 'tab completion' feature of IDEs is pretty useless.
Also, since Python doesn't have a build step, an IDE isn't needed to automate this. You can just fire up python app.py in a terminal and have much more control over how it runs.

It sounds like 'You don't need vehicle to go to work' to me. It might be true or not, depends on how far your workplace is.

IDE's assist in developer productivity and can equally apply to Python. The defining thing about an IDE is the ability to not have to "mode switch" between tasks such as editing, compiling, testing, running and debugging etc.

Python uses dynamic typing and interpreting, rather than compiling.
The interpreter itself will output comprehensive error messages, similar to Perl.
If you look at dynamically typed programming languages in general, you'll find that most of them are not really suitable for IDEs. RAD components (code completion, code generation, code templates, etc) can be included in almost any smart text editor, like Vim, Emacs, Gedit, or SciTE.
I use Vim and Gedit for most of my programming, and I find, that I don't need IDE-ish stuff other than that, what is already included in those text editors. When I program in Java, however, I use Eclipse most of the time, since keeping track of all those parts manually, would be too time consuming. I tend not to use IDE's for my C++ stuff, too, but when the projects grows beyond a certain size, I tend to use either Eclipse (CDT), NetBeans, Code::Blocks, or something like that.
So it's the family of languages itself, that make IDE's unnecessary, but it doesn't mean that working with IDE's with those languages, is bad practice.
Side Note: There's even a Lua environment for Eclipse. Out of all languages, Lua is probably one of the least that needs an IDE...

Well I use an IDE when programming in Python on my computer. Its easier that way . But when on the run or on university's terminal , I prefer terminal .

I'm still fairly new to Python and use an IDE with code completion but find myself rarely needing it, Python does a really good job of not having an uncessarily large number of verbose calls, as dsimcha pointed out above. I find that just using a basic IDE I can work efficiently in it and the fact that the code is a lot less cluttered without having brackets makes it easier to work with files that have a lot of lines of code (something that I found unbearable in PHP due to all its syntax clutter)
As far as #Postman's answer, I'm not sure that having an IDE makes RAD any faster, at least not in the case of python, its such a succinct language, the only thing that it would help in would be code completion, the way you answered it it sounds more like you are hinting at the use of a framework, which I believe is still very important in Python which does make RAD much more possible than otherwise.

The problems is IDEs dont work very well with dynamic languages.
The IDE cannot second guess runtime duck typing so other than some basic syntax checking and displaying the keywords in pretty colours they ar enot much help.
My personal experience is with groovy and eclipse where eclipse is actually pretty annoying. Method completion for a groovy object brings up about 200 posabilties, it constantly insert quotes and brackets exactly where you dont want them and messes up the syntax coloring whenever it encounters a reasonably complex regular expression. I would ditch eclipse except the majority of the code base is in Java where eclipse is useful.

Are there Python statical analysis/validation tools?

I've never been a huge Python fan. I learned it for a course where the teacher was really into it, but his enthusiasm never quite made it to the rest of our class it seems: as soon as we had the chance, we all jumped off to C#/Java.
Anyways. This wasn't a concluding experience, and what annoyed me the most in the language was that to find out if Python code would work, you actually have to execute it, and risk dying halfway through because of something stupid like a typo in a variable name (throwing up a NameError). Stuff that compilers for compiled languages catch at the very first glance, but that Python won't bother to complain about until it's too late. (I know you can always die half through a test with compiled programs too, but at least it won't be from a typo.)
I'm not really giving it a second chance yet, but for the sake of the next students, are there Python statical analysis or validation tools out there that would catch most errors (I understand you can't catch them all) compilers would catch at compile-time?

"but that Python won't bother to complain about until it's too late"
It's not that the message comes too late. It's that you're waiting too long to use Python. Don't type a mountain of code and then complain that one small piece is bad.
Use Unit Testing. Write less code before running a test.
Use python Interactively to experiment. You can do most statistical processing from the >>> prompt.
Don't write long, main-program-like scripts. Write short scripts -- in small pieces -- and test the small pieces.

Take a look at the following programs:
pylint
pyflakes
pychecker

In addition to the ones mentioned by ars.
Try Pydev, it has static code analysis build-in. Or Pida which has a couple of different static analysis tools available.
Or if you are looking for a standalone library, try Rope

Why use Python interactive mode?

When I first started reading about Python, all of the tutorials have you use Python's Interactive Mode. It is difficult to save, write long programs, or edit your existing lines (for me at least). It seems like a far more difficult way of writing Python code than opening up a code.py file and running the interpreter on that file.
python code.py
I am coming from a Java background, so I have ingrained expectations of writing and compiling files for programs. I also know that a feature would not be so prominent in Python documentation if it were not somehow useful. So what am I missing?

Let's see:
If you want to know how something works, you can just try it. There is no need to write up a file. I almost always scratch write my programs in the interpreter before coding them. It's not just for things that you don't know how they work in the programming language. I never remember what the correct arguments to range are to create, for example, [-2, -1, 0, 1]. I don't need to. I just have to fire up the interpreter and try stuff until I figure out it is range(-2, 2) (did that just now, actually).
You can use it as a calculator.
Python is a very introspective programming language. If you want to know anything about an object, you can just do dir(object). If you use IPython, you can even do object.<TAB> and it will tab-complete the methods and attributes of that object. That's way faster than looking stuff up in documentation or even in code.
help(anything) for documentation. It's way faster than any web interface.
Again, you have to use IPython (highly recommended), but you can time stuff. %timeit func1() and %timeit func2() is a common idiom to determine what is faster.
How often have you wanted to write a program to use once, and then never again. The fastest way to do this is to just do it in the Python interpreter. Sure, you have to be careful writing loops or functions (they must have the correct syntax the first time), but most stuff is just line by line, and you can play around with it.
Debugging. You don't need to put selective print statements in code to see what variables are when you write it in the interpreter. You just have to type >>> a, and it will show what a is. Nice again to see if you constructed something correctly. The building Python debugger pdb also uses the intrepeter functionality, so you can not only see what a variable is when debugging, but you can also manipulate or even change it without halting debugging.
When people say that Python is faster to develop in, I guarantee that this is a big part of what they are talking about.
Commenters: anything I am forgetting?

REPL Loops (like Python's interactive mode) provide immediate feedback to the programmer. As such, you can rapidly write and test small pieces of code, and assemble those pieces into a larger program.

You're talking about running Python in the console by simply typing "python"? That's just for little tests and for practicing with the language. It's very useful when learning the language and testing out other modules.
Of course any real software project is written in .py files and later executed by the interpreter!

The Python interpreter is a least common denominator: you can run it on multiple platforms, and it acts the same way (modulo platform-specific modules), so it's pretty easy to get a newbie going with.
It's a lot easier to tell a newbie to launch the interpreter and "do this" than to have them open a file, type in some code, save it, make it executable, make sure python is in your PATH, or use a #! line, etc etc. Scrap all of that and just launch the interpreter. For simple examples, you can't beat it. It was never meant for long programs, so if you were using it for that, you probably missed the part of the tutorial that told you "longer scripts go in a file". :)

you use the interactive interpreter to test snippets of your code before you put them into your script.

As already mentioned, the Python interactive interpreter gives a quick and dirty way to test simple Python functions and/or code snippets.
I personally use the Python shell as a very quick way to perform simple Numerical operations (provided by the math module). I have my environment setup, so that the math module is automatically imported whenever I start a Python shell. In fact, its a good way to "market" Python to non-Pythoniasts. Show them how they can use Python as a neat scientific calculator, and for simple mathematical prototyping.

One thing I use interactive mode for that others haven't mentioned: To see if a module is installed. Just fire up Python and try to import the module; if it dies, then your PYTHONPATH is broke or the module is not installed.
This is a great first step for "Hey, it's not working on my machine" or "Which Python did that get installed in, anyway" bugs.

I find the interactive interpreter very, very good for testing quick code, or to show others the Power of Python. Sometimes I use the interpreter as a handy calculator, too. It's amazing what you can do in a very short amount of time.
Aside from the built-in console, I also have to recommend Pyshell. It has auto-completion, and a decent syntax highlighting. You can also edit multiple lines of code at once. Of course, it's not perfect, but certainly better than the default python console.

When coding in Java, you almost always will have the API open in some browser window. However with the python interpreter, you can always import any module that you are thinking about using and check what it offers. You can also test the behavior of new methods that you are unsure of, to eliminate the "Oh! so THAT's how it works" as a source of bugs.

Interactive mode makes it easy to test code snippets before incorporating them into a larger program. If you use IDLE there's syntax highlighting and argument pop-ups to help you out. It's also a quick way of checking that you've figured out how to use a module without having to write a test program.

When to use the Python debugger

Since Python is a dynamic, interpreted language you don't have to compile your code before running it. Hence, it's very easy to simply write your code, run it, see what problems occur, and fix them. Using hotkeys or macros can make this incredibly quick.
So, because it's so easy to immediately see the output of your program and any errors that may occur, I haven't uses a debugger tool yet. What situations may call for using a real debugger vs. the method I currently use?
I'd like to know before I get into a situation and get frustrated because I don't know how to fix the problem.

In 30 years of programming I've used a debugger exactly 4 times. All four times were to read the core file produced from a C program crashing to locate the traceback information that's buried in there.
I don't think debuggers help much, even in compiled languages. Many people like debuggers, there are some reasons for using them, I'm sure, or people wouldn't lavish such love and care on them.
Here's the point -- software is knowledge capture.
Yes, it does have to run. More importantly, however, software has meaning.
This is not an indictment of your use of a debugger. However, I find that the folks who rely on debugging will sometimes produce really odd-looking code and won't have a good justification for what it means. They can only say "it may be a hack, but it works."
My suggestion on debuggers is "don't bother".
"But, what if I'm totally stumped?" you ask, "should I learn the debugger then?" Totally stumped by what? The language? Python's too simple for utter befuddlement. Some library? Perhaps.
Here's what you do -- with or without a debugger.
You have the source, read it.
You write small tests to exercise the library. Using the interactive shell, if possible. [All the really good libraries seem to show their features using the interactive Python mode -- I strive for this level of tight, clear simplicity.]
You have the source, add print functions.

I use pdb for basic python debugging. Some of the situations I use it are:
When you have a loop iterating over 100,000 entries and want to break at a specific point, it becomes really helpful.(conditional breaks)
Trace the control flow of someone else's code.
Its always better to use a debugger than litter the code with prints.
Normally there can be more than one point of failures resulting in a bug, all are not obvious in the first look. So you look for obvious places, if nothing is wrong there, you move ahead and add some more prints.. debugger can save you time here, you dont need to add the print and run again.

Usually when the error is buried in some function, but I don't know exactly what or where. Either I insert dozens of log.debug() calls and then have to take them back out, or just put in:
import pdb
pdb.set_trace ()
and then run the program. The debugger will launch when it reaches that point and give me a full REPL to poke around in.

Any time you want to inspect the contents of variables that may have caused the error. The only way you can do that is to stop execution and take a look at the stack.
pydev in Eclipse is a pretty good IDE if you are looking for one.

I find it very useful to drop into a debugger in a failing test case.
I add import pdb; pdb.set_trace() just before the failure point of the test. The test runs, building up a potentially quite large context (e.g. importing a database fixture or constructing an HTTP request). When the test reaches the pdb.set_trace() line, it drops into the interactive debugger and I can inspect the context in which the failure occurs with the usual pdb commands looking for clues as to the cause.

You might want to take a look at this other SO post:
Why is debugging better in an IDE?
It's directly relevant to what you're asking about.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.