How to profile combined python and c code

How to profile combined python and c code - python

I have an application that consists of multiple python scripts. Some of these scripts are calling C code. The application is now running much slower than it was, so I would like to profile it to see where the problem lies. Is there a tool, software package or just a way to profile such an application? A tool that will follow the python code into the C code and profile these calls as well?
Note 1: I am well aware of the standard Python profiling tools. I'm specifically looking here for combined Python/C profiling.
Note 2: the Python modules are calling C code using ctypes (see http://docs.python.org/library/ctypes.html for details).
Thanks!

Stackshots work. Since you have combined Python and C you can handle them separately. For Python, you can hit Ctrl-C while it's being slow to examine the stack. Do this several times. That will expose anything you can fix in the python code. For the C code, run the whole thing under a debugger like GDB and hit Ctrl-C to get a stack trace in C. Several of those will expose anything you can fix in the C code. I'm told OProfile can also do this. (Another way is to use lsstack if it is available.)
This is a little-known method that works on this principle: Suppose you have an infinite loop or a nearly infinite loop. How would you find it? You would halt the program and see what it was doing, right? Suppose the program only took twice as long as necessary. Each time you halted it, the chance that you would catch it doing the unnecessary thing is 50%. So all you have to do is halt it a number of times. As soon as you see it doing something that could be improved, on as few as 2 samples, you know you can fix that for a healthy speedup. Then you can repeat it to get the next problem. Measuring is not the point. Catching things you can improve is the point.

The combination would be pretty hard, but you can use some of the standard profilers like valgrind, gprof or even oprofile (although I never managed to get meaningful output out of it).

Related

Is writing a daemon in Python a good idea?

I have to write a daemon program that constantly runs in the background and performs some simple tasks. The logic is not complicated at all, however it has to run for extended periods of time and be stable.
I think C++ would be a good choice for writing this kind of application, however I'm also considering Python since it's easier to write and test something quickly in it.
The problem that I have with Python is that I'm not sure how its runtime environment is going to behave over extended periods of time. Can it eat up more and more memory because of some GC quirks? Can it crash unexpectedly? I've never written daemons in Python before, so if anyone here did, please share your experience. Thanks!

I've written a number of daemons in Python for my last company. The short answer is, it works just fine. As long as the code itself doesn't have some huge memory bomb, I've never seen any gradual degradation or memory hogging. Be mindful of anything in the global or class scopes, because they'll live on, so use del more liberally than you might normally. Otherwise, like I said, no issues I can personally report.
And in case you're wondering, they ran for months and months (let's say 6 months usually) between routine reboots with zero problems.

Yes it can leak. Yes it can crash unexpectedly. Anything can.
I'd say you're far more likely to end up accidentally leaking in an environment with manual memory management (e.g. C++) than you are with something like Python.
As for crashing unexpectedly, well, chances are an arbitrary lump of Python might be more likely to crash unexpectedly than an arbitrary lump of Java, because the latter benefits from static typing where you can catch a whole load of errors at compile time, that Python with its duck typing and other forms of flexibility.
Realistically, Python sounds a perfectly reasonable choice for what you want to do. Take a look at something like Twisted for a decent engine to build things around, or at least for an idea of structure (your question sounds like some sort of school assignment, so I'm not sure how much freedom of implementation you get)

I've written many things in C/C++ and Perl that are initiated when a LINUX box O.S. boots, launching them using the rc.d.
Also I've written a couple of java and python scripts that are started the same way I've mentioned above, but I needed a little shell-script (.sh file) to launch them and I used rc.5.
Let me tell you that your concerns about their runtime environments are completely valid, you will have to be careful about wich runlevel you'll use... (only from rc.2 to rc.5, because rc.1 and rc.6 are for the System).
If the runlevel is too low, the python runtime might not be up at the time you are launching your program and it could flop. e.g.: In a LAMP Server MySQL and Apache are started in rc.3 where the Network is already available.
I think your best shot is to make your script in python and launch it using a .sh file from rc.5.
Good luck!

Why use Python interactive mode?

When I first started reading about Python, all of the tutorials have you use Python's Interactive Mode. It is difficult to save, write long programs, or edit your existing lines (for me at least). It seems like a far more difficult way of writing Python code than opening up a code.py file and running the interpreter on that file.
python code.py
I am coming from a Java background, so I have ingrained expectations of writing and compiling files for programs. I also know that a feature would not be so prominent in Python documentation if it were not somehow useful. So what am I missing?

Let's see:
If you want to know how something works, you can just try it. There is no need to write up a file. I almost always scratch write my programs in the interpreter before coding them. It's not just for things that you don't know how they work in the programming language. I never remember what the correct arguments to range are to create, for example, [-2, -1, 0, 1]. I don't need to. I just have to fire up the interpreter and try stuff until I figure out it is range(-2, 2) (did that just now, actually).
You can use it as a calculator.
Python is a very introspective programming language. If you want to know anything about an object, you can just do dir(object). If you use IPython, you can even do object.<TAB> and it will tab-complete the methods and attributes of that object. That's way faster than looking stuff up in documentation or even in code.
help(anything) for documentation. It's way faster than any web interface.
Again, you have to use IPython (highly recommended), but you can time stuff. %timeit func1() and %timeit func2() is a common idiom to determine what is faster.
How often have you wanted to write a program to use once, and then never again. The fastest way to do this is to just do it in the Python interpreter. Sure, you have to be careful writing loops or functions (they must have the correct syntax the first time), but most stuff is just line by line, and you can play around with it.
Debugging. You don't need to put selective print statements in code to see what variables are when you write it in the interpreter. You just have to type >>> a, and it will show what a is. Nice again to see if you constructed something correctly. The building Python debugger pdb also uses the intrepeter functionality, so you can not only see what a variable is when debugging, but you can also manipulate or even change it without halting debugging.
When people say that Python is faster to develop in, I guarantee that this is a big part of what they are talking about.
Commenters: anything I am forgetting?

REPL Loops (like Python's interactive mode) provide immediate feedback to the programmer. As such, you can rapidly write and test small pieces of code, and assemble those pieces into a larger program.

You're talking about running Python in the console by simply typing "python"? That's just for little tests and for practicing with the language. It's very useful when learning the language and testing out other modules.
Of course any real software project is written in .py files and later executed by the interpreter!

The Python interpreter is a least common denominator: you can run it on multiple platforms, and it acts the same way (modulo platform-specific modules), so it's pretty easy to get a newbie going with.
It's a lot easier to tell a newbie to launch the interpreter and "do this" than to have them open a file, type in some code, save it, make it executable, make sure python is in your PATH, or use a #! line, etc etc. Scrap all of that and just launch the interpreter. For simple examples, you can't beat it. It was never meant for long programs, so if you were using it for that, you probably missed the part of the tutorial that told you "longer scripts go in a file". :)

you use the interactive interpreter to test snippets of your code before you put them into your script.

As already mentioned, the Python interactive interpreter gives a quick and dirty way to test simple Python functions and/or code snippets.
I personally use the Python shell as a very quick way to perform simple Numerical operations (provided by the math module). I have my environment setup, so that the math module is automatically imported whenever I start a Python shell. In fact, its a good way to "market" Python to non-Pythoniasts. Show them how they can use Python as a neat scientific calculator, and for simple mathematical prototyping.

One thing I use interactive mode for that others haven't mentioned: To see if a module is installed. Just fire up Python and try to import the module; if it dies, then your PYTHONPATH is broke or the module is not installed.
This is a great first step for "Hey, it's not working on my machine" or "Which Python did that get installed in, anyway" bugs.

I find the interactive interpreter very, very good for testing quick code, or to show others the Power of Python. Sometimes I use the interpreter as a handy calculator, too. It's amazing what you can do in a very short amount of time.
Aside from the built-in console, I also have to recommend Pyshell. It has auto-completion, and a decent syntax highlighting. You can also edit multiple lines of code at once. Of course, it's not perfect, but certainly better than the default python console.

When coding in Java, you almost always will have the API open in some browser window. However with the python interpreter, you can always import any module that you are thinking about using and check what it offers. You can also test the behavior of new methods that you are unsure of, to eliminate the "Oh! so THAT's how it works" as a source of bugs.

Interactive mode makes it easy to test code snippets before incorporating them into a larger program. If you use IDLE there's syntax highlighting and argument pop-ups to help you out. It's also a quick way of checking that you've figured out how to use a module without having to write a test program.

Debugging a scripting language like ruby

I am basically from the world of C language programming, now delving into the world of scripting languages like Ruby and Python.
I am wondering how to do debugging.
At present the steps I follow is,
I complete a large script,
Comment everything but the portion I
want to check
Execute the script
Though it works, I am not able to debug like how I would do in, say, a VC++ environment or something like that.
My question is, is there any better way of debugging?
Note: I guess it may be a repeated question, if so, please point me to the answer.

Your sequence seems entirely backwards to me. Here's how I do it:
I write a test for the functionality I want.
I start writing the script, executing bits and verifying test results.
I review what I'd done to document and publish.
Specifically, I execute before I complete. It's way too late by then.
There are debuggers, of course, but with good tests and good design, I've almost never needed one.

Here's a screencast on ruby debugging with ruby-debug.

Seems like the problem here is that your environment (Visual Studio) doesn't support these languages, not that these languages don't support debuggers in general.
Perl, Python, and Ruby all have fully-featured debuggers; you can find other IDEs that help you, too. For Ruby, there's RubyMine; for Perl, there's Komodo. And that's just off the top of my head.

There is a nice gentle introduction to the Python debugger here

If you're working with Python then you can find a list of debugging tools here to which I just want to add Eclipse with the Pydev extension, which makes working with breakpoints etc. also very simple.

My question is, is there any better way of debugging?"
Yes.
Your approach, "1. I complete a large script, 2. Comment everything but the portion I want to check, 3. Execute the script" is not really the best way to write any software in any language (sorry, but that's the truth.)
Do not write a large anything. Ever.
Do this.
Decompose your problem into classes of objects.
For each class, write the class by
2a. Outline the class, focus on the external interface, not the implementation details.
2b. Write tests to prove that interface works.
2c. Run the tests. They'll fail, since you only outlined the class.
2d. Fix the class until it passes the test.
2e. At some points, you'll realize your class designs aren't optimal. Refactor your design, assuring your tests still pass.
Now, write your final script. It should be short. All the classes have already been tested.
3a. Outline the script. Indeed, you can usually write the script.
3b. Write some test cases that prove the script works.
3c. Runt the tests. They may pass. You're done.
3d. If the tests don't pass, fix things until they do.
Write many small things. It works out much better in the long run that writing a large thing and commenting parts of it out.

Script languages have no differences compared with other languages in the sense that you still have to break your problems into manageable pieces -- that is, functions. So, instead of testing the whole script after finishing the whole script, I prefer to test those small functions before integrating them. TDD always helps.

There's a SO question on Ruby IDEs here - and searching for "ruby IDE" offers more.
I complete a large script
That's what caught my eye: "complete", to me, means "done", "finished", "released". Whether or not you write tests before writing the functions that pass them, or whether or not you write tests at all (and I recommend that you do) you should not be writing code that can't be run (which is a test in itself) until it's become large. Ruby and Python offer a multitude of ways to write small, individually-testable (or executable) pieces of code, so that you don't have to wait for (?) days before you can run the thing.
I'm building a (Ruby) database translation/transformation script at the moment - it's up to about 1000 lines and still not done. I seldom go more than 5 minutes without running it, or at least running the part on which I'm working. When it breaks (I'm not perfect, it breaks a lot ;-p) I know where the problem must be - in the code I wrote in the last 5 minutes. Progress is pretty fast.
I'm not asserting that IDEs/debuggers have no place: some problems don't surface until a large body of code is released: it can be really useful on occasion to drop the whole thing into a debugging environment to find out what is going on. When third-party libraries and frameworks are involved it can be extremely useful to debug into their code to locate problems (which are usually - but not always - related to faulty understanding of the library function).

You can debug your Python scripts using the included pdb module. If you want a visual debugger, you can download winpdb - don't be put off by that "win" prefix, winpdb is cross-platform.

The debugging method you described is perfect for a static language like C++, but given that the language is so different, the coding methods are similarly different. One of the big very important things in a dynamic language such as Python or Ruby is the interactive toplevel (what you get by typing, say python on the command line). This means that running a part of your program is very easy.
Even if you've written a large program before testing (which is a bad idea), it is hopefully separated into many functions. So, open up your interactive toplevel, do an import thing (for whatever thing happens to be) and then you can easily start testing your functions one by one, just calling them on the toplevel.
Of course, for a more mature project, you probably want to write out an actual test suite, and most languages have a method to do that (in Python, this is doctest and nose, don't know about other languages). At first, though, when you're writing something not particularly formal, just remember a few simple rules of debugging dynamic languages:
Start small. Don't write large programs and test them. Test each function as you write it, at least cursorily.
Use the toplevel. Running small pieces of code in a language like Python is extremely lightweight: fire up the toplevel and run it. Compare with writing a complete program and the compile-running it in, say, C++. Use that fact that you can quickly change the correctness of any function.
Debuggers are handy. But often, so are print statements. If you're only running a single function, debugging with print statements isn't that inconvenient, and also frees you from dragging along an IDE.

There's a lot of good advice here, i recommend going through some best practices:
http://github.com/edgecase/ruby_koans
http://blog.rubybestpractices.com/
http://on-ruby.blogspot.com/2009/01/ruby-best-practices-mini-interview-2.html
(and read Greg Brown's book, it's superb)
You talk about large scripts. A lot of my workflow is working out logic in irb or the python shell, then capturing them into a cascade of small, single-task focused methods, with appropriate tests (not 100% coverage, more focus on edge and corner cases).
http://binstock.blogspot.com/2008/04/perfecting-oos-small-classes-and-short.html

What is the feasibility of porting a legacy C program to Python?

I have a program in C that communicates via UDP with another program (in Java) and then does process manipulation (start/stop) based on the UDP pkt exchange.
Now this C program has been legacy and I want to convert it to Python - do you think Python will be a good choice for the tasks mentioned?

Yes, I do think that Python would be a good replacement. I understand that the Twisted Python framework is quite popular.

I'd say that if:
Your C code contains no platform specific requirements
You are sure speed is not going to be an issue going from C to python
You have a desire to not compile anymore
You would like to try utilise exception handling
You want to dabble in OO
You might choose to run on many platforms without porting
You are curious about dynamic typing
You want memory handled for you
You know or want to learn python
Then sure, why not.
There doesn't seem to be any technical reason you shouldn't use python here, so it's a preference in this case.

Remember as well, you can leave parts of your program in C, turn them into Python modules and build python code around them - you don't need to re-write everything up-front.

Assuming that you have control over the environment which this application will run, and that the performance of interpreted language (python) compared to a compiled one (C) can be ignored, I believe Python is a great choice for this.

If I was faced with a similar situation I'd ask myself a couple of questions:
Is there anything more important I could be working on?
Does Python bring anything to the table that is currently handled poorly by the current application?
Will this allow me to add functionality that was previously too difficult to implement?
Is this going to disrupt service in any way?
If I can't answer those satisfactorily, then I'd put off the rewrite.

Yes, I think Python is a good choice, if all your platforms support it. Since this is a network program, I'm assuming the network is your runtime bottleneck? That's likely to still be the case in Python. If you really do need to speed it up, you can include your long-since-debugged, speedy C as Python modules.

If this is an embedded program, then it might be a problem to port it since Python programs typically rely on the Python runtime and library, and those are fairly large. Especially when compared to a C program doing a well-defined task. Of course, it's likely you've already considered that aspect, but I wanted to mention it in the context of the question anyway, since I feel it's an important aspect when doing this type of comparison.

When to use the Python debugger

Since Python is a dynamic, interpreted language you don't have to compile your code before running it. Hence, it's very easy to simply write your code, run it, see what problems occur, and fix them. Using hotkeys or macros can make this incredibly quick.
So, because it's so easy to immediately see the output of your program and any errors that may occur, I haven't uses a debugger tool yet. What situations may call for using a real debugger vs. the method I currently use?
I'd like to know before I get into a situation and get frustrated because I don't know how to fix the problem.

In 30 years of programming I've used a debugger exactly 4 times. All four times were to read the core file produced from a C program crashing to locate the traceback information that's buried in there.
I don't think debuggers help much, even in compiled languages. Many people like debuggers, there are some reasons for using them, I'm sure, or people wouldn't lavish such love and care on them.
Here's the point -- software is knowledge capture.
Yes, it does have to run. More importantly, however, software has meaning.
This is not an indictment of your use of a debugger. However, I find that the folks who rely on debugging will sometimes produce really odd-looking code and won't have a good justification for what it means. They can only say "it may be a hack, but it works."
My suggestion on debuggers is "don't bother".
"But, what if I'm totally stumped?" you ask, "should I learn the debugger then?" Totally stumped by what? The language? Python's too simple for utter befuddlement. Some library? Perhaps.
Here's what you do -- with or without a debugger.
You have the source, read it.
You write small tests to exercise the library. Using the interactive shell, if possible. [All the really good libraries seem to show their features using the interactive Python mode -- I strive for this level of tight, clear simplicity.]
You have the source, add print functions.

I use pdb for basic python debugging. Some of the situations I use it are:
When you have a loop iterating over 100,000 entries and want to break at a specific point, it becomes really helpful.(conditional breaks)
Trace the control flow of someone else's code.
Its always better to use a debugger than litter the code with prints.
Normally there can be more than one point of failures resulting in a bug, all are not obvious in the first look. So you look for obvious places, if nothing is wrong there, you move ahead and add some more prints.. debugger can save you time here, you dont need to add the print and run again.

Usually when the error is buried in some function, but I don't know exactly what or where. Either I insert dozens of log.debug() calls and then have to take them back out, or just put in:
import pdb
pdb.set_trace ()
and then run the program. The debugger will launch when it reaches that point and give me a full REPL to poke around in.

Any time you want to inspect the contents of variables that may have caused the error. The only way you can do that is to stop execution and take a look at the stack.
pydev in Eclipse is a pretty good IDE if you are looking for one.

I find it very useful to drop into a debugger in a failing test case.
I add import pdb; pdb.set_trace() just before the failure point of the test. The test runs, building up a potentially quite large context (e.g. importing a database fixture or constructing an HTTP request). When the test reaches the pdb.set_trace() line, it drops into the interactive debugger and I can inspect the context in which the failure occurs with the usual pdb commands looking for clues as to the cause.

You might want to take a look at this other SO post:
Why is debugging better in an IDE?
It's directly relevant to what you're asking about.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.