When Does It Make Sense To Rewrite A Python Module in C?

When Does It Make Sense To Rewrite A Python Module in C? - python

In a game that I am writing, I use a 2D vector class which I have written to handle the speeds of the objects. This is called a large number of times every frame as there are a lot of objects on the screen, so any increase I can make in its speed will be useful.
It is pretty simple, consisting mostly of wrappers to the related math functions. It would be quite trivial to rewrite in C, but I am not sure whether doing so will make any significant difference as all it really does is call the underlying math functions, add, multiply or divide.
So, my question is under what circumstances does it make sense to rewrite in C? Where will you see a significant speed boost, and where can you see a reasonable speed boost without rewriting an extensive amount of the program?

If you're vector-munging, give numpy a try first. Chances are you will get speeds not far from C if you utilize numpy's vector manipulation functions wisely.
Other than that, your question is very heuristic. If your code is too slow:
Profile it - chances are you'll be able to improve it in Python
Use the correct optimized C-based libraries (numpy in your case)
Try psyco
Try rewriting parts with cython
If all else fails, rewrite in C

First measure then optimize

You should never optimize anything, be it in C or any other language, without timing your code before and after your optimization:
your clever optimization could in fact induce a slow down
optimizing something that takes 1% of the total execution time will never give you more than 1% performance
The common approach is:
profile your code
identify a hotspot
time this hotspot
optimize it
time the hotspot again, see if it's faster. If it's not goto 3.
If you can't find hotspots it could mean that your app is already optimized, or that you are not using the good algorithm for your problem. In both cases profiling helps understanding what your code does.
For profiling python code under Linux, you can use pyprof2calltree which works in conjunction with kcachegrind, and is totally awesome.

Common wisdom is "profile", "measure", etc. Well - maybe. Just get in the debugger and take 10 stackshots. If more than one of them terminates in your wrapper code, then it is costing more than 10% roughly, so you should consider re-doing it in C, to save that time. Chances are you will find other things also that are costing more than that.

A nice Profiler I use on Linux is pycallgraph - however, as your program gets bigger it starts to create much larger images which are harder to trace. I'm pretty sure you can exclude modules, though.

Related

Julia for image processing and speech recognition

I recently stumbled upon Julia Language and I was surprised to see their claims. It claims to be many folds faster than languages like Python, which I'm currently using for machine learning algorithms on speech recogniton.
Their claim for example Fibonacci sequence is about 30 times faster than in Python.
I dont understand what is that it makes it so much faster. However what I really want to know is whether someone has actually verified this. If that is the case I would move from Python to Julia Language for the speech recognizer I am building as I am hitting the slow threshold of my response and cannot afford to make it anymore slower.
Also are there any projects on github(or anywhere) where I can find Julia projects which do heavy number crunching like image processing, speech recognition etc.
I searched upon some other link such as this one , but could not decide effectively. I can run the programs and verify their claims, but if someone has already done so, it would be helpful to me.

It's not that surprising; Python made a number of decisions that make it a fairly slow language, and Guido van Rossum, its creator, says "It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++ rather than rewriting your entire system in a faster language, because for most of what you're doing, the speed of the language is irrelevant." As a general rule, any language that is concerned with speed will be faster than Python: C, C++, Ada, Java, Scala, Clojure, a number of other languages, all show up more then an order of magnitude faster in typical implementations than Python in benchmarks. Unless the authors of Julia have completely failed in their attempts to make a faster language, it will be faster than Python.

I dont understand what is that it makes it so much faster.
People often assume Julia is fast due to some sort of superior JIT compiler and that Python could be optimized in the same way since they are both dynamic languages.
However it is actually all due to clever language design choices. All functions in Julia are multiple dispatch unlike Python functions. That means at runtime Julia can pick one particular code implementation for every possible combination of arguments. Since types are immutable in Julia the compiled machine code handling a particular case of argument combinations can be cached.
In Julia all libraries have also been designed with type stability in mind, meaning for a given set of argument types a function will always return objects of the same type. E.g. so if you do a mathematical operation on two integers and it returns and integers, then most of the time the JIT compiler will know that it ALWAYS returns an integer and thus don't have to create lots of code to check for every possible case.
If you are uncertain about the performance of Julia, I suggest you actually look at some small code examples and play with the code generator. With #code_native, you can see the assembly code Julia generates for a function. You will find it is possible to emit basically the same code as a C compiler would do. Quite dynamic and complex looking code in Julia surprisingly often just turns into a few assembly code instructions.
The thing with Julia is that you can't just look at the performance of some random library existing doing what you want. It could be a quick port or not well optimized. It is easy to write slow Julia code if you have no idea what you are doing. The key thing is that Julia gives you quite good tools for writing highly optimized code.
If you need to you can typically manage to get performance close to C. But of course you have to spend some time optimizing your code. Julia has functions which allows you to analyze your functions and tell you were you did something causing potential performance problems.
Fortunately Julia performance benefits from lots of small functions, so you can typically analyze performance issues in very small isolated cases.

Regarding packages, there is a link to all registered "external packages" from the homepage. You may find some of the things you want there.

using cython or PyPy to optimise tuples/lists (graph theory algorithm implemented in python)

I am working on a theoretical graph theory problem which involves taking combinations of hyperedges in a hypergrapha to analyse the various cases.
I have implemented an initial version of the main algorithm in Python, but due to its combinatorial structure (and probably my implementation) the algorithm is quite slow.
One way I am considering speeding it up is by using either PyPy or Cython.
Looking at the documentation it seems Cython doesn't offer great speedup when it comes to tuples. This might be problematic for the implementation, since I am representing hyperedges as tuples - so the majority of the algorithm is in manipulating tuples (however they are all the same length, around len 6 each).
Since both my C and Python skills are quite minimal I would appreciate it if someone can advise what would be the best way to proceed in optimising the code given its reliance on tuples/lists. Is there a documentation of using lists/tuples with Cython (or PyPy)?

If your algorithm is bad in terms of computational complexity, then you cannot be saved, you need to write it better. Consult a good graph theory book or wikipedia, it's usually relatively easy, although there are some that have both non-trivial and crazy hard to implement algorithms. This sounds like a thing that PyPy can speed up quite significantly, but only by a constant factor, however it does not involve any modifications to your code. Cython does not speed up your code all that much without type declarations and it seems like this sort of problem cannot be really sped up just by types.
The constant part is what's crucial here - if the algorithm complexity grown like, say, 2^n (which is typical for a naive algorithm), then adding extra node to the graph doubles your time. This means 10 nodes add 1024 time time, 20 nodes 1024*1024 etc. If you're super-lucky, PyPy can speed up your algorithm by 100x, but this remains constant on the graph size (and you quickly run out of the universe time one way or another).

what would be the best way to proceed in optimising the code...
Profile first. There is a standard cProfile module that does simple profiling very well. Optimising your code before profiling is quite pointless.
Besides, for graphs you can try using the excellent networkx module. Also, if you deal with long sorted lists you can have a look at bisect and heapq modules.

Optimizing Python Code [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've been working on one of the coding challenges on InterviewStreet.com and I've run into a bit of an efficiency problem. Can anyone suggest where I might change the code to make it faster and more efficient?
Here's the code
Here's the problem statement if you're interested

If your question is about optimising python code generally (which I think it should be ;) then there are all sorts of intesting things you can do, but first:
You probably shouldn't be obsessively optimising python code! If you're using the fastest algorithm for the problem you're trying to solve and python doesn't do it fast enough you should probably be using a different language.
That said, there are several approaches you can take (because sometimes, you really do want to make python code faster):
Profile (do this first!)
There are lots of ways of profiling python code, but there are two that I'll mention: cProfile (or profile) module, and PyCallGraph.
cProfile
This is what you should actually use, though interpreting the results can be a bit daunting.
It works by recording when each function is entered or exited, and what the calling function was (and tracking exceptions).
You can run a function in cProfile like this:
import cProfile
cProfile.run('myFunction()', 'myFunction.profile')
Then to view the results:
import pstats
stats = pstats.Stats('myFunction.profile')
stats.strip_dirs().sort_stats('time').print_stats()
This will show you in which functions most of the time is spent.
PyCallGraph
PyCallGraph provides a prettiest and maybe the easiest way of profiling python programs -- and it's a good introduction to understanding where the time in your program is spent, however it adds significant execution overhead
To run pycallgraph:
pycallgraph graphviz ./myprogram.py
Simple! You get a png graph image as output (perhaps after a while...)
Use Libraries
If you're trying to do something in python that a module already exists for (maybe even in the standard library), then use that module instead!
Most of the standard library modules are written in C, and they will execute hundreds of times faster than equivilent python implementations of, say, bisection search.
Make the Interpreter do as Much of Your Work as You Can
The interpreter will do some things for you, like looping. Really? Yes! You can use the map, reduce, and filter keywords to significantly speed up tight loops:
consider:
for x in xrange(0, 100):
doSomethingWithX(x)
vs:
map(doSomethingWithX, xrange(0,100))
Well obviously this could be faster because the interpreter only has to deal with a single statement, rather than two, but that's a bit vague... in fact, this is faster for two reasons:
all flow control (have we finished looping yet...) is done in the interpreter
the doSomethingWithX function name is only resolved once
In the for loop, each time around the loop python has to check exactly where the doSomethingWithX function is! even with cacheing this is a bit of an overhead.
Remember that Python is an Interpreted Language
(Note that this section really is about tiny tiny optimisations that you shouldn't let affect your normal, readable coding style!)
If you come from a background of a programming in a compiled language, like c or Fortran, then some things about the performance of different python statements might be surprising:
try:ing is cheap, ifing is expensive
If you have code like this:
if somethingcrazy_happened:
uhOhBetterDoSomething()
else:
doWhatWeNormallyDo()
And doWhatWeNormallyDo() would throw an exception if something crazy had happened, then it would be faster to arrange your code like this:
try:
doWhatWeNormallyDo()
except SomethingCrazy:
uhOhBetterDoSomething()
Why? well the interpreter can dive straight in and start doing what you normally do; in the first case the interpreter has to do a symbol look up each time the if statement is executed, because the name could refer to something different since the last time the statement was executed! (And a name lookup, especially if somethingcrazy_happened is global can be nontrivial).
You mean Who??
Because of cost of name lookups it can also be better to cache global values within functions, and bake-in simple boolean tests into functions like this:
Unoptimised function:
def foo():
if condition_that_rarely_changes:
doSomething()
else:
doSomethingElse()
Optimised approach, instead of using a variable, exploit the fact that the interpreter is doing a name lookup on the function anyway!
When the condition becomes true:
foo = doSomething # now foo() calls doSomething()
When the condition becomes false:
foo = doSomethingElse # now foo() calls doSomethingElse()
PyPy
PyPy is a python implementation written in python. Surely that means it will run code infinitely slower? Well, no. PyPy actually uses a Just-In-Time compiler (JIT) to run python programs.
If you don't use any external libraries (or the ones you do use are compatible with PyPy), then this is an extremely easy way to (almost certainly) speed up repetitive tasks in your program.
Basically the JIT can generate code that will do what the python interpreter would, but much faster, since it is generated for a single case, rather than having to deal with every possible legal python expression.
Where to look Next
Of course, the first place you should have looked was to improve your algorithms and data structures, and to consider things like caching, or even whether you need to be doing so much in the first place, but anyway:
This page of the python.org wiki provides lots of information about how to speed up python code, though some of it is a bit out of date.
Here's the BDFL himself on the subject of optimising loops.
There are quite a few things, even from my own limited experience that I've missed out, but this answer was long enough already!
This is all based on my own recent experiences with some python code that just wasn't fast enough, and I'd like to stress again that I don't really think any of what I've suggested is actually a good idea, sometimes though, you have to....

First off, profile your code so you know where the problems lie. There are many examples of how to do this, here's one: https://codereview.stackexchange.com/questions/3393/im-trying-to-understand-how-to-make-my-application-more-efficient
You do a lot of indexed access as in:
for pair in range(i-1, j):
if coordinates[pair][0] >= 0 and coordinates[pair][1] >= 0:
Which could be written more plainly as:
for coord in coordinates[i-1:j]:
if coord[0] >= 0 and cood[1] >= 0:
List comprehensions are cool and "pythonic", but this code would probably run faster if you didn't create 4 lists:
N = int(raw_input())
coordinates = []
coordinates = [raw_input() for i in xrange(N)]
coordinates = [pair.split(" ") for pair in coordinates]
coordinates = [[int(pair[0]), int(pair[1])] for pair in coordinates]
I would instead roll all those together into one simple loop or if you're really dead set on list comprehensions, encapsulate the multiple transformations into a function which operates on the raw_input().

This answer shows how I locate code to optimize.
Suppose there is some line of code you could replace, and it is costing, say, 40% of the time.
Then it resides on the call stack 40% of the time.
If you take 10 samples of the call stack, it will appear on 4 of them, give or take.
It really doesn't matter how many samples show it.
If it appears on two or more, and if you can replace it, you will save whatever time it costs.

Most of the interview street problems seem to be tested in a way that will verify that you have found an algorithm with the right big O complexity rather than that you have coded the solution in the most optimal way possible.
In other words if you are failing some of the test cases due to running out of time the problem is likely that you need to figure out a solution with lower algorithmic complexity rather than micro-optimize the algorithm you have. This is why they generally state that N can be quite large.

Writing a CPU bound script to gauge rough CPU performance

I have wrote a script and running it on different machines. Script looks like below
def f(n):
x = None
while n:
x = simple_math(n)
n -= 1
return x
start = now()
f(BIGNUM)
print now() - start
At the end of the script it print how much time does it take to finish. Is this good enough to compare different machine for practical CPU speed for simple Python scripts?
By simple I mean it does not use multiprocessing module or any other technique to take advantage of multi-core machines.
This question is not about
making python programs run faster
multiprocessing module
GIL, I/O efficiency etc.
non cPython programs
Just that I want make sure if my approach to understand CPU performance among machines is fairly correct.

What's wrong with all of the countless existing benchmarks? The more sophisticated ones are propably a bit more robust. The major problems of your naive approach I - and I'm not an expert on this topic, mind you - can spot are:
Modern CPUs are highly complex and employ very clever optimizations. The speed of a purely CPU-bound can vary widely depending on how often the cache can help, how often the program causes pipeline stalls, how often branch prediction is correct, and propably many many more (these were just off the top of my head). Although many of these shouldn't make a difference when you use the same build of the same executable running the same script doing the same pure calculations, they can matter - to a degree none of us can predict - once you change any of these paramteres (e.g. using a different build because of a different OS or architecture).
Multi-threading OSs will never let a program occupy the CPU exclusively. There will always be some other program running at the same time stealing time, and you can't really know how much of the x seconds were spent running your program and how many were spent on other programs. At the very least, you should run a program many times and take the minimum time as the time it takes with relatively little inference from other programs. And even then, you need to have about the same system load in both benchmarks to make the numbers somewhat meaningful.
At least CPython won't multi-thread, so you only get the speed of one core.
But since your requirements seem to be "very rough estimate of CPU speed only, in full awareness that these numbers can't be used for anything except putting CPU speed into orders of magnitude, must be taken with a grain of salt even then and don't tell anything about the actual performance of any real applications", it might be okay - just don't consider it anywhere close to accurate. Still, why not use a hardened benchmark suite that already put some effort into mitigating (not removing - nobody can do that) these problems?
Also note that the timeit stdlib module is both easier to use than manually wielding the stopwatch and tries (not too hard, but it's a start) to fix the second point by the method I mentioned.

You can get a rough idea by using these type of methods. But that will not be exact measurement. The execution time of the script will depend on many other things other than CPU speed, like OS and interpreter version used, current system load, memory speed etc. etc. My suggestion is not to depend on this.
EDIT: Just a note. When it comes to performance, many people think only about CPU speed, but actually performance can be hampered by almost everything on the system. For example you have a high speed CPU but low RAM (both in size and speed), then you will get no performance boost up for the CPU.

In essence: No.
Benchmarking is a very difficult problem which usually is not worth solving yourself. It all depends on why you care. Your method will surely give a very rough estimate on if System A is better than System B, but really only when the outcome is vastly different.
What you're trying to do is determine how Real World Application X will perform on different computers. Very rarely is a real world application approximated by a loop of simple math. Even when it is (scientific computing mostly) you're better off measuring times on the actual program.
Real world applications are usually non-linear, and difficult to measure and simulate. Its really one of those problems which has been solved by someone else much better than you could reasonably solve yourself.
If you want a very rough estimate of performance, sure do it your way. Just don't put too much faith in the results because they will be far from what you might call "scientific"

If I understand your intention correctly (and you could clarify it a bit - what is it exactly that you are trying to measure or estimate, processor speed, code speed, something else and for what purpose, but if I understand you then) why not check how is it done in timeit

Performance differences between Python and C

Working on different projects I have the choice of selecting different programming languages, as long as the task is done.
I was wondering what the real difference is, in terms of performance, between writing a program in Python, versus doing it in C.
The tasks to be done are pretty varied, e.g. sorting textfiles, disk access, network access, textfile parsing.
Is there really a noticeable difference between sorting a textfile using the same algorithm in C versus Python, for example?
And in your experience, given the power of current CPU's (i7), is it really a noticeable difference (Consider that its a program that doesnt bring the system to its knees).

Use python until you have a performance problem. If you ever have one figure out what the problem is (often it isn't what you would have guessed up front). Then solve that specific performance problem which will likely be an algorithm or data structure change. In the rare case that your problem really needs C then you can write just that portion in C and use it from your python code.

C will absolutely crush Python in almost any performance category, but C is far more difficult to write and maintain and high performance isn't always worth the trade off of increased time and difficulty in development.
You say you're doing things like text file processing, but what you omit is how much text file processing you're doing. If you're processing 10 million files an hour, you might benefit from writing it in C. But if you're processing 100 files an hour, why not use python? Do you really need to be able to process a text file in 10ms vs 50ms? If you're planning for the future, ask yourself, "Is this something I can just throw more hardware at later?"
Writing solid code in C is hard. Be sure you can justify that investment in effort.

In general IO bound work will depend more on the algorithm then the language. In this case I would go with Python because it will have first class strings and lots of easy to use libraries for manipulating files, etc.

Is there really a noticeable difference between sorting a textfile using the same algorithm in C versus Python, for example?
Yes.
The noticeable differences are these
There's much less Python code.
The Python code is much easier to read.
Python supports really nice unit testing, so the Python code tends to be higher quality.
You can write the Python code more quickly, since there are fewer quirky language features. No preprocessor, for example, really saves a lot of hacking around. Super-experience C programmers hardly notice it. But all that #include sandwich stuff and making the .h files correct is remarkably time-consuming.
Python can be easier to package and deploy, since you don't need a big fancy make script to do a build.

The first rule of computer performance questions: Your mileage will vary. If small performance differences are important to you, the only way you will get valid information is to test with your configuration, your data, and your benchmark. "Small" here is, say, a factor of two or so.
The second rule of computer performance questions: For most applications, performance doesn't matter -- the easiest way to write the app gives adequate performance, even when the problem scales. If that is the case (and it is usually the case) don't worry about performance.
That said:
C compiles down to machine executable and thus has the potential to execute as at least as fast as any other language
Python is generally interpreted and thus may take more CPU than a compiled language
Very few applications are "CPU bound." I/O (to disk, display, or memory) is not greatly affected by compiled vs interpreted considerations and frequently is a major part of computer time spent on an application
Python works at a higher level of abstraction than C, so your development and debugging time may be shorter
My advice: Develop in the language you find the easiest with which to work. Get your program working, then check for adequate performance. If, as usual, performance is adequate, you're done. If not, profile your specific app to find out what is taking longer than expected or tolerable. See if and how you can fix that part of the app, and repeat as necessary.
Yes, sometimes you might need to abandon work and start over to get the performance you need. But having a working (albeit slow) version of the app will be a big help in making progress. When you do reach and conquer that performance goal you'll be answering performance questions in SO rather than asking them.

If your text files that you are sorting and parsing are large, use C. If they aren't, it doesn't matter. You can write poor code in any language though. I have seen simple code in C for calculating areas of triangles run 10x slower than other C code, because of poor memory management, use of structures, pointers, etc.
Your I/O algorithm should be independent of your compute algorithm. If this is the case, then using C for the compute algorithm can be much faster.

(Assumption - The question implies that the author is familiar with C but not Python, therefore I will base my answer with that in mind.)
I was wondering what the real
difference is, in terms of
performance, between writing a program
in Python, versus doing it in C.
C will almost certainly be faster unless it is implemented poorly, but the real questions are:
What are the development implications
(development time, maintenance, etc.)
for either implementation?
Is the performance benefit significant?
Learning Python can take some time, but there are Python modules that can greatly speed development time. For example, the csv module in Python makes reading and writing csv easy. Also, Python strings, arrays, maps, and other objects make it more flexible than plain C and more elegant, in my opinion, than the equivalent C++. Some things like network access may be much quicker to develop in Python as well.
However, it may take time to learn how to program Python well enough to accomplish your task. Since you are concerned with performance, I suggest trying a simple task, such as sorting a text file, in both C and Python. That will give you a better baseline on both languages in terms of performance, development time, and possibly maintenance.

It really depends a lot on what your doing and if the algorithm in question is available in Python via a natively compiled library. If it is, then I believe you'll be looking at performance numbers close enough that Python is most likely your answer -- assuming it's your preferred language. If you must implement the algorithm yourself, depending on the amount of logic required and the size of your data set, C/C++ may be the better option. It's hard to provide a less nebulous answer without more information.

To get an idea of the raw difference in speed, check out the Computer Languages Benchmark Game.
Then you have to decide whether that difference matters to you.
Personally, I ended up deciding that it did, but most of the time instead of using C, I ended up using other higher-level languages. Personally I mostly use Scala, but Haskell and C# and Java each have their advantages also.

Across all programs, it isn't really possible to say whether things will be quicker or slower on average in Python or C.
For the programs that I've implemented in both languages, using similar algorithms, I've seen no improvement (and sometimes a performance degradation) for string- and IO-heavy code, when reimplementing python code in C. The execution time is dominated by allocation and manipulation of strings (which functionality python implements very efficiently) and waiting for IO operations (which incurs the same overhead in either language), so the extra overhead of python makes very little difference.
But for programs that do even simple operations on image files, say (images being large enough for processing time to be noticeable compared to IO), C is enormously quicker. For this sort of task the bulk of the time running the python code is spent doing Python Stuff, and this dwarfs the time spent on the underlying operations (multiply, add, compare, etc.). When reimplemented as C, the bureaucracy goes away, the computer spends its time doing real honest work, and for that reason the thing runs much quicker.
It's not uncommon for the python code to run in (say) 5 seconds where the C code runs in (say) 0.05. So that's a 100x increase -- but in absolute terms, this is not so big a deal. It takes so much less longer to write python code than it does to write C code that your program would have to be run some huge number of times to turn a time profit. I often reimplement in C, for various reasons, but if you don't have this requirement then it's probably not worth bothering. You won't get that part of your life back, and next year computers will be quicker.

Actually you can solve most of your tasks efficiently with python.
You just should know which tools to use. For text processing there is brilliant package from Egenix guys - http://www.egenix.com/products/python/mxBase/mxTextTools/. I was able to create very efficient parsers with it in python, since all the heavy lifting is done by native code.
Same approach goes for any other problem - if you have performance problems, get a C/C++ library with Python interface which implements whatever bottleneck you got efficiently.

C is definitely faster than Python because Python is written in C.
C is middle level language and hence faster but there not much a great difference between C & Python regarding executable time it takes.
but it is really very easy to write code in Python than C and it take much shorter time to write code and learn Python than C.
Because its easy to write its easy to test also.

You will find C is much slower. Your developers will have to keep track of memory allocation, and use libraries (such as glib) to handle simple things such as dictionaries, or lists, which python has built-in.
Moreover, when an error occurs, your C program will typically just crash, which means you'll need to get the error to happen in a debugger. Python would give you a stack trace (typically).
Your code will be bigger, which means it will contain more bugs. So not only will it take longer to write, it will take longer to debug, and will ship with more bugs. This means that customers will notice the bugs more often.
So your developers will spend longer fixing old bugs and thus new features will get done more slowly.
In the mean-time, your competitors will be using a sensible programming language and their products will be increasing in features and usability, rapidly yours will look bad. Your customers will leave and you'll go out of business.

The excess time to write the code in C compared to Python will be exponentially greater than the difference between C and Python execution speed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.