Preserve code readability while optimising

Preserve code readability while optimising - python

I am writing a scientific program in Python and C with some complex physical simulation algorithms. After implementing algorithm, I found that there are a lot of possible optimizations to improve performance. Common ones are precalculating values, getting calculations out of cycle, replacing simple matrix algorithms with more complex and other. But there arises a problem. Unoptimized algorithm is much slower, but its logic and connection with theory look much clearer and readable. Also, it's harder to extend and modify optimized algorithm.
So, the question is - what techniques should I use to keep readability while improving performance? Now I am trying to keep both fast and clear branches and develop them in parallel, but maybe there are better methods?

Just as a general remark (I'm not too familiar with Python): I would suggest you make sure that you can easily exchange the slow parts of the 'reference implementation' with the 'optimized' parts (e.g., use something like the Strategy pattern).
This will allow you to cross-validate the results of the more sophisticated algorithms (to ensure you did not mess up the results), and will keep the overall structure of the simulation algorithm clear (separation of concerns). You can place the optimized algorithms into separate source files / folders / packages and document them separately, in as much detail as necessary.
Apart from this, try to avoid the usual traps: don't do premature optimization (check if it is actually worth it, e.g. with a profiler), and don't re-invent the wheel (look for available libraries).

Yours is a very good question that arises in almost every piece of code, however simple or complex, that's written by any programmer who wants to call himself a pro.
I try to remember and keep in mind that a reader newly come to my code has pretty much the same crude view of the problem and the same straightforward (maybe brute force) approach that I originally had. Then, as I get a deeper understanding of the problem and paths to the solution become clearer, I try to write comments that reflect that better understanding. I sometimes succeed and those comments help readers and, especially, they help me when I come back to the code six weeks later. My style is to write plenty of comments anyway and, when I don't (because: a sudden insight gets me excited; I want to see it run; my brain is fried), I almost always greatly regret it later.
It would be great if I could maintain two parallel code streams: the naïve way and the more sophisticated optimized way. But I have never succeeded in that.
To me, the bottom line is that if I can write clear, complete, succinct, accurate and up-to-date comments, that's about the best I can do.
Just one more thing that you know already: optimization usually doesn't mean shoehorning a ton of code onto one source line, perhaps by calling a function whose argument is another function whose argument is another function whose argument is yet another function. I know that some do this to avoid storing a function's value temporarily. But it does very little (usually nothing) to speed up the code and it's a bitch to follow. No news to you, I know.

It is common to assume you must give up readability to get performance.
That's not necessarily so.
You need to find out What exactly is it spending much time doing, and why?
Notice, I didn't say you need to do any measuring.
Here's an example of what I mean.
Chances are very good that you can do some simple changes to avoid waste motion, but don't fix anything until the program itself has told you what to fix.

def whatYouShouldDo(servings, integration_method=oven):
"""
Make chicken soup
"""
# Comments:
# They are important. With some syntax highlighting, the comments are
# the first thing a new programmer will look for. Therefore, they should
# motivate your algorithm, outline it, and break it up into stages.
# You can MAKE IT FEEL AS IF YOU ARE READING TEXT, interspersing code
# amongst the text.
#
# Algorithm overview:
# To make chicken soup, we will acquire chicken broth and some noodles.
# Preprocessing ingredients is done to optimize cooking time. Then we
# will output in SOUP format via stdout.
#
# BEGIN ALGORITHM
#
# Preprocessing:
# 1. Thaw chicken broth.
broth = chickenstore.deserialize()
# 2. Mix with noodles
if not noodles in cache:
with begin_transaction(locals=poulty) as t:
cache[noodles] = t.buy(noodles) # get from local store
noodles = cache[noodles]
# 3. Perform 4th-order Runge-Kutta numerical integration
import kitchensink import * # FIXME: poor form, better to from kitchensink import pan at beginning
result = boilerplate.RK4(broth `semidirect_product` noodles)
# 4. Serve hot
log.debug('attempting to serve')
return result
log.debug('server successful')
also see http://en.wikipedia.org/wiki/Literate_programming#Example
I've also heard that this is what http://en.wikipedia.org/wiki/Aspect-oriented_programming attempts to help with, though I haven't really looked into it. (It just seems to be a fancy way of saying "put your optimizations and your debugs and your other junk outside of the function you're writing".)

Related

Running parallel iterations

I am trying to run a sort of simulations where there are fixed parameters i need to iterate on and find out the combinations which has the least cost.I am using python multiprocessing for this purpose but the time consumed is too high.Is there something wrong with my implementation?Or is there better solution.Thanks in advance
import multiprocessing
class Iters(object):
#parameters for iterations
iters['cwm']={'min':100,'max':130,'step':5}
iters['fx']={'min':1.45,'max':1.45,'step':0.01}
iters['lvt']={'min':106,'max':110,'step':1}
iters['lvw']={'min':9.2,'max':10,'step':0.1}
iters['lvk']={'min':3.3,'max':4.3,'step':0.1}
iters['hvw']={'min':1,'max':2,'step':0.1}
iters['lvh']={'min':6,'max':7,'step':1}
def run_mp(self):
mps=[]
m=multiprocessing.Manager()
q=m.list()
cmain=self.iters['cwm']['min']
while(cmain<=self.iters['cwm']['max']):
t2=multiprocessing.Process(target=mp_main,args=(cmain,iters,q))
mps.append(t2)
t2.start()
cmain=cmain+self.iters['cwm']['step']
for mp in mps:
mp.join()
r1=sorted(q,key=lambda x:x['costing'])
returning=[r1[0],r1[1],r1[2],r1[3],r1[4],r1[5],r1[6],r1[7],r1[8],r1[9],r1[10],r1[11],r1[12],r1[13],r1[14],r1[15],r1[16],r1[17],r1[18],r1[19]]
self.counter=len(q)
return returning
def mp_main(cmain,iters,q):
fmain=iters['fx']['min']
while(fmain<=iters['fx']['max']):
lvtmain=iters['lvt']['min']
while (lvtmain<=iters['lvt']['max']):
lvwmain=iters['lvw']['min']
while (lvwmain<=iters['lvw']['max']):
lvkmain=iters['lvk']['min']
while (lvkmain<=iters['lvk']['max']):
hvwmain=iters['hvw']['min']
while (hvwmain<=iters['hvw']['max']):
lvhmain=iters['lvh']['min']
while (lvhmain<=iters['lvh']['max']):
test={'cmain':cmain,'fmain':fmain,'lvtmain':lvtmain,'lvwmain':lvwmain,'lvkmain':lvkmain,'hvwmain':hvwmain,'lvhmain':lvhmain}
y=calculations(test,q)
lvhmain=lvhmain+iters['lvh']['step']
hvwmain=hvwmain+iters['hvw']['step']
lvkmain=lvkmain+iters['lvk']['step']
lvwmain=lvwmain+iters['lvw']['step']
lvtmain=lvtmain+iters['lvt']['step']
fmain=fmain+iters['fx']['step']
def calculations(test,que):
#perform huge number of calculations here
output={}
output['data']=test
output['costing']='foo'
que.append(output)
x=Iters()
x.run_thread()

From a theoretical standpoint:
You're iterating every possible combination of 6 different variables. Unless your search space is very small, or you wanted just a very rough solution, there's no way you'll get any meaningful results within reasonable time.
i need to iterate on and find out the combinations which has the least cost
This very much sounds like an optimization problem.
There are many different efficient ways of dealing with these problems, depending on the properties of the function you're trying to optimize. If it has a straighforward "shape" (it's injective), you can use a greedy algorithm such as hill climbing, or gradient descent. If it's more complex, you can try shotgun hill climbing.
There are a lot more complex algorithms, but these are the basic, and will probably help you a lot in this situation.
From a more practical programming standpoint:
You are using very large steps - so large, in fact, that you'll only probe the function 19,200. If this is what you want, it seems very feasible. In fact, if I comment the y=calculations(test,q), this returns instantly on my computer.
As you indicate, there's a "huge number of calculations" there - so maybe that is your real problem, and not the code you're asking for help with.
As to multiprocessing, my honest advise is to not use it until you already have your code executing reasonably fast. Unless you're running a supercomputing cluster (you're not programming a supercomputing cluster in python, are you??), parallel processing will get you speedups of 2-4x. That's absolutely negligible, compared to the gains you get by the kind of algorithmic changes I mentioned.
As an aside, I don't think I've ever seen that many nested loops in my life (excluding code jokes). If don't want to switch to another algorithm, you might want to consider using itertools.product together with numpy.arange

Optimizing Python Code [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've been working on one of the coding challenges on InterviewStreet.com and I've run into a bit of an efficiency problem. Can anyone suggest where I might change the code to make it faster and more efficient?
Here's the code
Here's the problem statement if you're interested

If your question is about optimising python code generally (which I think it should be ;) then there are all sorts of intesting things you can do, but first:
You probably shouldn't be obsessively optimising python code! If you're using the fastest algorithm for the problem you're trying to solve and python doesn't do it fast enough you should probably be using a different language.
That said, there are several approaches you can take (because sometimes, you really do want to make python code faster):
Profile (do this first!)
There are lots of ways of profiling python code, but there are two that I'll mention: cProfile (or profile) module, and PyCallGraph.
cProfile
This is what you should actually use, though interpreting the results can be a bit daunting.
It works by recording when each function is entered or exited, and what the calling function was (and tracking exceptions).
You can run a function in cProfile like this:
import cProfile
cProfile.run('myFunction()', 'myFunction.profile')
Then to view the results:
import pstats
stats = pstats.Stats('myFunction.profile')
stats.strip_dirs().sort_stats('time').print_stats()
This will show you in which functions most of the time is spent.
PyCallGraph
PyCallGraph provides a prettiest and maybe the easiest way of profiling python programs -- and it's a good introduction to understanding where the time in your program is spent, however it adds significant execution overhead
To run pycallgraph:
pycallgraph graphviz ./myprogram.py
Simple! You get a png graph image as output (perhaps after a while...)
Use Libraries
If you're trying to do something in python that a module already exists for (maybe even in the standard library), then use that module instead!
Most of the standard library modules are written in C, and they will execute hundreds of times faster than equivilent python implementations of, say, bisection search.
Make the Interpreter do as Much of Your Work as You Can
The interpreter will do some things for you, like looping. Really? Yes! You can use the map, reduce, and filter keywords to significantly speed up tight loops:
consider:
for x in xrange(0, 100):
doSomethingWithX(x)
vs:
map(doSomethingWithX, xrange(0,100))
Well obviously this could be faster because the interpreter only has to deal with a single statement, rather than two, but that's a bit vague... in fact, this is faster for two reasons:
all flow control (have we finished looping yet...) is done in the interpreter
the doSomethingWithX function name is only resolved once
In the for loop, each time around the loop python has to check exactly where the doSomethingWithX function is! even with cacheing this is a bit of an overhead.
Remember that Python is an Interpreted Language
(Note that this section really is about tiny tiny optimisations that you shouldn't let affect your normal, readable coding style!)
If you come from a background of a programming in a compiled language, like c or Fortran, then some things about the performance of different python statements might be surprising:
try:ing is cheap, ifing is expensive
If you have code like this:
if somethingcrazy_happened:
uhOhBetterDoSomething()
else:
doWhatWeNormallyDo()
And doWhatWeNormallyDo() would throw an exception if something crazy had happened, then it would be faster to arrange your code like this:
try:
doWhatWeNormallyDo()
except SomethingCrazy:
uhOhBetterDoSomething()
Why? well the interpreter can dive straight in and start doing what you normally do; in the first case the interpreter has to do a symbol look up each time the if statement is executed, because the name could refer to something different since the last time the statement was executed! (And a name lookup, especially if somethingcrazy_happened is global can be nontrivial).
You mean Who??
Because of cost of name lookups it can also be better to cache global values within functions, and bake-in simple boolean tests into functions like this:
Unoptimised function:
def foo():
if condition_that_rarely_changes:
doSomething()
else:
doSomethingElse()
Optimised approach, instead of using a variable, exploit the fact that the interpreter is doing a name lookup on the function anyway!
When the condition becomes true:
foo = doSomething # now foo() calls doSomething()
When the condition becomes false:
foo = doSomethingElse # now foo() calls doSomethingElse()
PyPy
PyPy is a python implementation written in python. Surely that means it will run code infinitely slower? Well, no. PyPy actually uses a Just-In-Time compiler (JIT) to run python programs.
If you don't use any external libraries (or the ones you do use are compatible with PyPy), then this is an extremely easy way to (almost certainly) speed up repetitive tasks in your program.
Basically the JIT can generate code that will do what the python interpreter would, but much faster, since it is generated for a single case, rather than having to deal with every possible legal python expression.
Where to look Next
Of course, the first place you should have looked was to improve your algorithms and data structures, and to consider things like caching, or even whether you need to be doing so much in the first place, but anyway:
This page of the python.org wiki provides lots of information about how to speed up python code, though some of it is a bit out of date.
Here's the BDFL himself on the subject of optimising loops.
There are quite a few things, even from my own limited experience that I've missed out, but this answer was long enough already!
This is all based on my own recent experiences with some python code that just wasn't fast enough, and I'd like to stress again that I don't really think any of what I've suggested is actually a good idea, sometimes though, you have to....

First off, profile your code so you know where the problems lie. There are many examples of how to do this, here's one: https://codereview.stackexchange.com/questions/3393/im-trying-to-understand-how-to-make-my-application-more-efficient
You do a lot of indexed access as in:
for pair in range(i-1, j):
if coordinates[pair][0] >= 0 and coordinates[pair][1] >= 0:
Which could be written more plainly as:
for coord in coordinates[i-1:j]:
if coord[0] >= 0 and cood[1] >= 0:
List comprehensions are cool and "pythonic", but this code would probably run faster if you didn't create 4 lists:
N = int(raw_input())
coordinates = []
coordinates = [raw_input() for i in xrange(N)]
coordinates = [pair.split(" ") for pair in coordinates]
coordinates = [[int(pair[0]), int(pair[1])] for pair in coordinates]
I would instead roll all those together into one simple loop or if you're really dead set on list comprehensions, encapsulate the multiple transformations into a function which operates on the raw_input().

This answer shows how I locate code to optimize.
Suppose there is some line of code you could replace, and it is costing, say, 40% of the time.
Then it resides on the call stack 40% of the time.
If you take 10 samples of the call stack, it will appear on 4 of them, give or take.
It really doesn't matter how many samples show it.
If it appears on two or more, and if you can replace it, you will save whatever time it costs.

Most of the interview street problems seem to be tested in a way that will verify that you have found an algorithm with the right big O complexity rather than that you have coded the solution in the most optimal way possible.
In other words if you are failing some of the test cases due to running out of time the problem is likely that you need to figure out a solution with lower algorithmic complexity rather than micro-optimize the algorithm you have. This is why they generally state that N can be quite large.

Performance differences between Python and C

Working on different projects I have the choice of selecting different programming languages, as long as the task is done.
I was wondering what the real difference is, in terms of performance, between writing a program in Python, versus doing it in C.
The tasks to be done are pretty varied, e.g. sorting textfiles, disk access, network access, textfile parsing.
Is there really a noticeable difference between sorting a textfile using the same algorithm in C versus Python, for example?
And in your experience, given the power of current CPU's (i7), is it really a noticeable difference (Consider that its a program that doesnt bring the system to its knees).

Use python until you have a performance problem. If you ever have one figure out what the problem is (often it isn't what you would have guessed up front). Then solve that specific performance problem which will likely be an algorithm or data structure change. In the rare case that your problem really needs C then you can write just that portion in C and use it from your python code.

C will absolutely crush Python in almost any performance category, but C is far more difficult to write and maintain and high performance isn't always worth the trade off of increased time and difficulty in development.
You say you're doing things like text file processing, but what you omit is how much text file processing you're doing. If you're processing 10 million files an hour, you might benefit from writing it in C. But if you're processing 100 files an hour, why not use python? Do you really need to be able to process a text file in 10ms vs 50ms? If you're planning for the future, ask yourself, "Is this something I can just throw more hardware at later?"
Writing solid code in C is hard. Be sure you can justify that investment in effort.

In general IO bound work will depend more on the algorithm then the language. In this case I would go with Python because it will have first class strings and lots of easy to use libraries for manipulating files, etc.

Is there really a noticeable difference between sorting a textfile using the same algorithm in C versus Python, for example?
Yes.
The noticeable differences are these
There's much less Python code.
The Python code is much easier to read.
Python supports really nice unit testing, so the Python code tends to be higher quality.
You can write the Python code more quickly, since there are fewer quirky language features. No preprocessor, for example, really saves a lot of hacking around. Super-experience C programmers hardly notice it. But all that #include sandwich stuff and making the .h files correct is remarkably time-consuming.
Python can be easier to package and deploy, since you don't need a big fancy make script to do a build.

The first rule of computer performance questions: Your mileage will vary. If small performance differences are important to you, the only way you will get valid information is to test with your configuration, your data, and your benchmark. "Small" here is, say, a factor of two or so.
The second rule of computer performance questions: For most applications, performance doesn't matter -- the easiest way to write the app gives adequate performance, even when the problem scales. If that is the case (and it is usually the case) don't worry about performance.
That said:
C compiles down to machine executable and thus has the potential to execute as at least as fast as any other language
Python is generally interpreted and thus may take more CPU than a compiled language
Very few applications are "CPU bound." I/O (to disk, display, or memory) is not greatly affected by compiled vs interpreted considerations and frequently is a major part of computer time spent on an application
Python works at a higher level of abstraction than C, so your development and debugging time may be shorter
My advice: Develop in the language you find the easiest with which to work. Get your program working, then check for adequate performance. If, as usual, performance is adequate, you're done. If not, profile your specific app to find out what is taking longer than expected or tolerable. See if and how you can fix that part of the app, and repeat as necessary.
Yes, sometimes you might need to abandon work and start over to get the performance you need. But having a working (albeit slow) version of the app will be a big help in making progress. When you do reach and conquer that performance goal you'll be answering performance questions in SO rather than asking them.

If your text files that you are sorting and parsing are large, use C. If they aren't, it doesn't matter. You can write poor code in any language though. I have seen simple code in C for calculating areas of triangles run 10x slower than other C code, because of poor memory management, use of structures, pointers, etc.
Your I/O algorithm should be independent of your compute algorithm. If this is the case, then using C for the compute algorithm can be much faster.

(Assumption - The question implies that the author is familiar with C but not Python, therefore I will base my answer with that in mind.)
I was wondering what the real
difference is, in terms of
performance, between writing a program
in Python, versus doing it in C.
C will almost certainly be faster unless it is implemented poorly, but the real questions are:
What are the development implications
(development time, maintenance, etc.)
for either implementation?
Is the performance benefit significant?
Learning Python can take some time, but there are Python modules that can greatly speed development time. For example, the csv module in Python makes reading and writing csv easy. Also, Python strings, arrays, maps, and other objects make it more flexible than plain C and more elegant, in my opinion, than the equivalent C++. Some things like network access may be much quicker to develop in Python as well.
However, it may take time to learn how to program Python well enough to accomplish your task. Since you are concerned with performance, I suggest trying a simple task, such as sorting a text file, in both C and Python. That will give you a better baseline on both languages in terms of performance, development time, and possibly maintenance.

It really depends a lot on what your doing and if the algorithm in question is available in Python via a natively compiled library. If it is, then I believe you'll be looking at performance numbers close enough that Python is most likely your answer -- assuming it's your preferred language. If you must implement the algorithm yourself, depending on the amount of logic required and the size of your data set, C/C++ may be the better option. It's hard to provide a less nebulous answer without more information.

To get an idea of the raw difference in speed, check out the Computer Languages Benchmark Game.
Then you have to decide whether that difference matters to you.
Personally, I ended up deciding that it did, but most of the time instead of using C, I ended up using other higher-level languages. Personally I mostly use Scala, but Haskell and C# and Java each have their advantages also.

Across all programs, it isn't really possible to say whether things will be quicker or slower on average in Python or C.
For the programs that I've implemented in both languages, using similar algorithms, I've seen no improvement (and sometimes a performance degradation) for string- and IO-heavy code, when reimplementing python code in C. The execution time is dominated by allocation and manipulation of strings (which functionality python implements very efficiently) and waiting for IO operations (which incurs the same overhead in either language), so the extra overhead of python makes very little difference.
But for programs that do even simple operations on image files, say (images being large enough for processing time to be noticeable compared to IO), C is enormously quicker. For this sort of task the bulk of the time running the python code is spent doing Python Stuff, and this dwarfs the time spent on the underlying operations (multiply, add, compare, etc.). When reimplemented as C, the bureaucracy goes away, the computer spends its time doing real honest work, and for that reason the thing runs much quicker.
It's not uncommon for the python code to run in (say) 5 seconds where the C code runs in (say) 0.05. So that's a 100x increase -- but in absolute terms, this is not so big a deal. It takes so much less longer to write python code than it does to write C code that your program would have to be run some huge number of times to turn a time profit. I often reimplement in C, for various reasons, but if you don't have this requirement then it's probably not worth bothering. You won't get that part of your life back, and next year computers will be quicker.

Actually you can solve most of your tasks efficiently with python.
You just should know which tools to use. For text processing there is brilliant package from Egenix guys - http://www.egenix.com/products/python/mxBase/mxTextTools/. I was able to create very efficient parsers with it in python, since all the heavy lifting is done by native code.
Same approach goes for any other problem - if you have performance problems, get a C/C++ library with Python interface which implements whatever bottleneck you got efficiently.

C is definitely faster than Python because Python is written in C.
C is middle level language and hence faster but there not much a great difference between C & Python regarding executable time it takes.
but it is really very easy to write code in Python than C and it take much shorter time to write code and learn Python than C.
Because its easy to write its easy to test also.

You will find C is much slower. Your developers will have to keep track of memory allocation, and use libraries (such as glib) to handle simple things such as dictionaries, or lists, which python has built-in.
Moreover, when an error occurs, your C program will typically just crash, which means you'll need to get the error to happen in a debugger. Python would give you a stack trace (typically).
Your code will be bigger, which means it will contain more bugs. So not only will it take longer to write, it will take longer to debug, and will ship with more bugs. This means that customers will notice the bugs more often.
So your developers will spend longer fixing old bugs and thus new features will get done more slowly.
In the mean-time, your competitors will be using a sensible programming language and their products will be increasing in features and usability, rapidly yours will look bad. Your customers will leave and you'll go out of business.

The excess time to write the code in C compared to Python will be exponentially greater than the difference between C and Python execution speed.

Does creating separate functions instead of one big one slow processing time?

I'm working in the Google App Engine environment and programming in Python. I am creating a function that essentially generates a random number/letter string and then stores to the memcache.
def generate_random_string():
# return a random 6-digit long string
def check_and_store_to_memcache():
randomstring = generate_random_string()
#check against memcache
#if ok, then store key value with another value
#if not ok, run generate_random_string() again and check again.
Does creating two functions instead of just one big one affect performance? I prefer two, as it better matches how I think, but don't mind combining them if that's "best practice".

Focus on being able to read and easily understand your code.
Once you've done this, if you have a performance problem, then look into what might be causing it.
Most languages, python included, tend to have fairly low overhead for making method calls. Putting this code into a single function is not going to (dramatically) change the performance metrics - I'd guess that your random number generation will probably be the bulk of the time, not having 2 functions.
That being said, splitting functions does have a (very, very minor) impact on performance. However, I'd think of it this way - it may take you from going 80 mph on the highway to 79.99mph (which you'll never really notice). The important things to watch for are avoiding stoplights and traffic jams, since they're going to make you have to stop altogether...

In almost all cases, "inlining" functions to increase speed is like getting a hair cut to lose weight.

Reed is right. For the change you're considering, the cost of a function call is a small number of cycles, and you'd have to be doing it 10^8 or so times per second before you'd notice.
However, I would caution that often people go to the other extreme, and then it is as if function calls were costly. I've seen this in over-designed systems where there were many layers of abstraction.
What happens is there is some human psychology that says if something is easy to call, then it is fast. This leads to writing more function calls than strictly necessary, and when this occurs over multiple layers of abstraction, the wastage can be exponential.
Following Reed's driving example, a function call can be like a detour, and if the detour contains detours, and if those also contain detours, soon there is tremendous time being wasted, for no obvious reason, because each function call looks innocent.

Like others have said, I wouldn't worry about it in this particular scenario. The very small overhead involved in function calls would pale in comparison to what is done inside each function. And as long as these functions don't get called in rapid succession, it probably wouldn't matter much anyway.
It is a good question though. In some cases it's best not to break code into multiple functions. For example, when working with math intensive tasks with nested loops it's best to make as few function calls as possible in the inner loop. That's because the simple math operations themselves are very cheap, and next to that the function-call-overhead can cause a noticeable performance penalty.
Years ago I discovered the hypot (hypotenuse) function in the math library I was using in a VC++ app was very slow. It seemed ridiculous to me because it's such a simple set of functionality -- return sqrt(a * a + b * b) -- how hard is that? So I wrote my own and managed to improve performance 16X over. Then I added the "inline" keyword to the function and made it 3X faster than that (about 50X faster at this point). Then I took the code out of the function and put it in my loop itself and saw yet another small performance increase. So... yeah, those are the types of scenarios where you can see a difference.

Should I optimise my python code like C++? Does it matter?

I had an argument with a colleague about writing python efficiently. He claimed that though you are programming python you still have to optimise the little bits of your software as much as possible, as if you are writing an efficient algorithm in C++.
Things like:
In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
Use the most efficient functions for manipulating strings in common use. Not code that grinds strings, but simple things like doing joins and splits, and finding substrings.
Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates.
I say, that in most cases it doesn't matter. I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
What's your view of the matter?

My answer to that would be :
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is the
root of all evil.
(Quoting Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268)
If your application is doing anything like a query to the database, that one query will take more time than anything you can gain with those kind of small optimizations, anyway...
And if running after performances like that, why not code in assembly language, afterall ? Because Python is easier/faster to write and maintain ? Well, if so, you are right :-)
The most important thing is that your code is easy to maintain ; not a couple micro-seconds of CPU-time !
Well, maybe except if you have thousands of servers -- but is it your case ?

The answer is really simple :
Follow Python best practices, not C++ best practices.
Readability in Python is more important that speed.
If performance becomes an issue, measure, then start optimizing.

This sort of premature micro-optimisation is usually a waste of time in my experience, even in C and C++. Write readable code first. If it's running too slowly, run it through a profiler, and if necessary, fix the hot-spots.
Fundamentally, you need to think about return on investment. Is it worth the extra effort in reading and maintaining "optimised" code for the couple of microseconds it saves you? In most cases it isn't.
(Also, compilers and runtimes are getting cleverer. Some micro-optimisations may become micro-pessimisations over time.)

I agree with others: readable code first ("Performance is not a problem until performance is a problem.").
I only want to add that when you absolutely need to write some unreadable and/or non-intuitive code, you can generally isolate it in few specific methods, for which you can write detailed comments, and keep the rest of your code highly readable. If you do so, you'll end up having easy to maintain code, and you'll only have to go through the unreadable parts when you really need to.

I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
Given this, I'd say that you should take your colleague's advice about writing efficient Python but ignore anything he says that goes against prioritizing readability and maintainability of the code, which will probably be more important than the speed at which it'll execute.

In an if statement with an or always
put the condition most likely to fail
first, so the second will not be
checked.
This is generally a good advice, and also depends on the logic of your program. If it makes sense that the second statement is not evaluated if the first returns false, then do so. Doing the opposite could be a bug otherwise.
Use the most efficient functions for
manipulating strings in common use.
Not code that grinds strings, but
simple things like doing joins and
splits, and finding substrings.
I don't really get this point. Of course you should use the library provided functions, because they are probably implemented in C, and a pure python implementation is most likely to be slower. In any case, no need to reinvent the wheel.
Call as less functions as possible,
even if it comes on the expense of
readability, because of the overhead
this creates.
$ cat withcall.py
def square(a):
return a*a
for i in xrange(1,100000):
i_square = square(i)
$ cat withoutcall.py
for i in xrange(1,100000):
i_square = i*i
$ time python2.3 withcall.py
real 0m5.769s
user 0m4.304s
sys 0m0.215s
$ time python2.3 withcall.py
real 0m5.884s
user 0m4.315s
sys 0m0.206s
$ time python2.3 withoutcall.py
real 0m5.806s
user 0m4.172s
sys 0m0.209s
$ time python2.3 withoutcall.py
real 0m5.613s
user 0m4.171s
sys 0m0.216s
I mean... come on... please.

I think there are several related 'urban legends' here.
False Putting the more often-checked condition first in a conditional and similar optimizations save enough time for a typical program that it is worthy for a typical programmer.
True Some, but not many, people are using such styles in Python in the incorrect belief outlined above.
True Many people use such style in Python when they think that it improves readability of a Python program.
About readability: I think it's indeed useful when you give the most useful conditional first, since this is what people notice first anyway. You should also use ''.join() if you mean concatenation of strings since it's the most direct way to do it (the s += x operation could mean something different).
"Call as less functions as possible" decreases readability and goes against Pythonic principle of code reuse. And so it's not a style people use in Python.

Before introducing performance optimizations at the expense of readability, look into modules like psyco that will do some JIT-ish compiling of distinct functions, often with striking results, with no impairment of readability.
Then if you really want to embark on the optimization path, you must first learn to measure and profile. Optimization MUST BE QUANTITATIVE - do not go with your gut. The hotspot profiler will show you the functions where your program is burning up the most time.
If optimization turns up a function like this is being frequently called:
def get_order_qty(ordernumber):
# look up order in database and return quantity
If there is any repetition of ordernumbers, then memoization would be a good optimization technique to learn, and it is easily packaged in an #memoize decorator so that there is little impact to program readability. The effect of memoizing is that values returned for a given set of input arguments are cached, so that the expensive function can be called only once, with subseqent calls resolved against the cache.
Lastly, consider lifting invariants out of loops. For large multi-dimensional structures, this can save a lot of time - in fact in this case, I would argue that this optimization improves readability, as it often serves to make clear that some expression can be computed at a high-level dimension in the nested logic.
(BTW, is this really what you meant?
•In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
I should think this might be the case for "and", but an "or" will short-circuit if the first value is True, saving the evaluation of the second term of the conditional. So I would change this optimization "rule" to:
If testing "A and B", put A first if
it is more likely to evaluate to
False.
If testing "A or B", put A first if
it is more likely to evaluate to
True.
But often, the sequence of conditions is driven by the tests themselves:
if obj is not None and hasattr(obj,"name") and obj.name.startswith("X"):
You can't reorder these for optimization - they have to be in this order (or just let the exceptions fly and catch them later:
if obj.name.startswith("X"):

Sure follow Python best-practices (and in fact I agree with the first two recommendations), but maintainability and efficiency are not opposites, they are mostly togethers (if that's a word).
Statements like "always write your IF statements a certain way for performance" are a-priori, i.e. not based on knowledge of what your program spends time on, and are therefore guesses. The first (or second, or third, whatever) rule of performance tuning is don't guess.
If after you measure, profile, or in my case do this, you actually know that you can save much time by re-ordering tests, by all means, do. My money says that's at the 1% level or less.

My visceral reaction is this:
I've worked with guys like your colleague and in general I wouldn't take advice from them.
Ask him if he's ever even used a profiler.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.