Is Python allowed to optimize a function definition to eliminate unused code?

Is Python allowed to optimize a function definition to eliminate unused code? - python

If I defined a function like this:
def ccid_year(seq):
year, prefix, index, suffix = seq
return year
Is Python allowed to optimize it to be effectively:
def ccid_year(seq):
return seq[0]
I'd prefer to write the first function because it documents the format of the data being passed in but would hope that Python would generate code that is effectively as efficient as the second definition.

The two functions are not equivalent:
def ccid_year_1(seq):
year, prefix, index, suffix = seq
return year
def ccid_year_2(seq):
return seq[0]
arg = {1:'a', 2:'b', 0:'c', 3:'d'}
print ccid_year_1(arg)
print ccid_year_2(arg)
The first call prints 0 and the second prints c.

I'll answer the question at face value later, but first: When in doubt, benchmark it! But first, recall that most time is spent in a small portion of the code (i.e., most code is irrelevant to performance!) and, in CPython, function call overhead usually dominates small inefficiencies. Not to mention that large-scale algorithmic inefficiencies (a.k.a. freaking stupid code) dwarfs micro-optimization concerns.
So either don't worry about this at all, or if you have reason to worry about it, first benchmark alternatives and second don't put it in a function. Note that "reasons to worry about it" must be weighted against the time spent worrying, and the maintenance burden (if there is one) of the manual optimization.
CPython, the reference implementation you most like use, is very conservative about optimizing at this level. While there is a peephole optimizer operating on bytecode, it is limited in scale. More generally, you can't expect much optimization crossing a single statement. The problem with statically optimizing Python code is that there's a billion ways even the most innocently-looking program frament can call into arbitrary code, which might do anything at all, so you can't omit these calls.
While we're at it, your proposed optimization is invalid (in the sense that the program doesn't have the same behavior) if seq is of the wrong type (not a sequence, or a very weird sequence) or length (not exactly three items long)! Any program claiming to implement Python must maintain such differences, so it won't do the transformation you suggest literally. I assume this was just an off-hand illustration, but it does indicate you seriously underestimate how complex Python is (to implement, and doubly so to optimize). I and others have written about this at length before, so I'll stop now before this post becomes even larger.
PyPy on the other hand will, if this function is indeed called from a hot loop, probably optimize this and a million other things you didn't even think of, while compiling it down to a machine code loop that iterates faster than any Python loop could ever iterate on CPython. It will still contain a few checks to break out of the loop and take the proper action (e.g. raise an exception) if necessary, but they'll also be highly efficient if not triggered.
I do not know much about IronPython and Jython and other implementations, but if their lack of consistent several-times-faster-than-CPython benchmark results is any indicator, they do not perform significant optimizations. While the VMs IronPython and Jython include JIT compilers (not - but not quite - entirely unlike PyPy's), these JIT compilers are built for very different languages, and I'd be very surprised if they could look through the mess of code IronPython/Jython must execute to achieve Python semantics and perform such optimizations on it.

Related

Does every simple mathematical operation use the same amount of power (as in, battery power)?

Recently I have been revising some of my old python codes, which are essentially loops of algebra, in order to have them execute faster, generally by eliminating some un-necessary operations. Often, changing the value of an entry in a list from 0 (as a python float, which I believe is a double by default) to the same value, which is obviously not necessary. Or, checking if a float is equal to something, when it MUST be that thing, because a preceeding "if" would not have triggered if it wasn't, or some other extraneous operation. This got me wondering about what will preserve my battery more, as I do a some of my coding on the bus where I can't plug my laptop in.
For example, which of the following two operations would be expected to use less battery power?
if b != 0: #b was assigned previously, and I know it is zero already
b = 0
or:
b = 0
The first one checks if b is zero, and it is, so it doesn't do the next part. The second one just assigns b to zero without bothering to check. I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?

I suggest watching this talk by Chandler Carruth: "Efficiency with Algorithms, Performance with Data Structures"
He addresses the idea of "Power efficient instructions" at 4m 49s in the video. I agree with him, thinking about how much watt particular code consumes is useless. As he put it
Q: "How to save battery life?"
A: "Finish ruining the program".
Also, in Python you do not have low level control to be even thinking about low level problems like this. Use appropriate data structures and algorithms, and pray that Python interpreter will give you well optimized byte-code.

Does every simple mathematical operation use the same amount of power (as in, battery power)?
No. It's not the same to compute a two number addition than a fourier transform of a 20 megapixel photo.
I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?
Yes. You are right on your intuitions but these are very trivial examples. And if you dig deeper you will get into uncharted territory of weird optimization that's quite difficult to grasp (e.g., see this question: Times two faster than bit shift?)

In general the more your code utilizes system resources the greater power those resources would use. However it is more useful to optimize your code based on time or size constraints instead of thinking about high level code in terms of power draw.
One way of doing this is Big O notation. In essence, Big O notation is a way of comparing the size and or runtime complexity of an algorithm. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
A computer at its lowest level is large quantity of transistors which require power to change and maintain their state.
It would be extremely difficult to anticipate how much power any one line of python code would draw.

I once had questions like this. Still do sometimes. Here's the answer I wish someone told me earlier.
Summary
You are correct that generally, if your computer does less work, it'll use less power.
But we have to be really careful in figuring out which logical operations involve more work and which ones involve less work - in this case:
Reading vs writing memory is usually the same amount of work.
if and any other conditional execution also costs work.
Python's "simple operations" are not "simple operations" for the CPU.
But the idea you had is probably correct for some cases you had in mind.
If you're concerned about power consumption, measure where power is being used.
For some perspective: You're asking about which Python code costs you one more drop of water, but really in Python every operation costs a bucket and your whole Python program is using a river and your computer as a whole is using an ocean.
Direct Answers
Don't apply these answers to Python yet. Read the rest of the answer first, because there's so much indirection between Python and the CPU that you'll mislead yourself about how they're connected if you don't take that into account.
I believe the first one is more time-efficient, as you don't have to change anything in memory.
As a general rule, reading memory is just as slow as writing to memory, or even slower depending on exactly what your computer is doing. For further reading you'll want to look into CPU memory cache levels, memory access times, and how out-of-order execution and data dependencies factor into modern CPU architectures.
As a general rule, the if statement in a language is itself an operation which can have a non-negligible cost. For further reading you should look into how CPU pipelining relates to branch prediction and branch penalties. Also look into how if statements are implemented in typical CPU instruction sets.
Does "more time efficient" always imply "more power efficient"?
As a general rule, more work efficient (doing less work - less machine instructions, for example) implies more power efficient, because on modern hardware (this wasn't always this way) your hardware will use less power when it's not doing anything.
You should be careful about the idea of "more time efficient" though, because modern hardware doesn't always execute the same amount of work in the same amount of time: for further reading you'll want to look into CPU frequency scaling, ARM's big.LITTLE architectures, and discussions about the "Race to Idle" concept as a starting point.
"One Simple Operation" - CPU vs. Python
Your question is about Python, so it's important to realize that Python's x != 0, if, and x = 0 do not map directly to simple operations in the CPU.
For further reading, especially if you're familiar with C, I would recommend taking a long look at how Python is implemented. There are many implementations - the main one is CPython, which is a C program that reads and interprets Python source, converts it into Python "bytecode" and then when running interprets that bytecode one by one.
As a baseline, if you're using Python, any one "simple" operation is actually a lot of CPU operations, as each step in the Python interpreter is multiple CPU operations, but which ones cost more might be surprising.
Let's break down the three used in our example (I'm primarily describing this from the perspective of the main Python implementation written in C, called "CPython", which I am the most closely familiar with, but in general this explanation is roughly applicable to all of them, though some will be able to optimize out certain steps):
x != 0
It looks like a simple operation, and if this was C and x was an int it would be just one machine instruction - but Python allows for operator overloading, so first Python has to:
look up x (at least one memory read, but may involve one or more hashmap lookups in Python's internals, which is many machine operations),
check the type of x (more memory reading),
based on the type look up a function pointer that implements the not-equality operation (one or arbitrarily many memory reads and one or more arbitrarily many conditional branches, with data dependencies between them),
only then it can finally call that function with references to Python objects holding the values of x and 0 (which is also not "free" - look up function calling ABI for more on that).
All that and more has to be done by the CPU even if x is a Python int or float mapping closely to the CPU's native numerical data types.
x = 0
An assignment is actually far cheaper in Python (though still not trivial): it only has to get as far as step 1 above, because once it knows "where" x is, it can just overwrite that pointer with the pointer to the Python object representing 0.
if
Abstractly speaking, the Python if statement has to be able to handle "truthy" and "falsey" values, which in the most naive implementation would involves running through more CPU instructions to evaluate what result of the condition is according to Python's semantics of what's true and what's false.
Sidenote About Optimizations
Different Python implementations go to different lengths to get Python operations closer to as few CPU operations in possible. For example, an optimizing JIT (Just In Time) compiler might notice that, inside some loop on an array, all elements of the array are native integers and actually reduce the if x != 0 and x = 0 parts into their respective minimal machine instructions, but that only happens in very specific circumstances when the optimizing logic can prove that it can safely bypass a lot of the behavior it would normally need to do.
The biggest thing here is this: a high-level language like Python is so removed from the hardware that "simple" operations are often complex "under the covers".
What You Asked vs. What I Think You Wanted To Ask
Correct me if I'm wrong, but I suspect the use-case you actually had in mind was this:
if x != 0:
# some code
x = 0
vs. this:
if x != 0:
# some code
x = 0
In that case, the first option is superior to the second, because you are already paying the cost of if x != 0 anyway.
Last Point of Emphasis
The hardest breakthrough for me was moving away from trying to reason about individual instructions in my head, and instead switching into looking at how things work and measuring real systems.
Looking at how things work will teach you how to optimize, but measuring will show you where to optimize.
This question is great for exploring the former, but for your stated motivation of reducing power consumption on your laptop, you would benefit more from the latter.

Is there any way to "call a function" in python without incurring the usual performance hit?

Captain Hindsight, reporting in:
After reading through the comment and answer and running a few tests, I found out that I had made a subtle error in my calculations. Turns out, I was
comparing compiled lookups to interpreted calls. When I precompiled
the call using the NON-IPython line magic version ( ie:
timeit.timeit(codestr, setup_codestr), I found that the function calls
were indeed on the same order of magnitude as the lookups :)
Now there's a whole world of caching function results, precompiling functions, and precompiling types to explore! ..and that's nice :)
For posterity:
I realize that sounds like a strange question, but someone might know a way around this, and that would be great. So here goes:
If I do something like:
%%timeit somelist[42]
Then I get times in the 90 nanosecond range. A slice will get it up to 190ish; and, to my pleasant surprise, even big crazy ones were still fast. This bad boy, for instance, weighs in at 385 nanseconds:
%%timeit some_nested_list[2:5][1][6:13]
Here's the thing. Function calls, it seems, are a lot slower than that. I like decomposing problems functionally, and am starting to give functional programming a bit more thought, but the speed difference is significant and (3.34 microseconds vs 100-150 nanoseconds (realistic actual avgs of conditionals, etc)). The following takes 3.34 micros:
def func():
some_nested_list[2:5][1][6:13]
%%timeit func()
So, there's presumably a lot of functional programmers out there? You all must have dealt with this little hiccup? Someone care to point me in the right direction?

Not really. Python function calls involve a certain amount of overhead for setting up the stack frame, etc., and you can't eliminate that overhead while still writing a Python function. The reason the operations in your example are fast is that you're doing them on a list, and lists are written in C.
One thing to keep in mind is that, in many practical situations, the function call overhead will be small relative to what the function actually does. See this question for some discussion. However, if you move toward a pure-functional style in which each function just evaluates one expression, you may indeed suffer a performance penalty.
An alternative is to look at PyPy, which makes many pure-Python operations faster. I don't know whether it improves function call speed specifically. Also, by using PyPy you restrict the set of libraries you can use.
Finally, there is Cython, which allows you to write code in a language that looks basically the same as Python, but actually compiles to C. This can be much faster than Python in some cases.
The bottom line is that how to speed up your functions depends on what your functions actually do. There is no magic way to just magically make all function calls faster while still keeping everything else about Python the same. If there were, they probably would have already added it to Python.

Optimizing Python Code [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've been working on one of the coding challenges on InterviewStreet.com and I've run into a bit of an efficiency problem. Can anyone suggest where I might change the code to make it faster and more efficient?
Here's the code
Here's the problem statement if you're interested

If your question is about optimising python code generally (which I think it should be ;) then there are all sorts of intesting things you can do, but first:
You probably shouldn't be obsessively optimising python code! If you're using the fastest algorithm for the problem you're trying to solve and python doesn't do it fast enough you should probably be using a different language.
That said, there are several approaches you can take (because sometimes, you really do want to make python code faster):
Profile (do this first!)
There are lots of ways of profiling python code, but there are two that I'll mention: cProfile (or profile) module, and PyCallGraph.
cProfile
This is what you should actually use, though interpreting the results can be a bit daunting.
It works by recording when each function is entered or exited, and what the calling function was (and tracking exceptions).
You can run a function in cProfile like this:
import cProfile
cProfile.run('myFunction()', 'myFunction.profile')
Then to view the results:
import pstats
stats = pstats.Stats('myFunction.profile')
stats.strip_dirs().sort_stats('time').print_stats()
This will show you in which functions most of the time is spent.
PyCallGraph
PyCallGraph provides a prettiest and maybe the easiest way of profiling python programs -- and it's a good introduction to understanding where the time in your program is spent, however it adds significant execution overhead
To run pycallgraph:
pycallgraph graphviz ./myprogram.py
Simple! You get a png graph image as output (perhaps after a while...)
Use Libraries
If you're trying to do something in python that a module already exists for (maybe even in the standard library), then use that module instead!
Most of the standard library modules are written in C, and they will execute hundreds of times faster than equivilent python implementations of, say, bisection search.
Make the Interpreter do as Much of Your Work as You Can
The interpreter will do some things for you, like looping. Really? Yes! You can use the map, reduce, and filter keywords to significantly speed up tight loops:
consider:
for x in xrange(0, 100):
doSomethingWithX(x)
vs:
map(doSomethingWithX, xrange(0,100))
Well obviously this could be faster because the interpreter only has to deal with a single statement, rather than two, but that's a bit vague... in fact, this is faster for two reasons:
all flow control (have we finished looping yet...) is done in the interpreter
the doSomethingWithX function name is only resolved once
In the for loop, each time around the loop python has to check exactly where the doSomethingWithX function is! even with cacheing this is a bit of an overhead.
Remember that Python is an Interpreted Language
(Note that this section really is about tiny tiny optimisations that you shouldn't let affect your normal, readable coding style!)
If you come from a background of a programming in a compiled language, like c or Fortran, then some things about the performance of different python statements might be surprising:
try:ing is cheap, ifing is expensive
If you have code like this:
if somethingcrazy_happened:
uhOhBetterDoSomething()
else:
doWhatWeNormallyDo()
And doWhatWeNormallyDo() would throw an exception if something crazy had happened, then it would be faster to arrange your code like this:
try:
doWhatWeNormallyDo()
except SomethingCrazy:
uhOhBetterDoSomething()
Why? well the interpreter can dive straight in and start doing what you normally do; in the first case the interpreter has to do a symbol look up each time the if statement is executed, because the name could refer to something different since the last time the statement was executed! (And a name lookup, especially if somethingcrazy_happened is global can be nontrivial).
You mean Who??
Because of cost of name lookups it can also be better to cache global values within functions, and bake-in simple boolean tests into functions like this:
Unoptimised function:
def foo():
if condition_that_rarely_changes:
doSomething()
else:
doSomethingElse()
Optimised approach, instead of using a variable, exploit the fact that the interpreter is doing a name lookup on the function anyway!
When the condition becomes true:
foo = doSomething # now foo() calls doSomething()
When the condition becomes false:
foo = doSomethingElse # now foo() calls doSomethingElse()
PyPy
PyPy is a python implementation written in python. Surely that means it will run code infinitely slower? Well, no. PyPy actually uses a Just-In-Time compiler (JIT) to run python programs.
If you don't use any external libraries (or the ones you do use are compatible with PyPy), then this is an extremely easy way to (almost certainly) speed up repetitive tasks in your program.
Basically the JIT can generate code that will do what the python interpreter would, but much faster, since it is generated for a single case, rather than having to deal with every possible legal python expression.
Where to look Next
Of course, the first place you should have looked was to improve your algorithms and data structures, and to consider things like caching, or even whether you need to be doing so much in the first place, but anyway:
This page of the python.org wiki provides lots of information about how to speed up python code, though some of it is a bit out of date.
Here's the BDFL himself on the subject of optimising loops.
There are quite a few things, even from my own limited experience that I've missed out, but this answer was long enough already!
This is all based on my own recent experiences with some python code that just wasn't fast enough, and I'd like to stress again that I don't really think any of what I've suggested is actually a good idea, sometimes though, you have to....

First off, profile your code so you know where the problems lie. There are many examples of how to do this, here's one: https://codereview.stackexchange.com/questions/3393/im-trying-to-understand-how-to-make-my-application-more-efficient
You do a lot of indexed access as in:
for pair in range(i-1, j):
if coordinates[pair][0] >= 0 and coordinates[pair][1] >= 0:
Which could be written more plainly as:
for coord in coordinates[i-1:j]:
if coord[0] >= 0 and cood[1] >= 0:
List comprehensions are cool and "pythonic", but this code would probably run faster if you didn't create 4 lists:
N = int(raw_input())
coordinates = []
coordinates = [raw_input() for i in xrange(N)]
coordinates = [pair.split(" ") for pair in coordinates]
coordinates = [[int(pair[0]), int(pair[1])] for pair in coordinates]
I would instead roll all those together into one simple loop or if you're really dead set on list comprehensions, encapsulate the multiple transformations into a function which operates on the raw_input().

This answer shows how I locate code to optimize.
Suppose there is some line of code you could replace, and it is costing, say, 40% of the time.
Then it resides on the call stack 40% of the time.
If you take 10 samples of the call stack, it will appear on 4 of them, give or take.
It really doesn't matter how many samples show it.
If it appears on two or more, and if you can replace it, you will save whatever time it costs.

Most of the interview street problems seem to be tested in a way that will verify that you have found an algorithm with the right big O complexity rather than that you have coded the solution in the most optimal way possible.
In other words if you are failing some of the test cases due to running out of time the problem is likely that you need to figure out a solution with lower algorithmic complexity rather than micro-optimize the algorithm you have. This is why they generally state that N can be quite large.

Generator speed in python 3

I am going through a link about generators that someone posted. In the beginning he compares the two functions below. On his setup he showed a speed increase of 5% with the generator.
I'm running windows XP, python 3.1.1, and cannot seem to duplicate the results. I keep showing the "old way"(logs1) as being slightly faster when tested with the provided logs and up to 1GB of duplicated data.
Can someone help me understand whats happening differently?
Thanks!
def logs1():
wwwlog = open("big-access-log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(None,1)[1]
if bytestr != '-':
total += int(bytestr)
return total
def logs2():
wwwlog = open("big-access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
getbytes = (int(x) for x in bytecolumn if x != '-')
return sum(getbytes)
*edit, spacing messed up in copy/paste

For what it's worth, the main purpose of the speed comparison in the presentation was to point out that using generators does not introduce a huge performance overhead. Many programmers, when first seeing generators, might start wondering about the hidden costs. For example, is there all sorts of fancy magic going on behind the scenes? Is using this feature going to make my program run twice as slow?
In general that's not the case. The example is meant to show that a generator solution can run at essentially the same speed, if not slightly faster in some cases (although it depends on the situation, version of Python, etc.). If you are observing huge differences in performance between the two versions though, then that would be something worth investigating.

In David Beazley's slides that you linked to, he states that all tests were run with "Python 2.5.1 on OS X 10.4.11," and you say you're running tests with Python 3.1 on Windows XP. So, realize you're doing some apples to oranges comparison. I suspect of the two variables, the Python version matters much more.
Python 3 is a different beast than Python 2. Many things have changed under the hood, (even within the Python 2 branch). This includes performance optimizations as well as performance regressions (see, for example, Beazley's own recent blog post on I/O in Python 3). For this reason, the Python Performance Tips page states explicitly,
You should always test these tips with
your application and the version of
Python you intend to use and not just
blindly accept that one method is
faster than another.
I should mention that one area that you can count on generators helping is in reducing memory consumption, rather than CPU consumption. If you have a large amount of data where you calculate or extract something from each individual piece, and you don't need the data after, generators will shine. See generator comprehension for more details.

You don't have an answer after almost a half an hour. I'm posting something that makes sense to me, not necessarily the right answer. I figure that this is better than nothing after almost half an hour:
The first algorithm uses a generator. A generator functions by loading the first page of results from the list (into memory) and continually loads the successive pages (into memory) until there is nothing left to read from input.
The second algorithm uses two generators, each with an if statement for a total of two comparisons per loop as opposed to the first algorithm's one comparison.
Also the second algorithm calls the sum function at the end as opposed to the first algorithm that simply keeps adding relevant integers as it keeps encountering them.
As such, for sufficiently large inputs, the second algorithm has more comparisons and an extra function call than the first. This could possibly explain why it takes longer to finish than the first algorithm.
Hope this helps

Should I optimise my python code like C++? Does it matter?

I had an argument with a colleague about writing python efficiently. He claimed that though you are programming python you still have to optimise the little bits of your software as much as possible, as if you are writing an efficient algorithm in C++.
Things like:
In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
Use the most efficient functions for manipulating strings in common use. Not code that grinds strings, but simple things like doing joins and splits, and finding substrings.
Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates.
I say, that in most cases it doesn't matter. I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
What's your view of the matter?

My answer to that would be :
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is the
root of all evil.
(Quoting Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268)
If your application is doing anything like a query to the database, that one query will take more time than anything you can gain with those kind of small optimizations, anyway...
And if running after performances like that, why not code in assembly language, afterall ? Because Python is easier/faster to write and maintain ? Well, if so, you are right :-)
The most important thing is that your code is easy to maintain ; not a couple micro-seconds of CPU-time !
Well, maybe except if you have thousands of servers -- but is it your case ?

The answer is really simple :
Follow Python best practices, not C++ best practices.
Readability in Python is more important that speed.
If performance becomes an issue, measure, then start optimizing.

This sort of premature micro-optimisation is usually a waste of time in my experience, even in C and C++. Write readable code first. If it's running too slowly, run it through a profiler, and if necessary, fix the hot-spots.
Fundamentally, you need to think about return on investment. Is it worth the extra effort in reading and maintaining "optimised" code for the couple of microseconds it saves you? In most cases it isn't.
(Also, compilers and runtimes are getting cleverer. Some micro-optimisations may become micro-pessimisations over time.)

I agree with others: readable code first ("Performance is not a problem until performance is a problem.").
I only want to add that when you absolutely need to write some unreadable and/or non-intuitive code, you can generally isolate it in few specific methods, for which you can write detailed comments, and keep the rest of your code highly readable. If you do so, you'll end up having easy to maintain code, and you'll only have to go through the unreadable parts when you really need to.

I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
Given this, I'd say that you should take your colleague's advice about writing efficient Python but ignore anything he says that goes against prioritizing readability and maintainability of the code, which will probably be more important than the speed at which it'll execute.

In an if statement with an or always
put the condition most likely to fail
first, so the second will not be
checked.
This is generally a good advice, and also depends on the logic of your program. If it makes sense that the second statement is not evaluated if the first returns false, then do so. Doing the opposite could be a bug otherwise.
Use the most efficient functions for
manipulating strings in common use.
Not code that grinds strings, but
simple things like doing joins and
splits, and finding substrings.
I don't really get this point. Of course you should use the library provided functions, because they are probably implemented in C, and a pure python implementation is most likely to be slower. In any case, no need to reinvent the wheel.
Call as less functions as possible,
even if it comes on the expense of
readability, because of the overhead
this creates.
$ cat withcall.py
def square(a):
return a*a
for i in xrange(1,100000):
i_square = square(i)
$ cat withoutcall.py
for i in xrange(1,100000):
i_square = i*i
$ time python2.3 withcall.py
real 0m5.769s
user 0m4.304s
sys 0m0.215s
$ time python2.3 withcall.py
real 0m5.884s
user 0m4.315s
sys 0m0.206s
$ time python2.3 withoutcall.py
real 0m5.806s
user 0m4.172s
sys 0m0.209s
$ time python2.3 withoutcall.py
real 0m5.613s
user 0m4.171s
sys 0m0.216s
I mean... come on... please.

I think there are several related 'urban legends' here.
False Putting the more often-checked condition first in a conditional and similar optimizations save enough time for a typical program that it is worthy for a typical programmer.
True Some, but not many, people are using such styles in Python in the incorrect belief outlined above.
True Many people use such style in Python when they think that it improves readability of a Python program.
About readability: I think it's indeed useful when you give the most useful conditional first, since this is what people notice first anyway. You should also use ''.join() if you mean concatenation of strings since it's the most direct way to do it (the s += x operation could mean something different).
"Call as less functions as possible" decreases readability and goes against Pythonic principle of code reuse. And so it's not a style people use in Python.

Before introducing performance optimizations at the expense of readability, look into modules like psyco that will do some JIT-ish compiling of distinct functions, often with striking results, with no impairment of readability.
Then if you really want to embark on the optimization path, you must first learn to measure and profile. Optimization MUST BE QUANTITATIVE - do not go with your gut. The hotspot profiler will show you the functions where your program is burning up the most time.
If optimization turns up a function like this is being frequently called:
def get_order_qty(ordernumber):
# look up order in database and return quantity
If there is any repetition of ordernumbers, then memoization would be a good optimization technique to learn, and it is easily packaged in an #memoize decorator so that there is little impact to program readability. The effect of memoizing is that values returned for a given set of input arguments are cached, so that the expensive function can be called only once, with subseqent calls resolved against the cache.
Lastly, consider lifting invariants out of loops. For large multi-dimensional structures, this can save a lot of time - in fact in this case, I would argue that this optimization improves readability, as it often serves to make clear that some expression can be computed at a high-level dimension in the nested logic.
(BTW, is this really what you meant?
•In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
I should think this might be the case for "and", but an "or" will short-circuit if the first value is True, saving the evaluation of the second term of the conditional. So I would change this optimization "rule" to:
If testing "A and B", put A first if
it is more likely to evaluate to
False.
If testing "A or B", put A first if
it is more likely to evaluate to
True.
But often, the sequence of conditions is driven by the tests themselves:
if obj is not None and hasattr(obj,"name") and obj.name.startswith("X"):
You can't reorder these for optimization - they have to be in this order (or just let the exceptions fly and catch them later:
if obj.name.startswith("X"):

Sure follow Python best-practices (and in fact I agree with the first two recommendations), but maintainability and efficiency are not opposites, they are mostly togethers (if that's a word).
Statements like "always write your IF statements a certain way for performance" are a-priori, i.e. not based on knowledge of what your program spends time on, and are therefore guesses. The first (or second, or third, whatever) rule of performance tuning is don't guess.
If after you measure, profile, or in my case do this, you actually know that you can save much time by re-ordering tests, by all means, do. My money says that's at the 1% level or less.

My visceral reaction is this:
I've worked with guys like your colleague and in general I wouldn't take advice from them.
Ask him if he's ever even used a profiler.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.