Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've been working on one of the coding challenges on InterviewStreet.com and I've run into a bit of an efficiency problem. Can anyone suggest where I might change the code to make it faster and more efficient?
Here's the code
Here's the problem statement if you're interested
If your question is about optimising python code generally (which I think it should be ;) then there are all sorts of intesting things you can do, but first:
You probably shouldn't be obsessively optimising python code! If you're using the fastest algorithm for the problem you're trying to solve and python doesn't do it fast enough you should probably be using a different language.
That said, there are several approaches you can take (because sometimes, you really do want to make python code faster):
Profile (do this first!)
There are lots of ways of profiling python code, but there are two that I'll mention: cProfile (or profile) module, and PyCallGraph.
cProfile
This is what you should actually use, though interpreting the results can be a bit daunting.
It works by recording when each function is entered or exited, and what the calling function was (and tracking exceptions).
You can run a function in cProfile like this:
import cProfile
cProfile.run('myFunction()', 'myFunction.profile')
Then to view the results:
import pstats
stats = pstats.Stats('myFunction.profile')
stats.strip_dirs().sort_stats('time').print_stats()
This will show you in which functions most of the time is spent.
PyCallGraph
PyCallGraph provides a prettiest and maybe the easiest way of profiling python programs -- and it's a good introduction to understanding where the time in your program is spent, however it adds significant execution overhead
To run pycallgraph:
pycallgraph graphviz ./myprogram.py
Simple! You get a png graph image as output (perhaps after a while...)
Use Libraries
If you're trying to do something in python that a module already exists for (maybe even in the standard library), then use that module instead!
Most of the standard library modules are written in C, and they will execute hundreds of times faster than equivilent python implementations of, say, bisection search.
Make the Interpreter do as Much of Your Work as You Can
The interpreter will do some things for you, like looping. Really? Yes! You can use the map, reduce, and filter keywords to significantly speed up tight loops:
consider:
for x in xrange(0, 100):
doSomethingWithX(x)
vs:
map(doSomethingWithX, xrange(0,100))
Well obviously this could be faster because the interpreter only has to deal with a single statement, rather than two, but that's a bit vague... in fact, this is faster for two reasons:
all flow control (have we finished looping yet...) is done in the interpreter
the doSomethingWithX function name is only resolved once
In the for loop, each time around the loop python has to check exactly where the doSomethingWithX function is! even with cacheing this is a bit of an overhead.
Remember that Python is an Interpreted Language
(Note that this section really is about tiny tiny optimisations that you shouldn't let affect your normal, readable coding style!)
If you come from a background of a programming in a compiled language, like c or Fortran, then some things about the performance of different python statements might be surprising:
try:ing is cheap, ifing is expensive
If you have code like this:
if somethingcrazy_happened:
uhOhBetterDoSomething()
else:
doWhatWeNormallyDo()
And doWhatWeNormallyDo() would throw an exception if something crazy had happened, then it would be faster to arrange your code like this:
try:
doWhatWeNormallyDo()
except SomethingCrazy:
uhOhBetterDoSomething()
Why? well the interpreter can dive straight in and start doing what you normally do; in the first case the interpreter has to do a symbol look up each time the if statement is executed, because the name could refer to something different since the last time the statement was executed! (And a name lookup, especially if somethingcrazy_happened is global can be nontrivial).
You mean Who??
Because of cost of name lookups it can also be better to cache global values within functions, and bake-in simple boolean tests into functions like this:
Unoptimised function:
def foo():
if condition_that_rarely_changes:
doSomething()
else:
doSomethingElse()
Optimised approach, instead of using a variable, exploit the fact that the interpreter is doing a name lookup on the function anyway!
When the condition becomes true:
foo = doSomething # now foo() calls doSomething()
When the condition becomes false:
foo = doSomethingElse # now foo() calls doSomethingElse()
PyPy
PyPy is a python implementation written in python. Surely that means it will run code infinitely slower? Well, no. PyPy actually uses a Just-In-Time compiler (JIT) to run python programs.
If you don't use any external libraries (or the ones you do use are compatible with PyPy), then this is an extremely easy way to (almost certainly) speed up repetitive tasks in your program.
Basically the JIT can generate code that will do what the python interpreter would, but much faster, since it is generated for a single case, rather than having to deal with every possible legal python expression.
Where to look Next
Of course, the first place you should have looked was to improve your algorithms and data structures, and to consider things like caching, or even whether you need to be doing so much in the first place, but anyway:
This page of the python.org wiki provides lots of information about how to speed up python code, though some of it is a bit out of date.
Here's the BDFL himself on the subject of optimising loops.
There are quite a few things, even from my own limited experience that I've missed out, but this answer was long enough already!
This is all based on my own recent experiences with some python code that just wasn't fast enough, and I'd like to stress again that I don't really think any of what I've suggested is actually a good idea, sometimes though, you have to....
First off, profile your code so you know where the problems lie. There are many examples of how to do this, here's one: https://codereview.stackexchange.com/questions/3393/im-trying-to-understand-how-to-make-my-application-more-efficient
You do a lot of indexed access as in:
for pair in range(i-1, j):
if coordinates[pair][0] >= 0 and coordinates[pair][1] >= 0:
Which could be written more plainly as:
for coord in coordinates[i-1:j]:
if coord[0] >= 0 and cood[1] >= 0:
List comprehensions are cool and "pythonic", but this code would probably run faster if you didn't create 4 lists:
N = int(raw_input())
coordinates = []
coordinates = [raw_input() for i in xrange(N)]
coordinates = [pair.split(" ") for pair in coordinates]
coordinates = [[int(pair[0]), int(pair[1])] for pair in coordinates]
I would instead roll all those together into one simple loop or if you're really dead set on list comprehensions, encapsulate the multiple transformations into a function which operates on the raw_input().
This answer shows how I locate code to optimize.
Suppose there is some line of code you could replace, and it is costing, say, 40% of the time.
Then it resides on the call stack 40% of the time.
If you take 10 samples of the call stack, it will appear on 4 of them, give or take.
It really doesn't matter how many samples show it.
If it appears on two or more, and if you can replace it, you will save whatever time it costs.
Most of the interview street problems seem to be tested in a way that will verify that you have found an algorithm with the right big O complexity rather than that you have coded the solution in the most optimal way possible.
In other words if you are failing some of the test cases due to running out of time the problem is likely that you need to figure out a solution with lower algorithmic complexity rather than micro-optimize the algorithm you have. This is why they generally state that N can be quite large.
Related
Captain Hindsight, reporting in:
After reading through the comment and answer and running a few tests, I found out that I had made a subtle error in my calculations. Turns out, I was
comparing compiled lookups to interpreted calls. When I precompiled
the call using the NON-IPython line magic version ( ie:
timeit.timeit(codestr, setup_codestr), I found that the function calls
were indeed on the same order of magnitude as the lookups :)
Now there's a whole world of caching function results, precompiling functions, and precompiling types to explore! ..and that's nice :)
For posterity:
I realize that sounds like a strange question, but someone might know a way around this, and that would be great. So here goes:
If I do something like:
%%timeit somelist[42]
Then I get times in the 90 nanosecond range. A slice will get it up to 190ish; and, to my pleasant surprise, even big crazy ones were still fast. This bad boy, for instance, weighs in at 385 nanseconds:
%%timeit some_nested_list[2:5][1][6:13]
Here's the thing. Function calls, it seems, are a lot slower than that. I like decomposing problems functionally, and am starting to give functional programming a bit more thought, but the speed difference is significant and (3.34 microseconds vs 100-150 nanoseconds (realistic actual avgs of conditionals, etc)). The following takes 3.34 micros:
def func():
some_nested_list[2:5][1][6:13]
%%timeit func()
So, there's presumably a lot of functional programmers out there? You all must have dealt with this little hiccup? Someone care to point me in the right direction?
Not really. Python function calls involve a certain amount of overhead for setting up the stack frame, etc., and you can't eliminate that overhead while still writing a Python function. The reason the operations in your example are fast is that you're doing them on a list, and lists are written in C.
One thing to keep in mind is that, in many practical situations, the function call overhead will be small relative to what the function actually does. See this question for some discussion. However, if you move toward a pure-functional style in which each function just evaluates one expression, you may indeed suffer a performance penalty.
An alternative is to look at PyPy, which makes many pure-Python operations faster. I don't know whether it improves function call speed specifically. Also, by using PyPy you restrict the set of libraries you can use.
Finally, there is Cython, which allows you to write code in a language that looks basically the same as Python, but actually compiles to C. This can be much faster than Python in some cases.
The bottom line is that how to speed up your functions depends on what your functions actually do. There is no magic way to just magically make all function calls faster while still keeping everything else about Python the same. If there were, they probably would have already added it to Python.
If I defined a function like this:
def ccid_year(seq):
year, prefix, index, suffix = seq
return year
Is Python allowed to optimize it to be effectively:
def ccid_year(seq):
return seq[0]
I'd prefer to write the first function because it documents the format of the data being passed in but would hope that Python would generate code that is effectively as efficient as the second definition.
The two functions are not equivalent:
def ccid_year_1(seq):
year, prefix, index, suffix = seq
return year
def ccid_year_2(seq):
return seq[0]
arg = {1:'a', 2:'b', 0:'c', 3:'d'}
print ccid_year_1(arg)
print ccid_year_2(arg)
The first call prints 0 and the second prints c.
I'll answer the question at face value later, but first: When in doubt, benchmark it! But first, recall that most time is spent in a small portion of the code (i.e., most code is irrelevant to performance!) and, in CPython, function call overhead usually dominates small inefficiencies. Not to mention that large-scale algorithmic inefficiencies (a.k.a. freaking stupid code) dwarfs micro-optimization concerns.
So either don't worry about this at all, or if you have reason to worry about it, first benchmark alternatives and second don't put it in a function. Note that "reasons to worry about it" must be weighted against the time spent worrying, and the maintenance burden (if there is one) of the manual optimization.
CPython, the reference implementation you most like use, is very conservative about optimizing at this level. While there is a peephole optimizer operating on bytecode, it is limited in scale. More generally, you can't expect much optimization crossing a single statement. The problem with statically optimizing Python code is that there's a billion ways even the most innocently-looking program frament can call into arbitrary code, which might do anything at all, so you can't omit these calls.
While we're at it, your proposed optimization is invalid (in the sense that the program doesn't have the same behavior) if seq is of the wrong type (not a sequence, or a very weird sequence) or length (not exactly three items long)! Any program claiming to implement Python must maintain such differences, so it won't do the transformation you suggest literally. I assume this was just an off-hand illustration, but it does indicate you seriously underestimate how complex Python is (to implement, and doubly so to optimize). I and others have written about this at length before, so I'll stop now before this post becomes even larger.
PyPy on the other hand will, if this function is indeed called from a hot loop, probably optimize this and a million other things you didn't even think of, while compiling it down to a machine code loop that iterates faster than any Python loop could ever iterate on CPython. It will still contain a few checks to break out of the loop and take the proper action (e.g. raise an exception) if necessary, but they'll also be highly efficient if not triggered.
I do not know much about IronPython and Jython and other implementations, but if their lack of consistent several-times-faster-than-CPython benchmark results is any indicator, they do not perform significant optimizations. While the VMs IronPython and Jython include JIT compilers (not - but not quite - entirely unlike PyPy's), these JIT compilers are built for very different languages, and I'd be very surprised if they could look through the mess of code IronPython/Jython must execute to achieve Python semantics and perform such optimizations on it.
I am going through a link about generators that someone posted. In the beginning he compares the two functions below. On his setup he showed a speed increase of 5% with the generator.
I'm running windows XP, python 3.1.1, and cannot seem to duplicate the results. I keep showing the "old way"(logs1) as being slightly faster when tested with the provided logs and up to 1GB of duplicated data.
Can someone help me understand whats happening differently?
Thanks!
def logs1():
wwwlog = open("big-access-log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(None,1)[1]
if bytestr != '-':
total += int(bytestr)
return total
def logs2():
wwwlog = open("big-access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
getbytes = (int(x) for x in bytecolumn if x != '-')
return sum(getbytes)
*edit, spacing messed up in copy/paste
For what it's worth, the main purpose of the speed comparison in the presentation was to point out that using generators does not introduce a huge performance overhead. Many programmers, when first seeing generators, might start wondering about the hidden costs. For example, is there all sorts of fancy magic going on behind the scenes? Is using this feature going to make my program run twice as slow?
In general that's not the case. The example is meant to show that a generator solution can run at essentially the same speed, if not slightly faster in some cases (although it depends on the situation, version of Python, etc.). If you are observing huge differences in performance between the two versions though, then that would be something worth investigating.
In David Beazley's slides that you linked to, he states that all tests were run with "Python 2.5.1 on OS X 10.4.11," and you say you're running tests with Python 3.1 on Windows XP. So, realize you're doing some apples to oranges comparison. I suspect of the two variables, the Python version matters much more.
Python 3 is a different beast than Python 2. Many things have changed under the hood, (even within the Python 2 branch). This includes performance optimizations as well as performance regressions (see, for example, Beazley's own recent blog post on I/O in Python 3). For this reason, the Python Performance Tips page states explicitly,
You should always test these tips with
your application and the version of
Python you intend to use and not just
blindly accept that one method is
faster than another.
I should mention that one area that you can count on generators helping is in reducing memory consumption, rather than CPU consumption. If you have a large amount of data where you calculate or extract something from each individual piece, and you don't need the data after, generators will shine. See generator comprehension for more details.
You don't have an answer after almost a half an hour. I'm posting something that makes sense to me, not necessarily the right answer. I figure that this is better than nothing after almost half an hour:
The first algorithm uses a generator. A generator functions by loading the first page of results from the list (into memory) and continually loads the successive pages (into memory) until there is nothing left to read from input.
The second algorithm uses two generators, each with an if statement for a total of two comparisons per loop as opposed to the first algorithm's one comparison.
Also the second algorithm calls the sum function at the end as opposed to the first algorithm that simply keeps adding relevant integers as it keeps encountering them.
As such, for sufficiently large inputs, the second algorithm has more comparisons and an extra function call than the first. This could possibly explain why it takes longer to finish than the first algorithm.
Hope this helps
In a game that I am writing, I use a 2D vector class which I have written to handle the speeds of the objects. This is called a large number of times every frame as there are a lot of objects on the screen, so any increase I can make in its speed will be useful.
It is pretty simple, consisting mostly of wrappers to the related math functions. It would be quite trivial to rewrite in C, but I am not sure whether doing so will make any significant difference as all it really does is call the underlying math functions, add, multiply or divide.
So, my question is under what circumstances does it make sense to rewrite in C? Where will you see a significant speed boost, and where can you see a reasonable speed boost without rewriting an extensive amount of the program?
If you're vector-munging, give numpy a try first. Chances are you will get speeds not far from C if you utilize numpy's vector manipulation functions wisely.
Other than that, your question is very heuristic. If your code is too slow:
Profile it - chances are you'll be able to improve it in Python
Use the correct optimized C-based libraries (numpy in your case)
Try psyco
Try rewriting parts with cython
If all else fails, rewrite in C
First measure then optimize
You should never optimize anything, be it in C or any other language, without timing your code before and after your optimization:
your clever optimization could in fact induce a slow down
optimizing something that takes 1% of the total execution time will never give you more than 1% performance
The common approach is:
profile your code
identify a hotspot
time this hotspot
optimize it
time the hotspot again, see if it's faster. If it's not goto 3.
If you can't find hotspots it could mean that your app is already optimized, or that you are not using the good algorithm for your problem. In both cases profiling helps understanding what your code does.
For profiling python code under Linux, you can use pyprof2calltree which works in conjunction with kcachegrind, and is totally awesome.
Common wisdom is "profile", "measure", etc. Well - maybe. Just get in the debugger and take 10 stackshots. If more than one of them terminates in your wrapper code, then it is costing more than 10% roughly, so you should consider re-doing it in C, to save that time. Chances are you will find other things also that are costing more than that.
A nice Profiler I use on Linux is pycallgraph - however, as your program gets bigger it starts to create much larger images which are harder to trace. I'm pretty sure you can exclude modules, though.
I had an argument with a colleague about writing python efficiently. He claimed that though you are programming python you still have to optimise the little bits of your software as much as possible, as if you are writing an efficient algorithm in C++.
Things like:
In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
Use the most efficient functions for manipulating strings in common use. Not code that grinds strings, but simple things like doing joins and splits, and finding substrings.
Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates.
I say, that in most cases it doesn't matter. I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
What's your view of the matter?
My answer to that would be :
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is the
root of all evil.
(Quoting Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268)
If your application is doing anything like a query to the database, that one query will take more time than anything you can gain with those kind of small optimizations, anyway...
And if running after performances like that, why not code in assembly language, afterall ? Because Python is easier/faster to write and maintain ? Well, if so, you are right :-)
The most important thing is that your code is easy to maintain ; not a couple micro-seconds of CPU-time !
Well, maybe except if you have thousands of servers -- but is it your case ?
The answer is really simple :
Follow Python best practices, not C++ best practices.
Readability in Python is more important that speed.
If performance becomes an issue, measure, then start optimizing.
This sort of premature micro-optimisation is usually a waste of time in my experience, even in C and C++. Write readable code first. If it's running too slowly, run it through a profiler, and if necessary, fix the hot-spots.
Fundamentally, you need to think about return on investment. Is it worth the extra effort in reading and maintaining "optimised" code for the couple of microseconds it saves you? In most cases it isn't.
(Also, compilers and runtimes are getting cleverer. Some micro-optimisations may become micro-pessimisations over time.)
I agree with others: readable code first ("Performance is not a problem until performance is a problem.").
I only want to add that when you absolutely need to write some unreadable and/or non-intuitive code, you can generally isolate it in few specific methods, for which you can write detailed comments, and keep the rest of your code highly readable. If you do so, you'll end up having easy to maintain code, and you'll only have to go through the unreadable parts when you really need to.
I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.
Given this, I'd say that you should take your colleague's advice about writing efficient Python but ignore anything he says that goes against prioritizing readability and maintainability of the code, which will probably be more important than the speed at which it'll execute.
In an if statement with an or always
put the condition most likely to fail
first, so the second will not be
checked.
This is generally a good advice, and also depends on the logic of your program. If it makes sense that the second statement is not evaluated if the first returns false, then do so. Doing the opposite could be a bug otherwise.
Use the most efficient functions for
manipulating strings in common use.
Not code that grinds strings, but
simple things like doing joins and
splits, and finding substrings.
I don't really get this point. Of course you should use the library provided functions, because they are probably implemented in C, and a pure python implementation is most likely to be slower. In any case, no need to reinvent the wheel.
Call as less functions as possible,
even if it comes on the expense of
readability, because of the overhead
this creates.
$ cat withcall.py
def square(a):
return a*a
for i in xrange(1,100000):
i_square = square(i)
$ cat withoutcall.py
for i in xrange(1,100000):
i_square = i*i
$ time python2.3 withcall.py
real 0m5.769s
user 0m4.304s
sys 0m0.215s
$ time python2.3 withcall.py
real 0m5.884s
user 0m4.315s
sys 0m0.206s
$ time python2.3 withoutcall.py
real 0m5.806s
user 0m4.172s
sys 0m0.209s
$ time python2.3 withoutcall.py
real 0m5.613s
user 0m4.171s
sys 0m0.216s
I mean... come on... please.
I think there are several related 'urban legends' here.
False Putting the more often-checked condition first in a conditional and similar optimizations save enough time for a typical program that it is worthy for a typical programmer.
True Some, but not many, people are using such styles in Python in the incorrect belief outlined above.
True Many people use such style in Python when they think that it improves readability of a Python program.
About readability: I think it's indeed useful when you give the most useful conditional first, since this is what people notice first anyway. You should also use ''.join() if you mean concatenation of strings since it's the most direct way to do it (the s += x operation could mean something different).
"Call as less functions as possible" decreases readability and goes against Pythonic principle of code reuse. And so it's not a style people use in Python.
Before introducing performance optimizations at the expense of readability, look into modules like psyco that will do some JIT-ish compiling of distinct functions, often with striking results, with no impairment of readability.
Then if you really want to embark on the optimization path, you must first learn to measure and profile. Optimization MUST BE QUANTITATIVE - do not go with your gut. The hotspot profiler will show you the functions where your program is burning up the most time.
If optimization turns up a function like this is being frequently called:
def get_order_qty(ordernumber):
# look up order in database and return quantity
If there is any repetition of ordernumbers, then memoization would be a good optimization technique to learn, and it is easily packaged in an #memoize decorator so that there is little impact to program readability. The effect of memoizing is that values returned for a given set of input arguments are cached, so that the expensive function can be called only once, with subseqent calls resolved against the cache.
Lastly, consider lifting invariants out of loops. For large multi-dimensional structures, this can save a lot of time - in fact in this case, I would argue that this optimization improves readability, as it often serves to make clear that some expression can be computed at a high-level dimension in the nested logic.
(BTW, is this really what you meant?
•In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
I should think this might be the case for "and", but an "or" will short-circuit if the first value is True, saving the evaluation of the second term of the conditional. So I would change this optimization "rule" to:
If testing "A and B", put A first if
it is more likely to evaluate to
False.
If testing "A or B", put A first if
it is more likely to evaluate to
True.
But often, the sequence of conditions is driven by the tests themselves:
if obj is not None and hasattr(obj,"name") and obj.name.startswith("X"):
You can't reorder these for optimization - they have to be in this order (or just let the exceptions fly and catch them later:
if obj.name.startswith("X"):
Sure follow Python best-practices (and in fact I agree with the first two recommendations), but maintainability and efficiency are not opposites, they are mostly togethers (if that's a word).
Statements like "always write your IF statements a certain way for performance" are a-priori, i.e. not based on knowledge of what your program spends time on, and are therefore guesses. The first (or second, or third, whatever) rule of performance tuning is don't guess.
If after you measure, profile, or in my case do this, you actually know that you can save much time by re-ordering tests, by all means, do. My money says that's at the 1% level or less.
My visceral reaction is this:
I've worked with guys like your colleague and in general I wouldn't take advice from them.
Ask him if he's ever even used a profiler.