Will using less calls to functions increase speed of total execution? [duplicate] - python

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Does creating separate functions instead of one big one slow processing time?
OOP Bloop boop!
Yes, OOP is great, keeps code clean and well organized. That's why I like to use it. I"m referring to to OOP very primitively specifically in using functions (defs).
Will taking out my function calls and sticking the content of the function straight into my algo increase speed of execution of the overall code?
Yes, I know I can run a test myself, but I'm choosing to ask it here in the forum of my fellow coder colleagues, because I know this is a question that floats around in many heads....
def myFunc(var):
return var*3243 #perhaps more complicated function code will make a difference?
i = 0
hungry = True
while hungry:
i = i + 1
x = myFunc(i)
if i > 50:
hungry = False

Write it correctly (i.e. with concerns properly separated into distinct functions and classes), then use PyPy to make it fast.
PyPy uses function inlining and assorted other tricks in its Just-In-Time compiler to speed up code execution without having to make it unmaintainable in the name of speed.
Note that this solution only works if you're using the Python 2.x series with any C extensions you use being compatible with cpyext.

Apparently there is very high function call overhead in Python.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation
Is it worth keeping in mind while choosing the best way to write a piece of code?
Sure.
Is the speed up worth making your coder harder to understand?
Probably not.

Python won't "inline" the function for you, so of course there is overhead. Usually the code in the function will take enough time that the call overhead is not significant.
Bear in mind that it's much easier to test,debug and profile programs that are broken up into functions.
If you really really need more performance it's usually a better idea to write the function in C or Cython than to eliminate the call overhead
It's also worth noting that the usual code set up like this
def main():
...
if __name__=="main":
main()
is faster than just running code at the top level as identifier lookups are faster in functions/methods than in the global namespace

Related

Limit of performance benefit gained in Python from using local variables instead of global variables?

I came upon a question which says that Python code runs faster in functions. So I thought that breaking code down into as many parts as I can will be the faster approach. But when timing some functions I found that it wasn't really correct.
I won't post the code here as it is currently placed for review at codereview. I am still figuring out the best way to do timing as that code is also placed for review at codereview and I am not getting many answer despite a bounty.
I figured out that the performance benefit cannot be infinite and there has to be a limit where breaking functions down would stop giving performance benefit.
So what is the the limit where performance benefit stops by breaking Python code into various functions? From the point of performance, when is breaking code into functions no longer useful?
It is true that code runs faster in a function than in the global scope because of access times. A function has its own local scope which is implemented as an array, whereas the global scope is really just a dictionary. Arrays are accessed faster than dicts and this is happening under the hood, on the C level.
This doesn't imply that breaking the code up into many functions will make it quicker, it only implies that moving code from global to inside a function will improve access times, which might result in an overall boost in speed.
Even if you were to shove all your code inside one function just for the sake of using local scope instead of global, it doesn't guarantee a performance improvement, which can only be determined by actually profiling the code. This is because function calls have a relatively high overhead in Python which might dwarf any performance gain from faster local access times.
There's a lot of info on the page at the first link you provided that confirms this.
As I understand it, that performance boost only applies when global variables are moved into a local scope of a function, as it improves their access time. Further breaking your code down into more functions won't provide any similar boost.

Are there advantages to use the Python/C interface instead of Cython?

I want to extend python and numpy by writing some modules in C or C++, using BLAS and LAPACK. I also want to be able to distribute the code as standalone C/C++ libraries. I would like this libraries to use both single and double precision float. Some examples of functions I will write are conjugate gradient for solving linear systems or accelerated first order methods. Some functions will need to call a Python function from the C/C++ code.
After playing a little with the Python/C API and the Numpy/C API, I discovered that many people advocate the use of Cython instead (see for example this question or this one). I am not an expert about Cython, but it seems that for some cases, you still need to use the Numpy/C API and know how it works. Given the fact that I already have (some little) knowledge about the Python/C API and none about Cython, I was wondering if it makes sense to keep on using the Python/C API, and if using this API has some advantages over Cython. In the future, I will certainly develop some stuff not involving numerical computing, so this question is not only about numpy. One of the thing I like about the Python/C API is the fact that I learn some stuff about how the Python interpreter is working.
Thanks.
The current "top answer" sounds a bit too much like FUD in my ears. For one, it is not immediately obvious that the Average Developer would write faster code in C than what NumPy+Cython gives you anyway. Quite the contrary, the time it takes to even get the necessary C code to work correctly in a Python environment is usually much better invested in writing a quick prototype in Cython, benchmarking it, optimising it, rewriting it in a faster way, benchmarking it again, and then deciding if there is anything in it that truly requires the 5-10% more performance that you may or may not get from rewriting 2% of the code in hand-tuned C and calling it from your Cython code.
I'm writing a library in Cython that currently has about 18K lines of Cython code, which translate to almost 200K lines of C code. I once managed to get a speed-up of almost 25% for a couple of very important internal base level functions, by injecting some 20 lines of hand-tuned C code in the right places. It took me a couple of hours to rewrite and optimise this tiny part. That's truly nothing compared to the huge amount of time I saved by not writing (and having to maintain) the library in plain C in the first place.
Even if you know C a lot better than Cython, if you know Python and C, you will learn Cython so quickly that it's worth the investment in any case, especially when you are into numerics. 80-95% of the code you write will benefit so much from being written in a high-level language, that you can safely lay back and invest half of the time you saved into making your code just as fast as if you had written it in a low-level language right away.
That being said, your comment that you want "to be able to distribute the code as standalone C/C++ libraries" is a valid reason to stick to plain C/C++. Cython always depends on CPython, which is quite a dependency. However, using plain C/C++ (except for the Python interface) will not allow you to take advantage of NumPy either, as that also depends on CPython. So, as usual when writing something in C, you will have to do a lot of ground work before you get to the actual functionality. You should seriously think about this twice before you start this work.
First, there is one point in your question I don't get:
[...] also want to be able to distribute the code as standalone C/C++ libraries. [...] Some functions will need to call a Python function from the C/C++ code.
How is this supposed to work?
Next, as to your actual question, there are certainly advantages of using the Python/C API directly:
Most likely, you are more familar with writing C code than writing Cython code.
Writing your code in C gives you maximum control. To get the same performance from Cython code as from equivalent C code, you'll have to be very careful. You'll not only need to make sure to declare the types of all variables, you'll also have to set some flags adequately -- just one example is bounds checking. You will need intimate knowledge how Cython is working to get the best performance.
Cython code depends on Python. It does not seem to be a good idea to write code that should also be distributed as standalone C library in Cython
The main disadvantage of the Python/C API is that it can be very slow if it's used in an inner loop. I'm seeing that calling a Python function takes a 80-160x hit over calling an equivalent C++ function.
If that doesn't bother your code then you benefit from being able to write some chunks of code in Python, have access to Python libraries, support callbacks written directly in Python. That also means that you can make some changes without recompiling, making prototyping easier.

Python Language Nuances [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Common Pitfalls in Python
I'm learning Python and I come from a diverse background of programming languages. In the last five years, I've written quite a bit of Java, C++, VB.Net, and PHP. As many of you might agree, once you learn one programming language, learning another is just a mater of learning the differences in syntax and best practices.
Coming down from PHP, I've become very accustomed to a lot of script-style language features. For instance, stuff like this tickles me insides:
# Retrieve the value from the cache; otherwise redownload.
if(!($value = $cache->get($key)))
# Redownload the value and store in the cache.
$cache->set($key, $value = redownload($key));
Python, though, doesn't consider assignment to be an expression. OTOH, it does support nice things like the in construct, which I find to be one of the greatest inventions of all time. x in y is so much nicer than !empty($y[$x]).
What other nuances, "missing" features, and performance bottlenecks should I watch out for? I'm hoping to make as seamless a transition into Python development as possible, and hope to learn some of the secrets that will help to smooth out development time and eliminate trial and error. Your insight is appreciated!
This one took me a few hours to figure out when I first encountered it in a real program:
A default argument to a function is a mutable, static value.
def foo(bar = []):
bar.append(1)
print(bar)
foo()
foo()
This will print
[1]
[1, 1]
For your example, the usual way would be something like this
try:
value = cache[key]
except KeyError:
value = cache[key] = redownload(key)
Threads do not do what you think they do, and probably shouldn't be used how you're used to using them. This is a huge gotcha for many people, especially for those used to Java where the custom is to subclass Thread implement the Runnable interface to do asynchronous work, and where there is language support for running threads in parallel (on machines with multiple CPU cores).
In general you probably don't want threads at all, but subprocesses. See my answer to the question "python threading and performace?".
(In more general, there might be a better way altogether.)
Exceptions are your friend.
In contrast with languages like C and PHP which use the return value to indicate errors, Python uses exceptions to interrupt the program instead of allowing the errors to cause further problems down the line.
Pythonic code is usually much faster than C-like code.
Something like:
new_list=[]
for i in xrange(len(old_list)):
new_list.append(some_function(old_list[i]))
is better written as:
new_list=[some_function(x) for x in old_list]
or
new_list=map(some_function,old_list)

Program structure in long running data processing python script

For my current job I am writing some long-running (think hours to days) scripts that do CPU intensive data-processing. The program flow is very simple - it proceeds into the main loop, completes the main loop, saves output and terminates: The basic structure of my programs tends to be like so:
<import statements>
<constant declarations>
<misc function declarations>
def main():
for blah in blahs():
<lots of local variables>
<lots of tightly coupled computation>
for something in somethings():
<lots more local variables>
<lots more computation>
<etc., etc.>
<save results>
if __name__ == "__main__":
main()
This gets unmanageable quickly, so I want to refactor it into something more manageable. I want to make this more maintainable, without sacrificing execution speed.
Each chuck of code relies on a large number of variables however, so refactoring parts of the computation out to functions would make parameters list grow out of hand very quickly. Should I put this sort of code into a python class, and change the local variables into class variables? It doesn't make a great deal of sense tp me conceptually to turn the program into a class, as the class would never be reused, and only one instance would ever be created per instance.
What is the best practice structure for this kind of program? I am using python but the question is relatively language-agnostic, assuming a modern object-oriented language features.
First off, if your program is going to be running for hours/days then the overhead of switching to using classes/methods instead of putting everything in a giant main is pretty much non-existent.
Additionally, refactoring (even if it does involve passing a lot of variables) should help you improve speed in the long run. Profiling an application which is designed well is much easier because you can pin-point the slow parts and optimize there. Maybe a new library comes along that's highly optimized for your calculations... a well designed program will let you plug it in and test right away. Or perhaps you decide to write a C Module extension to improve the speed of a subset of your calculations, a well designed application will make that easy too.
It's hard to give concrete advice without seeing <lots of tightly coupled computation> and <lots more computation>. But, I would start with making every for block it's own method and go from there.
Not too clean, but works well in little projects...
You can start using modules as if they were singleton instances, and create real classes only when you feel the complexity of the module or the computation justifies them.
If you do that, you would want to use "import module" and not "from module import stuff" -- it's cleaner and will work better if "stuff" can be reassigned. Besides, it's recommended in the Google guidelines.
Using a class (or classes) can help you organize your code.
Simplicity of form (such as through use of class attributes and methods) is important because it helps you see your algorithm, and can help you more easily unit test the parts.
IMO, these benefits far outweigh the slight loss of speed that may come with using OOP.

Encapsulation severely hurts performance?

I know this question is kind of stupid, maybe it's a just a part of writing code but it seems defining simple functions can really hurt performance severely... I've tried this simple test:
def make_legal_foo_string(x):
return "This is a foo string: " + str(x)
def sum_up_to(x):
return x*(x+1)/2
def foo(x):
return [make_legal_foo_string(x),sum_up_to(x),x+1]
def bar(x):
return ''.join([str(foo(x))," -- bar !! "])
it's very good style and makes code clear but it can be three times as slow as just writing it literally. It's inescapable for functions that can have side effects but it's actually almost trivial to define some functions that just should literally be replaced with lines of code every time they appear, translate the source code into that and only then compile. Same I think for magic numbers, it doesn't take a lot of time to read from memory but if they're not supposed to be changed then why not just replace every instance of 'magic' with a literal before the code compiles?
Function call overheads are not big; you won't normally notice them. You only see them in this case because your actual code (x*x) is itself so completely trivial. In any real program that does real work, the amount of time spent in function-calling overhead will be negligably small.
(Not that I'd really recommend using foo, identity and square in the example, in any case; they're so trivial it's just as readable to have them inline and they don't really encapsulate or abstract anything.)
if they're not supposed to be changed then why not just replace every instance of 'magic' with a literal before the code compiles?
Because programs are written for to be easy for you to read and maintain. You could replace constants with their literal values, but it'd make the program harder to work with, for a benefit that is so tiny you'll probably never even be able to measure it: the height of premature optimisation.
Encapsulation is about one thing and one thing only: Readability. If you're really so worried about performance that you're willing to start stripping out encapsulated logic, you may as well just start coding in assembly.
Encapsulation also assists in debugging and feature adding. Consider the following: lets say you have a simple game and need to add code that depletes the players health under some circumstances. Easy, yes?
def DamagePlayer(dmg):
player.health -= dmg;
This IS very trivial code, so it's very tempting to simply scatter "player.health -=" everywhere. But what if later you want to add a powerup that halves damage done to the player while active? If the logic is still encapsulated, it's easy:
def DamagePlayer(dmg):
if player.hasCoolPowerUp:
player.health -= dmg / 2
else
player.health -= dmg
Now, consider if you had neglected to encapulate that bit of logic because of it's simplicity. Now you're looking at coding the same logic into 50 different places, at least one of which you are almost certain to forget, which leads to weird bugs like: "When player has powerup all damage is halved except when hit by AlienSheep enemies..."
Do you want to have problems with Alien Sheep? I don't think so. :)
In all seriousness, the point I'm trying to make is that encapsulation is a very good thing in the right circumstances. Of course, it's also easy to over-encapsulate, which can be just as problematic. Also, there are situations where the speed really truly does matter (though they are rare), and that extra few clock cycles is worth it. About the only way to find the right balance is practice. Don't shun encapsulation because it's slower, though. The benefits usually far outweigh the costs.
I don't know how good python compilers are, but the answer to this question for many languages is that the compiler will optimize calls to small procedures / functions / methods by inlining them. In fact, in some language implementations you generally get better performance by NOT trying to "micro-optimize" the code yourself.
What you are talking about is the effect of inlining functions for gaining efficiency.
It is certainly true in your Python example, that encapsulation hurts performance. But there are some counter example to it:
In Java, defining getter&setter instead of defining public member variables does not result in performance degradation as the JIT inline the getters&setters.
sometimes calling a function repetedly may be better than performing inlining as the code executed may then fit in the cache. Inlining may cause code explosion...
Figuring out what to make into a function and what to just include inline is something of an art. Many factors (performance, readability, maintainability) feed into the equation.
I actually find your example kind of silly in many ways - a function that just returns it's argument? Unless it's an overload that's changing the rules, it's stupid. A function to square things? Again, why bother. Your function 'foo' should probably return a string, so that it can be used directly:
''.join(foo(x)," -- bar !! "])
That's probably a more correct level of encapsulation in this example.
As I say, it really depends on the circumstances. Unfortunately, this is the sort of thing that doesn't lend itself well to examples.
IMO, this is related to Function Call Costs. Which are negligible usually, but not zero. Splitting the code in a lot of very small functions may hurt. Especially in interpreted languages where full optimization is not available.
Inlining functions may improve performance but it may also deteriorate. See, for example, C++ FQA Lite for explanations (“Inlining can make code faster by eliminating function call overhead, or slower by generating too much code, causing instruction cache misses”). This is not C++ specific. Better leave optimizations for compiler/interpreter unless they are really necessary.
By the way, I don't see a huge difference between two versions:
$ python bench.py
fine-grained function decomposition: 5.46632194519
one-liner: 4.46827578545
$ python --version
Python 2.5.2
I think this result is acceptable. See bench.py in the pastebin.
There is a performance hit for using functions, since there is the overhead of jumping to a new address, pushing registers on the stack, and returning at the end. This overhead is very small, however, and even in peformance critial systems worrying about such overhead is most likely premature optimization.
Many languages avoid these issues in small frequenty called functions by using inlining, which is essentially what you do above.
Python does not do inlining. The closest thing you can do is use macros to replace the function calls.
This sort of performance issue is better served by another language, if you need the sort of speed gained by inlining (mostly marginal, and sometimes detremental) then you need to consider not using python for whatever you are working on.
There is a good technical reason why what you suggested is impossible. In Python, functions, constants, and everything else are accessible at runtime and can be changed at any time if necessary; they could also be changed by outside modules. This is an explicit promise of Python and one would need some extremely important reasons to break it.
For example, here's the common logging idiom:
# beginning of the file xxx.py
log = lambda *x: None
def something():
...
log(...)
...
(here log does nothing), and then at some other module or at the interactive prompt:
import xxx
xxx.log = print
xxx.something()
As you see, here log is modified by completely different module --- or by the user --- so that logging now works. That would be impossible if log was optimized away.
Similarly, if an exception was to happen in make_legal_foo_string (this is possible, e.g. if x.__str__() is broken and returns None) you'd be hit with a source quote from a wrong line and even perhaps from a wrong file in your scenario.
There are some tools that in fact apply some optimizations to Python code, but I don't think of the kind you suggested.

Categories

Resources