Python: Reduce vs For loop vs built in functions - python

newbie programmer here. Just started learning some functional programming and I was wondering what's going on behind the scenes in the various scenarios of reduce, a for loop, and built in functions. One thing I noticed when I calculated the times for running each of these was that using reduce() took the longest, the for loop inside the function took the second longest, and using a built in function max() took the shortest. Can somebody explain what's going on behind the scenes that causes these speed differences?
I defined the for loop as:
def f(iterable):
j = next(iterable)
for i in iterable:
if i > j:
j = i
return j
and then compared it with
max(iterable)
and
reduce(lambda x, y: x if x>y else y, iterable)
and noticed, as stated previously, that using reduce() took the longest, the for loop inside the function took the second longest, and using a built in function max() took the shortest.

Python is an interpreted language. (At least, it's partly interpreted. Technically source code is compiled into byte code which is then interpreted.) Code running in an interpreter is almost always going to be a lot slower than native code running on the raw hardware of your machine.
But, a lot of the builtin functions and objects of Python are not written in the Python language itself. A function like max is implemented in C, so it can be pretty fast. It can be a lot faster than pure Python code that the interpreter needs to handle through.
Furthermore, some parts of pure Python code are faster than other parts. Function calls are notoriously slower than most other bits of code, so doing a lot of function calls is generally to be avoided if possible in performance-sensitive sections of your code.
So lets examine your three examples again with these performance thoughts in mind. The max function is implemented in C, so it's fastest. The pure-Python function is slower because its loop and comparisons all need to be interpreted, and while it contains several function calls, most of them are to builtin functions (like next which in turn calls __next__ method of your iterator, both of which are likely builtins). The slowest example is the one using reduce, which, though it is builtin itself, keeps calling back out to the lambda function you gave it as an argument. The repeated function calls to the relatively slow lambda function are what make it the slowest of your three examples.
Note that none of these speed differences change the asymptotic performance of your code. All three of your examples are O(N) where N is the number of items in the iterable. And often asymptotic performance is a lot more important than raw per-item speed if you need your code to be able to scale up to a larger problem. If you were instead comparing a exponentially scaling algorithm with an alternative that was linear (or even polynomial), you'd see vastly different performance numbers once the input size got large enough. Of course it's also possible that you won't care about scalability, if you only need the code to work once for a relatively modest data set. But in that case, the performance differences between builtin functions and lambdas probably don't matter all that much either.

Related

Efficiency of for loops in python3

I am currently learning Python (3), having mostly experience with R as main programming language. While in R for-loops have mostly the same functionality as in Python, I was taught to avoid using it for big operations and instead use apply, which is more efficient.
My question is: how efficient are for-loops in Python, are there alternatives and is it worth exploring those possibilities as a Python newbie?
For example:
p = some_candidate_parameter_generator(data)
for i in p:
fit_model_with paramter(data, i)
Bear with me, it is tricky to give an example without going too much into specific code. But this is something that in R I would have writting with apply, especially if p is large.
The comments correctly point out that for loops are "only as efficient as your logic"; however, the range and xrange in Python do have performance implications, and this may be what you had in mind when asking this question. These methods have nothing to do with the intrinsic performance of for loops though.
In Python 3.0, xrange is now implicitly just range; however, in Python versions less than 3.0, there used to be a distinction – range loaded your entire iterable into memory, and then iterated over each item, while xrange was more akin to a generator, where each item was loaded into memory only when needed and then removed from memory after it was iterated over.
After your updated question:
In other words, if you have a giant list of items that you need to iterate over via a for loop, it is often more memory efficient to use a generator, not a list or a tuple, etc. Again though, this has nothing to do with how the Python for-loop operates, but more to do with what you're iterating over. If in doubt, use a generator, and your memory-efficiency will be as good as it will get with Python.

Is there any way to "call a function" in python without incurring the usual performance hit?

Captain Hindsight, reporting in:
After reading through the comment and answer and running a few tests, I found out that I had made a subtle error in my calculations. Turns out, I was
comparing compiled lookups to interpreted calls. When I precompiled
the call using the NON-IPython line magic version ( ie:
timeit.timeit(codestr, setup_codestr), I found that the function calls
were indeed on the same order of magnitude as the lookups :)
Now there's a whole world of caching function results, precompiling functions, and precompiling types to explore! ..and that's nice :)
For posterity:
I realize that sounds like a strange question, but someone might know a way around this, and that would be great. So here goes:
If I do something like:
%%timeit somelist[42]
Then I get times in the 90 nanosecond range. A slice will get it up to 190ish; and, to my pleasant surprise, even big crazy ones were still fast. This bad boy, for instance, weighs in at 385 nanseconds:
%%timeit some_nested_list[2:5][1][6:13]
Here's the thing. Function calls, it seems, are a lot slower than that. I like decomposing problems functionally, and am starting to give functional programming a bit more thought, but the speed difference is significant and (3.34 microseconds vs 100-150 nanoseconds (realistic actual avgs of conditionals, etc)). The following takes 3.34 micros:
def func():
some_nested_list[2:5][1][6:13]
%%timeit func()
So, there's presumably a lot of functional programmers out there? You all must have dealt with this little hiccup? Someone care to point me in the right direction?
Not really. Python function calls involve a certain amount of overhead for setting up the stack frame, etc., and you can't eliminate that overhead while still writing a Python function. The reason the operations in your example are fast is that you're doing them on a list, and lists are written in C.
One thing to keep in mind is that, in many practical situations, the function call overhead will be small relative to what the function actually does. See this question for some discussion. However, if you move toward a pure-functional style in which each function just evaluates one expression, you may indeed suffer a performance penalty.
An alternative is to look at PyPy, which makes many pure-Python operations faster. I don't know whether it improves function call speed specifically. Also, by using PyPy you restrict the set of libraries you can use.
Finally, there is Cython, which allows you to write code in a language that looks basically the same as Python, but actually compiles to C. This can be much faster than Python in some cases.
The bottom line is that how to speed up your functions depends on what your functions actually do. There is no magic way to just magically make all function calls faster while still keeping everything else about Python the same. If there were, they probably would have already added it to Python.

Is Python allowed to optimize a function definition to eliminate unused code?

If I defined a function like this:
def ccid_year(seq):
year, prefix, index, suffix = seq
return year
Is Python allowed to optimize it to be effectively:
def ccid_year(seq):
return seq[0]
I'd prefer to write the first function because it documents the format of the data being passed in but would hope that Python would generate code that is effectively as efficient as the second definition.
The two functions are not equivalent:
def ccid_year_1(seq):
year, prefix, index, suffix = seq
return year
def ccid_year_2(seq):
return seq[0]
arg = {1:'a', 2:'b', 0:'c', 3:'d'}
print ccid_year_1(arg)
print ccid_year_2(arg)
The first call prints 0 and the second prints c.
I'll answer the question at face value later, but first: When in doubt, benchmark it! But first, recall that most time is spent in a small portion of the code (i.e., most code is irrelevant to performance!) and, in CPython, function call overhead usually dominates small inefficiencies. Not to mention that large-scale algorithmic inefficiencies (a.k.a. freaking stupid code) dwarfs micro-optimization concerns.
So either don't worry about this at all, or if you have reason to worry about it, first benchmark alternatives and second don't put it in a function. Note that "reasons to worry about it" must be weighted against the time spent worrying, and the maintenance burden (if there is one) of the manual optimization.
CPython, the reference implementation you most like use, is very conservative about optimizing at this level. While there is a peephole optimizer operating on bytecode, it is limited in scale. More generally, you can't expect much optimization crossing a single statement. The problem with statically optimizing Python code is that there's a billion ways even the most innocently-looking program frament can call into arbitrary code, which might do anything at all, so you can't omit these calls.
While we're at it, your proposed optimization is invalid (in the sense that the program doesn't have the same behavior) if seq is of the wrong type (not a sequence, or a very weird sequence) or length (not exactly three items long)! Any program claiming to implement Python must maintain such differences, so it won't do the transformation you suggest literally. I assume this was just an off-hand illustration, but it does indicate you seriously underestimate how complex Python is (to implement, and doubly so to optimize). I and others have written about this at length before, so I'll stop now before this post becomes even larger.
PyPy on the other hand will, if this function is indeed called from a hot loop, probably optimize this and a million other things you didn't even think of, while compiling it down to a machine code loop that iterates faster than any Python loop could ever iterate on CPython. It will still contain a few checks to break out of the loop and take the proper action (e.g. raise an exception) if necessary, but they'll also be highly efficient if not triggered.
I do not know much about IronPython and Jython and other implementations, but if their lack of consistent several-times-faster-than-CPython benchmark results is any indicator, they do not perform significant optimizations. While the VMs IronPython and Jython include JIT compilers (not - but not quite - entirely unlike PyPy's), these JIT compilers are built for very different languages, and I'd be very surprised if they could look through the mess of code IronPython/Jython must execute to achieve Python semantics and perform such optimizations on it.

Optimizing Python Code [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've been working on one of the coding challenges on InterviewStreet.com and I've run into a bit of an efficiency problem. Can anyone suggest where I might change the code to make it faster and more efficient?
Here's the code
Here's the problem statement if you're interested
If your question is about optimising python code generally (which I think it should be ;) then there are all sorts of intesting things you can do, but first:
You probably shouldn't be obsessively optimising python code! If you're using the fastest algorithm for the problem you're trying to solve and python doesn't do it fast enough you should probably be using a different language.
That said, there are several approaches you can take (because sometimes, you really do want to make python code faster):
Profile (do this first!)
There are lots of ways of profiling python code, but there are two that I'll mention: cProfile (or profile) module, and PyCallGraph.
cProfile
This is what you should actually use, though interpreting the results can be a bit daunting.
It works by recording when each function is entered or exited, and what the calling function was (and tracking exceptions).
You can run a function in cProfile like this:
import cProfile
cProfile.run('myFunction()', 'myFunction.profile')
Then to view the results:
import pstats
stats = pstats.Stats('myFunction.profile')
stats.strip_dirs().sort_stats('time').print_stats()
This will show you in which functions most of the time is spent.
PyCallGraph
PyCallGraph provides a prettiest and maybe the easiest way of profiling python programs -- and it's a good introduction to understanding where the time in your program is spent, however it adds significant execution overhead
To run pycallgraph:
pycallgraph graphviz ./myprogram.py
Simple! You get a png graph image as output (perhaps after a while...)
Use Libraries
If you're trying to do something in python that a module already exists for (maybe even in the standard library), then use that module instead!
Most of the standard library modules are written in C, and they will execute hundreds of times faster than equivilent python implementations of, say, bisection search.
Make the Interpreter do as Much of Your Work as You Can
The interpreter will do some things for you, like looping. Really? Yes! You can use the map, reduce, and filter keywords to significantly speed up tight loops:
consider:
for x in xrange(0, 100):
doSomethingWithX(x)
vs:
map(doSomethingWithX, xrange(0,100))
Well obviously this could be faster because the interpreter only has to deal with a single statement, rather than two, but that's a bit vague... in fact, this is faster for two reasons:
all flow control (have we finished looping yet...) is done in the interpreter
the doSomethingWithX function name is only resolved once
In the for loop, each time around the loop python has to check exactly where the doSomethingWithX function is! even with cacheing this is a bit of an overhead.
Remember that Python is an Interpreted Language
(Note that this section really is about tiny tiny optimisations that you shouldn't let affect your normal, readable coding style!)
If you come from a background of a programming in a compiled language, like c or Fortran, then some things about the performance of different python statements might be surprising:
try:ing is cheap, ifing is expensive
If you have code like this:
if somethingcrazy_happened:
uhOhBetterDoSomething()
else:
doWhatWeNormallyDo()
And doWhatWeNormallyDo() would throw an exception if something crazy had happened, then it would be faster to arrange your code like this:
try:
doWhatWeNormallyDo()
except SomethingCrazy:
uhOhBetterDoSomething()
Why? well the interpreter can dive straight in and start doing what you normally do; in the first case the interpreter has to do a symbol look up each time the if statement is executed, because the name could refer to something different since the last time the statement was executed! (And a name lookup, especially if somethingcrazy_happened is global can be nontrivial).
You mean Who??
Because of cost of name lookups it can also be better to cache global values within functions, and bake-in simple boolean tests into functions like this:
Unoptimised function:
def foo():
if condition_that_rarely_changes:
doSomething()
else:
doSomethingElse()
Optimised approach, instead of using a variable, exploit the fact that the interpreter is doing a name lookup on the function anyway!
When the condition becomes true:
foo = doSomething # now foo() calls doSomething()
When the condition becomes false:
foo = doSomethingElse # now foo() calls doSomethingElse()
PyPy
PyPy is a python implementation written in python. Surely that means it will run code infinitely slower? Well, no. PyPy actually uses a Just-In-Time compiler (JIT) to run python programs.
If you don't use any external libraries (or the ones you do use are compatible with PyPy), then this is an extremely easy way to (almost certainly) speed up repetitive tasks in your program.
Basically the JIT can generate code that will do what the python interpreter would, but much faster, since it is generated for a single case, rather than having to deal with every possible legal python expression.
Where to look Next
Of course, the first place you should have looked was to improve your algorithms and data structures, and to consider things like caching, or even whether you need to be doing so much in the first place, but anyway:
This page of the python.org wiki provides lots of information about how to speed up python code, though some of it is a bit out of date.
Here's the BDFL himself on the subject of optimising loops.
There are quite a few things, even from my own limited experience that I've missed out, but this answer was long enough already!
This is all based on my own recent experiences with some python code that just wasn't fast enough, and I'd like to stress again that I don't really think any of what I've suggested is actually a good idea, sometimes though, you have to....
First off, profile your code so you know where the problems lie. There are many examples of how to do this, here's one: https://codereview.stackexchange.com/questions/3393/im-trying-to-understand-how-to-make-my-application-more-efficient
You do a lot of indexed access as in:
for pair in range(i-1, j):
if coordinates[pair][0] >= 0 and coordinates[pair][1] >= 0:
Which could be written more plainly as:
for coord in coordinates[i-1:j]:
if coord[0] >= 0 and cood[1] >= 0:
List comprehensions are cool and "pythonic", but this code would probably run faster if you didn't create 4 lists:
N = int(raw_input())
coordinates = []
coordinates = [raw_input() for i in xrange(N)]
coordinates = [pair.split(" ") for pair in coordinates]
coordinates = [[int(pair[0]), int(pair[1])] for pair in coordinates]
I would instead roll all those together into one simple loop or if you're really dead set on list comprehensions, encapsulate the multiple transformations into a function which operates on the raw_input().
This answer shows how I locate code to optimize.
Suppose there is some line of code you could replace, and it is costing, say, 40% of the time.
Then it resides on the call stack 40% of the time.
If you take 10 samples of the call stack, it will appear on 4 of them, give or take.
It really doesn't matter how many samples show it.
If it appears on two or more, and if you can replace it, you will save whatever time it costs.
Most of the interview street problems seem to be tested in a way that will verify that you have found an algorithm with the right big O complexity rather than that you have coded the solution in the most optimal way possible.
In other words if you are failing some of the test cases due to running out of time the problem is likely that you need to figure out a solution with lower algorithmic complexity rather than micro-optimize the algorithm you have. This is why they generally state that N can be quite large.

Generator speed in python 3

I am going through a link about generators that someone posted. In the beginning he compares the two functions below. On his setup he showed a speed increase of 5% with the generator.
I'm running windows XP, python 3.1.1, and cannot seem to duplicate the results. I keep showing the "old way"(logs1) as being slightly faster when tested with the provided logs and up to 1GB of duplicated data.
Can someone help me understand whats happening differently?
Thanks!
def logs1():
wwwlog = open("big-access-log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(None,1)[1]
if bytestr != '-':
total += int(bytestr)
return total
def logs2():
wwwlog = open("big-access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
getbytes = (int(x) for x in bytecolumn if x != '-')
return sum(getbytes)
*edit, spacing messed up in copy/paste
For what it's worth, the main purpose of the speed comparison in the presentation was to point out that using generators does not introduce a huge performance overhead. Many programmers, when first seeing generators, might start wondering about the hidden costs. For example, is there all sorts of fancy magic going on behind the scenes? Is using this feature going to make my program run twice as slow?
In general that's not the case. The example is meant to show that a generator solution can run at essentially the same speed, if not slightly faster in some cases (although it depends on the situation, version of Python, etc.). If you are observing huge differences in performance between the two versions though, then that would be something worth investigating.
In David Beazley's slides that you linked to, he states that all tests were run with "Python 2.5.1 on OS X 10.4.11," and you say you're running tests with Python 3.1 on Windows XP. So, realize you're doing some apples to oranges comparison. I suspect of the two variables, the Python version matters much more.
Python 3 is a different beast than Python 2. Many things have changed under the hood, (even within the Python 2 branch). This includes performance optimizations as well as performance regressions (see, for example, Beazley's own recent blog post on I/O in Python 3). For this reason, the Python Performance Tips page states explicitly,
You should always test these tips with
your application and the version of
Python you intend to use and not just
blindly accept that one method is
faster than another.
I should mention that one area that you can count on generators helping is in reducing memory consumption, rather than CPU consumption. If you have a large amount of data where you calculate or extract something from each individual piece, and you don't need the data after, generators will shine. See generator comprehension for more details.
You don't have an answer after almost a half an hour. I'm posting something that makes sense to me, not necessarily the right answer. I figure that this is better than nothing after almost half an hour:
The first algorithm uses a generator. A generator functions by loading the first page of results from the list (into memory) and continually loads the successive pages (into memory) until there is nothing left to read from input.
The second algorithm uses two generators, each with an if statement for a total of two comparisons per loop as opposed to the first algorithm's one comparison.
Also the second algorithm calls the sum function at the end as opposed to the first algorithm that simply keeps adding relevant integers as it keeps encountering them.
As such, for sufficiently large inputs, the second algorithm has more comparisons and an extra function call than the first. This could possibly explain why it takes longer to finish than the first algorithm.
Hope this helps

Categories

Resources