newbie programmer here. Just started learning some functional programming and I was wondering what's going on behind the scenes in the various scenarios of reduce, a for loop, and built in functions. One thing I noticed when I calculated the times for running each of these was that using reduce() took the longest, the for loop inside the function took the second longest, and using a built in function max() took the shortest. Can somebody explain what's going on behind the scenes that causes these speed differences?
I defined the for loop as:
def f(iterable):
j = next(iterable)
for i in iterable:
if i > j:
j = i
return j
and then compared it with
max(iterable)
and
reduce(lambda x, y: x if x>y else y, iterable)
and noticed, as stated previously, that using reduce() took the longest, the for loop inside the function took the second longest, and using a built in function max() took the shortest.
Python is an interpreted language. (At least, it's partly interpreted. Technically source code is compiled into byte code which is then interpreted.) Code running in an interpreter is almost always going to be a lot slower than native code running on the raw hardware of your machine.
But, a lot of the builtin functions and objects of Python are not written in the Python language itself. A function like max is implemented in C, so it can be pretty fast. It can be a lot faster than pure Python code that the interpreter needs to handle through.
Furthermore, some parts of pure Python code are faster than other parts. Function calls are notoriously slower than most other bits of code, so doing a lot of function calls is generally to be avoided if possible in performance-sensitive sections of your code.
So lets examine your three examples again with these performance thoughts in mind. The max function is implemented in C, so it's fastest. The pure-Python function is slower because its loop and comparisons all need to be interpreted, and while it contains several function calls, most of them are to builtin functions (like next which in turn calls __next__ method of your iterator, both of which are likely builtins). The slowest example is the one using reduce, which, though it is builtin itself, keeps calling back out to the lambda function you gave it as an argument. The repeated function calls to the relatively slow lambda function are what make it the slowest of your three examples.
Note that none of these speed differences change the asymptotic performance of your code. All three of your examples are O(N) where N is the number of items in the iterable. And often asymptotic performance is a lot more important than raw per-item speed if you need your code to be able to scale up to a larger problem. If you were instead comparing a exponentially scaling algorithm with an alternative that was linear (or even polynomial), you'd see vastly different performance numbers once the input size got large enough. Of course it's also possible that you won't care about scalability, if you only need the code to work once for a relatively modest data set. But in that case, the performance differences between builtin functions and lambdas probably don't matter all that much either.
I've seen a few examples of getting Python to do tail call optimization by using a while True loop. E.g.
def tail_exp(base, exponent, acc=1):
if exponent == 0:
return acc
return tail_exp(base, exponent - 1, acc * base)
becomes
def tail_exp_2(base, exponent, acc=1):
while True:
if exponent == 0:
return acc
exponent, acc = exponent - 1, acc * base
I'm curious to know if this technique is applicable to all/most recursive algorithms in Python, and if there are any downsides or "gotchas" to look out for when optimizing recursive algorithms in this way?
Any recursive algorithm can be replaced by an iterative one. However, some examples will require an additional stack be added to the code, to manage state that is handled by the recursive calls in the original form. With tail recursion, there is no state to be managed, so no separate stack is needed.
Some programming languages take advantage of that fact and design their compilers to optimize out tail calls in recursive code, producing machine code that is equivalent to a loop. Python does not do tail call optimization, so this isn't really relevant to your question. Rewriting code by hand is not tail call optimization, it's just a particular sort of refactoring.
There are a few reasons Python chooses not to do tail call optimization. It's not because it's impossible. Python code is compiled into byte code, so at least theoretically there's an opportunity to translate a recursive call into a loop if that was desired by the developers (in practice it's a little more complicated, since Python variable names are dynamic, you can't necessarily tell if a function name refers to what you expect it to at runtime, a fact use by techniques like monkeypatching). However, the biggest problem with tail call optimization is that it generally overwrites useful debugging information that would usually be preserved by a call stack, like exactly how deep in the recursion you are and the exact state of those previous function calls. The Python developers have decided that they prefer the simplicity and debuggability of normal recursion over performance benefits of tail call optimization, even when the latter is possible.
If you want to rewrite an algorithm from a recursive implementation into an iterative one, you can always do so. In some cases, though, it may get a lot more complicated. Recursive implementations of some algorithms can be a lot shorter, simpler, and easier to reason about, even though iterative equivalents may be faster (and won't hit the recursion limit for large inputs). Converting tail calls into a loop is usually quite simple though. The complicated cases are generally not amenable to tail call optimization either, since they're doing complicated stuff with the values returned by their recursion.
I know that recursion is sometimes a lot cleaner than looping, and I'm not asking anything about when I should use recursion over iteration, I know there are lots of questions about that already.
What I'm asking is, is recursion ever faster than a loop? To me it seems like, you would always be able to refine a loop and get it to perform more quickly than a recursive function because the loop is absent constantly setting up new stack frames.
I'm specifically looking for whether recursion is faster in applications where recursion is the right way to handle the data, such as in some sorting functions, in binary trees, etc.
This depends on the language being used. You wrote 'language-agnostic', so I'll give some examples.
In Java, C, and Python, recursion is fairly expensive compared to iteration (in general) because it requires the allocation of a new stack frame. In some C compilers, one can use a compiler flag to eliminate this overhead, which transforms certain types of recursion (actually, certain types of tail calls) into jumps instead of function calls.
In functional programming language implementations, sometimes, iteration can be very expensive and recursion can be very cheap. In many, recursion is transformed into a simple jump, but changing the loop variable (which is mutable) sometimes requires some relatively heavy operations, especially on implementations which support multiple threads of execution. Mutation is expensive in some of these environments because of the interaction between the mutator and the garbage collector, if both might be running at the same time.
I know that in some Scheme implementations, recursion will generally be faster than looping.
In short, the answer depends on the code and the implementation. Use whatever style you prefer. If you're using a functional language, recursion might be faster. If you're using an imperative language, iteration is probably faster. In some environments, both methods will result in the same assembly being generated (put that in your pipe and smoke it).
Addendum: In some environments, the best alternative is neither recursion nor iteration but instead higher order functions. These include "map", "filter", and "reduce" (which is also called "fold"). Not only are these the preferred style, not only are they often cleaner, but in some environments these functions are the first (or only) to get a boost from automatic parallelization — so they can be significantly faster than either iteration or recursion. Data Parallel Haskell is an example of such an environment.
List comprehensions are another alternative, but these are usually just syntactic sugar for iteration, recursion, or higher order functions.
is recursion ever faster than a loop?
No, Iteration will always be faster than Recursion. (in a Von Neumann Architecture)
Explanation:
If you build the minimum operations of a generic computer from scratch, "Iteration" comes first as a building block and is less resource intensive than "recursion", ergo is faster.
Building a pseudo-computing-machine from scratch:
Question yourself: What do you need to compute a value, i.e. to follow an algorithm and reach a result?
We will establish a hierarchy of concepts, starting from scratch and defining in first place the basic, core concepts, then build second level concepts with those, and so on.
First Concept: Memory cells, storage, State. To do something you need places to store final and intermediate result values. Let’s assume we have an infinite array of "integer" cells, called Memory, M[0..Infinite].
Instructions: do something - transform a cell, change its value. alter state. Every interesting instruction performs a transformation. Basic instructions are:
a) Set & move memory cells
store a value into memory, e.g.: store 5 m[4]
copy a value to another position: e.g.: store m[4] m[8]
b) Logic and arithmetic
and, or, xor, not
add, sub, mul, div. e.g. add m[7] m[8]
An Executing Agent: a core in a modern CPU. An "agent" is something that can execute instructions. An Agent can also be a person following the algorithm on paper.
Order of steps: a sequence of instructions: i.e.: do this first, do this after, etc. An imperative sequence of instructions. Even one line expressions are "an imperative sequence of instructions". If you have an expression with a specific "order of evaluation" then you have steps. It means than even a single composed expression has implicit “steps” and also has an implicit local variable (let’s call it “result”). e.g.:
4 + 3 * 2 - 5
(- (+ (* 3 2) 4 ) 5)
(sub (add (mul 3 2) 4 ) 5)
The expression above implies 3 steps with an implicit "result" variable.
// pseudocode
1. result = (mul 3 2)
2. result = (add 4 result)
3. result = (sub result 5)
So even infix expressions, since you have a specific order of evaluation, are an imperative sequence of instructions. The expression implies a sequence of operations to be made in a specific order, and because there are steps, there is also an implicit "result" intermediate variable.
Instruction Pointer: If you have a sequence of steps, you have also an implicit "instruction pointer". The instruction pointer marks the next instruction, and advances after the instruction is read but before the instruction is executed.
In this pseudo-computing-machine, the Instruction Pointer is part of Memory. (Note: Normally the Instruction Pointer will be a “special register” in a CPU core, but here we will simplify the concepts and assume all data (registers included) are part of “Memory”)
Jump - Once you have an ordered number of steps and an Instruction Pointer, you can apply the "store" instruction to alter the value of the Instruction Pointer itself. We will call this specific use of the store instruction with a new name: Jump. We use a new name because is easier to think about it as a new concept. By altering the instruction pointer we're instructing the agent to “go to step x“.
Infinite Iteration: By jumping back, now you can make the agent "repeat" a certain number of steps. At this point we have infinite Iteration.
1. mov 1000 m[30]
2. sub m[30] 1
3. jmp-to 2 // infinite loop
Conditional - Conditional execution of instructions. With the "conditional" clause, you can conditionally execute one of several instructions based on the current state (which can be set with a previous instruction).
Proper Iteration: Now with the conditional clause, we can escape the infinite loop of the jump back instruction. We have now a conditional loop and then proper Iteration
1. mov 1000 m[30]
2. sub m[30] 1
3. (if not-zero) jump 2 // jump only if the previous
// sub instruction did not result in 0
// this loop will be repeated 1000 times
// here we have proper ***iteration***, a conditional loop.
Naming: giving names to a specific memory location holding data or holding a step. This is just a "convenience" to have. We do not add any new instructions by having the capacity to define “names” for memory locations. “Naming” is not a instruction for the agent, it’s just a convenience to us. Naming makes code (at this point) easier to read and easier to change.
#define counter m[30] // name a memory location
mov 1000 counter
loop: // name a instruction pointer location
sub counter 1
(if not-zero) jmp-to loop
One-level subroutine: Suppose there’s a series of steps you need to execute frequently. You can store the steps in a named position in memory and then jump to that position when you need to execute them (call). At the end of the sequence you'll need to return to the point of calling to continue execution. With this mechanism, you’re creating new instructions (subroutines) by composing core instructions.
Implementation: (no new concepts required)
Store the current Instruction Pointer in a predefined memory position
jump to the subroutine
at the end of the subroutine, you retrieve the Instruction Pointer from the predefined memory location, effectively jumping back to the following instruction of the original call
Problem with the one-level implementation: You cannot call another subroutine from a subroutine. If you do, you'll overwrite the returning address (global variable), so you cannot nest calls.
To have a better Implementation for subroutines: You need a STACK
Stack: You define a memory space to work as a "stack", you can “push” values on the stack, and also “pop” the last “pushed” value. To implement a stack you'll need a Stack Pointer (similar to the Instruction Pointer) which points to the actual “head” of the stack. When you “push” a value, the stack pointer decrements and you store the value. When you “pop”, you get the value at the actual Stack Pointer and then the Stack Pointer is incremented.
Subroutines Now that we have a stack we can implement proper subroutines allowing nested calls. The implementation is similar, but instead of storing the Instruction Pointer in a predefined memory position, we "push" the value of the IP in the stack. At the end of the subroutine, we just “pop” the value from the stack, effectively jumping back to the instruction after the original call. This implementation, having a “stack” allows calling a subroutine from another subroutine. With this implementation we can create several levels of abstraction when defining new instructions as subroutines, by using core instructions or other subroutines as building blocks.
Recursion: What happens when a subroutine calls itself?. This is called "recursion".
Problem: Overwriting the local intermediate results a subroutine can be storing in memory. Since you are calling/reusing the same steps, if the intermediate result are stored in predefined memory locations (global variables) they will be overwritten on the nested calls.
Solution: To allow recursion, subroutines should store local intermediate results in the stack, therefore, on each recursive call (direct or indirect) the intermediate results are stored in different memory locations.
...
having reached recursion we stop here.
Conclusion:
In a Von Neumann Architecture, clearly "Iteration" is a simpler/basic concept than “Recursion". We have a form of "Iteration" at level 7, while "Recursion" is at level 14 of the concepts hierarchy.
Iteration will always be faster in machine code because it implies less instructions therefore less CPU cycles.
Which one is "better"?
You should use "iteration" when you are processing simple, sequential data structures, and everywhere a “simple loop” will do.
You should use "recursion" when you need to process a recursive data structure (I like to call them “Fractal Data Structures”), or when the recursive solution is clearly more “elegant”.
Advice: use the best tool for the job, but understand the inner workings of each tool in order to choose wisely.
Finally, note that you have plenty of opportunities to use recursion. You have Recursive Data Structures everywhere, you’re looking at one now: parts of the DOM supporting what you are reading are a RDS, a JSON expression is a RDS, the hierarchical file system in your computer is a RDS, i.e: you have a root directory, containing files and directories, every directory containing files and directories, every one of those directories containing files and directories...
Recursion may well be faster where the alternative is to explicitly manage a stack, like in the sorting or binary tree algorithms you mention.
I've had a case where rewriting a recursive algorithm in Java made it slower.
So the right approach is to first write it in the most natural way, only optimize if profiling shows it is critical, and then measure the supposed improvement.
Tail recursion is as fast as looping. Many functional languages have tail recursion implemented in them.
Most of the answers here are wrong. The right answer is it depends. For example, here are two C functions which walks through a tree. First the recursive one:
static
void mm_scan_black(mm_rc *m, ptr p) {
SET_COL(p, COL_BLACK);
P_FOR_EACH_CHILD(p, {
INC_RC(p_child);
if (GET_COL(p_child) != COL_BLACK) {
mm_scan_black(m, p_child);
}
});
}
And here is the same function implemented using iteration:
static
void mm_scan_black(mm_rc *m, ptr p) {
stack *st = m->black_stack;
SET_COL(p, COL_BLACK);
st_push(st, p);
while (st->used != 0) {
p = st_pop(st);
P_FOR_EACH_CHILD(p, {
INC_RC(p_child);
if (GET_COL(p_child) != COL_BLACK) {
SET_COL(p_child, COL_BLACK);
st_push(st, p_child);
}
});
}
}
It's not important to understand the details of the code. Just that p are nodes and that P_FOR_EACH_CHILD does the walking. In the iterative version we need an explicit stack st onto which nodes are pushed and then popped and manipulated.
The recursive function runs much faster than the iterative one. The reason is because in the latter, for each item, a CALL to the function st_push is needed and then another to st_pop.
In the former, you only have the recursive CALL for each node.
Plus, accessing variables on the callstack is incredibly fast. It means you are reading from memory which is likely to always be in the innermost cache. An explicit stack, on the other hand, has to be backed by malloc:ed memory from the heap which is much slower to access.
With careful optimization, such as inlining st_push and st_pop, I can reach roughly parity with the recursive approach. But at least on my computer, the cost of accessing heap memory is bigger than the cost of the recursive call.
But this discussion is mostly moot because recursive tree walking is incorrect. If you have a large enough tree, you will run out of callstack space which is why an iterative algorithm must be used.
Most answers here forget the obvious culprit why recursion is often slower than iterative solutions. It's linked with the build up and tear down of stack frames but is not exactly that. It's generally a big difference in the storage of the auto variable for each recursion. In an iterative algorithm with a loop, the variables are often held in registers and even if they spill, they will reside in the Level 1 cache. In a recursive algorithm, all intermediary states of the variable are stored on the stack, meaning they will generate many more spills to memory. This means that even if it makes the same amount of operations, it will have a lot memory accesses in the hot loop and what makes it worse, these memory operations have a lousy reuse rate making the caches less effective.
TL;DR recursive algorithms have generally a worse cache behavior than iterative ones.
Consider what absolutely must be done for each, iteration and recursion.
iteration: a jump to beginning of loop
recursion: a jump to beginning of called function
You see that there is not much room for differences here.
(I assume recursion being a tail-call and compiler being aware of that optimization).
In general, no, recursion will not be faster than a loop in any realistic usage that has viable implementations in both forms. I mean, sure, you could code up loops that take forever, but there would be better ways to implement the same loop that could outperform any implementation of the same problem via recursion.
You hit the nail on the head regarding the reason; creating and destroying stack frames is more expensive than a simple jump.
However, do note that I said "has viable implementations in both forms". For things like many sorting algorithms, there tends to not be a very viable way of implementing them that doesn't effectively set up its own version of a stack, due to the spawning of child "tasks" that are inherently part of the process. Thus, recursion may be just as fast as attempting to implement the algorithm via looping.
Edit: This answer is assuming non-functional languages, where most basic data types are mutable. It does not apply to functional languages.
In any realistic system, no, creating a stack frame will always be more expensive than an INC and a JMP. That's why really good compilers automatically transform tail recursion into a call to the same frame, i.e. without the overhead, so you get the more readable source version and the more efficient compiled version. A really, really good compiler should even be able to transform normal recursion into tail recursion where that is possible.
Functional programming is more about "what" rather than "how".
The language implementors will find a way to optimize how the code works underneath, if we don't try to make it more optimized than it needs to be. Recursion can also be optimized within the languages that support tail call optimization.
What matters more from a programmer standpoint is readability and maintainability rather than optimization in the first place. Again, "premature optimization is root of all evil".
This is a guess. Generally recursion probably doesn't beat looping often or ever on problems of decent size if both are using really good algorithms(not counting implementation difficulty) , it may be different if used with a language w/ tail call recursion(and a tail recursive algorithm and with loops also as part of the language)-which would probably have very similar and possibly even prefer recursion some of the time.
According to theory its the same things.
Recursion and loop with the same O() complexity will work with the same theoretical speed, but of course real speed depends on language, compiler and processor.
Example with power of number can be coded in iteration way with O(ln(n)):
int power(int t, int k) {
int res = 1;
while (k) {
if (k & 1) res *= t;
t *= t;
k >>= 1;
}
return res;
}
Here is an example when recursion ran faster than for looping in Java. This is a program which performs Bubble Sort on two arrays. The recBubbleSort(....) method sorts array arr using recursion and bbSort(....) method just uses looping to sort the array narr. The data are same in both the arrays.
public class BBSort_App {
public static void main(String args[]) {
int[] arr = {231,414235,23,543,245,6,324,-32552,-4};
long time = System.nanoTime();
recBubbleSort(arr, arr.length-1, 0);
time = System.nanoTime() - time;
System.out.println("Time Elapsed: "+time+"nanos");
disp(arr);
int[] narr = {231,414235,23,543,245,6,324,-32552,-4};
time = System.nanoTime();
bbSort(narr);
time = System.nanoTime()-time;
System.out.println("Time Elapsed: "+time+"nanos");
disp(narr);
}
static void disp(int[] origin) {
System.out.print("[");
for(int b: origin)
System.out.print(b+", ");
System.out.println("\b\b \b]");
}
static void recBubbleSort(int[] origin, int i, int j) {
if(i>0)
if(j!=i) {
if(origin[i]<origin[j]) {
int temp = origin[i];
origin[i] = origin[j];
origin[j] = temp;
}
recBubbleSort(origin, i, j+1);
}
else
recBubbleSort(origin, i-1, 0);
}
static void bbSort(int[] origin) {
for(int out=origin.length-1;out>0;out--)
for(int in=0;in<out;in++)
if(origin[out]<origin[in]) {
int temp = origin[out];
origin[out] = origin[in];
origin[in] = temp;
}
}
}
Running the test even 50 times gave alomst same results:
The answers given to this question is satisfactory but are without simple examples. Can anybody just give the reason to why this recursion runs faster?
(Python 2.7.8 Windows)
I'm doing a comparison between different sorting algorithms (Quick, bubble and insertion), and mostly it's going as expected, Quick sort is considerably faster with long lists and bubble and insertion are faster with very short lists and alredy sorted ones.
What's raising a problem is Quick Sort and the before mentioned "already sorted" lists. I can sort lists of even 100000 items without problems with this, but with lists of integers from 0...n the limit seems to be considerably lower. 0...500 works but even 0...1000 gives:
RuntimeError: maximum recursion depth exceeded in cmp
Quick Sort:
def quickSort(myList):
if myList == []:
return []
else:
pivot = myList[0]
lesser = quickSort([x for x in myList[1:] if x < pivot])
greater = quickSort([x for x in myList[1:] if x >= pivot])
myList = lesser + [pivot] + greater
return myList
Is there something wrong with the code, or am I missing something?
There are two things going on.
First, Python intentionally limits recursion to a fixed depth. Unlike, say, Scheme, which will keep allocating frames for recursive calls until you run out of memory, Python (at least the most popular implementation, CPython) will only allocate sys.getrecursionlimit() frames (defaulting to 1000) before failing. There are reasons for that,* but really, that isn't relevant here; just the fact that it does this is what you need to know about.
Second, as you may already know, while QuickSort is O(N log N) with most lists, it has a worst case of O(N^2)—in particular (using the standard pivot rules) with already-sorted lists. And when this happens, your stack depth can end up being O(N). So, if you have 1000 elements, arranged in worst-case order, and you're already one frame into the stack, you're going to overflow.
You can work around this in a few ways:
Rewrite the code to be iterative, with an explicit stack, so you're only limited by heap memory instead of stack depth.
Make sure to always recurse into the shorter side first, rather than the left side. This means that even in the O(N^2) case, your stack depth is still O(log N). But only if you've already done the previous step.**
Use a random, median-of-three, or other pivot rule that makes common cases not like already-sorted worst-case. (Of course someone can still intentionally DoS your code; there's really no way to avoid that with quicksort.) The Wikipedia article has some discussion on this, and links to the classic Sedgewick and Knuth papers.
Use a Python implementation with an unlimited stack.***
sys.setrecursionlimit(max(sys.getrecursionlimit(), len(myList)+CONSTANT)). This way, you'll fail right off the bat for an obvious reason if you can't make enough space, and usually won't fail otherwise. (But you might—you could be starting the sort already 900 steps deep in the stack…) But this is a bad idea.****. Besides, you have to figure out the right CONSTANT, which is impossible in general.*****
* Historically, the CPython interpreter recursively calls itself for recursive Python function calls. And the C stack is fixed in size; if you overrun the end, you could segfault, stomp all over heap memory, or all kinds of other problems. This could be changed—in fact, Stackless Python started off as basically just CPython with this change. But the core devs have intentionally chosen not to do so, in part because they don't want to encourage people to write deeply recursive code.
** Or if your language does automatic tail call elimination, but Python doesn't do that. But, as gnibbler points out, you can write a hybrid solution—recurse on the small end, then manually unwrap the tail recursion on the large end—that won't require an explicit stack.
*** Stackless and PyPy can both be configured this way.
**** For one thing, eventually you're going to crash the C stack.
***** The constant isn't really constant; it depends on how deep you already are in the stack (computable non-portably by walking sys._getframe() up to the top) and how much slack you need for comparison functions, etc. (not computable at all, you just have to guess).
You're choosing the first item of each sublist as the pivot. If the list is already in order, this means that your greater sublist is all the items but the first, rather than about half of them. Essentially, each recursive call manages to process only one item. Which means the depth of recursive calls you'll need to make will be about the same as the number of items in the full list. Which overflows Python's built-in limit once you hit about 1000 items. You will have a similar problem sorting lists that are already in reversed order.
To correct this use one of the workarounds suggested in the literature, such as choosing an item at random to be the pivot or the median of the first, middle, and last items.
Always choosing the first (or last) element as the pivot will have problems for quicksort - worst case performance for some common inputs as you have seen
One technique that works fairly well is to choose the average of first,middle and last element
You don't want to make the pivot selection too complicated, or it will dominate the runtime of the search
If I defined a function like this:
def ccid_year(seq):
year, prefix, index, suffix = seq
return year
Is Python allowed to optimize it to be effectively:
def ccid_year(seq):
return seq[0]
I'd prefer to write the first function because it documents the format of the data being passed in but would hope that Python would generate code that is effectively as efficient as the second definition.
The two functions are not equivalent:
def ccid_year_1(seq):
year, prefix, index, suffix = seq
return year
def ccid_year_2(seq):
return seq[0]
arg = {1:'a', 2:'b', 0:'c', 3:'d'}
print ccid_year_1(arg)
print ccid_year_2(arg)
The first call prints 0 and the second prints c.
I'll answer the question at face value later, but first: When in doubt, benchmark it! But first, recall that most time is spent in a small portion of the code (i.e., most code is irrelevant to performance!) and, in CPython, function call overhead usually dominates small inefficiencies. Not to mention that large-scale algorithmic inefficiencies (a.k.a. freaking stupid code) dwarfs micro-optimization concerns.
So either don't worry about this at all, or if you have reason to worry about it, first benchmark alternatives and second don't put it in a function. Note that "reasons to worry about it" must be weighted against the time spent worrying, and the maintenance burden (if there is one) of the manual optimization.
CPython, the reference implementation you most like use, is very conservative about optimizing at this level. While there is a peephole optimizer operating on bytecode, it is limited in scale. More generally, you can't expect much optimization crossing a single statement. The problem with statically optimizing Python code is that there's a billion ways even the most innocently-looking program frament can call into arbitrary code, which might do anything at all, so you can't omit these calls.
While we're at it, your proposed optimization is invalid (in the sense that the program doesn't have the same behavior) if seq is of the wrong type (not a sequence, or a very weird sequence) or length (not exactly three items long)! Any program claiming to implement Python must maintain such differences, so it won't do the transformation you suggest literally. I assume this was just an off-hand illustration, but it does indicate you seriously underestimate how complex Python is (to implement, and doubly so to optimize). I and others have written about this at length before, so I'll stop now before this post becomes even larger.
PyPy on the other hand will, if this function is indeed called from a hot loop, probably optimize this and a million other things you didn't even think of, while compiling it down to a machine code loop that iterates faster than any Python loop could ever iterate on CPython. It will still contain a few checks to break out of the loop and take the proper action (e.g. raise an exception) if necessary, but they'll also be highly efficient if not triggered.
I do not know much about IronPython and Jython and other implementations, but if their lack of consistent several-times-faster-than-CPython benchmark results is any indicator, they do not perform significant optimizations. While the VMs IronPython and Jython include JIT compilers (not - but not quite - entirely unlike PyPy's), these JIT compilers are built for very different languages, and I'd be very surprised if they could look through the mess of code IronPython/Jython must execute to achieve Python semantics and perform such optimizations on it.