Benefits to recursive function over using loops outside of readability? [duplicate] - python

I know that recursion is sometimes a lot cleaner than looping, and I'm not asking anything about when I should use recursion over iteration, I know there are lots of questions about that already.
What I'm asking is, is recursion ever faster than a loop? To me it seems like, you would always be able to refine a loop and get it to perform more quickly than a recursive function because the loop is absent constantly setting up new stack frames.
I'm specifically looking for whether recursion is faster in applications where recursion is the right way to handle the data, such as in some sorting functions, in binary trees, etc.

This depends on the language being used. You wrote 'language-agnostic', so I'll give some examples.
In Java, C, and Python, recursion is fairly expensive compared to iteration (in general) because it requires the allocation of a new stack frame. In some C compilers, one can use a compiler flag to eliminate this overhead, which transforms certain types of recursion (actually, certain types of tail calls) into jumps instead of function calls.
In functional programming language implementations, sometimes, iteration can be very expensive and recursion can be very cheap. In many, recursion is transformed into a simple jump, but changing the loop variable (which is mutable) sometimes requires some relatively heavy operations, especially on implementations which support multiple threads of execution. Mutation is expensive in some of these environments because of the interaction between the mutator and the garbage collector, if both might be running at the same time.
I know that in some Scheme implementations, recursion will generally be faster than looping.
In short, the answer depends on the code and the implementation. Use whatever style you prefer. If you're using a functional language, recursion might be faster. If you're using an imperative language, iteration is probably faster. In some environments, both methods will result in the same assembly being generated (put that in your pipe and smoke it).
Addendum: In some environments, the best alternative is neither recursion nor iteration but instead higher order functions. These include "map", "filter", and "reduce" (which is also called "fold"). Not only are these the preferred style, not only are they often cleaner, but in some environments these functions are the first (or only) to get a boost from automatic parallelization — so they can be significantly faster than either iteration or recursion. Data Parallel Haskell is an example of such an environment.
List comprehensions are another alternative, but these are usually just syntactic sugar for iteration, recursion, or higher order functions.

is recursion ever faster than a loop?
No, Iteration will always be faster than Recursion. (in a Von Neumann Architecture)
Explanation:
If you build the minimum operations of a generic computer from scratch, "Iteration" comes first as a building block and is less resource intensive than "recursion", ergo is faster.
Building a pseudo-computing-machine from scratch:
Question yourself: What do you need to compute a value, i.e. to follow an algorithm and reach a result?
We will establish a hierarchy of concepts, starting from scratch and defining in first place the basic, core concepts, then build second level concepts with those, and so on.
First Concept: Memory cells, storage, State. To do something you need places to store final and intermediate result values. Let’s assume we have an infinite array of "integer" cells, called Memory, M[0..Infinite].
Instructions: do something - transform a cell, change its value. alter state. Every interesting instruction performs a transformation. Basic instructions are:
a) Set & move memory cells
store a value into memory, e.g.: store 5 m[4]
copy a value to another position: e.g.: store m[4] m[8]
b) Logic and arithmetic
and, or, xor, not
add, sub, mul, div. e.g. add m[7] m[8]
An Executing Agent: a core in a modern CPU. An "agent" is something that can execute instructions. An Agent can also be a person following the algorithm on paper.
Order of steps: a sequence of instructions: i.e.: do this first, do this after, etc. An imperative sequence of instructions. Even one line expressions are "an imperative sequence of instructions". If you have an expression with a specific "order of evaluation" then you have steps. It means than even a single composed expression has implicit “steps” and also has an implicit local variable (let’s call it “result”). e.g.:
4 + 3 * 2 - 5
(- (+ (* 3 2) 4 ) 5)
(sub (add (mul 3 2) 4 ) 5)
The expression above implies 3 steps with an implicit "result" variable.
// pseudocode
1. result = (mul 3 2)
2. result = (add 4 result)
3. result = (sub result 5)
So even infix expressions, since you have a specific order of evaluation, are an imperative sequence of instructions. The expression implies a sequence of operations to be made in a specific order, and because there are steps, there is also an implicit "result" intermediate variable.
Instruction Pointer: If you have a sequence of steps, you have also an implicit "instruction pointer". The instruction pointer marks the next instruction, and advances after the instruction is read but before the instruction is executed.
In this pseudo-computing-machine, the Instruction Pointer is part of Memory. (Note: Normally the Instruction Pointer will be a “special register” in a CPU core, but here we will simplify the concepts and assume all data (registers included) are part of “Memory”)
Jump - Once you have an ordered number of steps and an Instruction Pointer, you can apply the "store" instruction to alter the value of the Instruction Pointer itself. We will call this specific use of the store instruction with a new name: Jump. We use a new name because is easier to think about it as a new concept. By altering the instruction pointer we're instructing the agent to “go to step x“.
Infinite Iteration: By jumping back, now you can make the agent "repeat" a certain number of steps. At this point we have infinite Iteration.
1. mov 1000 m[30]
2. sub m[30] 1
3. jmp-to 2 // infinite loop
Conditional - Conditional execution of instructions. With the "conditional" clause, you can conditionally execute one of several instructions based on the current state (which can be set with a previous instruction).
Proper Iteration: Now with the conditional clause, we can escape the infinite loop of the jump back instruction. We have now a conditional loop and then proper Iteration
1. mov 1000 m[30]
2. sub m[30] 1
3. (if not-zero) jump 2 // jump only if the previous
// sub instruction did not result in 0
// this loop will be repeated 1000 times
// here we have proper ***iteration***, a conditional loop.
Naming: giving names to a specific memory location holding data or holding a step. This is just a "convenience" to have. We do not add any new instructions by having the capacity to define “names” for memory locations. “Naming” is not a instruction for the agent, it’s just a convenience to us. Naming makes code (at this point) easier to read and easier to change.
#define counter m[30] // name a memory location
mov 1000 counter
loop: // name a instruction pointer location
sub counter 1
(if not-zero) jmp-to loop
One-level subroutine: Suppose there’s a series of steps you need to execute frequently. You can store the steps in a named position in memory and then jump to that position when you need to execute them (call). At the end of the sequence you'll need to return to the point of calling to continue execution. With this mechanism, you’re creating new instructions (subroutines) by composing core instructions.
Implementation: (no new concepts required)
Store the current Instruction Pointer in a predefined memory position
jump to the subroutine
at the end of the subroutine, you retrieve the Instruction Pointer from the predefined memory location, effectively jumping back to the following instruction of the original call
Problem with the one-level implementation: You cannot call another subroutine from a subroutine. If you do, you'll overwrite the returning address (global variable), so you cannot nest calls.
To have a better Implementation for subroutines: You need a STACK
Stack: You define a memory space to work as a "stack", you can “push” values on the stack, and also “pop” the last “pushed” value. To implement a stack you'll need a Stack Pointer (similar to the Instruction Pointer) which points to the actual “head” of the stack. When you “push” a value, the stack pointer decrements and you store the value. When you “pop”, you get the value at the actual Stack Pointer and then the Stack Pointer is incremented.
Subroutines Now that we have a stack we can implement proper subroutines allowing nested calls. The implementation is similar, but instead of storing the Instruction Pointer in a predefined memory position, we "push" the value of the IP in the stack. At the end of the subroutine, we just “pop” the value from the stack, effectively jumping back to the instruction after the original call. This implementation, having a “stack” allows calling a subroutine from another subroutine. With this implementation we can create several levels of abstraction when defining new instructions as subroutines, by using core instructions or other subroutines as building blocks.
Recursion: What happens when a subroutine calls itself?. This is called "recursion".
Problem: Overwriting the local intermediate results a subroutine can be storing in memory. Since you are calling/reusing the same steps, if the intermediate result are stored in predefined memory locations (global variables) they will be overwritten on the nested calls.
Solution: To allow recursion, subroutines should store local intermediate results in the stack, therefore, on each recursive call (direct or indirect) the intermediate results are stored in different memory locations.
...
having reached recursion we stop here.
Conclusion:
In a Von Neumann Architecture, clearly "Iteration" is a simpler/basic concept than “Recursion". We have a form of "Iteration" at level 7, while "Recursion" is at level 14 of the concepts hierarchy.
Iteration will always be faster in machine code because it implies less instructions therefore less CPU cycles.
Which one is "better"?
You should use "iteration" when you are processing simple, sequential data structures, and everywhere a “simple loop” will do.
You should use "recursion" when you need to process a recursive data structure (I like to call them “Fractal Data Structures”), or when the recursive solution is clearly more “elegant”.
Advice: use the best tool for the job, but understand the inner workings of each tool in order to choose wisely.
Finally, note that you have plenty of opportunities to use recursion. You have Recursive Data Structures everywhere, you’re looking at one now: parts of the DOM supporting what you are reading are a RDS, a JSON expression is a RDS, the hierarchical file system in your computer is a RDS, i.e: you have a root directory, containing files and directories, every directory containing files and directories, every one of those directories containing files and directories...

Recursion may well be faster where the alternative is to explicitly manage a stack, like in the sorting or binary tree algorithms you mention.
I've had a case where rewriting a recursive algorithm in Java made it slower.
So the right approach is to first write it in the most natural way, only optimize if profiling shows it is critical, and then measure the supposed improvement.

Tail recursion is as fast as looping. Many functional languages have tail recursion implemented in them.

Most of the answers here are wrong. The right answer is it depends. For example, here are two C functions which walks through a tree. First the recursive one:
static
void mm_scan_black(mm_rc *m, ptr p) {
SET_COL(p, COL_BLACK);
P_FOR_EACH_CHILD(p, {
INC_RC(p_child);
if (GET_COL(p_child) != COL_BLACK) {
mm_scan_black(m, p_child);
}
});
}
And here is the same function implemented using iteration:
static
void mm_scan_black(mm_rc *m, ptr p) {
stack *st = m->black_stack;
SET_COL(p, COL_BLACK);
st_push(st, p);
while (st->used != 0) {
p = st_pop(st);
P_FOR_EACH_CHILD(p, {
INC_RC(p_child);
if (GET_COL(p_child) != COL_BLACK) {
SET_COL(p_child, COL_BLACK);
st_push(st, p_child);
}
});
}
}
It's not important to understand the details of the code. Just that p are nodes and that P_FOR_EACH_CHILD does the walking. In the iterative version we need an explicit stack st onto which nodes are pushed and then popped and manipulated.
The recursive function runs much faster than the iterative one. The reason is because in the latter, for each item, a CALL to the function st_push is needed and then another to st_pop.
In the former, you only have the recursive CALL for each node.
Plus, accessing variables on the callstack is incredibly fast. It means you are reading from memory which is likely to always be in the innermost cache. An explicit stack, on the other hand, has to be backed by malloc:ed memory from the heap which is much slower to access.
With careful optimization, such as inlining st_push and st_pop, I can reach roughly parity with the recursive approach. But at least on my computer, the cost of accessing heap memory is bigger than the cost of the recursive call.
But this discussion is mostly moot because recursive tree walking is incorrect. If you have a large enough tree, you will run out of callstack space which is why an iterative algorithm must be used.

Most answers here forget the obvious culprit why recursion is often slower than iterative solutions. It's linked with the build up and tear down of stack frames but is not exactly that. It's generally a big difference in the storage of the auto variable for each recursion. In an iterative algorithm with a loop, the variables are often held in registers and even if they spill, they will reside in the Level 1 cache. In a recursive algorithm, all intermediary states of the variable are stored on the stack, meaning they will generate many more spills to memory. This means that even if it makes the same amount of operations, it will have a lot memory accesses in the hot loop and what makes it worse, these memory operations have a lousy reuse rate making the caches less effective.
TL;DR recursive algorithms have generally a worse cache behavior than iterative ones.

Consider what absolutely must be done for each, iteration and recursion.
iteration: a jump to beginning of loop
recursion: a jump to beginning of called function
You see that there is not much room for differences here.
(I assume recursion being a tail-call and compiler being aware of that optimization).

In general, no, recursion will not be faster than a loop in any realistic usage that has viable implementations in both forms. I mean, sure, you could code up loops that take forever, but there would be better ways to implement the same loop that could outperform any implementation of the same problem via recursion.
You hit the nail on the head regarding the reason; creating and destroying stack frames is more expensive than a simple jump.
However, do note that I said "has viable implementations in both forms". For things like many sorting algorithms, there tends to not be a very viable way of implementing them that doesn't effectively set up its own version of a stack, due to the spawning of child "tasks" that are inherently part of the process. Thus, recursion may be just as fast as attempting to implement the algorithm via looping.
Edit: This answer is assuming non-functional languages, where most basic data types are mutable. It does not apply to functional languages.

In any realistic system, no, creating a stack frame will always be more expensive than an INC and a JMP. That's why really good compilers automatically transform tail recursion into a call to the same frame, i.e. without the overhead, so you get the more readable source version and the more efficient compiled version. A really, really good compiler should even be able to transform normal recursion into tail recursion where that is possible.

Functional programming is more about "what" rather than "how".
The language implementors will find a way to optimize how the code works underneath, if we don't try to make it more optimized than it needs to be. Recursion can also be optimized within the languages that support tail call optimization.
What matters more from a programmer standpoint is readability and maintainability rather than optimization in the first place. Again, "premature optimization is root of all evil".

This is a guess. Generally recursion probably doesn't beat looping often or ever on problems of decent size if both are using really good algorithms(not counting implementation difficulty) , it may be different if used with a language w/ tail call recursion(and a tail recursive algorithm and with loops also as part of the language)-which would probably have very similar and possibly even prefer recursion some of the time.

According to theory its the same things.
Recursion and loop with the same O() complexity will work with the same theoretical speed, but of course real speed depends on language, compiler and processor.
Example with power of number can be coded in iteration way with O(ln(n)):
int power(int t, int k) {
int res = 1;
while (k) {
if (k & 1) res *= t;
t *= t;
k >>= 1;
}
return res;
}

Here is an example when recursion ran faster than for looping in Java. This is a program which performs Bubble Sort on two arrays. The recBubbleSort(....) method sorts array arr using recursion and bbSort(....) method just uses looping to sort the array narr. The data are same in both the arrays.
public class BBSort_App {
public static void main(String args[]) {
int[] arr = {231,414235,23,543,245,6,324,-32552,-4};
long time = System.nanoTime();
recBubbleSort(arr, arr.length-1, 0);
time = System.nanoTime() - time;
System.out.println("Time Elapsed: "+time+"nanos");
disp(arr);
int[] narr = {231,414235,23,543,245,6,324,-32552,-4};
time = System.nanoTime();
bbSort(narr);
time = System.nanoTime()-time;
System.out.println("Time Elapsed: "+time+"nanos");
disp(narr);
}
static void disp(int[] origin) {
System.out.print("[");
for(int b: origin)
System.out.print(b+", ");
System.out.println("\b\b \b]");
}
static void recBubbleSort(int[] origin, int i, int j) {
if(i>0)
if(j!=i) {
if(origin[i]<origin[j]) {
int temp = origin[i];
origin[i] = origin[j];
origin[j] = temp;
}
recBubbleSort(origin, i, j+1);
}
else
recBubbleSort(origin, i-1, 0);
}
static void bbSort(int[] origin) {
for(int out=origin.length-1;out>0;out--)
for(int in=0;in<out;in++)
if(origin[out]<origin[in]) {
int temp = origin[out];
origin[out] = origin[in];
origin[in] = temp;
}
}
}
Running the test even 50 times gave alomst same results:
The answers given to this question is satisfactory but are without simple examples. Can anybody just give the reason to why this recursion runs faster?

Related

Python Tail Recursion "Hack" using While Loop

I've seen a few examples of getting Python to do tail call optimization by using a while True loop. E.g.
def tail_exp(base, exponent, acc=1):
if exponent == 0:
return acc
return tail_exp(base, exponent - 1, acc * base)
becomes
def tail_exp_2(base, exponent, acc=1):
while True:
if exponent == 0:
return acc
exponent, acc = exponent - 1, acc * base
I'm curious to know if this technique is applicable to all/most recursive algorithms in Python, and if there are any downsides or "gotchas" to look out for when optimizing recursive algorithms in this way?
Any recursive algorithm can be replaced by an iterative one. However, some examples will require an additional stack be added to the code, to manage state that is handled by the recursive calls in the original form. With tail recursion, there is no state to be managed, so no separate stack is needed.
Some programming languages take advantage of that fact and design their compilers to optimize out tail calls in recursive code, producing machine code that is equivalent to a loop. Python does not do tail call optimization, so this isn't really relevant to your question. Rewriting code by hand is not tail call optimization, it's just a particular sort of refactoring.
There are a few reasons Python chooses not to do tail call optimization. It's not because it's impossible. Python code is compiled into byte code, so at least theoretically there's an opportunity to translate a recursive call into a loop if that was desired by the developers (in practice it's a little more complicated, since Python variable names are dynamic, you can't necessarily tell if a function name refers to what you expect it to at runtime, a fact use by techniques like monkeypatching). However, the biggest problem with tail call optimization is that it generally overwrites useful debugging information that would usually be preserved by a call stack, like exactly how deep in the recursion you are and the exact state of those previous function calls. The Python developers have decided that they prefer the simplicity and debuggability of normal recursion over performance benefits of tail call optimization, even when the latter is possible.
If you want to rewrite an algorithm from a recursive implementation into an iterative one, you can always do so. In some cases, though, it may get a lot more complicated. Recursive implementations of some algorithms can be a lot shorter, simpler, and easier to reason about, even though iterative equivalents may be faster (and won't hit the recursion limit for large inputs). Converting tail calls into a loop is usually quite simple though. The complicated cases are generally not amenable to tail call optimization either, since they're doing complicated stuff with the values returned by their recursion.

Does every simple mathematical operation use the same amount of power (as in, battery power)?

Recently I have been revising some of my old python codes, which are essentially loops of algebra, in order to have them execute faster, generally by eliminating some un-necessary operations. Often, changing the value of an entry in a list from 0 (as a python float, which I believe is a double by default) to the same value, which is obviously not necessary. Or, checking if a float is equal to something, when it MUST be that thing, because a preceeding "if" would not have triggered if it wasn't, or some other extraneous operation. This got me wondering about what will preserve my battery more, as I do a some of my coding on the bus where I can't plug my laptop in.
For example, which of the following two operations would be expected to use less battery power?
if b != 0: #b was assigned previously, and I know it is zero already
b = 0
or:
b = 0
The first one checks if b is zero, and it is, so it doesn't do the next part. The second one just assigns b to zero without bothering to check. I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?
I suggest watching this talk by Chandler Carruth: "Efficiency with Algorithms, Performance with Data Structures"
He addresses the idea of "Power efficient instructions" at 4m 49s in the video. I agree with him, thinking about how much watt particular code consumes is useless. As he put it
Q: "How to save battery life?"
A: "Finish ruining the program".
Also, in Python you do not have low level control to be even thinking about low level problems like this. Use appropriate data structures and algorithms, and pray that Python interpreter will give you well optimized byte-code.
Does every simple mathematical operation use the same amount of power (as in, battery power)?
No. It's not the same to compute a two number addition than a fourier transform of a 20 megapixel photo.
I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?
Yes. You are right on your intuitions but these are very trivial examples. And if you dig deeper you will get into uncharted territory of weird optimization that's quite difficult to grasp (e.g., see this question: Times two faster than bit shift?)
In general the more your code utilizes system resources the greater power those resources would use. However it is more useful to optimize your code based on time or size constraints instead of thinking about high level code in terms of power draw.
One way of doing this is Big O notation. In essence, Big O notation is a way of comparing the size and or runtime complexity of an algorithm. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
A computer at its lowest level is large quantity of transistors which require power to change and maintain their state.
It would be extremely difficult to anticipate how much power any one line of python code would draw.
I once had questions like this. Still do sometimes. Here's the answer I wish someone told me earlier.
Summary
You are correct that generally, if your computer does less work, it'll use less power.
But we have to be really careful in figuring out which logical operations involve more work and which ones involve less work - in this case:
Reading vs writing memory is usually the same amount of work.
if and any other conditional execution also costs work.
Python's "simple operations" are not "simple operations" for the CPU.
But the idea you had is probably correct for some cases you had in mind.
If you're concerned about power consumption, measure where power is being used.
For some perspective: You're asking about which Python code costs you one more drop of water, but really in Python every operation costs a bucket and your whole Python program is using a river and your computer as a whole is using an ocean.
Direct Answers
Don't apply these answers to Python yet. Read the rest of the answer first, because there's so much indirection between Python and the CPU that you'll mislead yourself about how they're connected if you don't take that into account.
I believe the first one is more time-efficient, as you don't have to change anything in memory.
As a general rule, reading memory is just as slow as writing to memory, or even slower depending on exactly what your computer is doing. For further reading you'll want to look into CPU memory cache levels, memory access times, and how out-of-order execution and data dependencies factor into modern CPU architectures.
As a general rule, the if statement in a language is itself an operation which can have a non-negligible cost. For further reading you should look into how CPU pipelining relates to branch prediction and branch penalties. Also look into how if statements are implemented in typical CPU instruction sets.
Does "more time efficient" always imply "more power efficient"?
As a general rule, more work efficient (doing less work - less machine instructions, for example) implies more power efficient, because on modern hardware (this wasn't always this way) your hardware will use less power when it's not doing anything.
You should be careful about the idea of "more time efficient" though, because modern hardware doesn't always execute the same amount of work in the same amount of time: for further reading you'll want to look into CPU frequency scaling, ARM's big.LITTLE architectures, and discussions about the "Race to Idle" concept as a starting point.
"One Simple Operation" - CPU vs. Python
Your question is about Python, so it's important to realize that Python's x != 0, if, and x = 0 do not map directly to simple operations in the CPU.
For further reading, especially if you're familiar with C, I would recommend taking a long look at how Python is implemented. There are many implementations - the main one is CPython, which is a C program that reads and interprets Python source, converts it into Python "bytecode" and then when running interprets that bytecode one by one.
As a baseline, if you're using Python, any one "simple" operation is actually a lot of CPU operations, as each step in the Python interpreter is multiple CPU operations, but which ones cost more might be surprising.
Let's break down the three used in our example (I'm primarily describing this from the perspective of the main Python implementation written in C, called "CPython", which I am the most closely familiar with, but in general this explanation is roughly applicable to all of them, though some will be able to optimize out certain steps):
x != 0
It looks like a simple operation, and if this was C and x was an int it would be just one machine instruction - but Python allows for operator overloading, so first Python has to:
look up x (at least one memory read, but may involve one or more hashmap lookups in Python's internals, which is many machine operations),
check the type of x (more memory reading),
based on the type look up a function pointer that implements the not-equality operation (one or arbitrarily many memory reads and one or more arbitrarily many conditional branches, with data dependencies between them),
only then it can finally call that function with references to Python objects holding the values of x and 0 (which is also not "free" - look up function calling ABI for more on that).
All that and more has to be done by the CPU even if x is a Python int or float mapping closely to the CPU's native numerical data types.
x = 0
An assignment is actually far cheaper in Python (though still not trivial): it only has to get as far as step 1 above, because once it knows "where" x is, it can just overwrite that pointer with the pointer to the Python object representing 0.
if
Abstractly speaking, the Python if statement has to be able to handle "truthy" and "falsey" values, which in the most naive implementation would involves running through more CPU instructions to evaluate what result of the condition is according to Python's semantics of what's true and what's false.
Sidenote About Optimizations
Different Python implementations go to different lengths to get Python operations closer to as few CPU operations in possible. For example, an optimizing JIT (Just In Time) compiler might notice that, inside some loop on an array, all elements of the array are native integers and actually reduce the if x != 0 and x = 0 parts into their respective minimal machine instructions, but that only happens in very specific circumstances when the optimizing logic can prove that it can safely bypass a lot of the behavior it would normally need to do.
The biggest thing here is this: a high-level language like Python is so removed from the hardware that "simple" operations are often complex "under the covers".
What You Asked vs. What I Think You Wanted To Ask
Correct me if I'm wrong, but I suspect the use-case you actually had in mind was this:
if x != 0:
# some code
x = 0
vs. this:
if x != 0:
# some code
x = 0
In that case, the first option is superior to the second, because you are already paying the cost of if x != 0 anyway.
Last Point of Emphasis
The hardest breakthrough for me was moving away from trying to reason about individual instructions in my head, and instead switching into looking at how things work and measuring real systems.
Looking at how things work will teach you how to optimize, but measuring will show you where to optimize.
This question is great for exploring the former, but for your stated motivation of reducing power consumption on your laptop, you would benefit more from the latter.

Is it bad practice to use Recursion where it isn't necessary?

In one of my last assignments, I got points docked for using recursion where it was not necessary. Is it bad practice to use Recursion where you don't have to?
For instance, this Python code block could be written two ways:
def test():
if(foo() == 'success!'):
print(True)
else:
test()
or
def test():
while(True):
if(foo() == 'success!'):
print(True)
break
Is one inherently better than another? (Performance-wise or Practice-wise)?
While recursion may allow for an easier-to-read solution, there is a cost. Each function call requires a small amount of memory overhead and set-up time that a loop iteration does not.
The Python call stack is also limited to 1000 nested calls by default; each recursive call counts against that limit, so you risk raising a run-time error with any recursive algorithm. There is no such hard limit to the number of iterations a loop may make.
They're not the same. The iterative version can in theory run forever, once it is entered it doesn't it doesn't change the state of the Python virtual machine. The recursive version, however, keeps expanding the call stack when keeps going. As #chepner mentions in his answer, there's a limit to how long you can keep that up.
For the example you give, you'll notice the difference quickly! As foo never changes, when foo !='success!' the recursive version will raise an exception once you blow out the stack (which won't take long), but the iterative version will "hang". For other functions that actually terminate, between two implementations of the same algorithm, one recursive and the other iterative, the iterative version usually will outperform the recursive one as function calls impose some overhead.
Generally, recursions should bottom out -- typically there's a simplest case they handle outright without further recursive calls (n = 0, list or tuple or dict is empty, etc.) For more complex inputs, recursive calls work on constituent parts of the input (elements of a list, items of a dict, ...), returning solutions to their subproblems which the calling instance combines in some way and returns.
Recursion is analogous, and in many ways related, to mathematical induction -- more generally, well-founded induction. You reason about the correctness of a recursive procedure using induction, as the arguments passed recursively are generally "smaller" than those passed to the caller.
In your example, recursion is pointless: you're not passing data to a nested call, it's not a case of divide-and-conquer, there's no base case at all. It's the wrong mechanism for "infinite"/unbounded loops.

Python QuickSort maximum recursion depth

(Python 2.7.8 Windows)
I'm doing a comparison between different sorting algorithms (Quick, bubble and insertion), and mostly it's going as expected, Quick sort is considerably faster with long lists and bubble and insertion are faster with very short lists and alredy sorted ones.
What's raising a problem is Quick Sort and the before mentioned "already sorted" lists. I can sort lists of even 100000 items without problems with this, but with lists of integers from 0...n the limit seems to be considerably lower. 0...500 works but even 0...1000 gives:
RuntimeError: maximum recursion depth exceeded in cmp
Quick Sort:
def quickSort(myList):
if myList == []:
return []
else:
pivot = myList[0]
lesser = quickSort([x for x in myList[1:] if x < pivot])
greater = quickSort([x for x in myList[1:] if x >= pivot])
myList = lesser + [pivot] + greater
return myList
Is there something wrong with the code, or am I missing something?
There are two things going on.
First, Python intentionally limits recursion to a fixed depth. Unlike, say, Scheme, which will keep allocating frames for recursive calls until you run out of memory, Python (at least the most popular implementation, CPython) will only allocate sys.getrecursionlimit() frames (defaulting to 1000) before failing. There are reasons for that,* but really, that isn't relevant here; just the fact that it does this is what you need to know about.
Second, as you may already know, while QuickSort is O(N log N) with most lists, it has a worst case of O(N^2)—in particular (using the standard pivot rules) with already-sorted lists. And when this happens, your stack depth can end up being O(N). So, if you have 1000 elements, arranged in worst-case order, and you're already one frame into the stack, you're going to overflow.
You can work around this in a few ways:
Rewrite the code to be iterative, with an explicit stack, so you're only limited by heap memory instead of stack depth.
Make sure to always recurse into the shorter side first, rather than the left side. This means that even in the O(N^2) case, your stack depth is still O(log N). But only if you've already done the previous step.**
Use a random, median-of-three, or other pivot rule that makes common cases not like already-sorted worst-case. (Of course someone can still intentionally DoS your code; there's really no way to avoid that with quicksort.) The Wikipedia article has some discussion on this, and links to the classic Sedgewick and Knuth papers.
Use a Python implementation with an unlimited stack.***
sys.setrecursionlimit(max(sys.getrecursionlimit(), len(myList)+CONSTANT)). This way, you'll fail right off the bat for an obvious reason if you can't make enough space, and usually won't fail otherwise. (But you might—you could be starting the sort already 900 steps deep in the stack…) But this is a bad idea.****. Besides, you have to figure out the right CONSTANT, which is impossible in general.*****
* Historically, the CPython interpreter recursively calls itself for recursive Python function calls. And the C stack is fixed in size; if you overrun the end, you could segfault, stomp all over heap memory, or all kinds of other problems. This could be changed—in fact, Stackless Python started off as basically just CPython with this change. But the core devs have intentionally chosen not to do so, in part because they don't want to encourage people to write deeply recursive code.
** Or if your language does automatic tail call elimination, but Python doesn't do that. But, as gnibbler points out, you can write a hybrid solution—recurse on the small end, then manually unwrap the tail recursion on the large end—that won't require an explicit stack.
*** Stackless and PyPy can both be configured this way.
**** For one thing, eventually you're going to crash the C stack.
***** The constant isn't really constant; it depends on how deep you already are in the stack (computable non-portably by walking sys._getframe() up to the top) and how much slack you need for comparison functions, etc. (not computable at all, you just have to guess).
You're choosing the first item of each sublist as the pivot. If the list is already in order, this means that your greater sublist is all the items but the first, rather than about half of them. Essentially, each recursive call manages to process only one item. Which means the depth of recursive calls you'll need to make will be about the same as the number of items in the full list. Which overflows Python's built-in limit once you hit about 1000 items. You will have a similar problem sorting lists that are already in reversed order.
To correct this use one of the workarounds suggested in the literature, such as choosing an item at random to be the pivot or the median of the first, middle, and last items.
Always choosing the first (or last) element as the pivot will have problems for quicksort - worst case performance for some common inputs as you have seen
One technique that works fairly well is to choose the average of first,middle and last element
You don't want to make the pivot selection too complicated, or it will dominate the runtime of the search

Is Python allowed to optimize a function definition to eliminate unused code?

If I defined a function like this:
def ccid_year(seq):
year, prefix, index, suffix = seq
return year
Is Python allowed to optimize it to be effectively:
def ccid_year(seq):
return seq[0]
I'd prefer to write the first function because it documents the format of the data being passed in but would hope that Python would generate code that is effectively as efficient as the second definition.
The two functions are not equivalent:
def ccid_year_1(seq):
year, prefix, index, suffix = seq
return year
def ccid_year_2(seq):
return seq[0]
arg = {1:'a', 2:'b', 0:'c', 3:'d'}
print ccid_year_1(arg)
print ccid_year_2(arg)
The first call prints 0 and the second prints c.
I'll answer the question at face value later, but first: When in doubt, benchmark it! But first, recall that most time is spent in a small portion of the code (i.e., most code is irrelevant to performance!) and, in CPython, function call overhead usually dominates small inefficiencies. Not to mention that large-scale algorithmic inefficiencies (a.k.a. freaking stupid code) dwarfs micro-optimization concerns.
So either don't worry about this at all, or if you have reason to worry about it, first benchmark alternatives and second don't put it in a function. Note that "reasons to worry about it" must be weighted against the time spent worrying, and the maintenance burden (if there is one) of the manual optimization.
CPython, the reference implementation you most like use, is very conservative about optimizing at this level. While there is a peephole optimizer operating on bytecode, it is limited in scale. More generally, you can't expect much optimization crossing a single statement. The problem with statically optimizing Python code is that there's a billion ways even the most innocently-looking program frament can call into arbitrary code, which might do anything at all, so you can't omit these calls.
While we're at it, your proposed optimization is invalid (in the sense that the program doesn't have the same behavior) if seq is of the wrong type (not a sequence, or a very weird sequence) or length (not exactly three items long)! Any program claiming to implement Python must maintain such differences, so it won't do the transformation you suggest literally. I assume this was just an off-hand illustration, but it does indicate you seriously underestimate how complex Python is (to implement, and doubly so to optimize). I and others have written about this at length before, so I'll stop now before this post becomes even larger.
PyPy on the other hand will, if this function is indeed called from a hot loop, probably optimize this and a million other things you didn't even think of, while compiling it down to a machine code loop that iterates faster than any Python loop could ever iterate on CPython. It will still contain a few checks to break out of the loop and take the proper action (e.g. raise an exception) if necessary, but they'll also be highly efficient if not triggered.
I do not know much about IronPython and Jython and other implementations, but if their lack of consistent several-times-faster-than-CPython benchmark results is any indicator, they do not perform significant optimizations. While the VMs IronPython and Jython include JIT compilers (not - but not quite - entirely unlike PyPy's), these JIT compilers are built for very different languages, and I'd be very surprised if they could look through the mess of code IronPython/Jython must execute to achieve Python semantics and perform such optimizations on it.

Categories

Resources