When using the in operator on a literal, is it most idiomatic for that literal to be a list, set, or tuple?
e.g.
for x in {'foo', 'bar', 'baz'}:
doSomething(x)
...
if val in {1, 2, 3}:
doSomethingElse(val)
I don't see any benefit to the list, but the tuple's immutably means it could be hoisted or reused by an efficient interpreter. And in the case of the if, if it's reused, there's an efficiency benefit.
Which is the most idiomatic, and which is most performant in cpython?
Python provides a disassembler, so you can often just check the bytecode:
In [4]: def checktup():
...: for _ in range(10):
...: if val in (1, 2, 3):
...: print("foo")
...:
In [5]: def checkset():
...: for _ in range(10):
...: if val in {1, 2, 3}:
...: print("foo")
...:
In [6]: import dis
For the tuple literal:
In [7]: dis.dis(checktup)
2 0 SETUP_LOOP 32 (to 34)
2 LOAD_GLOBAL 0 (range)
4 LOAD_CONST 1 (10)
6 CALL_FUNCTION 1
8 GET_ITER
>> 10 FOR_ITER 20 (to 32)
12 STORE_FAST 0 (_)
3 14 LOAD_GLOBAL 1 (val)
16 LOAD_CONST 6 ((1, 2, 3))
18 COMPARE_OP 6 (in)
20 POP_JUMP_IF_FALSE 10
4 22 LOAD_GLOBAL 2 (print)
24 LOAD_CONST 5 ('foo')
26 CALL_FUNCTION 1
28 POP_TOP
30 JUMP_ABSOLUTE 10
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
For the set-literal:
In [8]: dis.dis(checkset)
2 0 SETUP_LOOP 32 (to 34)
2 LOAD_GLOBAL 0 (range)
4 LOAD_CONST 1 (10)
6 CALL_FUNCTION 1
8 GET_ITER
>> 10 FOR_ITER 20 (to 32)
12 STORE_FAST 0 (_)
3 14 LOAD_GLOBAL 1 (val)
16 LOAD_CONST 6 (frozenset({1, 2, 3}))
18 COMPARE_OP 6 (in)
20 POP_JUMP_IF_FALSE 10
4 22 LOAD_GLOBAL 2 (print)
24 LOAD_CONST 5 ('foo')
26 CALL_FUNCTION 1
28 POP_TOP
30 JUMP_ABSOLUTE 10
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
You'll notice that in both cases, the function will LOAD_CONST, i.e., both times it has been optimized. Even better, in the case of the set literal, the compiler has saved a frozenset, which during the construction of the function, the peephole-optimizer has managed to figure out can become the immutable equivalent of a set.
Note, on Python 2, the compiler builds a set every time!:
In [1]: import dis
In [2]: def checkset():
...: for _ in range(10):
...: if val in {1, 2, 3}:
...: print("foo")
...:
In [3]: dis.dis(checkset)
2 0 SETUP_LOOP 49 (to 52)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 35 (to 51)
16 STORE_FAST 0 (_)
3 19 LOAD_GLOBAL 1 (val)
22 LOAD_CONST 2 (1)
25 LOAD_CONST 3 (2)
28 LOAD_CONST 4 (3)
31 BUILD_SET 3
34 COMPARE_OP 6 (in)
37 POP_JUMP_IF_FALSE 13
4 40 LOAD_CONST 5 ('foo')
43 PRINT_ITEM
44 PRINT_NEWLINE
45 JUMP_ABSOLUTE 13
48 JUMP_ABSOLUTE 13
>> 51 POP_BLOCK
>> 52 LOAD_CONST 0 (None)
55 RETURN_VALUE
IMO, there's essentially no such thing as "idiomatic" usage of literal values as shown in the question. Such values look like "magic numbers" to me. Using literals for "performance" is probably misguided because it sacrifices readability for marginal gains. In cases where performance really matters, using literals is unlikely to help much and there are better options regardless.
I think the idiomatic thing to do would be to store such values in a global or class variable, especially if you're using them in multiple places (but also even if you aren't). This provides some documentation as to what a value's purpose is and makes it easier to update. You can then memomize these values in function/method definitions to improve performance if necessary.
As to what type of data structure is most appropriate, that would depend on what your program does and how it uses the data. For example, does ordering matter? With an if x in y, it won't, but maybe you're using the data in a for and an if. Without context, it's hard to say what the best choice would be.
Here's an example I think is readable, extensible, and also efficient. Memoizing the global ITEMS in the function definitions makes lookup fast because items is in the local namespace of the function. If you look at the disassembled code, you'll see that items is looked up via LOAD_FAST instead of LOAD_GLOBAL. This approach also avoids making multiple copies of the list of items, which might be relevant if it's big enough (although, if it was big enough, you probably wouldn't try to inline it anyway). Personally, I wouldn't bother with these kinds of optimizations most of the time, but they can be useful in some cases.
# In real code, this would have a domain-specific name instead of the
# generic `ITEMS`.
ITEMS = {'a', 'b', 'c'}
def filter_in_items(values, items=ITEMS):
matching_items = []
for value in values:
if value in items:
matching_items.append(value)
return matching_items
def filter_not_in_items(values, items=ITEMS):
non_matching_items = []
for value in values:
if value not in items:
non_matching_items.append(value)
return non_matching_items
print(filter_in_items(('a', 'x'))) # -> ['a']
print(filter_not_in_items(('a', 'x'))) # -> ['x']
import dis
dis.dis(filter_in_items)
Related
In this trivial example, I want to factor out the i < 5 condition of a list comprehension into it's own function. I also want to eat my cake and have it too, and avoid the overhead of the CALL_FUNCTION bytecode/creating a new frame in the python virtual machine.
Is there any way to factor out the conditions inside of a list comprehension into a new function but somehow get a disassembled result that avoids the large overhead of CALL_FUNCTION?
import dis
import sys
import timeit
def my_filter(n):
return n < 5
def a():
# list comprehension with function call
return [i for i in range(10) if my_filter(i)]
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
assert a() == b()
>>> sys.version_info[:]
(3, 6, 5, 'final', 0)
>>> timeit.timeit(a)
1.2616060493517098
>>> timeit.timeit(b)
0.685117881097812
>>> dis.dis(a)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F4890B660, file "<stdin>", line 3>)
# ...
>>> dis.dis(b)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F48A42270, file "<stdin>", line 3>)
# ...
# list comprehension with function call
# big overhead with that CALL_FUNCTION at address 12
>>> dis.dis(a.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (my_filter)
10 LOAD_FAST 1 (i)
12 CALL_FUNCTION 1
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
# list comprehension without function call
>>> dis.dis(b.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
I'm willing to take a hacky solution that I would never use in production, like somehow replacing the bytecode at run time.
In other words, is it possible to replace a's addresses 8, 10, and 12 with b's 8, 10, and 12 at runtime?
Consolidating all of the excellent answers in the comments into one.
As georg says, this sounds like you are looking for a way to inline a function or an expression, and there is no such thing in CPython attempts have been made: https://bugs.python.org/issue10399
Therefore, along the lines of "metaprogramming", you can build the lambda's inline and eval:
from typing import Callable
import dis
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
def gen_list_comprehension(expr: str) -> Callable:
return eval(f"lambda: [i for i in range(10) if {expr}]")
a = gen_list_comprehension("i < 5")
dis.dis(a.__code__.co_consts[1])
print("=" * 10)
dis.dis(b.__code__.co_consts[1])
which when run under 3.7.6 gives:
6 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
==========
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
From a security standpoint "eval" is dangerous, athough here it is less so because what you can do inside a lambda. And what can be done in an IfExp expression is even more limited, but still dangerous like call a function that does evil things.
However, if you want the same effect that is more secure, instead of working with strings you can modify AST's. I find that a lot more cumbersome though.
A hybrid approach would be the call ast.parse() and check the result. For example:
import ast
def is_cond_str(s: str) -> bool:
try:
mod_ast = ast.parse(s)
expr_ast = isinstance(mod_ast.body[0])
if not isinstance(expr_ast, ast.Expr):
return False
compare_ast = expr_ast.value
if not isinstance(compare_ast, ast.Compare):
return False
return True
except:
return False
This is a little more secure, but there still may be evil functions in the condition so you could keep going. Again, I find this a little tedious.
Coming from the other direction of starting off with bytecode, there is my cross-version assembler; see https://pypi.org/project/xasm/
Which one is better?
for x in range(0,100):
print("Lorem Ipsum")
for x in range(0,10):
for y in range(0,10):
print("Lorem Ipsum")
The second one is harder to read and you construct an unnecessary range iterable (a list in Python 2, a less memory consuming and faster to create range object in Python 3).
From the unnecessary iterable the inner for loop constructs an unnecessary iterator (a list_iterator in Python 2, a range_iterator in Python 3).
The first one is more readable and easier understandable. Use that.
Regarding performance, I doubt it makes any difference and if it does, the 0-100 is faster, because it has smaller code (if the double loop is not optimized away) and thus a smaller code path.
When in doubt about such things, use the one that is easier to understand when you read the code. Premature optimization is a sin.
You can use dis from dis module to disassemble and analyse the bytecode of wich one of your loops is better (in a way your loops needs less memory, less iterators, etc ...).
Here is a traceback:
from dis import dis
def loop1():
for x in range(100):
pass
def loop2():
for x in range(10):
for j in range(10):
pass
Now look under the hood of each loop:
dis(loop1)
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (x)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
And look at the amount of data and operations needed in your second loop:
dis(loop2)
2 0 SETUP_LOOP 43 (to 46)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 29 (to 45)
16 STORE_FAST 0 (x)
3 19 SETUP_LOOP 20 (to 42)
22 LOAD_GLOBAL 0 (range)
25 LOAD_CONST 1 (10)
28 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
31 GET_ITER
>> 32 FOR_ITER 6 (to 41)
35 STORE_FAST 1 (j)
4 38 JUMP_ABSOLUTE 32
>> 41 POP_BLOCK
>> 42 JUMP_ABSOLUTE 13
>> 45 POP_BLOCK
>> 46 LOAD_CONST 0 (None)
49 RETURN_VALUE
Because, both of loops do the same thing, the first one is a far better.
Just imagine how would you modify the nested loop for 101 iterations instead of 100 and the disadvantage is clear.
Will the following snippet create and destroy the list of constants on each loop, incurring whatever (albeit small) overhead this implies, or is the list created once?
for i in <some-type-of-iterable>:
if i in [1,3,5,18,3457,40567]:
print(i)
I am asking about both the Python "standard", such one as exists, and about the common CPython implementation.
I am aware that this example is contrived, as well as that trying to worry about performance using CPython is silly, but I am just curious.
This depends on the python implementation and version and how the "constant lists" are used. On Cpython2.7.10 with your example, it looks like the answer is that the list in the condition of the if statement is only created once...
>>> def foo():
... for i in iterable:
... if i in [1, 3, 5]:
... print(i)
...
>>> import dis
>>> dis.dis(foo)
2 0 SETUP_LOOP 34 (to 37)
3 LOAD_GLOBAL 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 26 (to 36)
10 STORE_FAST 0 (i)
3 13 LOAD_FAST 0 (i)
16 LOAD_CONST 4 ((1, 3, 5))
19 COMPARE_OP 6 (in)
22 POP_JUMP_IF_FALSE 7
4 25 LOAD_FAST 0 (i)
28 PRINT_ITEM
29 PRINT_NEWLINE
30 JUMP_ABSOLUTE 7
33 JUMP_ABSOLUTE 7
>> 36 POP_BLOCK
>> 37 LOAD_CONST 0 (None)
40 RETURN_VALUE
Notice: 16 LOAD_CONST 4 ((1, 3, 5))
Python's peephole optimizer has turned our list into a tuple (thanks python!) and stored it as a constant. Note that the peephole optimizer can only do these transforms on objects if it knows that you as the programmer have absolutely no way of getting a reference to the list (otherwise, you could mutate the list and change the meaning of the code). As far as I'm aware, they only do this optimization for list, set literals that are composed of entirely constants and are the RHS of an in operator. There might be other cases that I'm not aware of (dis.dis is your friend for finding these optimizations).
I hinted at it above, but you can do the same thing with set-literals in more recent versions of python (in python3.2+, the set is converted to a constant frozenset). The benefit there is that set/frozenset have faster membership testing on average than list/tuple.
Another example with Python 3.5, list is created for each iteration.
>>> import dis
>>> def func():
... for i in iterable:
... for j in [1,2,3]:
... print(i+j)
...
>>> dis.dis(func)
2 0 SETUP_LOOP 54 (to 57)
3 LOAD_GLOBAL 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 46 (to 56)
10 STORE_FAST 0 (i)
3 13 SETUP_LOOP 37 (to 53)
16 LOAD_CONST 1 (1) # building list
19 LOAD_CONST 2 (2)
22 LOAD_CONST 3 (3)
25 BUILD_LIST 3
28 GET_ITER
>> 29 FOR_ITER 20 (to 52) # inner loop body begin
32 STORE_FAST 1 (j)
4 35 LOAD_GLOBAL 1 (print)
38 LOAD_FAST 0 (i)
41 LOAD_FAST 1 (j)
44 BINARY_ADD
45 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
48 POP_TOP
49 JUMP_ABSOLUTE 29 # inner loop body end
>> 52 POP_BLOCK
>> 53 JUMP_ABSOLUTE 7 # outer loop end,
# jumping back before list creation
>> 56 POP_BLOCK
>> 57 LOAD_CONST 0 (None)
60 RETURN_VALUE
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
As far as i know, monitoring exception will make a program slower.
Would an iterator exception monitor, such as StopIteration make a for loop slower?
While exception monitoring has some small overhead in the usual case, in the case of iterators there does not appear to be any overhead involved in handling StopIteration exceptions. Python optimises iterators as a special case so that StopIteration doesn't involve any exception handlers. (I'll also observe---and I may be missing something---that it's hard to come up with a Python for loop that doesn't implicitly use iterators).
Here's some examples, first using the built-in range function and a simple for loop:
Python 2.7.5
>>> import dis
>>> def x():
... for i in range(1,11):
... pass
...
>>> dis.dis(x)
2 0 SETUP_LOOP 23 (to 26)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1)
9 LOAD_CONST 2 (11)
12 CALL_FUNCTION 2
15 GET_ITER
>> 16 FOR_ITER 6 (to 25)
19 STORE_FAST 0 (i)
3 22 JUMP_ABSOLUTE 16
>> 25 POP_BLOCK
>> 26 LOAD_CONST 0 (None)
29 RETURN_VALUE
Note that range is essentially being treated as an iterator.
Now, using a simple generator function:
>>> def g(x):
... while x < 11:
... yield x
... x = x + 1
...
>>> def y():
... for i in g(1):
... pass
...
>>> dis.dis(y)
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (g)
6 LOAD_CONST 1 (1)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (i)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> dis.dis(g)
2 0 SETUP_LOOP 31 (to 34)
>> 3 LOAD_FAST 0 (x)
6 LOAD_CONST 1 (11)
9 COMPARE_OP 0 (<)
12 POP_JUMP_IF_FALSE 33
3 15 LOAD_FAST 0 (x)
18 YIELD_VALUE
19 POP_TOP
4 20 LOAD_FAST 0 (x)
23 LOAD_CONST 2 (1)
26 BINARY_ADD
27 STORE_FAST 0 (x)
30 JUMP_ABSOLUTE 3
>> 33 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
37 RETURN_VALUE
Note that y here is basically the same as x above, the difference being one LOAD_CONST instruction, since x references the number 11. Likewise, our simple generator is basically equivalent to the same thing written as a while loop:
>>> def q():
... x = 1
... while x < 11:
... x = x + 1
...
>>> dis.dis(q)
2 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (x)
3 6 SETUP_LOOP 26 (to 35)
>> 9 LOAD_FAST 0 (x)
12 LOAD_CONST 2 (11)
15 COMPARE_OP 0 (<)
18 POP_JUMP_IF_FALSE 34
4 21 LOAD_FAST 0 (x)
24 LOAD_CONST 1 (1)
27 BINARY_ADD
28 STORE_FAST 0 (x)
31 JUMP_ABSOLUTE 9
>> 34 POP_BLOCK
>> 35 LOAD_CONST 0 (None)
38 RETURN_VALUE
Again, there's no specific overhead to handle the iterator or the generator (range may be somewhat more optimised than the generator version, simply because its a built-in, but not due to the way Python handles it).
Finally, let's look at an actual explicit iterator written with StopIteration
>>> class G(object):
... def __init__(self, x):
... self.x = x
... def __iter__(self):
... return self
... def next(self):
... x = self.x
... if x >= 11:
... raise StopIteration
... x = x + 1
... return x - 1
...
>>> dis.dis(G.next)
7 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 0 (x)
6 STORE_FAST 1 (x)
8 9 LOAD_FAST 1 (x)
12 LOAD_CONST 1 (11)
15 COMPARE_OP 5 (>=)
18 POP_JUMP_IF_FALSE 30
9 21 LOAD_GLOBAL 1 (StopIteration)
24 RAISE_VARARGS 1
27 JUMP_FORWARD 0 (to 30)
10 >> 30 LOAD_FAST 1 (x)
33 LOAD_CONST 2 (1)
36 BINARY_ADD
37 STORE_FAST 1 (x)
11 40 LOAD_FAST 1 (x)
43 LOAD_CONST 2 (1)
46 BINARY_SUBTRACT
47 RETURN_VALUE
Now, here we can see that the generator function involves a few less instructions than this simple iterator, mostly related to the differences in implementation and a couple of instructions related to raising the StopIteration exception. Nevertheless, a function using this iterator is exactly equivalent to y above:
>>> def z():
... for i in G(1):
... pass
...
>>> dis.dis(z)
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (G)
6 LOAD_CONST 1 (1)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (i)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
Of course, these results are based around the fact that Python for-loops will optimise iterators to remove the need for explicit handlers for the StopIteration exception. After all, StopIteration exception essentially form a normal part of the operation of a Python for-loop.
Regarding why it is implemented this way, see PEP-234 which defines iterators. This specifically addresses the issue of the expense of the exception:
It has been questioned whether an exception to signal the end of
the iteration isn't too expensive. Several alternatives for the
StopIteration exception have been proposed: a special value End
to signal the end, a function end() to test whether the iterator
is finished, even reusing the IndexError exception.
A special value has the problem that if a sequence ever
contains that special value, a loop over that sequence will
end prematurely without any warning. If the experience with
null-terminated C strings hasn't taught us the problems this
can cause, imagine the trouble a Python introspection tool
would have iterating over a list of all built-in names,
assuming that the special End value was a built-in name!
Calling an end() function would require two calls per
iteration. Two calls is much more expensive than one call
plus a test for an exception. Especially the time-critical
for loop can test very cheaply for an exception.
Reusing IndexError can cause confusion because it can be a
genuine error, which would be masked by ending the loop
prematurely.
Looking at the output of the bytecode generated by a function with a try and except block, it looks like it would be slightly slower, however, this is honestly negligible in most circumstances, as it is extremely small as far as performance hit goes. I think the real thing to consider when doing an optimization like this would be scoping the exceptions properly.
Output of an example function with try/except block when compiled to bytecode:
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import dis
>>> def x():
try:
sd="lol"
except:
raise
>>> dis.dis(x)
2 0 SETUP_EXCEPT 10 (to 13)
3 3 LOAD_CONST 1 ('lol')
6 STORE_FAST 0 (sd)
9 POP_BLOCK
10 JUMP_FORWARD 10 (to 23)
4 >> 13 POP_TOP
14 POP_TOP
15 POP_TOP
5 16 RAISE_VARARGS 0
19 JUMP_FORWARD 1 (to 23)
22 END_FINALLY
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>>
Consider the following functions:
def fact1(n):
if n < 2:
return 1
else:
return n * fact1(n-1)
def fact2(n):
if n < 2:
return 1
return n * fact2(n-1)
They should be equivalent. But there's a performance difference:
>>> T(lambda : fact1(1)).repeat(number=10000000)
[2.5754408836364746, 2.5710129737854004, 2.5678811073303223]
>>> T(lambda : fact2(1)).repeat(number=10000000)
[2.8432059288024902, 2.834425926208496, 2.8364310264587402]
The version without the else is 10% slower. This is pretty significant. Why?
What is happening here is that fact2 has a hash conflict with __name__ in your module globals. That makes the lookup of the global fact2 ever so slightly slower.
>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('__package__', 15), ('fact2', 25), ('__name__', 25), ('fact1', 26), ('__doc__', 29)]
i.e. The same answer as for Why is early return slower than else? except that there the hash conflict was with __builtins__
For me, they are virtually the same speed: (Python 2.6.6 on Debian)
In [4]: %timeit fact1(1)
10000000 loops, best of 3: 151 ns per loop
In [5]: %timeit fact2(1)
10000000 loops, best of 3: 154 ns per loop
The byte code is also very similar:
In [6]: dis.dis(fact1)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 COMPARE_OP 0 (<)
9 JUMP_IF_FALSE 5 (to 17)
12 POP_TOP
3 13 LOAD_CONST 2 (1)
16 RETURN_VALUE
>> 17 POP_TOP
5 18 LOAD_FAST 0 (n)
21 LOAD_GLOBAL 0 (fact)
24 LOAD_FAST 0 (n)
27 LOAD_CONST 2 (1)
30 BINARY_SUBTRACT
31 CALL_FUNCTION 1
34 BINARY_MULTIPLY
35 RETURN_VALUE
36 LOAD_CONST 0 (None)
39 RETURN_VALUE
In [7]: dis.dis(fact2)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 COMPARE_OP 0 (<)
9 JUMP_IF_FALSE 5 (to 17)
12 POP_TOP
3 13 LOAD_CONST 2 (1)
16 RETURN_VALUE
>> 17 POP_TOP
4 18 LOAD_FAST 0 (n)
21 LOAD_GLOBAL 0 (fact)
24 LOAD_FAST 0 (n)
27 LOAD_CONST 2 (1)
30 BINARY_SUBTRACT
31 CALL_FUNCTION 1
34 BINARY_MULTIPLY
35 RETURN_VALUE
The only difference is that the version with the else includes code to return None in case control reaches the end of the function body.
I question the timings. The two functions aren't recursing to themselves. fact1 and fact2 both call fact which isn't shown.
Once that is fixed, the disassembly (in both Py2.6 and Py2.7) shows that both are running the same op codes except for the name of the recursed into function. The choice of name trigger a small difference in timings because fact1 may insert in the module dictionary with no name collisions while *fact2) may have a hash value that collides with another name in the module.
In other words, any differences you see in timings are not due to the choice of whether the else-clause is present :-)