Why python fibonacci sequence loop is slower than recursion? - python

Below is the well-known example of fibonacci sequence
# test.py
import sys
sys.setrecursionlimit(20000)
def fib_loop(n):
if n <= 1:
return n
fn, fnm1 = 1, 0
for _ in range(2, n+1):
fn, fnm1 = fn + fnm1, fn
return fn
def fib_recursion(n, memo={}):
if n <= 1:
return n
if n not in memo:
memo[n] = fib_recursion(n-1, memo) + fib_recursion(n-2, memo)
return memo[n]
As everybody does, I used to think that the loop variant will be much faster than the recursive one. However, the actual result is quite surprising.
$ python3 -m timeit "import test; test.fib_loop(10000)"
100 loops, best of 5: 1.93 msec per loop
$ python3 -m timeit "import test; test.fib_recursion(10000)"
500000 loops, best of 5: 471 nsec per loop
I have no idea why. Could anybody help me?

Because you are memoizing your result. And you are re-using that memo dict on every iteration. So the first time it runs it is slow. On every other invoctation, it is a simple dict-lookup.
If you use number=1 so it only runs just once, you'll see the first call is actually slower
>>> import sys
>>> sys.setrecursionlimit(20000)
>>>
>>> def fib_loop(n):
... if n <= 1:
... return n
... fn, fnm1 = 1, 0
... for _ in range(2, n+1):
... fn, fnm1 = fn + fnm1, fn
... return fn
...
>>> def fib_recursion(n, memo={}):
... if n <= 1:
... return n
... if n not in memo:
... memo[n] = fib_recursion(n-1, memo) + fib_recursion(n-2, memo)
... return memo[n]
...
>>> import timeit
>>> timeit.timeit("fib_loop(1000)", setup="from __main__ import fib_loop", number=1)
9.027599999456015e-05
>>> timeit.timeit("fib_recursion(1000)", setup="from __main__ import fib_recursion", number=1)
0.0016194200000114733
Alternatively, if you pass a new memo dict for each outer call, you get the same behavior:
>>> timeit.timeit("fib_recursion(1000, {})", setup="from __main__ import fib_recursion", number=1000)
0.38679519899999093
>>> timeit.timeit("fib_loop(1000)", setup="from __main__ import fib_loop", number=1000)
0.07079556799999409

Related

Lazy evaluation of variables

I want to lexicographically compare two lists, but the values inside the list should be computed when needed. For instance, for these two lists
a = list([1, 3, 3])
b = list([1, 2, 2])
(a < b) == False
(b < a) == True
I'd like the values in the list to be functions and in the case of a and b, the values (i.e. the function) at index=2 would not be evaluated as the values at index=1 (a[1]==3, b[1]==2) are already sufficient to determine that b < a.
One option would be to manually compare the elements, and that's probably what I will do when I don't find a solution that allows me to use the list's comparator, but I found that the manual loop is a tad slower than the list's builtin comparator which is why I want to make use of it.
Update
Here's a way to accomplish what I am trying to do, but I was wondering if there are any built-in functions that would do this faster (and which makes use of this feature of lists).
def lex_comp(a, b):
for func_a, func_b in izip(a, b):
v_a = func_a()
v_b = func_b()
if v_a < v_b: return -1
if v_b > v_a: return +1
return 0
def foo1(): return 1
def foo2(): return 1
def bar1(): return 1
def bar2(): return 2
def func1(): return ...
def func2(): return ...
list_a = [foo1, bar1, func1, ...]
list_b = [foo2, bar2, func2, ...]
# now you can use the comparator for instance to sort a list of these lists
sort([list_a, list_b], cmp=lex_comp)
Try this (the extra parameters to the function are just for illustration purposes):
import itertools
def f(a, x):
print "lazy eval of {}".format(a)
return x
a = [lambda: f('a', 1), lambda: f('b', 3), lambda: f('c', 3)]
b = [lambda: f('d', 1), lambda: f('e', 2), lambda: f('f', 2)]
c = [lambda: f('g', 1), lambda: f('h', 2), lambda: f('i', 2)]
def lazyCmpList(a, b):
l = len(list(itertools.takewhile(lambda (x, y): x() == y(), itertools.izip(a, b))))
if l == len(a):
return 0
else:
return cmp(a[l](), b[l]())
print lazyCmpList(a, b)
print lazyCmpList(b, a)
print lazyCmpList(b, c)
Produces:
lazy eval of a
lazy eval of d
lazy eval of b
lazy eval of e
-1
lazy eval of d
lazy eval of a
lazy eval of e
lazy eval of b
1
lazy eval of d
lazy eval of g
lazy eval of e
lazy eval of h
lazy eval of f
lazy eval of i
0
Note that the code assumes the list of functions are of the same length. It could be enhanced to support non-equal list length, you'd have to define what the logic was i.e. what should cmp([f1, f2, f3], [f1, f2, f3, f1]) produce?
I haven't compared the speed but given your updated code I would imagine any speedup will be marginal (looping done in C code rather than Python). This solution may actually be slower as it is more complex and involved more memory allocation.
Given you are trying to sort a list of functions by evaluating them it follows that the functions will be evaluated i.e. O(nlogn) times and so your best speedup may be to look at using memoization to avoid repeated revaluation of the functions.
Here is an approach that uses lazy evaluation:
>>> def f(x):
... return 2**x
...
>>> def g(x):
... return x*2
...
>>> [f(x) for x in range(1,10)]
[2, 4, 8, 16, 32, 64, 128, 256, 512]
>>> [g(x) for x in range(1,10)]
[2, 4, 6, 8, 10, 12, 14, 16, 18]
>>> zipped = zip((f(i) for i in range(1,10)),(g(i) for i in range(1,10)))
>>> x,y = next(itertools.dropwhile(lambda t: t[0]==t[1],zipped))
>>> x > y
True
>>> x < y
False
>>> x
8
>>> y
6
>>>
I did some testing and found that #juanpa's answer and the version in my update are the fastest versions:
import random
import itertools
import functools
num_rows = 100
data = [[random.randint(0, 2) for i in xrange(10)] for j in xrange(num_rows)]
# turn data values into functions.
def return_func(value):
return value
list_funcs = [[functools.partial(return_func, v) for v in row] for row in data]
def lazy_cmp_FujiApple(a, b):
l = len(list(itertools.takewhile(lambda (x, y): x() == y(), itertools.izip(a, b))))
if l == len(a):
return 0
else:
return cmp(a[l](), b[l]())
sorted1 = sorted(list_funcs, lazy_cmp_FujiApple)
%timeit sorted(list_funcs, lazy_cmp_FujiApple)
# 100 loops, best of 3: 2.77 ms per loop
def lex_comp_mine(a, b):
for func_a, func_b in itertools.izip(a, b):
v_a = func_a()
v_b = func_b()
if v_a < v_b: return -1
if v_a > v_b: return +1
return 0
sorted2 = sorted(list_funcs, cmp=lex_comp_mine)
%timeit sorted(list_funcs, cmp=lex_comp_mine)
# 1000 loops, best of 3: 930 µs per loop
def lazy_comp_juanpa(a, b):
x, y = next(itertools.dropwhile(lambda t: t[0]==t[1], itertools.izip(a, b)))
return cmp(x, y)
sorted3 = sorted(list_funcs, cmp=lazy_comp_juanpa)
%timeit sorted(list_funcs, cmp=lex_comp_mine)
# 1000 loops, best of 3: 949 µs per loop
%timeit sorted(data)
# 10000 loops, best of 3: 45.4 µs per loop
# print sorted(data)
# print [[c() for c in row] for row in sorted1]
# print [[c() for c in row] for row in sorted2]
# print sorted3
I guess the creation of an intermediate list is hurting performance of #FujiApple's version. When running my comparator version on the original data list and comparing the runtime to Python's native list sorting, I note that my version is about 10times slower (501 µs vs 45.4 µs per loop). I guess theres' no easy way to get close to the performance of Python's native implementation...

Performance of choice vs randint

I want to pick a random integer between a and b, inclusive.
I know 3 ways of doing it. However, their performance seems very counter-intuitive:
import timeit
t1 = timeit.timeit("n=random.randint(0, 2)", setup="import random", number=100000)
t2 = timeit.timeit("n=random.choice([0, 1, 2])", setup="import random", number=100000)
t3 = timeit.timeit("n=random.choice(ar)", setup="import random; ar = [0, 1, 2]", number=100000)
[print(t) for t in [t1, t2, t3]]
On my machine, this gives:
0.29744589625620965
0.19716156798482648
0.17500512311108346
Using an online interpreter, this gives:
0.23830216699570883
0.16536146598809864
0.15081614299560897
Note how the most direct version (#1) that uses the dedicated function for doing what I'm doing is 50% worse that the strangest version (#3) which pre-defines an array and then chooses randomly from it.
What's going on?
It's just implementation details. randint delegates to randrange, so it has another layer of function call overhead, and randrange goes through a lot of argument checking and other crud. In contrast, choice is a really simple one-liner.
Here's the code path randint goes through for this call, with comments and unexecuted code stripped out:
def randint(self, a, b):
return self.randrange(a, b+1)
def randrange(self, start, stop=None, step=1, _int=int, _maxwidth=1L<<BPF):
istart = _int(start)
if istart != start:
# not executed
if stop is None:
# not executed
istop = _int(stop)
if istop != stop:
# not executed
width = istop - istart
if step == 1 and width > 0:
if width >= _maxwidth:
# not executed
return _int(istart + _int(self.random()*width))
And here's the code path choice goes through:
def choice(self, seq):
return seq[int(self.random() * len(seq))]

Python: Why is list comprehension slower than for loop

Essentially these are the same functions - except list comprehension uses sum instead of x=0; x+= since the later is not supported. Why is list comprehension compiled to something 40% slower?
#list comprehension
def movingAverage(samples, n=3):
return [float(sum(samples[i-j] for j in range(n)))/n for i in range(n-1, len(samples))]
#regular
def moving_average(samples, n=3):
l =[]
for i in range(n-1, len(samples)):
x= 0
for j in range(n):
x+= samples[i-j]
l.append((float(x)/n))
return l
For timing the sample inputs I used variations on [i*random.random() for i in range(x)]
You are using a generator expression in your list comprehension:
sum(samples[i-j] for j in range(n))
Generator expressions require a new frame to be created each time you run one, just like a function call. This is relatively expensive.
You don't need to use a generator expression at all; you only need to slice the samples list:
sum(samples[i - n + 1:i + 1])
You can specify a second argument, a start value for the sum() function; set it to 0.0 to get a float result:
sum(samples[i - n + 1:i + 1], 0.0)
Together these changes make all the difference:
>>> from timeit import timeit
>>> import random
>>> testdata = [i*random.random() for i in range(1000)]
>>> def slow_moving_average(samples, n=3):
... return [float(sum(samples[i-j] for j in range(n)))/n for i in range(n-1, len(samples))]
...
>>> def fast_moving_average(samples, n=3):
... return [sum(samples[i - n + 1:i + 1], 0.0) / n for i in range(n-1, len(samples))]
...
>>> def verbose_moving_average(samples, n=3):
... l =[]
... for i in range(n-1, len(samples)):
... x = 0.0
... for j in range(n):
... x+= samples[i-j]
... l.append(x / n)
... return l
...
>>> timeit('f(s)', 'from __main__ import verbose_moving_average as f, testdata as s', number=1000)
0.9375386269966839
>>> timeit('f(s)', 'from __main__ import slow_moving_average as f, testdata as s', number=1000)
1.9631599469939829
>>> timeit('f(s)', 'from __main__ import fast_moving_average as f, testdata as s', number=1000)
0.5647804250038462

Performance considerations when populating lists vs dictionaries

Say I need to collect millions of strings in an iterable that I can later randomly index by position.
I need to populate the iterable one item at a time, sequentially, for millions of entries.
Given the above, which method could in principle be more efficient:
Populating a list:
while <condition>:
if <condition>:
my_list[count] = value
count += 1
Populating a dictionary:
while <condition>:
if <condition>:
my_dict[count] = value
count += 1
(the above is pesudocode, everything would be initialized before running the snippets).
I am specifically interested in the CPython implementation for Python 3.4.
Lists are definitely faster, if you use them in the right way.
In [19]: %%timeit l = []
....: for i in range(1000000): l.append(str(i))
....:
1 loops, best of 3: 182 ms per loop
In [20]: %%timeit d = {}
....: for i in range(1000000): d[i] = str(i)
....:
1 loops, best of 3: 207 ms per loop
In [21]: %timeit [str(i) for i in range(1000000)]
10 loops, best of 3: 158 ms per loop
Pushing the Python loop down to the C level with a comprehension buys you quite a bit of time. It also makes more sense to prefer a list for keys that are a prefix of the integers. Pre-allocating saves even more time:
>>> %%timeit
... l = [None] * 1000000
... for i in xrange(1000000): my_list[i] = str(i)
...
10 loops, best of 3: 147 ms per loop
For completeness, a dict comprehension does not speed things up:
In [22]: %timeit {i: str(i) for i in range(1000000)}
1 loops, best of 3: 213 ms per loop
With larger strings, I see very similar differences in performance (try str(i) * 10). This is CPython 2.7.6 on an x86-64.
I don't understand why you want to create an empty list or dict and then populate it. Why not create a new list or dictionary directly from the generation process?
results = list(a_generator)
# Or if you really want to use a dict for some reason:
results = dict(enumerate(a_generator))
You can get even better times by using the map function:
>>> def test1():
l = []
for i in range(10 ** 6):
l.append(str(i))
>>> def test2():
d = {}
for i in range(10 ** 6):
d[i] = str(i)
>>> def test3():
[str(i) for i in range(10 ** 6)]
>>> def test4():
{i: str(i) for i in range(10 ** 6)}
>>> def test5():
list(map(str, range(10 ** 6)))
>>> def test6():
r = range(10 ** 6)
dict(zip(r, map(str, r)))
>>> timeit.Timer('test1()', 'from __main__ import test1').timeit(100)
30.628035710889932
>>> timeit.Timer('test2()', 'from __main__ import test2').timeit(100)
31.093550469839613
>>> timeit.Timer('test3()', 'from __main__ import test3').timeit(100)
25.778271498509355
>>> timeit.Timer('test4()', 'from __main__ import test4').timeit(100)
30.10892986559668
>>> timeit.Timer('test5()', 'from __main__ import test5').timeit(100)
20.633583353028826
>>> timeit.Timer('test6()', 'from __main__ import test6').timeit(100)
28.660790917067914

find the best way for factorial in python?

I am researching on speed of factorial. But I am using two ways only,
import timeit
def fact(N):
B = N
while N > 1:
B = B * (N-1)
N = N-1
return B
def fact1(N):
B = 1
for i in range(1, N+1):
B = B * i
return B
print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
Here is the output,
0.540276050568 120
0.654400110245 120
From above code I have observed,
While take less time than for
My question is,
Is the best way to find the factorial in python ?
If you're looking for the best, why not use the one provided in the math module?
>>> import math
>>> math.factorial
<built-in function factorial>
>>> math.factorial(10)
3628800
And a comparison of timings on my machine:
>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
0.840167045593 120
>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
1.04350399971 120
>>> print timeit.timeit('factorial(5)', setup="from math import factorial")
0.149857997894
We see that the builtin is significantly better than either of the pure python variants you proposed.
TLDR; microbenchmarks aren't very useful
For Cpython, try this:
>>> from math import factorial
>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
1.38128209114 120
>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
1.46199703217 120
>>> print timeit.timeit('factorial(5)', setup="from math import factorial"), factorial(5)
0.397044181824 120
But under pypy, the while is faster than the one from math
>>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)\
0.170556783676 120
>>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1\
(5)
0.319650173187 120
>>>> print timeit.timeit('factorial(5)', setup="from math import factorial"), f\
actorial(5)
0.210616111755 120
So it depends on the implementation. Now try bigger numbers
>>>> print timeit.timeit('fact(50)', setup="from __main__ import fact"), fact(50)
7.71517109871 30414093201713378043612608166064768844377641568960512000000000000
>>>> print timeit.timeit('fact1(50)', setup="from __main__ import fact1"), fact1(50)
6.58060312271 30414093201713378043612608166064768844377641568960512000000000000
>>>> print timeit.timeit('factorial(50)', setup="from math import factorial"), factorial(50)
6.53072690964 30414093201713378043612608166064768844377641568960512000000000000
while is in last place, and the version using for is about the same as the one from the math module
Otherwise, if you're looking for a Python implementation (this is my favourite):
from operator import mul
def factorial(n):
return reduce(mul, range(1, (n + 1)), 1)
Usage:
>>> factorial(0)
1
>>> factorial(1)
1
>>> factorial(2)
2
>>> factorial(3)
6
>>> factorial(4)
24
>>> factorial(5)
120
>>> factorial(10)
3628800
Performance: (On my desktop:)
$ python -m timeit -c -s "fact = lambda n: reduce(lambda a, x: a * x, range(1, (n + 1)), 1)" "fact(10)"
1000000 loops, best of 3: 1.98 usec per loop
I have tried with reduce(lambda x, y: x*y, range(1, 5))
>>>timeit("import math; math.factorial(4)")
1.0205099133840179
>>>timeit("reduce(lambda x, y: x*y, range(1, 5))")
1.4047879075160665
>>>timeit("from operator import mul;reduce(mul, range(1, 5))")
2.530837320051319

Categories

Resources