I've been playing around with memoization and recursion in python 3.3
Ignoring the fact that python is the wrong language to be doing this in, I've found that I get inconsistent results between using functools.lru_cache to memoize, and not using functools.lru_cache
I'm not changing the recursion limit - it stays at the default, which for me is 1000.
To test the problem, I've written up a simple recursive function to sum numbers from 1 through i
#!/usr/bin/python
def sumtil(i):
"""Recursive function to sum all numbers from 1 through i"""
# Base case, the sum of all numbers from 1 through 1 is 1...
if i == 1:
return 1
else:
return i+sumtil(i-1)
# This will not throw an exception
sumtil(998)
# This will throw an exception
sumtil(999)
Running this function normally, I can run sumtil(998) comfortably without hitting the recursion limit. sumtil(999) or above will throw an exception.
However, if I try decorating this function with #functools.lru_cache(), the recursion limit exception is thrown 3 times earlier, when running sumtil(333)
#!/usr/bin/python
import functools
#functools.lru_cache(maxsize=128)
def sumtil(i):
"""Recursive function to sum all numbers from 1 through i"""
# Base case, the sum of all numbers from 1 through 1 is 1...
if i == 1:
return 1
else:
return i+sumtil(i-1)
# This will not throw an exception
sumtil(332)
# This will throw an exception
sumtil(333)
Being that 332*3 = 996, but 333*3 = 999, it appears to me that the lru_cache decorator is causing each level of recursion in my function to become three levels of recursion.
Why do I get three times as many levels of recursion when using functools.lru_cache to memoize a function?
Because a decorator is an extra function, so it "uses" one level in the stack. Example:
>>> def foo(f):
... def bar(i):
... if i == 1:
... raise Exception()
... return f(i)
... return bar
...
>>> #foo
... def sumtil(i):
... if i == 1:
... return 1
... else:
... return i+sumtil(i-1)
...
>>> sumtil(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in bar
File "<stdin>", line 6, in sumtil
File "<stdin>", line 5, in bar
File "<stdin>", line 6, in sumtil
File "<stdin>", line 4, in bar
Exception
>>>
Besides, if the decorator uses argument packing/unpacking, then an extra level is used (though I'm not knowledgeable enough about the Python runtime to explain why that happens).
def foo(f):
def bar(*args,**kwargs):
return f(*args,**kwargs)
return bar
Max. recursion depth exceeded:
undecorated: 1000
w/o packing: 500
with packing: 334
Related
I was revisiting some dynamic programming concepts and I wrote a code to calculate Fibonacci with memoization.
Here is the code:
def fib(n,memo={}):
if(n in memo):
return memo[n]
if(n <= 2):
return 1
memo[n]=fib(n-1,memo) + fib(n-2,memo)
return memo[n]
Now I ran some test cases and here are the results
>>> fib(2)
1
>>> fib(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
[Previous line repeated 995 more times]
File "<stdin>", line 4, in fib
RecursionError: maximum recursion depth exceeded in comparison
>>> fib(6)
8
>>> fib(10)
55
>>> fib(100)
354224848179261915075
.
.
.
.
>>> fib(980)
2873442049110331124686847975839596483580184681281842823699466692000268066325404550898791458706068536914736664630537864515125212890415097163803163111745199726085365105
>>> fib(990)
3534100091787525753399448335204590682849450463581549776041091752538906966342713601215835661100647255108360758515849851434123968685864251091027232911065706187500753920
>>> fib(999)
2686381002448535938614672720214292396761660931898695234012317599761798170024788168933836965448335656419182785616144335631297667364221035032463485041037768036733415116
>>> fib(1000)
4346655768693745643568852767504062580256466051737178040248172908953655541794905189040387984007925516929592259308032263477520968962323987332247116164299644090653318795
When I ran fib(1000) the first time, it said maximum recursion depth exceeded. However, when I gradually increased n, fib(1000) worked fine.
Then I tried fib(2000) and got the same exception.
>>> fib(2000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
[Previous line repeated 995 more times]
File "<stdin>", line 4, in fib
RecursionError: maximum recursion depth exceeded in comparison
I tried gradually increasing n and it worked fine again:
>>> fib(1200)
2726988445540627015799161531364219870500077999291772582118050289497472647637302680948250928456231003117017238012762721449359761674385644301603997220584740591763466070
>>> fib(1500)
1355112566856310195163693686714840837778601071241849724213354315322148731087352875061225935403571726530037377881434732025769925708235655004534991410292424959599748390
>>> fib(1700)
8501653935514177120246392248625833924052052390491381030300605977750345588982825628424071479174753549360050542305550855066813804919653208931716726270523366654632196915
>>> fib(1900)
5333735470177196739708654380013216364182711606231750028692155598599810955874132791398352277818697705852238294681640540003099177608752396895596802978549351480795061055
>>> fib(2000)
4224696333392304878706725602341482782579852840250681098010280137314308584370130707224123599639141511088446087538909603607640194711643596029271983312598737326253555805
>>> fib(2500)
1317090516751949629522763087125316412066606964992507141887746936727530870405038425764503130123186407746570862185871925952766836352119119528156315582632460790383834605
>>> fib(2900)
5184080332847202181832545365520373859688699234105705045492742368770388504951261158081878962852500283133276036303031796698449718008155302155556519351587134410081144235
>>> fib(3000)
4106158863079712603335683787192671052201251086373692524088854309269055842741134037313304916608500445608300368357069422745885693621454765026743730454468521604866062920
Same thing happens if I run fib(4000) immediately afterwards, but it works fine if I gradually increase. I am basically trying to understand why is that the case. The memo object is not global and should be initialized in the first call to the function, so successively increasing n to 1000 should, in theory, be no different than directly calling fib(1000).
This is because if memo is empty, the recursion needs to go all the way to the base case of n <= 2. So if you immediately start with a first call that is fib(1000) you may bump into a stack overflow.
However, when you start with smaller values, like the call of fib(10), memo will collect lots of results, including for 10 and 9. And so the next time you make a call increasing the argument that you pass, it doesn't have to recur all the way to 2, but can already back track when it reaches 9 or 10, as it will find it already available in memo.
Note that memo only initialises to {} at the moment the function is defined, so you just keep extending it, thereby reducing the need to use the call stack for deep recursions.
As trincot mentioned, there's no data present, so the code will just go and find another call to the function instead of the value until the value is computable within the stack limit. A call to the function, in theory, reserves some space somewhere (stack) where there is a returning point from the function, func's arguments and perhaps something else. What's happening for you is, that you repeat the storing or return point + args + other stuff N times when the memo is empty AND you haven't started with any computation yet, therefore the only output is either recursion exhaustion or storage (stack) exhaustion, depending on the two sizes.
However, you get a nice exception, so when such a case occurs, simply catch it, split the number into parts (e.g. divide by two/four/...) and return again (recursively). This way, even if you blow the stack with your recursion calls, you'll recursively find a smaller number which will fill the memo and then you slowly bootstrap your way to the larger numbers.
I wrote the following code, which works for calculating Fibonacci sequences:
arr = [0]
i = 1
def get_fib(position):
if position == 0:
return arr[0]
if position > 0:
global i
arr.append(i)
i = i + arr[-2]
get_fib(position - 1)
return arr[position]
Is this still recursion, even though I don't use return before get_fib?
Do I need to include return for a function to be recursive?
The function is recursive because it calls itself. So, no, technically you don't need to return the value from that call for it to be recursive.
However, for this function to work, you do. Consider this example:
def factorial(n):
if n == 0:
return 1
else:
return factorial(n-1) * n
This does the same as:
def factorial(n):
if n == 0:
result = 1
else:
result = factorial(n-1) * n
return result
What do you think would happen if we change the next to last line to just:
factorial(n-1) * n
Now there is no longer a result being assigned and the function will probably fail, claiming result has no value. If we change the original in a similar way:
def factorial(n):
if n == 0:
return 1
else:
factorial(n-1) * n
It would calculate factorial(n-1) * n, but it would simply discard the result and since there is no statement after it, the function would return (!) without a return statement, returning None instead.
An example of a recursive function that does something useful without returning anything:
from pathlib import Path
def list_txt(dir):
for name in Path(dir).glob('*'):
if name.is_dir():
list_txt(name)
elif name.suffix.lower() == '.txt':
print(name)
This function is recursive, because it calls itself with del_txt(name), but it doesn't need to return anything, so it will just return None whenever it is done. It would go through a directory and all its subdirectories and list all the .txt files in all of them. A recursive function isn't necessarily the best choice here, but it's very easy to write and maintain, and easy to read.
Yes the function is recursive, by definition, because it calls itself. Where the call to return is placed is not what determines whether or not the function is recursive. However, if you write a recursive function, it must return at some point (known as the "base case"), because if it doesn't it will cause infinite recursion, which will throw an exception ("Runtime Error: maximum recursion depth exceeded") once you pass the Python interpreter's max recursion limit.
Consider two examples:
Example 1:
>>> def fun_a():
... fun_a()
This is a simple function that, calls itself. This function has no terminating condition (the condition when it has to stop calling itself and start popping the stack contents that was build up during the call to itself). This is an example of infinite recursion. If you execute such function, you'll get an error message similar to this:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in fun_a
File "<stdin>", line 2, in fun_a
File "<stdin>", line 2, in fun_a
[Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded
Example 2:
>>> def fun_b(n):
... if n == 0:
... return
... fun_b(n-1)
In this case, even though the function is calling itself over and over again, but there is a termination condition, that will stop the function from calling itself again. This is an example of finite recursion and we say that the recursion unfolds on the base case (here base case is when the value of n becomes 0).
To conclude, this is what we say is the general format of a finite recursion. The base case and the recursive call should be in the same order as mentioned here. Otherwise, the function will never stop calling itself and will lead to infinite recursion.
function_name(parameters)
base case
recursive call
A function can be recursive even if there is no return statement. A classic example is the inorder traversal of a binary tree. It doesn't have a return statement. The only requirement for a function to be recursive is that it should call itself. Below is the code (in C).
void inorder(struct node* root)
{
if (root)
{
inorder(root->left);
printf("%d", data);
inorder(root-right);
}
}
I'm trying out the python timeit function in my Python REPL. It can time small pieces of code in two ways: Either as a callable, or as a quoted expression. I'd like to know why the following code produces different timing results.
>>> import timeit
>>> timeit.timeit("lambda *args: None")
0.058281898498535156
>>> timeit.timeit(lambda *args: None)
0.0947730541229248
>>>
My intuition tells me that there should be more 'overhead' associated with the quoted string variant because it requires interpretation, but this does not appear to be the case. But apparently my intuition is mistaken..
Here's another code snippet. There does not appear a huge time difference between invoking the callable function vs. timing the quoted function statement:
>>> def costly_func():
... return list(map(lambda x: x^2, range(10)))
...
>>> import timeit
>>> timeit.timeit(costly_func)
2.421797037124634
>>> timeit.timeit("list(map(lambda x: x^2, range(10)))")
2.3588619232177734
Observe:
>>> def costly():
... return list(map(str, list(range(1_000_000))))
...
>>> timeit.timeit(costly, number=100)
30.65105245400082
>>> timeit.timeit('costly', number=1_000_000_000, globals=globals())
27.45540758000061
Look at the number argument. It took 30 seconds to execute the function costly 100 times. It took almost 30 seconds to execute the expression costly 1'000'000'000 (!) times.
Why? Because the second code does not execute the function costly! The only thing it executes is the expression costly: notice the lack of parentheses, which means it's not a function call. The expression costly is basically a no-op (well, it just requires checking whether the name "costly" exists in the current scope, that's all), that's why it's so fast, and if Python was smart enough to optimise it away, the execution of the expression costly (not costly()!) would be instantaneous!
In your case, saying lambda *args: None is simply defining an anonymous function, right? When you execute this exact code, a new function is created, but not executed (in order to do that, you should call it: (lambda *args: None)()).
So, timing the string "lambda *args: None" with timeit.timeit("lambda *args: None") basically tests how fast Python can spit out new anonymous functions.
Timing the function itself with timeit.timeit(lambda *args: None) tests how fast Python can execute an existing function.
Spitting out newly created functions is a piece of cake, while actually running them can be really hard.
Take this code for example:
def Ackermann(m, n):
if m == 0:
return n + 1
if m > 0:
if n == 0:
return Ackermann(m - 1, 1)
elif n > 0:
return Ackermann(m - 1, Ackermann(m, n - 1))
If you put that exact code in a string and timeit it, you'll get something like this:
>>> code = """def Ackermann(m, n):
... if m == 0:
... return 0
... if m > 0:
... if n == 0:
... return Ackermann(m - 1, 1)
... elif n > 0:
... return Ackermann(m - 1, Ackermann(m, n - 1))"""
>>> timeit.timeit(code, number=1_000_000)
0.10481472999890684
Now try to timeit the function itself:
>>> timeit.timeit(lambda : Ackermann(6, 4), number=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/timeit.py", line 232, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/timeit.py", line 176, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
[Previous line repeated 1 more time]
File "<stdin>", line 6, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 6, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
[Previous line repeated 983 more times]
File "<stdin>", line 6, in Ackermann
File "<stdin>", line 2, in Ackermann
RecursionError: maximum recursion depth exceeded in comparison
See - you can't even run that! Actually, probably nobody can since it's so much recursion!
Why did the first call succeed, though? Because it didn't execute anything, it just spit out a lot of new functions and got rid of all of them shortly after.
When a function is called by unpacking arguments, it seems to increase the recursion depth twice. I would like to know why this happens.
Normally:
depth = 0
def f():
global depth
depth += 1
f()
try:
f()
except RuntimeError:
print(depth)
#>>> 999
With an unpacking call:
depth = 0
def f():
global depth
depth += 1
f(*())
try:
f()
except RuntimeError:
print(depth)
#>>> 500
In theory both should reach about 1000:
import sys
sys.getrecursionlimit()
#>>> 1000
This happens on CPython 2.7 and CPython 3.3.
On PyPy 2.7 and PyPy 3.3 there is a difference, but it is much smaller (1480 vs 1395 and 1526 vs 1395).
As you can see from the disassembly, there is little difference between the two, other than the type of call (CALL_FUNCTION vs CALL_FUNCTION_VAR):
import dis
def f():
f()
dis.dis(f)
#>>> 34 0 LOAD_GLOBAL 0 (f)
#>>> 3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
#>>> 6 POP_TOP
#>>> 7 LOAD_CONST 0 (None)
#>>> 10 RETURN_VALUE
def f():
f(*())
dis.dis(f)
#>>> 47 0 LOAD_GLOBAL 0 (f)
#>>> 3 BUILD_TUPLE 0
#>>> 6 CALL_FUNCTION_VAR 0 (0 positional, 0 keyword pair)
#>>> 9 POP_TOP
#>>> 10 LOAD_CONST 0 (None)
#>>> 13 RETURN_VALUE
The exception message actually offers you a hint. Compare the non-unpacking option:
>>> import sys
>>> sys.setrecursionlimit(4) # to get there faster
>>> def f(): f()
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded
with:
>>> def f(): f(*())
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded while calling a Python object
Note the addition of the while calling a Python object. This exception is specific to the PyObject_CallObject() function. You won't see this exception when you set an odd recursion limit:
>>> sys.setrecursionlimit(5)
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded
because that is the specific exception raised in the ceval.c frame evaluation code inside PyEval_EvalFrameEx():
/* push frame */
if (Py_EnterRecursiveCall(""))
return NULL;
Note the empty message there. This is a crucial difference.
For your 'regular' function (no variable arguments), what happens is that an optimized path is picked; a Python function that doesn't need tuple or keyword argument unpacking support is handled directly in the fast_function() function of the evaluation loop. A new frameobject with the Python bytecode object for the function is created, and run. This is one recursion check.
But for a function call with variable arguments (tuple or dictionary or both), the fast_function() call cannot be used. Instead, ext_do_call() (extended call) is used, which handles the argument unpacking, then uses PyObject_Call() to invoke the function. PyObject_Call() does a recursion limit check, and 'calls' the function object. The function object is invoked via the function_call() function, which calls PyEval_EvalCodeEx(), which calls PyEval_EvalFrameEx(), which makes the second recursion limit check.
TL;DR version
Python functions calling Python functions are optimised and bypass the PyObject_Call() C-API function, unless argument unpacking takes place. Both Python frame execution and PyObject_Call() make recursion limit tests, so bypassing PyObject_Call() avoids incrementing the recursion limit check per call.
More places with 'extra' recursion depth checks
You can grep the Python source code for Py_EnterRecursiveCall for other locations where recursion depth checks are made; various libraries, such as json and pickle use it to avoid parsing structures that are too deeply nested or recursive, for example. Other checks are placed in the list and tuple __repr__ implementations, rich comparisons (__gt__, __lt__, __eq__, etc.), handling the __call__ callable object hook and handling __str__ calls.
As such, you can hit the recursion limit much faster still:
>>> class C:
... def __str__(self):
... global depth
... depth += 1
... return self()
... def __call__(self):
... global depth
... depth += 1
... return str(self)
...
>>> depth = 0
>>> sys.setrecursionlimit(10)
>>> C()()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in __call__
File "<stdin>", line 5, in __str__
RuntimeError: maximum recursion depth exceeded while calling a Python object
>>> depth
2
Why/how does this create a seemingly infinite loop? Incorrectly, I assumed this would cause some form of a stack overflow type error.
i = 0
def foo () :
global i
i += 1
try :
foo()
except RuntimeError :
# This call recursively goes off toward infinity, apparently.
foo()
foo()
print i
The RuntimeError exception will be raised if the recursion limit is exceeded.
Since you're catching this exception, your machine is going to continue on, but you're only adding to a single global int value, which doesn't use much memory.
You can set the recursion limit with sys.setrecursionlimit().
The current limit can be found with sys.getrecursionlimit().
>>> import sys
>>> sys.setrecursionlimit(100)
>>>
>>> def foo(i):
... i += 1
... foo(i)
...
>>> foo(1)
Traceback (most recent call last):
...
File "<stdin>", line 3, in foo
RuntimeError: maximum recursion depth exceeded
>>>
If you want to run out of memory try consuming more of it.
>>> def foo(l):
... l = l * 100
... foo(l)
...
>>> foo(["hello"])
Traceback (most recent call last):
...
File "<stdin>", line 2, in foo
MemoryError
>>>
If you change the code to
i = 0
def foo ():
global i
i += 1
print i
try :
foo()
except RuntimeError :
# This call recursively goes off toward infinity, apparently.
foo()
finally:
i -= 1
print i
foo()
You'll observe that the output oscillates short below 999 (1000 being Python's default recursion limit). That means, when the limit is hit (RuntimeError) that last call of foo() is terminated, and another one is set off to replace it immediately.
If you raise a KeyboardInterrupt you'll observe how the entire trace is being terminated at once.
UPDATE
Interestingly the second call of foo() is not protected by the try ... except-block anymore. Therefore the application will in fact terminate eventually. This becomes apparant if you set the recursion limit to a smaller number, e.g. the output for sys.setrecursionlimit(3):
$ python test.py
1
2
1
2
1
0
Traceback (most recent call last):
File "test.py", line 19, in <module>
foo()
File "test.py", line 14, in foo
foo()
File "test.py", line 14, in foo
foo()
RuntimeError