I was revisiting some dynamic programming concepts and I wrote a code to calculate Fibonacci with memoization.
Here is the code:
def fib(n,memo={}):
if(n in memo):
return memo[n]
if(n <= 2):
return 1
memo[n]=fib(n-1,memo) + fib(n-2,memo)
return memo[n]
Now I ran some test cases and here are the results
>>> fib(2)
1
>>> fib(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
[Previous line repeated 995 more times]
File "<stdin>", line 4, in fib
RecursionError: maximum recursion depth exceeded in comparison
>>> fib(6)
8
>>> fib(10)
55
>>> fib(100)
354224848179261915075
.
.
.
.
>>> fib(980)
2873442049110331124686847975839596483580184681281842823699466692000268066325404550898791458706068536914736664630537864515125212890415097163803163111745199726085365105
>>> fib(990)
3534100091787525753399448335204590682849450463581549776041091752538906966342713601215835661100647255108360758515849851434123968685864251091027232911065706187500753920
>>> fib(999)
2686381002448535938614672720214292396761660931898695234012317599761798170024788168933836965448335656419182785616144335631297667364221035032463485041037768036733415116
>>> fib(1000)
4346655768693745643568852767504062580256466051737178040248172908953655541794905189040387984007925516929592259308032263477520968962323987332247116164299644090653318795
When I ran fib(1000) the first time, it said maximum recursion depth exceeded. However, when I gradually increased n, fib(1000) worked fine.
Then I tried fib(2000) and got the same exception.
>>> fib(2000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
File "<stdin>", line 6, in fib
[Previous line repeated 995 more times]
File "<stdin>", line 4, in fib
RecursionError: maximum recursion depth exceeded in comparison
I tried gradually increasing n and it worked fine again:
>>> fib(1200)
2726988445540627015799161531364219870500077999291772582118050289497472647637302680948250928456231003117017238012762721449359761674385644301603997220584740591763466070
>>> fib(1500)
1355112566856310195163693686714840837778601071241849724213354315322148731087352875061225935403571726530037377881434732025769925708235655004534991410292424959599748390
>>> fib(1700)
8501653935514177120246392248625833924052052390491381030300605977750345588982825628424071479174753549360050542305550855066813804919653208931716726270523366654632196915
>>> fib(1900)
5333735470177196739708654380013216364182711606231750028692155598599810955874132791398352277818697705852238294681640540003099177608752396895596802978549351480795061055
>>> fib(2000)
4224696333392304878706725602341482782579852840250681098010280137314308584370130707224123599639141511088446087538909603607640194711643596029271983312598737326253555805
>>> fib(2500)
1317090516751949629522763087125316412066606964992507141887746936727530870405038425764503130123186407746570862185871925952766836352119119528156315582632460790383834605
>>> fib(2900)
5184080332847202181832545365520373859688699234105705045492742368770388504951261158081878962852500283133276036303031796698449718008155302155556519351587134410081144235
>>> fib(3000)
4106158863079712603335683787192671052201251086373692524088854309269055842741134037313304916608500445608300368357069422745885693621454765026743730454468521604866062920
Same thing happens if I run fib(4000) immediately afterwards, but it works fine if I gradually increase. I am basically trying to understand why is that the case. The memo object is not global and should be initialized in the first call to the function, so successively increasing n to 1000 should, in theory, be no different than directly calling fib(1000).
This is because if memo is empty, the recursion needs to go all the way to the base case of n <= 2. So if you immediately start with a first call that is fib(1000) you may bump into a stack overflow.
However, when you start with smaller values, like the call of fib(10), memo will collect lots of results, including for 10 and 9. And so the next time you make a call increasing the argument that you pass, it doesn't have to recur all the way to 2, but can already back track when it reaches 9 or 10, as it will find it already available in memo.
Note that memo only initialises to {} at the moment the function is defined, so you just keep extending it, thereby reducing the need to use the call stack for deep recursions.
As trincot mentioned, there's no data present, so the code will just go and find another call to the function instead of the value until the value is computable within the stack limit. A call to the function, in theory, reserves some space somewhere (stack) where there is a returning point from the function, func's arguments and perhaps something else. What's happening for you is, that you repeat the storing or return point + args + other stuff N times when the memo is empty AND you haven't started with any computation yet, therefore the only output is either recursion exhaustion or storage (stack) exhaustion, depending on the two sizes.
However, you get a nice exception, so when such a case occurs, simply catch it, split the number into parts (e.g. divide by two/four/...) and return again (recursively). This way, even if you blow the stack with your recursion calls, you'll recursively find a smaller number which will fill the memo and then you slowly bootstrap your way to the larger numbers.
Related
I've learned about iterators and such and discovered this quite interesting way of getting the first element in a list that a condition is applied (and also with default value in case we don't find it):
first_occurence = next((x for x in range(1,10) if x > 5), None)
For me, it seems a very useful, clear way of obtaining the result.
But since I've never seen that in production code, and since next is a little more "low-level" in the python structure I was wondering if that could be bad practice for some reason. Is that the case? and why?
It's fine. It's efficient, it's fairly readable, etc.
If you're expecting a result, or None is a possible result (so using None as a placeholder makes it hard to figure out if you got a result or got the default) it may be better to use the EAFP form rather than providing a default, catching the StopIteration it raises if no item is found, or just letting it bubble up if the problem is from the caller's input not meeting specs (so it's up to them to handle it). It looks even cleaner at point of use that way:
first_occurence = next(x for x in range(1,10) if x > 5)
Alternatively, when None is a valid result, you can use an explicit sentinel object that's guaranteed unique like so:
sentinel = object() # An anonymous object you construct can't possibly appear in the input
first_occurence = next((x for x in range(1,10) if x > 5), sentinel)
if first_occurence is not sentinel: # Compare with is for performance and to avoid broken __eq__ comparing equal to sentinel
A common use case for this one of these constructs to replace a call to any when you not only need to know if any item passed the test, but which item (any can only return True or False, so it's unsuited to finding which item passed).
We can wrap it up in a function to provide an even nicer interface:
_raise = object()
# can pass either an iterable or an iterator
def first(iterable, condition, *, default=_raise, exctype=None):
"""Get the first value from `iterable` which meets `condition`.
Will consume elements from the iterable.
default -> if no element meets the condition, return this instead.
exctype -> if no element meets the condition and there is no default,
raise this kind of exception rather than `StopIteration`.
(It will be chained from the original `StopIteration`.)
"""
try:
# `iter` is idempotent; this makes sure we have an iterator
return next(filter(condition, iter(iterable)))
except StopIteration as e:
if default is not _raise:
return default
if exctype:
raise exctype() from e
raise
Let's test it:
>>> first(range(10), lambda x: x > 5)
6
>>> first(range(10), lambda x: x > 11)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in first
StopIteration
>>> first(range(10), lambda x: x > 11, exctype=ValueError)
Traceback (most recent call last):
File "<stdin>", line 4, in first
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in first
ValueError
>>> first(range(10), lambda x: x > 11, default=None)
>>>
I'm trying out the python timeit function in my Python REPL. It can time small pieces of code in two ways: Either as a callable, or as a quoted expression. I'd like to know why the following code produces different timing results.
>>> import timeit
>>> timeit.timeit("lambda *args: None")
0.058281898498535156
>>> timeit.timeit(lambda *args: None)
0.0947730541229248
>>>
My intuition tells me that there should be more 'overhead' associated with the quoted string variant because it requires interpretation, but this does not appear to be the case. But apparently my intuition is mistaken..
Here's another code snippet. There does not appear a huge time difference between invoking the callable function vs. timing the quoted function statement:
>>> def costly_func():
... return list(map(lambda x: x^2, range(10)))
...
>>> import timeit
>>> timeit.timeit(costly_func)
2.421797037124634
>>> timeit.timeit("list(map(lambda x: x^2, range(10)))")
2.3588619232177734
Observe:
>>> def costly():
... return list(map(str, list(range(1_000_000))))
...
>>> timeit.timeit(costly, number=100)
30.65105245400082
>>> timeit.timeit('costly', number=1_000_000_000, globals=globals())
27.45540758000061
Look at the number argument. It took 30 seconds to execute the function costly 100 times. It took almost 30 seconds to execute the expression costly 1'000'000'000 (!) times.
Why? Because the second code does not execute the function costly! The only thing it executes is the expression costly: notice the lack of parentheses, which means it's not a function call. The expression costly is basically a no-op (well, it just requires checking whether the name "costly" exists in the current scope, that's all), that's why it's so fast, and if Python was smart enough to optimise it away, the execution of the expression costly (not costly()!) would be instantaneous!
In your case, saying lambda *args: None is simply defining an anonymous function, right? When you execute this exact code, a new function is created, but not executed (in order to do that, you should call it: (lambda *args: None)()).
So, timing the string "lambda *args: None" with timeit.timeit("lambda *args: None") basically tests how fast Python can spit out new anonymous functions.
Timing the function itself with timeit.timeit(lambda *args: None) tests how fast Python can execute an existing function.
Spitting out newly created functions is a piece of cake, while actually running them can be really hard.
Take this code for example:
def Ackermann(m, n):
if m == 0:
return n + 1
if m > 0:
if n == 0:
return Ackermann(m - 1, 1)
elif n > 0:
return Ackermann(m - 1, Ackermann(m, n - 1))
If you put that exact code in a string and timeit it, you'll get something like this:
>>> code = """def Ackermann(m, n):
... if m == 0:
... return 0
... if m > 0:
... if n == 0:
... return Ackermann(m - 1, 1)
... elif n > 0:
... return Ackermann(m - 1, Ackermann(m, n - 1))"""
>>> timeit.timeit(code, number=1_000_000)
0.10481472999890684
Now try to timeit the function itself:
>>> timeit.timeit(lambda : Ackermann(6, 4), number=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/timeit.py", line 232, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/timeit.py", line 176, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
[Previous line repeated 1 more time]
File "<stdin>", line 6, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 6, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
File "<stdin>", line 8, in Ackermann
[Previous line repeated 983 more times]
File "<stdin>", line 6, in Ackermann
File "<stdin>", line 2, in Ackermann
RecursionError: maximum recursion depth exceeded in comparison
See - you can't even run that! Actually, probably nobody can since it's so much recursion!
Why did the first call succeed, though? Because it didn't execute anything, it just spit out a lot of new functions and got rid of all of them shortly after.
When a function is called by unpacking arguments, it seems to increase the recursion depth twice. I would like to know why this happens.
Normally:
depth = 0
def f():
global depth
depth += 1
f()
try:
f()
except RuntimeError:
print(depth)
#>>> 999
With an unpacking call:
depth = 0
def f():
global depth
depth += 1
f(*())
try:
f()
except RuntimeError:
print(depth)
#>>> 500
In theory both should reach about 1000:
import sys
sys.getrecursionlimit()
#>>> 1000
This happens on CPython 2.7 and CPython 3.3.
On PyPy 2.7 and PyPy 3.3 there is a difference, but it is much smaller (1480 vs 1395 and 1526 vs 1395).
As you can see from the disassembly, there is little difference between the two, other than the type of call (CALL_FUNCTION vs CALL_FUNCTION_VAR):
import dis
def f():
f()
dis.dis(f)
#>>> 34 0 LOAD_GLOBAL 0 (f)
#>>> 3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
#>>> 6 POP_TOP
#>>> 7 LOAD_CONST 0 (None)
#>>> 10 RETURN_VALUE
def f():
f(*())
dis.dis(f)
#>>> 47 0 LOAD_GLOBAL 0 (f)
#>>> 3 BUILD_TUPLE 0
#>>> 6 CALL_FUNCTION_VAR 0 (0 positional, 0 keyword pair)
#>>> 9 POP_TOP
#>>> 10 LOAD_CONST 0 (None)
#>>> 13 RETURN_VALUE
The exception message actually offers you a hint. Compare the non-unpacking option:
>>> import sys
>>> sys.setrecursionlimit(4) # to get there faster
>>> def f(): f()
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded
with:
>>> def f(): f(*())
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded while calling a Python object
Note the addition of the while calling a Python object. This exception is specific to the PyObject_CallObject() function. You won't see this exception when you set an odd recursion limit:
>>> sys.setrecursionlimit(5)
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded
because that is the specific exception raised in the ceval.c frame evaluation code inside PyEval_EvalFrameEx():
/* push frame */
if (Py_EnterRecursiveCall(""))
return NULL;
Note the empty message there. This is a crucial difference.
For your 'regular' function (no variable arguments), what happens is that an optimized path is picked; a Python function that doesn't need tuple or keyword argument unpacking support is handled directly in the fast_function() function of the evaluation loop. A new frameobject with the Python bytecode object for the function is created, and run. This is one recursion check.
But for a function call with variable arguments (tuple or dictionary or both), the fast_function() call cannot be used. Instead, ext_do_call() (extended call) is used, which handles the argument unpacking, then uses PyObject_Call() to invoke the function. PyObject_Call() does a recursion limit check, and 'calls' the function object. The function object is invoked via the function_call() function, which calls PyEval_EvalCodeEx(), which calls PyEval_EvalFrameEx(), which makes the second recursion limit check.
TL;DR version
Python functions calling Python functions are optimised and bypass the PyObject_Call() C-API function, unless argument unpacking takes place. Both Python frame execution and PyObject_Call() make recursion limit tests, so bypassing PyObject_Call() avoids incrementing the recursion limit check per call.
More places with 'extra' recursion depth checks
You can grep the Python source code for Py_EnterRecursiveCall for other locations where recursion depth checks are made; various libraries, such as json and pickle use it to avoid parsing structures that are too deeply nested or recursive, for example. Other checks are placed in the list and tuple __repr__ implementations, rich comparisons (__gt__, __lt__, __eq__, etc.), handling the __call__ callable object hook and handling __str__ calls.
As such, you can hit the recursion limit much faster still:
>>> class C:
... def __str__(self):
... global depth
... depth += 1
... return self()
... def __call__(self):
... global depth
... depth += 1
... return str(self)
...
>>> depth = 0
>>> sys.setrecursionlimit(10)
>>> C()()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in __call__
File "<stdin>", line 5, in __str__
RuntimeError: maximum recursion depth exceeded while calling a Python object
>>> depth
2
One of my coworkers was using the builtin max function (on Python 2.7), and he found a weird behavior.
By mistake, instead of using the keyword argument key (as in key=lambda n: n) to pre-sort the list passed as a parameter, he did:
>>> max([1,2,3,3], lambda n : n)
[1, 2, 3, 3]
He was doing what in the documentation is explained as:
If two or more positional arguments are provided, the largest of the positional arguments is returned., so now I'm curious about why this happens:
>>> (lambda n:n) < []
True
>>> def hello():
... pass
...
>>> hello < []
True
>>> len(hello)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'function' has no len()
I know it's not a big deal, but I'd appreciate if any of the stackoverflowers could explain how those comparisons are internally made (or point me into a direction where I can find that information). :-)
Thank you in advance!
Python 2 orders objects of different types rather arbitrarily. It did this to make lists always sortable, whatever the contents. Which direction that comparison comes out as is really not of importance, just that one always wins. As it happens, the C implementation falls back to comparing type names; lambda's type name is function, which sorts before list.
In Python 3, your code would raise an exception instead:
>>> (lambda n: n) < []
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: function() < list()
because, as you found out, supporting arbitrary comparisons mostly leads to hard-to-crack bugs.
Everything in Python (2) can be compared, but some are fairly nonsensical, as you've seen.
>>> (lambda n:n) < []
True
Python 3 resolves this, and produces exceptions instead.
I've been playing around with memoization and recursion in python 3.3
Ignoring the fact that python is the wrong language to be doing this in, I've found that I get inconsistent results between using functools.lru_cache to memoize, and not using functools.lru_cache
I'm not changing the recursion limit - it stays at the default, which for me is 1000.
To test the problem, I've written up a simple recursive function to sum numbers from 1 through i
#!/usr/bin/python
def sumtil(i):
"""Recursive function to sum all numbers from 1 through i"""
# Base case, the sum of all numbers from 1 through 1 is 1...
if i == 1:
return 1
else:
return i+sumtil(i-1)
# This will not throw an exception
sumtil(998)
# This will throw an exception
sumtil(999)
Running this function normally, I can run sumtil(998) comfortably without hitting the recursion limit. sumtil(999) or above will throw an exception.
However, if I try decorating this function with #functools.lru_cache(), the recursion limit exception is thrown 3 times earlier, when running sumtil(333)
#!/usr/bin/python
import functools
#functools.lru_cache(maxsize=128)
def sumtil(i):
"""Recursive function to sum all numbers from 1 through i"""
# Base case, the sum of all numbers from 1 through 1 is 1...
if i == 1:
return 1
else:
return i+sumtil(i-1)
# This will not throw an exception
sumtil(332)
# This will throw an exception
sumtil(333)
Being that 332*3 = 996, but 333*3 = 999, it appears to me that the lru_cache decorator is causing each level of recursion in my function to become three levels of recursion.
Why do I get three times as many levels of recursion when using functools.lru_cache to memoize a function?
Because a decorator is an extra function, so it "uses" one level in the stack. Example:
>>> def foo(f):
... def bar(i):
... if i == 1:
... raise Exception()
... return f(i)
... return bar
...
>>> #foo
... def sumtil(i):
... if i == 1:
... return 1
... else:
... return i+sumtil(i-1)
...
>>> sumtil(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in bar
File "<stdin>", line 6, in sumtil
File "<stdin>", line 5, in bar
File "<stdin>", line 6, in sumtil
File "<stdin>", line 4, in bar
Exception
>>>
Besides, if the decorator uses argument packing/unpacking, then an extra level is used (though I'm not knowledgeable enough about the Python runtime to explain why that happens).
def foo(f):
def bar(*args,**kwargs):
return f(*args,**kwargs)
return bar
Max. recursion depth exceeded:
undecorated: 1000
w/o packing: 500
with packing: 334