Recursive "decorator" in Kotlin - python

Suppose I have a recursive function like fibonacci:
fun fibonacci(n: Int): BigInteger =
if (n < 2)
n.toBigInteger()
else
fibonacci(n-1) + fibonacci(n-2)
This is slow because I'm recalculating known values a bunch of times. I can fix this by adding a "memo":
val memo = ConcurrentSkipListMap<Int, BigInteger>()
fun mFibonacci(n: Int): BigInteger =
memo.computeIfAbsent(n) {
if (n < 2)
it.toBigInteger()
else
mFibonacci(n-1) + mFibonacci(n-2)
}
Works like a charm, but can I do this without touching the function? My first thought was to use a wrapper class:
class Cached<in T, out R>(private val f: (T) -> R) : (T) -> R {
private val cache = ConcurrentSkipListMap<T, R>()
override fun invoke(x: T): R = cache.computeIfAbsent(x, f)
}
cFibonacci = Cached(::fibonacci)
... but the problem is, this only memoizes the outer-most call. If I call cFibonacci with a "big" number like 42, it takes a long time and then puts the correct value in the memo; subsequent calls with the 42 will be fast, but 41 will be slow again. Compare this to mFibonacci, which runs fast the first time, and populates the memo with values from 0 up to 42.
In Python, I can write a "decorator" which does this.
def memoized(f):
def helper(t):
if x in helper.memo:
return helper.memo[t]
else:
r = f(t)
helper.memo[t] = r
return r
helper.memo = {}
return helper
#memoized
def fib(n):
if n < 2:
return n
else:
return fib(n-1) + fib(n-2)
This works just like mFibonacci above. I can also call it as fib = memoized(fib) if I imported fib from somewhere else and don't have access to the definition. Interestingly, c_fib = memoized(fib) works like Cached/cFibonacci above, hinting that maybe mutability of function reference is necessary.
The question is: (how) can I wrap/"decorate" a recursive function in a way that affects the inner calls in Kotlin the way I can in Python?

I'll suggest a workaround in the absence of a solution. This pattern requires access to the definition of the function (i.e. it can't be an import):
object fibonacci: (Int) -> BigInteger {
private val memo = ConcurrentSkipListMap<Int, BigInteger>()
override fun invoke(n: Int): BigInteger = fibonacci(n)
private fun fibonacci(n: Int): BigInteger = memo.computeIfAbsent(n) {
if (n < 2)
n.toBigInteger()
else
fibonacci(n-1) + fibonacci(n-2)
}
}
There are few decisions here that may need justification:
I'm using a camelCase name instead of PascalCase. Despite the fact that object is a class, it's being called as a function and so I feel the naming convention of a function is better. With this, you can call the fibonacci function exactly as you normally would.
I've renamed invoke to fibonacci. Without this, the recursive calls use invoke which seems less readable to me. With this, you can read and write the fibonacci function (almost*) exactly as you normally would.
In general, the idea is to be the least intrusive as possible while still adding the desired functionality. I'm open to suggestions on how to improve it though!
*Something to note is that the function is defined using lambda syntax, so there is no return. If you have a single return at the end of the function, you just remove the return keyword. If you have multiple returns, you'll have to use the less-than-beautiful return#computeIfAbsent for the short-circuits.

As requested, this is a variation of #AlexJones's workaround that doesn't wrap the function in an object with unconventional naming. I did not test this--it's based on the assumption that the other solution works. The following code would be at the top level of a .kt file.
private val memo = ConcurrentSkipListMap<Int, BigInteger>()
fun fibonacci(n: Int): BigInteger = fibonacciImpl(n)
private fun fibonacciImpl(n: Int): BigInteger = memo.computeIfAbsent(n) {
if (n < 2)
n.toBigInteger()
else
fibonacci(n-1) + fibonacci(n-2)
}

Related

Turning a recursive function into an iterative function

I have written the following recursive function, but am incurring a runtime error due to maximum recursion depth. I was wondering is it possible to write an iterative function to overcome this:
def finaldistance(n):
if n%2 == 0:
return 1 + finaldistance(n//2)
elif n != 1:
a = finaldistance(n-1)+1
b = distance(n)
return min(a,b)
else:
return 0
What I have tried is this but it does not seem to be working,
def finaldistance(n, acc):
while n > 1:
if n%2 == 0:
(n, acc) = (n//2, acc+1)
else:
a = finaldistance(n-1, acc) + 1
b = distance(n)
if a < b:
(n, acc) = (n-1, acc+1)
else:
(n, acc) =(1, acc + distance(n))
return acc
Johnbot's solution shows you how to solve your specific problem. How in general can we remove this recursion? Let me show you how, by making a series of small, clearly correct, clearly safe refactorings.
First, here's a slightly rewritten version of your function. I hope you agree it is the same:
def f(n):
if n % 2 == 0:
return 1 + f(n // 2)
elif n != 1:
a = f(n - 1) + 1
b = d(n)
return min(a, b)
else:
return 0
I want the base case to be first. This function is logically the same:
def f(n):
if n == 1:
return 0
if n % 2 == 0:
return 1 + f(n // 2)
a = f(n - 1) + 1
b = d(n)
return min(a, b)
I want the code that comes after each recursive call to be a method call and nothing else. These functions are logically the same:
def add_one(n, x):
return 1 + x
def min_distance(n, x):
a = x + 1
b = d(n)
return min(a, b)
def f(n):
if n == 1:
return 0
if n % 2 == 0:
return add_one(n, f(n // 2))
return min_distance(n, f(n - 1))
Similarly, we add helper functions that compute the recursive argument:
def half(n):
return n // 2
def less_one(n):
return n - 1
def f(n):
if n == 1:
return 0
if n % 2 == 0:
return add_one(n, f(half(n))
return min_distance(n, f(less_one(n))
Again, make sure you agree that this program is logically the same. Now I'm going to simplify the computation of the argument:
def get_argument(n):
return half if n % 2 == 0 else less_one
def f(n):
if n == 1:
return 0
argument = get_argument(n) # argument is a function!
if n % 2 == 0:
return add_one(n, f(argument(n)))
return min_distance(n, f(argument(n)))
Now I'm going to do the same thing to the code after the recursion, and we'll get down to a single recursion:
def get_after(n):
return add_one if n % 2 == 0 else min_distance
def f(n):
if n == 1:
return 0
argument = get_argument(n)
after = get_after(n) # this is also a function!
return after(n, f(argument(n)))
Now I'm noticing that we're passing n to get_after, and then passing it right along to "after" again. I'm going to curry these functions to eliminate that problem. This step is tricky. Make sure you understand it!
def add_one(n):
return lambda x: x + 1
def min_distance(n):
def nested(x):
a = x + 1
b = d(n)
return min(a, b)
return nested
These functions did take two arguments. Now they take one argument, and return a function that takes one argument! So we refactor the use site:
def get_after(n):
return add_one(n) if n % 2 == 0 else min_distance(n)
and here:
def f(n):
if n == 1:
return 0
argument = get_argument(n)
after = get_after(n) # now this is a function of one argument, not two
return after(f(argument(n)))
Similarly we notice that we are calling get_argument(n)(n) to get the argument. Let's simplify that:
def get_argument(n):
return half(n) if n % 2 == 0 else less_one(n)
And let's make it just slightly more general:
base_case_value = 0
def is_base_case(n):
return n == 1
def f(n):
if is_base_case(n):
return base_case_value
argument = get_argument(n)
after = get_after(n)
return after(f(argument))
OK, we now have our program in an extremely compact form. The logic has been spread out into multiple functions, and some of them are curried, to be sure. But now that the function is in this form we can easily remove the recursion. This is the bit that is really tricky is turning the whole thing into an explicit stack:
def f(n):
# Let's make a stack of afters.
afters = [ ]
while not is_base_case(n) :
argument = get_argument(n)
after = get_after(n)
afters.append(after)
n = argument
# Now we have a stack of afters:
x = base_case_value
while len(afters) != 0:
after = afters.pop()
x = after(x)
return x
Study this implementation very carefully. You will learn a lot from it. Remember, when you do a recursive call:
after(f(something))
you are saying that after is the continuation -- the thing that comes next -- of the call to f. We typically implement continuations by putting information about the location in the callers code onto the "call stack". What we're doing in this removal of recursion is simply moving continuation information off of the call stack and onto a stack data structure. But the information is exactly the same.
The important thing to realize here is that we typically think of the call stack as "what is the thing that happened in the past that got me here?". That is exactly backwards. The call stack tells you what you have to do after this call is finished! So that's the information that we encode in the explicit stack. Nowhere do we encode what we did before each step as we "unwind the stack", because we don't need that information.
As I said in my initial comment: there is always a way to turn a recursive algorithm into an iterative one but it is not always easy. I've shown you here how to do it: carefully refactor the recursive method until it is extremely simple. Get it down to a single recursion by refactoring it. Then, and only then, apply this transformation to get it into an explicit stack form. Practice that until you are comfortable with this program transformation. You can then move on to more advanced techniques for removing recursions.
Note that of course this is almost certainly not the "pythonic" way to solve this problem; you could likely build a much more compact, understandable method using lazily evaluated list comprehensions. This answer was intended to answer the specific question that was asked: how in general do we turn recursive methods into iterative methods?
I mentioned in a comment that a standard technique for removing a recursion is to build an explicit list as a stack. This shows that technique. There are other techniques: tail recursion, continuation passing style and trampolines. This answer is already too long, so I'll cover those in a follow-up answer.
Read this answer after you read my first answer.
Again, we are answering the question in general of "how do you turn a recursive algorithm into an iterative algorithm", in this case in Python. As noted previously, this is about exploring the general idea of transforming a program; this is not the "pythonic" way to solve the specific problem.
In my first answer I started by rewriting the program into this form:
def f(n):
if is_base_case(n):
return base_case_value
argument = get_argument(n)
after = get_after(n)
return after(f(argument))
And then transformed it into this form:
def f(n):
# Let's make a stack of afters.
afters = [ ]
while not is_base_case(n) :
argument = get_argument(n)
after = get_after(n)
afters.append(after)
n = argument
# Now we have a stack of afters:
x = base_case_value
while len(afters) != 0:
after = afters.pop()
x = after(x)
return x
The technique here is to construct an explicit stack of "after" calls for a particular input, and then once we have it, run down the whole stack. We are essentially simulating what the runtime already does: constructs a stack of "continuations" that say what to do next.
A different technique is to let the function itself decide what to do with its continuation; this is called "continuation passing style". Let's explore it.
This time, we're going to add a parameter c to the recursive method f. c is a function that takes what would normally be the return value of f, and does whatever was suppose to happen after the call to f. That is, it is explicitly the continuation of f. The method f then becomes "void returning".
The base case is easy. What do we do if we're in the base case? We call the continuation with the value we would have returned:
def f(n, c):
if is_base_case(n):
c(base_case_value)
return
Easy peasy. What about the non-base case? Well, what were we going to do in the original program? We were going to (1) get the arguments, (2) get the "after" -- the continuation of the recursive call, (3) do the recursive call, (4) call "after", its continuation, and (5) return the computed value to whatever the continuation of f is.
We're going to do all the same things, except that when we do step (3) now we need to pass in a continuation that does steps 4 and 5:
argument = get_argument(n)
after = get_after(n)
f(argument, lambda x: c(after(x)))
Hey, that is so easy! What do we do after the recursive call? Well, we call after with the value returned by the recursive call. But now that value is going to be passed to the recursive call's continuation function, so it just goes into x. What happens after that? Well, whatever was going to happen next, and that's in c, so it needs to be called, and we're done.
Let's try it out. Previously we would have said
print(f(100))
but now we have to pass in what happens after f(100). Well, what happens is, the value gets printed!
f(100, print)
and we're done.
So... big deal. The function is still recursive. Why is this interesting? Because the function is now tail recursive! That is, the last thing it does in the non-base case is call itself. Consider a silly case:
def tailcall(x, sum):
if x <= 0:
return sum
return tailcall(x - 1, sum + x)
If we call tailcall(10, 0) it calls tailcall(9, 10), which calls (8, 19), and so on. But any tail-recursive method we can rewrite into a loop very, very easily:
def tailcall(x, sum):
while True:
if x <= 0:
return sum
x = x - 1
sum = sum + x
So can we do the same thing with our general case?
# This is wrong!
def f(n, c):
while True:
if is_base_case(n):
c(base_case_value)
return
argument = get_argument(n)
after = get_after(n)
n = argument
c = lambda x: c(after(x))
Do you see what is wrong? the lambda is closed over c and after, which means that every lambda will use the current value of c and after, not the value it had when the lambda was created. So this is broken, but we can fix it easily by creating a scope which introduces new variables every time it is invoked:
def continuation_factory(c, after)
return lambda x: c(after(x))
def f(n, c):
while True:
if is_base_case(n):
c(base_case_value)
return
argument = get_argument(n)
after = get_after(n)
n = argument
c = continuation_factory(c, after)
And we're done! We've turned this recursive algorithm into an iterative algorithm.
Or... have we?
Think about this really carefully before you read on. Your spider sense should be telling you that something is wrong here.
The problem we started with was that a recursive algorithm is blowing the stack. We've turned this into an iterative algorithm -- there's no recursive call at all here! We just sit in a loop updating local variables.
The question though is -- what happens when the final continuation is called, in the base case? What does that continuation do? Well, it calls its after, and then it calls its continuation. What does that continuation do? Same thing.
All we've done here is moved the recursive control flow into a collection of function objects that we've built up iteratively, and calling that thing is still going to blow the stack. So we haven't actually solved the problem.
Or... have we?
What we can do here is add one more level of indirection, and that will solve the problem. (This solves every problem in computer programming except one problem; do you know what that problem is?)
What we'll do is we'll change the contract of f so that it is no longer "I am void-returning and will call my continuation when I'm done". We will change it to "I will return a function that, when it is called, calls my continuation. And furthermore, my continuation will do the same."
That sounds a little tricky but really its not. Again, let's reason it through. What does the base case have to do? It has to return a function which, when called, calls my continuation. But my continuation already meets that requirement:
def f(n, c):
if is_base_case(n):
return c(base_case_value)
What about the recursive case? We need to return a function, which when called, executes the recursion. The continuation of that call needs to be a function that takes a value and returns a function that when called executes the continuation on that value. We know how to do that:
argument = get_argument(n)
after = get_after(n)
return lambda : f(argument, lambda x: lambda: c(after(x)))
OK, so how does this help? We can now move the loop into a helper function:
def trampoline(f, n, c):
t = f(n, c)
while t != None:
t = t()
And call it:
trampoline(f, 3, print)
And holy goodness it works.
Follow along what happens here. Here's the call sequence with indentation showing stack depth:
trampoline(f, 3, print)
f(3, print)
What does this call return? It effectively returns lambda : f(2, lambda x: lambda : print(min_distance(x)), so that's the new value of t.
That's not None, so we call t(), which calls:
f(2, lambda x: lambda : print(min_distance(x))
What does that thing do? It immediately returns
lambda : f(1,
lambda x:
lambda:
(lambda x: lambda : print(min_distance(x)))(add_one(x))
So that's the new value of t. It's not None, so we invoke it. That calls:
f(1,
lambda x:
lambda:
(lambda x: lambda : print(min_distance(x)))(add_one(x))
Now we're in the base case, so we *call the continuation, substituting 0 for x. It returns:
lambda: (lambda x: lambda : print(min_distance(x)))(add_one(0))
So that's the new value of t. It's not None, so we invoke it.
That calls add_one(0) and gets 1. It then passes 1 for x in the middle lambda. That thing returns:
lambda : print(min_distance(1))
So that's the new value of t. It's not None, so we invoke it. And that calls
print(min_distance(1))
Which prints out the correct answer, print returns None, and the loop stops.
Notice what happened there. The stack never got more than two deep because every call returned a function that said what to do next to the loop, rather than calling the function.
If this sounds familiar, it should. Basically what we're doing here is making a very simple work queue. Every time we "enqueue" a job, it is immediately dequeued, and the only thing the job does is enqueues the next job by returning a lambda to the trampoline, which sticks it in its "queue", the variable t.
We break the problem up into little pieces, and make each piece responsible for saying what the next piece is.
Now, you'll notice that we end up with arbitrarily deep nested lambdas, just as we ended up in the previous technique with an arbitrarily deep queue. Essentially what we've done here is moved the workflow description from an explicit list into a network of nested lambdas, but unlike before, this time we've done a little trick to avoid those lambdas ever calling each other in a manner that increases the stack depth.
Once you see this pattern of "break it up into pieces and describe a workflow that coordinates execution of the pieces", you start to see it everywhere. This is how Windows works; each window has a queue of messages, and messages can represent portions of a workflow. When a portion of a workflow wishes to say what the next portion is, it posts a message to the queue, and it runs later. This is how async await works -- again, we break up the workflow into pieces, and each await is the boundary of a piece. It's how generators work, where each yield is the boundary, and so on. Of course they don't actually use trampolines like this, but they could.
The key thing to understand here is the notion of continuation. Once you realize that you can treat continuations as objects that can be manipulated by the program, any control flow becomes possible. Want to implement your own try-catch? try-catch is just a workflow where every step has two continuations: the normal continuation and the exceptional continuation. When there's an exception, you branch to the exceptional continuation instead of the regular continuation. And so on.
The question here was again, how do we eliminate an out-of-stack caused by a deep recursion in general. I've shown that any recursive method of the form
def f(n):
if is_base_case(n):
return base_case_value
argument = get_argument(n)
after = get_after(n)
return after(f(argument))
...
print(f(10))
can be rewritten as:
def f(n, c):
if is_base_case(n):
return c(base_case_value)
argument = get_argument(n)
after = get_after(n)
return lambda : f(argument, lambda x: lambda: c(after(x)))
...
trampoline(f, 10, print)
and that the "recursive" method will now use only a very small, fixed amount of stack.
First you need to find all the values of n, luckily your sequence is strictly descending and only depends on the next distance:
values = []
while n > 1:
values.append(n)
n = n // 2 if n % 2 == 0 else n - 1
Next you need to calculate the distance at each value. To do that we need to start from the buttom:
values.reverse()
And now we can easily keep track of the previous distance if we need it to calculate the next distance.
distance_so_far = 0
for v in values:
if v % 2 == 0:
distance_so_far += 1
else:
distance_so_far = min(distance(v), distance_so_far + 1)
return distance_so_far
Stick it all together:
def finaldistance(n):
values = []
while n > 1:
values.append(n)
n = n // 2 if n % 2 == 0 else n - 1
values.reverse()
distance_so_far = 0
for v in values:
if v % 2 == 0:
distance_so_far += 1
else:
distance_so_far = min(distance(v), distance_so_far + 1)
return distance_so_far
And now you're using memory instead of stack.
(I don't program in Python so this is probably not be idiomatic Python)

[Python]Function that runs once then remembers result when called again [duplicate]

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

python 3: setting up a variable once inside a function that is called multiple times [duplicate]

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

Is there a way to refer to the current function in python?

I want a function to refer to itself. e.g. to be recursive.
So I do something like that:
def fib(n):
return n if n <= 1 else fib(n-1)+fib(n-2)
This is fine most of the time, but fib does not, actually, refer to itself; it refers to the the binding of fib in the enclosing block. So if for some reason fib is reassigned, it will break:
>>> foo = fib
>>> fib = foo(10)
>>> x = foo(8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in fib
TypeError: 'int' object is not callable
How can I prevent this from happening (from inside fib), if at all possible? As far as I know, the name of fib does not exist before the function-definition is fully executed; Are there any workarounds?
I don't have a real use case where it may actually happen; I am asking out of sheer curiosity.
I'd make a decorator for this
from functools import wraps
def selfcaller(func):
#wraps(func)
def wrapper(*args, **kwargs):
return func(wrapper, *args, **kwargs)
return wrapper
And use it like
#selfcaller
def fib(self, n):
return n if n <= 1 else self(n-1)+self(n-2)
This is actually a readable way to define a Fixed Point Combinator (or Y Combinator):
fix = lambda g: (lambda f: g(lambda arg: f(f)(arg))) (lambda f: g(lambda arg: f(f)(arg)))
usage:
fib = fix(lambda self: lambda n: n if n <= 1 else self(n-1)+self(n-2))
or:
#fix
def fib(self):
return lambda n: n if n <= 1 else self(n-1)+self(n-2)
The binding here happens in the formal parameter, so the problem does not arise.
There's no way to do what you're trying to do. You're right that fib does not exist before the function definition is executed (or, worse, it exists but refers to something completely different…), which means there is no workaround from inside fib that can possibly work.*
However, if you're willing to drop that requirement, there are workarounds that do work. For example:
def _fibmaker():
def fib(n):
return n if n <= 1 else fib(n-1)+fib(n-2)
return fib
fib = _fibmaker()
del _fibmaker
Now fib is referring to the binding in the closure from the local environment of a call to _fibmaker. Of course even that can be replaced if you really want to, but it's not easy (the fib.__closure__ attribute is not writable; it's a tuple, so you can't replace any of its cells; each cell's cell_contents is a readonly attribute, …), and there's no way you're going to do it by accident.
There are other ways to do this (e.g., use a special placeholder inside fib, and a decorator that replaces the placeholder with the decorated function), and they're all about equally unobvious and ugly, which may seem to violate TOOWTDI. But in this case, the "it" is something you probably don't want to do, so it doesn't really matter.
Here's one way you can write a general, pure-python decorator for a function that uses self instead of its own name, without needing an extra self parameter to the function:
def selfcaller(func):
env = {}
newfunc = types.FunctionType(func.__code__, globals=env)
env['self'] = newfunc
return newfunc
#selfcaller
def fib(n):
return n if n <= 1 else self(n-1)+self(n-2)
Of course this won't work on a function that has any free variables that are bound from globals, but you can fix that with a bit of introspection. And, while we're at it, we can also remove the need to use self inside the function's definition:
def selfcaller(func):
env = dict(func.__globals__)
newfunc = types.FunctionType(func.__code__, globals=env)
env[func.__code__.co_name] = newfunc
return newfunc
This is Python 3.x-specific; some of the attribute names are different in 2.x, but otherwise it's the same.
This still isn't 100% fully general. For example, if you want to be able to use it on methods so they can still call themselves even if the class or object redefines their name, you need slightly different tricks. And there are some pathological cases that might require building a new CodeType out of func.__code__.co_code. But the basic idea is the same.
* As far as Python is concerned, until the name is bound, it doesn't exist… but obviously, under the covers, the interpreter has to know the name of the function you're defining. And at least some interpreters offer non-portable ways to get at that information.
For example, in CPython 3.x, you can very easily get the name of the function currently being defined—it's just sys._getframe().f_code.co_name.
Of course this won't directly do you any good, because nothing (or the wrong thing) is bound to that name. But notice that f_code in there. That's the current frame's code object. Of course you can't call a code object directly, but you can do so indirectly, either by generating a new function out of it, or by using bytecodehacks.
For example:
def fib2(n):
f = sys._getframe()
fib2 = types.FunctionType(f.f_code, globals=globals())
return n if n<=1 else fib2(n-1)+fib2(n-2)
Again, this won't handle every pathological case… but the only way I can think of to do so is to actually keep a circular reference to the frame, or at least its globals (e.g., by passing globals=f.f_globals), which seems like a very bad idea.
See Frame Hacks for more clever things you can do.
Finally, if you're willing to step out of Python entirely, you can create an import hook that preprocesses or compiles your code from a Python custom-extended with, say, defrec into pure Python and/or bytecode.
And if you're thinking "But that sounds like it would be a lot nicer as a macro than as a preprocessor hack, if only Python had macros"… then you'll probably prefer to use a preprocessor hack that gives Python macros, like MacroPy, and then write your extensions as macros.
Like abamert said "..there is no way around the problem from inside ..".
Here's my approach:
def fib(n):
def fib(n):
return n if n <= 1 else fib(n-1)+fib(n-2)
return fib(n)
Someone asked me for a macro based solution for this, so here it is:
# macropy/my_macro.py
from macropy.core.macros import *
macros = Macros()
#macros.decorator()
def recursive(tree, **kw):
tree.decorator_list = []
wrapper = FunctionDef(
name=tree.name,
args=tree.args,
body=[],
decorator_list=tree.decorator_list
)
return_call = Return(
Call(
func = Name(id=tree.name),
args = tree.args.args,
keywords = [],
starargs = tree.args.vararg,
kwargs = tree.args.kwarg
)
)
return_call = parse_stmt(unparse_ast(return_call))[0]
wrapper.body = [tree, return_call]
return wrapper
This can be used as follows:
>>> import macropy.core.console
0=[]=====> MacroPy Enabled <=====[]=0
>>> from macropy.my_macro import macros, recursive
>>> #recursive
... def fib(n):
... return n if n <= 1 else fib(n-1)+fib(n-2)
...
>>> foo = fib
>>> fib = foo(10)
>>> x = foo(8)
>>> x
21
It basically does exactly the wrapping that hus787 gave:
Create a new statement which does return fib(...), which uses the argument list of the original function as the ...
Create a new def, with the same name, same args, same decorator_list as the old one
Place the old function, together followed by the return statement, in the body of the new functiondef
Strip the original function of its decorators (I assume you'd want to decorate the wrapper instead)
The parse_stmt(unparse_ast(return_call))[0] rubbish is a quick hack to get stuff to work (you actually can't just copy the argument AST from the param list of the function and use them in a Call AST) but that's just detail.
To show that it's actually doing that, you can add a print unparse_ast statement to see what the transformed function looks like:
#macros.decorator()
def recursive(tree, **kw):
...
print unparse_ast(wrapper)
return wrapper
which, when run as above, prints
def fib(n):
def fib(n):
return (n if (n <= 1) else (fib((n - 1)) + fib((n - 2))))
return fib(n)
Looks like exactly what you want! It should work for any function, with multiple args, kwargs, defaults, etc., but I'm too lazy to test. Working with the AST is a bit verbose, and MacroPy is still super-experimental, but i think it's pretty neat.

What is memoization and how can I use it in Python?

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

Categories

Resources