Difference between Memoization Implementations - Python - python

What is the difference (if any exists) between these memoization implementations? Is there a use case where one is preferable to the other? (I included this Fibo recursion as an example)
Put another way: is there a difference between checking if some_value in self.memo: and if some_value not in self.memo:, and if so, is there a case where one presents a better implementation (better optimized for performance, etc.)?
class Fibo:
def __init__(self):
self.memo = {}
"""Implementation 1"""
def fib1(self, n):
if n in [0,1]:
return n
if n in self.memo:
return self.memo[n]
result = self.fib1(n - 1) + self.fib1(n - 2)
self.memo[n] = result
return result
"""Implementation 2"""
def fib2(self, n):
if n in [0,1]:
return n
if n not in self.memo:
result = self.fib2(n - 1) + self.fib2(n - 2)
self.memo[n] = result
return self.memo[n]
# Fibo().fib1(8) returns 21
# Fibo().fib2(8) returns 21

There is no significant performance difference in these implementations. In my opinion fib2 is a more readable/pythonic implementation, and should be preferred.
One other recommendation I would make, is to initialise the memo in __init__ like this:
self.memo = {0:0, 1:1}
This avoids the need to make a conditional check inside each and every call, you can simply remove the first two lines of the fib method now.

Related

Why recursive function is considering last element in an array twice?

Trying to understand recursive functions and so created this program but the output is incorrect. Would like to understand what am I doing wrong here
class Recursive:
def __init__(self, arr):
self.arr = arr
self.sum = 0
def sumRecursive(self):
if len(self.arr) == 0:
return self.sum
self.sum = self.arr.pop(0)
return self.sum + self.sumRecursive()
def main():
recur = Recursive([1,2,3])
print(recur.sumRecursive())
main()
output: 9
There are two types of recursion to consider: tail recursion, where the return value of a single recursive call is returned as-is, and "regular" recursion, where you do something with the return value(s) of the recursive call(s) before returning yourself.
You are combining the two. You either add a value from the list to the recursive sum, using no accumulator:
def non_tail_recursive(self):
if len(self.arr) == 0:
return 0
return self.arr.pop(0) + self.non_tail_recursive()
or you use an accumulator:
def tail_recursive(self):
if len(self.arr) == 0:
return self.sum
self.sum += self.arr.pop(0)
return self.tail_recursive()
You don't usually use an object with state to implement recursion. If you're keeping state, then you often don't need a recursive solution at all.
Here's how to do a "stateless" recursive sum.
def sumRecursive(arr):
if not arr:
return 0
return arr[0] + sumRecursive(arr[1:])
def main():
print(sumRecursive([1,2,3]))
main()
Your self.sum attribute is redundant. The information being processed by a recursive algorithm rarely needs members to pass information along.
class Recursive:
def __init__(self, arr):
self.arr = arr
def sumRecursive(self):
if not len(self.arr):
return 0
return self.arr.pop(0) + self.sumRecursive()
def main():
recur = Recursive([1,2,3])
print(recur.sumRecursive())
main()
Output: 6
To answer your question without rewriting your code (since obviously there are better ways to sum arrays), the answer is that the last leg of your recursion calls self.sum + self.sumRecursive which hits your sumRecursive function one last time which returns (from your if statement) self.sum which is the last element in your list (which you already summed).
Instead when the array is empty, return 0.
class Recursive:
def __init__(self, arr):
self.arr = arr
self.sum = 0
def sumRecursive(self):
if len(self.arr) == 0:
return 0
self.sum = self.arr.pop(0)
return self.sum + self.sumRecursive()
def main():
recur = Recursive([1,2,3,4])
print(recur.sumRecursive())
main()
Optionally move your if to the bottom where I personally think it makes more sense:
def sumRecursive(self):
self.sum = self.arr.pop(0)
if len(self.arr) == 0:
return self.sum
else:
return self.sum + self.sumRecursive()
fundamentals
You are making this more challenging for yourself because you are tangling your sum function with the class. The sum function simply needs to work on a list but the context of your object, self, is making it hard for you to focus.
Recursion is a functional heritage and this means writing your code in a functional and modular way. It's easier to read/write small, single-purpose functions and it promotes reuse within other areas of your program. Next time you need to sum a list in another class, do you want to rewrite sumRecursive again?
Write your summing function once and then import it where it's needed -
# mymath.py
def mysum(t):
if not t:
return 0
else:
return t.pop() + mysum(t)
See how mysum has no concern for context, self? Why should it? All it does it sum the elements of a list.
Now write your recursive module -
# recursive.py
from mymath import mysum
class Recursive:
def __init__(self, arr): self.arr = arr
def sum(self): return mysum(self.arr)
See how Recursive.sum just hands off self.arr to mysum? It doesn't have to be more complicated than that. mysum will work on every list, not just lists within your Recursive module. We don't have to know how mysum works, that is the concern of the mymath module.
Now we write the main module. Each module represents a barrier of abstraction. This means that the details of a module shouldn't spill over into other modules. We don't know how Recursive actually sums the input, and from the caller's point of view, we don't care. That is the concern of Recursive module.
# main.py
from recursive import Recursive
recur = Recursive([1,2,3])
print(recur.sum())
6
going functional
Above we wrote mysum using the .pop technique in your question. I did this because that seems to be how you are understanding the problem right now. But watch what happens when we do this -
x = [1,2,3]
print(mysum(x)) # 6
print(mysum(x)) # 0
Why does the mysum return a different answer the second time? Because as it is written now, mysum uses t.pop(), which mutates t. When mysum is finished running, t is completely emptied!
By why would we write our function like this? What if 5 + x returned a different result each time we called it?
x = 3
print(5 + x) # 8
print(5 + x) # 8
How annoying it would be if we could not depend on values not to change. The sum of the input, [1,2,3], is 6. But as it is written, the sum of the input is to return 6 and empty the input. This second part of emptying (changing) the input is known as a side effect. Ie, the desired effect is to sum and emptying of the input list is unintended but a consequence of us using .pop to calculate the result. This is not the functional way. Functional style means avoiding mutations, variable reassignments, and other side effects.
def mysum(t):
if not t:
return 0
else:
return t[0] + mysum(t[1:])
When written in this way, t, is not changed. This allows us to use equational reasoning whereby we can substitute any function call for its return value and always get the correct answer
x = [1,2,3]
mysum(x)
== 1 + mysum([2,3])
== 1 + 2 + mysum([3])
== 1 + 2 + 3 + mysum([])
== 1 + 2 + 3 + 0
== 1 + 2 + 3
== 1 + 5
== 6
And x was not changed as a result of running mysum -
print(x)
# [1,2,3]
Note, the Recursive module does not need to make any change to receive the benefit of rewriting mysum in this way. Before the change, we would've seen this behavior -
# main.py
from recursive import Recursive
recur = Recursive([1,2,3])
print(recur.sum())
print(recur.sum())
6
0
Because the first call to sum passes self.arr to mysum which empties self.arr as a side effect. A second call to recur.sum() will sum an empty self.arr! After fixing mysum we get the intended behaviour -
# main.py
from recursive import Recursive
recur = Recursive([1,2,3])
print(recur.sum())
print(recur.sum())
6
6
additional reading
I've written extensively about the techniques used in this answer. Follow the links to see them used in other contexts with additional explanation provided -
I want to reverse the stack but i dont know how to use recursion for reversing this… How can i reverse the stack without using Recursion
Finding all maze solutions with Python
Return middle node of linked list with recursion
How do i recursively find a size of subtree based on any given node? (BST)
Deleting node in BST Python

Closing over a variable in Python

In Scheme I can say
(define f
(let ((a (... some long computation ...)))
(lambda (args)
(...some expression involving a ...))))
Then the long computation that computes a will be performed only once, and a will be available inside the lambda. I can even set! a to some different value.
How do I accomplish the same thing in Python?
I've looked at lots of Google references to 'Python closures' and all of them refer to multiple local procedures inside an outer procedure, which is not what I want.
EDIT: I want to write a function that determines if a number is a perfect square. This code works using quadratic residues to various bases, and is quite fast, calling the expensive square root function only 6 times out of 715 (less than 1%) on average:
def iroot(k, n): # newton
u, s = n, n+1
while u < s:
s = u
t=(k-1)*s+n//pow(s,k-1)
u = t // k
return s
from sets import Set
q64 = Set()
for k in xrange(0,64):
q64.add(pow(k,2,64))
q63 = Set()
for k in xrange(0,63):
q63.add(pow(k,2,63))
q65 = Set()
for k in xrange(0,65):
q65.add(pow(k,2,65))
q11 = Set()
for k in xrange(0,11):
q11.add(pow(k,2,11))
def isSquare(n):
if n % 64 not in q64:
return False
r = n % 45045
if r % 63 not in q63:
return False
if r % 65 not in q65:
return False
if r % 11 not in q11:
return False
s = iroot(2, n)
return s * s == n
I want to hide the computations of q64, q63, q65 and q11 inside the isSquare function, so no other code can modify them. How can I do that?
A typical Python closure combined with the fact that functions are first-class citizens in this language looks almost like what you're requesting:
def f(arg1, arg2):
a = tremendously_long_computation()
def closure():
return a + arg1 + arg2 # sorry, lack of imaginantion
return closure
Here, a call to f(arg1, arg2) will return a function which closes over a and has it already computed. The only difference is that a is read-only since a closure is constructed using static program's text (this is, however, may be evaded with ugly solutions, which involve using mutable containers).
As for Python 3, the latter seems to be achievable with nonlocal keyword.
EDIT: for your purpose, a caching decorator seems the best choice:
import functools
def memoize(f):
if not hasattr(f, "cache"):
f.cache = {}
#functools.wraps(f)
def caching_function(*args, **kwargs):
key = (args, tuple(sorted(kwargs.items())))
if key not in f.cache:
result = f(*args, **kwargs)
f.cache[key] = result
return f.cache[key]
return caching_function
#memoize
def q(base):
return set(pow(k, 2, base) for k in xrange(0, base))
def test(n, base):
return n % base in q(base)
def is_square(n):
if not test(n, 64):
return False
r = n % 45045
if not all((test(r, 63), test(r, 65), test(r, 11))):
return False
s = iroot(2, n)
return s * s == n
This way, q(base) is calculated exactly once for every base. Oh, and you could have made iroot and is_square cache-able as well!
Of course, my implementation of a caching decorator is error-prone and doesn't look after memory it consumes -- better make use of functools.lru_cache (at least in Python 3), but it gives a good understanding of what goes on.

[Python]Function that runs once then remembers result when called again [duplicate]

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

python 3: setting up a variable once inside a function that is called multiple times [duplicate]

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

What is memoization and how can I use it in Python?

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

Categories

Resources