I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).
Related
I am coming from c
The concept of first class function is interesting and exiting.
However, I am struggling to find a practical use-case to returning a function.
I have seen the examples of building a function that returns print a grating...
Hello = grating('Hello')
Hi = grating('Hi')
But why is this better then just using
grating('Hello') and grating('Hi')?
Consider:
def func_a():
def func_b():
do sonthing
return somthing
return func_b
When this is better then:
def func_b():
do sonthing
return somthing
def func_a():
res = func_b()
return (res)
can someone point me to a real world useful example?
Thanks!
Your examples aren't helpful as written. But there are other cases where it's useful, e.g. decorators (which are functions that are called on functions and return functions, to modify the behavior of the function they're called on for future callers) and closures. The closure case can be a convenience (e.g. some other part of your code doesn't want to have to pass 10 arguments on every call when eight of them are always the same value; this is usually covered by functools.partial, but closures also work), or it can be a caching time saver. For an example of the latter, imagine a stupid function that tests which of a set of numbers less than some bound are prime by computing all primes up to that bound, then filtering the inputs to those in the set of primes.
If you write it as:
def get_primes(*args):
maxtotest = max(args)
primes = sieve_of_eratosthenes(maxtotest) # (expensive) Produce set of primes up to maxtotest
return primes.intersection(args)
then you're redoing the Sieve of Eratosthenes every call, which swamps the cost of a set intersection. If you implement it as a closure, you might do:
def make_get_primes(initialmax=1000):
primes = sieve_of_eratosthenes(initialmax) # (expensive) Produce set of primes up to maxtotest
currentmax = initialmax
def get_primes(*args):
nonlocal currentmax
maxtotest = max(args)
if maxtotest > currentmax:
primes.update(partial_sieve_of_eratosthenes(currentmax, maxtotest)) # (less expensive) Fill in additional primes not sieved
currentmax = maxtotest
return primes.intersection(args)
return get_primes
Now, if you need a tester for a while, you can do:
get_primes = make_get_primes()
and each call to get_primes is cheap (essentially free if the cached primes already cover you, and cheaper if it has to compute more).
Imagine you wanted to pass a function that would choose a function based on parameters:
def compare(a, b):
...
def anticompare(a, b): # Compare but backwards
...
def get_comparator(reverse):
if reverse: return anticompare
else: return compare
def sort(arr, reverse=false):
comparator = get_comparator(reverse)
...
Obviously this is mildly contrived, but it separates the logic of choosing a comparator from the comparator functions themselves.
A perfect example of a function returning a function is a Python decorator:
def foo(func):
def bar():
return func().upper()
return bar
#foo
def hello_world():
return "Hello World!"
print(hello_world())
The decorator is the #foo symbol above the hello_world() function declaration. In essence, we tell the interpreter that whenever it sees the mention of hello_world(), to actually call foo(hello_world()), therefore, the output of the code above is:
HELLO WORLD!
The idea of a function that returns a function in Python is for decorators, which is a design pattern that allows to add new functionality to an existing object.
I recommend you to read about decorators
On example would be creating a timer function. Let's say we want a function that accepts another function, and times how long it takes to complete:
import time
def timeit(func, args):
st = time.time()
func(args)
return str('Took ' + str(time.time() - st) + ' seconds to complete.'
timeit(lambda x: x*10, 10)
# Took 1.9073486328125e-06 seconds to complete
Edit: Using a function that returns a function
import time
def timeit(func):
st = time.time()
func(10)
return str('Took ' + str(time.time() - st) + ' seconds to complete.')
def make_func(x):
return lambda x: x
timeit(make_func(10))
# Took 1.90734863281e-06 seconds to complete.
I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).
BACKGROUND
When playing around, I often write simple recursive functions looking something like:
def f(a,b):
if a>=0 and b>=0:
return min( f(a-1,b) , f(b,a-1) ) # + some cost that depends on a,b
else:
return 0
(For example, when computing weighted edit distances, or evaluating recursively defined mathematical formulas.)
I then use a memoizing decorator to cache the results automatically.
PROBLEM
When I try something like f(200,10) I get:
RuntimeError: maximum recursion depth exceeded
This is as expected because the recursive implementation exhausts Python's stack space/ recursion limits.
WORKAROUNDS
I usually work around this problem by one of:
Increasing recursion limit with sys.setrecursionlimit (only works up to about 1000 depth)
Using a for loop to fill up the cache for smaller values
Changing the function to use a list as a manual stack (via append and pop calls) (in other words, moving from a recursive implementation to an iterative one)
Using an alternative programming language
but I find all of these quite error prone.
QUESTION
Is there a way to write an #Bigstack decorator that would simulate the effect of having a really big stack?
Note that my functions normally make several recursive function calls so this is not the same as tail recursion - I really do want to save all the internal state of each function on the stack.
WHAT I'VE TRIED
I've been thinking about using a list of generator expressions as my stack. By probing the stackframe I could work out when the function has been called recursively and then trigger an exception to return to the decorator code. However, I can't work out a way of gluing these ideas together to make anything that actually works.
Alternatively, I could try accessing the abstract syntax tree for the function and try transforming calls to recursive functions to yield statements, but this seems like it's heading in the wrong direction.
Any suggestions?
EDIT
It certainly looks like I am misusing Python, but another approach I have been considering is to use a different thread for each block of, say, 500 stack frames and then insert queues between each consecutive pair of threads - one queue for arguments, and another queue for return values. (Each queue will have at most one entry in it.) I think this probably doesn't work for some reason - but I'll probably only work out why after I've tried to implement it.
To get around the recursion limit, you can catch the RuntimeError exception to detect when you've run out of stack space, and then return a continuation-ish function that, when called, restarts the recursion at the level where you ran out of space. Call this (and its return value, and so on) until you get a value, then try again from the top. Once you've memoized the lower levels, the higher levels won't run into a recursion limit, so eventually this will work. Put the repeated-calling-until-it-works in a wrapper function. Basically it's a lazy version of your warming-up-the-cache idea.
Here's an example with a simple recursive "add numbers from 1 to n inclusive" function.
import functools
def memoize(func):
cache = {}
#functools.wraps(func)
def wrapper(*args, **kwargs):
key = args, tuple(sorted(kwargs.items()))
if key in cache:
return cache[key]
else:
result = func(*args, **kwargs)
if not callable(result):
cache[key] = result
return result
return wrapper
#memoize
def _addup(n):
if n < 2:
return n
else:
try:
result = _addup(n - 1)
except RuntimeError:
return lambda: _addup(n)
else:
return result if callable(result) else result + n
def addup(n):
result = _addup(n)
while callable(result):
while callable(result):
result = result()
result = _addup(n)
return result
assert addup(5000) == sum(xrange(5001))
Rather than returning the lambda function all the way back up the call chain, we can raise an exception to short-circuit that, which both improves performance and simplifies the code:
# memoize function as above, or you can probably use functools.lru_cache
class UnwindStack(Exception):
pass
#memoize
def _addup(n):
if n < 2:
return n
else:
try:
return _addup(n - 1) + n
except RuntimeError:
raise UnwindStack(lambda: _addup(n))
def _try(func, *args, **kwargs):
try:
return func(*args, **kwargs)
except UnwindStack as e:
return e[0]
def addup(n):
result = _try(_addup, n)
while callable(result):
while callable(result):
result = _try(result)
result = _try(_addup, n)
return result
This remains pretty inelegant, though, and still has a fair amount of overhead, and I can't imagine how you'd make a decorator out it. Python isn't really suited to this kind of thing, I guess.
Here's an implementation that uses a list of generator expressions as the stack:
def run_stackless(frame):
stack, return_stack = [(False, frame)], []
while stack:
active, frame = stack.pop()
action, res = frame.send(return_stack.pop() if active else None)
if action == 'call':
stack.extend([(True, frame), (False, res)])
elif action == 'tail':
stack.append((False, res))
elif action == 'return':
return_stack.append(res)
else:
raise ValueError('Unknown action', action)
return return_stack.pop()
To use it you need to transform the recursive function according to the following rules:
return expr -> yield 'return', expr
recursive_call(args...) -> (yield 'call', recursive_call(args...))
return recursive_call(args...) -> yield 'tail', recursive_call(args...)
For example, with the cost function as a * b, your function becomes:
def f(a,b):
if a>=0 and b>=0:
yield 'return', min((yield 'call', f(a-1,b)),
(yield 'call', f(b,a-1))) + (a * b)
else:
yield 'return', 0
Testing:
In [140]: run_stackless(g(30, 4))
Out[140]: 410
In Python 2.6.2 it appears to offer a ~8-10x performance hit compared to direct calls.
The tail action is for tail recursion:
def factorial(n):
acc = [1]
def fact(n):
if n == 0:
yield 'return', 0
else:
acc[0] *= n
yield 'tail', fact(n - 1)
run_stackless(fact(n))
return acc[0]
The transformation to generator-recursive style is fairly easy, and could probably be done as a bytecode hack.
This approach combines memoisation and increased stack depth into a single decorator.
I generate a pool of threads with each thread responsible for 64 levels of the stack.
Threads are only created once and resued (but currently never deleted).
Queues are used to pass information between threads, although note that only the thread corresponding to the current stack depth will actually have work to do.
My experiments suggest this adds around 10% overhead for a simple recursive function (and should be less for more complicated functions).
import threading,Queue
class BigstackThread(threading.Thread):
def __init__(self,send,recv,func):
threading.Thread.__init__( self )
self.daemon = True
self.send = send
self.recv = recv
self.func = func
def run(self):
while 1:
args = self.send.get()
v = self.func(*args)
self.recv.put(v)
class Bigstack(object):
def __init__(self,func):
self.func = func
self.cache = {}
self.depth = 0
self.threadpool = {}
def __call__(self,*args):
if args in self.cache:
return self.cache[args]
self.depth+=1
if self.depth&63:
v = self.func(*args)
else:
T=self.threadpool
if self.depth not in T:
send = Queue.Queue(1)
recv = Queue.Queue(1)
t = BigstackThread(send,recv,self)
T[self.depth] = send,recv,t
t.start()
else:
send,recv,_ = T[self.depth]
send.put(args)
v = recv.get()
self.depth-=1
self.cache[args]=v
return v
#Bigstack
def f(a,b):
if a>=0 and b>=0:
return min(f(a-1,b),f(b-1,a))+1
return 0
I am solving a problem in three different ways, two are recursive and I memoize them myself. The other is not recursive but uses math.factorial. I need to know if I need to add explicit memoization to it.
Thanks.
Search for math_factorial on this link and you will find its implementation in python:
http://svn.python.org/view/python/trunk/Modules/mathmodule.c?view=markup
P.S. This is for python2.6
Python's math.factorial is not memoized, it is a simple for loop multiplying the values from 1 to your arg. If you need memoization, you need to do it explicitly.
Here is a simple way to memoize using dictionary setdefault method.
import math
cache = {}
def myfact(x):
return cache.setdefault(x,math.factorial(x))
print myfact(10000)
print myfact(10000)
Python's math.factorial is not memoized.
I'm going to guide you through some trial and error examples to see why to get a really memoized and working factorial function you have to redefine it ex-novo taking into account a couple of things.
The other answer actually is not correct. Here,
import math
cache = {}
def myfact(x):
return cache.setdefault(x,math.factorial(x))
the line
return cache.setdefault(x,math.factorial(x))
computes both x and math.factorial(x) every time and therefore you gain no performance improvement.
You may think of doing something like this:
if x not in cache:
cache[x] = math.factorial(x)
return cache[x]
but actually this is wrong as well. Yes, you avoid computing again the factorial of a same x but think, for example, if you are going to calculate myfact(1000) and soon after that myfact(999). Both of them gets calculated completely thus not taking any advantage from the fact that myfact(1000) automatically computes myfact(999).
It comes natural then to write something like this:
def memoize(f):
"""Returns a memoized version of f"""
memory = {}
def memoized(*args):
if args not in memory:
memory[args] = f(*args)
return memory[args]
return memoized
#memoize
def my_fact(x):
assert x >= 0
if x == 0:
return 1
return x * my_fact(x - 1)
This is going to work. Unfortunately it soon reaches the maximum recursion depth.
So how to implement it?
Here is an example of truly memoized factorial, that takes advantage of how factorials work and does not consumes all the stack with recursive calls:
# The 'max' key stores the maximum number for which the factorial is stored.
fact_memory = {0: 1, 1: 1, 'max': 1}
def my_fact(num):
# Factorial is defined only for non-negative numbers
assert num >= 0
if num <= fact_memory['max']:
return fact_memory[num]
for x in range(fact_memory['max']+1, num+1):
fact_memory[x] = fact_memory[x-1] * x
fact_memory['max'] = num
return fact_memory[num]
I hope you find this useful.
EDIT:
Just as a note, a way to achieve this same optimization having at the same time the conciseness and elegance of recursion would be to redefine the function as a tail-recursive function.
def memoize(f):
"""Returns a memoized version of f"""
memory = {}
def memoized(*args):
if args not in memory:
memory[args] = f(*args)
return memory[args]
return memoized
#memoize
def my_fact(x, fac=1):
assert x >= 0
if x < 2:
return fac
return my_fact(x-1, x*fac)
Tail recursion functions in fact can be recognized by the interpreter/compiler and be automagically translated/optimized to an iterative version, but not all interpreters/compilers support this.
Unfortunately python does not support tail recursion optimization, so you still get:
RuntimeError: maximum recursion depth exceeded
when the input of my_fact is high.
I'm late to the party, yet here are my 2c on implementing an efficient memoized factorial function in Python. This approach is more efficient since it relies on an array-like structure (that is list) rather than a hashed container (that is dict). No recursion involved (spares you some Python function-call overhead) and no slow for-loops involved. And it is (arguably) functionally-pure as there are no outer side-effects involved (that is it doesn't modify a global variable). It caches all intermediate factorials, hence if you've already calculated factorial(n), it will take you O(1) to calculate factorial(m) for any 0 <= m <= n and O(m-n) for any m > n.
def inner_func(f):
return f()
#inner_func
def factorial():
factorials = [1]
def calculate_factorial(n):
assert n >= 0
return reduce(lambda cache, num: (cache.append(cache[-1] * num) or cache),
xrange(len(factorials), n+1), factorials)[n]
return calculate_factorial
I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).