Are inner functions redefined every time their parent function is called? - python

I know functions in Python are 1st class citizens, meaning that they are objects of the function class similar to how 5 is an object of the int class. This means that at some point in their life-time a constructor is called. For most functions I would expect this to happen when they are defined (as most functions are presumably defined only once) so that we only pay the construction price once no matter how many times we use it.
But what about nested functions? Those are redefined every time their parent is called. Does this mean we re-construct the object every time? If yes, isn't that grossly inefficient? Wouldn't a private method (assuming the parent function is a method) or another function be much more efficient? I am ignoring scoping arguments in favour of nesting for this discussion.
I run a simple experiment that seems to support my aforementioned argument as the inner function edition is slower:
import time
def f(x, y):
def defined_inside(x, y):
return x + y
return defined_inside(x, y)
def defined_outside(x, y):
return x + y
def g(x, y):
return defined_outside(x, y)
start = time.time()
for i in range(10000000):
_ = f(3, 4)
end = time.time()
print("Using inner function it took {} seconds".format(end - start))
start = time.time()
for i in range(10000000):
_ = g(3, 4)
end = time.time()
print("Using outer function it took {} seconds".format(end - start))
Results:
Using inner function it took 2.494696855545044 seconds
Using outer function it took 1.8862690925598145 seconds
Bonus question: In case the above is true, how does the situation relate to compiled languages, such as Scala? I grew into a huge fun of nesting and higher order functions, it would be terrible if the trick is as inefficient as it seems.

Related

In Python, when/why should you return a function?

I am coming from c
The concept of first class function is interesting and exiting.
However, I am struggling to find a practical use-case to returning a function.
I have seen the examples of building a function that returns print a grating...
Hello = grating('Hello')
Hi = grating('Hi')
But why is this better then just using
grating('Hello') and grating('Hi')?
Consider:
def func_a():
def func_b():
do sonthing
return somthing
return func_b
When this is better then:
def func_b():
do sonthing
return somthing
def func_a():
res = func_b()
return (res)
can someone point me to a real world useful example?
Thanks!
Your examples aren't helpful as written. But there are other cases where it's useful, e.g. decorators (which are functions that are called on functions and return functions, to modify the behavior of the function they're called on for future callers) and closures. The closure case can be a convenience (e.g. some other part of your code doesn't want to have to pass 10 arguments on every call when eight of them are always the same value; this is usually covered by functools.partial, but closures also work), or it can be a caching time saver. For an example of the latter, imagine a stupid function that tests which of a set of numbers less than some bound are prime by computing all primes up to that bound, then filtering the inputs to those in the set of primes.
If you write it as:
def get_primes(*args):
maxtotest = max(args)
primes = sieve_of_eratosthenes(maxtotest) # (expensive) Produce set of primes up to maxtotest
return primes.intersection(args)
then you're redoing the Sieve of Eratosthenes every call, which swamps the cost of a set intersection. If you implement it as a closure, you might do:
def make_get_primes(initialmax=1000):
primes = sieve_of_eratosthenes(initialmax) # (expensive) Produce set of primes up to maxtotest
currentmax = initialmax
def get_primes(*args):
nonlocal currentmax
maxtotest = max(args)
if maxtotest > currentmax:
primes.update(partial_sieve_of_eratosthenes(currentmax, maxtotest)) # (less expensive) Fill in additional primes not sieved
currentmax = maxtotest
return primes.intersection(args)
return get_primes
Now, if you need a tester for a while, you can do:
get_primes = make_get_primes()
and each call to get_primes is cheap (essentially free if the cached primes already cover you, and cheaper if it has to compute more).
Imagine you wanted to pass a function that would choose a function based on parameters:
def compare(a, b):
...
def anticompare(a, b): # Compare but backwards
...
def get_comparator(reverse):
if reverse: return anticompare
else: return compare
def sort(arr, reverse=false):
comparator = get_comparator(reverse)
...
Obviously this is mildly contrived, but it separates the logic of choosing a comparator from the comparator functions themselves.
A perfect example of a function returning a function is a Python decorator:
def foo(func):
def bar():
return func().upper()
return bar
#foo
def hello_world():
return "Hello World!"
print(hello_world())
The decorator is the #foo symbol above the hello_world() function declaration. In essence, we tell the interpreter that whenever it sees the mention of hello_world(), to actually call foo(hello_world()), therefore, the output of the code above is:
HELLO WORLD!
The idea of a function that returns a function in Python is for decorators, which is a design pattern that allows to add new functionality to an existing object.
I recommend you to read about decorators
On example would be creating a timer function. Let's say we want a function that accepts another function, and times how long it takes to complete:
import time
def timeit(func, args):
st = time.time()
func(args)
return str('Took ' + str(time.time() - st) + ' seconds to complete.'
timeit(lambda x: x*10, 10)
# Took 1.9073486328125e-06 seconds to complete
Edit: Using a function that returns a function
import time
def timeit(func):
st = time.time()
func(10)
return str('Took ' + str(time.time() - st) + ' seconds to complete.')
def make_func(x):
return lambda x: x
timeit(make_func(10))
# Took 1.90734863281e-06 seconds to complete.

[Python]Function that runs once then remembers result when called again [duplicate]

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

python 3: setting up a variable once inside a function that is called multiple times [duplicate]

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

Function closure vs. callable class

In many cases, there are two implementation choices: a closure and a callable class. For example,
class F:
def __init__(self, op):
self.op = op
def __call__(self, arg1, arg2):
if (self.op == 'mult'):
return arg1 * arg2
if (self.op == 'add'):
return arg1 + arg2
raise InvalidOp(op)
f = F('add')
or
def F(op):
if op == 'or':
def f_(arg1, arg2):
return arg1 | arg2
return f_
if op == 'and':
def g_(arg1, arg2):
return arg1 & arg2
return g_
raise InvalidOp(op)
f = F('add')
What factors should one consider in making the choice, in either direction?
I can think of two:
It seems a closure would always have better performance (can't
think of a counterexample).
I think there are cases when a closure cannot do the job (e.g., if
its state changes over time).
Am I correct in these? What else could be added?
Closures are faster. Classes are more flexible (i.e. more methods available than just __call__).
I realize this is an older posting, but one factor I didn't see listed is that in Python (pre-nonlocal) you cannot modify a local variable contained in the referencing environment. (In your example such modification is not important, but technically speaking the lack of being able to modify such a variable means it's not a true closure.)
For example, the following code doesn't work:
def counter():
i = 0
def f():
i += 1
return i
return f
c = counter()
c()
The call to c above will raise a UnboundLocalError exception.
This is easy to get around by using a mutable, such as a dictionary:
def counter():
d = {'i': 0}
def f():
d['i'] += 1
return d['i']
return f
c = counter()
c() # 1
c() # 2
but of course that's just a workaround.
Please note that because of an error previously found in my testing code, my original answer was incorrect. The revised version follows.
I made a small program to measure running time and memory consumption. I created the following callable class and a closure:
class CallMe:
def __init__(self, context):
self.context = context
def __call__(self, *args, **kwargs):
return self.context(*args, **kwargs)
def call_me(func):
return lambda *args, **kwargs: func(*args, **kwargs)
I timed calls to simple functions accepting different number of arguments (math.sqrt() with 1 argument, math.pow() with 2 and max() with 12).
I used CPython 2.7.10 and 3.4.3+ on Linux x64. I was only able to do memory profiling on Python 2. The source code I used is available here.
My conclusions are:
Closures run faster than equivalent callable classes: about 3 times faster on Python 2, but only 1.5 times faster on Python 3. The narrowing is both because closure became slower and callable classes slower.
Closures take less memory than equivalent callable classes: roughly 2/3 of the memory (only tested on Python 2).
While not part of the original question, it's interesting to note that the run time overhead for calls made via a closure is roughly the same as a call to math.pow(), while via a callable class it is roughly double that.
These are very rough estimates, and they may vary with hardware, operating system and the function you're comparing it too. However, it gives you an idea about the impact of using each kind of callable.
Therefore, this supports (conversely to what I've written before), that the accepted answer given by #RaymondHettinger is correct, and closures should be preferred for indirect calls, at least as long as it doesn't impede on readability. Also, thanks to #AXO for pointing out the mistake in my original code.
I consider the class approach to be easier to understand at one glance, and therefore, more maintainable. As this is one of the premises of good Python code, I think that all things being equal, one is better off using a class rather than a nested function. This is one of the cases where the flexible nature of Python makes the language violate the "there should be one, and preferably only one, obvious way of doing something" predicate for coding in Python.
The performance difference for either side should be negligible - and if you have code where performance matters at this level, you certainly should profile it and optimize the relevant parts, possibly rewriting some of your code as native code.
But yes, if there was a tight loop using the state variables, assessing the closure variables should be slight faster than assessing the class attributes. Of course, this would be overcome by simply inserting a line like op = self.op inside the class method, before entering the loop, making the variable access inside the loop to be made to a local variable - this would avoid an attribute look-up and fetching for each access. Again, performance differences should be negligible, and you have a more serious problem if you need this little much extra performance and are coding in Python.
Mr. Hettinger's answer still is true ten years later in Python3.10. For anyone wondering:
from timeit import timeit
class A: # Naive class
def __init__(self, op):
if op == "mut":
self.exc = lambda x, y: x * y
elif op == "add":
self.exc = lambda x, y: x + y
def __call__(self, x, y):
return self.exc(x,y)
class B: # More optimized class
__slots__ = ('__call__')
def __init__(self, op):
if op == "mut":
self.__call__ = lambda x, y: x * y
elif op == "add":
self.__call__ = lambda x, y: x + y
def C(op): # Closure
if op == "mut":
def _f(x,y):
return x * y
elif op == "add":
def _f(x,t):
return x + y
return _f
a = A("mut")
b = B("mut")
c = C("mut")
print(timeit("[a(x,y) for x in range(100) for y in range(100)]", globals=globals(), number=10000))
# 26.47s naive class
print(timeit("[b(x,y) for x in range(100) for y in range(100)]", globals=globals(), number=10000))
# 18.00s optimized class
print(timeit("[c(x,y) for x in range(100) for y in range(100)]", globals=globals(), number=10000))
# 12.12s closure
Using closure seems to offer significant speed gains in cases where the call number is high. However, classes have extensive customization and are superior choice at times.
I'd re-write class example with something like:
class F(object):
__slots__ = ('__call__')
def __init__(self, op):
if op == 'mult':
self.__call__ = lambda a, b: a * b
elif op == 'add':
self.__call__ = lambda a, b: a + b
else:
raise InvalidOp(op)
That gives 0.40 usec/pass (function 0.31, so it 29% slower) at my machine with Python 3.2.2. Without using object as a base class it gives 0.65 usec/pass (i.e. 55% slower than object based). And by some reason code with checking op in __call__ gives almost the same results as if it was done in __init__. With object as a base and check inside __call__ gives 0.61 usec/pass.
The reason why would you use classes might be polymorphism.
class UserFunctions(object):
__slots__ = ('__call__')
def __init__(self, name):
f = getattr(self, '_func_' + name, None)
if f is None: raise InvalidOp(name)
else: self.__call__ = f
class MyOps(UserFunctions):
#classmethod
def _func_mult(cls, a, b): return a * b
#classmethod
def _func_add(cls, a, b): return a + b

What is memoization and how can I use it in Python?

I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
Then:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
#Memoize
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
#functools.cache
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
#functools.lru_cache(maxsize=None)
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
#memoise
def actual_function(arg1, arg2):
#body
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
#wraps(function)
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
#memoize
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
fibonacci(25)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
see http://scriptbucket.wordpress.com/2012/12/11/introduction-to-memoization/
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
else:
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from http://docs.python.org/2/faq/design.html#why-are-default-values-shared-between-objects
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
try:
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
#functools.wraps(fn)
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
else:
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).

Categories

Resources