In Scheme I can say
(define f
(let ((a (... some long computation ...)))
(lambda (args)
(...some expression involving a ...))))
Then the long computation that computes a will be performed only once, and a will be available inside the lambda. I can even set! a to some different value.
How do I accomplish the same thing in Python?
I've looked at lots of Google references to 'Python closures' and all of them refer to multiple local procedures inside an outer procedure, which is not what I want.
EDIT: I want to write a function that determines if a number is a perfect square. This code works using quadratic residues to various bases, and is quite fast, calling the expensive square root function only 6 times out of 715 (less than 1%) on average:
def iroot(k, n): # newton
u, s = n, n+1
while u < s:
s = u
u = t // k
return s
from sets import Set
q64 = Set()
for k in xrange(0,64):
q63 = Set()
for k in xrange(0,63):
q65 = Set()
for k in xrange(0,65):
q11 = Set()
for k in xrange(0,11):
def isSquare(n):
if n % 64 not in q64:
return False
r = n % 45045
if r % 63 not in q63:
return False
if r % 65 not in q65:
return False
if r % 11 not in q11:
return False
s = iroot(2, n)
return s * s == n
I want to hide the computations of q64, q63, q65 and q11 inside the isSquare function, so no other code can modify them. How can I do that?
A typical Python closure combined with the fact that functions are first-class citizens in this language looks almost like what you're requesting:
def f(arg1, arg2):
a = tremendously_long_computation()
def closure():
return a + arg1 + arg2 # sorry, lack of imaginantion
return closure
Here, a call to f(arg1, arg2) will return a function which closes over a and has it already computed. The only difference is that a is read-only since a closure is constructed using static program's text (this is, however, may be evaded with ugly solutions, which involve using mutable containers).
As for Python 3, the latter seems to be achievable with nonlocal keyword.
EDIT: for your purpose, a caching decorator seems the best choice:
import functools
def memoize(f):
if not hasattr(f, "cache"):
f.cache = {}
def caching_function(*args, **kwargs):
key = (args, tuple(sorted(kwargs.items())))
if key not in f.cache:
result = f(*args, **kwargs)
f.cache[key] = result
return f.cache[key]
return caching_function
def q(base):
return set(pow(k, 2, base) for k in xrange(0, base))
def test(n, base):
return n % base in q(base)
def is_square(n):
if not test(n, 64):
return False
r = n % 45045
if not all((test(r, 63), test(r, 65), test(r, 11))):
return False
s = iroot(2, n)
return s * s == n
This way, q(base) is calculated exactly once for every base. Oh, and you could have made iroot and is_square cache-able as well!
Of course, my implementation of a caching decorator is error-prone and doesn't look after memory it consumes -- better make use of functools.lru_cache (at least in Python 3), but it gives a good understanding of what goes on.
I just started Python and I've got no idea what memoization is and how to use it. Also, may I have a simplified example?
Memoization effectively refers to remembering ("memoization" → "memorandum" → to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. You can think of it as a cache for method results. For further details, see page 387 for the definition in Introduction To Algorithms (3e), Cormen et al.
A simple example for computing factorials using memoization in Python would be something like this:
factorial_memo = {}
def factorial(k):
if k < 2: return 1
if k not in factorial_memo:
factorial_memo[k] = k * factorial(k-1)
return factorial_memo[k]
You can get more complicated and encapsulate the memoization process into a class:
class Memoize:
def __init__(self, f):
self.f = f
self.memo = {}
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.f(*args)
#Warning: You may wish to do a deepcopy here if returning objects
return self.memo[args]
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
factorial = Memoize(factorial)
A feature known as "decorators" was added in Python 2.4 which allow you to now simply write the following to accomplish the same thing:
def factorial(k):
if k < 2: return 1
return k * factorial(k - 1)
The Python Decorator Library has a similar decorator called memoized that is slightly more robust than the Memoize class shown here.
functools.cache decorator:
Python 3.9 released a new function functools.cache. It caches in memory the result of a functional called with a particular set of arguments, which is memoization. It's easy to use:
import functools
import time
def calculate_double(num):
time.sleep(1) # sleep for 1 second to simulate a slow calculation
return num * 2
The first time you call caculate_double(5), it will take a second and return 10. The second time you call the function with the same argument calculate_double(5), it will return 10 instantly.
Adding the cache decorator ensures that if the function has been called recently for a particular value, it will not recompute that value, but use a cached previous result. In this case, it leads to a tremendous speed improvement, while the code is not cluttered with the details of caching.
(Edit: the previous example calculated a fibonacci number using recursion, but I changed the example to prevent confusion, hence the old comments.)
functools.lru_cache decorator:
If you need to support older versions of Python, functools.lru_cache works in Python 3.2+. By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
def calculate_double(num):
# etc
The other answers cover what it is quite well. I'm not repeating that. Just some points that might be useful to you.
Usually, memoisation is an operation you can apply on any function that computes something (expensive) and returns a value. Because of this, it's often implemented as a decorator. The implementation is straightforward and it would be something like this
memoised_function = memoise(actual_function)
or expressed as a decorator
def actual_function(arg1, arg2):
I've found this extremely useful
from functools import wraps
def memoize(function):
memo = {}
def wrapper(*args):
# add the new key to dict if it doesn't exist already
if args not in memo:
memo[args] = function(*args)
return memo[args]
return wrapper
def fibonacci(n):
if n < 2: return n
return fibonacci(n - 1) + fibonacci(n - 2)
Memoization is keeping the results of expensive calculations and returning the cached result rather than continuously recalculating it.
Here's an example:
def doSomeExpensiveCalculation(self, input):
if input not in self.cache:
<do expensive calculation>
self.cache[input] = result
return self.cache[input]
A more complete description can be found in the wikipedia entry on memoization.
Let's not forget the built-in hasattr function, for those who want to hand-craft. That way you can keep the mem cache inside the function definition (as opposed to a global).
def fact(n):
if not hasattr(fact, 'mem'):
fact.mem = {1: 1}
if not n in fact.mem:
fact.mem[n] = n * fact(n - 1)
return fact.mem[n]
Memoization is basically saving the results of past operations done with recursive algorithms in order to reduce the need to traverse the recursion tree if the same calculation is required at a later stage.
Fibonacci Memoization example in Python:
fibcache = {}
def fib(num):
if num in fibcache:
return fibcache[num]
fibcache[num] = num if num < 2 else fib(num-1) + fib(num-2)
return fibcache[num]
Memoization is the conversion of functions into data structures. Usually one wants the conversion to occur incrementally and lazily (on demand of a given domain element--or "key"). In lazy functional languages, this lazy conversion can happen automatically, and thus memoization can be implemented without (explicit) side-effects.
Well I should answer the first part first: what's memoization?
It's just a method to trade memory for time. Think of Multiplication Table.
Using mutable object as default value in Python is usually considered bad. But if use it wisely, it can actually be useful to implement a memoization.
Here's an example adapted from
Using a mutable dict in the function definition, the intermediate computed results can be cached (e.g. when calculating factorial(10) after calculate factorial(9), we can reuse all the intermediate results)
def factorial(n, _cache={1:1}):
return _cache[n]
except IndexError:
_cache[n] = factorial(n-1)*n
return _cache[n]
Here is a solution that will work with list or dict type arguments without whining:
def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""
def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
return x
def make_tuple(L):
return tuple(handle_item(x) for x in L)
def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo
Note that this approach can be naturally extended to any object by implementing your own hash function as a special case in handle_item. For example, to make this approach work for a function that takes a set as an input argument, you could add to handle_item:
if is_instance(x, set):
return make_tuple(sorted(list(x)))
Solution that works with both positional and keyword arguments independently of order in which keyword args were passed (using inspect.getargspec):
import inspect
import functools
def memoize(fn):
cache = fn.cache = {}
def memoizer(*args, **kwargs):
kwargs.update(dict(zip(inspect.getargspec(fn).args, args)))
key = tuple(kwargs.get(k, None) for k in inspect.getargspec(fn).args)
if key not in cache:
cache[key] = fn(**kwargs)
return cache[key]
return memoizer
Similar question: Identifying equivalent varargs function calls for memoization in Python
Just wanted to add to the answers already provided, the Python decorator library has some simple yet useful implementations that can also memoize "unhashable types", unlike functools.lru_cache.
cache = {}
def fib(n):
if n <= 1:
return n
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
If speed is a consideration:
#functools.cache and #functools.lru_cache(maxsize=None) are equally fast, taking 0.122 seconds (best of 15 runs) to loop a million times on my system
a global cache variable is quite a lot slower, taking 0.180 seconds (best of 15 runs) to loop a million times on my system
a self.cache class variable is a bit slower still, taking 0.214 seconds (best of 15 runs) to loop a million times on my system
The latter two are implemented similar to how it is described in the currently top-voted answer.
This is without memory exhaustion prevention, i.e. I did not add code in the class or global methods to limit that cache's size, this is really the barebones implementation. The lru_cache method has that for free, if you need this.
One open question for me would be how to unit test something that has a functools decorator. Is it possible to empty the cache somehow? Unit tests seem like they would be cleanest using the class method (where you can instantiate a new class for each test) or, secondarily, the global variable method (since you can do yourimportedmodule.cachevariable = {} to empty it).
What is the difference (if any exists) between these memoization implementations? Is there a use case where one is preferable to the other? (I included this Fibo recursion as an example)
Put another way: is there a difference between checking if some_value in self.memo: and if some_value not in self.memo:, and if so, is there a case where one presents a better implementation (better optimized for performance, etc.)?
class Fibo:
def __init__(self):
self.memo = {}
"""Implementation 1"""
def fib1(self, n):
if n in [0,1]:
return n
if n in self.memo:
return self.memo[n]
result = self.fib1(n - 1) + self.fib1(n - 2)
self.memo[n] = result
return result
"""Implementation 2"""
def fib2(self, n):
if n in [0,1]:
return n
if n not in self.memo:
result = self.fib2(n - 1) + self.fib2(n - 2)
self.memo[n] = result
return self.memo[n]
# Fibo().fib1(8) returns 21
# Fibo().fib2(8) returns 21
There is no significant performance difference in these implementations. In my opinion fib2 is a more readable/pythonic implementation, and should be preferred.
One other recommendation I would make, is to initialise the memo in __init__ like this:
self.memo = {0:0, 1:1}
This avoids the need to make a conditional check inside each and every call, you can simply remove the first two lines of the fib method now.
I'm supposed to write a code which gets a mathematical function and a number and gives me as an output a function that is composed n times.
For example if n=3 I would get f(f(f(x))).
When I run my code I get an error, what should I fix in it?
Running examples :
>>> repeated(lambda x:x*x, 2)(5)
>>> repeated(lambda x:x*x, 4)(3)
This is my code :
def repeated(f, n):
for i in range(n):
g=lambda x: (g(g(x)))
return (g)
Return a new function that does the repeated applying only when called:
def repeated(f, n):
def repeat(arg):
return reduce(lambda r, g: g(r), [f] * n, arg)
return repeat
The reduce() method uses the list of f function references to create the right number of nested calls, starting with arg as the first argument.
>>> def repeated(f, n):
... def repeat(arg):
... return reduce(lambda r, g: g(r), [f] * n, arg)
... return repeat
>>> repeated(lambda x:x*x, 2)(5)
>>> repeated(lambda x:x*x, 4)(3)
A version that doesn't use reduce() would be:
def repeated(f, n):
def repeat(arg):
res = arg
for _ in range(n):
res = f(res)
return res
return repeat
Depending on the context of your task (e.g. programming class), you might be interested in following straightforward solution:
def repeated(f, n):
if n < 1:
raise ValueError()
elif n == 1:
return f
return lambda x: repeated(f, n-1)(f(x))
This is a naive recursive solution, which maps more directly to the requirements. If you already know about higher functions, such as reduce I suggest to go with Martijn Pieters solutions. Nevertheless this does work:
>>> repeated(lambda x:x*x, 2)(5)
>>> repeated(lambda x:x*x, 4)(3)
I thought this was an interesting enough problem that I wanted to think about it for a couple days before answering. I've created a set of generalizable, pythonic (I think), ways for composing a function on itself in the way described in the question. The most generic solution is just nest, which returns a generator that yields successively nested values of the function on the initial argument. Everything else builds off that, but the decorators could be implemented using one of the above solutions, as well.
#!/usr/bin/env python
Attempt to create a callable that can compose itself using operators
Also attempt to create a function-composition decorator.
f(x) composed once is f(x)
f(x) composed twice is f(f(x))
f(x) composed thrice is f(f(f(x)))
This only makes sense at all if the function takes at least one argument:
f() * 2 -> f(?)
But regardless of its arity, a function can only return exactly one value (even if that value is iterable). So I don't think it makes sense for the function to have >1 arity, either. I could unpack the result:
f(x, y) * 2 -> f(*f(x, y))
But that introduces ambiguity -- not every iterable value should be unpacked. Must I inspect the function to tell its arity and decide whether or not to unpack on the fly? Too much work!
So for now, I just ignore cases other than 1-arity.
def nest(func, arg):
"""Generator that calls a function on the results of the previous call.
The initial call just returns the original argument."""
while True:
yield arg
arg = func(arg)
def compose(n):
"""Return a decorator that composes the given function on itself n times."""
if n < 1: raise ValueError
def decorator(func):
def nested(arg):
gen = nest(func, arg)
for i in range(n):
return next(gen)
return nested
return decorator
class composable(object):
"""A callable that can be added and multiplied."""
def __init__(self, func):
self.func = func
def __add__(self, func2):
def added(a):
return self(func2(a))
return composable(added)
def __mul__(self, n):
"""self * 3 => self(self(self(a)))"""
def nested(a):
gen = nest(self, a)
for i in range(n):
return next(gen)
return composable(nested)
def __call__(self, *args, **kwargs):
return self.func(*args, **kwargs)
def sq(x):
return x*x
def qu(x):
return x*x
def add1(x):
return x + 1
compset = composable(set)
assert (compset + str.split)('abc def') == set(['abc', 'def']), (compset + str.split)('abc def')
assert add1(1) == 2, add1(1)
assert (add1 + (lambda x: x * x))(4) == 17, (add1 + (lambda x: x * x))(4)
assert (add1 * 3)(5) == 8, (add1 * 3)(5)
assert 625 == sq(5), sq(5)
assert 43046721 == qu(3), qu(3)
Is there a way in python to intercept (and change) the return value of an already compiled function?
The basic idea is: I have a function
def a (n, acc):
if n == 0: return acc
return a (n - 1, acc + n)
and I want to patch it, so that it behaves like this:
def a (n, acc):
if n == 0: return acc
return lambda: a (n - 1, acc + n)
Is it possible to write a function f, such as f (a) yields a function as in the second code snippet?
I can patch the function via inspect if python can locate its source and then return the newly compiled patched function, but this won't help much.
If I'm understanding correctly what you want, it is theoretically impossible; the transformation that you seem to be describing is one that would have different effects on equivalent functions, depending on superficial details of their source-code that likely won't be preserved in the compiled form. For example, consider the following two versions of a given function:
def a (n, acc):
print('called a(%d,%d)' % (n, acc))
if n == 0: return acc
return a (n - 1, acc + n)
def a (n, acc):
print('called a(%d,%d)' % (n, acc))
if n == 0: return acc
ret = a (n - 1, acc + n)
return ret
Clearly they are functionally identical. In the source code, the only difference is that the former uses return directly on a certain expression, whereas the latter saves the result of that expression into a local variable and then uses return on that variable. In the compiled form, there need be no difference at all.
Now consider the "patched" versions:
def a (n, acc):
print('called a(%d,%d)' % (n, acc))
if n == 0: return acc
return lambda: a (n - 1, acc + n)
def a (n, acc):
print('called a(%d,%d)' % (n, acc))
if n == 0: return acc
ret = a (n - 1, acc + n)
return lambda: ret
Clearly these are very different: for example, if n is 3 and acc is 0, then the former prints called a(3,0) and returns a function that prints called a(2,3) and returns a function that prints called a(1,5) and returns a function that prints called a(0,6) and returns 6, whereas the latter prints called a(3,0) and called a(2,3) and called a(1,5) and called a(0,6) and returns a function that returns a function that returns a function that returns 6.
The broader difference is that the first "patched" function performs one step of the computation each time the new return-value is called, whereas the second "patched" version performs all steps of the computation during the initial call, and simply arranges a series of subsequent calls for the sake of entertainment. This difference will matter whenever there's a side-effect (such as printing a message, or such as recursing so deeply that you overflow the stack). It can also matter if the caller introduces a side-effect: note that these functions will only be recursive until some other bit of code redefines a, at which point there is a difference between the version that plans to continue re-calling a and the version that has already completed all of its calls.
Since you can't distinguish the two "unpatched" versions, you obviously can't generate the distinct "patched" versions that your transformation implies.
Thank for your input. I wasn't seeing the obvious:
def inject (f):
def result (*args, **kwargs):
return lambda: f (*args, **kwargs)
return result
I accepted davidchambers's answer as he pushed me into the right direction.
Luke's answer is very detailed, but this may still be helpful:
>>> def f(*args, **kwargs):
... return lambda: a(*args, **kwargs)
>>> f(10, 0)()
In many cases, there are two implementation choices: a closure and a callable class. For example,
class F:
def __init__(self, op):
self.op = op
def __call__(self, arg1, arg2):
if (self.op == 'mult'):
return arg1 * arg2
if (self.op == 'add'):
return arg1 + arg2
raise InvalidOp(op)
f = F('add')
def F(op):
if op == 'or':
def f_(arg1, arg2):
return arg1 | arg2
return f_
if op == 'and':
def g_(arg1, arg2):
return arg1 & arg2
return g_
raise InvalidOp(op)
f = F('add')
What factors should one consider in making the choice, in either direction?
I can think of two:
It seems a closure would always have better performance (can't
think of a counterexample).
I think there are cases when a closure cannot do the job (e.g., if
its state changes over time).
Am I correct in these? What else could be added?
Closures are faster. Classes are more flexible (i.e. more methods available than just __call__).
I realize this is an older posting, but one factor I didn't see listed is that in Python (pre-nonlocal) you cannot modify a local variable contained in the referencing environment. (In your example such modification is not important, but technically speaking the lack of being able to modify such a variable means it's not a true closure.)
For example, the following code doesn't work:
def counter():
i = 0
def f():
i += 1
return i
return f
c = counter()
The call to c above will raise a UnboundLocalError exception.
This is easy to get around by using a mutable, such as a dictionary:
def counter():
d = {'i': 0}
def f():
d['i'] += 1
return d['i']
return f
c = counter()
c() # 1
c() # 2
but of course that's just a workaround.
Please note that because of an error previously found in my testing code, my original answer was incorrect. The revised version follows.
I made a small program to measure running time and memory consumption. I created the following callable class and a closure:
class CallMe:
def __init__(self, context):
self.context = context
def __call__(self, *args, **kwargs):
return self.context(*args, **kwargs)
def call_me(func):
return lambda *args, **kwargs: func(*args, **kwargs)
I timed calls to simple functions accepting different number of arguments (math.sqrt() with 1 argument, math.pow() with 2 and max() with 12).
I used CPython 2.7.10 and 3.4.3+ on Linux x64. I was only able to do memory profiling on Python 2. The source code I used is available here.
My conclusions are:
Closures run faster than equivalent callable classes: about 3 times faster on Python 2, but only 1.5 times faster on Python 3. The narrowing is both because closure became slower and callable classes slower.
Closures take less memory than equivalent callable classes: roughly 2/3 of the memory (only tested on Python 2).
While not part of the original question, it's interesting to note that the run time overhead for calls made via a closure is roughly the same as a call to math.pow(), while via a callable class it is roughly double that.
These are very rough estimates, and they may vary with hardware, operating system and the function you're comparing it too. However, it gives you an idea about the impact of using each kind of callable.
Therefore, this supports (conversely to what I've written before), that the accepted answer given by #RaymondHettinger is correct, and closures should be preferred for indirect calls, at least as long as it doesn't impede on readability. Also, thanks to #AXO for pointing out the mistake in my original code.
I consider the class approach to be easier to understand at one glance, and therefore, more maintainable. As this is one of the premises of good Python code, I think that all things being equal, one is better off using a class rather than a nested function. This is one of the cases where the flexible nature of Python makes the language violate the "there should be one, and preferably only one, obvious way of doing something" predicate for coding in Python.
The performance difference for either side should be negligible - and if you have code where performance matters at this level, you certainly should profile it and optimize the relevant parts, possibly rewriting some of your code as native code.
But yes, if there was a tight loop using the state variables, assessing the closure variables should be slight faster than assessing the class attributes. Of course, this would be overcome by simply inserting a line like op = self.op inside the class method, before entering the loop, making the variable access inside the loop to be made to a local variable - this would avoid an attribute look-up and fetching for each access. Again, performance differences should be negligible, and you have a more serious problem if you need this little much extra performance and are coding in Python.
Mr. Hettinger's answer still is true ten years later in Python3.10. For anyone wondering:
from timeit import timeit
class A: # Naive class
def __init__(self, op):
if op == "mut":
self.exc = lambda x, y: x * y
elif op == "add":
self.exc = lambda x, y: x + y
def __call__(self, x, y):
return self.exc(x,y)
class B: # More optimized class
__slots__ = ('__call__')
def __init__(self, op):
if op == "mut":
self.__call__ = lambda x, y: x * y
elif op == "add":
self.__call__ = lambda x, y: x + y
def C(op): # Closure
if op == "mut":
def _f(x,y):
return x * y
elif op == "add":
def _f(x,t):
return x + y
return _f
a = A("mut")
b = B("mut")
c = C("mut")
print(timeit("[a(x,y) for x in range(100) for y in range(100)]", globals=globals(), number=10000))
# 26.47s naive class
print(timeit("[b(x,y) for x in range(100) for y in range(100)]", globals=globals(), number=10000))
# 18.00s optimized class
print(timeit("[c(x,y) for x in range(100) for y in range(100)]", globals=globals(), number=10000))
# 12.12s closure
Using closure seems to offer significant speed gains in cases where the call number is high. However, classes have extensive customization and are superior choice at times.
I'd re-write class example with something like:
class F(object):
__slots__ = ('__call__')
def __init__(self, op):
if op == 'mult':
self.__call__ = lambda a, b: a * b
elif op == 'add':
self.__call__ = lambda a, b: a + b
raise InvalidOp(op)
That gives 0.40 usec/pass (function 0.31, so it 29% slower) at my machine with Python 3.2.2. Without using object as a base class it gives 0.65 usec/pass (i.e. 55% slower than object based). And by some reason code with checking op in __call__ gives almost the same results as if it was done in __init__. With object as a base and check inside __call__ gives 0.61 usec/pass.
The reason why would you use classes might be polymorphism.
class UserFunctions(object):
__slots__ = ('__call__')
def __init__(self, name):
f = getattr(self, '_func_' + name, None)
if f is None: raise InvalidOp(name)
else: self.__call__ = f
class MyOps(UserFunctions):
def _func_mult(cls, a, b): return a * b
def _func_add(cls, a, b): return a + b