cleaning up nested function calls

cleaning up nested function calls - python

I have written several functions that run sequentially, each one taking as its input the output of the previous function so in order to run it, I have to run this line of code
make_list(cleanup(get_text(get_page(URL))))
and I just find that ugly and inefficient, is there a better way to do sequential function calls?

Really, this is the same as any case where you want to refactor commonly-used complex expressions or statements: just turn the expression or statement into a function. The fact that your expression happens to be a composition of function calls doesn't make any difference (but see below).
So, the obvious thing to do is to write a wrapper function that composes the functions together in one place, so everywhere else you can make a simple call to the wrapper:
def get_page_list(url):
return make_list(cleanup(get_text(get_page(url))))
things = get_page_list(url)
stuff = get_page_list(another_url)
spam = get_page_list(eggs)
If you don't always call the exact same chain of functions, you can always factor out into the pieces that you frequently call. For example:
def get_clean_text(page):
return cleanup(get_text(page))
def get_clean_page(url):
return get_clean_text(get_page(url))
This refactoring also opens the door to making the code a bit more verbose but a lot easier to debug, since it only appears once instead of multiple times:
def get_page_list(url):
page = get_page(url)
text = get_text(page)
cleantext = cleanup(text)
return make_list(cleantext)
If you find yourself needing to do exactly this kind of refactoring of composed functions very often, you can always write a helper that generates the refactored functions. For example:
def compose1(*funcs):
#wraps(funcs[0])
def composed(arg):
for func in reversed(funcs):
arg = func(arg)
return arg
return composed
get_page_list = compose1(make_list, cleanup, get_text, get_page)
If you want a more complicated compose function (that, e.g., allows passing multiple args/return values around), it can get a bit complicated to design, so you might want to look around on PyPI and ActiveState for the various existing implementations.

You could try something like this. I always like separating train wrecks(the book "Clean Code" calls those nested functions train wrecks). This is easier to read and debug. Remember you probably spend twice as long reading your code than writing it so make it easier to read. You will thank yourself later.
url = get_page(URL)
url_text = get_text(url)
make_list(cleanup(url_text))
# you can also encapsulate that into its own function
def build_page_list_from_url(url):
url = get_page(URL)
url_text = get_text(url)
return make_list(cleanup(url_text))

Options:
Refactor: implement this series of function calls as one, aptly-named method.
Look into decorators. They're syntactic sugar for 'chaining' functions in this way. E.g. implement cleanup and make_list as a decorators, then decorate get_text with them.
Compose the functions. See code in this answer.

You could shorten constructs like that with something like the following:
class ChainCalls(object):
def __init__(self, *funcs):
self.funcs = funcs
def __call__(self, *args, **kwargs):
result = self.funcs[-1](*args, **kwargs)
for func in self.funcs[-2::-1]:
result = func(result)
return result
def make_list(arg): return 'make_list(%s)' % arg
def cleanup(arg): return 'cleanup(%s)' % arg
def get_text(arg): return 'get_text(%s)' % arg
def get_page(arg): return 'get_page(%r)' % arg
mychain = ChainCalls(make_list, cleanup, get_text, get_page)
print( mychain('http://is.gd') )
Output:
make_list(cleanup(get_text(get_page('http://is.gd'))))

Related

Python: More elegant way to add optional parameters to method call

This will seem trivial perhaps, but it is a condition that I run into fairly frequently and would like to find a more elegant way of writing this code. The method, while not terribly relevant to the question, takes a text value and an optional is_checked value to create a radio button (using dominate). In this case, I can't set 'checked' to None, or false - it either has to be there or not. It doesn't seem like I should have to write the 'input' line twice though, just to optionally add an argument.
def _get_radio_button(text: str, is_checked=False):
with label(text, cls="radio-inline") as lbl:
if is_checked:
input(text, type="radio", name="optradio", checked='checked')
else:
input(text, type="radio", name="optradio")
return lbl
This would be my second approach, but it is the same lines of code and less readable - though perhaps a tiny bit more DRY.
a = dict(type='radio', name='optradio')
if is_checked:
a['checked']='checked'
with label(text, cls="radio-inline") as lbl:
input(text, **a)
Question: How can I handle this code case with the fewest lines possible without sacrificing readability?

Your code looks fine, except obviously for the naming of a, which could be input_opts or something like that.
Another possibility to make it a bit clearer is to use direct keyword arguments for the common stuff and just inject the optional ones using **. When only one is optional, this can be quite short, e.g.:
checked_arg = {'checked': 'checked'} if is_checked else {}
with label(text, cls="radio-inline") as lbl:
input(text, type="radio", name="optradio", **checked_arg)

Only as concept :) You can decorate in this way own or alien (library) functions. Even more, you can make decorator as class (with __call__ method which will decorate underlying function) which can be parameterized with simple "morphisms" of underlying function arguments (they may be list of functions - as arguments of decorator class constructor). Also you can make more declarative style decorator and to inspect underlying function arguments (for default values, for example) - you are limited only by own fantasy :) So:
from functools import wraps
def adapt_gui_args(callable):
#wraps(callable)
def w(*args, **kwargs):
if kwargs.pop('is_checked', False): kwargs['checked'] = 'checked'
return callable(*args, **kwargs)
return w
# may be decorated with adapt_gui_args if it's your function
def input(*args, **kwargs):
print("args: ", args)
print("kwargs: ", kwargs)
# decorate input function outside its source body
input = adapt_gui_args(input)
def test(is_checked=False):
input(1, 2, type="radio", is_checked=is_checked)
test(False)
test(True)

Replacing parts of the function code on-the-fly

Here I came up with the solution to the other question asked by me on how to remove all costly calling to debug output function scattered over the function code (slowdown was 25 times with using empty function lambda *p: None).
The solution is to edit function code dynamically and prepend all function calls with comment sign #.
from __future__ import print_function
DEBUG = False
def dprint(*args,**kwargs):
'''Debug print'''
print(*args,**kwargs)
def debug(on=False,string='dprint'):
'''Decorator to comment all the lines of the function code starting with string'''
def helper(f):
if not on:
import inspect
source = inspect.getsource(f)
source = source.replace(string, '#'+string) #Beware! Swithces off the whole line after dprint statement
with open('temp_f.py','w') as file:
file.write(source)
from temp_f import f as f_new
return f_new
else:
return f #return f intact
return helper
def f():
dprint('f() started')
print('Important output')
dprint('f() ended')
f = debug(DEBUG,'dprint')(f) #If decorator #debug(True) is used above f(), inspect.getsource somehow includes #debug(True) inside the code.
f()
The problems I see now are these:
# commets all line to the end; but there may be other statements separated by ;. This may be addressed by deleting all pprint calls in f, not commenting, still it may be not that trivial, as there may be nested parantheses.
temp_f.py is created, and then new f code is loaded from it. There should be a better way to do this without writing to hard drive. I found this recipe, but haven't managed to make it work.
if decorator is applied with special syntax used #debug, then inspect.getsource includes the line with decorator to the function code. This line can be manually removed from string, but it may lead to bugs if there are more than one decorator applied to f. I solved it with resorting to old-style decorator application f=decorator(f).
What other problems do you see here?
How can all these problems be solved?
What are upsides and downsides of this approach?
What can be improved here?
Is there any better way to do what I try to achieve with this code?
I think it's a very interesting and contentious technique to preprocess function code before compilation to byte-code. Strange though that nobody got interested in it. I think the code I gave may have a lot of shaky points.

A decorator can return either a wrapper, or the decorated function unaltered. Use it to create a better debugger:
from functools import wraps
def debug(enabled=False):
if not enabled:
return lambda x: x # Noop, returns decorated function unaltered
def debug_decorator(f):
#wraps(f)
def print_start(*args, **kw):
print('{0}() started'.format(f.__name__))
try:
return f(*args, **kw)
finally:
print('{0}() completed'.format(f.__name__))
return print_start
return debug_decorator
The debug function is a decorator factory, when called it produces a decorator function. If debugging is disabled, it simply returns a lambda that returns it argument unchanged, a no-op decorator. When debugging is enabled, it returns a debugging decorator that prints when a decorated function has started and prints again when it returns.
The returned decorator is then applied to the decorated function.
Usage:
DEBUG = True
#debug(DEBUG)
def my_function_to_be_tested():
print('Hello world!')
To reiterate: when DEBUG is set to false, the my_function_to_be_tested remains unaltered, so runtime performance is not affected at all.

Here is the solution I came up with after composing answers from another questions asked by me here on StackOverflow.
This solution don't comment anything and just deletes standalone dprint statements. It uses ast module and works with Abstract Syntax Tree, it lets us avoid parsing source code. This idea was written in the comment here.
Writing to temp_f.py is replaced with execution f in necessary environment. This solution was offered here.
Also, the last solution addresses the problem of decorator recursive application. It's solved by using _blocked global variable.
This code solves the problem asked to be solved in the question. But still, it's suggested not to be used in real projects:
You are correct, you should never resort to this, there are so many
ways it can go wrong. First, Python is not a language designed for
source-level transformations, and it's hard to write it a transformer
such as comment_1 without gratuitously breaking valid code. Second,
this hack would break in all kinds of circumstances - for example,
when defining methods, when defining nested functions, when used in
Cython, when inspect.getsource fails for whatever reason. Python is
dynamic enough that you really don't need this kind of hack to
customize its behavior.
from __future__ import print_function
DEBUG = False
def dprint(*args,**kwargs):
'''Debug print'''
print(*args,**kwargs)
_blocked = False
def nodebug(name='dprint'):
'''Decorator to remove all functions with name 'name' being a separate expressions'''
def helper(f):
global _blocked
if _blocked:
return f
import inspect, ast, sys
source = inspect.getsource(f)
a = ast.parse(source) #get ast tree of f
class Transformer(ast.NodeTransformer):
'''Will delete all expressions containing 'name' functions at the top level'''
def visit_Expr(self, node): #visit all expressions
try:
if node.value.func.id == name: #if expression consists of function with name a
return None #delete it
except(ValueError):
pass
return node #return node unchanged
transformer = Transformer()
a_new = transformer.visit(a)
f_new_compiled = compile(a_new,'<string>','exec')
env = sys.modules[f.__module__].__dict__
_blocked = True
try:
exec(f_new_compiled,env)
finally:
_blocked = False
return env[f.__name__]
return helper
#nodebug('dprint')
def f():
dprint('f() started')
print('Important output')
dprint('f() ended')
print('Important output2')
f()
Other relevant links:
Switching off debug prints

Efficient way of having a function only execute once in a loop

At the moment, I'm doing stuff like the following, which is getting tedious:
run_once = 0
while 1:
if run_once == 0:
myFunction()
run_once = 1:
I'm guessing there is some more accepted way of handling this stuff?
What I'm looking for is having a function execute once, on demand. For example, at the press of a certain button. It is an interactive app which has a lot of user controlled switches. Having a junk variable for every switch, just for keeping track of whether it has been run or not, seemed kind of inefficient.

I would use a decorator on the function to handle keeping track of how many times it runs.
def run_once(f):
def wrapper(*args, **kwargs):
if not wrapper.has_run:
wrapper.has_run = True
return f(*args, **kwargs)
wrapper.has_run = False
return wrapper
#run_once
def my_function(foo, bar):
return foo+bar
Now my_function will only run once. Other calls to it will return None. Just add an else clause to the if if you want it to return something else. From your example, it doesn't need to return anything ever.
If you don't control the creation of the function, or the function needs to be used normally in other contexts, you can just apply the decorator manually as well.
action = run_once(my_function)
while 1:
if predicate:
action()
This will leave my_function available for other uses.
Finally, if you need to only run it once twice, then you can just do
action = run_once(my_function)
action() # run once the first time
action.has_run = False
action() # run once the second time

Another option is to set the func_code code object for your function to be a code object for a function that does nothing. This should be done at the end of your function body.
For example:
def run_once():
# Code for something you only want to execute once
run_once.func_code = (lambda:None).func_code
Here run_once.func_code = (lambda:None).func_code replaces your function's executable code with the code for lambda:None, so all subsequent calls to run_once() will do nothing.
This technique is less flexible than the decorator approach suggested in the accepted answer, but may be more concise if you only have one function you want to run once.

Run the function before the loop. Example:
myFunction()
while True:
# all the other code being executed in your loop
This is the obvious solution. If there's more than meets the eye, the solution may be a bit more complicated.

I'm assuming this is an action that you want to be performed at most one time, if some conditions are met. Since you won't always perform the action, you can't do it unconditionally outside the loop. Something like lazily retrieving some data (and caching it) if you get a request, but not retrieving it otherwise.
def do_something():
[x() for x in expensive_operations]
global action
action = lambda : None
action = do_something
while True:
# some sort of complex logic...
if foo:
action()

There are many ways to do what you want; however, do note that it is quite possible that —as described in the question— you don't have to call the function inside the loop.
If you insist in having the function call inside the loop, you can also do:
needs_to_run= expensive_function
while 1:
…
if needs_to_run: needs_to_run(); needs_to_run= None
…

I've thought of another—slightly unusual, but very effective—way to do this that doesn't require decorator functions or classes. Instead it just uses a mutable keyword argument, which ought to work in most versions of Python. Most of the time these are something to be avoided since normally you wouldn't want a default argument value to change from call-to-call—but that ability can be leveraged in this case and used as a cheap storage mechanism. Here's how that would work:
def my_function1(_has_run=[]):
if _has_run: return
print("my_function1 doing stuff")
_has_run.append(1)
def my_function2(_has_run=[]):
if _has_run: return
print("my_function2 doing some other stuff")
_has_run.append(1)
for i in range(10):
my_function1()
my_function2()
print('----')
my_function1(_has_run=[]) # Force it to run.
Output:
my_function1 doing stuff
my_function2 doing some other stuff
----
my_function1 doing stuff
This could be simplified a little further by doing what #gnibbler suggested in his answer and using an iterator (which were introduced in Python 2.2):
from itertools import count
def my_function3(_count=count()):
if next(_count): return
print("my_function3 doing something")
for i in range(10):
my_function3()
print('----')
my_function3(_count=count()) # Force it to run.
Output:
my_function3 doing something
----
my_function3 doing something

Here's an answer that doesn't involve reassignment of functions, yet still prevents the need for that ugly "is first" check.
__missing__ is supported by Python 2.5 and above.
def do_once_varname1():
print 'performing varname1'
return 'only done once for varname1'
def do_once_varname2():
print 'performing varname2'
return 'only done once for varname2'
class cdict(dict):
def __missing__(self,key):
val=self['do_once_'+key]()
self[key]=val
return val
cache_dict=cdict(do_once_varname1=do_once_varname1,do_once_varname2=do_once_varname2)
if __name__=='__main__':
print cache_dict['varname1'] # causes 2 prints
print cache_dict['varname2'] # causes 2 prints
print cache_dict['varname1'] # just 1 print
print cache_dict['varname2'] # just 1 print
Output:
performing varname1
only done once for varname1
performing varname2
only done once for varname2
only done once for varname1
only done once for varname2

One object-oriented approach and make your function a class, aka as a "functor", whose instances automatically keep track of whether they've been run or not when each instance is created.
Since your updated question indicates you may need many of them, I've updated my answer to deal with that by using a class factory pattern. This is a bit unusual, and it may have been down-voted for that reason (although we'll never know for sure because they never left a comment). It could also be done with a metaclass, but it's not much simpler.
def RunOnceFactory():
class RunOnceBase(object): # abstract base class
_shared_state = {} # shared state of all instances (borg pattern)
has_run = False
def __init__(self, *args, **kwargs):
self.__dict__ = self._shared_state
if not self.has_run:
self.stuff_done_once(*args, **kwargs)
self.has_run = True
return RunOnceBase
if __name__ == '__main__':
class MyFunction1(RunOnceFactory()):
def stuff_done_once(self, *args, **kwargs):
print("MyFunction1.stuff_done_once() called")
class MyFunction2(RunOnceFactory()):
def stuff_done_once(self, *args, **kwargs):
print("MyFunction2.stuff_done_once() called")
for _ in range(10):
MyFunction1() # will only call its stuff_done_once() method once
MyFunction2() # ditto
Output:
MyFunction1.stuff_done_once() called
MyFunction2.stuff_done_once() called
Note: You could make a function/class able to do stuff again by adding a reset() method to its subclass that reset the shared has_run attribute. It's also possible to pass regular and keyword arguments to the stuff_done_once() method when the functor is created and the method is called, if desired.
And, yes, it would be applicable given the information you added to your question.

Assuming there is some reason why myFunction() can't be called before the loop
from itertools import count
for i in count():
if i==0:
myFunction()

Here's an explicit way to code this up, where the state of which functions have been called is kept locally (so global state is avoided). I don't much like the non-explicit forms suggested in other answers: it's too surprising to see f() and for this not to mean that f() gets called.
This works by using dict.pop which looks up a key in a dict, removes the key from the dict, and takes a default value to use in case the key isn't found.
def do_nothing(*args, *kwargs):
pass
# A list of all the functions you want to run just once.
actions = [
my_function,
other_function
]
actions = dict((action, action) for action in actions)
while True:
if some_condition:
actions.pop(my_function, do_nothing)()
if some_other_condition:
actions.pop(other_function, do_nothing)()

I use cached_property decorator from functools to run just once and save the value. Example from the official documentation https://docs.python.org/3/library/functools.html
class DataSet:
def __init__(self, sequence_of_numbers):
self._data = tuple(sequence_of_numbers)
#cached_property
def stdev(self):
return statistics.stdev(self._data)

You can also use one of the standard library functools.lru_cache or functools.cache decorators in front of the function:
from functools import lru_cache
#lru_cache
def expensive_function():
return None
https://docs.python.org/3/library/functools.html

If I understand the updated question correctly, something like this should work
def function1():
print "function1 called"
def function2():
print "function2 called"
def function3():
print "function3 called"
called_functions = set()
while True:
n = raw_input("choose a function: 1,2 or 3 ")
func = {"1": function1,
"2": function2,
"3": function3}.get(n)
if func in called_functions:
print "That function has already been called"
else:
called_functions.add(func)
func()

You have all those 'junk variables' outside of your mainline while True loop. To make the code easier to read those variables can be brought inside the loop, right next to where they are used. You can also set up a variable naming convention for these program control switches. So for example:
# # _already_done checkpoint logic
try:
ran_this_user_request_already_done
except:
this_user_request()
ran_this_user_request_already_done = 1
Note that on the first execution of this code the variable ran_this_user_request_already_done is not defined until after this_user_request() is called.

A simple function you can reuse in many places in your code (based on the other answers here):
def firstrun(keyword, _keys=[]):
"""Returns True only the first time it's called with each keyword."""
if keyword in _keys:
return False
else:
_keys.append(keyword)
return True
or equivalently (if you like to rely on other libraries):
from collections import defaultdict
from itertools import count
def firstrun(keyword, _keys=defaultdict(count)):
"""Returns True only the first time it's called with each keyword."""
return not _keys[keyword].next()
Sample usage:
for i in range(20):
if firstrun('house'):
build_house() # runs only once
if firstrun(42): # True
print 'This will print.'
if firstrun(42): # False
print 'This will never print.'

I've taken a more flexible approach inspired by functools.partial function:
DO_ONCE_MEMORY = []
def do_once(id, func, *args, **kwargs):
if id not in DO_ONCE_MEMORY:
DO_ONCE_MEMORY.append(id)
return func(*args, **kwargs)
else:
return None
With this approach you are able to have more complex and explicit interactions:
do_once('foobar', print, "first try")
do_once('foo', print, "first try")
do_once('bar', print, "second try")
# first try
# second try
The exciting part about this approach it can be used anywhere and does not require factories - it's just a small memory tracker.

Depending on the situation, an alternative to the decorator could be the following:
from itertools import chain, repeat
func_iter = chain((myFunction,), repeat(lambda *args, **kwds: None))
while True:
next(func_iter)()
The idea is based on iterators, which yield the function once (or using repeat(muFunction, n) n-times), and then endlessly the lambda doing nothing.
The main advantage is that you don't need a decorator which sometimes complicates things, here everything happens in a single (to my mind) readable line. The disadvantage is that you have an ugly next in your code.
Performance wise there seems to be not much of a difference, on my machine both approaches have an overhead of around 130 ns.

If the condition check needs to happen only once you are in the loop, having a flag signaling that you have already run the function helps. In this case you used a counter, a boolean variable would work just as fine.
signal = False
count = 0
def callme():
print "I am being called"
while count < 2:
if signal == False :
callme()
signal = True
count +=1

I'm not sure that I understood your problem, but I think you can divide loop. On the part of the function and the part without it and save the two loops.

Is there a high-level profiling module for Python?

I want to profile my Python code. I am well-aware of cProfile, and I use it, but it's too low-level. (For example, there isn't even a straightforward way to catch the return value from the function you're profiling.)
One of the things I would like to do: I want to take a function in my program and set it to be profiled on the fly while running the program.
For example, let's say I have a function heavy_func in my program. I want to start the program and have the heavy_func function not profile itself. But sometime during the runtime of my program, I want to change heavy_func to profile itself while it's running. (If you're wondering how I can manipulate stuff while the program is running: I can do it either from the debug probe or from the shell that's integrated into my GUI app.)
Is there a module already written which does stuff like this? I can write it myself but I just wanted to ask before so I won't be reinventing the wheel.

It may be a little mind-bending, but this technique should help you find the "bottlenecks", it that's what you want to do.
You're pretty sure of what routine you want to focus on.
If that's the routine you need to focus on, it will prove you right.
If the real problem(s) are somewhere else, it will show you where they are.
If you want a tedious list of reasons why, look here.

I wrote my own module for it. I called it cute_profile. Here is the code. Here are the tests.
Here is the blog post explaining how to use it.
It's part of GarlicSim, so if you want to use it you can install garlicsim and do from garlicsim.general_misc import cute_profile.
If you want to use it on Python 3 code, just install the Python 3 fork of garlicsim.
Here's an outdated excerpt from the code:
import functools
from garlicsim.general_misc import decorator_tools
from . import base_profile
def profile_ready(condition=None, off_after=True, sort=2):
'''
Decorator for setting a function to be ready for profiling.
For example:
#profile_ready()
def f(x, y):
do_something_long_and_complicated()
The advantages of this over regular `cProfile` are:
1. It doesn't interfere with the function's return value.
2. You can set the function to be profiled *when* you want, on the fly.
How can you set the function to be profiled? There are a few ways:
You can set `f.profiling_on=True` for the function to be profiled on the
next call. It will only be profiled once, unless you set
`f.off_after=False`, and then it will be profiled every time until you set
`f.profiling_on=False`.
You can also set `f.condition`. You set it to a condition function taking
as arguments the decorated function and any arguments (positional and
keyword) that were given to the decorated function. If the condition
function returns `True`, profiling will be on for this function call,
`f.condition` will be reset to `None` afterwards, and profiling will be
turned off afterwards as well. (Unless, again, `f.off_after` is set to
`False`.)
`sort` is an `int` specifying which column the results will be sorted by.
'''
def decorator(function):
def inner(function_, *args, **kwargs):
if decorated_function.condition is not None:
if decorated_function.condition is True or \
decorated_function.condition(
decorated_function.original_function,
*args,
**kwargs
):
decorated_function.profiling_on = True
if decorated_function.profiling_on:
if decorated_function.off_after:
decorated_function.profiling_on = False
decorated_function.condition = None
# This line puts it in locals, weird:
decorated_function.original_function
base_profile.runctx(
'result = '
'decorated_function.original_function(*args, **kwargs)',
globals(), locals(), sort=decorated_function.sort
)
return locals()['result']
else: # decorated_function.profiling_on is False
return decorated_function.original_function(*args, **kwargs)
decorated_function = decorator_tools.decorator(inner, function)
decorated_function.original_function = function
decorated_function.profiling_on = None
decorated_function.condition = condition
decorated_function.off_after = off_after
decorated_function.sort = sort
return decorated_function
return decorator

Hashing a python function to regenerate output when the function is modified

I have a python function that has a deterministic result. It takes a long time to run and generates a large output:
def time_consuming_function():
# lots_of_computing_time to come up with the_result
return the_result
I modify time_consuming_function from time to time, but I would like to avoid having it run again while it's unchanged. [time_consuming_function only depends on functions that are immutable for the purposes considered here; i.e. it might have functions from Python libraries but not from other pieces of my code that I'd change.] The solution that suggests itself to me is to cache the output and also cache some "hash" of the function. If the hash changes, the function will have been modified, and we have to re-generate the output.
Is this possible or ridiculous?
Updated: based on the answers, it looks like what I want to do is to "memoize" time_consuming_function, except instead of (or in addition to) arguments passed into an invariant function, I want to account for a function that itself will change.

If I understand your problem, I think I'd tackle it like this. It's a touch evil, but I think it's more reliable and on-point than the other solutions I see here.
import inspect
import functools
import json
def memoize_zeroadic_function_to_disk(memo_filename):
def decorator(f):
try:
with open(memo_filename, 'r') as fp:
cache = json.load(fp)
except IOError:
# file doesn't exist yet
cache = {}
source = inspect.getsource(f)
#functools.wraps(f)
def wrapper():
if source not in cache:
cache[source] = f()
with open(memo_filename, 'w') as fp:
json.dump(cache, fp)
return cache[source]
return wrapper
return decorator
#memoize_zeroadic_function_to_disk(...SOME PATH HERE...)
def time_consuming_function():
# lots_of_computing_time to come up with the_result
return the_result

Rather than putting the function in a string, I would put the function in its own file. Call it time_consuming.py, for example. It would look something like this:
def time_consuming_method():
# your existing method here
# Is the cached data older than this file?
if (not os.path.exists(data_file_name)
or os.stat(data_file_name).st_mtime < os.stat(__file__).st_mtime):
data = time_consuming_method()
save_data(data_file_name, data)
else:
data = load_data(data_file_name)
# redefine method
def time_consuming_method():
return data
While testing the infrastructure for this to work, I'd comment out the slow parts. Make a simple function that just returns 0, get all of the save/load stuff working to your satisfaction, then put the slow bits back in.

The first part is memoization and serialization of your lookup table. That should be straightforward enough based on some python serialization library. The second part is that you want to delete your serialized lookup table when the source code changes. Perhaps this is being overthought into some fancy solution. Presumably when you change the code you check it in somewhere? Why not add a hook to your checkin routine that deletes your serialized table? Or if this is not research data and is in production, make it part of your release process that if the revision number of your file (put this function in it's own file) has changed, your release script deletes the serialzed lookup table.

So, here is a really neat trick using decorators:
def memoize(f):
cache={};
def result(*args):
if args not in cache:
cache[args]=f(*args);
return cache[args];
return result;
With the above, you can then use:
#memoize
def myfunc(x,y,z):
# Some really long running computation
When you invoke myfunc, you will actually be invoking the memoized version of it. Pretty neat, huh? Whenever you want to redefine your function, simply use "#memoize" again, or explicitly write:
myfunc = memoize(new_definition_for_myfunc);
Edit
I didn't realize that you wanted to cache between multiple runs. In that case, you can do the following:
import os;
import os.path;
import cPickle;
class MemoizedFunction(object):
def __init__(self,f):
self.function=f;
self.filename=str(hash(f))+".cache";
self.cache={};
if os.path.exists(self.filename):
with open(filename,'rb') as file:
self.cache=cPickle.load(file);
def __call__(self,*args):
if args not in self.cache:
self.cache[args]=self.function(*args);
return self.cache[args];
def __del__(self):
with open(self.filename,'wb') as file:
cPickle.dump(self.cache,file,cPickle.HIGHEST_PROTOCOL);
def memoize(f):
return MemoizedFunction(f);

What you describe is effectively memoization. Most common functions can be memoized by defining a decorator.
A (overly simplified) example:
def memoized(f):
cache={}
def memo(*args):
if args in cache:
return cache[args]
else:
ret=f(*args)
cache[args]=ret
return ret
return memo
#memoized
def time_consuming_method():
# lots_of_computing_time to come up with the_result
return the_result
Edit:
From Mike Graham's comment and the OP's update, it is now clear that values need to be cached over different runs of the program. This can be done by using some of of persistent storage for the cache (e.g. something as simple as using Pickle or a simple text file, or maybe using a full blown database, or anything in between). The choice of which method to use depends on what the OP needs. Several other answers already give some solutions to this, so I'm not going to repeat that here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

cleaning up nested function calls - python

Related

Python: More elegant way to add optional parameters to method call

Replacing parts of the function code on-the-fly

Efficient way of having a function only execute once in a loop

Is there a high-level profiling module for Python?

Hashing a python function to regenerate output when the function is modified

Categories

Resources