In essence, I want to put a variable on the stack, that will be reachable by all calls below that part on the stack until the block exits. In Java I would solve this using a static thread local with support methods, that then could be accessed from methods.
Typical example: you get a request, and open a database connection. Until the request is complete, you want all code to use this database connection. After finishing and closing the request, you close the database connection.
What I need this for, is a report generator. Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation. As I don't want to repeat heavy calculations, I need to cache them. My idea is to decorate methods with a cache decorator. The cache creates an id based on the method name and module, and it's arguments, looks if it has this allready calculated in a stack variable, and executes the method if not.
I will try and clearify by showing my current implementation. Want I want to do is to simplify the code for those implementing calculations.
First, I have the central cache access object, which I call MathContext:
class MathContext(object):
def __init__(self, fn):
self.fn = fn
self.cache = dict()
def get(self, calc_config):
id = create_id(calc_config)
if id not in self.cache:
self.cache[id] = calc_config.exec(self)
return self.cache[id]
The fn argument is the filename the context is created in relation to, from where data can be read to be calculated.
Then we have the Calculation class:
class CalcBase(object):
def exec(self, math_context):
raise NotImplementedError
And here is a stupid Fibonacci example. Non of the methods are actually recursive, they work on large sets of data instead, but it works to demonstrate how you would depend on other calculations:
class Fibonacci(CalcBase):
def __init__(self, n): self.n = n
def exec(self, math_context):
if self.n < 2: return 1
a = math_context.get(Fibonacci(self.n-1))
b = math_context.get(Fibonacci(self.n-2))
return a+b
What I want Fibonacci to be instead, is just a decorated method:
#cache
def fib(n):
if n<2: return 1
return fib(n-1)+fib(n-2)
With the math_context example, when math_context goes out of scope, so does all it's cached values. I want the same thing for the decorator. Ie. at point X, everything cached by #cache is dereferrenced to be gced.
I went ahead and made something that might just do what you want. It can be used as both a decorator and a context manager:
from __future__ import with_statement
try:
import cPickle as pickle
except ImportError:
import pickle
class cached(object):
"""Decorator/context manager for caching function call results.
All results are cached in one dictionary that is shared by all cached
functions.
To use this as a decorator:
#cached
def function(...):
...
The results returned by a decorated function are not cleared from the
cache until decorated_function.clear_my_cache() or cached.clear_cache()
is called
To use this as a context manager:
with cached(function) as function:
...
function(...)
...
The function's return values will be cleared from the cache when the
with block ends
To clear all cached results, call the cached.clear_cache() class method
"""
_CACHE = {}
def __init__(self, fn):
self._fn = fn
def __call__(self, *args, **kwds):
key = self._cache_key(*args, **kwds)
function_cache = self._CACHE.setdefault(self._fn, {})
try:
return function_cache[key]
except KeyError:
function_cache[key] = result = self._fn(*args, **kwds)
return result
def clear_my_cache(self):
"""Clear the cache for a decorated function
"""
try:
del self._CACHE[self._fn]
except KeyError:
pass # no cached results
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.clear_my_cache()
def _cache_key(self, *args, **kwds):
"""Create a cache key for the given positional and keyword
arguments. pickle.dumps() is used because there could be
unhashable objects in the arguments, but passing them to
pickle.dumps() will result in a string, which is always hashable.
I used this to make the cached class as generic as possible. Depending
on your requirements, other key generating techniques may be more
efficient
"""
return pickle.dumps((args, sorted(kwds.items())), pickle.HIGHEST_PROTOCOL)
#classmethod
def clear_cache(cls):
"""Clear everything from all functions from the cache
"""
cls._CACHE = {}
if __name__ == '__main__':
# used as decorator
#cached
def fibonacci(n):
print "calculating fibonacci(%d)" % n
if n == 0:
return 0
if n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
for n in xrange(10):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
def lucas(n):
print "calculating lucas(%d)" % n
if n == 0:
return 2
if n == 1:
return 1
return lucas(n - 1) + lucas(n - 2)
# used as context manager
with cached(lucas) as lucas:
for i in xrange(10):
print 'lucas(%d) = %d' % (i, lucas(i))
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
cached.clear_cache()
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
this question seems to be two question
a) sharing db connection
b) caching/Memoizing
b) you have answered yourselves
a) I don't seem to understand why you need to put it on stack?
you can do one of these
you can use a class and connection
could be attribute of it
you can decorate all your function
so that they get a connection from
central location
each function can explicitly use a
global connection method
you can create a connection and pass
around it, or create a context
object and pass around
context,connection can be a part of
context
etc, etc
You could use a global variable wrapped in a getter function:
def getConnection():
global connection
if connection:
return connection
connection=createConnection()
return connection
"you get a request, and open a database connection.... you close the database connection."
This is what objects are for. Create the connection object, pass it to other objects, and then close it when you're done. Globals are not appropriate. Simply pass the value around as a parameter to the other objects that are doing the work.
"Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation.... I need to cache them"
This is what objects are for. Create a dictionary with useful calculation results and pass that around from report part to report part.
You don't need to mess with "stack variables", "static thread local" or anything like that.
Just pass ordinary variable arguments to ordinary method functions. You'll be a lot happier.
class MemoizedCalculation( object ):
pass
class Fibonacci( MemoizedCalculation ):
def __init__( self ):
self.cache= { 0: 1, 1: 1 }
def __call__( self, arg ):
if arg not in self.cache:
self.cache[arg]= self(arg-1) + self(arg-2)
return self.cache[arg]
class MathContext( object ):
def __init__( self ):
self.fibonacci = Fibonacci()
You can use it like this
>>> mc= MathContext()
>>> mc.fibonacci( 4 )
5
You can define any number of calculations and fold them all into a single container object.
If you want, you can make the MathContext into a formal Context Manager so that it work with the with statement. Add these two methods to MathContext.
def __enter__( self ):
print "Initialize"
return self
def __exit__( self, type_, value, traceback ):
print "Release"
Then you can do this.
with MathContext() as mc:
print mc.fibonacci( 4 )
At the end of the with statement, you can guaranteed that the __exit__ method was called.
Related
I have the following code, in which I simply have a decorator for caching a function's results, and as a concrete implementation, I used the Fibonacci function.
After playing around with the code, I wanted to print the cache variable, that's initiated in the cache wrapper.
(It's not because I suspect the cache might be faulty, I simply want to know how to access it without going into debug mode and put a breakpoint inside the decorator)
I tried to explore the fib_w_cache function in debug mode, which is supposed to actually be the wrapped fib_w_cache, but with no success.
import timeit
def cache(f, cache = dict()):
def args_to_str(*args, **kwargs):
return str(args) + str(kwargs)
def wrapper(*args, **kwargs):
args_str = args_to_str(*args, **kwargs)
if args_str in cache:
#print("cache used for: %s" % args_str)
return cache[args_str]
else:
val = f(*args, **kwargs)
cache[args_str] = val
return val
return wrapper
#cache
def fib_w_cache(n):
if n == 0: return 0
elif n == 1: return 1
else:
return fib_w_cache(n-2) + fib_w_cache(n-1)
def fib_wo_cache(n):
if n == 0: return 0
elif n == 1: return 1
else:
return fib_wo_cache(n-1) + fib_wo_cache(n-2)
print(timeit.timeit('[fib_wo_cache(i) for i in range(0,30)]', globals=globals(), number=1))
print(timeit.timeit('[fib_w_cache(i) for i in range(0,30)]', globals=globals(), number=1))
I admit this is not an "elegant" solution in a sense, but keep in mind that python functions are also objects. So with some slight modification to your code, I managed to inject the cache as an attribute of a decorated function:
import timeit
def cache(f):
def args_to_str(*args, **kwargs):
return str(args) + str(kwargs)
def wrapper(*args, **kwargs):
args_str = args_to_str(*args, **kwargs)
if args_str in wrapper._cache:
#print("cache used for: %s" % args_str)
return wrapper._cache[args_str]
else:
val = f(*args, **kwargs)
wrapper._cache[args_str] = val
return val
wrapper._cache = {}
return wrapper
#cache
def fib_w_cache(n):
if n == 0: return 0
elif n == 1: return 1
else:
return fib_w_cache(n-2) + fib_w_cache(n-1)
#cache
def fib_w_cache_1(n):
if n == 0: return 0
elif n == 1: return 1
else:
return fib_w_cache(n-2) + fib_w_cache(n-1)
def fib_wo_cache(n):
if n == 0: return 0
elif n == 1: return 1
else:
return fib_wo_cache(n-1) + fib_wo_cache(n-2)
print(timeit.timeit('[fib_wo_cache(i) for i in range(0,30)]', globals=globals(), number=1))
print(timeit.timeit('[fib_w_cache(i) for i in range(0,30)]', globals=globals(), number=1))
print(fib_w_cache._cache)
print(fib_w_cache_1._cache) # to prove that caches are different instances for different functions
cache is of course a perfectly normal local variable in scope within the cache function, and a perfectly normal nonlocal cellvar in scope within the wrapper function, so if you want to access the value from there, you just do it—as you already are.
But what if you wanted to access it from somewhere else? Then there are two options.
First, cache happens to be defined at the global level, meaning any code anywhere (that hasn't hidden it with a local variable named cache) can access the function object.
And if you're trying to access the values of a function's default parameters from outside the function, they're available in the attributes of the function object. The inspect module docs explain the inspection-oriented attributes of each builtin type:
__defaults__ is a sequence of the values for all positional-or-keyword parameters, in order.
__kwdefaults__ is a mapping from keywords to values for all keyword-only parameters.
So:
>>> def f(a, b=0, c=1, *, d=2, e=3): pass
>>> f.__defaults__
(0, 1)
>>> f.__kwdefaults__
{'e': 3, 'd': 2}
So, for a simple case where you know there's exactly one default value and know which argument it belongs to, all you need is:
>>> cache.__defaults__[0]
{}
If you need to do something more complicated or dynamic, like get the default value for c in the f function above, you need to dig into other information—the only way to know that c's default value will be the second one in __defaults__ is to look at the attributes of the function's code object, like f.__code__.co_varnames, and figure it out from there. But usually, it's better to just use the inspect module's helpers. For example:
>>> inspect.signature(f).parameters['c'].default
1
>>> inspect.signature(cache).parameters['cache'].default
{}
Alternatively, if you're trying to access the cache from inside fib_w_cache, while there's no variable in lexical scope in that function body you can look at, you do know that the function body is only called by the decorator wrapper, and it is available there.
So, you can get your stack frame
frame = inspect.currentframe()
… follow it back to your caller:
back = frame.f_back
… and grab it from that frame's locals:
back.f_locals['cache']
It's worth noting that f_locals works like the locals function: it's actually a copy of the internal locals storage, so modifying it may have no effect, and that copy flattens nonlocal cell variables to regular local variables. If you wanted to access the actual cell variable, you'd have to grub around in things like back.f_code.co_freevars to get the index and then dig it out of the function object's __closure__. But usually, you don't care about that.
Just for a sake of completeness, python has caching decorator built-in in functools.lru_cache with some inspecting mechanisms:
from functools import lru_cache
#lru_cache(maxsize=None)
def fib_w_cache(n):
if n == 0: return 0
elif n == 1: return 1
else:
return fib_w_cache(n-2) + fib_w_cache(n-1)
print('fib_w_cache(10) = ', fib_w_cache(10))
print(fib_w_cache.cache_info())
Prints:
fib_w_cache(10) = 55
CacheInfo(hits=8, misses=11, maxsize=None, currsize=11)
I managed to find a solution (in some sense by #Patrick Haugh's advice).
I simply accessed cache.__defaults__[0] which holds the cache's dict.
The insights about the shared cache and how to avoid it we're also quite useful.
Just as a note, the cache dictionary can only be accessed through the cache function object. It cannot be accessed through the decorated functions (at least as far as I understand). It logically aligns well with the fact that the cache is shared in my implementation, where on the other hand, in the alternative implementation that was proposed, it is local per decorated function.
You can make a class into a wrapper.
def args_to_str(*args, **kwargs):
return str(args) + str(kwargs)
class Cache(object):
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args, **kwargs):
args_str = args_to_str(*args, **kwargs)
if args_str in self.cache:
return self.cache[args_str]
else:
val = self.func(*args, **kwargs)
self.cache[args_str] = val
return val
Each function has its own cache. you can access it by calling function.cache. This also allows for any methods you wish to attach to your function.
If you wanted all decorated functions to share the same cache, you could use a class variable instead of an instance variable:
class SharedCache(object):
cache = {}
def __init__(self, func):
self.func = func
#rest of the the code is the same
#SharedCache
def function_1(stuff):
things
I have a very long function func which takes a browser handle and performs a bunch of requests and reads a bunch of responses in a specific order:
def func(browser):
# make sure we are logged in otherwise log in
# make request to /search and check that the page has loaded
# fill form in /search and submit it
# read table of response and return the result as list of objects
Each operation require a large amount of code due to the complexity of the DOM and they tend to grow really fast.
What would be the best way to refactor this function into smaller components so that the following properties still hold:
the execution flow of the operations and/or their preconditions is guaranteed just like in the current version
the preconditions are not checked with asserts against the state, as this is a very costly operation
func can be called multiple times on the browser
?
Just wrap the three helper methods in a class, and track which methods are allowed to run in an instance.
class Helper(object):
def __init__(self):
self.a = True
self.b = False
self.c = False
def funcA(self):
if not self.A:
raise Error("Cannot run funcA now")
# do stuff here
self.a = False
self.b = True
return whatever
def funcB(self):
if not self.B:
raise Error("Cannot run funcB now")
# do stuff here
self.b = False
self.c = True
return whatever
def funcC(self):
if not self.C:
raise Error("Cannot run funcC now")
# do stuff here
self.c = False
self.a = True
return whatever
def func(...):
h = Helper()
h.funcA()
h.funcB()
h.funcC()
# etc
The only way to call a method is if its flag is true, and each method clears its own flag and sets the next method's flag before exiting. As long as you don't touch h.a et al. directly, this ensures that each method can only be called in the proper order.
Alternately, you can use a single flag that is a reference to the function currently allowed to run.
class Helper(object):
def __init__(self):
self.allowed = self.funcA
def funcA(self):
if self.allowed is not self.funcA:
raise Error("Cannot run funcA now")
# do stuff
self.allowed = self.funcB
return whatever
# etc
Here's the solution I came up with. I used a decorator (closely related to the one in this blog post) which only allows for a function to be called once.
def call_only_once(func):
def new_func(*args, **kwargs):
if not new_func._called:
try:
return func(*args, **kwargs)
finally:
new_func._called = True
else:
raise Exception("Already called this once.")
new_func._called = False
return new_func
#call_only_once
def stateA():
print 'Calling stateA only this time'
#call_only_once
def stateB():
print 'Calling stateB only this time'
#call_only_once
def stateC():
print 'Calling stateC only this time'
def state():
stateA()
stateB()
stateC()
if __name__ == "__main__":
state()
You'll see that if you re-call any of the functions, the function will throw an Exception stating that the functions have already been called.
The problem with this is that if you ever need to call state() again, you're hosed. Unless you implement these functions as private functions, I don't think you can do exactly what you want due to the nature of Python's scoping rules.
Edit
You can also remove the else in the decorator and your function will always return None.
Here a snippet I used once for my state machine
class StateMachine(object):
def __init__(self):
self.handlers = {}
self.start_state = None
self.end_states = []
def add_state(self, name, handler, end_state=0):
name = name.upper()
self.handlers[name] = handler
if end_state:
self.end_states.append(name)
def set_start(self, name):
# startup state
self.start_state = name
def run(self, **kw):
"""
Run
:param kw:
:return:
"""
# the first .run call call the first handler with kw keywords
# each registered handler should returns the following handler and the needed kw
try:
handler = self.handlers[self.start_state]
except:
raise InitializationError("must call .set_start() before .run()")
while True:
(new_state, kw) = handler(**kw)
if isinstance(new_state, str):
if new_state in self.end_states:
print("reached ", new_state)
break
else:
handler = self.handlers[new_state]
elif hasattr(new_state, "__call__"):
handler = new_state
else:
return
The use
class MyParser(StateMachine):
def __init__(self):
super().__init__()
# define handlers
# we can define many handler as we want
self.handlers["begin_parse"] = self.begin_parse
# define the startup handler
self.set_start("begin_parse")
def end(self, **kw):
logging.info("End of parsing ")
# no callable handler => end
return None, None
def second(self, **kw):
logging.info("second ")
# do something
# if condition is reach the call `self.end` handler
if ...:
return self.end, {}
def begin_parse(self, **kw):
logging.info("start of parsing ")
# long process until the condition is reach then call the `self.second` handler with kw new keywords
while True:
kw = {}
if ...:
return self.second, kw
# elif other cond:
# return self.other_handler, kw
# elif other cond 2:
# return self.other_handler 2, kw
else:
return self.end, kw
# start the state machine
MyParser().run()
will print
INFO:root:start of parsing
INFO:root:second
INFO:root:End of parsing
You could use local functions in your func function. Ok, they are still declared inside one single global function, but Python is nice enough to still give you access to them for tests.
Here is one example of one function declaring and executing 3 (supposedly heavy) subfunctions. It takes one optional parameter test that when set to TEST prevent actual execution but instead gives external access to individual sub-functions and to a local variable:
def func(test=None):
glob = []
def partA():
glob.append('A')
def partB():
glob.append('B')
def partC():
glob.append('C')
if (test == 'TEST'):
global testA, testB, testC, testCR
testA, testB, testC, testCR = partA, partB, partC, glob
return None
partA()
partB()
partC()
return glob
When you call func, the 3 parts are executed in sequence. But if you first call func('TEST'), you can then access the local glob variable as testCR, and the 3 subfunctions as testA, testB and testC. This way you can still test individually the 3 parts with well defined input and control their output.
I would insist on the suggestion given by #user3159253 in his comment on the original question:
If the sole purpose is readability I would split the func into three "private" > or "protected" ones (i.e. _func1 or __func1) and a private or protected property > which keeps the state shared between the functions.
This makes a lot of sense to me and seems more usual amongst object oriented programming than the other options. Consider this example as an alternative:
Your class (teste.py):
class Test:
def __init__(self):
self.__environment = {} # Protected information to be shared
self.public_stuff = 'public info' # Accessible to outside callers
def func(self):
print "Main function"
self.__func_a()
self.__func_b()
self.__func_c()
print self.__environment
def __func_a(self):
self.__environment['function a says'] = 'hi'
def __func_b(self):
self.__environment['function b says'] = 'hello'
def __func_c(self):
self.__environment['function c says'] = 'hey'
Other file:
from teste import Test
t = Test()
t.func()
This will output:
Main function says hey guys
{'function a says': 'hi', 'function b says': 'hello', 'function c says': 'hey'}
If you try to call one of the protected functions, an error occurs:
Traceback (most recent call last):
File "C:/Users/Lucas/PycharmProjects/testes/other.py", line 6, in <module>
t.__func_a()
AttributeError: Test instance has no attribute '__func_a'
Same thing if you try to access the protected environment variable:
Traceback (most recent call last):
File "C:/Users/Lucas/PycharmProjects/testes/other.py", line 5, in <module>
print t.__environment
AttributeError: Test instance has no attribute '__environment'
In my view this is the most elegant, simple and readable way to solve your problem, let me know if it fits your needs :)
Basically I want to do something like this:
How can I hook a function in a python module?
but I want to call the old function after my own code.
like
import whatever
oldfunc = whatever.this_is_a_function
def this_is_a_function(parameter):
#my own code here
# and call original function back
oldfunc(parameter)
whatever.this_is_a_function = this_is_a_function
Is this possible?
I tried copy.copy, copy.deepcopy original function but it didn't work.
Something like this? It avoids using globals, which is generally a good thing.
import whatever
import functools
def prefix_function(function, prefunction):
#functools.wraps(function)
def run(*args, **kwargs):
prefunction(*args, **kwargs)
return function(*args, **kwargs)
return run
def this_is_a_function(parameter):
pass # Your own code here that will be run before
whatever.this_is_a_function = prefix_function(
whatever.this_is_a_function, this_is_a_function)
prefix_function is a function that takes two functions: function and prefunction. It returns a function that takes any parameters, and calls prefunction followed by function with the same parameters. The prefix_function function works for any callable, so you only need to program the prefixing code once for any other hooking you might need to do.
#functools.wraps makes it so that the docstring and name of the returned wrapper function is the same.
If you need this_is_a_function to call the old whatever.this_is_a_function with arguments different than what was passed to it, you could do something like this:
import whatever
import functools
def wrap_function(oldfunction, newfunction):
#functools.wraps(function)
def run(*args, **kwargs):
return newfunction(oldfunction, *args, **kwargs)
return run
def this_is_a_function(oldfunc, parameter):
# Do some processing or something to customize the parameters to pass
newparams = parameter * 2 # Example of a change to newparams
return oldfunc(newparams)
whatever.this_is_a_function = wrap_function(
whatever.this_is_a_function, this_is_a_function)
There is a problem that if whatever is a pure C module, it's typically impossible (or very difficult) to change its internals in the first place.
So, here's an example of monkey-patching the time function from the time module.
import time
old_time = time.time
def time():
print('It is today... but more specifically the time is:')
return old_time()
time.time = time
print time.time()
# Output:
# It is today... but more specifically the time is:
# 1456954003.2
However, if you are trying to do this to C code, you will most likely get an error like cannot overwrite attribute. In that case, you probably want to subclass the C module.
You may want to take a look at this question.
This is the perfect time to tout my super-simplistic Hooker
def hook(hookfunc, oldfunc):
def foo(*args, **kwargs):
hookfunc(*args, **kwargs)
return oldfunc(*args, **kwargs)
return foo
Incredibly simple. It will return a function that first runs the desired hook function (with the same parameters, mind you) and will then run the original function that you are hooking and return that original value. This also works to overwrite a class method. Say we have static method in a class.
class Foo:
#staticmethod
def bar(data):
for datum in data:
print(datum, end="") # assuming python3 for this
print()
But we want to print the length of the data before we print out its elements
def myNewFunction(data):
print("The length is {}.".format(len(data)))
And now we simple hook the function
Foo.bar(["a", "b", "c"])
# => a b c
Foo.bar = hook(Foo.bar, myNewFunction)
Foo.bar(["x", "y", "z"])
# => The length is 3.
# => x y z
Actually, you can replace the target function's func_code. The example below
# a normal function
def old_func():
print "i am old"
# a class method
class A(object):
def old_method(self):
print "i am old_method"
# a closure function
def make_closure(freevar1, freevar2):
def wrapper():
print "i am old_clofunc, freevars:", freevar1, freevar2
return wrapper
old_clofunc = make_closure('fv1', 'fv2')
# ===============================================
# the new function
def new_func(*args):
print "i am new, args:", args
# the new closure function
def make_closure2(freevar1, freevar2):
def wrapper():
print "i am new_clofunc, freevars:", freevar1, freevar2
return wrapper
new_clofunc = make_closure2('fv1', 'fv2')
# ===============================================
# hook normal function
old_func.func_code = new_func.func_code
# hook class method
A.old_method.im_func.func_code = new_func.func_code
# hook closure function
# Note: the closure function's `co_freevars` count should be equal
old_clofunc.func_code = new_clofunc.func_code
# ===============================================
# call the old
old_func()
A().old_method()
old_clofunc()
output:
i am new, args: ()
i am new, args: (<__main__.A object at 0x0000000004A5AC50>,)
i am new_clofunc, freevars: fv1 fv2
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Are there any exemplary examples of the GoF Observer implemented in Python? I have a bit code which currently has bits of debugging code laced through the key class (currently generating messages to stderr if a magic env is set). Additionally, the class has an interface for incrementally return results as well as storing them (in memory) for post processing. (The class itself is a job manager for concurrently executing commands on remote machines over ssh).
Currently the usage of the class looks something like:
job = SSHJobMan(hostlist, cmd)
job.start()
while not job.done():
for each in job.poll():
incrementally_process(job.results[each])
time.sleep(0.2) # or other more useful work
post_process(job.results)
An alernative usage model is:
job = SSHJobMan(hostlist, cmd)
job.wait() # implicitly performs a start()
process(job.results)
This all works fine for the current utility. However it does lack flexibility. For example I currently support a brief output format or a progress bar as incremental results, I also support
brief, complete and "merged message" outputs for the post_process() function.
However, I'd like to support multiple results/output streams (progress bar to the terminal, debugging and warnings to a log file, outputs from successful jobs to one file/directory, error messages and other results from non-successful jobs to another, etc).
This sounds like a situation that calls for Observer ... have instances of my class accept registration from other objects and call them back with specific types of events as they occur.
I'm looking at PyPubSub since I saw several references to that in SO related questions. I'm not sure I'm ready to add the external dependency to my utility but I could see value in using their interface as a model for mine if that's going to make it easier for others to use. (The project is intended as both a standalone command line utility and a class for writing other scripts/utilities).
In short I know how to do what I want ... but there are numerous ways to accomplish it. I want suggestions on what's most likely to work for other users of the code in the long run.
The code itself is at: classh.
However it does lack flexibility.
Well... actually, this looks like a good design to me if an asynchronous API is what you want. It usually is. Maybe all you need is to switch from stderr to Python's logging module, which has a sort of publish/subscribe model of its own, what with Logger.addHandler() and so on.
If you do want to support observers, my advice is to keep it simple. You really only need a few lines of code.
class Event(object):
pass
class Observable(object):
def __init__(self):
self.callbacks = []
def subscribe(self, callback):
self.callbacks.append(callback)
def fire(self, **attrs):
e = Event()
e.source = self
for k, v in attrs.items():
setattr(e, k, v)
for fn in self.callbacks:
fn(e)
Your Job class can subclass Observable. When something of interest happens, call self.fire(type="progress", percent=50) or the like.
I think people in the other answers overdo it. You can easily achieve events in Python with less than 15 lines of code.
You simple have two classes: Event and Observer. Any class that wants to listen for an event, needs to inherit Observer and set to listen (observe) for a specific event. When an Event is instantiated and fired, all observers listening to that event will run the specified callback functions.
class Observer():
_observers = []
def __init__(self):
self._observers.append(self)
self._observables = {}
def observe(self, event_name, callback):
self._observables[event_name] = callback
class Event():
def __init__(self, name, data, autofire = True):
self.name = name
self.data = data
if autofire:
self.fire()
def fire(self):
for observer in Observer._observers:
if self.name in observer._observables:
observer._observables[self.name](self.data)
Example:
class Room(Observer):
def __init__(self):
print("Room is ready.")
Observer.__init__(self) # Observer's init needs to be called
def someone_arrived(self, who):
print(who + " has arrived!")
room = Room()
room.observe('someone arrived', room.someone_arrived)
Event('someone arrived', 'Lenard')
Output:
Room is ready.
Lenard has arrived!
A few more approaches...
Example: the logging module
Maybe all you need is to switch from stderr to Python's logging module, which has a powerful publish/subscribe model.
It's easy to get started producing log records.
# producer
import logging
log = logging.getLogger("myjobs") # that's all the setup you need
class MyJob(object):
def run(self):
log.info("starting job")
n = 10
for i in range(n):
log.info("%.1f%% done" % (100.0 * i / n))
log.info("work complete")
On the consumer side there's a bit more work. Unfortunately configuring logger output takes, like, 7 whole lines of code to do. ;)
# consumer
import myjobs, sys, logging
if user_wants_log_output:
ch = logging.StreamHandler(sys.stderr)
ch.setLevel(logging.INFO)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s")
ch.setFormatter(formatter)
myjobs.log.addHandler(ch)
myjobs.log.setLevel(logging.INFO)
myjobs.MyJob().run()
On the other hand there's an amazing amount of stuff in the logging package. If you ever need to send log data to a rotating set of files, an email address, and the Windows Event Log, you're covered.
Example: simplest possible observer
But you don't need to use any library at all. An extremely simple way to support observers is to call a method that does nothing.
# producer
class MyJob(object):
def on_progress(self, pct):
"""Called when progress is made. pct is the percent complete.
By default this does nothing. The user may override this method
or even just assign to it."""
pass
def run(self):
n = 10
for i in range(n):
self.on_progress(100.0 * i / n)
self.on_progress(100.0)
# consumer
import sys, myjobs
job = myjobs.MyJob()
job.on_progress = lambda pct: sys.stdout.write("%.1f%% done\n" % pct)
job.run()
Sometimes instead of writing a lambda, you can just say job.on_progress = progressBar.update, which is nice.
This is about as simple as it gets. One drawback is that it doesn't naturally support multiple listeners subscribing to the same events.
Example: C#-like events
With a bit of support code, you can get C#-like events in Python. Here's the code:
# glue code
class event(object):
def __init__(self, func):
self.__doc__ = func.__doc__
self._key = ' ' + func.__name__
def __get__(self, obj, cls):
try:
return obj.__dict__[self._key]
except KeyError, exc:
be = obj.__dict__[self._key] = boundevent()
return be
class boundevent(object):
def __init__(self):
self._fns = []
def __iadd__(self, fn):
self._fns.append(fn)
return self
def __isub__(self, fn):
self._fns.remove(fn)
return self
def __call__(self, *args, **kwargs):
for f in self._fns[:]:
f(*args, **kwargs)
The producer declares the event using a decorator:
# producer
class MyJob(object):
#event
def progress(pct):
"""Called when progress is made. pct is the percent complete."""
def run(self):
n = 10
for i in range(n+1):
self.progress(100.0 * i / n)
#consumer
import sys, myjobs
job = myjobs.MyJob()
job.progress += lambda pct: sys.stdout.write("%.1f%% done\n" % pct)
job.run()
This works exactly like the "simple observer" code above, but you can add as many listeners as you like using +=. (Unlike C#, there are no event handler types, you don't have to new EventHandler(foo.bar) when subscribing to an event, and you don't have to check for null before firing the event. Like C#, events do not squelch exceptions.)
How to choose
If logging does everything you need, use that. Otherwise do the simplest thing that works for you. The key thing to note is that you don't need to take on a big external dependency.
How about an implementation where objects aren't kept alive just because they're observing something? Below please find an implementation of the observer pattern with the following features:
Usage is pythonic. To add an observer to a bound method .bar of instance foo, just do foo.bar.addObserver(observer).
Observers are not kept alive by virtue of being observers. In other words, the observer code uses no strong references.
No sub-classing necessary (descriptors ftw).
Can be used with unhashable types.
Can be used as many times you want in a single class.
(bonus) As of today the code exists in a proper downloadable, installable package on github.
Here's the code (the github package or PyPI package have the most up to date implementation):
import weakref
import functools
class ObservableMethod(object):
"""
A proxy for a bound method which can be observed.
I behave like a bound method, but other bound methods can subscribe to be
called whenever I am called.
"""
def __init__(self, obj, func):
self.func = func
functools.update_wrapper(self, func)
self.objectWeakRef = weakref.ref(obj)
self.callbacks = {} #observing object ID -> weak ref, methodNames
def addObserver(self, boundMethod):
"""
Register a bound method to observe this ObservableMethod.
The observing method will be called whenever this ObservableMethod is
called, and with the same arguments and keyword arguments. If a
boundMethod has already been registered to as a callback, trying to add
it again does nothing. In other words, there is no way to sign up an
observer to be called back multiple times.
"""
obj = boundMethod.__self__
ID = id(obj)
if ID in self.callbacks:
s = self.callbacks[ID][1]
else:
wr = weakref.ref(obj, Cleanup(ID, self.callbacks))
s = set()
self.callbacks[ID] = (wr, s)
s.add(boundMethod.__name__)
def discardObserver(self, boundMethod):
"""
Un-register a bound method.
"""
obj = boundMethod.__self__
if id(obj) in self.callbacks:
self.callbacks[id(obj)][1].discard(boundMethod.__name__)
def __call__(self, *arg, **kw):
"""
Invoke the method which I proxy, and all of it's callbacks.
The callbacks are called with the same *args and **kw as the main
method.
"""
result = self.func(self.objectWeakRef(), *arg, **kw)
for ID in self.callbacks:
wr, methodNames = self.callbacks[ID]
obj = wr()
for methodName in methodNames:
getattr(obj, methodName)(*arg, **kw)
return result
#property
def __self__(self):
"""
Get a strong reference to the object owning this ObservableMethod
This is needed so that ObservableMethod instances can observe other
ObservableMethod instances.
"""
return self.objectWeakRef()
class ObservableMethodDescriptor(object):
def __init__(self, func):
"""
To each instance of the class using this descriptor, I associate an
ObservableMethod.
"""
self.instances = {} # Instance id -> (weak ref, Observablemethod)
self._func = func
def __get__(self, inst, cls):
if inst is None:
return self
ID = id(inst)
if ID in self.instances:
wr, om = self.instances[ID]
if not wr():
msg = "Object id %d should have been cleaned up"%(ID,)
raise RuntimeError(msg)
else:
wr = weakref.ref(inst, Cleanup(ID, self.instances))
om = ObservableMethod(inst, self._func)
self.instances[ID] = (wr, om)
return om
def __set__(self, inst, val):
raise RuntimeError("Assigning to ObservableMethod not supported")
def event(func):
return ObservableMethodDescriptor(func)
class Cleanup(object):
"""
I manage remove elements from a dict whenever I'm called.
Use me as a weakref.ref callback to remove an object's id from a dict
when that object is garbage collected.
"""
def __init__(self, key, d):
self.key = key
self.d = d
def __call__(self, wr):
del self.d[self.key]
To use this we just decorate methods we want to make observable with #event. Here's an example
class Foo(object):
def __init__(self, name):
self.name = name
#event
def bar(self):
print("%s called bar"%(self.name,))
def baz(self):
print("%s called baz"%(self.name,))
a = Foo('a')
b = Foo('b')
a.bar.addObserver(b.bar)
a.bar()
From wikipedia:
from collections import defaultdict
class Observable (defaultdict):
def __init__ (self):
defaultdict.__init__(self, object)
def emit (self, *args):
'''Pass parameters to all observers and update states.'''
for subscriber in self:
response = subscriber(*args)
self[subscriber] = response
def subscribe (self, subscriber):
'''Add a new subscriber to self.'''
self[subscriber]
def stat (self):
'''Return a tuple containing the state of each observer.'''
return tuple(self.values())
The Observable is used like this.
myObservable = Observable ()
# subscribe some inlined functions.
# myObservable[lambda x, y: x * y] would also work here.
myObservable.subscribe(lambda x, y: x * y)
myObservable.subscribe(lambda x, y: float(x) / y)
myObservable.subscribe(lambda x, y: x + y)
myObservable.subscribe(lambda x, y: x - y)
# emit parameters to each observer
myObservable.emit(6, 2)
# get updated values
myObservable.stat() # returns: (8, 3.0, 4, 12)
Based on Jason's answer, I implemented the C#-like events example as a fully-fledged python module including documentation and tests. I love fancy pythonic stuff :)
So, if you want some ready-to-use solution, you can just use the code on github.
Example: twisted log observers
To register an observer yourCallable() (a callable that accepts a dictionary) to receive all log events (in addition to any other observers):
twisted.python.log.addObserver(yourCallable)
Example: complete producer/consumer example
From Twisted-Python mailing list:
#!/usr/bin/env python
"""Serve as a sample implementation of a twisted producer/consumer
system, with a simple TCP server which asks the user how many random
integers they want, and it sends the result set back to the user, one
result per line."""
import random
from zope.interface import implements
from twisted.internet import interfaces, reactor
from twisted.internet.protocol import Factory
from twisted.protocols.basic import LineReceiver
class Producer:
"""Send back the requested number of random integers to the client."""
implements(interfaces.IPushProducer)
def __init__(self, proto, cnt):
self._proto = proto
self._goal = cnt
self._produced = 0
self._paused = False
def pauseProducing(self):
"""When we've produced data too fast, pauseProducing() will be
called (reentrantly from within resumeProducing's transport.write
method, most likely), so set a flag that causes production to pause
temporarily."""
self._paused = True
print('pausing connection from %s' % (self._proto.transport.getPeer()))
def resumeProducing(self):
self._paused = False
while not self._paused and self._produced < self._goal:
next_int = random.randint(0, 10000)
self._proto.transport.write('%d\r\n' % (next_int))
self._produced += 1
if self._produced == self._goal:
self._proto.transport.unregisterProducer()
self._proto.transport.loseConnection()
def stopProducing(self):
pass
class ServeRandom(LineReceiver):
"""Serve up random data."""
def connectionMade(self):
print('connection made from %s' % (self.transport.getPeer()))
self.transport.write('how many random integers do you want?\r\n')
def lineReceived(self, line):
cnt = int(line.strip())
producer = Producer(self, cnt)
self.transport.registerProducer(producer, True)
producer.resumeProducing()
def connectionLost(self, reason):
print('connection lost from %s' % (self.transport.getPeer()))
factory = Factory()
factory.protocol = ServeRandom
reactor.listenTCP(1234, factory)
print('listening on 1234...')
reactor.run()
OP asks "Are there any exemplary examples of the GoF Observer implemented in Python?"
This is an example in Python 3.7. This Observable class meets the requirement of creating a relationship between one observable and many observers while remaining independent of their structure.
from functools import partial
from dataclasses import dataclass, field
import sys
from typing import List, Callable
#dataclass
class Observable:
observers: List[Callable] = field(default_factory=list)
def register(self, observer: Callable):
self.observers.append(observer)
def deregister(self, observer: Callable):
self.observers.remove(observer)
def notify(self, *args, **kwargs):
for observer in self.observers:
observer(*args, **kwargs)
def usage_demo():
observable = Observable()
# Register two anonymous observers using lambda.
observable.register(
lambda *args, **kwargs: print(f'Observer 1 called with args={args}, kwargs={kwargs}'))
observable.register(
lambda *args, **kwargs: print(f'Observer 2 called with args={args}, kwargs={kwargs}'))
# Create an observer function, register it, then deregister it.
def callable_3():
print('Observer 3 NOT called.')
observable.register(callable_3)
observable.deregister(callable_3)
# Create a general purpose observer function and register four observers.
def callable_x(*args, **kwargs):
print(f'{args[0]} observer called with args={args}, kwargs={kwargs}')
for gui_field in ['Form field 4', 'Form field 5', 'Form field 6', 'Form field 7']:
observable.register(partial(callable_x, gui_field))
observable.notify('test')
if __name__ == '__main__':
sys.exit(usage_demo())
A functional approach to observer design:
def add_listener(obj, method_name, listener):
# Get any existing listeners
listener_attr = method_name + '_listeners'
listeners = getattr(obj, listener_attr, None)
# If this is the first listener, then set up the method wrapper
if not listeners:
listeners = [listener]
setattr(obj, listener_attr, listeners)
# Get the object's method
method = getattr(obj, method_name)
#wraps(method)
def method_wrapper(*args, **kwags):
method(*args, **kwags)
for l in listeners:
l(obj, *args, **kwags) # Listener also has object argument
# Replace the original method with the wrapper
setattr(obj, method_name, method_wrapper)
else:
# Event is already set up, so just add another listener
listeners.append(listener)
def remove_listener(obj, method_name, listener):
# Get any existing listeners
listener_attr = method_name + '_listeners'
listeners = getattr(obj, listener_attr, None)
if listeners:
# Remove the listener
next((listeners.pop(i)
for i, l in enumerate(listeners)
if l == listener),
None)
# If this was the last listener, then remove the method wrapper
if not listeners:
method = getattr(obj, method_name)
delattr(obj, listener_attr)
setattr(obj, method_name, method.__wrapped__)
These methods can then be used to add a listener to any class method. For example:
class MyClass(object):
def __init__(self, prop):
self.prop = prop
def some_method(self, num, string):
print('method:', num, string)
def listener_method(obj, num, string):
print('listener:', num, string, obj.prop)
my = MyClass('my_prop')
add_listener(my, 'some_method', listener_method)
my.some_method(42, 'with listener')
remove_listener(my, 'some_method', listener_method)
my.some_method(42, 'without listener')
And the output is:
method: 42 with listener
listener: 42 with listener my_prop
method: 42 without listener
I've been working on a way to get tests produced from a generator in nose to have descriptions that are customized for the specific iteration being tested. I have something that works, as long as my generator target method never tries to access self from my generator class. I'm seeing that all my generator target instances have a common test class instance while nose is generating a one-offed instance of the test class for each test run from the generator. This is resulting in setUp being run on each test instance nose creates, but never running on the instance the generator target is bound to (of course, the real problem is that I can't see how to bind the nose-created instance to the generator target). Here's the code I'm using to try to figure this all out (yes, I know the decorator would probably be better as a callable class, but nose, at least version 1.2.1 that I have, explicitly checks that tests are either functions or methods, so a callable class won't run at all):
import inspect
def labelable_yielded_case(case):
argspec = inspect.getargspec(case)
if argspec.defaults is not None:
defaults_list = [''] * (len(argspec.args) - len(argspec.defaults)) + argspec.defaults
else:
defaults_list = [''] * len(argspec.args)
argument_defaults_list = zip(argspec.args, defaults_list)
case_wrappers = []
def add_description(wrapper_id, argument_dict):
case_wrappers[wrapper_id].description = case.__doc__.format(**argument_dict)
def case_factory(*factory_args, **factory_kwargs):
def case_wrapper_wrapper():
wrapper_id = len(case_wrappers)
def case_wrapper(*args, **kwargs):
args = factory_args + args
argument_list = []
for argument in argument_defaults_list:
argument_list.append(list(argument))
for index, value in enumerate(args):
argument_list[index][1] = value
argument_dict = dict(argument_list)
argument_dict.update(factory_kwargs)
argument_dict.update(kwargs)
add_description(wrapper_id, argument_dict)
return case(*args, **kwargs)
case_wrappers.append(case_wrapper)
case_wrapper.__name__ = case.__name__
return case_wrapper
return case_wrapper_wrapper()
return case_factory
class TestTest(object):
def __init__(self):
self.data = None
def setUp(self):
print 'setup', self
self.data = (1,2,3)
def test_all(self):
for index, value in enumerate((1,2,3)):
yield self.validate_equality(), index, value
def test_all_again(self):
for index, value in enumerate((1,2,3)):
yield self.validate_equality_again, index, value
#labelable_yielded_case
def validate_equality(self, index, value):
'''element {index} equals {value}'''
print 'test', self
assert self.data[index] == value, 'expected %d got %d' % (value, self.data[index])
def validate_equality_again(self, index, value):
print 'test', self
assert self.data[index] == value, 'expected %d got %d' % (value, self.data[index])
validate_equality_again.description = 'again'
When run through nose, the again tests work just fine, but the set of tests using the decorated generator target all fail because self.data is None (because setUp is never run because the instance of TestTest stored in the closures is not the instances run by nose). I tried making the decorator an instance member of a base class for TestTest, but then nose threw errors about having too few arguments (no self) passed to the unbound labelable_yielded_case. Is there any way I can make this work (short of hacking nose), or am I stuck choosing between either not being able to have the yield target be an instance member or not having per-test labeling for each yielded test?
Fixed it (at least for the case here, though I think I got it for all cases). I had to fiddle with case_wrapper_wrapper and case_wrapper to get the factory to return the wrapped cases attached to the correct class, but not bound to any given instance in any way. I also had another code issue because I was building the argument dict in wrapper wrapper, but then not passing it to the case. Working code:
import inspect
def labelable_yielded_case(case):
argspec = inspect.getargspec(case)
if argspec.defaults is not None:
defaults_list = [''] * (len(argspec.args) - len(argspec.defaults)) + argspec.defaults
else:
defaults_list = [''] * len(argspec.args)
argument_defaults_list = zip(argspec.args, defaults_list)
case_wrappers = []
def add_description(wrapper_id, argument_dict):
case_wrappers[wrapper_id].description = case.__doc__.format(**argument_dict)
def case_factory(*factory_args, **factory_kwargs):
def case_wrapper_wrapper():
wrapper_id = len(case_wrappers)
def case_wrapper(*args, **kwargs):
argument_list = []
for argument in argument_defaults_list:
argument_list.append(list(argument))
for index, value in enumerate(args):
argument_list[index][1] = value
argument_dict = dict(argument_list)
argument_dict.update(kwargs)
add_description(wrapper_id, argument_dict)
return case(**argument_dict)
case_wrappers.append(case_wrapper)
case_name = case.__name__ + str(wrapper_id)
case_wrapper.__name__ = case_name
if factory_args:
setattr(factory_args[0].__class__, case_name, case_wrapper)
return getattr(factory_args[0].__class__, case_name)
else:
return case_wrapper
return case_wrapper_wrapper()
return case_factory
class TestTest(object):
def __init__(self):
self.data = None
def setUp(self):
self.data = (1,2,3)
def test_all(self):
for index, value in enumerate((1,2,3)):
yield self.validate_equality(), index, value
#labelable_yielded_case
def validate_equality(self, index, value):
'''element {index} equals {value}'''
assert self.data[index] == value, 'expected %d got %d' % (value, self.data[index])