I have a utility module that I use to provide data to other scripts. I can't get my head around the best way of utilising this whilst minimising the amount of function calls (which are all, for the sake of the argument, slow).
It looks something like this:
helper.py
dataset1 = slow_process1()
dataset2 = slow_process2()
def get_specific_data1():
data = #do stuff with dataset1
return data
def get_specific_data2():
data = #do stuff with dataset1
return data
def get_specific_data3():
data = #do stuff with dataset2
return data
Now, say I need to run get_specific_data1 in a script. In the setup above, I'm importing the module, which means I call slow_process2 on import, unnecessarily.
If I nest the assignment of dataset1 and dataset2, but then need to call get_specific_data1 and get_specific_data2 in the same script, I run slow_process1 twice, which again is unnecessary.
If I create a Helper class with methods for the get_specific_data functions, which runs slow_process1 or slow_process2 if required, stores the data, and then can access as required when methods are called I can get around this. Is that appropriate?
Something like:
class Helper:
def __init__(self):
self.dataset1 = None
self.dataset2 = None
def run_dataset1():
self.dataset1 = slow_process1()
def run_dataset2():
self.dataset2 = slow_process2()
def get_specific_data1():
if dataset1 is None:
self.rundataset1()
data = #do stuff with dataset1
return data
etc
Apologies if this is a stupid question, but I have limited experience with OOP and don't want to make mistakes up front.
Thanks
This is what I meant about using a class with properties, only in this case I've used a custom version of one named lazyproperty. It's considered "lazy" because it only gets computed when when it's accessed, like a regular property, but unlike them, the computed value is effectively cached in a way—changing it into a instance attribute—so it won't be re-computed every time.
Caveat: Doing this assumes that the value would be the same no matter when it was calculated and any changes made to it after the first access will be visible to other methods of the same instance of the class in which it was used—i.e they won't see a freshly re-computed value.
Once this is done, the methods in the class can just reference self.dataset1 or self.dataset2 as though they were regular instance attributes, and then, if it's the first time, the data associated with it will be computed, otherwise the value previously created value will simply be returned. You can see this happening in the output produced (shown far below).
# From the book "Python Cookbook" 3rd Edition.
class lazyproperty:
def __init__(self, func):
self.func = func
def __get__(self, instance, cls):
if instance is None:
return self
else:
value = self.func(instance)
setattr(instance, self.func.__name__, value)
return value
def slow_process1():
print('slow_process1() running')
return 13
def slow_process2():
print('slow_process2() running')
return 21
class Helper:
def __init__(self):
""" Does nothing - so not really needed. """
pass
#lazyproperty
def dataset1(self):
return slow_process1()
#lazyproperty
def dataset2(self):
return slow_process2()
def process_data1(self):
print('self.dataset1:', self.dataset1) # doing stuff with dataset1
return self.dataset1 * 2
def process_data2(self):
print('self.dataset2:', self.dataset2) # doing stuff with dataset2
return self.dataset2 * 2
def process_data3(self):
print('self.dataset2:', self.dataset2) # also does stuff with dataset2
return self.dataset2 * 3
if __name__ == '__main__':
helper = Helper()
print(helper.process_data1()) # Will cause slow_process1() to be called
print(helper.process_data2()) # Will cause slow_process2() to be called
print(helper.process_data3()) # Won't call slow_process2() again
Output:
slow_process1() running
self.dataset1: 13
26
slow_process2() running
self.dataset2: 21
42
self.dataset2: 21
63
You might be able to solve this with a lazy loading technique:
dataset1 = None
dataset2 = None
def ensureDataset1():
global dataset1
if dataset1 is None:
dataset1 = slow_process1()
def ensureDataset2():
global dataset2
if dataset2 is None:
dataset2 = slow_process2()
def get_specific_data1():
ensureDataset1()
data = #do stuff with dataset1
return data
etc
The side effect here is that if you never get around to examining either of dataset1 or dataset2 they never load.
Related
Given a class with class methods that contain only self input:
class ABC():
def __init__(self, input_dict)
self.variable_0 = input_dict['variable_0']
self.variable_1 = input_dict['variable_1']
self.variable_2 = input_dict['variable_2']
self.variable_3 = input_dict['variable_3']
def some_operation_0(self):
return self.variable_0 + self.variable_1
def some_operation_1(self):
return self.variable_2 + self.variable_3
First question: Is this very bad practice? Should I just refactor some_operation_0(self) to explicitly take the necessary inputs, some_operation_0(self, variable_0, variable_1)? If so, the testing is very straightforward.
Second question: What is the correct way to setup my unit test on the method some_operation_0(self)?
Should I setup a fixture in which I initialize input_dict, and then instantiate the class with a mock object?
#pytest.fixture
def generator_inputs():
f = open('inputs.txt', 'r')
input_dict = eval(f.read())
f.close()
mock_obj = ABC(input_dict)
def test_some_operation_0():
assert mock_obj.some_operation_0() == some_value
(I am new to both python and general unit testing...)
Those methods do take an argument: self. There is no need to mock anything. Instead, you can simply create an instance, and verify that the methods return the expected value when invoked.
For your example:
def test_abc():
a = ABC({'variable_0':0, 'variable_1':1, 'variable_2':2, 'variable_3':3))
assert a.some_operation_0() == 1
assert a.some_operation_1() == 5
If constructing an instance is very difficult, you might want to change your code so that the class can be instantiated from standard in-memory data structures (e.g. a dictionary). In that case, you could create a separate function that reads/parses data from a file and uses the "data-structure-based" __init__ method, e.g. make_abc() or a class method.
If this approach does not generalize to your real problem, you could imagine providing programmatic access to the key names or other metadata that ABC recognizes or cares about. Then, you could programmatically construct a "defaulted" instance, e.g. an instance where every value in the input dict is a default-constructed value (such as 0 for int):
class ABC():
PROPERTY_NAMES = ['variable_0', 'variable_1', 'variable_2', 'variable_3']
def __init__(self, input_dict):
# implementation omitted for brevity
pass
def some_operation_0(self):
return self.variable_0 + self.variable_1
def some_operation_1(self):
return self.variable_2 + self.variable_3
def test_abc():
a = ABC({name: 0 for name in ABC.PROPERTY_NAMES})
assert a.some_operation_0() == 0
assert a.some_operation_1() == 0
Basically I want to do something like this:
How can I hook a function in a python module?
but I want to call the old function after my own code.
like
import whatever
oldfunc = whatever.this_is_a_function
def this_is_a_function(parameter):
#my own code here
# and call original function back
oldfunc(parameter)
whatever.this_is_a_function = this_is_a_function
Is this possible?
I tried copy.copy, copy.deepcopy original function but it didn't work.
Something like this? It avoids using globals, which is generally a good thing.
import whatever
import functools
def prefix_function(function, prefunction):
#functools.wraps(function)
def run(*args, **kwargs):
prefunction(*args, **kwargs)
return function(*args, **kwargs)
return run
def this_is_a_function(parameter):
pass # Your own code here that will be run before
whatever.this_is_a_function = prefix_function(
whatever.this_is_a_function, this_is_a_function)
prefix_function is a function that takes two functions: function and prefunction. It returns a function that takes any parameters, and calls prefunction followed by function with the same parameters. The prefix_function function works for any callable, so you only need to program the prefixing code once for any other hooking you might need to do.
#functools.wraps makes it so that the docstring and name of the returned wrapper function is the same.
If you need this_is_a_function to call the old whatever.this_is_a_function with arguments different than what was passed to it, you could do something like this:
import whatever
import functools
def wrap_function(oldfunction, newfunction):
#functools.wraps(function)
def run(*args, **kwargs):
return newfunction(oldfunction, *args, **kwargs)
return run
def this_is_a_function(oldfunc, parameter):
# Do some processing or something to customize the parameters to pass
newparams = parameter * 2 # Example of a change to newparams
return oldfunc(newparams)
whatever.this_is_a_function = wrap_function(
whatever.this_is_a_function, this_is_a_function)
There is a problem that if whatever is a pure C module, it's typically impossible (or very difficult) to change its internals in the first place.
So, here's an example of monkey-patching the time function from the time module.
import time
old_time = time.time
def time():
print('It is today... but more specifically the time is:')
return old_time()
time.time = time
print time.time()
# Output:
# It is today... but more specifically the time is:
# 1456954003.2
However, if you are trying to do this to C code, you will most likely get an error like cannot overwrite attribute. In that case, you probably want to subclass the C module.
You may want to take a look at this question.
This is the perfect time to tout my super-simplistic Hooker
def hook(hookfunc, oldfunc):
def foo(*args, **kwargs):
hookfunc(*args, **kwargs)
return oldfunc(*args, **kwargs)
return foo
Incredibly simple. It will return a function that first runs the desired hook function (with the same parameters, mind you) and will then run the original function that you are hooking and return that original value. This also works to overwrite a class method. Say we have static method in a class.
class Foo:
#staticmethod
def bar(data):
for datum in data:
print(datum, end="") # assuming python3 for this
print()
But we want to print the length of the data before we print out its elements
def myNewFunction(data):
print("The length is {}.".format(len(data)))
And now we simple hook the function
Foo.bar(["a", "b", "c"])
# => a b c
Foo.bar = hook(Foo.bar, myNewFunction)
Foo.bar(["x", "y", "z"])
# => The length is 3.
# => x y z
Actually, you can replace the target function's func_code. The example below
# a normal function
def old_func():
print "i am old"
# a class method
class A(object):
def old_method(self):
print "i am old_method"
# a closure function
def make_closure(freevar1, freevar2):
def wrapper():
print "i am old_clofunc, freevars:", freevar1, freevar2
return wrapper
old_clofunc = make_closure('fv1', 'fv2')
# ===============================================
# the new function
def new_func(*args):
print "i am new, args:", args
# the new closure function
def make_closure2(freevar1, freevar2):
def wrapper():
print "i am new_clofunc, freevars:", freevar1, freevar2
return wrapper
new_clofunc = make_closure2('fv1', 'fv2')
# ===============================================
# hook normal function
old_func.func_code = new_func.func_code
# hook class method
A.old_method.im_func.func_code = new_func.func_code
# hook closure function
# Note: the closure function's `co_freevars` count should be equal
old_clofunc.func_code = new_clofunc.func_code
# ===============================================
# call the old
old_func()
A().old_method()
old_clofunc()
output:
i am new, args: ()
i am new, args: (<__main__.A object at 0x0000000004A5AC50>,)
i am new_clofunc, freevars: fv1 fv2
So the situation is that I have multiple methods, which might be threaded simaltenously, but all need their own lock
against being re-threaded until they have run. They are established by initialising a class with some dataprocessing options:
class InfrequentDataDaemon(object): pass
class FrequentDataDaemon(object): pass
def addMethod(name):
def wrapper(f):
setattr(processor, f.__name__, staticmethod(f))
return f
return wrapper
class DataProcessors(object):
lock = threading.Lock()
def __init__(self, options):
self.common_settings = options['common_settings']
self.data_processing_configurations = options['data_processing_configurations'] #Configs for each processing method
self.data_processing_types = options['data_processing_types']
self.Data_Processsing_Functions ={}
#I __init__ each processing method as a seperate function so that it can be locked
for type in options['data_processing_types']:
def bindFunction1(name):
def func1(self, data=None, lock=None):
config = self.data_processing_configurations[data['type']] #I get the right config for the datatype
with lock:
FetchDataBaseStuff(data['type'])
#I don't want this to be run more than once at a time per DataProcessing Type
# But it's fine if multiple DoSomethings run at once, as long as each DataType is different!
DoSomething(data, config)
WriteToDataBase(data['type'])
func1.__name__ = "Processing_for_{}".format(type)
self.Data_Processing_Functions[func1.__name__] = func1 #Add this function to the Dictinary object
bindFunction1(type)
#Then I add some methods to a daemon that are going to check if our Dataprocessors need to be called
def fast_process_types(data):
if not example_condition is True: return
if not data['type'] in self.data_processing_types: return #Check that we are doing something with this type of data
threading.Thread(target=self.Data_Processing_Functions["Processing_for_{}".format(data['type'])], args=(self,data, lock)).start()
def slow_process_types(data):
if not some_other_condition is True: return
if not data['type'] in self.data_processing_types: return #Check that we are doing something with this type of data
threading.Thread(target=self.Data_Processing_Functions["Processing_for_{}".format(data['type'])], args=(self,data, lock)).start()
addMethod(InfrequentDataDaemon)(slow_process_types)
addMethod(FrequentDataDaemon)(fast_process_types)
The idea is to lock each method in
DataProcessors.Data_Processing_Functions - so that each method is only accessed by one thread at a time (and the rest of the threads for the same method are queued). How does Locking need to be set up to achieve this effect?
I'm not sure I completely follow what you're trying to do here, but could you just create a separate threading.Lock object for each type?
class DataProcessors(object):
def __init__(self, options):
self.common_settings = options['common_settings']
self.data_processing_configurations = options['data_processing_configurations'] #Configs for each processing method
self.data_processing_types = options['data_processing_types']
self.Data_Processsing_Functions ={}
self.locks = {}
#I __init__ each processing method as a seperate function so that it can be locked
for type in options['data_processing_types']:
self.locks[type] = threading.Lock()
def bindFunction1(name):
def func1(self, data=None):
config = self.data_processing_configurations[data['type']] #I get the right config for the datatype
with self.locks[data['type']]:
FetchDataBaseStuff(data['type'])
DoSomething(data, config)
WriteToDataBase(data['type'])
func1.__name__ = "Processing_for_{}".format(type)
self.Data_Processing_Functions[func1.__name__] = func1 #Add this function to the Dictinary object
bindFunction1(type)
#Then I add some methods to a daemon that are going to check if our Dataprocessors need to be called
def fast_process_types(data):
if not example_condition is True: return
if not data['type'] in self.data_processing_types: return #Check that we are doing something with this type of data
threading.Thread(target=self.Data_Processing_Functions["Processing_for_{}".format(data['type'])], args=(self,data)).start()
def slow_process_types(data):
if not some_other_condition is True: return
if not data['type'] in self.data_processing_types: return #Check that we are doing something with this type of data
threading.Thread(target=self.Data_Processing_Functions["Processing_for_{}".format(data['type'])], args=(self,data)).start()
addMethod(InfrequentDataDaemon)(slow_process_types)
addMethod(FrequentDataDaemon)(fast_process_types)
I am maintaining a little library of useful functions for interacting with my company's APIs and I have come across (what I think is) a neat question that I can't find the answer to.
I frequently have to request large amounts of data from an API, so I do something like:
class Client(object):
def __init__(self):
self.data = []
def get_data(self, offset = 0):
done = False
while not done:
data = get_more_starting_at(offset)
self.data.extend(data)
offset += 1
if not data:
done = True
This works fine and allows me to restart the retrieval where I left off if something goes horribly wrong. However, since python functions are just regular objects, we can do stuff like:
def yo():
yo.hi = "yo!"
return None
and then we can interrogate yo about its properties later, like:
yo.hi => "yo!"
my question is: Can I rewrite my class-based example to pin the data to the function itself, without referring to the function by name. I know I can do this by:
def get_data(offset=0):
done = False
get_data.data = []
while not done:
data = get_more_starting_from(offset)
get_data.data.extend(data)
offset += 1
if not data:
done = True
return get_data.data
but I would like to do something like:
def get_data(offset=0):
done = False
self.data = [] # <===== this is the bit I can't figure out
while not done:
data = get_more_starting_from(offset)
self.data.extend(data) # <====== also this!
offset += 1
if not data:
done = True
return self.data # <======== want to refer to the "current" object
Is it possible to refer to the "current" object by anything other than its name?
Something like "this", "self", or "memememe!" is what I'm looking for.
I don't understand why you want to do this, but it's what a fixed point combinator allows you to do:
import functools
def Y(f):
#functools.wraps(f)
def Yf(*args):
return inner(*args)
inner = f(Yf)
return Yf
#Y
def get_data(f):
def inner_get_data(*args):
# This is your real get data function
# define it as normal
# but just refer to it as 'f' inside itself
print 'setting get_data.foo to', args
f.foo = args
return inner_get_data
get_data(1, 2, 3)
print get_data.foo
So you call get_data as normal, and it "magically" knows that f means itself.
You could do this, but (a) the data is not per-function-invocation, but per function (b) it's much easier to achieve this sort of thing with a class.
If you had to do it, you might do something like this:
def ybother(a,b,c,yrselflambda = lambda: ybother):
yrself = yrselflambda()
#other stuff
The lambda is necessary, because you need to delay evaluation of the term ybother until something has been bound to it.
Alternatively, and increasingly pointlessly:
from functools import partial
def ybother(a,b,c,yrself=None):
#whatever
yrself.data = [] # this will blow up if the default argument is used
#more stuff
bothered = partial(ybother, yrself=ybother)
Or:
def unbothered(a,b,c):
def inbothered(yrself):
#whatever
yrself.data = []
return inbothered, inbothered(inbothered)
This last version gives you a different function object each time, which you might like.
There are almost certainly introspective tricks to do this, but they are even less worthwhile.
Not sure what doing it like this gains you, but what about using a decorator.
import functools
def add_self(f):
#functools.wraps(f)
def wrapper(*args,**kwargs):
if not getattr(f, 'content', None):
f.content = []
return f(f, *args, **kwargs)
return wrapper
#add_self
def example(self, arg1):
self.content.append(arg1)
print self.content
example(1)
example(2)
example(3)
OUTPUT
[1]
[1, 2]
[1, 2, 3]
In essence, I want to put a variable on the stack, that will be reachable by all calls below that part on the stack until the block exits. In Java I would solve this using a static thread local with support methods, that then could be accessed from methods.
Typical example: you get a request, and open a database connection. Until the request is complete, you want all code to use this database connection. After finishing and closing the request, you close the database connection.
What I need this for, is a report generator. Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation. As I don't want to repeat heavy calculations, I need to cache them. My idea is to decorate methods with a cache decorator. The cache creates an id based on the method name and module, and it's arguments, looks if it has this allready calculated in a stack variable, and executes the method if not.
I will try and clearify by showing my current implementation. Want I want to do is to simplify the code for those implementing calculations.
First, I have the central cache access object, which I call MathContext:
class MathContext(object):
def __init__(self, fn):
self.fn = fn
self.cache = dict()
def get(self, calc_config):
id = create_id(calc_config)
if id not in self.cache:
self.cache[id] = calc_config.exec(self)
return self.cache[id]
The fn argument is the filename the context is created in relation to, from where data can be read to be calculated.
Then we have the Calculation class:
class CalcBase(object):
def exec(self, math_context):
raise NotImplementedError
And here is a stupid Fibonacci example. Non of the methods are actually recursive, they work on large sets of data instead, but it works to demonstrate how you would depend on other calculations:
class Fibonacci(CalcBase):
def __init__(self, n): self.n = n
def exec(self, math_context):
if self.n < 2: return 1
a = math_context.get(Fibonacci(self.n-1))
b = math_context.get(Fibonacci(self.n-2))
return a+b
What I want Fibonacci to be instead, is just a decorated method:
#cache
def fib(n):
if n<2: return 1
return fib(n-1)+fib(n-2)
With the math_context example, when math_context goes out of scope, so does all it's cached values. I want the same thing for the decorator. Ie. at point X, everything cached by #cache is dereferrenced to be gced.
I went ahead and made something that might just do what you want. It can be used as both a decorator and a context manager:
from __future__ import with_statement
try:
import cPickle as pickle
except ImportError:
import pickle
class cached(object):
"""Decorator/context manager for caching function call results.
All results are cached in one dictionary that is shared by all cached
functions.
To use this as a decorator:
#cached
def function(...):
...
The results returned by a decorated function are not cleared from the
cache until decorated_function.clear_my_cache() or cached.clear_cache()
is called
To use this as a context manager:
with cached(function) as function:
...
function(...)
...
The function's return values will be cleared from the cache when the
with block ends
To clear all cached results, call the cached.clear_cache() class method
"""
_CACHE = {}
def __init__(self, fn):
self._fn = fn
def __call__(self, *args, **kwds):
key = self._cache_key(*args, **kwds)
function_cache = self._CACHE.setdefault(self._fn, {})
try:
return function_cache[key]
except KeyError:
function_cache[key] = result = self._fn(*args, **kwds)
return result
def clear_my_cache(self):
"""Clear the cache for a decorated function
"""
try:
del self._CACHE[self._fn]
except KeyError:
pass # no cached results
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.clear_my_cache()
def _cache_key(self, *args, **kwds):
"""Create a cache key for the given positional and keyword
arguments. pickle.dumps() is used because there could be
unhashable objects in the arguments, but passing them to
pickle.dumps() will result in a string, which is always hashable.
I used this to make the cached class as generic as possible. Depending
on your requirements, other key generating techniques may be more
efficient
"""
return pickle.dumps((args, sorted(kwds.items())), pickle.HIGHEST_PROTOCOL)
#classmethod
def clear_cache(cls):
"""Clear everything from all functions from the cache
"""
cls._CACHE = {}
if __name__ == '__main__':
# used as decorator
#cached
def fibonacci(n):
print "calculating fibonacci(%d)" % n
if n == 0:
return 0
if n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
for n in xrange(10):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
def lucas(n):
print "calculating lucas(%d)" % n
if n == 0:
return 2
if n == 1:
return 1
return lucas(n - 1) + lucas(n - 2)
# used as context manager
with cached(lucas) as lucas:
for i in xrange(10):
print 'lucas(%d) = %d' % (i, lucas(i))
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
cached.clear_cache()
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
this question seems to be two question
a) sharing db connection
b) caching/Memoizing
b) you have answered yourselves
a) I don't seem to understand why you need to put it on stack?
you can do one of these
you can use a class and connection
could be attribute of it
you can decorate all your function
so that they get a connection from
central location
each function can explicitly use a
global connection method
you can create a connection and pass
around it, or create a context
object and pass around
context,connection can be a part of
context
etc, etc
You could use a global variable wrapped in a getter function:
def getConnection():
global connection
if connection:
return connection
connection=createConnection()
return connection
"you get a request, and open a database connection.... you close the database connection."
This is what objects are for. Create the connection object, pass it to other objects, and then close it when you're done. Globals are not appropriate. Simply pass the value around as a parameter to the other objects that are doing the work.
"Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation.... I need to cache them"
This is what objects are for. Create a dictionary with useful calculation results and pass that around from report part to report part.
You don't need to mess with "stack variables", "static thread local" or anything like that.
Just pass ordinary variable arguments to ordinary method functions. You'll be a lot happier.
class MemoizedCalculation( object ):
pass
class Fibonacci( MemoizedCalculation ):
def __init__( self ):
self.cache= { 0: 1, 1: 1 }
def __call__( self, arg ):
if arg not in self.cache:
self.cache[arg]= self(arg-1) + self(arg-2)
return self.cache[arg]
class MathContext( object ):
def __init__( self ):
self.fibonacci = Fibonacci()
You can use it like this
>>> mc= MathContext()
>>> mc.fibonacci( 4 )
5
You can define any number of calculations and fold them all into a single container object.
If you want, you can make the MathContext into a formal Context Manager so that it work with the with statement. Add these two methods to MathContext.
def __enter__( self ):
print "Initialize"
return self
def __exit__( self, type_, value, traceback ):
print "Release"
Then you can do this.
with MathContext() as mc:
print mc.fibonacci( 4 )
At the end of the with statement, you can guaranteed that the __exit__ method was called.