Consider
def f(x,*args):
intermediate = computationally_expensive_fct(x)
return do_stuff(intermediate,*args)
The problem: For the same x, this function might be called thousands of times with different arguments (other than x) and each time the function gets called intermediate would be computed (Cholesky factorisation, cost O(n^3)). In principle however it is enough if for each x, intermediate is computed only once for each x and then that result would be used again and again by f with different args.
My idea To remedy this, I tried to create a global dictionary where the function looks up whether for its parameter x the expensive stuff has already been done and stored in the dictionary or whether it has to compute it:
if all_intermediates not in globals():
global all_intermediates = {}
if all_intermediates.has_key(x):
pass
else:
global all_intermediates[x] = computationally_expensive_fct(x)
It turns out I can't do this because globals() is a dict itself and you can't hash dicts in python. I'm a novice programmer and would be happy if someone could point me towards a pythonic way to do what I want to achieve.
Solution
A bit more lightweight than writing a decorator and without accessing globals:
def f(x, *args):
if not hasattr(f, 'all_intermediates'):
f.all_intermediates = {}
if x not in f.all_intermediates:
f.all_intermediates[x] = computationally_expensive_fct(x)
intermediate = f.all_intermediates[x]
return do_stuff(intermediate,*args)
Variation
A variation that avoids the if not hasattr but needs to set all_intermediates as attribute of f after it is defined:
def f(x, *args):
if x not in f.all_intermediates:
f.all_intermediates[x] = computationally_expensive_fct(x)
intermediate = f.all_intermediates[x]
return do_stuff(intermediate,*args)
f.all_intermediates = {}
This caches all_intermediates as an attribute of the function itself.
Explanation
Functions are objects and can have attributes. Therefore, you can store the dictionary all_intermediates as an attribute of the function f. This makes the function self contained, meaning you can move it to another module without worrying about module globals. Using the variation shown above, you need move f.all_intermediates = {} along with function.
Putting things into globals() feels not right. I recommend against doing this.
I don't get it why you are trying to use globals(). Instead of using globals() you can simply keep computed values in your own module level dictionary and have a wrapper function that will look up whether intermediate is already calculated or not. Something like this:
computed_intermediate = {}
def get_intermediate(x):
if x not in computed_intermediate:
computed_intermediate[x] = computationally_expensive_fct(x)
return computed_intermediate[x]
def f(x,*args):
intermediate = get_intermediate(x)
return do_stuff(intermediate,*args)
In this way computationally_expensive_fct(x) will be calculated only once for each x, namely the first time it is accessed.
This is often implemented with the #memoized decorator on the expensive function.
It is described at https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize and brief enough to duplicate here in case of link rot:
import collections
import functools
class memoized(object):
'''Decorator. Caches a function's return value each time it is called.
If called later with the same arguments, the cached value is returned
(not reevaluated).
'''
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if not isinstance(args, collections.Hashable):
# uncacheable. a list, for instance.
# better to not cache than blow up.
return self.func(*args)
if args in self.cache:
return self.cache[args]
else:
value = self.func(*args)
self.cache[args] = value
return value
def __repr__(self):
'''Return the function's docstring.'''
return self.func.__doc__
def __get__(self, obj, objtype):
'''Support instance methods.'''
return functools.partial(self.__call__, obj)
Once the expensive function is memoized, using it is invisible:
#memoized
def expensive_function(n):
# expensive stuff
return something
p = expensive_function(n)
q = expensive_function(n)
assert p is q
Do note that if the result of expensive_function is not hashable (lists are a common example) there will not be a performance gain, it will still work, but act as if it isn't memoized.
Related
I am trying to modify an already defined class by changing an attribute's value. Importantly, I want this change to propagate internally.
For example, consider this class:
class Base:
x = 1
y = 2 * x
# Other attributes and methods might follow
assert Base.x == 1
assert Base.y == 2
I would like to change x to 2, making it equivalent to this.
class Base:
x = 2
y = 2 * x
assert Base.x == 2
assert Base.y == 4
But I would like to make it in the following way:
Base = injector(Base, x=2)
Is there a way to achieve this WITHOUT recompile the original class source code?
The effect you want to achieve belongs to the realm of "reactive programing" - a programing paradigm (from were the now ubiquitous Javascript library got its name as an inspiration).
While Python has a lot of mechanisms to allow that, one needs to write his code to actually make use of these mechanisms.
By default, plain Python code as the one you put in your example, uses the Imperative paradigm, which is eager: whenever an expression is encoutered, it is executed, and the result of that expression is used (in this case, the result is stored in the class attribute).
Python's advantages also can make it so that once you write a codebase that will allow some reactive code to take place, users of your codebase don't have to be aware of that, and things work more or less "magically".
But, as stated above, that is not free. For the case of being able to redefine y when x changes in
class Base:
x = 1
y = 2 * x
There are a couple paths that can be followed - the most important is that, at the time the "*" operator is executed (and that happens when Python is parsing the class body), at least one side of the operation is not a plain number anymore, but a special object which implements a custom __mul__ method (or __rmul__) in this case. Then, instead of storing a resulting number in y, the expression is stored somewhere, and when y is retrieved either as a class attribute, other mechanisms force the expression to resolve.
If you want this at instance level, rather than at class level, it would be easier to implement. But keep in mind that you'd have to define each operator on your special "source" class for primitive values.
Also, both this and the easier, instance descriptor approach using property are "lazily evaluated": that means, the value for y is calcualted when it is to be used (it can be cached if it will be used more than once). If you want to evaluate it whenever x is assigned (and not when y is consumed), that will require other mechanisms. Although caching the lazy approach can mitigate the need for eager evaluation to the point it should not be needed.
1 - Before digging there
Python's easiest way to do code like this is simply to write the expressions to be calculated as functions - and use the property built-in as a descriptor to retrieve these values. The drawback is small:
you just have to wrap your expressions in a function (and then, that function
in something that will add the descriptor properties to it, such as property). The gain is huge: you are free to use any Python code inside your expression, including function calls, object instantiation, I/O, and the like. (Note that the other approach requires wiring up each desired operator, just to get started).
The plain "101" approach to have what you want working for instances of Base is:
class Base:
x = 1
#property
def y(self):
return self.x * 2
b = Base()
b.y
-> 2
Base.x = 3
b.y
-> 6
The work of property can be rewritten so that retrieving y from the class, instead of an instance, achieves the effect as well (this is still easier than the other approach).
If this will work for you somehow, I'd recommend doing it. If you need to cache y's value until x actually changes, that can be done with normal coding
2 - Exactly what you asked for, with a metaclass
as stated above, Python'd need to know about the special status of your y attribute when calculcating its expression 2 * x. At assignment time, it would be already too late.
Fortunately Python 3 allow class bodies to run in a custom namespace for the attribute assignment by implementing the __prepare__ method in a metaclass, and then recording all that takes place, and replacing primitive attributes of interest by special crafted objects implementing __mul__ and other special methods.
Going this way could even allow values to be eagerly calculated, so they can work as plain Python objects, but register information so that a special injector function could recreate the class redoing all the attributes that depend on expressions. It could also implement lazy evaluation, somewhat as described above.
from collections import UserDict
import operator
class Reactive:
def __init__(self, value):
self._initial_value = value
self.values = {}
def __set_name__(self, owner, name):
self.name = name
self.values[owner] = self._initial_value
def __get__(self, instance, owner):
return self.values[owner]
def __set__(self, instance, value):
raise AttributeError("value can't be set directly - call 'injector' to change this value")
def value(self, cls=None):
return self.values.get(cls, self._initial_value)
op1 = value
#property
def result(self):
return self.value
# dynamically populate magic methods for operation overloading:
for name in "mul add sub truediv pow contains".split():
op = getattr(operator, name)
locals()[f"__{name}__"] = (lambda operator: (lambda self, other: ReactiveExpr(self, other, operator)))(op)
locals()[f"__r{name}__"] = (lambda operator: (lambda self, other: ReactiveExpr(other, self, operator)))(op)
class ReactiveExpr(Reactive):
def __init__(self, value, op2, operator):
self.op2 = op2
self.operator = operator
super().__init__(value)
def result(self, cls):
op1, op2 = self.op1(cls), self.op2
if isinstance(op1, Reactive):
op1 = op1.result(cls)
if isinstance(op2, Reactive):
op2 = op2.result(cls)
return self.operator(op1, op2)
def __get__(self, instance, owner):
return self.result(owner)
class AuxDict(UserDict):
def __init__(self, *args, _parent, **kwargs):
self.parent = _parent
super().__init__(*args, **kwargs)
def __setitem__(self, item, value):
if isinstance(value, self.parent.reacttypes) and not item.startswith("_"):
value = Reactive(value)
super().__setitem__(item, value)
class MetaReact(type):
reacttypes = (int, float, str, bytes, list, tuple, dict)
def __prepare__(*args, **kwargs):
return AuxDict(_parent=__class__)
def __new__(mcls, name, bases, ns, **kwargs):
pre_registry = {}
cls = super().__new__(mcls, name, bases, ns.data, **kwargs)
#for name, obj in ns.items():
#if isinstance(obj, ReactiveExpr):
#pre_registry[name] = obj
#setattr(cls, name, obj.result()
for name, reactive in pre_registry.items():
_registry[cls, name] = reactive
return cls
def injector(cls, inplace=False, **kwargs):
original = cls
if not inplace:
cls = type(cls.__name__, (cls.__bases__), dict(cls.__dict__))
for name, attr in cls.__dict__.items():
if isinstance(attr, Reactive):
if isinstance(attr, ReactiveExpr) and name in kwargs:
raise AttributeError("Expression attributes can't be modified by injector")
attr.values[cls] = kwargs.get(name, attr.values[original])
return cls
class Base(metaclass=MetaReact):
x = 1
y = 2 * x
And, after pasting the snippet above in a REPL, here is the
result of using injector:
In [97]: Base2 = injector(Base, x=5)
In [98]: Base2.y
Out[98]: 10
The idea is complicated with that aspect that Base class is declared with dependent dynamically evaluated attributes. While we can inspect class's static attributes, I think there's no other way of getting dynamic expression except for parsing the class's sourcecode, find and replace the "injected" attribute name with its value and exec/eval the definition again. But that's the way you wanted to avoid. (moreover: if you expected injector to be unified for all classes).
If you want to proceed to rely on dynamically evaluated attributes define the dependent attribute as a lambda function.
class Base:
x = 1
y = lambda: 2 * Base.x
Base.x = 2
print(Base.y()) # 4
Is there a convention on how to have both a method and a function that do the same thing (or whether to do this at all)?
Consider, for example,
from random import choice
from collections import Counter
class MyDie:
def __init__(self, smallest, largest, how_many_rolls):
self.min = smallest
self.max = largest
self.number_of_rolls = how_many_rolls
def __call__(self):
return choice( range(self.min, self.max+1) )
def count_values(self):
return Counter([self() for n in range(self.number_of_rolls)])
def count_values(randoms_func, number_of_values):
return Counter([randoms_func() for n in range(number_of_values)])
where count_values is both a method and a function.
I think it's nice to have the method because the result "belongs to" the MyDie object. Also, the method can pull attributes from the MyDie object without having to pass them to count_values. On the other hand, it's nice to have the function in order to operate on functions other than MyDie, like
count_values(lambda: choice([3,5]) + choice([7,9]), 7)
Is it best to do this as above (where the code is repeated; assume the function is a longer piece of code, not just one line) or replace the count_values method with
def count_values(self):
return count_values(self, number_of_rolls)
or just get rid of the method all together and just have a function? Or maybe something else?
Here is an alternative that still allows you to encapsulate the logic in MyDie. Create a class method in MyDie
#staticmethod
def count_specified_values(random_func, number_of_values):
return Counter([randoms_func() for n in range(number_of_values)])
You also could add additional formal parameters to the constructor with default values that you could override to achieve the same functionality.
I have a series of functions that I apply to each record in a dataset to generate a new field I store in a dictionary (the records—"documents"—are stored using MongoDB). I broke them all up as they are basically unrelated, and tie them back together by passing them as a list to a function that iterates through each operation for each record and adds on the results.
What irks me is how I'm going about it in what seems like a fairly inelegant manner; semi-duplicating names among other things.
def _midline_length(blob):
'''Generate a midline sequence for *blob*'''
return 42
midline_length = {
'func': _midline_length,
'key': 'calc_seq_midlen'} #: Midline sequence key/function pair.
Lots of these...
do_calcs = [midline_length, ] # all the functions ...
Then called like:
for record in mongo_collection.find():
for calc in do_calcs:
record[calc['key']] = calc['func'](record) # add new data to record
# update record in DB
Splitting up the keys like this makes it easier to remove all the calculated fields in the database (pointless after everything is set, but while developing the code and methodology it's handy).
I had the thought to maybe use classes, but it seems more like an abuse:
class midline_length(object):
key = 'calc_seq_midlen'
#staticmethod
def __call__(blob):
return 42
I could then make a list of instances (do_calcs = [midline_length(), ...]) and run through that calling each thing or pulling out it's key member. Alternatively, it seems like I can arbitrarily add members to functions, def myfunc(): then myfunc.key = 'mykey'...that seems even worse. Better ideas?
You might want to use decorators for this purpose.
import collections
RecordFunc = collections.namedtuple('RecordFunc', 'key func')
def record(key):
def wrapped(func):
return RecordFunc(key, func)
return wrapped
#record('midline_length')
def midline_length(blob):
return 42
Now, midline_length is not actually a function, but it is a RecordFunc object.
>>> midline_length
RecordFunc(key='midline_length', func=<function midline_length at 0x24b92f8>)
It has a func attribute, which is the original function, and a key attribute.
If they get added to the same dictionary, you can do it in the decorator:
RECORD_PARSERS = {}
def record(key):
def wrapped(func):
RECORD_PARSERS[key] = func
return func
return wrapped
#record('midline_length')
def midline_length(blob):
return 42
This is a perfect job for a decorator. Something like:
_CALC_FUNCTIONS = {}
def calcfunc(orig_func):
global _CALC_FUNCTIONS
# format the db name from the function name.
key = 'calc_%s' % orig_func.__name__
# note we're using a set so these will
_CALC_FUNCTIONS[key] = orig_func
return orig_func
#calcfunc
def _midline_length(blob):
return 42
print _CALC_FUNCTIONS
# prints {'calc__midline_length': <function _midline_length at 0x035F7BF0>}
# then your document update is as follows
for record in mongo_collection.find():
for key, func in _CALC_FUNCTIONS.iteritems():
record[key] = func(record)
# update in db
Note that you could also store the attributes on the function object itself like Dietrich pointed out but you'll probably still need to keep a global structure to keep the list of functions.
A function recursive_repr in reprlib module, introduced in Python 3.2, has the following source code:
def recursive_repr(fillvalue='...'):
'Decorator to make a repr function return fillvalue for a recursive call'
def decorating_function(user_function):
repr_running = set()
def wrapper(self):
key = id(self), get_ident()
if key in repr_running:
return fillvalue
repr_running.add(key)
try:
result = user_function(self)
finally:
repr_running.discard(key)
return result
# Can't use functools.wraps() here because of bootstrap issues
wrapper.__module__ = getattr(user_function, '__module__')
wrapper.__doc__ = getattr(user_function, '__doc__')
wrapper.__name__ = getattr(user_function, '__name__')
wrapper.__annotations__ = getattr(user_function, '__annotations__', {})
return wrapper
return decorating_function
The key identifying the specific __repr__ function is set to be (id(self), get_ident()).
Why self itself wasn't used as a key?
And why get_ident was needed?
Consider this code:
a = []
b = []
a.append(b)
b.append(a)
a == b
This causes a stack overflow. But the algorithm needs to handle this case safely. If you put self into the set, it'll compare using == and the algorithm will fail. So we use id(self) instead which doesn't attempt to check for equality of the objects. We only care if it is the exact same object.
As for get_indent, consider what happens in this algorithm if two threads try to use the code at the same time. The repr_running set is shared between all the threads. But if multiple thread start adding and removing elements from that set, there is no telling what will happen. get_ident() is unique to the thread running it, so by using that with the key we know all threads will use different keys and be ok.
I have created a class MyClassthat contains a lot of simulation data. The class groups simulation results for different simulations that have a similar structure. The results can be retreived with a MyClass.get(foo) method. It returns a dictionary with simulationID/array pairs, array being the value of foo for each simulation.
Now I want to implement a method in my class to apply any function to all the arrays for foo. It should return a dictionary with simulationID/function(foo) pairs.
For a function that does not need additional arguments, I found the following solution very satisfying (comments always welcome :-) ):
def apply(self, function, variable):
result={}
for k,v in self.get(variable).items():
result[k] = function(v)
return result
However, for a function requiring additional arguments I don't see how to do it in an elegant way. A typical operation would be the integration of foo with bar as x-values like np.trapz(foo, x=bar), where both foo and bar can be retreived with MyClass.get(...)
I was thinking in this direction:
def apply(self, function_call):
"""
function_call should be a string with the complete expression to evaluate
eg: MyClass.apply('np.trapz(QHeat, time)')
"""
result={}
for SID in self.simulations:
result[SID] = eval(function_call, locals=...)
return result
The problem is that I don't know how to pass the locals mapping object. Or maybe I'm looking in a wrong direction. Thanks on beforehand for your help.
Roel
You have two ways. The first is to use functools.partial:
foo = self.get('foo')
bar = self.get('bar')
callable = functools.partial(func, foo, x=bar)
self.apply(callable, variable)
while the second approach is to use the same technique used by partial, you can define a function that accept arbitrary argument list:
def apply(self, function, variable, *args, **kwds):
result={}
for k,v in self.get(variable).items():
result[k] = function(v, *args, **kwds)
return result
Note that in both case the function signature remains unchanged. I don't know which one I'll choose, maybe the first case but I don't know the context on you are working on.
I tried to recreate (the relevant part of) the class structure the way I am guessing it is set up on your side (it's always handy if you can provide a simplified code example for people to play/test).
What I think you are trying to do is translate variable names to variables that are obtained from within the class and then use those variables in a function that was passed in as well. In addition to that since each variable is actually a dictionary of values with a key (SID), you want the result to be a dictionary of results with the function applied to each of the arguments.
class test:
def get(self, name):
if name == "valA":
return {"1":"valA1", "2":"valA2", "3":"valA3"}
elif name == "valB":
return {"1":"valB1", "2":"valB2", "3":"valB3"}
def apply(self, function, **kwargs):
arg_dict = {fun_arg: self.get(sim_args) for fun_arg, sim_args in kwargs.items()}
result = {}
for SID in arg_dict[kwargs.keys()[0]]:
fun_kwargs = {fun_arg: sim_dict[SID] for fun_arg, sim_dict in arg_dict.items()}
result[SID] = function(**fun_kwargs)
return result
def joinstrings(string_a, string_b):
return string_a+string_b
my_test = test()
result = my_test.apply(joinstrings, string_a="valA", string_b="valB")
print result
So the apply method gets an argument dictionary, gets the class specific data for each of the arguments and creates a new argument dictionary with those (arg_dict).
The SID keys are obtained from this arg_dict and for each of those, a function result is calculated and added to the result dictionary.
The result is:
{'1': 'valA1valB1', '3': 'valA3valB3', '2': 'valA2valB2'}
The code can be altered in many ways, but I thought this would be the most readable. It is of course possible to join the dictionaries instead of using the SID's from the first element etc.