How can I cleanly associate a constant with a function? - python

I have a series of functions that I apply to each record in a dataset to generate a new field I store in a dictionary (the records—"documents"—are stored using MongoDB). I broke them all up as they are basically unrelated, and tie them back together by passing them as a list to a function that iterates through each operation for each record and adds on the results.
What irks me is how I'm going about it in what seems like a fairly inelegant manner; semi-duplicating names among other things.
def _midline_length(blob):
'''Generate a midline sequence for *blob*'''
return 42
midline_length = {
'func': _midline_length,
'key': 'calc_seq_midlen'} #: Midline sequence key/function pair.
Lots of these...
do_calcs = [midline_length, ] # all the functions ...
Then called like:
for record in mongo_collection.find():
for calc in do_calcs:
record[calc['key']] = calc['func'](record) # add new data to record
# update record in DB
Splitting up the keys like this makes it easier to remove all the calculated fields in the database (pointless after everything is set, but while developing the code and methodology it's handy).
I had the thought to maybe use classes, but it seems more like an abuse:
class midline_length(object):
key = 'calc_seq_midlen'
#staticmethod
def __call__(blob):
return 42
I could then make a list of instances (do_calcs = [midline_length(), ...]) and run through that calling each thing or pulling out it's key member. Alternatively, it seems like I can arbitrarily add members to functions, def myfunc(): then myfunc.key = 'mykey'...that seems even worse. Better ideas?

You might want to use decorators for this purpose.
import collections
RecordFunc = collections.namedtuple('RecordFunc', 'key func')
def record(key):
def wrapped(func):
return RecordFunc(key, func)
return wrapped
#record('midline_length')
def midline_length(blob):
return 42
Now, midline_length is not actually a function, but it is a RecordFunc object.
>>> midline_length
RecordFunc(key='midline_length', func=<function midline_length at 0x24b92f8>)
It has a func attribute, which is the original function, and a key attribute.
If they get added to the same dictionary, you can do it in the decorator:
RECORD_PARSERS = {}
def record(key):
def wrapped(func):
RECORD_PARSERS[key] = func
return func
return wrapped
#record('midline_length')
def midline_length(blob):
return 42

This is a perfect job for a decorator. Something like:
_CALC_FUNCTIONS = {}
def calcfunc(orig_func):
global _CALC_FUNCTIONS
# format the db name from the function name.
key = 'calc_%s' % orig_func.__name__
# note we're using a set so these will
_CALC_FUNCTIONS[key] = orig_func
return orig_func
#calcfunc
def _midline_length(blob):
return 42
print _CALC_FUNCTIONS
# prints {'calc__midline_length': <function _midline_length at 0x035F7BF0>}
# then your document update is as follows
for record in mongo_collection.find():
for key, func in _CALC_FUNCTIONS.iteritems():
record[key] = func(record)
# update in db
Note that you could also store the attributes on the function object itself like Dietrich pointed out but you'll probably still need to keep a global structure to keep the list of functions.

Related

Create Nothing from falsey values using Returns library

Using the Returns library, I have a function that filters a list. I want it to return Nothing if the list is empty (i.e. falsey) or Some([...]) if the list has values.
Maybe seems to be mostly focused on "true" nothing, being None. But I'm wondering if there's a way to get Nothing from a falsey value without doing something like
data = []
result = Some(data) if len(data) > 0 else Nothing
It looks like you have at least a few options. (1) You can create a new class that inherits from Maybe, and then override any methods you like, (2) create a simple function that returns Nothing is data is false, else returns Maybe.from_optional(data) {or whatever other method of Maybe you prefer), or (3) create your own container as per the returns documentation at https://returns.readthedocs.io/en/latest/pages/create-your-own-container.html.
Here is a class called Possibly, that inherits from Maybe and overrides the from_optional class method. You can add similar overrides for other methods following this pattern.
from typing import Optional
from returns.maybe import Maybe, _NewValueType, _Nothing, Some
class Possibly(Maybe):
def __init__(self):
super().__init__()
#classmethod
def from_optional(
cls, inner_value: Optional[_NewValueType],
) -> 'Maybe[_NewValueType]':
"""
Creates new instance of ``Maybe`` container based on an optional value.
"""
if not inner_value or inner_value is None:
return _Nothing(inner_value)
return Some(inner_value)
data = [1,2,3]
empty_data = []
print(Possibly.from_optional(data))
print(Possibly.from_optional(empty_data))
Here are two equivalent functions:
from returns.maybe import Maybe, _Nothing
data = [1,2,3]
empty_data = []
def my_from_optional(anything):
if not anything:
return _Nothing(anything)
else:
return Maybe.from_optional(anything)
def my_from_optional(anything):
return Maybe.from_optional(anything) if anything else _Nothing(anything)
print(my_from_optional(data))
print(my_from_optional(empty_data))

Alternative for non existing global dictionary

Consider
def f(x,*args):
intermediate = computationally_expensive_fct(x)
return do_stuff(intermediate,*args)
The problem: For the same x, this function might be called thousands of times with different arguments (other than x) and each time the function gets called intermediate would be computed (Cholesky factorisation, cost O(n^3)). In principle however it is enough if for each x, intermediate is computed only once for each x and then that result would be used again and again by f with different args.
My idea To remedy this, I tried to create a global dictionary where the function looks up whether for its parameter x the expensive stuff has already been done and stored in the dictionary or whether it has to compute it:
if all_intermediates not in globals():
global all_intermediates = {}
if all_intermediates.has_key(x):
pass
else:
global all_intermediates[x] = computationally_expensive_fct(x)
It turns out I can't do this because globals() is a dict itself and you can't hash dicts in python. I'm a novice programmer and would be happy if someone could point me towards a pythonic way to do what I want to achieve.
Solution
A bit more lightweight than writing a decorator and without accessing globals:
def f(x, *args):
if not hasattr(f, 'all_intermediates'):
f.all_intermediates = {}
if x not in f.all_intermediates:
f.all_intermediates[x] = computationally_expensive_fct(x)
intermediate = f.all_intermediates[x]
return do_stuff(intermediate,*args)
Variation
A variation that avoids the if not hasattr but needs to set all_intermediates as attribute of f after it is defined:
def f(x, *args):
if x not in f.all_intermediates:
f.all_intermediates[x] = computationally_expensive_fct(x)
intermediate = f.all_intermediates[x]
return do_stuff(intermediate,*args)
f.all_intermediates = {}
This caches all_intermediates as an attribute of the function itself.
Explanation
Functions are objects and can have attributes. Therefore, you can store the dictionary all_intermediates as an attribute of the function f. This makes the function self contained, meaning you can move it to another module without worrying about module globals. Using the variation shown above, you need move f.all_intermediates = {} along with function.
Putting things into globals() feels not right. I recommend against doing this.
I don't get it why you are trying to use globals(). Instead of using globals() you can simply keep computed values in your own module level dictionary and have a wrapper function that will look up whether intermediate is already calculated or not. Something like this:
computed_intermediate = {}
def get_intermediate(x):
if x not in computed_intermediate:
computed_intermediate[x] = computationally_expensive_fct(x)
return computed_intermediate[x]
def f(x,*args):
intermediate = get_intermediate(x)
return do_stuff(intermediate,*args)
In this way computationally_expensive_fct(x) will be calculated only once for each x, namely the first time it is accessed.
This is often implemented with the #memoized decorator on the expensive function.
It is described at https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize and brief enough to duplicate here in case of link rot:
import collections
import functools
class memoized(object):
'''Decorator. Caches a function's return value each time it is called.
If called later with the same arguments, the cached value is returned
(not reevaluated).
'''
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if not isinstance(args, collections.Hashable):
# uncacheable. a list, for instance.
# better to not cache than blow up.
return self.func(*args)
if args in self.cache:
return self.cache[args]
else:
value = self.func(*args)
self.cache[args] = value
return value
def __repr__(self):
'''Return the function's docstring.'''
return self.func.__doc__
def __get__(self, obj, objtype):
'''Support instance methods.'''
return functools.partial(self.__call__, obj)
Once the expensive function is memoized, using it is invisible:
#memoized
def expensive_function(n):
# expensive stuff
return something
p = expensive_function(n)
q = expensive_function(n)
assert p is q
Do note that if the result of expensive_function is not hashable (lists are a common example) there will not be a performance gain, it will still work, but act as if it isn't memoized.

Can I change function parameters when passing them as variables?

Excuse my poor wording in the title, but here's a longer explanation:
I have a function which as arguments takes some functions which are used to determine which data to retrieve from a database, as such:
def customer_data(customer_name, *args):
# initialize dictionary with ids
codata = dict([(data.__name__, []) for data in args])
codata['customer_observer_id'] = _customer_observer_ids(customer_name)
# add values to dictionary using function name as key
for data in args:
for coid in codata['customer_observer_id']:
codata[data.__name__].append(data(coid))
return codata
Which makes the call to the function looking something like this:
customer_data('customername', target_parts, source_group, ...)
One of these functions is defined with an extra parameter:
def polarization_value(customer_observer_id, timespan='day')
What I would like is a way to change the timespan variable in a clever way. One obvious way is to include a keyword argument in customer_observer and add an exception when the function name being called is 'polarization_value', but I have a feeling there is a better way to do this.
You can use functools.partial and pass polarization_value as :
functools.partial(polarization_value, timespan='day')
Example:
>>> import functools
def func(x, y=1):
print x, y
...
>>> new_func = functools.partial(func, y=20)
>>> new_func(100)
100 20
You may also find this helpful: Python: Why is functools.partial necessary?

how to make my own mapping type in python

I have created a class MyClassthat contains a lot of simulation data. The class groups simulation results for different simulations that have a similar structure. The results can be retreived with a MyClass.get(foo) method. It returns a dictionary with simulationID/array pairs, array being the value of foo for each simulation.
Now I want to implement a method in my class to apply any function to all the arrays for foo. It should return a dictionary with simulationID/function(foo) pairs.
For a function that does not need additional arguments, I found the following solution very satisfying (comments always welcome :-) ):
def apply(self, function, variable):
result={}
for k,v in self.get(variable).items():
result[k] = function(v)
return result
However, for a function requiring additional arguments I don't see how to do it in an elegant way. A typical operation would be the integration of foo with bar as x-values like np.trapz(foo, x=bar), where both foo and bar can be retreived with MyClass.get(...)
I was thinking in this direction:
def apply(self, function_call):
"""
function_call should be a string with the complete expression to evaluate
eg: MyClass.apply('np.trapz(QHeat, time)')
"""
result={}
for SID in self.simulations:
result[SID] = eval(function_call, locals=...)
return result
The problem is that I don't know how to pass the locals mapping object. Or maybe I'm looking in a wrong direction. Thanks on beforehand for your help.
Roel
You have two ways. The first is to use functools.partial:
foo = self.get('foo')
bar = self.get('bar')
callable = functools.partial(func, foo, x=bar)
self.apply(callable, variable)
while the second approach is to use the same technique used by partial, you can define a function that accept arbitrary argument list:
def apply(self, function, variable, *args, **kwds):
result={}
for k,v in self.get(variable).items():
result[k] = function(v, *args, **kwds)
return result
Note that in both case the function signature remains unchanged. I don't know which one I'll choose, maybe the first case but I don't know the context on you are working on.
I tried to recreate (the relevant part of) the class structure the way I am guessing it is set up on your side (it's always handy if you can provide a simplified code example for people to play/test).
What I think you are trying to do is translate variable names to variables that are obtained from within the class and then use those variables in a function that was passed in as well. In addition to that since each variable is actually a dictionary of values with a key (SID), you want the result to be a dictionary of results with the function applied to each of the arguments.
class test:
def get(self, name):
if name == "valA":
return {"1":"valA1", "2":"valA2", "3":"valA3"}
elif name == "valB":
return {"1":"valB1", "2":"valB2", "3":"valB3"}
def apply(self, function, **kwargs):
arg_dict = {fun_arg: self.get(sim_args) for fun_arg, sim_args in kwargs.items()}
result = {}
for SID in arg_dict[kwargs.keys()[0]]:
fun_kwargs = {fun_arg: sim_dict[SID] for fun_arg, sim_dict in arg_dict.items()}
result[SID] = function(**fun_kwargs)
return result
def joinstrings(string_a, string_b):
return string_a+string_b
my_test = test()
result = my_test.apply(joinstrings, string_a="valA", string_b="valB")
print result
So the apply method gets an argument dictionary, gets the class specific data for each of the arguments and creates a new argument dictionary with those (arg_dict).
The SID keys are obtained from this arg_dict and for each of those, a function result is calculated and added to the result dictionary.
The result is:
{'1': 'valA1valB1', '3': 'valA3valB3', '2': 'valA2valB2'}
The code can be altered in many ways, but I thought this would be the most readable. It is of course possible to join the dictionaries instead of using the SID's from the first element etc.

What do I do when I need a self referential dictionary?

I'm new to Python, and am sort of surprised I cannot do this.
dictionary = {
'a' : '123',
'b' : dictionary['a'] + '456'
}
I'm wondering what the Pythonic way to correctly do this in my script, because I feel like I'm not the only one that has tried to do this.
EDIT: Enough people were wondering what I'm doing with this, so here are more details for my use cases. Lets say I want to keep dictionary objects to hold file system paths. The paths are relative to other values in the dictionary. For example, this is what one of my dictionaries may look like.
dictionary = {
'user': 'sholsapp',
'home': '/home/' + dictionary['user']
}
It is important that at any point in time I may change dictionary['user'] and have all of the dictionaries values reflect the change. Again, this is an example of what I'm using it for, so I hope that it conveys my goal.
From my own research I think I will need to implement a class to do this.
No fear of creating new classes -
You can take advantage of Python's string formating capabilities
and simply do:
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item) % self
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/%(user)s',
'bin' : '%(home)s/bin'
})
print dictionary["home"]
print dictionary["bin"]
Nearest I came up without doing object:
dictionary = {
'user' : 'gnucom',
'home' : lambda:'/home/'+dictionary['user']
}
print dictionary['home']()
dictionary['user']='tony'
print dictionary['home']()
>>> dictionary = {
... 'a':'123'
... }
>>> dictionary['b'] = dictionary['a'] + '456'
>>> dictionary
{'a': '123', 'b': '123456'}
It works fine but when you're trying to use dictionary it hasn't been defined yet (because it has to evaluate that literal dictionary first).
But be careful because this assigns to the key of 'b' the value referenced by the key of 'a' at the time of assignment and is not going to do the lookup every time. If that is what you are looking for, it's possible but with more work.
What you're describing in your edit is how an INI config file works. Python does have a built in library called ConfigParser which should work for what you're describing.
This is an interesting problem. It seems like Greg has a good solution. But that's no fun ;)
jsbueno as a very elegant solution but that only applies to strings (as you requested).
The trick to a 'general' self referential dictionary is to use a surrogate object. It takes a few (understatement) lines of code to pull off, but the usage is about what you want:
S = SurrogateDict(AdditionSurrogateDictEntry)
d = S.resolve({'user': 'gnucom',
'home': '/home/' + S['user'],
'config': [S['home'] + '/.emacs', S['home'] + '/.bashrc']})
The code to make that happen is not nearly so short. It lives in three classes:
import abc
class SurrogateDictEntry(object):
__metaclass__ = abc.ABCMeta
def __init__(self, key):
"""record the key on the real dictionary that this will resolve to a
value for
"""
self.key = key
def resolve(self, d):
""" return the actual value"""
if hasattr(self, 'op'):
# any operation done on self will store it's name in self.op.
# if this is set, resolve it by calling the appropriate method
# now that we can get self.value out of d
self.value = d[self.key]
return getattr(self, self.op + 'resolve__')()
else:
return d[self.key]
#staticmethod
def make_op(opname):
"""A convience class. This will be the form of all op hooks for subclasses
The actual logic for the op is in __op__resolve__ (e.g. __add__resolve__)
"""
def op(self, other):
self.stored_value = other
self.op = opname
return self
op.__name__ = opname
return op
Next, comes the concrete class. simple enough.
class AdditionSurrogateDictEntry(SurrogateDictEntry):
__add__ = SurrogateDictEntry.make_op('__add__')
__radd__ = SurrogateDictEntry.make_op('__radd__')
def __add__resolve__(self):
return self.value + self.stored_value
def __radd__resolve__(self):
return self.stored_value + self.value
Here's the final class
class SurrogateDict(object):
def __init__(self, EntryClass):
self.EntryClass = EntryClass
def __getitem__(self, key):
"""record the key and return"""
return self.EntryClass(key)
#staticmethod
def resolve(d):
"""I eat generators resolve self references"""
stack = [d]
while stack:
cur = stack.pop()
# This just tries to set it to an appropriate iterable
it = xrange(len(cur)) if not hasattr(cur, 'keys') else cur.keys()
for key in it:
# sorry for being a duche. Just register your class with
# SurrogateDictEntry and you can pass whatever.
while isinstance(cur[key], SurrogateDictEntry):
cur[key] = cur[key].resolve(d)
# I'm just going to check for iter but you can add other
# checks here for items that we should loop over.
if hasattr(cur[key], '__iter__'):
stack.append(cur[key])
return d
In response to gnucoms's question about why I named the classes the way that I did.
The word surrogate is generally associated with standing in for something else so it seemed appropriate because that's what the SurrogateDict class does: an instance replaces the 'self' references in a dictionary literal. That being said, (other than just being straight up stupid sometimes) naming is probably one of the hardest things for me about coding. If you (or anyone else) can suggest a better name, I'm all ears.
I'll provide a brief explanation. Throughout S refers to an instance of SurrogateDict and d is the real dictionary.
A reference S[key] triggers S.__getitem__ and SurrogateDictEntry(key) to be placed in the d.
When S[key] = SurrogateDictEntry(key) is constructed, it stores key. This will be the key into d for the value that this entry of SurrogateDictEntry is acting as a surrogate for.
After S[key] is returned, it is either entered into the d, or has some operation(s) performed on it. If an operation is performed on it, it triggers the relative __op__ method which simple stores the value that the operation is performed on and the name of the operation and then returns itself. We can't actually resolve the operation because d hasn't been constructed yet.
After d is constructed, it is passed to S.resolve. This method loops through d finding any instances of SurrogateDictEntry and replacing them with the result of calling the resolve method on the instance.
The SurrogateDictEntry.resolve method receives the now constructed d as an argument and can use the value of key that it stored at construction time to get the value that it is acting as a surrogate for. If an operation was performed on it after creation, the op attribute will have been set with the name of the operation that was performed. If the class has a __op__ method, then it has a __op__resolve__ method with the actual logic that would normally be in the __op__ method. So now we have the logic (self.op__resolve) and all necessary values (self.value, self.stored_value) to finally get the real value of d[key]. So we return that which step 4 places in the dictionary.
finally the SurrogateDict.resolve method returns d with all references resolved.
That'a a rough sketch. If you have any more questions, feel free to ask.
If you, just like me wandering how to make #jsbueno snippet work with {} style substitutions, below is the example code (which is probably not much efficient though):
import string
class MyDict(dict):
def __init__(self, *args, **kw):
super(MyDict,self).__init__(*args, **kw)
self.itemlist = super(MyDict,self).keys()
self.fmt = string.Formatter()
def __getitem__(self, item):
return self.fmt.vformat(dict.__getitem__(self, item), {}, self)
xs = MyDict({
'user' : 'gnucom',
'home' : '/home/{user}',
'bin' : '{home}/bin'
})
>>> xs["home"]
'/home/gnucom'
>>> xs["bin"]
'/home/gnucom/bin'
I tried to make it work with the simple replacement of % self with .format(**self) but it turns out it wouldn't work for nested expressions (like 'bin' in above listing, which references 'home', which has it's own reference to 'user') because of the evaluation order (** expansion is done before actual format call and it's not delayed like in original % version).
Write a class, maybe something with properties:
class PathInfo(object):
def __init__(self, user):
self.user = user
#property
def home(self):
return '/home/' + self.user
p = PathInfo('thc')
print p.home # /home/thc
As sort of an extended version of #Tony's answer, you could build a dictionary subclass that calls its values if they are callables:
class CallingDict(dict):
"""Returns the result rather than the value of referenced callables.
>>> cd = CallingDict({1: "One", 2: "Two", 'fsh': "Fish",
... "rhyme": lambda d: ' '.join((d[1], d['fsh'],
... d[2], d['fsh']))})
>>> cd["rhyme"]
'One Fish Two Fish'
>>> cd[1] = 'Red'
>>> cd[2] = 'Blue'
>>> cd["rhyme"]
'Red Fish Blue Fish'
"""
def __getitem__(self, item):
it = super(CallingDict, self).__getitem__(item)
if callable(it):
return it(self)
else:
return it
Of course this would only be usable if you're not actually going to store callables as values. If you need to be able to do that, you could wrap the lambda declaration in a function that adds some attribute to the resulting lambda, and check for it in CallingDict.__getitem__, but at that point it's getting complex, and long-winded, enough that it might just be easier to use a class for your data in the first place.
This is very easy in a lazily evaluated language (haskell).
Since Python is strictly evaluated, we can do a little trick to turn things lazy:
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
d1 = lambda self: lambda: {
'a': lambda: 3,
'b': lambda: self()['a']()
}
# fix the d1, and evaluate it
d2 = Y(d1)()
# to get a
d2['a']() # 3
# to get b
d2['b']() # 3
Syntax wise this is not very nice. That's because of us needing to explicitly construct lazy expressions with lambda: ... and explicitly evaluate lazy expression with ...(). It's the opposite problem in lazy languages needing strictness annotations, here in Python we end up needing lazy annotations.
I think with some more meta-programmming and some more tricks, the above could be made more easy to use.
Note that this is basically how let-rec works in some functional languages.
The jsbueno answer in Python 3 :
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item).format(self)
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/{0[user]}',
'bin' : '{0[home]}/bin'
})
print(dictionary["home"])
print(dictionary["bin"])
Her ewe use the python 3 string formatting with curly braces {} and the .format() method.
Documentation : https://docs.python.org/3/library/string.html

Categories

Resources