Organizing code for Testing with Constants from a Configuration File - python

My application reads in many constants from a configuration file. These constants are then used at various places throughout the program. Here is an example:
import ConfigParser
config = ConfigParser.SafeConfigParser()
config.read('my.conf')
DB_HOST = config.get('DatabaseInfo', 'address')
DB_PORT_NUMBER = config.getint('DatabaseInfo', 'port_number')
DB_NUMBER = config.getint('DatabaseInfo', 'db_number')
IN_SERVICE = config.get('ServerInfo', 'in_service')
IN_DATA = config.get('ServerInfo', 'in_data')
etc...
I then have functions defined throughout my program that use these constants. Here is an example:
def connect_to_db():
return get_connection(DB_HOST, DB_PORT_NUMBER, DB_NUMBER)
Sometimes, when I am testing or using the REPL, however, I don't want to use the values defined in the configuration file.
So I have instead defined the functions to except the constants as parameters:
def connect_to_db(db_host, db_port_number, db_number):
return get_connection(db_host, db_port_number, db_number)
And then when my program is run, the constants all passed in to my main which needs to call other functions which then call the functions and create classes (which all may be in different modules) that need the constants:
def main(db_host, db_port_number, db_number, in_service, in_data):
intermediate_function(
db_host, db_port_number, db_number, other, parameters, here
)
other_intermediate_function(in_service, in_data, more, params, here)
# etc...
def intermediate_function(db_host, db_port_number, db_number):
# Other processing...
c = connect_to_db(db_host, db_port_number, db_number)
# Continued...
if __name__ == '__main__':
main(DB_HOST, DB_PORT_NUMBER, DB_NUMBER, IN_SERVICE, IN_DATA)
The problem is that with too many constants, this quickly becomes unwieldy. And if I need to add another constant, I have several places to modify to ensure that my code doesn't break. This is a maintenance nightmare.
What is the proper Pythonic way of dealing with many different configuration constants, so that the code is still easy to modify and easy to test?

My idea, is really simple, use optional keyword arguments.
Here is a little example, what I'm talking about:
# collect, the constants to a dictionary,
# with the proper key names
d = dict(a=1, b=2, c=3)
# create functions, where you want to use the constants
# and use the same key-names as in dictionary, and also
# add '**kwargs' at the end of the attribute list
def func1(a, b, d=4, **kwargs):
print a, b, d
def func2(c, f=5, **kwargs):
print c, f
# now any time, you use the original dictionary
# as an argument for one of these functions, this
# will let the function select only those keywords
# that are used later
func1(**d)
# 1 2 4
func2(**d)
# 3 5
This idea allow you to modify the list of constants only at one place at the time, in you original dictionary.
So this is ported back to your idea:
This is your configuration.py:
# Your parsing, reading and storing functions goes here..
# Now create your dictionary
constants = dict(
host = DB_HOST,
pnum = DB_PORT_NUMBER,
num = DB_NUMBER,
serv = IN_SERVICE,
data = IN_DATA
)
Here is your other_file.py:
import configuration as cf
def main(host, pnum, num, serv, data, **kwargs):
intermediate_function(
host, pnum, num, 'other', 'params', 'here'
)
def intermediate_function(host, pnum, num, *args):
pass
# Now, you can call this function, with your
# dictionary as keyword arguments
if __name__ == '__main__':
main(**cf.constants)
Although this is working, I do not recommend this solution!
Your code is going to be harder to maintain, since every time you call one of those functions, where you pass your dictionary of constants, you will only see "one" argument: the dictionary itself, which is not so verbose. So I belive, you should think about a greater architecture of your code, where you use more deterministic functions (returning "real" values), and use them chained to each other, so you don't have to pass all those constants all the time. But this is my opinion:)
EDIT:
If my solution, mentioned above is suits you, I also have a better idea on how to store and parse your configuration file, and turn it automatically into a dictionary. Use JSON instead of simple .cfg file:
This will be your conf.json:
{
"host" : "DB_HOST",
"pnum" : "DB_PORT_NUMBER",
"num" : "DB_NUMBER",
"serv" : "IN_SERVICE",
"data" : "IN_DATA"
}
And your configuration.py will look like this:
import json
with open('conf.json') as conf:
# JSON parser will convert it to dictionary
constants = json.load(conf)

Related

Overriding function signature (in help) when using functools.wraps

I'm creating a wrapper for a function with functools.wraps. My wrapper has the effect of overriding a default parameter (and it doesn't do anything else):
def add(*, a=1, b=2):
"Add numbers"
return a + b
#functools.wraps(add)
def my_add(**kwargs):
kwargs.setdefault('b', 3)
return add(**kwargs)
This my_add definition behaves the same as
#functools.wraps(add)
def my_add(*, a=1, b=3):
return add(a=a, b=b)
except that I didn't have to manually type out the parameter list.
However, when I run help(my_add), I see the help string for add, which has the wrong function name and the wrong default argument for the parameter b:
add(*, a=1, b=2)
Add numbers
How can I override the function name and the default argument in this help() output?
(Or, is there a different way to define my_add, using for example some magic function my_add = magic(add, func_name='my_add', kwarg_defaults={'b': 3}) that will do what I want?)
Let me try and explain what happens.
When you call the help functions, this is going to request information about your function using the inspect module. Therefore you have to change the function signature, in order to change the default argument.
Now this is not something that is advised, or often preferred, but who cares about that right? The provided solution is considered hacky and probably won't work for all versions of Python. Therefore you might want to reconsider how important the help function is... Any way let's start with some explanation on how it was done, followed by the code and test case.
Copying functions
Now the first thing we will do is copy the entire function, this is because I only want to change the signature of the new function and not the original function. This decouples the new my_add signature (and default values) from the original add function.
See:
How to create a copy of a python function
How can I make a deepcopy of a function in Python?
For ideas of how to do this (I will show my version in a bit).
Copying / updating signature
The next step is to get a copy of the function signature, for that this post was very useful. Except for the part where we have to adjust the signature parameters to match the new keyword default arguments.
For that we have to change the value of a mappingproxy, which we can see when running the debugger on the return value of inspect.signature(g). Now so far this can only be done by changing the private variables (the values with leading underscores _private). Therefore this solution will be considered hacky and is not guaranteed to withstand possible updates. That said, let's see the solution!
Full code
import inspect
import types
import functools
def update_func(f, func_name='', update_kwargs: dict = None):
"""Based on http://stackoverflow.com/a/6528148/190597 (Glenn Maynard)"""
g = types.FunctionType(
code=f.__code__,
globals=f.__globals__.copy(),
name=f.__name__,
argdefs=f.__defaults__,
closure=f.__closure__
)
g = functools.update_wrapper(g, f)
g.__signature__ = inspect.signature(g)
g.__kwdefaults__ = f.__kwdefaults__.copy()
# Adjust your arguments
for key, value in (update_kwargs or {}).items():
g.__kwdefaults__[key] = value
g.__signature__.parameters[key]._default = value
g.__name__ = func_name or g.__name__
return g
def add(*, a=1, b=2):
"Add numbers"
return a + b
my_add = update_func(add, func_name="my_add", update_kwargs=dict(b=3))
Example
if __name__ == '__main__':
a = 2
print("*" * 50, f"\nMy add\n", )
help(my_add)
print("*" * 50, f"\nOriginal add\n", )
help(add)
print("*" * 50, f"\nResults:"
f"\n\tMy add : a = {a}, return = {my_add(a=a)}"
f"\n\tOriginal add: a = {a}, return = {add(a=a)}")
Output
**************************************************
My add
Help on function my_add in module __main__:
my_add(*, a=1, b=3)
Add numbers
**************************************************
Original add
Help on function add in module __main__:
add(*, a=1, b=2)
Add numbers
**************************************************
Results:
My add : a = 2, return = 5
Original add: a = 2, return = 4
Usages
f: is the function that you want to update
func_name: is optionally the new name of the function (if empty, keeps the old name)
update_kwargs: is a dictionary containing the key and value of the default arguments that you want to update.
Notes
The solution is using copy variables to make full copies of dictionaries, such that there is no impact on the original add function.
The _default value is a private variable, and can be changed in future releases of python.

Python - Passing functions to another function where the arguments may be modified

I've written what's effectively a parser for a large amount of sequential data chunks, and I need to write a number of functions to analyze the data chunks in various ways. The parser contains some useful functionality for me such as frequency of reading data into (previously-instantiated) objects, conditional filtering of the data, and when to stop reading the file.
I would like to write external analysis functions in separate modules, import the parser, and pass the analysis function into the parser to evaluate at the end of every data chunk read. In general, the analysis functions will require variables modified within the parser itself (i.e. the data chunk that was read), but it may need additional parameters from the module where it's defined.
Here's essentially what I would like to do for the parser:
def parse_chunk(dat_file, dat_obj1, dat_obj2, parse_arg1=None, fun=None, **fargs):
# Process optional arguments to parser...
with open(dat_file,'r') as dat:
# Parse chunk of dat_file based on parse_arg1 and store data in dat_obj1, dat_obj2, etc.
dat_obj1.attr = parsed_data
local_var1 = dat_obj1.some_method()
# Call analysis function passed to parser
if fun != None:
return fun(**fargs)
In another module, I would have something like:
from parsemod import parse_chunk
def main_script():
# Preprocess data from other files
dat_obj1 = ...
dat_obj2 = ...
script_var1 = ...
# Parse data and analyze
result = parse_chunk(dat_file, dat_obj1, dat_obj2, fun=eval_prop,
dat_obj1=None, local_var1=None, foo=script_var1)
def eval_data(dat_obj1, local_var1, foo):
# Analyze data
...
return result
I've looked at similar questions such as this and this, but the issue here is that eval_data() has arguments which are modified or set in parse(), and since **fargs provides a dictionary, the variable names themselves are not in the namespace for parse(), so they aren't modified prior to calling eval_data().
I've thought about modifying the parser to just return all variables after every chunk read and call eval_data() from main_script(), but there are too many different possible variables needed for the different eval_data() functional forms, so this gets very clunky.
Here's another simplified example that's even more general:
def my_eval(fun, **kwargs):
x = 6
z = 1
return fun(**kwargs)
def my_fun(x, y, z):
return x + y + z
my_eval(my_fun, x=3, y=5, z=None)
I would like the result of my_eval() to be 12, as x gets overwritten from 3 to 6 and z gets set to 1. I looked into functools.partial but it didn't seem to work either.
To override kwargs you need to do
kwargs['variable'] = value # instead of just variable = value
in your case, in my_eval you need to do
kwargs['x'] = 6
kwargs['z'] = 1

Python Class passing value to "self"

I'm programming an optimizer that has to run through several possible variations. The team wants to implement multithreading to get through those variants faster. This means I've had to put all my functions inside a thread-class. My problem is with my call of the wrapper function
class variant_thread(threading.Thread):
def __init__(self, name, variant, frequencies, fit_vals):
threading.Thread.__init__(self)
self.name = name
self.elementCount = variant
self.frequencies = frequencies
self.fit_vals = fit_vals
def run(self):
print("Running Variant:", self.elementCount) # display thread currently running
fitFunction = self.Wrapper_Function(self.elementCount)
self.popt, pcov, self.infoRes = curve_fit_my(fitFunction, self.frequencies, self.fit_vals)
def Optimize_Wrapper(self, frequencies, *params): # wrapper which returns values in manner which optimizer can work with
cut = int(len(frequencies)/2) <---- ERROR OCCURS HERE
freq = frequencies[:cut]
vals = (stuff happens here)
return (stuff in proper form for optimizer)
I've cut out as much as I could to simplify the example, and I hope you can understand what's going on. Essentially, after the thread is created it calls the optimizer. The optimizer sends the list of frequencies and the parameters it wants to change to the Optimize_Wrapper function.
The problem is that Optimize-Wrapper takes the frequencies-list and saves them to "self". This means that the "frequencies" variable becomes a single float value, as opposed to the list of floats it should be. Of course this throws an errorswhen I try to take len(frequencies). Keep in mind I also need to use self later in the function, so I can't just create a static method.
I've never had the problem that a class method saved any values to "self". I know it has to be declared explicitly in Python, but anything I've ever passed to the class method always skips "self" and saves to my declared variables. What's going on here?
Don't pass instance variables to methods. They are already accessible through self. And be careful about which variable is which. The first parameter to Wrapper_function is called "frequency", but you call it as self.Wrapper_Function(self.elementCount) - so you have a self.frequency and a frequency ... and they are different things. Very confusing!
class variant_thread(threading.Thread):
def __init__(self, name, variant, frequencies, fit_vals):
threading.Thread.__init__(self)
self.name = name
self.elementCount = variant
self.frequencies = frequencies
self.fit_vals = fit_vals
def run(self):
print("Running Variant:", self.elementCount) # display thread currently running
fitFunction = self.Wrapper_Function()
self.popt, pcov, self.infoRes = curve_fit_my(fitFunction, self.frequencies, self.fit_vals)
def Optimize_Wrapper(self): # wrapper which returns values in manner which optimizer can work with
cut = int(len(self.frequencies)/2) <---- ERROR OCCURS HERE
freq = self.frequencies[:cut]
vals = (stuff happens here)
return (stuff in proper form for optimizer)
You don't have to subclass Thread to run a thread. Its frequently easier to define a function and have Thread call that function. In your case, you may be able to put the variant processing in a function and use a thread pool to run them. This would save all the tedious handling of the thread object itself.
def run_variant(name, variant, frequencies, fit_vals):
cut = int(len(self.frequencies)/2) <---- ERROR OCCURS HERE
freq = self.frequencies[:cut]
vals = (stuff happens here)
proper_form = (stuff in proper form for optimizer)
return curve_fit_my(fitFunction, self.frequencies, self.fit_vals)
if __name__ == "__main__":
variants = (make the variants)
name = "name"
frequencies = (make the frequencies)
fit_vals = (make the fit_vals)
from multiprocessing.pool import ThreadPool
with ThreadPool() as pool:
for popt, pcov, infoRes in pool.starmap(run_variant,
((name, variant, frequencies, fit_vals) for variant in variants)):
# do the other work here

How can I cleanly associate a constant with a function?

I have a series of functions that I apply to each record in a dataset to generate a new field I store in a dictionary (the records—"documents"—are stored using MongoDB). I broke them all up as they are basically unrelated, and tie them back together by passing them as a list to a function that iterates through each operation for each record and adds on the results.
What irks me is how I'm going about it in what seems like a fairly inelegant manner; semi-duplicating names among other things.
def _midline_length(blob):
'''Generate a midline sequence for *blob*'''
return 42
midline_length = {
'func': _midline_length,
'key': 'calc_seq_midlen'} #: Midline sequence key/function pair.
Lots of these...
do_calcs = [midline_length, ] # all the functions ...
Then called like:
for record in mongo_collection.find():
for calc in do_calcs:
record[calc['key']] = calc['func'](record) # add new data to record
# update record in DB
Splitting up the keys like this makes it easier to remove all the calculated fields in the database (pointless after everything is set, but while developing the code and methodology it's handy).
I had the thought to maybe use classes, but it seems more like an abuse:
class midline_length(object):
key = 'calc_seq_midlen'
#staticmethod
def __call__(blob):
return 42
I could then make a list of instances (do_calcs = [midline_length(), ...]) and run through that calling each thing or pulling out it's key member. Alternatively, it seems like I can arbitrarily add members to functions, def myfunc(): then myfunc.key = 'mykey'...that seems even worse. Better ideas?
You might want to use decorators for this purpose.
import collections
RecordFunc = collections.namedtuple('RecordFunc', 'key func')
def record(key):
def wrapped(func):
return RecordFunc(key, func)
return wrapped
#record('midline_length')
def midline_length(blob):
return 42
Now, midline_length is not actually a function, but it is a RecordFunc object.
>>> midline_length
RecordFunc(key='midline_length', func=<function midline_length at 0x24b92f8>)
It has a func attribute, which is the original function, and a key attribute.
If they get added to the same dictionary, you can do it in the decorator:
RECORD_PARSERS = {}
def record(key):
def wrapped(func):
RECORD_PARSERS[key] = func
return func
return wrapped
#record('midline_length')
def midline_length(blob):
return 42
This is a perfect job for a decorator. Something like:
_CALC_FUNCTIONS = {}
def calcfunc(orig_func):
global _CALC_FUNCTIONS
# format the db name from the function name.
key = 'calc_%s' % orig_func.__name__
# note we're using a set so these will
_CALC_FUNCTIONS[key] = orig_func
return orig_func
#calcfunc
def _midline_length(blob):
return 42
print _CALC_FUNCTIONS
# prints {'calc__midline_length': <function _midline_length at 0x035F7BF0>}
# then your document update is as follows
for record in mongo_collection.find():
for key, func in _CALC_FUNCTIONS.iteritems():
record[key] = func(record)
# update in db
Note that you could also store the attributes on the function object itself like Dietrich pointed out but you'll probably still need to keep a global structure to keep the list of functions.

Is this a "pythonic" method of executing functions as a python switch statement for tuple values?

I have a situation where I have six possible situations which can relate to four different results. Instead of using an extended if/else statement, I was wondering if it would be more pythonic to use a dictionary to call the functions that I would call inside the if/else as a replacement for a "switch" statement, like one might use in C# or php.
My switch statement depends on two values which I'm using to build a tuple, which I'll in turn use as the key to the dictionary that will function as my "switch". I will be getting the values for the tuple from two other functions (database calls), which is why I have the example one() and zero() functions.
This is the code pattern I'm thinking of using which I stumbled on with playing around in the python shell:
def one():
#Simulated database value
return 1
def zero():
return 0
def run():
#Shows the correct function ran
print "RUN"
return 1
def walk():
print "WALK"
return 1
def main():
switch_dictionary = {}
#These are the values that I will want to use to decide
#which functions to use
switch_dictionary[(0,0)] = run
switch_dictionary[(1,1)] = walk
#These are the tuples that I will build from the database
zero_tuple = (zero(), zero())
one_tuple = (one(), one())
#These actually run the functions. In practice I will simply
#have the one tuple which is dependent on the database information
#to run the function that I defined before
switch_dictionary[zero_tuple]()
switch_dictionary[one_tuple]()
I don't have the actual code written or I would post it here, as I would like to know if this method is considered a python best practice. I'm still a python learner in university, and if this is a method that's a bad habit, then I would like to kick it now before I get out into the real world.
Note, the result of executing the code above is as expected, simply "RUN" and "WALK".
edit
For those of you who are interested, this is how the relevant code turned out. It's being used on a google app engine application. You should find the code is considerably tidier than my rough example pattern. It works much better than my prior convoluted if/else tree.
def GetAssignedAgent(self):
tPaypal = PaypalOrder() #Parent class for this function
tAgents = []
Switch = {}
#These are the different methods for the actions to take
Switch[(0,0)] = tPaypal.AssignNoAgent
Switch[(0,1)] = tPaypal.UseBackupAgents
Switch[(0,2)] = tPaypal.UseBackupAgents
Switch[(1,0)] = tPaypal.UseFullAgents
Switch[(1,1)] = tPaypal.UseFullAndBackupAgents
Switch[(1,2)] = tPaypal.UseFullAndBackupAgents
Switch[(2,0)] = tPaypal.UseFullAgents
Switch[(2,1)] = tPaypal.UseFullAgents
Switch[(2,2)] = tPaypal.UseFullAgents
#I'm only interested in the number up to 2, which is why
#I can consider the Switch dictionary to be all options available.
#The "state" is the current status of the customer agent system
tCurrentState = (tPaypal.GetNumberofAvailableAgents(),
tPaypal.GetNumberofBackupAgents())
tAgents = Switch[tCurrentState]()
Consider this idiom instead:
>>> def run():
... print 'run'
...
>>> def walk():
... print 'walk'
...
>>> def talk():
... print 'talk'
>>> switch={'run':run,'walk':walk,'talk':talk}
>>> switch['run']()
run
I think it is a little more readable than the direction you are heading.
edit
And this works as well:
>>> switch={0:run,1:walk}
>>> switch[0]()
run
>>> switch[max(0,1)]()
walk
You can even use this idiom for a switch / default type structure:
>>> default_value=1
>>> try:
... switch[49]()
... except KeyError:
... switch[default_value]()
Or (the less readable, more terse):
>>> switch[switch.get(49,default_value)]()
walk
edit 2
Same idiom, extended to your comment:
>>> def get_t1():
... return 0
...
>>> def get_t2():
... return 1
...
>>> switch={(get_t1(),get_t2()):run}
>>> switch
{(0, 1): <function run at 0x100492d70>}
Readability matters
It is a reasonably common python practice to dispatch to functions based on a dictionary or sequence lookup.
Given your use of indices for lookup, an list of lists would also work:
switch_list = [[run, None], [None, walk]]
...
switch_list[zero_tuple]()
What is considered most Pythonic is that which maximizes clarity while meeting other operational requirements. In your example, the lookup tuple doesn't appear to have intrinsic meaning, so the operational intent is being lost of a magic constant. Try to make sure the business logic doesn't get lost in your dispatch mechanism. Using meaningful names for the constants would likely help.

Categories

Resources