Storing configuration defaults in a Python docstring?

Storing configuration defaults in a Python docstring? - python

I have a Python app with many modules. There's a YAML config file which contains configuration settings for each module. In each module's init(), I load the config and then process it into a config dictionary for that module.
The processing is ugly right now. Dozens and dozens of lines at the beginning of each module with a lot of stuff like:
if 'foo' not in config:
config['foo'] = bar
or
if 'foo' in config:
config['foo'] = string_to_list(config['foo'])
else:
config['foo'] = list()
etc.
So now I want to write a centralized config processing method that each module can use. I'm thinking that I want to use a YAML-formatted string to specify what the final config should look like. (Which settings are mandatory, default values, object types, etc.)
I'm thinking the config specification in each module could be something like this: (with the type|default|required values)
config_template = """MySection:
setting1: int|0
setting2: string|None|required
setting3: int|10"""
So far, so good. My real question is whether there's any way for me to save this config specification thing in each module in a docstring for init() or something. I'm thinking that since I'm essentially defining exactly what the module expects for its config, along with the defaults and the types, if I could store this config thing in the docstring then I ought to be able to write something and/or configure sphinx to pretty it up for the documentation?
So I wonder if this approach sounds sane in general, and, if so, is there a way I can store this config info in the docstring to get it to work as double duty?
EDIT: I considered setting up a dictionary with default values first, and certainly that will take care of the defaults. But there are a lot of cases where I need the values to be of a certain type. For example, maybe some have to be a list, but if a single item is entered into the config file it will be read as a string, so I need the config processor to convert the string to a list with that string as its only item.
EDIT 2: The reason I was asking about the docstring is because I was thinking that my config specification would essentially tell a programmer what this modules was expecting in terms of a config dictionary, so if I can do it in the docstring then I can just specify it once and have the module use it for its config as well as have it appear in the sphinx documentation. But if that's not possible, then so be it. I can use it in a variable (like in my example above) and then write the docstring manually.
Thanks!
Brian

Related

Tuneables/Configurable for a module

I am writing a module in python which has many functions to be used in a variety of situations, resulting in changing default values over time. (Currently this is a .py file I am importing) Many of the functions have the same hard-coded defaults. I would like to make these defaults configureable, ideally through some functionality that can be accessed via a jupyter notebook that has already imported the module or at the very least that can be set at the time of import or via a config file.
I know how to do this using other languages but have been struggling to do the same in python. I do not want defaults to be hardcoded. I know that some of the difficulty with this is that the module is only imported once, meaning variables inside the module are no longer accessable after the import is complete. If there is a way of looking at this more pythonically, I would accept an answer that explains why my desired solution is non-pythonic and what a good alternative would be.
For example here is what the functions would look like:
function1(arg1 = default_param1):
return arg1
function2(arg1 = default_param1, arg2 = default_param3):
other cool stuff
Here is what I would like to be able to do something similar to this:
import foo_module.py as foo
foo.function1()
foo.default_param1 = new_value
foo.function1()
==> arg1
==> new_value
Of course, with this setup you can always change the value input every time you call the function, but this is less than ideal.
In this case how would I change default_param1 accross the entire module via the code that is importing the module?
Edit: to clarify, this module would not be accessed via the command line. A primary use case is to import it into a jupyter notebook.

You could use environment variables such that, upon being imported, your module reads these variables and adjusts the defaults accordingly.
You could set the environment variables ahead of time using os.environ. So,
import os
os.environ['BLAH'] = '5'
import my_module
Inside my_module.py, you'd have something like
import os
try:
BLAH_DEFAULT = int(os.environ['BLAH'])
except:
BLAH_DEFAULT = 3
If you'd rather not fiddle with environment variables and you're okay with the defaults being mutable after importation, my_module.py could store the defaults in a global dict. E.g.
defaults = {
'BLAH': 3,
'FOO': 'bar',
'BAZ': True
}
Your user could update that dictionary manually (my_module.defaults['BAZ']=False) or, if that bothers you, you could hide the mechanics in a function:
def update_default(key,value):
if key not in defaults:
raise ValueError('{} is not a valid default parameter.'.format(key))
defaults[key]=value
You could spiff up that function by doing type/range checks on the passed value.
However, keep in mind that, unlike in languages like C++ and Java, nothing in Python is truly hidden. A user would be able to directly reference my_module.defaults thus bypassing your function.

Efficient way to pass system properties in python 3

I'm using python 3 for an application. For that utility, I need to pass command line arguments as following,
python3 -m com.xxx.executor -Denvironment=dev -Dtoggle=False
Both the parameter environment and toggle are present in a property file too. If the value is specified in command line, it should override what is present on property file.
I'm basically a java guy and in java, the properties passed in the form -Dkey=value will be set as system property. Then these properties can be read from code as System.getProperty(key, defaultVal).
But when I try the same in python 3, it didn't work.
After referring python docs, it seems to me like the sys._xoptions are suitable for my requirement.
python3 -Xenvironment=dev -Xtoggle=False -m com.xxx.executor
Then read the properties using, sys._xoptions
I'm using Cpython. The aim of the thread is to ensure that, the way I'm proceeding is right or not. Or is there any other better ways in python to implement the same.
Python veterans, please guide !

For argument parsing, I use the argparse module (docs) to define which are valid named and/or positional arguments.
There are third-party modules as well such as click and docopt. You should use what you feel most comfortable with and whether or not you can use third-party modules. The click documentation contains a (possibly biased) comparison between it, argparse and docopt.
I've never used sys._xoptions, nor did I know of its existence. Seems a bit strange that a function starting with an underscore (normally used to indicate a "private" function) is mentioned in the docs. Perhaps someone else can shed some light on this.
For the parsing of a property file, I use the configparser module (docs). Of course, you could opt for a JSON or YAML config file if you'd prefer that.
That said, you'll have to come up with the necessary code to overrule properties when specified as arguments (though that shouldn't be too difficult).

From the docs on -X args
Reserved for various implementation-specific options. CPython currently defines the following possible values:
That means you probably shouldn't be hijacking these for your own purposes. As Kristof mentioned, argparse is a pretty reasonable choice. Since you want both a file and command line arguments, here's a quick example using a json file-based config:
import json
import argparse
argparser = argparse.ArgumentParser()
argparser.add_argument('--environment')
argparser.add_argument('--toggle', action='store_true')
try:
with open('config.json') as f:
args = json.load(f)
except (IOError, ValueError) as e:
# If the file doesn't exist or has invalid JSON
args = {}
args.update(vars(argparser.parse_args()))
print(args)
There are other possible alternatives for the file-based config, like the configparser module.

How does one create an extension loader in Python?

I want to improve a Python framework that I'm writing by having it enumerate and load modules from some specified folder, at runtime, based on certain properties that the modules may have.
Probably properties such as: only modules that contain a certain value (like a tag) in some metadata field, or perhaps only modules that contain a class that derives from a certain base class.
For example, let's say that the extensions are plug-ins that support different types of authentication--I'd like my framework to discover the possible plug-ins at run-time without requiring explicit configuration.
It seems like this sort of "extension loading" should be possible, and has probably been done a zillion times before, none of the search queries that have thought to try are turning anything up, and I don't know a good specific project that already implements this to start by reading someone else's approach.
Any pointers on approaches that would work to build such a thing (or even advice on a more Pythonic way to think about this problem) would be great.

(A good answer for this would give an overview and options, so don't rush to accept my quick answer.)
I do this with one of my projects to load classes all the modules in a package without using import * and hardcoding names. The code is viewable in context on Google Code.
SPECIES = []
'''An automatically generated list of the available species types.'''
def _do_import():
'''Automatically populates SPECIES with all the modules in this
folder.
:Note:
Written as a function to prevent local variables from being
imported.
'''
import os
for _, _, files in os.walk(__path__[0]):
for filename in (file for file in files if file[0] != '_' and file[-3:] == '.py'):
modname = filename[:filename.find('.')]
mod = __import__(modname, globals(), fromlist=[])
for cls in (getattr(mod, s) for s in dir(mod)):
if cls is not Species and type(cls) is type and issubclass(cls, Species):
if getattr(cls, '_include_automatically', True):
SPECIES.append(cls)
globals()[cls.__name__] = cls
_do_import()

Get "flat" member output for sphinx automodule

I'm using the Sphinx autodoc extension to document a module, and I'd like to get a flat list of the module's members in the documentation output.
I tried using the following:
.. automodule:: modname
:members:
However, there are two problems with this:
It includes the module's docstring, which I don't want here.
The name of each entry is prefixed with "modname.", which is completely redundant (since this page is specifically for documenting this module)
However, I haven't been able to find any config options that would let me selectively disable these two aspects while still getting the automatic listing of all of the module members.
My current plan is to just use autofunction (etc) and explicitly enumerate the members to be documented, but I'd still like to know if I missed an easy way to achieve what I originally wanted.
Update: I at least found a workaround for the second part: set add_module_names=False in conf.py. That's a global setting though, so it doesn't really answer my original question.

Looking at this answer to a similar question, I've found that you can use the autodoc-process-docstring event to remove the docstrings from modules appending the following code to your conf.py:
def skip_modules_docstring(app, what, name, obj, options, lines):
if what == 'module':
del lines[:]
def setup(app):
app.connect('autodoc-process-docstring', skip_modules_docstring)
Note that the del statement is needed because, according to the documentation, the modification to lines must happend in place (it you create a new object, it doesn't work).
Finally, you can also use name to filter the docstrings of just a few modules while keeping the ones from others.

How is a Python project set up?

I am doing some heavy commandline stuff (not really web based) and am new to Python, so I was wondering how to set up my files/folders/etc. Are there "header" files where I can keep all the DB connection stuff?
How/where do I define classes and objects?

Just to give you an example of a typical Python module's source, here's something with some explanation. This is a file named "Dims.py". This is not the whole file, just some parts to give an idea what's going on.
#!/usr/bin/env python
This is the standard first line telling the shell how to execute this file. Saying /usr/bin/env python instead of /usr/bin/python tells the shell to find Python via the user's PATH; the desired Python may well be in ~/bin or /usr/local/bin.
"""Library for dealing with lengths and locations."""
If the first thing in the file is a string, it is the docstring for the module. A docstring is a string that appears immediately after the start of an item, which can be accessed via its __doc__ property. In this case, since it is the module's docstring, if a user imports this file with import Dims, then Dims.__doc__ will return this string.
# Units
MM_BASIC = 1500000
MILS_BASIC = 38100
IN_BASIC = MILS_BASIC * 1000
There are a lot of good guidelines for formatting and naming conventions in a document known as PEP (Python Enhancement Proposal) 8. These are module-level variables (constants, really) so they are written in all caps with underscores. No, I don't follow all the rules; old habits die hard. Since you're starting fresh, follow PEP 8 unless you can't.
_SCALING = 1
_SCALES = {
mm_basic: MM_BASIC,
"mm": MM_BASIC,
mils_basic: MILS_BASIC,
"mil": MILS_BASIC,
"mils": MILS_BASIC,
"basic": 1,
1: 1
}
These module-level variables have leading underscores in their names. This gives them a limited amount of "privacy", in that import Dims will not let you access Dims._SCALING. However, if you need to mess with it, you can explicitly say something like import Dims._SCALING as scaling.
def UnitsToScale(units=None):
"""Scales the given units to the current scaling."""
if units is None:
return _SCALING
elif units not in _SCALES:
raise ValueError("unrecognized units: '%s'." % units)
return _SCALES[units]
UnitsToScale is a module-level function. Note the docstring and the use of default values and exceptions. No spaces around the = in default value declarations.
class Length(object):
"""A length. Makes unit conversions easier.
The basic, mm, and mils properties can be used to get or set the length
in the desired units.
>>> x = Length(mils=1000)
>>> x.mils
1000.0
>>> x.mm
25.399999999999999
>>> x.basic
38100000L
>>> x.mils = 100
>>> x.mm
2.54
"""
The class declaration. Note the docstring has things in it that look like Python command line commands. These care called doctests, in that they are test code in the docstring. More on this later.
def __init__(self, unscaled=0, basic=None, mm=None, mils=None, units=None):
"""Constructs a Length.
Default contructor creates a length of 0.
>>> Length()
Length(basic=0)
Length(<float>) or Length(<string>) creates a length with the given
value at the current scale factor.
>>> Length(1500)
Length(basic=1500)
>>> Length("1500")
Length(basic=1500)
"""
# Straight copy
if isinstance(unscaled, Length):
self._x = unscaled._x
return
# rest omitted
This is the initializer. Unlike C++, you only get one, but you can use default arguments to make it look like several different constructors are available.
def _GetBasic(self): return self._x
def _SetBasic(self, x): self._x = x
basic = property(_GetBasic, _SetBasic, doc="""
This returns the length in basic units.""")
This is a property. It allows you to have getter/setter functions while using the same syntax as you would for accessing any other data member, in this case, myLength.basic = 10 does the same thing as myLength._SetBasic(10). Because you can do this, you should not write getter/setter functions for your data members by default. Just operate directly on the data members. If you need to have getter/setter functions later, you can convert the data member to a property and your module's users won't need to change their code. Note that the docstring is on the property, not the getter/setter functions.
If you have a property that is read-only, you can use property as a decorator to declare it. For example, if the above property was to be read-only, I would write:
#property
def basic(self):
"""This returns the length in basic units."""
return self._x
Note that the name of the property is the name of the getter method. You can also use decorators to declare setter methods in Python 2.6 or later.
def __mul__(self, other):
"""Multiplies a Length by a scalar.
>>> Length(10)*10
Length(basic=100)
>>> 10*Length(10)
Length(basic=100)
"""
if type(other) not in _NumericTypes:
return NotImplemented
return Length(basic=self._x * other)
This overrides the * operator. Note that you can return the special value NotImplemented to tell Python that this operation isn't implemented (in this case, if you try to multiply by a non-numeric type like a string).
__rmul__ = __mul__
Since code is just a value like anything else, you can assign the code of one method to another. This line tells Python that the something * Length operation uses the same code as Length * something. Don't Repeat Yourself.
Now that the class is declared, I can get back to module code. In this case, I have some code that I want to run only if this file is executed by itself, not if it's imported as a module. So I use the following test:
if __name__ == "__main__":
Then the code in the if is executed only if this is being run directly. In this file, I have the code:
import doctest
doctest.testmod()
This goes through all the docstrings in the module and looks for lines that look like Python prompts with commands after them. The lines following are assumed to be the output of the command. If the commands output something else, the test is considered to have failed and the actual output is printed. Read the doctest module documentation for all the details.
One final note about doctests: They're useful, but they're not the most versatile or thorough tests available. For those, you'll want to read up on unittests (the unittest module).

Each python source file is a module. There are no "header" files. The basic idea is that when you import "foo" it'll load the code from "foo.py" (or a previously compiled version of it). You can then access the stuff from the foo module by saying foo.whatever.
There seem to be two ways for arranging things in Python code. Some projects use a flat layout, where all of the modules are at the top-level. Others use a hierarchy. You can import foo/bar/baz.py by importing "foo.bar.baz". The big gotcha with hierarchical layout is to have __init__.py in the appropriate directories (it can even be empty, but it should exist).
Classes are defined like this:
class MyClass(object):
def __init__(self, x):
self.x = x
def printX(self):
print self.x
To create an instance:
z = MyObject(5)

You can organize it in whatever way makes the most sense for your application. I don't exactly know what you're doing so I can't be certain what the best organization would be for you, but you can pretty much split it up as you see fit and just import what you need.
You can define classes in any file, and you can define as many classes as you would like in a script (unlike Java). There are no official header files (not like C or C++), but you can use config files to store info about connecting to a DB, whatever, and use configparser (a standard library function) to organize them.
It makes sense to keep like things in the same file, so if you have a GUI, you might have one file for the interface, and if you have a CLI, you might keep that in a file by itself. It's less important how your files are organized and more important how the source is organized into classes and functions.

This would be the place to look for that: http://docs.python.org/reference/.
First of all, compile and install pip: http://pypi.python.org/pypi/pip. It is like Ubuntu's apt-get. You run it via a Terminal by typing in pip install package-name. It has a database of packages, so you can install/uninstall stuff quite easily with it.
As for importing and "header" files, from what I can tell, if you run import foo, Python looks for foo.py in the current folder. If it's not there, it looks for eggs (folders unzipped in the Python module directory) and imports those.
As for defining classes and objects, here's a basic example:
class foo(foobar2): # I am extending a class, in this case 'foobar2'. I take no arguments.
__init__(self, the, list, of, args = True): # Instead, the arguments get passed to me. This still lets you define a 'foo()' objects with three arguments, only you let '__init__' take them.
self.var = 'foo'
def bar(self, args):
self.var = 'bar'
def foobar(self): # Even if you don't need arguments, never leave out the self argument. It's required for classes.
print self.var
foobar = foo('the', 'class', 'args') # This is how you initialize me!
Read more on this in the Python Reference, but my only tip is to never forget the self argument in class functions. It will save you a lot of debugging headaches...
Good luck!

There's no some fixed structure for Python programs, but you can take Django project as an example. Django project consists of one settings.py module, where global settings (like your example with DB connection properties) are stored and pluggable applications. Each application has it's own models.py module, which stores database models and, possibly, other domain specific objects. All the rest is up to you.
Note, that these advices are not specific to Python. In C/C++ you probably used similar structure and kept settings in XML. Just forget about headers and put settings in plain in .py file, that's all.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.