How to use compile_commands.json with clang python bindings? - python

I have the following script that attempts to print out all the AST nodes in a given C++ file. This works fine when using it on a simple file with trivial includes (header file in the same directory, etc).
#!/usr/bin/env python
from argparse import ArgumentParser, FileType
from clang import cindex
def node_info(node):
return {'kind': node.kind,
'usr': node.get_usr(),
'spelling': node.spelling,
'location': node.location,
'file': node.location.file.name,
'extent.start': node.extent.start,
'extent.end': node.extent.end,
'is_definition': node.is_definition()
}
def get_nodes_in_file(node, filename, ls=None):
ls = ls if ls is not None else []
for n in node.get_children():
if n.location.file is not None and n.location.file.name == filename:
ls.append(n)
get_nodes_in_file(n, filename, ls)
return ls
def main():
arg_parser = ArgumentParser()
arg_parser.add_argument('source_file', type=FileType('r+'),
help='C++ source file to parse.')
arg_parser.add_argument('compilation_database', type=FileType('r+'),
help='The compile_commands.json to use to parse the source file.')
args = arg_parser.parse_args()
compilation_database_path = args.compilation_database.name
source_file_path = args.source_file.name
clang_args = ['-x', 'c++', '-std=c++11', '-p', compilation_database_path]
index = cindex.Index.create()
translation_unit = index.parse(source_file_path, clang_args)
file_nodes = get_nodes_in_file(translation_unit.cursor, source_file_path)
print [p.spelling for p in file_nodes]
if __name__ == '__main__':
main()
However, I get a clang.cindex.TranslationUnitLoadError: Error parsing translation unit. when I run the script and provide a valid C++ file that has a compile_commands.json file in its parent directory. This code runs and builds fine using CMake with clang, but I can't seem to figure out how to pass the argument for pointing to the compile_commands.json correctly.
I also had difficulty finding this option in the clang documentation and could not get -ast-dump to work. However, clang-check works fine by just passing the file path!

Your own accepted answer is incorrect. libclang does support compilation databases and so does cindex.py, the libclang python binding.
The main source of confusion might be that the compilation flags that libclang knows/uses are only a subset of all arguments that can be passed to the clang frontend. The compilation database is supported but does not work automatically: it must be loaded and queried manually. Something like this should work:
#!/usr/bin/env python
from argparse import ArgumentParser, FileType
from clang import cindex
compilation_database_path = args.compilation_database.name
source_file_path = args.source_file.name
index = cindex.Index.create()
# Step 1: load the compilation database
compdb = cindex.CompilationDatabase.fromDirectory(compilation_database_path)
# Step 2: query compilation flags
try:
file_args = compdb.getCompileCommands(source_file_path)
translation_unit = index.parse(source_file_path, file_args)
file_nodes = get_nodes_in_file(translation_unit.cursor, source_file_path)
print [p.spelling for p in file_nodes]
except CompilationDatabaseError:
print 'Could not load compilation flags for', source_file_path

From what I can tell Libclang does not support the compilation database but Libtooling does. To get around this I took the path to the compile_commands.json as an argument and ended up parsing it myself to find the file of interest and the relevant includes (the -I and -isystem includes).

The accepted answer seems to be deprecated, at minimum it did not work for me, I had to do this:
import clang.cindex
def main():
index = clang.cindex.Index.create()
compdb = clang.cindex.CompilationDatabase.fromDirectory(
"dir/")
source_file_path = 'path/to/file.cpp'
commands = compdb.getCompileCommands(source_file_path)
file_args = []
for command in commands:
for argument in command.arguments:
file_args.append(argument)
file_args = file_args[3:-3]
print(file_args)
translation_unit = index.parse(source_file_path, args=file_args)
comment_tokens = GetDoxygenCommentTokens(translation_unit)
if __name__ == "__main__":
main()
Basically I had to iterate over the commands and the arguments to create a string, and then eliminate some g++ specific flags.

Related

Handling argparse conflicts

If I import a Python module that is already using argparse, however, I would like to use argparse in my script as well ...how should I go about doing this?
I'm receiving a unrecognized arguments error when using the following code and invoking the script with a -t flag:
Snippet:
#!/usr/bin/env python
....
import conflicting_module
import argparse
...
#################################
# Step 0: Configure settings... #
#################################
parser = argparse.ArgumentParser(description='Process command line options.')
parser.add_argument('--test', '-t')
Error:
unrecognized arguments: -t foobar
You need to guard your imported modules with
if __name__ == '__main__':
...
against it running initialization code such as argument parsing on import. See What does if __name__ == "__main__": do?.
So, in your conflicting_module do
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Process command line options in conflicting_module.py.')
parser.add_argument('--conflicting', '-c')
...
instead of just creating the parser globally.
If the parsing in conflicting_module is a mandatory part of application configuration, consider using
args, rest = parser.parse_known_args()
in your main module and passing rest to conflicting_module, where you'd pass either None or rest to parse_args:
args = parser.parse_args(rest)
That is still a bit bad style and actually the classes and functions in conflicting_module would ideally receive parsed configuration arguments from your main module, which would be responsible for parsing them.

Automate compilation of protobuf specs into python classes in setup.py

I have a python project that uses google protobufs as a message format for communicating over the network. Generating python files from the .proto files is straight-forward using the protoc program. How can I configure my setup.py file for the project so that it automatically calls the protoc command?
In a similar situation, I ended up with this code (setup.py, but written in a way to allow extraction into some external Python module for reuse). Note that I took the generate_proto function and several ideas from the setup.py file of the protobuf source distribution.
from __future__ import print_function
import os
import shutil
import subprocess
import sys
from distutils.command.build_py import build_py as _build_py
from distutils.command.clean import clean as _clean
from distutils.debug import DEBUG
from distutils.dist import Distribution
from distutils.spawn import find_executable
from nose.commands import nosetests as _nosetests
from setuptools import setup
PROTO_FILES = [
'goobuntu/proto/hoststatus.proto',
]
CLEANUP_SUFFIXES = [
# filepath suffixes of files to remove on "clean" subcommand
'_pb2.py',
'.pyc',
'.so',
'.o',
'dependency_links.txt',
'entry_points.txt',
'PKG-INFO',
'top_level.txt',
'SOURCES.txt',
'.coverage',
'protobuf/compiler/__init__.py',
]
CLEANUP_DIRECTORIES = [ # subdirectories to remove on "clean" subcommand
# 'build' # Note: the build subdirectory is removed if --all is set.
'html-coverage',
]
if 'PROTOC' in os.environ and os.path.exists(os.environ['PROTOC']):
protoc = os.environ['PROTOC']
else:
protoc = find_executable('protoc')
def generate_proto(source):
"""Invoke Protocol Compiler to generate python from given source .proto."""
if not os.path.exists(source):
sys.stderr.write('Can\'t find required file: %s\n' % source)
sys.exit(1)
output = source.replace('.proto', '_pb2.py')
if (not os.path.exists(output) or
(os.path.getmtime(source) > os.path.getmtime(output))):
if DEBUG:
print('Generating %s' % output)
if protoc is None:
sys.stderr.write(
'protoc not found. Is protobuf-compiler installed? \n'
'Alternatively, you can point the PROTOC environment variable at a '
'local version.')
sys.exit(1)
protoc_command = [protoc, '-I.', '--python_out=.', source]
if subprocess.call(protoc_command) != 0:
sys.exit(1)
class MyDistribution(Distribution):
# Helper class to add the ability to set a few extra arguments
# in setup():
# protofiles : Protocol buffer definitions that need compiling
# cleansuffixes : Filename suffixes (might be full names) to remove when
# "clean" is called
# cleandirectories : Directories to remove during cleanup
# Also, the class sets the clean, build_py, test and nosetests cmdclass
# options to defaults that compile protobufs, implement test as nosetests
# and enables the nosetests command as well as using our cleanup class.
def __init__(self, attrs=None):
self.protofiles = [] # default to no protobuf files
self.cleansuffixes = ['_pb2.py', '.pyc'] # default to clean generated files
self.cleandirectories = ['html-coverage'] # clean out coverage directory
cmdclass = attrs.get('cmdclass')
if not cmdclass:
cmdclass = {}
# These should actually modify attrs['cmdclass'], as we assigned the
# mutable dict to cmdclass without copying it.
if 'nosetests' not in cmdclass:
cmdclass['nosetests'] = MyNosetests
if 'test' not in cmdclass:
cmdclass['test'] = MyNosetests
if 'build_py' not in cmdclass:
cmdclass['build_py'] = MyBuildPy
if 'clean' not in cmdclass:
cmdclass['clean'] = MyClean
attrs['cmdclass'] = cmdclass
# call parent __init__ in old style class
Distribution.__init__(self, attrs)
class MyClean(_clean):
def run(self):
try:
cleandirectories = self.distribution.cleandirectories
except AttributeError:
sys.stderr.write(
'Error: cleandirectories not defined. MyDistribution not used?')
sys.exit(1)
try:
cleansuffixes = self.distribution.cleansuffixes
except AttributeError:
sys.stderr.write(
'Error: cleansuffixes not defined. MyDistribution not used?')
sys.exit(1)
# Remove build and html-coverage directories if they exist
for directory in cleandirectories:
if os.path.exists(directory):
if DEBUG:
print('Removing directory: "{}"'.format(directory))
shutil.rmtree(directory)
# Delete generated files in code tree.
for dirpath, _, filenames in os.walk('.'):
for filename in filenames:
filepath = os.path.join(dirpath, filename)
for i in cleansuffixes:
if filepath.endswith(i):
if DEBUG:
print('Removing file: "{}"'.format(filepath))
os.remove(filepath)
# _clean is an old-style class, so super() doesn't work
_clean.run(self)
class MyBuildPy(_build_py):
def run(self):
try:
protofiles = self.distribution.protofiles
except AttributeError:
sys.stderr.write(
'Error: protofiles not defined. MyDistribution not used?')
sys.exit(1)
for proto in protofiles:
generate_proto(proto)
# _build_py is an old-style class, so super() doesn't work
_build_py.run(self)
class MyNosetests(_nosetests):
def run(self):
try:
protofiles = self.distribution.protofiles
except AttributeError:
sys.stderr.write(
'Error: protofiles not defined. MyDistribution not used?')
for proto in protofiles:
generate_proto(proto)
# _nosetests is an old-style class, so super() doesn't work
_nosetests.run(self)
setup(
# MyDistribution automatically enables several extensions, including
# the compilation of protobuf files.
distclass=MyDistribution,
...
tests_require=['nose'],
protofiles=PROTO_FILES,
cleansuffixes=CLEANUP_SUFFIXES,
cleandirectories=CLEANUP_DIRECTORIES,
)
Here's the solution that I have used for setup.py. The only thing you need to keep in mind is the version of the protoc compiler is compatible with the installed protobuf version.
'''
# here you can specify the proto folder and
# output folder or input them as parameters
# to script
protoc_command = [
"python", "-m", "grpc_tools.protoc",
f"--proto_path={proto_folder}",
f"--python_out={output_folder}",
f"--grpc_python_out={output_folder}",
]
'''

Reliable way to get the "build" directory from within setup.py

Inside the setup.py script I need to create some temporary files for the installation. The natural place to put them would be the "build/" directory.
Is there a way to retrieve its path that works if installing via pypi, from source, easy_install, pip, ...?
Thanks a lot!
By default distutils create build/ in current working dir, but it can be changed by argument --build-base. Seems like distutils parses it when executing setup and parsed argument does not accessible from outside, but you can cut it yourself:
import sys
build_base_long = [arg[12:].strip("= ") for arg in sys.argv if arg.startswith("--build-base")]
build_base_short = [arg[2:].strip(" ") for arg in sys.argv if arg.startswith("-b")]
build_base_arg = build_base_long or build_base_short
if build_base_arg:
build_base = build_base_arg[0]
else:
build_base = "."
This naive version of parser still shorter than optparse's version with proper error handling for unknown flags. Also you can use argparse's parser, which have try_parse method.
distutils/setuptools provide an abstract Command class that users can use to add custom commands to their package's setup process. This is the same class that built-in setup commands like build and install are subclasses of.
Every class that is a subclass of the abstract Command class must implement the initialize_options, finalize_options, and run methods. The "options" these method names refer to are class attributes that are derived from command-line arguments provided by the user (they can also have default values). The initialize_options method is where a class's options are defined, the finalize_options method is where a class's option values are assigned, and the run method is where a class's option values are used to perform the function of the command.
Since command-line arguments may affect more than one command, some command classes may share options with other command classes. For example, all the distutils/setuptools build commands (build, build_py, build_clib, build_ext, and build_scripts) and the install command need to know where the build directory is. Instead of having every one of these command classes define and parse the same command-line arguments into the same options, the build command, which is the first of all these commands to be executed, defines and parses the command-line arguments and options, and all the other classes get the option values from the build command in their finalize_options method.
For example, the build class defines the build_base and build_lib options in its initialize_options method and then computes their values from the command-line arguments in its finalize_options method. The install classes also defines the build_base and build_lib options in its initialize_options method but it gets the values for these options from the build command in its finalize_options method.
You can use the same pattern to add a custom sub-command to the build command as follows (it would be similar for install)
import setuptools
from distutils.command.build import build
class BuildSomething(setuptools.Command):
def initialize_options(self):
# define the command's options
self.build_base = None
self.build_lib = None
def finalize_options(self):
# get the option values from the build command
self.set_undefined_options('build',
('build_base', 'build_base'),
('build_lib', 'build_lib'))
def run(self):
# do something with the option values
print(self.build_base) # defaults to 'build'
print(self.build_lib)
build_something_command = 'build_something'
class Build(build):
def has_something(self):
# update this to check if your build should run
return True
sub_commands = [(build_something_command, has_something)] + build.sub_commands
COMMAND_CLASS = {
build_something_command: BuildSomething, # custom command
'build': Build # override distutils/setuptools build command
}
setuptools.setup(cmdclass=COMMAND_CLASS)
Alternatively, you could just subclass one of the distutils/setuptools classes if you just want to extend its functionality and it already has the options you need
import setuptools
from setuptools.command.build_py import build_py
class BuildPy(build_py):
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
# do something with the option values
print(self.build_lib) # inherited from build_py
build_py.run(self) # make sure the regular build_py still runs
COMMAND_CLASS = {
'build_py': BuildPy # override distutils/setuptools build_py command
}
setuptools.setup(cmdclass=COMMAND_CLASS)
Unfortunately, none of this is very well documented anywhere. I learned most of it from reading the distutils and setuptools source code. Any of the build*.py and install*.py files in either repository's command directory is informative. The abstract Command class is defined in distutils.
perhaps something like this? works in my case with python 3.8
...
from distutils.command.build import get_platform
import sys
import os
...
def configuration(parent_package='', top_path=None):
config = Configuration('', parent_package, top_path)
# add xxx library
config.add_library('xxx',['xxx/src/fil1.F90',
'xxx/src/file2.F90',
'xxx/src/file3.F90'],
language='f90')
# check for the temporary build directory option
_tempopt = None
_chkopt = ('-t','--build-temp')
for _opt in _chkopt:
if _opt in sys.argv:
_i = sys.argv.index(_opt)
if _i < len(sys.argv)-1:
_tempopt = sys.argv[_i+1]
break
# check for the base directory option
_buildopt = 'build'
_chkopt = ('-b','--build-base')
for _opt in _chkopt:
if _opt in sys.argv:
_i = sys.argv.index(_opt)
if _i < len(sys.argv)-1:
_buildopt = sys.argv[_i+1]
break
if _tempopt is None:
# works with python3 (check distutils/command/build.py)
platform_specifier = ".%s-%d.%d" % (get_platform(), *sys.version_info[:2])
_tempopt = '%s%stemp%s'%(_buildopt,os.sep,platform_specifier)
# add yyy module (wraps fortran code in xxx library)
config.add_extension('fastpost',sources=['yyy/src/fastpost.f90'],
f2py_options=['--quiet'],
libraries=['xxx'])
# to access the mod files produced from fortran modules comppilaton, add
# the temp build directory to the include directories of the configuration
config.add_include_dirs(_tempopt)
return config
setup(name="pysimpp",
version="0.0.1",
description="xxx",
author="xxx",
author_email="xxx#yyy",
configuration=configuration,)

How to extract chains from a PDB file?

I would like to extract chains from pdb files. I have a file named pdb.txt which contains pdb IDs as shown below. The first four characters represent PDB IDs and last character is the chain IDs.
1B68A
1BZ4B
4FUTA
I would like to 1) read the file line by line
2) download the atomic coordinates of each chain from the corresponding PDB files.
3) save the output to a folder.
I used the following script to extract chains. But this code prints only A chains from pdb files.
for i in 1B68 1BZ4 4FUT
do
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$i -O $i.pdb
grep ATOM $i.pdb | grep 'A' > $i\_A.pdb
done
The following BioPython code should suit your needs well.
It uses PDB.Select to only select the desired chains (in your case, one chain) and PDBIO() to create a structure containing just the chain.
import os
from Bio import PDB
class ChainSplitter:
def __init__(self, out_dir=None):
""" Create parsing and writing objects, specify output directory. """
self.parser = PDB.PDBParser()
self.writer = PDB.PDBIO()
if out_dir is None:
out_dir = os.path.join(os.getcwd(), "chain_PDBs")
self.out_dir = out_dir
def make_pdb(self, pdb_path, chain_letters, overwrite=False, struct=None):
""" Create a new PDB file containing only the specified chains.
Returns the path to the created file.
:param pdb_path: full path to the crystal structure
:param chain_letters: iterable of chain characters (case insensitive)
:param overwrite: write over the output file if it exists
"""
chain_letters = [chain.upper() for chain in chain_letters]
# Input/output files
(pdb_dir, pdb_fn) = os.path.split(pdb_path)
pdb_id = pdb_fn[3:7]
out_name = "pdb%s_%s.ent" % (pdb_id, "".join(chain_letters))
out_path = os.path.join(self.out_dir, out_name)
print "OUT PATH:",out_path
plural = "s" if (len(chain_letters) > 1) else "" # for printing
# Skip PDB generation if the file already exists
if (not overwrite) and (os.path.isfile(out_path)):
print("Chain%s %s of '%s' already extracted to '%s'." %
(plural, ", ".join(chain_letters), pdb_id, out_name))
return out_path
print("Extracting chain%s %s from %s..." % (plural,
", ".join(chain_letters), pdb_fn))
# Get structure, write new file with only given chains
if struct is None:
struct = self.parser.get_structure(pdb_id, pdb_path)
self.writer.set_structure(struct)
self.writer.save(out_path, select=SelectChains(chain_letters))
return out_path
class SelectChains(PDB.Select):
""" Only accept the specified chains when saving. """
def __init__(self, chain_letters):
self.chain_letters = chain_letters
def accept_chain(self, chain):
return (chain.get_id() in self.chain_letters)
if __name__ == "__main__":
""" Parses PDB id's desired chains, and creates new PDB structures. """
import sys
if not len(sys.argv) == 2:
print "Usage: $ python %s 'pdb.txt'" % __file__
sys.exit()
pdb_textfn = sys.argv[1]
pdbList = PDB.PDBList()
splitter = ChainSplitter("/home/steve/chain_pdbs") # Change me.
with open(pdb_textfn) as pdb_textfile:
for line in pdb_textfile:
pdb_id = line[:4].lower()
chain = line[4]
pdb_fn = pdbList.retrieve_pdb_file(pdb_id)
splitter.make_pdb(pdb_fn, chain)
One final note: don't write your own parser for PDB files. The format specification is ugly (really ugly), and the amount of faulty PDB files out there is staggering. Use a tool like BioPython that will handle parsing for you!
Furthermore, instead of using wget, you should use tools that interact with the PDB database for you. They take FTP connection limitations into account, the changing nature of the PDB database, and more. I should know - I updated Bio.PDBList to account for changes in the database. =)
It is probably a little late for asnwering this question, but I will give my opinion.
Biopython has some really handy features that would help you achieve such a think easily. You could use something like a custom selection class and then call it for each one of the chains you want to select inside a for loop with the original pdb file.
from Bio.PDB import Select, PDBIO
from Bio.PDB.PDBParser import PDBParser
class ChainSelect(Select):
def __init__(self, chain):
self.chain = chain
def accept_chain(self, chain):
if chain.get_id() == self.chain:
return 1
else:
return 0
chains = ['A','B','C']
p = PDBParser(PERMISSIVE=1)
structure = p.get_structure(pdb_file, pdb_file)
for chain in chains:
pdb_chain_file = 'pdb_file_chain_{}.pdb'.format(chain)
io_w_no_h = PDBIO()
io_w_no_h.set_structure(structure)
io_w_no_h.save('{}'.format(pdb_chain_file), ChainSelect(chain))
Lets say you have the following file pdb_structures
1B68A
1BZ4B
4FUTA
Then have your code in load_pdb.sh
while read name
do
chain=${name:4:1}
name=${name:0:4}
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$name -O $name.pdb
awk -v chain=$chain '$0~/^ATOM/ && substr($0,20,1)==chain {print}' $name.pdb > $name\_$chain.pdb
# rm $name.pdb
done
uncomment the last line if you don't need the original pdb's.
execute
cat pdb_structures | ./load_pdb.sh

Parse config files, environment, and command-line arguments, to get a single collection of options

Python's standard library has modules for configuration file parsing (configparser), environment variable reading (os.environ), and command-line argument parsing (argparse). I want to write a program that does all those, and also:
Has a cascade of option values:
default option values, overridden by
config file options, overridden by
environment variables, overridden by
command-line options.
Allows one or more configuration file locations specified on the command line with e.g. --config-file foo.conf, and reads that (either instead of, or additional to, the usual configuration file). This must still obey the above cascade.
Allows option definitions in a single place to determine the parsing behaviour for configuration files and the command line.
Unifies the parsed options into a single collection of option values for the rest of the program to access without caring where they came from.
Everything I need is apparently in the Python standard library, but they don't work together smoothly.
How can I achieve this with minimum deviation from the Python standard library?
UPDATE: I finally got around to putting this on pypi. Install latest version via:
pip install configargparser
Full help and instructions are here.
Original post
Here's a little something that I hacked together. Feel free suggest improvements/bug-reports in the comments:
import argparse
import ConfigParser
import os
def _identity(x):
return x
_SENTINEL = object()
class AddConfigFile(argparse.Action):
def __call__(self,parser,namespace,values,option_string=None):
# I can never remember if `values` is a list all the time or if it
# can be a scalar string; this takes care of both.
if isinstance(values,basestring):
parser.config_files.append(values)
else:
parser.config_files.extend(values)
class ArgumentConfigEnvParser(argparse.ArgumentParser):
def __init__(self,*args,**kwargs):
"""
Added 2 new keyword arguments to the ArgumentParser constructor:
config --> List of filenames to parse for config goodness
default_section --> name of the default section in the config file
"""
self.config_files = kwargs.pop('config',[]) #Must be a list
self.default_section = kwargs.pop('default_section','MAIN')
self._action_defaults = {}
argparse.ArgumentParser.__init__(self,*args,**kwargs)
def add_argument(self,*args,**kwargs):
"""
Works like `ArgumentParser.add_argument`, except that we've added an action:
config: add a config file to the parser
This also adds the ability to specify which section of the config file to pull the
data from, via the `section` keyword. This relies on the (undocumented) fact that
`ArgumentParser.add_argument` actually returns the `Action` object that it creates.
We need this to reliably get `dest` (although we could probably write a simple
function to do this for us).
"""
if 'action' in kwargs and kwargs['action'] == 'config':
kwargs['action'] = AddConfigFile
kwargs['default'] = argparse.SUPPRESS
# argparse won't know what to do with the section, so
# we'll pop it out and add it back in later.
#
# We also have to prevent argparse from doing any type conversion,
# which is done explicitly in parse_known_args.
#
# This way, we can reliably check whether argparse has replaced the default.
#
section = kwargs.pop('section', self.default_section)
type = kwargs.pop('type', _identity)
default = kwargs.pop('default', _SENTINEL)
if default is not argparse.SUPPRESS:
kwargs.update(default=_SENTINEL)
else:
kwargs.update(default=argparse.SUPPRESS)
action = argparse.ArgumentParser.add_argument(self,*args,**kwargs)
kwargs.update(section=section, type=type, default=default)
self._action_defaults[action.dest] = (args,kwargs)
return action
def parse_known_args(self,args=None, namespace=None):
# `parse_args` calls `parse_known_args`, so we should be okay with this...
ns, argv = argparse.ArgumentParser.parse_known_args(self, args=args, namespace=namespace)
config_parser = ConfigParser.SafeConfigParser()
config_files = [os.path.expanduser(os.path.expandvars(x)) for x in self.config_files]
config_parser.read(config_files)
for dest,(args,init_dict) in self._action_defaults.items():
type_converter = init_dict['type']
default = init_dict['default']
obj = default
if getattr(ns,dest,_SENTINEL) is not _SENTINEL: # found on command line
obj = getattr(ns,dest)
else: # not found on commandline
try: # get from config file
obj = config_parser.get(init_dict['section'],dest)
except (ConfigParser.NoSectionError, ConfigParser.NoOptionError): # Nope, not in config file
try: # get from environment
obj = os.environ[dest.upper()]
except KeyError:
pass
if obj is _SENTINEL:
setattr(ns,dest,None)
elif obj is argparse.SUPPRESS:
pass
else:
setattr(ns,dest,type_converter(obj))
return ns, argv
if __name__ == '__main__':
fake_config = """
[MAIN]
foo:bar
bar:1
"""
with open('_config.file','w') as fout:
fout.write(fake_config)
parser = ArgumentConfigEnvParser()
parser.add_argument('--config-file', action='config', help="location of config file")
parser.add_argument('--foo', type=str, action='store', default="grape", help="don't know what foo does ...")
parser.add_argument('--bar', type=int, default=7, action='store', help="This is an integer (I hope)")
parser.add_argument('--baz', type=float, action='store', help="This is an float(I hope)")
parser.add_argument('--qux', type=int, default='6', action='store', help="this is another int")
ns = parser.parse_args([])
parser_defaults = {'foo':"grape",'bar':7,'baz':None,'qux':6}
config_defaults = {'foo':'bar','bar':1}
env_defaults = {"baz":3.14159}
# This should be the defaults we gave the parser
print ns
assert ns.__dict__ == parser_defaults
# This should be the defaults we gave the parser + config defaults
d = parser_defaults.copy()
d.update(config_defaults)
ns = parser.parse_args(['--config-file','_config.file'])
print ns
assert ns.__dict__ == d
os.environ['BAZ'] = "3.14159"
# This should be the parser defaults + config defaults + env_defaults
d = parser_defaults.copy()
d.update(config_defaults)
d.update(env_defaults)
ns = parser.parse_args(['--config-file','_config.file'])
print ns
assert ns.__dict__ == d
# This should be the parser defaults + config defaults + env_defaults + commandline
commandline = {'foo':'3','qux':4}
d = parser_defaults.copy()
d.update(config_defaults)
d.update(env_defaults)
d.update(commandline)
ns = parser.parse_args(['--config-file','_config.file','--foo=3','--qux=4'])
print ns
assert ns.__dict__ == d
os.remove('_config.file')
TODO
This implementation is still incomplete. Here's a partial TODO list:
(easy) Interaction with parser defaults
(easy) If type conversion doesn't work, check against how argparse handles error messages
Conform to documented behavior
(easy) Write a function that figures out dest from args in add_argument, instead of relying on the Action object
(trivial) Write a parse_args function which uses parse_known_args. (e.g. copy parse_args from the cpython implementation to guarantee it calls parse_known_args.)
Less Easy Stuff…
I haven't tried any of this yet. It's unlikely—but still possible!—that it could just work…
(hard?) Mutual Exclusion
(hard?) Argument Groups (If implemented, these groups should get a section in the config file.)
(hard?) Sub Commands (Sub-commands should also get a section in the config file.)
The argparse module makes this not nuts, as long as you're happy with a config file that looks like command line. (I think this is an advantage, because users will only have to learn one syntax.) Setting fromfile_prefix_chars to, for example, #, makes it so that,
my_prog --foo=bar
is equivalent to
my_prog #baz.conf
if #baz.conf is,
--foo
bar
You can even have your code look for foo.conf automatically by modifying argv
if os.path.exists('foo.conf'):
argv = ['#foo.conf'] + argv
args = argparser.parse_args(argv)
The format of these configuration files is modifiable by making a subclass of ArgumentParser and adding a convert_arg_line_to_args method.
While I haven't tried it by my own, there is ConfigArgParse library which states that it does most of things that you want:
A drop-in replacement for argparse that allows options to also be set via config files and/or environment variables.
There's library that does exactly this called configglue.
configglue is a library that glues together python's
optparse.OptionParser and ConfigParser.ConfigParser, so that you don't
have to repeat yourself when you want to export the same options to a
configuration file and a commandline interface.
It also supports environment variables.
There's also another library called ConfigArgParse which is
A drop-in replacement for argparse that allows options to also be set
via config files and/or environment variables.
You might be interested in PyCon talk about configuration by Łukasz Langa - Let Them Configure!
It seems the standard library doesn't address this, leaving each programmer to cobble configparser and argparse and os.environ all together in clunky ways.
To hit all those requirements, I would recommend writing your own library that uses both [opt|arg]parse and configparser for the underlying functionality.
Given the first two and the last requirement, I'd say you want:
Step one: Do a command line parser pass that only looks for the --config-file option.
Step two: Parse the config file.
Step three: set up a second command line parser pass using the output of the config file pass as the defaults.
The third requirement likely means you have to design your own option definition system to expose all the functionality of optparse and configparser that you care about, and write some plumbing to do conversions in between.
The Python standard library does not provide this, as far as I know. I solved this for myself by writing code to use optparse and ConfigParser to parse the command line and config files, and provide an abstraction layer on top of them. However, you would need this as a separate dependency, which from your earlier comment seems to be unpalatable.
If you want to look at the code I wrote, it's at http://liw.fi/cliapp/. It's integrated into my "command line application framework" library, since that's a large part of what the framework needs to do.
I was tried something like this recently, using "optparse".
I set it up as a sub-class of OptonParser, with a '--Store' and a '--Check' command.
The code below should pretty much have you covered. You just need to define your own 'load' and 'store' methods which accept/return dictionaries and you're prey much set.
class SmartParse(optparse.OptionParser):
def __init__(self,defaults,*args,**kwargs):
self.smartDefaults=defaults
optparse.OptionParser.__init__(self,*args,**kwargs)
fileGroup = optparse.OptionGroup(self,'handle stored defaults')
fileGroup.add_option(
'-S','--Store',
dest='Action',
action='store_const',const='Store',
help='store command line settings'
)
fileGroup.add_option(
'-C','--Check',
dest='Action',
action='store_const',const='Check',
help ='check stored settings'
)
self.add_option_group(fileGroup)
def parse_args(self,*args,**kwargs):
(options,arguments) = optparse.OptionParser.parse_args(self,*args,**kwargs)
action = options.__dict__.pop('Action')
if action == 'Check':
assert all(
value is None
for (key,value) in options.__dict__.iteritems()
)
print 'defaults:',self.smartDefaults
print 'config:',self.load()
sys.exit()
elif action == 'Store':
self.store(options.__dict__)
sys.exit()
else:
config=self.load()
commandline=dict(
[key,val]
for (key,val) in options.__dict__.iteritems()
if val is not None
)
result = {}
result.update(self.defaults)
result.update(config)
result.update(commandline)
return result,arguments
def load(self):
return {}
def store(self,optionDict):
print 'Storing:',optionDict
Here's a module I hacked together that reads command-line arguments, environment settings, ini files, and keyring values as well. It's also available in a gist.
"""
Configuration Parser
Configurable parser that will parse config files, environment variables,
keyring, and command-line arguments.
Example test.ini file:
[defaults]
gini=10
[app]
xini = 50
Example test.arg file:
--xfarg=30
Example test.py file:
import os
import sys
import config
def main(argv):
'''Test.'''
options = [
config.Option("xpos",
help="positional argument",
nargs='?',
default="all",
env="APP_XPOS"),
config.Option("--xarg",
help="optional argument",
default=1,
type=int,
env="APP_XARG"),
config.Option("--xenv",
help="environment argument",
default=1,
type=int,
env="APP_XENV"),
config.Option("--xfarg",
help="#file argument",
default=1,
type=int,
env="APP_XFARG"),
config.Option("--xini",
help="ini argument",
default=1,
type=int,
ini_section="app",
env="APP_XINI"),
config.Option("--gini",
help="global ini argument",
default=1,
type=int,
env="APP_GINI"),
config.Option("--karg",
help="secret keyring arg",
default=-1,
type=int),
]
ini_file_paths = [
'/etc/default/app.ini',
os.path.join(os.path.dirname(os.path.abspath(__file__)),
'test.ini')
]
# default usage
conf = config.Config(prog='app', options=options,
ini_paths=ini_file_paths)
conf.parse()
print conf
# advanced usage
cli_args = conf.parse_cli(argv=argv)
env = conf.parse_env()
secrets = conf.parse_keyring(namespace="app")
ini = conf.parse_ini(ini_file_paths)
sources = {}
if ini:
for key, value in ini.iteritems():
conf[key] = value
sources[key] = "ini-file"
if secrets:
for key, value in secrets.iteritems():
conf[key] = value
sources[key] = "keyring"
if env:
for key, value in env.iteritems():
conf[key] = value
sources[key] = "environment"
if cli_args:
for key, value in cli_args.iteritems():
conf[key] = value
sources[key] = "command-line"
print '\n'.join(['%s:\t%s' % (k, v) for k, v in sources.items()])
if __name__ == "__main__":
if config.keyring:
config.keyring.set_password("app", "karg", "13")
main(sys.argv)
Example results:
$APP_XENV=10 python test.py api --xarg=2 #test.arg
<Config xpos=api, gini=1, xenv=10, xini=50, karg=13, xarg=2, xfarg=30>
xpos: command-line
xenv: environment
xini: ini-file
karg: keyring
xarg: command-line
xfarg: command-line
"""
import argparse
import ConfigParser
import copy
import os
import sys
try:
import keyring
except ImportError:
keyring = None
class Option(object):
"""Holds a configuration option and the names and locations for it.
Instantiate options using the same arguments as you would for an
add_arguments call in argparse. However, you have two additional kwargs
available:
env: the name of the environment variable to use for this option
ini_section: the ini file section to look this value up from
"""
def __init__(self, *args, **kwargs):
self.args = args or []
self.kwargs = kwargs or {}
def add_argument(self, parser, **override_kwargs):
"""Add an option to a an argparse parser."""
kwargs = {}
if self.kwargs:
kwargs = copy.copy(self.kwargs)
try:
del kwargs['env']
except KeyError:
pass
try:
del kwargs['ini_section']
except KeyError:
pass
kwargs.update(override_kwargs)
parser.add_argument(*self.args, **kwargs)
#property
def type(self):
"""The type of the option.
Should be a callable to parse options.
"""
return self.kwargs.get("type", str)
#property
def name(self):
"""The name of the option as determined from the args."""
for arg in self.args:
if arg.startswith("--"):
return arg[2:].replace("-", "_")
elif arg.startswith("-"):
continue
else:
return arg.replace("-", "_")
#property
def default(self):
"""The default for the option."""
return self.kwargs.get("default")
class Config(object):
"""Parses configuration sources."""
def __init__(self, options=None, ini_paths=None, **parser_kwargs):
"""Initialize with list of options.
:param ini_paths: optional paths to ini files to look up values from
:param parser_kwargs: kwargs used to init argparse parsers.
"""
self._parser_kwargs = parser_kwargs or {}
self._ini_paths = ini_paths or []
self._options = copy.copy(options) or []
self._values = {option.name: option.default
for option in self._options}
self._parser = argparse.ArgumentParser(**parser_kwargs)
self.pass_thru_args = []
#property
def prog(self):
"""Program name."""
return self._parser.prog
def __getitem__(self, key):
return self._values[key]
def __setitem__(self, key, value):
self._values[key] = value
def __delitem__(self, key):
del self._values[key]
def __contains__(self, key):
return key in self._values
def __iter__(self):
return iter(self._values)
def __len__(self):
return len(self._values)
def get(self, key, *args):
"""
Return the value for key if it exists otherwise the default.
"""
return self._values.get(key, *args)
def __getattr__(self, attr):
if attr in self._values:
return self._values[attr]
else:
raise AttributeError("'config' object has no attribute '%s'"
% attr)
def build_parser(self, options, **override_kwargs):
"""."""
kwargs = copy.copy(self._parser_kwargs)
kwargs.update(override_kwargs)
if 'fromfile_prefix_chars' not in kwargs:
kwargs['fromfile_prefix_chars'] = '#'
parser = argparse.ArgumentParser(**kwargs)
if options:
for option in options:
option.add_argument(parser)
return parser
def parse_cli(self, argv=None):
"""Parse command-line arguments into values."""
if not argv:
argv = sys.argv
options = []
for option in self._options:
temp = Option(*option.args, **option.kwargs)
temp.kwargs['default'] = argparse.SUPPRESS
options.append(temp)
parser = self.build_parser(options=options)
parsed, extras = parser.parse_known_args(argv[1:])
if extras:
valid, pass_thru = self.parse_passthru_args(argv[1:])
parsed, extras = parser.parse_known_args(valid)
if extras:
raise AttributeError("Unrecognized arguments: %s" %
' ,'.join(extras))
self.pass_thru_args = pass_thru + extras
return vars(parsed)
def parse_env(self):
results = {}
for option in self._options:
env_var = option.kwargs.get('env')
if env_var and env_var in os.environ:
value = os.environ[env_var]
results[option.name] = option.type(value)
return results
def get_defaults(self):
"""Use argparse to determine and return dict of defaults."""
parser = self.build_parser(options=self._options)
parsed, _ = parser.parse_known_args([])
return vars(parsed)
def parse_ini(self, paths=None):
"""Parse config files and return configuration options.
Expects array of files that are in ini format.
:param paths: list of paths to files to parse (uses ConfigParse logic).
If not supplied, uses the ini_paths value supplied on
initialization.
"""
results = {}
config = ConfigParser.SafeConfigParser()
config.read(paths or self._ini_paths)
for option in self._options:
ini_section = option.kwargs.get('ini_section')
if ini_section:
try:
value = config.get(ini_section, option.name)
results[option.name] = option.type(value)
except ConfigParser.NoSectionError:
pass
return results
def parse_keyring(self, namespace=None):
"""."""
results = {}
if not keyring:
return results
if not namespace:
namespace = self.prog
for option in self._options:
secret = keyring.get_password(namespace, option.name)
if secret:
results[option.name] = option.type(secret)
return results
def parse(self, argv=None):
"""."""
defaults = self.get_defaults()
args = self.parse_cli(argv=argv)
env = self.parse_env()
secrets = self.parse_keyring()
ini = self.parse_ini()
results = defaults
results.update(ini)
results.update(secrets)
results.update(env)
results.update(args)
self._values = results
return self
#staticmethod
def parse_passthru_args(argv):
"""Handles arguments to be passed thru to a subprocess using '--'.
:returns: tuple of two lists; args and pass-thru-args
"""
if '--' in argv:
dashdash = argv.index("--")
if dashdash == 0:
return argv[1:], []
elif dashdash > 0:
return argv[0:dashdash], argv[dashdash + 1:]
return argv, []
def __repr__(self):
return "<Config %s>" % ', '.join([
'%s=%s' % (k, v) for k, v in self._values.iteritems()])
def comma_separated_strings(value):
"""Handles comma-separated arguments passed in command-line."""
return map(str, value.split(","))
def comma_separated_pairs(value):
"""Handles comma-separated key/values passed in command-line."""
pairs = value.split(",")
results = {}
for pair in pairs:
key, pair_value = pair.split('=')
results[key] = pair_value
return results
You can use ChainMap for this. Take a look at my example that I provided for in "Which is the best way to allow configuration options be overridden at the command line in Python?" SO question.
The library confect I built is precisely to meet most of your needs.
It can load configuration file multiple times through given file paths or module name.
It loads configurations from environment variables with a given prefix.
It can attach command line options to some click commands
(sorry, it's not argparse, but click is better and much more advanced. confect might support argparse in the future release).
Most importantly, confect loads Python configuration files not JSON/YMAL/TOML/INI. Just like IPython profile file or DJANGO settings file, Python configuration file is flexible and easier to maintain.
For more information, please check the README.rst in the project repository. Be aware of that it supports only Python3.6 up.
Examples
Attaching command line options
import click
from proj_X.core import conf
#click.command()
#conf.click_options
def cli():
click.echo(f'cache_expire = {conf.api.cache_expire}')
if __name__ == '__main__':
cli()
It automatically creates a comprehensive help message with all properties and default values declared.
$ python -m proj_X.cli --help
Usage: cli.py [OPTIONS]
Options:
--api-cache_expire INTEGER [default: 86400]
--api-cache_prefix TEXT [default: proj_X_cache]
--api-url_base_path TEXT [default: api/v2/]
--db-db_name TEXT [default: proj_x]
--db-username TEXT [default: proj_x_admin]
--db-password TEXT [default: your_password]
--db-host TEXT [default: 127.0.0.1]
--help Show this message and exit.
Loading environment variables
It only needs one line to load environment variables
conf.load_envvars('proj_X')

Categories

Resources