How are arguments in a configuration file parsed regardless of position?

How are arguments in a configuration file parsed regardless of position? - python

e.g. A configuration file can have
CFLAGS = "xyz"
CXXFLAGS = "xyz"
OR
CXXFLAGS = "xyz"
CFLAGS = "xyz"
Best implementation I could think of would be to just split the argument and feed the left side into a switch
for line in file
x = line.split("=")
switch(x[0])
case CFLAGS
do cflags
case CXXFLAGS
do cxxflags
But how do people who have way more experience than me do it? I know theres probably some open source programs who do this but I wouldn't even know where to look in their source for this.
I program mainly in python and C so implementations/pseudocode/whattolookup in both would be preferred although other languages are fine also.
Thanks in advance.
P.S. try to avoid saying any form of re, regex, regexp, regular expressions, or any derivative thereof in your answers unless its unavoidable :P.

In Python just use the ConfigParser module which will parse .ini-like configuration files for you.
Re implementing this yourself, I find it convenient to view configuration data as a kind of dictionary. This naturally translates to Python's dicts, so if I split the line to <key> = <value> I just go on and update:
confdict[key] = value
With this scheme, the order of the keys in the configuration file doesn't matter, just like it doesn't matter in the dictionary itself - as long as you can lookup values for keys, you're happy.
If you look under the hood of ConfigParser, for example (the relevant method is _read), you will find this is exactly what it does. The options are kept in a dictionary (one per section, because .ini configuration files have one level of hierarchy). Lines are parsed from the file using regular expressions and key, value pairs are added to the dictionary as I described above.
This is Python. In C, I imagine there are quite a few libraries for doing this, but implementing your own would follow exactly the same algorithm. You'd use some kind of associative array data structure for the dictionary (hash table, tree, or whatever, doesn't really matter) and do the same parsing & assigning.

As Eli Bendersky says, in Python you should just use the provided ConfigParser.
If you insist on doing it yourself, his method of storing the configuration as a dictionary is one I recommend. Another way is to map the options to functions which process the values and do something with them:
# Map the options to functions to handle them.
handlers = {
'CFLAGS': some_function,
'CXXFLAGS': other_function,
}
# Loop through each option.
for line in configuration:
# Get the option and value.
option, value = line.split('=')
option = option.strip().upper()
value = value.strip()
# Try to find a handler and process the option.
handler = handlers.get(option)
if handler:
handler(option, value)
else:
raise Exception('Unknown option.')
Obviously, the handler functions must be able to accept the option and value parameters you're passing it.

Related

Is it possible to import key-value pairs from one INI file to another INI file

I would like to import the key-value pairs of one INI file to another INI file so that whenever I make an update to the "parent" INI file, the changes are automatically applied to the "child" INI file as well.
Is this possible with INI files?
I understand that I could manipulate the config parser to achieve this behavior but I'm looking more for an import solution here.
Thank you!

Just to clarify: I assume what you want is to have an import-statement inside your ini-file, something like:
import other.ini
[new values]
key = value
color = green
...
Basically, a ini-file is just a map of keys to values, something like a dict in the form of a text file. They are deliberately kept rather simple.
Now, while importing another ini-file sounds like a really simple thing to do, it quickly comes with an entire series of problems with which other import- or inheritance mechanisms have to deal. What happens, for instance, if two ini-files import each other, or what happens if A imports B and C, and both B and C import D (so called diamond problem)—do you then import D twice? Hence, importing other ini-files is not quite as simple as one might expect, and therefore not a feature you put necessarily into a minimalistic design.
That being said, keep in mind that ini-file is really just a map and therefore an inert entity: it does not do anything at all. In order to read an ini-file, you will usually need a parser like Python's configparser, which reads the textual information and creates the actual map for you. Also, it is this parser that would have to do the importing of other files. Hence, the question is: is there a parser for ini-files that supports importing?
I am not aware of any such parser as part of a publicly available standard package (although I assume they do exist). You could, of course, write one yourself.
Perhaps the easiest thing to do is to add imports as a special key-value pair to your ini file, something like import=other.ini; another.ini and then have your program follow these 'links' and import whatever other file(s) it is referring to.
Or you go the path of C and write a preprocessor that looks for lines that start with something like #import other.ini in your ini-file, and then merges the other ini-file into your text before parsing everything.

Dynamically update complex OrderedDict (based on yaml file)

I'm trying to build a piece of software that will rely on very dynamic configuration (or "ruleset", really). I've tried to capture it in the pseudocode:
"""
---
config:
item1:
thething: ${stages.current.variables.item1}
stages:
dev:
variables:
item1: stuff
prod:
variables:
item1: stuf2
"""
config_obj = yaml.load(config)
current_stage = 'dev'
#Insert artificial "current stage" to allow var matching
stages['current'] = stages[current_stage]
updated_config_obj = replace_vars(config_obj)
The goal is to have the updated_config_obj replace all variable-types with the actual value, so for this example it should replace ${stages.current.variables.item1} with stuff. The current part is easily solved by copying whatever's the current stage into a current item in the same OrderedDict, but I'm still stomped by how to actually perform the replace. The config yaml can be quite large and is totally depended on a plugin system so it must be dynamic.
Right now I'm looking at "walking" the entire object, checking for the existence of a $ on each "leaf" (indicating a variable) and performing a lookup backup to the current object to "resolve" the variable, but somehow that seems overly complex. Another alternative is (I guess) to ue Jinja2-templating on the "config string", with the parsed object as a lookup. Certainly doable but it somehow feels a little dirty.
I have the feeling that there should be a more elegant solution which can be done solely on the parsed object (without interacting with the string), but it escapes me.
Any pointers appreciated!

First, my two cents: try to avoid using any form of interpolation in your configuration file. This creates another layer of dependencies - one dependency for your program (the configuration file) and another dependency for your configuration file.
It's a slick solution at the moment, but consider that five years down the road some lowly developer might be staring down ${stages.current.variables.item1} for a month trying to figure out what this is, not understanding that this implicitly maps onto stages.dev. And then worse yet, some other developer comes along, and seeing that the floodgates of interpolation have been opened, they start using {{stages_dev}}, to mean that some value should interpolated from the system's environmental variables. And then some other developer starts using their own convention like {{!stagesdev!}}, which means that the value uses its own custom runtime interpolation, invoked in some obscure, downstream back-alley.
And then some consultant is hired to reverse-engineer the whole thing and now they are sailing the seas of spaghetti.
If you still want to do this, I'd recommend opening/parsing the configuration file into a dictionary (presumably using yaml.load()), then iterating through the whole thing, line-by-line, using regex to find instances of \$\{(.*)\}.
For each captured group, create an ordered list like:
# ["stages", "current", "variables", item1"]
yaml_references = ".".split("stages.current.variables.item1")
Then, you could do something like:
yaml_config_dict = "" # the parsed configuration file
interpolated_reference = None
for y in yaml_references:
interpolated_reference = yaml_config_dict[y]
i = interpolated_reference[0]
Now i should represent whatever ${stages.current.variables.item1} was pointing to in the context of the .yaml file and you should be able to do a string replace.

strange output from yaml.dump

I've just started using yaml and I love it. However, the other day I came across a case that seemed really odd and I am not sure what is causing it. I have a list of file path locations and another list of file path destinations. I create a dictionary out of them and then use yaml to dump it out to read later (I work with artists and use yaml so that it is human readable as well).
sorry for the long lists:
source = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr']
dest = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr']
dictionary = dict(zip(source, dest))
print yaml.dump(dictionary)
this is the output that I get:
{/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhaw
k_diff_diffuse_v0006.exr,
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v00
06/blackhawk_maskBurnt_diffuse_v0006.1031.exr,
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr}
It comes back in fine with a yaml.load, but this is not useful for artists to be able to edit if need be.

This is the first question in the FAQ.
By default, PyYAML chooses the style of a collection depending on whether it has nested collections. If a collection has nested collections, it will be assigned the block style. Otherwise it will have the flow style.
If you want collections to be always serialized in the block style, set the parameter default_flow_style of dump() to False.
So:
>>> print yaml.dump(dictionary, default_flow_style=False)
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr
Still not exactly beautiful, but when you have strings longer than 80 characters as keys, it's about as good as you can reasonably expect.
If you model (part of) the filesystem hierarchy in your object hierarchy, or create aliases (or dynamic aliasers) for parts of the tree, etc., the YAML will look a lot nicer. But that's something you have to actually do at the object-model level; as far as YAML is concerned, those long paths full of repeated prefixes are just strings.

Keep ConfigParser output files sorted

I've noticed with my source control that the content of the output files generated with ConfigParser is never in the same order. Sometimes sections will change place or options inside sections even without any modifications to the values.
Is there a way to keep things sorted in the configuration file so that I don't have to commit trivial changes every time I launch my application?

Looks like this was fixed in Python 3.1 and 2.7 with the introduction of ordered dictionaries:
The standard library now supports use
of ordered dictionaries in several
modules. The configparser module uses
them by default. This lets
configuration files be read, modified,
and then written back in their
original order.

If you want to take it a step further than Alexander Ljungberg's answer and also sort the sections and the contents of the sections you can use the following:
config = ConfigParser.ConfigParser({}, collections.OrderedDict)
config.read('testfile.ini')
# Order the content of each section alphabetically
for section in config._sections:
config._sections[section] = collections.OrderedDict(sorted(config._sections[section].items(), key=lambda t: t[0]))
# Order all sections alphabetically
config._sections = collections.OrderedDict(sorted(config._sections.items(), key=lambda t: t[0] ))
# Write ini file to standard output
config.write(sys.stdout)
This uses OrderdDict dictionaries (to keep ordering) and sorts the read ini file from outside ConfigParser by overwriting the internal _sections dictionary.

No. The ConfigParser library writes things out in dictionary hash order. (You can see this if you look at the source code.) There are replacements for this module that do a better job.
I will see if I can find one and add it here.
http://www.voidspace.org.uk/python/configobj.html#introduction is the one I was thinking of. It's not a drop-in replacement, but it is very easy to use.

ConfigParser is based on the ini file format, who in it's design is supposed to NOT be sensitive to order. If your config file format is sensitive to order, you can't use ConfigParser. It may also confuse people if you have an ini-type format that is sensitive to the order of the statements...

Selective merge of two or more data files

I have an executable whose input is contained in an ASCII file with format:
$ GENERAL INPUTS
$ PARAM1 = 123.456
PARAM2=456,789,101112
PARAM3(1)=123,456,789
PARAM4 =
1234,5678,91011E2
PARAM5(1,2)='STRING','STRING2'
$ NEW INSTANCE
NEW(1)=.TRUE.
PAR1=123
[More data here]
$ NEW INSTANCE
NEW(2)=.TRUE.
[etcetera]
In other words, some general inputs, and some parameter values for a number of new instances. The declaration of parameters is irregular; some numbers are separated by commas, others are in scientific notation, others are inside quotes, the spacing is not constant, etc.
The evaluation of some scenarios requires that I take the input of one "master" data file and copy the parameter data of, say, instances 2 through 6 to another data file which may already contain data for said instances (in which case data should be overwritten) and possibly others (data which should be left unchanged).
I have written a Flex lexer and a Bison parser; together they can eat a data file and store the parameters in memory. If I use them to open both files (master and "scenario"), it should not be too hard to selectively write to a third, new file the desired parameters (as in "general input from 'scenario'; instances 1 though 5 from 'master'; instances 6 through 9 from 'scenario'; ..."), save it, and delete the original scenario file.
Other information: (1) the files are highly sensitive, it is very important that the user is completely shielded from altering the master file; (2) the files are of manageable size (from 500K to 10M).
I have learned that what I can do in ten lines of code, some fellow here can do in two. How would you approach this problem? A Pythonic answer would make me cry. Seriously.

If you're already able to parse this format (I'd have tried it with pyParsing, but if you already have a working flexx/bison solution, that will be just fine), and the parsed data fit well in memory, then you're basically there. You can represent what you read from each file as a simple object with a dict for "general input" and a list of dicts, one per instance (or probably better a dict of instances, with the keys being the instance-numbers, which may give you a bit more flexibility). Then, as you mentioned, you just selectively "update" (add or overwrite) some of the instances copied from the master into the scenario, write the new scenario file, replace the old one with it.
To use the flexx/bison code with Python you have several options -- make it into a DLL/so and access it with ctypes, or call it from a cython-coded extension, a SWIG wrapper, a Python C-API extension, or SIP, Boost, etc etc.
Suppose that, one way or another, you have a parser primitive that (e.g.) accepts an input filename, reads and parses that file, and returns a list of 2-string tuples, each of which is either of the following:
(paramname, paramvalue)
('$$$$', 'General Inputs')
('$$$$', 'New Instance')
just using '$$$$' as a kind of arbitrary marker. Then for the object representing all that you've read from a file you might have:
import re
instidre = re.compile(r'NEW\((\d+)\)')
class Afile(object):
def __init__(self, filename):
self.filename = filename
self.geninput = dict()
self.instances = dict()
def feed_data(self, listoftuples):
it = iter(listoftuples)
assert next(it) == ('$$$$', 'General Inputs')
for name, value in it:
if name == '$$$$': break
self.geninput[name] = value
else: # no instances at all!
return
currinst = dict()
for name, value in it:
if name == '$$$$':
self.finish_inst(currinst)
currinst = dict()
continue
mo = instidre.match(name)
if mo:
assert value == '.TRUE.'
name = '$$$INSTID$$$'
value = mo.group(1)
currinst[name] = value
self.finish_inst(currinst)
def finish_inst(self, adict):
instid = dict.pop('$$$INSTID$$$')
assert instid not in self.instances
self.instances[instid] = adict
Sanity checking might be improved a bit, diagnosing anomalies more precisely, but net of error cases I think this is roughly what you want.
The merging just requires doing foo.instances[instid] = bar.instances[instid] for the required values of instid, where foo is the Afile instance for the scenario file and bar is the one for the master file -- that will overwrite or add as required.
I'm assuming that to write out the newly changed scenario file you don't need to repeat all the formatting quirks the specific inputs might have (if you do, then those quirks will need to be recorded during parsing together with names and values), so simply looping on sorted(foo.instances) and writing each out also in sorted order (after writing the general stuff also in sorted order, and with appropriate $ this and that marker lines, and with proper translation of the '$$$INSTID$$$' entry, etc) should suffice.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How are arguments in a configuration file parsed regardless of position? - python

Related

Is it possible to import key-value pairs from one INI file to another INI file

Dynamically update complex OrderedDict (based on yaml file)

strange output from yaml.dump

Keep ConfigParser output files sorted

Selective merge of two or more data files

Categories

Resources