Keep ConfigParser output files sorted

Keep ConfigParser output files sorted - python

I've noticed with my source control that the content of the output files generated with ConfigParser is never in the same order. Sometimes sections will change place or options inside sections even without any modifications to the values.
Is there a way to keep things sorted in the configuration file so that I don't have to commit trivial changes every time I launch my application?

Looks like this was fixed in Python 3.1 and 2.7 with the introduction of ordered dictionaries:
The standard library now supports use
of ordered dictionaries in several
modules. The configparser module uses
them by default. This lets
configuration files be read, modified,
and then written back in their
original order.

If you want to take it a step further than Alexander Ljungberg's answer and also sort the sections and the contents of the sections you can use the following:
config = ConfigParser.ConfigParser({}, collections.OrderedDict)
config.read('testfile.ini')
# Order the content of each section alphabetically
for section in config._sections:
config._sections[section] = collections.OrderedDict(sorted(config._sections[section].items(), key=lambda t: t[0]))
# Order all sections alphabetically
config._sections = collections.OrderedDict(sorted(config._sections.items(), key=lambda t: t[0] ))
# Write ini file to standard output
config.write(sys.stdout)
This uses OrderdDict dictionaries (to keep ordering) and sorts the read ini file from outside ConfigParser by overwriting the internal _sections dictionary.

No. The ConfigParser library writes things out in dictionary hash order. (You can see this if you look at the source code.) There are replacements for this module that do a better job.
I will see if I can find one and add it here.
http://www.voidspace.org.uk/python/configobj.html#introduction is the one I was thinking of. It's not a drop-in replacement, but it is very easy to use.

ConfigParser is based on the ini file format, who in it's design is supposed to NOT be sensitive to order. If your config file format is sensitive to order, you can't use ConfigParser. It may also confuse people if you have an ini-type format that is sensitive to the order of the statements...

Related

Is it possible to import key-value pairs from one INI file to another INI file

I would like to import the key-value pairs of one INI file to another INI file so that whenever I make an update to the "parent" INI file, the changes are automatically applied to the "child" INI file as well.
Is this possible with INI files?
I understand that I could manipulate the config parser to achieve this behavior but I'm looking more for an import solution here.
Thank you!

Just to clarify: I assume what you want is to have an import-statement inside your ini-file, something like:
import other.ini
[new values]
key = value
color = green
...
Basically, a ini-file is just a map of keys to values, something like a dict in the form of a text file. They are deliberately kept rather simple.
Now, while importing another ini-file sounds like a really simple thing to do, it quickly comes with an entire series of problems with which other import- or inheritance mechanisms have to deal. What happens, for instance, if two ini-files import each other, or what happens if A imports B and C, and both B and C import D (so called diamond problem)—do you then import D twice? Hence, importing other ini-files is not quite as simple as one might expect, and therefore not a feature you put necessarily into a minimalistic design.
That being said, keep in mind that ini-file is really just a map and therefore an inert entity: it does not do anything at all. In order to read an ini-file, you will usually need a parser like Python's configparser, which reads the textual information and creates the actual map for you. Also, it is this parser that would have to do the importing of other files. Hence, the question is: is there a parser for ini-files that supports importing?
I am not aware of any such parser as part of a publicly available standard package (although I assume they do exist). You could, of course, write one yourself.
Perhaps the easiest thing to do is to add imports as a special key-value pair to your ini file, something like import=other.ini; another.ini and then have your program follow these 'links' and import whatever other file(s) it is referring to.
Or you go the path of C and write a preprocessor that looks for lines that start with something like #import other.ini in your ini-file, and then merges the other ini-file into your text before parsing everything.

strange output from yaml.dump

I've just started using yaml and I love it. However, the other day I came across a case that seemed really odd and I am not sure what is causing it. I have a list of file path locations and another list of file path destinations. I create a dictionary out of them and then use yaml to dump it out to read later (I work with artists and use yaml so that it is human readable as well).
sorry for the long lists:
source = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr']
dest = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr']
dictionary = dict(zip(source, dest))
print yaml.dump(dictionary)
this is the output that I get:
{/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhaw
k_diff_diffuse_v0006.exr,
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v00
06/blackhawk_maskBurnt_diffuse_v0006.1031.exr,
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr}
It comes back in fine with a yaml.load, but this is not useful for artists to be able to edit if need be.

This is the first question in the FAQ.
By default, PyYAML chooses the style of a collection depending on whether it has nested collections. If a collection has nested collections, it will be assigned the block style. Otherwise it will have the flow style.
If you want collections to be always serialized in the block style, set the parameter default_flow_style of dump() to False.
So:
>>> print yaml.dump(dictionary, default_flow_style=False)
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr
Still not exactly beautiful, but when you have strings longer than 80 characters as keys, it's about as good as you can reasonably expect.
If you model (part of) the filesystem hierarchy in your object hierarchy, or create aliases (or dynamic aliasers) for parts of the tree, etc., the YAML will look a lot nicer. But that's something you have to actually do at the object-model level; as far as YAML is concerned, those long paths full of repeated prefixes are just strings.

Elementtree setting attribute order

I am trying to write a python script to standardise generic XML files, used to configure websites and website forms. However to do this I would like to either maintain the original attribute ordering of the elements, or even better be able to rearrange them in a pre-defined way. Currently most xml parsers I have tried re-write the attribute order to be alpha-numeric. As these XML files are human read/written and maintained, this isn't too useful.
For example a generic element may look like this in the XML;
<Question QuestionRef="XXXXX" DataType="Integer" Text="Question Text" Availability="Shown" DefaultAnswer="X">
However once passed through elementtree and re-written to a new file this is changed to:
<Question Availability="Shown" DataType="Integer" DefaultAnswer="X" PartType="X" QuestionRef="XXXXX" Text="Question Text">
As the aim of the script is to standardise a large number of XML files in order to increase readability between colleagues and that the information contained within the element's attributes have varying levels of significance (Eg. QuestionRef is highly important), dicates that attributes need to be sensibly ordered.
I understand that python dicts (which attributes are stored in) are naturally unordered and XML specification states attribute ordering is insignificant, but this the human readability factor is the driving force behind the script.
In other questions (on Stack Overflow) similar to this one I have seen it remarked that pxdom can do this (question link: link), but I cannot find any mention of how it may to do this in pxdom documentation or using a google search. So is there some way to maintain an order of attributes or define it with current XML parsers? Preferably without resorting to hotpatching :)!
Any help anyone can provide would be greatly appreciated :).

Apply monkey patch as mentioned below::
in ElementTree.py file, there is a function named as _serialize_xml;
in this function; apply the below mentioned patch;
##for k, v in sorted(items): # remove the sorted here
for k, v in items:
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib(v, encoding)
write(" %s=\"%s\"" % (qnames[k], v))
here; remove the sorted(items) and make it just items like i have done above.
Also to disable sorting based on namespace(because in above patch; sorting is still present when namespace is present for xml attribute; otherwise if namespace is not present; then above is working fine); so to do that, replace all {} with collections.OrderedDict() from ElementTree.py
Now you have all attributes in a order as you have added them to that xml element.
Before doing all of above; read the copyright message by Fredrik Lundh that is present in ElementTree.py

Elegant way to store dictionary permanently with Python?

Currently expensively parsing a file, which generates a dictionary of ~400 key, value pairs, which is seldomly updated. Previously had a function which parsed the file, wrote it to a text file in dictionary syntax (ie. dict = {'Adam': 'Room 430', 'Bob': 'Room 404'}) etc, and copied and pasted it into another function whose sole purpose was to return that parsed dictionary.
Hence, in every file where I would use that dictionary, I would import that function, and assign it to a variable, which is now that dictionary. Wondering if there's a more elegant way to do this, which does not involve explicitly copying and pasting code around? Using a database kind of seems unnecessary, and the text file gave me the benefit of seeing whether the parsing was done correctly before adding it to the function. But I'm open to suggestions.

Why not dump it to a JSON file, and then load it from there where you need it?
import json
with open('my_dict.json', 'w') as f:
json.dump(my_dict, f)
# elsewhere...
with open('my_dict.json') as f:
my_dict = json.load(f)
Loading from JSON is fairly efficient.
Another option would be to use pickle, but unlike JSON, the files it generates aren't human-readable so you lose out on the visual verification you liked from your old method.

Why mess with all these serialization methods? It's already written to a file as a Python dict (although with the unfortunate name 'dict'). Change your program to write out the data with a better variable name - maybe 'data', or 'catalog', and save the file as a Python file, say data.py. Then you can just import the data directly at runtime without any clumsy copy/pasting or JSON/shelve/etc. parsing:
from data import catalog

JSON is probably the right way to go in many cases; but there might be an alternative. It looks like your keys and your values are always strings, is that right? You might consider using dbm/anydbm. These are "databases" but they act almost exactly like dictionaries. They're great for cheap data persistence.
>>> import anydbm
>>> dict_of_strings = anydbm.open('data', 'c')
>>> dict_of_strings['foo'] = 'bar'
>>> dict_of_strings.close()
>>> dict_of_strings = anydbm.open('data')
>>> dict_of_strings['foo']
'bar'

If the keys are all strings, you can use the shelve module
A shelf is a persistent, dictionary-like object. The difference with
“dbm” databases is that the values (not the keys!) in a shelf can be
essentially arbitrary Python objects — anything that the pickle module
can handle. This includes most class instances, recursive data types,
and objects containing lots of shared sub-objects. The keys are
ordinary strings.
json would be a good choice if you need to use the data from other languages

If storage efficiency matters, use Pickle or CPickle(for execution performance gain). As Amber pointed out, you can also dump/load via Json. It will be human-readable, but takes more disk.

I suggest you consider using the shelve module since your data-structure is a mapping.
That was my answer to a similar question titled If I want to build a custom database, how could I? There's also a bit of sample code in another answer of mine promoting its use for the question How to get a object database?
ActiveState has a highly rated PersistentDict recipe which supports csv, json, and pickle output file formats. It's pretty fast since all three of those formats are implement in C (although the recipe itself is pure Python), so the fact that it reads the whole file into memory when it's opened might be acceptable.

JSON (or YAML, or whatever) serialisation is probably better, but if you're already writing the dictionary to a text file in python syntax, complete with a variable name binding, you could just write that to a .py file instead. Then that python file would be importable and usable as is. There's no need for the "function which returns a dictionary" approach, since you can directly use it as a global in that file. e.g.
# generated.py
please_dont_use_dict_as_a_variable_name = {'Adam': 'Room 430', 'Bob': 'Room 404'}
rather than:
# manually_copied.py
def get_dict():
return {'Adam': 'Room 430', 'Bob': 'Room 404'}
The only difference is that manually_copied.get_dict gives you a fresh copy of the dictionary every time, whereas generated.please_dont_use_dict_as_a_variable_name[1] is a single shared object. This may matter if you're modifying the dictionary in your program after retrieving it, but you can always use copy.copy or copy.deepcopy to create a new copy if you need to modify one independently of the others.
[1] dict, list, str, int, map, etc are generally viewed as bad variable names. The reason is that these are already defined as built-ins, and are used very commonly. So if you give something a name like that, at the least it's going to cause cognitive-dissonance for people reading your code (including you after you've been away for a while) as they have to keep in mind that "dict doesn't mean what it normally does here". It's also quite likely that at some point you'll get an infuriating-to-solve bug reporting that dict objects aren't callable (or something), because some piece of code is trying to use the type dict, but is getting the dictionary object you bound to the name dict instead.

on the JSON direction there is also something called simpleJSON. My first time using json in python the json library didnt work for me/ i couldnt figure it out. simpleJSON was...easier to use

How are arguments in a configuration file parsed regardless of position?

e.g. A configuration file can have
CFLAGS = "xyz"
CXXFLAGS = "xyz"
OR
CXXFLAGS = "xyz"
CFLAGS = "xyz"
Best implementation I could think of would be to just split the argument and feed the left side into a switch
for line in file
x = line.split("=")
switch(x[0])
case CFLAGS
do cflags
case CXXFLAGS
do cxxflags
But how do people who have way more experience than me do it? I know theres probably some open source programs who do this but I wouldn't even know where to look in their source for this.
I program mainly in python and C so implementations/pseudocode/whattolookup in both would be preferred although other languages are fine also.
Thanks in advance.
P.S. try to avoid saying any form of re, regex, regexp, regular expressions, or any derivative thereof in your answers unless its unavoidable :P.

In Python just use the ConfigParser module which will parse .ini-like configuration files for you.
Re implementing this yourself, I find it convenient to view configuration data as a kind of dictionary. This naturally translates to Python's dicts, so if I split the line to <key> = <value> I just go on and update:
confdict[key] = value
With this scheme, the order of the keys in the configuration file doesn't matter, just like it doesn't matter in the dictionary itself - as long as you can lookup values for keys, you're happy.
If you look under the hood of ConfigParser, for example (the relevant method is _read), you will find this is exactly what it does. The options are kept in a dictionary (one per section, because .ini configuration files have one level of hierarchy). Lines are parsed from the file using regular expressions and key, value pairs are added to the dictionary as I described above.
This is Python. In C, I imagine there are quite a few libraries for doing this, but implementing your own would follow exactly the same algorithm. You'd use some kind of associative array data structure for the dictionary (hash table, tree, or whatever, doesn't really matter) and do the same parsing & assigning.

As Eli Bendersky says, in Python you should just use the provided ConfigParser.
If you insist on doing it yourself, his method of storing the configuration as a dictionary is one I recommend. Another way is to map the options to functions which process the values and do something with them:
# Map the options to functions to handle them.
handlers = {
'CFLAGS': some_function,
'CXXFLAGS': other_function,
}
# Loop through each option.
for line in configuration:
# Get the option and value.
option, value = line.split('=')
option = option.strip().upper()
value = value.strip()
# Try to find a handler and process the option.
handler = handlers.get(option)
if handler:
handler(option, value)
else:
raise Exception('Unknown option.')
Obviously, the handler functions must be able to accept the option and value parameters you're passing it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.