I would like to import the key-value pairs of one INI file to another INI file so that whenever I make an update to the "parent" INI file, the changes are automatically applied to the "child" INI file as well.
Is this possible with INI files?
I understand that I could manipulate the config parser to achieve this behavior but I'm looking more for an import solution here.
Thank you!
Just to clarify: I assume what you want is to have an import-statement inside your ini-file, something like:
import other.ini
[new values]
key = value
color = green
...
Basically, a ini-file is just a map of keys to values, something like a dict in the form of a text file. They are deliberately kept rather simple.
Now, while importing another ini-file sounds like a really simple thing to do, it quickly comes with an entire series of problems with which other import- or inheritance mechanisms have to deal. What happens, for instance, if two ini-files import each other, or what happens if A imports B and C, and both B and C import D (so called diamond problem)—do you then import D twice? Hence, importing other ini-files is not quite as simple as one might expect, and therefore not a feature you put necessarily into a minimalistic design.
That being said, keep in mind that ini-file is really just a map and therefore an inert entity: it does not do anything at all. In order to read an ini-file, you will usually need a parser like Python's configparser, which reads the textual information and creates the actual map for you. Also, it is this parser that would have to do the importing of other files. Hence, the question is: is there a parser for ini-files that supports importing?
I am not aware of any such parser as part of a publicly available standard package (although I assume they do exist). You could, of course, write one yourself.
Perhaps the easiest thing to do is to add imports as a special key-value pair to your ini file, something like import=other.ini; another.ini and then have your program follow these 'links' and import whatever other file(s) it is referring to.
Or you go the path of C and write a preprocessor that looks for lines that start with something like #import other.ini in your ini-file, and then merges the other ini-file into your text before parsing everything.
I'm trying to build a piece of software that will rely on very dynamic configuration (or "ruleset", really). I've tried to capture it in the pseudocode:
"""
---
config:
item1:
thething: ${stages.current.variables.item1}
stages:
dev:
variables:
item1: stuff
prod:
variables:
item1: stuf2
"""
config_obj = yaml.load(config)
current_stage = 'dev'
#Insert artificial "current stage" to allow var matching
stages['current'] = stages[current_stage]
updated_config_obj = replace_vars(config_obj)
The goal is to have the updated_config_obj replace all variable-types with the actual value, so for this example it should replace ${stages.current.variables.item1} with stuff. The current part is easily solved by copying whatever's the current stage into a current item in the same OrderedDict, but I'm still stomped by how to actually perform the replace. The config yaml can be quite large and is totally depended on a plugin system so it must be dynamic.
Right now I'm looking at "walking" the entire object, checking for the existence of a $ on each "leaf" (indicating a variable) and performing a lookup backup to the current object to "resolve" the variable, but somehow that seems overly complex. Another alternative is (I guess) to ue Jinja2-templating on the "config string", with the parsed object as a lookup. Certainly doable but it somehow feels a little dirty.
I have the feeling that there should be a more elegant solution which can be done solely on the parsed object (without interacting with the string), but it escapes me.
Any pointers appreciated!
First, my two cents: try to avoid using any form of interpolation in your configuration file. This creates another layer of dependencies - one dependency for your program (the configuration file) and another dependency for your configuration file.
It's a slick solution at the moment, but consider that five years down the road some lowly developer might be staring down ${stages.current.variables.item1} for a month trying to figure out what this is, not understanding that this implicitly maps onto stages.dev. And then worse yet, some other developer comes along, and seeing that the floodgates of interpolation have been opened, they start using {{stages_dev}}, to mean that some value should interpolated from the system's environmental variables. And then some other developer starts using their own convention like {{!stagesdev!}}, which means that the value uses its own custom runtime interpolation, invoked in some obscure, downstream back-alley.
And then some consultant is hired to reverse-engineer the whole thing and now they are sailing the seas of spaghetti.
If you still want to do this, I'd recommend opening/parsing the configuration file into a dictionary (presumably using yaml.load()), then iterating through the whole thing, line-by-line, using regex to find instances of \$\{(.*)\}.
For each captured group, create an ordered list like:
# ["stages", "current", "variables", item1"]
yaml_references = ".".split("stages.current.variables.item1")
Then, you could do something like:
yaml_config_dict = "" # the parsed configuration file
interpolated_reference = None
for y in yaml_references:
interpolated_reference = yaml_config_dict[y]
i = interpolated_reference[0]
Now i should represent whatever ${stages.current.variables.item1} was pointing to in the context of the .yaml file and you should be able to do a string replace.
I've just started using yaml and I love it. However, the other day I came across a case that seemed really odd and I am not sure what is causing it. I have a list of file path locations and another list of file path destinations. I create a dictionary out of them and then use yaml to dump it out to read later (I work with artists and use yaml so that it is human readable as well).
sorry for the long lists:
source = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr']
dest = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr']
dictionary = dict(zip(source, dest))
print yaml.dump(dictionary)
this is the output that I get:
{/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhaw
k_diff_diffuse_v0006.exr,
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v00
06/blackhawk_maskBurnt_diffuse_v0006.1031.exr,
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr}
It comes back in fine with a yaml.load, but this is not useful for artists to be able to edit if need be.
This is the first question in the FAQ.
By default, PyYAML chooses the style of a collection depending on whether it has nested collections. If a collection has nested collections, it will be assigned the block style. Otherwise it will have the flow style.
If you want collections to be always serialized in the block style, set the parameter default_flow_style of dump() to False.
So:
>>> print yaml.dump(dictionary, default_flow_style=False)
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr
Still not exactly beautiful, but when you have strings longer than 80 characters as keys, it's about as good as you can reasonably expect.
If you model (part of) the filesystem hierarchy in your object hierarchy, or create aliases (or dynamic aliasers) for parts of the tree, etc., the YAML will look a lot nicer. But that's something you have to actually do at the object-model level; as far as YAML is concerned, those long paths full of repeated prefixes are just strings.
I am using SafeConfigParser, my configuration file includes a [DEFAULT] section and I am using the below code to extract that part.
parser = SafeConfigParser(allow_no_value=True)
parser.optionxform = str # makes names case sensitive
defaultAttributesDic = parser.defaults()
However my DEFAULT section include interpolated values such as:
A= 10000
B= %(A)s
But the problem is that defaults() returns the actual raw values (not interpolated values).
Why is that? when can that be useful? I don't get the reason behind this decision?
I am using parser.items(section) to read other sections and that works fine. Values are returned interpolated. Should i skip defaults and use items("DEFAULT") instead? Please explain this to me?
defaults() is a method inherited from RawConfigParser which doesn't support interpolation.
I think you should reserve the [DEFAULT] section for providing defaults for other sections instead of trying to "abuse" it as a "normal" section. The [DEFAULT] section has a special meaning and isn't even included in methods like sections() or has_section().
If you need a "default" section just call it a name other than DEFAULT.
I'm going to implement a tokenizer in Python and I was wondering if you could offer some style advice?
I've implemented a tokenizer before in C and in Java so I'm fine with the theory, I'd just like to ensure I'm following pythonic styles and best practices.
Listing Token Types:
In Java, for example, I would have a list of fields like so:
public static final int TOKEN_INTEGER = 0
But, obviously, there's no way (I think) to declare a constant variable in Python, so I could just replace this with normal variable declarations but that doesn't strike me as a great solution since the declarations could be altered.
Returning Tokens From The Tokenizer:
Is there a better alternative to just simply returning a list of tuples e.g.
[ (TOKEN_INTEGER, 17), (TOKEN_STRING, "Sixteen")]?
Cheers,
Pete
There's an undocumented class in the re module called re.Scanner. It's very straightforward to use for a tokenizer:
import re
scanner=re.Scanner([
(r"[0-9]+", lambda scanner,token:("INTEGER", token)),
(r"[a-z_]+", lambda scanner,token:("IDENTIFIER", token)),
(r"[,.]+", lambda scanner,token:("PUNCTUATION", token)),
(r"\s+", None), # None == skip token.
])
results, remainder=scanner.scan("45 pigeons, 23 cows, 11 spiders.")
print results
will result in
[('INTEGER', '45'),
('IDENTIFIER', 'pigeons'),
('PUNCTUATION', ','),
('INTEGER', '23'),
('IDENTIFIER', 'cows'),
('PUNCTUATION', ','),
('INTEGER', '11'),
('IDENTIFIER', 'spiders'),
('PUNCTUATION', '.')]
I used re.Scanner to write a pretty nifty configuration/structured data format parser in only a couple hundred lines.
Python takes a "we're all consenting adults" approach to information hiding. It's OK to use variables as though they were constants, and trust that users of your code won't do something stupid.
In many situations, exp. when parsing long input streams, you may find it more useful to implement you tokenizer as a generator function. This way you can easily iterate over all the tokens without the need for lots of memory to build the list of tokens first.
For generator see the original proposal or other online docs
Thanks for your help, I've started to bring these ideas together, and I've come up with the following. Is there anything terribly wrong with this implementation (particularly I'm concerned about passing a file object to the tokenizer):
class Tokenizer(object):
def __init__(self,file):
self.file = file
def __get_next_character(self):
return self.file.read(1)
def __peek_next_character(self):
character = self.file.read(1)
self.file.seek(self.file.tell()-1,0)
return character
def __read_number(self):
value = ""
while self.__peek_next_character().isdigit():
value += self.__get_next_character()
return value
def next_token(self):
character = self.__peek_next_character()
if character.isdigit():
return self.__read_number()
"Is there a better alternative to just simply returning a list of tuples?"
Nope. It works really well.
"Is there a better alternative to just simply returning a list of tuples?"
That's the approach used by the "tokenize" module for parsing Python source code. Returning a simple list of tuples can work very well.
I have recently built a tokenizer, too, and passed through some of your issues.
Token types are declared as "constants", i.e. variables with ALL_CAPS names, at the module level. For example,
_INTEGER = 0x0007
_FLOAT = 0x0008
_VARIABLE = 0x0009
and so on. I have used an underscore in front of the name to point out that somehow those fields are "private" for the module, but I really don't know if this is typical or advisable, not even how much Pythonic. (Also, I'll probably ditch numbers in favour of strings, because during debugging they are much more readable.)
Tokens are returned as named tuples.
from collections import namedtuple
Token = namedtuple('Token', ['value', 'type'])
# so that e.g. somewhere in a function/method I can write...
t = Token(n, _INTEGER)
# ...and return it properly
I have used named tuples because the tokenizer's client code (e.g. the parser) seems a little clearer while using names (e.g. token.value) instead of indexes (e.g. token[0]).
Finally, I've noticed that sometimes, especially writing tests, I prefer to pass a string to the tokenizer instead of a file object. I call it a "reader", and have a specific method to open it and let the tokenizer access it through the same interface.
def open_reader(self, source):
"""
Produces a file object from source.
The source can be either a file object already, or a string.
"""
if hasattr(source, 'read'):
return source
else:
from io import StringIO
return StringIO(source)
When I start something new in Python I usually look first at some modules or libraries to use. There's 90%+ chance that there already is somthing available.
For tokenizers and parsers this is certainly so. Have you looked at PyParsing ?
I've implemented a tokenizer for a C-like programming language. What I did was to split up the creation of tokens into two layers:
a surface scanner: This one actually reads the text and uses regular expression to split it up into only the most primitve tokens (operators, identifiers, numbers,...); this one yields tuples (tokenname, scannedstring, startpos, endpos).
a tokenizer: This consumes the tuples from the first layer, turning them into token objects (named tuples would do as well, I think). Its purpose is to detect some long-range dependencies in the token stream, particularly strings (with their opening and closing quotes) and comments (with their opening an closing lexems; - yes, I wanted to retain comments!) and coerce them into single tokens. The resulting stream of token objects is then returned to a consuming parser.
Both are generators. The benefits of this approach were:
Reading of the raw text is done only in the most primitive way, with simple regexps - fast and clean.
The second layer is already implemented as a primitive parser, to detect string literals and comments - re-use of parser technology.
You don't have to strain the surface scanner with complex detections.
But the real parser gets tokens on the semantic level of the language to be parsed (again strings, comments).
I feel quite happy with this layered approach.
I'd turn to the excellent Text Processing in Python by David Mertz
This being a late answer, there is now something in the official documentation: Writing a tokenizer with the re standard library. This is content in the Python 3 documentation that isn't in the Py 2.7 docs. But it is still applicable to older Pythons.
This includes both short code, easy setup, and writing a generator as several answers here have proposed.
If the docs are not Pythonic, I don't know what is :-)
"Is there a better alternative to just simply returning a list of tuples"
I had to implement a tokenizer, but it required a more complex approach than a list of tuples, therefore I implemented a class for each token. You can then return a list of class instances, or if you want to save resources, you can return something implementing the iterator interface and generate the next token while you progress in the parsing.