This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python Config Parser (Duplicate Key Support)
I'm trying to read an INI format project file in Python. The file contains duplicate keys (having unique values) within a section. For example, one of the sections looks like this:
[Source Files]
Source="file1.c"
Source="file2.c"
Source="file3.c"
If I read this using the following code
config = configparser.ConfigParser( strict=False )
config.read( "project/file/name" )
print( config.get( "Source Files", "Source" ) )
the result is
"file3.c"
Is there any way to get a list of all the values for the key Source instead? I'm open to using some other method to parse the file.
Note that I cannot change the file format.
I ended up inheriting from the RawConfigParser class to implement this feature. In case someone else is interested in this, here's the code I'm using:
import configparser
class ConfigParserMultiOpt(configparser.RawConfigParser):
"""ConfigParser allowing duplicate keys. Values are stored in a list"""
def __init__(self):
configparser.RawConfigParser.__init__(self, empty_lines_in_values=False, strict=False)
def _read(self, fp, fpname):
"""Parse a sectioned configuration file.
Each section in a configuration file contains a header, indicated by
a name in square brackets (`[]'), plus key/value options, indicated by
`name' and `value' delimited with a specific substring (`=' or `:' by
default).
Values can span multiple lines, as long as they are indented deeper
than the first line of the value. Depending on the parser's mode, blank
lines may be treated as parts of multiline values or ignored.
Configuration files may include comments, prefixed by specific
characters (`#' and `;' by default). Comments may appear on their own
in an otherwise empty line or may be entered in lines holding values or
section names.
"""
elements_added = set()
cursect = None # None, or a dictionary
sectname = None
optname = None
lineno = 0
indent_level = 0
e = None # None, or an exception
for lineno, line in enumerate(fp, start=1):
comment_start = None
# strip inline comments
for prefix in self._inline_comment_prefixes:
index = line.find(prefix)
if index == 0 or (index > 0 and line[index-1].isspace()):
comment_start = index
break
# strip full line comments
for prefix in self._comment_prefixes:
if line.strip().startswith(prefix):
comment_start = 0
break
value = line[:comment_start].strip()
if not value:
if self._empty_lines_in_values:
# add empty line to the value, but only if there was no
# comment on the line
if (comment_start is None and
cursect is not None and
optname and
cursect[optname] is not None):
cursect[optname].append('') # newlines added at join
else:
# empty line marks end of value
indent_level = sys.maxsize
continue
# continuation line?
first_nonspace = self.NONSPACECRE.search(line)
cur_indent_level = first_nonspace.start() if first_nonspace else 0
if (cursect is not None and optname and
cur_indent_level > indent_level):
cursect[optname].append(value)
# a section header or option header?
else:
indent_level = cur_indent_level
# is it a section header?
mo = self.SECTCRE.match(value)
if mo:
sectname = mo.group('header')
if sectname in self._sections:
if self._strict and sectname in elements_added:
raise DuplicateSectionError(sectname, fpname,
lineno)
cursect = self._sections[sectname]
elements_added.add(sectname)
elif sectname == self.default_section:
cursect = self._defaults
else:
cursect = self._dict()
self._sections[sectname] = cursect
self._proxies[sectname] = configparser.SectionProxy(self, sectname)
elements_added.add(sectname)
# So sections can't start with a continuation line
optname = None
# no section header in the file?
elif cursect is None:
raise MissingSectionHeaderError(fpname, lineno, line)
# an option line?
else:
mo = self._optcre.match(value)
if mo:
optname, vi, optval = mo.group('option', 'vi', 'value')
if not optname:
e = self._handle_error(e, fpname, lineno, line)
optname = self.optionxform(optname.rstrip())
if (self._strict and
(sectname, optname) in elements_added):
raise configparser.DuplicateOptionError(sectname, optname, fpname, lineno)
elements_added.add((sectname, optname))
# This check is fine because the OPTCRE cannot
# match if it would set optval to None
if optval is not None:
optval = optval.strip()
# Check if this optname already exists
if (optname in cursect) and (cursect[optname] is not None):
# If it does, convert it to a tuple if it isn't already one
if not isinstance(cursect[optname], tuple):
cursect[optname] = tuple(cursect[optname])
cursect[optname] = cursect[optname] + tuple([optval])
else:
cursect[optname] = [optval]
else:
# valueless option handling
cursect[optname] = None
else:
# a non-fatal parsing error occurred. set up the
# exception but keep going. the exception will be
# raised at the end of the file and will contain a
# list of all bogus lines
e = self._handle_error(e, fpname, lineno, line)
# if any parsing errors occurred, raise an exception
if e:
raise e
self._join_multiline_values()
The _read function is copy-pasted from configparser.py. The only change I made was adding the if condition after the optval = optval.strip() line. ConfigParserMultiOpt will return multiple values for duplicate keys within a section in a tuple.
I'm new to Python, so if anyone has suggestions on improving the code above, I'm all ears!
Related
I recently had to write a challenge for a company that was to merge 3 CSV files into one based on the first attribute of each (the attributes were repeating in all files).
I wrote the code and sent it to them, but they said it took 2 minutes to run. That was funny because it ran for 10 seconds on my machine. My machine had the same processor, 16GB of RAM, and had an SSD as well. Very similar environments.
I tried optimising it and resubmitted it. This time they said they ran it on an Ubuntu machine and got 11 seconds, while the code ran for 100 seconds on the Windows 10 still.
Another peculiar thing was that when I tried profiling it with the Profile module, it went on forever, had to terminate after 450 seconds. I moved to cProfiler and it recorded it for 7 seconds.
EDIT: The exact formulation of the problem is
Write a console program to merge the files provided in a timely and
efficient manner. File paths should be supplied as arguments so that
the program can be evaluated on different data sets. The merged file
should be saved as CSV; use the id column as the unique key for
merging; the program should do any necessary data cleaning and error
checking.
Feel free to use any language you’re comfortable with – only
restriction is no external libraries as this defeats the purpose of
the test. If the language provides CSV parsing libraries (like
Python), please avoid using them as well as this is a part of the
test.
Without further ado here's the code:
#!/usr/bin/python3
import sys
from multiprocessing import Pool
HEADERS = ['id']
def csv_tuple_quotes_valid(a_tuple):
"""
checks if a quotes in each attribute of a entry (i.e. a tuple) agree with the csv format
returns True or False
"""
for attribute in a_tuple:
in_quotes = False
attr_len = len(attribute)
skip_next = False
for i in range(0, attr_len):
if not skip_next and attribute[i] == '\"':
if i < attr_len - 1 and attribute[i + 1] == '\"':
skip_next = True
continue
elif i == 0 or i == attr_len - 1:
in_quotes = not in_quotes
else:
return False
else:
skip_next = False
if in_quotes:
return False
return True
def check_and_parse_potential_tuple(to_parse):
"""
receives a string and returns an array of the attributes of the csv line
if the string was not a valid csv line, then returns False
"""
a_tuple = []
attribute_start_index = 0
to_parse_len = len(to_parse)
in_quotes = False
i = 0
#iterate through the string (line from the csv)
while i < to_parse_len:
current_char = to_parse[i]
#this works the following way: if we meet a quote ("), it must be in one
#of five cases: "" | ", | ," | "\0 | (start_of_string)"
#in case we are inside a quoted attribute (i.e. "123"), then commas are ignored
#the following code also extracts the tuples' attributes
if current_char == '\"':
if i == 0 or (to_parse[i - 1] == ',' and not in_quotes): # (start_of_string)" and ," case
#not including the quote in the next attr
attribute_start_index = i + 1
#starting a quoted attr
in_quotes = True
elif i + 1 < to_parse_len:
if to_parse[i + 1] == '\"': # "" case
i += 1 #skip the next " because it is part of a ""
elif to_parse[i + 1] == ',' and in_quotes: # ", case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#not including the quote and comma in the next attr
attribute_start_index = i + 2
in_quotes = False #the quoted attr has ended
#skip the next comma - we know what it is for
i += 1
else:
#since we cannot have a random " in the middle of an attr
return False
elif i == to_parse_len - 1: # "\0 case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#reached end of line, so no more attr's to extract
attribute_start_index = to_parse_len
in_quotes = False
else:
return False
elif current_char == ',':
if not in_quotes:
a_tuple.append(to_parse[attribute_start_index:i].strip())
attribute_start_index = i + 1
i += 1
#in case the last attr was left empty or unquoted
if attribute_start_index < to_parse_len or (not in_quotes and to_parse[-1] == ','):
a_tuple.append(to_parse[attribute_start_index:])
#line ended while parsing; i.e. a quote was openned but not closed
if in_quotes:
return False
return a_tuple
def parse_tuple(to_parse, no_of_headers):
"""
parses a string and returns an array with no_of_headers number of headers
raises an error if the string was not a valid CSV line
"""
#get rid of the newline at the end of every line
to_parse = to_parse.strip()
# return to_parse.split(',') #if we assume the data is in a valid format
#the following checking of the format of the data increases the execution
#time by a factor of 2; if the data is know to be valid, uncomment 3 lines above here
#if there are more commas than fields, then we must take into consideration
#how the quotes parse and then extract the attributes
if to_parse.count(',') + 1 > no_of_headers:
result = check_and_parse_potential_tuple(to_parse)
if result:
a_tuple = result
else:
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
else:
a_tuple = to_parse.split(',')
if not csv_tuple_quotes_valid(a_tuple):
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
#if the format is correct but more data fields were provided
#the following works faster than an if statement that checks the length of a_tuple
try:
a_tuple[no_of_headers - 1]
except IndexError:
raise TypeError('Error while parsing CSV line %s. Unknown reason' % to_parse)
#this replaces the use my own hashtables to store the duplicated values for the attributes
for i in range(1, no_of_headers):
a_tuple[i] = sys.intern(a_tuple[i])
return a_tuple
def read_file(path, file_number):
"""
reads the csv file and returns (dict, int)
the dict is the mapping of id's to attributes
the integer is the number of attributes (headers) for the csv file
"""
global HEADERS
try:
file = open(path, 'r');
except FileNotFoundError as e:
print("error in %s:\n%s\nexiting...")
exit(1)
main_table = {}
headers = file.readline().strip().split(',')
no_of_headers = len(headers)
HEADERS.extend(headers[1:]) #keep the headers from the file
lines = file.readlines()
file.close()
args = []
for line in lines:
args.append((line, no_of_headers))
#pool is a pool of worker processes parsing the lines in parallel
with Pool() as workers:
try:
all_tuples = workers.starmap(parse_tuple, args, 1000)
except TypeError as e:
print('Error in file %s:\n%s\nexiting thread...' % (path, e.args))
exit(1)
for a_tuple in all_tuples:
#add quotes to key if needed
key = a_tuple[0] if a_tuple[0][0] == '\"' else ('\"%s\"' % a_tuple[0])
main_table[key] = a_tuple[1:]
return (main_table, no_of_headers)
def merge_files():
"""
produces a file called merged.csv
"""
global HEADERS
no_of_files = len(sys.argv) - 1
processed_files = [None] * no_of_files
for i in range(0, no_of_files):
processed_files[i] = read_file(sys.argv[i + 1], i)
out_file = open('merged.csv', 'w+')
merged_str = ','.join(HEADERS)
all_keys = {}
#this is to ensure that we include all keys in the final file.
#even those that are missing from some files and present in others
for processed_file in processed_files:
all_keys.update(processed_file[0])
for key in all_keys:
merged_str += '\n%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str)
out_file.close()
if __name__ == '__main__':
# merge_files()
import cProfile
cProfile.run('merge_files()')
# import time
# start = time.time()
# print(time.time() - start);
Here is the profiler report I got on my Windows.
EDIT: The rest of the csv data provided is here. Pastebin was taking too long to process the files, so...
It might not be the best code and I know that, but my question is what slows down Windows so much that doesn't slow down an Ubuntu? The merge_files() function takes the longest, with 94 seconds just for itself, not including the calls to other functions. And there doesn't seem to be anything too obvious to me for why it is so slow.
Thanks
EDIT: Note: We both used the same dataset to run the code with.
It turns out that Windows and Linux handle very long strings differently. When I moved the out_file.write(merged_str) inside the outer for loop (for key in all_keys:) and stopped appending to merged_str, it ran for 11 seconds as expected. I don't have enough knowledge on either of the OS's memory management systems to be able to give a prediction on why it is so different.
But I would say that the way that the second one (the Windows one) is the more fail-safe method because it is unreasonable to keep a 30 MB string in memory. It just turns out that Linux sees that and doesn't always try to keep the string in cache, or to rebuild it every time.
Funny enough, initially I did run it a few times on my Linux machine with these same writing strategies, and the one with the large string seemed to go faster, so I stuck with it. I guess you never know.
Here's the modified code
for key in all_keys:
merged_str = '%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str + '\n')
out_file.close()
When I run your solution on Ubuntu 16.04 with the three given files, it seems to take ~8 seconds to complete. The only modification I made was to uncomment the timing code at the bottom and use it.
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
8.039648056030273
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
7.78482985496521
I rewrote my first attempt without using csv from the standard library and am now getting times of ~4.3 seconds.
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.332579612731934
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.305467367172241
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.27345871925354
This is my solution code (lettuce_merge.py):
from collections import defaultdict
def split_row(csv_row):
return [col.strip('"') for col in csv_row.rstrip().split(',')]
def merge_csv_files(files):
file_headers = []
merged_headers = []
for i, file in enumerate(files):
current_header = split_row(next(file))
unique_key, *current_header = current_header
if i == 0:
merged_headers.append(unique_key)
merged_headers.extend(current_header)
file_headers.append(current_header)
result = defaultdict(lambda: [''] * (len(merged_headers) - 1))
for file_header, file in zip(file_headers, files):
for line in file:
key, *values = split_row(line)
for col_name, col_value in zip(file_header, values):
result[key][merged_headers.index(col_name) - 1] = col_value
file.close()
quotes = '"{}"'.format
with open('lettuce_merged.csv', 'w') as f:
f.write(','.join(quotes(a) for a in merged_headers) + '\n')
for key, values in result.items():
f.write(','.join(quotes(b) for b in [key] + values) + '\n')
if __name__ == '__main__':
from argparse import ArgumentParser, FileType
from time import time
parser = ArgumentParser()
parser.add_argument('files', nargs='*', type=FileType('r'))
args = parser.parse_args()
start_time = time()
merge_csv_files(args.files)
print(time() - start_time)
I'm sure this code could be optimized even further but sometimes just seeing another way to solve a problem can help spark new ideas.
I have a text file, which is strucutred as following:
segmentA {
content Aa
content Ab
content Ac
....
}
segmentB {
content Ba
content Bb
content Bc
......
}
segmentC {
content Ca
content Cb
content Cc
......
}
I know how to search certrain strings through the whole text file, but how can i define to search for a certain string whithin, like example, "segmentC". I need something like reg expression to tell the script??:
If text beginn with "segmentC {" perform a search of a certain string until the first "}" appears.
Someone an idea?
Thanks in advance!
Not a RegEx solution ...but would do the work!
def SearchStuff(lines,sstr):
i=0
while(lines[i]!='}'):
#Do stuffff .....for e.g.
if 'Ca' in lines[i]:
return lines[i]
i+=1
def main(search_str):
f=open('file.txt','r')
lines = f.readlines()
f.close()
for line in lines:
if search_str in line:
index = lines.index(line)
break
lines = lines[index+1:]
print SearchStuff(lines,search_str)
search_str = 'segmentC' #set this string accordingly
main(search_str)
Depending on the complexity you are looking for, you can range from a simple state machine with line based pattern searching to a full lexer.
Line based search
The below example makes the assumption that you are only looking for one segment and that segmentC { and the closing } are on one single line.
def parsesegment(fh):
# Yields all lines inside "segmentC"
state = "out"
for line in fh:
line = line.strip() # in case there are whitespaces around
if state == "out":
if line.startswith("segmentC {"):
state = "in"
break
elif state == "in":
if line.startswith("}"):
state = "out"
break
# Work on the specific lines here
yield line
with open(...) as fh:
for line in parsesegment(fh):
# do something
Simple Lexer
If you need more flexibility, you can design a simple lexer/parser couple. For example, the following code makes no assumption to the organisation of the syntax between lines. It also ignores unknown pattern, which a typical lexer do not (normally it should raise a syntax error):
import re
class ParseSegment:
# Dictionary of patterns per state
# Tuples are (token name, pattern, state change command)
_regexes = {
"out": [
("open", re.compile(r"segment(?P<segment>\w+)\s+\{"), "in")
],
"in": [
("close", re.compile(r"\}"), "out"),
# Here an example of what you could want to match
("content", re.compile(r"content\s+(?P<content>\w+)"), None)
]
}
def lex(self, source, initpos = 0):
pos = initpos
end = len(source)
state = "out"
while pos < end:
for token_name, reg, state_chng in self._regexes[state]:
# Try to get a match
match = reg.match(source, pos)
if match:
# Advance according to how much was matched
pos = match.end()
# yield a token if it has a name
if token_name is not None:
# Yield token name, the full matched part of source
# and the match grouped according to (?P<tag>) tags
yield (token_name, match.group(), match.groupdict())
# Switch state if requested
if state_chng is not None:
state = state_chng
break
else:
# No match, advance by one character
# This is particular to that lexer, usually no match means
# the input file has an error in the syntax and lexer should
# yield an exception
pos += 1
def parse(self, source, initpos = 0):
# This is an example of use of the lexer with a parser
# This converts the input file into a dictionary. Keys are segment
# names, and values are list of contents.
segments = {}
cur_segment = None
# Use lexer to get tokens from source
for token, fullmatch, groups in self.lex(source, initpos):
# On open, create the list of content in segments
if token == "open":
cur_segment = groups["segment"]
segments[cur_segment] = []
# On content, ensure we know the segment and add content to the
# list
elif token == "content":
if cur_segment is None:
raise RuntimeError("Content found outside a segment")
segments[cur_segment].append(groups["content"])
# On close, set the current segment to unknown
elif token == "close":
cur_segment = None
# ignore unknown tokens, we could raise an error instead
return segments
def main():
with open("...", "r") as fh:
data = fh.read()
lexer = ParseSegment()
segments = lexer.parse(data)
print(segments)
return 0
if __name__ == '__main__':
main()
Full Lexer
Then if you need even more flexibility and reuseability, you will have to create a full parser. No need to reinvent the wheel, have a look at this list of language parsing modules, you will probably find the one that suits you.
I got plugin for sublime text 3 that let me move cursor to line by its number:
import sublime, sublime_plugin
class prompt_goto_lineCommand(sublime_plugin.WindowCommand):
def run(self):
self.window.show_input_panel("Goto Line:", "", self.on_done, None, None)
pass
def on_done(self, text):
try:
line = int(text)
if self.window.active_view():
self.window.active_view().run_command("goto_line", {"line": line} )
except ValueError:
pass
class go_to_lineCommand(sublime_plugin.TextCommand):
def run(self, edit, line):
# Convert from 1 based to a 0 based line number
line = int(line) - 1
# Negative line numbers count from the end of the buffer
if line < 0:
lines, _ = self.view.rowcol(self.view.size())
line = lines + line + 1
pt = self.view.text_point(line, 0)
self.view.sel().clear()
self.view.sel().add(sublime.Region(pt))
self.view.show(pt)
I want to improve it to let me move cursor to first line containing the specified string. It is like a search on file:
For example if pass to it string "class go_to_lineCommand" plugin must move cursor to line 17 :
and possibly select string class go_to_lineCommand.
The problem is reduced to finding regionWithGivenString, and then I can select it:
self.view.sel().add(regionWithGivenString)
But don't know method to get regionWithGivenString.
I tried to
find on google: sublime plugin find and select text
check api
But still no result.
I am not sure about the typical way. However, you can achieve this in following way:
Get the content of current doc.
Search target string to find out its start and end position. Now you have the start and end point.
Add the Region(start, end) to selections.
Example:
def run(self, edit, target):
if not target or target == "":
return
content = self.view.substr(sublime.Region(0, self.view.size()))
begin = content.find(target)
if begin == -1:
return
end = begin + len(target)
target_region = sublime.Region(begin, end)
self.view.sel().clear()
self.view.sel().add(target_region)
there you have it in the API, use the view.find(regex,pos) method.
s = self.view.find("go_to_lineCommand", 0)
self.view.sel().add(s)
http://www.sublimetext.com/docs/3/api_reference.html
A possible improvement to the longhua's answer - adding moving cursor to the target line.
class FindcustomCommand(sublime_plugin.TextCommand):
def _select(self):
self.view.sel().clear()
self.view.sel().add(self._target_region)
def run(self, edit):
TARGET = 'http://nabiraem'
# if not target or target == "":
# return
content = self.view.substr(sublime.Region(0, self.view.size()))
begin = content.find(TARGET)
if begin == -1:
return
end = begin + len(TARGET)
self._target_region = sublime.Region(begin, end)
self._select()
self.view.show(self._target_region) # scroll to selection
I have config file,
[local]
variable1 : val1 ;#comment1
variable2 : val2 ;#comment2
code like this reads only value of the key:
class Config(object):
def __init__(self):
self.config = ConfigParser.ConfigParser()
self.config.read('config.py')
def get_path(self):
return self.config.get('local', 'variable1')
if __name__ == '__main__':
c = Config()
print c.get_path()
but i also want to read the comment present along with the value, any suggestions in this regards will be very helpful.
Alas, this is not easily done in general case. Comments are supposed to be ignored by the parser.
In your specific case, it is easy, because # only serves as a comment character if it begins a line. So variable1's value will be "val1 #comment1". I suppose you use something like this, only less brittle:
val1_line = c.get('local', 'var1')
val1, comment = val1_line.split(' #')
If the value of a 'comment' is needed, probably it is not a proper comment? Consider adding explicit keys for the 'comments', like this:
[local]
var1: 108.5j
var1_comment: remember, the flux capacitor capacitance is imaginary!
Your only solutions is to write another ConfigParser overriding the method _read(). In your ConfigParser you should delete all checks about comment removal. This is a dangerous solution, but should work.
class ValuesWithCommentsConfigParser(ConfigParser.ConfigParser):
def _read(self, fp, fpname):
from ConfigParser import DEFAULTSECT, MissingSectionHeaderError, ParsingError
cursect = None # None, or a dictionary
optname = None
lineno = 0
e = None # None, or an exception
while True:
line = fp.readline()
if not line:
break
lineno = lineno + 1
# comment or blank line?
if line.strip() == '' or line[0] in '#;':
continue
if line.split(None, 1)[0].lower() == 'rem' and line[0] in "rR":
# no leading whitespace
continue
# continuation line?
if line[0].isspace() and cursect is not None and optname:
value = line.strip()
if value:
cursect[optname].append(value)
# a section header or option header?
else:
# is it a section header?
mo = self.SECTCRE.match(line)
if mo:
sectname = mo.group('header')
if sectname in self._sections:
cursect = self._sections[sectname]
elif sectname == DEFAULTSECT:
cursect = self._defaults
else:
cursect = self._dict()
cursect['__name__'] = sectname
self._sections[sectname] = cursect
# So sections can't start with a continuation line
optname = None
# no section header in the file?
elif cursect is None:
raise MissingSectionHeaderError(fpname, lineno, line)
# an option line?
else:
mo = self._optcre.match(line)
if mo:
optname, vi, optval = mo.group('option', 'vi', 'value')
optname = self.optionxform(optname.rstrip())
# This check is fine because the OPTCRE cannot
# match if it would set optval to None
if optval is not None:
optval = optval.strip()
# allow empty values
if optval == '""':
optval = ''
cursect[optname] = [optval]
else:
# valueless option handling
cursect[optname] = optval
else:
# a non-fatal parsing error occurred. set up the
# exception but keep going. the exception will be
# raised at the end of the file and will contain a
# list of all bogus lines
if not e:
e = ParsingError(fpname)
e.append(lineno, repr(line))
# if any parsing errors occurred, raise an exception
if e:
raise e
# join the multi-line values collected while reading
all_sections = [self._defaults]
all_sections.extend(self._sections.values())
for options in all_sections:
for name, val in options.items():
if isinstance(val, list):
options[name] = '\n'.join(val)
In the ValuesWithCommentsConfigParser I fixed some imports and deleted the appropriate sections of code.
Using the same config.ini from my previous answer, I can prove the previous code is correct.
config = ValuesWithCommentsConfigParser()
config.read('config.ini')
assert config.get('local', 'variable1') == 'value1 ; comment1'
assert config.get('local', 'variable2') == 'value2 # comment2'
Accordiing to the ConfigParser module documentation,
Configuration files may include comments, prefixed by specific
characters (# and ;). Comments may appear on their own in an otherwise
empty line, or may be entered in lines holding values or section
names. In the latter case, they need to be preceded by a whitespace
character to be recognized as a comment. (For backwards compatibility,
only ; starts an inline comment, while # does not.)
If you want to read the "comment" with the value, you can omit the whitespace before the ; character or use the #. But in this case the strings comment1 and comment2 become part of the value and are not considered comments any more.
A better approach would be to use a different property name, such as variable1_comment, or to define another section in the configuration dedicated to comments:
[local]
variable1 = value1
[comments]
variable1 = comment1
The first solution requires you to generate a new key using another one (i.e. compute variable1_comment from variable1), the other one allows you to use the same key targeting different sections in the configuration file.
As of Python 2.7.2, is always possibile to read a comment along the line if you use the # character. As the docs say, it's for backward compatibility. The following code should run smoothly:
config = ConfigParser.ConfigParser()
config.read('config.ini')
assert config.get('local', 'variable1') == 'value1'
assert config.get('local', 'variable2') == 'value2 # comment2'
for the following config.ini file:
[local]
variable1 = value1 ; comment1
variable2 = value2 # comment2
If you adopt this solution, remember to manually parse the result of get() for values and comments.
according to the manuals:
Lines beginning with '#' or ';' are ignored and may be used to provide comments.
so the value of variable1 is "val1 #comment1".The comment is part of the value
you can check your config whether you put a Enter before your comment
In case anyone comes along afterwards. My situation was I needed to read in a .ini file generated by a Pascal Application. That configparser didn't care about # or ; starting the keys.
For example the .ini file would look like this
[KEYTYPE PATTERNS]
##-######=CLAIM
Python's configparser would skip that key value pair. Needed to modify the configparser to not look at # as comments
config = configparser.ConfigParser(comment_prefixes="")
config.read("thefile")
I'm sure I could set the comment_prefixes to whatever Pascal uses for comments, but didn't see any, so I set it to an empty string
Is there any solution to force the RawConfigParser.write() method to export the config file with an alphabetical sort?
Even if the original/loaded config file is sorted, the module mixes the section and the options into the sections arbitrarily, and is really annoying to edit manually a huge unsorted config file.
PD: I'm using python 2.6
I was able to solve this issue by sorting the sections in the ConfigParser from the outside like so:
config = ConfigParser.ConfigParser({}, collections.OrderedDict)
config.read('testfile.ini')
# Order the content of each section alphabetically
for section in config._sections:
config._sections[section] = collections.OrderedDict(sorted(config._sections[section].items(), key=lambda t: t[0]))
# Order all sections alphabetically
config._sections = collections.OrderedDict(sorted(config._sections.items(), key=lambda t: t[0] ))
# Write ini file to standard output
config.write(sys.stdout)
Three solutions:
Pass in a dict type (second argument to the constructor) which returns the keys in your preferred sort order.
Extend the class and overload write() (just copy this method from the original source and modify it).
Copy the file ConfigParser.py and add the sorting to the method write().
See this article for a ordered dict or maybe use this implementation which preserves the original adding order.
This is my solution for writing config file in alphabetical sort:
class OrderedRawConfigParser( ConfigParser.RawConfigParser ):
"""
Overload standart Class ConfigParser.RawConfigParser
"""
def __init__( self, defaults = None, dict_type = dict ):
ConfigParser.RawConfigParser.__init__( self, defaults = None, dict_type = dict )
def write(self, fp):
"""Write an .ini-format representation of the configuration state."""
if self._defaults:
fp.write("[%s]\n" % DEFAULTSECT)
for key in sorted( self._defaults ):
fp.write( "%s = %s\n" % (key, str( self._defaults[ key ] ).replace('\n', '\n\t')) )
fp.write("\n")
for section in self._sections:
fp.write("[%s]\n" % section)
for key in sorted( self._sections[section] ):
if key != "__name__":
fp.write("%s = %s\n" %
(key, str( self._sections[section][ key ] ).replace('\n', '\n\t')))
fp.write("\n")
The first method looked as the most easier, and safer way.
But, after looking at the source code of the ConfigParser, it creates an empty built-in dict, and then copies all the values from the "second parameter" one-by-one. That means it won't use the OrderedDict type. An easy work around can be to overload the CreateParser class.
class OrderedRawConfigParser(ConfigParser.RawConfigParser):
def __init__(self, defaults=None):
self._defaults = type(defaults)() ## will be correct with all type of dict.
self._sections = type(defaults)()
if defaults:
for key, value in defaults.items():
self._defaults[self.optionxform(key)] = value
It leaves only one flaw open... namely in ConfigParser.items(). odict doesn't support update and comparison with normal dicts.
Workaround (overload this function too):
def items(self, section):
try:
d2 = self._sections[section]
except KeyError:
if section != DEFAULTSECT:
raise NoSectionError(section)
d2 = type(self._section)() ## Originally: d2 = {}
d = self._defaults.copy()
d.update(d2) ## No more unsupported dict-odict incompatibility here.
if "__name__" in d:
del d["__name__"]
return d.items()
Other solution to the items issue is to modify the odict.OrderedDict.update function - maybe it is more easy than this one, but I leave it to you.
PS: I implemented this solution, but it doesn't work. If i figure out, ConfigParser is still mixing the order of the entries, I will report it.
PS2: Solved. The reader function of the ConfigParser is quite idiot. Anyway, only one line had to be changed - and some others for overloading in an external file:
def _read(self, fp, fpname):
cursect = None
optname = None
lineno = 0
e = None
while True:
line = fp.readline()
if not line:
break
lineno = lineno + 1
if line.strip() == '' or line[0] in '#;':
continue
if line.split(None, 1)[0].lower() == 'rem' and line[0] in "rR":
continue
if line[0].isspace() and cursect is not None and optname:
value = line.strip()
if value:
cursect[optname] = "%s\n%s" % (cursect[optname], value)
else:
mo = self.SECTCRE.match(line)
if mo:
sectname = mo.group('header')
if sectname in self._sections:
cursect = self._sections[sectname]
## Add ConfigParser for external overloading
elif sectname == ConfigParser.DEFAULTSECT:
cursect = self._defaults
else:
## The tiny single modification needed
cursect = type(self._sections)() ## cursect = {'__name__':sectname}
cursect['__name__'] = sectname
self._sections[sectname] = cursect
optname = None
elif cursect is None:
raise ConfigParser.MissingSectionHeaderError(fpname, lineno, line)
## Add ConfigParser for external overloading.
else:
mo = self.OPTCRE.match(line)
if mo:
optname, vi, optval = mo.group('option', 'vi', 'value')
if vi in ('=', ':') and ';' in optval:
pos = optval.find(';')
if pos != -1 and optval[pos-1].isspace():
optval = optval[:pos]
optval = optval.strip()
if optval == '""':
optval = ''
optname = self.optionxform(optname.rstrip())
cursect[optname] = optval
else:
if not e:
e = ConfigParser.ParsingError(fpname)
## Add ConfigParser for external overloading
e.append(lineno, repr(line))
if e:
raise e
Trust me, I didn't wrote this thing. I copy-pasted it entirely from ConfigParser.py
So overall what to do?
Download odict.py from one of the links previously suggested
Import it.
Copy-paste these codes in your favorite utils.py (which will create the OrderedRawConfigParser class for you)
cfg = utils.OrderedRawConfigParser(odict.OrderedDict())
use cfg as always. it will stay ordered.
Sit back, smoke a havanna, relax.
PS3: The problem I solved here is only in Python 2.5. In 2.6 there is already a solution for that. They created a second custom parameter in the __init__ function, which is a custom dict_type.
So this workaround is needed only for 2.5
I was looking into this for merging a .gitmodules doing a subtree merge with a supermodule -- was super confused to start with, and having different orders for submodules was confusing enough haha.
Using GitPython helped alot:
from collections import OrderedDict
import git
filePath = '/tmp/git.config'
# Could use SubmoduleConfigParser to get fancier
c = git.GitConfigParser(filePath, False)
c.sections()
# http://stackoverflow.com/questions/8031418/how-to-sort-ordereddict-in-ordereddict-python
c._sections = OrderedDict(sorted(c._sections.iteritems(), key=lambda x: x[0]))
c.write()
del c