This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 11 years ago.
I'm having trouble with this Python code.. It's supposed to give me output similar to that shown here (should differ for different music files):
album=
artist=Ghost in the Machine
title=A Time Long Forgotten (Concept
genre=31
name=/music/_singles/a_time_long_forgotten_con.mp3
year=1999
comment=http://mp3.com/ghostmachine
But instead gives me the following:
name = derp/whoshotya.mp3
Here is the code given to me (slightly modified to accommodate for the music sample I am using) in Chapter 5 of Dive Into Python, which can be found here:
"""Framework for getting filetype-specific metadata.
Instantiate appropriate class with filename. Returned object acts like a dictionary, with key-value pairs for each piece of metadata.
import fileinfo
info = fileinfo.MP3FileInfo("/music/ap/mahadeva.mp3")
print "\\n".join(["%s=%s" % (k, v) for k, v in info.items()])
Or use listDirectory function to get info on all files in a directory.
for info in fileinfo.listDirectory("/music/ap/", [".mp3"]):
...
Framework can be extended by adding classes for particular file types, e.g.
HTMLFileInfo, MPGFileInfo, DOCFileInfo. Each class is completely responsible for
parsing its files appropriately; see MP3FileInfo for example.
This program is part of "Dive Into Python", a free Python book for
experienced programmers. Visit http://diveintopython.net/ for the
latest version.
"""
__author__ = "Mark Pilgrim (mark#diveintopython.org)"
__version__ = "$Revision: 1.3 $"
__date__ = "$Date: 2004/05/05 21:57:19 $"
__copyright__ = "Copyright (c) 2001 Mark Pilgrim"
__license__ = "Python"
import os
import sys
from UserDict import UserDict
def stripnulls(data):
"strip whitespace and nulls"
return data.replace("\00", " ").strip()
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):
UserDict.__init__(self)
self["name"] = filename
class MP3FileInfo(FileInfo):
"store ID3v1.0 MP3 tags"
tagDataMap = {"title" : ( 3, 33, stripnulls),
"artist" : ( 33, 63, stripnulls),
"album" : ( 63, 93, stripnulls),
"year" : ( 93, 97, stripnulls),
"comment" : ( 97, 126, stripnulls),
"genre" : (127, 128, ord)}
def __parse(self, filename):
"parse ID3v1.0 tags from MP3 file"
self.clear()
try:
fsock = open(filename, "rb", 0)
try:
fsock.seek(-128, 2)
tagdata = fsock.read(128)
finally:
fsock.close()
if tagdata[:3] == 'TAG':
for tag, (start, end, parseFunc) in self.tagDataMap.items():
self[tag] = parseFunc(tagdata[start:end])
except IOError:
pass
def __setitem__(self, key, item):
if key == "name" and item:
self.__parse(item)
FileInfo.__setitem__(self, key, item)
def listDirectory(directory, fileExtList):
"get list of file info objects for files of particular extensions"
fileList = [os.path.normcase(f) for f in os.listdir(directory)]
fileList = [os.path.join(directory, f) for f in fileList \
if os.path.splitext(f)[1] in fileExtList]
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
"get file info class from filename extension"
subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
return [getFileInfoClass(f)(f) for f in fileList]
if __name__ == "__main__":
for info in listDirectory("derp/", [".mp3"]):
print "\n".join(["%s=%s" % (k, v) for k, v in info.items()])
print
I think the problem may be here:
if __name__ == "__main__":
for info in listDirectory("derp/", [".mp3"]):
print "\n".join(["%s=%s" % (k, v) for k, v in info.items()])
print
My thought is that the file in question (info) doesn't have all the file tags filled like the example does. That's why it's only showing the name field. Try manually changing the file tags (by right clicking the file and going to properties) and running the function again.
EDIT:
I'm not sure exactly what's going on in your code, but I have a theory.
class MP3FileInfo(FileInfo):
...
def __setitem__(self, key, item):
if key == "name" and item:
self.__parse(item)
FileInfo.__setitem__(self, key, item)
This part is confusing me. What exactly happens here? It would seem like you would want to add the parsed data to the FileInfo object, wouldn't you? Where does this function get called from? And is key ever going to be anything other than "name", because that might be your problem.
Related
I want to parse a large wikipedia dump iteratively. I found a tutorial for this here: https://towardsdatascience.com/wikipedia-data-science-working-with-the-worlds-largest-encyclopedia-c08efbac5f5c
However, when I want to read in the data like this:
data_path = 'C:\\Users\\Me\\datasets\\dewiki-latest-pages-articles1.xml-p1p262468.bz2'
import xml.sax
class WikiXmlHandler(xml.sax.handler.ContentHandler):
"""Content handler for Wiki XML data using SAX"""
def __init__(self):
xml.sax.handler.ContentHandler.__init__(self)
self._buffer = None
self._values = {}
self._current_tag = None
self._pages = []
def characters(self, content):
"""Characters between opening and closing tags"""
if self._current_tag:
self._buffer.append(content)
def startElement(self, name, attrs):
"""Opening tag of element"""
if name in ('title', 'text'):
self._current_tag = name
self._buffer = []
def endElement(self, name):
"""Closing tag of element"""
if name == self._current_tag:
self._values[name] = ' '.join(self._buffer)
if name == 'page':
self._pages.append((self._values['title'], self._values['text']))
# Object for handling xml
handler = WikiXmlHandler()
# Parsing object
parser = xml.sax.make_parser()
parser.setContentHandler(handler)
# Iteratively process file
for line in subprocess.Popen(['bzcat'],
stdin = open(data_path),
stdout = subprocess.PIPE,shell=True).stdout:
parser.feed(line)
# Stop when 3 articles have been found
if len(handler._pages) > 3:
break
it seems like nothing happens. The handler._pages list is empty. This is where the parsed articles should be stored. I also added shell=True because otherwise I get the error message FileNotFoundError: [WinError 2].
I never worked with subprocesses in python so I don't know what the problem might be.
I also tried to specify the data_path differently (with / and //).
Thank you in advance.
I want to check if class c uses a certain module m. I am able to get two things:
- the module's usage in that file(f),
- the starting line number of the classes in file f.
However, in order to know whether a particular class uses the module, I need to know the starting and end line of class c. I don't know how to get the end line of class c.
I tried going through the documentation of ast but could not find any method of finding the scope of a class.
My current code is:
source_code = "car"
source_code_data = pyclbr.readmodule(source_code)
This gives the following in source_code_data variable:
1
As you can see in the image, there is lineno but no end lineno to denote the end line of a class.
I expect to get the end line of the class in order to know its scope and finally knowing the usage of the module. Currently, the module has this structure:
Module: vehicles used in file: /Users/aviralsrivastava/dev/generate_uml/inheritance_and_dependencies/car.py at: [('vehicles.Vehicle', 'Vehicle', 5), ('vehicles.Vehicle', 'Vehicle', 20)]
So, with this information, I will be able to know if a module is used in a range: used in a class.
My whole code base is:
import ast
import os
import pyclbr
import subprocess
import sys
from operator import itemgetter
from dependency_collector import ModuleUseCollector
from plot_uml_in_excel import WriteInExcel
class GenerateUML:
def __init__(self):
self.class_dict = {} # it will have a class:children mapping.
def show_class(self, name, class_data):
print(class_data)
self.class_dict[name] = []
self.show_super_classes(name, class_data)
def show_methods(self, class_name, class_data):
methods = []
for name, lineno in sorted(class_data.methods.items(),
key=itemgetter(1)):
# print(' Method: {0} [{1}]'.format(name, lineno))
methods.append(name)
return methods
def show_super_classes(self, name, class_data):
super_class_names = []
for super_class in class_data.super:
if super_class == 'object':
continue
if isinstance(super_class, str):
super_class_names.append(super_class)
else:
super_class_names.append(super_class.name)
for super_class_name in super_class_names:
if self.class_dict.get(super_class_name, None):
self.class_dict[super_class_name].append(name)
else:
self.class_dict[super_class_name] = [name]
# adding all parents for a class in one place for later usage: children
return super_class_names
def get_children(self, name):
if self.class_dict.get(name, None):
return self.class_dict[name]
return []
source_code = "car"
source_code_data = pyclbr.readmodule(source_code)
print(source_code_data)
generate_uml = GenerateUML()
for name, class_data in sorted(source_code_data.items(), key=lambda x: x[1].lineno):
print(
"Class: {}, Methods: {}, Parent(s): {}, File: {}".format(
name,
generate_uml.show_methods(
name, class_data
),
generate_uml.show_super_classes(name, class_data),
class_data.file
)
)
print('-----------------------------------------')
# create a list with all the data
# the frame of the list is: [{}, {}, {},....] where each dict is: {"name": <>, "methods": [], "children": []}
agg_data = []
files = {}
for name, class_data in sorted(source_code_data.items(), key=lambda x: x[1].lineno):
methods = generate_uml.show_methods(name, class_data)
children = generate_uml.get_children(name)
# print(
# "Class: {}, Methods: {}, Child(ren): {}, File: {}".format(
# name,
# methods,
# children,
# class_data.file
# )
# )
agg_data.append(
{
"Class": name,
"Methods": methods,
"Children": children,
"File": class_data.file
}
)
files[class_data.file] = name
print('-----------------------------------------')
# print(agg_data)
for data_index in range(len(agg_data)):
agg_data[data_index]['Dependents'] = None
module = agg_data[data_index]["File"].split('/')[-1].split('.py')[0]
used_in = []
for file_ in files.keys():
if file_ == agg_data[data_index]["File"]:
continue
collector = ModuleUseCollector(module)
source = open(file_).read()
collector.visit(ast.parse(source))
print('Module: {} used in file: {} at: {}'.format(
module, file_, collector.used_at))
if len(collector.used_at):
used_in.append(files[file_])
agg_data[data_index]['Dependents'] = used_in
'''
# checking the dependencies
dependencies = []
for data_index in range(len(agg_data)):
collector = ModuleUseCollector(
agg_data[data_index]['File'].split('/')[-1].split('.py')[0]
)
collector.visit(ast.parse(source))
dependencies.append(collector.used_at)
agg_data[source_code_index]['Dependencies'] = dependencies
# next thing, for each class, find the dependency in each class.
'''
print('------------------------------------------------------------------')
print('FINAL')
for data in agg_data:
print(data)
print('-----------')
print('\n')
# The whole data is now collected and we need to form the dataframe of it:
'''
write_in_excel = WriteInExcel(file_name='dependency_1.xlsx')
df = write_in_excel.create_pandas_dataframe(agg_data)
write_in_excel.write_df_to_excel(df, 'class_to_child_and_dependents')
'''
'''
print(generate_uml.class_dict)
write_in_excel = WriteInExcel(
classes=generate_uml.class_dict, file_name='{}.xlsx'.format(source_code))
write_in_excel.form_uml_sheet_for_classes()
'''
Edit: Invasive Method
I did manage to find a way to do this with pyclbr but it involves changing parts of the source code. Essentially you make the stack (which is normally a list) a custom class. When an item is removed from the stack (when its scope has ended) the ending line number is added. I tried to make it as un invasive as possible.
First define a stack class in the pyclbr module:
class Stack(list):
def __init__(self):
import inspect
super().__init__()
def __delitem__(self, key):
frames = inspect.stack()
setattr(self[key][0], 'endline', frames[1].frame.f_locals['start'][0] - 1)
super().__delitem__(key)
Then you change the stack in the _create_tree function which is originally on line 193:
stack = Stack()
Now you can access the ending line as well by using
class_data.endline
I was not able to find a solution that uses the libraries you are using. So I wrote a simple parser that finds the start and end lines of every class contained in a module. You must pass in either the text wrapper itself or the list created by readlines() on your file. Its also worth noting that if a class continues until the end of the file then the end is -1.
import re
def collect_classes(source):
classes = {}
current_class = None
for lineno, line in enumerate(source, start=1):
if current_class and not line.startswith(' '):
if line != '\n':
classes[current_class]['end'] = lineno - 1
current_class = None
if line.startswith('class'):
current_class = re.search(r'class (.+?)(?:\(|:)', line).group(1)
classes[current_class] = {'start': lineno}
if current_class:
classes[current_class]['end'] = -1
return classes
import datetime
file = open(datetime.__file__, 'r') # opening the source of datetime for a test
scopes = collect_classes(file)
print(scopes)
This outputs:
{'timedelta': {'start': 454, 'end': 768}, 'date': {'start': 774, 'end': 1084}, 'tzinfo': {'start': 1092, 'end': 1159}, 'time': {'start': 1162, 'end': 1502}, 'datetime': {'start': 1509, 'end': 2119}, 'timezone': {'start': 2136, 'end': 2251}}
Thanks! The endline is really a missing feature there.
The discussion on this led to an alternative (still invasive) method to achieve this but that is almost a one-liner that does not require to subclass list and override the del operator:
https://github.com/python/cpython/pull/16466#issuecomment-539664180
Basically, just add
stack[-1].endline = start - 1
before all del stack[-1] in pyclbr.py
I'd like to be able to query Rally for an existing defect and then copy that defect changing only a couple of fields while maintaining all attachments. Is there a simple way to do this? I tried calling rally.create and passing the existing defect object, but it failed to serialize all members into JSON. Ultimately, it would be nice if pyral was extended to include this kind of functionality.
Instead, I've written some code to copy each python-native attribute of the existing defect and then use .ref for everything else. It seems to be working quite well. I've leveraged Mark W's code for copying attachments and that's working great also. One remaining frustration is that copying the iteration isn't working. When I call .ref on the Iteration attribute, I get this:
>>> s
<pyral.entity.Defect object at 0x029A74F0>
>>> s.Iteration
<pyral.entity.Iteration object at 0x029A7710>
>>> s.Iteration.ref
No classFor item for |UserIterationCapacity|
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python27\lib\site-packages\pyral\entity.py", line 119, in __getattr__
hydrateAnInstance(self._context, item, existingInstance=self)
File "c:\python27\lib\site-packages\pyral\restapi.py", line 77, in hydrateAnInstance
return hydrator.hydrateInstance(item, existingInstance=existingInstance)
File "c:\python27\lib\site-packages\pyral\hydrate.py", line 62, in hydrateInstance
self._setAppropriateAttrValueForType(instance, attrName, attrValue, 1)
File "c:\python27\lib\site-packages\pyral\hydrate.py", line 128, in _setAppropriateAttrValueForType
elements = [self._unravel(element) for element in attrValue]
File "c:\python27\lib\site-packages\pyral\hydrate.py", line 162, in _unravel
return self._basicInstance(thing)
File "c:\python27\lib\site-packages\pyral\hydrate.py", line 110, in _basicInstance
raise KeyError(itemType)
KeyError: u'UserIterationCapacity'
>>>
Does this look like an issue with Rally or perhaps an issue with a custom field that our project admin might have caused? I was able to work around it by building the ref from the oid:
newArtifact["Iteration"] = { "_ref": "iteration/" + currentArtifact.Iteration.oid }
This feels kludgy to me though.
Check out the Python answer (there's 2 answers for it, the other one is for Ruby) to this question:
Rally APIs: How to copy Test Folder and member Test Cases
The answer contains a python script that copies Test Cases, with attachments. Although the script is for Test Cases, the logic should be quite readily adaptable to Defects, as the operations will be fundamentally the same - only the field attributes will differ. The script copies with Attachments, Tags, etc, pretty much the whole artifact.
Final solution including Mark W's code for copying attachments
def getDataCopy( data ):
""" Given a piece of data, figure out how to copy it. If it's a native python type
like a string or numeric, just return the value. If it's a rally object, return
the ref to it. If it's a list, iterate and call ourself recursively for the
list members. """
if isinstance( data, types.ListType ):
copyData = []
for entry in data:
copyData.append( getDataCopy(entry) )
elif hasattr( data, "ref" ):
copyData = { "_ref": data.ref }
else:
copyData = data
return copyData
def getArtifactCopy( artifact ):
""" Build a dictionary based on the values in the specified artifact. This dictionary
can then be passed to a rallyConn.put() call to actually create the new entry in
Rally. Attachments and Tasks must be copied seperately, since they require creation
of additional artifacts """
newArtifact = {}
for attrName in artifact.attributes():
# Skip the attributes that we can't or shouldn't handle directly
if attrName.startswith("_") or attrName == "oid" or attrName == "Iteration" or attrName == "Attachments":
continue
attrValue = getattr( artifact, attrName )
newArtifact[attrName] = getDataCopy( attrValue )
if getattr( artifact, "Iteration", None ) != None:
newArtifact["Iteration"] = { "_ref": "iteration/" + artifact.Iteration.oid }
return newArtifact
def copyAttachments( rallyConn, oldArtifact, newArtifact ):
""" For each attachment in the old artifact, create new attachments and attach them to the new artifact"""
# Copy Attachments
source_attachments = rallyConn.getAttachments(oldArtifact)
for source_attachment in source_attachments:
# First copy the content
source_attachment_content = source_attachment.Content
target_attachment_content_fields = { "Content": base64.encodestring(source_attachment_content) }
try:
target_attachment_content = rallyConn.put( 'AttachmentContent', target_attachment_content_fields )
print "\t===> Copied AttachmentContent: %s" % target_attachment_content.ref
except pyral.RallyRESTAPIError, details:
sys.stderr.write('ERROR: %s \n' % details)
sys.exit(2)
# Next copy the attachment object
target_attachment_fields = {
"Name": source_attachment.Name,
"Description": source_attachment.Description,
"Content": target_attachment_content.ref,
"ContentType": source_attachment.ContentType,
"Size": source_attachment.Size,
"User": source_attachment.User.ref
}
# Attach it to the new artifact
target_attachment_fields["Artifact"] = newArtifact.ref
try:
target_attachment = rallyConn.put( source_attachment._type, target_attachment_fields)
print "\t===> Copied Attachment: '%s'" % target_attachment.Name
except pyral.RallyRESTAPIError, details:
sys.stderr.write('ERROR: %s \n' % details)
sys.exit(2)
def copyTasks( rallyConn, oldArtifact, newArtifact ):
""" Iterate over the old artifacts tasks and create new ones, attaching them to the new artifact """
for currentTask in oldArtifact.Tasks:
newTask = getArtifactCopy( currentTask )
# Assign the new task to the new artifact
newTask["WorkProduct"] = newArtifact.ref
# Push the new task into rally
newTaskObj = rallyConn.put( currentTask._type, newTask )
# Copy any attachments the task had
copyAttachments( rallyConn, currentTask, newTaskObj )
def copyDefect( rallyConn, currentDefect, addlUpdates = {} ):
""" Copy a defect including its attachments and tasks. Add the new defect as a
duplicate to the original """
newArtifact = getArtifactCopy( currentDefect )
# Add the current defect as a duplicate for the new one
newArtifact["Duplicates"].append( { "_ref": currentDefect.ref } )
# Copy in any updates that might be needed
for (attrName, attrValue) in addlUpdates.items():
newArtifact[attrName] = attrValue
print "Copying %s: %s..." % (currentDefect.Project.Name, currentDefect.FormattedID),
newDefect = rallyConn.create( currentDefect._type, newArtifact )
print "done, new item", newDefect.FormattedID
print "\tCopying attachments"
copyAttachments( rallyConn, currentDefect, newDefect )
print "\tCopying tasks"
copyTasks( rallyConn, currentDefect, newDefect )
I'm trying to get the "title" and "artist" of every music file in a directory so I can so stuff with it. I'm a newbie so I'm using this guys code: http://www.diveintopython.net/object_oriented_framework/index.html
Right, so I've got the following output. Using my elite image manipulation skillset, I've pointed out what data I would like to keep:
Using the following code:
if __name__ == "__main__":
for info in listDirectory("/home/alpha/htdocs/music/", [".mp3"]):
for song in info.items():
print song[1]
Full code:
import os
import sys
from UserDict import UserDict
def stripnulls(data):
"strip whitespace and nulls"
return data.replace("\00", "").strip()
class FileInfo(UserDict):
"store file metadata"
def __init__(self, filename=None):
UserDict.__init__(self)
self["name"] = filename
class MP3FileInfo(FileInfo):
"store ID3v1.0 MP3 tags"
tagDataMap = {"title" : ( 3, 33, stripnulls),
"artist" : ( 33, 63, stripnulls),
"album" : ( 63, 93, stripnulls),
"year" : ( 93, 97, stripnulls),
"comment" : ( 97, 126, stripnulls),
"genre" : (127, 128, ord)}
def __parse(self, filename):
"parse ID3v1.0 tags from MP3 file"
self.clear()
try:
fsock = open(filename, "rb", 0)
try:
fsock.seek(-128, 2)
tagdata = fsock.read(128)
finally:
fsock.close()
if tagdata[:3] == "TAG":
for tag, (start, end, parseFunc) in self.tagDataMap.items():
self[tag] = parseFunc(tagdata[start:end])
except IOError:
pass
def __setitem__(self, key, item):
if key == "name" and item:
self.__parse(item)
FileInfo.__setitem__(self, key, item)
def listDirectory(directory, fileExtList):
"get list of file info objects for files of particular extensions"
fileList = [os.path.normcase(f)
for f in os.listdir(directory)]
fileList = [os.path.join(directory, f)
for f in fileList
if os.path.splitext(f)[1] in fileExtList]
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
"get file info class from filename extension"
subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
return [getFileInfoClass(f)(f) for f in fileList]
if __name__ == "__main__":
for info in listDirectory("/music/_singles/", [".mp3"]):
print "\n".join(["%s=%s" % (k, v) for k, v in info.items()])
print
But I'm stuck after that because I have no idea what to do next. Could someone help me please?
You're probably better off using a library like Mutagen.
import mutagen, mutagen.mp3, mutagen.easyid3
mp3 = mutagen.mp3.MP3(file, ID3=mutagen.easyid3.EasyID3)
print mp3['artist'], mp3['album'], mp3['title']
One benefit of Mutagen over your MP3FileInfo is ID3v2 support. You're just reading the old, simple ID3v1 tags. Mutagen parses all standard ID3v2.4 frames.
If you've got that data in info, then just access what you want via:
info['artist']
info['album']
etc... anything in your mapping
In your main loop, info is a dictionary, with keys that you can see defined in tagDataMap in the MP3FileInfo class. So modifying your original code:
if __name__ == "__main__":
for info in listDirectory("/music/_singles/", [".mp3"]):
print "\n".join(["%s=%s" % (k, info[k]) for k in ("title", "artist")])
print
Is there any solution to force the RawConfigParser.write() method to export the config file with an alphabetical sort?
Even if the original/loaded config file is sorted, the module mixes the section and the options into the sections arbitrarily, and is really annoying to edit manually a huge unsorted config file.
PD: I'm using python 2.6
I was able to solve this issue by sorting the sections in the ConfigParser from the outside like so:
config = ConfigParser.ConfigParser({}, collections.OrderedDict)
config.read('testfile.ini')
# Order the content of each section alphabetically
for section in config._sections:
config._sections[section] = collections.OrderedDict(sorted(config._sections[section].items(), key=lambda t: t[0]))
# Order all sections alphabetically
config._sections = collections.OrderedDict(sorted(config._sections.items(), key=lambda t: t[0] ))
# Write ini file to standard output
config.write(sys.stdout)
Three solutions:
Pass in a dict type (second argument to the constructor) which returns the keys in your preferred sort order.
Extend the class and overload write() (just copy this method from the original source and modify it).
Copy the file ConfigParser.py and add the sorting to the method write().
See this article for a ordered dict or maybe use this implementation which preserves the original adding order.
This is my solution for writing config file in alphabetical sort:
class OrderedRawConfigParser( ConfigParser.RawConfigParser ):
"""
Overload standart Class ConfigParser.RawConfigParser
"""
def __init__( self, defaults = None, dict_type = dict ):
ConfigParser.RawConfigParser.__init__( self, defaults = None, dict_type = dict )
def write(self, fp):
"""Write an .ini-format representation of the configuration state."""
if self._defaults:
fp.write("[%s]\n" % DEFAULTSECT)
for key in sorted( self._defaults ):
fp.write( "%s = %s\n" % (key, str( self._defaults[ key ] ).replace('\n', '\n\t')) )
fp.write("\n")
for section in self._sections:
fp.write("[%s]\n" % section)
for key in sorted( self._sections[section] ):
if key != "__name__":
fp.write("%s = %s\n" %
(key, str( self._sections[section][ key ] ).replace('\n', '\n\t')))
fp.write("\n")
The first method looked as the most easier, and safer way.
But, after looking at the source code of the ConfigParser, it creates an empty built-in dict, and then copies all the values from the "second parameter" one-by-one. That means it won't use the OrderedDict type. An easy work around can be to overload the CreateParser class.
class OrderedRawConfigParser(ConfigParser.RawConfigParser):
def __init__(self, defaults=None):
self._defaults = type(defaults)() ## will be correct with all type of dict.
self._sections = type(defaults)()
if defaults:
for key, value in defaults.items():
self._defaults[self.optionxform(key)] = value
It leaves only one flaw open... namely in ConfigParser.items(). odict doesn't support update and comparison with normal dicts.
Workaround (overload this function too):
def items(self, section):
try:
d2 = self._sections[section]
except KeyError:
if section != DEFAULTSECT:
raise NoSectionError(section)
d2 = type(self._section)() ## Originally: d2 = {}
d = self._defaults.copy()
d.update(d2) ## No more unsupported dict-odict incompatibility here.
if "__name__" in d:
del d["__name__"]
return d.items()
Other solution to the items issue is to modify the odict.OrderedDict.update function - maybe it is more easy than this one, but I leave it to you.
PS: I implemented this solution, but it doesn't work. If i figure out, ConfigParser is still mixing the order of the entries, I will report it.
PS2: Solved. The reader function of the ConfigParser is quite idiot. Anyway, only one line had to be changed - and some others for overloading in an external file:
def _read(self, fp, fpname):
cursect = None
optname = None
lineno = 0
e = None
while True:
line = fp.readline()
if not line:
break
lineno = lineno + 1
if line.strip() == '' or line[0] in '#;':
continue
if line.split(None, 1)[0].lower() == 'rem' and line[0] in "rR":
continue
if line[0].isspace() and cursect is not None and optname:
value = line.strip()
if value:
cursect[optname] = "%s\n%s" % (cursect[optname], value)
else:
mo = self.SECTCRE.match(line)
if mo:
sectname = mo.group('header')
if sectname in self._sections:
cursect = self._sections[sectname]
## Add ConfigParser for external overloading
elif sectname == ConfigParser.DEFAULTSECT:
cursect = self._defaults
else:
## The tiny single modification needed
cursect = type(self._sections)() ## cursect = {'__name__':sectname}
cursect['__name__'] = sectname
self._sections[sectname] = cursect
optname = None
elif cursect is None:
raise ConfigParser.MissingSectionHeaderError(fpname, lineno, line)
## Add ConfigParser for external overloading.
else:
mo = self.OPTCRE.match(line)
if mo:
optname, vi, optval = mo.group('option', 'vi', 'value')
if vi in ('=', ':') and ';' in optval:
pos = optval.find(';')
if pos != -1 and optval[pos-1].isspace():
optval = optval[:pos]
optval = optval.strip()
if optval == '""':
optval = ''
optname = self.optionxform(optname.rstrip())
cursect[optname] = optval
else:
if not e:
e = ConfigParser.ParsingError(fpname)
## Add ConfigParser for external overloading
e.append(lineno, repr(line))
if e:
raise e
Trust me, I didn't wrote this thing. I copy-pasted it entirely from ConfigParser.py
So overall what to do?
Download odict.py from one of the links previously suggested
Import it.
Copy-paste these codes in your favorite utils.py (which will create the OrderedRawConfigParser class for you)
cfg = utils.OrderedRawConfigParser(odict.OrderedDict())
use cfg as always. it will stay ordered.
Sit back, smoke a havanna, relax.
PS3: The problem I solved here is only in Python 2.5. In 2.6 there is already a solution for that. They created a second custom parameter in the __init__ function, which is a custom dict_type.
So this workaround is needed only for 2.5
I was looking into this for merging a .gitmodules doing a subtree merge with a supermodule -- was super confused to start with, and having different orders for submodules was confusing enough haha.
Using GitPython helped alot:
from collections import OrderedDict
import git
filePath = '/tmp/git.config'
# Could use SubmoduleConfigParser to get fancier
c = git.GitConfigParser(filePath, False)
c.sections()
# http://stackoverflow.com/questions/8031418/how-to-sort-ordereddict-in-ordereddict-python
c._sections = OrderedDict(sorted(c._sections.iteritems(), key=lambda x: x[0]))
c.write()
del c