I am working with images that have multiple layer which are described in their meta data that looks like this..
print layers
Cube1[visible:true, mode:Normal]{r:Cube1.R, g:Cube1.G, b:Cube1.B, a:Cube1.A}, Ground[visible:true, mode:Lighten, opacity:186]{r:Ground.R, g:Ground.G, b:Ground.B, a:Ground.A}, Cube3[visible:true, mode:Normal]{r:Cube3.R, g:Cube3.G, b:Cube3.B, a:Cube3.A}
I'm wondering if this formatting could be recognizable by Python as more then a string. Ideally I would like to call up the properties of any one for the layers. For example:
print layers[0].mode
"Normal"
On another post someone showed me how to get the names of each layer, which was very helpful, but now I'm looking to use the other info.
PS: if it helps I don't care about any of the info inside the {}
Thanks
print type(layers)
<type 'str'>"
In case you don't want to deal with regex ...
layers = "Cube1[visible:true, mode:Normal]{r:Cube1.R, g:Cube1.G, b:Cube1.B, a:Cube1.A}, Ground[visible:true, mode:Lighten, opacity:186]{r:Ground.R, g:Ground.G, b:Ground.B, a:Ground.A}, Cube3[visible:true, mode:Normal]{r:Cube3.R, g:Cube3.G, b:Cube3.B, a:Cube3.A}"
layer_dict = {}
parts = layers.split('}')
for part in parts:
part = part.strip(', ')
name_end = part.find('[')
if name_end < 1:
continue
name = part[:name_end]
attrs_end = part.find(']')
attrs = part[name_end+1:attrs_end].split(', ')
layer_dict[name] = {}
for attr in attrs:
attr_parts = attr.split(':')
layer_dict[name][attr_parts[0]] = attr_parts[1]
print 'Cube1 ... mode:', layer_dict.get('Cube1').get('mode')
print 'Ground ... opacity:', layer_dict.get('Ground').get('opacity')
print 'Cube3', layer_dict.get('Cube3')
output ...
Cube1 ... mode: Normal
Ground ... opacity: 186
Cube3 {'visible': 'true', 'mode': 'Normal'}
Parsing (Pyparsing et al) is surely the correct and extensible way to go, but here's a fast-and-dirty object and constructors using regexes and comprehensions to parse properties and bolt them on with setattr(). All constructive criticisms welcome!
import re
#import string
class Layer(object):
#classmethod
def make_list_from_string(cls,s):
all_layers_params = re.findall(r'(\w+)\[([^\]]+)\]',s)
return [cls(lname,largs) for (lname, largs) in all_layers_params]
def __init__(self,name,args):
self.name = name
for (larg,lval) in re.findall(r'(\w+):(\w+)(?:,\w*)?', args):
setattr(self,larg,lval)
def __str__(self):
return self.name + '[' + ','.join('%s:%s' % (k,v) for k,v in self.__dict__.iteritems() if k!='name') + ']'
def __repr__(self):
return self.__str__()
t = 'Cube1[visible:true, mode:Normal]{r:Cube1.R, g:Cube1.G, b:Cube1.B, a:Cube1.A}, Ground[visible:true, mode:Lighten, opacity:186]{r:Ground.R, g:Ground.G, b:Ground.B, a:Ground.A}, Cube3[visible:true, mode:Normal]{r:Cube3.R, g:Cube3.G, b:Cube3.B, a:Cube3.A}'
layers = Layer.make_list_from_string(t)
I moved all the imperative code into __init__() or the classmethod Layers.make_list_from_string().
Currently it stores all args as string, it doesn't figure opacity is int/float, but that's just an extra try...except block.
Hey, it does the job you wanted. And as a bonus it throws in mutability:
print layers[0].mode
'Normal'
print layers[1].opacity
'186'
print layers[2]
Cube3[visible:true,mode:Normal]
layers[0].mode = 'Weird'
print layers[0].mode
'Weird'
"I'm wondering if this formatting could be recognizable by Python as more then a string."
Alternatively, I was thinking if you tweaked the format a little, eval()/exec() could be used, but that's yukkier, slower and a security risk.
Related
I'm writing software which does some analysis of the input and returns a result. Part of the requirements includes it generates zero or more warnings or errors and includes those with the result. I'm also writing unit tests which, in particular, have some contrived data to verify the right warnings are emitted.
I need to be able to parse the warnings/errors and verify that the expected messages are correctly emitted. I figured I'd store the messages in a container and reference them with a message ID which is pretty similar to how I've done localization in the past.
errormessages.py right now looks pretty similar to:
from enum import IntEnum
NO_MESSAGE = ('')
HELLO = ('Hello, World')
GOODBYE = ('Goodbye')
class MsgId(IntEnum):
NO_MESSAGE = 0
HELLO = 1
GOODBYE = 2
Msg = {
MessageId.NO_MESSAGE: NO_MESSAGE,
MessageId.HELLO: HELLO,
MessageId.GOODBYE: GOODBYE,
}
So then the analysis can look similar to this:
from errormessages import Msg, MsgId
def analyse(_):
errors = []
errors.append(Msg[MsgId.HELLO])
return _, errors
And in the unit tests I can do something similar to
from errormessages import Msg, MsgId
from my import analyse
def test_hello():
_, errors = analyse('toy')
assert Msg[MsgId.HELLO] in errors
But some of the messages get formatted and I think that's going to play hell with parsing the messages for unit tests. I was thinking I'd add flavors of the messages; one for formatting and the other for parsing:
updated errormessages.py:
from enum import IntEnum
import re
FORMAT_NO_MESSAGE = ('')
FORMAT_HELLO = ('Hello, {}')
FORMAT_GOODBYE = ('Goodbye')
PARSE_NO_MESSAGE = re.compile(r'^$')
PARSE_HELLO = re.compile(r'^Hello, (.*)$')
PARSE_GOODBYE = re.compile('^Goodbye$')
class MsgId(IntEnum):
NO_MESSAGE = 0
HELLO = 1
GOODBYE = 2
Msg = {
MessageId.NO_MESSAGE: (FORMAT_NO_MESSAGE, PARSE_NO_MESSAGE),
MessageId.HELLO: (FORMAT_HELLO, PARSE_HELLO),
MessageId.GOODBYE: (FORMAT_GOODBYE, PARSE_GOODBYE),
}
So then the analysis can look like:
from errormessages import Msg, MsgId
def analyse(_):
errors = []
errors.append(Msg[MsgId.HELLO][0].format('World'))
return _, errors
And in the unit tests I can do:
from errormessages import Msg, MsgId
from my import analyse
import re
def test_hello():
_, errors = analyse('toy')
expected = {v: [] for v in MsgId}
expected[MsgId.HELLO] = [
Msg[MsgId.HELLO][1].match(msg)
for msg in errors
]
for _,v in expected.items():
if _ == MsgId.HELLO:
assert v
else:
assert not v
I was wondering if there's perhaps a better / simpler way? In particular, the messages are effectively repeated twice; once for the formatter and once for the regular expression. Is there a way to use a single string for both formatting and regular expression capturing?
Assuming the messages are all stored as format string templates (e.g. "Hello", or "Hello, {}" or "Hello, {firstname} {surname}"), then you could generate the regexes directly from the templates:
import re
import random
import string
def format_string_to_regex(format_string: str) -> re.Pattern:
"""Convert a format string template to a regex."""
unique_string = ''.join(random.choices(string.ascii_letters, k=24))
stripped_fields = re.sub(r"\{[^\{\}]*\}(?!\})", unique_string, format_string)
pattern = re.escape(stripped_fields).replace(unique_string, "(.*)")
pattern = pattern.replace("\{\{","\{").replace("\}\}", "\}")
return re.compile(f"^{pattern}$")
def is_error_message(error: str, expected_message: MessageId) -> bool:
"""Returns whether the error plausibly matches the MessageId."""
expected_format = format_string_to_regex(Msg[expected_message])
return bool(expected_format.match(error))
I am new to programming in python,´and i have some troubles understanding the concept. I wish to compare two xml files. These xml files are quite large.
I will give an example for the type of files i wish to compare.
xmlfile1:
<xml>
<property1>
<property2>
<property3>
</property3>
</property2>
</property1>
</xml>
xml file2:
<xml>
<property1>
<property2>
<property3>
<property4>
</property4>
</property3>
</property2>
</property1>
</xml>
the property1,property2 that i have named are different from the ones that are actually in the file. There are a lot of properties within the xml file.
ANd i wish to compare the two xml files.
I am using an lxml parser to try to compare the two files and to print out the difference between them.
I do not know how to parse it and compare it automatically.
I tried reading through the lxml parser, but i couldnt understand how to use it to my problem.
Can someone please tell me how should i proceed with this problem.
Code snippets can be very useful
One more question, Am i following the right concept or i am missing something else? Please correct me of any new concepts that you knwo about
This is actually a reasonably challenging problem (due to what "difference" means often being in the eye of the beholder here, as there will be semantically "equivalent" information that you probably don't want marked as differences).
You could try using xmldiff, which is based on work in the paper Change Detection in Hierarchically Structured Information.
My approach to the problem was transforming each XML into a xml.etree.ElementTree and iterating through each of the layers.
I also included the functionality to ignore a list of attributes while doing the comparison.
The first block of code holds the class used:
import xml.etree.ElementTree as ET
import logging
class XmlTree():
def __init__(self):
self.hdlr = logging.FileHandler('xml-comparison.log')
self.formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
#staticmethod
def convert_string_to_tree( xmlString):
return ET.fromstring(xmlString)
def xml_compare(self, x1, x2, excludes=[]):
"""
Compares two xml etrees
:param x1: the first tree
:param x2: the second tree
:param excludes: list of string of attributes to exclude from comparison
:return:
True if both files match
"""
if x1.tag != x2.tag:
self.logger.debug('Tags do not match: %s and %s' % (x1.tag, x2.tag))
return False
for name, value in x1.attrib.items():
if not name in excludes:
if x2.attrib.get(name) != value:
self.logger.debug('Attributes do not match: %s=%r, %s=%r'
% (name, value, name, x2.attrib.get(name)))
return False
for name in x2.attrib.keys():
if not name in excludes:
if name not in x1.attrib:
self.logger.debug('x2 has an attribute x1 is missing: %s'
% name)
return False
if not self.text_compare(x1.text, x2.text):
self.logger.debug('text: %r != %r' % (x1.text, x2.text))
return False
if not self.text_compare(x1.tail, x2.tail):
self.logger.debug('tail: %r != %r' % (x1.tail, x2.tail))
return False
cl1 = x1.getchildren()
cl2 = x2.getchildren()
if len(cl1) != len(cl2):
self.logger.debug('children length differs, %i != %i'
% (len(cl1), len(cl2)))
return False
i = 0
for c1, c2 in zip(cl1, cl2):
i += 1
if not c1.tag in excludes:
if not self.xml_compare(c1, c2, excludes):
self.logger.debug('children %i do not match: %s'
% (i, c1.tag))
return False
return True
def text_compare(self, t1, t2):
"""
Compare two text strings
:param t1: text one
:param t2: text two
:return:
True if a match
"""
if not t1 and not t2:
return True
if t1 == '*' or t2 == '*':
return True
return (t1 or '').strip() == (t2 or '').strip()
The second block of code holds a couple of XML examples and their comparison:
xml1 = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"
xml2 = "<note><to>Tove</to><from>Daniel</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>"
tree1 = XmlTree.convert_string_to_tree(xml1)
tree2 = XmlTree.convert_string_to_tree(xml2)
comparator = XmlTree()
if comparator.xml_compare(tree1, tree2, ["from"]):
print "XMLs match"
else:
print "XMLs don't match"
Most of the credit for this code must be given to syawar
If your intent is to compare the XML content and attributes, and not just compare the files byte-by-byte, there are subtleties to the question, so there is no solution that fits all cases.
You have to know something about what is important in the XML files.
The order of attributes listed in an element tag is generally not supposed to matter. That is, two XML files that differ only in the order of element attributes generally ought to be judged the same.
But that's the generic part.
The tricky part is application-dependent. For instance, it may be that white-space formatting of some elements of the file doesn't matter, and white-space might be added to the XML for legibility. And so on.
Recent versions of the ElementTree module have a function canonicalize(), which can take care of simpler cases, by putting the XML string into a canonical format.
I used this function in the unit tests of a recent project, to compare a known XML output with output from a package that sometimes changes the order of attributes. In this case, white space in the text elements was unimportant, but it was sometimes used for formatting.
import xml.etree.ElementTree as ET
def _canonicalize_XML( xml_str ):
""" Canonicalizes XML strings, so they are safe to
compare directly.
Strips white space from text content."""
if not hasattr( ET, "canonicalize" ):
raise Exception( "ElementTree missing canonicalize()" )
root = ET.fromstring( xml_str )
rootstr = ET.tostring( root )
return ET.canonicalize( rootstr, strip_text=True )
To use it, something like this:
file1 = ET.parse('file1.xml')
file2 = ET.parse('file2.xml')
canon1 = _canonicalize_XML( ET.tostring( file1.getroot() ) )
canon2 = _canonicalize_XML( ET.tostring( file2.getroot() ) )
print( canon1 == canon2 )
In my distribution, the Python 2 doesn't have canonicalize(), but Python 3 does.
Another script using xml.etree. Its awful but it works :)
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as ET
from termcolor import colored
tree1 = ET.parse(sys.argv[1])
root1 = tree1.getroot()
tree2 = ET.parse(sys.argv[2])
root2 = tree2.getroot()
class Element:
def __init__(self,e):
self.name = e.tag
self.subs = {}
self.atts = {}
for child in e:
self.subs[child.tag] = Element(child)
for att in e.attrib.keys():
self.atts[att] = e.attrib[att]
print "name: %s, len(subs) = %d, len(atts) = %d" % ( self.name, len(self.subs), len(self.atts) )
def compare(self,el):
if self.name!=el.name:
raise RuntimeError("Two names are not the same")
print "----------------------------------------------------------------"
print self.name
print "----------------------------------------------------------------"
for att in self.atts.keys():
v1 = self.atts[att]
if att not in el.atts.keys():
v2 = '[NA]'
color = 'yellow'
else:
v2 = el.atts[att]
if v2==v1:
color = 'green'
else:
color = 'red'
print colored("first:\t%s = %s" % ( att, v1 ), color)
print colored("second:\t%s = %s" % ( att, v2 ), color)
for subName in self.subs.keys():
if subName not in el.subs.keys():
print colored("first:\thas got %s" % ( subName), 'purple')
print colored("second:\thasn't got %s" % ( subName), 'purple')
else:
self.subs[subName].compare( el.subs[subName] )
e1 = Element(root1)
e2 = Element(root2)
e1.compare(e2)
Consider a reStructuredText document with this skeleton:
Main Title
==========
text text text text text
Subsection
----------
text text text text text
.. my-import-from:: file1
.. my-import-from:: file2
The my-import-from directive is provided by a document-specific Sphinx extension, which is supposed to read the file provided as its argument, parse reST embedded in it, and inject the result as a section in the current input file. (Like autodoc, but for a different file format.) The code I have for that, right now, looks like this:
class MyImportFromDirective(Directive):
required_arguments = 1
def run(self):
src, srcline = self.state_machine.get_source_and_line()
doc_file = os.path.normpath(os.path.join(os.path.dirname(src),
self.arguments[0]))
self.state.document.settings.record_dependencies.add(doc_file)
doc_text = ViewList()
try:
doc_text = extract_doc_from_file(doc_file)
except EnvironmentError as e:
raise self.error(e.filename + ": " + e.strerror) from e
doc_section = nodes.section()
doc_section.document = self.state.document
# report line numbers within the nested parse correctly
old_reporter = self.state.memo.reporter
self.state.memo.reporter = AutodocReporter(doc_text,
self.state.memo.reporter)
nested_parse_with_titles(self.state, doc_text, doc_section)
self.state.memo.reporter = old_reporter
if len(doc_section) == 1 and isinstance(doc_section[0], nodes.section):
doc_section = doc_section[0]
# If there was no title, synthesize one from the name of the file.
if len(doc_section) == 0 or not isinstance(doc_section[0], nodes.title):
doc_title = nodes.title()
doc_title.append(make_title_text(doc_file))
doc_section.insert(0, doc_title)
return [doc_section]
This works, except that the new section is injected as a child of the current section, rather than a sibling. In other words, the example document above produces a TOC tree like this:
Main Title
Subsection
File1
File2
instead of the desired
Main Title
Subsection
File1
File2
How do I fix this? The Docutils documentation is ... inadequate, particularly regarding control of section depth. One obvious thing I have tried is returning doc_section.children instead of [doc_section]; that completely removes File1 and File2 from the TOC tree (but does make the section headers in the body of the document appear to be for the right nesting level).
I don't think it is possible to do this by returning the section from the directive (without doing something along the lines of what Florian suggested), as it will get appended to the 'current' section. You can, however, add the section via self.state.section as I do in the following (handling of options removed for brevity)
class FauxHeading(object):
"""
A heading level that is not defined by a string. We need this to work with
the mechanics of
:py:meth:`docutils.parsers.rst.states.RSTState.check_subsection`.
The important thing is that the length can vary, but it must be equal to
any other instance of FauxHeading.
"""
def __init__(self, length):
self.length = length
def __len__(self):
return self.length
def __eq__(self, other):
return isinstance(other, FauxHeading)
class ParmDirective(Directive):
required_arguments = 1
optional_arguments = 0
has_content = True
option_spec = {
'type': directives.unchanged,
'precision': directives.nonnegative_int,
'scale': directives.nonnegative_int,
'length': directives.nonnegative_int}
def run(self):
variableName = self.arguments[0]
lineno = self.state_machine.abs_line_number()
secBody = None
block_length = 0
# added for some space
lineBlock = nodes.line('', '', nodes.line_block())
# parse the body of the directive
if self.has_content and len(self.content):
secBody = nodes.container()
block_length += nested_parse_with_titles(
self.state, self.content, secBody)
# keeping track of the level seems to be required if we want to allow
# nested content. Not sure why, but fits with the pattern in
# :py:meth:`docutils.parsers.rst.states.RSTState.new_subsection`
myLevel = self.state.memo.section_level
self.state.section(
variableName,
'',
FauxHeading(2 + len(self.options) + block_length),
lineno,
[lineBlock] if secBody is None else [lineBlock, secBody])
self.state.memo.section_level = myLevel
return []
I don't know how to do it directly inside your custom directive. However, you can use a custom transform to raise the File1 and File2 nodes in the tree after parsing. For example, see the transforms in the docutils.transforms.frontmatter module.
In your Sphinx extension, use the Sphinx.add_transform method to register the custom transform.
Update: You can also directly register the transform in your directive by returning one or more instances of the docutils.nodes.pending class in your node list. Make sure to call the note_pending method of the document in that case (in your directive you can get the document via self.state_machine.document).
I've been mocking about with the following bit of dirty support-code for a pylons app, which works fine in a python-shell, a separate python file, or when running in paster. Now, we've put the application on-line through mod_wsgi and apache and this specific piece of code stopped working completely. First off, the code itself:
def fixStyle(self, text):
t = text.replace('<p>', '<p style="%s">' % (STYLEDEF,))
t = t.replace('class="wide"', 'style="width: 125px; %s"' % (STYLEDEF,))
t = t.replace('<td>', '<td style="%s">' % (STYLEDEF,))
t = t.replace('<a ', '<a style="%s" ' % (LINKSTYLE,))
return t
It seems pretty straightforward, and to be honest, it is. So what happens when I put a piece of text in it, for example:
<table><tr><td>Test!</td></tr></table>
The output should be:
<table><tr><td style="stuff-from-styledef">Test!</td></tr></table>
and it is, on most systems. When we put it through the app on Apache/mod_wsgi though, the following happens:
<table><tr><td>Test!</td></tr></table>
You guessed it.
I have put logging at the start outputting the text, and at the end outputting original text and the t variable. It displays what I present here: on most systems t is changed, on the apache environment it isn't.
Of course I made sure to restart apache (to get it to reload the .py files) after every change, and it reflected in the logging output.
I'm currently at a loss and have no idea where to go next. Googling doesn't really work out, so I'm hoping on you guys to help out and perhaps point out a fundamental issue with using whatever-is-causing-this.
If anything is missing I'll edit it in.
Add some print statments and examine the Apache logs:
def fixStyle(self, text):
print "text:", text
print "STYLEDEF", STYLEDEF
t = text.replace('<p>', '<p style="%s">' % (STYLEDEF,))
print "t:", t
I have no idea concerning what is your problem, but I find the repetition of replace() not a good thing: if the four patterns are in the string, there will be 4 times creation of a new string.
IMO, this should be better:
def fixStyle(self, text):
t = text.replace('<p>', '<p style="%s">' % (STYLEDEF,))
t = t.replace('class="wide"', 'style="width: 125px; %s"' % (STYLEDEF,))
t = t.replace('<td>', '<td style="%s">' % STYLEDEF)
t = t.replace('<a ', '<a style="%s" ' % (LINKSTYLE,))
return t
import re
STYLEDEF = 'stuff-from-styledef'
LINKSTYLE = 'VVVV'
def aux(m, dic = {'<p':('<p style="',STYLEDEF),
'<td':('<td style="',STYLEDEF),
'class="wide"':('style="width: 125px; ',STYLEDEF),
'<a':('<a style="',LINKSTYLE)} ):
return '%s%s"' % dic[m.group()]
pat = re.compile('<p(?=>)>|class="wide"|<td(?=>)|<a(?= )')
ch = '<table><tr><td>Test!</td></tr></table><a type="brown" >'
print ch
print fixStyle(None, ch)
print pat.sub(aux,ch)
result
<table><tr><td>Test!</td></tr></table><a type="brown" >
<table><tr><td style="stuff-from-styledef">Test!</td></tr></table><a style="VVVV" type="brown" >
<table><tr><td style="stuff-from-styledef">Test!</td></tr></table><a style="VVVV" type="brown" >
I think re.sub() does the replacements in only one pass upon the string.
Defining parameter dic with a default argument => the value is assigned to dic at the definition of aux() and then doesn't change anymore. At each call, there is no passing of an argument to dic from the outer level: the value is kept inside the function.
Also, the function aux() doesn't need to go out and search the values of STYLEDEF and LINKSTYLE ouside the function.
All that should increase the execution speed.
.
EDIT:
Since ' style="' and STYLEDEF are common to several results to be returned, I had tried to shorten the list of them and I had found
def aux(m, dic = {'<p' :'<p style="%s"',
'<td' :'<td style="%s"',
'class="wide"':'style="width: 125px; %s"'} ):
if m.group(1):
return '<a style="%s"' % LINKSTYLE
else:
return dic[m.group()] % STYLEDEF
pat = re.compile('<p(?=>)|class="wide"|<td(?=>)|(<a)(?= )')
In the aim to take down the conditional lines , I wrote the preceding solution and, I dont know why, I stopped there. The interest of the solution was in the writing of the Regular Expression string , with assertions, that allow to write the solution of John Machin, but I polluted it with these oafish tuples.
There is also this solution:
def aux(m, STY = STYLEDEF,LIN = LINKSTYLE ):
return ( 'style="width: 125px; ' if m.group(3) else m.group(1)+' style="' ) + \
( LIN if m.group(2) else STY) + '"'
pat = re.compile('(<p(?=>)|<td(?=>)|(<a(?= )))|(class="wide")')
But the clearer and simpler solution is, as John Machin noticed:
def aux(m, dic = {'<p' :'<p style="%s"' % STYLEDEF,
'<td':'<td style="%s"' % STYLEDEF,
'<a' :'<a style="%s"' % LINKSTYLE,
'class="wide"':'style="%s"' % ('width: 125px; '+STYLEDEF) } ):
return dic[m.group()]
pat = re.compile('<p(?=>)|class="wide"|<td(?=>)|<a(?= )')
The values in dic are calculated only at the execution of the function aux()'s definition.
In fact , it's very near the arguments of the replace() functions.
Sorry, but: Descriptions of debugging that don't mention repr() are not credible. Ensure that you are logging repr(text) and repr(t), NOT text and t.
Run the non-working environment and at least one of the working environments on the same piece of data and edit your question to show the actual code that you used and the actual logging output.
mystring = '14| "Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2"|minor'
So I have 3 elements (id, message and level) divided by pipe ("|"). I want to get each element so I have written these little functions:
def get_msg(i):
x = i.split("|")
return x[1].strip().replace('"','')
def get_level(i):
x = i.split("|")
return x[2].strip()
#testing
print get_msg(mystring ) # Missing Input PID, PID: 20 : Port 4 of a static component
print get_level(mystring )# major
Right now it works well but I feel like this is not pythonic way to solve it, how could these 2 functions can be improved? Regular expression feels like fitting here but I'm very naive at it so couldn't apply.
I think the most pythonic way is to use the csv module.
From PyMotW with delimiter option:
import csv
import sys
f = open(sys.argv[1], 'rt')
try:
reader = csv.reader(f, delimiter='|')
for row in reader:
print row
finally:
f.close()
lst = msg.split('|')
level = lst[2].strip()
message = lst[1].strip(' "')
you're splitting your string twice which is a bit of a waste, other than that modification is minor.
class MyParser(object):
def __init__(self, value):
self.lst = value.split('|')
def id(self):
return self.lst[0]
def level(self):
return self.lst[2].strip()
def message(self):
return self.lst[1].strip(' "')
I think the best practice would be to actually have a better formatted string, or not use a string for that. Why is it a string? Where are you parsing this from? A database? Xml? Can the origin be altered?
{ 'id': 14, 'message': 'foo', 'type': 'minor' }
A datatype like this I think would be a best practice, if it's stored in a database then split it up in multiple columns.
Edit: I'm probably going to get stoned for this because it's probably overkill/inefficient but if you add lots of sections later on you could store these in a nice hash map:
>>> formatParts = {
... 'id': lambda x: x[0],
... 'message': lambda x: x[1].strip(' "'),
... 'level': lambda x: x[2].strip()
... }
>>> myList = mystring.split('|')
>>> formatParts['id'](myList)
'14'
>>> formatParts['message'](myList)
'Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2'
>>> formatParts['level'](myList)
'minor'
If you don't need the getter functions, this should work nicely:
>>> m_id,msg,lvl = [s.strip(' "') for s in mystring.split('|')]
>>> m_id,msg,lvl
('14', 'Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2',
'minor')
Note: avoid shadowing built-in function 'id'