How to get parser error as exception in docutils

How to get parser error as exception in docutils - python

I have the following simple piece of code to parse a reSt file and return the corresponding DOM tree.
from docutils import nodes, utils
from docutils.parsers import rst
def _rst_to_dom(self, txt):
"""Parse reStructuredText and return corresponding DOM tree."""
document = utils.new_document("Doc")
document.settings.tab_width = 4
document.settings.pep_references = 1
document.settings.rfc_references = 1
document.settings.raw_enabled = True
document.settings.file_insertion_enabled = True
rst.Parser().parse(txt, document)
return document.asdom()
This works great, but when the parser finds some problem with the input, instead of raising an exception so that my program knows that there is something wrong, it simply prints out an error message to the standard output and returns a tree with what it could do. How can I get it to raise an exception? Or, how can I know that something was amiss?

Related

Python: How to properly handle an exception using 'ask for forgiveness' (try–except) approach?

I encountered a problem with my custom exceptions since they exit the process (the interpreter displays a traceback) instead of being properly handled in the code. Since I do not have much experience working with custom exceptions and exceptions imported from modules in the same code work properly, I pressue I made some mistake while defining my exceptions but I cannot find proper documentation to fix it myself.
Here is a sample code.
It is supposed to check whether XML path input by user works (by work I mean it returns the value contained within that XML element node) and if it does not work, it raises XMLPrefixMissing exception (due to possibly missing namespace prefix in the XML path). Then it uses an XML path with wildcard operator in place of a namespace prefix) but if it does not work anyways, it raises XMLElementNotFound (due to the fact that an element is possibly not in the XML file).
import xml.etree.ElementTree as ElementTree
class Error(Exception):
"""Error base class"""
pass
class XMLPrefixMissing(Error):
"""Error for when an element is not found on an XML path"""
def __init__(self,
message='No element found on an XML path. Possibly missing namespace prefix.'):
self.message = message
super(Error, self).__init__(message)
class XMLElementNotFound(Error):
"""Error for when an element value on an XML path is an empty string"""
def __init__(self, message='No element found on an XML path.'):
self.message = message
super(Error, self).__init__(message)
# Some code
file = '.\folder\example_file.xml'
xml_path = './DataArea/Order/Item/Description/ItemName'
xml_path_with_wildcard = './{*}DataArea/{*}Order/{*}Item/{*}Description/{*}ItemName'
namespaces = {'': 'http://firstnamespace.example.com/', 'foo': 'http://secondnamespace.example.com/'}
def xml_parser(file, xml_path, xml_path_with_wildcard, namespaces):
tree = ElementTree.parse(file)
root = tree.getroot()
try:
if root.find(xml_path, namespaces=namespaces) is None:
raise XMLElementNotFound
# Some code
except XMLPrefixMissing:
if root.find(xml_path_with_wildcard, namespaces=namespaces) is None:
raise XMLElementValueEmpty
# Some code
except XMLElementNotFound as e:
print(e)

From experience, try-except blocks kind of operate as an if/elif/else chain.
It looks to me like you're trying to raise exceptions in one except block in hopes that it will be caught in the next except block.
Instead, you should try to account for all exceptions inside the try block, and just have different except blocks to catch the different exceptions.
try:
if root.find(xml_path, namespaces=namespaces) is None:
raise XMLElementNotFound
elif root.find(xml_path_with_wildcard, namespaces=namespaces) is None:
raise XMLElementValueEmpty
except XMLPrefixMissing:
#some code
except XMLElementValueEmpty as e:
print(e)

Trying to implement jsonschema with just filepaths

I've uploaded a json from the user and now I'm trying to compare that json to a schema using the jsonschema validator. I'm getting an error, ValidationError: is not of type u'object'
Failed validating u'type' in schema
This is my code so far:
from __future__ import unicode_literals
from django.shortcuts import render, redirect
import jsonschema
import json
import os
from django.conf import settings
#File to store all the parsers
def jsonVsSchemaParser(project, file):
baseProjectURL = 'src\media\json\schema'
projectSchema = project.lower()+'.schema'
projectPath = os.path.join(baseProjectURL,projectSchema)
filePath = os.path.join(settings.BASE_DIR,'src\media\json', file)
actProjectPath = os.path.join(settings.BASE_DIR,projectPath)
print filePath, actProjectPath
schemaResponse = open(actProjectPath)
schema = json.load(schemaResponse)
response = open(filePath)
jsonFile = json.load(response)
jsonschema.validate(jsonFile, schema)
I'm trying to do something similar to this question except instead of using a url I'm using my filepath.
Also I'm using python 2.7 and Django 1.11 if that is helpful at all.
Also I'm pretty sure I don't have a problem with my filepaths because I printed them and it outputted what I was expecting. I also know that my schema and json can be read by jsonschema since I used it on the command line as well.
EDIT: that validation error seemed to be a fluke. the actual validation error I'm consistently getting is "-1 is not of type u'string'". The annoying thing is that's supposed to be like that. It is wrong that sessionid isn't a string but I want that to be handled by the jsonschema but I don't want my validation errors to be given in this format: . What I want to do is collect all validation errors in an array and then post it to the user in the next page.

I just ended up putting a try-catch around my validate method. Here's what it looks like:
validationErrors = []
try:
jsonschema.validate(jsonFile, schema)
except jsonschema.exceptions.ValidationError as error:
validationErrors.append(error)
EDIT: This solution only works if you have one error because after the validation error is called it breaks out of the validate method. In order to present every error you need to use lazy validation. This is how it looks in my code if you need another example:
v = jsonschema.Draft4Validator(schema)
for error in v.iter_errors(jsonFile):
validationErrors.append(error)

✓ try-except-else-finally statement is a great way to catch and handle exceptions(Run time errors) in Python.
✓So if you want to catch and store Exceptions in an array then the great solution for you is to use try-except statement. In this way you can catch and store in any data structure like lists etc. and your program with continue with its execution, it will not terminate.
✓ Below is a modified code where I have used a for loop which catches error 5 times and stores in list.
validationErrors = []
for i in range(5):
try:
jsonschema.validate(jsonFile, schema)
except jsonschema.exceptions.ValidationError as error:
validationErrors.append(error)
✓ Finally, you can have a look at the below code sample where I have stored ZeroDivisionError and it's related string message in 2 different lists by iterating over a for loop 5 times.
You can use the 2nd list ZeroDivisionErrorMessagesList to pass to template, if you want to print messages on web page (if you want). You can use 1st also.
ZeroDivisionErrorsList = [];
ZeroDivisionErrorMessagesList = list(); # list() is same as [];
for i in range(5):
try:
a = 10 / 0; # it will raise exception
print(a);. # it will not execute
except ZeroDivisionError as error:
ZeroDivisionErrorsList.append(error)
ZeroDivisionErrorMessagesList.append(str(error))
print(ZeroDivisionErrorsList);
print(); # new line
print(ZeroDivisionErrorMessagesList);
» Output:
[ZeroDivisionError('division by zero',),
ZeroDivisionError('division by zero',),
ZeroDivisionError('division by zero',),
ZeroDivisionError('division by zero',),
ZeroDivisionError('division by zero',)]
['division by zero', 'division by zero', 'division by zero', 'division by zero', 'division by zero']

Python: Why will this string print but not write to a file?

I am new to Python and working on a utility that changes an XML file into an HTML. The XML comes from a call to request = urllib2.Request(url), where I generate the custom url earlier in the code, and then set response = urllib2.urlopen(request) and, finally, xml_response = response.read(). This works okay, as far as I can tell.
My trouble is with parsing the response. For starters, here is a partial example of the XML structure I get back:
I tried adapting the slideshow example in the minidom tutorial here to parse my XML (which is ebay search results, by the way): http://docs.python.org/2/library/xml.dom.minidom.html
My code so far looks like this, with try blocks as an attempt to diagnose issues:
doc = minidom.parseString(xml_response)
#Extract relevant information and prepare it for HTML formatting.
try:
handleDocument(doc)
except:
print "Failed to handle document!"
def getText(nodelist): #taken straight from slideshow example
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
print "A TEXT NODE!"
rc.append(node.data)
return ''.join(rc) #this is a string, right?
def handleDocument(doc):
outputFile = open("EbaySearchResults.html", "w")
outputFile.write("<html>\n")
outputFile.write("<body>\n")
try:
items = doc.getElementsByTagName("item")
except:
"Failed to get elements by tag name."
handleItems(items)
outputFile.write("</html>\n")
outputFile.write("</body>\n")
def handleItems(items):
for item in items:
title = item.getElementsByTagName("title")[0] #there should be only one title
print "<h2>%s</h2>" % getText(title.childNodes) #this works fine!
try: #none of these things work!
outputFile.write("<h2>%s</h2>" % getText(title.childNodes))
#outputFile.write("<h2>" + getText(title.childNodes) + "</h2>")
#str = getText(title.childNodes)
#outputFIle.write(string(str))
#outputFile.write(getText(title.childNodes))
except:
print "FAIL"
I do not understand why the correct title text does print to the console but throws an exception and does not work for the output file. Writing plain strings like this works fine: outputFile.write("<html>\n") What is going on with my string construction? As far as I can tell, the getText method I am using from the minidom example returns a string--which is just the sort of thing you can write to a file..?

If I print the actual stack trace...
...
except:
print "Exception when trying to write to file:"
print '-'*60
traceback.print_exc(file=sys.stdout)
print '-'*60
traceback.print_tb(sys.last_traceback)
...
...I will instantly see the problem:
------------------------------------------------------------
Traceback (most recent call last):
File "tohtml.py", line 85, in handleItems
outputFile.write(getText(title.childNodes))
NameError: global name 'outputFile' is not defined
------------------------------------------------------------
Looks like something has gone out of scope!
Fellow beginners, take note.

Python: Update XML-file using ElementTree while conserving layout as much as possible

I have a document which uses an XML namespace for which I want to increase /group/house/dogs by one: (the file is called houses.xml)
<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>2</dogs>
</house>
</group>
My current result using the code below is: (the created file is called houses2.xml)
<ns0:group xmlns:ns0="http://dogs.house.local">
<ns0:house>
<ns0:id>2821</ns0:id>
<ns0:dogs>3</ns0:dogs>
</ns0:house>
</ns0:group>
I would like to fix two things (if it is possible using ElementTree. If it isn´t, I´d be greatful for a suggestion as to what I should use instead):
I want to keep the <?xml version="1.0"?> line.
I do not want to prefix all tags, I´d like to keep it as is.
In conclusion, I don´t want to mess with the document more than I absolutely have to.
My current code (which works except for the above mentioned flaws) generating the above result follows.
I have made a utility function which loads an XML file using ElementTree and returns the elementTree and the namespace (as I do not want to hard code the namespace, and am willing to take the risk it implies):
def elementTreeRootAndNamespace(xml_file):
from xml.etree import ElementTree
import re
element_tree = ElementTree.parse(xml_file)
# Search for a namespace on the root tag
namespace_search = re.search('^({\S+})', element_tree.getroot().tag)
# Keep the namespace empty if none exists, if a namespace exists set
# namespace to {namespacename}
namespace = ''
if namespace_search:
namespace = namespace_search.group(1)
return element_tree, namespace
This is my code to update the number of dogs and save it to the new file houses2.xml:
elementTree, namespace = elementTreeRootAndNamespace('houses.xml')
# Insert the namespace before each tag when when finding current number of dogs,
# as ElementTree requires the namespace to be prefixed within {...} when a
# namespace is used in the document.
dogs = elementTree.find('{ns}house/{ns}dogs'.format(ns = namespace))
# Increase the number of dogs by one
dogs.text = str(int(dogs.text) + 1)
# Write the result to the new file houses2.xml.
elementTree.write('houses2.xml')

An XML based solution to this problem is to write a helper class for ElementTree which:
Grabs the XML-declaration line before parsing as ElementTree at the time of writing is unable to write an XML-declaration line without also writing an encoding attribute(I checked the source).
Parses the input file once, grabs the namespace of the root element. Registers that namespace with ElementTree as having the empty string as prefix. When that is done the source file is parsed using ElementTree again, with that new setting.
It has one major drawback:
XML-comments are lost. Which I have learned is not acceptable for this situation(I initially didn´t think the input data had any comments, but it turns out it has).
My helper class with example:
from xml.etree import ElementTree as ET
import re
class ElementTreeHelper():
def __init__(self, xml_file_name):
xml_file = open(xml_file_name, "rb")
self.__parse_xml_declaration(xml_file)
self.element_tree = ET.parse(xml_file)
xml_file.seek(0)
root_tag_namespace = self.__root_tag_namespace(self.element_tree)
self.namespace = None
if root_tag_namespace is not None:
self.namespace = '{' + root_tag_namespace + '}'
# Register the root tag namespace as having an empty prefix, as
# this has to be done before parsing xml_file we re-parse.
ET.register_namespace('', root_tag_namespace)
self.element_tree = ET.parse(xml_file)
def find(self, xpath_query):
return self.element_tree.find(xpath_query)
def write(self, xml_file_name):
xml_file = open(xml_file_name, "wb")
if self.xml_declaration_line is not None:
xml_file.write(self.xml_declaration_line + '\n')
return self.element_tree.write(xml_file)
def __parse_xml_declaration(self, xml_file):
first_line = xml_file.readline().strip()
if first_line.startswith('<?xml') and first_line.endswith('?>'):
self.xml_declaration_line = first_line
else:
self.xml_declaration_line = None
xml_file.seek(0)
def __root_tag_namespace(self, element_tree):
namespace_search = re.search('^{(\S+)}', element_tree.getroot().tag)
if namespace_search is not None:
return namespace_search.group(1)
else:
return None
def __main():
el_tree_hlp = ElementTreeHelper('houses.xml')
dogs_tag = el_tree_hlp.element_tree.getroot().find(
'{ns}house/{ns}dogs'.format(
ns=el_tree_hlp.namespace))
one_dog_added = int(dogs_tag.text.strip()) + 1
dogs_tag.text = str(one_dog_added)
el_tree_hlp.write('hejsan.xml')
if __name__ == '__main__':
__main()
The output:
<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>3</dogs>
</house>
</group>
If someone has an improvement to this solution please don´t hesitate to grab the code and improve it.

Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).
Depending on your needs, you have the following options:
If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.
You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0=" → <group xmlns=". This is pretty safe unless you can have CDATA in your XML.
You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.

when Save xml add default_namespace argument is easy to avoid ns0, on my code
key code: xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)
if os.path.isfile(xmlfiile):
xmltree = ET.parse(xmlfiile)
root = xmltree.getroot()
xmlnamespace = root.tag.split('{')[1].split('}')[0] //get namespace
initwin=xmltree.find("./{"+ xmlnamespace +"}test")
initwin.find("./{"+ xmlnamespace +"}content").text = "aaa"
xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)

etree from lxml provides this feature.
elementTree.write('houses2.xml',encoding = "UTF-8",xml_declaration = True) helps you in not omitting the declaration
While writing into the file it does not change the namespaces.
http://lxml.de/parsing.html is the link for its tutorial.
P.S : lxml should be installed separately.

How Do I Suppress or Disable Warnings in reSTructuredText?

I'm working on a CMS in Python that uses reStructuredText (via docutils) to format content. Alot of my content is imported from other sources and usually comes in the form of unformatted text documents. reST works great for this because it makes everything look pretty sane by default.
One problem I am having, however, is that I get warnings dumped to stderr on my webserver and injected into my page content. For example, I get warnings like the following on my web page:
System Message: WARNING/2 (, line 296); backlink
My question is: How do I suppress, disable, or otherwise re-direct these warnings?
Ideally, I'd love to write these out to a log file, but if someone can just tell me how to turn off the warnings from being injected into my content then that would be perfect.
The code that's responsible for parsing the reST into HTML:
from docutils import core
import reSTpygments
def reST2HTML( str ):
parts = core.publish_parts(
source = str,
writer_name = 'html')
return parts['body_pre_docinfo'] + parts['fragment']

def reST2HTML( str ):
parts = core.publish_parts(
source = str,
writer_name = 'html',
settings_overrides={'report_level':'quiet'},
)
return parts['body_pre_docinfo'] + parts['fragment']

It seems the report_level accept string is an old version. Now, the below is work for me.
import docutils.core
import docutils.utils
from pathlib import Path
shut_up_level = docutils.utils.Reporter.SEVERE_LEVEL + 1
docutils.core.publish_file(
source_path=Path(...), destination_path=Path(...),
settings_overrides={'report_level': shut_up_level},
writer_name='html')
about level
# docutils.utils.__init__.py
class Reporter(object):
# system message level constants:
(DEBUG_LEVEL,
INFO_LEVEL,
WARNING_LEVEL,
ERROR_LEVEL,
SEVERE_LEVEL) = range(5)
...
def system_message(self, level, message, *children, **kwargs):
...
if self.stream and (level >= self.report_level # self.report_level was set by you. (for example, shut_up_level)
or self.debug_flag and level == self.DEBUG_LEVEL
or level >= self.halt_level):
self.stream.write(msg.astext() + '\n')
...
return msg
According to the above code, you know that you can assign the self.report_level (i.e. settings_overrides={'report_level': ...}) let the warning not show.
and I set it to SERVER_LEVEL+1, so it will not show any error. (you can set it according to your demand.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get parser error as exception in docutils - python

Related

Python: How to properly handle an exception using 'ask for forgiveness' (try–except) approach?

Trying to implement jsonschema with just filepaths

Python: Why will this string print but not write to a file?

Python: Update XML-file using ElementTree while conserving layout as much as possible

How Do I Suppress or Disable Warnings in reSTructuredText?

Categories

Resources