Generating XML in Python - python

I'm developing an API using Python which makes server calls using XML. I am debating on whether to use a library (ex. http://wiki.python.org/moin/MiniDom) or if it would be "better" (meaning less overhead and faster) to use string concatenation in order to generate the XML used for each request. Also, the XML I will be generating is going to be quite dynamic so I'm not sure if something that allows me to manage elements dynamically will be a benefit.

Since you are just using authorize.net, why not use a library specifically designed for the Authorize.net API and forget about constructing your own XML calls?
If you want or need to go your own way with XML, don't use minidom, use something with an ElementTree interface such as cElementTree (which is in the standard library). It will be much less painful and probably much faster. You will certainly need an XML library to parse the XML you produce, so you might as well use the same API for both.
It is very unlikely that the overhead of using an XML library will be a problem, and the benefit in clean code and knowing you can't generate invalid XML is very great.
If you absolutely, positively need to be as fast as possible, use one of the extremely fast templating libraries available for Python. They will probably be much faster than any naive string concatenation you do and will also be safe (i.e do proper escaping).

Another option is to use Jinja, especially if the dynamic nature of your xml is fairly simple. This is a common idiom in flask for generating html responses.
Here is an example of a jinja template that generates the XML of an aws S3 list objects response. I usually store the template in a separate file to avoid polluting my elegant python with ugly xml.
from datetime import datetime
from jinja2 import Template
list_bucket_result = """<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>{{bucket_name}}</Name>
<Prefix/>
<KeyCount>{{object_count}}</KeyCount>
<MaxKeys>{{max_keys}}</MaxKeys>
<IsTruncated>{{is_truncated}}</IsTruncated>
{%- for object in object_list %}
<Contents>
<Key>{{object.key}}</Key>
<LastModified>{{object.last_modified_date.isoformat()}}</LastModified>
<ETag></ETag>
<Size>{{object.size}}</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>{% endfor %}
</ListBucketResult>
"""
class BucketObject:
def __init__(self, key, last_modified_date, size):
self.key = key
self.last_modified_date = last_modified_date
self.size = size
object_list = [
BucketObject('/foo/bar.txt', datetime.utcnow(), 10*1024 ),
BucketObject('/foo/baz.txt', datetime.utcnow(), 29*1024 ),
]
template = Template(list_bucket_result)
result = template.render(
bucket_name='test-bucket',
object_count=len(object_list),
max_keys=1000,
is_truncated=False,
object_list=object_list
)
print result
Output:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>test-bucket</Name>
<Prefix/>
<KeyCount>2</KeyCount>
<MaxKeys>1000</MaxKeys>
<IsTruncated>False</IsTruncated>
<Contents>
<Key>/foo/bar.txt</Key>
<LastModified>2017-10-31T02:28:34.551000</LastModified>
<ETag></ETag>
<Size>10240</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>
<Contents>
<Key>/foo/baz.txt</Key>
<LastModified>2017-10-31T02:28:34.551000</LastModified>
<ETag></ETag>
<Size>29696</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>
</ListBucketResult>

I would definitely recommend that you make use of one of the Python libraries; such as MiniDom, ElementTree, lxml.etree or pyxser. There is no reason to not to, and the potential performance impact will be minimal.
Although, personally I prefer using simplejson (or simply json) instead.
my_list = ["Value1", "Value2"]
json = simplejson.dumps(my_list)
# send json

My real question is what are the biggest concerns for what you're trying to accomplish? If you're worried about speed/memory, then yes, minidom does take a hit. If you want something that's fairly reliable that you can deploy quickly, I'd say use it.
My suggestion for dealing with XML in any language(Java, Python, C#, Perl, etc) is to consider using something already existing. Everyone has written their own XML parser at least once, and then they never do so again because it's such a pain in the behind. And to be fair, these libraries will have already fixed 99.5% of any problems you would run into.

I recommend LXML. It's a Python library of bindings for the very fast C libraries libxml2 and libxslt.
LXML supports XPATH, and has an elementTree implementation. LXML also has an interface called objectify for writing XML as object hierarchies:
from lxml import etree, objectify
E = objectify.ElementMaker(annotate=False)
my_alpha = my_alpha = E.alpha(E.beta(E.gamma(firstattr='True')),
E.beta(E.delta('text here')))
etree.tostring(my_alpha)
# '<alpha><beta><gamma firstattr="True"/></beta><beta><delta>text here</delta></beta></alpha>'
etree.tostring(my_alpha.beta[0])
# '<beta><gamma firstattr="True"/></beta>'
my_alpha.beta[1].delta.text
# 'text here'

Related

Writing xml elements using multiple functions in python3

I need to write a bunch of elements in an xml file and every couple of elements should be empty so in order to make the code a bit more compact i thought I'd make a function that writes out the empty elements. Of course I can't make a code that looks like this:
def makeOne():
table=etree.SubElement(tables,'table')
values = etree.SubElement(table,'values')
and call it later in the function that does the actual input of values because as much as I gathered the file I'm working with isn't loaded inside of that function. I might be wrong. I didn't do much Python so I have no idea if there's a more elegant way of handling this. For clarity this is what i had in mind.
def writeVals():
tree = etree.parse('singleprog')
root = tree.getroot()
tables = etree.SubElement(korjen[0], 'tables')
makeOne()
I tihnk it's clear what I want to see happen here, the thing is I can't just put the two subelements in the writeVals() function because i need to use that code some 30 times at random places.
That's not really answer the question but alternatively but you can use lxml library and it's wonderful E factory method:
from lxml import etree
from lxml.builder import E
table = E.table(E.values)
etree.dump(table)
You'll get:
<table>
<values/>
</table>
To go further:
table = E.table(
E.values("one"),
E.values("two"),
E.values("there"),
)
etree.dump(table)
You'll get:
<table>
<values>one</values>
<values>two</values>
<values>there</values>
</table>
Introduction to lxml:
The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 2.6 to 3.6. See the introduction for more information about background and goals of the lxml project. Some common questions are answered in the FAQ.

Is there a way to get a line number from an ElementTree Element

So I'm parsing some XML files using Python 3.2.1's cElementTree, and during the parsing I noticed that some of the tags were missing attribute information. I was wondering if there is any easy way of getting the line numbers of those Elements in the xml file.
Looking at the docs, I see no way to do this with cElementTree.
However I've had luck with lxmls version of the XML implementation.
Its supposed to be almost a drop in replacement, using libxml2. And elements have a sourceline attribute. (As well as getting a lot of other XML features).
Only caveat is that I've only used it in python 2.x - not sure how/if it works under 3.x - but might be worth a look.
Addendum:
from their front page they say :
The lxml XML toolkit is a Pythonic binding for the C libraries libxml2
and libxslt. It is unique in that it combines the speed and XML
feature completeness of these libraries with the simplicity of a
native Python API, mostly compatible but superior to the well-known
ElementTree API. The latest release works with all CPython versions
from 2.3 to 3.2. See the introduction for more information about
background and goals of the lxml project. Some common questions are
answered in the FAQ.
So it looks like python 3.x is OK.
Took a while for me to work out how to do this using Python 3.x (using 3.3.2 here) so thought I would summarize:
# Force python XML parser not faster C accelerators
# because we can't hook the C implementation
sys.modules['_elementtree'] = None
import xml.etree.ElementTree as ET
class LineNumberingParser(ET.XMLParser):
def _start_list(self, *args, **kwargs):
# Here we assume the default XML parser which is expat
# and copy its element position attributes into output Elements
element = super(self.__class__, self)._start_list(*args, **kwargs)
element._start_line_number = self.parser.CurrentLineNumber
element._start_column_number = self.parser.CurrentColumnNumber
element._start_byte_index = self.parser.CurrentByteIndex
return element
def _end(self, *args, **kwargs):
element = super(self.__class__, self)._end(*args, **kwargs)
element._end_line_number = self.parser.CurrentLineNumber
element._end_column_number = self.parser.CurrentColumnNumber
element._end_byte_index = self.parser.CurrentByteIndex
return element
tree = ET.parse(filename, parser=LineNumberingParser())
I've done this in elementtree by subclassing ElementTree.XMLTreeBuilder. Then where I have access to the self._parser (Expat) it has properties _parser.CurrentLineNumber and _parser.CurrentColumnNumber.
http://docs.python.org/py3k/library/pyexpat.html?highlight=xml.parser#xmlparser-objects has details about these attributes
During parsing you could print out info, or put these values into the output XML element attributes.
If your XML file includes additional XML files, you have to do some stuff that I don't remember and was not well documented to keep track of the current XML file.
One (hackish) way of doing this is by inserting a dummy-attribute holding the line number into each element, before parsing. Here's how I did this with minidom:
python reporting line/column of origin of XML node
This can be trivially adjusted to cElementTree (or in fact any other python XML parser).

How can I change a Python object into XML?

I am looking to convert a Python object into XML data. I've tried lxml, but eventually had to write custom code for saving my object as xml which isn't perfect.
I'm looking for something more like pyxser. Unfortunately pyxser xml code looks different from what I need.
For instance I have my own class Person
Class Person:
name = ""
age = 0
ids = []
and I want to covert it into xml code looking like
<Person>
<name>Mike</name>
<age> 25 </age>
<ids>
<id>1234</id>
<id>333333</id>
<id>999494</id>
</ids>
</Person>
I didn't find any method in lxml.objectify that takes object and returns xml code.
Best is rather subjective and I'm not sure it's possible to say what's best without knowing more about your requirements. However Gnosis has previously been recommended for serializing Python objects to XML so you might want to start with that.
From the Gnosis homepage:
Gnosis Utils contains several Python modules for XML processing, plus other generally useful tools:
xml.pickle (serializes objects to/from XML)
API compatible with the standard pickle module)
xml.objectify (turns arbitrary XML documents into Python objects)
xml.validity (enforces XML validity constraints via DTD or Schema)
xml.indexer (full text indexing/searching)
many more...
Another option is lxml.objectify.
Mike,
you can either implement object rendering into XML :
class Person:
...
def toXml( self):
print '<Person>'
print '\t<name>...</name>
...
print '</Person>'
or you can transform Gnosis or pyxser output using XSLT.

Python HTML generator

I am looking for an easily implemented HTML generator for Python. I found HTML.py, but there is no way to add CSS elements (id, class) for table.
Dominate is an HTML generation library that lets you easily create tags. In dominate, python reserved words are prefixed with an underscore, so it would look like this:
from dominate.tags import *
t = div(table(_id="the_table"), _class="tbl")
print(t)
<div class="tbl">
<table id="the_table"></table>
</div>
Disclaimer: I am the author of dominate
If you want programmatic generation rather than templating, Karrigell's HTMLTags module is one possibility; it can include e.g. the class attribute (which would be a reserved word in Python) by the trick of uppercasing its initial, i.e., quoting the doc URL I just gave:
Attributes with the same name as
Python keywords (class, type) must be
capitalized :
print DIV('bar', Class="title") ==> <DIV class="title">bar</DIV>
HTML Generation is usually done with one of the infinite amounts of HTML templating languages available for Python. Personally I like Templess, but Genshi is probably the most popular. There are infinite amounts of them, there is a list which is highly likely to be incomplete.
Otherwise you might want to use lxml, where you can generate it in a more programmatically XML-ish way. Although I have a hard time seeing the benefit.
There's the venerable HTMLGen by Robin Friedrich, which is hard to find but still available here (dated 2001, but HTML hasn't changed much since then ;-). There's also xist. Of course nowadays HTML generation, as Lennart points out, is generally better done using templating systems such as Jinja or Mako.
This is one ultra-simple HTML generator I have written. I use it build-time to generate html. If one is generating html pages run-time then there are better options available
Here is the link
http://pypi.python.org/pypi/sphc
And a quick example
>> import sphw
>> tf = sphw.TagFactory()
>>> div = tf.DIV("Some Text here.", Class='content', id='foo')
>>> print(div)
<DIV Class="content", id="foo">Some Text here.</DIV>
Actually you can add any attribute such as id and class to objects in HTML.py (http://www.decalage.info/python/html).
attribs is an optional parameter of Table, TableRow and TableCell classes. It is a dictionary of additional attributes you would like to set. For example, the following code sets id and class for a table:
import HTML
table_data = [
['Last name', 'First name', 'Age'],
['Smith', 'John', 30],
['Carpenter', 'Jack', 47],
['Johnson', 'Paul', 62],
]
htmlcode = HTML.table(table_data,
attribs={'id':'table1', 'class':'myclass'})
print htmlcode
The same parameter can be used with TableRow and TableCell objects to format rows and cells. It does not exist for columns yet, but should be easy to implement if needed.
Ok, here's another html generator, or I prefer to think of it as a compiler.
https://pypi.python.org/pypi/python-html-compiler
This is a set of base classes that can be used to define tags and attributes. Thus a tag class has attributes and children. Children are themselves Tag classes that have attributes and children etc etc. Also you can set parameters that start with your root class and work up the various branches.
This will allow you to define all the tag classes you want thus be able to create customised classes and implement any tags or attributes you want.
Just started on this, so if anybody wants to test :)
Html generation or any text generatio,jinja is a powerful template engine.
You might be interested in some of the Python HAML implementations. HAML is like HTML shorthand and only takes a few minutes to learn. There's a CSS version called SASS too.
http://haml.hamptoncatlin.com/
"Is there a HAML implementation for use with Python and Django" talks about Python and HAML a bit more.
I'm using HAML as much as possible when I'm programming in Ruby. And, as a footnote, there's also been some work getting modules for Perl which work with the nice MVC Mojolicious:
http://metacpan.org/pod/Text::Haml

How do I use genshi.builder to programmatically build an HTML document?

I recently discovered the genshi.builder module. It reminds me of Divmod Nevow's Stan module. How would one use genshi.builder.tag to build an HTML document with a particular doctype? Or is this even a good thing to do? If not, what is the right way?
It's not possible to build an entire page using just genshi.builder.tag -- you would need to perform some surgery on the resulting stream to insert the doctype. Besides, the resulting code would look horrific. The recommended way to use Genshi is to use a separate template file, generate a stream from it, and then render that stream to the output type you want.
genshi.builder.tag is mostly useful for when you need to generate simple markup from within Python, such as when you're building a form or doing some sort of logic-heavy modification of the output.
See documentation for:
Creating and using templates
The XML-based template language
genshi.builder API docs
If you really want to generate a full document using only builder.tag, this (completely untested) code could be a good starting point:
from itertools import chain
from genshi.core import DOCTYPE, Stream
from genshi.output import DocType
from genshi.builder import tag as t
# Build the page using `genshi.builder.tag`
page = t.html (t.head (t.title ("Hello world!")), t.body (t.div ("Body text")))
# Convert the page element into a stream
stream = page.generate ()
# Chain the page stream with a stream containing only an HTML4 doctype declaration
stream = Stream (chain ([(DOCTYPE, DocType.get ('html4'), None)], stream))
# Convert the stream to text using the "html" renderer (could also be xml, xhtml, text, etc)
text = stream.render ('html')
The resulting page will have no whitespace in it -- it'll look normal, but you'll have a hard time reading the source code because it will be entirely on one line. Implementing appropriate filters to add whitespace is left as an exercise to the reader.
Genshi.builder is for "programmatically generating markup streams"[1]. I believe the purpose of it is as a backend for the templating language. You're probably looking for the templating language for generating a whole page.
You can, however do the following:
>>> import genshi.output
>>> genshi.output.DocType('html')
('html', '-//W3C//DTD HTML 4.01//EN', 'http://www.w3.org/TR/html4/strict.dtd')
See other Doctypes here: http://genshi.edgewall.org/wiki/ApiDocs/genshi.output#genshi.output:DocType
[1] genshi.builder.__doc__

Categories

Resources