XML creation from a Dictionary in Python - python

I am quite new to XML as well as Python, so please overlook . I am trying to unpack a dictionary straight into XML format. My Code Fragment is as follows:
from xml.dom.minidom import Document
def make_xml(dictionary):
doc = Document()
result = doc.createElement('result')
doc.appendChild(result)
for key in dictionary:
attrib = doc.createElement(key)
result.appendChild(attrib)
value = doc.createTextNode(dictionary[key])
attrib.appendChild(value)
print doc
I expected an answer of the format
<xml>
<result>
<attrib#1>value#1</attrib#1>
...
However all I am getting is
<xml.dom.minidom.Document instance at 0x01BE6130>
Please help

You have not checked the
http://docs.python.org/library/xml.dom.minidom.html
docs.
Look at the toxml() or prettyprettyxml() methods.

You can always use a library like xmler which easily takes a python dictionary and converts it to xml. Full disclosure, I created the package, but I feel like it will probably do what you need.
Also feel free to take a look at the source.

Related

lxml preserves attributes order?

I was writing my aplication using minidom but minidom does not preserve attribute order(sorts alphabetically), so I decided to do it using lxml.
However in the following lines of code I'm not getting the desired order:
import lxml.etree as ET
SATNS = "link_1"
NS = "link_2"
location_attribute = '{%s}schemaLocation' % NS
root = ET.Element('{%s}Catalogo' % SATNS, nsmap={'catalogocuentas':SATNS}, attrib=
{location_attribute: 'http://www.sat.gob.mx/catalogocuentas'}, Ano="2014", Mes="02", TotalCtas="219", RFC="ALF040329CX6", Version="1.0")
print (ET.tostring(root, pretty_print=True))
This is what I'm expecting to get:
<catalogocuentas:Catalogo xmlns:catalogocuentas="link_1"
xmlns:xsi="link_2" xsi:schemaLocation="http://www.sat.gob.mx/catalogocuentas"
Ano="2014" Mes="02" TotalCtas="219" RFC="XXX010101XXX" Version="1.0">
</catalogocuentas:Catalogo>
Which is in the order that I filled in:
root=ET.element(...)
But I'm getting the next, that has no order:
<catalogocuentas:Catalogo xmlns:catalogocuentas="link_1"
xmlns:xsi="link_2" RFC="ALF040329CX6" Version="1.0"
Mes="02" xsi:schemaLocation="http://www.sat.gob.mx/catalogocuentas" Ano="2014" TotalCtas="219">
</catalogocuentas:Catalogo>
Is there a way to fix this problem?
Thanks in advance!!
Dictionaries in Python are unordered. Keyword arguments are passed to functions by a dictionary traditionally named **kwargs, and so the order is lost. The function can't possibly know what order the arguments to ET.element came in.
As stated in this question, there isn't really any way to get this done. XML doesn't care about attribute order, so there isn't really any good reason to do it.

How do you create a non-nested xml element using Python's lxml.objectify?

My current code is
xml_obj = lxml.objectify.Element('root_name')
xml_obj[root_name] = str('text')
lxml.etree.tostring(xml_obj)
but this creates the following xml:
<root_name><root_name>text</root_name></root_name>
In the application I am using this for I could easily use text substitution to solve this problem, but it would be nice to know how to do it using the library.
I'm not that familiar with objectify, but i don't think that's the way it's intended to be used. The way it represents objects, is that a node at any given level is, say, a classname, and the subnodes are field names (with types) and values. And the normal way to use it would be something more like this:
xml_obj = lxml.objectify.Element('xml_obj')
xml_obj.root_path = 'text'
etree.dump(xml_obj)
<root_name xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="TREE">
<root_name py:pytype="str">text</root_name>
</root_name>
What you want would be way easier to do with etree:
xml_obj = lxml.etree.Element('root_path')
xml_obj.text = 'text'
etree.dump(xml_obj)
<root_path>text</root_path>
If you really need it to be in objectify, it looks like while you shouldn't mix directly, you can use tostring to generate XML, then objectify.fromstring to bring it back. But probably, if this is what you want, you should just use etree to generate it.
I don't think you can write data into the root element. You may need to create a child element like this:
xml_obj = lxml.objectify.Element('root_name')
xml_obj.child_name = str('text')
lxml.etree.tostring(xml_obj)

Python minidom look for empty text node

I am parsing an XML file with the minidom parser, where I'm iterating over the XML and output specific information that stands between the tags into a dictionary.
Like this:
d={}
dom = parseString(data)
macro=dom.getElementsByTagName('macro')
for node in macro:
d={}
id_name=node.getElementsByTagName('id')[0].toxml()
id_data=id_name.replace('<id>','').replace('</id>','')
print (id_data)
cl_name=node.getElementsByTagName('cl')[1].toxml()
cl_data=cl_name.replace('<cl>','').replace('</cl>','')
print (cl_data)
d_source[id_data]=(cl_data)
Now, my problem is that the data where I'm looking for in cl_name=node.getElementsByTagName('cl')[1].toxml() is sometimes non-existent!
In this case the part of the XML looks like this:
<cl>blabla</cl>
<cl></cl>
Because of this I receive an "index is out of range"-error.
However, I really need this "nothing" in my dictionary. My dictionary should look like this:
d={blabla:'',xyz:'abc'}
I have to look for the empty text node, which I tried by doing this:
if node.getElementsByTagName('cl')[1].toxml is None:
print ('')
else:
cl_name=node.getElementsByTagName('cl')[1].toxml()
cl_data=cl_name.replace('<cl>','').replace('</cl>','')
print (cl_data)
d_target[id_data]=(cl_data)
print(d_target)
I still receive that indexing error...I also thought about inserting a white space into the original source file, but am not sure if this would solve the issue. Any ideas?
If the minidom is not dictated somehow, I suggest to change your mind and use the standard xml.etree.ElementTree. It is much easier.
I figured out it's working when adding a white space into the original source file. This looks a bit messy though. So if anyone has a better idea, I'm looking forward to it!

How can you parse xml in Google Refine using jython/python ElementTree

I trying to parse some xml in Google Refine using Jython and ElementTree but I'm struggling to find any documentation to help me getting this working (probably not helped by not being a python coder)
Here's an extract of the XML I'm trying to parse. I'm trying to return a joined string of all the dc:indentifier:
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:creator>J. Koenig</dc:creator>
<dc:date>2010-01-13T15:47:38Z</dc:date>
<dc:date>2010-01-13T15:47:38Z</dc:date>
<dc:date>2010-01-13T15:47:38Z</dc:date>
<dc:identifier>CCTL0059</dc:identifier>
<dc:identifier>CCTL0059</dc:identifier>
<dc:identifier>http://open.jorum.ac.uk:80/xmlui/handle/123456789/335</dc:identifier>
<dc:format>application/pdf</dc:format>
</oai_dc:dc>
Here's the code I've got so far. This is a test to return anything as right now all I'm getting is 'Error: null'
from elementtree import ElementTree as ET
element = ET.parse(value)
namespace = "{http://www.openarchives.org/OAI/2.0/oai_dc/}"
e = element.findall('{0}identifier'.format(namespace))
for i in e:
count += 1
return count
You can use a GREL expression like this, try it:
forEach(value.parseHtml().select("dc|identifier"),v,v.htmlText()).join(",")
For each identifier found, give me the htmlText and join them all with commas.
parseHtml() uses Jsoup.org library and really just parses tags and structure. It also knows about parsing namespaces with the format of ns|identifier and is a nice way to get what your after in this case.
You've used the wrong namespace. This works on Jython 2.5.1:
from xml.etree import ElementTree as ET
element = ET.fromstring(value) # `value` is a string with the xml from question
namespace = "{http://purl.org/dc/elements/1.1/}"
for e in element.getiterator(namespace+'identifier'):
print e.text
Output
CCTL0059
CCTL0059
http://open.jorum.ac.uk:80/xmlui/handle/123456789/335
Here's a slight tweak on J.F. Sebastian's version which can be pasted directly into Google Refine:
from xml.etree import ElementTree as ET
element = ET.fromstring(value)
namespace = "{http://purl.org/dc/elements/1.1/}"
return ','.join([e.text for e in element.getiterator(namespace+'identifier')])
It returns a comma separated list, but you can change the delimiter used in the return statement.

How can I change a Python object into XML?

I am looking to convert a Python object into XML data. I've tried lxml, but eventually had to write custom code for saving my object as xml which isn't perfect.
I'm looking for something more like pyxser. Unfortunately pyxser xml code looks different from what I need.
For instance I have my own class Person
Class Person:
name = ""
age = 0
ids = []
and I want to covert it into xml code looking like
<Person>
<name>Mike</name>
<age> 25 </age>
<ids>
<id>1234</id>
<id>333333</id>
<id>999494</id>
</ids>
</Person>
I didn't find any method in lxml.objectify that takes object and returns xml code.
Best is rather subjective and I'm not sure it's possible to say what's best without knowing more about your requirements. However Gnosis has previously been recommended for serializing Python objects to XML so you might want to start with that.
From the Gnosis homepage:
Gnosis Utils contains several Python modules for XML processing, plus other generally useful tools:
xml.pickle (serializes objects to/from XML)
API compatible with the standard pickle module)
xml.objectify (turns arbitrary XML documents into Python objects)
xml.validity (enforces XML validity constraints via DTD or Schema)
xml.indexer (full text indexing/searching)
many more...
Another option is lxml.objectify.
Mike,
you can either implement object rendering into XML :
class Person:
...
def toXml( self):
print '<Person>'
print '\t<name>...</name>
...
print '</Person>'
or you can transform Gnosis or pyxser output using XSLT.

Categories

Resources