How to insert a processing instruction in XML file? - python

I want to add a xml-stylesheet processing instruction before the root element in my XML file using ElementTree (Python 3.8).
You find as below my code that I used to create XML file
import xml.etree.cElementTree as ET
def Export_star_xml( self ):
star_element = ET.Element("STAR",**{ 'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance' })
element_node = ET.SubElement(star_element ,"STAR_1")
element_node.text = "Mario adam"
tree.write( "star.xml" ,encoding="utf-8", xml_declaration=True )
Output:
<?xml version="1.0" encoding="windows-1252"?>
<STAR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<STAR_1> Mario adam </STAR_1>
</STAR>
Output Expected:
<?xml version="1.0" encoding="windows-1252"?>
<?xml-stylesheet type="text/xsl" href="ResourceFiles/form_star.xsl"?>
<STAR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<STAR_1> Mario adam </STAR_1>
</STAR>

I cannot figure out how to do this with ElementTree. Here is a solution that uses lxml, which provides an addprevious() method on elements.
from lxml import etree as ET
# Note the use of nsmap. The syntax used in the question is not accepted by lxml
star_element = ET.Element("STAR", nsmap={'xsi': 'http://www.w3.org/2001/XMLSchema-instance'})
element_node = ET.SubElement(star_element ,"STAR_1")
element_node.text = "Mario adam"
# Create PI and and insert it before the root element
pi = ET.ProcessingInstruction("xml-stylesheet", text='type="text/xsl" href="ResourceFiles/form_star.xsl"')
star_element.addprevious(pi)
ET.ElementTree(star_element).write("star.xml", encoding="utf-8",
xml_declaration=True, pretty_print=True)
Result:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="ResourceFiles/form_star.xsl"?>
<STAR xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<STAR_1>Mario adam</STAR_1>
</STAR>

Related

Read/Extract data from XML with Python

I am trying to read/extract data from XML with Python using xml.etree.ElementTree.
Unfortunately, up to now, I didn't find how to do it. Most probably because I didn't understand how xml works.
The idea is to write the DocumentId number as a list
Here is my XML file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<RegisterSearch TotalResults="4">
<SearchResults>
<Document DocumentId="1348828088501913376">
<DocumentNumber>001</DocumentNumber>
</Document>
<Document DocumentId="1348828088501881434">
<DocumentNumber>001</DocumentNumber>
</Document>
<Document DocumentId="1348828088539553420">
<DocumentNumber>010</DocumentNumber>
</Document>
<Document DocumentId="1348828088539570694">
<DocumentNumber>010</DocumentNumber>
</Document>
</SearchResults>
</RegisterSearch>
And here is my Python code:
#!/usr/bin/python2
import xml.etree.ElementTree as ET
tree = ET.parse('documents.xml')
root = tree.getroot()
for elem in root:
if(elem.tag=='Document'):
print elem.get('DocumentId')
This is what I try to achieve:
1348828088501913376
1348828088501881434
1348828088539553420
1348828088539570694
Actually, the code brings back nothing...
Thanks in advance for your suggestion.
Iterate over the tags you are interested in:
for elem in root.iter(tag='Document'):
print(elem.get('DocumentId'))
Your original solution would work with
for elem in root.iter():
...
v3.8: https://docs.python.org/3/library/xml.etree.elementtree.html#finding-interesting-elements
v2.7: https://docs.python.org/2.7/library/xml.etree.elementtree.html#finding-interesting-elements

Python etree XSLT Requires Tag output?

I'm trying to make a simple XML --> CSV script, using XSLT. I found that etree seems to "want" a tag to output... Does anyone know a workaround? Yes, I've seen this post: XML to CSV Using XSLT.
See below...
Here's a sample XML data just for reference. My code doesn't even do anything with the data yet, as it was failing to even write a header.
<projects>
<project>
<name>Shockwave</name>
<language>Ruby</language>
<owner>Brian May</owner>
<state>New</state>
<startDate>31/10/2008 0:00:00</startDate>
</project>
<project>
<name>Other</name>
<language>Erlang</language>
<owner>Takashi Miike</owner>
<state> Canceled </state>
<startDate>07/11/2008 0:00:00</startDate>
</project>
</projects>
Here's my script:
import sys
from lxml import etree
system_file = sys.argv[1]
xml_file = sys.argv[2]
sys_txt = open( system_file,"r" ).read()
xsl_txt = open( "csv_file.xslt","r" ).read()
sysroot = etree.fromstring( sys_txt )
xslroot = etree.fromstring( xsl_txt )
transform = etree.XSLT( xslroot )
with open( xml_file, "w" ) as f:
f.write(etree.tostring( transform(sysroot) ) )
This XSLT code does NOT work ( etree.tostring... = None ):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
Hi
</xsl:template>
</xsl:stylesheet>
But THIS XSLT does work... seems etree needs to output an XML file?
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<dummy>
Hi
</dummy>
</xsl:template>
</xsl:stylesheet>
At this point I'm thinking I can proceed with a dummy tag, then remove it at end...
"Python etree XSLT Requires Tag output?"
The answer is NO.
As exemplified in the documentation, section XSLT result objects; you can use standard python str() function to get the expected string representation of the transformation result, especially when it has no root element :
from lxml import etree
raw_xml = '''<projects>
<project>
<name>Shockwave</name>
<language>Ruby</language>
<owner>Brian May</owner>
<state>New</state>
<startDate>31/10/2008 0:00:00</startDate>
</project>
<project>
<name>Other</name>
<language>Erlang</language>
<owner>Takashi Miike</owner>
<state>Canceled</state>
<startDate>07/11/2008 0:00:00</startDate>
</project>
</projects>'''
raw_xslt = '''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:text>Hi</xsl:text>
</xsl:template>
</xsl:stylesheet>'''
sysroot = etree.fromstring(raw_xml)
xslroot = etree.fromstring(raw_xslt)
transform = etree.XSLT(xslroot)
print str(transform(sysroot))
# output:
# Hi
And as you saw, etree.tostring() is still usable when the transformation result has a root element.

How to keep the xml-stylesheet?

I want to keep the xml-stylesheet. But it doesn't work.
I use Python to modify the XML for deploy hadoop automatically.
XML:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://c11:9000</value>
  </property>
</configuration>
Code:
from xml.etree.ElementTree import ElementTree as ET
def modify_core_site(namenode_hostname):
tree = ET()
tree.parse("pkg/core-site.xml")
root = tree.getroot()
for p in root.iter("property"):
name = p.find("name").text
if name == "fs.default.name":
text = "hdfs://%s:9000" % namenode_hostname
p.find("value").text = text
tree.write("pkg/tmp.xml", encoding="utf-8", xml_declaration=True)
modify_core_site("c80")
Result:
<?xml version='1.0' encoding='utf-8'?>
<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://c80:9000</value>
  </property>
</configuration>
The xml-stylesheet disappear...
How can I keep this?
One solution is you can use lxml Once you parse xml go till you find the xsl node. Quick sample below:
>>> import lxml.etree
>>> doc = lxml.etree.parse('C:/downloads/xmltest.xml')
>>> root = doc.getroot()
>>> xslnode=root.getprevious().getprevious()
>>> xslnode
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
Make sure you put in some exception handling and check if the node indeed exists. You can check if the node is xslt processing instruction by
>>> isinstance(xslnode, lxml.etree._XSLTProcessingInstruction)
True

ElementTree returns no nodes parsing simple KML document

I have a very simple KML file which returns no nodes when parsed with ElementTree. This is frustrating me :-). Any clues?
from xml.etree import ElementTree
from pprint import pprint
kml = '''<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
<Document>
<name>NEXRAD Radar Sites</name>
<Schema parent="Placemark" name="wsr">
<SimpleField type="wstring" name="STATE">
</SimpleField>
</Schema>
<wsr>
<name>KABR</name>
</wsr>
</Document>
</kml>
'''
tree = ElementTree.fromstring(kml)
ElementTree.dump(tree)
for node in tree.iter('wsr'):
pprint(node)
for node in tree.findall('../wsr'):
pprint(node)
The tags are namespaced. If you try tree.iter() with no tag it will show what ElementTree thinks the tags are called. The wsr tag is called {http://earth.google.com/kml/2.0}wsr. This returns a node:
list(tree.iter('{http://earth.google.com/kml/2.0}wsr'))

How to add an xml-stylesheet processing instruction node with Python 2.6 and minidom?

I'm creating an XML document using minidom - how do I ensure my resultant XML document contains a stylesheet reference like this:
<?xml-stylesheet type="text/xsl" href="mystyle.xslt"?>
Thanks !
Use something like this:
from xml.dom import minidom
xml = """
<root>
<x>text</x>
</root>"""
dom = minidom.parseString(xml)
pi = dom.createProcessingInstruction('xml-stylesheet',
'type="text/xsl" href="mystyle.xslt"')
root = dom.firstChild
dom.insertBefore(pi, root)
print dom.toprettyxml()
=>
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="mystyle.xslt"?>
<root>
<x>
text
</x>
</root>
I am not familiar with minidom, but you must create a processing instruction node (PI) with name: "xml-stylesheet" and text: "type='text/xsl' href='mystyle.xslt'"
Read the documentation how a PI is created.
import xml.dom
dom = xml.dom.minidom.parse("C:\\Temp\\Report.xml")
pi = dom.createProcessingInstruction('xml-stylesheet',
'type="text/xsl" href="TestCaseReport.xslt"')
root = dom.firstChild
dom.insertBefore(pi, root)
a = dom.toxml()
f = open("C:\\Report(1).xml",'w')
f.write(a)
f.close()

Categories

Resources