How to keep the xml-stylesheet? - python

I want to keep the xml-stylesheet. But it doesn't work.
I use Python to modify the XML for deploy hadoop automatically.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
from xml.etree.ElementTree import ElementTree as ET
def modify_core_site(namenode_hostname):
tree = ET()
root = tree.getroot()
for p in root.iter("property"):
name = p.find("name").text
if name == "":
text = "hdfs://%s:9000" % namenode_hostname
p.find("value").text = text
tree.write("pkg/tmp.xml", encoding="utf-8", xml_declaration=True)
<?xml version='1.0' encoding='utf-8'?>
The xml-stylesheet disappear...
How can I keep this?

One solution is you can use lxml Once you parse xml go till you find the xsl node. Quick sample below:
>>> import lxml.etree
>>> doc = lxml.etree.parse('C:/downloads/xmltest.xml')
>>> root = doc.getroot()
>>> xslnode=root.getprevious().getprevious()
>>> xslnode
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
Make sure you put in some exception handling and check if the node indeed exists. You can check if the node is xslt processing instruction by
>>> isinstance(xslnode, lxml.etree._XSLTProcessingInstruction)


How to insert a processing instruction in XML file?

I want to add a xml-stylesheet processing instruction before the root element in my XML file using ElementTree (Python 3.8).
You find as below my code that I used to create XML file
import xml.etree.cElementTree as ET
def Export_star_xml( self ):
star_element = ET.Element("STAR",**{ 'xmlns:xsi': '' })
element_node = ET.SubElement(star_element ,"STAR_1")
element_node.text = "Mario adam"
tree.write( "star.xml" ,encoding="utf-8", xml_declaration=True )
<?xml version="1.0" encoding="windows-1252"?>
<STAR xmlns:xsi="">
<STAR_1> Mario adam </STAR_1>
Output Expected:
<?xml version="1.0" encoding="windows-1252"?>
<?xml-stylesheet type="text/xsl" href="ResourceFiles/form_star.xsl"?>
<STAR xmlns:xsi="">
<STAR_1> Mario adam </STAR_1>
I cannot figure out how to do this with ElementTree. Here is a solution that uses lxml, which provides an addprevious() method on elements.
from lxml import etree as ET
# Note the use of nsmap. The syntax used in the question is not accepted by lxml
star_element = ET.Element("STAR", nsmap={'xsi': ''})
element_node = ET.SubElement(star_element ,"STAR_1")
element_node.text = "Mario adam"
# Create PI and and insert it before the root element
pi = ET.ProcessingInstruction("xml-stylesheet", text='type="text/xsl" href="ResourceFiles/form_star.xsl"')
ET.ElementTree(star_element).write("star.xml", encoding="utf-8",
xml_declaration=True, pretty_print=True)
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="ResourceFiles/form_star.xsl"?>
<STAR xmlns:xsi="">
<STAR_1>Mario adam</STAR_1>

Read/Extract data from XML with Python

I am trying to read/extract data from XML with Python using xml.etree.ElementTree.
Unfortunately, up to now, I didn't find how to do it. Most probably because I didn't understand how xml works.
The idea is to write the DocumentId number as a list
Here is my XML file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<RegisterSearch TotalResults="4">
<Document DocumentId="1348828088501913376">
<Document DocumentId="1348828088501881434">
<Document DocumentId="1348828088539553420">
<Document DocumentId="1348828088539570694">
And here is my Python code:
import xml.etree.ElementTree as ET
tree = ET.parse('documents.xml')
root = tree.getroot()
for elem in root:
print elem.get('DocumentId')
This is what I try to achieve:
Actually, the code brings back nothing...
Thanks in advance for your suggestion.
Iterate over the tags you are interested in:
for elem in root.iter(tag='Document'):
Your original solution would work with
for elem in root.iter():

Getting root node's attributes (namespace) in Python

I need to extract namespace which comes at the very beginning of xml file.
It looks something like this.
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:a="CannotGetThisAttrib" xmlns:b="CannotGetThisAttrib">
<fileHeader c="CanGetThisAttrib/>>
I can extract attributes beneath the root node. However, I cannot get the root node attributes, both a and b, which are namespaces necessary to parse xml file.
tree = ET.parse("xmlfile.xml")
root = tree.getroot()
root.attrib => None
root[0].attrib["c"] => CanGetThisAttrib
Any advice is appreciated.
Here (using lxml)
from lxml import etree
data = '''<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:a="CannotGetThisAttrib" xmlns:b="CannotGetThisAttrib">
<fileHeader c="CanGetThisAttrib"/>
data = data.encode('ascii')
tree = etree.fromstring(data)
for k,v in tree.nsmap.items():
print('{} -> {}'.format(k,v))
a -> CannotGetThisAttrib
b -> CannotGetThisAttrib

Accessing Elements with and without namespaces using lxml

Is there a way to search for the same element, at the same time, within a document that occur with and without namespaces using lxml? As an example, I would want to get all occurences of the element identifier irrespective of whether or not it is associated with a specific namespace. I am currently only able to access them separately as below.
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()
for l in root.iter('identifier'):
print l.text
for l in root.iter('{}identifier'):
print l.text
File: xmlfile.xml
<?xml version="1.0"?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
<provenance xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
<originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
You could use XPath to solve that kind of issue:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")

How to add an xml-stylesheet processing instruction node with Python 2.6 and minidom?

I'm creating an XML document using minidom - how do I ensure my resultant XML document contains a stylesheet reference like this:
<?xml-stylesheet type="text/xsl" href="mystyle.xslt"?>
Thanks !
Use something like this:
from xml.dom import minidom
xml = """
dom = minidom.parseString(xml)
pi = dom.createProcessingInstruction('xml-stylesheet',
'type="text/xsl" href="mystyle.xslt"')
root = dom.firstChild
dom.insertBefore(pi, root)
print dom.toprettyxml()
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="mystyle.xslt"?>
I am not familiar with minidom, but you must create a processing instruction node (PI) with name: "xml-stylesheet" and text: "type='text/xsl' href='mystyle.xslt'"
Read the documentation how a PI is created.
import xml.dom
dom = xml.dom.minidom.parse("C:\\Temp\\Report.xml")
pi = dom.createProcessingInstruction('xml-stylesheet',
'type="text/xsl" href="TestCaseReport.xslt"')
root = dom.firstChild
dom.insertBefore(pi, root)
a = dom.toxml()
f = open("C:\\Report(1).xml",'w')

