Setting an attribute value in XML file using ElementTree - python

I need to change the value of an attribute named approved-by in an xml file from 'no' to 'yes'. Here is my xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<!--Arbortext, Inc., 1988-2008, v.4002-->
<!DOCTYPE doc PUBLIC "-//MYCOMPANY//DTD XSEIF 1/FAD 110 05 R5//EN"
"XSEIF_R5.dtd">
<doc version="XSEIF R5" xmlns="urn:x-mycompany:r2:reg-doc:1551-fad.110.05:en:*">
<meta-data>
<?Pub Dtl?>
<confidentiality class="mycompany-internal" />
<doc-name>INSTRUCTIONS</doc-name>
<doc-id>
<doc-no type="registration">1/1531-CRA 119 1364/2</doc-no>
<language code="en" />
<rev>PA1</rev>
<date>
<y>2013</y>
<m>03</m>
<d>12</d>
</date>
</doc-id>
<company-id>
<business-unit></business-unit>
<company-name></company-name>
<company-symbol logotype="X"></company-symbol>
</company-id>
<title>SIM Software Installation Guide</title>
<drafted-by>
<person>
<name>Shahul Hameed</name>
<signature>epeeham</signature>
</person>
</drafted-by>
<approved-by approved="no">
<person>
<name>AB</name>
<signature>errrrrn</signature>
</approved-by>
I tried in two ways, and failed in both. My first way is
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element
root = ET.parse('Path/1_1531-CRA 119 1364_2.xml')
sh = root.find('approved-by')
sh.set('approved', 'yes')
print etree.tostring(root)
In this way, I got an error message saying AttributeError: 'NoneType' object has no attribute 'set'.
So I tried another way.
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element
root = ET.parse('C:/Path/1_1531-CRA 119 1364_2.xml')
elem = Element("approved-by")
elem.attrib["approved"] = "yes"
I didn't get any error, also it didn't set the attribute either. I am confused, and not able to find whats wrong with this script.

Since the xml you've provided is not valid, here's an example:
import xml.etree.ElementTree as ET
xml = """<?xml version="1.0" encoding="UTF-8"?>
<body>
<approved-by approved="no">
<name>AB</name>
<signature>errrrrn</signature>
</approved-by>
</body>
"""
tree = ET.fromstring(xml)
sh = tree.find('approved-by')
sh.set('approved', 'yes')
print ET.tostring(tree)
prints:
<body>
<approved-by approved="yes">
<name>AB</name>
<signature>errrrrn</signature>
</approved-by>
</body>
So, the first way you've tried works. Hope that helps.

Related

add comment in XML root element python code

I am parsing data from csv to xml using Python library import xml.etree.ElementTree.
I want to put a comment before first node so the output will be like
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- Comment -->
<a xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<b>
<c>xxxxx</c>
</b>
</a>
The code I have try is:
from lxml import etree
import xml.dom.minidom
import xml.etree.ElementTree as et
name_space = {
# namespace defined below
"xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance"
}
comment = et.Comment
root = et.Element('')
root.tag = None
root.insert(0, comment)
root2 = et.Element('a', namespace)
root.insert(1, root2)
xml_data = et.tostring(root, encoding='iso-8859-1', method='xml')
xmlstr = xml.dom.minidom.parseString(xml_data, parser=None).toprettyxml(indent=" ", encoding='iso-8859-1')
Last sentence xml.dom.minidom.parseString() gives me the error xml.parsers.expat.ExpatError: junk after document element: line 2, column 135
If I print xml_data content it is:
<?xml version=\'1.0\' encoding=\'iso-8859-1\'?>\n<!--Comment--><a xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" /><b><c>xxxx</c></b>
Do you know if is there any other way to add the comment?
I believe you're making it a bit too complicated. Try something like the below. Note that this assumes (like in your question) that xxxxx has already been extract from the csv file.
cmt = """<?xml version="1.0" encoding="iso-8859-1"?>
<a xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<!-- Comment -->
<b>
<c>xxxxx</c>
</b>
</a>
"""
parser = etree.XMLParser(remove_comments=False)
doc = etree.XML(cmt.encode(),parser=parser)
print(etree.tostring(doc).decode())
The output should be what you're looking for.

How to modify XML element using Python Elementtree

I would like to modify a key value of an attribute(e.g Change the value of "strokeColor" inside the "style" attribute), and the other values of this attribute can not be changed. I'm using Python's ElementTree included with Python.
Here is an example of what I did before:
Part of my XML example code:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('example.xml')
target = tree.find('.//mxCell[#id="line1"]')
target.set("strokeColor","#FF0000")
tree.write('output.xml')
My output XML:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" strokeColor="#FF0000" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
As you can see, there is a new attribute called "strokeColor", but not changing the strokeColor value inside the "style" attribute. I want to change the strokeColor inside "style" attribute. How can I fix this?
Another method.
from simplified_scrapy import SimplifiedDoc, utils, req
html = '''
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
'''
doc = SimplifiedDoc(html)
mxCell = doc.select('mxCell#line1')
style = doc.replaceReg(mxCell['style'],'strokeColor=.*?;','strokeColor=#FF0000;')
mxCell.setAttr('style',style)
print(doc.html)
Result:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#FF0000;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

I want to remove the curly braces and XML namspace using lxml and just report the tag name

So I have the following XML document It is much longer:
<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>
I use the following python to extract some of the tag names:
doc = etree.fromstring(resulttxt)
print( doc.attrib)
print(doc.tag)
print(doc[4][0][0].tag)
if(doc[4][0][0].tag == 'field'):
print 'hi'
What I'm getting though is:
{'version': '1.0'}
{http://www.filemaker.com/xml/fmresultset}fmresultset
{http://www.filemaker.com/xml/fmresultset}field
The xmlns doesn't show up as an attribute of the root tag but it is there.
And it is placed in front of each tag name which makes it difficult to loop through and use conditionals. I want doc.tag just to show the tag and not the namespace and the tag.
This is day 1 for me using this. could anyone help out?
You need to handle namespaces, in your case an empty one:
from lxml import etree as ET
data = """<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>
</fmresultset>
"""
namespaces = {
"myns": "http://www.filemaker.com/xml/fmresultset"
}
tree = ET.fromstring(data)
print tree.find("myns:product", namespaces=namespaces).attrib.get("name")
Prints:
FileMaker Web Publishing Engine

How to keep the xml-stylesheet?

I want to keep the xml-stylesheet. But it doesn't work.
I use Python to modify the XML for deploy hadoop automatically.
XML:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://c11:9000</value>
  </property>
</configuration>
Code:
from xml.etree.ElementTree import ElementTree as ET
def modify_core_site(namenode_hostname):
tree = ET()
tree.parse("pkg/core-site.xml")
root = tree.getroot()
for p in root.iter("property"):
name = p.find("name").text
if name == "fs.default.name":
text = "hdfs://%s:9000" % namenode_hostname
p.find("value").text = text
tree.write("pkg/tmp.xml", encoding="utf-8", xml_declaration=True)
modify_core_site("c80")
Result:
<?xml version='1.0' encoding='utf-8'?>
<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://c80:9000</value>
  </property>
</configuration>
The xml-stylesheet disappear...
How can I keep this?
One solution is you can use lxml Once you parse xml go till you find the xsl node. Quick sample below:
>>> import lxml.etree
>>> doc = lxml.etree.parse('C:/downloads/xmltest.xml')
>>> root = doc.getroot()
>>> xslnode=root.getprevious().getprevious()
>>> xslnode
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
Make sure you put in some exception handling and check if the node indeed exists. You can check if the node is xslt processing instruction by
>>> isinstance(xslnode, lxml.etree._XSLTProcessingInstruction)
True

Accessing Elements with and without namespaces using lxml

Is there a way to search for the same element, at the same time, within a document that occur with and without namespaces using lxml? As an example, I would want to get all occurences of the element identifier irrespective of whether or not it is associated with a specific namespace. I am currently only able to access them separately as below.
Code:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()
for l in root.iter('identifier'):
print l.text
for l in root.iter('{http://www.openarchives.org/OAI/2.0/provenance}identifier'):
print l.text
File: xmlfile.xml
<?xml version="1.0"?>
<record>
<header>
<identifier>identifier1</identifier>
<datestamp>datastamp1</datestamp>
<setSpec>setspec1</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>title1</dc:title>
<dc:title>title2</dc:title>
<dc:creator>creator1</dc:creator>
<dc:subject>subject1</dc:subject>
<dc:subject>subject2</dc:subject>
</oai_dc:dc>
</metadata>
<about>
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
<baseURL>baseURL1</baseURL>
<identifier>identifier3</identifier>
<datestamp>datestamp2</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
<originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
<baseURL>xxxxx</baseURL>
<identifier>identifier4</identifier>
<datestamp>2010-04-27T01:10:31Z</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
</originDescription>
</originDescription>
</provenance>
</about>
</record>
You could use XPath to solve that kind of issue:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")

Categories

Resources