Recently ive been messing around with editing xml files with a python script for a project im working on but i cant figure out how to edit the attributes of the root element.
for example the xml file would say:
<root width="200">
<element1>
</element1>
</root>
what i want to do is have my code find the width attribute and change it to some other value, i know how to edit elements after the root but not the root itself
code im using for changing attributes
You could use the following module xml.etree.ElementTree. With this module you can set up attributes using xml.etree.ElementTree.Element.set()
Here is an example of snippet you could use:
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
root.set('width','400')
print(root.attrib)
tree.write('output.xml')
Related
I have tried to save the xml file in the following variable and later work on it as normal xml file. This is not working. How can I approach this situation. I need to edit the xml file without editing in the original file and without creating a new xml file. Is that possible?
comment_2 = open("cool.xml").read()
Thanks and Regards
You can use xml.etree.ElementTree to parse the XML file and then save it to a variable:
import xml.getElementTree as ET
tree = ET.parse('xml.etr.xml')
root = tree.getroot()
root.save(root)
root_variable = root_variable
Then you can save the xml file to an instance of ElementTree.
Assume that I've the following XML which I want to modify using Python's ElementTree:
<root xmlns:prefix="URI">
<child company:name="***"/>
...
</root>
I'm doing some modification on the XML file like this:
import xml.etree.ElementTree as ET
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
Then the XML file looks like:
<root xmlns:ns0="URI">
<child ns0:name="***"/>
...
</root>
As you can see, the namepsace prefix changed to ns0. I'm aware of using ET.register_namespace() as mentioned here.
The problem with ET.register_namespace() is that:
You need to know prefix and URI
It can not be used with default namespace.
e.g. If the xml looks like:
<root xmlns="http://uri">
<child name="name">
...
</child>
</root>
It will be transfomed to something like:
<ns0:root xmlns:ns0="http://uri">
<ns0:child name="name">
...
</ns0:child>
</ns0:root>
As you can see, the default namespace is changed to ns0.
Is there any way to solve this problem with ElementTree?
ElementTree will replace those namespaces' prefixes that are not registered with ET.register_namespace. To preserve a namespace prefix, you need to register it first before writing your modifications on a file. The following method does the job and registers all namespaces globally,
def register_all_namespaces(filename):
namespaces = dict([node for _, node in ET.iterparse(filename, events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])
This method should be called before ET.parse method, so that the namespaces will remain as unchanged,
import xml.etree.ElementTree as ET
register_all_namespaces('filename.xml')
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
I am trying to Parse an XML file using elemenTree of Python.
The xml file is like below:
<App xmlns="test attribute">
<name>sagar</name>
</App>
Parser Code:
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
def parser():
eleTree = etree.parse('app.xml')
eleRoot = eleTree.getroot()
print("Tag:"+str(eleRoot.tag)+"\nAttrib:"+str(eleRoot.attrib))
if __name__ == "__main__":
parser()
Output:
[sagar#linux Parser]$ python test.py
Tag:{test attribute}App <------------- It should print only "App"
Attrib:{}
When I remove "xmlns" attribute or rename "xmlns" attribute to something else the eleRoot.tag is printing correct value.
Why can't element tree unable to parse the tags properly when I have "xmlns" attribute in the tag. Am I missing some pre-requisite to parse an XML of this format using element tree?
Your xml uses the attribute xmlns, which is trying to define a default xml namespace. Xml namespaces are used to solve naming conflicts, and require a valid URI for their value, as such the value of "test attribute" is invalid, which appears to be troubling the parsing of your xml by etree.
For more information on xml namespaces see XML Namespaces at W3 Schools.
Edit:
After looking into the issue further it appears that the fully qualified name of an element when using a python's ElementTree has the form {namespace_url}tag_name. This means that, as you defined the default namespace of "test attribute", the fully qualified name of your "App" tag is infact {test attribute}App, which is what you're getting out of your program.
Source
I am new to etree. I wanted to read etree and put that particular information in another file format like html, xml, etc. I checked and now I can do that but now what about other way around? Like, If I want to read any other file format and generate or write into etree. Please give me some suggestions or with example to proceed with that.
Suppose you want to write an xml file test.xml like the following:
<?xml version='1.0' encoding='ASCII'?>
<document category = "location">
<name>Timbuktu</name>
<name>Eldorado</name>
</document>
The corresponding code would be:
from lxml import etree
root = etree.Element("document", {"category" : "locations"})
for location in ["Timbuktu", "Eldorado"]:
name = etree.SubElement(root, "name")
name.text = location
tree = etree.ElementTree(element=root, file=None, parser=None)
tree.write('test.xml', pretty_print=True, xml_declaration=True)
If you want to add further sub-elements under name then you have to nest another for loop and create subelements under the name tag object.
E.g. consider parsing a pom.xml file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>com.parent</groupId>
<artifactId>parent</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>2.0.0</modelVersion>
<groupId>com.parent.somemodule</groupId>
<artifactId>some_module</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Some Module</name>
...
Code:
import xml.etree.ElementTree as ET
tree = ET.parse(pom)
root = tree.getroot()
groupId = root.find("groupId")
artifactId = root.find("artifactId")
Both groupId and artifactId are None. Why when they are the direct descendants of the root? I tried to replace the root with tree (groupId = tree.find("groupId")) but that didn't change anything.
The problem is that you don't have a child named groupId, you have a child named {http://maven.apache.org/POM/4.0.0}groupId, because etree doesn't ignore XML namespaces, it uses "universal names". See Working with Namespaces and Qualified Names in the effbot docs.
Just to expand on abarnert's comment about BeautifulSoup, if you DO just want a quick and dirty solution to the problem, this is probably the fastest way to go about it. I have implemented this (for a personal script) that uses bs4, where you can traverse the tree with
element = dom.getElementsByTagNameNS('*','elementname')
This will reference the dom using ANY namespace, handy if you know you've only got one in the file so there's no ambiguity.