I am working with xml files in python, and I want to ask if there is a way to add a subelement into an other subelement into the xml file.
If for the example, the structure of the xml file is as follow, and I want to add a new subelement under the container 'b'. how can I do it ?
<?xml version .....>
<module name=....>
<augment ....>
<container name="a">
</container>
<container name="b">
</container>
</augment>
</module>
If you want to do it in more future-proof way, you may want to use some kind of xml parser, i.e. lxml.etree. You could parse your xml, work on its elements and eventually dump it later back to a file. There is simple working example:
from lxml import etree
xml = '''<module name="x">
<augment name="y">
<container name="a">
</container>
<container name="b">
</container>
</augment>
</module>'''.strip()
xtree = etree.fromstring(xml)
for element in xtree.xpath('.//container[#name="b"]'):
new = etree.Element('something') # create new element to be inserted
new.set('name', 'xyz') # define some attributes for new element
element.append(new) # append it to your currently-processed element
print(etree.tostring(xtree,pretty_print=True).decode('ascii'))
For more see lxml documentation (https://lxml.de/tutorial.html)
Related
Recently ive been messing around with editing xml files with a python script for a project im working on but i cant figure out how to edit the attributes of the root element.
for example the xml file would say:
<root width="200">
<element1>
</element1>
</root>
what i want to do is have my code find the width attribute and change it to some other value, i know how to edit elements after the root but not the root itself
code im using for changing attributes
You could use the following module xml.etree.ElementTree. With this module you can set up attributes using xml.etree.ElementTree.Element.set()
Here is an example of snippet you could use:
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
root.set('width','400')
print(root.attrib)
tree.write('output.xml')
I want to create a text file from an XML file using XSLT.
Here is my code:
import lxml.etree as ET
dom = ET.parse('a_file.xml')
xslt = ET.parse('a_file.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)
print(ET.tostring(newdom, pretty_print=True))
when a_file.xsl does not contain a root element within the template like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:text>{ this is a test }</xsl:text>
</xsl:template>
</xsl:stylesheet>
the code returns None, however when I add a root element, then it works ie. <r><xsl:text>{ this is a test }</xsl:text></r>
If you want to create a text file as the result of an XSLT transformation, then there are two changes you need to make to the code in your question.
Firstly, you need to tell the XSLT that it will generate text output. Add the following element to your stylesheet, as a direct child of the <xsl:stylesheet> element:
<xsl:output method="text" encoding="utf-8" />
Secondly, if you want to convert the result to a string, follow the guidance in the lxml documentation and call str(...) on it, i.e.
print(str(newdom))
instead of
print(ET.tostring(newdom, pretty_print=True))
Assume that I've the following XML which I want to modify using Python's ElementTree:
<root xmlns:prefix="URI">
<child company:name="***"/>
...
</root>
I'm doing some modification on the XML file like this:
import xml.etree.ElementTree as ET
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
Then the XML file looks like:
<root xmlns:ns0="URI">
<child ns0:name="***"/>
...
</root>
As you can see, the namepsace prefix changed to ns0. I'm aware of using ET.register_namespace() as mentioned here.
The problem with ET.register_namespace() is that:
You need to know prefix and URI
It can not be used with default namespace.
e.g. If the xml looks like:
<root xmlns="http://uri">
<child name="name">
...
</child>
</root>
It will be transfomed to something like:
<ns0:root xmlns:ns0="http://uri">
<ns0:child name="name">
...
</ns0:child>
</ns0:root>
As you can see, the default namespace is changed to ns0.
Is there any way to solve this problem with ElementTree?
ElementTree will replace those namespaces' prefixes that are not registered with ET.register_namespace. To preserve a namespace prefix, you need to register it first before writing your modifications on a file. The following method does the job and registers all namespaces globally,
def register_all_namespaces(filename):
namespaces = dict([node for _, node in ET.iterparse(filename, events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])
This method should be called before ET.parse method, so that the namespaces will remain as unchanged,
import xml.etree.ElementTree as ET
register_all_namespaces('filename.xml')
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
I am new to etree. I wanted to read etree and put that particular information in another file format like html, xml, etc. I checked and now I can do that but now what about other way around? Like, If I want to read any other file format and generate or write into etree. Please give me some suggestions or with example to proceed with that.
Suppose you want to write an xml file test.xml like the following:
<?xml version='1.0' encoding='ASCII'?>
<document category = "location">
<name>Timbuktu</name>
<name>Eldorado</name>
</document>
The corresponding code would be:
from lxml import etree
root = etree.Element("document", {"category" : "locations"})
for location in ["Timbuktu", "Eldorado"]:
name = etree.SubElement(root, "name")
name.text = location
tree = etree.ElementTree(element=root, file=None, parser=None)
tree.write('test.xml', pretty_print=True, xml_declaration=True)
If you want to add further sub-elements under name then you have to nest another for loop and create subelements under the name tag object.
E.g. consider parsing a pom.xml file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>com.parent</groupId>
<artifactId>parent</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>2.0.0</modelVersion>
<groupId>com.parent.somemodule</groupId>
<artifactId>some_module</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Some Module</name>
...
Code:
import xml.etree.ElementTree as ET
tree = ET.parse(pom)
root = tree.getroot()
groupId = root.find("groupId")
artifactId = root.find("artifactId")
Both groupId and artifactId are None. Why when they are the direct descendants of the root? I tried to replace the root with tree (groupId = tree.find("groupId")) but that didn't change anything.
The problem is that you don't have a child named groupId, you have a child named {http://maven.apache.org/POM/4.0.0}groupId, because etree doesn't ignore XML namespaces, it uses "universal names". See Working with Namespaces and Qualified Names in the effbot docs.
Just to expand on abarnert's comment about BeautifulSoup, if you DO just want a quick and dirty solution to the problem, this is probably the fastest way to go about it. I have implemented this (for a personal script) that uses bs4, where you can traverse the tree with
element = dom.getElementsByTagNameNS('*','elementname')
This will reference the dom using ANY namespace, handy if you know you've only got one in the file so there's no ambiguity.