I'm using ElementTree in Python to parse an xml file and add or remove elements in it.
In my XML file the root and the elements just below the root have a namespace, but all the other elements do not.
I see that ElementTree, when printing the modified tree, adds namespaces to every element.
Is there a proper way of telling ElementTree to just keep namespaces in the elements where they originally appeared?
Try with this:
import xml.etree.ElementTree as ET
namespaces = {
'': 'http://tempuri.org/',
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'xsd': 'http://www.w3.org/2001/XMLSchema',
}
for prefix, uri in namespaces.items():
ET.register_namespace(prefix, uri)
Related
import xml.etree.ElementTree as ET
def addCommentInXml():
fileXml ='C:\\Users\\Documents\\config.xml'
tree = ET.parse(fileXml)
root = tree.getroot()
comment = ET.Comment('TEST')
root.insert(1, comment) # 1 is the index where comment is inserted
tree.write(fileXml, encoding='UTF-8', xml_declaration=True)
print("Done")
It is updating xml as below,Please suggest how to add right after xml declaration line:
<?xml version='1.0' encoding='UTF-8'?>
<ScopeConfig Checksum="5846AFCF4E5D02786">
<ExecutableName>STU</ExecutableName>
<!--TEST--><ZoomT2Encoder>-2230</ZoomT2Encoder>
The ElementTree XML API does not allow this. The documentation for the Comment factory function explicitly states:
An ElementTree will only contain comment nodes if they have been
inserted into to the tree using one of the Element methods.
but you would like to insert a comment outside the tree. The documentation for the TreeBuilder class is even more explicit:
When insert_comments and/or insert_pis is true, comments/pis will be
inserted into the tree if they appear within the root element (but not
outside of it)
So I would suggest writing out the XML file without the comment, using this API, and then reading the file as plain text (not parsed XML) to add your comment after the first line.
I want to read a XML string, edit it and save it as a XML file.
However I get the mentioned error in the title when I do .write()
I found out that when you read an XML string using ElementTree.fromstring(string) it will create an ElementTree.Element and not an ElementTree itself. An Element has no write method but the ElementTree does.
How can I write an Element to a XML file? Or how can I create an ElementTree and add my Element to that and then use the .write method?
I found out that when you read a xml string using ElementTree.fromstring(string) it will actually create an ElementTree.Element and not a ElementTree itself.
Yes, you get the top-level element back (also called the "document element").
An Element has no write method but the ElementTree does.
The ElementTree constructor signature goes like this:
class xml.etree.ElementTree.ElementTree(element=None, file=None)
Therefore it's completely straightforward:
import xml.etree.ElementTree as ET
doc = ET.fromstring("<test>test öäü</test>")
tree = ET.ElementTree(doc)
tree.write("test.xml", encoding="utf-8")
You always should specify the encoding when writing an XML file. Most of the time, UTF-8 is the best choice.
In case this helps anyone who gets this unclear error message when trying to use ElementTree to write an xml file, and spends way too long on it (like I did):
File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 788, in _get_writer
write = file_or_filename.write
AttributeError: 'str' object has no attribute 'write'
... in my case, it was simply because the path to the directory I was trying to write my xml file to did not exist! For example:
tree.write("/FolderDidNotExist/test.xml", encoding="utf-8")
a simple mkdir /FolderDidNotExist did the trick. No more error. (Of course, this error message could use some "love" so I'm posting this here in case I forget what it means again [which I've done] and need to google this again)
I have been using xml.etree.ElementTree to parse a Word XML document. After making my changes I use tree.write('test.xml') to write the tree to a file. Once the XML is saved, Word was unable to read the file. Looking at the XML, it appears that the new XML has all of the namespaces renamed.
For example, w:t became ns2:t
import xml.etree.ElementTree as ET
import re
tree = ET.parse('FL0809spec2.xml')
root = tree.getroot()
l = [' ',' ']
prev = None
count = 0
for t in root.iter('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t'):
l[0] = l[1]
l[1] = t.text
if(l[0] <> '' and l[1] <> '' and re.search(r'[a-zA-Z]', l[0][len(l[0]) - 1]) and re.search(r'[a-z]', l[1][0])):
words = re.findall(r'(\b\w+\b)(\W+)',l[1])
if(len(words) > 0):
prev.text = prev.text + words[0][0]
t.text = t.text[len(words[0][0]):]
count += 1
prev = t
tree.write('FL0809spec2Improved.xml')
It appears that:
a) Python built-in xml.etree.ElementTree is not idempotent (transparent) - if you read an XML file and then immediately write out the xml, the output is different from the input. The namespace prefixes are changed, for example. Also the initial ?xml and ?mso tags are removed. There may be other differences. The removal of the two initial tags doesn't seem to matter, so it's something about the rest of the XML that Word doesn't like.
and b) MS Word expects the namespaces to be written with exactly the same prefixes as the xml files it generates - IMO this is very poor (if not appalling) style because in pure XML terms it is the namespace URI that defines the namespace, not the prefix used to reference it, but hey ho that's the way it seems to work.
As long as you don't mind installing lxml, to solve your problem is very easy. Happily lxml.etree.ElementTree appears to be a lot more determined than xml.etree.ElementTree about not changing anything when writing what it has read, at least it maintains the prefixes that were read in, and those first two tags are written too.
So to use lxml:
Install xlmx with pip:
pip install lxml
Change the first line of your code from:
import xml.etree.ElementTree as ET
to:
from lxml import etree as ET
Then (in my testing of your code with the changey bits between reading and writing the xml removed) the output document can be opened without error in MS Word :-)
I'm working on a project to store various bits of text in xml files, but because people besides me are going to look at it and use it, it has to be properly indented and such. I looked at a question on how to generate xml files using cElement Tree here, and the guy says something about putting in info about making things pretty if people ask, but there isn't anything there (I guess because no one asked). So basically, is there a way to properly indent and whitespace using cElementTree, or should i just throw up my hands and go learn how to use lxml.
You can use minidom to prettify our xml string:
from xml.etree import ElementTree as ET
from xml.dom import minidom
# Return a pretty-printed XML string for the Element.
def prettify(xmlStr):
INDENT = " "
rough_string = ET.tostring(xmlStr, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=INDENT)
# name of root tag
root = ET.Element("root")
child = ET.SubElement(root, 'child')
child.text = 'This is text of child'
prettified_xmlStr = prettify(root)
output_file = open("Output.xml", "w")
output_file.write(prettified_xmlStr)
output_file.close()
print("Done!")
Answering myself here:
Not with ElementTree. The best option would be to download and install the module for lxml, then simply enable the option
prettyprint = True
when generating new XML files.
I am reading in a bunch of XML files. If the file only contains an empty root element like:
<?xml version="1.0" encoding="UTF-8"?>
<root />
I want to skip over it. Currently I do:
import xml.etree.cElementTree as ET
xml = ET.parse(filename)
if not [el for el in xml.getroot()]:
# skip
Is there a better way to handle this case?
Instead of the list comprehension, use the DOM methods ElementTree gives you:
if not xml.getroot().getchildren():
# skip