I want to read a XML string, edit it and save it as a XML file.
However I get the mentioned error in the title when I do .write()
I found out that when you read an XML string using ElementTree.fromstring(string) it will create an ElementTree.Element and not an ElementTree itself. An Element has no write method but the ElementTree does.
How can I write an Element to a XML file? Or how can I create an ElementTree and add my Element to that and then use the .write method?
I found out that when you read a xml string using ElementTree.fromstring(string) it will actually create an ElementTree.Element and not a ElementTree itself.
Yes, you get the top-level element back (also called the "document element").
An Element has no write method but the ElementTree does.
The ElementTree constructor signature goes like this:
class xml.etree.ElementTree.ElementTree(element=None, file=None)
Therefore it's completely straightforward:
import xml.etree.ElementTree as ET
doc = ET.fromstring("<test>test öäü</test>")
tree = ET.ElementTree(doc)
tree.write("test.xml", encoding="utf-8")
You always should specify the encoding when writing an XML file. Most of the time, UTF-8 is the best choice.
In case this helps anyone who gets this unclear error message when trying to use ElementTree to write an xml file, and spends way too long on it (like I did):
File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 788, in _get_writer
write = file_or_filename.write
AttributeError: 'str' object has no attribute 'write'
... in my case, it was simply because the path to the directory I was trying to write my xml file to did not exist! For example:
tree.write("/FolderDidNotExist/test.xml", encoding="utf-8")
a simple mkdir /FolderDidNotExist did the trick. No more error. (Of course, this error message could use some "love" so I'm posting this here in case I forget what it means again [which I've done] and need to google this again)
I have been using xml.etree.ElementTree to parse a Word XML document. After making my changes I use tree.write('test.xml') to write the tree to a file. Once the XML is saved, Word was unable to read the file. Looking at the XML, it appears that the new XML has all of the namespaces renamed.
For example, w:t became ns2:t
import xml.etree.ElementTree as ET
import re
tree = ET.parse('FL0809spec2.xml')
root = tree.getroot()
l = [' ',' ']
prev = None
count = 0
for t in root.iter('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t'):
l[0] = l[1]
l[1] = t.text
if(l[0] <> '' and l[1] <> '' and re.search(r'[a-zA-Z]', l[0][len(l[0]) - 1]) and re.search(r'[a-z]', l[1][0])):
words = re.findall(r'(\b\w+\b)(\W+)',l[1])
if(len(words) > 0):
prev.text = prev.text + words[0][0]
t.text = t.text[len(words[0][0]):]
count += 1
prev = t
tree.write('FL0809spec2Improved.xml')
It appears that:
a) Python built-in xml.etree.ElementTree is not idempotent (transparent) - if you read an XML file and then immediately write out the xml, the output is different from the input. The namespace prefixes are changed, for example. Also the initial ?xml and ?mso tags are removed. There may be other differences. The removal of the two initial tags doesn't seem to matter, so it's something about the rest of the XML that Word doesn't like.
and b) MS Word expects the namespaces to be written with exactly the same prefixes as the xml files it generates - IMO this is very poor (if not appalling) style because in pure XML terms it is the namespace URI that defines the namespace, not the prefix used to reference it, but hey ho that's the way it seems to work.
As long as you don't mind installing lxml, to solve your problem is very easy. Happily lxml.etree.ElementTree appears to be a lot more determined than xml.etree.ElementTree about not changing anything when writing what it has read, at least it maintains the prefixes that were read in, and those first two tags are written too.
So to use lxml:
Install xlmx with pip:
pip install lxml
Change the first line of your code from:
import xml.etree.ElementTree as ET
to:
from lxml import etree as ET
Then (in my testing of your code with the changey bits between reading and writing the xml removed) the output document can be opened without error in MS Word :-)
I'm using ElementTree in Python to parse an xml file and add or remove elements in it.
In my XML file the root and the elements just below the root have a namespace, but all the other elements do not.
I see that ElementTree, when printing the modified tree, adds namespaces to every element.
Is there a proper way of telling ElementTree to just keep namespaces in the elements where they originally appeared?
Try with this:
import xml.etree.ElementTree as ET
namespaces = {
'': 'http://tempuri.org/',
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'xsd': 'http://www.w3.org/2001/XMLSchema',
}
for prefix, uri in namespaces.items():
ET.register_namespace(prefix, uri)
Background
I am using SQLite to access a database and retrieve the desired information. I'm using ElementTree in Python version 2.6 to create an XML file with that information.
Code
import sqlite3
import xml.etree.ElementTree as ET
# NOTE: Omitted code where I acccess the database,
# pull data, and add elements to the tree
tree = ET.ElementTree(root)
# Pretty printing to Python shell for testing purposes
from xml.dom import minidom
print minidom.parseString(ET.tostring(root)).toprettyxml(indent = " ")
####### Here lies my problem #######
tree.write("New_Database.xml")
Attempts
I've tried using tree.write("New_Database.xml", "utf-8") in place of the last line of code above, but it did not edit the XML's layout at all - it's still a jumbled mess.
I also decided to fiddle around and tried doing:
tree = minidom.parseString(ET.tostring(root)).toprettyxml(indent = " ") instead of printing this to the Python shell, which gives the error AttributeError: 'unicode' object has no attribute 'write'.
Questions
When I write my tree to an XML file on the last line, is there a way to pretty print to the XML file as it does to the Python shell?
Can I use toprettyxml() here or is there a different way to do this?
Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
with open("New_Database.xml", "w") as f:
f.write(xmlstr)
There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings. If your toprettyxml method hands back a Unicode string (u"something"), then you may want to cast it to a suitable file encoding, such as UTF-8. E.g. replace the one write line with:
f.write(xmlstr.encode('utf-8'))
I simply solved it with the indent() function:
xml.etree.ElementTree.indent(tree, space=" ", level=0) Appends
whitespace to the subtree to indent the tree visually. This can be
used to generate pretty-printed XML output. tree can be an Element or
ElementTree. space is the whitespace string that will be inserted for
each indentation level, two space characters by default. For indenting
partial subtrees inside of an already indented tree, pass the initial
indentation level as level.
tree = ET.ElementTree(root)
ET.indent(tree, space="\t", level=0)
tree.write(file_name, encoding="utf-8")
Note, the indent() function was added in Python 3.9.
I found a way using straight ElementTree, but it is rather complex.
ElementTree has functions that edit the text and tail of elements, for example, element.text="text" and element.tail="tail". You have to use these in a specific way to get things to line up, so make sure you know your escape characters.
As a basic example:
I have the following file:
<?xml version='1.0' encoding='utf-8'?>
<root>
<data version="1">
<data>76939</data>
</data>
<data version="2">
<data>266720</data>
<newdata>3569</newdata>
</data>
</root>
To place a third element in and keep it pretty, you need the following code:
addElement = ET.Element("data") # Make a new element
addElement.set("version", "3") # Set the element's attribute
addElement.tail = "\n" # Edit the element's tail
addElement.text = "\n\t\t" # Edit the element's text
newData = ET.SubElement(addElement, "data") # Make a subelement and attach it to our element
newData.tail = "\n\t" # Edit the subelement's tail
newData.text = "5431" # Edit the subelement's text
root[-1].tail = "\n\t" # Edit the previous element's tail, so that our new element is properly placed
root.append(addElement) # Add the element to the tree.
To indent the internal tags (like the internal data tag), you have to add it to the text of the parent element. If you want to indent anything after an element (usually after subelements), you put it in the tail.
This code give the following result when you write it to a file:
<?xml version='1.0' encoding='utf-8'?>
<root>
<data version="1">
<data>76939</data>
</data>
<data version="2">
<data>266720</data>
<newdata>3569</newdata>
</data> <!--root[-1].tail-->
<data version="3"> <!--addElement's text-->
<data>5431</data> <!--newData's tail-->
</data> <!--addElement's tail-->
</root>
As another note, if you wish to make the program uniformally use \t, you may want to parse the file as a string first, and replace all of the spaces for indentations with \t.
This code was made in Python3.7, but still works in Python2.7.
Riffing on Ben Anderson answer as a function.
def _pretty_print(current, parent=None, index=-1, depth=0):
for i, node in enumerate(current):
_pretty_print(node, current, i, depth + 1)
if parent is not None:
if index == 0:
parent.text = '\n' + ('\t' * depth)
else:
parent[index - 1].tail = '\n' + ('\t' * depth)
if index == len(parent) - 1:
current.tail = '\n' + ('\t' * (depth - 1))
So running the test on unpretty data:
import xml.etree.ElementTree as ET
root = ET.fromstring('''<?xml version='1.0' encoding='utf-8'?>
<root>
<data version="1"><data>76939</data>
</data><data version="2">
<data>266720</data><newdata>3569</newdata>
</data> <!--root[-1].tail-->
<data version="3"> <!--addElement's text-->
<data>5431</data> <!--newData's tail-->
</data> <!--addElement's tail-->
</root>
''')
_pretty_print(root)
tree = ET.ElementTree(root)
tree.write("pretty.xml")
with open("pretty.xml", 'r') as f:
print(f.read())
We get:
<root>
<data version="1">
<data>76939</data>
</data>
<data version="2">
<data>266720</data>
<newdata>3569</newdata>
</data>
<data version="3">
<data>5431</data>
</data>
</root>
Install bs4
pip install bs4
Use this code to pretty print:
from bs4 import BeautifulSoup
x = your xml
print(BeautifulSoup(x, "xml").prettify())
If one wants to use lxml, it could be done in the following way:
from lxml import etree
xml_object = etree.tostring(root,
pretty_print=True,
xml_declaration=True,
encoding='UTF-8')
with open("xmlfile.xml", "wb") as writter:
writter.write(xml_object)`
If you see xml namespaces e.g. py:pytype="TREE", one might want to add before the creation of xml_object
etree.cleanup_namespaces(root)
This should be sufficient for any adaptation in your code.
One liner(*) to read, parse (once) and pretty print XML from file named fname:
from xml.dom import minidom
print(minidom.parseString(open(fname).read()).toprettyxml(indent=" "))
(* not counting import)
Using pure ElementTree and Python 3.9+:
def prettyPrint(element):
encoding = 'UTF-8'
# Create a copy of the input element: Convert to string, then parse again
copy = ET.fromstring(ET.tostring(element))
# Format copy. This needs Python 3.9+
ET.indent(copy, space=" ", level=0)
# tostring() returns a binary, so we need to decode it to get a string
return ET.tostring(copy, encoding=encoding).decode(encoding)
If you need a file, replace the last line with with copy.write(...) to avoid the extra overhead.
I'm working on a project to store various bits of text in xml files, but because people besides me are going to look at it and use it, it has to be properly indented and such. I looked at a question on how to generate xml files using cElement Tree here, and the guy says something about putting in info about making things pretty if people ask, but there isn't anything there (I guess because no one asked). So basically, is there a way to properly indent and whitespace using cElementTree, or should i just throw up my hands and go learn how to use lxml.
You can use minidom to prettify our xml string:
from xml.etree import ElementTree as ET
from xml.dom import minidom
# Return a pretty-printed XML string for the Element.
def prettify(xmlStr):
INDENT = " "
rough_string = ET.tostring(xmlStr, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=INDENT)
# name of root tag
root = ET.Element("root")
child = ET.SubElement(root, 'child')
child.text = 'This is text of child'
prettified_xmlStr = prettify(root)
output_file = open("Output.xml", "w")
output_file.write(prettified_xmlStr)
output_file.close()
print("Done!")
Answering myself here:
Not with ElementTree. The best option would be to download and install the module for lxml, then simply enable the option
prettyprint = True
when generating new XML files.