Modifying and rewriting XML file with Python ElementTree - python

I have a XML file that starts like this:
<?xml version="1.0" encoding="utf-8"?>
<Recipe xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
I need to read it in, modify it, then write it back out. Here is a code snippet:
from xml.etree import ElementTree
with open('base.xml', 'rt') as f:
tree = ElementTree.parse(f)
recipe = tree.find('')
t = recipe.find('Targets_Params/Target_Table/Target_Name')
t.text = "new Value"
output_file = open('new.xml', 'w' )
output_file.write(ElementTree.tostring(recipe))
output_file.close()
My problem is that when I write the file out I do not get the first line at all, and the second line comes out with just:
<Recipe>
How I can read in the file, modify it, and write it out while preserving the original structure?

Related

XML Python: XML code is duplicated after saving to file

I have a code that in principle is to open the file content and wrap it with an additional import tag:
with open('oferta-empik.xml', 'r+', encoding='utf-8') as f:
xml = '<import>' + f.read() + '</import>'
print(xml)
f.write(xml)
f.close()
Unfortunately, after saving half the code is unchanged, and then the xml code already wrapped in the import is inserted into the file.
In total, the file duplicates the xml code where the first original is unchanged and then the same is appended to the end of the file wrapped with the import tag
ORIGINAL CODE:
<offers>
<offer>
<leadtime-to-ship>1</leadtime-to-ship>
<product-id-type>EAN</product-id-type>
<state>11</state>
<quantity>0</quantity>
<price>146</price>
<sku>B01.001.1.10</sku>
</offer>
</offer>
AFTER CODE:
<offers>
<offer>
<leadtime-to-ship>1</leadtime-to-ship>
<product-id-type>EAN</product-id-type>
<state>11</state>
<quantity>0</quantity>
<price>146</price>
<sku>B01.001.1.10</sku>
</offer>
</offer>
<import><offers>
<offer>
<leadtime-to-ship>1</leadtime-to-ship>
<product-id-type>EAN</product-id-type>
<state>11</state>
<quantity>0</quantity>
<price>146</price>
<sku>B01.001.1.10</sku>
</offer>
</offer></import>
the issue is that you're appending the new text (the new XML) to the end of the file. You're reading the entire file, and then write the modified XML at the end of that file.
There are two solutions:
Recommended: open the file for reading. Read the XML. Close it, and then open it for writing and write the entire thing (override the initial content).
Not Recommended: After you read, seek to the beginning of the file (with f.seek(0)) and write the new content. This solution is not recommended because if, at some point, the new content is shorter than the original content, the result will be inconsistent / messed-up.
I have a code that in principle is to open the file content and wrap it with an additional import tag
Your current approach is wrong. Don't open XML files as text files, don't treat XML as text. Always use a parser.
This is a lot better:
import xml.etree.ElementTree as ET
# 1: load current document and top level element
old_tree = ET.parse('oferta-empik.xml')
old_root = old_tree.getroot()
# 2: create <import> element to serve as new top level
new_root = ET.Element('import')
# 3: insert current document root ("wrap it in <import>")
new_root.insert(0, old_root)
# 4 make new ElementTree and write it to file
new_tree = ET.ElementTree(new_root)
with open('output.xml', 'wb') as f:
new_tree.write(f, encoding='utf8')
Compressed:
new_root = ET.Element('import')
new_root.insert(0, ET.parse('oferta-empik.xml').getroot())
with open('output.xml', 'wb') as f:
ET.ElementTree(new_root).write(f, encoding='utf8')

lxml.etree: Start tag expected, '<' not found, line 1, column 1

I want to take some simple xml files and convert them all to CSV in one go (though this code is just for one at a time). It looks to me like there are no official name spaces, but I'm not sure.
I have this code (I used one header, SubmittingSystemVendor, but I really want to write all of them to CSV:
import csv
import lxml.etree
x = r'C:\Users\...\jh944.xml'
with open('output.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow('SubmittingSystemVendor')
root = lxml.etree.fromstring(x)
writer.writerow(row)
Here is a sample of the XML file:
<?xml version="1.0" encoding="utf-8"?>
<EOYGeneralCollectionGroup SchemaVersionMajor="2014-2015" SchemaVersionMinor="1" CollectionId="157" SubmittingSystemName="MISTAR" SubmittingSystemVendor="WayneRESA" SubmittingSystemVersion="2014" xsi:noNamespaceSchemaLocation="http://cepi.state.mi.us/msdsxml/EOYGeneralCollection2014-20151.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<EOYGeneralCollection>
<SubmittingEntity>
<SubmittingEntityTypeCode>D</SubmittingEntityTypeCode>
<SubmittingEntityCode>82730</SubmittingEntityCode>
</SubmittingEntity>
The error is:
lxml.etree: Start tag expected, '<' not found, line 1, column 1
You are using lxml.etree.fromstring, but giving it a file path as the argument. This means it's trying to interpret "C:\Users...\jh944.xml" as the XML data to be parsed.
Instead, you want to open the file containing this XML. You can simply replace the call to fromstring with lxml.etree.parse, which will accept a filename or open file object as the argument.

Pretty formatting xml file in Python using lxml

I am trying to add a vhost entry to tomcat server.xml using python lxml
import io
from lxml import etree
newdoc = etree.fromstring('<Host name="getrailo.com" appBase="webapps"><Context path="" docBase="/var/sites/getrailo.org" /><Alias>www.getrailo.org</Alias><Alias>my.getrailo.org</Alias></Host>')
doc = etree.parse('/root/server.xml')
root = doc.getroot()
for node1 in root.iter('Service'):
for node2 in node1.iter('Engine'):
node2.append(newdoc)
doc.write('/root/server.xml')
The problem is that it is removing the
<?xml version='1.0' encoding='utf-8'?>
line on top of the file from the output and the vhost entry is all in one line .How can I add the xml element in a pretty way like
<Host name="getrailo.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.getrailo.org</Alias>
<Alias>my.getrailo.org</Alias>
</Host>
First you need to parse existing file with remove_blank_text so that it's clean and with no extra spaces that I think is a problem in this case
parser = etree.XMLParser(remove_blank_text=True)
newdoc = etree.fromstring('/root/server.xml' parser=parser)
Then you're safe to write it back to disk with pretty_print and xml_declaration set in doc.write()
doc.write('/root/server.xml',
xml_declaration=True,
encoding='utf-8',
pretty_print=True)

Trying to extract xml element using python 2.7

I am trying to extract the name elements under the sequence in xml files. I have pasted in the top of a sample xml to illustrate. With this I want to get the text from 01 Interview_been successful through mentorship and write it to a file. There are multiple sequence tags in the xml and I am trying to figure out how to go through it and extract it. I have tried to figure out how to use xml.etree and xml.dom.minidom but I can't seem to wrap my brain around it. I was able to get all of the id values from the sequence tags but not the name elements. I'm pasting in my code before the xml.
from xml.etree import ElementTree
file = open("xmldump.txt", "r")
filedata = file.read()
file.close()
with open('test.xml', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.iter('name'):
sequenceid = node.attrib.get('name')
print ' %s' % (sequenceid)
newLine = sequenceid + "\n"
file = open("xmldump.txt", "w")
file.write(newLine)
file.close()
Here is the XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<bin>
<uuid>0F5D72FA-54E4-4DE8-81D7-CC33F5C43836</uuid>
<updatebehavior>add</updatebehavior>
<name>Logged</name>
<children>
<sequence id="01 Interview_been successful through mentorship">
<uuid>12FB944D-83EA-4527-9A54-2130A42E3A06</uuid>
<updatebehavior>add</updatebehavior>
<name>01 Interview_been successful through mentorship</name>
<duration>1195</duration>
<rate>
<ntsc>TRUE</ntsc>
<timebase>24</timebase>
</rate>
<timecode>
Well, I'm not sure if you want the "id" attribute or the name tag(your code is confusing, it tries to extract a "name" attribute out of the "sequence" tag, but that tag only has an "id" attribute). Below is code that extract both, should help you get started on figuring out how ElementTree works
from xml.etree import ElementTree
with open('test.xml', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.iter('sequence'):
sequenceid = node.attrib.get('id')
name = node.findtext('name')

How to add an element to xml file by using elementtree

I've a xml file, and I'm trying to add additional element to it.
the xml has the next structure :
<root>
<OldNode/>
</root>
What I'm looking for is :
<root>
<OldNode/>
<NewNode/>
</root>
but actually I'm getting next xml :
<root>
<OldNode/>
</root>
<root>
<OldNode/>
<NewNode/>
</root>
My code looks like that :
file = open("/tmp/" + executionID +".xml", 'a')
xmlRoot = xml.parse("/tmp/" + executionID +".xml").getroot()
child = xml.Element("NewNode")
xmlRoot.append(child)
xml.ElementTree(root).write(file)
file.close()
Thanks.
You opened the file for appending, which adds data to the end. Open the file for writing instead, using the w mode. Better still, just use the .write() method on the ElementTree object:
tree = xml.parse("/tmp/" + executionID +".xml")
xmlRoot = tree.getroot()
child = xml.Element("NewNode")
xmlRoot.append(child)
tree.write("/tmp/" + executionID +".xml")
Using the .write() method has the added advantage that you can set the encoding, force the XML prolog to be written if you need it, etc.
If you must use an open file to prettify the XML, use the 'w' mode, 'a' opens a file for appending, leading to the behaviour you observed:
with open("/tmp/" + executionID +".xml", 'w') as output:
output.write(prettify(tree))
where prettify is something along the lines of:
from xml.etree import ElementTree
from xml.dom import minidom
def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = ElementTree.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ")
e.g. the minidom prettifying trick.

Categories

Resources