How to add a root to an XML file? [duplicate] - python

I'm having one XML file which doesn't have a single root tag. I want to add a new Root tag to this XML file.
Below is the existing XML:
<A>
<Val>123</Val>
</A>
<B>
<Val1>456</Val1>
</B>
Now I want to add a Root tag 'X', so the final XML will look like:
<X>
<A>
<Val>123</Val>
</A>
<B>
<Val1>456</Val1>
</B>
</X>
I've tried using the below python code:
from xml.etree import ElementTree as ET
root = ET.parse(Input_FilePath).getroot()
newroot = ET.Element("X")
newroot.insert(0, root)
tree = ET.ElementTree(newroot)
tree.write(Output_FilePath)
But at the first line I'm getting the below error:
xml.etree.ElementTree.ParseError: junk after document element: line 4, column 4

As pointed out in the comments by #kjhughes, the XML spec requires that a document must have a single root element.
from xml.etree import ElementTree as ET
node = ET.parse(Input_FilePath)
xml.etree.ElementTree.ParseError: junk after document element: line 4, column 0
You'll need to read the file manually and add the tags yourself:
from xml.etree import ElementTree as ET
with open(Input_FilePath) as f:
xml_string = '<X>' + f.read() + '</X>'
node = ET.fromstring(xml_string)

I think your can do in without xml parsers.
If your know that root tag missing, you can add it by such way.
with open('test.xml', 'r') as f:
data = f.read()
with open('test.xml', 'w') as f:
f.write("<x>\n" + data + "\n</x>")
f.close()
If dont know, your can check it by:
import re
if re.match(u"\s*<x>.*</x>", text, re.S) != None:
#do something
pass

Related

Can't get text of an XML element in python

I am trying to parse an XML file in python. Here is a small portion of the XML code:
<body>
<p feature="XXX">
<ph>text1 </ph>
DESIRED TEXT
<ph>text2</ph>
<ph>sometext...</ph>
</p>
</body>
I want to get "DESIRED TEXT". I did the following:
import xml.etree.ElementTree as ET
tree = ET.parse(dir)
root = tree.getroot()
for el in root.findall("./body/p"):
print(el.attrib, el.text)
el.attrib return the correct values (which is XXX in this case) but el.text return None.
What am I missing? What should I use instead of .text?
Thanks in advance.
You can use xmltodict lib:
import xmltodict
with open('file.xml', 'r') as f:
result = xmltodict.parse(f.read())['body']['p']['#text']
Output:
DESIRED TEXT
below (no need to install an external library)
import xml.etree.ElementTree as ET
xml = '''<body>
<p feature="XXX">
<ph>text1 </ph>
DESIRED TEXT
<ph>text2</ph>
<ph>sometext...</ph>
</p>
</body>'''
root = ET.fromstring(xml)
print(root.findall('.//ph')[0].tail.strip())

How to add a root to an existing XML which doesn't have a single root tag

I'm having one XML file which doesn't have a single root tag. I want to add a new Root tag to this XML file.
Below is the existing XML:
<A>
<Val>123</Val>
</A>
<B>
<Val1>456</Val1>
</B>
Now I want to add a Root tag 'X', so the final XML will look like:
<X>
<A>
<Val>123</Val>
</A>
<B>
<Val1>456</Val1>
</B>
</X>
I've tried using the below python code:
from xml.etree import ElementTree as ET
root = ET.parse(Input_FilePath).getroot()
newroot = ET.Element("X")
newroot.insert(0, root)
tree = ET.ElementTree(newroot)
tree.write(Output_FilePath)
But at the first line I'm getting the below error:
xml.etree.ElementTree.ParseError: junk after document element: line 4, column 4
As pointed out in the comments by #kjhughes, the XML spec requires that a document must have a single root element.
from xml.etree import ElementTree as ET
node = ET.parse(Input_FilePath)
xml.etree.ElementTree.ParseError: junk after document element: line 4, column 0
You'll need to read the file manually and add the tags yourself:
from xml.etree import ElementTree as ET
with open(Input_FilePath) as f:
xml_string = '<X>' + f.read() + '</X>'
node = ET.fromstring(xml_string)
I think your can do in without xml parsers.
If your know that root tag missing, you can add it by such way.
with open('test.xml', 'r') as f:
data = f.read()
with open('test.xml', 'w') as f:
f.write("<x>\n" + data + "\n</x>")
f.close()
If dont know, your can check it by:
import re
if re.match(u"\s*<x>.*</x>", text, re.S) != None:
#do something
pass

How to determine what the root tag name is for a XML document

I was wonder how I would go about determining what the root tag for an XML document is using xml.dom.minidom.
<?xml version="1.0" encoding="UTF-8"?>
<root>
<child1></child1>
<child2></child2>
<child3></child3>
</root>
In the example XML above, my root tag could be 3 or 4 different things. All I want to do is pull the tag, and then use that value to get the elements by tag name.
def import_from_XML(self, file_name)
file = open(file_name)
document = file.read()
if re.compile('^<\?xml').match(document):
xml = parseString(document)
root = '' # <-- THIS IS WHERE IM STUCK
elements = xml.getElementsByTagName(root)
I tried searching through the documentation for xml.dom.minidom, but it is a little hard for me to wrap my head around, and I couldn't find anything that answered this question outright.
I'm using Python 3.6.x, and I would prefer to keep with the standard library if possible.
For the line you commented as Where I am stuck, the following should assign the value of the root tag of the XML document to the variable theNameOfTheRootElement:
theNameOfTheRootElement = xml.documentElement.tagName
this is what I did when I last processed xml. I didn't use the approach you used but I hope it will help you.
import urllib2
from xml.etree import ElementTree as ET
req = urllib2.Request(site)
file=None
try:
file = urllib2.urlopen(req)
except urllib2.URLError as e:
print e.reason
data = file.read()
file.close()
root = ET.fromstring(data)
print("root", root)
for child in root.findall('parent element'):
print(child.text, child.attrib)

Find and replacing text in elementtree

i am very new to programming and python. I am trying to find and replace a text in an xml file. Here is my xml file
<?xml version="1.0" encoding="UTF-8"?>
<!--Arbortext, Inc., 1988-2008, v.4002-->
<!DOCTYPE doc PUBLIC "-//MYCOMPANY//DTD XSEIF 1/FAD 110 05 R5//EN"
"XSEIF_R5.dtd">
<doc version="XSEIF R5"
xmlns="urn:x-mycompany:r2:reg-doc:1551-fad.110.05:en:*">
<meta-data></meta-data>
<front></front>
<body>
<chl1><title xml:id="id_881i">Installation</title>
<p>To install SDK, perform the tasks mentioned in the following
table.</p>
<p><input>ln -s /sim/<var>user_id</var>/.VirtualBox $home/.VirtualBox</input
></p>
</chl1>
</body>
</doc>
<?Pub *0000021917 0?>
I need to replace all entries of "virtual box" with "Xen". For this i tried Elementtree. But i dont know how to replace and write back to the file. Here is my try.
import xml.etree.ElementTree as ET
tree=ET.parse('C:/My_location/1_1531-CRA 119 1364_2.xml')
doc=tree.getroot()
iterator=doc.getiterator()
for body in iterator:
old_text=body.replace("Virtualbox", "Xen")
The texts are available in many sub tags under body.I got the method to remove the subelement and append a new element, but didnt get to replace only the texts.
Replace text, tail attributes.
import lxml.etree as ET
with open('1.xml', 'rb+') as f:
tree = ET.parse(f)
root = tree.getroot()
for elem in root.getiterator():
if elem.text:
elem.text = elem.text.replace('VirtualBox', 'Xen')
if elem.tail:
elem.tail = elem.tail.replace('VirtualBox', 'Xen')
f.seek(0)
f.write(ET.tostring(tree, encoding='UTF-8', xml_declaration=True))
f.truncate()
Probably the simplest way is to do:
ifile = open('input_file','r')
ofile = open('output_file','w')
for line in ifile.readlines():
ofile.write(line.replace('VirtualBox','Xen'))
ifile.close()
ofile.close()

Trying to extract xml element using python 2.7

I am trying to extract the name elements under the sequence in xml files. I have pasted in the top of a sample xml to illustrate. With this I want to get the text from 01 Interview_been successful through mentorship and write it to a file. There are multiple sequence tags in the xml and I am trying to figure out how to go through it and extract it. I have tried to figure out how to use xml.etree and xml.dom.minidom but I can't seem to wrap my brain around it. I was able to get all of the id values from the sequence tags but not the name elements. I'm pasting in my code before the xml.
from xml.etree import ElementTree
file = open("xmldump.txt", "r")
filedata = file.read()
file.close()
with open('test.xml', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.iter('name'):
sequenceid = node.attrib.get('name')
print ' %s' % (sequenceid)
newLine = sequenceid + "\n"
file = open("xmldump.txt", "w")
file.write(newLine)
file.close()
Here is the XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<bin>
<uuid>0F5D72FA-54E4-4DE8-81D7-CC33F5C43836</uuid>
<updatebehavior>add</updatebehavior>
<name>Logged</name>
<children>
<sequence id="01 Interview_been successful through mentorship">
<uuid>12FB944D-83EA-4527-9A54-2130A42E3A06</uuid>
<updatebehavior>add</updatebehavior>
<name>01 Interview_been successful through mentorship</name>
<duration>1195</duration>
<rate>
<ntsc>TRUE</ntsc>
<timebase>24</timebase>
</rate>
<timecode>
Well, I'm not sure if you want the "id" attribute or the name tag(your code is confusing, it tries to extract a "name" attribute out of the "sequence" tag, but that tag only has an "id" attribute). Below is code that extract both, should help you get started on figuring out how ElementTree works
from xml.etree import ElementTree
with open('test.xml', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.iter('sequence'):
sequenceid = node.attrib.get('id')
name = node.findtext('name')

Categories

Resources