Adding subElement at a specific location with xml.dom.minidom (appendChild) - python

I intend to insert a sub element at a specified location. However, I do not know how to do that using appendChild in xml.dom
Here is my xml code:
<?xml version='1.0' encoding='UTF-8'?>
<VOD>
<root>
<ab>sdsd
<pp>pras</pp>
<ps>sinha</ps>
</ab>
<ab>prashu</ab>
<ab>sakshi</ab>
<cd>dfdf</cd>
</root>
<root>
<ab>pratik</ab>
</root>
<root>
<ab>Mum</ab>
</root>
</VOD>
I would like to insert another sub element "new" in first "root" element just before the "cd" tag. The result should look like this:
<ab>prashu</ab>
<ab>sakshi</ab>
<new>Anydata</new>
<cd>dfdf</cd>
The code I used for this is:
import xml.dom.minidom as m
doc = m.parse("file_notes.xml")
root=doc.getElementsByTagName("root")
valeurs = doc.getElementsByTagName("root")[0]
element = doc.createElement("new")
element.appendChild(doc.createTextNode("Anydata"))
valeurs.appendChild(element)
doc.writexml(open("newxmlfile.xml","w"))
In what way can I achieve my goal?
Thank you in advance..!!

Try using insertBefore instead. Something along these lines:
element = doc.createElement("new")
element.appendChild(doc.createTextNode("Anydata"))
cd = doc.getElementsByTagName("cd")[0]
cd.parentNode.insertBefore(element, cd)
To insert new nodes based on an index you can just do:
cd_list = doc.getElementsByTagName("cd")
cd_list[0].parentNode.insertBefore(element, cd_list[0])

Related

Get children elements of multiple instances of the same name tag using ElementTree

I have an xml file looking like this:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<boundary_conditions>
<rot>
<rot_instance>
<name>BC_1</name>
<rpm>200</rpm>
<parts>
<name>rim_FL</name>
<name>tire_FL</name>
<name>disk_FL</name>
<name>center_FL</name>
</parts>
</rot_instance>
<rot_instance>
<name>BC_2</name>
<rpm>100</rpm>
<parts>
<name>tire_FR</name>
<name>disk_FR</name>
</parts>
</rot_instance>
</data>
I actually know how to extract data corresponding to each instance. So I can do this for the names tag as follows:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
names= tree.findall('.//boundary_conditions/rot/rot_instance/name')
for val in names:
print(val.text)
which gives me:
BC_1
BC_2
But if I do the same thing for the parts tag:
names= tree.findall('.//boundary_conditions/rot/rot_instance/parts/name')
for val in names:
print(val.text)
It will give me:
rim_FL
tire_FL
disk_FL
center_FL
tire_FR
disk_FR
Which combines all data corresponding to parts/name together. I want output that gives me the 'parts' sub-element for each instance as separate lists. So this is what I want to get:
instance_BC_1 = ['rim_FL', 'tire_FL', 'disk_FL', 'center_FL']
instance_BC_2 = ['tire_FR', 'disk_FR']
Any help is appreciated,
Thanks.
You've got to first find all parts elements, then from each parts element find all name tags.
Take a look:
parts = tree.findall('.//boundary_conditions/rot/rot_instance/parts')
for part in parts:
for val in part.findall("name"):
print(val.text)
print()
instance_BC_1 = [val.text for val in parts[0].findall("name")]
instance_BC_2 = [val.text for val in parts[1].findall("name")]
print(instance_BC_1)
print(instance_BC_2)
Output:
rim_FL
tire_FL
disk_FL
center_FL
tire_FR
disk_FR
['rim_FL', 'tire_FL', 'disk_FL', 'center_FL']
['tire_FR', 'disk_FR']

Python - Deep XML file for loop

I am working with a XML file that looks like the code below, the real one has a lot more spreekbeurt sessions but I made it readable. My goal is to get from all the spreekbeurt sessions the text in the voorvoegsel and achternaam part.
<?xml version="1.0" encoding="utf-8"?>
<officiele-publicatie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://technische-documentatie.oep.overheid.nl/schema/op-xsd-2012-2">
<metadata>
<meta name="OVERHEIDop.externMetadataRecord" scheme="" content="https://zoek.officielebekendmakingen.nl/h-tk-20122013-4-2/metadata.xml" />
</metadata>
<handelingen>
<spreekbeurt nieuw="ja">
<spreker>
<voorvoegsels>De heer</voorvoegsels>
<naam>
<achternaam>Marcouch</achternaam>
</naam> (<politiek>PvdA</politiek>):</spreker>
<tekst status="goed">
<al>Sample Text</al>
</tekst>
</spreekbeurt>
</agendapunt>
</handelingen>
</officiele-publicatie>
I use a for loop to loop through all the spreekbeurt elemets in my XML file. But how do I print out the voorvoegsels and achternaam for every spreekbeurt in my XML file?
import xml.etree.ElementTree as ET
tree = ET.parse('...\directory')
root = tree.getroot()
for spreekbeurt in root.iter('spreekbeurt'):
print spreekbeurt.attrib
This code prints:
{'nieuw': 'nee'}
{'nieuw': 'ja'}
{'nieuw': 'nee'}
{'nieuw': 'nee'}
but how do I get the children printed out of the spreekbeurt?
Thanks in advance!
You can use find() passing path* to the target element to find individual element within a parent/ancestor, for example :
>>> for spreekbeurt in root.iter('spreekbeurt'):
... v = spreekbeurt.find('spreker/voorvoegsels')
... a = spreekbeurt.find('spreker/naam/achternaam')
... print v.text, a.text
...
De heer Marcouch
*) in fact it supports more than just simple path, but subset of XPath 1.0 expressions.

Python3 parse XML into dictionary

It seems the original post was too vague, so I'm narrowing down the focus of this post. I have an XML file from which I want to pull values from specific branches, and I am having difficulty in understanding how to effectively navigate the XML paths. Consider the XML file below. There are several <mi> branches. I want to store the <r> value of certain branches, but not others. In this example, I want the <r> values of counter1 and counter3, but not counter2.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="Data.xsl" ?>
<!DOCTYPE mdc SYSTEM "Data.dtd">
<mdc xmlns:HTML="http://www.w3.org/TR/REC-xml">
<mfh>
<vn>TEST</vn>
<cbt>20140126234500.0+0000</cbt>
</mfh>
<mi>
<mts>20140126235000.0+0000</mts>
<mt>counter1</mt>
<mv>
<moid>DEFAULT</moid>
<r>58</r>
</mv>
</mi>
<mi>
<mts>20140126235000.0+0000</mts>
<mt>counter2</mt>
<mv>
<moid>DEFAULT</moid>
<r>100</r>
</mv>
</mi>
<mi>
<mts>20140126235000.0+0000</mts>
<mt>counter3</mt>
<mv>
<moid>DEFAULT</moid>
<r>7</r>
</mv>
</mi>
</mdc>
From that I would like to build a tuple with the following:
('20140126234500.0+0000', 58, 7)
where 20140126234500.0+0000 is taken from <cbt>, 58 is taken from the <r> value of the <mi> element that has <mt>counter1</mt> and 7 is taken from the <mi> element that has <mt>counter3</mt>.
I would like to use xml.etree.cElementTree since it seems to be standard and should be more than capable for my purposes. But I am having difficulty in navigating the tree and extracting the values I need. Below is some of what I have tried.
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='Data.xml')
root = tree.getroot()
for mi in root.iter('mi'):
print(mi.tag)
for mt in mi.findall("./mt") if mt.value == 'counter1':
print(mi.find("./mv/r").value) #I know this is invalid syntax, but it's what I want to do :)
From a pseudo code standpoint, what I am wanting to do is:
find the <cbt> value and store it in the first position of the tuple.
find the <mi> element where <mt>counter1</mt> exists and store the <r> value in the second position of the tuple.
find the <mi> element where <mt>counter3</mt> exists and store the <r> value in the third position of the tuple.
I'm not clear when to use element.iter() or element.findall(). Also, I'm not having the best of luck with using XPath within the functions, or being able to extract the info I'm needing.
Thanks,
Rusty
Starting with:
import xml.etree.cElementTree as ET # or with try/except as per your edit
xml_data1 = """<?xml version="1.0"?> and the rest of your XML here"""
tree = ET.fromstring(xml_data) # or `ET.parse(<filename>)`
xml_dict = {}
Now tree has the xml tree and xml_dict will be the dictionary you're trying to get the result.
# first get the key & val for 'cbt'
cbt_val = tree.find('mfh').find('cbt').text
xml_dict['cbt'] = cbt_val
The counters are in 'mi':
for elem in tree.findall('mi'):
counter_name = elem.find('mt').text # key
counter_val = elem.find('mv').find('r').text # value
xml_dict[counter_name] = counter_val
At this point, xml_dict is:
>>> xml_dict
{'counter2': '100', 'counter1': '58', 'cbt': '20140126234500.0+0000', 'counter3': '7'}
Some shortening, though possibly not as read-able: the code in the for elem in tree.findall('mi'): loop can be:
xml_dict[elem.find('mt').text] = elem.find('mv').find('r').text
# that combines the key/value extraction to one line
Or further, building the xml_dict can be done in just two lines with the counters first and cbt after:
xml_dict = {elem.find('mt').text: elem.find('mv').find('r').text for elem in tree.findall('mi')}
xml_dict['cbt'] = tree.find('mfh').find('cbt').text
Edit:
From the docs, Element.findall() finds only elements with a tag which are direct children of the current element.
find() only finds the first direct child.
iter() iterates over all the elements recursively.

Extract all the text from xml data with python

I'm new to xml data processing. I want to extract the text data in the following xml file:
<data>
<p>12345<strong>45667</strong>abcde</p>
</data>
so that expected result is:
['12345','45667', 'abcde'] Currently I have tried:
tree = ET.parse('data.xml')
data = tree.getiterator()
text = [data[i].text for i in range(0, len(data))]
But the result only shows ['12345','45667'] . 'abcde' is missing. Can someone help me? Thanks in advance!
Try doing this using xpath and lxml :
import lxml.etree as etree
string = '''
<data>
<p>12345<strong>45667</strong>abcde</p>
</data>
'''
tree = etree.fromstring(string)
print(tree.xpath('//p//text()'))
The Xpath expression means: "select all p elements wich containing text recursively"
OUTPUT:
['12345', '45667', 'abcde']
getiterator() (or it's replacement iter()) iterates over child tags/elements, while abcde is a text node, a tail of the strong tag.
You can use itertext() method:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
print list(tree.find('p').itertext())
Prints:
['12345', '45667', 'abcde']

Python: How to replace a character in a XML file with a new node?

I want to replace all instances of semicolon ":" in my node below with a new node "<colon/>" as shown below.
I want this:
<shortName>Trigger:Digital Edge:Source</shortName>
to become like this:
<shortName>Trigger<colon/>Digital Edge<colon/>Source</shortName>
I have already tried using search and replace string, but when I get the output all the "< >" change to &lt and &gt .
Can anyone please suggest any techniques to do this.
Thank You
The idea is to get the node text, split it by colon and add one by one while setting .tail for every colon:
import xml.etree.ElementTree as ET
data = """<?xml version="1.0" encoding="UTF-8" ?>
<body>
<shortName>Trigger:Digital Edge:Source</shortName>
</body>"""
tree = ET.fromstring(data)
for element in tree.findall('shortName'):
items = element.text.split(':')
if not items:
continue
element.text = items[0]
for item in items[1:]:
colon = ET.Element('colon')
colon.tail = item
element.append(colon)
print ET.tostring(tree)
Prints:
<body>
<shortName>Trigger<colon />Digital Edge<colon />Source</shortName>
</body>

Categories

Resources