Parsing XML in Python with ElementTree

Parsing XML in Python with ElementTree - python

I'm using the documentation here to try to get only the values (name,ip , netmask) for certain elements.
This is an example of the structure of my xml:
<?xml version="1.0" ?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:5cf32451-91af-4f71-a0bd-ead244b81b1f">
<data>
<interfaces xmlns="urn:ietf:params:xml:ns:yang:ietf-interfaces">
<interface>
<name>GigabitEthernet1</name>
<type xmlns:ianaift="urn:ietf:params:xml:ns:yang:iana-if-type">ianaift:ethernetCsmacd</type>
<enabled>true</enabled>
<ipv4 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip">
<address>
<ip>192.168.40.30</ip>
<netmask>255.255.255.0</netmask>
</address>
</ipv4>
<ipv6 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip"/>
</interface>
<interface>
<name>GigabitEthernet2</name>
<type xmlns:ianaift="urn:ietf:params:xml:ns:yang:iana-if-type">ianaift:ethernetCsmacd</type>
<enabled>true</enabled>
<ipv4 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip">
<address>
<ip>10.10.10.1</ip>
<netmask>255.255.255.0</netmask>
</address>
</ipv4>
<ipv6 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip"/>
</interface>
</interfaces>
</data>
</rpc-reply>
Python code: This code returns nothing .
import xml.etree.ElementTree as ET
tree = ET.parse("C:\\Users\\Redha\\Documents\\test_network\\interface1234.xml")
root = tree.getroot()
namespaces = {'interfaces': 'urn:ietf:params:xml:ns:yang:ietf-interfaces' }
for elem in root.findall('.//interfaces:interfaces', namespaces):
s0 = elem.find('.//interfaces:name',namespaces)
name = s0.text
print(name)

interface = ET.parse('interface2.xml')
interface_root = interface.getroot()
for interface_attribute in interface_root[0][0]:
print(f"{interface_attribute[0].text}, {interface_attribute[3][0][0].text}, {interface_attribute[3][0][1].text}")

Related

Python - replace root element of one xml file with another root element without its children

I have one xml file that looks like this, XML1:
<?xml version='1.0' encoding='utf-8'?>
<report>
</report>
And the other one that is like this,
XML2:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
....
</child2>
</child1>
</report>
I need to replace and put root element of XML2 without its children, so XML1 looks like this:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
Currently my code looks like this but it won't remove children but put whole tree inside:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
report = source_root.findall('report')
for child in list(report):
report.remove(child)
source_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
Anyone has ide how can I achieve this?
Thanks!

Try the below (just copy attrib)
import xml.etree.ElementTree as ET
xml1 = '''<?xml version='1.0' encoding='utf-8'?>
<report>
</report>'''
xml2 = '''<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
</child2>
</child1>
</report>'''
root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)
root1.attrib = root2.attrib
ET.dump(root1)
output
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>

So here is working code:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
dest_tree = ET.parse('XML1.xml')
dest_root = dest_tree.getroot()
dest_root.attrib = source_root.attrib
dest_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)

Get text inside xml tags by their name

I had a xml code and i want to get text in exact elements(xml tags) using python language .
I have tried couple of solutions and didnt work.
import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('Model'):
print node
How can i do that ?
Xml Code :
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetVehicleLimitedInfoResponse
xmlns="http://schemas.conversesolutions.com/xsd/dmticta/v1">
<return>
<ResponseMessage xsi:nil="true" />
<ErrorCode xsi:nil="true" />
<RequestId> 2012290007705 </RequestId>
<TransactionCharge>150</TransactionCharge>
<VehicleNumber>GF-0176</VehicleNumber>
<AbsoluteOwner>SIYAPATHA FINANCE PLC</AbsoluteOwner>
<EngineNo>GA15-483936F</EngineNo>
<ClassOfVehicle>MOTOR CAR</ClassOfVehicle>
<Make>NISSAN</Make>
<Model>PULSAR</Model>
<YearOfManufacture>1998</YearOfManufacture>
<NoOfSpecialConditions>0</NoOfSpecialConditions>
<SpecialConditions xsi:nil="true" />
</return>
</GetVehicleLimitedInfoResponse>
</soap:Body>
</soap:Envelope>

Edited and improved answer:
import xml.etree.ElementTree as ET
import re
ns = {"veh": "http://schemas.conversesolutions.com/xsd/dmticta/v1"}
tree = ET.parse('test.xml') # save your xml as test.xml
root = tree.getroot()
def get_tag_name(tag):
return re.sub(r'\{.*\}', '',tag)
for node in root.find(".//veh:return", ns):
print(get_tag_name(node.tag)+': ', node.text)
It should produce something like this:
ResponseMessage: None
ErrorCode: None
RequestId: 2012290007705
TransactionCharge: 150
VehicleNumber: GF-0176
AbsoluteOwner: SIYAPATHA FINANCE PLC
EngineNo: GA15-483936F
ClassOfVehicle: MOTOR CAR
Make: NISSAN
Model: PULSAR
YearOfManufacture: 1998
NoOfSpecialConditions: 0
SpecialConditions: None

How to add space before and after CDATA in XML file

I want to create a function to modify XML content without changing the format. I managed to change the text but I can't do it without changing the format in XML.
So now, what I wanted to do is to add space before and after CDATA in a XML file.
Default XML file:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
And I am getting this result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row><![CDATA[001 001 099]]></Row>
</Data>
</Device>
</Map>
</Maps>
However, I want the new xml to be like this:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 099]]> </Row>
</Data>
</Device>
</Map>
</Maps>
Here is my code:
from lxml import etree as ET
def xml_new(f,fpath,newtext,xmlrow):
xmlrow = 19
parser = ET.XMLParser(strip_cdata=False)
tree = ET.parse(f, parser)
root = tree.getroot()
for child in root:
value = child[0][2][xmlrow].text
text = ET.CDATA("001 001 099")
child[0][2][xmlrow] = ET.Element('Row')
child[0][2][xmlrow].text = text
child[0][2][xmlrow].tail = "\n"
ET.register_namespace('A', "http://www.semi.org")
tree.write(fpath,encoding='utf-8',xml_declaration=True)
return value
Anyone can help me on this? thanks in advance!

I don't quite understand what you want to do. Here's an example for you. I don't know if it can meet your needs.
from simplified_scrapy import SimplifiedDoc,req,utils
html ='''<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>'''
doc = SimplifiedDoc(html)
row = doc.Data.Row # Get the node you want to modify.
row.setContent(" "+row.html+" ") # Modify the node content.
print (doc.html)
Result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice />
<Bin>
<Bin Bin="001" />
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>

thanks for all your help. I have found another way to achieve the result I want
This is the code:
# what you want to change
replaceby = '020]]> </Row>\n'
# row you want to change
row = 1
# col you want to change based on list
col = 3
file = open(file,'r')
line = file.readlines()
i = 0
editedXML=[]
for l in line:
if 'cdata' in l.lower():
i=i+1
if i == row:
oldVal = l.split(' ')
newVal = []
for index, old in enumerate(oldVal):
if index == col:
newVal.append(replaceby)
else:
newVal.append(old)
editedXML.append(' '.join(newVal))
else:
editedXML.append(l)
else:
editedXML.append(l)
file2 = open(newfile,'w')
file2.write(''.join(editedXML))
file2.close()

Parsing XML: Python ElementTree, find elements and its parent elements without other elements in same parent

I am using python's ElementTree library to parse an XML file which has the following structure. I am trying to get the xml string corresponding to entity with id = 192 with all its parents (folders) but without other entities
<catalog>
<folder name="entities">
<entity id="102">
</entity>
<folder name="newEntities">
<entity id="192">
</entity>
<entity id="2982">
</entity>
</folder>
</folder>
</catalog>
The required result should be
<catalog>
<folder name="entities">
<folder name="newEntities">
<entity id="192">
</entity>
</folder>
</folder>
</catalog>
assuming the 1st xml string is stored in a variable called xml_string
tree = ET.fromstring(xmlstring)
id = 192
required_element = tree.find(".//entity[#id='" + id + "']")
This gets the xml element for the required entity but not the parent folders, any quick solution fix for this?

The challenge here is to bypass the fact that ET has no parent information. The solution is to use parent_map
import copy
import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom
xml = '''<catalog>
<folder name="entities">
<entity id="102">
</entity>
<folder name="newEntities">
<entity id="192">
</entity>
<entity id="2982">
</entity>
</folder>
</folder>
</catalog>'''
def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = ET.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent="\t")
root = ET.fromstring(xml)
parent_map = {c: p for p in root.iter() for c in p}
_id = 192
required_element = root.find(".//entity[#id='" + str(_id) + "']")
_path = [copy.deepcopy(required_element)]
while True:
parent = parent_map.get(required_element)
if parent:
_path.append(copy.deepcopy(parent))
required_element = parent
else:
break
idx = len(_path) - 1
while idx >= 1:
_path[idx].clear()
_path[idx].append(_path[idx-1])
idx -= 1
print(prettify(_path[-1]))
output
<?xml version="1.0" ?>
<catalog>
<folder>
<folder>
<entity id="192">
</entity>
</folder>
</folder>
</catalog>

Append on XML element using Python

I have xml named user_data.xml that contains:
<?xml version="1.0"?>
<users>
<user name="Rocky" id="1" age="38"/>
<user name="Steve" id="2" age="50"/>
<user name="Melinda" id="3" age="38"/>
</users>
and i want to add new element inside users something like:
<?xml version="1.0"?>
<users>
<user name="Rocky" id="1" age="38"/>
<user name="Steve" id="2" age="50"/>
<user name="Melinda" id="3" age="38"/>
<user name="Yondu" id="4" age="55"/>
</users>
and I've tried to do it using this python code:
class add_user:
root_new = ET.Element("users")
root_new.append((ET.fromstring('<user name="Yondu" id="4" age="55"/>')))
tree = ET.ElementTree(root_new)
tree.write(sys.stdout)
for c in root_new:
print(root_new)
but it's not working .
Any idea on how can I do it?

Parse input XML file/content by etree.fromstring()
Now you get object of root element.
Then use etree.Element() method to create New User element.
As our root element is users i.e. append new element to root element by append method.
Demo:
>>> from lxml import etree
>>> input_data = """<?xml version="1.0"?>
... <users>
... <user name="Rocky" id="1" age="38"/>
... <user name="Steve" id="2" age="50"/>
... <user name="Melinda" id="3" age="38"/>
... </users>"""
>>> root = etree.fromstring(input_data)
>>> new_user = etree.Element("user", {"name":"Yondu", "id":"4", "age": "55"})
>>> root.tag
'users'
>>> root.append(new_user)
>>> print etree.tostring(root, method="xml", pretty_print=True)
<users>
<user name="Rocky" id="1" age="38"/>
<user name="Steve" id="2" age="50"/>
<user name="Melinda" id="3" age="38"/>
<user age="55" name="Yondu" id="4"/></users>
>>>
Note: Do necessary Exception handle.
lxml Documentation Link

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing XML in Python with ElementTree - python

interface = ET.parse('interface2.xml') interface_root = interface.getroot() for interface_attribute in interface_root[0][0]: print(f"{interface_attribute[0].text}, {interface_attribute[3][0][0].text}, {interface_attribute[3][0][1].text}")

Related

Python - replace root element of one xml file with another root element without its children

Get text inside xml tags by their name

How to add space before and after CDATA in XML file

Parsing XML: Python ElementTree, find elements and its parent elements without other elements in same parent

Append on XML element using Python

Categories

Resources