python xml.etree.ElementTree parse

python xml.etree.ElementTree parse - python

The SOAP envelop looks like following. I need to parse the following in python. I tried following without any luck.
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = mytree.getroot()
I tried Namespaces and XPath
Can someone help as I am new to python? Thanks.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Body>
<dp:response xmlns:dp="http://www.datapower.com/schemas/management">
<dp:timestamp>2008-03-18T17:48:22+01:00</dp:timestamp>
<dp:config>
<User xmlns:env="http://www.w3.org/2003/05/soap-envelope" name="xyz"
<mAdminState read-only="true">enabled</mAdminState>
<userSummary>admin</UserSummary>
</User>
<User xmlns:env="http://www.w3.org/2003/05/soap-envelope" name="abc"
<mAdminState read-only="true">enabled</mAdminState>
<userSummary>admin</UserSummary>
</User>
</dp:config>
</dp:response>
</env:Body>
</env:Envelope>

The XML is not well-formatted, you're missing a > at the end of each user tag. Also userSummary is not case matching with UserSummary. After these fixes, you can go ahead and start parsing the file as follows.
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
namespaces = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'dp': 'http://www.datapower.com/schemas/management'
}
users = root.findall('./soap:Body/dp:response/dp:config/User/userSummary', namespaces=namespaces)
for user in users:
print(user.text)

Related

Change the key value in XML using Python

I need to modify the online_hostname key value in XML using python. I tried xml element tree but it does not work.
import xml.etree.ElementTree as ET
xml_tree = ET.parse('test.xml')
root = xml_tree.getroot()
root[0][0] = "requiredvalue"
test.xml file is as below:
<?xml version="1.0" encoding="UTF-8" ?>
<bzinfo>
<myidentity online_hostname="testdevice-air_2022_01_25"
bzlogin="me#abc.com" />
</bzinfo>
Error:
IndexError: child assignment index out of range

It is much better to explicitly search the required node (find will do in this case) instead of using indexes on the root node:
import xml.etree.ElementTree as ET
xml_tree = ET.parse('test.xml')
root = xml_tree.getroot()
myidentity_node = root.find('myidentity')
myidentity_node.attrib['online_hostname'] = 'required_value'
xml_tree.write('modified.xml')
modified.xml after running this code:
<bzinfo>
<myidentity bzlogin="me#abc.com" online_hostname="required_value" />
</bzinfo>

python, xml: how to access the 3rd child by element' name

Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>

lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)

Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000

Script does not run correctly. No output

I'm writing a code that should read a XML file. But it doesn't run and I'm not able to find the problem. I think it has something to do with my laptop.
I've already tried to run it in the normal python3 shell and in there it works perfectly fine.
import os
import xml.etree.ElementTree as ET
tree = ET.parse('people.xml')
root = tree.getroot()
root[0].attrib
It should output this : {'name': 'Samy'} and it does in the python3 shell but it doesn't work in the script.
The XML file looks like this
<?xml version="1.0"?>
<PEOPLE>
<Person name="Samy">
<age>99</age>
<number>0176293747238</number>
</Person>
<Person name="Alkoholik">
<age>20</age>
<number>0176234923482</number>
</Person>
</PEOPLE>

adding the print() function gives you the wanted output.
import xml.etree.ElementTree as ET
XML = 'people.xml'
tree = ET.parse(XML)
root = tree.getroot()
print(root[0].attrib) #{'name': 'Samy'}
for i in root:
print(i.attrib) #{'name': 'Samy'}
#{'name': 'Alkoholik'}

parsing XML file in python2.7

I know this is a very common question, but the kind of XML file and the kind of extraction of data i need is a little unique due to the nature of the xml file. So appreciate any help on the steps to extract the required data, with pyhton2.7
I have the below XML
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>Mango.XYZ_DIG_Team_ABCDEF_Mango_Review</members>
<members>Mango.XYZ_DIG_Team_Reporting_Mango_Review</members>
<members>Opportunity.A_T_Occupier_City_Job_List</members>
<name>ListView</name>
</types>
<types>
<members>Modify_All_Data_Permission</members>
<members>Opportunity_Alerts_Implementation</members>
<members>Process_Builder_Permission</members>
<members>Regional_Business_Support</members>
<members>Reports_Dashboards_Data_Export_for_Super_Users</members>
<name>PermissionSet</name>
</types>
<types>
<members>SolutionManager</members>
<members>Standard</members>
<name>Profile</name>
</types>
<types>
<members>Mango.Set Verified Date and System Id</members>
<members>Mango.Update Mango Site With Billing Street%2C City%2C Country</members>
<members>Mango.Update Family Id on Mango when created</members>
<members>Opportunity.Set Opportunity Name</members>
<name>WorkflowRule</name>
</types>
<version>38.0</version>
</Package>
i am trying to extract only the members from the PermissionSet block. So that eventually i will have a file, that only have the entries like
Modify_All_Data_Permission
Opportunity_Alerts_Implementation
Process_Builder_Permission
Regional_Business_Support
Reports_Dashboards_Data_Export_for_Super_Users
I have been able to extract only the 'name' tag by
from xml.dom import minidom
doc = minidom.parse("path_to_xmlFile")
t = doc.getElementsByTagName("types")
for n in t:
name = n.getElementsByTagName("name")[0]
print name.firstChild.data
How can i extract the members and save that to a file?
Note: the number of 'members' are not fixed they varies.
I can also try with a different library, if it serves the purpose.

Probably easiest to use XPath
import xml.etree.ElementTree as ET
root = ET.parse('file.xml').getroot()
for member in root.findall(".//members/")
print(member.text)

This may help you!
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
for data in root[1]:
print data.text

How to keep the xml-stylesheet?

I want to keep the xml-stylesheet. But it doesn't work.
I use Python to modify the XML for deploy hadoop automatically.
XML:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
　　　　<name>fs.default.name</name>
　　　　<value>hdfs://c11:9000</value>
　　</property>
</configuration>
Code:
from xml.etree.ElementTree import ElementTree as ET
def modify_core_site(namenode_hostname):
tree = ET()
tree.parse("pkg/core-site.xml")
root = tree.getroot()
for p in root.iter("property"):
name = p.find("name").text
if name == "fs.default.name":
text = "hdfs://%s:9000" % namenode_hostname
p.find("value").text = text
tree.write("pkg/tmp.xml", encoding="utf-8", xml_declaration=True)
modify_core_site("c80")
Result:
<?xml version='1.0' encoding='utf-8'?>
<configuration>
<property>
　　　　<name>fs.default.name</name>
　　　　<value>hdfs://c80:9000</value>
　　</property>
</configuration>
The xml-stylesheet disappear...
How can I keep this?

One solution is you can use lxml Once you parse xml go till you find the xsl node. Quick sample below:
>>> import lxml.etree
>>> doc = lxml.etree.parse('C:/downloads/xmltest.xml')
>>> root = doc.getroot()
>>> xslnode=root.getprevious().getprevious()
>>> xslnode
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
Make sure you put in some exception handling and check if the node indeed exists. You can check if the node is xslt processing instruction by
>>> isinstance(xslnode, lxml.etree._XSLTProcessingInstruction)
True

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python xml.etree.ElementTree parse - python

Related

Change the key value in XML using Python

python, xml: how to access the 3rd child by element' name

Script does not run correctly. No output

parsing XML file in python2.7

How to keep the xml-stylesheet?

Categories

Resources