Parsing XML for specific item using ElementTree

Parsing XML for specific item using ElementTree - python

I am making a request to the Salesforce merge API and getting a response like this:
xml_result = '<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns="urn:partner.soap.sforce.com">
<soapenv:Header>
<LimitInfoHeader>
<limitInfo>
<current>62303</current>
<limit>2680000</limit><type>API REQUESTS</type></limitInfo>
</LimitInfoHeader>
</soapenv:Header>
<soapenv:Body>
<mergeResponse>
<result>
<errors>
<message>invalid record type</message>
<statusCode>INSUFFICIENT_ACCESS_ON_CROSS_REFERENCE_ENTITY</statusCode>
</errors>
<id>003skdjf494244</id>
<success>false</success>
</result>
</mergeResponse>
</soapenv:Body>
</soapenv:Envelope>'
I'd like to be able to parse this response and if success=false, return the errors, statusCode, and the message text.
I've tried the following:
import xml.etree.ElementTree as ET
tree = ET.fromstring(xml_result)
root.find('mergeResponse')
root.find('{urn:partner.soap.sforce.com}mergeResponse')
root.findtext('mergeResponse')
root.findall('{urn:partner.soap.sforce.com}mergeResponse')
...and a bunch of other variations of find, findtext and findall but I can't seem to get these to return any results. Here's where I get stuck. I've tried to follow the ElementTree docs, but I don't understand how to parse the tree for specific elements.

Element.find() finds the first child with a particular tag
https://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements
Since mergeResponse is a descendant, not a child, you should use XPath-syntax in this case:
root.find('.//{urn:partner.soap.sforce.com}mergeResponse')
will return your node. .// searches all descendants starting with the current node (in this case the root).

Related

Python - how to extract a soap envelope from input xml and send as output

I am new to python and I have an input xml which has a soap envelope embedded in it under a child node.
Input xml:
<SyncShipmentCreation xmlns="http://schema.infor.com/InforOAGIS/2" releaseID="2">
<ApplicationArea>
<CreationDateTime>2022-06-22T14:21:56Z</CreationDateTime>
</ApplicationArea>
<DataArea>
<Sync>
<TenantID>TLD_TST</TenantID>
</Sync>
<ShipmentCreation>
<soapenv:Envelope xmlns:dns="http://schema.infor.com/InforOAGIS/2" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header>
</soapenv:Header>
<soapenv:Body>
<ShipmentRequest xmlns="http://xxxyyyzzz.com/ShipmentMsgRequest">
...
</ShipmentRequest>
</soapenv:Body>
</soapenv:Envelope>
</ShipmentCreation>
</DataArea>
</SyncShipmentCreation>
The soap part is the Output needed. Like
<soapenv:Envelope xmlns:dns="http://schema.infor.com/InforOAGIS/2" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
..............
</soapenv:Envelope>
Is this possible? I could not find how to parse/extract this value and assign to an output variable. Please help.

One of the ways of doing it is by using the lxml library in the following way:
from lxml import etree
soap ="""[your xml above]"""
doc1 = etree.XML(soap)
#locate your target nodes and assign them to a second document
doc2 = doc1.xpath('//*[local-name()="ShipmentCreation"]/*')[0]
print(etree.tostring(doc2).decode())
The output should be your expected output.

Parse xsi:type in XML with ElementTree in Python

I'm trying to connect to a RESTful API and I'm hacing problems when building the XML request, for that I'm using Elementree library.
I have an example of the XML I have to send in the request. From that example a build a model and then write the different attributes by code. But the output XML is not exactly like the example I was given and I'm unable to connect to the API.
This is the example I have:
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetLoc xmlns="http://abc/Getloc">
<request>
<Access>
<string xmlns="http://bcd/Arrays"></string>
</Access>
<Details xsi:type="Request">
<Postcode ></Postcode >
</Details>
<UserConsent>Yes</UserConsent>
</request>
</GetLoc>
</soap:Body>
</soap:Envelope>
This is my code:
tree = ET.parse('model.xml')
root = tree.getroot()
ns = {'loc':'http://abc/Getloc',\
'arr':http://bcd/Arrays',\
'soapenv':'http://schemas.xmlsoap.org/soap/envelope/', \
'xsi':"http://www.w3.org/2001/XMLSchema-instance", \
xsd': "http://www.w3.org/2001/XMLSchema"}
tree.find('.//arr:string', ns).text = 'THC'
tree.find('.//Postcode ', ns).text = '15478'
This is the output XML (SOAP):
<ns0:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="http://abc/Getloc" xmlns:ns2="http://bcd/Arrays" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns0:Body>
<ns1:GetLoc >
<ns1:request>
<ns1:Access>
<ns2:string>THC</ns2:string>
</ns1:Access>
<ns1:Details xsi:type="Request">
<ns1:Postcode >15478</ns1:Postcode >
</ns1:Details>
<ns1:UserConsent>Yes</ns1:UserConsent>
</ns1:request>
</ns1:GetLoc >
</ns0:Body>
</ns0:Envelope>
With the example (first above) I have no problem when connecting to the API. However with the second one I get and error:
" status="Service Not Found. The request may have been sent to an invalid URL, or intended for an unsupported operation." xmlns:l7="http://www.layer7tech.com/ws/policy/fault"/>"
Both XML are sent to the same URL with the same headers and auth. I see both XML equivalent so I was expecting same behavior. I don't understand why it isn't working.
EDIT: The output XML needs to be like
<ns0:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="http://abc/Getloc" xmlns:ns2="http://bcd/Arrays" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns0:Body>
<ns1:GetLoc >
<ns1:request>
<ns1:Access>
<ns2:string>THC</ns2:string>
</ns1:Access>
<ns1:Details xsi:type="ns1:Request">
<ns1:Postcode >15478</ns1:Postcode >
</ns1:Details>
<ns1:UserConsent>Yes</ns1:UserConsent>
</ns1:request>
</ns1:GetLoc >
</ns0:Body>
</ns0:Envelope>
But I don't know hoy to change the code to get: xsi:type="ns1:Request"

Finally I found the solution myself.
The solution is in here (an incredibly complete article), since I was already using ElementTree. You may find other solutions like using lxml library.
So, for ElementTree I just need to use my own parser instead of the standard ElementTree.parse('file.xml').
The xsi attribute name is handled by the parser, but the parser doesn’t know that the attribute happens to contain a qualified name as well, so it leaves it as is. To be able to handle such a format, you can use a custom parser that knows how to handle certain attributes and elements, or keep track of the prefix mapping for each element.
To do the latter, you can use the iterparse parser, and ask it to report “start-ns” and “end-ns” events. The following snippet adds an ns_map attribute to each element which contains the prefix/URI mapping that applies to that specific element:
def parse_map(file):
events = "start", "start-ns", "end-ns"
root = None
ns_map = []
for event, elem in ET.iterparse(file, events):
if event == "start-ns":
ns_map.append(elem)
elif event == "end-ns":
ns_map.pop()
elif event == "start":
if root is None:
root = elem
elem.ns_map = dict(ns_map)
return ET.ElementTree(root)

parsing XML file in python2.7

I know this is a very common question, but the kind of XML file and the kind of extraction of data i need is a little unique due to the nature of the xml file. So appreciate any help on the steps to extract the required data, with pyhton2.7
I have the below XML
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>Mango.XYZ_DIG_Team_ABCDEF_Mango_Review</members>
<members>Mango.XYZ_DIG_Team_Reporting_Mango_Review</members>
<members>Opportunity.A_T_Occupier_City_Job_List</members>
<name>ListView</name>
</types>
<types>
<members>Modify_All_Data_Permission</members>
<members>Opportunity_Alerts_Implementation</members>
<members>Process_Builder_Permission</members>
<members>Regional_Business_Support</members>
<members>Reports_Dashboards_Data_Export_for_Super_Users</members>
<name>PermissionSet</name>
</types>
<types>
<members>SolutionManager</members>
<members>Standard</members>
<name>Profile</name>
</types>
<types>
<members>Mango.Set Verified Date and System Id</members>
<members>Mango.Update Mango Site With Billing Street%2C City%2C Country</members>
<members>Mango.Update Family Id on Mango when created</members>
<members>Opportunity.Set Opportunity Name</members>
<name>WorkflowRule</name>
</types>
<version>38.0</version>
</Package>
i am trying to extract only the members from the PermissionSet block. So that eventually i will have a file, that only have the entries like
Modify_All_Data_Permission
Opportunity_Alerts_Implementation
Process_Builder_Permission
Regional_Business_Support
Reports_Dashboards_Data_Export_for_Super_Users
I have been able to extract only the 'name' tag by
from xml.dom import minidom
doc = minidom.parse("path_to_xmlFile")
t = doc.getElementsByTagName("types")
for n in t:
name = n.getElementsByTagName("name")[0]
print name.firstChild.data
How can i extract the members and save that to a file?
Note: the number of 'members' are not fixed they varies.
I can also try with a different library, if it serves the purpose.

Probably easiest to use XPath
import xml.etree.ElementTree as ET
root = ET.parse('file.xml').getroot()
for member in root.findall(".//members/")
print(member.text)

This may help you!
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
for data in root[1]:
print data.text

Python Spyne custom output parameters

I need an output like this in Spyne:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<getActiveServicesResponse xmlns="http://mci.tajmi.ir/">
<getActiveServicesReturn>12345:2030:hafez poem:hafez </getActiveServicesReturn>
<getActiveServicesReturn>12346:2031:شعر طنز:tanz </getActiveServicesReturn>
<getActiveServicesReturn>bardari123:203861:سرویس بارداري :bar
</getActiveServicesReturn>
</getActiveServicesResponse>
</soapenv:Body>
</soapenv:Envelope>
What I can generate is
<?xml version='1.0' encoding='UTF-8'?>
<soap11env:Envelope xmlns:soap11env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tns="http://mci.tajmi.ir/">
<soap11env:Body>
<tns:getActiveServicesResponse>
<tns:getActiveServicesReturn>
<tns:string>12345:2030:hafez poem:hafez</tns:string>
<tns:string>12346:2031:شعر طنز:tanz </tns:string>
....
</tns:getActiveServicesReturn>
</tns:getActiveServicesResponse>
</soap11env:Body>
</soap11env:Envelope>
How can I customize the output? I tried complex methods without success.

have a look at my code at https://github.com/timi-ro/simulator. you can find how to make it. Also read it:
Spyne - how to duplicate one elements of wsdl file created by spyne?

Why won't this check for an element work using python elementtree

I finally decided to learn how to parse xml in python. I'm using elementtree just to get a basic understanding. I'm on CentOS 6.5 using python 2.7.9. I've looked through the following pages:
http://www.diveintopython3.net/xml.html
https://pymotw.com/2/xml/etree/ElementTree/parse.html#traversing-the-parsed-tree
and performed several searches on this forum, but I'm having some trouble and I'm not sure if it's my code or the xml I'm trying to parse.
I need to be able to verify if certain elements are in the xml or not. For example, in the xml below, I need to check to see if the element Analyzer is present and if so, get the attribute. Then, if Analyzer is present, I need to check for the location element and get the text then the name element and get that text. I thought that the following code would check to see if the element existed:
if element.find('...') is not None
but that yields inconsistent results and it never seems to find the location or name element. For example:
if tree.find('Alert') is not None:
appears to work, but
if tree.find('location') is not None:
or
if tree.find('Analyzer') is not None:
definitely don't work. I'm guessing that the tree.find() function only works for the top level?
So how do I do this check?
Here is my xml:
<?xml version='1.0' encoding='UTF-8'?>
<Report>
<Alert>
<Analyzer analyzerid="CS">
<Node>
<location>USA</location>
<name>John Smith</name>
</Node>
</Analyzer>
<AnalyzerTime>2016-06-11T00:30:02+0000</AnalyzerTime>
<AdditionalData type="integer" meaning="number of alerts in this report">19</AdditionalData>
<AdditionalData type="string" meaning="report schedule">5 minutes</AdditionalData>
<AdditionalData type="string" meaning="report type">alerts</AdditionalData>
<AdditionalData type="date-time" meaning="report start time">2016-06-11T00:25:16+0000</AdditionalData>
</Alert>
<Alert>
<CreateTime>2016-06-11T00:25:16+0000</CreateTime>
<Source>
<Node>
<Address category="ipv4-addr">
<address>1.5.1.4</address>
</Address>
</Node>
</Source>
<Target>
<Service>
<port>22</port>
<protocol>TCP</protocol>
</Service>
</Target>
<Classification text="SSH scans, direction:ingress, confidence:80, severity:high">
<Reference meaning="Scanning" origin="user-specific">
<name>SSH Attack</name>
<url> </url>
</Reference>
</Classification>
<Assessment>
<Action category="block-installed"/>
</Assessment>
<AdditionalData type="string" meaning="top level domain owner">PH, Philippines</AdditionalData>
<AdditionalData type="integer" meaning="alert threshold">0</AdditionalData>
</Alert>
</Report>
And here is my code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for child in root: print child
all_links = tree.findall('.//Analyzer')
try:
print all_links[0].attrib.get('analyzerid')
ID = all_links[0].attrib.get('analyzerid')
all_links2 = tree.findall('.//location')
print all_links2
try:
print all_links[0].text
except: print "can't print text location"
if tree.find('location') is None: print 'lost'
for kid in tree.iter('location'):
try:
location = kid.text
print kid.text
except: print 'bad'
except IndexError: print'There was no Analyzer element'

I think you're missing one important line from the Dive Into Python tutorial (just up from here):
There is a way to search for descendant elements, i.e. children, grandchildren, and any element at any nesting level.
That way is to precede the element names with //.
tree.find("someElementName") will only find a direct child element of tree with the name someElementName. If you want to search for an element named someElementName anywhere within tree, use tree.find("//someElementName").
The // notation originates from XPath. The ElementTree module provides support for a limited subset of XPath. The ElementTree documentation details the parts of XPath syntax it supports.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing XML for specific item using ElementTree - python

Related

Python - how to extract a soap envelope from input xml and send as output

Parse xsi:type in XML with ElementTree in Python

parsing XML file in python2.7

Python Spyne custom output parameters

Why won't this check for an element work using python elementtree

Categories

Resources