How to parse SOAP XML with Python - python

I have some SOAP responses saved in a file which I would like to parse,
Part of example file:
<?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<ns0:GetList_Operation_0Response xmlns:ns0="urn:COMPANY:TEST:Assets" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Data Centre</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0003</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWDS</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>UPS</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS66666</ns0:AssetInstanceId>
</ns0:getListValues>
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Unix</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0004</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWQQ</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>COMPUTERSYSTEM</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS55555</ns0:AssetInstanceId>
</ns0:getListValues>
</ns0:GetList_Operation_0Response>
</soapenv:Body>
I would like to get (FullName & AssetInstanceId):
Data Centre;ASDAWDDAWSDWADS66666
Unix;ASDAWDDAWSDWADS55555
Could you suggest the best method to do that? Whenever I try to do that with ElementTree I get error of
"SyntaxError: expected path separator (:)"
Probably because of ns0: annex in beginning of every line

Looks like your xml is broken. You can try using regex to get the required values
Demo:
a = """<?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns0:GetList_Operation_0Response xmlns:ns0="urn:COMPANY:TEST:Assets" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Data Centre</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0003</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWDS</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>UPS</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS66666</ns0:AssetInstanceId>
</ns0:getListValues>
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Unix</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0004</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWQQ</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>COMPUTERSYSTEM</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS55555</ns0:AssetInstanceId>
</ns0:getListValues>"""
import re
FullName = re.findall("<ns0:FullName>(.*?)</ns0:FullName>", a)
AssetInstanceId = re.findall("<ns0:AssetInstanceId>(.*?)</ns0:AssetInstanceId>", a)
for i in zip(FullName, AssetInstanceId):
print(i)
Output:
'Data Centre', 'ASDAWDDAWSDWADS66666'
'Unix', 'ASDAWDDAWSDWADS55555'

Related

Remove namespaces and nodes from XML string in python

I get an xml string from a post request and I need to use this xml in a subsequent request. I need to edit the XML from the first request to reflect the correct format for the subsequent request.
I can successfully remove the name spaces but am struggling with extracting the desired node and keeping the xml formatting.
current format
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<GetExResponse xmlns="http://www.someurl.com/">
<GetExResult>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
</GetExResult>
</GetExResponse>
</soap:Body>
</soap:Envelope>
Desired Format
<?xml version="1.0" encoding="UTF-8"?>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
--removes namespaces
dmXML = xmlstring
from lxml import etree
root = etree.fromstring(dmXML)
for elem in root.getiterator():
elem.tag = etree.QName(elem).localname
etree.cleanup_namespaces(root)
test = etree.tostring(root).decode()
print(test)
--extracts desired node but into dataframe changing the formatting
xdf = pandas.read_xml(dmXML, xpath='.//DataMap/*', namespaces={"doc": "http://www.w3.org/2001/XMLSchema"})
xml = pandas.DataFrame.to_xml(xdf)
You can simply extract the relevant portion into a new document:
import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())
Output:
<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
</DataMap>

Python - replace root element of one xml file with another root element without its children

I have one xml file that looks like this, XML1:
<?xml version='1.0' encoding='utf-8'?>
<report>
</report>
And the other one that is like this,
XML2:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
....
</child2>
</child1>
</report>
I need to replace and put root element of XML2 without its children, so XML1 looks like this:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
Currently my code looks like this but it won't remove children but put whole tree inside:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
report = source_root.findall('report')
for child in list(report):
report.remove(child)
source_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
Anyone has ide how can I achieve this?
Thanks!
Try the below (just copy attrib)
import xml.etree.ElementTree as ET
xml1 = '''<?xml version='1.0' encoding='utf-8'?>
<report>
</report>'''
xml2 = '''<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
</child2>
</child1>
</report>'''
root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)
root1.attrib = root2.attrib
ET.dump(root1)
output
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
So here is working code:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
dest_tree = ET.parse('XML1.xml')
dest_root = dest_tree.getroot()
dest_root.attrib = source_root.attrib
dest_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)

create subelement with namespace in xml

I want to create this xml, but I don't know how to create the subelement IsAddSegments with the namespace:
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope">
<soapenv:Body>
<ISAddSegments xmlns="http://www.blue-order.com/ma/integrationservicews/api">
<accessKey>key</accessKey>
<objectID>
<guid>guid</guid>
</objectID>
<StratumName>STRATUM</StratumName>
<Segments>
<Segment>
<Begin>00:00:00:00</Begin>
<Content>TEXT</Content>
<End>00:00:10:00</End>
</Segment>
</Segments>
</ISAddSegments>
</soapenv:Body>
</soapenv:Envelope>
This is what I have:
import xml.etree.ElementTree as ET
Envelope = ET.Element("{http://www.w3.org/2003/05/soap-envelope}Envelope")
Body = ET.SubElement(Envelope, '{http://www.w3.org/2003/05/soap-envelope}Body')
ET.register_namespace('soapenv', 'http://www.w3.org/2003/05/soap-envelope')
ISAddSegments = ET.SubElement(Body, '{http://www.blue-order.com/ma/integrationservicews/api}ISAddSegments')
...
But this creates an extra namespace in the main element and that's not what I need.
<?xml version='1.0' encoding='utf-8'?>
<soapenv:Envelope xmlns:ns1="http://www.blue-order.com/ma/integrationservicews/api" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope">
<soapenv:Body>
<ns1:ISAddSegments>
I solved it with lxml:
from lxml import etree as etree
ns1 = 'http://www.w3.org/2003/05/soap-envelope'
ns2 = 'http://www.blue-order.com/ma/integrationservicews/api'
Envelope = etree.Element('{ns1}Envelope', nsmap = {'soapenv': ns1})
Body = etree.SubElement(Envelope, '{ns1}Body')
ISAddSegments = etree.SubElement(Body, 'ISAddSegments', nsmap = {None : ns2})
accessKey = etree.SubElement(ISAddSegments, 'accessKey')
...
Consider using a dedicated SOAP module such as suds. Then you can create a custom namespace by providing ns to Element. The value should be a tuple containing the namespace name and a url in which it's defined:
from suds.sax.element import Element
custom_namespace = ('custom_namespace', 'http://url/namespace.xsd')
element_with_custom_namespace = Element('Element', ns=custom_namespace)
print(element_with_custom_namespace)
# <custom_namespace:Element xmlns:custom_namespace="http://url/namespace.xsd"/>

XML parsing to get list of values in Python

i have a XML output like below:
<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:getValuesResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="http://soap.core.green.controlj.com"><getValuesReturn soapenc:arrayType="xsd:string[3]" xsi:type="soapenc:Array" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"><getValuesReturn xsi:type="xsd:string">337.81998</getValuesReturn><getValuesReturn xsi:type="xsd:string">129.1</getValuesReturn><getValuesReturn xsi:type="xsd:string">1152.9691</getValuesReturn></getValuesReturn></ns1:getValuesResponse></soapenv:Body></soapenv:Envelope>
I want to get all the values regarding "getValuesReturn" attribute as a Python list. For this, i used a code like below:
import libxml2
DOC="""<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:getValuesResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="http://soap.core.green.controlj.com"><getValuesReturn soapenc:arrayType="xsd:string[3]" xsi:type="soapenc:Array" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"><getValuesReturn xsi:type="xsd:string">337.81998</getValuesReturn><getValuesReturn xsi:type="xsd:string">129.1</getValuesReturn><getValuesReturn xsi:type="xsd:string">1152.9691</getValuesReturn></getValuesReturn></ns1:getValuesResponse></soapenv:Body></soapenv:Envelope>"""
def getValues(cat):
return [attr.content for attr in doc.xpathEval("/elements/parent[#name='%s']/child/#value" % (cat))]
# gelen xml dosyasini yazdir
doc = libxml2.parseDoc(DOC)
#getValuesReturn etiketinin degerlerini yazdir
print getValues("getValuesReturn")
It just returns me an empty list. But i should get a list such as ["337.81998","129.1","1152.9691"]. Could you please help me out with this ?
Thanks in advance.
Where does the xpath expression come from? It doesn't match anything. (There's no elements, parent tag element)
Try following:
DOC = ...
doc = libxml2.parseDoc(DOC)
print [attr.content for attr in doc.xpathEval(".//getValuesReturn")]
prints
['337.81998129.11152.9691', '337.81998', '129.1', '1152.9691']
doc = libxml2.parseDoc(DOC)
print [attr.content for attr in doc.xpathEval('.//getValuesReturn/text()')]
prints
['337.81998', '129.1', '1152.9691']

Soap Client using Suds

Soap call in Python
Hi above is my previous question regarding soap. In there i am passing a 1D array. Now my problem is i need to pass the 2D array to the following Soap schema.
Request Schema
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<CalculateWeb2DObjectArray xmlns="http://tempuri.org/">
<HCID>string</HCID>
<jaggedobjDataMICRO>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
</jaggedobjDataMICRO>
<numeratorID>int</numeratorID>
</CalculateWeb2DObjectArray>
</soap:Body>
</soap:Envelope>
Response Schema
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<CalculateWeb2DObjectArrayResponse xmlns="http://tempuri.org/">
<CalculateWeb2DObjectArrayResult>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
</CalculateWeb2DObjectArrayResult>
</CalculateWeb2DObjectArrayResponse>
</soap:Body>
</soap:Envelope>
My Code
from suds.xsd.doctor import Import, ImportDoctor
from suds.client import Client
# enable logging to see transmitted XML
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger('suds.client').setLevel(logging.DEBUG)
# fix broken wsdl
# add <s:import namespace="http://www.w3.org/2001/XMLSchema"/> to the wsdl
imp = Import('http://www.w3.org/2001/XMLSchema',
location='http://www.w3.org/2001/XMLSchema.xsd')
imp.filter.add('http://tempuri.org/')
wsdl_url = 'http://204.9.76.243/nuCast.DataFeedService/Service1.asmx?WSDL'
client = Client(wsdl_url, doctor=ImportDoctor(imp))
# make request
arrayofstring1 = client.factory.create('ArrayOfString')
arrayofstring1.string = [1,2]
arrayofstring2 = client.factory.create('ArrayOfString')
arrayofstring2.string = [5,6]
arrayofstring = client.factory.create('ArrayOfString')
arrayofstring.string = [arrayofstring1,arrayofstring2]
print client.service.CalculateWeb2DObjectArray(1073757, arrayofstring, 99)
But i got empty value in output.Plz help to solve this.
Thanks
You pass invalid arguments to CalculateWeb2DObjectArray() function.
To find out what type of arguments CalculateWeb2DObjectArray() accepts, you could add to your script:
print client
The output contains:
CalculateWeb2DObjectArray(xs:string HCID,
ArrayOfArrayOfAnyType jaggedobjDataMICRO,
xs:int numeratorID, )
So the second argument should be ArrayOfArrayOfAnyType, use client.factory to create it:
aoaoat = client.factory.create('ArrayOfArrayOfAnyType')
To find out how to populate aoaoat, just print it:
print aoaoat
The output:
(ArrayOfArrayOfAnyType){
ArrayOfAnyType[] = <empty>
}
Repeating the same procedure for ArrayOfAnyType you get:
(ArrayOfAnyType){
anyType[] = <empty>
}
Putting it all together:
aoaoat = client.factory.create('ArrayOfArrayOfAnyType')
lst = aoaoat.ArrayOfAnyType = []
for L in [[1,2], [5,6]]:
aoat = client.factory.create('ArrayOfAnyType')
aoat.anyType = L
lst.append(aoat)
response = client.service.CalculateWeb2DObjectArray(1073757, aoaoat, 99)
print response
Request
DEBUG:suds.client:sending to (
http://204.9.76.243/nuCast.DataFeedService/Service1.asmx)
message:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:ns0="http://tempuri.org/"
xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header/>
<ns1:Body>
<ns0:CalculateWeb2DObjectArray>
<ns0:HCID>1073757</ns0:HCID>
<ns0:jaggedobjDataMICRO>
<ns0:ArrayOfAnyType>
<ns0:anyType>1</ns0:anyType>
<ns0:anyType>2</ns0:anyType>
</ns0:ArrayOfAnyType>
<ns0:ArrayOfAnyType>
<ns0:anyType>5</ns0:anyType>
<ns0:anyType>6</ns0:anyType>
</ns0:ArrayOfAnyType>
</ns0:jaggedobjDataMICRO>
<ns0:numeratorID>99</ns0:numeratorID>
</ns0:CalculateWeb2DObjectArray>
</ns1:Body>
</SOAP-ENV:Envelope>
DEBUG:suds.client:headers = {
'SOAPAction': u'"http://tempuri.org/CalculateWeb2DObjectArray"',
'Content-Type': 'text/xml; charset=utf-8'}
Response
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<CalculateWeb2DObjectArrayResponse xmlns="http://tempuri.org/">
<CalculateWeb2DObjectArrayResult>
<ArrayOfAnyType>
<anyType>1</anyType>
<anyType>2</anyType>
</ArrayOfAnyType>
<ArrayOfAnyType>
<anyType>5</anyType>
<anyType>6</anyType>
</ArrayOfAnyType>
</CalculateWeb2DObjectArrayResult>
</CalculateWeb2DObjectArrayResponse>
</soap:Body>
</soap:Envelope>
Output
(ArrayOfArrayOfAnyType){
ArrayOfAnyType[] =
(ArrayOfAnyType){
anyType[] =
"1",
"2",
},
(ArrayOfAnyType){
anyType[] =
"5",
"6",
},
}

Categories

Resources