Print Soap Body data using lxml - python

I have a following XML . I need to store whole body xml from the Soap request in a variable .
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:cre="http://www.code.com/abc/V1/createCase">
<soapenv:Header><wsse:Security xmlns:wsse="http://docs.oasis-open.org/2" xmlns:wsu="http://docs.oasis-open.org/a.xsd"></wsse:Security>
</soapenv:Header>
<soapenv:Body xmlns:wsu="http://docs.oasis-open.org/30.xsd" wsu:Id="id-14">
<cre:createCase>
<cre:Request>
<cre:ServiceAttributesGrp>
<cre:MinorVer>?</cre:MinorVer>
</cre:ServiceAttributesGrp>
<cre:CreateCaseReqGrp>
<cre:Language>English</cre:Language>
<cre:CustFirstNm>Issue</cre:CustFirstNm>
<cre:CustLastNm>Detection</cre:CustLastNm>
<cre:AddlDynInfoGrp>
<cre:AddlDynInfo>
<cre:FieldNm>TM3</cre:FieldNm>
<cre:FieldVal></cre:FieldVal>
</cre:AddlDynInfo>
<cre:AddlDynInfo>
<cre:FieldNm>PM417</cre:FieldNm>
<cre:FieldVal>Not Defined</cre:FieldVal>
</cre:AddlDynInfo>
</cre:AddlDynInfoGrp>
<cre:CreateCriteriasGrp>
<cre:CreateCriterias>
<cre:CriteriaNm>CriticalReqDtlValidationReqd</cre:CriteriaNm>
</cre:CreateCriterias>
</cre:CreateCriteriasGrp>
</cre:CreateCaseReqGrp>
</cre:Request>
</cre:createCase>
</soapenv:Body>
</soapenv:Envelope>
As of now I am trying to print in the following manner , but I am unable to :
ns = {'cre': 'http://www.americanexpress.com/worldservice/CLIC/CaseManagementService/V1/createCase' , 'soapenv':'http://schemas.xmlsoap.org/soap/envelope/'}
tree = etree.parse(template_xml)
root = tree.getroot()
for bodytag in root.xpath('soapenv:Body/cre:createCase',namespaces=ns):
print bodytag
datalevel = etree.XPathEvaluator(bodytag,namespaces=ns)
print datalevel('cre:createCase').text()
I just need to print the createCase part .

I dumped your xml into variable root, here is how you can get that piece of XML:
import lxml.etree as ET
createCase=root.find('.//cre:createCase',namespaces=root.nsmap)
print ET.tostring(createCase, pretty_print=True)
prints this:
<cre:createCase xmlns:cre="http://www.code.com/abc/V1/createCase" xmlns:wsu="http://docs.oasis-open.org/30.xsd" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<cre:Request>
<cre:ServiceAttributesGrp>
<cre:MinorVer>?</cre:MinorVer>
</cre:ServiceAttributesGrp>
<cre:CreateCaseReqGrp>
<cre:Language>English</cre:Language>
<cre:CustFirstNm>Issue</cre:CustFirstNm>
<cre:CustLastNm>Detection</cre:CustLastNm>
<cre:AddlDynInfoGrp>
<cre:AddlDynInfo>
<cre:FieldNm>TM3</cre:FieldNm>
<cre:FieldVal/>
</cre:AddlDynInfo>
<cre:AddlDynInfo>
<cre:FieldNm>PM417</cre:FieldNm>
<cre:FieldVal>Not Defined</cre:FieldVal>
</cre:AddlDynInfo>
</cre:AddlDynInfoGrp>
<cre:CreateCriteriasGrp>
<cre:CreateCriterias>
<cre:CriteriaNm>CriticalReqDtlValidationReqd</cre:CriteriaNm>
</cre:CreateCriterias>
</cre:CreateCriteriasGrp>
</cre:CreateCaseReqGrp>
</cre:Request>
</cre:createCase>
EDIT:
OP was using an older version of python/lxml that did not take namespaces, the right code was:
createCase=etree.tostring(root.find('.//{http://www.code.com/abc/V1/createCase}createCase'))
print etree.tostring(createCase, pretty_print=True)

Related

how to parse XML with namespace and attribute in Python?

hi I am trying to parse xml with namespace and attribute.
I am almost close by using root.findall() and .get()
However still struggling to get the accurate values from xml file.
How to get the xml attribute values ?
Input:
<?xml version="1.0" encoding="UTF-8"?><message:GenericData
xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message"
xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:generic="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic"
xsi:schemaLocation="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message https://sdw-
wsrest.ecb.europa.eu:443/vocabulary/sdmx/2_1/SDMXMessage.xsd
http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common https://sdw-
wsrest.ecb.europa.eu:443/vocabulary/sdmx/2_1/SDMXCommon.xsd
http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic https://sdw-
wsrest.ecb.europa.eu:443/vocabulary/sdmx/2_1/SDMXDataGeneric.xsd">
<generic:Obs>
<generic:ObsDimension value="1999-01"/>
<generic:ObsValue value="0.7029125"/>
</generic:Obs>
<generic:Obs>
<generic:ObsDimension value="1999-02"/>
<generic:ObsValue value="0.688505"/>
</generic:Obs>
Code:
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
for x in root.findall('.//'):
print(x.tag, " ", x.get('value'))
Output:
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}Obs None
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsDimension 1999-01
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsValue 0.7029125
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}Obs None
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsDimension 1999-02
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsValue 0.688505
Expected_Output:
1999-01 0.7029125
1999-02 0.688505
How about this:
for parent in root:
print(' '.join([child.get('value', "") for child in parent]))

Iterating through xml file

I am trying to get all surnames from xml file, but if I am trying to use find, It throws an exception
TypeError: 'NoneType' object is not iterable
This is my code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root:
for subelem in elem:
for subsubelem in subelem.find('surname'):
print(subsubelem.text)
When I remove the find('surname') from code, It returning all texts from subsubelements.
This is xml:
<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>
How should I fix it?
Not really a python person, but should the "find" statement include the "pp:" in its search, such as,
find('pp:surname')
Neither the opening nor closing tags actually match "surname".
Use the namespace when you call findall
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>'''
ns = {'pp': 'http://xmlns.page.com/path/subpath'}
root = ET.fromstring(xml)
names = [sn.text for sn in root.findall('.//pp:surname', ns)]
print(names)
output
['Walker', 'Jordan']

Add xml subelement with different namespaces than root element using lxml

This is a simplified version of the xml I'm trying to build:
<BizData xmlns="urn:iso:std:iso:20022:tech:xsd:head.003.001.01"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:n1="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.003.001.01 head.003.001.02_DTCC.xsd">
<Hdr>
<AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd">
</AppHdr>
</Hdr>
</BizData>
Python Code
from lxml import etree as etree
if __name__ == '__main__':
attr_qname = etree.QName('http://www.w3.org/2001/XMLSchema-instance', 'schemaLocation')
nsmap = {None: 'urn:iso:std:iso:20022:tech:xsd:head.003.001.01',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'n1': 'urn:iso:std:iso:20022:tech:xsd:head.001.001.02'
}
root = etree.Element('BizData',
{attr_qname: 'urn:iso:std:iso:20022:tech:xsd:head.003.001.01 head.003.001.02_DTCC.xsd'},
nsmap)
hdr = etree.Element('hdr')
attr_qname = etree.QName('http://www.w3.org/2001/XMLSchema-instance', 'schemaLocation')
nsmap = {None: 'urn:iso:std:iso:20022:tech:xsd:head.001.001.02',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
}
app_hdr = etree.Element('AppHdr',
{attr_qname: 'urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd'},
nsmap)
hdr.append(app_hdr)
root.append(hdr)
When printing hdr before appending to the root I get the correct output:
<Hdr>
<AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd">
</AppHdr>
</Hdr>
But after appending to root the namesspaces xmlns and xmlns:xsi disappear:
<BizData xmlns:n1="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:iso:std:iso:20022:tech:xsd:head.003.001.01"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.003.001.01 head.003.001.02_DTCC.xsd">
<hdr>
<AppHdr xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd"/>
</hdr>
</BizData>
I tried using the set function to set xmlns:xsi but this causes the error ..not a valid attribute...
Does anybody has an idea?
DIRTY WORKAROUND
Create Envelope (BizData), Header (Hdr) and Payload (Pyld) as individual etree.Element's
Transfer them to strings
Combine the strings
Write to xml file
This is ignoring any sort of validation but doesn't mess with the namespace. Not ideal, but does the job.

Extract specific XML tags Values in python

I have a XML file which contains tags like these.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DataFlows>
<DataFlow id="ABC">
<Flow name="flow4" type="Ingest">
<Ingest dataSourceName="type1" tableName="table1">
<DataSet>
<DataSetRef>value1-${d1}-${t1}</DataSetRef>
<DataStore>ingest</DataStore>
</DataSet>
<Mode>Overwrite</Mode>
</Ingest>
</Flow>
</DataFlow>
<DataFlow id="MHH" dependsOn="ABC">
<Flow name="flow5" type="Reconcile">
<Reconciliation>
<Source>QW</Source>
<Target>EF</Target>
<ComparisonKey>
<Column>dealNumber</Column>
</ComparisonKey>
<ReconcileColumns mode="required">
<Column>bookId</Column>
</ReconcileColumns>
</Reconciliation>
</Flow>
<Flow name="output" type="Export" format="Native">
<Table publishToSQLServer="true">
<DataSet>
<DataSetRef>value4_${cob}_${ts}</DataSetRef>
<DataStore>recon</DataStore>
<Date>${run_date}</Date>
</DataSet>
<Mode>Overwrite</Mode>
</Table>
</Flow>
</DataFlow>
</DataFlows>
I want to process this XML in python using Python Minimal DOM implementation.
I need to extract information in DataSet Tag only when the Flow type in “Reconcile".
For Example:
If my Flow Type is "Reconcile" then i need to go to next Flow tag named "output" and extract values of DataSetRef,DataSource and Date tags.
So far i have tried below mentioned Code but i am getting blank values in all may fields.
#!/usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("Store.xml")
collection = DOMTree.documentElement
#if collection.hasAttribute("DataFlows"):
# print "Root element : %s" % collection.getAttribute("DataFlows")
pretty = DOMTree.toprettyxml()
print "Collectio: %s" % collection
dataflows = DOMTree.getElementsByTagName("DataFlow")
# Print detail of each movie.
for dataflow in dataflows:
print "*****dataflow*****"
if dataflow.hasAttribute("dependsOn"):
print "Depends On is present"
flows = DOMTree.getElementsByTagName("Flow")
print "flows"
for flow in flows:
print "******flow******"
if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":
flowByReconcileType = flow.getAttribute("type")
TagValue = flow.getElementsByTagName("DataSet")
print "Tag Value is %s" % TagValue
print "flow type is: %s" % flowByReconcileType
From there onwards i need to pass these 3 values extracted above to Unix Shell scripts to process some directories.
Any Help would be appreciated.
First of all check if your XML is well formatted. You are missing a root tag and you got wrong double quotes for example here <Flow name=“flow4" type="Ingest">
IN your code you are correctly grabbing the dataflows.
You don't need to query the DOMTree again for the flows, you can check every dataflow's flow by querying like this:
flows = dataflow.getElementsByTagName("Flow")
Your condition if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile": looks ok to me, so in order to get the next flow item you can do something like this always checking your index is inside the array.
for index, flow in enumerate(flows):
if flow.hasAttribute("type") and flow.getAttribute("type") == "Reconcile":
if index + 1 < len(flows):
your_flow = flows[index + 1]

Parsing XML with ElementTree in Python

I have XML like this:
<parameter>
<name>ec_num</name>
<value>none</value>
<units/>
<url/>
<id>2455</id>
<m_date>2008-11-29 13:15:14</m_date>
<user_id>24</user_id>
<user_name>registry</user_name>
</parameter>
<parameter>
<name>swisspro</name>
<value>Q8H6N2</value>
<units/>
I want to parse the XML and extract the <value> entry which is just below the <name> entry marked 'swisspro'. I.e. I want to parse and extract the 'Q8H6N2' value.
How would I do this using ElementTree?
It would by much easier to do via lxml, but here' a solution using ElementTree library:
import xml.etree.ElementTree as ET
data = """<parameters>
<parameter>
<name>ec_num</name>
<value>none</value>
<units/>
<url/>
<id>2455</id>
<m_date>2008-11-29 13:15:14</m_date>
<user_id>24</user_id>
<user_name>registry</user_name>
</parameter>
<parameter>
<name>swisspro</name>
<value>Q8H6N2</value>
<units/>
</parameter>
</parameters>"""
tree = ET.fromstring(data)
for parameter in tree.iter(tag='parameter'):
name = parameter.find('name')
if name is not None and name.text == 'swisspro':
print parameter.find('value').text
break
prints:
Q8H6N2
The idea is pretty simple: iterate over all parameter tags, check the value of the name tag and if it is equal to swisspro, get the value element.
Hope that helps.
Here is an example:
xml file
<span style="font-size:13px;"><?xml version="1.0" encoding="utf-8"?>
<root>
<person age="18">
<name>hzj</name>
<sex>man</sex>
</person>
<person age="19" des="hello">
<name>kiki</name>
<sex>female</sex>
</person>
</root></span>
parse method
from xml.etree import ElementTree
def print_node(node):
'''print basic info'''
print "=============================================="
print "node.attrib:%s" % node.attrib
if node.attrib.has_key("age") > 0 :
print "node.attrib['age']:%s" % node.attrib['age']
print "node.tag:%s" % node.tag
print "node.text:%s" % node.text
def read_xml(text):
'''read xml file'''
# root = ElementTree.parse(r"D:/test.xml") #first method
root = ElementTree.fromstring(text) #second method
# get element
# 1 by getiterator
lst_node = root.getiterator("person")
for node in lst_node:
print_node(node)
# 2 by getchildren
lst_node_child = lst_node[0].getchildren()[0]
print_node(lst_node_child)
# 3 by .find
node_find = root.find('person')
print_node(node_find)
#4. by findall
node_findall = root.findall("person/name")[1]
print_node(node_findall)
if __name__ == '__main__':
read_xml(open("test.xml").read())

Categories

Resources