Add xml subelement with different namespaces than root element using lxml

Add xml subelement with different namespaces than root element using lxml - python

This is a simplified version of the xml I'm trying to build:
<BizData xmlns="urn:iso:std:iso:20022:tech:xsd:head.003.001.01"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:n1="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.003.001.01 head.003.001.02_DTCC.xsd">
<Hdr>
<AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd">
</AppHdr>
</Hdr>
</BizData>
Python Code
from lxml import etree as etree
if __name__ == '__main__':
attr_qname = etree.QName('http://www.w3.org/2001/XMLSchema-instance', 'schemaLocation')
nsmap = {None: 'urn:iso:std:iso:20022:tech:xsd:head.003.001.01',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'n1': 'urn:iso:std:iso:20022:tech:xsd:head.001.001.02'
}
root = etree.Element('BizData',
{attr_qname: 'urn:iso:std:iso:20022:tech:xsd:head.003.001.01 head.003.001.02_DTCC.xsd'},
nsmap)
hdr = etree.Element('hdr')
attr_qname = etree.QName('http://www.w3.org/2001/XMLSchema-instance', 'schemaLocation')
nsmap = {None: 'urn:iso:std:iso:20022:tech:xsd:head.001.001.02',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
}
app_hdr = etree.Element('AppHdr',
{attr_qname: 'urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd'},
nsmap)
hdr.append(app_hdr)
root.append(hdr)
When printing hdr before appending to the root I get the correct output:
<Hdr>
<AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd">
</AppHdr>
</Hdr>
But after appending to root the namesspaces xmlns and xmlns:xsi disappear:
<BizData xmlns:n1="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:iso:std:iso:20022:tech:xsd:head.003.001.01"
xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.003.001.01 head.003.001.02_DTCC.xsd">
<hdr>
<AppHdr xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.001.001.02 head.001.001.02.xsd"/>
</hdr>
</BizData>
I tried using the set function to set xmlns:xsi but this causes the error ..not a valid attribute...
Does anybody has an idea?

DIRTY WORKAROUND
Create Envelope (BizData), Header (Hdr) and Payload (Pyld) as individual etree.Element's
Transfer them to strings
Combine the strings
Write to xml file
This is ignoring any sort of validation but doesn't mess with the namespace. Not ideal, but does the job.

Related

how to parse XML with namespace and attribute in Python?

hi I am trying to parse xml with namespace and attribute.
I am almost close by using root.findall() and .get()
However still struggling to get the accurate values from xml file.
How to get the xml attribute values ?
Input:
<?xml version="1.0" encoding="UTF-8"?><message:GenericData
xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message"
xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:generic="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic"
xsi:schemaLocation="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message https://sdw-
wsrest.ecb.europa.eu:443/vocabulary/sdmx/2_1/SDMXMessage.xsd
http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common https://sdw-
wsrest.ecb.europa.eu:443/vocabulary/sdmx/2_1/SDMXCommon.xsd
http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic https://sdw-
wsrest.ecb.europa.eu:443/vocabulary/sdmx/2_1/SDMXDataGeneric.xsd">
<generic:Obs>
<generic:ObsDimension value="1999-01"/>
<generic:ObsValue value="0.7029125"/>
</generic:Obs>
<generic:Obs>
<generic:ObsDimension value="1999-02"/>
<generic:ObsValue value="0.688505"/>
</generic:Obs>
Code:
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
for x in root.findall('.//'):
print(x.tag, " ", x.get('value'))
Output:
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}Obs None
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsDimension 1999-01
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsValue 0.7029125
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}Obs None
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsDimension 1999-02
{http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic}ObsValue 0.688505
Expected_Output:
1999-01 0.7029125
1999-02 0.688505

How about this:
for parent in root:
print(' '.join([child.get('value', "") for child in parent]))

Iterating through xml file

I am trying to get all surnames from xml file, but if I am trying to use find, It throws an exception
TypeError: 'NoneType' object is not iterable
This is my code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root:
for subelem in elem:
for subsubelem in subelem.find('surname'):
print(subsubelem.text)
When I remove the find('surname') from code, It returning all texts from subsubelements.
This is xml:
<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>
How should I fix it?

Not really a python person, but should the "find" statement include the "pp:" in its search, such as,
find('pp:surname')
Neither the opening nor closing tags actually match "surname".

Use the namespace when you call findall
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>'''
ns = {'pp': 'http://xmlns.page.com/path/subpath'}
root = ET.fromstring(xml)
names = [sn.text for sn in root.findall('.//pp:surname', ns)]
print(names)
output
['Walker', 'Jordan']

Extracting Child XML using ElementTree ignoring Namespace

I have the following XML that I would like to extract a portion of the child if name matches "Adam"
<data>
<a:config version="1.0" xmlns:a="uri:abc.com/a" xmlns:b="uri:abc.com/b">
<a:xxx config="ABC">
<set>option_on</set>
<location>/123/123</location>
<data>123</data>
</a:xxx>
<a:xxx name="Adam">
<a:yyy value="5555-5555">
<log>true</log>
</a:yyy>
</a:xxx>
<a:xxx name="Lisa">
<a:yyy value="2222-2222">
<log>false</log>
</a:yyy>
</a:xxx>
</a:config>
</data>
I manage to extract the section but it doesn't output the original namespace rather it is showing ns0 and ns1. Below is my code
import xml.etree.ElementTree as ET
tree2 = ET.parse("mycode.xml")
root2= tree2.getroot()
for elem in tree2.iter(tag='{uri:abc.com/a}xxx'):
match = elem.get('name')
if match == "Adam":
bla = ET.dump(elem)
Output as follows: -
<ns0:xxx xmlns:ns0="uri:abc.com/a" name="Adam">
<ns0:yyy value="5555-5555">
<log>true</log>
</ns0:yyy>
</ns0:xxx>
I am hoping to get exactly as what the original document is:-
<a:xxx name="Adam">
<a:yyy value="5555-5555">
<log>true</log>
</a:yyy>
</a:xxx>

Use the register_namespace function.
import xml.etree.ElementTree as ET
tree2 = ET.parse("mycode.xml")
root2 = tree2.getroot()
# Register the 'a' prefix to be used when serializing
ET.register_namespace("a", "uri:abc.com/a")
for elem in tree2.iter(tag='{uri:abc.com/a}xxx'):
match = elem.get('name')
if match == "Adam":
bla = ET.dump(elem)
Output:
<a:xxx xmlns:a="uri:abc.com/a" name="Adam">
<a:yyy value="5555-5555">
<log>true</log>
</a:yyy>
</a:xxx>
This is not the exact output that you asked for. You cannot force ElementTree to omit the namespace declaration (because doing so would make the output ill-formed).

Remove XML node if childnode's childnode contains specific value

I need to filter an XML file for certain values, if the node contains this value, the node should be removed.
<?xml version="1.0" encoding="utf-8" ?>
<ogr:FeatureCollection
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ogr.maptools.org/ TZwards.xsd"
xmlns:ogr="http://ogr.maptools.org/"
xmlns:gml="http://www.opengis.net/gml">
<gml:boundedBy></gml:boundedBy>
<gml:featureMember>
<ogr:TZwards fid="F0">
<ogr:Region_Nam>TARGET</ogr:Region_Nam>
<ogr:District_N>Kondoa</ogr:District_N>
<ogr:Ward_Name>Bumbuta</ogr:Ward_Name>
</ogr:TZwards>
</gml:featureMember>
<gml:featureMember>
<ogr:TZwards fid="F1">
<ogr:Region_Nam>REMOVE</ogr:Region_Nam>
<ogr:District_N>Kondoa</ogr:District_N>
<ogr:Ward_Name>Pahi</ogr:Ward_Name>
</ogr:TZwards>
</gml:featureMember>
</ogr:FeatureCollection>
The Python script should keep the <gml:featureMember> node if the <ogr:Region_Nam> contains TARGET and remove all other nodes.
from xml.dom import minidom
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml').getroot()
removeList = list()
for child in tree.iter('gml:featureMember'):
if child.tag == 'ogr:TZwards':
name = child.find('ogr:Region_Nam').text
if (name == 'TARGET'):
removeList.append(child)
for tag in removeList:
parent = tree.find('ogr:TZwards')
parent.remove(tag)
out = ET.ElementTree(tree)
out.write(outputfilepath)
Desired output:
<?xml version="1.0" encoding="utf-8" ?>
<ogr:FeatureCollection>
<gml:boundedBy></gml:boundedBy>
<gml:featureMember>
<ogr:TZwards fid="F0">
<ogr:Region_Nam>TARGET</ogr:Region_Nam>
<ogr:District_N>Kondoa</ogr:District_N>
<ogr:Ward_Name>Bumbuta</ogr:Ward_Name>
</ogr:TZwards>
</gml:featureMember>
</ogr:FeatureCollection>
My output still contains all nodes..

You need to declare the namespaces in the python code:
from xml.dom import minidom
import xml.etree.ElementTree as ET
tree = ET.parse('/tmp/input.xml').getroot()
namespaces = {'gml': 'http://www.opengis.net/gml', 'ogr':'http://ogr.maptools.org/'}
for child in tree.findall('gml:featureMember', namespaces=namespaces):
if len(child.find('ogr:TZwards', namespaces=namespaces)):
name = child.find('ogr:TZwards', namespaces=namespaces).find('ogr:Region_Nam', namespaces=namespaces).text
if name != 'TARGET':
tree.remove(child)
out = ET.ElementTree(tree)
out.write("/tmp/out.xml")

Print Soap Body data using lxml

I have a following XML . I need to store whole body xml from the Soap request in a variable .
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:cre="http://www.code.com/abc/V1/createCase">
<soapenv:Header><wsse:Security xmlns:wsse="http://docs.oasis-open.org/2" xmlns:wsu="http://docs.oasis-open.org/a.xsd"></wsse:Security>
</soapenv:Header>
<soapenv:Body xmlns:wsu="http://docs.oasis-open.org/30.xsd" wsu:Id="id-14">
<cre:createCase>
<cre:Request>
<cre:ServiceAttributesGrp>
<cre:MinorVer>?</cre:MinorVer>
</cre:ServiceAttributesGrp>
<cre:CreateCaseReqGrp>
<cre:Language>English</cre:Language>
<cre:CustFirstNm>Issue</cre:CustFirstNm>
<cre:CustLastNm>Detection</cre:CustLastNm>
<cre:AddlDynInfoGrp>
<cre:AddlDynInfo>
<cre:FieldNm>TM3</cre:FieldNm>
<cre:FieldVal></cre:FieldVal>
</cre:AddlDynInfo>
<cre:AddlDynInfo>
<cre:FieldNm>PM417</cre:FieldNm>
<cre:FieldVal>Not Defined</cre:FieldVal>
</cre:AddlDynInfo>
</cre:AddlDynInfoGrp>
<cre:CreateCriteriasGrp>
<cre:CreateCriterias>
<cre:CriteriaNm>CriticalReqDtlValidationReqd</cre:CriteriaNm>
</cre:CreateCriterias>
</cre:CreateCriteriasGrp>
</cre:CreateCaseReqGrp>
</cre:Request>
</cre:createCase>
</soapenv:Body>
</soapenv:Envelope>
As of now I am trying to print in the following manner , but I am unable to :
ns = {'cre': 'http://www.americanexpress.com/worldservice/CLIC/CaseManagementService/V1/createCase' , 'soapenv':'http://schemas.xmlsoap.org/soap/envelope/'}
tree = etree.parse(template_xml)
root = tree.getroot()
for bodytag in root.xpath('soapenv:Body/cre:createCase',namespaces=ns):
print bodytag
datalevel = etree.XPathEvaluator(bodytag,namespaces=ns)
print datalevel('cre:createCase').text()
I just need to print the createCase part .

I dumped your xml into variable root, here is how you can get that piece of XML:
import lxml.etree as ET
createCase=root.find('.//cre:createCase',namespaces=root.nsmap)
print ET.tostring(createCase, pretty_print=True)
prints this:
<cre:createCase xmlns:cre="http://www.code.com/abc/V1/createCase" xmlns:wsu="http://docs.oasis-open.org/30.xsd" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<cre:Request>
<cre:ServiceAttributesGrp>
<cre:MinorVer>?</cre:MinorVer>
</cre:ServiceAttributesGrp>
<cre:CreateCaseReqGrp>
<cre:Language>English</cre:Language>
<cre:CustFirstNm>Issue</cre:CustFirstNm>
<cre:CustLastNm>Detection</cre:CustLastNm>
<cre:AddlDynInfoGrp>
<cre:AddlDynInfo>
<cre:FieldNm>TM3</cre:FieldNm>
<cre:FieldVal/>
</cre:AddlDynInfo>
<cre:AddlDynInfo>
<cre:FieldNm>PM417</cre:FieldNm>
<cre:FieldVal>Not Defined</cre:FieldVal>
</cre:AddlDynInfo>
</cre:AddlDynInfoGrp>
<cre:CreateCriteriasGrp>
<cre:CreateCriterias>
<cre:CriteriaNm>CriticalReqDtlValidationReqd</cre:CriteriaNm>
</cre:CreateCriterias>
</cre:CreateCriteriasGrp>
</cre:CreateCaseReqGrp>
</cre:Request>
</cre:createCase>
EDIT:
OP was using an older version of python/lxml that did not take namespaces, the right code was:
createCase=etree.tostring(root.find('.//{http://www.code.com/abc/V1/createCase}createCase'))
print etree.tostring(createCase, pretty_print=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add xml subelement with different namespaces than root element using lxml - python

DIRTY WORKAROUND Create Envelope (BizData), Header (Hdr) and Payload (Pyld) as individual etree.Element's Transfer them to strings Combine the strings Write to xml file This is ignoring any sort of validation but doesn't mess with the namespace. Not ideal, but does the job.

Related

how to parse XML with namespace and attribute in Python?

Iterating through xml file

Extracting Child XML using ElementTree ignoring Namespace

Remove XML node if childnode's childnode contains specific value

Print Soap Body data using lxml

Categories

Resources