Remove namespaces and nodes from XML string in python - python

I get an xml string from a post request and I need to use this xml in a subsequent request. I need to edit the XML from the first request to reflect the correct format for the subsequent request.
I can successfully remove the name spaces but am struggling with extracting the desired node and keeping the xml formatting.
current format
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<GetExResponse xmlns="http://www.someurl.com/">
<GetExResult>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
</GetExResult>
</GetExResponse>
</soap:Body>
</soap:Envelope>
Desired Format
<?xml version="1.0" encoding="UTF-8"?>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
--removes namespaces
dmXML = xmlstring
from lxml import etree
root = etree.fromstring(dmXML)
for elem in root.getiterator():
elem.tag = etree.QName(elem).localname
etree.cleanup_namespaces(root)
test = etree.tostring(root).decode()
print(test)
--extracts desired node but into dataframe changing the formatting
xdf = pandas.read_xml(dmXML, xpath='.//DataMap/*', namespaces={"doc": "http://www.w3.org/2001/XMLSchema"})
xml = pandas.DataFrame.to_xml(xdf)

You can simply extract the relevant portion into a new document:
import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())
Output:
<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
</DataMap>

Related

Python Parsing Multiple XML Nodes with Dynamic Data

I have an log file from an application in XML-like format that I'm trying to parse. As you can see from the file, one "group" starts with a [trace] line, and contains 4 nodes - RequestMeta, Request, ReplyMeta, and Reply.
Once the file is parsed, I want to create an object for each "group" and use the objects for further processing. There could be from 1:n groups depending on the complexity of the log file.
I have been able to parse the XML, but I have some questions on how best to proceed based on it's structure.
The first problem is how to structure/re-structure the file for parsing. Since I'm adding a single root node to more than one "group", there will be no easy way for me to know which children of the root node belong together in that group. In the original file, the group is denoted as everything between the [trace] line and the next [trace] line.
I think I could potentially solve this by taking each string "group" and create a tree for each group instead of a tree for the entire file.
The second problem is how to store the data once it's parsed. Each and every request/reply will contain different data elements under the srvdata node. I'm not sure how to dynamically store a variable number of values that have a variable number of names.
After parsing all of the data, I want to output it in a simple webpage that looks something like https://imgur.com/a/2l6ZSJK
py script
import xml.etree.ElementTree as ET
with open('C:/code/mra/requestreply.txt') as f:
txt = f.read()
pos = 0
# replace all [trace] lines
while pos >= 0:
pos = txt.find('[trace-')
pos2 = txt.find('\n', pos + 1) + 1
if pos >= 0:
txt = txt.replace(txt[pos:pos2], '')
# replace all xml instances because they are out of order
txt = txt.replace('<?xml version="1.0" encoding="utf-8"?>\n', '')
# add a master root node
xml = '<root>\n' + txt + '</root>'
tree = ET.fromstring(xml)
xml file - this is considered a single group (there could be hundreds)
[trace-592] TransactionID=6010 TransactionName=CPM.ExecuteDiscernScript User=MEPPS
<RequestMeta>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</RequestMeta>
<Request>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Request>
<ReplyMeta>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</ReplyMeta>
<Reply>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Reply>
I suggest modify your xml structure like this, I named the file trace.xml:
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--[trace-592] TransactionID=6010 TransactionName=CPM.ExecuteDiscernScript User=MEPPS-->
<RequestMeta>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</RequestMeta>
<Request>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Request>
<ReplyMeta>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</ReplyMeta>
<Reply>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Reply>
</root>
Then you can parse each segment separate like:
import xml.etree.ElementTree as ET
def parseRequestMeta(RequestMeta):
"""Parse your interest here """
for root in RequestMeta:
print(root.tag)
for child in root.iter():
print(child.tag, child.text)
def parseRequest(Request):
psss
def parseReplyMeta(ReplyMeta):
psss
def parseReply(Reply):
psss
RequestMeta = []
Request = []
ReplyMeta = []
Reply = []
events = ["start", "end"]
for event, node in ET.iterparse('trace.xml', events=events):
if event == "end" and node.tag == "RequestMeta":
RequestMeta.append(node)
print(node.tag)
if event == "end" and node.tag == "Request":
Request.append(node)
print(node.tag)
if event == "end" and node.tag == "ReplyMeta":
ReplyMeta.append(node)
print(node.tag)
if event == "end" and node.tag == "Reply":
Reply.append(node)
print(node.tag)
parseRequestMeta(RequestMeta)
parseRequestMeta(Request)
parseRequestMeta(ReplyMeta)
parseRequestMeta(Reply)

Python - replace root element of one xml file with another root element without its children

I have one xml file that looks like this, XML1:
<?xml version='1.0' encoding='utf-8'?>
<report>
</report>
And the other one that is like this,
XML2:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
....
</child2>
</child1>
</report>
I need to replace and put root element of XML2 without its children, so XML1 looks like this:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
Currently my code looks like this but it won't remove children but put whole tree inside:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
report = source_root.findall('report')
for child in list(report):
report.remove(child)
source_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
Anyone has ide how can I achieve this?
Thanks!
Try the below (just copy attrib)
import xml.etree.ElementTree as ET
xml1 = '''<?xml version='1.0' encoding='utf-8'?>
<report>
</report>'''
xml2 = '''<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
</child2>
</child1>
</report>'''
root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)
root1.attrib = root2.attrib
ET.dump(root1)
output
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
So here is working code:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
dest_tree = ET.parse('XML1.xml')
dest_root = dest_tree.getroot()
dest_root.attrib = source_root.attrib
dest_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)

Get text inside xml tags by their name

I had a xml code and i want to get text in exact elements(xml tags) using python language .
I have tried couple of solutions and didnt work.
import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('Model'):
print node
How can i do that ?
Xml Code :
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetVehicleLimitedInfoResponse
xmlns="http://schemas.conversesolutions.com/xsd/dmticta/v1">
<return>
<ResponseMessage xsi:nil="true" />
<ErrorCode xsi:nil="true" />
<RequestId> 2012290007705 </RequestId>
<TransactionCharge>150</TransactionCharge>
<VehicleNumber>GF-0176</VehicleNumber>
<AbsoluteOwner>SIYAPATHA FINANCE PLC</AbsoluteOwner>
<EngineNo>GA15-483936F</EngineNo>
<ClassOfVehicle>MOTOR CAR</ClassOfVehicle>
<Make>NISSAN</Make>
<Model>PULSAR</Model>
<YearOfManufacture>1998</YearOfManufacture>
<NoOfSpecialConditions>0</NoOfSpecialConditions>
<SpecialConditions xsi:nil="true" />
</return>
</GetVehicleLimitedInfoResponse>
</soap:Body>
</soap:Envelope>
Edited and improved answer:
import xml.etree.ElementTree as ET
import re
ns = {"veh": "http://schemas.conversesolutions.com/xsd/dmticta/v1"}
tree = ET.parse('test.xml') # save your xml as test.xml
root = tree.getroot()
def get_tag_name(tag):
return re.sub(r'\{.*\}', '',tag)
for node in root.find(".//veh:return", ns):
print(get_tag_name(node.tag)+': ', node.text)
It should produce something like this:
ResponseMessage: None
ErrorCode: None
RequestId: 2012290007705
TransactionCharge: 150
VehicleNumber: GF-0176
AbsoluteOwner: SIYAPATHA FINANCE PLC
EngineNo: GA15-483936F
ClassOfVehicle: MOTOR CAR
Make: NISSAN
Model: PULSAR
YearOfManufacture: 1998
NoOfSpecialConditions: 0
SpecialConditions: None

How to parse SOAP XML with Python

I have some SOAP responses saved in a file which I would like to parse,
Part of example file:
<?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<ns0:GetList_Operation_0Response xmlns:ns0="urn:COMPANY:TEST:Assets" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Data Centre</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0003</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWDS</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>UPS</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS66666</ns0:AssetInstanceId>
</ns0:getListValues>
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Unix</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0004</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWQQ</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>COMPUTERSYSTEM</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS55555</ns0:AssetInstanceId>
</ns0:getListValues>
</ns0:GetList_Operation_0Response>
</soapenv:Body>
I would like to get (FullName & AssetInstanceId):
Data Centre;ASDAWDDAWSDWADS66666
Unix;ASDAWDDAWSDWADS55555
Could you suggest the best method to do that? Whenever I try to do that with ElementTree I get error of
"SyntaxError: expected path separator (:)"
Probably because of ns0: annex in beginning of every line
Looks like your xml is broken. You can try using regex to get the required values
Demo:
a = """<?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns0:GetList_Operation_0Response xmlns:ns0="urn:COMPANY:TEST:Assets" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Data Centre</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0003</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWDS</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>UPS</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS66666</ns0:AssetInstanceId>
</ns0:getListValues>
<ns0:getListValues>
<ns0:Status>New</ns0:Status>
<ns0:FormType>Support Group</ns0:FormType>
<ns0:PersonRole>Supported by</ns0:PersonRole>
<ns0:FullName>Unix</ns0:FullName>
<ns0:PeopleGroupFormEntryID>SG0004</ns0:PeopleGroupFormEntryID>
<ns0:PeopleGroupInstanceID>ASDAWDASDWADSDWSDWQQ</ns0:PeopleGroupInstanceID>
<ns0:AssetClassId>COMPUTERSYSTEM</ns0:AssetClassId>
<ns0:AssetInstanceId>ASDAWDDAWSDWADS55555</ns0:AssetInstanceId>
</ns0:getListValues>"""
import re
FullName = re.findall("<ns0:FullName>(.*?)</ns0:FullName>", a)
AssetInstanceId = re.findall("<ns0:AssetInstanceId>(.*?)</ns0:AssetInstanceId>", a)
for i in zip(FullName, AssetInstanceId):
print(i)
Output:
'Data Centre', 'ASDAWDDAWSDWADS66666'
'Unix', 'ASDAWDDAWSDWADS55555'

Soap Client using Suds

Soap call in Python
Hi above is my previous question regarding soap. In there i am passing a 1D array. Now my problem is i need to pass the 2D array to the following Soap schema.
Request Schema
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<CalculateWeb2DObjectArray xmlns="http://tempuri.org/">
<HCID>string</HCID>
<jaggedobjDataMICRO>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
</jaggedobjDataMICRO>
<numeratorID>int</numeratorID>
</CalculateWeb2DObjectArray>
</soap:Body>
</soap:Envelope>
Response Schema
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<CalculateWeb2DObjectArrayResponse xmlns="http://tempuri.org/">
<CalculateWeb2DObjectArrayResult>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
<ArrayOfAnyType>
<anyType />
<anyType />
</ArrayOfAnyType>
</CalculateWeb2DObjectArrayResult>
</CalculateWeb2DObjectArrayResponse>
</soap:Body>
</soap:Envelope>
My Code
from suds.xsd.doctor import Import, ImportDoctor
from suds.client import Client
# enable logging to see transmitted XML
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger('suds.client').setLevel(logging.DEBUG)
# fix broken wsdl
# add <s:import namespace="http://www.w3.org/2001/XMLSchema"/> to the wsdl
imp = Import('http://www.w3.org/2001/XMLSchema',
location='http://www.w3.org/2001/XMLSchema.xsd')
imp.filter.add('http://tempuri.org/')
wsdl_url = 'http://204.9.76.243/nuCast.DataFeedService/Service1.asmx?WSDL'
client = Client(wsdl_url, doctor=ImportDoctor(imp))
# make request
arrayofstring1 = client.factory.create('ArrayOfString')
arrayofstring1.string = [1,2]
arrayofstring2 = client.factory.create('ArrayOfString')
arrayofstring2.string = [5,6]
arrayofstring = client.factory.create('ArrayOfString')
arrayofstring.string = [arrayofstring1,arrayofstring2]
print client.service.CalculateWeb2DObjectArray(1073757, arrayofstring, 99)
But i got empty value in output.Plz help to solve this.
Thanks
You pass invalid arguments to CalculateWeb2DObjectArray() function.
To find out what type of arguments CalculateWeb2DObjectArray() accepts, you could add to your script:
print client
The output contains:
CalculateWeb2DObjectArray(xs:string HCID,
ArrayOfArrayOfAnyType jaggedobjDataMICRO,
xs:int numeratorID, )
So the second argument should be ArrayOfArrayOfAnyType, use client.factory to create it:
aoaoat = client.factory.create('ArrayOfArrayOfAnyType')
To find out how to populate aoaoat, just print it:
print aoaoat
The output:
(ArrayOfArrayOfAnyType){
ArrayOfAnyType[] = <empty>
}
Repeating the same procedure for ArrayOfAnyType you get:
(ArrayOfAnyType){
anyType[] = <empty>
}
Putting it all together:
aoaoat = client.factory.create('ArrayOfArrayOfAnyType')
lst = aoaoat.ArrayOfAnyType = []
for L in [[1,2], [5,6]]:
aoat = client.factory.create('ArrayOfAnyType')
aoat.anyType = L
lst.append(aoat)
response = client.service.CalculateWeb2DObjectArray(1073757, aoaoat, 99)
print response
Request
DEBUG:suds.client:sending to (
http://204.9.76.243/nuCast.DataFeedService/Service1.asmx)
message:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:ns0="http://tempuri.org/"
xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header/>
<ns1:Body>
<ns0:CalculateWeb2DObjectArray>
<ns0:HCID>1073757</ns0:HCID>
<ns0:jaggedobjDataMICRO>
<ns0:ArrayOfAnyType>
<ns0:anyType>1</ns0:anyType>
<ns0:anyType>2</ns0:anyType>
</ns0:ArrayOfAnyType>
<ns0:ArrayOfAnyType>
<ns0:anyType>5</ns0:anyType>
<ns0:anyType>6</ns0:anyType>
</ns0:ArrayOfAnyType>
</ns0:jaggedobjDataMICRO>
<ns0:numeratorID>99</ns0:numeratorID>
</ns0:CalculateWeb2DObjectArray>
</ns1:Body>
</SOAP-ENV:Envelope>
DEBUG:suds.client:headers = {
'SOAPAction': u'"http://tempuri.org/CalculateWeb2DObjectArray"',
'Content-Type': 'text/xml; charset=utf-8'}
Response
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<CalculateWeb2DObjectArrayResponse xmlns="http://tempuri.org/">
<CalculateWeb2DObjectArrayResult>
<ArrayOfAnyType>
<anyType>1</anyType>
<anyType>2</anyType>
</ArrayOfAnyType>
<ArrayOfAnyType>
<anyType>5</anyType>
<anyType>6</anyType>
</ArrayOfAnyType>
</CalculateWeb2DObjectArrayResult>
</CalculateWeb2DObjectArrayResponse>
</soap:Body>
</soap:Envelope>
Output
(ArrayOfArrayOfAnyType){
ArrayOfAnyType[] =
(ArrayOfAnyType){
anyType[] =
"1",
"2",
},
(ArrayOfAnyType){
anyType[] =
"5",
"6",
},
}

Categories

Resources