cant find specific node/element using python elementtree

cant find specific node/element using python elementtree - python

I have the below XML document I am trying to parse. I just need to grab one node from the document. I need to get the serviceProfile text. I'm banging my head against the desk here... I am new to Python.
<?xml version='1.0' encoding='UTF-8'?>
<soapenv:Envelope
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<ns:getUserResponse
xmlns:ns="http://www.cisco.com/AXL/API/11.5">
<return>
<user uuid="{blbhbl-bhblb-kbhb}">
<firstName>fname</firstName>
<displayName>fname lname</displayName>
<middleName/>
<lastName>lname</lastName>
<userid>wooty</userid>
<password/>
<pin/>
<mailid>wooty#woot.com</mailid>
<department/>
<manager/>
<userLocale />
<associatedDevices/>
<primaryExtension/>
<associatedPc/>
<enableCti>false</enableCti>
<digestCredentials/>
<phoneProfiles/>
<defaultProfile/>
<presenceGroupName uuid="{sdsds-sdsds-sdsdsd-sdsdsd-sdsd}">Standard Presence group</presenceGroupName>
<subscribeCallingSearchSpaceName/>
<enableMobility>false</enableMobility>
<enableMobileVoiceAccess>false</enableMobileVoiceAccess>
<maxDeskPickupWaitTime>10000</maxDeskPickupWaitTime>
<remoteDestinationLimit>4</remoteDestinationLimit>
<associatedRemoteDestinationProfiles/>
<associatedTodAccess/>
<status>1</status>
<enableEmcc>false</enableEmcc>
<associatedCapfProfiles/>
<ctiControlledDeviceProfiles/>
<patternPrecedence />
<numericUserId />
<mlppPassword />
<customUserFields/>
<homeCluster>true</homeCluster>
<imAndPresenceEnable>true</imAndPresenceEnable>
<serviceProfile uuid="{dsdsdsd-sdsdsd-sdsd-sdsds-sdsds}">1 IM Presence Only</serviceProfile>
<lineAppearanceAssociationForPresences/>
<directoryUri>blah#wooty.com</directoryUri>
<telephoneNumber>555-555-5555</telephoneNumber>
<title/>
<mobileNumber/>
<homeNumber/>
<pagerNumber/>
<extensionsInfo/>
<selfService />
<userProfile/>
<calendarPresence>false</calendarPresence>
<ldapDirectoryName uuid="{sdsd-sdsdsd-sdsds-sdsds}">someinfo</ldapDirectoryName>
<userIdentity>blah#woot.com</userIdentity>
<nameDialing>blehWoot</nameDialing>
<ipccExtension/>
<convertUserAccount uuid="{sdsd-sdsdsd-sdsds-sdsds}">someinfo</convertUserAccount>
<enableUserToHostConferenceNow>false</enableUserToHostConferenceNow>
<attendeesAccessCode/>
</user>
</return>
</ns:getUserResponse>
</soapenv:Body>
</soapenv:Envelope>

Based on #danielHaley suggestions i created the following code to retrieve the node.
#read XML response and get service profile
tree = ET.ElementTree(ET.fromstring(response.content))
root = tree.getroot()
serviceprofile = root.find(".//serviceProfile").text
Worked great. thank you so much for your help.

Related

CDATA sections and comments are lost when parsing XML with ElementTree

I am editing xml files, I ran into the problem that when changing a file in a python script, its structure is lost.
Xml file:
<?xml version="1.0" encoding="UTF-8"?>
<main>
<element formatVersion="1.0">
<firstValue>firstText</firstValue>
<secondValue>secondText</secondValue>
<thirdValue>thirdText</thirdValue>
<errors>
<path><![CDATA[path]]></path>
<code_main />
</errors>
<reference>3</reference>
</element>
....
</main>
Используя:
tree = ET.parse(xml_file).write("test.xml", encoding='utf-8', xml_declaration=True)
I lose all comments in the file, while if I compare the original file with the modified one using diff (in linux), the files are shown as completely different
Is there a way to change the xml file (my task is to add a subelement to <element>), while leaving the overall structure of the file unchanged, including comments and order.
The order and comments are fundamental in the file
UPD:
After executing the above code, I get it from the source xml in the following form:
<?xml version='1.0' encoding='utf-8'?>
<main>
<element formatVersion="1.0">
<firstValue>firstText</firstValue>
<secondValue>secondText</secondValue>
<thirdValue>thirdText</thirdValue>
<errors>
<path>path</path>
<code_main />
</errors>
<reference>3</reference>
</element>
</main>
Pay attention to <path>
Comments are also not saved at the same time:
Source:
<main>
<element formatVersion="1.0">
<firstValue>firstText</firstValue>
<secondValue>secondText</secondValue>
<thirdValue>thirdText</thirdValue>
<errors>
<path><![CDATA[path]]></path>
<!--Stt-->
<code_main />
</errors>
<reference>3</reference>
</element>
</main>
Modified:
<main>
<element formatVersion="1.0">
<firstValue>firstText</firstValue>
<secondValue>secondText</secondValue>
<thirdValue>thirdText</thirdValue>
<errors>
<path>path</path>
<code_main />
</errors>
<reference>3</reference>
</element>
</main>

Parsing XML for specific item using ElementTree

I am making a request to the Salesforce merge API and getting a response like this:
xml_result = '<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns="urn:partner.soap.sforce.com">
<soapenv:Header>
<LimitInfoHeader>
<limitInfo>
<current>62303</current>
<limit>2680000</limit><type>API REQUESTS</type></limitInfo>
</LimitInfoHeader>
</soapenv:Header>
<soapenv:Body>
<mergeResponse>
<result>
<errors>
<message>invalid record type</message>
<statusCode>INSUFFICIENT_ACCESS_ON_CROSS_REFERENCE_ENTITY</statusCode>
</errors>
<id>003skdjf494244</id>
<success>false</success>
</result>
</mergeResponse>
</soapenv:Body>
</soapenv:Envelope>'
I'd like to be able to parse this response and if success=false, return the errors, statusCode, and the message text.
I've tried the following:
import xml.etree.ElementTree as ET
tree = ET.fromstring(xml_result)
root.find('mergeResponse')
root.find('{urn:partner.soap.sforce.com}mergeResponse')
root.findtext('mergeResponse')
root.findall('{urn:partner.soap.sforce.com}mergeResponse')
...and a bunch of other variations of find, findtext and findall but I can't seem to get these to return any results. Here's where I get stuck. I've tried to follow the ElementTree docs, but I don't understand how to parse the tree for specific elements.

Element.find() finds the first child with a particular tag
https://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements
Since mergeResponse is a descendant, not a child, you should use XPath-syntax in this case:
root.find('.//{urn:partner.soap.sforce.com}mergeResponse')
will return your node. .// searches all descendants starting with the current node (in this case the root).

Append data dynamically to my xml SOAP message

I am calling an API by sending an xml request by doing a string formatting like this:
data = '''<?xml version="1.0" encoding="utf-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SOAP-ENV:Body>
<ns2:MultiAvailabilityRequest xmlns:m="http://www.derbysoft.com/doorway" Password="CoolJoe" Token="{token}" UserName="CoolJoe">
<ns2:MultiAvailabilityCriteria NumberOfUnits="{units}">
<ns2:StayDateRange CheckIn="2016-05-02" CheckOut="2016-05-04"/>
<ns2:GuestCounts>
<ns2:GuestCount AdultCount="{adultcount}"/>
</ns2:GuestCounts>
<ns2:HotelCodes>
<ns2:HotelCode>{hotelcode}</ns2:HotelCode>
</ns2:HotelCodes>
</ns2:MultiAvailabilityCriteria>
</ns2::MultiAvailabilityRequest>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>'''.format(token=token, units=units, adultcount=adultcount, hotelcode=hotelcode)
The above code is working fine and getting the value of different hotelcodes, token etc and showing the results based on them.
But, I have one more different requirement where the hotelcodes could be more than 1 (either 2,3 or more). And, the required xml will look like this:
data = '''<?xml version="1.0" encoding="utf-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SOAP-ENV:Body>
<ns2:MultiAvailabilityRequest xmlns:m="http://www.derbysoft.com/doorway" Password="CoolJoe" Token="{token}" UserName="CoolJoe">
<ns2:MultiAvailabilityCriteria NumberOfUnits="{units}">
<ns2:StayDateRange CheckIn="2016-05-02" CheckOut="2016-05-04"/>
<ns2:GuestCounts>
<ns2:GuestCount AdultCount="{adultcount}"/>
</ns2:GuestCounts>
<ns2:HotelCodes>
<ns2:HotelCode>{hotelcode1}</ns2:HotelCode>
<ns2:HotelCode>{hotelcode2}</ns2:HotelCode>
<ns2:HotelCode>{hotelcode3}</ns2:HotelCode>
</ns2:HotelCodes>
</ns2:MultiAvailabilityCriteria>
</ns2::MultiAvailabilityRequest>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>'''.format(token=token, units=units, adultcount=adultcount)
So, my question is: how do I check whether two hotelcodes are present or more than two. As you can see from second xml for each hotel code, a new line like this adds up:
<ns2:HotelCode>{hotelcode1}</ns2:HotelCode>
Any help would be appreciated. Thanks.

Basically you should split the process in two parts:
fill in the hotel codes (doesn't really matter if it's one or more):
hotelcode_string =''.join(['<ns2:HotelCode>{hotelcode}</ns2:HotelCode>'.format(hotelcode=code) for code in set([item["hotelcode"] for item in hotelcode])])
put the hotel code section in the xml:
data = '''.... <ns2:HotelCodes>{hotelcode_string}</ns2:HotelCodes>
...'''.format(token=token, units=units, adultcount=adultcount,hotelcode_string=hotelcode_string)

I want to remove the curly braces and XML namspace using lxml and just report the tag name

So I have the following XML document It is much longer:
<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>
I use the following python to extract some of the tag names:
doc = etree.fromstring(resulttxt)
print( doc.attrib)
print(doc.tag)
print(doc[4][0][0].tag)
if(doc[4][0][0].tag == 'field'):
print 'hi'
What I'm getting though is:
{'version': '1.0'}
{http://www.filemaker.com/xml/fmresultset}fmresultset
{http://www.filemaker.com/xml/fmresultset}field
The xmlns doesn't show up as an attribute of the root tag but it is there.
And it is placed in front of each tag name which makes it difficult to loop through and use conditionals. I want doc.tag just to show the tag and not the namespace and the tag.
This is day 1 for me using this. could anyone help out?

You need to handle namespaces, in your case an empty one:
from lxml import etree as ET
data = """<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>
</fmresultset>
"""
namespaces = {
"myns": "http://www.filemaker.com/xml/fmresultset"
}
tree = ET.fromstring(data)
print tree.find("myns:product", namespaces=namespaces).attrib.get("name")
Prints:
FileMaker Web Publishing Engine

How to remove all attributes of a tag

How can I remove all the attributes of a xml tag so I can get from this:
<xml blah blah blah> to just <xml>.
With lxml I know I can remove the whole element and I didn't find any way to do it specific on a tag. (I found solutions on stackoverflow for C# but I want Python).
I am opening a gpx(xml) file and this is my code so far (based on How do I get the whole content between two xml tags in Python?):
from lxml import etree
t = etree.parse("1.gpx")
e = t.xpath('//trk')[0]
print(e.text + ''.join(map(etree.tostring, e))).strip()
Another approach I did was this:
from lxml import etree
TOPOGRAFIX_NS = './/{http://www.topografix.com/GPX/1/1}'
TRACKPOINT_NS = TOPOGRAFIX_NS + 'extensions/{http://www.garmin.com/xmlschemas/TrackPointExtension/v1}TrackPointExtension/{http://www.garmin.com/xmlschemas/TrackPointExtension/v1}'
doc1 = etree.parse("1.gpx")
for node1 in doc1.findall(TOPOGRAFIX_NS + 'trk'):
node_to_string1 = etree.tostring(node1)
print(node_to_string1)
But I get the trk tag with TOPOGRAFIX_NS attributes witch I don't want and here I am wanting to remove the tag attribute. I just want to get:
<trk> all the inside content </trk>
Thank you very much!
P.S. The content of the gpx file:
<?xml version="1.0" encoding="UTF-8"?>
<gpx version="1.1" creator="Endomondo.com" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<metadata>
<author>
<name>Blah Blah</name>
<email id="blah" domain="blah.com"/>
</author>
<link href="http://www.endomondo.com">
<text>Endomondo</text>
</link>
<time>2014-01-20T10:50:28Z</time>
</metadata>
<trk>
<name>Galati</name>
<src>http://www.endomondo.com/</src>
<link href="http://www.endomondo.com/workouts/260782567/13005122">
<text>Galati</text>
</link>
<type>MOUNTAIN_BIKING</type>
<trkseg>
<trkpt lat="45.431074" lon="28.021038">
<time>2013-10-20T05:49:04Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

cant find specific node/element using python elementtree - python

Related

CDATA sections and comments are lost when parsing XML with ElementTree

Parsing XML for specific item using ElementTree

Append data dynamically to my xml SOAP message

I want to remove the curly braces and XML namspace using lxml and just report the tag name

How to remove all attributes of a tag

Categories

Resources