Unable to locate element based on absolute path in XML using Python - python

My XML file looks something like this:
<SCAN_LIST_OUTPUT>
<RESPONSE>
<DATETIME>2018-05-21T11:29:05Z</DATETIME>
<SCAN_LIST>
<SCAN>
<REF>scan/1526727908.25005</REF>
<TITLE><![CDATA[ACRS_Scan]]></TITLE>
<LAUNCH_DATETIME>2018-05-19T11:05:08Z</LAUNCH_DATETIME>
</SCAN>
<SCAN>
<REF>scan/1526549903.07613</REF>
<TITLE><![CDATA[testScan]]></TITLE>
<LAUNCH_DATETIME>2018-05-17T09:38:23Z</LAUNCH_DATETIME>
</SCAN>
</SCAN_LIST>
</RESPONSE>
</SCAN_LIST_OUTPUT>
Now when I try to find the REF element of the first element using an absolute path where I know the LAUNCH_DATETIME it gives me an error saying invalid predicate.
Here is my code:
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(response))
groot = tree.getroot()
path = './/REF[../LAUNCH_DATETIME="2018-05-19T11:05:08Z"]'
scan_id = tree.find(path)
Here is the following traceback call:
KeyError: ('.//REF[../LAUNCH_DATETIME="2018-05-19T11:05:08Z"]', None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/doomsday/PycharmProjects/untitled/venv/ScanList.py", line 44, in <module>
scan_id = tree.find(path)
File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 651, in find
return self._root.find(path, namespaces)
File "/usr/lib/python3.5/xml/etree/ElementPath.py", line 298, in find
return next(iterfind(elem, path, namespaces), None)
File "/usr/lib/python3.5/xml/etree/ElementPath.py", line 277, in iterfind
selector.append(ops[token[0]](next, token))
File "/usr/lib/python3.5/xml/etree/ElementPath.py", line 233, in prepare_predicate
raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate
When I use the same absolute path on an online xpath evaluator, it gives me the desired output. But when I try the same in my code, it fails. If anyone could tell what the problem is and how it can be resolved, would be great.
Thanks in advance.

ElementTree's xpath support is limited. Instead of trying to go back up the tree with .. in a predicate on REF, add the predicate to SCAN.
Example...
path = './/SCAN[LAUNCH_DATETIME="2018-05-19T11:05:08Z"]/REF'

Related

XPath SyntaxError: invalid predicate

I have a XML file like this:
$ cat sample.xml
<Requests>
<Request>
<ID>123</ID>
<Items>
<Item>a item</Item>
<Item>b item</Item>
<Item>c item</Item>
</Items>
</Request>
<Request>
<ID>456</ID>
<Items>
<Item>d item</Item>
<Item>e item</Item>
</Items>
</Request>
</Requests>
I simply want to extract the XML of Request elements which has certain value for their grandchild element Item. Here is code:
bash-4.2$ cat xsearch.py
import sys
import xml.etree.ElementTree as ET
if __name__ == '__main__':
tree = ET.parse(sys.argv[1])
root = tree.getroot()
for request in root.findall(".//Item[.='c item']/../.."):
#for request in root.findall(".//Request[Items/Item = 'c item']"):
print(request)
I got "invalid predicate" error:
bash-4.2$ python3 xsearch.py sample.xml
Traceback (most recent call last):
File "/usr/lib64/python3.6/xml/etree/ElementPath.py", line 263, in iterfind
selector = _cache[cache_key]
KeyError: (".//Item[.='c item']/../..", None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "xsearch.py", line 8, in <module>
for request in root.findall(".//Item[.='c item']/../.."):
File "/usr/lib64/python3.6/xml/etree/ElementPath.py", line 304, in findall
return list(iterfind(elem, path, namespaces))
File "/usr/lib64/python3.6/xml/etree/ElementPath.py", line 277, in iterfind
selector.append(ops[token[0]](next, token))
File "/usr/lib64/python3.6/xml/etree/ElementPath.py", line 233, in prepare_predicate
raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate
Could any one point out where I got it wrong?
In general, an XPath invalid predicate error means something is syntactically wrong with one of the XPath's predicates, the code between the [ and ].
Specifically in your case, there are two issues:
The SyntaxError("invalid predicate") is because there's an extra ) in the predicate:
for request in root.findall(".//Item[.='c item')]/../.."):
^
Note also that you can hoist the predicate to avoid navigating down and then back up (../..):
Instead of
.//Item[.='c item']/../..
consider
.//Request[Items/Item = 'c item']
to select the Request element with the targeted Item.
The XPath library you're using, ElementTree, is not a full implementation of the XPath standard. You can waste a lot of time trying to identify what ElementTree does support (".//Items[Item='c item']/.." happens to work here) and does not support, but it'd be better to just use a more compliant library such as lxml.

I am not able to resolve 'lxml.etree.XPathEvalError: Invalid expression' error on a legal XPATH expression

I am trying to parse an xpath but it is giving Invalid expression error.
The code that should work:
x = tree.xpath("//description/caution[1]/preceding-sibling::*/name()!='warning'")
print(x)
Expected result is a boolean value but it is showing error:
Traceback (most recent call last):
File "poc_xpath2.0_v1.py", line 9, in <module>
x = tree.xpath("//description/caution[1]/preceding-sibling::*/name()!='warning'")
File "src\lxml\etree.pyx", line 2276, in lxml.etree._ElementTree.xpath
File "src\lxml\xpath.pxi", line 359, in lxml.etree.XPathDocumentEvaluator.__call__
File "src\lxml\xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid expression
The exception is because name() isn't a valid node type. Your XPath would only be valid as XPath 2.0 or greater. lxml only supports XPath 1.0.
You would need to move the name() != 'warning' into a predicate.
Also, if you want a True/False result, wrap the xpath in boolean()...
tree.xpath("boolean(//description/caution[1]/preceding-sibling::*[name()!='warning'])")
Full example...
from lxml import etree
xml = """
<doc>
<description>
<warning></warning>
<caution></caution>
</description>
</doc>"""
tree = etree.fromstring(xml)
x = tree.xpath("boolean(//description/caution[1]/preceding-sibling::*[name()!='warning'])")
print(x)
This would print False.

With Python 3 and lxml, how to extract the Version number from a SOAP WSDL?

When I test with a subset of the WSDL file, with Name Spaces omitted from the file and code, it works fine.
# for reference, these are the final lines from the WSDL
#
# <wsdl:service name="Shopping">
# <wsdl:documentation>
# <Version>1027</Version>
# </wsdl:documentation>
# <wsdl:port binding="ns:ShoppingBinding" name="Shopping">
# <wsdlsoap:address location="http://open.api.ebay.com/shopping"/>
# </wsdl:port>
# </wsdl:service>
#</wsdl:definitions>
from lxml import etree
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
print(wsdl.findtext('wsdl:.//Version')) # wish this would print 1027
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py"
Traceback (most recent call last):
File "src/lxml/_elementpath.py", line 79, in lxml._elementpath.xpath_tokenizer (src/lxml/_elementpath.c:2414)
KeyError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py", line 14, in <module>
print(wsdl.findtext('wsdl:.//Version')) # wish this would print 1027
File "src/lxml/etree.pyx", line 2230, in lxml.etree._ElementTree.findtext (src/lxml/etree.c:69049)
File "src/lxml/etree.pyx", line 1552, in lxml.etree._Element.findtext (src/lxml/etree.c:60629)
File "src/lxml/_elementpath.py", line 329, in lxml._elementpath.findtext (src/lxml/_elementpath.c:10089)
File "src/lxml/_elementpath.py", line 311, in lxml._elementpath.find (src/lxml/_elementpath.c:9610)
File "src/lxml/_elementpath.py", line 300, in lxml._elementpath.iterfind (src/lxml/_elementpath.c:9282)
File "src/lxml/_elementpath.py", line 277, in lxml._elementpath._build_path_iterator (src/lxml/_elementpath.c:8675)
File "src/lxml/_elementpath.py", line 82, in xpath_tokenizer (src/lxml/_elementpath.c:2542)
SyntaxError: prefix 'wsdl' not found in prefix map
Process finished with exit code 1
The xml has namespaces defined in it, hence to access the element you need to define the link of the namespace. Please see if the code below helps:
wsdlLink = "http://schemas.xmlsoap.org/wsdl/"
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
print(wsdl.findtext('{'+wsdlLink+'}//Version'))
With credit to the kind folks that commented, here is a modified solution that does print the Version number. All I could get working was the wildcard search. Also, the iterator skipped the Version element, so I had to get at it from its parent element. Good enough.
from lxml import etree
wsdlLink = "http://schemas.xmlsoap.org/wsdl/"
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
for element in wsdl.iter('{'+wsdlLink+'}*'):
if 'documentation' in element.tag:
for child in element:
print(child.text)

PyXB - AssertionError: No element bindings in http://www.w3.org/1999/xhtml

I am attempting to generate bindings for a WSDL with PyXB, and it is giving the AssertionError exception in the title.
My understanding, based on the PyXB documentation, is that the bundle archive for http://www.w3.org/1999/xhtml is included with PyXB. However, something appears to be wrong. It either does not get used, or it has incorrect contents.
I use the following command line to attempt to generate the bindings:
python c:\Python27\Scripts\pyxbgen.py --wsdl-location=http://xx.xxx.xxx.xxx/YYY.asmx?WSDL --module=client --write-for-customization
The traceback:
Traceback (most recent call last):
File "c:\Python27\Scripts\pyxbgen.py", line 51, in <module> generator.resolveExternalSchema()
File "c:\Python27\lib\site-packages\pyxb\binding\generate.py", line 2647, in resolveExternalSchema
schema = converter(self, sl)
File "c:\Python27\Scripts\pyxbgen.py", line 28, in WSDLToSchema
spec = wsdl.definitions.createFromDOM(pyxb.utils.domutils.StringToDOM(xmld,
location_base=wsdl_uri), process_schema=True, generation_uid=generator.generationUID())
File "c:\Python27\lib\site-packages\pyxb\binding\basis.py", line 1767, in createFromDOM
return self._createFromDOM(node, expanded_name, **kw)
File "c:\Python27\lib\site-packages\pyxb\binding\basis.py", line 1791, in _createFromDOM
return element.CreateDOMBinding(node, self.elementForName(expanded_name), **kw)
File "c:\Python27\lib\site-packages\pyxb\binding\basis.py", line 1735, in elementForName
assert 'elementBinding' in elt_en.namespace()._categoryMap(), 'No element bindings in %s' % (elt_en.namespace(),)
AssertionError: No element bindings in http://www.w3.org/1999/xhtml
In addition, I set the PYXB_ARCHIVE_PATH environment variable to:
C:\Python27\Lib\site-packages\pyxb\bundles\common\raw
I am not sure if this is the correct way to do this. I also tried specifying the --archive-path command line option as well, but I got the same error back.
Probably you need to use:
--archive-path=${PYXB_ROOT}/pyxb/bundles/common//:+
as the argument. This recursively searches for available namespaces in the common bundles first, then includes any other search paths. There's an example in the manual that's close to this.

Wikipedia with Python

I have this very simple python code to read xml for the wikipedia api:
import urllib
from xml.dom import minidom
usock = urllib.urlopen("http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500")
xmldoc=minidom.parse(usock)
usock.close()
print xmldoc.toxml()
But this code returns with these errors:
Traceback (most recent call last):
File "/home/user/workspace/wikipediafoundations/src/list.py", line 5, in <module><br>
xmldoc=minidom.parse(usock)<br>
File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse<br>
return expatbuilder.parse(file)<br>
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse<br>
result = builder.parseFile(file)<br>
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile<br>
parser.Parse(buffer, 0)<br>
xml.parsers.expat.ExpatError: syntax error: line 1, column 62<br>
I have no clue as I just learning python. Is there a way to get an error with more detail? Does anyone know the solution? Also, please recommend a better language to do this in.
Thank You,
Venkat Rao
The URL you're requesting is an HTML representation of the XML that would be returned:
http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500
So the XML parser fails. You can see this by pasting the above in a browser. Try adding a format=xml at the end:
http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500&format=xml
as documented on the linked page:
http://en.wikipedia.org/w/api.php

Categories

Resources