Yahoo BOSS Python Library, ExpatError - python

I tried to install the Yahoo BOSS mashup framework, but am having trouble running the examples provided. Examples 1, 2, 5, and 6 work, but 3 & 4 give Expat errors. Here is the output from ex3.py:
gpython examples/ex3.py
examples/ex3.py:33: Warning: 'as' will become a reserved keyword in Python 2.6
Traceback (most recent call last):
File "examples/ex3.py", line 27, in <module>
digg = db.select(name="dg", udf=titlef, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")
File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 214, in select
tb = create(name, data=data, url=url, keep_standards_prefix=keep_standards_prefix)
File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 201, in create
return WebTable(name, d=rest.load(url), keep_standards_prefix=keep_standards_prefix)
File "/usr/lib/python2.5/site-packages/yos/crawl/rest.py", line 38, in load
return xml2dict.fromstring(dl)
File "/usr/lib/python2.5/site-packages/yos/crawl/xml2dict.py", line 41, in fromstring
t = ET.fromstring(s)
File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 963, in XML
parser.feed(text)
File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
It looks like both examples are failing when trying to query Digg.com. Here is the query that is constructed in ex3.py's code:
diggf = lambda r: {"title": r["title"]["value"], "diggs": int(r["diggCount"]["value"])}
digg = db.select(name="dg", udf=diggf, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")

The problem is the digg search string. It should be "s=". Not "search="

I believe that must be an error in the example: it's getting a JSON result (indeed if you copy and paste that URL in your browser, you'll download a file names search.json which starts with
{"results":[{"profile_image_url":
"http://a3.twimg.com/profile_images/255524395/KEN_OMALLEY_REVISED_normal.jpg",
"created_at":"Mon, 14 Sep 2009 14:52:07 +0000","from_user":"twilightlords",
i.e. perfectly normal JSON; but then instead of parsing it with modules such as json or simplejson, it tries to parse it as XML -- and obviously this attempt fails.
I believe the fix (which probably needs to be brought to the attention of whoever maintains that code so they can incorporate it) is either to ask for XML instead of JSON output, OR to parse the resulting JSON with appropriate means instead of trying to look at it as XML (not sure how to best implement either change, as I'm not familiar with that code).

Related

PyXB - AssertionError: No element bindings in http://www.w3.org/1999/xhtml

I am attempting to generate bindings for a WSDL with PyXB, and it is giving the AssertionError exception in the title.
My understanding, based on the PyXB documentation, is that the bundle archive for http://www.w3.org/1999/xhtml is included with PyXB. However, something appears to be wrong. It either does not get used, or it has incorrect contents.
I use the following command line to attempt to generate the bindings:
python c:\Python27\Scripts\pyxbgen.py --wsdl-location=http://xx.xxx.xxx.xxx/YYY.asmx?WSDL --module=client --write-for-customization
The traceback:
Traceback (most recent call last):
File "c:\Python27\Scripts\pyxbgen.py", line 51, in <module> generator.resolveExternalSchema()
File "c:\Python27\lib\site-packages\pyxb\binding\generate.py", line 2647, in resolveExternalSchema
schema = converter(self, sl)
File "c:\Python27\Scripts\pyxbgen.py", line 28, in WSDLToSchema
spec = wsdl.definitions.createFromDOM(pyxb.utils.domutils.StringToDOM(xmld,
location_base=wsdl_uri), process_schema=True, generation_uid=generator.generationUID())
File "c:\Python27\lib\site-packages\pyxb\binding\basis.py", line 1767, in createFromDOM
return self._createFromDOM(node, expanded_name, **kw)
File "c:\Python27\lib\site-packages\pyxb\binding\basis.py", line 1791, in _createFromDOM
return element.CreateDOMBinding(node, self.elementForName(expanded_name), **kw)
File "c:\Python27\lib\site-packages\pyxb\binding\basis.py", line 1735, in elementForName
assert 'elementBinding' in elt_en.namespace()._categoryMap(), 'No element bindings in %s' % (elt_en.namespace(),)
AssertionError: No element bindings in http://www.w3.org/1999/xhtml
In addition, I set the PYXB_ARCHIVE_PATH environment variable to:
C:\Python27\Lib\site-packages\pyxb\bundles\common\raw
I am not sure if this is the correct way to do this. I also tried specifying the --archive-path command line option as well, but I got the same error back.
Probably you need to use:
--archive-path=${PYXB_ROOT}/pyxb/bundles/common//:+
as the argument. This recursively searches for available namespaces in the common bundles first, then includes any other search paths. There's an example in the manual that's close to this.

Upper limit of fromstring function in ElementTree

I'm using Python 2.4 version on a Windows 32-bit PC. I'm trying to parse through a very large XML file using the ElementTree module. I downloaded version 1.2.6 of this module from effbot.org.
I followed the below code for my purpose:
import elementtree.ElementTree as ET
input = ''' 001 Chuck 009 Brent '''
stuff = ET.fromstring(input)
lst = stuff.findall("users/user")
print len(lst)
for item in lst:
print item.attrib["x"]
item = lst[0]
ET.dump(item)
item.get("x") # get works on attributes
item.find("id").text
item.find("id").tag
for user in stuff.getiterator('user'):
print "User" , user.attrib["x"]
ET.dump(user)
If the content of input is too large, more than 10,000 lines, the fromstring function raises an error (below). Can anyone help me out in rectifying this error?
This is the error generated:
Traceback (most recent call last): File "C:\Documents and Settings\hariprar\My Documents\My files\Python Try\xml_try1.py", line 16, in -toplevel- stuff = ET.fromstring(input) File "C:\Python24\Lib\site-packages\elementtree\ElementTree.py", line 1012, in XML return api.fromstring(text) File "C:\Python24\Lib\site-packages\elementtree\ElementTree.py", line 182, in fromstring parser.feed(text) File "C:\Python24\Lib\site-packages\elementtree\ElementTree.py", line 1292, in feed self._parser.Parse(data, 0) ExpatError: not well-formed (invalid token): line 2445, column 39
Take a look at the iterparse function. It will let you parse your input incrementally rather than reading it into memory as one big chunk.
It's described here: http://effbot.org/zone/element-iterparse.htm

ExpatError: no element found - Python script

Using OS X 10.6.8, libxml 2-2.7.8, libxslt-1.1.26, and python 2.6, I'm trying to run the tumblrRestore.py script linked here:
https://github.com/hughsaunders/Tumblr-Restore/blob/master/tumblrRestore.py
It ran successfully and restored 76 posts before crashing.
However on second run I got an ExpatError: no element found, and have not been able to run it successfully since - it always produces this same error now. Error text:
Tumblr Restore
Traceback (most recent call last):
File "tumblrRestore.py", line 264, in <module>
cli.start()
File "tumblrRestore.py", line 232, in start
bp.parse()
File "tumblrRestore.py", line 51, in parse
postelement=ElementTree.fromstring(xml_string)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/etree/ElementTree.py", line 964, in XM
return parser.close()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/etree/ElementTree.py", line 1254, in close
self._parser.Parse("", 1) # end of data
xml.parsers.expat.ExpatError: no element found: line 1, column 0
I'm wondering whether I have the wrong or competing or outdated versions of python or lxml, though that still doesn't explain why the script ran successfully once.
Complete newbie, any advice appreciated.
Check your extract_xml_string method of BackupParser class. It definitely returns empty string, because your begin_re regular expresssion doesn't match xml header.
Try the next one:
begin_re = re.compile("<\? xml .*\?>")

Python 3.x and TestLink xmlprc

Appreciate your helping first, I am new for the python 3.x.
When I try to use Python 3.x to parse the testlink xmlprc server. I got below error, but I can run the code under Python 2.x, any idea?
import xmlrpc.client
server = xmlrpc.client.Server("http://172.16.29.132/SITM/lib/api/xmlrpc.php") //here is my testlink server
print (server.system.listMethods()) //I can print the methods list here
print (server.tl.ping()) // Got error.
Here is the error:
['system.multicall', 'system.listMethods', 'system.getCapabilities', 'tl.repeat', 'tl.sayHello', 'tl.ping', 'tl.setTestMode', 'tl.about', 'tl.checkDevKey', 'tl.doesUserExist', 'tl.deleteExecution', 'tl.getTestSuiteByID', 'tl.getFullPath', 'tl.getTestCase', 'tl.getTestCaseAttachments', 'tl.getFirstLevelTestSuitesForTestProject', 'tl.getTestCaseCustomFieldDesignValue', 'tl.getTestCaseIDByName', 'tl.getTestCasesForTestPlan', 'tl.getTestCasesForTestSuite', 'tl.getTestSuitesForTestSuite', 'tl.getTestSuitesForTestPlan', 'tl.getLastExecutionResult', 'tl.getLatestBuildForTestPlan', 'tl.getBuildsForTestPlan', 'tl.getTotalsForTestPlan', 'tl.getTestPlanPlatforms', 'tl.getProjectTestPlans', 'tl.getTestPlanByName', 'tl.getTestProjectByName', 'tl.getProjects', 'tl.addTestCaseToTestPlan', 'tl.assignRequirements', 'tl.uploadAttachment', 'tl.uploadTestCaseAttachment', 'tl.uploadTestSuiteAttachment', 'tl.uploadTestProjectAttachment', 'tl.uploadRequirementAttachment', 'tl.uploadRequirementSpecificationAttachment', 'tl.uploadExecutionAttachment', 'tl.createTestSuite', 'tl.createTestProject', 'tl.createTestPlan', 'tl.createTestCase', 'tl.createBuild', 'tl.setTestCaseExecutionResult', 'tl.reportTCResult']
Traceback (most recent call last):
File "F:\SQA\Python\Testlink\Test.py", line 5, in <module>
print (server.tl.ping())
File "C:\Python31\lib\xmlrpc\client.py", line 1029, in __call__
return self.__send(self.__name, args)
File "C:\Python31\lib\xmlrpc\client.py", line 1271, in __request
verbose=self.__verbose
File "C:\Python31\lib\xmlrpc\client.py", line 1070, in request
return self.parse_response(resp)
File "C:\Python31\lib\xmlrpc\client.py", line 1164, in parse_response
p.feed(response)
File "C:\Python31\lib\xmlrpc\client.py", line 454, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: junk after document element: line 2, column 0
When I've seen this message before, it happened because the contents of the transported data wasn't escaped for XML transport. The solution was to wrap the data in an XMLRPC Binary object.
In your case, you don't control the server side, so the above isn't a solution for you but it may suggest what the actual problem is.
Also, the Python 2 versus Python 3 difference suggests that there is a text/bytes issue at work.
To help diagnose the issue, set verbose=True so you can see the actual HTTP request/response headers and the XML request/response. That may show you what is at line 2: column 0. You may find that the issue may be with the PHP script not wrapping up binary data in base64 encoding as required by the XMLRPC spec.
Thank you , I find out all the methods list, only 'tl.sayHello', 'tl.ping','tl.about' has this problem, and all of them are pass a string with a PHP automatic loader empty file *.class.php to the parser, other methods are pass a xml file. So I give up to use those methods and the script works fine.

Wikipedia with Python

I have this very simple python code to read xml for the wikipedia api:
import urllib
from xml.dom import minidom
usock = urllib.urlopen("http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500")
xmldoc=minidom.parse(usock)
usock.close()
print xmldoc.toxml()
But this code returns with these errors:
Traceback (most recent call last):
File "/home/user/workspace/wikipediafoundations/src/list.py", line 5, in <module><br>
xmldoc=minidom.parse(usock)<br>
File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse<br>
return expatbuilder.parse(file)<br>
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse<br>
result = builder.parseFile(file)<br>
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile<br>
parser.Parse(buffer, 0)<br>
xml.parsers.expat.ExpatError: syntax error: line 1, column 62<br>
I have no clue as I just learning python. Is there a way to get an error with more detail? Does anyone know the solution? Also, please recommend a better language to do this in.
Thank You,
Venkat Rao
The URL you're requesting is an HTML representation of the XML that would be returned:
http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500
So the XML parser fails. You can see this by pasting the above in a browser. Try adding a format=xml at the end:
http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500&format=xml
as documented on the linked page:
http://en.wikipedia.org/w/api.php

Categories

Resources