scrapy xpath exception

scrapy xpath exception - python

self.product_urls.extend(hxs.select("//div[#id="product-list"]//div[#class="product-images"]/table/tr[1]//a')").extract())
This line of code gives me an exception "Invalid Path", I guess it's something wrong with "product-list"
how can I write the same #id without getting the error?

Problem is with Extra parentheses , here is correct syntax.
self.product_urls.extend(hxs.select('//div[#id="product-list"]//div[#class="product-images"]/table/tr[1]//a').extract())
google should be your best friend for this kind of issues, you need to learn Xpath/python basics as well

Related

Python - Socket Error 10054 - How to prevent terminal from preventing error?

Since it is not an execution-fail error, I am not sure what my options are to keep this error from popping up. I do not believe it really matters what my code is that is causing the error if there is some universal command to suppress this error line from printing see my error here
It is simply using whois to determine if the domain is registered or not. I was doing a basic test of the top 1,000 english words to see if their .com domains were taken. code here
Here is my code:
for url in wordlist:
try:
domain = whois.whois(url)
boom.write( ("%s,%s,%s\r\n"% \
(str(number), url, "TAKEN")).encode('UTF-8'))
except:
boom.write( ("%s,%s,%s\r\n"% \
(str(number), url, "NOT TAKEN")).encode('UTF-8'))

A bit hard to know for sure without your code, but wrap the section that's generating the error like this:
try:
# Your error-generating code
except:
pass

Python Xpath Contains %s in Selenium

I know this must be really, really simple but I can't quite figure out the syntax for this. I thought I understood but apparently I don't.
The XPath here works and if I use 'Wednesday' instead of '%s' it works fine but I'm trying to feed the XPath a variable string with this:
texttolookfor = "Wednesday"
buttonelement= webDriver.find_element_by_xpath("//tr[contains(., '%s')]/td[#class='button-cell']" % texttolookfor)
Which gives me:
<class 'selenium.common.exceptions.NoSuchElementException'>
But this works:
buttonelement= webDriver.find_element_by_xpath("//tr[contains(., 'Wednesday')]/td[#class='button-cell']")
So I've clearly misunderstood or made a mistake feeding the string in. I've been trying variations and looking at sample code but as far as my understanding goes this should of worked. I'd appreciate it if anyone could shed some light.
As asked in the comments I ran:
print "//tr[contains(., '%s')]/td[#class='button-cell']" % texttolookfor
which produced:
//tr[contains(., ' Wednesday')]/td[#class='button-cell']

python lxml xpath AttributeError (NoneType) with correct xpath and usually working

I am trying to migrate a forum to phpbb3 with python/xpath. Although I am pretty new to python and xpath, it is going well. However, I need help with an error.
(The source file has been downloaded and processed with tagsoup.)
Firefox/Firebug show xpath: /html/body/table[5]/tbody/tr[position()>1]/td/a[3]/b
(in my script without tbody)
Here is an abbreviated version of my code:
forumfile="morethread-alte-korken-fruchtweinkeller-89069-6046822-0.html"
XPOSTS = "/html/body/table[5]/tr[position()>1]"
t = etree.parse(forumfile)
allposts = t.xpath(XPOSTS)
XUSER = "td[1]/a[3]/b"
XREG = "td/span"
XTIME = "td[2]/table/tr/td[1]/span"
XTEXT = "td[2]/p"
XSIG = "td[2]/i"
XAVAT = "td/img[last()]"
XPOSTITEL = "/html/body/table[3]/tr/td/table/tr/td/div/h3"
XSUBF = "/html/body/table[3]/tr/td/table/tr/td/div/strong[position()=1]"
for p in allposts:
unreg=0
username = None
username = p.find(XUSER).text #this is where it goes haywire
When the loop hits user "tompson" / position()=11 at the end of the file, I get
AttributeError: 'NoneType' object has no attribute 'text'
I've tried a lot of try except else finallys, but they weren't helpful.
I am getting much more information later in the script such as date of post, date of user registry, the url and attributes of the avatar, the content of the post...
The script works for hundreds of other files/sites of this forum.
This is no encode/decode problem. And it is not "limited" to the XUSER part. I tried to "hardcode" the username, then the date of registry will fail. If I skip those, the text of the post (code see below) will fail...
#text of getpost
text = etree.tostring(p.find(XTEXT),pretty_print=True)
Now, this whole error would make sense if my xpath would be wrong. However, all the other files and the first numbers of users in this file work. it is only this "one" at position()=11
Is position() uncapable of going >10 ? I don't think so?
Am I missing something?

Question answered!
I have found the answer...
I must have been very tired when I tried to fix it and came here to ask for help. I did not see something quite obvious...
The way I posted my problem, it was not visible either.
the HTML I downloaded and processed with tagsoup had an additional tag at position 11... this was not visible on the website and screwed with my xpath
(It probably is crappy html generated by the forum in combination with tagsoups attempt to make it parseable)
out of >20000 files less than 20 are afflicted, this one here just happened to be the first...
additionally sometimes the information is in table[4], other times in table[5]. I did account for this and wrote a function that will determine the correct table. Although I tested the function a LOT and thought it working correctly (hence did not inlcude it above), it did not.
So I made a better xpath:
'/html/body/table[tr/td[#width="20%"]]/tr[position()>1]'
and, although this is not related, I ran into another problem with unxpected encoding in the html file (not utf-8) which was fixed by adding:
parser = etree.XMLParser(encoding='ISO-8859-15')
t = etree.parse(forumfile, parser)
I am now confident that after adjusting for strange additional and multiple , and tags my code will work on all files...
Still I will be looking into lxml.html, as I mentioned in the comment, I have never used it before, but if it is more robust and may allow for using the files without tagsoup, it might be a better fit and save me extensive try/except statements and loops to fix the few files screwing with my current script...

PyObjC giving strange error - [OC_PythonUnicode representations]: unrecognized selector sent to instance 0x258ae2a0

I have this line:
NSWorkspace.sharedWorkspace().setIcon_forFile_options_(unicode(icon),unicode(target),0)
Why does it give that error and how do I fix it?
Thank you.

I misread the documentation. I need to do this:
NSWorkspace.sharedWorkspace().setIcon_forFile_options_(NSImage.alloc().initWithContentsOfFile_(icon),target,0)
Unfortunately the error is what confused me.

Python in Plone: trying to append a variable to RESPONSE.redirect

I have a python script in Plone, I'm having trouble appending a variable to RESPONSE.redirect. I get a invalid syntax error.
test = '1000'
RESPONSE.redirect(("/Plone/user_blast/public_blast_results/%s" % (test))

Its me being stupid, theres an extra bracket by redirect.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

scrapy xpath exception - python

self.product_urls.extend(hxs.select("//div[#id="product-list"]//div[#class="product-images"]/table/tr[1]//a')").extract()) This line of code gives me an exception "Invalid Path", I guess it's something wrong with "product-list" how can I write the same #id without getting the error?

Problem is with Extra parentheses , here is correct syntax. self.product_urls.extend(hxs.select('//div[#id="product-list"]//div[#class="product-images"]/table/tr[1]//a').extract()) google should be your best friend for this kind of issues, you need to learn Xpath/python basics as well

Related

Python - Socket Error 10054 - How to prevent terminal from preventing error?

Python Xpath Contains %s in Selenium

python lxml xpath AttributeError (NoneType) with correct xpath and usually working

PyObjC giving strange error - [OC_PythonUnicode representations]: unrecognized selector sent to instance 0x258ae2a0

Python in Plone: trying to append a variable to RESPONSE.redirect

Categories

Resources