python gets stuck loading page with mechanize

python gets stuck loading page with mechanize - python

While using mechanize to open and process a lot pages (1000+) on a website I have hit a strange problem. Every now and then I get stuck trying to load a page, without timing out, the problem doesn't seem to be page specific as if I run it again and try to open the same page it works as a charm, but rather seem to happen at random.
I'm using this function to open pages
def openMechanize(br, url):
while True:
try:
print time.localtime()
print "opening: " + url
resp = br.open(url, timeout = 2.5)
print "done\n"
return resp
except Exception, errormsg:
print repr(errormsg)
print "failed to load page, retrying"
time.sleep(0.5)
When it gets stuck it makes the first print, current time and opening url, but never gets to the second one. I have tried to let it run for hours but nothing happens.
When interrupting the script with ctrl+c while it is stuck I get the following output:
File "test.py", line 143, in openMechanize
resp = br.open(url, timeout = 2.5)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 230, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 193, in open
response = urlopen(self, req, data)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 344, in _open
'_open', req)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1142, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1116, in do_open
r = h.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
KeyboardInterrupt
Upon inspecting socket.py, where it gets stuck, I see the following:
self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
while True:
try:
data = self._sock.recv(self._rbufsize)
except error, e:
if e.args[0] == EINTR:
continue
raise
Looks like it gets stuck in an endless loop as recv for some reason crashes
Has anyone experienced this error and found some sort of fix?

Related

Selenium python types of content loading constraints

I need to parse the news from several sites with javascript and use selenium + PhantomJS for it. But there are videos on these sites, which are useless for me and I don't need them at all. (I was given an advice to use selenium + Chrome or selenium + Firefox, but I don't need any opening windows during parsing).
These videos start playing automatically according to the site's logic, and in the end of the end exception http.client.RemoteDisconnected: Remote end closed connection without response throws.
I think it throws because my internet is very slow and videos can't be full loaded with it.
How can I avoid this problem?
May be any content constraints exist in the selenium or PhantomJS?
Full traceback:
File "viralnova/viralnova.py", line 101, in parse_viralnova
_parse_post_link(postlinktest, driver)
File "viralnova/viralnova.py", line 9, in _parse_post_link
driver.get(post_link)
File "/Users/user/anaconda/envs/env/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 309, in get
self.execute(Command.GET, {'url': url})
File "/Users/user/anaconda/envs/env/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 295, in execute
response = self.command_executor.execute(driver_command, params)
File "/Users/user/anaconda/envs/env/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py", line 464, in execute
return self._request(command_info[0], url, body=data)
File "/Users/user/anaconda/envs/env/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py", line 526, in _request
resp = opener.open(request, timeout=self._timeout)
File "/Users/user/anaconda/envs/env/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/Users/user/anaconda/envs/env/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/Users/user/anaconda/envs/env/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Users/user/anaconda/envs/env/lib/python3.6/urllib/request.py", line 1346, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/Users/user/anaconda/envs/env/lib/python3.6/urllib/request.py", line 1321, in do_open
r = h.getresponse()
File "/Users/user/anaconda/envs/env/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/Users/user/anaconda/envs/env/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/Users/user/anaconda/envs/env/lib/python3.6/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
Code is here
def _parse_post_link(post_link, driver):
try:
driver.get(post_link)
except Exception:
return None
post_page_soup = Soup(driver.page_source, "lxml")
title = post_page_soup.find('div', attrs={'class': 'post-box-detail article'}).h2.text
print(title)
def parse_viralnova(to_csv=True):
driver = webdriver.PhantomJS("/Users/user/.phantomjsdriver/phantomjs")
postlinktest = 'http://www.viralnova.com/restroom-design-fails/'
_parse_post_link(postlinktest, driver)

If it's just parsing the text content that you're after, you might consider using just Python and BeautifulSoup. You shouldn't be triggering anything in the browser this way since you won't use one at all (you mentioned you don't need windows opening) and at the same time the solution will be faster lacking that browser overhead.
If you do need some javascript loaded, you can try using dryscape as well.

Selenium throws an error getting the current URL

Trying to scrape a webpage for data, I check the current URL to make sure I'm on the expected page. However, it eventually raises an error and it seems to be when checking the URL. I can't figure out why, and when it happens isn't consistent. Sometimes it's several pages into the script, sometimes it's only a few pages in.
Traceback (most recent call last):
File "scrape.py", line 5, in <module>
scraper.start_search("ebook")
File "/home/ubuntu/workspace/scraper/school/scraper.py", line 56, in start_search
self.scrape_item(product_el)
File "/home/ubuntu/workspace/scraper/school/scraper.py", line 97, in scrape_item
if self.driver.current_url.split("/")[3] != "search":
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 493, in current_url
return self.execute(Command.GET_CURRENT_URL)['value']
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 415, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 489, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>
The seemingly relevant code is just:
if self.driver.current_url.split("/")[3] != "search":
time.sleep(random.randint(1, 3))
self.driver.back()
I'm using Python 2.7, Selenium, and PhantomJS.

I don't know why this is happening, though I have also seen current_url be flaky. Have you tried mitigating this with some exception handling?
from retry import retry
from urllib2 import URLError
#retry(URLError, tries=3)
def get_url(driver):
return driver.current_url
def main():
# Whatever setup you have goes here
# <...>
if get_url(driver).split("/")[3] != "search":
time.sleep(random.randint(1, 3))
driver.back()
if __name__ == "__main__":
main()
The retry package is available from PyPI

Python handling socket.error: [Errno 104] Connection reset by peer

When using Python 2.7 with urllib2 to retrieve data from an API, I get the error [Errno 104] Connection reset by peer. Whats causing the error, and how should the error be handled so that the script does not crash?
ticker.py
def urlopen(url):
response = None
request = urllib2.Request(url=url)
try:
response = urllib2.urlopen(request).read()
except urllib2.HTTPError as err:
print "HTTPError: {} ({})".format(url, err.code)
except urllib2.URLError as err:
print "URLError: {} ({})".format(url, err.reason)
except httplib.BadStatusLine as err:
print "BadStatusLine: {}".format(url)
return response
def get_rate(from_currency="EUR", to_currency="USD"):
url = "https://finance.yahoo.com/d/quotes.csv?f=sl1&s=%s%s=X" % (
from_currency, to_currency)
data = urlopen(url)
if "%s%s" % (from_currency, to_currency) in data:
return float(data.strip().split(",")[1])
return None
counter = 0
while True:
counter = counter + 1
if counter==0 or counter%10:
rateEurUsd = float(get_rate('EUR', 'USD'))
# does more stuff here
Traceback
Traceback (most recent call last):
File "/var/www/testApp/python/ticker.py", line 71, in <module>
rateEurUsd = float(get_rate('EUR', 'USD'))
File "/var/www/testApp/python/ticker.py", line 29, in get_exchange_rate
data = urlopen(url)
File "/var/www/testApp/python/ticker.py", line 16, in urlopen
response = urllib2.urlopen(request).read()
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 438, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 625, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 438, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 625, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1180, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 447, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 104] Connection reset by peer
error: Forever detected script exited with code: 1

"Connection reset by peer" is the TCP/IP equivalent of slamming the phone back on the hook. It's more polite than merely not replying, leaving one hanging. But it's not the FIN-ACK expected of the truly polite TCP/IP converseur. (From other SO answer)
So you can't do anything about it, it is the issue of the server.
But you could use try .. except block to handle that exception:
from socket import error as SocketError
import errno
try:
response = urllib2.urlopen(request).read()
except SocketError as e:
if e.errno != errno.ECONNRESET:
raise # Not error we are looking for
pass # Handle error here.

You can try to add some time.sleep calls to your code.
It seems like the server side limits the amount of requests per timeunit (hour, day, second) as a security issue. You need to guess how many (maybe using another script with a counter?) and adjust your script to not surpass this limit.
In order to avoid your code from crashing, try to catch this error with try .. except around the urllib2 calls.

There is a way to catch the error directly in the except clause with ConnectionResetError, better to isolate the right error.
This example also catches the timeout.
from urllib.request import urlopen
from socket import timeout
url = "http://......"
try:
string = urlopen(url, timeout=5).read()
except ConnectionResetError:
print("==> ConnectionResetError")
pass
except timeout:
print("==> Timeout")
pass

there are 2 solution you can try.
request too frequently.
try sleep after per request
time.sleep(1)
the server detect the request client is python, so reject.
add User-Agent in header to handle this.
headers = {
"Content-Type": "application/json;charset=UTF-8",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)"
}
try:
res = requests.post("url", json=req, headers=headers)
except Exception as e:
print(e)
pass
the second solution save me

Various Urllib2 errors when running Selenium webdriver on a VPS

I'm using Selenium with Python bindings to scrape AJAX content from a web page with headless Firefox. It works perfectly when run on my local machine. When I run the exact same script on my VPS, errors get thrown on seemingly random (yet consistent) lines. My local and remote systems have the same exact OS/architecture, so I'm guessing the difference is VPS-related.
For each of these tracebacks, the line is run 4 times before an error is thrown.
I most often get this URLError when executing JavaScript to scroll an element into view.
File "google_scrape.py", line 18, in _get_data
driver.execute_script("arguments[0].scrollIntoView(true);", e)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 396, in execute_script
{'script': script, 'args':converted_args})['value']
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 162, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 355, in execute
return self._request(url, method=command_info[0], data=data)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 402, in _request
response = opener.open(request)
File "/usr/lib64/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>
Occasionally I'll get this BadStatusLine when reading text from an element.
File "google_scrape.py", line 19, in _get_data
if e.text.strip():
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 55, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 233, in _execute
return self._parent.execute(command, params)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 162, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 355, in execute
return self._request(url, method=command_info[0], data=data)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 402, in _request
response = opener.open(request)
File "/usr/lib64/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib64/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 373, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
A couple times I've gotten a socket error:
File "google_scrape.py", line 19, in _get_data
if e.text.strip():
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 55, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 233, in _execute
return self._parent.execute(command, params)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 162, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 355, in execute
return self._request(url, method=command_info[0], data=data)
File "/home/ryne/.virtualenvs/DEV/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 402, in _request
response = opener.open(request)
File "/usr/lib64/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib64/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib64/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 104] Connection reset by peer
I'm scraping from Google without a proxy, so my first thought was that my IP address is recognized as a VPS and put under a 5-time page-manipulation limitation or something. But my initial research indicates that these errors would not arise from being blocked.
Any insight into what these errors mean collectively, or on the necessary considerations when making HTTP requests from a VPS would be much appreciated.
Update
After a little thinking and looking into what a webdriver really is -- automated browser input -- I should have been confused about why remote_connection.py is making urllib2 requests at all. It would seem that the text method of the WebElement class is an "extra" feature of the python bindings that isn't part of the Selenium core. That doesn't explain the above errors, but it may indicate that the text method shouldn't be used for scraping.
Update 2
I realized that, for my purposes, Selenium's only function is getting the ajax content to load. So after the page loads, I'm parsing the source with lxml rather than getting elements with Selenium, i.e.:
html = lxml.html.fromstring(driver.page_source)
However, page_source is yet another method that results in a call to urllib2, and I consistently get the BadStatusLine error the second time I use it. Minimizing urllib2 requests is definitely a step in the right direction.
Update 3
Eliminating urllib2 requests by grabbing the source with javascript is better yet:
html = lxml.html.fromstring(driver.execute_script("return window.document.documentElement.outerHTML"))
Conclusion
These errors can be avoided by doing a time.sleep(10) between every few requests. The best explanation I've come up with is that Google's firewall recognizes my IP as a VPS and therefore puts it under a stricter set of blocking rules.
This was my initial thought, but I still find it hard to believe because my web searches return no indication that the above errors could be caused by a firewall.
If this is the case though, I would think the stricter rules could be circumvented with a proxy, though that proxy might have to be a local system or tor to avoid the same restrictions.

As per our conversation, you discovered that even for a small number of daily scrapes, Google has anti-scraping blocking in place. The solution is to put a delay of a few seconds between each fetch.
In the general case, since you are technically transferring non-recoverable costs to a third party, it is always good practice to try to reduce the extra resource load you are placing upon the remote server. Without pauses between HTTP fetches, a fast server and connection can cause a remote denial of service, especially to scrape targets that do not have Google's server resources.

Webdriver + PhantomJS just hangs in there

I'm using Selenium Webdriver (in Python) to automate the downloading of thousands of files from a certain website (that can't be webscraped by conventional means like urllib, httplib, etc). My script works perfectly with Firefox, but I don't need to see magic happening, so I'm trying to use PhantomJS. It works almost all the way down, except when it tries to click a certain button in order to close a window. Here's the command at which the script gets stuck:
browser.find_element_by_css_selector("img[alt=\"Close Window\"]").click()
It just hangs in there, nothing happens.
PhantomJS is faster than Firefox (since there are no visuals), so I thought the problem might be related to the 'Close Window' button not being clickable soon enough. Hence I tried using an explicit wait:
element = WebDriverWait(browser, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "img[alt=\"Close Window\"]")))
print "done with waiting"
browser.find_element_by_css_selector("img[alt=\"Close Window\"]").click()
Doesn't work: the wait ends pretty quickly (the "done with waiting" message appears after a second or so), but then the code hangs again. I've also tried using an implicit wait, but that didn't work either.
So, I'm at a loss. The same script runs like a charm when I use Firefox, so why doesn't the it work with PhantomJS?
I don't know if this helps, but here is the page source:
http://www.flickr.com/photos/88729961#N00/9512669916/sizes/l/in/photostream/
I don't know if this helps either, but when I break the execution w/ Crtl-C, I get this:
Traceback (most recent call last):
File "myscript.py", line 361, in <module>
myfunction(some_argument, some_other_argument)
File "myscript.py", line 277, in myfunction
browser.find_element_by_css_selector("img[alt=\"Close Window\"]").click()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium-2.33.0-py2.7.egg/selenium/webdriver/remote/webelement.py", line 54, in click
self._execute(Command.CLICK_ELEMENT)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium-2.33.0-py2.7.egg/selenium/webdriver/remote/webelement.py", line 228, in _execute
return self._parent.execute(command, params)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium-2.33.0-py2.7.egg/selenium/webdriver/remote/webdriver.py", line 163, in execute
response = self.command_executor.execute(driver_command, params)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium-2.33.0-py2.7.egg/selenium/webdriver/remote/remote_connection.py", line 349, in execute
return self._request(url, method=command_info[0], data=data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium-2.33.0-py2.7.egg/selenium/webdriver/remote/remote_connection.py", line 396, in _request
response = opener.open(request)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
KeyboardInterrupt
I'm new to programming and I can't make sense of this output (I don't even know what a "socket" is). But maybe some of you can point me in the right direction? A quick fix might be too much to ask, but maybe a hint as to what could be going on?
(Mac OS X 10.6.8, Python 2.7.5, Selenium 2.33, PhantomJS 1.9.1)

Running the following line of code in your script, solves the question.
browser.execute_script("closeWindow(false, '/lnacui2api/cart/displayCart.do', 'false');");

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python gets stuck loading page with mechanize - python

Related

Selenium python types of content loading constraints

Selenium throws an error getting the current URL

Python handling socket.error: [Errno 104] Connection reset by peer

Various Urllib2 errors when running Selenium webdriver on a VPS

Webdriver + PhantomJS just hangs in there

Categories

Resources