Can't run PhantomJS in python via Selenium - python

I have been trying to run PhantomJS via selenium for past 3 days and have had no success.
So far i have tried installing PhantomJS via npm, building it from source, installing via apt-get and downloading prebuilt executable and placing it in /usr/bin/phantomjs.
Every time I was able to run this example script loadspeed.js :
var page = require('webpage').create(),
system = require('system'),
t, address;
if (system.args.length === 1) {
console.log('Usage: loadspeed.js <some URL>');
phantom.exit();
}
t = Date.now();
address = system.args[1];
page.open(address, function (status) {
if (status !== 'success') {
console.log('FAIL to load the address');
} else {
t = Date.now() - t;
console.log('Loading time ' + t + ' msec');
}
phantom.exit();
});
and run it with 'phantomjs test.js http://google.com' and it worked just as it should.
but running PhantomJS via selenium in this small python script produces errors:
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get('http://seleniumhq.org')
python test.py
Traceback (most recent call last):
File "test.py", line 4, in <module>
browser.get('http://seleniumhq.org/')
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 176, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 162, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 350, in execute
return self._request(url, method=command_info[0], data=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 382, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 373, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
Replacing second LOC with browser = webdriver.Firefox() works fine.
I am on Ubuntu 13.10 desktop and same error occurs on Ubuntu 13.04 aswell.
Python: 2.7
PhantomJS: 1.9.2
What am I doing wrong here?

There seems to be some issue introduced in newer Selenium, see
http://code.google.com/p/selenium/issues/detail?id=6690
I got a bit further using by using
pip install selenium==2.37
Avoids the stack trace above. Still having problems with driver.save_screenshot('foo.png') resulting in an empty file though.

Related

pyppeteer.errors.BrowserError: Failed to connect to browser port

I have problems when using requests-html package on Python 3.6.5, Ubuntu 16.04(x64). To be more specific, the last line of
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://python-requests.org/')
r.html.render()
produces the following error:
Traceback (most recent call last): File "", line 1, in
File
"/home/candy/.conda/envs/candy_env/lib/python3.6/site-packages/requests_html.py",
line 572, in render
self.session.browser # Automatycally create a event loop and browser
File
"/home/candy/.conda/envs/candy_env/lib/python3.6/site-packages/requests_html.py",
line 680, in browser
self._browser = self.loop.run_until_complete(pyppeteer.launch(headless=True,
args=['--no-sandbox']))
File
"/home/candy/.conda/envs/candy_env/lib/python3.6/asyncio/base_events.py",
line 468, in run_until_complete
return future.result()
File
"/home/candy/.conda/envs/candy_env/lib/python3.6/site-packages/pyppeteer/launcher.py",
line 243, in launch
return await Launcher(options, **kwargs).launch()
File
"/home/candy/.conda/envs/candy_env/lib/python3.6/site-packages/pyppeteer/launcher.py",
line 160, in launch
self.browserWSEndpoint = self._get_ws_endpoint()
File
"/home/candy/.conda/envs/candy_env/lib/python3.6/site-packages/pyppeteer/launcher.py",
line 178, in _get_ws_endpoint
raise BrowserError(f'Failed to connect to browser port: {url}')
pyppeteer.errors.BrowserError: Failed to connect to browser port:
http://127.0.0.1:43623/json/version
However, the same code works well without errors on another Windows 10 platform, with the same Python requirements configured.
I have checked whether a Chrome has been downloaded successfully on my computer and the result is yes! So I think that's not where the problem is.
(candy_env)
candy#botwriter01:~/.pyppeteer/local-chromium/543305/chrome-linux$ ls
chrome
chrome_sandbox libclearkeycdm.so locales nacl_helper_bootstrap natives_blob.bin resources.pak
xdg-mime chrome_100_percent.pak chrome-wrapper libEGL.so
MEIPreload nacl_helper_nonsfi product_logo_48.png swiftshader
xdg-settings chrome_200_percent.pak icudtl.dat libGLESv2.so
nacl_helper nacl_irt_x86_64.nexe resources
v8_context_snapshot.bin
I have already searched the guidebook of requests-html for answers but got nothing found. I want the command r.html.render() to work correctly, what can I do now?

Pythonanywhere returning BadStatusLine when I am trying to get request of Airbnb through Selenium

This is my code.
import requests
from bs4 import BeautifulSoup
import time
import datetime
from selenium import webdriver
import io
from pyvirtualdisplay import Display
neighbours = []
with io.open('cntr_london.txt', "r", encoding="utf-8") as f:
for q in f:
neighbours.append(q.replace('neighborhoods%5B%5D=', '').replace('\n',''))
#url = 'https://www.airbnb.com/s/paris/homes?room_types%5B%5D=Entire%20home%2Fapt&room_types%5B%5D=Private%20room&price_max=' +str(price_max)+ '&price_min=' + str(price_min)
def scroll_through_bottom():
s = 0
while s <= 4000:
s = s+200
browser.execute_script('window.scrollTo(0, '+ str(s) +');')
def get_links():
link_data = browser.find_elements_by_class_name('_1szwzht')
for link in link_data:
link_tag = link.find_elements_by_tag_name('a')
for l in link_tag:
link_list.append(l.get_attribute("href"))
length = len(link_list)
print length
with Display():
browser = webdriver.Firefox()
try:
browser.get('http://airbnb.com')
finally:
browser.quit()
Every url is working. But when I am trying to get Airbnb, it is giving me this error:
Traceback (most recent call last):
File "airbnb_new.py", line 43, in <module>
browser.get('http://airbnb.com')
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 401, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 433, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
On the other hand, when I am trying to run my code in Python3, it is giving me no module named pyvirtualdisplay even though I installed it with pip.
Can someone please help me with this problem? I would highly appreciate it.
Airbnb has identified your scraper and as a preventive measure they are rejecting requests from your spider. So you can not do anything may be you can change IP and system info and check if it works or you can wait for couple of hour and check if airbnb system has released lock and accepting requests from your system too.

When initially creating a chromedriver in python, http.client.BadStatusLine: '' is thrown

When creating a new chromedriver instance (in python): webdriver.Chrome("./venv/selenium/webdriver/chromedriver"), I get an error http.client.BadStatusLine: ''. I am not navigating to a site, or using a server, just creating a new chromedriver. I am in a VirtualEnv that has the most recent version of Selenium (3.0.1) and chromedriver (2.24.1). This was working fine a few days ago, and I didn't change any code. I'm not really sure where to begin solving the code. My first step was to run pip install --upgrade -r requirements.txt to make sure all packages were up to date. My only idea now is that selenium is no handling the default start page, with url as data;,, because there is no response. However, as that is the default behavior, I would be surprised if selenium could not handle its' own default behavior. Any help would be much appreciated!
When the code is run (via python from the bash terminal), a new chromedriver instance is successfully created, but the error http.client.BadStatusLine: '' gets thrown, and the python terminal loses the connection to the chromedriver.
Full code:
import pythonscripts
# Creates a new webdriver
driver = pythonscripts.md()
# Never gets here, attempts to use driver get NameError: name 'driver' is not defined
Pythonscripts md method:
def md():
return webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
Full error output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/brydenr/server_scripts/cad_tests/pythonscripts.py", line 65, in md
return webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__
desired_capabilities=desired_capabilities)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 407, in execute
return self._request(command_info[0], url, body=data)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 439, in _request
resp = self._conn.getresponse()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1171, in getresponse
response.begin()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
Tried doing
try:
webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
except Exception:
webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
The result is two of the same traceback as before, and two chromedriver instances. It seems like this question points to an error in urllib, but it is for a slightly different situation.
This happened to me after I updated chrome to the latest version.
I just updated chromedriver to 2.25 and it works again.

How to click on load button dynamically using selenium python?

I want to click on load-more until it disappear on that page.
I have tried but it sometimes work or giving error. It is not perfect solution which i did.
I can have multiple url in a list and hit one by one and load-more until it disappear from that page.
Thanks in advance for helping.
Code
driver = webdriver.Firefox()
url = ["https://www.zomato.com/HauzKhasSocial","https://www.zomato.com/ncr/wendys-sector-29-gurgaon","https://www.zomato.com/vaultcafecp"]
for load in url:
driver.get(load)
xpath_content='//div[#class = "load-more"]'
temp_xpath="true"
while temp_xpath:
try:
#driver.implicitly.wait(15)
#WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH,xpath_content)))
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH,xpath_content)))
#urls=driver.find_element_by_xpath(xpath_content)
urls=driver.find_element_by_xpath(xpath_content)
text=urls.text
if text:
temp_xpath=text
print "XPATH=",temp_xpath
driver.find_element_by_xpath(xpath_content).click()
#driver.execute_script('$("div.load-more").click();')
except TimeoutException:
print driver.title, "no xpath of pagination"
temp_xpath=""
continue
Most of time I get following error while running my program.
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 173, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 380, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 373, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
You probably get the BadStatus error because of a bug that has been fixed in the latest versions of Selenium webdrivers. I got into a similar situation recently and here's the thread of discussion with the developers that helped me out.
https://bugs.chromium.org/p/chromedriver/issues/detail?id=1548

Selenium in Python

I've been using urllib2 to access webpages, but it doesn't support javascript, so I took a look at Selenium, but I'm quite confused even having read its docs.
I downloaded Selenium IDE add-on for firefox and I tried some simple things.
from selenium import selenium
import unittest, time, re
class test(unittest.TestCase):
def setUp(self):
self.verificationErrors = []
self.selenium = selenium("localhost", 4444, "*chrome", "http://www.wikipedia.org/")
self.selenium.start()
def test_test(self):
sel = self.selenium
sel.open("/")
sel.type("searchInput", "pacific ocean")
sel.click("go")
sel.wait_for_page_to_load("30000")
def tearDown(self):
self.selenium.stop()
self.assertEqual([], self.verificationErrors)
if __name__ == "__main__":
unittest.main()
I just access wikipedia.org and type pacific ocean in the search field, but when I try to compile it, it gives me a lot of errors.
If running the script results in a [Errno 111] Connection refused error such as this:
% test.py
E
======================================================================
ERROR: test_test (__main__.test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/unutbu/pybin/test.py", line 11, in setUp
self.selenium.start()
File "/data1/unutbu/pybin/selenium.py", line 189, in start
result = self.get_string("getNewBrowserSession", [self.browserStartCommand, self.browserURL, self.extensionJs])
File "/data1/unutbu/pybin/selenium.py", line 219, in get_string
result = self.do_command(verb, args)
File "/data1/unutbu/pybin/selenium.py", line 207, in do_command
conn.request("POST", "/selenium-server/driver/", body, headers)
File "/usr/lib/python2.6/httplib.py", line 898, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.6/httplib.py", line 935, in _send_request
self.endheaders()
File "/usr/lib/python2.6/httplib.py", line 892, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 764, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 723, in send
self.connect()
File "/usr/lib/python2.6/httplib.py", line 704, in connect
self.timeout)
File "/usr/lib/python2.6/socket.py", line 514, in create_connection
raise error, msg
error: [Errno 111] Connection refused
----------------------------------------------------------------------
Ran 1 test in 0.063s
FAILED (errors=1)
then the solution is most likely that you need get the selenium server running first.
In the download for SeleniumRC you will find a file called selenium-server.jar (as of a few months ago, that file was located at SeleniumRC/selenium-server-1.0.3/selenium-server.jar).
On Linux, you could run the selenium server in the background with the command
java -jar /path/to/selenium-server.jar 2>/dev/null 1>&2 &
You will find more complete instructions on how to set up the server here.
I would suggest you to use a webdriver, you can find it here: http://code.google.com/p/selenium/downloads/list. If you want to write tests as a coder (and not with the use of your mouse), that thing would work better then the RC version you're trying to use, at least because it would not ask you for an SeleniumRC Jar Instance. You would simply have a binary of a browser or use those ones that are already installed on your system, for example, Firefox.
I faced with this issue in my project and found that problem was in few webdriver.get calls with very small time interval between them. My fix was not to put delay, just remove unneeded calls and error disappears.
Hope, it can helps for somebody.

Categories

Resources