I recently upgraded to 16.04.
I installed selenium==3.0.2 and now when I run the script, this is what I see:
Traceback (most recent call last):
File "aux.py", line 3, in <module>
browser = webdriver.Chrome()
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 408, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 440, in _request
resp = self._conn.getresponse()
File "/usr/local/lib/python3.4/http/client.py", line 1227, in getresponse
response.begin()
File "/usr/local/lib/python3.4/http/client.py", line 386, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.4/http/client.py", line 356, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
and this is all in the script:
from selenium import webdriver
# from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome()
browser.get('google.com')
Related
Just moved to a new work computer and I seem to be getting the following error when I try to use pip. Wondering what the issue could be, python 3.7.2 on Windows 10.
Is it that my work password has characters that aren't allowed?
Strange because my older computer didn't have this issue.
Exception:
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\cli\base_command.py", line 143, in main
status = self.run(options, args)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\commands\install.py", line 318, in run
resolver.resolve(requirement_set)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\resolve.py", line 102, in resolve
self._resolve_one(requirement_set, req)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\resolve.py", line 256, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\resolve.py", line 199, in _get_abstract_dist_for
skip_reason = self._check_skip_installed(req)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\resolve.py", line 170, in _check_skip_installed
self.finder.find_requirement(req_to_install, upgrade=True)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\index.py", line 572, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\index.py", line 530, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\index.py", line 675, in _get_pages
page = self._get_page(location)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\index.py", line 793, in _get_page
return _get_html_page(link, session=self.session)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\index.py", line 144, in _get_html_page
"Cache-Control": "max-age=0",
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\sessions.py", line 525, in get
return self.request('GET', url, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\pip\_internal\download.py", line 396, in request
return super(PipSession, self).request(method, url, *args, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\adapters.py", line 410, in send
conn = self.get_connection(request.url, proxies)
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\adapters.py", line 308, in get_connection
proxy_manager = self.proxy_manager_for(proxy)
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\adapters.py", line 191, in proxy_manager_for
proxy_headers = self.proxy_headers(proxy)
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\adapters.py", line 389, in proxy_headers
password)
File "C:\Program Files\Python37\lib\site-packages\pip\_vendor\requests\auth.py", line 63, in _basic_auth_str
password = password.encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201d' in position 0: ordinal not in range(256)
Oops I think I resolved it minutes later. The instructions for the environmental variable for the proxy server password included a "" that wasn't supposed to actually be in the environmental variable text.
I'm trying to get started with Selenium with Chromedriver.
I've downloaded the latest driver for Linux_64 to
/usr/local/share/chromedriver
And links to /usr/local/bin/chromdriver and /usr/bin/chromedriver
Python code
#!/usr/bin/env python
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('http://www.google.com/')
browser.save_screenshot('screenie.png')
browser.quit()
The file is executable
When running it the terminal gets idle. Only way to get some output is by pressing CTRL+C which the returns this
Traceback (most recent call last):
File "./test3.py", line 5, in <module>
browser = webdriver.Chrome()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 154, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 243, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 310, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 466, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 490, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1136, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize)
KeyboardInterrupt
Any ideas?
This error has been under my skin for a few hours now. I decided to code up a separate project just to see if I can replicate it and I can, but ONLY on my server. This works on my Mac.
Mac: OSX El Capitan 10.11.6
Server: CentOS 7.2.1511
Both have PhantomJS version: 2.1.1
Python Mac: Python 2.7.11
Python Server: 2.7.5
Both have Selenium version: 2.53.0
Identical code ran on both:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import NoSuchElementException
import time
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36"
dcap["phantomjs.page.customHeaders.accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
dcap["phantomjs.page.customHeaders.Accept-Language"] = "en-US,en;q=0.8"
dcap["phantomjs.page.customHeaders.connection"] = "keep-alive"
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.set_window_size(1120, 700)
driver.get("https://www.instagram.com/espn/")
while True:
print len(driver.find_elements_by_css_selector("a[href*='/p/']"))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
loadMore = driver.find_element_by_link_text("Load more")
loadMore.click()
except NoSuchElementException:
print "No such"
driver.save_screenshot('none.png')
Mac output:
12
24
No such
24
No such
36
No such
48
No such
48
No such
60
No such
72
No such
84
# This goes until I end it
Server output:
12
24
No such
Traceback (most recent call last):
File "junk.py", line 27, in <module>
driver.save_screenshot('none.png')
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 790, in get_screenshot_as_file
png = self.get_screenshot_as_png()
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 809, in get_screenshot_as_png
return base64.b64decode(self.get_screenshot_as_base64().encode('ascii'))
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 819, in get_screenshot_as_base64
return self.execute(Command.SCREENSHOT)['value']
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1217, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
Server output after removing the screenshot line:
12
24
No such
24
Traceback (most recent call last):
File "junk.py", line 23, in <module>
loadMore = driver.find_element_by_link_text("Load more")
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
{'using': by, 'value': value})['value']
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1217, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
One related answer I found was here: Can't run PhantomJS in python via Selenium
So I installed Selenium 2.37 and it gave the same error.
I read this answer about the problem perhaps behind related to changing the headers, so I removed the headers by changing the driver to driver = webdriver.PhantomJS() and still get the same error.
I also installed 2.7.12 on the server, to see if there was a difference. Output was:
# python2.7 junk.py
12
24
No such
24
Traceback (most recent call last):
File "junk.py", line 29, in <module>
loadMore = driver.find_element_by_link_text("Load more")
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
{'using': by, 'value': value})['value']
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1228, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1201, in do_open
r = h.getresponse(buffering=True)
File "/usr/local/lib/python2.7/httplib.py", line 1136, in getresponse
response.begin()
File "/usr/local/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python2.7/httplib.py", line 417, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
Checking space on system. It's a brand new VPS, but still, to confirm:
EDIT 3
Add the following:
except httplib.BadStatusLine:
pass
EDIT 2
Python WebDriver and phantomJs have a problem with keep_alive. This could be your problem. So add keep_alive=False as follows:
driver = webdriver.PhantomJS(desired_capabilities=dcap,keep_alive=False)
end edit
Add the following
import httplib
import socket
from selenium.webdriver.remote.command import Command
def get_status(driver):
try:
driver.execute(Command.STATUS)
return "Alive"
except (socket.error, httplib.CannotSendRequest):
return "Dead"
Call get_status(driver) just before the save_screenshot statement and print the result. This will tell us if the driver has prematurely shutdown.
EDIT
Add the following after driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.implicitly_wait(10) #wait 10 seconds when doing a find_element before carrying on
I want to create a web based scraper using Python, Selenium and PhantomJS where you can input a url into a form and the results from the scrape will be returned to the webpage. I can run it on my PC and I can also get it to work through the terminal.
It is located in a virtual environment on Dreamhost shared hosting with Python3.5 installed. I have tested that the parameters are being passed in fine, and it does work using just lxml and requests. However, when I try to run the script from the form on the webpage using PhantomJS then it doesn't work properly. The following error in returned...
Traceback (most recent call last):
File "testscrape.py", line 140, in <module>
driver = init_driver()
File "testscrape.py", line 69, in init_driver
driver = webdriver.PhantomJS(executable_path=phantomPATH,desired_capabilities=dcap)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/phantomjs/webdriver.py", line 56, in __init__
desired_capabilities=desired_capabilities)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 91, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 173, in start_session
'desiredCapabilities': desired_capabilities,
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 465, in open
response = self._open(req, data)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 483, in _open
'_open', req)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 443, in _call_chain
result = func(*args)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 1243, in do_open
r = h.getresponse()
File "/home/paul/.python35/lib/python3.5/http/client.py", line 1174, in getresponse
response.begin()
File "/home/paul/.python35/lib/python3.5/http/client.py", line 282, in begin
version, status, reason = self._read_status()
File "/home/paul/.python35/lib/python3.5/http/client.py", line 243, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/paul/.python35/lib/python3.5/socket.py", line 575, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
I have tried a few different variations of desired_capabilities and even changing file permissions of everything in the virtual environment but to no avail. I must be missing something, or is it just not possible? Any suggestions gratefully received.
I am running a script with Selenium, Firefox headless in my linux server. It is running well for my server. But I cannot install/configure the same thing for another one.
I am getting this error for my python script:
Traceback (most recent call last):
File "cde.py", line 290, in <module>
acde.Run()
File "cde.py", line 76, in Run
self.driver.get(self.link_to_explore)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 213, in get
self.execute(Command.GET, {'url': url})
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 199, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
May be I am missing to install something dependency. Is it possible to clone the configuration for certain app and use same in another machine ?