Headless Chrome in Python (Linux) - Taking screenshot - python

I'm trying to get started with Selenium with Chromedriver.
I've downloaded the latest driver for Linux_64 to
/usr/local/share/chromedriver
And links to /usr/local/bin/chromdriver and /usr/bin/chromedriver
Python code
#!/usr/bin/env python
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('http://www.google.com/')
browser.save_screenshot('screenie.png')
browser.quit()
The file is executable
When running it the terminal gets idle. Only way to get some output is by pressing CTRL+C which the returns this
Traceback (most recent call last):
File "./test3.py", line 5, in <module>
browser = webdriver.Chrome()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 154, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 243, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 310, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 466, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 490, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1136, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 409, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize)
KeyboardInterrupt
Any ideas?

Related

Selenium - Raises httplib.BadStatusLine

I'm using PhantomJS with Selenium, I want to do several searches on stackoverflow. This code worked fine with my local pc, when I changed it to a server with less RAM it raises the httplib.BadStatusLine error.
This sounds either like a problem with the installs or with exceeding system specs, can someone clarify how to solve this?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
start = time.time()
driver = webdriver.PhantomJS() # or add to your PATH
array = ["Python", "Javascript", "Golang", "C++", "C", "CSS", "XML", "Json", "Ruby", "Django",
"Rails", "Java", "OpenCV", "GameMaker", "iOS", "Android", "React", "AngularJS", "React Native"]
for i in array:
driver.get('http://stackoverflow.com/')
sbtn = driver.find_element(By.XPATH, '//*[#id="search"]/input')
sbtn.send_keys(i)
sbtn.send_keys(Keys.ENTER)
driver.save_screenshot('Photos/screen' + i + '.png') # save a screenshot to disk
Error:
Traceback (most recent call last):
File "test.py", line 14, in <module>
driver.save_screenshot('Photos/screen' + i + '.png') # save a screenshot to disk
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 798, in get_screenshot_as_file
png = self.get_screenshot_as_png()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 817, in get_screenshot_as_png
return base64.b64decode(self.get_screenshot_as_base64().encode('ascii'))
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 827, in get_screenshot_as_base64
return self.execute(Command.SCREENSHOT)['value']
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 408, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 478, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1051, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 415, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 379, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
You're seeing the error because you ran out of memory on the server, so you'll need to increase the amount of memory. I'll assume that you have more memory on your local computer, hence, the reason why it runs as expected. Unfortunately, PhantomJS + Selenium are memory intensive.

this error while running selenium script with Python on Ubuntu

I recently upgraded to 16.04.
I installed selenium==3.0.2 and now when I run the script, this is what I see:
Traceback (most recent call last):
File "aux.py", line 3, in <module>
browser = webdriver.Chrome()
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 408, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 440, in _request
resp = self._conn.getresponse()
File "/usr/local/lib/python3.4/http/client.py", line 1227, in getresponse
response.begin()
File "/usr/local/lib/python3.4/http/client.py", line 386, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.4/http/client.py", line 356, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
and this is all in the script:
from selenium import webdriver
# from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome()
browser.get('google.com')

httplib.BadStatusLine: '' on Linux but not Mac

This error has been under my skin for a few hours now. I decided to code up a separate project just to see if I can replicate it and I can, but ONLY on my server. This works on my Mac.
Mac: OSX El Capitan 10.11.6
Server: CentOS 7.2.1511
Both have PhantomJS version: 2.1.1
Python Mac: Python 2.7.11
Python Server: 2.7.5
Both have Selenium version: 2.53.0
Identical code ran on both:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import NoSuchElementException
import time
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36"
dcap["phantomjs.page.customHeaders.accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
dcap["phantomjs.page.customHeaders.Accept-Language"] = "en-US,en;q=0.8"
dcap["phantomjs.page.customHeaders.connection"] = "keep-alive"
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.set_window_size(1120, 700)
driver.get("https://www.instagram.com/espn/")
while True:
print len(driver.find_elements_by_css_selector("a[href*='/p/']"))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
loadMore = driver.find_element_by_link_text("Load more")
loadMore.click()
except NoSuchElementException:
print "No such"
driver.save_screenshot('none.png')
Mac output:
12
24
No such
24
No such
36
No such
48
No such
48
No such
60
No such
72
No such
84
# This goes until I end it
Server output:
12
24
No such
Traceback (most recent call last):
File "junk.py", line 27, in <module>
driver.save_screenshot('none.png')
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 790, in get_screenshot_as_file
png = self.get_screenshot_as_png()
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 809, in get_screenshot_as_png
return base64.b64decode(self.get_screenshot_as_base64().encode('ascii'))
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 819, in get_screenshot_as_base64
return self.execute(Command.SCREENSHOT)['value']
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1217, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
Server output after removing the screenshot line:
12
24
No such
24
Traceback (most recent call last):
File "junk.py", line 23, in <module>
loadMore = driver.find_element_by_link_text("Load more")
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
{'using': by, 'value': value})['value']
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1217, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
One related answer I found was here: Can't run PhantomJS in python via Selenium
So I installed Selenium 2.37 and it gave the same error.
I read this answer about the problem perhaps behind related to changing the headers, so I removed the headers by changing the driver to driver = webdriver.PhantomJS() and still get the same error.
I also installed 2.7.12 on the server, to see if there was a difference. Output was:
# python2.7 junk.py
12
24
No such
24
Traceback (most recent call last):
File "junk.py", line 29, in <module>
loadMore = driver.find_element_by_link_text("Load more")
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
{'using': by, 'value': value})['value']
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1228, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1201, in do_open
r = h.getresponse(buffering=True)
File "/usr/local/lib/python2.7/httplib.py", line 1136, in getresponse
response.begin()
File "/usr/local/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python2.7/httplib.py", line 417, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
Checking space on system. It's a brand new VPS, but still, to confirm:
EDIT 3
Add the following:
except httplib.BadStatusLine:
pass
EDIT 2
Python WebDriver and phantomJs have a problem with keep_alive. This could be your problem. So add keep_alive=False as follows:
driver = webdriver.PhantomJS(desired_capabilities=dcap,keep_alive=False)
end edit
Add the following
import httplib
import socket
from selenium.webdriver.remote.command import Command
def get_status(driver):
try:
driver.execute(Command.STATUS)
return "Alive"
except (socket.error, httplib.CannotSendRequest):
return "Dead"
Call get_status(driver) just before the save_screenshot statement and print the result. This will tell us if the driver has prematurely shutdown.
EDIT
Add the following after driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.implicitly_wait(10) #wait 10 seconds when doing a find_element before carrying on

How to Call PhantomJS from webpage using Python - error on initiation

I want to create a web based scraper using Python, Selenium and PhantomJS where you can input a url into a form and the results from the scrape will be returned to the webpage. I can run it on my PC and I can also get it to work through the terminal.
It is located in a virtual environment on Dreamhost shared hosting with Python3.5 installed. I have tested that the parameters are being passed in fine, and it does work using just lxml and requests. However, when I try to run the script from the form on the webpage using PhantomJS then it doesn't work properly. The following error in returned...
Traceback (most recent call last):
File "testscrape.py", line 140, in <module>
driver = init_driver()
File "testscrape.py", line 69, in init_driver
driver = webdriver.PhantomJS(executable_path=phantomPATH,desired_capabilities=dcap)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/phantomjs/webdriver.py", line 56, in __init__
desired_capabilities=desired_capabilities)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 91, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 173, in start_session
'desiredCapabilities': desired_capabilities,
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/home/paul/.python35/bin/magenv/lib/python3.5/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 465, in open
response = self._open(req, data)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 483, in _open
'_open', req)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 443, in _call_chain
result = func(*args)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/home/paul/.python35/lib/python3.5/urllib/request.py", line 1243, in do_open
r = h.getresponse()
File "/home/paul/.python35/lib/python3.5/http/client.py", line 1174, in getresponse
response.begin()
File "/home/paul/.python35/lib/python3.5/http/client.py", line 282, in begin
version, status, reason = self._read_status()
File "/home/paul/.python35/lib/python3.5/http/client.py", line 243, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/paul/.python35/lib/python3.5/socket.py", line 575, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
I have tried a few different variations of desired_capabilities and even changing file permissions of everything in the virtual environment but to no avail. I must be missing something, or is it just not possible? Any suggestions gratefully received.

how to clone Same Linux configuration for another server

I am running a script with Selenium, Firefox headless in my linux server. It is running well for my server. But I cannot install/configure the same thing for another one.
I am getting this error for my python script:
Traceback (most recent call last):
File "cde.py", line 290, in <module>
acde.Run()
File "cde.py", line 76, in Run
self.driver.get(self.link_to_explore)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 213, in get
self.execute(Command.GET, {'url': url})
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 199, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request
resp = self._conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
May be I am missing to install something dependency. Is it possible to clone the configuration for certain app and use same in another machine ?

Categories

Resources