I am currently trying to implement a scraper that will check twice a day for if certain PDFs change names. Unfortunately it requires website manipulation to find the pdfs so the best solution in my mind is a combination of Selenium and AWS Lambda.
To begin I was following this tutorial. I have completed the tutorial but ran into this error from Lambda:
START RequestId: 18637c6d-ea75-40ee-8789-374654700b99 Version: $LATEST
Starting google.com
Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
: WebDriverException
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 46, in lambda_handler
driver = webdriver.Chrome(chrome_options=chrome_options)
File "/var/task/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
self.service.start()
File "/var/task/selenium/webdriver/common/service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
This error was experienced by others and was "resolved" by the author by linking to this stack overflow page. I have tried going through it but all the answers are pertaining to using headless chromium on desktop not AWS lambda.
A couple of changes Ive tried to no avail.
1) Changing the chromedriver and headless-chromium to .exe files
2) Changing this line of code to include the executable_path
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "/bin/chromedriver.exe")
Any help in getting selenium and aws lambda working together would be greatly appreciated.
I had the same issue and it was due to the binary files being in a location that couldn't execute them. Adding a function to move them, then reading them from that location fixed it. See below example which I just got working while researching this error. (Apologies for the messy code.)
import time
import os
from selenium import webdriver
from fake_useragent import UserAgent
import subprocess
import shutil
import time
BIN_DIR = "/tmp/bin"
CURR_BIN_DIR = os.getcwd() + "/bin"
def _init_bin(executable_name):
start = time.clock()
if not os.path.exists(BIN_DIR):
print("Creating bin folder")
os.makedirs(BIN_DIR)
print("Copying binaries for " + executable_name + " in /tmp/bin")
currfile = os.path.join(CURR_BIN_DIR, executable_name)
newfile = os.path.join(BIN_DIR, executable_name)
shutil.copy2(currfile, newfile)
print("Giving new binaries permissions for lambda")
os.chmod(newfile, 0o775)
elapsed = time.clock() - start
print(executable_name + " ready in " + str(elapsed) + "s.")
def handler(event, context):
_init_bin("headless-chromium")
_init_bin("chromedriver")
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.binary_location = "/tmp/bin/headless-chromium"
driver = webdriver.Chrome("/tmp/bin/chromedriver", chrome_options=chrome_options)
driver.get('https://en.wikipedia.org/wiki/Special:Random')
line = driver.find_element_by_class_name('firstHeading').text
print(line)
driver.quit()
return line
I also had the same issue but I have fixed it now. In my case it was the python version was not same on lambda and My Dockerfile.
Related
I am trying to run this script
import crawler
crawler.crawl(url="https://www.montratec.com",output_dir="crawling_test",method="rendered-all")
from this library:
https://github.com/SimFin/pdf-crawler
but I am getting this error:
Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
I already have Firefox installed and I am using Windows.
If you have Firefox installed in a non-default location which is not in your system’s search path, you can specify a binary field on the moz:firefoxOptions capabilities object (documented in README), or use the --binary PATH flag passed to geckodriver when it starts.
Since Selenium is tagged, You can do the following changes to get rid of the above error :-
This is purely selenium solution, if you have a running instance of driver, you re-configure it using the FirefoxOptions like below :
options = webdriver.FirefoxOptions()
options.binary_location = r"C:\Program Files\Mozilla Firefox\firefox.exe"
driver = webdriver.Firefox(executable_path=r'\geckodriver.exe full path here', firefox_options=options)
driver.get("https://www.montratec.com1")
for crawler (Web scraping framework based on py3 asyncio & aiohttp libraries.)
Installation :
pip install crawler
Sample code :
import re
from itertools import islice
from crawler import Crawler, Request
RE_TITLE = re.compile(r'<title>([^<]+)</title>', re.S | re.I)
class TestCrawler(Crawler):
def task_generator(self):
for host in islice(open('var/domains.txt'), 100):
host = host.strip()
if host:
yield Request('http://%s/' % host, tag='page')
def handler_page(self, req, res):
print('Result of request to {}'.format(req.url))
try:
title = RE_TITLE.search(res.body).group(1)
except AttributeError:
title = 'N/A'
print('Title: {}'.format(title))
bot = TestCrawler(concurrency=10)
bot.run()
Official reference here
How to solve NameError:Options is not defined in AWS Lambda ?
my attempts
Added selenium module file in awslambda
aws docuemnt refer. but not solve
stackoverflow errormessage search but not solve.
Then, it worked, but the error occurred (cloudwatch).
2021-06-28T12:30:31.892+09:00 START RequestId: 7bb8408e-2b12-4e16-80be-e6f1b0166a60 Version: $LATEST
2021-06-28T12:30:31.892+09:00 Error in Imports
2021-06-28T12:30:31.895+09:00 [ERROR] NameError: name 'Options' is not defined Traceback (most recent call last): File "/var/task/lambda_function.py", line 203, in lambda_handler instance_ = WebDriver() File "/var/task/lambda_function.py", line 25, in __init__ self.options = Options()
selenium sample code
import json
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
def __init__(self):
self.options = Options()
self.options.binary_location = '/opt/headless-chromium'
self.options.add_argument('--headless')
self.options.add_argument('--no-sandbox')
self.options.add_argument('--start-maximized')
self.options.add_argument('--start-fullscreen')
self.options.add_argument('--single-process')
self.options.add_argument('--disable-dev-shm-usage')
def get(self):
driver = Chrome('/opt/chromedriver', options=self.options)
return driver
my goals
Through aws_lambda, I want to collect data through crawling all day long.
Just had a similar issue so perhaps it can help you:
Check if the selenium, chromdriver $ headless-chromium are zipped properly.
so when you upload it as a layer to aws lambda, it has the right pathname for you to import selenium in your lambda code.
you can download the file chrome_headless.zip from: https://github.com/soumilshah1995/Selenium-on-AWS-Lambda-Python3.7
and then zip it again inside that folder, on your local machine:
zip -r chrome_headless.zip chromedriver headless-chromium python
then try to upload it to aws lambda layers, attach it to your function and test it.
all credit goes to Soumil Nitin Shah with his great tutorial:
https://www.youtube.com/watch?v=jWqbYiHudt8
I want to use the chrome webdriver to connect to "https://www.google.com".
below is the code.
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\\Users\\faisal\\library")
driver.set_page_load_timeout(10)
driver.get("https://www.google.com")
driver.find_element_by_name("q").send_keys(" automation by name ")
driver.find_element_by_name("blink").click()
time.sleep(5)
driver.close()
When I run the test, the following error message is displayed.Its a permission problem
C:\Users\faisal\PycharmProjects\firstSeleniumTest2\venv\Scripts\python.exe C:/Users/faisal/PycharmProjects/firstSeleniumTest2/test.py
Traceback (most recent call last):
File "C:\Users\faisal\PycharmProjects\firstSeleniumTest2\venv\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
stdin=PIPE)
File "C:\Python\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Python\lib\subprocess.py", line 997, in _execute_child
startupinfo)
PermissionError: [WinError 5] Access is denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/faisal/PycharmProjects/firstSeleniumTest2/test.py", line 4, in <module>
driver = webdriver.Chrome("C:\\Users\\faisal\\library")
File "C:\Users\faisal\PycharmProjects\firstSeleniumTest2\venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 68, in __init__
self.service.start()
File "C:\Users\faisal\PycharmProjects\firstSeleniumTest2\venv\lib\site-packages\selenium\webdriver\common\service.py", line 88, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'library' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
Process finished with exit code 1
The error says it all :
selenium.common.exceptions.WebDriverException: Message: 'library' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
In you code block you mentioned :
driver = webdriver.Chrome("C:\\Users\\faisal\\library")
The error clearly says your program was considering library as the ChromeDriver binary. Hence the error.
But as per the documentation of selenium.webdriver.chrome.webdriver the call to the WebDriver() is as :
class selenium.webdriver.chrome.webdriver.WebDriver(executable_path='chromedriver', port=0, options=None, service_args=None, desired_capabilities=None, service_log_path=None, chrome_options=None)
So you need to change send the Key executable_path along with the Value as the absolute path within single qoute '' along with the raw (r) switch as follows :
driver = webdriver.Chrome(executable_path=r'C:\Users\faisal\library\chromedriver.exe')
Update
As per the counter question from #Mangohero1 of-coarse executable_path is optional but in case you provide only the absolute path as per the source code provided below the absolute path is considered as the Value to the Key executable_path.
class WebDriver(RemoteWebDriver):
"""
Controls the ChromeDriver and allows you to drive the browser.
You will need to download the ChromeDriver executable from
http://chromedriver.storage.googleapis.com/index.html
"""
def __init__(self, executable_path="chromedriver", port=0,
options=None, service_args=None,
desired_capabilities=None, service_log_path=None,
chrome_options=None):
"""
Creates a new instance of the chrome driver.
Starts the service and then creates new instance of chrome driver.
:Args:
- executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
C:\Users\faisal\library is not the correct path to chromedriver. Give the actual path to your chromedriver file.
In case of Linux providing permission will solve the problem.
Use
sudo chmod +x chromedriver
driver=webdriver.Chrome("C:\\Users\\SQA Anas\\Downloads\\chromedriver.exe")
Please enter the complete chrome driver path like this:
"C:\Users\SQA Anas\Downloads\chromedriver.exe"
Its works for me :)
The executable_path should have chromedriver added at last:
executable_path='/home/selenium/Linkedin-Automation/chromedriver'
I had to use the following to run on Windows 10 64 bit and 32 bit chromedriver:
driver = webdriver.Chrome(executable_path=r'C:\\Users\\My Name\\Downloads\\chromedriver_win32\\chromedriver.exe')
I am using python 3.6 and using the latest version of chromedriver, I have tried using older version of chromedriver and I get the same problem restarted my pc, same problem. this is the code I run to reproduce the error:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://google.com")
full error:
driver.get("https://google.com")
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get
self.execute(Command.GET, {'url': url})
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute
response = self.command_executor.execute(driver_command, params)
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute
return self._request(command_info[0], url, body=data)
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 488, in _request
resp = self._conn.getresponse()
File "C:\Python36\lib\http\client.py", line 1331, in getresponse
response.begin()
File "C:\Python36\lib\http\client.py", line 297, in begin
version, status, reason = self._read_status()
File "C:\Python36\lib\http\client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Python36\lib\socket.py", line 586, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Put in a time.sleep(3) before driver.get("https://google.com"), that will fix your error. Then if you're like me you'll get a different error.
Seth and Jack1990's answers above were helpful to me for troubleshooting the use of IEDriverServer from python. I did try Adhithiya's advice, but that did not help with my problem.
This GitHub site was VERY helpful to me. The section to pay attention to there is "Required Configuration". I had followed this first, however, in the statement, "On IE 7 or higher on Windows Vista or Windows 7, you must set the Protected Mode settings for each zone to be the same value. The value can be on or off, as long as it is the same for every zone." I found that I had to do this for Windows 10 also. In fact, the python error messages were very clear on this point. They all need to be enabled or disabled. They do NOT have to be at the same level.
Also, I did have to play around with the value of x in time.sleep(x). This sleep command is the one between driver = webdriver.Ie() and driver.get("http://testwisely.com/demo") in the code below. If set to 5 for me, the ie driver fires off a local host first and complains that it can't be reached and then it connects to the page that I wanted it to (most of the time!).
The good news is that the other 3 web browsers work great! I found that running a driver.quit() command for Chrome, Firefox and Edge (in Windows 10) webdrivers successfully shuts down those browsers, whereas the iedriver version did not shutdown IE.
My code is below in case you'd like to use it for experimentation.
from selenium import webdriver
import time
browser_to_use = "Edge" # "Chrome" "Firefox" "Ie"
if browser_to_use == "Chrome":
driver = webdriver.Chrome()
elif browser_to_use == "Firefox":
driver = webdriver.Firefox()
elif browser_to_use == "Ie": # This sucks!
driver = webdriver.Ie()
time.sleep(5)
elif browser_to_use == "Edge":
driver = webdriver.Edge()
driver.get("http://testwisely.com/demo")
time.sleep(5)
driver.quit()
Chromedriver might be running in the background, check Background processes in your task manager.
If you find more than one instance of chromedriver running, kill all the process manually and try running the program again.
You should be good to go.
Downloading an old version (3.8) also fixes the issue, but the test will run extremely slowly...
You can find the link here: http://selenium-release.storage.googleapis.com/index.html?path=3.8/
I'm trying to follow a tutorial about Selenium, http://selenium-python.readthedocs.io/getting-started.html. I've downloaded the latest version of geckodriver and copied it to /usr/local/bin. However, when I try
from selenium import webdriver
driver = webdriver.Firefox()
I get the following error message:
Traceback (most recent call last):
File "/Users/kurtpeek/Documents/Scratch/selenium_getting_started.py", line 4, in <module>
driver = webdriver.Firefox()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in __init__
keep_alive=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in execute
self.error_handler.check_response(response)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Unable to find a matching set of capabilities
[Finished in 1.2s with exit code 1]
From https://github.com/SeleniumHQ/selenium/issues/3884, it seems like other users are experiencing similar issues, but the Selenium team is unable to reproduce it. How can I get Selenium working with Firefox? (It does work with chromedriver and a webdriver.Chrome() instance, so I suspect this might be a bug in Selenium).
Updating Firefox and Selenium solved it for me. I don't pretend to have an explanation for the root cause however.
Updated Firefox 48 → 53
Updated to Selenium 3.4.1
I also reinstalled/updated Geckodriver using Homebrew and explicitly used it as an executable for Selenium WebDriver, but it turned out that it wasn't necessary to mitigate the "Unable to find matching set of capabilities" error.
I had this same issue, and the problem was related to using Firefox ESR (I'm on Debian). To be more specific, I'm on Debian 10 using 64-bit Firefox 68.11.0esr, python3.7, selenium 3.141.0, and geckodriver 0.27.0.
Here's the standard example I used that failed:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://google.com")
As recommended in this answer, I changed:
browser = webdriver.Firefox()
to
browser = webdriver.Firefox(firefox_binary="/usr/bin/firefox-esr")
and it worked.
If you don't know the path to firefox-esr, you can run sudo find / -name firefox-esr on the command line. Several should come up.
for me it was enough to just upgrade FF
Mac user here.
I fixed this issue by making sure Firefox is named "Firefox" and in the "Applications" folder. I had called it "Firefox 58" before (I have multiple versions).
Just sharing my success case here
Note: Remember the Architecture matters here, Window 64/32 or Linux 64/32. Make sure you download the right 64/32 bit Selenium Webdriver, 64/32 Geckodriver.
My configuration was as follows:
Linux: Centos 7 64bit, Window 7 64bit
Firefox: 52.0.3
Selenium Webdriver: 3.4.0 (Windows), 3.8.1 (Linux Centos)
GeckoDriver: v0.16.0 (Windows), v0.17.0 (Linux Centos)
Working Code (Without Proxy Settings)
System.setProperty("webdriver.gecko.driver", "/home/seleniumproject/geckodrivers/linux/v0.17/geckodriver");
ProfilesIni ini = new ProfilesIni();
// Change the profile name to your own. The profile name can
// be found under .mozilla folder ~/.mozilla/firefox/profile.
// See you profile.ini for the default profile name
FirefoxProfile profile = ini.getProfile("default");
DesiredCapabilities cap = new DesiredCapabilities();
cap.setAcceptInsecureCerts(true);
FirefoxBinary firefoxBinary = new FirefoxBinary();
GeckoDriverService service =new GeckoDriverService.Builder(firefoxBinary)
.usingDriverExecutable(new File("/home/seleniumproject/geckodrivers/linux/v0.17/geckodriver"))
.usingAnyFreePort()
.build();
try {
service.start();
} catch (IOException e) {
e.printStackTrace();
}
FirefoxOptions options = new FirefoxOptions().setBinary(firefoxBinary).setProfile(profile).addCapabilities(cap);
driver = new FirefoxDriver(options);
driver.get("https://www.google.com");
System.out.println("Life Title -> " + driver.getTitle());
driver.close();
Working Code (With Proxy Settings)
System.setProperty("webdriver.gecko.driver", "/home/seleniumproject/geckodrivers/linux/v0.17/geckodriver");
String PROXY = "my-proxy.co.jp";
int PORT = 8301;
ProfilesIni ini = new ProfilesIni();
// Change the profile name to your own. The profile name can
// be found under .mozilla folder ~/.mozilla/firefox/profile.
// See you profile.ini for the default profile name
FirefoxProfile profile = ini.getProfile("default");
com.google.gson.JsonObject json = new com.google.gson.JsonObject();
json.addProperty("proxyType", "manual");
json.addProperty("httpProxy", PROXY);
json.addProperty("httpProxyPort", PORT);
json.addProperty("sslProxy", PROXY);
json.addProperty("sslProxyPort", PORT);
DesiredCapabilities cap = new DesiredCapabilities();
cap.setAcceptInsecureCerts(true);
cap.setCapability("proxy", json);
FirefoxBinary firefoxBinary = new FirefoxBinary();
GeckoDriverService service =new GeckoDriverService.Builder(firefoxBinary)
.usingDriverExecutable(new File("/home/seleniumproject/geckodrivers/linux/v0.17/geckodriver"))
.usingAnyFreePort()
.usingAnyFreePort()
.build();
try {
service.start();
} catch (IOException e) {
e.printStackTrace();
}
FirefoxOptions options = new FirefoxOptions().setBinary(firefoxBinary).setProfile(profile).addCapabilities(cap);
driver = new FirefoxDriver(options);
driver.get("https://www.google.com");
System.out.println("Life Title -> " + driver.getTitle());
driver.close();
In my case, I only have Firefox Developer Edition but still throw same error.
After installing a standard Firefox version, it solves.
I had the same issue. My geckodriver was 32 bit and fireFox was 64. Resolved by updating geckodriver to 64 bit.
I had exactly the same issue when i was using selenium firefox()
>> webdriver.Firefox()
it was not working : throwing error like "Unable to find a matching set of capabilities"
Then i installed geckodriver.exe and that put that .exe file inside the both directory
C:\Users\<USER-NAME>\AppData\Local\Programs\Python\Python36\Scripts
and
C:\Users\<USER-NAME>\AppData\Local\Programs\Python\Python36\
and set these two paths in the environment setting
then it started working
Here's the solution that solved it for me. Don't overlook this point: make sure you're using the correct 32/64 bit version of the binaries - it should be uniform - e.g. if the firefox is 64bit, so must be the geckodriver.
Got the same error on a droplet at DigitalOcean - FireFox was not installed . Stack trace of error was as seen below -
exception_class
<class 'selenium.common.exceptions.SessionNotCreatedException'>
json
<module 'json' from '/usr/lib/python3.5/json/__init__.py'>
message
'Unable to find a matching set of capabilities'
response
{'status': 500,
'value': '{"value":{"error":"session not created","message":"Unable to find a '
'matching set of capabilities","stacktrace":""}}'}
screen
None
self
<selenium.webdriver.remote.errorhandler.ErrorHandler object at 0x7f428e3f10f0>
stacktrace
None
status
'session not created'
value
{'error': 'session not created',
'message': 'Unable to find a matching set of capabilities',
'stacktrace': ''}
value_json
('{"value":{"error":"session not created","message":"Unable to find a matching '
'set of capabilities","stacktrace":""}}')
It seems like different workarounds are seem to make the error go away. After ensuring you have downloaded and installed the 64bit versions for Firefox and geckodriver.exe, update the PATH with the location of the geckodriver.exe. What may also help before running the script, launch the geckodriver.exe which opens a cmd like window. Now if you run the py script, you shouldn't run into the error below:
selenium.common.exceptions.SessionNotCreatedException: Message: Unable to find a matching set of capabilities