Scrapy-Selenium: Chrome Driver does not load page

Scrapy-Selenium: Chrome Driver does not load page - python

I have two projects, one with Selenium and one using Scrapy-Selenium, which fits into a Scrapy spider program format but uses Selenium for automation.
I can get the Chromedriver to load the page I want for the basic Selenium program, but something about the second project (with Scrapy) prevents it from loading the URL. Instead it's stuck at showing data:, in the URL bar.
First project (works fine):
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path="./chromedriver")
driver.get("https://ricemedia.co")
Second project (doesn't load page):
import scrapy
from scrapy_selenium import SeleniumRequest
from selenium import webdriver
import time
class ExampleSpider(scrapy.Spider):
name = 'rice'
def start_requests(self):
yield SeleniumRequest(
url="https://ricemedia.co",
wait_time=3,
callback=self.parse
)
def parse(self, response):
driver = webdriver.Chrome(executable_path="./chromedriver")
driver.maximize_window()
time.sleep(20)
I have browsed StackOverflow and Google, and the two most common reasons are outdated Chrome Drivers and missing http in the URL. Neither is the case for me. The path to chromedriver seems alright too (these two projects are in the same folder, along with the same chromedriver). Since one works and the other doesn't, it should have something to do with my Scrapy-Selenium spider.
I should add that I have installed Scrapy, Selenium and Scrapy-Selenium locally in my virtual environment with pip, and I doubt it's an installation issue.
Please help, thanks!

You can use another method to install chrome driver:
First of all install Webdriver manager using following pip install webdriver-manager or use maven dep to get it
Then code:
# selenium 3
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())

Related

Chromedriver driver.get() working on Mac but does work on Windows (Chrome version 101.0.4951.54)

I am writing a program that uses selenium and chromedriver to load a page. The same code loads the page (nytimes.com) on my Windows computer but not on my Mac. On my Mac, it loads the webdriver with the blank data:, page but just stops and the console log just shows it waiting. I don't know why the driver does not get the page.
This is my code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
opts = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('http://nytimes.com')
html = driver.page_source
This below is the last thing it shows in the console log. It just waits there after that with the blinking cursor.
====== WebDriver manager ======
Current google-chrome version is 101.0.4951
Get LATEST chromedriver version for 101.0.4951 google-chrome
Driver [/Users/me/.wdm/drivers/chromedriver/mac64_m1/101.0.4951.41/chromedriver]
found in cache
testing.py:14: DeprecationWarning: executable_path has been deprecated, please pass
in a Service object
driver = webdriver.Chrome(ChromeDriverManager().install())
What could be the problem? I have a suspicion that it's the new version of chrome that I'm using but why would that change anything?

From the below message it seems you executed code before which installed older version of driver. Now when you trying to run code again , it detecting older version
in cache. Please check below path once.
Driver [/Users/me/.wdm/drivers/chromedriver/mac64_m1/101.0.4951.41/chromedriver]
found in cache

How to stop selenium from printing WebDriver manager startup logs?

When I'm launching a new selenium driver I get a message as:
====== WebDriver manager ======
Current chromium version is 90.0.4430
Get LATEST chromedriver version for 90.0.4430 chromium
Driver [/root/.wdm/drivers/chromedriver/linux64/90.0.4430.24/chromedriver] found in cache
I tryed using:
chrome_options.add_experimental_option("excludeSwitches", ["enable-logging"])
chrome_options.add_argument('log-level=2')
But none worked.
Is there a better way ?

To silent webdrivermanager-python logs and remove them from console, you can initialize the env variable WDM_LOG_LEVEL with 0 value before your selenium tests as follows:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import os
os.environ['WDM_LOG_LEVEL'] = '0'
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://www.google.com")

according to documents:
just add below code into your files:
import os
os.environ['WDM_LOG'] = '0'
i have tried it with myself, working very well

The log-level that you are setting for chrome_options is completely separate from the logs that you are seeing from using the external library webdrivermanager for Python. That library will have its own way of disabling log messages (or at least it should). There are other Python libraries for managing WebDriver installs, such as SeleniumBase for example. Related, you might be able to change the Python logging level to hide that message, see Dynamically changing log level without restarting the application for details.

Are you using web driver manager? it looks like that is what is giving you logs (pip install webdriver-manager) . Im using selenium without web driver manager or adding any chrome options to remove logs , and not getting any logs printed.
also see :Turning off logging in Selenium (from Python)

This worked for me for webdriver_manager v3.8.3:
from webdriver_manager.core.logger import __logger as wdm_logger
wdm_logger.setLevel(logging.WARNING)

Checking current browser url with python

I'm trying to make an app that blocks some acces to certain websites, now i'm stuck thinking how to check the current url. I've tried selenium, but that doesn't work when you change tabs, so i had to try something else. I've been thinking about a chrome addon that checks current url and sends it to my python code, but i don't know how to do it without making any additional server. Any help appreciated.

You can use selenium and web drive manager for this
Install webdrive manager first using
pip install webdrive-manager
Then input the following code
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
print (driver.current_url)

Selenium Chrome Extension Interaction

I can load a Selenium Chromedriver extension in python. But I need to login in this extension in order to be able to use it. My question is how can I interact with this extension in order to login within it? The extension namely is the "Hoxx VPN".
Until I have the following code:
chop = webdriver.ChromeOptions()
chop.add_extension("D:/01_PhD/Fogadas/chromeextension/2.2.2_0.crx")
driver = webdriver.Chrome(chrome_options=chop)

Selenium Webdriver can interact with web pages only. Previously I have also tried it but unable to succeed.
See this as reference:
https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/7805

You can use the Opera Webbrowser it's latest version i.e. 70+ having the in-built vpn which can be activated easily by selenium
Github have the required code to operate opera using selenium

Selenium Chromedriver not working in Python, Centos Cpanel Server

I installed selenium and chrome driver.
I then created the below code to run it:
from pyvirtualdisplay import Display
from selenium import webdriver
chromedriver = "/usr/bin/chromedriver"
driver = webdriver.Chrome(chromedriver)
driver.get(url)
html = driver.page_source
driver.quit()
display.stop()
The code yields the following error: "'chromedriver.exe' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home"
The path is correct. I have tried some variations of the code but they all yield the same message. My webhost claims you cant install selenium on a centos server that has cpanel. Is this true or is there something wrong with my code?

you might use OS.environ and chromeoptions

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scrapy-Selenium: Chrome Driver does not load page - python

Related

Chromedriver driver.get() working on Mac but does work on Windows (Chrome version 101.0.4951.54)

How to stop selenium from printing WebDriver manager startup logs?

Checking current browser url with python

Selenium Chrome Extension Interaction

Selenium Chromedriver not working in Python, Centos Cpanel Server

Categories

Resources