I'm trying to automate a process using selenium. Everything works perfect but the site has anti-bot methods in place which blocks my selenium script. To solve this I came across a python module called selenium-stealth. This does some stuff that avoid those anti bots. It works but the problem is that this only works on the orignal tab that gets opened in the first go. Any new tabs in that same browser doesn't have this stealth. Is there a way to add this stealth to every tab.
Here's a demo code to reproduce the stealth not working on multiple tabs:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
import time
options = webdriver.ChromeOptions()
options.add_argument("--log-level=3")
options.add_argument("start-maximized")
options.add_argument("--mute-audio")
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
options.binary_location = "C:\\Program Files\\Google\\Chrome Beta\\Application\\chrome.exe"
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
bot = webdriver.Chrome(service=Service("chromedriver.exe"), options=options)
stealth(bot,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
bot.get("https://infosimples.github.io/detect-headless/")
time.sleep(5)
bot.execute_script('''window.open("https://infosimples.github.io/detect-headless/","_blank");''')
time.sleep(20)
bot.quit()
Outputs:
Main Tab:
2nd Tab:
As you can see, the first tab passes everything but the 2nd tab for some reason doesn't get the stealth. What could be the reason and any way to make this work?
The fact that selenium-stealth settings aren't effective within the adjascent tabs is known issue for almost a year now.
You can find the detailed discussion within the selenium-stealth repository in Settings doesn't apply on all tabs (but only on the first one)
Related
I wanted to build a semi-automatic solution for scraping a website protected by Cloudflare's hcaptcha. I thought that I could solve captcha manually whenever it appears and then let my scraper scrape the website for some time until another captcha must be solved.
To try out my solution I open the url with Selenium while trying to mask it as a regular user:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium_stealth import stealth
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=options)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get(url_to_scrape) # Fill the captcha manually
I would want to get to the actual website after solving the captcha so I can scrape some info from it. The problem is, even when I solve the captcha, Cloudflare doesn't let me see the site, it just refreshes the site with the captcha (with response 403) and makes me solve another one, then another, and another, etc.
What am I doing wrong? There shouldn't be any problem with me solving the captcha so it must somehow detect Selenium as a bot. I thought that with the snippet used above the website doesn't see Selenium any different than a normal user with Chrome web browser but surely I'm missing something.
Without the site url it is impossible to tell exactly what is happening, although from previous experience I believe, the Hcaptcha prompt is probably appearing as a result of the site protection and may not be on the site itself.
If its appearing as a result of the site protection then start you browser using your profile.
$browser = Start-SeDriver -Browser Chrome -Arguments "--user-data-dir=C:\Users\$($env:username)\AppData\Local\Google\Chrome\User Data"
$browser.Navigate().GoToURL("https://google.com")
....then run the remaining part of your code to scrape the site.
My OS → Microsoft Windows 11
GOOGLE CHROME:
I have Google website open and I want to open the Stack Overflow website in a new tab but the screen keeps showing the Google website, like this:
My first attempt was trying it with the webbrowser module and its autoraise argument:
sof = 'https://stackoverflow.com'
webbrowser.open(sof, new=0, autoraise=False)
webbrowser.open(sof, new=2, autoraise=False)
webbrowser.open_new_tab(sof)
None of the above options caused the tab in Chrome to open in the background keeping focus on the tab that was already open.
So I went for another try using subprocess and its getoutput function:
r = subprocess.getoutput(f"google-chrome-stable https://stackoverflow.com")
r
That option didn't even open a new tab in my browser.
MOZILLA FIREFOX:
My attempt was trying it with the webbrowser module and its autoraise argument (As my default browser is different I need to set the browser):
sof = 'https://stackoverflow.com'
webbrowser.register('firefox',
None,
webbrowser.BackgroundBrowser("C://Program Files//Mozilla Firefox//firefox.exe"))
webbrowser.get('firefox').open(sof, new=0, autoraise=False)
In neither of the two I managed to make this functionality work.
How should I proceed?
Chrome:
I don't think it is feasible (at least not w/ chrome).
See this StackExchange answer for details. Especially the mentioned bug that most likely will never get fixed.
Firefox:
Same here, did some research and the only solution to get it to work is changing the config option
'browser.tabs.loadDivertedInBackground' to 'true'
launch background tab like this (or from py with os or subprocess module):
"C:\Program Files\Mozilla Firefox\firefox.exe" -new-tab "https://stackoverflow.com/"
See https://stackoverflow.com/a/2276679/2606766. But again I don't think this solves your problem, does it?
maybe you can try to stimulate the keyboard using pynput library,
then stimulating crtl + Tab to change to the new open website?
*edit: to open the previous tab, press crtl + shift + tab
import webbrowser, time
from pynput.keyboard import Key,Controller
keyboard = Controller()
webbrowser.open("https://www.youtube.com/")
time.sleep(3)
keyboard.press(Key.ctrl)
keyboard.press(Key.shift)
keyboard.press(Key.tab)
keyboard.release(Key.ctrl)
keyboard.release(Key.shift)
keyboard.release(Key.tab)
Are you familiar with CDP and Selenium?
Option A:
CDP Via Selenium Controlled browser:
from selenium import webdriver
driver = webdriver.Chrome('/path/bin/chromedriver')
driver.get("https://example.com/")
driver.execute_cdp_cmd(cmd="Target.createTarget",cmd_args={"url": 'https://stackoverflow.com/', "background": True})
"background": True is key
EDIT:
On linux the browser doesn't close, at least for me.
If it dies when the code dies, try the following:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
CHROME_DRIVER_PATH = '/bin/chromedriver'
chrome_options = Options()
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option("detach", True)
driver = webdriver.Chrome(CHROME_DRIVER_PATH, chrome_options=chrome_options)
driver.get("https://example.com/")
driver.execute_cdp_cmd(cmd="Target.createTarget",cmd_args={"url": 'https://stackoverflow.com/', "background": True})
Option B:
Manually run chrome with a debug port (via cmd, subprocess.popen or anything else)
chrome.exe --remote-debugging-port=4455
and then either use a python CDP Client such as trio
or tell selenium to use your existing browser:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:4455")
# Change chrome driver path accordingly
CHROME_DRIVER_PATH= r"C:\chromedriver.exe"
driver = webdriver.Chrome(CHROME_DRIVER_PATH, chrome_options=chrome_options)
driver.get("https://example.com/")
driver.execute_cdp_cmd(cmd="Target.createTarget",cmd_args={"url": 'https://stackoverflow.com/', "background": True})
Simpliest is to switch to -1 window_handles with chromedriver
from selenium import webdriver
driver = webdriver.Chrome('chrome/driver/path')
driver.switch_to.window(driver.window_handles[-1])
I am using selenium python and chrome driver and wanted to know if there is any way you can get the command to enable or disable options in add_arguments() function. For example, there are '--disable-infobars', etc., but if I come across a new setting, how do I find its appropriate command?
An example being the settings to auto-download pdfs.
Any help is appreciated.
Chromium has lots of command switches, such as --disable-extensions or --disable-popup-blocking that can be enabled at runtime using Options().add_argument()
Here is a list of some of the Chromium Command Line Switches.
Chromium also allows for other runtime enabled features, such as useAutomationExtension or plugins.always_open_pdf_externally. These features are enabled using DesiredCapabilities.
I normal review the source code for Chromium when I need to find find other features to control with DesiredCapabilities.
The code below uses both command switches and runtime enabled features to automatically save a PDF file to disk without being prompted.
For my answer I downloaded a PDF file from the Library of Congress.
If you have any questions related to this code or something else related to your question please let me know.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
chrome_options = Options()
chrome_options.add_argument('--disable-infobars')
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--disable-popup-blocking')
# disable the banner "Chrome is being controlled by automated test software"
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])
# you can set the path for your download_directory
prefs = {
'download.default_directory': 'download_directory',
'download.prompt_for_download': False,
'plugins.always_open_pdf_externally': True
}
capabilities = DesiredCapabilities().CHROME
chrome_options.add_experimental_option('prefs', prefs)
capabilities.update(chrome_options.to_capabilities())
driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
url_main = 'https://www.loc.gov/aba/publications/FreeLCC/freelcc.html'
driver.get(url_main)
driver.implicitly_wait(20)
download_pdf_file = driver.find_element_by_xpath('//*[#id="main_body"]/ul[2]/li[1]/a')
download_pdf_file.click()
You can add options arguments to a chromium webdriver using Python the following way:
options = webdriver.ChromeOptions()
# Arguments go below
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=800,600")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--user-agent={}".format("your user agent string"))
# Etc etc..
options.binary_location = "absolute/path/to/chrome.exe"
driver = webdriver.Chrome(
desired_capabilities=caps,
executable_path="absolute/path/to/chromium-driver.exe",
options=options,
)
Here you can find the list of all the supported arguments for chrome.
I am currently trying to code a basic smartmirror for my coding II class in high school with python. One thing I'm trying to do is open new tabs in full screen (using chrome). I currently have it so I can open url's, but I am not getting them in full screen. Any ideas on code I can use to open chrome in full screen?
If you're using selenium, just code like below:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://google.com')
driver.maximize_window()
As suggested, selenium is a good way to accomplish your task.
In order to have it full-screen and not only maximized I would use:
chrome_options.add_argument("--start-fullscreen");
or
chrome_options.add_argument("--kiosk");
First option emulates the F11 pressure and you can exit pressing F11. The second one turns your chrome in "kiosk" mode and you can exit pressing ALT+F4.
Other interesting flags are:
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
Those will remove the top bar exposed by the chrome driver saying it is a dev chrome version.
The complete script is:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
# chrome_options.add_argument("--start-fullscreen");
chrome_options.add_argument("--kiosk");
driver = webdriver.Chrome(executable_path=rel("path/to/chromedriver"),
chrome_options=chrome_options)
driver.get('https://www.google.com')
"path/to/chromedriver" should point to the chrome driver compatible with your chrome version downloaded from here
I'm new to this forum as well as to programming and selenium.
I'm trying to maximize my chrome browser using selenium python(using the below code) and everytime selenium opens the browser it would not maximize the whole window.It maximizes only half the screen with "data:," already filled on the address bar.
self.driver = webdriver.Chrome()
self.driver.maximize_window()
I also tried
options=webdriver.ChromeOptions()
options.add_argument("--start-maximized")
driver=webdriver.Chrome(chrome_options=options)
but doesn't help
Just looked at my code and noticed that my import is
from selenium.webdriver.chrome.options import Options
And the rest of the code is
options = Options()
options.add_argument("--start-maximized")
driver=webdriver.Chrome(chrome_options=options)
This does work
There is an open issue in ChromeDriver which is specific to Mac OS X.
What I remember helped was to first set the dimensions explicitly and then maximize:
driver.set_window_size(1200, 800)
driver.maximize_window()