I have a list of websites which I am doing some testing and experimentation on. I visit a website from the list using selenium and inject a piece of JS into some of the files using MITMproxy scripting. This injected code performs some test and output the results using console.log() in JS onto the chrome console, something like
console.log(results of the injected JS)
The injection is successful and the results which I desire do appear on the Chrome Console when I run my experiment. The issue that I am facing is when I try to capture the chrome console for console.log output, it is not successful. It will capture warning and error message from chrome console but not console.log output. Currently this how I am doing it.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException, TimeoutException, StaleElementReferenceException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
option = Options()
option.add_argument('start-maximized')
# Specify the proxy
option.add_argument('--proxy-server=%s:%s' % (proxy_host, proxy_port))
# enable browser logging
d = DesiredCapabilities.CHROME
# d['loggingPrefs'] = { 'browser':'ALL' }
d['goog:loggingPrefs'] = { 'browser':'ALL' }
# Launch Chrome.
driver = webdriver.Chrome(options=option, executable_path = './chromedriver', desired_capabilities=d, service_args=["--verbose", "--log-path=./js_inject/qc1.log"])
for url in list_urls:
# Navigate to the test page
driver.get(url)
sleep(15)
# in this 15 seconds, the MITMproxy will inject the code and the injected code will output on chrome console.
for entry in driver.get_log('browser'):
print(entry)
Can anyone point me what mistake I might be making or an alternate approach to perform this task. Thank you.
P.S Pardon me for grammatical errors.
options.add_experimental_option('excludeSwitches', ['enable-logging'])
dc = DesiredCapabilities.CHROME
dc["goog:loggingPrefs"] = {"browser":"INFO"}
self.driver = webdriver.Chrome(chrome_options=options, desired_capabilities=dc)
I managed to make it work a while ago, so not sure what exactly did it, but I think it was the experimental option of "anable-logging".
Related
I am running a python script where I use the WebDriver get() method, which is known to freeze sometimes, without raising any exception.
This is a well known issue of selenium module, as you can read below:
discussion 1
discussion 2
I tried many of the solutions on the web (eg. setting the set_page_load_timeout) but had no luck.
So, my only option is to restart the script when it gets stuck on the get() method for a given amount of time.
What is the best way of doing that?
Please find a simplified code below:
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.support.ui import WebDriverWait
TIMEOUT = 10
options = ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_argument(f'user-data-dir={DATA["user_data_dir"]}')
chrome_driver = ChromeDriverManager().install()
driver = Chrome(service=Service(chrome_driver), options=options)
# driver.set_page_load_timeout(TIMEOUT)
driver.maximize_window()
wait = WebDriverWait(driver, TIMEOUT)
driver.get('https://www.stackoverflow.com/')
This code works for Windows where it launches Chrome connected via Tor. Keep in mind you have to have Tor browser running beforehand. How can I enable the user-profile and start the browser logged in? I have tried the regular method. I have only 1 profile. Default. Doesn't seem to be working. Any clues?
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
tor_proxy = "127.0.0.1:9150"
chrome_options = Options()
'''chrome_options.add_argument("--test-type")'''
chrome_options.add_argument('--ignore-certificate-errors')
'''chrome_options.add_argument('--disable-extensions')'''
chrome_options.add_argument('disable-infobars')
'''chrome_options.add_argument("--incognito")'''
chrome_options.add_argument('--user-data=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data\\Default')
chrome_options.add_argument('--proxy-server=socks5://%s' % tor_proxy)
driver = webdriver.Chrome(executable_path='C:\\chromedriver.exe', options=chrome_options)
driver.get('https://www.gmail.com')
time.sleep(4)
driver.switch_to.frame(0)
driver.find_element_by_id("introAgreeButton").click()
Use this instead.
chrome_options.add_argument("user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data")
you don't have to specify the profile directory (Default) as it is used by default if nothing is explicitly specified using below code. SO in your case use only the above line of code
chrome_options.add_argument("profile-directory=Profile 1")
Eg:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument(r"user-data-dir=C:\Users\prave\AppData\Local\Google\Chrome\User Data")
driver =webdriver.Chrome("./chromedriver.exe",options=options)
driver.get(url)
Output:
In using Chrome driver and selenium, below script is to open a page.
The annoying thing is, sometimes (not always) it keeps loading and never stop. So I added lines to force stop the loading.
Besides that, I disabled the showing of image and flash. However, these measures are not very effective. (I also uninstalled the flash on the computer. But it seems like the flash is still being shown)
Is it because the flash showing delay the page loading? If it is, what's the best way to force stop the loading, and stop the flash shown on this page? Thank you.
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
chromedriver = "D:\\Python27\\Scripts\\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
options= webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2, "plugins.plugins_disabled": ["Adobe Flash Player"]}
options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chromedriver, chrome_options =options, desired_capabilities=capa)
driver.get("https://www.investing.com/crypto/bitcoin/btc-usd")
wait = WebDriverWait(driver, 2)
wait.until(EC.presence_of_element_located((By.ID, 'topBarPopup')))
time.sleep(2)
driver.execute_script("window.stop();")
time.sleep(60)
driver.quit()
I am a web developer in Korea. We've recently been using this Python to implement the website crawl feature.
I'm new to Python. We looked for a lot of things for about two days, and we applied them. Current issues include:
Click the Excel download button to display a new window (pop up).
Clicking Download in the new window opens a new tab in the parent window and shuts down all browsers down as soon as the download starts.
Download page is PHP and data is set to Excel via header so that browser automatically recognizes download.
The problem is that the browser has shut down and the download is not complete, nor is the file saved.
I used the following source code.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
chrome_driver = './browser_driver/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
download_path = r"C:\Users\files"
timeout = 10
driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
driver.command_executor._commands["send_command"] = (
"POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior',
'params': {'behavior': 'allow', 'downloadPath': download_path}}
command_result = driver.execute("send_command", params)
driver.get("site_url")
#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()
driver.switch_to_window(driver.window_handles[1])
#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()
The browser itself shuts down as soon as the download is started during testing without headless mode.
The headless mode does not download the file itself.
Annotating a DevTools source related to Page.setDownloadBehavior removes the shutdown but does not change the download path.
I am not good at English, so I translated it into a translator. It's too hard because I'm a beginner. Please help me.
I just tested it with the Firefox web browser.
Firefox, unlike Chrome, shows a download window in a new form rather than a new tab, which runs an automatic download and closes the window automatically.
There is a problem here.
In fact, the download was successful even in headless mode in the Firefox.
However, the driver of the previously defined driver.get() was not recognized when the new window was closed.
import os
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
import json
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir",download_path)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","application/octet-stream, application/vnd.ms-excel")
fp.set_preference("dom.webnotifications.serviceworker.enabled",False)
fp.set_preference("dom.webnotifications.enabled",False)
timeout = 10
driver = webdriver.Firefox(executable_path=geckodriver, firefox_options=options, firefox_profile=fp)
driver.get(siteurl)
down_btn = driver.find_element_by_xpath('//*[#id="searchform"]/div/div[1]/div[6]/div/a[2]')
down_btn.click()
#down_btn Click to display a new window
#Automatic download starts in new window and closes window automatically
driver.switch_to_window(driver.window_handles[0])
#window_handles Select the main window and output the table to output an error.
print(driver.title)
Perhaps this is the same problem as the one we asked earlier.
Since the download is currently successful in the Firefox, we have written code to define a new driver and proceed with postprocessing.
Has anyone solved this problem?
I came across the same issue and I managed to solve it that way:
After you switch to the other window, you should enable the download again:
Isolate this code into a function
def enable_download_in_headless_chrome(driver, download_path):
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {
'cmd': 'Page.setDownloadBehavior',
'params': {'behavior': 'allow', 'downloadPath': download_path}
}
driver.execute("send_command", params)
Call it whenever you need to download a file from another window.
Your code will then be:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
chrome_driver = './browser_driver/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
download_path = r"C:\Users\files"
timeout = 10
driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
enable_download_in_headless_chrome(driver, download_path)
driver.get("site_url")
#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()
driver.switch_to_window(driver.window_handles[1])
enable_download_in_headless_chrome(driver, download_path) # THIS IS THE MISSING AND SUPER IMPORTANT PART
#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()
Selenium driver.get (url) wait till full page load. But a scraping page try to load some dead JS script. So my Python script wait for it and doesn't works few minutes. This problem can be on every pages of a site.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.cortinadecor.com/productos/17/estores-enrollables-screen/estores-screen-corti-3000')
# It try load: https://www.cetelem.es/eCommerceCalculadora/resources/js/eCalculadoraCetelemCombo.js
driver.find_element_by_name('ANCHO').send_keys("100")
How to limit the time wait, block AJAX load of a file, or is other way?
Also I test my script in webdriver.Chrome(), but will use PhantomJS(), or probably Firefox(). So, if some method uses a change in browser settings, then it must be universal.
When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. To make Selenium not to wait for full page load we can configure the pageLoadStrategy. pageLoadStrategy supports 3 different values as follows:
normal (full page load)
eager (interactive)
none
Here is the code block to configure the pageLoadStrategy :
Firefox :
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().FIREFOX
caps["pageLoadStrategy"] = "normal" # complete
#caps["pageLoadStrategy"] = "eager" # interactive
#caps["pageLoadStrategy"] = "none"
driver = webdriver.Firefox(desired_capabilities=caps, executable_path=r'C:\path\to\geckodriver.exe')
driver.get("http://google.com")
Chrome :
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal" # complete
#caps["pageLoadStrategy"] = "eager" # interactive
#caps["pageLoadStrategy"] = "none"
driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')
driver.get("http://google.com")
Note : pageLoadStrategy values normal, eager and none is a requirement as per WebDriver W3C Editor's Draft but pageLoadStrategy value as eager is still a WIP (Work In Progress) within ChromeDriver implementation. You can find a detailed discussion in “Eager” Page Load Strategy workaround for Chromedriver Selenium in Python
#undetected Selenium answer works well but for the chrome, part its not working use the below answer for chrome
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
browser= webdriver.Chrome(desired_capabilities=capa,executable_path='PATH',options=options)