In using Chrome driver and selenium, below script is to open a page.
The annoying thing is, sometimes (not always) it keeps loading and never stop. So I added lines to force stop the loading.
Besides that, I disabled the showing of image and flash. However, these measures are not very effective. (I also uninstalled the flash on the computer. But it seems like the flash is still being shown)
Is it because the flash showing delay the page loading? If it is, what's the best way to force stop the loading, and stop the flash shown on this page? Thank you.
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
chromedriver = "D:\\Python27\\Scripts\\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
options= webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2, "plugins.plugins_disabled": ["Adobe Flash Player"]}
options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chromedriver, chrome_options =options, desired_capabilities=capa)
driver.get("https://www.investing.com/crypto/bitcoin/btc-usd")
wait = WebDriverWait(driver, 2)
wait.until(EC.presence_of_element_located((By.ID, 'topBarPopup')))
time.sleep(2)
driver.execute_script("window.stop();")
time.sleep(60)
driver.quit()
Related
I'm using Selenium in Python (3.11) with a Firefox (107) driver.
With the driver I navigate to a page which, after several actions, triggers an OS alert (prompting me to launch a program). When this alert pops up, the driver hangs, and only once it is closed manually does my script continue to run.
I have tried driver.quit(), as well as using
os.system("taskkill /F /pid " + str(process.ProcessId))
with the driver's PID, with no luck.
I have managed to prevent the pop-up from popping up with
options.set_preference("security.external_protocol_requires_permission", False)
but the code still hangs the same way at the point where the popup would have popped up.
I don't care whether the program launches or not, I just need my code to not require human intervention at this key point.
here is a minimal example of what I currently have:
from selenium.webdriver import ActionChains, Keys
from selenium.webdriver.firefox.options import Options
from seleniumwire import webdriver
options = Options()
options.binary_location = r'C:\Program Files\Mozilla Firefox\firefox.exe'
options.set_preference("security.external_protocol_requires_permission", False)
driver = webdriver.Firefox(options=options)
# Go to the page
driver.get(url)
user_field = driver.find_element("id", "UserName")
user_field.send_keys(username)
pass_field = driver.find_element("id", "Password")
pass_field.send_keys(password)
pass_field.send_keys(Keys.ENTER)
#this is the point where the pop up appears
reqs = driver.requests
print("Success!")
driver.quit()
There are some prefs you can try
profile = webdriver.FirefoxProfile()
profile.set_preference('dom.push.enabled', False)
# or
profile = webdriver.FirefoxProfile()
profile.set_preference('dom.webnotifications.enabled', False)
profile.set_preference('dom.webnotifications.serviceworker.enabled', False)
Have you tried setting this preference to prevent the particular popup:
profile.set_preference('browser.helperApps.neverAsk.openFile', 'typeOfFile')
# e.g. profile.set_preference('browser.helperApps.neverAsk.openFile', 'application/xml,application/octet-stream')
Or have you tried just dismissing the popup:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
....
pass_field.send_keys(Keys.ENTER)
#this is the point where the pop up appears
WebDriverWait(driver, 5).until(EC.alert_is_present).dismiss()
reqs = driver.requests
...
check this checkbox manually then open the app for every app associated to the links you use, then it will work normally.
I am starting to learn about web scraping. For practice, I am trying to get a list with all the courses name that appears in this query: "https://www.udemy.com/courses/search/?src=ukw&q=api+python" the problem is when I start the script the web does not load en eventually the windows get closed. I think maybe Udemy has some type of security for automations
This is my code:
from selenium import webdriver
import time
website = "https://www.udemy.com/courses/search/?src=ukw&q=api+python"
path = "/"
chrome_options = webdriver.ChromeOptions();
chrome_options.add_experimental_option("excludeSwitches", ['enable-logging'])
driver = webdriver.Chrome(options=chrome_options);
driver.get(website)
time.sleep(5)
matches = driver.find_elements_by_tag_name("h3")
The reason behind udemy website not loading completely may be due to the fact that Selenium driven ChromeDriver initiated Chrome Browser gets detected as a bot and further navigation is getting blocked.
Solution
An easier hack to evade the detection would be to add the following argument:
--disable-blink-features=AutomationControlled
So effectively your code block will be:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get('https://www.udemy.com/courses/search/?src=ukw&q=api+python')
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'results for')]")))
driver.save_screenshot("udemy.png")
Saved Screenshot:
I am sucesfully able to load my chrome profile using the flags:
user-data-dir as well as profile-directory, yet once the profile is loaded and the chrome window is actually open, no webpage appears. It simply gets stuck on a blank screen.
When I remove the code for the profile it is actually able to open the webpage stored in the login-url variable.
Tried updating to latest version of chrome (94.0.4606.81) and I also used the exact steps listed here to ensure I have the right chrome driver version.
I also did the obvious like making sure there are not any instances of chrome running in the background.
Code is as follows:
import os
from os.path import exists
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
headless = False
login_url = "https://google.com)"
def startChrome():
global headless
try:
chrome_options = Options()
if headless:
chrome_options.add_argument("--headless")
chrome_options.add_argument("----user-data-dir=C:/Users/ERIK/AppData/Local/Google/Chrome/User Data")
chrome_options.add_argument("--profile-directory=Profile 1")
global driver
driver = webdriver.Chrome(path+"/chromedriver.exe", options=chrome_options)
except:
print("Failed to start Chrome!")
input()
exit()
startChrome()
driver.get(login_url)
input()
The following successfully opens google.com for me.
Selenium Version 3.141.0
ChromeDriver Version 94.0.4606.61
Chrome Version 94.0.4606.71
from os import path
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
def getDriver(profile_directory, headless = False):
chrome_options = Options()
if headless:
chrome_options.add_argument("--headless")
userDataDir = path.expandvars(r"%LOCALAPPDATA%\Google\Chrome\User Data")
chrome_options.add_argument(f"--user-data-dir={userDataDir}")
chrome_options.add_argument(f"--profile-directory={profile_directory}")
return webdriver.Chrome("./chromedriver.exe", options=chrome_options)
driver = getDriver("Profile 2")
driver.get("https://google.com")
I have created a python script to input in temperature by itself into a website, and it works locally. When I deployed it into Heroku, it worked as well. I also scheduled my app to run once in the morning and once in the afternoon, as per required.
from selenium import webdriver
import os
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
import time
chrome_options = webdriver.ChromeOptions()
chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
browser = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
browser.get('https://temptaking.ado.sg/group/e306686f4e962fec4c8b20ea8e60d1fe')
select = Select(browser.find_element_by_id('member-select'))
select.select_by_value('79563')
browser.find_element_by_id("ep1").send_keys("2084")
browser.find_element_by_id("td1").send_keys("36")
browser.find_element_by_id("td3").send_keys("5")
browser.find_element_by_class_name("btn")
submit = browser.find_element_by_class_name("btn-warning")
submit.click()
time.sleep(5)
subbmit = browser.find_element_by_id("submit-temp-btn")
subbmit.click()
browser.close()
However, there was an issue. The website has an AM and PM dropdown, which automatically changes (PM after 12pm). However, when I ran the script in the afternoon, the AM dropdown was still chosen.
I then tried to input in a couple more lines to choose the PM option from a dropdown if its not 8am, which is when I scheduled my app to run for the morning, using the following lines.
hour = dt.datetime.now().hour
if hour == 7:
select2 = Select(browser.find_element_by_id('meridies-input'))
time.sleep(3)
select2.select_by_value('AM')
select2.select_
else:
select2 = Select(browser.find_element_by_id('meridies-input'))
time.sleep(3)
select2.select_by_value('PM')
However, once I pushed this into heroku and tried to run the script, I'm met with the NoSuchElementException error, where it can't locate the value "PM". I have tried to use index, select_by_visible_Text, even xpath as well, all gave similar errors, not being able to locate ___. I have included the time.sleep as I thought it was having issues loading but to no avail. What baffles me is that I have used a select from a dropdown in another part of the code but that has no issues, just this specific block.
I've been set back by this small problem, and it's bugging me so much. Please help.
Instead of headless use virtual display. (might your element depends on javascript)
use start() display before initialing chrome after all operation stop() display.
from pyvirtualdisplay import Display
from selenium import webdriver
import os
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
import time
display = Display(visible=False, size=(800, 600))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--no-sandbox')
browser = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
# your code
#########
# at end stop display
display.stop()
Instead of depending on the AM / PM default setting of the application server you should handle the selection of AM / PM on you own using the datetime module as follows:
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://temptaking.ado.sg/group/e306686f4e962fec4c8b20ea8e60d1fe')
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//select[#id='meridies-input']")))).select_by_visible_text((datetime.now()).strftime("%p"))
I am a web developer in Korea. We've recently been using this Python to implement the website crawl feature.
I'm new to Python. We looked for a lot of things for about two days, and we applied them. Current issues include:
Click the Excel download button to display a new window (pop up).
Clicking Download in the new window opens a new tab in the parent window and shuts down all browsers down as soon as the download starts.
Download page is PHP and data is set to Excel via header so that browser automatically recognizes download.
The problem is that the browser has shut down and the download is not complete, nor is the file saved.
I used the following source code.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
chrome_driver = './browser_driver/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
download_path = r"C:\Users\files"
timeout = 10
driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
driver.command_executor._commands["send_command"] = (
"POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior',
'params': {'behavior': 'allow', 'downloadPath': download_path}}
command_result = driver.execute("send_command", params)
driver.get("site_url")
#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()
driver.switch_to_window(driver.window_handles[1])
#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()
The browser itself shuts down as soon as the download is started during testing without headless mode.
The headless mode does not download the file itself.
Annotating a DevTools source related to Page.setDownloadBehavior removes the shutdown but does not change the download path.
I am not good at English, so I translated it into a translator. It's too hard because I'm a beginner. Please help me.
I just tested it with the Firefox web browser.
Firefox, unlike Chrome, shows a download window in a new form rather than a new tab, which runs an automatic download and closes the window automatically.
There is a problem here.
In fact, the download was successful even in headless mode in the Firefox.
However, the driver of the previously defined driver.get() was not recognized when the new window was closed.
import os
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
import json
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir",download_path)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","application/octet-stream, application/vnd.ms-excel")
fp.set_preference("dom.webnotifications.serviceworker.enabled",False)
fp.set_preference("dom.webnotifications.enabled",False)
timeout = 10
driver = webdriver.Firefox(executable_path=geckodriver, firefox_options=options, firefox_profile=fp)
driver.get(siteurl)
down_btn = driver.find_element_by_xpath('//*[#id="searchform"]/div/div[1]/div[6]/div/a[2]')
down_btn.click()
#down_btn Click to display a new window
#Automatic download starts in new window and closes window automatically
driver.switch_to_window(driver.window_handles[0])
#window_handles Select the main window and output the table to output an error.
print(driver.title)
Perhaps this is the same problem as the one we asked earlier.
Since the download is currently successful in the Firefox, we have written code to define a new driver and proceed with postprocessing.
Has anyone solved this problem?
I came across the same issue and I managed to solve it that way:
After you switch to the other window, you should enable the download again:
Isolate this code into a function
def enable_download_in_headless_chrome(driver, download_path):
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {
'cmd': 'Page.setDownloadBehavior',
'params': {'behavior': 'allow', 'downloadPath': download_path}
}
driver.execute("send_command", params)
Call it whenever you need to download a file from another window.
Your code will then be:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
chrome_driver = './browser_driver/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
download_path = r"C:\Users\files"
timeout = 10
driver = webdriver.Chrome(executable_path=chrome_driver, chrome_options=options)
enable_download_in_headless_chrome(driver, download_path)
driver.get("site_url")
#download new window
down_xls_btn = driver.find_element_by_id("download")
down_xls_btn.click()
driver.switch_to_window(driver.window_handles[1])
enable_download_in_headless_chrome(driver, download_path) # THIS IS THE MISSING AND SUPER IMPORTANT PART
#download start
down_xls_btn = driver.find_element_by_id("download2")
down_xls_btn.click()