Handling "Accept all cookie" popup with selenium when selector is unknown - python

I have a python script, It look like this.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.select import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from os import path
import time
# Tried this code
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
browser = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
links = ["https://www.henleyglobal.com/", "https://markets.ft.com/data"]
for link in links:
browser.get(link)
#WebDriverWait(browser, 20).until(EC.url_changes(link))
#How do I disable/Ignore/remove/escape this "Accept all cookie" popup and then access the website to scrape data?
browser.quit()
So each website in the links array displays an "Accept all cookie" popup after navigating to the site. check the below image.
I have tried many ways nothing works, Check the one after imports
How do I exit/pass/escape this popup and then access the website to scrape data?

If you open your page in a new browser you'll note the page fully loads, then, a moment later your popup appears. The default wait strategy in selenium is just that the page is loaded.
One way to handle this is to simply inspect the page and find the xpath of the popup window. The below code should work for that.
browser.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS)
if link == 'https://www.henleyglobal.com/':
browser.findElement(By.XPATH("/html/body/div[7]/div/div/div/div[2]/div/div[2]/button[2]")).click()
else:
browser.findElement(By.XPATH("/html/body/div[4]/div/div/div[2]/div[2]/a")).click()
The code is waiting until the element of the pop-up is clickable and then clicking it.
For unknown sites you could try:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-notifications")
webdriver.Chrome(os.path.join(path, 'chromedriver'), chrome_options=chrome_options)

generally, you can not use some universal locator that will match the "Accept cookies" buttons for each and every web site in the world.
Even here, you have 2 different sites and the elements you need to click are totally different on these sites.
For https://www.henleyglobal.com/ site the correct locator may be something like this CSS Selector .confirmation button.primary-btn while for https://markets.ft.com/data site I'd advise to use CSS Selector .o-cookie-message__actions a.o-cookie-message__button.
These 2 elements are totally different: the first one is button while the second is a, they have totally different class names and all other attributes.
You may thing about the Accept text. It seems to be common, so you could use this XPath //*[contains(text(),'Accept')] but even this will not work since on the first page it matches 2 elements while the accept cookies element is the second between them...
So, there is no General locators, you will have to define separate locators for each page.
Again, for https://www.henleyglobal.com/ I would prefer
driver.find_element(By.CSS_SELECTOR, ".confirmation button.primary-btn").click()
While for the second page https://markets.ft.com/data I would prefer this
driver.find_element(By.CSS_SELECTOR, ".o-cookie-message__actions a.o-cookie-message__button").click()
Also, generally we always use WebDriverWait expected_conditions explicit waits, so the code will be as following:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
# for the first page
wait.until(EC.element_to_be_clickable((By.XPATH, ".confirmation button.primary-btn"))).click()
# for the second page
wait.until(EC.element_to_be_clickable((By.XPATH, ".o-cookie-message__actions a.o-cookie-message__button"))).click()

Related

Trying to locate an element in a webpage but getting NoSuchElementException

I am trying to get the webdriver to click a button on the site random.org The button is a generator that generates a random integer between 1 and 100. It looks like this:
After inspecting the webpage, I found the corresponding element on the webpage looks something like this:
It is inside an iframe and someone suggested that I should first switch over to that iframe to locate the element, so I incorporated that in my code but I am constantly getting NoSuchElementException error. I have attached my code and the error measage below for your reference. I can't understand why it cannot locate the button element despite referencing the ID, which is supposed to unique in the entire document.
The code:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Edge()
driver.get("https://www.random.org/")
driver.implicitly_wait(15)
driver.switch_to.frame(driver.find_element(By.TAG_NAME, "iframe"))
button = driver.find_element(By.CSS_SELECTOR, "input[id='hnbzsqjufzxezy-button']")
button.click()
The error message:
Make sure that there are no more Iframes on the page. If there are a few an not only one do this:
iframes = driver.find_elements(By.CSS, 'iframe')
// try switching to each iframe:
driver.switch_to.frame(iframes[0])
driver.switch_to.frame(iframes[1])
You can't find the button because its name contain random letters. Every time you will refresh the page you can see that the name value will change. So, do this:
button = driver.findElement(By.CSS, 'input[type="button"][value="Generate"]')
button.click()
There are several issues with your code:
First you need to close the cookies banner
The locator of the button is wrong. It's id is dynamic.
You need to use WebDriverWait to wait for elements clickability.
The following code works:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=options)
url = 'https://www.random.org/'
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[onclick*='all']"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME, "iframe")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[id*='button']"))).click()

Can't select element from page [Selenium[

I have tried to scrap info from that site - specifically, from a table. Every time I occur, info that elements doesn't exist.
https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d
I try to add time.slep(5) to my code or scrolling down function to load all element - ineffective.
Do you have any advice for me?
EDIT
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# Set drive
chrome_driver_path = r"C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path, options=chrome_options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[0]")))
print(element)
except TimeoutException as e:
print(e)
I added code in regard to your request. So my main goal is to scrap content from the table at this site. I add Explicit Waits to my code and still I can't select anything from that table - it's looking like the script doesn't see anything from that area.
One way to try solve it, its using the Xpath of the element or the relative position of the same, to make Selenium, get allways the same "line" of position to return the "value" of the information that you are searching.
Ex1:find_element(:xpath,"//*[#id="wmd-input"]")#in that case it's the input of this check box.
If it doesnt work, try this one.
Ex2: browser.implicitly_wait(30) #makes a timer to load all the informations from the web to your machine.

How can I download a image on a click event using selenium?

I want to download the image at this site https://imginn.com/p/CXVmwujLqbV/ via the button. but i always fail.
this is the code i use.
driver.find_element_by_xpath('/html/body/div[2]/div[5]/a').click()
Well,
Check this post for downloading resource. Picture has 'src' attribute in 'img' tag, that holds it.
Also, (though it just might be simplification just for this question), do not hardcode your xpath. Learn to code nicely using "Page Object Pattern".
There are several possible problems here:
You need to add a delay / wait before accessing this element.
You have to scroll the page since the element you wish to click is initially out of the view.
You should improve you locator. It's highly NOT recommended to use absolute XPaths.
This should work:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.set_window_size(1920,1080)
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get("https://imginn.com/p/CXVmwujLqbV/")
button = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.downloads a")))
time.sleep(0.5)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(1)
#actions.move_to_element(button).perform()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.downloads a"))).click()

How to find the href attribute of the videos on twitch through selenium and python?

I'm trying to find the twitch video IDs of all videos for a specific user. So for example on this page
https://www.twitch.tv/dyrus/videos/all
So here we have all videos linked, but its not quite so simple as to just scrape the html and find the links since they are generated dynamically it seems.
So I heard about selenium and did something like this:
from selenium import webdriver
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[#href]")
for link in link_element:
print(link.get_attribute('href'))
driver.close()
This returns me a bunch of links on the page but not the videos, they lie "deeper" I think, any input?
Thanks in advance
I would still suggest a couple of changes as follows:
Always open the Web Browser in maximized mode so that all/majority of the desired elements are within the Viewport.
If you are on Windows OS you need to append the extension .exe at the end of the WebDriver variant name, e.g. chromedriver.exe
While you identify for elements always try to include the class attribute in your Locator Strategy.
Always invoke driver.quit() at the end of your #Test to close & destroy the WebDriver and Web Client instances gracefully.
Here is your own code block with the above mentioned tweaks:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\path\to\chromedriver.exe')
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']")))
for link in link_elements:
print(link.get_attribute('href'))
driver.quit()
Console Output:
https://www.twitch.tv/videos/295314690
https://www.twitch.tv/videos/294901947
https://www.twitch.tv/videos/294472813
https://www.twitch.tv/videos/294075254
https://www.twitch.tv/videos/293617036
https://www.twitch.tv/videos/293236560
https://www.twitch.tv/videos/292800601
https://www.twitch.tv/videos/292409437
https://www.twitch.tv/videos/292328170
https://www.twitch.tv/videos/292032996
https://www.twitch.tv/videos/291625563
https://www.twitch.tv/videos/291192151
https://www.twitch.tv/videos/290824842
https://www.twitch.tv/videos/290434348
https://www.twitch.tv/videos/290021370
https://www.twitch.tv/videos/289561690
https://www.twitch.tv/videos/289495488
https://www.twitch.tv/videos/289138003
https://www.twitch.tv/videos/289110429
https://www.twitch.tv/videos/288804893
https://www.twitch.tv/videos/288784992
https://www.twitch.tv/videos/288687479
https://www.twitch.tv/videos/288432438
https://www.twitch.tv/videos/288117849
https://www.twitch.tv/videos/288004968
https://www.twitch.tv/videos/287689102
https://www.twitch.tv/videos/287451192
https://www.twitch.tv/videos/287267032
https://www.twitch.tv/videos/287017431
https://www.twitch.tv/videos/286819343
With your locator, you are returning every element on the page that contains an href attribute. You can be a little more specific than that and get what you are looking for. Switch to a CSS selector...
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))
for link in links:
print(link.get_attribute('href'))
driver.close()
That prints 40 links from the page.

Selenium: Dynamically loaded page causes automatic scrolling to fail

I'm using Selenium for Python to navigate a dynamic webpage (link).
I would like to automatically scroll down to a y-point on the page. I have tried various ways to code this, which seem to work on other pages, but not on this particular webpage.
Some examples of my efforts that don't seem to work for Firefox with Selenium:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://en.boerse-frankfurt.de/etp/Deka-Oekom-Euro-Nachhaltigkeit-UCITS-ETF-DE000ETFL474")
By scrolling to a y-position:
driver.execute_script("window.scrollTo(0, 1400);")
Or by finding an element:
try:
find_scrollpoint = driver.find_element_by_xpath("//*[#id='main-wrapper']/div[10]/div/div[1]/div[1]")
except:
pass
else:
scrollpoint = find_scrollpoint.location["y"]
driver.execute_script("window.scrollTo(0, scrollpoint);")
Questions:
What could be some unusual circumstances on webpages such as this, where this very typical scrollTo code may fail?
What can possibly be done to mitigate it? Perhaps keypresses to scroll down – but it would still have to know when to stop pressing down.
It is just quite a dynamic page. I would wait for this specific element to be visible and then scroll into it's view. This works for me:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get("http://en.boerse-frankfurt.de/etp/Deka-Oekom-Euro-Nachhaltigkeit-UCITS-ETF-DE000ETFL474")
wait = WebDriverWait(driver, 10)
elm = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[#id='main-wrapper']/div[10]/div/div[1]/div[1]")))
driver.execute_script("arguments[0].scrollIntoView();", elm)

Categories

Resources