Selenium: Dynamically loaded page causes automatic scrolling to fail - python

I'm using Selenium for Python to navigate a dynamic webpage (link).
I would like to automatically scroll down to a y-point on the page. I have tried various ways to code this, which seem to work on other pages, but not on this particular webpage.
Some examples of my efforts that don't seem to work for Firefox with Selenium:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://en.boerse-frankfurt.de/etp/Deka-Oekom-Euro-Nachhaltigkeit-UCITS-ETF-DE000ETFL474")
By scrolling to a y-position:
driver.execute_script("window.scrollTo(0, 1400);")
Or by finding an element:
try:
find_scrollpoint = driver.find_element_by_xpath("//*[#id='main-wrapper']/div[10]/div/div[1]/div[1]")
except:
pass
else:
scrollpoint = find_scrollpoint.location["y"]
driver.execute_script("window.scrollTo(0, scrollpoint);")
Questions:
What could be some unusual circumstances on webpages such as this, where this very typical scrollTo code may fail?
What can possibly be done to mitigate it? Perhaps keypresses to scroll down – but it would still have to know when to stop pressing down.

It is just quite a dynamic page. I would wait for this specific element to be visible and then scroll into it's view. This works for me:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get("http://en.boerse-frankfurt.de/etp/Deka-Oekom-Euro-Nachhaltigkeit-UCITS-ETF-DE000ETFL474")
wait = WebDriverWait(driver, 10)
elm = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[#id='main-wrapper']/div[10]/div/div[1]/div[1]")))
driver.execute_script("arguments[0].scrollIntoView();", elm)

Related

Handling "Accept all cookie" popup with selenium when selector is unknown

I have a python script, It look like this.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.select import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from os import path
import time
# Tried this code
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
browser = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
links = ["https://www.henleyglobal.com/", "https://markets.ft.com/data"]
for link in links:
browser.get(link)
#WebDriverWait(browser, 20).until(EC.url_changes(link))
#How do I disable/Ignore/remove/escape this "Accept all cookie" popup and then access the website to scrape data?
browser.quit()
So each website in the links array displays an "Accept all cookie" popup after navigating to the site. check the below image.
I have tried many ways nothing works, Check the one after imports
How do I exit/pass/escape this popup and then access the website to scrape data?
If you open your page in a new browser you'll note the page fully loads, then, a moment later your popup appears. The default wait strategy in selenium is just that the page is loaded.
One way to handle this is to simply inspect the page and find the xpath of the popup window. The below code should work for that.
browser.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS)
if link == 'https://www.henleyglobal.com/':
browser.findElement(By.XPATH("/html/body/div[7]/div/div/div/div[2]/div/div[2]/button[2]")).click()
else:
browser.findElement(By.XPATH("/html/body/div[4]/div/div/div[2]/div[2]/a")).click()
The code is waiting until the element of the pop-up is clickable and then clicking it.
For unknown sites you could try:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-notifications")
webdriver.Chrome(os.path.join(path, 'chromedriver'), chrome_options=chrome_options)
generally, you can not use some universal locator that will match the "Accept cookies" buttons for each and every web site in the world.
Even here, you have 2 different sites and the elements you need to click are totally different on these sites.
For https://www.henleyglobal.com/ site the correct locator may be something like this CSS Selector .confirmation button.primary-btn while for https://markets.ft.com/data site I'd advise to use CSS Selector .o-cookie-message__actions a.o-cookie-message__button.
These 2 elements are totally different: the first one is button while the second is a, they have totally different class names and all other attributes.
You may thing about the Accept text. It seems to be common, so you could use this XPath //*[contains(text(),'Accept')] but even this will not work since on the first page it matches 2 elements while the accept cookies element is the second between them...
So, there is no General locators, you will have to define separate locators for each page.
Again, for https://www.henleyglobal.com/ I would prefer
driver.find_element(By.CSS_SELECTOR, ".confirmation button.primary-btn").click()
While for the second page https://markets.ft.com/data I would prefer this
driver.find_element(By.CSS_SELECTOR, ".o-cookie-message__actions a.o-cookie-message__button").click()
Also, generally we always use WebDriverWait expected_conditions explicit waits, so the code will be as following:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
# for the first page
wait.until(EC.element_to_be_clickable((By.XPATH, ".confirmation button.primary-btn"))).click()
# for the second page
wait.until(EC.element_to_be_clickable((By.XPATH, ".o-cookie-message__actions a.o-cookie-message__button"))).click()

How can I download a image on a click event using selenium?

I want to download the image at this site https://imginn.com/p/CXVmwujLqbV/ via the button. but i always fail.
this is the code i use.
driver.find_element_by_xpath('/html/body/div[2]/div[5]/a').click()
Well,
Check this post for downloading resource. Picture has 'src' attribute in 'img' tag, that holds it.
Also, (though it just might be simplification just for this question), do not hardcode your xpath. Learn to code nicely using "Page Object Pattern".
There are several possible problems here:
You need to add a delay / wait before accessing this element.
You have to scroll the page since the element you wish to click is initially out of the view.
You should improve you locator. It's highly NOT recommended to use absolute XPaths.
This should work:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.set_window_size(1920,1080)
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get("https://imginn.com/p/CXVmwujLqbV/")
button = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.downloads a")))
time.sleep(0.5)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(1)
#actions.move_to_element(button).perform()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.downloads a"))).click()

Can you help me to get Selenium to click image in Chromedriver?

I am trying to do a tutorial and learn Selenium in python however i cant seem to get Selenium to click the "Checkout" button using "element_to_be_clickable((By.XPATH".
I am using:
Python v3.9
Chrome v87
This is the URL i am practicing on:
https://www.aria.co.uk/myAria/ShoppingBasket
And this is my current code for the clicking:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
# Open Chromedriver
driver = webdriver.Chrome(r"C:\Users\Ste1337\Desktop\chromedriver\chromedriver.exe")
# Open webpage
driver.get("https://www.aria.co.uk/SuperSpecials/Other+products/ASUS+ROG+Pugio+2+Wireless+Optical+RGB+Gaming+Mouse?productId=72427")
#https://www.aria.co.uk/Products/Components/Graphics+Cards/NVIDIA+GeForce/GeForce+RTX+3060+Ti/Palit+GeForce+RTX+3060+Ti+Dual+8GB+GPU?productId=73054
# Click "Add to Basket" or refresh page if out of stock
try:
element = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.XPATH, "Out of Stock!")))
time.sleep(5)
browser.refresh()
except:
button = driver.find_element_by_id("addQuantityButton")
button.click()
basket = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.ID, "basketContent")))
basket.click()
checkout = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH("//img[contains(#src,'/static/images/checkoutv2.png.png')]"))).click()
I can see your xpath is not correct.
Your Xpath should be.
//img[contains(#src,'/static/images/checkoutv2.png')]
Your code should be.
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//img[contains(#src,'/static/images/checkoutv2.png')]"))).click()
The link you provided contains hCaptcha, which is actually responsible to check whether you are a human being or a bot. I guess that it's also the reason, why you can't click any of the items on the page, because Selenium actually is nothing less than a bot.
You first have to pass the test by clicking on the images, which are asked for.

Webdriver selenium - can't find any element from TradingView

I'm new to using Selenium but I watched enough videos and followed enough articles to know something is missing. I'm trying to get values from TradingView but the problem I'm running into is that I simply can't find any of the elements, not by Xpath or Css. I went ahead and tried to do a simple visibility element test as shown in the code below and to my surprise it times out.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
# Stops the UI interface (chrome browser) from popping up
# chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='c:\se\chromedriver.exe', options=chrome_options)
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
driver.get("https://www.tradingview.com/chart/")
timeout = 20
try:
WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]")))
print("Page loaded")
except TimeoutException:
print("Timed out waiting for page to load")
driver.quit()
I tried to click on one of the chart buttons too using the following and that doesn't work either. I noticed that unlike many other websites for Tradingview the elements don't have names and don't generate a relative path (only full) using Xpath.
driver.find.element_by_xpath('/html/body/div[2]/div[5]/div/div[2]/div/div/div/div/div[4]').click()
Any help is greatly appreciated!
I think there must be an issue with xpath.
When I try to click the AAPL button it is working for me.
The xpath I used is:
(//div[contains(text(),'AAPL')])[1]
If you specify exactly which element to be clicked I will try.
And also be familiar with the concept of frames because these type of websites has lot of frames in it.

Can't get rid of hardcoded delay even when Explicit Wait is already there

I've written some code in python in combination with selenium to parse the different questions from quora.com. My scraper is doing it's job at this moment. The thing is I've used here hardcoded delay for the scraper to work, even when Explicit Wait has already been defined. As the page is an infinite scrolling one, i tried to make the scrolling process to a limited number. Now, I have got two questions:
Why wait.until(EC.staleness_of(page)) is not working within my scraper. It is commented out now.
If i use something else instead of page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link"))) the scraper throws an error: can't focus element.
Btw, I do not wish to go for page = driver.find_element_by_tag_name('body') this option.
Here is what I've written so far:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
for scroll in range(10):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
# wait.until(EC.staleness_of(page))
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
You can try below code to get as much XHR as possible and then parse the page:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
links_counter = len(wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "question_link"))))
while True:
page.send_keys(Keys.END)
try:
wait.until(lambda driver: len(driver.find_elements_by_class_name("question_link")) > links_counter)
links_counter = len(driver.find_elements_by_class_name("question_link"))
except TimeoutException:
break
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
Here we scroll page down and wait up to 10 seconds for more links to be loaded or break the while loop if the number of links remains the same
As for your questions:
wait.until(EC.staleness_of(page)) is not working because when you scroll page down you don't get the new DOM - you just make XHR which adds more links into existed DOM, so the first link (page) will not be stale in this case
(I'm not quite confident about this, but...) I guess you can send keys only to nodes that can be focused (user can set focus manually), e.g. links, input fields, textareas, buttons..., but not content division (div), paragraphs (p), etc

Categories

Resources