Python Selenium web scraping path

Python Selenium web scraping path - python

How do I convert the above to selenium xpath?
I tried:
"//ul[#id = 'productsCatalog AND #class = 'b-catalogList_wrapper clearfix']"
but it gives an error

As per the HTML you have shared the element is a dynamic element so you have to induce WebDriverWait for the element to be visible and you can use either of the following solutions:
CSS_SELECTOR:
element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "ul.b-catalogList_wrapper.clearfix#productsCatalog")))
XPATH:
element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//ul[#class = 'b-catalogList_wrapper clearfix' and #id ='productsCatalog']")))
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

Unable to find element using Selenium in Python

I'm trying to click on a button, but selenium can't find the element.
I'm using the xpath:
//button[#title="Monitor Chart(Ctrl+Alt+G)"]
but selenium can't locate. I can find the xpath from Chrome manualy using the find(Ctrl+F) tool, as we can see in the image.
Here is my code and error. The xpath does't have an ID.
Code Snapshot:

To click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.eui-btn.eui-btn-default.eui-btn-normal[title^='Monitor Chart']"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#class='eui-btn eui-btn-default eui-btn-normal' and starts-with(#title, 'Monitor Chart')]"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
hiding_button = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[title ='Monitor Chart(Ctrl+Alt+G)']")))
hiding_button.click()

Thanks to all. But I figure it out the issue. The button was inside an iFrame. So I hadd to switch to the iFrame first.
Code used:
iframe = driver.find_element_by_id('Access_Performance_Monitor')
driver.switch_to.frame(iframe)

Can't Find Element Inside iframe

I want to get the data-sitekey, but it is inside the iframe.
I only can get the element in class="container", can't find the element insde of it.
How can I get the data-sitekey?
driver.get(url)
driver.switch_to.frame("main-iframe")
container= driver.find_element(By.CLASS_NAME, 'container')
print(container)
time.sleep(2)
captcha = driver.find_element(By.CLASS_NAME, 'g-recaptcha')
print(captcha)

The reCAPTCHA element is within an <iframe>
Solution
To extract the value of the data-sitekey attribute you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the visibility_of_element_located.
You can use either of the following locator strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#main-iframe")))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.g-recaptcha"))).get_attribute("data-sitekey"))
Using XPATH:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#id='main-iframe']"))).get_attribute("data-sitekey"))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='g-recaptcha']")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

This is how you get that information:
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options as Firefox_Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import time as t
firefox_options = Firefox_Options()
# firefox_options.add_argument("--width=1280")
# firefox_options.add_argument("--height=720")
# firefox_options.headless = True
driverService = Service('chromedriver/geckodriver')
browser = webdriver.Firefox(service=driverService, options=firefox_options)
url = 'https://premier.hkticketing.com/'
browser.get(url)
t.sleep(5)
WebDriverWait(browser, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//*[#id='main-iframe']")))
print('switched')
t.sleep(5)
element_x = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='g-recaptcha']")) )
print(element_x.get_attribute('data-sitekey'))
Result printed in terminal:
switched
6Ld38BkUAAAAAPATwit3FXvga1PI6iVTb6zgXw62
Setup is for linux/Firefox/geckodriver, but you can adapt it to your own system, just mind the imports, and the code after defining the browser.
Selenium docs: https://www.selenium.dev/documentation/

Getting id or xpath of parts of a website

I want to use selenium to login in the website https://www.winamax.es/account/login.php?redir=/apuestas-deportivas. The case is that I don't find the xpath/id/text to get te next code running successfully:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
options = webdriver.ChromeOptions()
options.add_argument("window-size=1920,1080")
options.add_argument('--disable-blink-features=AutomationControlled')
driver=webdriver.Chrome(options=options,executable_path=r"chromedriver.exe")
driver.get("https://www.winamax.es/account/login.php?redir=/apuestas-deportivas")
WebDriverWait(driver=driver, timeout=15).until(
lambda x: x.execute_script("return document.readyState === 'complete'")
)
upload_field = driver.find_element_by_xpath("//input[#type='email']")
I don't only want the specific xpath for this example, but I prefer a method to obtain the xpath or something similar to get the code working for other parts of the website

<iframe id="iframe-login" data-node="iframe" name="login" scrolling="auto" frameborder="1" style="min-height: 280px; width: 100%;"></iframe>
Your element is in an iframe. Switch to it.
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "iframe-login")))
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Full working code.
wait = WebDriverWait(driver, 10)
driver.get("https://www.winamax.es/account/login.php?redir=/apuestas-deportivas")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "iframe-login")))
upload_field = wait.until(EC.element_to_be_clickable((By.XPATH, "//input[#type='email']")))
upload_field.send_keys("stuff")

The email element is within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired element to be clickable.
You can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get("https://www.winamax.es/account/login.php?redir=/apuestas-deportivas")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iframe-login")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[type='email']"))).send_keys("scraper#stackoverflow.com")
Using XPATH:
driver.get("https://www.winamax.es/account/login.php?redir=/apuestas-deportivas")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#id='iframe-login']")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[#type='email']"))).send_keys("scraper#stackoverflow.com")
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:
Reference
You can find a couple of relevant discussions in:
Ways to deal with #document under iframe
Switch to an iframe through Selenium and python
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element while trying to click Next button with selenium
selenium in python : NoSuchElementException: Message: no such element: Unable to locate element

Selenium to automate clicking a tab using Python

I am using Selenium with Python to webscrape a page which includes JavaScript.
The racecourse result tabs towards the top of the page eg "Ludlow","Dundalk" are manually clickable but do not have any obvious hyperlinks attached to them.
...
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='C:/A38/chromedriver_win32/chromedriver.exe')
driver.implicitly_wait(30)
driver.maximize_window()
# Navigate to the application home page
driver.get("https://www.sportinglife.com/racing/results/2020-11-23")
...
This works so far. I have used BeautifulSoup to find the label names of the NewGenericTabs eg "Ludlow","Dundalk" etc. However, the following code, to attempt to automate clicking a tab, times out each time.
WebDriverWait(driver, 60).until(EC.element_to_be_clickable((By.LINK_TEXT, "Ludlow"))).click()
Would welcome any help.

The WebElement aren't <a> tags but <span> tags so By.LINK_TEXT wouldn't work.
To click on the desired elements you can use either of the following xpath based Locator Strategies:
Ludlow:
driver.get("https://www.sportinglife.com/racing/results/2020-11-23")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#mode='primary']"))).click()
driver.find_element_by_xpath("//span[#data-test-id='generic-tab' and text()='Ludlow']").click()
Dundalk:
driver.get("https://www.sportinglife.com/racing/results/2020-11-23")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#mode='primary']"))).click()
driver.find_element_by_xpath("//span[#data-test-id='generic-tab' and text()='Dundalk']").click()
Ideally, to click on the elements you need to induce WebDriverWait for the element_to_be_clickable() and you can use the following xpath based Locator Strategies:
Ludlow:
driver.get("https://www.sportinglife.com/racing/results/2020-11-23")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#mode='primary']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#data-test-id='generic-tab' and text()='Ludlow']"))).click()
Dundalk:
driver.get("https://www.sportinglife.com/racing/results/2020-11-23")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#mode='primary']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#data-test-id='generic-tab' and text()='Dundalk']"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to click the button to crawl the reviews using python

I am trying to crawl the reviews on this website: https://www.bol.com/nl/p/Matras-140x200-7-zones-koudschuim-premium-plus-tijk-15-cm-medium/9200000118425897/.
However, I have to click a button ( Toon meer) to show all the reviews.
<div class="load-more load-more--divider load-more--reviews js-review-load-more-container">
<a data-href="/nl/rnwy/productPage/reviews?productId=9200000118425896&offset=5&limit=10&loadMore=true" class="review-load-more__button js-review-load-more-button" data-test="review-load-more"><div class="css-loader css-loader--reviews"></div>
Toon meer</a>
</div>
I use the below code :
import requests
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
from datetime import datetime
start_time = datetime.now()
data = []
link = "https://www.bol.com/nl/p/Matras-140x200-7-zones-koudschuim-premium-plus-tijk-15-cm-medium/9200000118425897/"
op = webdriver.ChromeOptions()
op.add_argument('--ignore-certificate-errors')
op.add_argument('--incognito')
op.add_argument('--headless')
driver = webdriver.Chrome(executable_path='D:/Desktop/work/real/chromedriver.exe',options=op)
driver.get(link)
driver.find_element_by_css_selector('div.review-load-more__button js-review-load-more-button').click()
However, it throws an error:
No such element: Unable to locate element: {"method":"css selector","selector":"div.review-load-more__button js-review-load-more-button"} .
Is there any solution?

Css selectors cannot select an element by containing text.
Try using xpath. The last line of your script should look something like:
wait = WebDriverWait(driver, 10)
wait.until(expected_conditions.element_to_be_clickable((By.XPATH, "//a[contains(., 'Toon meer')]")).click()

To click on Toon meer you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.bol.com/nl/p/Matras-140x200-7-zones-koudschuim-premium-plus-tijk-15-cm-medium/9200000118425896/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[data-test='consent-modal-confirm-btn']>span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.review-load-more__button.js-review-load-more-button"))).click()
Using XPATH:
driver.get('https://www.bol.com/nl/p/Matras-140x200-7-zones-koudschuim-premium-plus-tijk-15-cm-medium/9200000118425896/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#data-test='consent-modal-confirm-btn']/span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='review-load-more__button js-review-load-more-button' and contains(., 'Toon meer')]"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

When you get the page a popup comes with an accept button click it and then proceed with clicking your element.
driver.get('https://www.bol.com/nl/p/Matras-140x200-7-zones-koudschuim-premium-plus-tijk-15-cm-medium/9200000118425896/')
wait=WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[#class='js-confirm-button']"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[#data-test='review-load-more']"))).click()
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium web scraping path - python

How do I convert the above to selenium xpath? I tried: "//ul[#id = 'productsCatalog AND #class = 'b-catalogList_wrapper clearfix']" but it gives an error

Related

Unable to find element using Selenium in Python

Can't Find Element Inside iframe

Getting id or xpath of parts of a website

Selenium to automate clicking a tab using Python

How to click the button to crawl the reviews using python

Categories

Resources