Dynamic Web scraping using Selenium

Dynamic Web scraping using Selenium - python

I am trying to scrape the tables from the below dynamic webpage. I am using the below code to find the data in tables (they are under tag name tr). But I am getting empty list as output. Is there anything that I am missing here?
https://www.taipower.com.tw/tc/page.aspx?mid=206&cid=406&cchk=b6134cc6-838c-4bb9-b77a-0b0094afd49d
from selenium import webdriver
chrome_path = r"C:\Users\upko\Downloads\My Projects\Ibrahim Projects\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.taipower.com.tw/tc/page.aspx?mid=206&cid=406&cchk=b6134cc6-838c-4bb9-b77a-0b0094afd49d")
driver.find_elements_by_tag_name('tr')

Website have iframes, you need switch into desired iframe to access data. Didnt tested code, but should work
iframe = driver.find_element_by_xpath("//iframe[#id='IframeId']")
driver.switch_to_frame(iframe)
#Now you can get data
trs = driver.find_elements_by_tag_name('tr')

The desired elements are within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired visibility_of_all_elements_located.
You can use either of the following Locator Strategies:
Using XPATH:
driver.get("https://www.taipower.com.tw/tc/page.aspx?mid=206&cid=406&cchk=b6134cc6-838c-4bb9-b77a-0b0094afd49d")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"//iframe[#id='IframeId']")))
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='container-fluid']//div[#class='span6']/strong")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
['核能(Nuclear)', '燃煤(Coal)', '汽電共生(Co-Gen)', '民營電廠-燃煤(IPP-Coal)', '燃氣(LNG)', '民營電廠-燃氣(IPP-LNG)', ' 燃油(Oil)', '輕油(Diesel)', '水力(Hydro)', '風力(Wind)', '太陽能(Solar)', '抽蓄發電(Pumping Gen)']
Reference
You can find a couple of relevant discussions in:
Switch to an iframe through Selenium and python

Related

text based HTML of the element selenium python

I am actually trying to scrap a website. In fact I have a table like this below :
Table
I would like to navigate to the line that contains the word "Maître", but I am not able to find this line with selenium. Below the html code of the page :
Html code 1
Html code 2
And this is my code :
objets_de_risque = driver.find_element(By.ID,'sidebar-link-0-1')
driver.execute_script("arguments[0].click();", objets_de_risque)
code_h = driver.find_element(By.ID,'input-search-1')
code_h.send_keys(Keys.CONTROL + "a")
code_h.send_keys(Keys.DELETE)
code_h.send_keys("H1404")
code_h.send_keys(Keys.ENTER)
driver.switch_to_frame("main1")
line_maitre = driver.find_elements_by_xpath("//*[contains(text(), 'Maître')]")
driver.execute_script("arguments[0].click();", line_maitre)
The last 2 lines doesn't seem to work. Is there a way to go to this line with selenium ?
Thank you very much

find_elements() returns a list. Instead you need find_element() to locate the element.
Solution
To locate a visible element you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using XPATH:
line_maitre = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//td[text()=''Maître'']")))
However to click on it as per your code in a single line:
driver.execute_script("arguments[0].click();", WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//td[text()=''Maître'']"))))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Can't seem to select from drop down menu - Python Selenium

As titled, seems like I just can't select the drop down menu from this website no matter what.
from selenium import webdriver
from selenium.webdriver.support.ui import Select
driver=webdriver.Chrome()
driver.get('https://assessing.nashuanh.gov/search.asp')
time.sleep(1)
select=Select(driver.find_element_by_xpath('//*[#id="cboSearchType"]'))
select.select_by_value('2')

First you have to handdle the frames in your page. Also looks like there is no value 2 inside this dropdown, so you have to pass a valid value.
driver.switch_to.frame("middle")
select = Select(driver.find_element_by_xpath('//*[#id="cboSearchType"]'))
select.select_by_value('Parcel')

The <option> element with value/text as Owner within the html-select is within an <frame> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired element to be clickable.
You can use either of the following Locator Strategies:
Using CSS_SELECTOR and select_by_value():
driver.get('https://assessing.nashuanh.gov/search.asp')
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"frame[name='middle']")))
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#cboSearchType")))).select_by_value("Owner")
Using XPATH and select_by_visible_text():
driver.get('https://assessing.nashuanh.gov/search.asp')
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//frame[#name='middle']")))
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//select[#id='cboSearchType']")))).select_by_visible_text("Owner")
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:
Reference
You can find a couple of relevant discussions in:
Ways to deal with #document under iframe
Switch to an iframe through Selenium and python

Selenium Driver: How to find the element added after the webpage is loaded?

So the webpage has a button that after clicking will add an element to the webpage, in which I can't find using selenium
Some imaginary code as follows to explain the problem I experience:
from selenium import webdriver
d = webdriver.Chrome()
#Go to my target website
d.get("https://some_website_url") #ref1
#Okay now loading of the website is done. `d` will not be updated and this is the problem!!
#Click my target button and an element with id="SecretButton" is loaded.
d.find_element_by_css_selector("#secretlyupdatethewebpage").click()
#Find #SecretButton but to no avail.
#It can be found in the html panel of Chrome Developer Tools
#but cannot be found in the webdriver `d`, as `d` won't be
#updated after #ref1
d.find_element_by_css_selector("#SecretButton").click()
How can I find that #SecretButton?

To find and invoke click() on the secret button you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using ID:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "SecretButton"))).click()
Using CSS_SELECTOR:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#SecretButton"))).click()
Using XPATH:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[#id='SecretButton']"))).click()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to grab the price information from flight reservation site https://reservations.airarabia.com

I'm very new to python and trying to learn webscraping. Following a tutorial, I'm trying to extract a price from a website but nothing is being printed. What is wrong with my code?
from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = driver.find_elements_by_class_name("fare-and-services-flight-select-fare-value ng-isolate-scope")
for post in price:
print(post.text)

To print the first title you have to induce WebDriverWait for the desired visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "isa-flight-select button:first-child span.fare-and-services-flight-select-fare-value.ng-isolate-scope"))).get_attribute("innerHTML"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//isa-flight-select//following::button[contains(#class, 'button')]//span[#class='fare-and-services-flight-select-fare-value ng-isolate-scope']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output of two back to back execution:
475
You can find a relevant discussion in How to retrieve the title attribute through Selenium using Python?
Outro
As per the documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

The first reason for that is because the webpage you are trying to scrape uses javascript to load the HTML so you will need to wait until that element is present to get it using selenium's WebDriverWait
The second reason is that the find_elements_by_class_name method only accepts one class so you would need to either use find_elements_by_css_selector or find_elements_by_xpath
this is how your code should look
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = WebDriverWait(driver, 10).until(
lambda x: x.find_elements_by_css_selector(".currency-value.fare-value.ng-scope.ng-isolate-scope"))
for post in price:
print(post.get_attribute("innerText"))

Python/Selenium Click on an element in a ul

I am new to Python/Selenium (< 3 days). I am trying to click on the "grades" item of an unordered list (Please see image). I have tried find_element_by_link_text, find_element_by_css_selector but am unable to find it.
Image from inspect element
Thanks.

As per the HTML it is pretty clear that the desired field is within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired element to be clickable.
You can use either of the following Locator Strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"frameDetail")))
element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#grades")))
Using XPATH:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"frameDetail")))
element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#id='grades']")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Reference
You can find a couple of relevant discussions in:
Ways to deal with #document under iframe
Switch to an iframe through Selenium and python

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dynamic Web scraping using Selenium - python

Website have iframes, you need switch into desired iframe to access data. Didnt tested code, but should work iframe = driver.find_element_by_xpath("//iframe[#id='IframeId']") driver.switch_to_frame(iframe) #Now you can get data trs = driver.find_elements_by_tag_name('tr')

Related

text based HTML of the element selenium python

Can't seem to select from drop down menu - Python Selenium

Selenium Driver: How to find the element added after the webpage is loaded?

How to grab the price information from flight reservation site https://reservations.airarabia.com

Python/Selenium Click on an element in a ul

Categories

Resources