how do i access nested html elements using selenium? - python

i am using a school class schedule website and i want to access the div element that contains info on how many seats are in a class and who is teaching it in order to scrape it. i first find the element which contains the div element i want, after that i try to find the div element i want by using xpaths. the problem i face is when i try to use either the find_element_by_xpath or find_elements_by_xpath to get the div i want i get this error:
'list' object has no attribute 'find_element_by_xpath'
is this error happening because the div element i want to find is nested? is there a way to get nested elements using a div tag?
here is the code i have currently :
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_elements_by_id('INST366')
sixteen_grid = section_container.find_element_by_xpath(".//div[#class = 'sections sixteen colgrid']").text
the info i want is this:
<div class = "sections sixteen colgrid"</div>
its currently inside this id tag:
<div id="INST366" class="course"</div>
greatly appreciated if anyone could help me out with this

From documentation of find_elements_by_id:
Returns : list of WebElement - a list with elements if any was found. An empty list if not
Which means section_container is a list. You can't call find_element_by_xpath on a list but you can on each element within the list because they are WebElement.
What says the documentation about find_element_by_id?
Returns : WebElement - the element if it was found
In this case you can use find_element_by_xpath directly. Which one you should use? Depends on your need, if need to find the first match to keep digging for information or you need to use all the matches.
After fixing that you will encounter a second problem: your information is displayed after executing javascript code when clicking on "Show Sections", so you need to do that before locating what you want. For that go get the a href and click on it.
The new code will look like this:
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_element_by_id('INST366')
section_container.find_element_by_xpath(".//a[#class='toggle-sections-link']").click()
sleep(1)
section_info = section_container.find_element_by_xpath(".//div[#class='sections sixteen colgrid']").text
driver.quit()

Related

How do I click on an item on my google search page with Selenium Python?

Good time of the day!
Faced with a seemingly simple problem,
But it’s been a while, and I’m asking for your help.
I work with Selenium on Python and I need to curse about
20 items on google search page by random request.
And I’ll give you an example of the elements below, and the bottom line is, once the elements are revealed,
Google generates new such elements
Problem:
Cannot click on the element. I will need to click on existing and then on new, generated elements in this block: click for open see blocks with element
Tried to click on xpath, having collected all the elements:
xpath = '//*[#id="qmCCY_adG4Sj3QP025p4__16"]/div/div/div[1]/div[4]'
all_elements = driver.find_element(By.XPATH, value=xpath)
for element in all_elements:
element.click()
sleep(2)
Important note!
id xpath has constantly changing and is generated by another on the google side
Tried to click on the class
class="r21Kzd"
Tried to click on the selector:
#qmCCY_adG4Sj3QP025p4__16 > div > div > div > div.wWOJcd > div.r21Kzd
Errors
This is when I try to click using xpath:
Message: no such element: Unable to locate element: {"method":"xpath","selector"://*[#id="vU-CY7u3C8PIrgTuuJH4CQ_9"]/div/div[1]/div[4]}
In other cases, the story is almost the same, the driver does not find the element and cannot click on it. Below I apply a scratch tag on which I need to click
screenshot tags on google search
Thanks for the help!
In case iDjcJe IX9Lgd wwB5gf are a fixed class name values of that element all you need is to use CSS_SELECTOR instead of CLASS_NAME with a correct syntax of CSS Selectors.
So, instead of driver.find_element(By.CLASS_NAME, "iDjcJe IX9Lgd wwB5gf") try using this:
driver.find_element(By.CSS_SELECTOR, ".iDjcJe.IX9Lgd.wwB5gf")
(dots before each class name, no spaces between them)

Python selenium wait until an element located not by xpath?

I am scraping a webpage in multiple pages, pressing certain buttons will advance to the next page, but all the pages share exactly the same url, means most elements I have scraped aren't visible until certain buttons of the same type before them are clicked, thus trying to click them will raise the following error:
ElementNotInteractableException: Message: Element <div class="answer"> could not be scrolled into view
The elements are all of the same class, and they are all children of instances of another class, each of the parent classes have multiple instances of the child class.
To find the elements, I use two times .find_elements_by_class_name() method:
lists = []
for i in Firefox.find_elements_by_class_name('parent'):
lists.append(i.find_elements_by_class_name('child')
There is exactly one element in each of the sublists that needs to be clicked, the element is determined by its attribute and identified using list indexing, so I have absolutely no idea what its xpath is.
Each of the elements aren't visible until the preceding element is clicked, so wait for them is a must to avoid ElementNotInteractableException.
I am using the following syntax:
wait = WebDriverWait(Firefox, 3)
wait.until(EC.visibility_of_element_located((By.XPATH, xpath)))
And I need to find the xpath of an element located using the aforementioned methods, unfortunately Selenium doesn't support this natively.
But I have found a trick here:
In [1]: el
Out[1]: <Element span at 0x109187f50>
In [2]: el.getroottree().getpath(el)
Out[2]: '/html/body/div/table[2]/tbody/tr[1]/td[3]/table[2]/tbody/tr/td[1]/p[4]/span'
So I think, if I can build lxml tree from selenium page source then somehow convert selenium elements into lxml elements then it can be done, though I don't know exactly how...
What about something like this (maybe you need to use contains for class):
wait.until(EC.visibility_of_element_located((By.XPATH, f'//*[#class="parent"][{index}]//*[#class="child"][{subindex}]')))

How to extract the href attribute of an element using Selenium and Python

I want to scrape the URLs within the HTML of the 'Racing-Next to Go' section of www.tab.com.au.
Here is an excerpt of the HTML:
<a ng-href="/racing/2020-07-31/MACKAY/MAC/R/8" href="/racing/2020-07-31/MACKAY/MAC/R/8"><i ng-
All I want to scrape is the last bit of that HTML which is a link, so:
/racing/2020-07-31/MACKAY/MAC/R/8
I have tried to find the element by using xpath, but I can't get the URL I need.
My code:
driver = webdriver.Firefox(executable_path=r"C:\Users\Harrison Pollock\Downloads\Python\geckodriver-v0.27.0-win64\geckodriver.exe")
driver.get('https://www.tab.com.au/')
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.text)
Probaly you want to use get_attribute insted of .text. Documentation here.
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.get_attribute("href"))
Yes, you can use getAttribute(attributeLocator) function for your requirement.
selenium.getAttribute(//xpath#href);
Specify the Xpath of the element for which you require to know the class of.
The value /racing/2020-07-31/MACKAY/MAC/R/8 within the HTML is the value of href attribute but not the innerText.
Solution
Instead of using the text attribute you need to use get_attribute("href") and the effective lines of code will be:
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.get_attribute("href"))

Python, Selenium: can't find element by xpath when ul list is too long

I'm trying to create a program extracting all persons I follow on Instagram. I'm using Python, Selenium and Chromedriver.
To do so, I first get the number of followed persons and click on the 'following' button : `
nb_abonnements = int(webdriver.find_element_by_xpath('/html/body/span[1]/section[1]/main/div[1]/header/section[1]/ul/li[3]/a/span').text)
sleep(randrange(1,3))
abonnements = webdriver.find_element_by_xpath('/html/body/span[1]/section[1]/main/div[1]/header/section[1]/ul/li[3]/a')
abonnements.click()
I then use the following code to get the followers and scroll the popup page in case I can't find one:
followers_panel = webdriver.find_element_by_xpath('/html/body/div[3]/div/div/div[2]')
while i < nb_abonnements:
try:
print(i)
followed = webdriver.find_element_by_xpath('/html/body/div[3]/div/div/div[2]/ul/div/li[{}]/div/div[2]/div/div/div/a'.format(i+1)).text
#the followeds are in an ul-list
i += 1
followed_list.append(followed)
except NoSuchElementException:
webdriver.execute_script(
"arguments[0].scrollBy(0,400)",followers_panel
)
sleep(7)
The problem is once i is at 12, the program raises the exception and scrolls. From there, he still can't find the next follower and is stuck in a loop where he does nothing but scroll. I've checked the source codeof the IG page, and it turns out the path is still good, but apparently I can't access the elements as I do anymore, probably because the ul-list in which I am accessing them has become to long (line 5 of the program).
I can't work out how to solve this. I hope you will be of some help.
UPDATE: the DOM looks like this:
html
body
span
script
...
div[3]
div
...
div
div
div[2]
ul
div
li
li
li
li
...
li
The ul is the list of the followers.
The lis contain the info i'm trying to extract (username). Even when I go go by myself on the webpage, open the popup window, scroll a little and let everything load, I can't find the element I'm looking for by typing the xpath in the search bar of the DOM manually. Although the path is correct, I can check it by looking at the DOM.
I've tried various webdrivers for selenium, currently I am using chromedriver 2.45.615291. I've also put an explicit wait to wait for the element to show (WebDriverWait(webdriver, 10).until(EC.presence_of_element_located((By.XPATH, '/html/body/div[3]/div/div/div[2]/ul/div/li[{}]/div/div[2]/div/div/div/a'.format(i+1))))), but I just get a timeout exception: selenium.common.exceptions.TimeoutException: Message:.
It just seems like once the ul list is too long (which is from the moment I've scrolled down enough to load new people), I can't access any element of the list by its XPATH, even the elements that were already loaded before I began scrolling.
Instead of using xpath for each of the child element... find the ul-list element then find all the child elements using something like : ul-list element.find_elements_by_tag_name(). Then iterate through each element in the collection & get the required text
I've foud a solution: i just access the element through the XPATH like this: find_element_by_xpath("(//*[#class='FPmhX notranslate _0imsa '])[{}]".format(i)). I don't know why it didn't work the other way, but like this it works just fine.

How to select one element with common class names in Selenium Webdriver using Python?

I'm new to Selenium Webdriver using Python.
I want to select the tag which contains 'Stats'. Tried multiple ways but failed as both the tags don't have any Id and have same class names as well.
Please help me with the codes to select the tag which contains 'Stats' using Selenium webdriver with python.
List of few Trials, Even if the result is found. I'm unable to click it and the error message is Lists cannot be clicked.
driver.find_element_by_class_name("css173kae7").click()
driver.find_elements_by_link_text("stats/dashboard").click()
driver.find_elements_by_xpath("//*[contains(text(), 'Stats')]")
I've attached the image of the Inspect element of the code, please have a look at it.
(Updated) Image
Inspect element of the "Anchor Tags" from which one needs to be selected
Use this code:
driver.find_elements_by_xpath("//*[contains(text(), 'stats')]")
if you are looking for "stats" in href attribute use below code:
driver.find_elements_by_xpath("//*[contains(#href, 'stats')]")
Keep in mind, contains is case-sensitive.
If you want to do case-insensitive search, See here.
Based on the image of the HTML you shared, I would try this XPATH locator to select that element:
driver.find_element_by_xpath(".//a[#class='css-1qdedno' and #href='/stats/dashboard']")
try this xpath
//nav//a[contains(#href,'/stats/dashboard')]
You need to select the desired element by its index value after you get all the elements in a list:
elements= driver.find_elements_by_xpath("//*[contains(text(), 'stats')]")
Then identify and access the element by its index and call the click() function:
For example if the element is at elements[0], then
elements[0].click()

Categories

Resources