I am using Selenium for Python to scrape a site with multiple pages. To get to the next page, I use driver.find_element(By.XPATH, xpath). However, The xpath text changes. So, instead, I want to use other attributes.
I tried to find by class, using "page-link": driver.find_element(By.CLASS_NAME, "page-link". However, the "page-link" class is also present in the disabled list item. As a result, the Selenium driver won't stop after the last page, in this case page 2.
I want to stop the driver clicking the disabled item on the page, i.e. I want it to ignore the last item in the list, the one with "page-item disabled", aria-disabled="true" and aria-hidden="true". The idea is that if the script can't find that item, it will end a while loop that relies on the ">" button to be enabled.
See the source code below.
Please advise.
<nav>
<ul class="pagination">
<li class="page-item">
<a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&todate=2023-02-28&filterByMemberId=&view=View%20Report&page=1" rel="prev" aria-label="« Previous">‹</a>
</li>
<li class="page-item">
<a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&todate=2023-02-28&filterByMemberId=&view=View%20Report&page=1">1</a>
</li>
<li class="page-item active" aria-current="page">
<span class="page-link">2</span>
</li>
<li class="page-item disabled" aria-disabled="true" aria-label="Next »">
<span class="page-link" aria-hidden="true">›</span>
</li>
</ul>
</nav>
To go to the Next Page there can be a couple of approaches:
You can opt to find_element() and click it's descendant <span> of the <li> with aria-label="Next »" but doesn't contains aria-disabled="true" as follows:
driver.find_element(By.XPATH, "//li[starts-with(#aria-label, 'Next') and not(#aria-disabled='true')]/span").click()
Context
While trying to click a delete button belonging to a GitHub personal access token (PAT) with a certain description, using Selenium in Python. I am able to find the description and the ID of the PAT. However the button itself does not contain any reference to the id. Only the form that is spawned after clicking the button contains that reference. So to find out how to click the right button, I thought I would be able to find the button within the <div id="access-token-836771760" class="access-token js-revoke-item ".. element. However, most solutions that are able to search elements within elements, require one to know the xpath of this entry. I do not know the xpath of the parent element, because I find this element based on the token description. Apparently it is not practical to get the xpath of an element, once you have the element in Selenium.
HTML Code
<div class="listgroup">
<div id="access-token-836771760" class="access-token js-revoke-item " data-id="836771760" data-type="token">
<div class="listgroup-item">
<div class="d-flex float-right">
<details class="ml-2 details-reset details-overlay details-overlay-dark">
<summary data-view-component="true" class="btn-danger btn-sm btn" role="button"> Delete
</summary>
<details-dialog class="anim-fade-in fast Box Box--overlay d-flex flex-column" role="dialog"
aria-modal="true">
<div class="Box-header">
<button class="Box-btn-octicon btn-octicon float-right" type="button"
aria-label="Close dialog" data-close-dialog="">
<svg aria-hidden="true" height="16" viewBox="0 0 16 16" version="1.1" width="16"
data-view-component="true" class="octicon octicon-x">
<path fill-rule="evenodd"
d="M3.72 3.72a.75.75 0 011.06 0L8 6.94l3.22-3.22a.75.75 0 111.06 1.06L9.06 8l3.22 3.22a.75.75 0 11-1.06 1.06L8 9.06l-3.22 3.22a.75.75 0 01-1.06-1.06L6.94 8 3.72 4.78a.75.75 0 010-1.06z">
</path>
</svg>
</button>
<h3 class="Box-title">Are you sure you want to delete this token?</h3>
</div>
<div data-view-component="true" class="flash flash-warn flash-full">
Any applications or scripts using this token will no longer be able to access the GitHub
API. You cannot undo this action.
</div>
<div class="Box-body overflow-auto">
</div>
<div class="Box-footer">
<!-- '"` -->
<!-- </textarea></xmp> -->
<form class="js-revoke-access-form" data-id="836771760" data-type-name="token"
data-turbo="false" action="/settings/tokens/836771760" accept-charset="UTF-8"
method="post" style=""><input type="hidden" name="_method" value="delete"
autocomplete="off"><input type="hidden" name="authenticity_token"
value="somevalue">
<button type="submit" data-view-component="true" class="btn-danger btn btn-block"> I
understand, delete this token
</button>
</form>
</div>
</details-dialog>
</details>
</div>
<small class="last-used float-right">Last used within the last 6 months</small>
<span class="token-description">
<strong>
<a href="/settings/tokens/836771760" data-pjax="">
Set GitHub commit build status values.</a>
</strong>
<span class="color-fg-muted">
<em>— <span title="Access commit status">repo:status</span></em>
</span>
</span>
<div>
<span class="color-fg-attention">
<a class="color-fg-attention" href="/settings/tokens/836771760/regenerate?index_page=1">
Expired <span class="text-semibold text-italic">on Mon, May 2 2022</span>.
</a> </span>
</div>
</div>
</div>
<div id="access-token-826562783" class="access-token js-revoke-item " data-id="826562783" data-type="token">
<div class="listgroup-item">
<div class="d-flex float-right">
<details class="ml-2 details-reset details-overlay details-overlay-dark">
<summary data-view-component="true" class="btn-danger btn-sm btn" role="button"> Delete
</summary>
<details-dialog class="anim-fade-in fast Box Box--overlay d-flex flex-column" role="dialog"
aria-modal="true">
<div class="Box-header">
<button class="Box-btn-octicon btn-octicon float-right" type="button"
aria-label="Close dialog" data-close-dialog="">
<svg aria-hidden="true" height="16" viewBox="0 0 16 16" version="1.1" width="16"
data-view-component="true" class="octicon octicon-x">
<path fill-rule="evenodd"
d="M3.72 3.72a.75.75 0 011.06 0L8 6.94l3.22-3.22a.75.75 0 111.06 1.06L9.06 8l3.22 3.22a.75.75 0 11-1.06 1.06L8 9.06l-3.22 3.22a.75.75 0 01-1.06-1.06L6.94 8 3.72 4.78a.75.75 0 010-1.06z">
</path>
</svg>
</button>
<h3 class="Box-title">Are you sure you want to delete this token?</h3>
</div>
<div data-view-component="true" class="flash flash-warn flash-full">
Any applications or scripts using this token will no longer be able to access the GitHub
API. You cannot undo this action.
</div>
<div class="Box-body overflow-auto">
</div>
<div class="Box-footer">
<!-- '"` -->
<!-- </textarea></xmp> -->
<form class="js-revoke-access-form" data-id="826562783" data-type-name="token"
data-turbo="false" action="/settings/tokens/826562783" accept-charset="UTF-8"
method="post"><input type="hidden" name="_method" value="delete"
autocomplete="off"><input type="hidden" name="authenticity_token"
value="someothervalue">
<button type="submit" data-view-component="true" class="btn-danger btn btn-block"> I
understand, delete this token
</button>
</form>
</div>
</details-dialog>
</details>
</div>
<small class="last-used float-right">Last used within the last 6 months</small>
<span class="token-description">
<strong>
<a href="/settings/tokens/82653355" data-pjax="">
somedescription</a>
</strong>
<span class="color-fg-muted">
<em>— <span title="something">repo</span></em>
</span>
</span>
<div>
<span class="color-fg-attention">
<a class="color-fg-attention" href="/settings/tokens/826562783/regenerate?index_page=1">
Expired <span class="text-semibold text-italic">on Thu, May 19 2022</span>.
</a> </span>
</div>
</div>
</div>
</div>
Question
How could I click the delete button belonging to the access-token-836771760 class in Python using Selenium?
Approach
I can find the delete buttons with:
danger_button = website_controller.driver.find_elements(By.CSS_SELECTOR,'btn-danger.btn-sm.btn')
print_attributes_of_elements(danger_button,website_controller)
def print_attributes_of_elements(elements,website_controller):
for elem in elements:
attrs = website_controller.driver.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', elem)
pprint(attrs)
However, within those buttons, I do not know which button is the right one.
If you already have the <div id="access-token-836771760" class="access-token js-revoke-item ".. element it should be as easy as that:
# get div by description (you already have your div)
div = driver.find_element(By.XPATH, "//a[normalize-space(text())='Test']//ancestor::div[#data-type='token']")
# click delete button
button = div.find_element(By.XPATH, ".//summary")
button.click()
You don't need to know the XPATH if you already have the reference to the div.
Edit:
I am already using a method to find an element within an element here.
You just need to call WebElement.find_element(By.XPATH, ".//tag").
Have a look at the XPath Syntax.
Firstly, the . selects the current node (WebElement). The // selects nodes in the document from the current node that match the selection. I think that is exactly what you want.
In the end, I was able to get the xpaths relative to another element of which I knew the xpath, by manually analysing what the xpath change pattern was. Still a general method to find elements within an element, would be appreciated.
Here is the verified script that deletes a GitHub personal access token if it already exists, based on the GitHub personal access token description:
from pprint import pprint
from typing import List
from code.project1.src.Website_controller import Website_controller
from code.project1.src.control_website import click_element_by_xpath, open_url, wait_until_page_is_loaded
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from code.project1.src.helper import scroll_shim
def remove_previous_github_pat(hardcoded,website_controller):
"""Assumes the user is logged in into GitHub. Then lists the already
existing GitHub personal access token (PAT) descriptions. If the new GitHub
PAT description is already existing, it deletes the existing GitHub PAT.
Then it verifies the GitHub PAT is not yet in GitHub/is removed
succesfully."""
# Check if the token exists, and if yes, get a link containing token id.
github_pat_exists,link =github_pat_description_exists(hardcoded,website_controller)
if github_pat_exists:
# Delete the GitHub personal access token.
delete_github_pat(link,hardcoded,website_controller)
# Verify token is deleted.
if github_pat_description_exists(hardcoded,website_controller)[0]:
raise Exception("Error, GitHub pat is not deleted succesfully.")
def github_pat_description_exists(hardcoded,website_controller):
"""Assumes the user is logged in into GitHub. Then lists the already
existing GitHub personal access token (PAT) descriptions. If the new GitHub
PAT description is already existing, it returns True, otherwise returns
False. Also returns the url of the GitHub pat that contains the token id."""
# Go to url containing GitHub pat.
website_controller.driver = open_url(
website_controller.driver,
hardcoded.github_pat_tokens_url,
)
# Wait until url is loaded.
wait_until_page_is_loaded(6,website_controller)
# Get the token descriptions through the href element.
elems = website_controller.driver.find_elements(By.CSS_SELECTOR,f".{hardcoded.github_pat_description_elem_classname} [href]")
for elem in elems:
link=elem.get_attribute('href')
if hardcoded.github_pat_description in elem.text:
return True, link
return False, None
def delete_github_pat(link,hardcoded,website_controller):
"""Gets the GitHub pat id from the link, then clicks the delete button, and
the confirm deletion button, to delete the GitHub pat."""
if link[:len(hardcoded.github_pat_tokens_url)] == hardcoded.github_pat_tokens_url:
github_pat_id=int(link[len(hardcoded.github_pat_tokens_url):])
print(f'github_pat_id={github_pat_id}')
# Get the right table row nr.
valid_indices=list_of_valid_xpath_indices([],f"{hardcoded.github_pat_table_xpath}/div[","]",website_controller)
row_nr= get_desired_token_index(hardcoded,website_controller,valid_indices)
# Click delete button and deletion confirmation button.
click_github_pat_delete_button(hardcoded,website_controller,row_nr)
else:
raise Exception(f'{link[:len(hardcoded.github_pat_tokens_url)]} is not:{hardcoded.github_pat_tokens_url}')
def list_of_valid_xpath_indices(valid_indices,left,right,website_controller):
"""Returns the row numbers of the GitHub personal access tokens table,
starting at index =1. Basically gets how much GitHub pats are stored."""
if valid_indices == []:
latest_index=1
else:
latest_index=valid_indices[-1]+1
try:
row = website_controller.driver.find_element(By.XPATH,
f"{left}{latest_index}{right}"
)
if not row is None:
print(row.text)
valid_indices.append(latest_index)
return list_of_valid_xpath_indices(valid_indices,left,right,website_controller)
else:
return valid_indices
except:
if len(valid_indices) ==0:
raise Exception("Did not find any valid indices.")
return valid_indices
def get_desired_token_index(hardcoded,website_controller,valid_indices:List[int]):
"""Finds the index/row number of the GitHub pat's that corresponds to the
description of the GitHub pat that is to be created, and returns this
index."""
for row_nr in valid_indices:
row_elem = website_controller.driver.find_element(By.XPATH,
f"{hardcoded.github_pat_table_xpath}/div[{row_nr}]"
)
if hardcoded.github_pat_description in row_elem.text:
return row_nr
def click_github_pat_delete_button(hardcoded,website_controller,row_nr:int):
"""Clicks the delete GitHub pat button, and then clicks the confirm
deletion button."""
delete_button = website_controller.driver.find_element(By.XPATH,
f"{hardcoded.github_pat_table_xpath}/div[{row_nr}]/div/div[1]/details/summary"
)
delete_button.click()
confirm_deletion_button = website_controller.driver.find_element(By.XPATH,
f"{hardcoded.github_pat_table_xpath}/div[{row_nr}]/div/div[1]/details/details-dialog/div[4]/form/button"
)
confirm_deletion_button.click()
So I want to find and click the last product that isn't sold on a product page. Im using xPath to click on the product but I am having issues:
Selecting, exclusively, an unsold product.
Selecting the last unsold product.
This is an example of the code:
<li class =“product_container”>
<a data-testid=“product__item”>
<div class=“hover overlay”>
<img>..</img>
</div>
</li>
<li class=“product_container”>
<a data-testid=“product__item”>…</a>
<div class=“hover overlay”>
<div data-testid=“product__sold”>Sold</div>
</div>
</li>
The first list tag is an unsold product and the second list tag is a sold product (A hover overlay stating "sold")
So far I can find the last loaded element that satisfies the a/[#data-testid="product__item"] but every attempt I've made to find element that doesn't contain div/[#data-testid='product__sold'] doesn't work.
I apologise in advance is my writing and terminology is off, this is the first script I've attempted.
Bases on this xml:
<li class="product_container">
<a data-testid="product__item">...</a>
<div class="hover overlay">
<img>..</img>
</div>
</li>
<li class="product_container">
<a data-testid="product__item">...</a>
<div class="hover overlay">
<div data-testid="product__sold">Sold</div>
</div>
</li>
You need:
//li[not(descendant::div[#data-testid='product__sold'])][position()=last()]/a
The result is:
<a data-testid="product__item">...</a>
You can search for an element that doesn't have sibling with data-testid="product__sold"
(//a[last()][#data-testid="product__item"][not(following-sibling::div/div[#data-testid="product__sold"])])[last()]
i'm having some troubles with trying to get a list in python..
I'm using Selenium Web Driver, Chrome specifically, and i have the next "button" :
<button id="btn" class="btn btn-default dropdown-toggle" type="button" data-toggle="dropdown" aria-expanded="false">Nope</button>
<ul id="ulDropdownNivel2" class="dropdown-menu">
<li>
text1
</li>
<li>
text2
</li>
<li>
text3
</li>
</ul>
So.. i have tried to use Select.class of Selenium Wd but, it's a button, and the class can't be used there... tried using it on the <ul> but it can't use Select neither...
Can't use smh like:
dropdoun = Select(driver.find_element_by_id('ID'))
for elm in dropdoun{ print(elm.text())}
Tried to figure a way to iterate through items... but i didn't get anything
i figured a way to click by xPath but, it doesn't work to me, cause text1, text2, text3 and so changes the order everytime you open the web
Any idea?
EDIT:
What i need here, is to iterate trough each item, and select the one that match with "text1", "text2" or "text3...
I think you should click button for dropdown will be opened:
opendropdownButton = driver.find_element_by_id("btn").click()
//here should be wait if now works
dropdownElements = driver.find_elements_by_xpath("//ul [#id="ulDropdownNivel2"]//a")
for elm in dropdownElements{ print(elm.text())}
I am trying to use selenium to grab text data from a page.
Printing the html attributes:
element = driver.find_element_by_id("divresults")
Results:
print(element.get_attribute('innerHTML'))
<div id="divDesktopResults"> </div>
Results:
print(element.get_attribute('outerHTML'))
<div id="divresults" data-bind="html:resultsContent"><div id="divDesktopResults"> </div></div>
Tried grabbing this element
Results:
driver.find_element_by_css_selector("span[class='glyphicon glyphicon-tasks']")
Message: no such element: Unable to locate element: {"method":"css selector","selector":"span[class='glyphicon glyphicon-tasks']"}
This is the code when copied from the Browser. There is much more below 'divresults' that did not show up in the innerhtml printout
<div id="divresults" data-bind="html:resultsContent">
<div>
<div class="row" style="font-size:8pt;">
<a data-toggle="tooltip" style="text-decoration:underline" href="#pdfviewer?ID=D218101736">
<strong>D218101736 </strong>
<span class="glyphicon glyphicon-new-window"></span>
</a>
<div class="btn-group" style="font-size:8pt;margin-left:10px;" id="btnD218101736">
<span style="display:none;font-size:8pt;" id="lblD218101736"> Added To Cart</span>
<button type="button" style="font-size:8pt;" class="btn btn-primary dropdown-toggle" data-toggle="dropdown"> Add To Cart
<span class="caret"></span>
</button>
<ul class="dropdown-menu" role="menu">
<li> <strong>Regular ($7.00)</strong> </li>
<li> <strong>Certified ($12.00)</strong> </li>
</ul>
</div>
</div> <br>
<ul class="nav nav-tabs compact">
<li class="active">
<a data-toggle="tab" href="#D218101736_Doc">
<span class="glyphicon glyphicon-file"></span>
<span>Doc Info</span>
</a>
</li>
<li class="hidden-xs">
<a data-toggle="tab" href="#D218101736_Thumbnail">
<span class="glyphicon glyphicon-th-large"></span>
<span>Thumbnail</span>
</a>
</li>
....
How to I get data beneath divresults in the instance?
My guess is that it's one of two things:
There is more than one element that matches that locator. To investigate this, try using $$("#divresults") in the dev console and make sure that it returns 1. If it returns more than one, run $$("#divresults")[0] and make sure the element returned is the one you want. If it is, go on to step 2. If it isn't, you will need to find a locator that is more specific. If you want our help, you will need to provide a link to the page or more of the surrounding HTML to the desired element.
You need to add a wait so that the contents of the element can finish loading. You could wait for a locator like #divresults strong or any number of locators to find some of the elements that were missing. You would wait for them to be visible (or at least present). See the docs for more info and options.