Web scraping ignore "Next" or ">" when hidden (Selenium, Python) - python

I am using Selenium for Python to scrape a site with multiple pages. To get to the next page, I use driver.find_element(By.XPATH, xpath). However, The xpath text changes. So, instead, I want to use other attributes.
I tried to find by class, using "page-link": driver.find_element(By.CLASS_NAME, "page-link". However, the "page-link" class is also present in the disabled list item. As a result, the Selenium driver won't stop after the last page, in this case page 2.
I want to stop the driver clicking the disabled item on the page, i.e. I want it to ignore the last item in the list, the one with "page-item disabled", aria-disabled="true" and aria-hidden="true". The idea is that if the script can't find that item, it will end a while loop that relies on the ">" button to be enabled.
See the source code below.
Please advise.
<nav>
<ul class="pagination">
<li class="page-item">
<a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&todate=2023-02-28&filterByMemberId=&view=View%20Report&page=1" rel="prev" aria-label="« Previous">‹</a>
</li>
<li class="page-item">
<a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&todate=2023-02-28&filterByMemberId=&view=View%20Report&page=1">1</a>
</li>
<li class="page-item active" aria-current="page">
<span class="page-link">2</span>
</li>
<li class="page-item disabled" aria-disabled="true" aria-label="Next »">
<span class="page-link" aria-hidden="true">›</span>
</li>
</ul>
</nav>

To go to the Next Page there can be a couple of approaches:
You can opt to find_element() and click it's descendant <span> of the <li> with aria-label="Next »" but doesn't contains aria-disabled="true" as follows:
driver.find_element(By.XPATH, "//li[starts-with(#aria-label, 'Next') and not(#aria-disabled='true')]/span").click()

Related

Selecting Toggle Button with only Classes to identify the toggle button with Selenium & Python

I am attempting to toggle a button on an existing html file that is online. I change some of the names of the files to not give away the website I am working on. I know there is a way to do this if the dropdown-toggle has a title or id. However, for this html file there are only classes. Is there a way to click on the toggle button with an html file with this structure?
<li class="dropdown">
<b>Name</b><strong class="caret"></strong>
<ul class="dropdown-menu">
<li>
File1
</li>
<li>
File2s
</li>
<li>
File3
</li>
<li role="presentation" class="divider"></li>
<li>
Add
</li>
<li>
search
</li>
</ul>
</li>
Try this:
button = driver.findElement(By.cssSelector("a[class='dropdown-toggle']"))
# Or if it is not working.
button = driver.findElement(By.cssSelector("a[data-toggle='dropdown']"))
I think you are trying to select <a> with class name dropdown-toggle.

How to click on li element of a dropdown list using Selenium in Python^

I'm trying to select the li element "US" in this dropdown list of the following website: https://proxyscrape.com/free-proxy-list
Here is the python code I have but does not work:
driver.find_element_by_xpath('/html/body/main/div/div[1]/div[3]/div/div[1]/div[2]').click()
time.sleep(4)
driver.find_element_by_css_selector("#list httpcountry countryselect [value='US']")
And here is the HTML I'm working with:
<div class="nice-select selectortypes open" tabindex="0">
<span>
Country: <span class="current">all</span>
</span>
<ul class="list httpcountry countryselect">
<li data-value="all" class="option">all</li>
<li data-value="US" class="option">US</li>
<li data-value="ES" class="option">ES</li>
<li data-value="RU" class="option">RU</li>
<li data-value="PL" class="option">PL</li>
<li data-value="BD" class="option">BD</li>
<li data-value="IR" class="option">IR</li>
<li data-value="FR" class="option">FR</li>
<li data-value="CN" class="option">CN</li>
<li data-value="CA" class="option">CA</li>
<li data-value="PK" class="option">PK</li>
<li data-value="IN" class="option">IN</li>
<li data-value="ID" class="option">ID</li>
<li data-value="BR" class="option">BR</li>
<li data-value="DE" class="option">DE</li>
<li data-value="GB" class="option">GB</li>
<li data-value="TH" class="option">TH</li>
<li data-value="SG" class="option">SG</li>
<li data-value="EG" class="option">EG</li>
<li data-value="UA" class="option">UA</li>
</ul>
</div>
Any clue on how to select this element?
Solution:
You need to wait before clicking on country dropdown list, because of top-banner appears and webdriver losts the focus, dropdown closes.
Here is the code which i wrote and script has passed after adding two sleeps in these two places:
driver.get('https://proxyscrape.com/free-proxy-list')
country_list = driver.find_element_by_css_selector('.list.socks4country.countryselect').find_element_by_xpath('./..')
sleep(2)
country_list.click()
sleep(1)
country_list.find_element_by_css_selector('[data-value="US"]').click()
us = driver.find_element_by_css_selector('[class="list socks4country countryselect"] [data-value="US"]')
assert us.get_attribute('class') == 'option selected'
If you look what happens when you select US option, then you can see, that it will change parameters in request here:
From:
Download
To:
Download
So, you actually don't want to click US option, but probably send request with appropriate parameters

Using Selenium Webdriver, grabbing data not showing up in innerhtml

I am trying to use selenium to grab text data from a page.
Printing the html attributes:
element = driver.find_element_by_id("divresults")
Results:
print(element.get_attribute('innerHTML'))
<div id="divDesktopResults"> </div>
Results:
print(element.get_attribute('outerHTML'))
<div id="divresults" data-bind="html:resultsContent"><div id="divDesktopResults"> </div></div>
Tried grabbing this element
Results:
driver.find_element_by_css_selector("span[class='glyphicon glyphicon-tasks']")
Message: no such element: Unable to locate element: {"method":"css selector","selector":"span[class='glyphicon glyphicon-tasks']"}
This is the code when copied from the Browser. There is much more below 'divresults' that did not show up in the innerhtml printout
<div id="divresults" data-bind="html:resultsContent">
<div>
<div class="row" style="font-size:8pt;">
<a data-toggle="tooltip" style="text-decoration:underline" href="#pdfviewer?ID=D218101736">
<strong>D218101736 </strong>
<span class="glyphicon glyphicon-new-window"></span>
</a>
<div class="btn-group" style="font-size:8pt;margin-left:10px;" id="btnD218101736">
<span style="display:none;font-size:8pt;" id="lblD218101736"> Added To Cart</span>
<button type="button" style="font-size:8pt;" class="btn btn-primary dropdown-toggle" data-toggle="dropdown"> Add To Cart
<span class="caret"></span>
</button>
<ul class="dropdown-menu" role="menu">
<li> <strong>Regular ($7.00)</strong> </li>
<li> <strong>Certified ($12.00)</strong> </li>
</ul>
</div>
</div> <br>
<ul class="nav nav-tabs compact">
<li class="active">
<a data-toggle="tab" href="#D218101736_Doc">
<span class="glyphicon glyphicon-file"></span>
<span>Doc Info</span>
</a>
</li>
<li class="hidden-xs">
<a data-toggle="tab" href="#D218101736_Thumbnail">
<span class="glyphicon glyphicon-th-large"></span>
<span>Thumbnail</span>
</a>
</li>
....
How to I get data beneath divresults in the instance?
My guess is that it's one of two things:
There is more than one element that matches that locator. To investigate this, try using $$("#divresults") in the dev console and make sure that it returns 1. If it returns more than one, run $$("#divresults")[0] and make sure the element returned is the one you want. If it is, go on to step 2. If it isn't, you will need to find a locator that is more specific. If you want our help, you will need to provide a link to the page or more of the surrounding HTML to the desired element.
You need to add a wait so that the contents of the element can finish loading. You could wait for a locator like #divresults strong or any number of locators to find some of the elements that were missing. You would wait for them to be visible (or at least present). See the docs for more info and options.

I want to click on the <li> item in the left navigation of a webpage, using Python and Selenium webdriver to locate it

I want to click on MAKE UP in the
left navigation, Please find attached image and link for the webpage
Image for the Webpage
Link for the Webpage
I am currently using the below code to click on the item but not
getting any result.I am able to acces the elements by class
name('has-sub').I can even print them but cant click them
obc = driver.find_elements_by_class_name('has-sub')
for ea in obc:
if ea.text == "Makeup":
ea.click()
Just for the more info below is the html code for the webpage
<li class="has-sub" style="height: 38px;">
Makeup
<ul class="submenu" style="top: 0px;">
<li>
<a id="SBN_facet_Face" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/face" escapexml="false">Face </a>
</li>
<li>
<a id="SBN_facet_Lips" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/lips" escapexml="false">Lips </a>
</li>
<li>
<a id="SBN_facet_Eyes" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/eyes" escapexml="false">Eyes </a>
</li>
<li>
<a id="SBN_facet_Nails" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/nails" escapexml="false">Nails </a>
</li>
<li>
<a id="SBN_facet_Brushes & Tools" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/beauty-brushes-accessories" escapexml="false">Brushes & Tools </a>
</li>
<li>
<a id="SBN_facet_Makeup" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/beauty-makeup" escapexml="false">All Makeup </a>
</li>
</ul>
</li>`enter code here`
Any help will be appreciated .
I am able to click using below code.
wait = WebDriverWait(driver, 10)
elements = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//li[#class='has-sub']")))
for element in elements:
if element.find_elements_by_link_text("Makeup"):
element.click()
break
innerElements = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//li[#class='has-sub open']/ul/li")))
for innerElement in innerElements:
if innerElement.text == "Face":
innerElement.click()
break
Hope this will help you.
Problem here is, you are trying to click on the element while the text is under the element. So what you are going to need to do is:
obc = driver.find_elements_by_xpath('//li[#class='has-sub']/a[contains(text(), 'Makeup')]')
I tested the xpath on your webpage and it worked.
As per the HTML you have provided, to click on MAKE UP in the left navigation pane, you can use the following code block :
obc = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='aside all-open']/ul//li[#class='has-sub']/a")))
for ea in obc:
if 'Makeup' in ea.get_attribute("innerHTML"):
ea.click()
break

Selenium Python Web Element not visible

I have a submenu item that I used to be able to access via selenium web navigation. Now I keep getting the following error: "You may only interact with visible elements". I have tried a number of recommendations (waits, implicit/explicit, maximizing windows, using the ActionChain object) without success. Can anyone spot why this element would remain invisible by looking at the following HTML and code?:
<ul class="nav" >
<li>
EDC
</li>
</ul>
<ul class="nav" >
<li>
Queries
</li>
</ul>
<ul class="nav" >
<li>
Docs
</li>
</ul>
<ul class="nav" >
<li>
Data
</li>
</ul>
<ul class="nav" >
<li>
Audit Log
</li>
</ul>
<ul class="nav" >
<li>
Reports
</li>
</ul>
<ul class="nav" >
<li class="dropdown ">
Tools <b class="caret"></b>
<ul class="dropdown-menu">
<li>
SQL Worksheet
</li>
<li>
Meddra
</li>
<li>
SAE
</li>
<li>
Worksheets
</li>
<li>
Pipelines
</li>
<li>
Sync
</li>
<li>
Project Management
</li>
<li>
RSS
</li>
<li>
IPT
</li>
<li>
Images
</li>
</ul>
</li>
</ul>
And here is the python code snippet that is not working:
try:
menu_item = driver.find_element(By.LINK_TEXT, 'Tools')
actions = ActionChains(driver)
actions.click(menu_item).perform()
except Exception as error:
print ("Tools menu not found: " + str(error))
try:
wait = WebDriverWait(driver, 10) wait.until(EC.presence_of_element_located(By.XPATH("/html/body/header/div/div/div/div/ul[7]/li/ul/li[9]/a")));
ipt_menu_item = driver.find_element(By.XPATH, "/html/body/header/div/div/div/div/ul[7]/li/ul/li[9]/a")
actions.click(ipt_menu_item).perform()
except Exception as error:
print ("Tools | IPT link not found: " + str(error))
I have dealt with this same issue a few times, I have found that most the time you can move to the element first and then issue the .click() command:
Element = driver.find_element_by_link_text('link')
actions = ActionChains(driver)
# try this
actions.move_to_element(Element)
# or this
driver.execute_script("return arguments[0].scrollIntoView();", Element)
Element.click()
EDIT:
Or, a third option for click in case the top two do not work, if you can get the element with selenium and the element is in view, but just can't interact with it, then it's probably behind a <div that is not visible. Try this click instead of your normal .click():
driver.execute_script("arguments[0].click()", Element)
If that does not work, you may need to try interacting with the attributes to get the element in a state of visibility for selenium before you make the click, such as:
driver.execute_script("arguments[0].style.display = 'block'", Element)

Categories

Resources