Alternative to time.sleep() in selenium using python while web scraping?

Alternative to time.sleep() in selenium using python while web scraping? - python

I need to scrape price of certain listed food items basis different locations in the country. There's an input text box that allows me to enter the name of the city & pressing "Enter" shows me the list of items available in that city.
Here's how I am trying to automate this:
driver.get("https://grofers.com/")
ele = driver.find_element_by_xpath("//input[#data-test-id='area-input-box']")`
ele.send_keys(area)
ele.send_keys(Keys.RETURN)
Here's the HTML I'm working with:
<div style="margin-left: 51px; height: 36px;">
<div style="display: flex; height: 100%;">
<button class="btn location-box mask-button">Detect my location</button>
<div class="oval-container">
<div class="oval">
<span class="separator-text">
<div class="or">OR</div>
</span>
</div>
</div>
<div style="width: 220px;">
<div class="modal-right__input-wrapper">
<div class="display--table full-width">
<div class="display--table-cell full-width">
<div id="map-canvas"></div>
<div class="Select location-search-input-v1 is-searchable Select--single">
<div class="Select-control">
<div class="Select-multi-value-wrapper" id="react-select-2--value">
<div class="Select-placeholder">Type your city Society/Colony/Area</div>
<div class="Select-input" style="display: inline-block;">**<input data-test-id="area-input-box" aria-activedescendant="react-select-2--value" aria-expanded="false" aria-haspopup="false" aria-owns="" role="combobox" value="">**</div>
</div>
<span class="Select-arrow-zone"><span class="Select-arrow"></span></span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
The problem is - after send_keys, the website takes time to autofill the input box AFTER WHICH I need to press enter.
I tried using time.sleep(2) after send_keys but this leads to pop-up disappearing & a StaleElementException when I do Keys.RETURN.
Have been stuck on this for quite some time now. Any help/pointers would be appreciated.

Selenium actually has an article on this with Explicit and Implicit waits, I think this is the one you're looking for:
# Wait until an element with id='myNewInput' has class 'myCSSClass'
wait = WebDriverWait(driver, 10)
element = wait.until(element_has_css_class((By.ID, 'myNewInput'), "myCSSClass"))
https://selenium-python.readthedocs.io/waits.html That's the article

You can also create custom wait conditions when none of the previous convenience methods fit your requirements. A custom wait condition can be created using a class with call method which returns False when the condition doesn’t match.
class element_has_css_class(object):
"""An expectation for checking that an element has a particular css class.
locator - used to find the element
returns the WebElement once it has the particular css class
"""
def __init__(self, locator, css_class):
self.locator = locator
self.css_class = css_class
def __call__(self, driver):
element = driver.find_element(*self.locator) # Finding the referenced element
if self.css_class in element.get_attribute("class"):
return element
else:
return False
# Wait until an element with id='myNewInput' has class 'myCSSClass'
wait = WebDriverWait(driver, 10)
element = wait.until(element_has_css_class((By.ID, 'myNewInput'), "myCSSClass"))

Related

Finding an element within an element without knowing the xpath?

Context
While trying to click a delete button belonging to a GitHub personal access token (PAT) with a certain description, using Selenium in Python. I am able to find the description and the ID of the PAT. However the button itself does not contain any reference to the id. Only the form that is spawned after clicking the button contains that reference. So to find out how to click the right button, I thought I would be able to find the button within the <div id="access-token-836771760" class="access-token js-revoke-item ".. element. However, most solutions that are able to search elements within elements, require one to know the xpath of this entry. I do not know the xpath of the parent element, because I find this element based on the token description. Apparently it is not practical to get the xpath of an element, once you have the element in Selenium.
HTML Code
<div class="listgroup">
<div id="access-token-836771760" class="access-token js-revoke-item " data-id="836771760" data-type="token">
<div class="listgroup-item">
<div class="d-flex float-right">
<details class="ml-2 details-reset details-overlay details-overlay-dark">
<summary data-view-component="true" class="btn-danger btn-sm btn" role="button"> Delete
</summary>
<details-dialog class="anim-fade-in fast Box Box--overlay d-flex flex-column" role="dialog"
aria-modal="true">
<div class="Box-header">
<button class="Box-btn-octicon btn-octicon float-right" type="button"
aria-label="Close dialog" data-close-dialog="">
<svg aria-hidden="true" height="16" viewBox="0 0 16 16" version="1.1" width="16"
data-view-component="true" class="octicon octicon-x">
<path fill-rule="evenodd"
d="M3.72 3.72a.75.75 0 011.06 0L8 6.94l3.22-3.22a.75.75 0 111.06 1.06L9.06 8l3.22 3.22a.75.75 0 11-1.06 1.06L8 9.06l-3.22 3.22a.75.75 0 01-1.06-1.06L6.94 8 3.72 4.78a.75.75 0 010-1.06z">
</path>
</svg>
</button>
<h3 class="Box-title">Are you sure you want to delete this token?</h3>
</div>
<div data-view-component="true" class="flash flash-warn flash-full">
Any applications or scripts using this token will no longer be able to access the GitHub
API. You cannot undo this action.
</div>
<div class="Box-body overflow-auto">
</div>
<div class="Box-footer">
<!-- '"` -->
<!-- </textarea></xmp> -->
<form class="js-revoke-access-form" data-id="836771760" data-type-name="token"
data-turbo="false" action="/settings/tokens/836771760" accept-charset="UTF-8"
method="post" style=""><input type="hidden" name="_method" value="delete"
autocomplete="off"><input type="hidden" name="authenticity_token"
value="somevalue">
<button type="submit" data-view-component="true" class="btn-danger btn btn-block"> I
understand, delete this token
</button>
</form>
</div>
</details-dialog>
</details>
</div>
<small class="last-used float-right">Last used within the last 6 months</small>
<span class="token-description">
<strong>
<a href="/settings/tokens/836771760" data-pjax="">
Set GitHub commit build status values.</a>
</strong>
<span class="color-fg-muted">
<em>— <span title="Access commit status">repo:status</span></em>
</span>
</span>
<div>
<span class="color-fg-attention">
<a class="color-fg-attention" href="/settings/tokens/836771760/regenerate?index_page=1">
Expired <span class="text-semibold text-italic">on Mon, May 2 2022</span>.
</a> </span>
</div>
</div>
</div>
<div id="access-token-826562783" class="access-token js-revoke-item " data-id="826562783" data-type="token">
<div class="listgroup-item">
<div class="d-flex float-right">
<details class="ml-2 details-reset details-overlay details-overlay-dark">
<summary data-view-component="true" class="btn-danger btn-sm btn" role="button"> Delete
</summary>
<details-dialog class="anim-fade-in fast Box Box--overlay d-flex flex-column" role="dialog"
aria-modal="true">
<div class="Box-header">
<button class="Box-btn-octicon btn-octicon float-right" type="button"
aria-label="Close dialog" data-close-dialog="">
<svg aria-hidden="true" height="16" viewBox="0 0 16 16" version="1.1" width="16"
data-view-component="true" class="octicon octicon-x">
<path fill-rule="evenodd"
d="M3.72 3.72a.75.75 0 011.06 0L8 6.94l3.22-3.22a.75.75 0 111.06 1.06L9.06 8l3.22 3.22a.75.75 0 11-1.06 1.06L8 9.06l-3.22 3.22a.75.75 0 01-1.06-1.06L6.94 8 3.72 4.78a.75.75 0 010-1.06z">
</path>
</svg>
</button>
<h3 class="Box-title">Are you sure you want to delete this token?</h3>
</div>
<div data-view-component="true" class="flash flash-warn flash-full">
Any applications or scripts using this token will no longer be able to access the GitHub
API. You cannot undo this action.
</div>
<div class="Box-body overflow-auto">
</div>
<div class="Box-footer">
<!-- '"` -->
<!-- </textarea></xmp> -->
<form class="js-revoke-access-form" data-id="826562783" data-type-name="token"
data-turbo="false" action="/settings/tokens/826562783" accept-charset="UTF-8"
method="post"><input type="hidden" name="_method" value="delete"
autocomplete="off"><input type="hidden" name="authenticity_token"
value="someothervalue">
<button type="submit" data-view-component="true" class="btn-danger btn btn-block"> I
understand, delete this token
</button>
</form>
</div>
</details-dialog>
</details>
</div>
<small class="last-used float-right">Last used within the last 6 months</small>
<span class="token-description">
<strong>
<a href="/settings/tokens/82653355" data-pjax="">
somedescription</a>
</strong>
<span class="color-fg-muted">
<em>— <span title="something">repo</span></em>
</span>
</span>
<div>
<span class="color-fg-attention">
<a class="color-fg-attention" href="/settings/tokens/826562783/regenerate?index_page=1">
Expired <span class="text-semibold text-italic">on Thu, May 19 2022</span>.
</a> </span>
</div>
</div>
</div>
</div>
Question
How could I click the delete button belonging to the access-token-836771760 class in Python using Selenium?
Approach
I can find the delete buttons with:
danger_button = website_controller.driver.find_elements(By.CSS_SELECTOR,'btn-danger.btn-sm.btn')
print_attributes_of_elements(danger_button,website_controller)
def print_attributes_of_elements(elements,website_controller):
for elem in elements:
attrs = website_controller.driver.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', elem)
pprint(attrs)
However, within those buttons, I do not know which button is the right one.

If you already have the <div id="access-token-836771760" class="access-token js-revoke-item ".. element it should be as easy as that:
# get div by description (you already have your div)
div = driver.find_element(By.XPATH, "//a[normalize-space(text())='Test']//ancestor::div[#data-type='token']")
# click delete button
button = div.find_element(By.XPATH, ".//summary")
button.click()
You don't need to know the XPATH if you already have the reference to the div.
Edit:
I am already using a method to find an element within an element here.
You just need to call WebElement.find_element(By.XPATH, ".//tag").
Have a look at the XPath Syntax.
Firstly, the . selects the current node (WebElement). The // selects nodes in the document from the current node that match the selection. I think that is exactly what you want.

In the end, I was able to get the xpaths relative to another element of which I knew the xpath, by manually analysing what the xpath change pattern was. Still a general method to find elements within an element, would be appreciated.
Here is the verified script that deletes a GitHub personal access token if it already exists, based on the GitHub personal access token description:
from pprint import pprint
from typing import List
from code.project1.src.Website_controller import Website_controller
from code.project1.src.control_website import click_element_by_xpath, open_url, wait_until_page_is_loaded
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from code.project1.src.helper import scroll_shim
def remove_previous_github_pat(hardcoded,website_controller):
"""Assumes the user is logged in into GitHub. Then lists the already
existing GitHub personal access token (PAT) descriptions. If the new GitHub
PAT description is already existing, it deletes the existing GitHub PAT.
Then it verifies the GitHub PAT is not yet in GitHub/is removed
succesfully."""
# Check if the token exists, and if yes, get a link containing token id.
github_pat_exists,link =github_pat_description_exists(hardcoded,website_controller)
if github_pat_exists:
# Delete the GitHub personal access token.
delete_github_pat(link,hardcoded,website_controller)
# Verify token is deleted.
if github_pat_description_exists(hardcoded,website_controller)[0]:
raise Exception("Error, GitHub pat is not deleted succesfully.")
def github_pat_description_exists(hardcoded,website_controller):
"""Assumes the user is logged in into GitHub. Then lists the already
existing GitHub personal access token (PAT) descriptions. If the new GitHub
PAT description is already existing, it returns True, otherwise returns
False. Also returns the url of the GitHub pat that contains the token id."""
# Go to url containing GitHub pat.
website_controller.driver = open_url(
website_controller.driver,
hardcoded.github_pat_tokens_url,
)
# Wait until url is loaded.
wait_until_page_is_loaded(6,website_controller)
# Get the token descriptions through the href element.
elems = website_controller.driver.find_elements(By.CSS_SELECTOR,f".{hardcoded.github_pat_description_elem_classname} [href]")
for elem in elems:
link=elem.get_attribute('href')
if hardcoded.github_pat_description in elem.text:
return True, link
return False, None
def delete_github_pat(link,hardcoded,website_controller):
"""Gets the GitHub pat id from the link, then clicks the delete button, and
the confirm deletion button, to delete the GitHub pat."""
if link[:len(hardcoded.github_pat_tokens_url)] == hardcoded.github_pat_tokens_url:
github_pat_id=int(link[len(hardcoded.github_pat_tokens_url):])
print(f'github_pat_id={github_pat_id}')
# Get the right table row nr.
valid_indices=list_of_valid_xpath_indices([],f"{hardcoded.github_pat_table_xpath}/div[","]",website_controller)
row_nr= get_desired_token_index(hardcoded,website_controller,valid_indices)
# Click delete button and deletion confirmation button.
click_github_pat_delete_button(hardcoded,website_controller,row_nr)
else:
raise Exception(f'{link[:len(hardcoded.github_pat_tokens_url)]} is not:{hardcoded.github_pat_tokens_url}')
def list_of_valid_xpath_indices(valid_indices,left,right,website_controller):
"""Returns the row numbers of the GitHub personal access tokens table,
starting at index =1. Basically gets how much GitHub pats are stored."""
if valid_indices == []:
latest_index=1
else:
latest_index=valid_indices[-1]+1
try:
row = website_controller.driver.find_element(By.XPATH,
f"{left}{latest_index}{right}"
)
if not row is None:
print(row.text)
valid_indices.append(latest_index)
return list_of_valid_xpath_indices(valid_indices,left,right,website_controller)
else:
return valid_indices
except:
if len(valid_indices) ==0:
raise Exception("Did not find any valid indices.")
return valid_indices
def get_desired_token_index(hardcoded,website_controller,valid_indices:List[int]):
"""Finds the index/row number of the GitHub pat's that corresponds to the
description of the GitHub pat that is to be created, and returns this
index."""
for row_nr in valid_indices:
row_elem = website_controller.driver.find_element(By.XPATH,
f"{hardcoded.github_pat_table_xpath}/div[{row_nr}]"
)
if hardcoded.github_pat_description in row_elem.text:
return row_nr
def click_github_pat_delete_button(hardcoded,website_controller,row_nr:int):
"""Clicks the delete GitHub pat button, and then clicks the confirm
deletion button."""
delete_button = website_controller.driver.find_element(By.XPATH,
f"{hardcoded.github_pat_table_xpath}/div[{row_nr}]/div/div[1]/details/summary"
)
delete_button.click()
confirm_deletion_button = website_controller.driver.find_element(By.XPATH,
f"{hardcoded.github_pat_table_xpath}/div[{row_nr}]/div/div[1]/details/details-dialog/div[4]/form/button"
)
confirm_deletion_button.click()

Locating an element using Python and Selenium via innerHTML

I'm new to Selenium and I'm trying to write my first real script using the package for Python.
I'm using:
Windows 10
Python 3.10.5
Selenium 4.3.0
So far I've been able to do everything I need with different selectors, like ID, name, XPATH etc.
However I've stumbled upon an issue where I need to find a specific element by using the innerHTML of it.
The issue I'm facing is that I need to find an element with the innerHTML-value of "Changed" as seen in the HTML below.
The first challenge I'm facing is that the element doesn't have a unique ID, name or otherwise to identify it and there's many objects/elements of "dlx-treeview-node".
The second challenge is that XPATH won't work because the element changes position depending on where you are on the website (the number of "dlx-treeview-node"-elements change), so if I use XPATH I'll get the wrong element depending on where I am.
I can successfully get the name by using the below XPATH, "get_attribute" and printing to console, which is why I know it's innerHTML and not innerText, but as mentioned this will change depending on where I am on the website.
I would really appreciate any help I can get to solve this challenge and to learn more about the use of Selenium with Python.
Code trials:
select_filter_name = wait.until(EC.element_to_be_clickable((By.XPATH, "/html/body/div/app-root/dlx-select-filter-attribute-dialog/dlx-dialog-window/div/div[2]/div/div/div[5]/div/div/dlx-view-column-selector-component/div[1]/dlx-treeview/div/dlx-treeview-nodes/div/dlx-treeview-nodes/div/dlx-treeview-node[16]/div/div/div/div[2]/div/dlx-text-truncater/div")))
filter_name = select_filter_name.get_attribute("innerHTML")
print(filter_name)
HTML:
<dlx-treeview-node _nghost-nrk-c188="" class="ng-star-inserted">
<div _ngcontent-nrk-c188="" dlx-droppable="" dlx-draggable="" dlx-file-drop="" class="d-flex flex-column position-relative dlx-hover on-hover-show-expandable-menu bg-control-active bg-control-hover">
<div _ngcontent-nrk-c188="" class="d-flex flex-row ml-2">
<div _ngcontent-nrk-c188="" class="d-flex flex-row text-nowrap expand-horizontal" style="padding-left: 15px;">
<!---->
<div _ngcontent-nrk-c188="" class="d-flex align-self-center ng-star-inserted" style="min-width: 16px; margin-left: 3px;">
<!---->
</div>
<!---->
<div _ngcontent-nrk-c188="" class="d-flex flex-1 flex-no-overflow-x" style="padding: 3.5px 0px;">
<div class="d-flex flex-row justify-content-start flex-no-overflow-x align-items-center expand-horizontal ng-star-inserted">
<!---->
<dlx-text-truncater class="overflow-hidden d-flex flex-no-overflow-x ng-star-inserted">
<div class="text-truncate expand-horizontal ng-star-inserted">Changed</div>
<!---->
<!---->
</dlx-text-truncater>
<!---->
</div>
<!---->
<!---->
<!---->
</div>
</div>
<!---->
<!---->
</div>
</div>
<!---->
<dlx-attachment-content _ngcontent-nrk-c188="">
<div style="position: fixed; z-index: 10001; left: -10000px; top: -10000px; pointer-events: auto;">
<!---->
<!---->
</div>
</dlx-attachment-content>
</dlx-treeview-node>
Edit-1:
NOTE: I'm not sure I'm using the correct terms for HTML, so please correct me if I'm wrong.
I've learned that I have a follow up question:
How do I search for the text as described, but only searching in the "dlx-treeview-node" (there's about 100 of these)? So basically searching in the "children" of these.
The question is because I've learned that there are more elements with the specific text I'm searching for in other places.
Edit-2/solution:
I ended up finding my own solution before I received answers - I'm writing it here in case it can help anyone else.
The reply that is marked as "answer" is because this came the closest to what I needed.
The final code ended up like this (first searching the nodes - then searching the children for the specific innerHTML):
select_filter_name = wait.until(EC.element_to_be_clickable((By.XPATH, "//dlx-treeview-node[.//div[text()='Changed']]")))

Presuming the innerText of the <div> element as a unique text within the HTML DOM to locate the element with the innerHTML as Changed you can use either of the following xpath based locator strategies:
Using xpath and text():
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Changed']")))
Using xpath and contains():
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(., 'Changed')]")))

just run this code on your page and you will get an array of all elements which are a div with the value of Changed
# Define XPath Function (used in the next step)
driver.execute_script("function getXPathOfElement(elt) {var path = "";for (; elt && elt.nodeType == 1; elt = elt.parentNode) { idx = getElementIdx(elt); xname = elt.tagName; if (idx > 1) xname += "[" + idx + "]"; path = "/" + xname + path;} return path;}")
# Get all XPaths for all nodes which are a div with the text of "changed"
xpaths = driver.execute_script("return Array.from(document.querySelectorAll(\"div\")).find(el => el.textContent.includes('Changed')).map((node)=>{ return getXPathOfElement(node)});');
write up
the first execute adds a javascript function to the dom called getXPathOfElement this function accepts a html node element and will provide the xpath string for said node.
the second execute gets all elements which are a div with the text of Changed this will then loop through each element and then provide you with an array of strings, where each string is an xpath by calling the above getXPathOfElement function on each node.
the js is quite simple and harmless.
Tips
check if xpaths length is more than or equal to 1
index xpaths such as xpaths[0] or do loops to make your changes
you will now have an xpath which can be used like a normal selector.
good luck
Edit 1
execute_script() synchronously executes JavaScript in the current window/frame.
or find more here

Selenium + Python: Print the text attribute of an element

I would like to navigate through a website, find an element and print it.
Python version: 3.10; Selenium Webdriver: Firefox; IDE: PyCharm 2021.3.2 (CE);
OS: Fedora 35 VM
I am able to navigate to the appropriate page where the text is generated in a drop down menu.
When I locate the element by CSS Selector and attempt to print it, the output does print the text "None".
I would like it to print the Plan Name which in this case is "Dual Complete Plan 1".
The element is not always present so I also need to catch any exceptions.
The relevant HTML code of the element I am trying to print:
<span class="OSFillParent" data-expression="" style="font-size: 12px; margin-top: 5px;">Dual Complete Plan 1</span>
More of the HTML code of the element I am trying to print (element I am trying to capture is below the fourth div):
<td data-header="Plan Name">
<div id="b8-b40-l1_0-132_0-$b2" class="OSBlockWidget" data-block="Content.AccordionItem">
<div id="b8-b40-l1_0-132_0-b2-SectionItem" class="section-expandable open is--open small-accordion" data-container="" data-expanded="true" aria-expanded="true" aria-disabled="false" role="tab">
<div id="b8-b40-l1_0-132_0-b2-TitleWrapper" class="section-expandable-title" data-container="" style="cursor: pointer;" role+"button" aria-hidden="false" aria-expanmded="true" tabindex="0" aria-controls="b8-b40-l1_0-132_0-b2-Content" EVENT FLEX
<div id="b8-b40-l1_0-132_0-b2-Title" class="dividers full-width">
<span class="OSFillParent" data-expression="" style="font-size: 12px; margin-top: 5px;">Dual Complete Plan 1</span>
</div>
<div class="section-expandable-icon" data-container="" aria-hidden="true"
::after
</div>
</div>
<div id="b8-b40-l1_0-132_0-b2-ContentWrapper" class="section-expandable-content no-padding is--expanded" data-container="" tabindex="0" aria-hidden="false" aria-labelledby="b8-b40-l1_0-132_0-b2-TitleWrapper">
<div id="b8-b40-l1_0-132_0-b2-Content" role="tabpanel">
<a data-link="" href="https://www.communityplan.com" target="_blank" title="Click for more information"> EVENT
<span class="OSFillParent" data-expression="" style="font-size: 12px;">www.CommunityPlan.com</span>
</a>
<span class="OSFillParent" data-expression="" style="font-size: 12px:">Phone Number: 8005224700</span>
</div>
</div>
</div>
</div>
</td>
My relevant Selenium code:
# Find the Plan Name & if present set it to the variable "Advantage"
try:
Advantage = (WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "#b8-b40-l1_0-132_0-b2-Title > span:nth-child(1)"))).get_attribute("value"))
except:
pass
print('\033[91;46m', Advantage, '\033[0m')
I expect the output to be "Dual Complete Plan 1", which is what I see on the screen and in the HTML. Instead I get the following:
None
Apparently the "Advantage" variable is being set to "None".
Why?
I can see the text "Dual Complete Plan 1" that I want to print in the HTML code above.
What am I doing wrong?
I feel like I need a primer on "get attribute"?

To get the text Dual Complete Plan 1 you need to use
element.text
or
element.get_attribute("innerHTML")
or
element.get_attribute("textContent")
Instead of presence_of_element_located() use visibility_of_element_located()
and following css selector to identify
div[id*='Title'] > span.OSFillParent
Or
div.dividers.full-width > span.OSFillParent
Code:
try:
Advantage = WebDriverWait(driver, 5).until(
EC.visibility_of_element_located((By.CSS_SELECTOR, "div[id*='Title'] > span.OSFillParent"))).text
except:
pass
print(Advantage )

all element attributes in specific container selenium python

Lets say I have some HTML code that looks like this and I use CSS selectors to make a list of elements
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
now I have a list of elements that are at the item-container level. How would I go about finding the href value of each element in elements.
I was thinking I do something like
for element in elements:
element.get_attribute("href")
I know I could explicitly go to the href level with the code but I want to check if each container contains href and if it does I want the value in that container. If I go specifically to the href level it will just skip the containers that do not have href in them.

You could try this
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome()
driver.get("file://{PATH_TO_YOUR_FILE}")
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
for element in elements:
try:
link = element.find_element_by_tag_name('a')
print(link.get_attribute('href'))
except NoSuchElementException:
print('No Data Available!')
driver.close()
Besides, I'd suggest surrounding your divs with </div> and add https:// before your URLs.
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
</div>
</div>
If you don't add https:// before your URLs, python will interpret it as a local URL if you run selenium in a local file.

Select checkbox using selenium in python

I want to select a checkbox using selenium in python. Following is the HTML of the checkbox. The span element is getting highlighted when hovering the mouse over checkbox
HTML
<div id="rc-anchor-container" class="rc-anchor rc-anchor-normal rc-anchor-light">
<div id="recaptcha-accessible-status" class="rc-anchor-aria-status" aria-hidden="true">Recaptcha requires verification. </div>
<div class="rc-anchor-error-msg-container" style="display:none"><span class="rc-anchor-error-msg" aria-hidden="true"></span></div>
<div class="rc-anchor-content">
<div class="rc-inline-block">
<div class="rc-anchor-center-container">
<div class="rc-anchor-center-item rc-anchor-checkbox-holder"><span class="recaptcha-checkbox goog-inline-block recaptcha-checkbox-unchecked rc-anchor-checkbox" role="checkbox" aria-checked="false" id="recaptcha-anchor" tabindex="0" dir="ltr" aria-labelledby="recaptcha-anchor-label"><div class="recaptcha-checkbox-border" role="presentation"></div><div class="recaptcha-checkbox-borderAnimation" role="presentation"></div><div class="recaptcha-checkbox-spinner" role="presentation"></div><div class="recaptcha-checkbox-spinnerAnimation" role="presentation"></div><div class="recaptcha-checkbox-checkmark" role="presentation"></div></span></div>
</div>
</div>
<div class="rc-inline-block">
<div class="rc-anchor-center-container">
<label class="rc-anchor-center-item rc-anchor-checkbox-label" aria-hidden="true" role="presentation" id="recaptcha-anchor-label"><span aria-live="polite" aria-labelledby="recaptcha-accessible-status"></span>I'm not a robot</label>
</div>
</div>
</div>
<div class="rc-anchor-normal-footer">
<div class="rc-anchor-logo-portrait" aria-hidden="true" role="presentation">
<div class="rc-anchor-logo-img rc-anchor-logo-img-portrait"></div>
<div class="rc-anchor-logo-text">reCAPTCHA</div>
</div>
<div class="rc-anchor-pt">Privacy<span aria-hidden="true" role="presentation"> - </span>Terms</div>
</div>
</div>
I am trying the following code but it is giving following exception selenium.common.exceptions.NoSuchElementException
My Code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
chromedriver = 'C:\Program Files (x86)\Google\Chrome\chromedriver'
browser = webdriver.Chrome(chromedriver)
browser.get(url)
checkBox = browser.find_element_by_id("recaptcha-anchor")
checkBox.click()

This is a recaptha stuff .. it's not like normal elements in the page
you have to navigate with selenium to the captcha frame .. then you can deal with the checkbox element..
to do that you need first to save the main window handle to be get back to it when you're done with the recaptcha
# save the main window handle
mainwindow = browser.current_window_handle
# get the recapthca iframe then navigate to it
frame = browser.find_element_by_tag_name("iframe")
browser.switch_to.frame(frame)
# now you can access the checkbox element
browser.find_element_by_id("recaptcha-anchor").click()
# navigate back to main window
browser.switch_to.window(mainwindow)
for further info about how to deal with the recaptcha challenge check this link

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Alternative to time.sleep() in selenium using python while web scraping? - python

Related

Finding an element within an element without knowing the xpath?

Locating an element using Python and Selenium via innerHTML

Selenium + Python: Print the text attribute of an element

all element attributes in specific container selenium python

Select checkbox using selenium in python

Categories

Resources