Selenium Python: Flagging pages with specific span text - python

I currently have a Selenium script that is running through a list of part numbers on a website and capturing some information such as product name (pulled from the page title).
I have noticed that some of the product is identified as "DISCONTINUED" (through a span) and so I would like to be able to capture that information so that I can ignore all of those products.
On the website in general, they denote these products through:
<span data*="">DISCONTINUED</span>
Any other products that are valid will not have this information on it, and so in this case I want to ensure that the script doesn't crash and just captures a blank value.
I tried using:
driver.find_element(By.XPATH, '//html/body/div/div/section/div/section/div/div/section/div/div/div/div/div/span/span[text()="DISCONTINUED"]')
However I get this error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//html/body/div/div/section/div/section/div/div/section/div/div/div/div/div/span/span[text()="DISCONTINUED"]"}
I did make sure to use only a single part number that did indeed have a DISCONTINUED on it.
I also did load up the page in dev tools and searched for that specific xpath and it did indeed highlight the proper section:
Searched:
//html/body/div/div/section/div/section/div/div/section/div/div/div/div/div/span/span
Highlighted:
<span data*="">DISCONTINUED</span>
What is the best way to do this? Also, I need to check to see how to include a blank value by default so that it will not crash when a current product is retrieved.

Considering the HTML:
<span data*="">DISCONTINUED</span>
To locate the element with the text DISCONTINUED you can use either of the following locator strategies:
Using xpath and text():
element = driver.find_element(By.XPATH, "//span[text()='DISCONTINUED']")
Using xpath and contains():
element = driver.find_element(By.XPATH, "//span[contains(., 'DISCONTINUED')]")
Ideally to locate the element you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using xpath and text():
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[text()='DISCONTINUED']")))
Using xpath and contains():
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'DISCONTINUED')]")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

This is what I used to make it work with error avoidance.
try:
discontinued = WebDriverWait(driver, 2).until(EC.visibility_of_element_located((By.XPATH, "//html/body/div/div/section/div/section/div/div/section/div/div/div/div/div/span/span[text()='ARCHIVED']")))
discontinued = "CURRENT"
except:
discontinued="DISCONTINUED"
Thanks again for the guidance on using waits instead of the find_element approach!

Related

XPATH cannot locate element with colon in its ID or name

I am on a side project to write python script to auto-login a website.
I am trying to use python script to launch FireFox browser and auto-login into a website
Below is the source code of the element i'm trying to locate:
source code
I used below
item_email= WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, "//input[contains(#id,'username')]")))
or below
item_email= WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, "//input[contains(#id,'username')]")))
or below
item_email = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, "//input[#id='j_id0:j_id5:loginComponent:loginForm:username']")))
They all ended up with selenium.common.exceptions.TimeoutException
I was able to locate elements when the input ID didn't have colons (not with namespace, i guess)
erhai950,
Explicit wait will always end up in selenium.common.exceptions.TimeoutException exception if the element is not found in HTMLDOM for any reason.
You can try the below xpath:
//input[contains(#id,'username') and #placeholder='Email']
Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste and see //input[contains(#id,'username') and #placeholder='Email'] if your desired element is getting highlighted or not.
Now if we have 1/1 matching entry then try below code:
item_email = driver.find_element(By.XPATH, "//input[contains(#id,'username') and #placeholder='Email']")
or
item_email= WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, "//input[contains(#id,'username') and #placeholder='Email']")))
Also check if this web element is in iframe or not.
Given the HTML:
the element is a <input> element which would accept text input.
Solution
Ideally to send a character sequence to an <input> element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.form-group input[id$='username'][name$='username'][placeholder='Email']"))).send_keys("erhai950")
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='form-group']//input[contains(#id, 'username') and contains(#name, 'username')][#placeholder='Email']"))).send_keys("erhai950")
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium Python: Element is not clickable

I am testing a webapp created with python dash using selenium. I am trying to click a tab but always get the ElementClickIntercepted Exception.
Note: Of course I stumbled over similiar problems but most times the problem is that the element cannot be clicked as is did not load yet - I already built in waits! - or the fact that another element would receive the click which cannot be the case either. Note that the first item is always preselected and i want to choose the second.
#Selct X-Axis
x_axis = driver.find_element(By.XPATH, "//*[#id='navbar']/a[2]")
x_axis.click()
driver.implicitly_wait(2)
try:
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//a[text()='Even']"))
)
except:
raise Exception
even= driver.find_element(By.XPATH, "//a[text()='Even']")
even.click()
Exception snapshot:
HTML snapshot:
Based on what you've posted, By.XPATH, "//a[text()='Even']" is probably selecting an HTML element different from what you're expecting.
There appears to be an <a> with text that starts with 'Even', but you've shown nothing with text that equals 'Even'.
BTW, when asking a question, you ought to at least paste the relevant HTML as text, rather than as a screenshot.
Use contains(text(), 'even') instead of text()='even' in your XPath. Because, text()='even' means the text must be 'even' but not contain 'even', and in your HTML code, the text seems not only have 'even'.
To click on either of the element instead of presence_of_element_located() you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
$$$ Response:
Using PARTIAL_LINK_TEXT:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "Response"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[starts-with(#id, 'content')]//ul//div[#class='nav-item']//a[contains(., 'Response')]"))).click()
Even $$$:
Using PARTIAL_LINK_TEXT:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "Even"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[starts-with(#id, 'content')]//ul//div[#class='nav-item']//a[starts-with(., 'Even')]"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium find attribute value (Python)

I would like to get all attribute values names 'href' from a website, there are like 10 of them. I have successfully got one using the following method:
url = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[2]/div[2]/div/div/div[2]/section[1]/div/div[2]/div[3]/ul/li[1]/div/div[1]/span/a"))).get_attribute("href")
The problem with this is it's only giving back one not all of them. I have tried to go by ID but it doesn't return anything:
url = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "a"))).get_attribute("href")
Also, the other href values are located in different xpaths:
/html/body/div[2]/div[2]/div/div/div[2]/section[1]/div/div[2]/div[3]/ul/li[2]/div/div[1]/span/a
/html/body/div[2]/div[2]/div/div/div[2]/section[1]/div/div[2]/div[3]/ul/li[3]/div/div[1]/span/a
/html/body/div[2]/div[2]/div/div/div[2]/section[1]/div/div[2]/div[3]/ul/li[4]/div/div[1]/span/a
Here's my element:
<a ph-tevent="job_click" ref="linkEle" href.bind="getUrl(linkEle, 'job', eachJob, '', eachJob.jobUrl)" data-ph-at-id="job-link" data-ph-id="ph-page-element-page20-CRUCUZ" class="au-target" au-target-id="181" ph-click-ctx="job" ph-tref="12313123213" ph-tag="ph-search-results-v2" href="https://hyperlink.com" data-ph-at-job-title-text="title" data-ph-at-job-location-text="Unknown" data-ph-at-job-location-area-text="asd" data-ph-at-job-category-text="Manufacturing" data-access-list-item="2" data-ph-at-job-id-text="A123124" data-ph-at-job-type-text="Regular" data-ph-at-job-industry-text="Manufacturing" data-ph-at-job-post-date-text="2021-12-09T00:00:00.000Z" data-ph-at-job-seqno-text="ASD212ASFS" aria-label="Senior Manager">
<div class="job-title" data-ph-id="ph-page-element-page20-0Mi3Ce">
<!--anchor-->
<!--anchor-->
<span data-ph-id="ph-page-element-page20-PLxqta">Senior Manager </span>
</div><!--anchor--> </a>
Any help is appreciated!
For finding more than one web element, you should either use find_elements or if you are using Explicit waits then you can use presence_of_all_elements_located or visibility_of_all_elements_located.
Based on the HTML that you've shared, if
this css
a[ph-tevent='job_click'][ref='linkEle']
or this xpath
//a[#ph-tevent='job_click' and #ref='linkEle']
represent all the nodes, to check below are the steps:
Please check in the dev tools (Google chrome) if we have all desired nodes entry in HTML DOM or not.
Steps to check:
Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath and see, if your desired elements are getting highlighted or not.
If they are then the below code should work:
for ele in driver.find_elements(By.XPATH, "//a[#ph-tevent='job_click' and #ref='linkEle']"):
print(ele.get_attribute('href'))
To extract the value of the href attributes using Selenium and python you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[ph-tevent='job_click'][ref='linkEle'][data-ph-at-id='job-link'][ph-click-ctx='job'][href]")))])
Using XPATH:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[#ph-tevent='job_click' and #ref='linkEle'][#data-ph-at-id='job-link' and #ph-click-ctx='job'][#href]")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Unable to locate element using selenium chrome webdriver in python selenium

I am new to python and trying to do some webscraping but have some real issues. May be you can help me out.
HTML:
<input autocomplete="off" type="search" name="search-search-field" placeholder="150k companies worldwide" data-cy-id="search-search-field" class="sc-dnqmqq grpFhe" value="">
The first part of my code looks as follows and works good without having any issues:
driver.get("https:")
login = driver.find_element_by_xpath(email_xpath).send_keys(email)
login = driver.find_element_by_xpath(pwd_xpath).send_keys(pwd)
login = driver.find_element_by_xpath(continue_xpath)
login.click()
time.sleep(10)
email and pwd are variables including my login details. As I said that part works pretty fine.
The issues I have are with the following code line:
search = driver.find_element_by_xpath('/html/body/div[1]/div/div[1]/header/div/nav/div[1]/div/div/fieldset/input')
As a result I get this following error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[1]/div/div[1]/header/div/nav/div[1]/div/div/fieldset/input"}
I tried and tried but could not solve the problem. I would appreciate it very much, if anyone could help me out. Thank you!
To locate the search field you can use either of the following Locator Strategies:
Using css_selector:
search = driver.find_element_by_css_selector("input[name='search-search-field'][data-cy-id='search-search-field']")
Using xpath:
search = driver.find_element_by_xpath("//input[#name='search-search-field' and #data-cy-id='search-search-field']")
Ideally, to locate the element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
search = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='search-search-field'][data-cy-id='search-search-field']")))
Using XPATH:
search = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[#name='search-search-field' and #data-cy-id='search-search-field']")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
References
You can find a couple of relevant discussions on NoSuchElementException in:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element while trying to click Next button with selenium
selenium in python : NoSuchElementException: Message: no such element: Unable to locate element
The following Xpath will work and is much simpler.
/html/body/div[1]//fieldset/input
do not use absolute xpath or css , always use relative as it is more stable
Absolute (Full) xpath will be depended on parent and so if parent changes the locator will fail to find the element
in xpath and css the locator can be used in the form:
//tagname[#attributename="attriubutevalue"] - Xpath
tagname[attributename="attriubutevalue"] - CSS
so you can use any attribute , type, name , id ,class, what ever attribute there in your element eg:
//input[#type="search"] - xpath
input[type="search"] - css
search = driver.find_element_by_xpath('//input[#type="search"]')
Try wait:
WebDriverWait(driver,15).until(EC.presence_of_element_located((By.XPATH, '//input[#type="search"]')))

Python Selenium - get href value

I am trying to copy the href value from a website, and the html code looks like this:
<p class="sc-eYdvao kvdWiq">
<a href="https://www.iproperty.com.my/property/setia-eco-park/sale-
1653165/">Shah Alam Setia Eco Park, Setia Eco Park
</a>
</p>
I've tried driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") but it returned 'list' object has no attribute 'get_attribute'. Using driver.find_element_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") returned None. But i cant use xpath because the website has like 20+ href which i need to copy all. Using xpath would only copy one.
If it helps, all the 20+ href are categorised under the same class which is sc-eYdvao kvdWiq.
Ultimately i would want to copy all the 20+ href and export them out to a csv file.
Appreciate any help possible.
You want driver.find_elements if more than one element. This will return a list. For the css selector you want to ensure you are selecting for those classes that have a child href
elems = driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq [href]")
links = [elem.get_attribute('href') for elem in elems]
You might also need a wait condition for presence of all elements located by css selector.
elems = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sc-eYdvao.kvdWiq [href]")))
As per the given HTML:
<p class="sc-eYdvao kvdWiq">
Shah Alam Setia Eco Park, Setia Eco Park
</p>
As the href attribute is within the <a> tag ideally you need to move deeper till the <a> node. So to extract the value of the href attribute you can use either of the following Locator Strategies:
Using css_selector:
print(driver.find_element_by_css_selector("p.sc-eYdvao.kvdWiq > a").get_attribute('href'))
Using xpath:
print(driver.find_element_by_xpath("//p[#class='sc-eYdvao kvdWiq']/a").get_attribute('href'))
If you want to extract all the values of the href attribute you need to use find_elements* instead:
Using css_selector:
print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_css_selector("p.sc-eYdvao.kvdWiq > a")])
Using xpath:
print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_xpath("//p[#class='sc-eYdvao kvdWiq']/a")])
Dynamic elements
However, if you observe the values of class attributes i.e. sc-eYdvao and kvdWiq ideally those are dynamic values. So to extract the href attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a"))).get_attribute('href'))
Using XPATH:
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//p[#class='sc-eYdvao kvdWiq']/a"))).get_attribute('href'))
If you want to extract all the values of the href attribute you can use visibility_of_all_elements_located() instead:
Using CSS_SELECTOR:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a")))])
Using XPATH:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[#class='sc-eYdvao kvdWiq']/a")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
The XPATH
//p[#class='sc-eYdvao kvdWiq']/a
return the elements you are looking for.
Writing the data to CSV file is not related to the scraping challenge. Just try to look at examples and you will be able to do it.
To crawl any hyperlink or Href, proxycrwal API is ideal as it uses pre-built functions for fetching desired information. Just pip install the API and follow the code to get the required output. The second approach to fetch Href links using python selenium is to run the following code.
Source Code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
list = ['https://www.heliosholland.com/Ampullendoos-voor-63-ampullen','https://www.heliosholland.com/lege-testdozen’]
driver = webdriver.Chrome()
wait = WebDriverWait(driver,29)
for i in list:
driver.get(i)
image = wait.until(EC.visibility_of_element_located((By.XPATH,'/html/body/div[1]/div[3]/div[2]/div/div[2]/div/div/form/div[1]/div[1]/div/div/div/div[1]/div/img'))).get_attribute('src')
print(image)
To scrape the link, use .get_attribute(‘src’).
Get the whole element you want with driver.find_elements(By.XPATH, 'path').
To extract the href link use get_attribute('href').
Which gives,
driver.find_elements(By.XPATH, 'path').get_attribute('href')
try something like:
elems = driver.find_elements_by_xpath("//p[contains(#class, 'sc-eYdvao') and contains(#class='kvdWiq')]/a")
for elem in elems:
print elem.get_attribute['href']

Categories

Resources