Selenium not extracting info using xpath - python

I am trying to extract some information from the amazon website using selenium. But I am not able to scrape that information using xpath in selenium.
In the image below I want to extract the info highlighted.
This is the code I am using
try:
path = "//div[#id='desktop_buybox']//div[#class='a-box-inner']//span[#class='a-size-small')]"
seller_element = WebDriverWait(driver, 5).until(
EC.visibility_of_element_located((By.XPATH, path)))
except Exception as e:
print(e)
When I run this code, it shows that there is an error with seller_element = WebDriverWait(driver, 5).until( EC.visibility_of_element_located((By.XPATH, path))) but does not say what exception it is.
I tried looking online and found that this happens when selenium is not able to find the element in the webpage.
But I think the path I have specified is right. Please help me.
Thanks in advance
[EDIT-1]
This is the exception I am getting
Message:

//div[class='a-section a-spacing-none a-spacing-top-base']//span[class='a-size-small a-color-secondary']
XPath could be something like this. You can shorten this.
CSS selector could be and so forth.
.a-section.a-spacing-none.a-spacing-top-base
.a-size-small.a-color-secondary

I think the reason is xpath expression is not correct.
Take the following element as an example, it means the span has two class:
<span class="a-size-small a-color-secondary">
So, span[#class='a-size-small') will not work.
Instead of this, you can ues xpath as
//span[contains(#class, 'a-size-small') and contains(#class, 'a-color-secondary')]
or cssSelector as
span.a-size-small.a-color-secondary

Amazon is updating its content on the basis of the country you are living in, as I have clicked on the link provided by you, there I did not find the element you are looking for simply because the item is not sold here in India.
So in short if you are sitting in India and try to find your element, it is not there, but as you change the location to "United States". it is appearing there.
Solution - Change the location

To print the Ships from and sold by Amazon.com of an element you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and get_attribute():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.a-section.a-spacing-none.a-spacing-top-base > span.a-size-small.a-color-secondary"))).get_attribute("innerHTML"))
Using XPATH and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='a-section a-spacing-none a-spacing-top-base']/span[#class='a-size-small a-color-secondary']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Outro
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Related

Web Scraping using Selenium not returning same results as on the UI

Last week User #KunduK kindly helped me scrap a website to return the address of a particular record
Record in question : https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN
By Using the following snippet of Code;
address=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h4[data-aura-rendered-by] ~p:nth-of-type(1)"))).text
print(address)
However whilst trying to understand the snippet i started to see some additional data being returned.
On the screen shot below, the Left is the expected results to be returned, however on the right is what is being returned.
Inspecting the element i can see there is an additional row (highlighted in yellow)(that's not being presented on the UI (right hand side)
I am also trying to get the "Website" and Reference Number" and following the example provided before, however following these steps (https://www.scrapingbee.com/blog/selenium-python/) i am not able to get the desired results being returned
Current Code:
Website=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".accordion_text h4"))).text
print(Website)
Website Inspect
Looking forward to your help!
To extract the Website address and Firm reference number ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using Website address:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[text()='Website']//following-sibling::a[1]"))).get_attribute("href"))
Using Firm reference number:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[text()='Firm reference number']//following-sibling::p[1]"))).text)
Console Output:
https://www.masonowen.com/
311960
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Python Selenium How to take grandchild text

I am working with this website= https://www.offerte.smartpaws.de/
In the final page once you have put all your variables in, we get a page with this info:
and the HTML for the price looks like this:
I am looking to store the prices for each product, but it seems like I have to select the parent element based on the text "Optimal Care", "Classic Case", etc.. and go down to the child element to extract the price.
I am not sure how to do this, my current code for this specific scenario:
price_optimal = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//form[#class="summary-table-title" and #text="Optimal Care"]/*/*/'))).get_attribute("text()")
How would I take a text based on a parent element text?
Thanks
If you want to only retrieve price then you can use the below XPath:
//div[#class='summary-table-price price-monthly']//div[2]
with find_elements or with explicit waits.
With find_elements
Code:
for price in driver.find_elements(By.XPATH, "//div[#class='summary-table-price price-monthly']//div[2]"):
print(price.get_attribute('innerText'))
With ExplicitWaits:
for price in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='summary-table-price price-monthly']//div[2]"))):
print(price.get_attribute('innerText'))
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
If you want to extract based on the plan name such as Classic Care
You should use the below XPath:
//div[text()='Classic Care']//following-sibling::div[#class='summary-table-price price-monthly']/descendant::div[2]
and use it like this:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Classic Care']//following-sibling::div[#class='summary-table-price price-monthly']/descendant::div[2]"))).get_attribute('innerText'))
Also, for other plans, all you will have to do is to replace Classic care with Optimal Care or Basic care in the XPath.

Pass a Selenium WebElement to WebDriverWait

I'm trying to click a Javascript link, but I can't get it to work.
First I'm getting list of Links using this code:
links = driver.find_elements_by_xpath("(//div[#class='market-box-wp collapse in'])[1]//a[#class='truncate']")
then trying to click some of them
links[3].click() #Doesn't work
I found this solution online for Javascript links, but it's using xPath, not sure how to pass links[3] to it:
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH,"Xpath of Element"))).click()
You can use xpath indexing :-
see this is the xpath
(//div[#class='market-box-wp collapse in'])[1]//a[#class='truncate']
Now to locate the 3rd item, you could do this :
((//div[#class='market-box-wp collapse in'])[1]//a[#class='truncate'])[3]
and use it like this :
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH,"((//div[#class='market-box-wp collapse in'])[1]//a[#class='truncate'])[3]"))).click()

How can I get the text in Selenium?

I want to get the text of an element in selenium. First I did this:
team1_names = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".home span"))
)
for kir in team1_names:
print(kir.text)
It didn’t work out. So I tried this:
team1_name = driver.find_elements_by_css_selector('.home span')
print(team1_name.getText())
so team1_name.text doesn’t work either.
So what's wrong with it?
You need to take care of a couple of things here:
presence_of_element_located() is the expectation for checking that an element is present on the DOM of a page. This does not necessarily mean that the element is visible.
Additionally, presence_of_element_located() returns a WebElement, not a list. Hence, iterating through for won't work.
Solution
As a solution you need to induce WebDriverWait for the visibility_of_element_located(), and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and text attribute:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".home span"))).text)
Using XPATH and get_attribute():
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[#class='home']//span"))).get_attribute("innerHTML"))
Note: You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Outro
Link to useful documentation:
The get_attribute() method Gets the given attribute or property of the element.
The text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Python selenium cannot find element even with wait

I am trying to send text to an input field, but selenium is not able to find the element.
element = WebDriverWait(b, 10).until(EC.presence_of_element_located((By.XPATH, '/html/body/table/tbody/tr[1]/td/form/div/table/tbody/tr[2]/td/table[2]/tbody/tr/td[4]/table/tbody/tr/td[1]/input')))
element.send_keys("Customer Care", Keys.ENTER)
I've tried using the XPATH, the full XPATH and the ID to locate it, but it keeps giving me an error that indicates that it cannot find the element
selenium.common.exceptions.TimeoutException
A snippet of the HTML element
<input class="iceInpTxt testBox" id="headerForm:jumpto" maxlength="40" name="headerForm:jumpto" onblur="setFocus('');iceSubmitPartial(form, this, event);" onfocus="setFocus(this.id);" onkeyup="iceSubmit(form,this,event);" onmousedown="this.focus();" type="text" value="">
Element has ID, use it as locator. Check if element is inside a iframe:
wait = WebDriverWait(b, 10)
element = wait.until(EC.element_to_be_clickable((By.ID, 'headerForm:jumpto')))
element.send_keys("Customer Care", Keys.ENTER)
How to switch to iframe:
wait = WebDriverWait(b, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe_locator")))
element = wait.until(EC.element_to_be_clickable((By.ID, 'headerForm:jumpto')))
element.send_keys("Customer Care", Keys.ENTER)
# How to go back to default content
b.switch_to.default_content()
it is a good idea to check whether or not you installed and imported selenium or other necessary packages. Use pip to check your version and see if there is a bug online. Please let me know what python version you are using. It is likely the XPATH you provided was incorrect or maybe try increasing the amount of time in the 2nd parameter of WebDriverWait(1st,2nd). It would be much more helpful if you had a link to this html page so I could check your XPATH. If you'd like further help, please provide your html page.
Edit:
This is something that needs to be reproduced so that it can be checked. If you have tried the above, I am unable to help unless I see the html document. You should remove all sensitive information before sharing it. The other elements of your code seem to be correct.
If your usecase involves invoking click() or send_keys() while inducing WebDriverWait instead of presence_of_element_located() you need to use the expected_conditions as element_to_be_clickable() as follows:
So effectively, you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
WebDriverWait(b, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.iceInpTxt.testBox[id^='headerForm'][name$='jumpto']"))).send_keys("Customer Care", Keys.ENTER)
Using XPATH:
WebDriverWait(b, 10).until(EC.element_to_be_clickable((By.XPATH, "//input[#class='iceInpTxt testBox' and #id='headerForm:jumpto'][#name='headerForm:jumpto']"))).send_keys("Customer Care", Keys.ENTER)
References
You can find a couple of detailed discussion about the different expected_conditions in:
WebDriverWait not working as expected

Categories

Resources