Trying to click amazon Best Sellers Rank (Python)

Trying to click amazon Best Sellers Rank (Python) - python

Hello i'm trying to click those links but when I try to click with
driver.find_element_by_xpath('//*[#id="productDetails_detailBullets_sections1"]/tbody/tr[6]/td/span/span[2]/a').click()
its work's but problem is every items has different path and its changing and it does not work for some items
URL: https://www.amazon.com/MICHELANGELO-Piece-Rainbow-Kitchen-Knife/dp/B074T6C4YS/ref=zg_bs_289857_1?_encoding=UTF8&psc=1&refRID=K5GAX1GF2SDZMN3NS403>

The Amazon webpage has 3 entries for Best Sellers Rank. An effective approach would be to collect the href of all the three(3) Best Sellers, store them in a list and open in a seperate tab to scrape. To construct the list you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.amazon.com/dp/B074T6C4YS')
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#productDetails_detailBullets_sections1 td>span>span a")))])
Using CSS_SELECTOR in a single line:
driver.get('https://www.amazon.com/dp/B074T6C4YS')
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[#id='productDetails_detailBullets_sections1']//td/span/span//a")))])
Console Output:
['https://www.amazon.com/gp/bestsellers/kitchen/ref=pd_zg_ts_kitchen', 'https://www.amazon.com/gp/bestsellers/kitchen/289857/ref=pd_zg_hrsr_kitchen', 'https://www.amazon.com/gp/bestsellers/kitchen/289862/ref=pd_zg_hrsr_kitchen']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

This is easy one, even you did not specify which link do you want, only from the table all different links that will transfer you to table.
You need to use customized xpath like e.g.
//*[#id="productDetails_detailBullets_sections1"]/tbody/tr[6]/td/span/span['+i+']/a'
Where i will be your iterator in for loop. To get valu of i use something like
driver.find_elements_by_xpath('//*[#id="productDetails_detailBullets_sections1"]/tbody/tr[6]/td/span/span').size();

Related

Keep only an element of a webpage while web-scraping

I am trying to extract a table from a webpage with python. I managed to get all the contents inside of that table, but since I am very new to webscrapping I don't know how to keep only the elements that I am looking for.
I know that I should look for this class in the code: <a class="_3BFvyrImF3et_ZF21Xd8SC", which specify the items in the table.
So how can I keep only those classes to then extract the title of them?
<a class="_3BFvyrImF3et_ZF21Xd8SC" title="r/Python" href="/r/Python/">r/Python</a>
<a class="_3BFvyrImF3et_ZF21Xd8SC" title="r/Java" href="/r/Java/">r/Java</a>
I miserably failed in writing a code for that. I don't know how I could extract only these classes, so any inputs will be highly appreciated.

To extract the value of title attributes you can use list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a._3BFvyrImF3et_ZF21Xd8SC[title]")])
Using XPATH:
print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.XPATH, "//a[#class='_3BFvyrImF3et_ZF21Xd8SC' and #title]")])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Okay, I have made a very simple thing that worked.
Basically I pasted the code on VSCODE and the selected all the occurrences of that class. Then I just had to copy and paste in another file. Not sure why the shortcut CTRL + Shift + L did not work, but I have managed to get what I needed.
Select all occurrences of selected word in VSCode

Python Selenium How to take grandchild text

I am working with this website= https://www.offerte.smartpaws.de/
In the final page once you have put all your variables in, we get a page with this info:
and the HTML for the price looks like this:
I am looking to store the prices for each product, but it seems like I have to select the parent element based on the text "Optimal Care", "Classic Case", etc.. and go down to the child element to extract the price.
I am not sure how to do this, my current code for this specific scenario:
price_optimal = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//form[#class="summary-table-title" and #text="Optimal Care"]/*/*/'))).get_attribute("text()")
How would I take a text based on a parent element text?
Thanks

If you want to only retrieve price then you can use the below XPath:
//div[#class='summary-table-price price-monthly']//div[2]
with find_elements or with explicit waits.
With find_elements
Code:
for price in driver.find_elements(By.XPATH, "//div[#class='summary-table-price price-monthly']//div[2]"):
print(price.get_attribute('innerText'))
With ExplicitWaits:
for price in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='summary-table-price price-monthly']//div[2]"))):
print(price.get_attribute('innerText'))
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
If you want to extract based on the plan name such as Classic Care
You should use the below XPath:
//div[text()='Classic Care']//following-sibling::div[#class='summary-table-price price-monthly']/descendant::div[2]
and use it like this:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Classic Care']//following-sibling::div[#class='summary-table-price price-monthly']/descendant::div[2]"))).get_attribute('innerText'))
Also, for other plans, all you will have to do is to replace Classic care with Optimal Care or Basic care in the XPath.

Selenium not extracting info using xpath

I am trying to extract some information from the amazon website using selenium. But I am not able to scrape that information using xpath in selenium.
In the image below I want to extract the info highlighted.
This is the code I am using
try:
path = "//div[#id='desktop_buybox']//div[#class='a-box-inner']//span[#class='a-size-small')]"
seller_element = WebDriverWait(driver, 5).until(
EC.visibility_of_element_located((By.XPATH, path)))
except Exception as e:
print(e)
When I run this code, it shows that there is an error with seller_element = WebDriverWait(driver, 5).until( EC.visibility_of_element_located((By.XPATH, path))) but does not say what exception it is.
I tried looking online and found that this happens when selenium is not able to find the element in the webpage.
But I think the path I have specified is right. Please help me.
Thanks in advance
[EDIT-1]
This is the exception I am getting
Message:

//div[class='a-section a-spacing-none a-spacing-top-base']//span[class='a-size-small a-color-secondary']
XPath could be something like this. You can shorten this.
CSS selector could be and so forth.
.a-section.a-spacing-none.a-spacing-top-base
.a-size-small.a-color-secondary

I think the reason is xpath expression is not correct.
Take the following element as an example, it means the span has two class:
<span class="a-size-small a-color-secondary">
So, span[#class='a-size-small') will not work.
Instead of this, you can ues xpath as
//span[contains(#class, 'a-size-small') and contains(#class, 'a-color-secondary')]
or cssSelector as
span.a-size-small.a-color-secondary

Amazon is updating its content on the basis of the country you are living in, as I have clicked on the link provided by you, there I did not find the element you are looking for simply because the item is not sold here in India.
So in short if you are sitting in India and try to find your element, it is not there, but as you change the location to "United States". it is appearing there.
Solution - Change the location

To print the Ships from and sold by Amazon.com of an element you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and get_attribute():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.a-section.a-spacing-none.a-spacing-top-base > span.a-size-small.a-color-secondary"))).get_attribute("innerHTML"))
Using XPATH and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='a-section a-spacing-none a-spacing-top-base']/span[#class='a-size-small a-color-secondary']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Outro
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

How can I get the text in Selenium?

I want to get the text of an element in selenium. First I did this:
team1_names = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".home span"))
)
for kir in team1_names:
print(kir.text)
It didn’t work out. So I tried this:
team1_name = driver.find_elements_by_css_selector('.home span')
print(team1_name.getText())
so team1_name.text doesn’t work either.
So what's wrong with it?

You need to take care of a couple of things here:
presence_of_element_located() is the expectation for checking that an element is present on the DOM of a page. This does not necessarily mean that the element is visible.
Additionally, presence_of_element_located() returns a WebElement, not a list. Hence, iterating through for won't work.
Solution
As a solution you need to induce WebDriverWait for the visibility_of_element_located(), and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and text attribute:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".home span"))).text)
Using XPATH and get_attribute():
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[#class='home']//span"))).get_attribute("innerHTML"))
Note: You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Outro
Link to useful documentation:
The get_attribute() method Gets the given attribute or property of the element.
The text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Order of found elements in Selenium

I'm using selenium with python to interact with a webpage.
There is a table in the webpage. I'm trying to access to its rows with this code:
rows = driver.find_elements_by_class_name("data-row")
It works as expected. It returns all elements of the table.
The question is, is the order of the returned elements guaranteed to be the same as they appear on the page?
For example, Will the first row that I see in the table in browser ALWAYS be the 0th index in the array?

You shouldn't be depending on the fact whether Selenium returns the elements in the same order as they appear on the webpage or DOM Tree.
Each WebElement within the HTML DOM can be identified uniquely using either of the Locator Strategies.
Though you were able to pull out all the desired elements using find_elements_by_class_name() as follows:
rows = driver.find_elements_by_class_name("data-row")
Ideally, you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CLASS_NAME:
element = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "data-row")))
Using CSS_SELECTOR:
element = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".data-row")))
Using XPATH:
element = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#class='data-row']")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a detailed discussion in WebDriverWait not working as expected

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to click amazon Best Sellers Rank (Python) - python

Related

Keep only an element of a webpage while web-scraping

Python Selenium How to take grandchild text

Selenium not extracting info using xpath

How can I get the text in Selenium?

Order of found elements in Selenium

Categories

Resources