Python - extract image source address from a Selenium div element - python

We are working on extracting the image source address from the page.
<div class="product-row">
<div class="product-item">
<div class="product-picture"><img src="https://t3a.coupangcdn.com/thumbnails/remote/212x212ex/image/vendor_inventory/6ca9/2e097d911efc291473d0c47052cdc8f42d7b7b8f2a3ebbb0ccc974d76fe4.jpg" alt="product"><div><button type="button" class="ant-btn hover-btn btn-open-detail">
</div></div>
<div class="product-item">
<div class="product-picture">
<img src="https://thumbnail11.coupangcdn.com/thumbnails/remote/212x212ex/image/retail/images/239519218793467-6edc7d92-4165-4476-a528-fa238ffeeeb6.jpg" alt="product"><div></div></div>
I tried to get it in the following way:
ele = driver.find_elements_by_xpath("//div[#class='product-picture']/img")
print(ele)
Output:
<selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="27ef8c33-624d-4166-9dc7-3a355c4dcc32")>
<selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="a6d77107-fecf-4c84-a048-9b4bda39b9df")>
<selenium.webdriver.remote.webelement.WebElement (session="d9fd08b93bd5dd83fe520826c1f6fd77", element="1f62cb8b-df58-4f06-afe6-6c60cb572527")>
I want the image source address string of every <div class="product-picture"> element on the page. Is there a way to extract a string?

from selenium.webdriver.common.by import By
images = driver.find_elements(By.XPATH, "//div[#class='product-picture']/img")
for img in images:
print(img.get_attribute("src"))
This will give you the expected output:
https://t3a.coupangcdn.com/thumbnails/remote/212x212ex/image/vendor_inventory/6ca9/2e097d911efc291473d0c47052cdc8f42d7b7b8f2a3ebbb0ccc974d76fe4.jpg"
https://thumbnail11.coupangcdn.com/thumbnails/remote/212x212ex/image/retail/images/239519218793467-6edc7d92-4165-4476-a528-fa238ffeeeb6.jpg

Try to use get_attribute('src') method to grab the src value
ele = driver.find_elements_by_xpath("//div[#class='product-picture']/img").get_attribute('src')

You are using deprecated syntax. Please see Python Selenium warning "DeprecationWarning: find_element_by_* commands are deprecated"
The optimal way of locating elements which are likely to be lazy loading would be:
images = WebDriverWait(browser, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[#class='product-picture']/img")))
for i in images:
print(i.get_attribute('src')
You will also need the following imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Selenium docs can be found at https://www.selenium.dev/documentation/

Related

Find specific link on a website with Selenium (in Python)

I'm trying to scrape specific links on a website. I'm using Python and Selenium 4.8.
The HTML code looks like this with multiple lists, each containing a link:
<li>
<div class="programme programme xxx" >
<div class="programme_body">
<h4 class="programme titles">
<a class="br-blocklink__link" href="https://www.example_link1.com">
</a>
</h4>
</div>
</div>
</li>
<li>...</li>
<li>...</li>
So each < li > contains a link.
Ideally, I would like a python list with all the hrefs which I can then iterate through to get additional output.
Thank you for your help!
You can try something like below (untested, as you didn't confirm the url):
[...]
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
[...]
wait = WebDriverWait(driver, 25)
[...]
wanted_elements = [x.get_attribute('href') for x in wait.until(EC.presence_of_all_elements_located((By.XPATH, '//li//h4[#class="programme titles"]/a[#class="br-blocklink__link"]')))]
Selenium documentation can be found here.
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.example.com")
lis = driver.find_elements_by_xpath('//li//a[#class="br-blocklink__link"]')
hrefs = []
for li in lis:
hrefs.append(li.get_attribute('href'))
driver.quit()
This will give you a list hrefs with all the hrefs from the website. You can then iterate through this list and use the links for further processing.

Not able to target 2nd matching element by XPATH

I have a html content like this
<div class="ng-star-inserted">
<span class="ng-star-inserted">
Seismic interpreter
</span>
<span class="ng-star-inserted">
Geophysicist
</span>
</div>
I have a selenium script where I am trying to get the first role name and second role name.
I tried this, but it's not giving me the 1st and 2nd elements as expected. Any idea what am I missing? thanks!
'//div[#class="ng-star-inserted"]//span//a[1]'
'//div[#class="ng-star-inserted"]//span//a[2]'
You can xpath indexing in this case :
so instead of this :
'//div[#class="ng-star-inserted"]//span//a[1]'
'//div[#class="ng-star-inserted"]//span//a[2]'
use this :
"(//div[#class='ng-star-inserted']//span//a)[1]"
"(//div[#class='ng-star-inserted']//span//a)[2]"
or if you think the text won't change, you can make use of LINK_TEXT as well or PARTIAL_LINK_TEXT
Sample code :-
wait = WebDriverWait(driver, 10)
button1 = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "Seismic interpreter")))
button1.click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
So you can use the below way
((//a[contains(text(),'Seismic interpreter')])[2])
((//a[contains(text(),'Geophysicist')])[2])
clickLink=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"((//a[contains(text(),'Seismic interpreter')])[2])")))
checkbox.click()

I keep getting a NoSuchElementException in selenium even though the element DOES exist

This is my first scraping project using selenium. I'm trying to download a couple of reddit videos saved in a list using this website.
As you can see, it shows an input tag where I need to enter the url of the video or gif then go to the download page. That input tag has a class name form-control form-control-lg form-control-alternative. So when I try getting that element so I can fill it with a link from a list in Python, it shows a selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .form-control form-control-lg form-control-alternative error.
You can check yourselves using the Developer Tools and you'll see that that input tag has that class.
Here's my code:
for gif in gif_list:
driver.get('https://keepv.id/reddit-video-downloader')
input_tag = driver.find_element_by_class_name('form-control form-control-lg form-control-alternative')
input_tag.send_keys(gif)
go_button = driver.find_element_by_class_name('btn btn-danger')
go_button.click()
second_button = driver.find_element_by_class_name('btn btn-danger sheen waggle spin')
second_button.click()
WebDriverWait(driver, 5).until(expected_conditions.presence_of_element_located((By.CLASS_NAME, 'row')))
download_button = driver.find_element_by_class_name('btn btn-lg btn-danger mb-3 shadow vdlbtn')
download_button.click()
gif_url = driver.current_url()
download_gif(gif_url)
find_element_by_class_name() accept single class only use css selector instead.
driver.find_element_by_css_selector(".form-control.form-control-lg.form-control-alternative").send_keys(gif)
driver.find_element_by_css_selector(".btn.btn-danger").click()
Or you can use by_id
driver.find_element_by_id("dlURL").send_keys(gif)
driver.find_element_by_id("dlBTN1").click()
Ideally you should use WebDriverWait() and wait for element_to_be_clickable() and following css selecor
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,".form-control.form-control-lg.form-control-alternative"))).send_keys(gif)
To click go button
WebDriverWait(browser,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,".btn.btn-danger"))).click()
you need to import below libraries
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Wait for every element in Selenium

I want to retrieve from a website every div class='abcd' element using Selenium together with 'waiter' and 'XPATH' classes from Explicit.
The source code is something like this:
<div class='abcd'>
<a> Something </a>
</div>
<div class='abcd'>
<a> Something else </a>
...
When I run the following code (Python) I get only 'Something' as a result. I'd like to iterate over every instance of the div class='abcd' appearing in the source code of the website.
from explicit import waiter, XPATH
from selenium import webdriver
driver = webdriver.Chrome(PATH)
result = waiter.find_element(driver, "//div[#class='abcd']/a", by=XPATH).text
Sorry if the explanation isn't too technical, I'm only starting with webscraping. Thanks
I've used like this. You can also use if you like this procedure.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(PATH)
css_selector = "div.abcd"
results = WebDriverWait(driver, 10).until((expected_conditions.presence_of_all_elements_located((By.CSS_SELECTOR, css_selector))))
for result in results:
print(result.text)

Finding li element by text inside - Selenium Python

I have been trying to find a way to click a specific li element by using the text inside, however, whatever I try seems to not be finding the element.
The HTML is;
<ul class="shoes-sizen-mp" id="ul_top_hypers" style="overflow: auto;">
<li id="li_284" onclick="return select_size('284')"><a class="a_top_hypers"> 3.5 <span style="display: none">,</span><br> 35.5<br></a></li>
<li id="li_285" onclick="return select_size('285')"><a class="a_top_hypers"> 4 <span style="display: none">,</span><br> 36<br></a></li>
This is part of a list where you can select your size (such as 3.5,4,4.5,5). I want to be able to click the specific one by using the text such as 3.5, for example.
Edit
driver.find_element_by_xpath(f"//ul[#id='ul_top_hypers' and starts-with(#id, 'li_'][contains(text(),'{user_shoe_size}')]").click()
sleep(20)
Above is one of the many things I have tried to locate the element, but nothing just yet.
Any help would be greatly appreciated. Thanks a lot!
To click on dynamic induce WebDriverWait() and wait for element_to_be_clickable() and following xpath.
def click_size(size):
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//ul[#class='shoes-sizen-mp']//li[contains(.,'{}')]".format(size)))).click()
click_size("3.5")
You need to import below libraries.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Categories

Resources