Python: find element by a href - python

I used webdriver Chrome to scrape data from a website, but I don't know how to extract data from a href.
HTM:
<div class="buySearchResultContent">
<ul id="CARS_LIST_DATA">
<li class="seo_list" data-seo_name="440285">
<div class="buySearchResultContentImg">
<a href="carinfo-333285.php">
<img src="carpics/9400180056/290x200/20180305101502854_4567823.jpg" srcset="carpics/9400180056/290x200/20180305101502854_9098765.jpg 290w, carpics/9400180056/435x300/20180305101502854_00000.jpg 435w , carpics/9400180056/720x520/20180305101502854_00001.jpg 720w" sizes="(min-width: 992px) 75vw, 90vw" alt="auto">
</a>
My code:
driver = webdriver.Chrome("C:/chromedriver.exe")
url = "https://www.asdf.com.tw/price-02.php?v=3&brand=lisa&model=lulu&year1=2009&year2=2018&page=1"
driver.get(url)
content=driver.find_element_by_class_name('buySearchResultContentImg')
print(content)
What I want to extract is "carinfo-333285.php". Thanks!

Try the following code:
from selenium.common.exceptions import NoSuchElementException
try:
a_element = driver.find_element_by_xpath('//div[contains(#class,
"buySearchResultContentImg")]/a[#href]')
link = a_element.get_attribute("href")
except NoSuchElementException:
link = None

As per the HTML you have provided to extract the href attribute you can use either of the following Locator Strategies :
css_selector :
myHref = driver.find_element_by_css_selector("div.buySearchResultContentImg > a").get_attribute("href")
xpath :
myHref = driver.find_element_by_xpath("//div[#class='buySearchResultContentImg']/a").get_attribute("href")

I don't know too much about python, Please try this
Jpg_href= driver.find_element_by_xpath("//div[#class='buySearchResultContentImg']/a[#href='carinfo-333285.php']").get_attribute("href")

Related

Click and scrape 'a href' links by class name using Selenium in Python

I have the following a href link with only a class identifier. I'm trying to have Selenium recursively click through each link, but Selenium isn't returning the proper page sources from each 'a href' links.
<div class="row">
<div class="item">
↳<a href /path/to/link/ class="link-box">
<div class="item">
<div class="item">
<div class="item">
What am I doing wrong here:
driver = webdriver.Chrome("/Users/me/Downloads/chromedriver", options=options)
driver.get("https://the_website")
link_box = driver.find_elements_by_class_name('link-box')
for i in range(len(link_box)):
driver.execute_script("arguments[0].click();", link_box[i])
page_source = driver.page_source
pprint(page_source)
I wrote another code to do it.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
#driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver = webdriver.Firefox(executable_path='geckodriver')
driver.get("url")
l=[]
for a in driver.find_elements_by_class_name('link-box'):
link = a.get_attribute('href')
l.append(link)
print(l)
for b in range(len(l)):
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[b+1])
driver.get(l[b])
print(l[b])
First, it will take all the link which has class link-box. Then it will open all the links in new tabs. Otherwise, there might be an error. I did this with Firefox but if you are doing with Chrome comment line 4 and uncomment line 3. Then give the right path.

Get links from a certain div using Selenium in Python

I have the following HTML page. I want to get all the links inside a specific div. Here is my HTML code:
<div class="rec_view">
<a href='www.xyz.com/firstlink.html'>
<img src='imga.png'>
</a>
<a href='www.xyz.com/seclink.html'>
<img src='imgb.png'>
</a>
<a href='www.xyz.com/thrdlink.html'>
<img src='imgc.png'>
</a>
</div>
I want to get all the links that are present on the rec_view div. So those links that I want are,
www.xyz.com/firstlink.html
www.xyz.com/seclink.html
www.xyz.com/thrdlink.html
Here is the Python code which I tried with
from selenium import webdriver;
webpage = r"https://www.testurl.com/page/123/"
driver = webdriver.Chrome("C:\chromedriver_win32\chromedriver.exe")
driver.get(webpage)
element = driver.find_element_by_css_selector("div[class='rec_view']>a")
link = element.get_attribute("href")
print(link)
How can I get those links using selenium on Python?
As per the HTML you have shared to get the list of all the links that are present on the rec_view div you can use the following code block :
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe')
driver.get('https://www.testurl.com/page/123/')
elements = driver.find_elements_by_css_selector("div.rec_view a")
for element in elements:
print(element.get_attribute("href"))
Note : As you need to collect all the href attributes from the div tag so instead of find_element_* you need to use find_elements_*. Additionally, > refers to immediate <a> child node where as you need to traverse all the <a> child nodes so the desired css_selector will be div.rec_view a

I m not able to click on a button in a web page using selenium in python

What is wrong in the below code
import os
import time
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://x.x.x.x/html/load.jsp")
elm1 = driver.find_element_by_link_text("load")
time.sleep(10)
elm1.click()
time.sleep(30)
driver.close()
The page source is
<body>
<div class="formcenterdiv">
<form class="form" action="../load" method="post">
<header class="formheader">Loader</header>
<div align="center"><button class="formbutton">load</button></div>
</form>
</div>
</body></html>
I want to click on button load. when I ran the above code getting this error
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: load
As the documentation says, find_elements_by_link_text only works on a tags:
Use this when you know link text used within an anchor tag. With this
strategy, the first element with the link text value matching the
location will be returned. If no element has a matching link text
attribute, a NoSuchElementException will be raised.
The solution is to use a different selector like find_element_by_class_name:
elm1 = driver.find_element_by_class_name('formbutton')
Did you try using Xpath?
As the OP said, find_elements_by_link_text works on a tags only:
Below code might help you out
driver.get_element_by_xpath("/html/body/div/form/div/button")

Getting href value of a tag of selenium web element

I want to get the url of the link of tag. I have attached the class of the element to type selenium.webdriver.remote.webelement.WebElement in python:
elem = driver.find_elements_by_class_name("_5cq3")
and the html is:
<div class="_5cq3" data-ft="{"tn":"E"}">
<a class="_4-eo" href="/9gag/photos/a.109041001839.105995.21785951839/10153954245456840/?type=1" rel="theater" ajaxify="/9gag/photos/a.109041001839.105995.21785951839/10153954245456840/?type=1&src=https%3A%2F%2Fscontent.xx.fbcdn.net%2Fhphotos-xfp1%2Ft31.0-8%2F11894571_10153954245456840_9038620401603938613_o.jpg&smallsrc=https%3A%2F%2Fscontent.xx.fbcdn.net%2Fhphotos-prn2%2Fv%2Ft1.0-9%2F11903991_10153954245456840_9038620401603938613_n.jpg%3Foh%3D0c837ce6b0498cd833f83cfbaeb577e7%26oe%3D567D8819&size=651%2C1000&fbid=10153954245456840&player_origin=profile" style="width:256px;">
<div class="uiScaledImageContainer _4-ep" style="width:256px;height:394px;" id="u_jsonp_2_r">
<img class="scaledImageFitWidth img" src="https://fbcdn-photos-h-a.akamaihd.net/hphotos-ak-prn2/v/t1.0-0/s526x395/11903991_10153954245456840_9038620401603938613_n.jpg?oh=15f59e964665efe28943d12bd00cefd9&oe=5667BDBA&__gda__=1448928574_a7c6da855842af4c152c2fdf8096e1ef" alt="9GAG's photo." width="256" height="395">
</div>
</a>
</div>
I want the href value of the a tag falling inside the class _5cq3.
Why not do it directly?
url = driver.find_element_by_class_name("_4-eo").get_attribute("href")
And if you need the div element first you can do it this way:
divElement = driver.find_elements_by_class_name("_5cq3")
url = divElement.find_element_by_class_name("_4-eo").get_attribute("href")
or another way via xpath (given that there is only one link element inside your 5cq3 Elements:
url = driver.find_element_by_xpath("//div[#class='_5cq3']/a").get_attribute("href")
You can use xpath for same
If you want to take href of "a" tag, 2nd line according to your HTML code then use
url = driver.find_element_by_xpath("//div[#class='_5cq3']/a[#class='_4-eo']").get_attribute("href")
If you want to take href of "img" tag, 4nd line according to your HTML code then use
url = driver.find_element_by_xpath("//div[#class='_5cq3']/a/div/img[#class='scaledImageFitWidth img']").get_attribute("href")
Use:
1)
xpath to specify the path to the href first.
x = '//a[#class="_4-eo"]'
k = driver.find_elements_by_xpath(x).get_attribute("href")
for url in k:
print url
2) Use #drkthng's solution(the simplest).
3)You can use:
parentElement = driver.find_elements_by_class("_4-eo")
elementList = parentElement.find_elements_by_tag_name("href")
You can use whatever you want in Selenium. there are 2-3 more ways to find the same.
And for image src use below xpath:
img_path = '//div[#class="uiScaledImageContainer _4-ep"]//img[#src]'

How to get link from elements with Selenium and Python

Let's say all Author/username elements in one webpage look like this...
How can I get to the href part using python and Selenium?
users = browser.find_elements_by_xpath(?)
<span>
Author:
<a href="/account/57608-bob">
bob
</a>
</span>
Thanks.
Use find_elements_by_tag_name('a') to find the 'a' tags, and then use get_attribute('href') to get the link string.
Use .//span[contains(text(), "Author")]/a as xpath expression.
For example:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://jsfiddle.net/9pKMU/show/')
for a in driver.find_elements_by_xpath('.//span[contains(text(), "Author")]/a'):
print(a.get_attribute('href'))
Using this code you can get the all links from a webpage
from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://your website/")
# identify elements with tagname <a>
lnks=driver.find_elements_by_tag_name("a")
# traverse list
for lnk in lnks:
# get_attribute() to get all href
print(lnk.get_attribute("href"))
driver.quit()

Categories

Resources