How to get link from elements with Selenium and Python - python

Let's say all Author/username elements in one webpage look like this...
How can I get to the href part using python and Selenium?
users = browser.find_elements_by_xpath(?)
<span>
Author:
<a href="/account/57608-bob">
bob
</a>
</span>
Thanks.

Use find_elements_by_tag_name('a') to find the 'a' tags, and then use get_attribute('href') to get the link string.

Use .//span[contains(text(), "Author")]/a as xpath expression.
For example:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://jsfiddle.net/9pKMU/show/')
for a in driver.find_elements_by_xpath('.//span[contains(text(), "Author")]/a'):
print(a.get_attribute('href'))

Using this code you can get the all links from a webpage
from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://your website/")
# identify elements with tagname <a>
lnks=driver.find_elements_by_tag_name("a")
# traverse list
for lnk in lnks:
# get_attribute() to get all href
print(lnk.get_attribute("href"))
driver.quit()

Related

Get an empty list of XPATH expression in python

I have watched a video at this link https://www.youtube.com/watch?v=EELySnTPeyw and this is the code ( I have changed the xpath as it seems the website has been changed)
import selenium.webdriver as webdriver
def get_results(search_term):
url = 'https://www.startpage.com'
browser = webdriver.Chrome(executable_path="D:\\webdrivers\\chromedriver.exe")
browser.get(url)
search_box = browser.find_element_by_id('q')
search_box.send_keys(search_term)
try:
links = browser.find_elements_by_xpath("//a[contains(#class, 'w-gl__result-title')]")
except:
links = browser.find_lemets_by_xpath("//h3//a")
print(links)
for link in links:
href = link.get_attribute('href')
print(href)
results.append(href)
browser.close()
get_results('cat')
The code works well as for the part of opening the browser and navigating to the search box and sending keys but as for the links return an empty list although I have manually searched for the xpath in the developer tools and it returns 10 results.
You need to add keys.enter to your search. You weren't on the next page.
search_box.send_keys(search_term+Keys.ENTER)
Import
from selenium.webdriver.common.keys import Keys
Outputs
https://en.wikipedia.org/wiki/Cat
https://www.cat.com/en_US.html
https://www.cat.com/
https://www.youtube.com/watch?v=cbP2N1BQdYc
https://icatcare.org/advice/thinking-of-getting-a-cat/
https://www.caterpillar.com/en/brands/cat.html
https://www.petfinder.com/cats/
https://www.catfootwear.com/US/en/home
https://www.aspca.org/pet-care/cat-care/general-cat-care
https://www.britannica.com/animal/cat

Click and scrape 'a href' links by class name using Selenium in Python

I have the following a href link with only a class identifier. I'm trying to have Selenium recursively click through each link, but Selenium isn't returning the proper page sources from each 'a href' links.
<div class="row">
<div class="item">
↳<a href /path/to/link/ class="link-box">
<div class="item">
<div class="item">
<div class="item">
What am I doing wrong here:
driver = webdriver.Chrome("/Users/me/Downloads/chromedriver", options=options)
driver.get("https://the_website")
link_box = driver.find_elements_by_class_name('link-box')
for i in range(len(link_box)):
driver.execute_script("arguments[0].click();", link_box[i])
page_source = driver.page_source
pprint(page_source)
I wrote another code to do it.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
#driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver = webdriver.Firefox(executable_path='geckodriver')
driver.get("url")
l=[]
for a in driver.find_elements_by_class_name('link-box'):
link = a.get_attribute('href')
l.append(link)
print(l)
for b in range(len(l)):
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[b+1])
driver.get(l[b])
print(l[b])
First, it will take all the link which has class link-box. Then it will open all the links in new tabs. Otherwise, there might be an error. I did this with Firefox but if you are doing with Chrome comment line 4 and uncomment line 3. Then give the right path.

Selenium how to extract href and label name Python?

I'm trying to pull the href and the data-promoname from the
URL:
https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav
I tried the code below but can only extract href under the class "promo-focus", but I also want to get the COVID-19 Economic cases: Scenarios for business leaders from data-promoname
driver = webdriver.Chrome(executable_path=r'C:\chromedriver.exe')
url = "https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav"
driver.get(url)
for i in driver.find_elements_by_class_name('promo-focus'):
print(i.get_attribute('href'))
Can anyone tell me how to do that using Python?
Try using the text method to get the text.
Example
from selenium import webdriver
chrome_browser = webdriver.Chrome()
url = "https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav"
chrome_browser.get(url)
for a in chrome_browser.find_elements_by_class_name('promo-focus'):
print(a.get_attribute('href'))
print(a.text)
To get the value from data-promoname you can do this by using .get_attribute method. This method can be used to get the value of any attribute corresponding to its tag.
driver_path = 'C:/chromedriver.exe' #the path to your chrome driver
browser = webdriver.Chrome(driver_path)
url_to_open = 'https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav'
browser.get(url_to_open)
for a in browser.find_elements_by_class_name('promo-focus'):
print(a.get_attribute('href'))
print(a.get_attribute("data-promoname"))
If you are looking for the content being displayed on the page under the anchor tags, you can use .text instead
print(a.text)

Python: find element by a href

I used webdriver Chrome to scrape data from a website, but I don't know how to extract data from a href.
HTM:
<div class="buySearchResultContent">
<ul id="CARS_LIST_DATA">
<li class="seo_list" data-seo_name="440285">
<div class="buySearchResultContentImg">
<a href="carinfo-333285.php">
<img src="carpics/9400180056/290x200/20180305101502854_4567823.jpg" srcset="carpics/9400180056/290x200/20180305101502854_9098765.jpg 290w, carpics/9400180056/435x300/20180305101502854_00000.jpg 435w , carpics/9400180056/720x520/20180305101502854_00001.jpg 720w" sizes="(min-width: 992px) 75vw, 90vw" alt="auto">
</a>
My code:
driver = webdriver.Chrome("C:/chromedriver.exe")
url = "https://www.asdf.com.tw/price-02.php?v=3&brand=lisa&model=lulu&year1=2009&year2=2018&page=1"
driver.get(url)
content=driver.find_element_by_class_name('buySearchResultContentImg')
print(content)
What I want to extract is "carinfo-333285.php". Thanks!
Try the following code:
from selenium.common.exceptions import NoSuchElementException
try:
a_element = driver.find_element_by_xpath('//div[contains(#class,
"buySearchResultContentImg")]/a[#href]')
link = a_element.get_attribute("href")
except NoSuchElementException:
link = None
As per the HTML you have provided to extract the href attribute you can use either of the following Locator Strategies :
css_selector :
myHref = driver.find_element_by_css_selector("div.buySearchResultContentImg > a").get_attribute("href")
xpath :
myHref = driver.find_element_by_xpath("//div[#class='buySearchResultContentImg']/a").get_attribute("href")
I don't know too much about python, Please try this
Jpg_href= driver.find_element_by_xpath("//div[#class='buySearchResultContentImg']/a[#href='carinfo-333285.php']").get_attribute("href")

Get links from a certain div using Selenium in Python

I have the following HTML page. I want to get all the links inside a specific div. Here is my HTML code:
<div class="rec_view">
<a href='www.xyz.com/firstlink.html'>
<img src='imga.png'>
</a>
<a href='www.xyz.com/seclink.html'>
<img src='imgb.png'>
</a>
<a href='www.xyz.com/thrdlink.html'>
<img src='imgc.png'>
</a>
</div>
I want to get all the links that are present on the rec_view div. So those links that I want are,
www.xyz.com/firstlink.html
www.xyz.com/seclink.html
www.xyz.com/thrdlink.html
Here is the Python code which I tried with
from selenium import webdriver;
webpage = r"https://www.testurl.com/page/123/"
driver = webdriver.Chrome("C:\chromedriver_win32\chromedriver.exe")
driver.get(webpage)
element = driver.find_element_by_css_selector("div[class='rec_view']>a")
link = element.get_attribute("href")
print(link)
How can I get those links using selenium on Python?
As per the HTML you have shared to get the list of all the links that are present on the rec_view div you can use the following code block :
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe')
driver.get('https://www.testurl.com/page/123/')
elements = driver.find_elements_by_css_selector("div.rec_view a")
for element in elements:
print(element.get_attribute("href"))
Note : As you need to collect all the href attributes from the div tag so instead of find_element_* you need to use find_elements_*. Additionally, > refers to immediate <a> child node where as you need to traverse all the <a> child nodes so the desired css_selector will be div.rec_view a

Categories

Resources