Python Selenium - get href - python

I want to get some links in the url but they I get all links instead. How can I pull the links by specifying the selector?
For ex:
I'm using:
ids = browser.find_elements_by_xpath('//a[#href]')
for ii in ids:
print(ii.get_attribute('href'))
Result: All Links
But I want just some selector
<a class="classifiedTitle" title="MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS" href="/ilan/ikinci-el-ve-sifir-alisveris-bilgisayar-dizustu-notebook-monster-tulpar-t5v13.1-plus-15%2C6-ekran-plusi7-plus6gb-gtx1060-plus16ram-plus256gb-ss-793070526/detay">
MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS</a>
So how can I add some selectors?
Thanks & Regards

Try the following css selector to get the specific link.
print(browser.find_element_by_css_selector("a.classifiedTitle[title='MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS'][href*='/ilan/ikinci-el-ve-sifir-alisveris-bilgisayar-dizustu-notebook-monster-tulpar-']").get_attribute("href"))

If you want just the item in your example:
href=browser.find_element_by_xpath("//a[#title='MONSTER TULPAR T5V13.1+ 15,6 EKRAN+İ7+6GB GTX1060+16RAM+256GB SS" href="/ilan/ikinci-el-ve-sifir-alisveris-bilgisayar-dizustu-notebook-monster-tulpar-t5v13.1-plus-15%2C6-ekran-plusi7-plus6gb-gtx1060-plus16ram-plus256gb-ss-793070526/detay']").get_attribute('href')
There are obviously more than one way of identifying your element, this is simply an example using xpath.

Related

Can't grab next sibling using css selector within scrapy

I'm trying to fetch the budget using scrapy implementing css selector within it. I can get it when I use xpath but in case of css selector I'm lost. I can even get the content when I go for BeautifulSoup and use next_sibling.
I've tried with:
import requests
from scrapy import Selector
url = "https://www.imdb.com/title/tt0111161/"
res = requests.get(url)
sel = Selector(res)
# budget = sel.xpath("//h4[contains(.,'Budget:')]/following::text()").get()
# print(budget)
budget = sel.css("h4:contains('Budget:')::text").get()
print(budget)
Output I'm getting using css selector:
Budget:
Expected output:
$25,000,000
Relevant portion of html:
<div class="txt-block">
<h4 class="inline">Budget:</h4>$25,000,000
<span class="attribute">(estimated)</span>
</div>
website address
That portion in that site is visible as:
How can I get the budgetary information using css selector when it is used within scrapy?
This selector .css("h4:contains('Budget:')::text") is selecting the h4 tag, and the text you want is in it's parent, the div element.
You could use .css('div.txt-block::text') but this would return several elements, as the page have several elements like that. CSS selectors don't have a parent pseudo-element, I guess you could use .css('div.txt-block:nth-child(12)::text') but if you are going to scrape more pages, this will probably fail in other pages.
The best option would be to use XPath:
response.xpath('//h4[text() = "Budget:"]/parent::div/text()').getall()

Using find_element_by_xpath inside span

I'm trying to automatically navigate web pages using python, selenium, xpath.
I want to click a next page button in a page whose code is like this:
<a _ngcontent-pnd-c43="" automation-id="discover-market-next-button"
class="menu-item-button ng-star-inserted">
<span _ngcontent-pnd-c43="" class="nav-button-right sprite"></span>
</a>
I tried with the following code:
try:
element='//span[class="nav-button-right sprite"]'
button_next = driver.find_element_by_xpath(element)
webdriver.ActionChains(driver).move_to_element(button_next).click(button_next).perform()
time.sleep(15)
content = driver.page_source.encode('utf-8')
except NoSuchElementException:
print ("NoSuchElementException")
but I got "NoSuchElementException".
Could anyone help me?
The problem is that nav-button-right sprite is not a class, not even an attribute name. So try changing you xpath expression from:
element='//span[class="nav-button-right sprite"]'
to:
element='//span[#_ngcontent-pnd-c43]'
and see if it works.
You less # in:
element='//span[#class="nav-button-right sprite"]'
I think the problem is in your XPath, you missed # symbol after class, try this hope it helps:
element='//span[#class='nav-button-right sprite']'

how to find element by css selector using python / selenium

i'm trying to pick up links of youtube channels which are located as below:
<a id="author-text" class="yt-simple-endpoint style-scope ytd-comment-
renderer" href="/channel/UCUSy-h1fPG1L6X7KOe70asA"> <span class="style-
scope ytd-comment-renderer">Jörgen Nilsson</span></a>
So in the example above I would want to pick up "/channel/UCUSy-h1fPG1L6X7KOe70asA". So far i have tried many options but none work:
driver = webdriver.Chrome('C:/Users/me/Chrome Web Driver/chromedriver.exe')
api_url="https://www.youtube.com/watch?v=TQG7m1BFeRc"
driver.get(api_url)
time.sleep(2)
div = driver.find_element_by_class_name("yt-simple-endpoint style-scope ytd-comment-renderer")
but I get the following error:
InvalidSelectorException: Message: invalid selector: Compound class names not permitted
I also tried other approaches:
div = driver.find_elements_by_xpath("yt-simple-endpoint style-scope ytd-comment-renderer")
div = driver.find_element_by_class_name('yt-simple-endpoint style-scope ytd-comment-renderer')
div=driver.find_element_by_css_selector('.yt-simple-endpoint style-scope ytd-comment-renderer').get_attribute('href')
but no luck.. if someone could please help it would be much appreciated. Thank you
Your selectors are invalid:
driver.find_element_by_class_name("yt-simple-endpoint style-scope ytd-comment-renderer")
you cannot pass more than one class name to find_element_by_class_name method. You can try driver.find_element_by_class_name("ytd-comment-renderer")
driver.find_elements_by_xpath("yt-simple-endpoint style-scope ytd-comment-renderer")
it's not a correct XPath syntax. You probably mean driver.find_elements_by_xpath("//*[#class='yt-simple-endpoint style-scope ytd-comment-renderer']")
driver.find_element_by_css_selector('.yt-simple-endpoint style-scope ytd-comment-renderer')
each class name should start with the dot: driver.find_element_by_css_selector('.yt-simple-endpoint.style-scope.ytd-comment-renderer')
But the best way IMHO to identify by ID:
driver.find_element_by_id("author-text")
You can use BeautifulSoup in python to get the links in anchor tag having specific class names like soup.find_all('a', attrs={'class':'yt-simple-endpoint'}) you can read more here find_all using css

Find nested divs scrapy

I am trying to get the text from a div that is nested. Here is the code that I currently have:
sites = hxs.select('/html/body/div[#class="content"]/div[#class="container listing-page"]/div[#class="listing"]/div[#class="listing-heading"]/div[#class="price-container"]/div[#class="price"]')
But it is not returning a value. Is my syntax wrong? Essentially I just want the text out of <div class="price">
Any ideas?
The URL is here.
The price is inside an iframe so you should scrape https://www.rentler.com/ksl/listing/index/?sid=17403849&nid=651&ad=452978
Once you request this url:
hxs.select('//div[#class="price"]/text()').extract()[0]

How can I get the href of elements found by partial link text?

Using Selenium and the Chrome Driver I do:
links = browser.find_elements_by_partial_link_text('##') matches about 160 links.
If I try,
for link in links:
print link.text
with it I get the text of all the links:
##1
##2
...
##160
The links are like this:
##1
##2
...
##160
How can I get the href attribute of all the links found?
Call get_attribute on each of the links you have found:
links = browser.find_elements_by_partial_link_text('##')
for link in links:
print(link.get_attribute("href"))
An existing answer to a similar question seems like it might apply:
Assume
your HTML consists solely of that one tag, then this should do it:
String href = selenium.getAttribute("css=a#href");
You use the DefaultSelenium#getAttribute() method and pass in a CSS locator, an # symbol, and the name of the attribute you want to fetch. In this case, you select the a and get its #href.
So if the link contains "..blablabla..." text then you can find it in that way:
selenium.getAttribute("css=a:contains('..blablabla...')#href");

Categories

Resources