How to get the value of href using text

How to get the value of href using text - python

I have the following text: 'abcdef' ,
I want to use this text to get the href. How can I do it?
<span class="thumb_link">
<a class="link_txt" href="/14803/tool/4554"> abcdef</a>
</span>
I tried the following but it failed
driver.find_element_by_css_selector('a[text()="abcdef"]')
Because the matching point is text, it must be obtained using text.

The link text would do that or if you have several similar link texts. Another way is with xpath.
driver.find_element_by_xpath("//a[text()='abcdef']").get_attribute("href")

Selenium's link text allows you to do that.
driver.find_element_by_link_text("abcdef").get_attribute("href")
will return the href.

Related

Scrapy: getting an item using XPath

I am trying to extract this text This is it from the following html code:
<span id="theId" class="theClass1 nUl" title="This is it" >
I am trying this:
response.xpath('//span[#class="theClass1 nUl"]')
But I do not know how to get what is inside title.
How can I do that?

Use this XPath expression to get the text content of the title attribute:
response.xpath('//span[#class="theClass1 nUl"]/#title')
Output is:
This is it

Selenium-Python: Class containing link-text

I am using Python & Selenium to scrap the content of a certain webpage. Currently, I have the following problem: There are multiple div-classes with the same name, but each div-class has different content. I only need the information for one particular div-class. In the following example, I would need the information in the first "show_result"-class since there is the "Important-Element" within the link text:
<div class="show_result">
<a href="?submitaction=showMoreid=77" title="Go-here">
<span class="new">Important-Element</span></a>
Other text, links, etc within the class...
</div>
<div class="show_result">
<a href="?submitaction=showMoreid=78" title="Go-here">
<span class="new">Not-Important-Element</span></a>
Other text, links, etc within the class...
</div>
<div class="show_result">
<a href="?submitaction=showMoreid=79" title="Go-here">
<span class="new">Not-Important-Element</span></a>
Other text, links, etc within the class...
</div>
With the following code I can get the "Important-Element" and its link:
driver.find_element_by_partial_link_text('Important-Element'). However, I also need the other information within the same div-class "show-result". How can I refer to the entire div-class that contains the Important-Element in the link text? driver.find_elements_by_class_name('show_result') does not work since I do not know in which of the div-classes the Important-Element is located.
Thanks,
Finn
Edit / Update: Ups, I found the solution on my own using xpath:
driver.find_element_by_xpath("//div[contains(#class, 'show_result') and contains(., 'Important-Element')]")

I know you've found an answer but I believe it's wrong since you would also select the other nodes because Important-Element is still in Non-Important-Element.
Maybe it works for your specific case since that's not really the text you're after. But here are a few more answers:
//div[#class='show_result' and starts-with(.,'Important-Element')]
//div[span[text()='Important-Element']]
//div[contains(span/text(),'Important-Element') and not(contains(span/text(),'Non'))]
There are more ways to write this...

Ups, i found the solution on my own via xpath:
driver.find_element_by_xpath("//div[contains(#class, 'show_result') and contains(., 'Important-Element')]")

Selenium - Extract text in div without other tags (Python)

Trying to figure out how to access the text in the screenshot below without pulling all the span tags.
Doing element = driver.find_elements_by_id('response') gives me a list, but I can't seem to dig down further to access the text I want.
I also tried this after doing some searching:
element = driver.find_element_by_xpath("//div[#id='response']/pre")
But I get the same result.
Any tips?

element.get_attribute('innerHTML')
this will help you to get the text between two div tag

element.text
Should give out the contents of the element without any HTML tags.

In the case of the text being in the pure div the text is not extracted using element.text
Example:
<div>the text here</div>
I recommend to use a library called html2text and next:
html2text(element.get_attribute("outerHTML"))
It will do the trick!

Extracting text from hyperlink using XPath

I am using Python along with Xpath to scrape Reddit. Currently I am working on the front page. I am trying to extract links from its front page and display their titles in the shell.
For this I am using the Scrapy framework. I am testing this in the Scrapy shell itself.
My question is this: How do I extract the text from the <a> ABC </a> attribute. I want the string "ABC". I cannot find it. I have tried the following expressions, but it does not seem to work.
response.xpath('//p[descendant::a[contains(#class,"title")]]/#value')
response.xpath('//p[descendant::a[contains(#class,"title")]]/#data')
response.xpath('//p[descendant::a[contains(#class,"title")]]').extract()
response.xpath('//p[descendant::a[contains(#class,"title")]]/text()')
None of them seem to work. When I use extract(), it gives me the whole attribute itself. For example, instead of giving me ABC, it will give me <a>ABC</a>.
How can i extract the text string?

If <p> and <a> are in this situation:
<p>
<something>
<a class="title">ABC</a>
</something>
</p>
This will give you "ABC":
>>print response.xpath('//p//a[#class="title"]/text()').extract()[0]
ABC
// is equal of using descendants. p[descendant::a] wont give you the result because you are not considering <a> as descendant of <p>

Only tested it with online XPath evaluator, but it should work when you adjust it to
response.xpath('//p/descendant::a[contains(#class,"title")]/text()')
If you're evaluating //p[descendant::a[contains(#class,"title")]]/text(), the <p> (with the descendant <a>) is the current element and not the <a>.

XPATH to check on a specific text within a node

I have this as a node to parse:
<h3 class="atag">
<a href="http://www.example.com">
<span class="btag">text to be ignored</span>
</a>
<span class="ctag">text to be checked</span>
</h3>
I'm gonna need to extract "http://www.example.com" but not the part text to to be ignored; I also need to check that if ctag contains text to be checked.
I came up with this but it seems it doesn't do the job.
response.xpath("//h3/a/#*[not(self::span)]").extract()
any idea on this?

If you need to just select href from 'a' tag, use #href.
To also check, whether the ctag contains some text, I think you can use code like this:
'//h3[contains(span[#class="ctag"]/text(), "text to be checked")]/a/#href'
This would check whether there is a span with "text to be checked" inside given h3 block. If the text exists, the 'www.example.com' would be found, otherwise there would be an empty result.

Do you mean something like this XPath? :
//h3/a[following-sibling::span[#class='ctag' and .='text to be checked']/#href
above XPath get <a> tag that followed by <span class="ctag"> containing value of "text to be checked", then return href attribute from the previously mentioned <a> tag.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get the value of href using text - python

The link text would do that or if you have several similar link texts. Another way is with xpath. driver.find_element_by_xpath("//a[text()='abcdef']").get_attribute("href")

Selenium's link text allows you to do that. driver.find_element_by_link_text("abcdef").get_attribute("href") will return the href.

Related

Scrapy: getting an item using XPath

Selenium-Python: Class containing link-text

Selenium - Extract text in div without other tags (Python)

Extracting text from hyperlink using XPath

XPATH to check on a specific text within a node

Categories

Resources