Further to question here.
<a id='1234' href ="http://www.google.com' class='alpha' > MY TEXT </a>
<caption>
<em> ABCD </em>
</caption>
I want to extract text between i.e
id='1234' href ="http://www.google.com' class='alpha'
How to do this using python and selenium.
Use
web_element.get_attribute(attribute_name)
method on the web_element object to get the value of any attribute present in a web_element in this case id, href, class.
Related
I'm new in Python and Selenium. I have this code:
<div class="Product_ProductInfo__23DMi">
<p style="font-weight: bold;">4.50</p>
<p>Bread</p>
<p>390 g</p>
</div>
I want to get access to the second <p> tag and get its value (I mean Bread).
For the first <p> tag, I used:
self.driver.find_element_by_xpath('//div[#class="Product_ProductInfo__23DMi"]/p')
But I don't know how to get to the other one.
Thanks.
You can do that using find_elements_by_css_selector() function and then selecting the second element of it.
a = self.webdriver.find_element_by_css_selector('div[class="Product_ProductInfo__23DMi"]')
second_p = a.find_elements_by_css_selector('p')[1]
You can use :nth-of-type(<index>) (index starts with 1) which is you can say css property.
a = self.webdriver.find_element_by_css_selector('div[class="Product_ProductInfo__23DMi"] > p:nth-of-type(2)')
I have the following HTML page. I want to get all the links inside a specific div. Here is my HTML code:
<div class="rec_view">
<a href='www.xyz.com/firstlink.html'>
<img src='imga.png'>
</a>
<a href='www.xyz.com/seclink.html'>
<img src='imgb.png'>
</a>
<a href='www.xyz.com/thrdlink.html'>
<img src='imgc.png'>
</a>
</div>
I want to get all the links that are present on the rec_view div. So those links that I want are,
www.xyz.com/firstlink.html
www.xyz.com/seclink.html
www.xyz.com/thrdlink.html
Here is the Python code which I tried with
from selenium import webdriver;
webpage = r"https://www.testurl.com/page/123/"
driver = webdriver.Chrome("C:\chromedriver_win32\chromedriver.exe")
driver.get(webpage)
element = driver.find_element_by_css_selector("div[class='rec_view']>a")
link = element.get_attribute("href")
print(link)
How can I get those links using selenium on Python?
As per the HTML you have shared to get the list of all the links that are present on the rec_view div you can use the following code block :
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\chromedriver_win32\chromedriver.exe')
driver.get('https://www.testurl.com/page/123/')
elements = driver.find_elements_by_css_selector("div.rec_view a")
for element in elements:
print(element.get_attribute("href"))
Note : As you need to collect all the href attributes from the div tag so instead of find_element_* you need to use find_elements_*. Additionally, > refers to immediate <a> child node where as you need to traverse all the <a> child nodes so the desired css_selector will be div.rec_view a
I want to get the href from a <p> tag using an XPath expression.
I want to use the text from <h1> tag ('Cable Stripe Knit L/S Polo') and simultaneously text from the <p> tag ('White') to find the href in the <p> tag.
Note: There are more colors of one item (more articles with different <p> tags, but the same <h1> tag)!
HTML source
<article>
<div class="inner-article">
<a href="/shop/tops-sweaters/ix4leuczr/a1ykz7f2b" style="height:150px;">
</a>
<h1>
<a href="/shop/tops-sweaters/ix4leuczr/a1ykz7f2b" class="name-link">Cable Stripe Knit L/S Polo
</a>
</h1>
<p>
White
</p>
</div>
</article>
I've tried this code, but it didn't work.
specificProductColor = driver.find_element_by_xpath("//div[#class='inner-article' and contains(text(), 'White') and contains(text(), 'Cable')]/p")
driver.get(specificProductColor.get_attribute("href"))
As per the HTML source, the XPath expression to get the href tags would be something like this:
specificProductColors = driver.find_elements_by_xpath("//div[#class='inner-article']//a[contains(text(), 'White') or contains(text(), 'Cable')]")
specificProductColors[0].get_attribute("href")
specificProductColors[1].get_attribute("href")
Since there are two hyperlink tags, you should be using find_elements_by_xpath which returns a list of elements. In this case it would return two hyperlink tags, and you could get their href using the get_attribute method.
I've got working code. It's not the fastest one - this part takes approximately 550 ms, but it works. If someone could simplify that, I'd be very thankful :)
It takes all products with the specified keyword (Cable) from the product page and all products with a specified color (White) from the product page as well. It compares href links and matches wanted product with wanted color.
I also want to simplify the loop - stop both for loops if the links match.
specificProduct = driver.find_elements_by_xpath("//div[#class='inner-article']//*[contains(text(), '" + productKeyword[arrayCount] + "')]")
specificProductColor = driver.find_elements_by_xpath("//div[#class='inner-article']//*[contains(text(), '" + desiredColor[arrayCount] + "')]")
for i in specificProductColor:
specProductColor = i.get_attribute("href")
for i in specificProduct:
specProduct = i.get_attribute("href")
if specProductColor == specProduct:
print(specProduct)
wantedProduct = specProduct
driver.get(wantedProduct)
I have to crawl data with Scrapy like this:
<div class="data"
data-name="{"id":"566565", "name":"data1"}"
data-property="{"length":"444", "height":"678"}"
>
data1
</div>
<div class="data"
data-name="{"id":"566566", "name":"data2"}"
data-property="{"length":"555", "height":"777"}"
>
data2
</div>
I need data-name and data-property attributes. My selector is:
selections = Selector(response).xpath('//div[#class="data"]/attribute::data-property').extract()
How can I include data-name attribute in selections?
The following XPath should return data-property and data-name attributes :
//div[#class='data']/attribute::*[name()='data-property' or name()='data-name']
XPath Demo : http://www.xpathtester.com/xpath/e720602b62461f3600989be73eb15aec
If you need to return the two attributes as a pair in a certain format for each parent div, then this can't be done using pure XPath 1.0. Some python would be required, maybe using list comprehension (not tested) :
selections = [div.xpath('concat(#data-property, " ", #data-name)').extract() \
for div in Selector(response).xpath('//div[#class="data"]')]
This HTML block:
<td class="tl-cell tl-popularity" data-tooltip="9,043,725 plays" data-tooltip-instant="">
<div class="pop-meter">
<div class="pop-meter-background"></div>
<div class="pop-meter-overlay" style="width: 57%"></div>
</div>
</td>
equates to this XPath:
xpath = '//*[#id="album-tracks"]/table/tbody/tr[5]/td[6]'
Trying to extract the text: 9,043,725 plays with
find_element_by_xpath(xpath).text()
returns an empty string. This text is only generated when a user hovers their mouse over the HTML block.
Is there a way to alter the XPath so that an empty string is not returned but the actual string is returned?
Try using get_attribute instead. The intended element can be located using any find_elements mechanisms. See the API DOC
element = browser.find_elements_by_css_selector('.tl-cell.tl-popularity')
text = element.get_attribute('data-tooltip')