Find a particular element of an html page using Selenium/Python

Find a particular element of an html page using Selenium/Python - python

I have multiple levels of div class elements out of which I need to find just a particular elements and get the text value and store in a variable.
<div class="Serial">
<p> … </p>
<p>
<span>
<a href="mailto:xyz#xyz.com">
Mr. XYZ
</a>
</span>
</p>
<p> … </p>
<p> … </p>
So, we have 4 different paragraphs out of which I need to only read second paragraph and save the email ID to a variable. When I use the following code,
find_element_by_xpath("//div[#class='Serial']")
I get all the 4 paragraph information. Is there anyway I can specify which paragraph to read within the div class? I know for sure the order doesn't change and I only want to read 2nd p element. Appreciate your help.

You could try accessing the <p> tag by giving xpath as find_element_by_xpath("//div[#class='Serial']/p[2]/span/a") to access the email id present in the second paragraph.

I think this is not completely correct to rely on order of paragraphs - one day it may change, and those who will come after you can be slightly confused by p[2]. As you need to find text from paragraph with email link, I believe this XPath would do the trick:
//p[span/a[starts-with(#href, 'mailto:')]]

Related

How to combine find_all() and find_next() in BeautifulSoup?

So I have such pieces of HTML that I'm trying to parse. What I want to grab is the price ("84.00 USD"):
<div class="HeaderAndValues_headerDetailSection__3c2SZ ProductCatalog_price__25i2r">
<div class="HeaderAndValues_header__3dB61">Wholesale</div>
<span class="notranslate">
<div class="">84.00 USD</div>
</span>
</div>
soup.find(text="Wholesale").find_next().text gives me exactly what I need but only for the first search result. Is there anyway I could combine find_all() and find_next()? smth like soup.find_all(text="Wholesale").find_next() that would grab next text for each found "Wholesale"

Ok, I've found it! Someone might still find it useful
[x.find_next().text for x in page.find_all(text = "Wholesale")]

Python: XPATH searching within a node

What is a good way to select multiple nodes from with in a node in a html code using xpath?
I have this code (actually this repeated 23 times);
<li>
<a class="Title" href="http://www.google.com" >Google</a>
<div class="Info">
<p>
text
</p>
<p class="Date">Status: Under development</p>
</div>
I am trying to get both Title and Date and have two different XPATH querys like this;
//a[#class="Title"]/#href
//p[#class="Date"]
But when I do this I get two returns with 23 and 22 values each. This is because at one point in the HTML code Date is not present. Therefore I would like to stay inside the li and search for Title and Date within that li so I can check if there are any values.
I changed my XPATH to this;
//li
In my return Element I can see that there are two sub elements, a and div but I cannot seem to figure out how I am supposed to handle what is inside the return Element?

When you want to search elements within the current node you need to start your Xpath pattern with a dot.
For example:
.//a[#class="Title"]/#href
.//p[#class="Date"]

xpath/python search then grab child nodes?

I am working on a scraper using python and selenium and I have an issue traversing xpath. I feel like this should be simple, but I'm obviously missing something.
I am able to navigate the site I am browsing fine, but I need to grab some SPAN text based on an XPATH search.
I am able to click the appropriate radio button(in this case the 1st one)
(driver.find_elements_by_name("start-date"))[0].click()
But I also need to capture the text next to the radio button which is captures in the span tags.
<label>
<input type="radio" name="start-date" value="1" data-start-date="/Date(1507854300000)/" data-end-date="/Date(1508200200000)/" group="15" type-id="8">
<span class="start-date">
10/12/2017<br>Summary text
</span>
</label>
In the above example, I'm looking to capture "10/12/2017" and "Summary text" into 2 string variables based on the find_elements_by_name search I used to find the radio button.
I then have a second, similar, collection issue, where I need to capture the span tags after searching by class name. This finds the appropriate parent node on the page:
(driver.find_element_by_xpath("//div[#class=\"MyClass\"]"))
Based on the node returned by that search, I want to grab "Text 1" and "Text 2" from the span tags below it.
<div class="MyClass">
<span>
<span>Text 1</span>
</span>
<span class="bullet">
</span>
<span>
<span>Text 2</span>
</span>
</div>
I am new to xpath, but from what I can gather, the span nodes I am looking for should be children of the nodes I found with my searches, and I should be able to traverse down the hierarchy somehow to get the values, I'm just not sure how.

It's actually very simple, all WebElement objects have the same find_element_by_* methods that the WebDriver object has, with the main difference that the element methods change the context to that element, meaning that it will only have children of the selected element.
With that in mind you should be able to do:
my_element = driver.find_element_by_class_name('MyClass')
my_spans = my_element.find_elements_by_css_selector('span>span')
What happens here is that we grab the first element with class MyClass, then from the context of that element we search for elements that are span AND children of a span

you can try with the following x-path.
//div[#class='MyClass']/span[1]/span ---- To get Text 1
//div[#class='MyClass']/span[3]/span -----To get Text 2
or
(//div[#class='MyClass']/span/span)[1] ---- To get Text 1
(//div[#class='MyClass']/span/span)[2] ---- To get Text 2

xpath in lxml to find id number based on href

I'm trying to rewrite someones library to parse some xml returned with requests. However they use lxml in a way I'm not used to. I believe it's using regular expression to find the data and while most of the library provided works, it doesn't work when the site being parsed has the file id in a list structure. Essnetially I get a page back and I'm looking for an id that matches the href athlete number. So say I want to just get id's for athlete 567377.
</div>
</a></div>
<ul class='list-entries'>
<li class='entity-details feed-entry' id='Activity-123120999590'>
<div class='avatar avatar-athlete avatar-default'>
<a class='avatar-content' href='/athletes/567377' >
</a>
</div>
</li>
<li class='entity-details feed-entry' id='Activity-16784940202'>
<div class='avatar avatar-athlete avatar-default'>
<a class='avatar-content' href='/athletes/5252525'>
</a>
</div>
The code:
lst_group_activity = parser.xpath(".//li[substring(#id, 1, 8)='Activity']")
Provides all list items perfectly but for all activities. I want to only have the one related to the right athlete. The library uses the following to use an #href to select the right athlete.
lst_athlethe_act_in_group_activity = parser.xpath(".//li[substring(#id, 1, 8)='Activity']/*[#href='/athletes/"+athlethe_id+"']/..")
However, this never seems to work. It finds the activity but then throws them all away.
Is there a better way to get this working? Any tutorial that can point me in the right direction to correlate to the next element.

The element with the href attribute isn't an immedite child of your li element, so your xpath is failing. You're matching:
.//li/*[#href="..."]
You want:
.//li/div/a[#href="..."]
(You could match * instead of a if you think another element might contain the href attribute, and you can match against .//li//a[#href="..."] if you think the path to the a element might not always be li/div/a).
So to find the li element:
parser.xpath(".//li[substring(#id, 1, 8)='Activity']/div/a[#href='/athletes/%s']/../.." % '5252525')
But you can also write that without the ../..:
parser.xpath(".//li[substring(#id, 1, 8)='Activity' and div/a/#href='/athletes/%s']" % '5252525')

How to read a particular value from a web page in Python/Selenium

I want to read the amount value (24.40) from this HTML.
<div id="order-total" class="clear-fix" style="margin-bottom:20px;">
<h3 class="col-left">Order total</h3>
<h3 class="col-right" style="display: block;">
<span class="credit-total-to-order" data-total-to-order="24.40">$ 24.40</span>
credits
</h3>
</div>
xpath - /html/body/div/header/section/form/div[5]/h3[2]/span
css - html body.ui-lang-en div#slave-edit.string-v2 header#slave-edit-header.edit
section#order-form form#frm-order-translation div#order-total.clear-fix
h3.col-right span.credit-total-to-order
I know I should use find_element_by_class_name or find_element_by_css_selector.
But not sure what should be the argument.
How can I do it?

Why not select the value from the element and parse the string to get the answer you need. For example, you can split the string and disregard the dollar to return the number you need.
someString = selenium.find_element_by_css_selector(".credit-total-to-order").text
someString.split(' ')[1]
Bear in mind - this will only work for the example you have provided.

Its not necessary to use find_element_by_class_name or find_element_by_css_selector..You can achive it with xpath like this
driver.find_element_by_xpath("//span[#class='credit-total-to-order']").text
UPDATE:
As per your updated html it looks like the style makes your element hidden.Mean while I also came to notice that the value you want to get is also stored in an attribute data-total-to-order.
So you can do somthing like this :
driver.find_element_by_xpath("//span[#class='credit-total-to-order']").get_Attribute("data-total-to-order")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find a particular element of an html page using Selenium/Python - python

You could try accessing the <p> tag by giving xpath as find_element_by_xpath("//div[#class='Serial']/p[2]/span/a") to access the email id present in the second paragraph.

Related

How to combine find_all() and find_next() in BeautifulSoup?

Python: XPATH searching within a node

xpath/python search then grab child nodes?

xpath in lxml to find id number based on href

How to read a particular value from a web page in Python/Selenium

Categories

Resources