Here is the HTML I'm dealing with
<a class="_54nc" href="#" role="menuitem">
<span>
<span class="_54nh">Other...</span>
</span>
</a>
I can't seem to get my XPath structured correctly to find this element with the link. There are other elements on the page with the same attributes as <a class="_54nc"> so I thought I would start with the child and then go up to the parent.
I've tried a number of variations, but I would think something like this:
crawler.get_element_by_xpath('//span[#class="_54nh"][contains(text(), "Other")]/../..')
None of the things I've tried seem to be working. Any ideas would be much appreciated.
Or, more cleaner is //*[.='Other...']/../.. and with . you are directly pointing to the parent element
In other scenario, if you want to find a tag then use css [role='menuitem'] which is a better option if role attribute is unique
how about trying this
crawler.get_element_by_xpath('//a[#class="_54nc"][./span/span[contains(text(), "other")]]')
Try this:
crawler.get_element_by_xpath('//a[#class='_54nc']//span[.='Other...']');
This will search for the element 'a' with class as "_54nc" and containing exact text/innerHTML "Other...". Furthermore, you can just edit the text "Other..." with other texts to find the respective element(s)
Related
I'm trying to rewrite someones library to parse some xml returned with requests. However they use lxml in a way I'm not used to. I believe it's using regular expression to find the data and while most of the library provided works, it doesn't work when the site being parsed has the file id in a list structure. Essnetially I get a page back and I'm looking for an id that matches the href athlete number. So say I want to just get id's for athlete 567377.
</div>
</a></div>
<ul class='list-entries'>
<li class='entity-details feed-entry' id='Activity-123120999590'>
<div class='avatar avatar-athlete avatar-default'>
<a class='avatar-content' href='/athletes/567377' >
</a>
</div>
</li>
<li class='entity-details feed-entry' id='Activity-16784940202'>
<div class='avatar avatar-athlete avatar-default'>
<a class='avatar-content' href='/athletes/5252525'>
</a>
</div>
The code:
lst_group_activity = parser.xpath(".//li[substring(#id, 1, 8)='Activity']")
Provides all list items perfectly but for all activities. I want to only have the one related to the right athlete. The library uses the following to use an #href to select the right athlete.
lst_athlethe_act_in_group_activity = parser.xpath(".//li[substring(#id, 1, 8)='Activity']/*[#href='/athletes/"+athlethe_id+"']/..")
However, this never seems to work. It finds the activity but then throws them all away.
Is there a better way to get this working? Any tutorial that can point me in the right direction to correlate to the next element.
The element with the href attribute isn't an immedite child of your li element, so your xpath is failing. You're matching:
.//li/*[#href="..."]
You want:
.//li/div/a[#href="..."]
(You could match * instead of a if you think another element might contain the href attribute, and you can match against .//li//a[#href="..."] if you think the path to the a element might not always be li/div/a).
So to find the li element:
parser.xpath(".//li[substring(#id, 1, 8)='Activity']/div/a[#href='/athletes/%s']/../.." % '5252525')
But you can also write that without the ../..:
parser.xpath(".//li[substring(#id, 1, 8)='Activity' and div/a/#href='/athletes/%s']" % '5252525')
I have a page that has to be scrapped.I use the python code
div = driver.find_element_by_class_name("parent")
data = div.find_elements_by_class_name("child1")
//I cannot access the web elements of **data** for eg: data.find_elements_by
for tag in data
//I cannot print the information of each div here
the Html
<div class="Parent">
<div class = child1 >
<div class = "heading">
data
</div>
</div>
<div class = child1 child2 >strong text
<div class = "heading">
<span>data</span>
</div>
</div>
</div>
Is there an easy way to access data
Well you can access html tags or text in different ways http://selenium-python.readthedocs.io/locating-elements.html
For multiple elements you can use :
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
There isn't a simple solution as far as I'm aware only by having specifics about the information you're looking for.
For instance let's you're using xpath (my personal preference):
Absolute XPath :
/html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
We can use Absolute xpath: /html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
Above xpath will technically work, but each of those nested
relationships will need to be present 100% of the time, or the locator
will not function. Above choosed xpath is known as Absolute xpath.
There is a good chance that your xpath will vary in every release. It
is always better to choose Relative xpath, as it helps us to reduce
the chance of element not found exception.
Relative xpath: //*[#id=’social-media’]/ul/li[3]/a
We can have a different approach to the data, therefore by using the correct way to 'select' the data we need, we can only extract/select the needed information. Look into each of these methods to understand them better, because you're asking for one line of code and each of those have their pros/cons (times when they can be useful or not).
It seems you want to access text which is inside heading div, if it is so then you can try the below code.
element=driver.find_element_by_class_name("heading")
data=element.text
assuming you are asking a way to loop through data where the info present is located in different locators in different nesting levels
There are multiple ways,
look for various selectors that match your pattern - find a way to do it that matches your problem - you refer css/xpath selector reference
if there are many selectors( but consistenly being used), you can use ByChained/ByAll Selectors look for the implementation in java, it will be like this, you can mimic the implementation,
selector1 = .Heading .child2
selector3 = .Heading .child3 span
selector2 = .Heading .child1
ByAll(selector1,selector2,selector3)'
if the parent is the only matching selector and there's no way to know abt child selectors, then another way is to use innerText/textContent property from a common parent
driver.findElement(By.cssSelector
('.child1').getAttribute('innerText')
if none of these, solves your problem, and you application is dynamic enough to use different references and different nesting levels each time for all the page, then it was meant to be not scrapped. so your should look for other ways of scrapping it.
I am using Python & Selenium to scrap the content of a certain webpage. Currently, I have the following problem: There are multiple div-classes with the same name, but each div-class has different content. I only need the information for one particular div-class. In the following example, I would need the information in the first "show_result"-class since there is the "Important-Element" within the link text:
<div class="show_result">
<a href="?submitaction=showMoreid=77" title="Go-here">
<span class="new">Important-Element</span></a>
Other text, links, etc within the class...
</div>
<div class="show_result">
<a href="?submitaction=showMoreid=78" title="Go-here">
<span class="new">Not-Important-Element</span></a>
Other text, links, etc within the class...
</div>
<div class="show_result">
<a href="?submitaction=showMoreid=79" title="Go-here">
<span class="new">Not-Important-Element</span></a>
Other text, links, etc within the class...
</div>
With the following code I can get the "Important-Element" and its link:
driver.find_element_by_partial_link_text('Important-Element'). However, I also need the other information within the same div-class "show-result". How can I refer to the entire div-class that contains the Important-Element in the link text? driver.find_elements_by_class_name('show_result') does not work since I do not know in which of the div-classes the Important-Element is located.
Thanks,
Finn
Edit / Update: Ups, I found the solution on my own using xpath:
driver.find_element_by_xpath("//div[contains(#class, 'show_result') and contains(., 'Important-Element')]")
I know you've found an answer but I believe it's wrong since you would also select the other nodes because Important-Element is still in Non-Important-Element.
Maybe it works for your specific case since that's not really the text you're after. But here are a few more answers:
//div[#class='show_result' and starts-with(.,'Important-Element')]
//div[span[text()='Important-Element']]
//div[contains(span/text(),'Important-Element') and not(contains(span/text(),'Non'))]
There are more ways to write this...
Ups, i found the solution on my own via xpath:
driver.find_element_by_xpath("//div[contains(#class, 'show_result') and contains(., 'Important-Element')]")
I'm looking for way to find element which contain some exactly text, the problem is this text dynamically changes every time.
It looks like this:
<div class="some class" ng-class="{ 'ngSorted': !col.noSortVisible90 }">
<span ng-call-text class="ngbinding" style="cursor: defaulte;">some text and digits</span>
Where "some text and digits" element that I need.
Could somebody help me with this?
UPD: I have a lot elements with the same classes on page and also I know text phrase thet should be fount, I can provide this text to my code as parameter.
You can use the id attribute
<span ng-call-text id="snarfblat" class="ngbinding" style="cursor: defaulte;">some text and digits</span>
so you can access it within JavaScript with
document.getElementById("snarfblat");
Why don't you use Xpath or CSSSelector to reach to your target element, maybe on of its parent has a unique Id or a property, start from there and reach you destination i.e the concerned HTML tag with dynamic text
I want to read the amount value (24.40) from this HTML.
<div id="order-total" class="clear-fix" style="margin-bottom:20px;">
<h3 class="col-left">Order total</h3>
<h3 class="col-right" style="display: block;">
<span class="credit-total-to-order" data-total-to-order="24.40">$ 24.40</span>
credits
</h3>
</div>
xpath - /html/body/div/header/section/form/div[5]/h3[2]/span
css - html body.ui-lang-en div#slave-edit.string-v2 header#slave-edit-header.edit
section#order-form form#frm-order-translation div#order-total.clear-fix
h3.col-right span.credit-total-to-order
I know I should use find_element_by_class_name or find_element_by_css_selector.
But not sure what should be the argument.
How can I do it?
Why not select the value from the element and parse the string to get the answer you need. For example, you can split the string and disregard the dollar to return the number you need.
someString = selenium.find_element_by_css_selector(".credit-total-to-order").text
someString.split(' ')[1]
Bear in mind - this will only work for the example you have provided.
Its not necessary to use find_element_by_class_name or find_element_by_css_selector..You can achive it with xpath like this
driver.find_element_by_xpath("//span[#class='credit-total-to-order']").text
UPDATE:
As per your updated html it looks like the style makes your element hidden.Mean while I also came to notice that the value you want to get is also stored in an attribute data-total-to-order.
So you can do somthing like this :
driver.find_element_by_xpath("//span[#class='credit-total-to-order']").get_Attribute("data-total-to-order")