python selenium accessing html elements on more than two levels - python

I have a page that has to be scrapped.I use the python code
div = driver.find_element_by_class_name("parent")
data = div.find_elements_by_class_name("child1")
//I cannot access the web elements of **data** for eg: data.find_elements_by
for tag in data
//I cannot print the information of each div here
the Html
<div class="Parent">
<div class = child1 >
<div class = "heading">
data
</div>
</div>
<div class = child1 child2 >strong text
<div class = "heading">
<span>data</span>
</div>
</div>
</div>
Is there an easy way to access data

Well you can access html tags or text in different ways http://selenium-python.readthedocs.io/locating-elements.html
For multiple elements you can use :
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
There isn't a simple solution as far as I'm aware only by having specifics about the information you're looking for.
For instance let's you're using xpath (my personal preference):
Absolute XPath :
/html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
We can use Absolute xpath: /html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
Above xpath will technically work, but each of those nested
relationships will need to be present 100% of the time, or the locator
will not function. Above choosed xpath is known as Absolute xpath.
There is a good chance that your xpath will vary in every release. It
is always better to choose Relative xpath, as it helps us to reduce
the chance of element not found exception.
Relative xpath: //*[#id=’social-media’]/ul/li[3]/a
We can have a different approach to the data, therefore by using the correct way to 'select' the data we need, we can only extract/select the needed information. Look into each of these methods to understand them better, because you're asking for one line of code and each of those have their pros/cons (times when they can be useful or not).

It seems you want to access text which is inside heading div, if it is so then you can try the below code.
element=driver.find_element_by_class_name("heading")
data=element.text

assuming you are asking a way to loop through data where the info present is located in different locators in different nesting levels
There are multiple ways,
look for various selectors that match your pattern - find a way to do it that matches your problem - you refer css/xpath selector reference
if there are many selectors( but consistenly being used), you can use ByChained/ByAll Selectors look for the implementation in java, it will be like this, you can mimic the implementation,
selector1 = .Heading .child2
selector3 = .Heading .child3 span
selector2 = .Heading .child1
ByAll(selector1,selector2,selector3)'
if the parent is the only matching selector and there's no way to know abt child selectors, then another way is to use innerText/textContent property from a common parent
driver.findElement(By.cssSelector
('.child1').getAttribute('innerText')
if none of these, solves your problem, and you application is dynamic enough to use different references and different nesting levels each time for all the page, then it was meant to be not scrapped. so your should look for other ways of scrapping it.

Related

How do I click on this item using Selenium?

Im trying to automate the download of report using selenium. To get to the page where the report is I have to click on an image with this code
<div class="leaflet-marker-icon single-icon-container running hover asset leaflet-zoom-hide leaflet-clickable" tabindex="0" style="margin-left: -22px; margin-top: -41px; width: 44px; height: 44px; opacity: 1; transform: translate3d(525px, 238px, 0px); z-index: 238;"><div class="icon-value" lid="219058"></div></div>
I tried with
wtg = driver.find_elements_by_class_name(
"leaflet-marker-icon single-icon-container running hover asset leaflet-zoom-hide leaflet-clickable")
wtg.click()
but nothing happens.
There are 7 elements with the same class, and a unique "id " tha looks like lid="219058" but I dont know how to select that.
leaflet-marker-icon single-icon-container running hover asset leaflet-zoom-hide leaflet-clickable contains multiple class names while driver.find_element_by_class_name method intends to get a single class name.
I can't give you a correct locator for this element since you didn't share the page link, however if you wish to locate that element based on these class names combination you can use CSS Selector or XPath as following:
wtg = driver.find_element_by_css_selector(".leaflet-marker-icon.single-icon-container.running.hover.asset.leaflet-zoom-hide.leaflet-clickable")
wtg.click()
Or
wtg = driver.find_element_by_xpath("//*[#class='leaflet-marker-icon single-icon-container running hover asset leaflet-zoom-hide leaflet-clickable']")
wtg.click()
Also you should use driver.find_element_by_class_name, not driver.find_elements_by_class_name since driver.find_elements_by_class_name will give you a list of web elements, not a single web element that can be clicked directly.
Alternatively you can use the first index inside the list of received web elements as described by FLAK-ZOSO
Generally speaking, the best practice when building web scrapers is to always use xpath, since xpath can apply all the filters (id, class, etc) in a more flexible way (in some cases though, performance in selenium might be decreased).
I recommend you check this article on how to write xpaths for various needs: https://www.softwaretestinghelp.com/xpath-writing-cheat-sheet-tutorial-examples/
For your particular use case, I would use:
driver.find_element_by_xpath('//div[#lid="219058"]')
This will actually click on the inner div (notice how the lid is actually inside the nested div). If you wish to click on the outer div you can use:
driver.find_element_by_xpath('//div[#lid="219058"]/parent::div')
I again recommend you to learn Xpath syntax and always use it, it is way easier to manipulate than the other selenium selectors and is also faster in case you ever choose to implement a C compiled html parser such as lxml to parse the elements.
Remember that driver.find_elements_by_class_name() returns a list.
You have to do something like this when using this get/find method:
driver.find_elements_by_class_name('class')[0] #If you want the first of the page
In your case you need to use the css_selector because you have multiple classes, like suggested by #Prophet.
You can also use only one of the classes and simply use the class_name selector.
In your case, if you need the first element of the page with that class, you have to add [0].

How to find the second occurence of .find_element_by_class_name?

I have tried,
browser.find_element_by_class_name('("my_class")[1]')
browser.find_element_by_class_name('("my_class")[position()=1]')
browser.find_element_by_class_name("my_class")[1]
The "easy" answer
The simple way to get what you want is to use the plural form of the locator method you are already using, find_elements_by_class_name(). The plural forms return a list instead of just the first match so in your case you would use
find_elements_by_class_name("my_class")[1]
The find_elements_* method returns a list and the [1] at the end specifies to return only the second item in the collection (the index starts at 0).
What I would use
I generally don't use *_by_class_name() because it's rare that I'm only looking for a class. I typically at least specify the tag name also, e.g. div.my_class. Another option, and the one I typically use, is a CSS selector. CSS selectors should be preferred over XPath because of better performance, better support, etc.*
An example
<div class="class1 class2 class3">123</div>
<div class="class2">2</div>
<div class="class3">3</div>
<div class="class1 class2">12</div>
<div class="class1 class3">13</div>
<div class="class1">1</div>
<div class="class2 class3">23</div>
If you had the above HTML and wanted the second instance of "class1", you would use
driver.find_elements_by_css_selector("div.class1")[1]
Another advantage of CSS selectors over XPath is that CSS selectors look for the class name amongst multiple class names on an element where XPath can only do a text search which can lead to false or missed matches. The CSS selector above would return 4 total elements: "123", "12", "13", "1". The index [1] returns only the second instance, "12".
If you used the XPath that DebanjanB suggested,
//*[#class='my_class'][position()=2]
it would return nothing. That's because there's only one element that has the exact string "my_class" as the class. It misses all the other elements that contain but are not only "my_class". You could improve it to find them all but it still has all the downfalls of XPath vs CSS selectors, it's much longer, and so on...
See the Selenium-python docs for more info on ways to find elements.
*If you need more details on the why, there's a number of articles already addressing this or look through the videos on the Selenium Conference YT channel and watch some of the keynote addresses by Simon Stewart or other Selenium contributors.
Don't forget
You may need to use a wait if the page is slow to load.
On some lazy loading pages you will need to scroll the page to get the additional elements to load.

Pulling text value from <a class> <b class> with Selenium

I'm using selenium to draw some information about ownership for a given PredictIt market (ie, https://www.predictit.org/Home/SingleOption?contractId=7347#data). The shares owned is nested in:
How can I pull out the number?
I've tried
self.driver.find_element_by_class_name("showpointer showOwnership").text
self.driver.find_element_by_id('showpointer showOwnership')
self.driver.find_element_by_class_name("label alert-success label-lg")
self.find_element_by_css_selector("spand[class='label alert-success label-lg']")
self.driver.findElement(By.cssSelector("#ctrlNotesWindow .notesData > .notesDate")).getText())
all to no avail. Any suggestions would be greatly appreciated! Thanks :)
Edit: All errors have been:
"NoSuchElement Error"
Why your attempts failed:
the "by class name" locators requires you to specify a single class name - not multiple of them
there are no id elements present - the "by id" locator would match nothing
you are looking for a non-existent spand element with your CSS selector. Plus, you are calling find_element_by_css_selector on self instead of self.driver
your last attempt is in Java, not in Python
Judging by what you've posted, I would use a CSS selector checking classes of a and b:
self.driver.find_element_by_css_selector("a.showOwnership > b.label").text

What is a unique identifier and how to use it to select?

I use Selenium and I am trying to automate a task on a website and in order to select an item I have to use this:
select = driver.find_element_by_*whatever*
However, all the whatevers like find_element_by_id, by name, by tag name etc. are either unavailable or are shared by several items. The only one that seems to be unique to each item is a "data-id" number but there isn't a find_element_by_data_id function as far as I know.
I can get a unique identifier which looks like this:
div.item:nth-child(453)
It seems to fit since it doesn't change when I reload the page and is unique to only one item.
How can I use this unique identifier to select the object? Alternatively, could you suggest a way of how I could select the desired item?
Here's the HTML pertaining to the object:
...
</div>
<div data-id="3817366931"
data-slot="secondary"
data-classes="pyro"
data-content="Level: 30<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_manmelter.b76b87bda3242806c05a6201a4024a560269e805.png);"
data-title="Manmelter"
data-defindex="595">
</div>
<div data-id="3820690816"
data-slot="primary"
data-classes="pyro"
data-content="Level: 10<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_phlogistinator.99b83086e28b2f85ed4c925ac5e3c6e123289aec.png);"
data-title="Phlogistinator"
data-defindex="594">
</div>
<div data-id="3819377317"
data-slot="primary"
data-classes="pyro"
data-content="Level: 10<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_phlogistinator.99b83086e28b2f85ed4c925ac5e3c6e123289aec.png);"
data-title="Phlogistinator"
data-defindex="594">
So the items in the two bottom boxes are the same. The one at the top is different. Let's I would like a way to select the item in the second box.
I am not sure how easy it will be to automate the scenario based on the html structure like this. I would suggest you to talk to the devs to see if they can add some kind of ids to each parent div otherwise the selector will be too brittle. I also see the data-id attribute is unique in every case so that could be your best bet if you somehow know the ids beforehand. If you do not have any other options then css nth-child() function is the next most reliable mechanism. But, in that case you have to know the parent. nth-child() is well explained here
On the other hand, if the intention is to find the second data-slot you can use the following xpath:
//div[#data-slot='primary'][2]

Selenium XPath multiple attributes including text

Here is the HTML I'm dealing with
<a class="_54nc" href="#" role="menuitem">
<span>
<span class="_54nh">Other...</span>
</span>
</a>
I can't seem to get my XPath structured correctly to find this element with the link. There are other elements on the page with the same attributes as <a class="_54nc"> so I thought I would start with the child and then go up to the parent.
I've tried a number of variations, but I would think something like this:
crawler.get_element_by_xpath('//span[#class="_54nh"][contains(text(), "Other")]/../..')
None of the things I've tried seem to be working. Any ideas would be much appreciated.
Or, more cleaner is //*[.='Other...']/../.. and with . you are directly pointing to the parent element
In other scenario, if you want to find a tag then use css [role='menuitem'] which is a better option if role attribute is unique
how about trying this
crawler.get_element_by_xpath('//a[#class="_54nc"][./span/span[contains(text(), "other")]]')
Try this:
crawler.get_element_by_xpath('//a[#class='_54nc']//span[.='Other...']');
This will search for the element 'a' with class as "_54nc" and containing exact text/innerHTML "Other...". Furthermore, you can just edit the text "Other..." with other texts to find the respective element(s)

Categories

Resources