What is a unique identifier and how to use it to select?

What is a unique identifier and how to use it to select? - python

I use Selenium and I am trying to automate a task on a website and in order to select an item I have to use this:
select = driver.find_element_by_*whatever*
However, all the whatevers like find_element_by_id, by name, by tag name etc. are either unavailable or are shared by several items. The only one that seems to be unique to each item is a "data-id" number but there isn't a find_element_by_data_id function as far as I know.
I can get a unique identifier which looks like this:
div.item:nth-child(453)
It seems to fit since it doesn't change when I reload the page and is unique to only one item.
How can I use this unique identifier to select the object? Alternatively, could you suggest a way of how I could select the desired item?
Here's the HTML pertaining to the object:
...
</div>
<div data-id="3817366931"
data-slot="secondary"
data-classes="pyro"
data-content="Level: 30<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_manmelter.b76b87bda3242806c05a6201a4024a560269e805.png);"
data-title="Manmelter"
data-defindex="595">
</div>
<div data-id="3820690816"
data-slot="primary"
data-classes="pyro"
data-content="Level: 10<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_phlogistinator.99b83086e28b2f85ed4c925ac5e3c6e123289aec.png);"
data-title="Phlogistinator"
data-defindex="594">
</div>
<div data-id="3819377317"
data-slot="primary"
data-classes="pyro"
data-content="Level: 10<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_phlogistinator.99b83086e28b2f85ed4c925ac5e3c6e123289aec.png);"
data-title="Phlogistinator"
data-defindex="594">
So the items in the two bottom boxes are the same. The one at the top is different. Let's I would like a way to select the item in the second box.

I am not sure how easy it will be to automate the scenario based on the html structure like this. I would suggest you to talk to the devs to see if they can add some kind of ids to each parent div otherwise the selector will be too brittle. I also see the data-id attribute is unique in every case so that could be your best bet if you somehow know the ids beforehand. If you do not have any other options then css nth-child() function is the next most reliable mechanism. But, in that case you have to know the parent. nth-child() is well explained here
On the other hand, if the intention is to find the second data-slot you can use the following xpath:
//div[#data-slot='primary'][2]

Related

what's difference between page.locator("xx",has=Locator) and page.locator("xx").filter(has=Locator) in Playwright

HTML like:
<ul>
<li>
<h3>Product 1</h3>
<button class="button">Add to cart</button>
</li>
<li>
<h3>Product 2</h3>
<button class="button">Add to cart</button>
</li>
</ul>
when i wanna locating the first <li>, i can write like:
#the first solution:
page.locator("li",has=page.get_by_text("Product 1"))
#the second solution:
page.locator("li").filter(has=page.get_by_text("Product 1"))
what's any detailed difference between these two solutions?

They both return the same element. However, with the second solution you can always sprinkle a few assertions before filtering down within that element.
// get element
el = page.locator('li')
// check it is visible before searching within
expect(el).toBeVisible()
// now we get product 1
product1 = el.filter(has=page.get_by_text("Product 1")

In your scenario, those two lines of code will give you the exact same result. The reason for having the two approaches is that they can be used for different purposes.
The has option within the locator method only seems to be available within that method, but otherwise is noted to be similar to filter.
The filter method is more robust. You can use it on any locator created using any method, like get_by_role. It also allows you to perform the filter separately/later, on any locator, so you can just filter when it’s needed for a specific scenario or whenever it makes sense. Another benefit is that you can chain as many as you want, for instance to see if it has two specific elements inside it.
Basically, has within locator is just a more limited or specific use case of filter, while filter can do the same functionality plus more.
Since you’re using locator, with one filter, and don’t need the general locator to use or filter again to get a different element, the has within locator option works for you equally with filter.

How to find the second occurence of .find_element_by_class_name?

I have tried,
browser.find_element_by_class_name('("my_class")[1]')
browser.find_element_by_class_name('("my_class")[position()=1]')
browser.find_element_by_class_name("my_class")[1]

The "easy" answer
The simple way to get what you want is to use the plural form of the locator method you are already using, find_elements_by_class_name(). The plural forms return a list instead of just the first match so in your case you would use
find_elements_by_class_name("my_class")[1]
The find_elements_* method returns a list and the [1] at the end specifies to return only the second item in the collection (the index starts at 0).
What I would use
I generally don't use *_by_class_name() because it's rare that I'm only looking for a class. I typically at least specify the tag name also, e.g. div.my_class. Another option, and the one I typically use, is a CSS selector. CSS selectors should be preferred over XPath because of better performance, better support, etc.*
An example
<div class="class1 class2 class3">123</div>
<div class="class2">2</div>
<div class="class3">3</div>
<div class="class1 class2">12</div>
<div class="class1 class3">13</div>
<div class="class1">1</div>
<div class="class2 class3">23</div>
If you had the above HTML and wanted the second instance of "class1", you would use
driver.find_elements_by_css_selector("div.class1")[1]
Another advantage of CSS selectors over XPath is that CSS selectors look for the class name amongst multiple class names on an element where XPath can only do a text search which can lead to false or missed matches. The CSS selector above would return 4 total elements: "123", "12", "13", "1". The index [1] returns only the second instance, "12".
If you used the XPath that DebanjanB suggested,
//*[#class='my_class'][position()=2]
it would return nothing. That's because there's only one element that has the exact string "my_class" as the class. It misses all the other elements that contain but are not only "my_class". You could improve it to find them all but it still has all the downfalls of XPath vs CSS selectors, it's much longer, and so on...
See the Selenium-python docs for more info on ways to find elements.
*If you need more details on the why, there's a number of articles already addressing this or look through the videos on the Selenium Conference YT channel and watch some of the keynote addresses by Simon Stewart or other Selenium contributors.
Don't forget
You may need to use a wait if the page is slow to load.
On some lazy loading pages you will need to scroll the page to get the additional elements to load.

python selenium accessing html elements on more than two levels

I have a page that has to be scrapped.I use the python code
div = driver.find_element_by_class_name("parent")
data = div.find_elements_by_class_name("child1")
//I cannot access the web elements of **data** for eg: data.find_elements_by
for tag in data
//I cannot print the information of each div here
the Html
<div class="Parent">
<div class = child1 >
<div class = "heading">
data
</div>
</div>
<div class = child1 child2 >strong text
<div class = "heading">
<span>data</span>
</div>
</div>
</div>
Is there an easy way to access data

Well you can access html tags or text in different ways http://selenium-python.readthedocs.io/locating-elements.html
For multiple elements you can use :
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
There isn't a simple solution as far as I'm aware only by having specifics about the information you're looking for.
For instance let's you're using xpath (my personal preference):
Absolute XPath :
/html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
We can use Absolute xpath: /html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
Above xpath will technically work, but each of those nested
relationships will need to be present 100% of the time, or the locator
will not function. Above choosed xpath is known as Absolute xpath.
There is a good chance that your xpath will vary in every release. It
is always better to choose Relative xpath, as it helps us to reduce
the chance of element not found exception.
Relative xpath: //*[#id=’social-media’]/ul/li[3]/a
We can have a different approach to the data, therefore by using the correct way to 'select' the data we need, we can only extract/select the needed information. Look into each of these methods to understand them better, because you're asking for one line of code and each of those have their pros/cons (times when they can be useful or not).

It seems you want to access text which is inside heading div, if it is so then you can try the below code.
element=driver.find_element_by_class_name("heading")
data=element.text

assuming you are asking a way to loop through data where the info present is located in different locators in different nesting levels
There are multiple ways,
look for various selectors that match your pattern - find a way to do it that matches your problem - you refer css/xpath selector reference
if there are many selectors( but consistenly being used), you can use ByChained/ByAll Selectors look for the implementation in java, it will be like this, you can mimic the implementation,
selector1 = .Heading .child2
selector3 = .Heading .child3 span
selector2 = .Heading .child1
ByAll(selector1,selector2,selector3)'
if the parent is the only matching selector and there's no way to know abt child selectors, then another way is to use innerText/textContent property from a common parent
driver.findElement(By.cssSelector
('.child1').getAttribute('innerText')
if none of these, solves your problem, and you application is dynamic enough to use different references and different nesting levels each time for all the page, then it was meant to be not scrapped. so your should look for other ways of scrapping it.

to get xpath in python selenium for ID and class together

In python selenium, how to create xpath for below code which needs only id and class:
<button type="button" id="ext-gen756" class=" x-btn-text">Save</button>
And I also need to select Global ID from below drop-down without clicking it.
<div class="x-combo-list-item">Global ID</div>
My below solution is not working-
//div[#class='x-combo-list-item']/div[contains(.,'Global ID')]
I do not want to mention droplist sequence number like-
//div[#class='x-combo-list-item']/div[1]

If you want to club id and class together in your xpath try like this-
driver.find_element_by_xpath('//button[#id="ext-gen756"][#class=" x-btn-text"]');
You can also try the same using AND -
driver.find_element_by_xpath('//button[#id="ext-gen756" and #class=" x-btn-text"]');
EDITED
Your xpath seem incorrect. Use following -
driver.find_element_by_xpath('//div[#class="x-combo-list-item"][contains(.,"Global ID")]');

Just answering my own question after a long time had a look on this. The Question was posted when I was new in xpath topics.
<button type="button" id="ext-gen756" class=" x-btn-text">Save</button>
in terms of id and class:
driver.find_element_by_xpath("//button[#id='ext-gen756'][#class=' x-btn-text']")
Also sometime if Id's are dynamic and changes for every reload of the page then you may try:
driver.find_element_by_xpath("//button[#type='Save'][contains(#id,'ext-gen')][#class=' x-btn-text']")
Here I have used #type and for the #id contains option as prefix(ext-gen) usually remains the same for the dynamic ID's

Selenium XPath multiple attributes including text

Here is the HTML I'm dealing with
<a class="_54nc" href="#" role="menuitem">
<span>
<span class="_54nh">Other...</span>
</span>
</a>
I can't seem to get my XPath structured correctly to find this element with the link. There are other elements on the page with the same attributes as <a class="_54nc"> so I thought I would start with the child and then go up to the parent.
I've tried a number of variations, but I would think something like this:
crawler.get_element_by_xpath('//span[#class="_54nh"][contains(text(), "Other")]/../..')
None of the things I've tried seem to be working. Any ideas would be much appreciated.

Or, more cleaner is //*[.='Other...']/../.. and with . you are directly pointing to the parent element
In other scenario, if you want to find a tag then use css [role='menuitem'] which is a better option if role attribute is unique

how about trying this
crawler.get_element_by_xpath('//a[#class="_54nc"][./span/span[contains(text(), "other")]]')

Try this:
crawler.get_element_by_xpath('//a[#class='_54nc']//span[.='Other...']');
This will search for the element 'a' with class as "_54nc" and containing exact text/innerHTML "Other...". Furthermore, you can just edit the text "Other..." with other texts to find the respective element(s)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is a unique identifier and how to use it to select? - python

Related

what's difference between page.locator("xx",has=Locator) and page.locator("xx").filter(has=Locator) in Playwright

How to find the second occurence of .find_element_by_class_name?

python selenium accessing html elements on more than two levels

to get xpath in python selenium for ID and class together

Selenium XPath multiple attributes including text

Categories

Resources