Selenium Python - Finding div containing two specific elements - python

I'm building a python script using selenium, and have ran into a quite confusing problem.
The website lists products using a name, which is not unique, and a color, which is not unique either. The color and name elements have the same parent.
My script gets user input on which product he wants the script to buy for him, and which color.
The problem:
I can't for the life of me figure out how to select the right product using the two variables productName and productColor.
DOM:
<div class="inner-article">
<h1>
<a class="product-name">Silk Shirt</a>
</h1>
<p>
<a class="product-color">Black</a>
</p>
</div>
What i've tried so far:
Obviously, selecting the first product named Silk Shirt on the page is quite easy. I considered selecting the first product, then selecting that products parent, selecting that elements parent, then selecting that parents second child, checking if it was black, and proceeding, but CSS doesn't have a parent selector.
How would i go about doing this?

Create a main loop that selects each div class="inner-article" element.
In the loop, look for elements that have an h1 child element and an a class=product-name grandchild element with text of "Silk Shirt", and a p child element and a a class=product-color grandchild element with text of "Black".

Perhaps try searching using xpath. The xpath below will return the div element that contains the product and color you desire.
driver.find_element_by_xpath('//div[#class="inner-article"][.//a[#class="product-name"][.="Silk Shirt"]][.//a[#class="product-color"][.="Black"]]')
To make it reusable:
def select_product(name, color):
return driver.find_element_by_xpath('//div[#class="inner-article"][.//a[#class="product-name"][.="{product_name}"]][.//a[#class="product-color"][.="{product_color}"]]'.format(product_name=name, product_color=color))

Related

How can I search a specific element on the page for text using XPATH (Selenium);

I can't seem to find an example of this.
What I am trying to do is search a specific div element on the page for text that has the potential to change.
So it'd be like this
<div id="coolId">
<div>This</div>
<div>Can</div>
<div>Change depending on the iteration of the page</div>
</div>
In my case, the div coolID will always be present, but the text within it's inner divs and child elements will change depending on which iteration of the page is loaded, and I need to search for the presence of certain terms within this coolID div and cool div only because I know it will always be there, and I'd like to specify the search as much as possible so as not to potentially contaminate results with other text from other places on the page.
In my head, I sort of see it like this (using the above example):
"//div[#id='coolId', contains(text(), 'Change depending on the iteration of the page')]"
Or something to this effect.
Does anyone know how to do this?
I'm not completely sure you can set a correct XPath based on all 3 inner elements texts.
What you clearly can is to locate the outer div with id = coolId based on one of the inner texts that will be unique and then to extract all the inner texts form it.
the_total_text = driver.find_element_by_xpath("//div[#id and contains(.,'Change depending on the iteration of the page')]").text
This will give you
the_total_text = This Can Change depending on the iteration of the page
You should try:
div_element_with_needed_text = driver.find_element_by_xpath("//div[#id='coolId']/div[text()[contains(.,'Change depending on the iteration of the page')]]")
Considering the HTML:
<div id="coolId">
<div>This</div>
<div>Can</div>
<div>Change depending on the iteration of the page</div>
</div>
to retrieve the variable texts with respective to the parent <div id="coolId"> you can use the following solutions:
Extracting This using xpath:
first_child_text = driver.find_element(By.XPATH, "//div[#id='coolId']//following::div[1]").text
Extracting Can using xpath:
second_child_text = driver.find_element(By.XPATH, "//div[#id='coolId']//following::div[2]").text
Extracting Change depending on the iteration of the page using xpath:
third_child_text = driver.find_element(By.XPATH, "//div[#id='coolId']//following::div[3]").text
To extract all the texts from the decendents using xpath:
all_child_text = driver.find_element(By.XPATH, "//div[#id='coolId']").text

How would you click all texts on a page with Xpath - Python

So, this won't be a long description, but I am trying to have xpath click on all of the elements (more specifically text elements) that are on a page. I really don't know where to start, and all of the other questions on clicking everything on a page is based on a class, not a text using xpath.
Here is some of my code:
browser.find_element_by_xpath("//*[text()='sample']").click()
I really don't know how I would go about to make it click all of the "sample" texts throughout the whole page.
Thanks in advance!
Well, let's say that you have lots of Divs or spans that contains text. Let's figure out Divs :
<div class="some class name" visibility ="visible" some other attribute> Text here </div>
Now when you go to developer mode(F12) in elements section and if you do this //div[contains(#class,'some class name')] and if there are more than 1 entry then you can store all of them in a list just like below :
driver.find_elements(By.XPATH, '//div[contains(#class,'some class name')]')
this will give you a list of divs web element.
div_list = driver.find_elements(By.XPATH, '//div[contains(#class,'some class name')]')
Now you have a python list and you can manipulate this list as per your requirement.
for div_text in div_list:
print(div_text.text)
Same way you can try for span or different web elements.
You just need to use that xpath to define an array of elements instead, like this:
my_elements = browser.find_elements_by_xpath("//*[text()='sample']")
for element in my_elements:
element.click();
That loop may not work as is (you could maybe add a wait for element) but that's the idea.

How to search two driver.find_element_by_partial_link_text

I am trying to create an automated bot to purchase items from supreme python/selenium.
When I am on the products page and I use a driver.find_element_by_partial_link_text('Flight Pant') to find the product I want to buy, however I also want to select the colour of the product so I use a driver.find_element_by_partial_link_text('Black') but by doing this I am returned with the first Black product on the page instead of Flight pants that are Black. Any idea how I would achieve this goal?
here is the site link where I am try to achieve this,
http://www.supremenewyork.com/shop/all/pants
Note - I am unable to use xpaths for this, as the products change on a weekly bases so I would be unable to get the xpath for the product before it goes live on the site.
Any advice or guidance would be greatly appreciated.
You can use XPath, but the maneuver is slightly trickier. The XPath would be:
driver.find_element_by_xpath('//*[contains(text(), "Flight Pant")]/../following-sibling::p/a[contains(text(), "Black")]')
Assuming the structure of the page doesn't change on a weekly basis... To explain my XPath:
//*[contains(text(), "Flight Pant")]
Select any node that contains the text "Flight Pant". These are all <a> tags.
/../following-sibling::p
Notice how the DOM looks:
<h1>
<a class="name-link" href="/shop/pants/dfkjdafkj">Flight Pant</a>
</h1>
<p>
<a class="name-link" href="/shop/pants/pvfcp0txzy">Black</a>
</p>
So we need to go to the parent and find its sibling that is a <p> element.
/a[contains(text(), "Black")]
Now go to the <a> tag that has the text Black.
The reason there's not really any other alternative to XPath is because there's no unique way to identify the desired element by any other means (tag name, class, link text, etc.)
After finding elements by the link text "Flight pants" , iterate over each found result and extract its css color attributes. Its a psuedo-code. You have to fine tune the specific color extraction web elements.
elements = driver.find_elements_by_partial_link_text("Flight Pants")
for element in elements :
if(element.get_css_value('color').lower() == "black")
element.click()
break

What is a unique identifier and how to use it to select?

I use Selenium and I am trying to automate a task on a website and in order to select an item I have to use this:
select = driver.find_element_by_*whatever*
However, all the whatevers like find_element_by_id, by name, by tag name etc. are either unavailable or are shared by several items. The only one that seems to be unique to each item is a "data-id" number but there isn't a find_element_by_data_id function as far as I know.
I can get a unique identifier which looks like this:
div.item:nth-child(453)
It seems to fit since it doesn't change when I reload the page and is unique to only one item.
How can I use this unique identifier to select the object? Alternatively, could you suggest a way of how I could select the desired item?
Here's the HTML pertaining to the object:
...
</div>
<div data-id="3817366931"
data-slot="secondary"
data-classes="pyro"
data-content="Level: 30<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_manmelter.b76b87bda3242806c05a6201a4024a560269e805.png);"
data-title="Manmelter"
data-defindex="595">
</div>
<div data-id="3820690816"
data-slot="primary"
data-classes="pyro"
data-content="Level: 10<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_phlogistinator.99b83086e28b2f85ed4c925ac5e3c6e123289aec.png);"
data-title="Phlogistinator"
data-defindex="594">
</div>
<div data-id="3819377317"
data-slot="primary"
data-classes="pyro"
data-content="Level: 10<br/>"
data-appid="440"
class="item hoverable quality6 app440"
style="opacity:1;background-image:url(https://steamcdn-a.akamaihd.net/apps/440/icons/c_drg_phlogistinator.99b83086e28b2f85ed4c925ac5e3c6e123289aec.png);"
data-title="Phlogistinator"
data-defindex="594">
So the items in the two bottom boxes are the same. The one at the top is different. Let's I would like a way to select the item in the second box.
I am not sure how easy it will be to automate the scenario based on the html structure like this. I would suggest you to talk to the devs to see if they can add some kind of ids to each parent div otherwise the selector will be too brittle. I also see the data-id attribute is unique in every case so that could be your best bet if you somehow know the ids beforehand. If you do not have any other options then css nth-child() function is the next most reliable mechanism. But, in that case you have to know the parent. nth-child() is well explained here
On the other hand, if the intention is to find the second data-slot you can use the following xpath:
//div[#data-slot='primary'][2]

Iterating Over Elements and Sub Elements With lxml

This one is for legitimate lxml gurus. I have a web scraping application where I want to iterate over a number of div.content (content is the class) tags on a website. Once in a div.content tag, I want to see if there are any <a> tags that are the children of <h3> elements. This seems relatively simple by just trying to create a list using XPath from the div.cont tag, i.e.,
linkList = tree.xpath('div[contains(#class,"cont")]//h3//a')
The problem is, I then want to create a tuple that contains the link from the div.content box as well as the text from the paragraph element of the same div.content box. I could obviously iterate over the whole document and store all of the paragraph text as well as all of the links, but I wouldn't have any real way of matching the appropriate paragraphs to the <a> tags.
lxml's Element.iter() function could ALMOST achieve this by iterating over all of the div.cont elements, ignoring those without <a> tags, and pairing up the paragraph/a combos, but unfortunately there doesn't seem to be any option for iterating over class names, only tag names, with that method.
Edit: here's an extremely stripped down version of the HTML I want to parse:
<body>
<div class="cont">
<h1>Random Text</h1>
<p>The text I want to obtain</p>
<h3>The link I want to obtain</h3>
</div>
</body>
There are a number of div.conts like this that I want to work with -- most of them have far more elements than this, but this is just a sketch to give you an idea of what I'm working with.
You could just use a less specific XPath expression:
for matchingdiv in tree.xpath('div[contains(#class,"cont")]'):
# skip those without a h3 > a setup.
link = matchingdiv.xpath('.//h3//a')
if not link:
continue
# grab the `p` text and of course the link.
You could expand this (be ambitious) and select for the h3 > a tags, then go to the div.cont ancestor (based off XPath query with descendant and descendant text() predicates):
for matchingdiv in tree.xpath('.//h3//a/ancestor::*[self::div[contains(#class,"cont")]]'):
# no need to skip anymore, this is a div.cont with h3 and a contained
link = matchingdiv.xpath('.//h3//a')
# grab the `p` text and of course the link
but since you need to then scan for the link anyway that doesn't actually buy you anything.

Categories

Resources