XPATH target div and image in loop? - python

Here's the document struvture:
<div class="search-results-container">
<div>
<div class="feed-shared-update-v2">
<div class="update-components-actor">
<div class="update-components-actor__image">
<img class="presence-entity__image" src="https://www.testimage.com/test.jpg"/>
<span></span>
<span>test</span>
</div>
</div>
</div>
</div>
<div>
<div class="feed-shared-update-v2">
<div class="update-components-actor">
<div class="update-components-actor__image">
<img class="presence-entity__image" src="https://www.testimage.com/test.jpg"/>
<span></span>
<span>test</span>
</div>
</div>
</div>
</div>
</div>
not sure the best way to do this but hoping someone can help. I have a for loop that grabs all the divs that precede a div with class "feed-shared-update-v2". This works:
elements = driver.find_elements(By.XPATH, "//*[contains(#class, 'feed-shared-update-v2')]//preceding::div[1]");
I then run a for loop over it:
for card in elements:
however i'm having trouble trying to target the img and the second span in these for loops. I tried:
for card in elements:
profilePic = card.find_element(By.XPATH, ".//following::div[#class='update-components-actor']//following::img[1]").get_attribute('src')
text = card.find_element(By.XPATH, ".//following::div[#class='update-components-text']//following::span[2]").text
but this produces a error saying:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//following::div[#class='update-components-actor']//following::img[1]"}
so I'm hoping someone can point me in the right direction as to what i'm doing wrong. I know its my xpath syntax and i'm not allowed to chain "followings" (although even just trying .//following doesn't work, so is ".//" not the right syntax?) but i'm not sure what the right syntax should be, especially since the span does not have a class. :(
Thanks!

I guess you are overusing the following:: axis. Simply try the following (no pun intended):
For your first expression use
//*[contains(#class, 'feed-shared-update-v2')]/..
This will select the parent <div> of the <div class="feed-shared-update-v2">. So you will select the whole surrounding element.
To retrieve the children you want, use these XPaths: .//img/#src and .//span[2]. Full code is
for card in elements:
profilePic = card.find_element(By.XPATH, ".//img").get_attribute('src')
text = card.find_element(By.XPATH, ".//span[2]").text
That's all. Hope it helps.

It seems in the span that there is not such class of div called: update-components-text
did you mean: update-components-actor?
Im not such a fan of xpath, but when i copied your html and img selector, it did find me 2 img, maybe you are not waiting for the element to load, and then it fails?
try using implicit/explicit waits in your code.
I know you are using xpath, but concider using css
This might do the trick:
.feed-shared-update-v2 span:nth-of-type(2)
And if you want a css of the img:
.feed-shared-update-v2 img

Related

Python, Selenium: How to get text next to element

I'm fairly new to selenium and I'm trying to get the text of a cell next to a known element.
This is an excerpt of a webtable:
<div class="row">
<div class="cell">
text-to-copy
</div>
<div class="cell">
<input type="text" size="10" id="known_id" onchange="update(this.id);" onclick="setElementId(this.id);"/>
X
</div>
<div class="cell right">
<div id="some_id">?</div>
</div>
</div>
It looks something like this:
From this table I would like to get the text-to-copy with selenium. As the composition of the table can vary, there is no way to know that cells xpath. Therefore I can not use selenium_driver.find_element_by_xpath(). The only known thing is the id of the cell next to it (id=known_id).
The following pseudo code is to illustrate what I'm looking for:
element = selenium_driver.find_element_by_id("known_id")
result = element.get_visible_text_from_cell_before_element()
Is there a way to get the visible text (text-to-copy) with selenium?
I believe you can fairly use xpath, all other locators that Selenium supports would not work, becasue we have to traverse upward in DOM.
The below xpath is dependent on known_id
//input[contains(#id,'known_id')]/../preceding-sibling::div
You have to either use .text or .get_attribute etc to get the text.
Sample code :
time.sleep(5)
element = selenium_driver.find_element_by_xpath("//input[contains(#id,'known_id')]/../preceding-sibling::div").get_attribute('innerText')
print(element)

Can't find html sub-elements from inside an element

I'm somewhat inexperienced in scraping websites with lots of sub elements and am trying to understand the best way to loop through elements that have data you want buried in further levels of sub elements.
Here is an example HTML
<div class="s-item__info clearfix">
<h3 class="s-item__title">The Music Tree Activities Book: Part 1 (Music Tree (Summy)) by Clark, Frances, </h3>
</a>
<div class="s-item__subtitle"><span class="SECONDARY_INFO">Pre-Owned</span></div>
<div class="s-item__reviews">
</div>
<div class="s-item__details clearfix">
<div class="s-item__detail s-item__detail--primary"><span class="s-item__price">$3.99</span></div>
<span class="s-item__detail s-item__detail--secondary">
</span>
<div class="s-item__detail s-item__detail--primary"><span class="s-item__purchase-options-with-icon" aria-label="">Buy It Now</span></div>
<div class="s-item__detail s-item__detail--primary"><span class="s-item__shipping s-item__logisticsCost">Free shipping</span></div>
<div class="s-item__detail s-item__detail--primary"><span class="s-item__free-returns s-item__freeReturnsNoFee">Free returns</span></div>
<div class="s-item__detail s-item__detail--primary"></div>
</div>
</div>
There are multiple items so I started by getting all of them in a list and I can find each title by iterating through but am having an issue getting price. Example code
for item in driver.find_elements_by_class_name("s-item__info"):
title = item.find_element_by_xpath('.//h3')
print(title.text)
details = item.find_element_by_xpath('.//span[#class="s-item__price"]')
print(details.text)
This gets the Title of the item, but can't find the price. If I look outside of "s-item_info" element and just use the driver I can get all the prices with the code below, but wondering why it cant find it in the info element, I would think the details would be a subelement and .// would look through those.
driver.find_elements_by_class_name("s-item__price")
Have also tried
find_element_by_xpath('.//div[#class="s-item__detail"]//span[#class="s-item__price"]')
I can grab the data I need but want to understand why I can't get the price when I try to iterate through each item. Thanks
See if this works
for item in driver.find_elements_by_class_name("s-item__info"):
title = item.find_element_by_xpath('.//h3')
print(title.text)
details = item.find_element_by_xpath('.//following::div[contains(#class,'s-item__details')]//span[#class='s-item__price']')
print(details.text)
OK, there are several problems here:
s-item__info is not the only class name on that element, you should use
//div[contains(#class,'s-item__info')] instead
The first element matching this class name is not a valid search result.
The simples approach to make your code work can be:
for item in driver.find_elements_by_xpath("//div[contains(#class,'s-item__info')]"):
title = item.find_elements_by_xpath('.//h3')
if(title):
print(title[0].text)
details = item.find_elements_by_xpath('.//span[#class="s-item__price"]')
if(details):
print(details[0].text)
This will print data if existing, otherwise just print empty strings

Select a div for a specific class which has child elements that contains a certain text

Given this html sample :
<div class="measure-tab"> --- i want to select this one
<span class="color_title">someText</span>
</div>
<div class="measure-tab"> --- i dont want to select this one
<span class="color_title">someText</span>
<div>
<span class="qwery">jokerText</span>
</div>
</div>
<div class="measure-tab"> --- i want to select this one
<span class="color_title">someText</span>
</div>
I want to select the div that has #class='measure-tab' which has under it a span that as a specific class and text = someText and a nested span that has a specific class and does not contain text = 'jokerText', all this in an XPATH
What i've tried is :
//div[contains(#class, 'measure-tab') and //span[#class="color_title" and (contains(text(),'someText')) and //span[#class="color_title" and not(contains(text(),'jokerText'))]]
But this dosen't seem to work.
I also used This post as inspiration.
EDIT : Corrected bad description of what is the goal for this question
EDIT, made a new solution :
//div[contains(#class, 'measure-tab') and //span[contains(#class, 'color_title') and //span[not(contains(#class, 'qwery'))]]]
But this returns all the divs, instead of not matching it with --- i dont want to select this one
<span class="color_title">someText</span>
<div>
<span class="qwery">jokerText</span>
</div>
I feel so close but yet so far, haha, it dosen't make sense for me why is it matching it with <span class="qwery">jokerText</span> when i wrote not contains there
I believe this is what you are looking for-
MyDivs = driver.find_elements_by_xpath("//div[#class='measure-tab' and not(descendant::*[text() = 'jokerText' and #class = 'qwery'])]")
This will select all the div tag which does not have jokerText in it.
You can query with not(following-sibling::div/span.....)
Try with following xpath:
//span[#class='color_title' and not(following-sibling::div/span[#class='qwery' and text()='jokerText'])]/parent::div[#class='measure-tab']

Elements Inside Opening Tag

I am writing a spider to download all images on the front page of a subreddit using scrapy. To do so, I have to find the image links to download the images from and use a CSS or XPath selector.
Upon inspection, the links are provided but the HTML looks like this for all of them:
<div class="expando expando-uninitialized" style="display: none" data-cachedhtml=" <div class="media-preview" id="media-preview-7lp06p" style="max-width: 861px"> <div class="media-preview-content"> <img class="preview" src="https://i.redditmedia.com/Q-LKAeFelFa9wAdrnvuwCMyXLrs0ULUKMsJTXSf3y34.jpg?w=861&s=69085fb507bed30f1e4228e83e24b6b2" width="861" height="638"> </div> </div> " data-pin-condition="function() {return this.style.display != 'none';}"><span class="error">loading...</span></div>
From what I can tell, it looks like all of the new elements are being initialized inside the opening tag of the <div> element. Could you explain what exactly is going on here, and how one would go about extracting image information from this?
*Sorry, I'm not quite sure how to properly format the html code, but there really isn't all too much to format, as it is all one big tag anyway.
How to read the mangled attribute, data-cachedhtml
The HTML is a mess. Try the techniques listed in How to parse invalid (bad / not well-formed) XML? to get viable markup before using XPath. It may take three passes:
Cleanup the markup mess.
Get the attribute value of data-cachedhtml.
Use XPath to extract the image links.
XPath part
For the de-mangled data-chachedhtml in this form:
<div class="media-preview" id="media-preview-7lp06p" style="max-width: 861px">
<div class="media-preview-content">
<a href="https://i.redd.it/29moua43so501.jpg" class="may-blank">
<img class="preview" src="https://i.redditmedia.com/elided"
width="861" height="638"/>
</a>
</div>
<span class="error">loading...</span>
</div>
This XPath will retrieve the preview image links:
//a/img/#src
(That is, all src attributes of img element children of a elements.)
or
This XPath will retrieve the click-through image links:
//a[img]/#href
(That is, all href attributes of the a elements that have a img child.)

how to insert all id with similar text in a list with selinium python

<div>
<div id="ide_1"> </div>
<div id="ide_3"> </div>
<div id="ide_5"> </div>
<div id="ide_7"> </div>
</div>
I want to select all ids of the child div and insert them in a list but i didn't find any solution to get into the parent div. I am trying to find all id that's similar to ide_ because that's the fix part which wouldn't change.
You can use css_selector search for all ids that contains ide_
find_elements_by_css_selector('[id*="ide_"]')
You can use the find_elements_by_xpath() , this will return a list of elements with specified path.
Lets say your div is located as
<html>
<body>
<form>
<table>
<div>
Then you have to specify as
driver.find_elements _by_xpath(r'html/body/form/table/div')
In case if you have any classname or any text or anything in the main div element you can Use any of the find_elements method . for further reading Locating Elements
Hope it helps. Happy Coding :)

Categories

Resources