How to access elements within a web element in Selenium? - python

I have a list of web elements that I defined as follows:
sellersList = browser.find_elements_by_class_name('gig-card-layout')
Each web element looks like this:
<div class="gig-card-layout">
<div>
<div class="gig-wrapper card" data-gig-id="gig_id" data-impression-collected="true">
...
<div class="seller-info text-body-2">...</div>
<h3 class="text-display-7>...</h3>
<footer>
...
</footer>
</div>
</div>
</div>
I would like to access the price text located in the footer of each web element using a for loop.
How could I do that?

You can get the price elements directly using the below snippet.
# get all price elements
priceElems = driver.find_elements_by_css_selector(".gig-card-layout footer a")
# iterate through all price elements and print the price
for priceElem in priceElems:
print(priceElem.get_attribute('title'))
If you want to use the sellersList and iterate through the list then you can do the below
for seller in sellersList:
print(seller.find_element_by_xpath(".//footer/a").get_attribute('title'))

Related

Python, Selenium: How to get text next to element

I'm fairly new to selenium and I'm trying to get the text of a cell next to a known element.
This is an excerpt of a webtable:
<div class="row">
<div class="cell">
text-to-copy
</div>
<div class="cell">
<input type="text" size="10" id="known_id" onchange="update(this.id);" onclick="setElementId(this.id);"/>
X
</div>
<div class="cell right">
<div id="some_id">?</div>
</div>
</div>
It looks something like this:
From this table I would like to get the text-to-copy with selenium. As the composition of the table can vary, there is no way to know that cells xpath. Therefore I can not use selenium_driver.find_element_by_xpath(). The only known thing is the id of the cell next to it (id=known_id).
The following pseudo code is to illustrate what I'm looking for:
element = selenium_driver.find_element_by_id("known_id")
result = element.get_visible_text_from_cell_before_element()
Is there a way to get the visible text (text-to-copy) with selenium?
I believe you can fairly use xpath, all other locators that Selenium supports would not work, becasue we have to traverse upward in DOM.
The below xpath is dependent on known_id
//input[contains(#id,'known_id')]/../preceding-sibling::div
You have to either use .text or .get_attribute etc to get the text.
Sample code :
time.sleep(5)
element = selenium_driver.find_element_by_xpath("//input[contains(#id,'known_id')]/../preceding-sibling::div").get_attribute('innerText')
print(element)

Can't find html sub-elements from inside an element

I'm somewhat inexperienced in scraping websites with lots of sub elements and am trying to understand the best way to loop through elements that have data you want buried in further levels of sub elements.
Here is an example HTML
<div class="s-item__info clearfix">
<h3 class="s-item__title">The Music Tree Activities Book: Part 1 (Music Tree (Summy)) by Clark, Frances, </h3>
</a>
<div class="s-item__subtitle"><span class="SECONDARY_INFO">Pre-Owned</span></div>
<div class="s-item__reviews">
</div>
<div class="s-item__details clearfix">
<div class="s-item__detail s-item__detail--primary"><span class="s-item__price">$3.99</span></div>
<span class="s-item__detail s-item__detail--secondary">
</span>
<div class="s-item__detail s-item__detail--primary"><span class="s-item__purchase-options-with-icon" aria-label="">Buy It Now</span></div>
<div class="s-item__detail s-item__detail--primary"><span class="s-item__shipping s-item__logisticsCost">Free shipping</span></div>
<div class="s-item__detail s-item__detail--primary"><span class="s-item__free-returns s-item__freeReturnsNoFee">Free returns</span></div>
<div class="s-item__detail s-item__detail--primary"></div>
</div>
</div>
There are multiple items so I started by getting all of them in a list and I can find each title by iterating through but am having an issue getting price. Example code
for item in driver.find_elements_by_class_name("s-item__info"):
title = item.find_element_by_xpath('.//h3')
print(title.text)
details = item.find_element_by_xpath('.//span[#class="s-item__price"]')
print(details.text)
This gets the Title of the item, but can't find the price. If I look outside of "s-item_info" element and just use the driver I can get all the prices with the code below, but wondering why it cant find it in the info element, I would think the details would be a subelement and .// would look through those.
driver.find_elements_by_class_name("s-item__price")
Have also tried
find_element_by_xpath('.//div[#class="s-item__detail"]//span[#class="s-item__price"]')
I can grab the data I need but want to understand why I can't get the price when I try to iterate through each item. Thanks
See if this works
for item in driver.find_elements_by_class_name("s-item__info"):
title = item.find_element_by_xpath('.//h3')
print(title.text)
details = item.find_element_by_xpath('.//following::div[contains(#class,'s-item__details')]//span[#class='s-item__price']')
print(details.text)
OK, there are several problems here:
s-item__info is not the only class name on that element, you should use
//div[contains(#class,'s-item__info')] instead
The first element matching this class name is not a valid search result.
The simples approach to make your code work can be:
for item in driver.find_elements_by_xpath("//div[contains(#class,'s-item__info')]"):
title = item.find_elements_by_xpath('.//h3')
if(title):
print(title[0].text)
details = item.find_elements_by_xpath('.//span[#class="s-item__price"]')
if(details):
print(details[0].text)
This will print data if existing, otherwise just print empty strings

Selenium get next sibling

I have more or less this structure, How can I select the next element after title? The starting point must be x or y since the structure have duplicated classes and so on, and this one is an unique anchor. Just to clarify I need to catch the content and the reference is the title.
x = wd.find_elements_by_class_name('title')[0] // print title 0
y = wd.find_elements_by_class_name('title')[1] // print title 1
HTML:
<div class='global'>
<div class="main">
<p class="title">Title0</p>
<p class="content">Content0</p>
</div>
<div class="main">
<p class="title">Title1</p>
<p class="content">Content1</p>
</div>
</div>
If you are using selenium try following css selector to get the p tag based on class title.
driver.find_elements_by_css_selector(".title+p")
To get the content value.
for item in driver.find_elements_by_css_selector(".title+p"):
print(item.text)
For Specific Element:
driver.find_element(By.XPATH, '(//p[#class()="title"]/following-sibling::p)[1]' #Content0
driver.find_element(By.XPATH, '(//p[#class()="title"]/following-sibling::p)[2]' #Content1
For all Elements:
for content in driver.find_elements(By.XPATH, '//p[#class()="title"]/following-sibling::p'):
print(content.text)

'NoneType' error when trying to set attributes for the first tag from a parsed PDF document in Python with BeautifulSoup4

I am writing a Python script using pdfminer.six to convert a huge bulk of pdfs to html to upload them on a e-store afterwards. So far the main text blocks have been parsed quite well, but in the process I had to replace all spans to divs (and strip the spans from their attributes) for obvious reasons, so now a document's structure is as follows:
<div> #first main block
<div>
Product desc heading
</div>
<div>
Product desc text
</div>
#etc etc
</div>
<div> #second main block
<div>
Product specs heading
</div>
<div>
Product specs text
</div>
#etc etc
</div>
The problem is the navigation in identical divs. If I try to find the very first div and add some attributes to it, like the docs suggest:
firstdiv = soup.find('div')
firstdiv['class'] = 'main_productinfo'
The result is quite predictable - IDLE prints out the following error:
File "C:\Users\blabla\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4\element.py", line 1036, in __setitem__
self.attrs[key] = value
TypeError: 'NoneType' object does not support item assignment
, since the find() method doesn't return a particular result (may or may not find).
I want to strain the first block in each file and then parse the tables (found in the specs block below) to html and join these two in each upload file.
How can I add attributes to the first tag without converting the soup to string again and again (and thus making it really, really ugly, since it converts the newly refined soup without any whitespaces) and replacing parts of the string in str(soup)? I'm quite new to Python and nothing readily comes to mind.
UPD:
I'm using Python 3.7.2 on Win 7 64.
I'm not getting that error:
import bs4
html = '''<div> #first main block
<div>
Product desc heading
</div>
<div>
Product desc text
</div>
#etc etc
</div>
<div> #second main block
<div>
Product specs heading
</div>
<div>
Product specs text
</div>
#etc etc
</div>'''
soup = bs4.BeautifulSoup(html, 'html.parser')
firstdiv = soup.find('div')
Output:
print (firstdiv)
<div> #first main block
<div>
Product desc heading
</div>
<div>
Product desc text
</div>
#etc etc
</div>
Then:
firstdiv['class'] = 'main_productinfo'
print (firstdiv)
<div class="main_productinfo"> #first main block
<div>
Product desc heading
</div>
<div>
Product desc text
</div>
#etc etc
</div>

how to insert all id with similar text in a list with selinium python

<div>
<div id="ide_1"> </div>
<div id="ide_3"> </div>
<div id="ide_5"> </div>
<div id="ide_7"> </div>
</div>
I want to select all ids of the child div and insert them in a list but i didn't find any solution to get into the parent div. I am trying to find all id that's similar to ide_ because that's the fix part which wouldn't change.
You can use css_selector search for all ids that contains ide_
find_elements_by_css_selector('[id*="ide_"]')
You can use the find_elements_by_xpath() , this will return a list of elements with specified path.
Lets say your div is located as
<html>
<body>
<form>
<table>
<div>
Then you have to specify as
driver.find_elements _by_xpath(r'html/body/form/table/div')
In case if you have any classname or any text or anything in the main div element you can Use any of the find_elements method . for further reading Locating Elements
Hope it helps. Happy Coding :)

Categories

Resources