Get a link text and element text based on a condition - python

I have to get the text and link of an element if there is 'theme-cell-card Ace' else not. Following is the sample html code:
<div class="theme-grid-cell-frame">
<a href="/t/490">
<div class="theme-cell">
<div class="image"></div>
<div class="theme-cell-overlay deep"></div>
<h1 class="theme-cell-name"> textqwqw</h1>
<div class="theme-cell-card Ace"></div>
</div>
</a>
</div>
<div class="theme-grid-cell-frame">
<a href="/o/434">
<div class="theme-cell">
<div class="image"></div>
<div class="theme-cell-overlay deep"></div>
<h1 class="theme-cell-name"> textegg</h1>
<div class="theme-cell-card Jack"></div>
</div>
</a>
</div>
<div class="theme-grid-cell-frame">
<a href="/t/4665">
<div class="theme-cell">
<div class="image"></div>
<div class="theme-cell-overlay deep"></div>
<h1 class="theme-cell-name"> textdgfh</h1>
<div class="theme-cell-card Ace"></div>
</div>
</a>
</div>
<div class="theme-grid-cell-frame">
<a href="/o/764">
<div class="theme-cell">
<div class="image"></div>
<div class="theme-cell-overlay deep"></div>
<h1 class="theme-cell-name"> textgrth</h1>
</div>
</a>
</div>
I am able to get text of an element but I want to pass the condition class="theme-cell-card Ace" is true.
${grid} Set Variable //div[#class='theme-cell']
#{elements} Get Webelements ${grid}
:FOR ${element} IN #{elements}
\ ${text} Get Text ${element}
I am a newbie, so please let me know if you need more info. Thank you

I don't know the robot framework but this is the XPath locator you want
//a[.//div[#class='theme-cell-card Ace']]
That will get you the A tags that contain a DIV that has the desired classes. You can get the href from that element along with the contained text.
Since your question is tagged python, you can use something simple like
aces = driver.find_elements_by_xpath("//a[.//div[#class='theme-cell-card Ace']]")
for ace in aces
print(ace.get_attribute("href"))
print(ace.text)

#{elements} Get Webelements //a[.//div[#class='theme-cell-card Ace']]
:FOR ${element} IN #{elements}
\ ${text} Get Text ${element}
\ ${link} SeleniumLibrary.Get Element Attribute ${element} attribute=href
\ Log to console ${text}
\ Log to console ${link}

Related

all element attributes in specific container selenium python

Lets say I have some HTML code that looks like this and I use CSS selectors to make a list of elements
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
now I have a list of elements that are at the item-container level. How would I go about finding the href value of each element in elements.
I was thinking I do something like
for element in elements:
element.get_attribute("href")
I know I could explicitly go to the href level with the code but I want to check if each container contains href and if it does I want the value in that container. If I go specifically to the href level it will just skip the containers that do not have href in them.
You could try this
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome()
driver.get("file://{PATH_TO_YOUR_FILE}")
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
for element in elements:
try:
link = element.find_element_by_tag_name('a')
print(link.get_attribute('href'))
except NoSuchElementException:
print('No Data Available!')
driver.close()
Besides, I'd suggest surrounding your divs with </div> and add https:// before your URLs.
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
</div>
</div>
If you don't add https:// before your URLs, python will interpret it as a local URL if you run selenium in a local file.

Using Selenium and BS4 is it possible to scrape the text outside the "=" within the div tag

I am looking at scraping the below information using both selenium and bs4, and was wondering if I find the below div tag, is it possible to scrape the data inside the quotation marks? for exmaple: data-room-type-code="SUK"
<div
class="sl-flexbox room-price-item hidden-top-border"
data-room-name="Superior Shard Room"
data-bed-type="K"
data-bed-name="King"
data-pay-type-tag-filter="No Prepayment"
data-cancel-tag-filter=""
data-breakfast-tag-filter=""
data-room-type-code="SUK"
data-rate-code="ZBAR"
data-price="430"
>
<div class="room-price-basic-info">
<div class="room-price-title title-regular">Flexible Rate / CustomStay</div>
<ul class="abstract text-regular">
<li>No Prepayment</li>
</ul>
<div
class="show-detail text-btn js-show-detail"
data-index="0-productRates-0"
>
OFFER DETAILS
</div>
</div>
<div class="room-price-book-info">
<div class="number text-medium">GBP 430</div>
</div>
<div class="boot-btn text-medium js-booking-room" data-type="PRICE">
Book Now
</div>
</div>

Unable to get whole row from BeautifulSoup

I've been practicing my scraping and everything was going fine but as hard as I try I can't seem to get this specific data I'm looking for.
Structure looks like this
</div>
<div class="col-xs-12 col-sm-12 col-md-7 list-field-wrap">
<div class="pull-left">
<div class="row">
<div class=" list-field type-field" style="width: 45px"><div class="visible-xs-block visible-sm-block list-label">BIB</div>17584</div>
<div class=" list-field type-age_class" style="width: 65px"><div class="visible-xs-block visible-sm-block list-label">Division</div>20-24</div>
</div>
</div>
What I want to do is get the 17584 with class = "visible-xs-block visible-sm-block list-label"
Unfortunately every time I try to select it. It only returns
<div class="visible-xs-block visible-sm-block list-label">BIB</div>
This is my code I've been trying to select it
bib = soup.find('div', class_="visible-xs-block visible-sm-block list-label"
print(bib)
WAS ABLE TO FIGURE IT OUT STRUCTURE START EARLIER.
17584 is not part of the tag with class visible-xs-block visible-sm-block list-label:
<div class=" list-field type-field" style="width: 45px">
<div class="visible-xs-block visible-sm-block list-label">
BIB
</div>
17584
</div>
Try to select list-field type-field instead.

Using Selenium Webdriver, grabbing data not showing up in innerhtml

I am trying to use selenium to grab text data from a page.
Printing the html attributes:
element = driver.find_element_by_id("divresults")
Results:
print(element.get_attribute('innerHTML'))
<div id="divDesktopResults"> </div>
Results:
print(element.get_attribute('outerHTML'))
<div id="divresults" data-bind="html:resultsContent"><div id="divDesktopResults"> </div></div>
Tried grabbing this element
Results:
driver.find_element_by_css_selector("span[class='glyphicon glyphicon-tasks']")
Message: no such element: Unable to locate element: {"method":"css selector","selector":"span[class='glyphicon glyphicon-tasks']"}
This is the code when copied from the Browser. There is much more below 'divresults' that did not show up in the innerhtml printout
<div id="divresults" data-bind="html:resultsContent">
<div>
<div class="row" style="font-size:8pt;">
<a data-toggle="tooltip" style="text-decoration:underline" href="#pdfviewer?ID=D218101736">
<strong>D218101736 </strong>
<span class="glyphicon glyphicon-new-window"></span>
</a>
<div class="btn-group" style="font-size:8pt;margin-left:10px;" id="btnD218101736">
<span style="display:none;font-size:8pt;" id="lblD218101736"> Added To Cart</span>
<button type="button" style="font-size:8pt;" class="btn btn-primary dropdown-toggle" data-toggle="dropdown"> Add To Cart
<span class="caret"></span>
</button>
<ul class="dropdown-menu" role="menu">
<li> <strong>Regular ($7.00)</strong> </li>
<li> <strong>Certified ($12.00)</strong> </li>
</ul>
</div>
</div> <br>
<ul class="nav nav-tabs compact">
<li class="active">
<a data-toggle="tab" href="#D218101736_Doc">
<span class="glyphicon glyphicon-file"></span>
<span>Doc Info</span>
</a>
</li>
<li class="hidden-xs">
<a data-toggle="tab" href="#D218101736_Thumbnail">
<span class="glyphicon glyphicon-th-large"></span>
<span>Thumbnail</span>
</a>
</li>
....
How to I get data beneath divresults in the instance?
My guess is that it's one of two things:
There is more than one element that matches that locator. To investigate this, try using $$("#divresults") in the dev console and make sure that it returns 1. If it returns more than one, run $$("#divresults")[0] and make sure the element returned is the one you want. If it is, go on to step 2. If it isn't, you will need to find a locator that is more specific. If you want our help, you will need to provide a link to the page or more of the surrounding HTML to the desired element.
You need to add a wait so that the contents of the element can finish loading. You could wait for a locator like #divresults strong or any number of locators to find some of the elements that were missing. You would wait for them to be visible (or at least present). See the docs for more info and options.

Need help scraping items from a list with Scrapy using ancestor

I am trying to scrape the details like Contact, Location, Phone and Rate. The html is as below. The list is a dynamic one so sometimes only few of the items like Contact and Location may appear on the page while sometimes all of them can appear. I am thinking I can use the icon tag to get the required text but am unable to find any documentation on this. Any help would be highly appreciated.
Thanks in advance.
<div class="detail-all-label">
<i class="abc-Contact"></i>
<div class="detail-all-text"><b>Contact</b>: Ram Bahadur</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Location"></i>
<div class="detail-all-text"><b>Location</b>: Kathmandu</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Website"></i>
<div class="detail-all-text"><b>Website</b>: itworkremotely</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Phone"></i>
<div class="detail-all-text"><b>Phone</b>: 3283550121</div>
</div>
<div class="detail-all-label">
<i class="abc-font abc-Rate"></i>
<div class="detail-all-text"><b>Rate</b>: €700 - 10000</div>
</div>
You can get all of the detail values that have a preceding b element inside the div with class="detail-all-text":
for detail in response.xpath("//div[#class='detail-all-text']/b"):
name = detail.xpath("text()").extract()[0]
value = detail.xpath("following-sibling::text()")[0]
print name, value

Categories

Resources