Lets say I have some HTML code that looks like this and I use CSS selectors to make a list of elements
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
now I have a list of elements that are at the item-container level. How would I go about finding the href value of each element in elements.
I was thinking I do something like
for element in elements:
element.get_attribute("href")
I know I could explicitly go to the href level with the code but I want to check if each container contains href and if it does I want the value in that container. If I go specifically to the href level it will just skip the containers that do not have href in them.
You could try this
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome()
driver.get("file://{PATH_TO_YOUR_FILE}")
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
for element in elements:
try:
link = element.find_element_by_tag_name('a')
print(link.get_attribute('href'))
except NoSuchElementException:
print('No Data Available!')
driver.close()
Besides, I'd suggest surrounding your divs with </div> and add https:// before your URLs.
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
</div>
</div>
If you don't add https:// before your URLs, python will interpret it as a local URL if you run selenium in a local file.
Related
I'm currently scraping elements from a webpage. Let's say i'm iterating over a HTML reponse and a part of that response looks like this:
<div class="col-sm-12 col-md-5">
<div class="material">
<div class="material-parts">
<span class="material-part" title="SLT-4 2435">
<img src="/images/train-material/mat_slt4.png"/> </span>
<span class="material-part" title="SLT-6 2631">
<img src="/images/train-material/mat_slt6.png"/> </span>
</div>
</div>
</div>
I know I can access the first element under title within the span class like so:
row[-1].find('span')['title']
"SLT-4 2435
But I would like to select the second title under the span class (if it exists) as a string too, like so: "SLT-4 2435, SLT-6 2631"
Any ideas?
You can use the find_all() function to find all the span elements with class material-part
titles = []
for material_part in row[-1].find_all('span', class_='material-part'):
titles.append(material_part['title'])
result = ', '.join(titles)
In alternativ to find() / find_all() you could use css selectors:
soup.select('span.material-part[title]')
,iterate the ResultSet with list comprehension and join() your texts to a single string:
','.join([t.get('title') for t in soup.select('span.material-part[title]')])
Example
from bs4 import BeautifulSoup
html = '''<div class="col-sm-12 col-md-5">
<div class="material">
<div class="material-parts">
<span class="material-part" title="SLT-4 2435">
<img src="/images/train-material/mat_slt4.png"/> </span>
<span class="material-part" title="SLT-6 2631">
<img src="/images/train-material/mat_slt6.png"/> </span>
</div>
</div>
</div>'''
soup = BeautifulSoup(html)
','.join([t.get('title') for t in soup.select('span.material-part[title]')])
Output
SLT-4 2435,SLT-6 2631
This is the html code exemple:
<div aria-label="Continue" class="my-class" data-visualcompletion="ignore"></div>
<div class="div1-class">
<div class="div1-class2">
<span class="area-span" dir="auto">
<span class="text-span">Continue</span>
</span>
</div>
</div>
<div class="div2-class" data-visualcompletion="ignore"></div>
I'm trying:
continue = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, 'my-class')) )
continue.click()
but it doesn't work in any of the ways I tried.
You should check the xpaths in the browser console.
And try doing this ?
driver.findElement(By.xpath("//div[contains(#class,'my-class')]"));
I've tried to get the text from class="eventAwayMinute">57 in every matchEvent class (Parent tag)
If a matchEvent class contains class="eventIcon eventIcon_1":
<div class="matchEvent">
<div class="eventHomePlayer">
</div>
<div class="eventHomeMinute"></div>
<div class="eventIcon eventIcon_1"></div>
<div class="eventAwayMinute">57'</div>
<div class="eventAwayPlayer">
George
<span>(Irakli)</span> </div>
</div>
I tried
Minutes = [(gm.get_text()).strip() for gm in soup.select('matchEvent , div[class$="eventIcon_1"]')]
and it dose not work.
I tried also
Minutes = [(gm.get_text()).strip() for gm in soup.select('matchEvent')]
But it returns all minutes that exist in every matchEvent (There is several matchEvent classes in html code).
You can use the :has() CSS Selector to check if matchEvent has an eventIcon eventIcon_1 class, and than print the eventAwayMinute class:
from bs4 import BeautifulSoup
html = """<div class="matchEvent">
<div class="eventHomePlayer">
</div>
<div class="eventHomeMinute"></div>
<div class="eventIcon eventIcon_1"></div>
<div class="eventAwayMinute">57'</div>
<div class="eventAwayPlayer">
George
<span>(Irakli)</span> </div>
</div>
"""
soup = BeautifulSoup(html, "html.parser")
for tag in soup.select(".matchEvent:has(.eventIcon.eventIcon_1)"):
print(tag.select_one(".eventAwayMinute").text.strip("'"))
Output:
57
I am looking at scraping the below information using both selenium and bs4, and was wondering if I find the below div tag, is it possible to scrape the data inside the quotation marks? for exmaple: data-room-type-code="SUK"
<div
class="sl-flexbox room-price-item hidden-top-border"
data-room-name="Superior Shard Room"
data-bed-type="K"
data-bed-name="King"
data-pay-type-tag-filter="No Prepayment"
data-cancel-tag-filter=""
data-breakfast-tag-filter=""
data-room-type-code="SUK"
data-rate-code="ZBAR"
data-price="430"
>
<div class="room-price-basic-info">
<div class="room-price-title title-regular">Flexible Rate / CustomStay</div>
<ul class="abstract text-regular">
<li>No Prepayment</li>
</ul>
<div
class="show-detail text-btn js-show-detail"
data-index="0-productRates-0"
>
OFFER DETAILS
</div>
</div>
<div class="room-price-book-info">
<div class="number text-medium">GBP 430</div>
</div>
<div class="boot-btn text-medium js-booking-room" data-type="PRICE">
Book Now
</div>
</div>
I have run into an issue while working on a web scraping project in python. I am new to python and am not sure how to extract a specific line, or a value from part of a line, from the beautiful soup output. I would like to get only the data-rarity part from this site but i haven't found how to do that without removing the entire line from the list.
Any help is much appreciated!
I have this:
rarity = soup.find_all('div', {'class': 'profileCards__card'})
print(rarity[0])
This outputs:
<div class="profileCards__card upgrade " data-level="902" data-elixir="2" data-rarity="102" data-arena="802">
<img src="//cdn.statsroyale.com/images/cards/full/snowball.png"><span class="profileCards__level">lvl.9</span>
<div class="profileCards__meter">
<span style="width: 100%"></span>
<div class="profileCards__meter__numbers">
8049/800
</div>
</div>
<div class="ui__tooltip ui__tooltipTop ui__tooltipMiddle cards__tooltip">
Giant Snowball
</div>
</div>
I would ideally want to get only the value after the data-rarity so just the 102 part from this in the inspect element of the site.
<div class="profileCards__cards">
<div class="profileCards__card upgrade " data-level="902" data-elixir="2" data-rarity="102" data-arena="802">
<img src="//cdn.statsroyale.com/images/cards/full/snowball.png"><span class="profileCards__level">lvl.9</span>
<div class="profileCards__meter">
<span style="width: 100%"></span>
<div class="profileCards__meter__numbers">
8049/800
</div>
</div>
<div class="ui__tooltip ui__tooltipTop ui__tooltipMiddle cards__tooltip">
Giant Snowball
</div>
</div>
Use:
rarity = soup.find_all('div', {'class': 'profileCards__card'})
for r in rarity:
print(r.find("div", {'class': 'profileCards__card'})["data-rarity"])