I am using Selenium to parse a page containing markup that looks a bit like this:
<html>
<head><title>Example</title></head>
<body>
<div>
<span class="Fw(500) D(ib) Fz(42px)">1</span>
<span class="Fw(500) D(ib) Fz(42px) Green XYZ">2</span>
</div>
</body>
</html>
I want to fetch all span elements that contain the class foobar.
I have tried both of this (the variable wd is an instance of selenium.webdriver):
elem = wd.find_elements_by_css_selector("span[class='Fw(500) D(ib) Fz(42px).']")
elem = wd.find_element_by_xpath("//span[starts-with(#class, 'Fw(500) D(ib) Fz(42px))]")
NONE OF WHICH WORK.
How can I select only the elements that start with Fw(500) D(ib) Fz(42px)
i.e. both span elements in the sample markup given.
Try as below :-
elem = wd.find_elements_by_css_selector("span.foobar")
If there is space between class foo and bar then try as below :-
elem = wd.find_elements_by_css_selector("span.foo.bar")
Edited : If your class contains with non alphabetical charactor and you want to find element which starts with Fw(500) D(ib) Fz(42px) then try as below :-
elem = wd.find_elements_by_css_selector("span[class ^= 'Fw(500) D(ib) Fz(42px)']")
Try to find elements by XPath:
//span[#class='foobar']
This should work.
Related
I'm trying to scrap the class name of the first child (span) from multiple div.
Here is the html code:
<div class="ui_column is-9">
<span class="name1></span>
<span class="...">...</span>
...
<div class ="ui_column is-9">
<span class="name2></span>
<span class="...">...</span>
...
<div class ..
URL of the page for the complete code.
I'm achieving this task with this code for the first five div:
i=0
liste=[]
while i <= 4:
parent= driver.find_elements_by_xpath("//div[#class='ui_column is-9']")[i]
child= parent.find_element_by_xpath("./child::*")
class_name= child.get_attribute('class')
i = i+1
liste.append(nom_classe)
But do you know if there is an easier way to do it ?
You can directly get all these first span elements and then extract their class attribute values as following:
liste = []
first_spans = driver.find_elements_by_xpath("//div[#class='ui_column is-9']//span[1]")
for element in first_spans:
class_name= element.get_attribute('class')
liste.append(class_name)
You can also extract the class attribute values from 5 first elements only by limiting the loop for 5 iterations
UPD
Well, after updating your question the answer becomes different and much simpler.
You can get the desired elements directly and extract their class name attribute values as following:
liste = []
first_spans = driver.find_elements_by_xpath("//div[#class='ui_column is-9']//span[contains(#class,'ui_bubble_rating')]")
for element in first_spans:
class_name= element.get_attribute('class')
liste.append(class_name)
This is the HTMl code, I want to get "1" and so on all values written in nested <li> <a> tags
I have tried
total = driver.find_element_by_xpath("//a[text()='...']/following-sibling::a").text
and
totl = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[#class='ng-binding']")))
print (totl.text)
but nothing works. It will be a great favor if you let me out of it.
To be able to get text WebElement should be visible, that's why wait for visibility of all elements. Code examples to get all a elements (total is be a list of WebElements):
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'ul[uib-pagination] li a')))
# or
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'li.pagination-page a')))
To get text from total:
# texts of all links in total
total_texts = [element.text from element in total]
print(total_texts)
# text of the first one
first_page_number = total[0].text
# text of the last one
last_page_number = total[-1].text
<li tabindex="0" role="tab" aria-selected="false">
<a href="#gift-cards" class="leftnav-links kas-leftnav-links" data-section="gift-cards" data-ajaxurl="/wallet/my_wallet.jsp">
<span class="width200 kas-gift-cards-tab">Gift Cards</span>
<span class="count kas-count">info</span>
</a>
</li>
I have such a html code which is duplicated on a page about 5 times, and only 2 of these blocks have information I need. Their classes are the same and I don't know what to do.
Plus, the execute_script in Firefox does not work for me.
html_list = driver.find_element_by_id("rewards-contents")
items = html_list.find_element_by_tag_name("li")
for item in items:
text = item.text
print(text)
I tried to crank it on python, but nothing sensible came out.
I expect the script to display info from all 5 blocks.
To get all elements use find_elements instead of find_element. Your code should look like:
html_list = driver.find_element_by_id("rewards-contents")
items = html_list.find_elements_by_tag_name("li")
for item in items:
print(item.text)
To get text by span elements:
html_list = driver.find_elements_by_css_selector("#rewards-contents li")
items = html_list.find_elements_by_tag_name("span")
for item in items:
print(item.text)
Thanks to #Sers for the hint with css_selector. I solved my problem in the following way:
`info = []
time.sleep(2)
htmllist = driver.find_element_by_class_name("rewards-contents")
items = htmllist.find_elements_by_css_selector(".kas-count")
for item in items:
info.append(item.text)
print(item.text)
print(info)`
My html page looks like this:
<div class="some class">
<p>
<i class="class1"></i>
Some Text
</p>
<p>
<i class="class2"></i>
Some Text
</p>
. . .
. . .
. . .
</div
I want to get Some text. Currently I am trying:
elem = browser.find_element_by_xpath("//div[#class='some class']")
text = elem.find_element_by_xpath("//p/i[#class='class1']").text
But it returns an empty string. I cant understand why. I am new to selenium. Please help.
You use xpath below:
# Find "i" element with "class1" css class and get first parent "p" element
elem = browser.find_element_by_xpath("//i[#class='class1']/ancestor::p[1]")
# Same as previous with added "div"
elem = browser.find_element_by_xpath("//div[#class='some class']//i[#class='class1']/ancestor::p[1]")
# Find "p" element with child "i" element with "class1" css class
elem = browser.find_element_by_xpath("//p[./i[#class='class1']]")
# Same as previous with added "div"
elem = browser.find_element_by_xpath("//div[#class='some class']//p[./i[#class='class1']]")
Your selector is grabbing the element i that has attribute class="class1". i has no text, which is why it's an empty string, so to fix that:
elem = browser.find_element_by_xpath("//div[#class='some class']")
# Now let's find the i element you want
i_elem = elem.find_element_by_xpath("//i[#class='class1']")
# Now find the parent of that i_elem, which is p
p_elem = [p for p in i_elem.iterancestors() if p.tag=='p'][0]
txt = p_elem.text
you can use execute_script
xPath = "//div[#class='some class']"
try:
element = driver.find_element_by_xpath(xPath)
b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
print(b1Text)
except:
print()
try changing the value inside childNodes[N] for example childNodes[2], childNodes[1]
Assuming that your class1 and class2 are different, you can use this css selector
div.some class > p:nth-child(1) to get the text inside it. Since the text is inside the <p> para tag, you can get the text from the first <p> tag.
elem = browser.find_element_by_css_selector("div.some class > p:nth-child(1)")
text = elem.text
This should get you the text inside the element.
This is the HTML:
<div><div id="NhsjLK">
<li class="EditableListItem NavListItem FollowersNavItem NavItem not_removable">
Followers <span class="list_count">92</span></li></div></div>
I want to extract the text 92 and convert it into integer and print in python2. How can I?
Code:
i = soup.find('div', id='NhsjLK')
print "Followers :", i.find('span', id='list_count').text
I'd not go with getting it by the class directly, since I think "list_count" is too broad of a class value and might be used for other things on the page.
There are definitely several different options judging by this HTML snippet alone, but one of the nicest, from my point of you, is to use that "Followers" text/label and get the next sibling of it:
from bs4 import BeautifulSoup
data = """
<div><div id="NhsjLK">
<li class="EditableListItem NavListItem FollowersNavItem NavItem not_removable">
Followers <span class="list_count">92</span></li></div></div>"""
soup = BeautifulSoup(data, "html.parser")
count = soup.find(text=lambda text: text and text.startswith('Followers')).next_sibling.get_text()
count = int(count)
print(count)
Or, an another, a very concise and reliable approach would be to use the partial match (the *= part below) on the href value of the parent a element:
count = int(soup.select_one("a[href*=followers] .list_count").get_text())
Or, you might check the class value of the parent li element:
count = int(soup.select_one("li.FollowersNavItem .list_count").get_text())