Need help parsing html element and execute script no worked - python

<li tabindex="0" role="tab" aria-selected="false">
<a href="#gift-cards" class="leftnav-links kas-leftnav-links" data-section="gift-cards" data-ajaxurl="/wallet/my_wallet.jsp">
<span class="width200 kas-gift-cards-tab">Gift Cards</span>
<span class="count kas-count">info</span>
</a>
</li>
I have such a html code which is duplicated on a page about 5 times, and only 2 of these blocks have information I need. Their classes are the same and I don't know what to do.
Plus, the execute_script in Firefox does not work for me.
html_list = driver.find_element_by_id("rewards-contents")
items = html_list.find_element_by_tag_name("li")
for item in items:
text = item.text
print(text)
I tried to crank it on python, but nothing sensible came out.
I expect the script to display info from all 5 blocks.

To get all elements use find_elements instead of find_element. Your code should look like:
html_list = driver.find_element_by_id("rewards-contents")
items = html_list.find_elements_by_tag_name("li")
for item in items:
print(item.text)
To get text by span elements:
html_list = driver.find_elements_by_css_selector("#rewards-contents li")
items = html_list.find_elements_by_tag_name("span")
for item in items:
print(item.text)

Thanks to #Sers for the hint with css_selector. I solved my problem in the following way:
`info = []
time.sleep(2)
htmllist = driver.find_element_by_class_name("rewards-contents")
items = htmllist.find_elements_by_css_selector(".kas-count")
for item in items:
info.append(item.text)
print(item.text)
print(info)`

Related

Scraping the attribute of the first child from multiple div (selenium)

I'm trying to scrap the class name of the first child (span) from multiple div.
Here is the html code:
<div class="ui_column is-9">
<span class="name1></span>
<span class="...">...</span>
...
<div class ="ui_column is-9">
<span class="name2></span>
<span class="...">...</span>
...
<div class ..
URL of the page for the complete code.
I'm achieving this task with this code for the first five div:
i=0
liste=[]
while i <= 4:
parent= driver.find_elements_by_xpath("//div[#class='ui_column is-9']")[i]
child= parent.find_element_by_xpath("./child::*")
class_name= child.get_attribute('class')
i = i+1
liste.append(nom_classe)
But do you know if there is an easier way to do it ?
You can directly get all these first span elements and then extract their class attribute values as following:
liste = []
first_spans = driver.find_elements_by_xpath("//div[#class='ui_column is-9']//span[1]")
for element in first_spans:
class_name= element.get_attribute('class')
liste.append(class_name)
You can also extract the class attribute values from 5 first elements only by limiting the loop for 5 iterations
UPD
Well, after updating your question the answer becomes different and much simpler.
You can get the desired elements directly and extract their class name attribute values as following:
liste = []
first_spans = driver.find_elements_by_xpath("//div[#class='ui_column is-9']//span[contains(#class,'ui_bubble_rating')]")
for element in first_spans:
class_name= element.get_attribute('class')
liste.append(class_name)

find an element under a <ul> <li> <a> "some text" </a> </li> </ul> tag as shown in figure

This is the HTMl code, I want to get "1" and so on all values written in nested <li> <a> tags
I have tried
total = driver.find_element_by_xpath("//a[text()='...']/following-sibling::a").text
and
totl = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[#class='ng-binding']")))
print (totl.text)
but nothing works. It will be a great favor if you let me out of it.
To be able to get text WebElement should be visible, that's why wait for visibility of all elements. Code examples to get all a elements (total is be a list of WebElements):
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'ul[uib-pagination] li a')))
# or
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'li.pagination-page a')))
To get text from total:
# texts of all links in total
total_texts = [element.text from element in total]
print(total_texts)
# text of the first one
first_page_number = total[0].text
# text of the last one
last_page_number = total[-1].text

Is there an alternative to bs4's find_all() method that returns another soup object instead of a list, for further navigation?

Upon finding all the <ul>'s, I'd like to further extract the text, and the href's. The problem I'm facing particular to this bit of HTML, is that I need most, but not all the <li> items in the page. I see that when I find_all(), I am returned a list object which does not allow me to further navigate it as a soup object.
For example, in the below snippet, to ultimately create a dictionary of {'cityName': 'href',}, I have tried:
city_list = soup.find_all('ul', {'class': ''})
city_dict = {}
for city in city_list:
city_dict[city.text] = city['href']
Here is the sample minimal HTML:
<h4>Alabama</h4>
<ul>
<li>auburn</li>
<li>birmingham</li>
<li>tuscaloosa</li>
</ul>
<h4>Alaska</h4>
<ul>
<li>anchorage / mat-su</li>
<li>southeast alaska</li>
</ul>
<h4>Arizona</h4>
<ul>
<li>flagstaff / sedona</li>
<li>yuma</li>
</ul>
<ul>
<li>help</li>
<li>safety</li>
<li class="fsel mobile linklike" data-mode="regular">desktop</li>
</ul>
How can I, essentially, find_all() the ul's first, and then further find only the li's that interest me?
Probably you need something like this:
city_dict = {}
for ul in soup.find_all('ul', {'class': ''}):
state_name = ul.find_previous_sibling('h4').text
print(state_name)
for link in ul.find_all('a'):
print(link['href'])
city_dict = {}
for li in soup.find_all('li'):
city_name = li.text
for link in li.find_all('a'):
city_dict[city_name] = link['href']
Try this, Thank me later :)
list_items = soup.find_all('ul',{'class':''})
list_of_dicts = []
for item in list_items:
for i in item.find_all('li'):
new_dict = {i.text:i.a.get('href')}
list_of_dicts.append(new_dict)

How to find text of <div><span>text</span></div> in beautifulsoup?

This is the HTML:
<div><div id="NhsjLK">
<li class="EditableListItem NavListItem FollowersNavItem NavItem not_removable">
Followers <span class="list_count">92</span></li></div></div>
I want to extract the text 92 and convert it into integer and print in python2. How can I?
Code:
i = soup.find('div', id='NhsjLK')
print "Followers :", i.find('span', id='list_count').text
I'd not go with getting it by the class directly, since I think "list_count" is too broad of a class value and might be used for other things on the page.
There are definitely several different options judging by this HTML snippet alone, but one of the nicest, from my point of you, is to use that "Followers" text/label and get the next sibling of it:
from bs4 import BeautifulSoup
data = """
<div><div id="NhsjLK">
<li class="EditableListItem NavListItem FollowersNavItem NavItem not_removable">
Followers <span class="list_count">92</span></li></div></div>"""
soup = BeautifulSoup(data, "html.parser")
count = soup.find(text=lambda text: text and text.startswith('Followers')).next_sibling.get_text()
count = int(count)
print(count)
Or, an another, a very concise and reliable approach would be to use the partial match (the *= part below) on the href value of the parent a element:
count = int(soup.select_one("a[href*=followers] .list_count").get_text())
Or, you might check the class value of the parent li element:
count = int(soup.select_one("li.FollowersNavItem .list_count").get_text())

Python selenium webdriver. Find elements with specified class name

I am using Selenium to parse a page containing markup that looks a bit like this:
<html>
<head><title>Example</title></head>
<body>
<div>
<span class="Fw(500) D(ib) Fz(42px)">1</span>
<span class="Fw(500) D(ib) Fz(42px) Green XYZ">2</span>
</div>
</body>
</html>
I want to fetch all span elements that contain the class foobar.
I have tried both of this (the variable wd is an instance of selenium.webdriver):
elem = wd.find_elements_by_css_selector("span[class='Fw(500) D(ib) Fz(42px).']")
elem = wd.find_element_by_xpath("//span[starts-with(#class, 'Fw(500) D(ib) Fz(42px))]")
NONE OF WHICH WORK.
How can I select only the elements that start with Fw(500) D(ib) Fz(42px)
i.e. both span elements in the sample markup given.
Try as below :-
elem = wd.find_elements_by_css_selector("span.foobar")
If there is space between class foo and bar then try as below :-
elem = wd.find_elements_by_css_selector("span.foo.bar")
Edited : If your class contains with non alphabetical charactor and you want to find element which starts with Fw(500) D(ib) Fz(42px) then try as below :-
elem = wd.find_elements_by_css_selector("span[class ^= 'Fw(500) D(ib) Fz(42px)']")
Try to find elements by XPath:
//span[#class='foobar']
This should work.

Categories

Resources