Text value is empty while extracting div text

Text value is empty while extracting div text - python

Im trying to extract the text "Quesadilla" , however when I try to get the text it comes as empty.
HTML trying to extract from:
<div data-baseweb="block" data-testid="menu-item-name" class="d0 bd ib ic id ie be ed f7 bh fz if">
<div lines="2" class="ig ih ii bq">Quesadilla</div></div>
</div>
Code:
menuItemNames = driver.find_elements_by_xpath("//div[#data-testid='menu-item-name']")
for menuItemName in menuItemNames:
print(menuItemName.text)
Is there a better way to do this?
When I use the above code to get the text I get the values of some div while others return empty or null

try to introduce CSS_SELECTOR with explicit wait for more reliability :
CSS SELETOR :
div[data-testid='menu-item-name'] div
Code with explicit wait :
driver = webdriver.Chrome("C:\\Users\\etc\\Desktop\\Selenium+Python\\chromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("Your URL")
ele = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[data-testid='menu-item-name'] div "))).text
print(ele)
if there are 10 elements inside this div div[data-testid='menu-item-name'] you could properly try to use css selector like this :
elements = driver.find_elements(By.CSS_SELECTOR, "div[data-testid='menu-item-name']")
for ele in elements:
print(ele.text)
Learn more about explicit wait here

Related

how to select something from a dropdown menu that has no select element ? python selenium

hello im practicing selenium on a practice forum this the link for it :
click here
if visit the page and inspect element on the dropdown menu for state and city you will find it consists of only div element i tried doing this but didnt work obviously :
dropdown = Select(d.find_element("xpath",'//*[#id="state"]'))
dropdown.select_by_index(0)
this the error message :
Select only works on <select> elements, not on <div>
can someone show how to loop through the value of the menu or is there any other solution ?

This code is working
search_url = 'https://demoqa.com/automation-practice-form'
driver = webdriver.Chrome(options = options, executable_path= os.path.join(os.environ['USERPROFILE'],"Desktop") + f'\\Python\\available Tender\\chromedriver\\chromedriver.exe')
driver.get(search_url)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
element1 = WebDriverWait(driver, 4).until(EC.presence_of_element_located((By.XPATH, f"//div[#id='adplus-anchor']")))
driver.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", element1)
element2 = WebDriverWait(driver, 4).until(EC.presence_of_element_located((By.XPATH, f"//div[#id='currentAddress-wrapper']")))
driver.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", element2)
driver.find_element(By.XPATH, '//*[#id="state"]/div/div[2]/div').click()
e1 = WebDriverWait(driver, 4).until(EC.presence_of_element_located((By.XPATH, f"//div[contains(#class,'menu')]")))
e1.find_element(By.XPATH, "//div[contains(text(),'NCR')]").click()

find an element under a <ul> <li> <a> "some text" </a> </li> </ul> tag as shown in figure

This is the HTMl code, I want to get "1" and so on all values written in nested <li> <a> tags
I have tried
total = driver.find_element_by_xpath("//a[text()='...']/following-sibling::a").text
and
totl = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[#class='ng-binding']")))
print (totl.text)
but nothing works. It will be a great favor if you let me out of it.

To be able to get text WebElement should be visible, that's why wait for visibility of all elements. Code examples to get all a elements (total is be a list of WebElements):
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'ul[uib-pagination] li a')))
# or
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'li.pagination-page a')))
To get text from total:
# texts of all links in total
total_texts = [element.text from element in total]
print(total_texts)
# text of the first one
first_page_number = total[0].text
# text of the last one
last_page_number = total[-1].text

Using Selenium to find indexed element within a div

I'm scraping the front-end of a webpage and having difficulty getting the HMTL text of a div within a div.
Basically, I'm simulating clicks - one for each event listed on the page. From there, I want to scrape the date and time of the event, as well as the location of the event.
Here's an example of one of the pages I'm trying to scrape:
https://www.bandsintown.com/e/1013664851-los-grandes-de-la-banda-at-aura-nightclub?came_from=257&utm_medium=web&utm_source=home&utm_campaign=event
<div class="eventInfoContainer-54d5deb3">
<div class="lineupContainer-570750d2">
<div class="eventInfoContainer-9e539994">
<img src="assets.bandsintown.com/images.clock.svg">
<div>Sunday, April 21st, 2019</div> <!––***––>
<div class="eventInfoContainer-50768f6d">5:00PM</div><!––***––>
</div>
<div class="eventInfoContainer-1a68a0e1">
<img src="assets.bandsintown.com/images.clock.svg">
<div class="eventInfoContainer-2d9f07df">
<div>Aura Nightclub</div> <!––***––>
<div>283 1st St., San Jose, CA 95113</div> <!––***––>
</div>
I've marked the elements I want to extract with asterisks - the date, time, venue, and address. Here's my code:
base_url = 'https://www.bandsintown.com/?came_from=257&page='
events = []
eventContainerBucket = []
for i in range(1, 2):
driver.get(base_url + str(i))
# get events links
event_list = driver.find_elements_by_css_selector('div[class^=eventList-] a[class^=event-]')
# collect href attribute of events in even_list
events.extend(list(event.get_attribute("href") for event in event_list))
# iterate through all events and open them.
for event in events:
driver.get(event)
uniqueEventContainer = driver.find_elements_by_css_selector('div[class^=eventInfoContainer-]')[0]
print "Event information: "+ uniqueEventContainer.text
This prints:
Event information: Sunday, April 21st, 2019
3:00 PM
San Francisco Brewing Co.
3150 Polk St, Sf, CA 94109
View All The Fourth Son Tour Dates
My issue is that I can't access the nested eventInfoContainer divs individually. For example, the 'date' div is position [1], as it is the second element (after img) in it's parent div "eventInfoContainer-9e539994". The parent div "eventInfoContainer-9e539994" is in position [1] is it is likewise the second element in it's parent div "eventInfoContainer-54d5deb3" (after "lineupContainer).
By this logic, shouldn't I be able to access the date text by this code: (accessing the 1st position element, with it's parent being the 1st position element, within the container (the 0th position element)?
for event in events:
driver.get(event)
uniqueEventContainer = driver.find_elements_by_css_selector('div[class^=eventInfoContainer-]')[0][1][1]
I get the following error:
TypeError: 'WebElement' object does not support indexing

When you index into webElements list (which is what find_elements_by_css_selector('div[class^=eventInfoContainer-]') returns) you get a webElement, you cannot further index into that. You can split the text of a webElement to generate a list for further indexing.
If there is regular structure across pages you could load html for div into BeautifulSoup. Example url:
from selenium import webdriver
from bs4 import BeautifulSoup as bs
d = webdriver.Chrome()
d.get('https://www.bandsintown.com/e/1013664851-los-grandes-de-la-banda-at-aura-nightclub?came_from=257&utm_medium=web&utm_source=home&utm_campaign=event')
soup = bs(d.find_element_by_css_selector('[class^=eventInfoContainer-]').get_attribute('outerHTML'), 'lxml')
date = soup.select_one('img + div').text
time = soup.select_one('img + div + div').text
venue = soup.select_one('[class^=eventInfoContainer-]:nth-of-type(3) div > div').text
address = soup.select_one('[class^=eventInfoContainer-]:nth-of-type(3) div + div').text
print(date, time, venue, address)
If line breaks were consistent:
containers = d.find_elements_by_css_selector('div[class^=eventInfoContainer-]')
array = containers[0].text.split('\n')
date = array[3]
time = array[4]
venue = array[5]
address = array[6]
print(date, time, venue, address)
With index and split:
from selenium import webdriver
from bs4 import BeautifulSoup as bs
d = webdriver.Chrome()
d.get('https://www.bandsintown.com/e/1013664851-los-grandes-de-la-banda-at-aura-nightclub?came_from=257&utm_medium=web&utm_source=home&utm_campaign=event')
containers = d.find_elements_by_css_selector('div[class^=eventInfoContainer-]')
date_time = containers[1].text.split('\n')
i_date = date_time[0]
i_time = date_time[1]
venue_address = containers[3].text.split('\n')
venue = venue_address[0]
address = venue_address[1]
print(i_date, i_time, venue, address)

As the error suggests, webelements doesn't have indexing. What you are confusing with is list.
Here
driver.find_elements_by_css_selector('div[class^=eventInfoContainer-]')
This code returns a list of webelements. That is why you can access a webelement using the index of the list. But that element doesn't have indexing to another webelement. You are not getting a list of lists.
That is why
driver.find_elements_by_css_selector('div[class^=eventInfoContainer-]')[0] works. But driver.find_elements_by_css_selector('div[class^=eventInfoContainer-][0][1]') doesn't.
Edit:(Answer for quesion in the comment)
It is not slenium code.
The code posted in the answer by QHarr uses BeautifulSoup. It is a python package for parsing HTML and XML documents.
BeautifulSoup has a .select() method which uses CSS selector against a parsed document and return all the matching elements.
There’s also a method called select_one(), which finds only the first tag that matches a selector.
In the code,
time = soup.select_one('img + div + div').text
venue = soup.select_one('[class^=eventInfoContainer-]:nth-of-type(3) div > div').tex
It gets the first element found by the given CSS selector and returns the text inside the tag. The first line finds a img tag then finds the immediate sibling div tag, then again finds the sibling dev tag of the previous div tag.
In the second line it finds the third sibling tag that has class starts with eventInfoContainer- and then it finds the child div and find the child of that div.
Check out CSS selectors
This could be done directly using selenium:
date = driver.find_element_by_css_selector("img[class^='eventInfoContainer-'][src$='clock.svg'] + div")
time = driver.find_element_by_css_selector("img[class^='eventInfoContainer-'] + div + div")
venue = driver.find_element_by_css_selector("img[class^='eventInfoContainer-'][src$='pin.svg'] + div > div")
address = driver.find_element_by_css_selector("img[class^='eventInfoContainer-'][src$='pin.svg'] + div > div:nth-of-type(2)")
I've used differnt CSS selectors but it still selects the same elements.
I'm not sure about BeautifulSoup but in the answer of QHarr, the date selector would return other value instead of intended value for selenium.

How to extract the texts from the span tag as per the html using selenium and Python

I'm looking to pull the following information between the span/div tags from these three tags.
<span class="engagementInfo-valueNumber js-countValue">496.26K</span>
<div class="websiteRanks-valueContainer js-websiteRanksValue">
<span class="websiteRanks-valueChange websiteRanks-valueChange--isSingleMode websiteRanks-valueChange--up"></span>
180
</div>
<span class="websitePage-relativeChangeNumber">16.35%</span>
When I copy the xpath it turns out like:
/html/body/div[1]/main/div/div/div[2]/div[2]/div[1]/div[3]/div/div/div/div[2]/div/span[2]/span[2]/span
and copying the selector yields:
body > div.wrapper-body.wrapperBody--websiteAnalysis.js-wrapperBody > main > div > div > div.analysisPage-section.analysisPage-section--withFeedback.websitePage-overview.js-section.js-showInCompare.is-active.js-triggered > div.analysisPage-sectionContent.analysisPage-sectionVisits.js-sectionContent.js-print-pageFooter.is-triggered > div.u-clearfix.analysisPage-sectionOverview > div.websitePage-mobileFramed.websitePage-mobileFramed--overview > div > div > div > div:nth-child(2) > div > span.engagementInfo-value.engagementInfo-value--large.u-text-ellipsis > span.engagementInfo-valueRelative.websitePage-relativeChange.websitePage-relativeChange--delay.websitePage-relativeChange--up.js-showOnCount.is-shown > span
ultimately I would love a few elements with 496.26K, 180 and 16.35%, or in a list.
I've tried the following without success, though its worked for me for other websites in the past:
url = 'https://www.similarweb.com/website/' + domain
driver.get(url) #get response
driver.implicitly_wait(2) #wait to load content
total_vists = driver.find_element_by_xpath(xpath='/html/body/div[1]/main/div/div/section[2]/div/ul/li[1]/div[2]').text

You can try css selector for first span as :
for extracting 496.26K
first_span = driver.find_element_by_css_selector("span.engagementInfo-valueNumber.js-countValue").text
print(first_span)
for extracting 180 :
second_span= driver.find_element_by_css_selector("span.websiteRanks-valueChange.websiteRanks-valueChange--isSingleMode.websiteRanks-valueChange--up")
print(second_span.text)
for extracting 16.35%
third_span= driver.find_element_by_css_selector("span.websitePage-relativeChangeNumber")
print(third_span.text)

As perr the HTML you have shared as the elements are JavaScript based so you need to induce WebDriverWait for the elements to be visible and you can use the following solutions:
496.26K:
engagementInfo = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//span[#class='engagementInfo-valueNumber js-countValue']"))).get_attribute("innerHTML")
180:
websiteRanks = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//span[#class='websiteRanks-valueContainer js-websiteRanksValue']")))
websiteRanksText = driver.execute_script('return arguments[0].lastChild.textContent;', websiteRanks).strip()
16.35%:
websitePageChangeNumber = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//span[#class='websitePage-relativeChangeNumber']"))).get_attribute("innerHTML")
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Python selenium webdriver. Find elements with specified class name

I am using Selenium to parse a page containing markup that looks a bit like this:
<html>
<head><title>Example</title></head>
<body>
<div>
<span class="Fw(500) D(ib) Fz(42px)">1</span>
<span class="Fw(500) D(ib) Fz(42px) Green XYZ">2</span>
</div>
</body>
</html>
I want to fetch all span elements that contain the class foobar.
I have tried both of this (the variable wd is an instance of selenium.webdriver):
elem = wd.find_elements_by_css_selector("span[class='Fw(500) D(ib) Fz(42px).']")
elem = wd.find_element_by_xpath("//span[starts-with(#class, 'Fw(500) D(ib) Fz(42px))]")
NONE OF WHICH WORK.
How can I select only the elements that start with Fw(500) D(ib) Fz(42px)
i.e. both span elements in the sample markup given.

Try as below :-
elem = wd.find_elements_by_css_selector("span.foobar")
If there is space between class foo and bar then try as below :-
elem = wd.find_elements_by_css_selector("span.foo.bar")
Edited : If your class contains with non alphabetical charactor and you want to find element which starts with Fw(500) D(ib) Fz(42px) then try as below :-
elem = wd.find_elements_by_css_selector("span[class ^= 'Fw(500) D(ib) Fz(42px)']")

Try to find elements by XPath:
//span[#class='foobar']
This should work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Text value is empty while extracting div text - python

Related

how to select something from a dropdown menu that has no select element ? python selenium

find an element under a <ul> <li> <a> "some text" </a> </li> </ul> tag as shown in figure

Using Selenium to find indexed element within a div

How to extract the texts from the span tag as per the html using selenium and Python

Python selenium webdriver. Find elements with specified class name

Categories

Resources