I am trying to extraxt the review text from this page.
Here's a condensed version of the html shown in my chrome browser inspector:
<div id="module_product_review" class="pdp-block module">
<div class="lazyload-wrapper ">
<div class="pdp-mod-review" data-spm="ratings_reviews" lazada_pdp_review="expose" itemid="1615006548" data-nosnippet="true" data-aplus-ae="x1_490e4591" data-spm-anchor-id="a2o42.pdp_revamp.0.ratings_reviews.508466b1OJjCoH">
<div>...</div>
<div>...</div>
<div>
<div class="mod-reviews">
<div class="item">
<div class="top">...</div>
<div class="middle">...</div>
<div class="item-content">
<div class="content" data-spm-anchor-id="a2o42.pdp_revamp.ratings_reviews.i3.508466b1OJjCoH">Slim and light. feel good. better if providing 16G version.</div>
<div class="review-image">...></div>
<div class="skuInfo">Color Family:MYSTIC SILVER</div>
<div class="bottom">...</div>
<div class="dialogs"></div>
</div>
<div class="seller-reply-wrapper">...</div>
<div class="item">...</div>
<div class="item">...</div>
<div class="item">...</div>
<div class="item">...</div>
</div>
</div>
</div>
</div>
</div>
I'm trying to extract the "Slim and light. feel good. better if providing 16G version." text from the class="content" element.
But when I try to retrieve the id="module_product_review" element using Selenium in python, this is what I get instead:
<div class="pdp-block module" id="module_product_review">
<div class="lazyload-wrapper">
<div class="lazy-load-placeholder">
<div class="lazy-load-skeleton">
</div>
</div>
</div>
</div>
This is my code:
op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
module_product_review = driver.find_element(By.ID, "module_product_review")
html = module_product_review.get_attribute("outerHTML")
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())
I thought it might have been because I was retrieving the element before it was fully loaded, so I tried to sleep the program for 30 seconds before calling find_element(), but I still get the same result. As far as I can tell, it's not an issue of iframes or shadow roots either.
Is there some other issue that I'm missing?
The element you are trying to access and to get it's text is initially out of the visible view. You have first to scroll that element into the view.
Also, since you are working in headless mode you should set the window size. The default window size in headless mode is much smaller than we normally use.
And you should use expected conditions explicit waits to access the elements only when they are ready for that.
This should work better:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
options.add_argument("window-size=1920,1080")
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
element = wait.until(EC.presence_of_element_located((By.ID, "module_product_review")))
time.sleep(1)
actions.move_to_element(element).perform()
module_product_review = wait.until(EC.visibility_of_element_located((By.ID, "module_product_review")))
#now you can do what you want here
html = module_product_review.get_attribute("outerHTML")
Also, in order to find that specific element and get that specific text you could use something more precise, like this:
your_text = wait.until(EC.visibility_of_element_located((By.XPATH, "(//div[#id='module_product_review']//div[#class='item']//div[#class='content'])[1]"))).text
You can use this after scrolling, as mentioned above
Related
Hi guys im trying to make a script that will loop through all projects on rarirty tools and click on the divs one by one and then get me the projects details. However for some reason my script is not doing anything.... im not getting any errors either , just its not clicking on the divs for some reason. All the divs have a classname of w-72 as you can see below
<div class="flex flex-row flex-wrap justify-center">
<div class="mb-4 ml-4 overflow-hidden border border-gray-300 rounded-lg shadow-md bgCard dark:border-gray-800">
<div class="w-72">
<a href="/garagexyz-genesis" class="">
<div class="relative w-full overflow-hidden" style="height: 220px;">
<!---->
<img src="https://projects.b-cdn.net/garagexyz-genesis/header.jpg?height=220" class="object-cover object-center w-full h-full">
<div class="flex flex-row mt-2"><div class="p-2 ml-2">
<div class="font-bold text-pink-600 dark:text-gray-300">
GarageXYZ Genesis 500
</div>
Here is the code
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path="C:\\chromedriver.exe")
driver.maximize_window()
index=0
driver.get("https://rarity.tools/")
while True:
divs = driver.find_elements(By.CLASS_NAME, 'w-72')
print(divs)
try:
divs[index].click()
except IndexError:
break # no more elements, exit the loop
# get project info
# ...
driver.back()
index += 1
You can simply loop over the found elements and then extract the link by accessing the child a tag and getting its href attribute.
In order to avoid stale elements you can story the links in a list and open them separately
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://rarity.tools/")
links = [e.find_element(By.XPATH, ".//a").get_attribute("href") for e in driver.find_elements(By.CLASS_NAME, 'w-72')]
for l in links:
driver.get(l)
I'm trying to get the hours of the available time slots from this webpage (the boxes below the calendar):
https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/
I've read other related questions and wrote this code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
url = 'https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/'
wait_time = 10
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get(url)
driver.switch_to.frame(0)
wait = WebDriverWait(driver, wait_time)
first_result = wait.until(presence_of_element_located((By.ID, "sb_main")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(soup)
driver.quit()
After switching to the iframe containing the time slots, I get this from printing soup
<script id="time_slots_view" type="text/html"><div class="slots-view{{#ifCond (getThemeOption 'timeline_modern_display') '==' 'as_table'}} as-table{{/ifCond}}">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
{{_t 'available_services_on_this_day'}}
</div>
{{#if error_message}}
<div class="alert alert-danger alert-dismissible" role="alert">
{{error_message}}
</div>
{{/if}}
{{>emptyTimePart is_empty=is_empty is_loaded=is_loaded}}
<div id="sb_time_slots_container"></div>
{{> bookingTimeLegendPart legend="only_available" time_diff=0}}
</div>
</div>
</div></script>
<script id="time_slot_view" type="text/html"><div class="slot">
<a class="sb-cell free {{#ifPluginActive 'slots_count'}}{{#if available_slots}}has-available-slot{{/if}}{{/ifPluginActive}}" href="#{{bookingStepUrl time=time date=date}}">
{{formatDateTime datetime 'time' time_diff}}
{{#ifCond (getThemeOption 'timeline_show_end_time') '==' 1}}
-<span class="end-time">
{{formatDateTime end_datetime 'time' time_diff}}
</span>
{{/ifCond}}
{{#ifPluginActive 'slots_count'}}
{{#if available_slots}}
<span class="slot--available-slot">
{{available_slots}}
{{#ifConfigParam 'slots_count_show_total' '==' true}} / {{total_slots}} {{/ifConfigParam}}
</span>
{{/if}}
{{/ifPluginActive}}
</a>
</div></script>
while from right click > inspect element in the webpage I get this
<div class="slots-view">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
Orari d'inizio disponibili
</div>
<div id="sb_time_slots_container">
<div class="slot">
<a class="sb-cell free " href="#book/location/4/service/6/count/1/provider/6/date/2020-03-09/time/23:00:00/">
23:00
</a>
</div>
</div>
<div class="time-legend">
<div class="available">
<div class="circle">
</div>
- Disponibile
</div>
</div>
</div>
</div>
</div>
How can I get the hour of the available slots (23:00 in this example) using selenium?
To get the desired response you need to:
Correctly identify the iframe you want to switch to (and switch to it). You were trying to switch to frame[0] but needed frame[1]. The following code removes reliance on indexes and uses xpath instead.
Get the elements containing the time. Again this uses xpath to identify all child div's of an element with id=sb_time_slots_container.
We then iterate over these child div's and get the text property, which is nested within an <a> of these div's.
For both steps 1 & 2 you should also use wait.until so that the content can be loaded.
...
driver.get(url)
wait = WebDriverWait(driver, wait_time)
# Wait until the iframe exists then switch to it
iframe_element = wait.until(presence_of_element_located((By.XPATH, '//*[#id="prenota"]//iframe')))
driver.switch_to.frame(iframe_element)
# Wait until the times exist then get an array of them
wait.until(presence_of_element_located((By.XPATH, '//*[#id="sb_time_slots_container"]/div')))
all_time_elems = driver.find_elements_by_xpath('//*[#id="sb_time_slots_container"]/div')
# Iterate over each element and print the time out
for elem in all_time_elems:
print(elem.find_element_by_tag_name("a").text)
driver.quit()
So I'm trying to write a test for a webpage which has some elements within an iframe. I've been able to successfully run the test using webdriver.Firefox() without any problems but if I switch it over to webdriver.Chrome() I get a timeout exception on the following lines of code:
self.driver.switch_to.frame(0)
self.activity_status = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#overview > div.details.w-66 > div > div.duration-and-status > span.status.stat_incomplete#')))
It'd be great to get a solution to this as I'm all out of ideas.
Thanks for your help.
edit, partial html for the page:
<iframe id="iframe_course_details" allowfullscreen="" src="../Course/Details.aspx?HidePageNav=true&IsInIframe=true"></iframe>
Close
Edit (Inactive)
Edit
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_favourite" class="favourite button tooltipstered" style="display: none;">Favourite</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_basket_dull" class="add-to-basket button delete tooltipstered" style="display: none;">Enrolled (Remove From Enrolments)</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_basket" class="add-to-basket button tooltipstered">Add to Enrolments</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_print" class="print button tooltipstered">Print</span>
</div>
<section id="overview" style="opacity: 1;">
<div id="fullname" class="fullname w-100" style="display: none;">
</div>
<div class="image w-33" style="cursor: pointer;">
<div style="background-image:url(/App_Themes/MainTheme-responsive/Images/Course/webcast.jpg);"></div></div>
<div class="details w-66">
<div class="inner">
<h2>testing activity</h2>
<div class="star-rating-num-ratings">
<div class="star-rating">
<span></span><span></span><span></span><span></span><span></span>
</div>
<span class="num-of-ratings">0 Ratings</span>
</div>
<div class="duration-and-status">
<span class="duration">
<label>
Duration:
</label>
<span>0</span>
</span>
<span class="status stat_incomplete">Started</span>
</div>
Edit 2:
So we've managed to find a solution to this and its even more confusing than the original problem
WebDriverWait(self.driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID, 'iframe_course_details')))
time.sleep(0)
self.activity_status = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//*[#id="overview"]/div[3]/div/div[2]/span[2]')))
I'd be really curious to hear some theories on why this works, it times out without the 'time.sleep(0).
If you reference the iframe directly rather then an integer that will work between Firefox/Chrome.
self.driver.switch_to.frame(driver.find_element_by_name("iframe"))
You can find the iframe element any way you wish e.g by css/xpath etc
As the the desired element is within an <iframe> so to invoke click() on the element you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Code Block:
# as per your comment assuming -> there is only one frame on the page
WebDriverWait(self.driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME,"iframe")))
self.element = self.activity_status = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#overview > div.details.w-66 > div > div.duration-and-status > span.status.stat_incomplete#')))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Reference
You can find a relevant detailed discussion in:
Ways to deal with #document under iframe
I'm a relative newcomer to selenium so this might be something incredibly simple but I can't seem to access an element even though it appears on the page. I don't think it can be that it hasn't loaded yet because I can reference other elements. The line of code I am trying to use and the html is below.
max_questions = driver.find_element_by_xpath(xpath="//span[contains(#class, 'total-questions')]")
<div data-v-404a90e7="" data-v-084771db="" class="header animated fadeInDown anim-300-duration">
<div data-v-404a90e7="" class="left-section half-width">
<div data-v-404a90e7="" flow="right" class="menu-icon animated fadeIn anim-300-duration">
<div data-v-404a90e7="" class="menu-icon-image"></div>
</div>
<div data-v-404a90e7="" class="question-number-wrapper text-unselectable animated fadeIn anim-300-duration">
<span data-v-404a90e7="" class="current-question">1</span>
<span data-v-404a90e7="" class="total-questions">/10</span>
</div>
</div>
<div data-v-404a90e7="" class="right-section half-width">
<div data-v-404a90e7="" class="room-code animated fadeIn anim-300-duration">712851</div>
<div data-v-404a90e7="" flow="left" class="exit-game-btn-wrapper animated fadeIn anim-300-duration">
<div data-v-404a90e7="" class="exit-game-icon"></div>
</div>
</div>
</div>
You can use WebDriverWait with expected_conditions:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('d:\\chromedriver\\chromedriver.exe')
driver.get(url)
wait = WebDriverWait(driver, 10)
max_questions = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[contains(#class, 'total-questions')]")))
print(max_questions.text)
I have the following Selenium Test for Python/Django application:
class EmailRecordsTest(StaticLiveServerTestCase):
def test_can_store_email_and_retrieve_it_later(self):
self.browser.get(self.live_server_url)
emailbox = self.browser.find_element_by_xpath("//form[#class='pma-subscribe-form']/input[1]")
self.assertEqual(emailbox.get_attribute("placeholder"), 'Enter your Email')
print("tested until here")
print("The placeholder: ", emailbox.get_attribute("placeholder"))
print(emailbox)
emailbox.send_keys('vio#mesmerizing.com')
First occurance of emailbox is clearly identified as seen from the print runs and assert Equal for placeholder. The last instance of emailbox.send_keys throws following error:
selenium.common.exceptions.ElementNotVisibleException: Message:
Element is not currently visible and so may not be interacted with
Cannot find why the same element become Not Visible when using with send_keys.
The Html code being tested is as below:
<!-- Start footer -->
<footer id="pma-footer">
<!-- start footer top -->
<div class="pma-footer-top">
<div class="container">
<div class="pma-footer-top-area">
<div class="row">
<div class="col-lg-3 col-md-3 col-sm-3">
<div class="pma-footer-widget">
<h4>News letter</h4>
<p>Get latest update, news & offers</p>
<form class="pma-subscribe-form">
<input id="subscribe-email" type="email" placeholder="Enter your Email">
<button class="btn btn-danger btn-md" type="submit">Subscribe!</button>
</form>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- end footer top -->
Kindly help.
Actually find_element returns element which would be present on the DOM no matter it's visible or not and you can get attribute of this element as well but send_keys does an action on element and selenium does action only visible element, So you need to be sure before doing action on element that it's visible using WebDriverWait as below :-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
emailbox = wait.until(EC.visibility_of_element_located((By.ID, "subscribe-email")))
#do your all stuff before send keys
# now use send_keys
emailbox.send_keys('vio#mesmerizing.com')
Edited :- If you are still unable to interact with element try using execute_script() to set value as below :-
emailbox = wait.until(EC.presence_of_element_located((By.ID, "subscribe-email")))
#do your all stuff before send keys
# now use execute_script
driver.execute_script("arguments[0].value = 'vio#mesmerizing.com'", emailbox)
Another option which worked in this case is that you scroll to the specific element (which was at the bottom of the page)and then use send_keys it works.
emailbox = self.browser.find_element_by_xpath("//form[#class='mu-subscribe-form']/input[1]")
self.browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
emailbox.send_keys('vio#mesmerizing.com')