I'm a relative newcomer to selenium so this might be something incredibly simple but I can't seem to access an element even though it appears on the page. I don't think it can be that it hasn't loaded yet because I can reference other elements. The line of code I am trying to use and the html is below.
max_questions = driver.find_element_by_xpath(xpath="//span[contains(#class, 'total-questions')]")
<div data-v-404a90e7="" data-v-084771db="" class="header animated fadeInDown anim-300-duration">
<div data-v-404a90e7="" class="left-section half-width">
<div data-v-404a90e7="" flow="right" class="menu-icon animated fadeIn anim-300-duration">
<div data-v-404a90e7="" class="menu-icon-image"></div>
</div>
<div data-v-404a90e7="" class="question-number-wrapper text-unselectable animated fadeIn anim-300-duration">
<span data-v-404a90e7="" class="current-question">1</span>
<span data-v-404a90e7="" class="total-questions">/10</span>
</div>
</div>
<div data-v-404a90e7="" class="right-section half-width">
<div data-v-404a90e7="" class="room-code animated fadeIn anim-300-duration">712851</div>
<div data-v-404a90e7="" flow="left" class="exit-game-btn-wrapper animated fadeIn anim-300-duration">
<div data-v-404a90e7="" class="exit-game-icon"></div>
</div>
</div>
</div>
You can use WebDriverWait with expected_conditions:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('d:\\chromedriver\\chromedriver.exe')
driver.get(url)
wait = WebDriverWait(driver, 10)
max_questions = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[contains(#class, 'total-questions')]")))
print(max_questions.text)
Related
I am trying to extraxt the review text from this page.
Here's a condensed version of the html shown in my chrome browser inspector:
<div id="module_product_review" class="pdp-block module">
<div class="lazyload-wrapper ">
<div class="pdp-mod-review" data-spm="ratings_reviews" lazada_pdp_review="expose" itemid="1615006548" data-nosnippet="true" data-aplus-ae="x1_490e4591" data-spm-anchor-id="a2o42.pdp_revamp.0.ratings_reviews.508466b1OJjCoH">
<div>...</div>
<div>...</div>
<div>
<div class="mod-reviews">
<div class="item">
<div class="top">...</div>
<div class="middle">...</div>
<div class="item-content">
<div class="content" data-spm-anchor-id="a2o42.pdp_revamp.ratings_reviews.i3.508466b1OJjCoH">Slim and light. feel good. better if providing 16G version.</div>
<div class="review-image">...></div>
<div class="skuInfo">Color Family:MYSTIC SILVER</div>
<div class="bottom">...</div>
<div class="dialogs"></div>
</div>
<div class="seller-reply-wrapper">...</div>
<div class="item">...</div>
<div class="item">...</div>
<div class="item">...</div>
<div class="item">...</div>
</div>
</div>
</div>
</div>
</div>
I'm trying to extract the "Slim and light. feel good. better if providing 16G version." text from the class="content" element.
But when I try to retrieve the id="module_product_review" element using Selenium in python, this is what I get instead:
<div class="pdp-block module" id="module_product_review">
<div class="lazyload-wrapper">
<div class="lazy-load-placeholder">
<div class="lazy-load-skeleton">
</div>
</div>
</div>
</div>
This is my code:
op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
module_product_review = driver.find_element(By.ID, "module_product_review")
html = module_product_review.get_attribute("outerHTML")
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())
I thought it might have been because I was retrieving the element before it was fully loaded, so I tried to sleep the program for 30 seconds before calling find_element(), but I still get the same result. As far as I can tell, it's not an issue of iframes or shadow roots either.
Is there some other issue that I'm missing?
The element you are trying to access and to get it's text is initially out of the visible view. You have first to scroll that element into the view.
Also, since you are working in headless mode you should set the window size. The default window size in headless mode is much smaller than we normally use.
And you should use expected conditions explicit waits to access the elements only when they are ready for that.
This should work better:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
options.add_argument("window-size=1920,1080")
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
element = wait.until(EC.presence_of_element_located((By.ID, "module_product_review")))
time.sleep(1)
actions.move_to_element(element).perform()
module_product_review = wait.until(EC.visibility_of_element_located((By.ID, "module_product_review")))
#now you can do what you want here
html = module_product_review.get_attribute("outerHTML")
Also, in order to find that specific element and get that specific text you could use something more precise, like this:
your_text = wait.until(EC.visibility_of_element_located((By.XPATH, "(//div[#id='module_product_review']//div[#class='item']//div[#class='content'])[1]"))).text
You can use this after scrolling, as mentioned above
I'm trying to get a checkmark on "Minutes (1993-Present)" from the federal reserve's document filter page with selenium.
https://www.federalreserve.gov/monetarypolicy/materials/
This is my code. I have tried the following ways, I keep get "Message: no such element: Unable to locate element: Unable to locate element"
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.federalreserve.gov/monetarypolicy/materials/")
link = driver.find_element_by_xpath('//*[#id="article"]/app-root/div/ng-component/div[1]/div/div/form/app-doc-types/div[2]/div/div/div[7]/label/input').click()
link = driver.find_element_by_xpath("//div[#class='form-group']/div/div[7]/label/input[contains(text(), 'Minutes (1993-Present)')]").click()
link = driver.find_element_by_css_selector('div.form-group div:nth-child(7) label input').click()
I sliced some parts of the HTML below.
<app-doc-types><div class="eventSearch__label">
<p class="form-control-static">
<strong><legend class="ng-binding">Type:</legend></strong>
</p>
</div>
<div class="eventSearch__inputs">
<div class="form-group">
<div class="row">
<div class="col-lg-6">
<label>
<input type="checkbox" class="ng-untouched ng-pristine ng-valid">
Agendas
</label>
</div><div class="col-lg-6">
<label>
<input type="checkbox" class="ng-untouched ng-pristine ng-valid">
Beige Books/Redbooks
</label>
</div><div class="col-lg-6">
<label>
<input type="checkbox" class="ng-untouched ng-pristine ng-valid">
Bluebooks
</label>
</div><div class="col-lg-6">
<label>
<input type="checkbox" class="ng-untouched ng-pristine ng-valid">
Chairman's FOMC Press Conferences
</label>
</div><div class="col-lg-6">
<label>
<input type="checkbox" class="ng-untouched ng-pristine ng-valid">
Greenbooks
</label>
</div><div class="col-lg-6">
<label>
<input type="checkbox" class="ng-untouched ng-pristine ng-valid">
Memos
</label>
</div><div class="col-lg-6">
<label>
<input type="checkbox" class="ng-valid ng-dirty ng-touched">
Minutes (1993-Present)
</label>
</div><!---->
</div>
</div>
</div>
</app-doc-types>
It takes some time for website to load. I added time.sleep(2) after opening the webpage with driver.
Your first XPath should work, however it's extremely dependent on the structure of the page, therefore it may not work due to a little modification in the HTML structure of the page.
Second XPath won't work, and actually you do not need to add text() check to the end of your path. This should work:
//div[#class='form-group']/div/div[7]/label/input
CSS Selector path should work, however your css path can be easily simplified. Check my solution below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
import time
driver = webdriver.Chrome()
driver.get("https://www.federalreserve.gov/monetarypolicy/materials/")
time.sleep(2)
link = driver.find_element_by_css_selector('.col-lg-6:nth-of-type(7) input').click()
Alternatively, you can use an explicit wait.
driver.get("https://www.federalreserve.gov/monetarypolicy/materials/")
element = WebDriverWait(driver , 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR , '.col-lg-6:nth-of-type(7) input')))
element.click()
you can use playwright
import asyncio
from playwright.async_api import async_playwright
async def run(playwright):
chromium = playwright.chromium # or "firefox" or "webkit".
browser = await chromium.launch()
page = await browser.new_page()
await page.goto("https://www.federalreserve.gov/monetarypolicy/materials/")
await page.click('xpath=/html/body/div[3]/div[2]/div[2]/app-root/div/ng-component/div[1]/div/div/form/app-doc-types/div[2]/div/div/div[7]/label/input')
await browser.close()
async def main():
async with async_playwright() as playwright:
await run(playwright)
asyncio.run(main())
I'm trying to web scrape the person's name and company.
This is what I've tried.
<div id="viewcontact">
<table width="100%">
<tbody><tr>
<td style="display: inline-block; width: 30%">
<div class="formsection_light" style="margin-top:-8px;background:#eaeaea;">
<div style="padding-bottom:10px;">
<div class="left">
<h1>Company Name</h1>
<p class="f16">Person's Name</p>
<div class="theme">
Person's Name
</div>
</div>
<div class="right" style="margin-top:5px;">
driver.find_element_by_xpath('//h1[#class="left"]')
driver.find_element_by_class_name("f16")
And the output was nothing, no errors just didn't scrape anything
Try something like this :
details = driver.find_elements_by_xpath("//div[#id = 'viewcontact']//tr")
for detail in details:
name = detail.find_element_by_tag_name("h1").text #Or `.get_attribute("innerText")`
cpny = detail.find_element_by_tag_name("p").text
print("{} : {}".format(name,cpny))
to get the company name :
wait = WebDriverWait(driver, 50)
print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.formsection_light h1"))).text)
to get the first person name :
print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.formsection_light p.f16"))).text)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
<div class="wrapper">
<button class="w-full h-14 pt-2 pb-1 px-3 bg-accent text-dark-1 rounded-full md:rounded select-none cursor-pointer md:hover:shadow-big focus:outline-none md:focus:bg-accent-2 md:focus:shadow-small ">
<div class="font-medium">
<div class="text-17 md:text-18 md:font-bold leading-18">Enter</div>
<div class="text-13 md:text-12 font-normal md:font-medium leading-normal">2 hours</div>
</div>
</button>
</div>
So I'm trying to click this button but it has a huge class name in CSS. One of the ways possible is to use 'driver.find_element_by_css_selector' but i'm not sure if I am doing it right? I'd prefer an approach where I don't have to use the 'css_selector'. But if that is the only way I guess that'll have to do.
I tried this, but it did not seem to work:
self.driver.find_element_by_css_selector('.w-full h-14 pt-2 pb-1 px-3 bg-accent text-dark-1 rounded-full md:rounded select-none cursor-pointer md:hover:shadow-big focus:outline-none md:focus:bg-accent-2 md:focus:shadow-small ')
Any suggestions?
Thank you.
Try this
btn = self.driver.find_element_by_css_selector('.w-full h-14 pt-2 pb-1 px-3 bg-accent
text-dark-1 rounded-full md:rounded select-none cursor-pointer md:hover:shadow-big
focus:outline-none md:focus:bg-accent-2 md:focus:shadow-small ')
btn.click()
Use following locator to identify the element.
Css selector :
self.driver.find_element_by_css_selector("div.wrapper >button:nth-of-type(1)").click()
Xpath :
self.driver.find_element_by_xpath("//div[#class='wrapper']/button[1]").click()
Ideally you should use WebDriverWait() and wait for element clickable
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.wrapper >button:nth-of-type(1)"))).click()
You need to import below libraries.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
So I'm trying to write a test for a webpage which has some elements within an iframe. I've been able to successfully run the test using webdriver.Firefox() without any problems but if I switch it over to webdriver.Chrome() I get a timeout exception on the following lines of code:
self.driver.switch_to.frame(0)
self.activity_status = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#overview > div.details.w-66 > div > div.duration-and-status > span.status.stat_incomplete#')))
It'd be great to get a solution to this as I'm all out of ideas.
Thanks for your help.
edit, partial html for the page:
<iframe id="iframe_course_details" allowfullscreen="" src="../Course/Details.aspx?HidePageNav=true&IsInIframe=true"></iframe>
Close
Edit (Inactive)
Edit
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_favourite" class="favourite button tooltipstered" style="display: none;">Favourite</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_basket_dull" class="add-to-basket button delete tooltipstered" style="display: none;">Enrolled (Remove From Enrolments)</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_basket" class="add-to-basket button tooltipstered">Add to Enrolments</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_print" class="print button tooltipstered">Print</span>
</div>
<section id="overview" style="opacity: 1;">
<div id="fullname" class="fullname w-100" style="display: none;">
</div>
<div class="image w-33" style="cursor: pointer;">
<div style="background-image:url(/App_Themes/MainTheme-responsive/Images/Course/webcast.jpg);"></div></div>
<div class="details w-66">
<div class="inner">
<h2>testing activity</h2>
<div class="star-rating-num-ratings">
<div class="star-rating">
<span></span><span></span><span></span><span></span><span></span>
</div>
<span class="num-of-ratings">0 Ratings</span>
</div>
<div class="duration-and-status">
<span class="duration">
<label>
Duration:
</label>
<span>0</span>
</span>
<span class="status stat_incomplete">Started</span>
</div>
Edit 2:
So we've managed to find a solution to this and its even more confusing than the original problem
WebDriverWait(self.driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID, 'iframe_course_details')))
time.sleep(0)
self.activity_status = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//*[#id="overview"]/div[3]/div/div[2]/span[2]')))
I'd be really curious to hear some theories on why this works, it times out without the 'time.sleep(0).
If you reference the iframe directly rather then an integer that will work between Firefox/Chrome.
self.driver.switch_to.frame(driver.find_element_by_name("iframe"))
You can find the iframe element any way you wish e.g by css/xpath etc
As the the desired element is within an <iframe> so to invoke click() on the element you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Code Block:
# as per your comment assuming -> there is only one frame on the page
WebDriverWait(self.driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME,"iframe")))
self.element = self.activity_status = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#overview > div.details.w-66 > div > div.duration-and-status > span.status.stat_incomplete#')))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Reference
You can find a relevant detailed discussion in:
Ways to deal with #document under iframe