Selenium WebDriverWait page load how write yield statement - python

I am trying to wait for a page to fully load with Selenium, and tried to use code from other answers here: https://stackoverflow.com/a/30385843/8165689 3rd method in this answer using Selenium's 'staleness_of' property, and originally at: http://www.obeythetestinggoat.com/how-to-get-selenium-to-wait-for-page-load-after-a-click.html
However, I think I have some problem with the Python yield keyword specifically in this code. Based on the above, I have the method:
#contextmanager
def wait_for_page_load(driver, timeout = 30):
old_page = driver.find_element_by_tag_name('html')
yield WebDriverWait(driver, timeout).until(staleness_of(old_page))
This doesn't get called by Python, breakpoint shows it is skipped.
I also have same problem with apparent original code:
#contextmanager
def wait_for_page_load(driver, timeout = 30):
old_page = driver.find_element_by_tag_name('html') # up to here with decorator, the function is called OK, with 'yield' it is NOT called
yield
WebDriverWait(driver, timeout).until(staleness_of(old_page))
But if I delete the yield statement onwards this function does at least get called:
#contextmanager
def wait_for_page_load(driver, timeout = 30):
old_page = driver.find_element_by_tag_name('html')
Anyone know how I should write the yield statement? I'm not experienced with yield, but it looks like Python has to yield something, so perhaps original code which seems to have yield in line of its own has a problem?

I think you have might missed out the expected conditions here.Please try that code see if this helps.
from selenium.webdriver.support import expected_conditions as EC
def wait_for_page_load(driver, timeout = 30):
old_page = driver.find_element_by_tag_name('html')
yield WebDriverWait(driver, timeout).until(EC.staleness_of(old_page))

Related

How to put XPath into optional mode

This XPath may available sometime or sometime not.
If reject is true then I am using if statement:
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.firefox.options import Options
import time
import bs4
import requests
url="abc"
options = Options()
options.set_preference("dom.webnotifications.enabled", False)
driver=webdriver.Firefox(executable_path="C:\driver\geckodriver.exe",options=options)
driver.get(url)
driver.maximize_window()
reject = driver.find_element_by_xpath("/html/body/div/div/div/main/div/section/div[2]/div[2]/div/ul/a[1]/div[3]/label")
if reject:
driver.find_element_by_xpath("/html/body/div[1]/div/div/main/div/section/div[2]/div[2]/div/ul/a[1]/div[1]/span/i").click()
time.sleep(1)
driver.find_element_by_xpath("/html/body/div[1]/div/div/main/div/section/div[2]/div[2]/div/ul/a[1]/div[1]/div/ul/li[2]").click()
time.sleep(2)
driver.find_element_by_xpath("/html/body/div[3]/div/div/div[3]/button[1]").click()
time.sleep(5)
# Above code blocking to run below code (if reject is None).
neighbourhood= Select(driver.find_element_by_name("Locality"))
neighbourhood.select_by_value("5001641")
But the problem is if this reject variable XPath doesn't exist so it's showing error & blocking below code.
how to make this reject variable optional if XPath available then work if not then leave it & run below code.
You could catch the exception. Something like following:
...
try:
reject = driver.find_element_by_xpath("/html/body/div/div/div/main/div/section/div[2]/div[2]/div/ul/a[1]/div[3]/label")
except:
print("No element found")
if reject:
...
If you need this more often you could create a utility method for that.
def elementVisible(xpath):
try:
driver.find_element_by_xpath(xpath);
return true;
except:
return false;
try-except block will do the trick.
try:
reject = driver.find_element_by_xpath("/html/body/div/div/div/main/div/section/div[2]/div[2]/div/ul/a[1]/div[3]/label")
except:
print("An exception occurred")
Whenever the xpath is not found, the print statement is executed, and further code gets executed without an error like before.

Stale element exception python

I am retrieving my link using the following code:
os.environ['MOZ_HEADLESS'] = '1'
binary = FirefoxBinary('C:\\Program Files\\Mozilla Firefox\\firefox.exe', log_file=sys.stdout)
self.driver = webdriver.Firefox(firefox_binary=binary, executable_path='C:/chromedriver/geckodriver')
self.driver.get(link)
Next, I call:
xpath=".//a[#class='tileLink']"
ignored_exceptions = (NoSuchElementException, StaleElementReferenceException,)
your_element = WebDriverWait(self.driver, 30, ignored_exceptions=ignored_exceptions).until(
expected_conditions.presence_of_element_located((By.XPATH, xpath)))
and then
links = self.driver.find_elements_by_xpath(".//a[#class='tileLink']")
for link in links:
href_ = link.get_attribute("href") # <<-- Error ehre
and link.get_attribute(attribute) throws the stale element exception.
Now, given the WebDriverWait I thought I would avoid this issue, yet it persists.
I am tempted to take the page source, once it has loaded, and throw it into lxml to avoid this issue completely.
The time the passes between establishing links and iterating over the links is a second at most.
Has anyone else experienced an issue like this, and found a solution?
Any guidance is appreciated.
If I have understood your question you need to get the href attribute from all the elements identified with the xpath //a[#class='tileLink']. To achieve that you can use the following code block :
# xpath=".//a[#class='tileLink']"
# ignored_exceptions = (NoSuchElementException, StaleElementReferenceException,)
# links = WebDriverWait(self.driver, 30, ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_element_located((By.XPATH, xpath)))
links = WebDriverWait(self.driver, 30).until(expected_conditions.visibility_of_all_elements_located((By.XPATH, "//a[#class='tileLink']")))
for link in links:
print(link.get_attribute("href"))
I had a similar problem with some movable buttons on the page being stale.
How about something like:
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.common.action_chains import ActionChains
hrefs = []
for index, link in enumerate(links):
attempts = 0
while True:
try:
for action in range(0, 10):
ActionChains(context.browser). \
move_to_element(links[index]).click().perform()
href_ = link.get_attribute("href")
if href_:
hrefs.append(href_)
break
break
except StaleElementReferenceException:
attempts += 1
if attempts > 10:
break
I realise this is a very crude (primitive, even) solution, and assumes that the element becoming "un-stale" is a timing issue.
I'm also not that good with Python, so this may need some tweaking...
Thinking about it, since those elements are links, perhaps you don't want to click them, in which case delete the click() bit from the ActionChains line, or perhaps change it to context_click().

Youtube scraping with selenium :not getting all comments

I am trying to scrape youtube comments using selenium with python. Below is the code which scrapes just the one comment and throws error
driver = webdriver.Chrome()
url="https://www.youtube.com/watch?v=MNltVQqJhRE"
driver.get(url)
wait(driver, 5500)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight + 500);")
driver.implicitly_wait(5000)
#content = driver.find_element_by_xpath('//*[#id="contents"]')
comm=driver.find_element_by_xpath('//div[#class="style-scope ytd-item-section-renderer"]')
comm1=comm.find_elements_by_xpath('//yt-formatted-string[#id="content-text"]')
#print(comm.text)
for i in range(50):
print(comm1[i].text,end=' ')
This is the output I am getting. How do I get all the comments on that page??? Can anyone help me with this.
Being a sucessful phyton freelancer really mean to me because if I able to make $2000 in month I can really help my family financial, improve my skill, and have a lot of time to refreshing. So thanks Qazi, you really help me :D
Traceback (most recent call last):
File "C:\Python36\programs\Web scrap\YT_Comm.py", line 19, in <module>
print(comm1[i].text,end=' ')
IndexError: list index out of range
An IndexError means you’re attempting to access a position in a list that doesn’t exist. You’re iterating over your list of elements (comm1) exactly 50 times, but there are fewer than 50 elements in the list, so eventually you attempt to access an index that doesn’t exist.
Superficially, you can solve your problem by changing your iteration to loop over exactly as many elements as exist in your list—no more and no less:
for element in comm1:
print(element.text, end=‘ ‘)
But that leaves you with the problem of why your list has fewer than 50 elements. The video you’re scraping has over 90 comments. Why doesn’t your list have all of them?
If you take a look at the page in your browser, you'll see that the comments load progressively using the infinite scroll technique: when the user scrolls to the bottom of the document, another "page" of comments are fetched and rendered, increasing the length of the document. To load more comments, you will need to trigger this behavior.
But depending on the number of comments, one fetch may not be enough. In order to trigger the fetch and rendering of all of the content, then, you will need to:
attempt to trigger a fetch of additional content, then
determine whether additional content was fetched, and, if so,
repeat (because there might be even more).
Triggering a fetch
We already know that additional content is fetched by scrolling to the bottom of the content container (the element with id #contents), so let's do that:
driver.execute_script(
"window.scrollTo(0, document.querySelector('#contents').scrollHeight);")
(Note: Because the content resides in an absolute-positioned element, document.body.scrollHeight will always be 0 and will not trigger a scroll.)
Waiting for the content container
But as with any browser automation, we're in a race with the application: What if the content container hasn't rendered yet? Our scroll would fail.
Selenium provides WebDriverWait() to help you wait for the application to be in a particular state. It also provides, via its expected_conditions module, a set of common states to wait for, such as the presence of an element. We can use both of these to wait for the content container to be present:
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
TIMEOUT_IN_SECONDS = 10
wait = WebDriverWait(driver, TIMEOUT_IN_SECONDS)
wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "#contents")))
Determining whether additional content was fetched
At a high level, we can determine whether additional content was fetched by:
counting the content before we trigger the fetch,
counting the content after we trigger the fetch, then
comparing the two.
Counting the content
Within our container (with id "#contents"), each piece of content has id #content. To count the content, we can simply fetch each of those elements and use Python's built-in len():
count = len(driver.find_elements_by_css_selector("#contents #content")
Handling a slow render
But again, we're in a race with the application: What happens if either the fetch or the render of additional content is slow? We won't immediately see it.
We need to give the web application time to do its thing. To do this, we can use WebDriverWait() with a custom condition:
def get_count():
return len(driver.find_elements_by_css_selector("#contents #content"))
count = get_count()
# ...
wait.until(
lambda _: get_count() > count)
Handling no additional content
But what if there isn't any additional content? Our wait for the count to increase will timeout.
As long as our timeout is high enough to allow sufficient time for the additional content to appear, we can assume that there is no additional content and ignore the timeout:
try:
wait.until(
lambda _: get_count() > count)
except TimeoutException:
# No additional content appeared. Abort our loop.
break
Putting it all together
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
TIMEOUT_IN_SECONDS = 10
wait = WebDriverWait(driver, TIMEOUT_IN_SECONDS)
driver.get(URL)
wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "#contents")))
def get_count():
return len(driver.find_elements_by_css_selector("#contents #content"))
while True:
count = get_count()
driver.execute_script(
"window.scrollTo(0, document.querySelector('#contents').scrollHeight);")
try:
wait.until(
lambda _: get_count() > initial_count)
except TimeoutException:
# No additional content appeared. Abort our loop.
break
elements = driver.find_elements_by_css_selector("#contents #content")
Bonus: Simplifying with capybara-py
With capybara-py, this becomes a bit simpler:
import capybara
from capybara.dsl import page
from capybara.exceptions import ExpectationNotMet
#capybara.register_driver("selenium_chrome")
def init_selenium_chrome_driver(app):
from capybara.selenium.driver import Driver
return Driver(app, browser="chrome")
capybara.current_driver = "selenium_chrome"
capybara.default_max_wait_time = 10
page.visit(URL)
contents = page.find("#contents")
elements = []
while True:
try:
elements = contents.find_all("#content", minimum=len(elements) + 1)
except ExpectationNotMet:
# No additional content appeared. Abort our loop.
break
page.execute_script(
"window.scrollTo(0, arguments[0].scrollHeight);", contents)

Python3 Selenium: Wait for a Page Load in New Window

I've looked all over and found some very nice solutions to wait for page load and window load (and yes I know there are a lot of Stack questions regarding them in isolation). However, there doesn't seem to be a way to combine these two waits effectively.
Taking my primary inspiration from the end of this post, I came up with these two functions to be used in a "with" statement:
#This is within a class where the browser is at self.driver
#contextmanager
def waitForLoad(self, timeout=60):
oldPage = self.driver.find_element_by_tag_name('html')
yield
WebDriverWait(self.driver, timeout).until(staleness_of(oldPage))
#contextmanager
def waitForWindow(self, timeout=60):
oldHandles = self.driver.window_handles
yield
WebDriverWait(self.driver, timeout).until(
lambda driver: len(oldHandles) != len(self.driver.window_handles)
)
#example
button = self.driver.find_element_by_xpath(xpath)
if button:
with self.waitForPage():
button.click()
These work great in isolation. However, they can't be combined due to the way each one checks their own internal conditions. For example, this code will fail to wait until the second page has loaded because the act of switching windows causes "oldPage" to become stale:
#contextmanager
def waitForWindow(self, timeout=60):
with self.waitForLoad(timeout):
oldHandles = self.driver.window_handles
yield
WebDriverWait(self.driver, timeout).until(
lambda driver: len(oldHandles) != len(self.driver.window_handles)
)
self.driver.switch_to_window(self.driver.window_handles[-1])
Is there some Selenium method that will allow these to work together?

Does selenium's waits always need a timeout?(Python)

In selenium doc we can see that we must set some timeout for wait.
For example: code from that doc
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID,'someid')))
I wonder do we always must set up some timeout? Or there is some method that will wait until all of the AJAX code will download and only after it our driver will interact with some web-elements(I mean without any fixed timeout , it just loads all things and only after it starts interacting)?
Hopefully this code will help you. This is how I solved this issue.
#Check with jQuery if it has any outstanding ajax
def ajax_complete(self):
try:
return 0 == self.execute_script("return jQuery.active")
except:
pass
#Create a method to wait for ajax to complete
driver.wait_for_ajax = lambda: WebDriverWait(driver, 10).until(ajax_complete, "")
driver.implicitly_wait(30)

Categories

Resources