Python selenium click element while displayed - python

I'm scraping a dynamic page that requires the user to click a "Load More Results" button several times in order to get all data. Is there a better way to approach the task of clicking on an element while displayed?
def clickon(xpath):
try:
element = driver.find_element_by_xpath(xpath)
except:
print "RETRYING CLICKON() for %s" % (xpath)
time.sleep(1)
clickon(xpath)
else:
element.click()
time.sleep(3)
def click_element_while_displayed(xpath):
element = driver.find_element_by_xpath(xpath)
try:
while element.is_displayed():
clickon(xpath)
except:
pass

I suspect you are asking this question because the current solution is slow. This is mostly because you have these hardcoded time.sleep() delays which wait for more than they usually should. To tackle this problem I'd start using Explicit Waits - initialize an endless loop and break it once selenium stopped waiting for the button to be clickable:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
wait = WebDriverWait(driver, 10)
while True:
try:
element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath)))
element.click()
except TimeoutException:
break # cannot click the button anymore
# TODO: wait for the results of the click action
Now the last TODO part is also important - here I suggest you wait for a specific condition that would indicate that click resulted into something on a page - for instance, more results were loaded. For example, you can use a custom Expected Condition similar to this one.

Related

Why can't find element using Selenium?

I'm trying to find an element in Yandex.ru through selenium and click on it.enter image description here
the code is being processed but the click is not happening, I'm assuming selenium doesn't see the element.
def start_bot():
for i in req:
browser.get('https://yandex.ru/search/?lr=65&text='+i)
time.sleep(2)
print(browser.find_element(By.XPATH, '//*[#id="search-result"]/li[1]/div/div[2]/div[2]').click()
start_bot()
Referring to the parent class is not an option, how can I solve this problem?
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
# Assuming browser is already defined
try:
max_delay = 10 # seconds
browser.get('https://yandex.ru/search/?lr=65&text=1')
button = WebDriverWait(browser, max_delay).until(
EC.element_to_be_clickable((By.XPATH, '//*[#id="search-result"]/li[1]/div/div[2]/div[2]'))
)
button.click()
except TimeoutException:
print('Timeout: Element not found or clickable')
Instead of waiting for a fixed delay (2 seconds) as you have done, this code will wait for the element specified by the XPath to be clickable. If the element is not clickable after max_delay (10 seconds), an error of type TimeoutException will be thrown.
You can easily adapt this code to open multiple urls in a loop (as you have done in your code).
PS: Please fix your code's formatting (the function call start_bot() is rendered as text, not code.)

With Selenium python, do I need to refresh the page when waiting for hidden btn to appear and be clickable?

I'm trying to make a small program that looks at a web page at a hidden button (uses hide in the class) and waits for it to be clickable before clicking it. The code is below. I'm wondering if the WebDriverWait and element_to_be_clickable functions will already by refreshing things or if I would have to manually refresh the page.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.common.exceptions import WebDriverException
driver = webdriver.Firefox()
driver.get(<URL>)
print("beginning 120s wait")
time.sleep(120)
print("finished 120s wait")
try:
element = WebDriverWait(driver, 1000).until(
EC.element_to_be_clickable((By.CLASS_NAME, "btn add"))
)
print("It went through")
element.click()
driver.execute_script("alert('It went through!');")
finally:
driver.execute_script("alert('Did it work?');")
First of all, I am not really sure if just searching by the class name minus the "hide" part will actually find the correct element, but the larger issue is that I do not know if the button will only be visible after refreshing the page. If I need to refresh, then it gets annoying because most sites throw up additional captchas for both Firefox or Chrome when they figure out a bot is accessing the site. (That's why I have the initial sleep: so that I can finish any captcha manually first)
So, do I need to have a refresh in my code, or will it be fine without it? If I do need it, how do I implement it? Do I just add it like:
try:
element = WebDriverWait(driver, 1000).until(
drive.refresh()
EC.element_to_be_clickable((By.CLASS_NAME, "btn add"))
)
And sorry if this has been answered elsewhere, I searched a bunch, but I have not quite found the answer on this site.
First, you shouldn't use sleep the WebDriverWait with the correct EC will do the trick.
As for the EC.element_to_be_clickable this is the code behind the function:
def element_to_be_clickable(locator):
""" An Expectation for checking an element is visible and enabled such that
you can click it."""
def _predicate(driver):
element = visibility_of_element_located(locator)(driver)
if element and element.is_enabled():
return element
else:
return False
return _predicate
As you can see the EC.element_to_be_clickable function does not refresh the browser.
If you insist you need the refresh the correct way to implement it will be:
try:
element = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CLASS_NAME, "btn add"))
except (NoSuchElementException, StaleElementReferenceException):
driver.refresh()
element = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CLASS_NAME, "btn add"))
I don't think the refresh will help with the hidden element...

Using Selenium to get past dropdown (md-select)

I'm stuck with a dropdown that I can't get pass in Selenium.
I'm trying to collect some price data using Selenium from this link:
https://xxx. In this link, you need to click on a button (Next), then select any option in the subsequent dropdown, then press (Next) again to advance into the information page that I wanted to collect some information. I'm stuck at the dropdown - I am unable to select any option.
This is my code thus far:
browser.get("https://xxx/#/pricePlans/step1")
wait = WebDriverWait(browser, 10)
while True:
try:
button = browser.find_element_by_css_selector('body > div.md-dialog-container.ng-scope > md-dialog > md-dialog-actions > div > button')
except TimeoutException:
break
button.click()
options_box= browser.find_element_by_class_name('bullet-content-title')
wait = WebDriverWait(browser, 5)
options_box.click()
The issue lies with the dropdown options (It has options like HDB 1-room, HDB 2-room etc). I tried to reference the option box by XPATH, CSS selector, class_name (as seen above) but with the snippet above, Spyder issues time-out. Other snippets I tried included:
ui.WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "bullet-content-title")))
using XPATH, class_name but no luck.
I'm a newbie at web scraping who got thus far by searching the SO, but I am unable to find much solutions regarding (md-select) dropdowns.
I also attempted to use
ActionChains(driver).move_to_element(options_box).click(options_box)
but I did not see any clicking nor mouse movements so i'm stumped.
I appreciate any advice at this point of time. Thank you so much!
Edit:
Code Snippets and Responses:
from selenium import webdriver
from selenium.webdriver.support import ui
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains
option = webdriver.ChromeOptions()
option.add_argument('--incognito')
browser = webdriver.Chrome(executable_path='C:\\ChromeDriver\\chromedriver.exe', options=option)
browser.get("https://xxx")
wait = WebDriverWait(browser, 10)
while True:
try:
button = browser.find_element_by_css_selector('body > div.md-dialog-container.ng-scope > md-dialog > md-dialog-actions > div > button')
except TimeoutException:
break
button.click()
options_box = browser.find_element_by_class_name('bullet-content-title')
wait = WebDriverWait(browser, 5)
options_box.click()
This returns "StaleElementReferenceException: stale element reference: element is not attached to the page document"
Which I assume it is due to the presence of the second "Next" Button which is inert at the moment.
options_box = ui.WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "bullet-content-title")))
options_box.click()
Does nothing. Spyder eventually returned me TimeOut Error.
#AndrewRay answer is good for getting the value but not for selecting the options. you can do this to select the options.
#browser.get("https://......")
wait = WebDriverWait(browser, 10)
try:
browser.find_element_by_css_selector('button.green-btn').click()
# wait until dialog dissapear
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, 'md-dialog[aria-describedby="dialogContent_0"]')))
# click the dropdown
browser.find_element_by_css_selector('md-input-container').click()
# select the option element
setOptionElement = browser.find_element_by_css_selector('md-option[value="HDB Executive"]')
# need to scrollIntoView if the option in the bottom
# or you get error the element not clickable
browser.execute_script('arguments[0].scrollIntoView();arguments[0].click()', setOptionElement)
except Exception as ex:
print(ex)
driver.get('https://compare.openelectricitymarket.sg/#/pricePlans/step1')
time.sleep(5)
next_btn = driver.find_element_by_css_selector('button.green-btn')
next_btn.click()
dropdown = driver.find_element_by_id('select_4')
options = dropdown.find_elements_by_tag_name('md-option')
for option in options:
print option.get_attribute('value')
Hope this helps. Use the .get_attribute method to find the value of the option and click that option if matches the desired value. :)

How can i wait till page reload [duplicate]

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.
The webdriver will wait for a page to load by default via .get() method.
As you may be looking for some specific element as #user227215 said, you should use WebDriverWait to wait for an element located in your page:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Page is ready!"
except TimeoutException:
print "Loading took too much time!"
I have used it for checking alerts. You can use any other type methods to find the locator.
EDIT 1:
I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.
Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fragles' comment:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
This matches the example in the documentation. Here is a link to the documentation for By.
Find below 3 methods:
readyState
Checking page readyState (not reliable):
def page_has_loaded(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
page_state = self.driver.execute_script('return document.readyState;')
return page_state == 'complete'
The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.
id
Comparing new page ids with the old one:
def page_has_loaded_id(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
try:
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
except NoSuchElementException:
return False
It's possible that comparing ids is not as effective as waiting for stale reference exceptions.
staleness_of
Using staleness_of method:
#contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
For more details, check Harry's blog.
As mentioned in the answer from David Cullen, I've always seen recommendations to use a line like the following one:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here.
According to Web Scraping with Python by Ryan Mitchell:
ID
Used in the example; finds elements by their HTML id attribute
CLASS_NAME
Used to find elements by their HTML class attribute. Why is this
function CLASS_NAME not simply CLASS? Using the form object.CLASS
would create problems for Selenium's Java library, where .class is a
reserved method. In order to keep the Selenium syntax consistent
between different languages, CLASS_NAME was used instead.
CSS_SELECTOR
Finds elements by their class, id, or tag name, using the #idName,
.className, tagName convention.
LINK_TEXT
Finds HTML tags by the text they contain. For example, a link that
says "Next" can be selected using (By.LINK_TEXT, "Next").
PARTIAL_LINK_TEXT
Similar to LINK_TEXT, but matches on a partial string.
NAME
Finds HTML tags by their name attribute. This is handy for HTML forms.
TAG_NAME
Finds HTML tags by their tag name.
XPATH
Uses an XPath expression ... to select matching elements.
From selenium/webdriver/support/wait.py
driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_id("someId"))
On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)
def scrollDown(driver, value):
driver.execute_script("window.scrollBy(0,"+str(value)+")")
# Scroll down the page
def scrollDownAllTheWay(driver):
old_page = driver.page_source
while True:
logging.debug("Scrolling loop")
for i in range(2):
scrollDown(driver, 500)
time.sleep(2)
new_page = driver.page_source
if new_page != old_page:
old_page = new_page
else:
break
return True
Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.
driver = webdriver.Chrome()
driver.implicitly_wait(10)
So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.
To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait.
Here I did it using a rather simple form:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
try:
searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
searchTxt.send_keys("USERNAME")
except:continue
Solution for ajax pages that continuously load data. The previews methods stated do not work. What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time.
import time
from selenium import webdriver
def page_has_loaded(driver, sleep_time = 2):
'''
Waits for page to completely load by comparing current page hash values.
'''
def get_page_hash(driver):
'''
Returns html dom hash
'''
# can find element by either 'html' tag or by the html 'root' id
dom = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
# dom = driver.find_element_by_id('root').get_attribute('innerHTML')
dom_hash = hash(dom.encode('utf-8'))
return dom_hash
page_hash = 'empty'
page_hash_new = ''
# comparing old and new page DOM hash together to verify the page is fully loaded
while page_hash != page_hash_new:
page_hash = get_page_hash(driver)
time.sleep(sleep_time)
page_hash_new = get_page_hash(driver)
print('<page_has_loaded> - page not loaded')
print('<page_has_loaded> - page loaded: {}'.format(driver.current_url))
How about putting WebDriverWait in While loop and catching the exceptions.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
try:
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
print "Page is ready!"
break # it will break from the loop once the specific element will be present.
except TimeoutException:
print "Loading took too much time!-Try again"
You can do that very simple by this function:
def page_is_loading(driver):
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
return True
else:
yield False
and when you want do something after page loading complete,you can use:
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")
while not page_is_loading(Driver):
continue
Driver.execute_script("alert('page is loaded')")
use this in code :
from selenium import webdriver
driver = webdriver.Firefox() # or Chrome()
driver.implicitly_wait(10) # seconds
driver.get("http://www.......")
or you can use this code if you are looking for a specific tag :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox() #or Chrome()
driver.get("http://www.......")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "tag_id"))
)
finally:
driver.quit()
Very good answers here. Quick example of wait for XPATH.
# wait for sizes to load - 2s timeout
try:
WebDriverWait(driver, 2).until(expected_conditions.presence_of_element_located(
(By.XPATH, "//div[#id='stockSizes']//a")))
except TimeoutException:
pass
I struggled a bit to get this working as that didn't worked for me as expected. anyone who is still struggling to get this working, may check this.
I want to wait for an element to be present on the webpage before proceeding with my manipulations.
we can use WebDriverWait(driver, 10, 1).until(), but the catch is until() expects a function which it can execute for a period of timeout provided(in our case its 10) for every 1 sec. so keeping it like below worked for me.
element_found = wait_for_element.until(lambda x: x.find_element_by_class_name("MY_ELEMENT_CLASS_NAME").is_displayed())
here is what until() do behind the scene
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)
If you are trying to scroll and find all items on a page. You can consider using the following. This is a combination of a few methods mentioned by others here. And it did the job for me:
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem1 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_1 = len(elem1)
print(f"A list Length {len_elem_1}")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem2 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_2 = len(elem2)
print(f"B list Length {len_elem_2}")
if len_elem_1 == len_elem_2:
print(f"final length = {len_elem_1}")
break
except TimeoutException:
print("Loading took too much time!")
selenium can't detect when the page is fully loaded or not, but javascript can. I suggest you try this.
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
this will execute javascript code instead of using python, because javascript can detect when page is fully loaded, it will show 'complete'. This code means in 100 seconds, keep tryingn document.readyState until complete shows.
nono = driver.current_url
driver.find_element(By.XPATH,"//button[#value='Send']").click()
while driver.current_url == nono:
pass
print("page loaded.")

webdriver wait for ajax request in python

Currently I am writing webdriver test for search which uses ajax for suggestions. Test works well if I add explicit wait after typing the search content and before pressing enter.
wd.find_element_by_xpath("//div[#class='searchbox']/input").send_keys("obama")
time.sleep(2)
wd.find_element_by_xpath("//div[#class='searchbox']/input").send_keys(Keys.RETURN)
but
wd.find_element_by_xpath("//div[#class='searchbox']/input").send_keys("obama")
wd.find_element_by_xpath("//div[#class='searchbox']/input").send_keys(Keys.RETURN)
fails. I am running tests on ec2 with 1 virtual cpu. I am suspecting, I pressed enter even before GET requests related to search are sent and if I press enter before suggestions, it fails.
Is there any better way that adding explicit waits?
Add this method, where I ensure the API responses are back from server
def wait_for_ajax(driver):
wait = WebDriverWait(driver, 15)
try:
wait.until(lambda driver: driver.execute_script('return jQuery.active') == 0)
wait.until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
except Exception as e:
pass
You indeed can add an explicit wait for the presence of an element like
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
ff = webdriver.Firefox()
ff.get("http://somedomain/url_that_delays_loading")
ff.find_element_by_xpath("//div[#class='searchbox']/input").send_keys("obama")
try:
element = WebDriverWait(ff, 10).until(EC.presence_of_element_located((By.ID, "keywordSuggestion")))
finally:
ff.find_element_by_xpath("//div[#class='searchbox']/input").send_keys(Keys.RETURN)
ff.quit()
See: http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#explicit-and-implicit-waits
And what about:
driver.implicitly_wait(10)
for your example:
wd.implicitly_wait(10)
In this case every time you are going to click or find element driver will try to do this action every 0.5 second during 10 seconds. In this case you don't need to add wait every time.
Note: But it is only about element on screen. It will not wait until some JS actions ends.

Categories

Resources